>Kevin McKernan, Applied Biosystems – “The whole Methylome: Sequencing a Bisulfite Converted Genome using SOLiD”

>Background on methylation. It’s not rare, but it is clustered. This is begging for enrichment. You can use Restriction Enzymes. Uses Mate Pairs to set this up. People can also use MeDIP and a new third method: methyl binding protein from Invitrogen. (Seems to be more sensitive.)

MeDIP doesn’t grab CpG, tho… just leaving single stranded DNA, which is a pain for making libraries. Using only 90ng. There is a slight bias on adaptors, tho. Not yet optimized. If they’re bisulfite converting, it has issues (protecting adaptors, requires densely methylated DNA, etc). They get poor alignment because methylation areas tend to be repetitive. Stay tuned, though, for more developments

MethylMiner workflow: Sheer genomic DNA, and put adaptors on it, and then biotin bind methyls? You can titrate methyl fractions off the solid support, so you can then sequence and know how many methyls you’ll have. Thus, mapping becomes much easier, and sensitivity is better.

When you start getting up to 10-methyls in a 50mer, bisulfite treating + mapping is a problem. It’s also worth mentioning that methylation is not binary when you have a large cell population.

The methyl miner system was tested on A. thaliana, SOLiD fragments generated… good results obtained, and salt titration seems to have worked well, and mapping reads show that you get the right number (aprox) of methyl Cs. – but mapping is easy, since you don’t need to bisulfite.

Showed examples where some of genes are missed by MeDIP, but found by MethylMiner.

(Interesting note, even though they only have 3 bases after conversion (generally), it’s still 4 colour.)

Do you still get the same number of reads on both strands? Yes…

Apparently methylation is easier to align in colourspace. [Not sure I caught why.] Doing 50mers with 2MM. (Seems to keep % align-able in colourspace, but bisulfite treated base space libraries can only be aligned about 2/3rds as well.

When bisulfite converted, 5mCTP will appear as a SNP in alignment. To approach that, you can do fractionation in MethylMiner kit, which gives you a more rational approach to alignments.

You can also make a LMP library, and then treat with 5mCTP when extending, so you get two tags, then separate tags – (they keep a barcode) and then pass over methylminer kit… etc etc… barcoded mapping to detect methyl C’s better.

Also have a method in which you do something the same way, but ligate hairpins on the ends… then put on adaptors, and then sequence the ends, to get mirror imaged mate pairs. (Stay tuned for this too.)

There are many tools to do Methylation mapping: colourspace, lab kits and techniques.

>Stephen Kingsmore, National Centre for Genome Resources – “Digital Gene Expression (DGE) and Measurement of Alternative Splice Isoforms, eQTLs and cSN

>[Starts with apologizing for chewing out a guy from Duke… I have no idea what the back story is on that.]

Developed their own pipeline, with a web interface called Alpheus, which is remotely accessible. They have and Ag biotech focus, which is their niche. Would like to get into personal genome sequencing.

Application 1: Schizophrenia DGE.
pipeline: ends with Anova analysis. Alignment to several references: transcripts and genome. 7% span exon junctions. MRNA-Seq Coverage. Read count gene based expression analysis is as good as or better than arrays or similar tech. Using Principle component analysis. Using mRNA-Seq, you can clearly separate their controls and cases, which they couldn’t do with Arrays. It improves diagnosis component of Variance.
Showing “Volcano Plots”.

Many of genes found for schizophrenia converged on a single pathway, known to be involved in neurochemistry.

Have a visualization tool, and showed that you can see junctions and retained introns, and then wanted to do it more high throughput. Started a collaboration to focus on junctions, to quantify alternative transcript isoforms. Working on first map of splicing of transcriptome in human tissues. 94% of human genes have multiple exons. Every one had alternative splicing in at least one of the tissues examined.

92% have biochemically relevant splicing. (minimum 15%?)

8 types of alternative splicing… 63% of alternative splicing is tissue regulated. 30% of splicing occurs between individuals. (So tissue splicing trumps individuals)

[Brief discussion of 454 based experiment… similar results, I think.]

1.cost effective,
3.biologically relevant
4.identified stuff missed by genome sequencing

Finally, also compared genotypes from individuals looking at cSNPs. Cis-acting SNPS causing allelic imbalance. Used it to find eSNPS (171 found). Finally, you can also Fine Map eQTN within eQTL.

>Jesse Gray, Harvard Medical School – “Neuronal Activity-Induced Changes in Gene Expression as Detected by ChIP-Seq and RNA-Seq”

>Now “widespread overlapping sense/antisense transcription surrounding mRNA transcriptional start sites.”

Thousands of promoters exhibit divergent transcriptional initiation. Annotated TSS come from NCBI. There are 25,000 genes. There is an additional anti-sense TSS (TSSa) 200 bp upstream. [Nifty, I hadn’t heard about that.]

Do RNA-Seq and ChIP-Seq. Using SOLiD. SOLiD or Ambion [not sure which] plans to sell the method as a kit for WTSS/WT-Seq.

Using RNA Pol II ChIP-Seq.

Anti-sense transcription peaks about ~400 bases upstream of TSS. When looking at the genome browser, you see overlapping TSS-associated transcription. (you see negative strand reads on the other direction, upstream from TSS, and on the forward strand at the TSS, with a small overlap in the middle.)

It is a small amount of RNA being produced.

Did a binomial statistical test, fit to 4 models:
1.sense only initiation
2.divergent initiation (overlap)
3.anti-sense only initiation
4.divergent (no overlap)

The vast majority are TSSs with divergent overlap, 380 with divergent (no overlap), 900 sense only, 140 anti-sense only. Many other sites were discarded because it was unclear what was happening. This is apparently a wide-spread phenomenon.

Might this be important? Went back to ChIP-Seq to classify the peaks into these categories from RNA Pol II expt. (Same categories.) Is this a meaningful way to classify sites, and what does it tell us?

How many of those peaks have a solid PhastCons score, which should tell something about the read. No initiation has the lowest scores… the ones with the antisense models have the highest conservation at the location of antisense initiation.

Where do the peaks fall, when they have anti-sense? Anti-sense are bimodal, sense only and bi-direcitional are just before the TSS, and non-bi-modal.

Tentatively, yes, it seems like this anti-sense is functionally important.

Does TSSo change efficiency of initiation?

Break into two categories. Non-overlap TSSs, and overlap TSSs. It appears that overlap TSSs produce more than twice the RNA than non-overlap. This could be a bias… could be selecting for highly expressed genes. Plot the RNA Pol II occupancy at the star sites, there is a big difference at the overlapping TSS. Non-overlap has higher occupancy at non-overlap, but lower up or down stream than overlap. Thus the transition to elongation may be less efficient.

Does TSSo change efficiency of initiation? Tentatively, yes.

Comment from audience: this was discovered a year ago in a paper a year ago by Kaplan (Kaparov?). Apparently this was lately described that these are cleaved into 31nt capped reads. THus, the fate of the small RNA should be of interst. 50% of genes had this phenomenon.

Question from audience: what aligner was used, and how were repetitive sections handled. Only uniquely mapping read, using SOLID pipeline. (Audience member thinks that you can’t do this analysis with that data set.) Apparently, someone else claims it doesn’t matter.

My Comment: This is pretty cool. I wasn’t aware of the anti-sense transcription in the reverse direction from the TSS. It will be interesting to see where this goes.

>Terrence Furey, Duke University – “A Genome-Wide Open Chromatin Map in Human Cell Types in the ENCODE Project”

>2003: initial focus on 1% of genome. Where are all the DNA elements.
2007: Scale up from 1% to 100%

Where are all of the regulatory element in the genome: a parts list of all functional elements.

We now know: 53% unique, 45% repetitive, 2% are genes. Some how, the 98% controls the other 2%.

Focussed on regions of open chromatin. Open chromatin is not bound to nucleosomes.

5.locus control regions
6.meiotic recombination hotspots.

Use two assays: DNAse hyper-sensitivity. Used at single site in the past, now used for high throughput genome wide assays. The second method is FAIRE: formaldehyde assisted identification of regulatory elements. It’s a ChIP-Seq. [I don’t know why they call it FAIRE… it’s exactly a ChIP experiment – I must be missing something.]

Also explaining what ChIP-Seq/ChIP-chip is. They now do ChIP-Seq. Align sequences with MAQ. Filter on number of aligned locations. (keep up to 4 alignments). Use F-Seq. Then call peaks with a threshold. Use a continuous value signal.

The program is F-Seq, created by Alan Boyle. Outputs in Bed and Wig format. Also deals with alignability “ploidy”. (Boyle et al, Bioinformatics 2008). They use Mappability to calculate smoothing.

[This all sounds famillar, somehow… yet I’ve never heard of F-Seq. I’m going to have to look this up!]

Claim you need normalization to do proper calling. Normalization can also be applied if you know regions of duplications.

[as I think about it, continuous read signals must create MASSIVE wig files. I would think that would be an issue.]

Peak calling validation: ROC analysis. False positive along bottom axis, true positives on vertical axis. Show chip-seq and chip-array have very high concordance.

Dnase I HS – 72 Million sequences, 149,000 regions, 58.5Mb – 2.0%
FAIRE – 70 Million sequences, 147,000 regions, 53Mb – 1.8%

Compare them – and you see the peaks correspond with the peaks in the other. Not exact, but similar. Very good coverage by FAIRE of the Dnase peaks. Not as good the other way, but close.

Goal of project should be done on a huge list of cells (92 types?? – 20 cell lines, add 50 to 60 more, including different locations in body, disease, cells exposed to different agents… etc etc.) RNA is tissue specific, so that changes what you’ll see.

Using dnase and fare assays to define open chromatin map
exploring many cell times,
discovery of ubiquitous and cell specific elements.

Note: Data is available as quickly as possible – next month or two, but may not be used for publication for the first 9 months.

>Kai Lao, Applied Biosystems – “Deep Sequencing-Based Whole Transcriptome Analysis of Single Early Embryos”

>I think all sequencing was done with ABI SOLiD.

To get answers about early life stages, you need to do single cells – early life is in single cells, or close to it. When you separate a two cell embryos, miRNAs are symmetrically distributed (measured by array). T1 and T2 have similar profiling. When you separate in 4 cells – it’s still the same….

Can you do the same thing with next gen sequencing to do whole transcriptome? (Yes, apparently, but the slide is too dark to see what the method is.) Quantified cDNA libraries on gel, then started looking at results.

If you do everything perfectly, concordance between forward and reverse strand should be same. However, if you do the concordance between two blastomers, you see different results. [not sure what the difference is, but things aren’t concordant between two samples….]

First, showed that libraries have very high concorndance – same oocyte gives excellent concordance. However, between dicer knock out and wt, you get several genes that do not have same expression expression in both. Many genes are co-up-regulated or co-down-regulated.

One gene was Dppa5. In wt, it had low expression, in Dicer-KO and ago2-KO, they were upregulated.

After Dicer Genes were KO at day 5, only 2% of maternal miRNAs survived in a mature Dicer KO oocyte (30 days.) Dicer-KO embryos can not form viable organisms (beyond first few cell stages.)

Deeper sequencing is better. With 20M reads, you get array level data. You get excellent data beyond 100M reads.

No one ever proved that multiple isoforms are expressed at the same time in a cell – used this data to map junctions, and showed they do exist. 15% of genes expressed in a single cell as different isoforms.

>Matthew Bainbridge, Baylor College of Medicine – “Human Variant Discovery Using DNA Capture Sequencing”

>overview: technology + pipeline, then genome pilot 3, snp calling, verification.

Use solid phase capture – Nimblegen array + 454 sequeencing
map with BLAT and cross_match. SNP calling (ATLAS-SNP).

All manner of snp filtering.
1.Remove duplicates with same location
2.Then filter on p value.
3.More.. [missed it]

226 samples of 400.

Rebalanced Arrays.. Some exons pull down too much, and others grab less. You can change concentrations, then, and then use the rebalanced array.

Average coverage came down, but overall coverage went up.. Much less skew with rebalanced array. 3% of target region just can’t get sequence. 90% of sequence ends up covered 10x or better.

Started looking at SNPs – frequency across individuals.

Interested in Ataxia, hereditary neurological disorder. Did 2 runs in first pilot test on 2 patients. Now do 4. Found 18,000 variants. Found one in the gene named for that disease – turned out to be novel, and non-synonymous. Follow up on it, and it looks good: and sequence it in the rest of the family, but it didn’t actually exist outside that patient.

So that brings us to validation: Concordance to HapMap, etc etc, but they only tell you about false negatives, not false positives. You have to go learn more about false positives with other methods, but the traditional ones can’t do high throughput. So, to verify, they suggest using other platforms: 454 + SOLiD.

When they’re done, you get a good concordance, but the false positives drop out. The interesting thing is “do you need high quality in both techniques?” The answer seems to be no. You just need high quality in one… but do you need even that? Apparently, no, you can do this with two low quality runs from different platforms. Call everything a SNP (errors, whatever.. call it all a SNP.) When you do that and then build your concordance, you can get a very good job of SNP calling! (60% are found in dbsnp.)

My Comments: Nifty.

>Keynote: Richard Gibbs, Baylor College of Medicine – “Genome Sequencing to Health and Biological Insight”

>Repetitive things coming up in genomics, and comments about the knowledge pipeline. Picture of snake that ate two lightbulbs…. [random, no explanation]

“cyclic” meeting history: used to be GSAC, then stopped when it became too industrial. Then switched to AMS, and then transitioned to AGBT. We’re coming back to the same position, but it’s much more healthy this time.

We should be more honest about our conflicts.

The pressing promise infront of us – making genomics accessible. Get yourself genotyped… (he did), the information presented is just “completely useless!”

We know it can be really fruitful to find variants. So how do we go do that operationally? Targeted sequencing versus whole genome. What platform (compared to coke vs. Pepsi.)

They use much less Solexa, historically. They just had good experiences with the other two platforms.

16% of watson snps are novel, 15% of venter snps are novel. ~10,500 novel variants.(?) [not clear on slide]

Mutations in Human gene mutation database. We already know the database just aren’t ready yet.. not for functional use.

Switch to talking about SOLiD platform:

SNP detection and validation. Validation is difficult – but having two platforms do the same thing, it’s MUCH easier to knock out false positives. Same thing on indels. You get much higher confidence data. Two platforms is better than one.

Another cyclic event: Sanger, then next-gen then base-error modelling. We used to say “just do both strands”, and now it’s coming back to “just sequence it twice”. (calls it “just do it twice” sequencing.)

Knowledge chain value: sequencing was the problem, then it became the data management, and soon, it’ll shift back to sequence again.

Capture: it’s finally “getting there”. Exon capture and nimblegen work very well in their hands. Coverage is looking very well.

Candidate mutation for ataxia mutaion. In one week got to a list. Of course, they’re still working on the list itself.

How to make genotyping useful?
1.develop physicians and genetics connection
2.retain faith in genotypic effects
3.need to develop knowledge of *every* base.
4.Example, function, orthology…and…

Other issues that have to do with the history of each base. MapMap3/Encode. Sanger based methods, about 1Mb each patient. Bottom line: found a lot of singletons. They found a few sites that were mutated independently, not heritable.

Other is MiCorTex. 15,200 people (2 loci). Looking for athlerosclerosis. Bottom line: we find a lot of low frequency variants. Sequenced so many people, you can make predictions (“The coalescent”). Sample size is now a significant fraction of population, so the statistics change. All done with Sanger!

Change error modeling – went back to original sequencing and got more information on nature of calls. Decoupling of Ne and Mu in a large sample data.

In the works: represent Snp error rates estimates with genotype likelihood.
1000 genomes pilot 3 project. If high penetrance variants are out there, wouldn’t it be nice to know what they’re doing and how. 250 samples accumulated so far.

Some early data: propensity for non-sense mutations.
Methods have evolved considerably
whole exome
variants will be converted to assays
data merged with other functional variants.

Both whole genome and capture are both doing well.
Focus is now back on rare variants
platform comparison also good
Db’s still need work
site specific info is growing
major challenge of variants understanding can be achieved by ongoing functional studies and improve context.

>John Todd, University of Cambridge – “The Identification of Susceptibly Genes in Common Diseases Using Ultra-Deep Sequencing”

>Type 1 diabetes: a common multifactorial disease. One of many immune-mediated disease that in total affect ~5% of the population. Distinct epidemiological & clinical features. Genome wide association success… but.. What’s next?

There is a pandemic increase in type 1 diabetes. Since 1950’s, there’s an abrupt 3% increase each year. Age at diagnosis has been decreasing. Now 10-15% are diagnosed under 5 years old.

There is a strong north-south and seasonality bias to it. Something about this disease tracks with seasons.. vitamin D? Viruses?

Pathology: massive infiltration of beta cell islets.

In 1986: 1000 genotypes. In 1996: multiplexing allowed 1,000,000 genotypes, now allows full genome association.

Crones and diabetes are “the big winners” from the welcome trust – most heritable and easily diagnosed of the seven diseases originally selected.

Why do people get type 1 diabetes. Large effect at the HLA classII = immune recognition of beta cells. 100’s of other genes in common and rare alleles of SNPs and SV in immune homesostatsis.

Disease = a threshold os susceptibility alleles and a permissive environment.

What will the next 20 years look like: National registers of diseases. (linkage to records and samples where available.) Mobile phone text health, identificaion of causal genes and their patheways (mechanisms), natural history of disease susceptibility and newborn susceptibility by their TID gneetic profile. What dietary, infectious, gut flora-host interactions modify these and which can we affect?

Can we slow the disease spread down?

There are 42 chromosome regions in type 1 diabetes, with 96 genes. Which are causal? What are the pathways? What are the rare variants? Geneome-wide gene-isoform expression. Genotype to protein information.

Ultra-deep sequencing study: 480 patients and 480 controls, PCR of exons and did 454. 95% probability of allele at .3%.

Found one hit: IFIH1. Followed up in 8000+ patients – found this gene was not associated with disease, but with protection from disease! Knock it out, and you become susceptible!

It’s possible that this is associated with protection of viral infections. The 1000 genome project may also help give us better information for this type of study.

The major prevention trial to prevent type 1 diabetes is ingestion of insulin to restore immune tolerance to insulin.

Do we know enough about type 1 diabetes?

Maybe one of the pathways in type1 diabetes is a defect in oral tolerance?

Type 1 diabetes co-segregates with stuff like ciliac disease (wheat tolerance.) One of the rare auto immune diseases for which we know the environmental factor (gluton). Failure of gut immune to be tolerant of glutin.

The majority of loci between type 1 diabetes and cilliac are similar. (sister diseases)

Compared genes in Type 1 and Type 2 diabetes – they are not overlapping. No molecular basis for the grouping of these two diseases.

Common genotypes are ok for predicting type 1. ROC curve presented. Can identify population that is likely to develop T1D, but…. how do you treat?

Going from genome to treatment is not obvious, tho.

Healthy volunteers – recallable by genotype, age, etc (Near Cambridge).

Most susceptibility variants affect gene regulation & splicing. Genome wide expression analysis of mRNA and isoforms in pure cell population. Need to get down to lower volume of input material and lower costs.

Using high throughput sequencing with allele-specific expression(ASE). Looking or eQTLs for disease and biomarkers. Doing work on other susceptibility genes. (Using volunteers recallable by genotype).

Looking for new recruits: Chair of biomedical stats, head of informatics, chair of genomics clinical….

>Kathy Hudson, The Johns Hopkins University – “Public Policy Challenges in Genomics”

>Challenges: getting enough evidence is difficult: Analytic validity, clinical validity.. etc etc

Personal value is there theoretically – but will it work?

Two different approaches: who offers them, and then who makes the tests?

Types: either performed with or without consent. Results returned.. or not. There are now a large number of people offering tests for a wide number of conditions.

Are the companies medical miracles, or just marketing scam? Are the predictions really medically relevant. FTC is supposed to stop companies that lie… but for genetic testing they just put out a warning.

Role of states in regulating: States dictate who can authorize a test. However, in some states anyone can order it, not just medical personel.

How they’re made:
Two types of tests: Lab tests and (homebrews) test “kits”. The level of regulator oversight is disparate. Difference is not apparent to people ordering them, but they have different types of oversight.

[flow charts on who regulates what] Lab tests are not under FDA (done through the CMS)… and it makes no sense to be there. you can’t get access to basic science information through CMS, whereas in FDA, that’s a key part of mandate(?)

Example about proficiency testing – which as poorly implemented in law, and is still not well done. The list is now out of date – and none of the list of diseases being tested have genetic basis. CMS can’t give information on what the numbers in the reported values mean (labs get 0’s for multi-year tests, but CMS can’t explain it.)

FDA regulation of test kits are much more rigorous.

Genentech started arguing that the two path system should not be there. Should be regulated based on risk, not manufacturer. Obama-Burr introduced genetic medicine bill in 2007, and something more recently by Kennedy. (Also biobanking?)

Steps to effective testing:
1.level over oversight based on risk
2.tests should give answer nearly all the time
3.data linking genotype to phenotype should be publicly accessible
4.high risk tests should be subject to independent review before entering market
5.pharmacogenetics should be on label
6.[missed this point]

Privacy: should it be public? Who percieves it as what?

More people are concerned about financial privacy than medical privacy. 1/3 think that medical record should be “super secret” : and what part of it they thought should be most private, most people thought it was social security number! Genetic test and family history is way down the list of what needs to be protected.

People trust doctors and researchers well, but not employers. Genetic information nondiscrimination act is a consequence of that trust level. (not a direct result?)

The new Privacy Problem? DNA snooping. Who is testing your DNA? (Something about a half-eaten waffle left by Obama that ended up on ebay… claiming it had his DNA on it.)

Many actions: testing, implementing lawas, modernizing laws, transparency, better testing

My comments: It was a really engaging talk, with great insight into US law in genetics. I’d love to see a more global view, but still, quite interesting.

>Howard McLeod, University of North Carolina, Chapel Hill – “Using the Genome to Optimize Drug Therapy”

>“A surgeon who uses the wrong sde of the scalpel cuts her own fingers and not hte patient.
If the same applied to drugs they would have been investigated very carefully a long time ago.”
Rudolph Bucheim. (1849)

The clinical problem: Multiple active regiments for the treatment of most diseases. Variation in response to therapy, unprecedented toxicity + and cost issues! With choice comes decision. How do you know which drug to provide.

“We only pick the right medicine for a disease 50% of the time”. Eventually we find the right drug, but it may take 4-5 tries. Especially in cancer.

“toxicity happens to the patient, not the prescriber”

[Discussion of importance of genetics. – very self-deprecating humour… Keeps the talk very amusing. Much Nature vs. Nurture again]

“Many Ways To Skin a Genome”. Tidbit: Up to half of the DNA being meausred can come from the lab personal handling the sample. [Wha?] DNA testing is being done in massive amounts: newborns, transplants..

“you can get DNA from anything but OJ’s glove.”

We also see applications of genetics in drug metabolism. Eg, warfarin. Too much: bleeding, too little; clotting. One of only two drugs that has it’s own clinic. [yikes.] Apparently methadone is the other. Why does it have its own clinic? “That’s because this drug sucks.” Still the best thing out there, though. Discussion of CYP based mechanisms and the Vitamin K reductase target. Showed family tree – too much crossing of left and right hand…

Some discussion of results – showing that there are difference in genetics that strongly influences metabolism of warfarin.

Genetics is now become part of litigation – Warfarin is one of the most litigated drugs.

We need tools that translate genetics in to lay-speak. IT doesn’t help to tell people they have a CYP2C*8.. they need a way to understand and interpret that.

If we used genetics, we’d be able to go from 11% to 57% of “proper doese” at the first time with warfarin.

Pharmacogenomics have really started to take off and there are now at least 10 examples.

What is becoming important is pathways… but there are MANY holes. We know what we know, We don’t know what we don’t know.

We can do much of the phenotyping in cell lines – we can ask “is this an inheritable trait?” This should focus our research efforts in some areas.

Better systematic approach to sampling patients.

What do we do after biomarker validation? Really, we do nothing – we assume someone else will pick it up (Through osmosis… that’s faith based medicine!) We need to talk to the right people and then hand it off – we need to do biomarker-driven studies with the goal of knowing who to hand it off too.

Take home message:

Pharmacogenetic analysis of patient DNA is ready for prime time.

My Comment: Very amusing speaker! The message is very good, and it was engaging. The Science was well presented and easily understandable, and the result is clear: there’s lots more room for improvement, but we’re making a decent start and there is promise for good pharmagenomics.