>AGBT 2009 – Thoughts and reflections

>Note: I wrote this last night on the flight home, and wasn’t able to post it till now. In the meantime, I’ve gotten some corrections and feedback that I’ll go through and make corrections to my blog posts as needed. In the meantime, here’s what I wrote last night.


This was my second year at AGBT, and I have to admit that I enjoyed this year a little more than the last. Admittedly, it’s probably because I knew more people and was more familiar with the topics being presented than I was last year. Of course, comprehensive exams and last year’s AGBT meeting were very good motivators to come up to speed on those topics.

Still, there were many things this year that made the meeting stand out, for which the organizing committee deserves a round of applause.

One of the things that worked really well this year was the mix of people. There were a lot of industry people there, but they didn’t take over or monopolize the meeting. The industry people did a good job of using their numbers to host open houses, parties and sessions without seeming “short-staffed”. Indeed, there were enough of them that it was fairly easy to find them to ask questions and learn more about the “tools of the trade.”

On the other hand, the seminars were mainly hosted by academics – so it didn’t feel like you were sitting through half hour infomercials. In fact, the sessions that I attended were all pretty decent, with a high level of novelty and entertainment factor. The speakers were nearly all excellent, with only a few that felt of “average” presentation quality. (I managed to take notes all the way through, so clearly I didn’t fall asleep during anyone’s talk, even if I had the momentary zone out caused by the relentless 9am-9pm talk schedule.)

At the end of last year’s conference, I returned to Vancouver – and all I could talk about was Pacific Biosciences SMRT technology, which dominated the “major announcement” factor for me for the past year. At this year’s conference, there were several major announcements that really caught my attention. I’m not sure if it’s because I have a better grasp of the field, or if there really was more of the “big announcement” category this year, but either way, it’s worth doing a quick review of some of the major highlights.

Having flown in late on the first day, I missed the Illumina workshop, where they announced the extension of their read length to 250 bp, which brings them up to the same range as the 454 technology platform. Of course technology doesn’t stand still, so I’m sure 454 will have a few tricks up their sleeves. At any rate, when I started talking with people on thursday morning, it was definitely the hot topic of debate.

The second topic that was getting a lot of discussion was the presentation by Complete Genomics, which I’ve blogged about – and I’m sure several of the other bloggers will be doing in the next few days. I’m still not sure if their business model is viable, or if the technology is ideal… or even if they’ll find a willing audience, but it sure is an interesting concept. The era of the $5000 genome is clearly here, and as long as you only want to study human beings, they might be a good partner for your research. (Several groups announced they’ll do pilot studies, and I’ll be in touch with at least one of them to find out how it goes.)

And then, of course, there was the talk by Marco Marra. I’m still in awe about what they’ve accomplished – having been involved in the project officially (in a small way) and through many many games of ping-pong with some of the grad students involved in the project more heavily, it was amazing to watch it all unfold, and now equally amazing to find out that they had achieved success in treating a cancer of indeterminate origin. I’m eagerly awaiting the publication of this research.

In addition to the breaking news, there were other highlights for me at the conference. The first of many was talking to the other bloggers who were in attendance. I’ve added all of their blogs to the links on my page, and I highly suggest giving their blogs a look. I was impressed with their focus and professionalism, and learned a lot from them. (Tracking statistics, web layout, ethics, and content were among a few of the topics upon which I received excellent advice.) I would really suggest that this be made an unofficial session in the future. (you can find the links to their blogs as the top three in my “blogs I follow” category.)

The Thursday night parties were also a lot of fun – and a great chance to meet people. I had long talks with people all over the industry, where I might not otherwise have had a chance to ask questions. (Not that I talked science all evening, although I did apologize several times to the kind Pacific Biosciences guy I cornered for an hour and grilled with questions about the company and the technology. And, of course, the ABI party where Olena got the picture in which Richard Gibbs has his arm around me is definitely another highlight. (Maybe next year I’ll introduce myself before I get the hug, so he knows who I am…)

One last highlight was the panel session sponsored by Pacific Biosciences, in which Charlie Rose (I hope I got his name right) mediated the discussion on a range of topics. I’ve asked a guest writer to contribute a piece based on that session, so I won’t talk too much about it. (I also don’t have any notes, so I probably shouldn’t talk about it too much anyhow.) It was very well done with several controversial topics being raised, and lots of good stones were turned over. One point is worth mentioning, however: One of the panel guests was Eric Lander, who has recently come to fame in the public’s eye for co-chairing a science committee requested by the new U.S. President Obama. This was really the first time I’d seen him in a public setting, and I have to admit I was impressed. He was able to clearly articulate his points, draw people into the discussion and dominate the discussion while he had the floor, but without stifling anyone else’s point of view. It’s a rare scientist who can accomplish all of that – I am now truly a fan.

To sum up, I’m very happy I had the opportunity to attend this conference and looking forward to see what the next few years bring. I’m going back to Vancouver with an added passion to get my work finished and published, to get my code into shape, and to keep blogging about a field going through so many changes.

And finally, thanks to all of you who read my blog and said hi. I’m amazed there are so many of you, and thrilled that you take the time to stop by my humble little corner of the web.

>Stephan Schuster, Penn State University – “Genomics of Extinct and Endangered Species”

>Last year, introduced nanosequencing of complete extinct species. What are the implication of extinct genomes on endangered species.

Mammoth: went extinct 3 times… 45,000ya, 10,000 ya, and 3,500ya. Wooly rhino: 10,000 years ago, Moa 500 years ago (were eaten), Thylacine 73 years ago.. And Tasmanian devils, which are expected only to last another 10 years.

Makes you wonder about dinosaurs.. maybe dinosaurs just tasted like chicken.

Looking at population structure and biological diversity from a genomic perspective. (Review of Genotyping Biodiversity.) Mitochondrial genome is generally higher copy, and thus was traditionally the one used, but now with better sequencing, we can target nuclear DNA.

Mammoth Mitochondrial genome has been done. ~16,500bp. Includes ribosomal, coding and noncoding regions. In 2008, can get 1000x coverage on the mitochrondrial. You need extra coverage to correct for damaged DNA.

This has now allowed 18 mammoth mitochondrial genome sequences. 20-30 SNPs between members of same groups, and 200-300 between groups. WAY more sequencing than is available for african elephants!

Have now switched to using hair instead of bone, and can use hair shaft. (not just follicle)

Ancient DNA = highly fragmented. 300,000 sequences, 45% was nuclear DNA.

Now: Sequenced bases: 4.17Gb. Genome size is 4.7Gb. 77 Runs, got 32.6 million bases.

Can visit mammoth.psu.edu for more info.

Sequenced mammoth orthologs of human genes. Compared to watson/venter… rate of predicted genes of chromosomes (“No inferrences here”), Complete representation of genome available. SAP =Single Amino acid Polymorphism.

(Discussion Divergence for mammoth) coalescence time for human and neandethal, 600,000. Same thing happens for mammoth, but not really well accepted because the biological evidence doesn’t show it.

Did the same thing for the Tasmanian Tiger. Two complete genomes – only 5 differences between them.

Hair for one sample was taken from what fell off when preserved in a jar of ethanol!

Moa: did it from egg shell!

Wooly rhino: did the wooly rhino from hair – did other rhinos. (wooly is the only extinct one.) Rhinos radiated only a million years ago, so couldn’t resolve phylogenic tree. Tried: hair, horn, hoof, and bone.. bone was by far the worst.

Now, to jump to the living: the tasmanian devil. Highly endangered. 1996 infectious cancer discovered (not figured out till 2004). Devils protected since 1941. Isolations with fences, islands, mainland, insurance population. Culling and vaccination are also possible.

Genome markers will be very useful. Problem is probably because there is nearly no diversity in population. Sequenced museum sample devils, and show mitochondrial DNA had more diversity in non-living population.

Project for full genome is now underway – two animals. (More information on plans on what to do with this data and how to save them.) SNP info for genotyping to direct captive breeding program.
(“Project Arc”) Trying to breed resistant animals.

>Len Pennacchio, Lawrence Berkely National Laboratory – “ChIP-Seq Accurately Predicts Tissue Specific Enhancers in Vivo”

>The Noncoding Challenge: 50% of GWAS are falling into non-coding studies. CAD and Diabetes fall in gene deserts, so how do they work. Regulatory regions. Build a category of distal enhancers.

Talk about Sonic Hedgehog, involved in limb formation. Regulation of expression is a million bases away from the gene. There are very few examples. We don’t know if we’ll find lots, or if this is just the tip of the iceberg. How do we find more?

First part: work going on in the lab for the past 3 years. Using conservation to identify regions that are likely invovled. Using ChIP-Seq to do this.

Extreme conservation. Either things conserved over huge spans (human to fish) or within a smaller group. (human mouse, chimp).

Clone the regions into vectors, put them in mouse eggs, and then stain for Beta-galactosidase. Tested 1000 constructs, 250,000 eggs, 6000 blue mice. About 50% of them work as reproducible enhancers. Do everything at whole mouse level. Each one has a specific pattern. [Hey, I’ve seen this project before a year or two ago… nifty! I love to see updates a few years later.]

Bin by anatomical pattern. Forebrain enhancers is one of the big “bins”. Working on forebrain atlas.

All data is at enhancer.ldl.gov. Also in Genome Browser. There is also an Enhancer Browser. Comparative genomics works great at finding enhancers in vivo. No shortage of candidates to test.

While this works, it’s not a perfect system. Half of the things don’t express, and the system is slow and expensive. The comparative genomics also tells you nothing about where it expresses, so this is ok for wide scans, but not great if you want something random.

Enter ChIP-Seq. (brief model of how expression works) Collaboration with Bing Ren. (brief explanation of ChIP-Seq). Using Illumina to sequence. Looking at bits of mouse embryo. Did chipseq, got peaks. What’s the accuracy?

Took 90 of predictions, used same assay. When p300 was used, now up to 9/10 of conserved sequences work as enhancers. Also tissue specific.

Summarize: using comparative gives you 5-16% active things in one tissue. Using ChIP-Seq, you get 75-80%.

How good or bad is comparative genomics at ranking targets? 5% of exons are constrained, almost the rest are moderately constrained. [I don’t follow this slide. Showing better conserved in forebrain and other tissues].

P300 peaks are enriched near genes that are expressed in the same tissues.

Conclusion: p300 is a better way of prediction enhancers.
P300 occupancy circumvents DNA conservation only approach.

What about negatives? For ones that don’t work, it’s even better, but mouse orthologs bind, while human does not bind any more in mice.

Conclusion II: Identified 500 more enhancers with first method, and now a few reads done 9 months ago have 5000 new elements using ChIP-Seq.

Many new things can be done with this system, and integrating it with WGAS.

>Bruce Budowle, Federal Bureau of Investigation – “Detection by SOLiD Short-Read Sequencing of Bacilus Anthracis and Tersinia Pestis SNPs for Strain Id

>We live in a world with a “heightened sense of concern.” The ultimate goal is to reduce risk, whether it’s helping people with flooding, or otherwise. Mainly, they work on stopping crime and identifying threat.

Why do we do this? We’ve only had one anthrax incident since 2001… but we’ve been been using bioterrorism for a 2000 years. (several examples given.)

Microbial Forensics. We don’t just want knee jerk responses. Essentially the same as any other forensic discipline, again, to reduce risk. This is a very difficult problem. Over 1000 agents known to infect humans: 217 viruses, 538 bacterial species, 307 fungi, 66 parasitic protozoa. Not are all effective, but there are different diagnostic challenges. Laid out on the tree of life…. it’s pretty much the whole thing.

Biosynthetic technology. New risks are accruing due to advances in DNA synthesis. Risks are vastly outweighted by benefits of synthesis… bioengineering also plays a role.

Forensic genetic questions:
what is the source?
Is it endemic?
what is the significance?
How confident can you be in results?
Are there alternative explanations?

So, a bit of history on the “Amerantrax” case. VERY complex case, changed the way the government works on this type of case. Different preparations in different envelopes.

Goals and Objectives:
could they reverse engineer the process? To figure out how it was done? No, too complex, didn’t happen.

First sequencing – did a 15 locus 4-colour genotyping system. Was not a validated process – but helped identify strain. That helped narrow down the origin of the strain. Some came from texas, but it was more likely to have come from a lab than to come from the woods.

Identifying informative SNPs. Don’t need to know the evolution – just the signature. That can be then used for diagnostics. Whole genome sequencing for genotyping was a great use. Back in 2001, most of this WGS wasn’t possible. They had a great deal from Tigr – only $125,000 to sequence the genome. From the florida isolate : took 2 weeks, found out interesting details about copy number of plasmids. The major cost was then to validate and understand what was happening.

Florida was compared to Ames to one from UK, which gave 11 SNPs only. Many evolution challenges that came up. The strain they used was “cured” of it’s plasmid, so it evolved to have other SNPs… a very poor reference genome.

The key to identification: one of the microbiologists discovered that some cultures had different morphology. That was then used as another signature for identifying the source.

Limited Strategy: it didn’t give the whole answer – only allows them to rule out some colonies. It would be more useful to sequence full genomes… so entered into deal with ABI SOLiD for genome sequencing.

Some features were very appealing. One of them is the Emulsion PCR. Helped to improve quality and reliability of the assay. And beads, were useful too.

Multiplex value was very useful. Could test 8 samples simultaneously using barcoding, including the reference Ames strain. Coverage was 16x-80x, depending on DNA concentration. Multiple starting points gives more confidence, and to find better SNPs.

Compare to reference: found 12 SNPs in resequenced reference. When you look at SNP data, you see that there was a lot of confidence if it’s in both direction… however, it only turns up on the one strand. That becomes a major way to remove false positive result. That was really only possible by using higher coverage.

Not going to talk about Pestis.. (almost out of time.) Similar points, 130-180X coverage. Found multidrug transporter in the strain which has been a lab strain for 50 years. Plasmids were also higher coverage. SNPs were less in the north american, etc.
An interesting point. If you go to the ref in genbank, there are known errors in the sequence. Several have been corrected, and the higher coverage was helpful in figuring out the real sequence past the errors.

$1000 /strain using multiplex, using equipment that is not yet available. This type of data really changes the game, and can now screen samples VERY quickly (a week).

Every project is a large scale sequencing project
depth is good
multiplexing is good
keep moving to higher accuracy sequencing.

>Andy Fire, Stanford University – “Understanding and Clinical Monitoring of Immune-Related Illnesses Using Massively-Parallel IgH and TcR Sequencing”

>The story starts off with a lab that works on small RNAs, which believe they form a small layer of immunity. [did I get that right?] They work in response to foreign DNA.

Joke Slide: by 2018, we’ll have an iSequencer.

Question: can you sequence the immunome. [new word for me.] Showing a picture of lymphoma cells, which to me looks like a test to see if you’re colour blind. There are patches of slightly different shades…

Brief intro to immulogy. “I got an F in immunology as a grad student.” [There’s hope for me, then!]
Overview of VDJ Recombination, control by B-Cell differentiation. This is really critical – responsible for our health. One Model: If something recognizes both a virus and self, then you can end up with autoimmune response.

There is a continuum based on this. It’s not necessarily an either /or relationship.

There is a PCR/454 test for VDJ distribution. Under some cases, you get a single dominating size class, and that is usually a sign of disease, such as lymphoma. You can also use 454 for this, since you need longer reads, and read the V, D and J units in the amplified fragment. Similar to email, you can get “spam”, and you can use similar technologies to drop out the “spam” from the run.

To show the results of the tests for B-cell recombination species, you put V on one axis, J on the other. D is dropped to make it more viewable. In lymphoma, a single species dominates the chart.

An interesting experiment – dilute with regular blood to see detection limit – it’s about 1:100. For some lymphomas, you can’t use these primers, and they don’t show up. There are other primers for the other diseases.

So what happens in normal distributions? Did the same thing with VDJ, (D included so there are way more spots). Neat image.. Do this experiments with two aliquots of blood from the same person. Look for concordance. You find lots of spots fail to correspond well at the different time points, but many do.

On another project, Bone Marrow transplant. Recipient has a funny pattern, mostly caused by “spam” because the recipient really has very little immune system left. The patient eventually gets the donor VDJ types, which is a completely donor response. You can also do something like this for autoimmune disorders.

Malfunctioning Lymphoid cells cause many human diseases and medical side-effects. (several examples given.)

>Keynote Speaker: Rick Wilson, Washington University School of Medicine – “Sequencing the Cancer Genome”

>Interested in:
1.Somatic mutations in protein coding genes, including indels.
2.Also like to find: non-coding mutations, miRNAs and lincRNAs.
3.Like to learn about germ line variations.
4.Differential transcription and splicing
6.structural variation
7.Epigenetic changes
8.big problem: integrate all of this data.. and make sense of it.

Paradigm for years: exon focus for large collection of samples. Example: EGFR mutations in Lung Cancer. Large number of patients (some sample) had EGFR mutations. Further studies carry on this legacy in Lung cancer using new technology. However, when you look at pathways, you’ll find out finding that the pathways are more important than individual genes.

Description of “The Cancer Genome Atlas”

Initial lists of genes mutated in cancer. Mutations were found, many of which were new. (TCGA Research Network, Nature, 2008)

Treatment-related hypermutation. Another example of TCGA’s work: glioblastoma. Although they didn’t want treated samples, in the end they took a look and saw that treated samples have interesting changes in methylation sites, when MMR genes and MGMT were mutatated. If you know the status of the patient’s genome, you can better select the drug (eg, not use a alkylation based drug).

Pathways analysis can be done… looking for interesting accumulations of mutations. Network view of the Genome… (just looks like a mess, but a complex problem we need to work on.)

What are we missing? What are we missing by focusing on exons? There should be mutations in cancer cells that are outside exons.

Revisit the first slide: now we do “Everything” from the sample of patients, not just the list given earlier.

(Discussion of AML cancer example.) (Ley at al, Nature 2008)
Found 8 heterozygous somatic mutations, 2 somatic insertion mutations. Are they cause or effect?
The verdict is not yet in. Ultimately, functional experiments will be required.

There are things we’re not doing with the technology: Digital gene expression counts. Can PCR up gene of interest from tumour, sequence and do a count: how many cells have the genotype of interest?
Did the same thing for several genes, and generally got a ratio around 50%.

Started looking at GBM1. 3,520,407 tumour variants passing SNP filter. Broke down to Coding NS splice sites, coding silent, conserved regions, regulatory regions including miRNA targes, non repetitive regions, everything else (~15,000). Many of the first class were validated.

CNV analysis also done. Add coverage to sequence variants, and the information becomes more interesting. Can then use read pairs to find breakpoints/deletions/insertions.

What’s next for cancer genomics? More AML (Doing more structural variations, non-coding information, more genomes), more WGS for other tumours, and more lung cancer, neuroblastoma… etc.

“If the goal is to understand the pathogenensis of cancer, there will never be a substitute for understanding the sequence of the entire cancer genome” – Renato Dulbecco, 1986

Need ~25X coverage of WGS tumour and normal – also transcriptome and other data. Fortunately, costs are dropping rapidly.

>Peter Park, Harvard Medical School – “Statistical Issues in ChIP-Seq and its Application to Dosage Compensation in Drosophila”

>(brief overview of ChIP-Seq, epigenomics again)

ChIP-Seq not always cost-competitive yet. (can’t do it at the same cost as chip-chip)

Issues in analysis:Generate tags, align, remove anomalus, assemble, subtract background, determine binding position, check sequencing depth.

Map tags in strand specific manner: (Like directional flag in Findpeaks). Scoring tags accounting for that profile. Can be incorporated into peak caller.

Do something called Cross-correlation analysis. (look at peaks in both directions.) use this to rescue more tags. Peaks get better if you add good data, and worse if you add bad data. Use it to learn something about histone modification marks. (Tolstorukov et al, Genome Research).

How deep to sequence? 10-12M reads is current. That’s one lane on illumina, but is it enough? What quality metric is important? Clearly this depends on the marks you’re seeing (narrow vs broad, noise, etc). Brings you to saturation analysis? Show no saturation for STAT1, CTCF, NRSF. [not a surprise, we knew that a year ago… We’re already using this analysis method, however, as you add new reads, you add new sites, so you have to threshold to make sure you don’t keep adding new peaks that are insignificant. Oh, he just said that. Ok, then.]

Talking about using “fold enrichment” to show saturation. This allows you to estimate how many tags you need to get a certain tag enrichment ratio.

See paper they published last year.

Next topic: Dosage compensation.

(Background on what dosage compensation is.)

In drosophila, the X chromosome is up-regulated in XY, unlike in humans, where the 2nd copy of the X is quashed in the XX genotype. Several models available. Some evidence that there’s something specific and sequence related. Can’t find anything easily in ChIP based methods – just too much information. Comparing ChIP-seq, you get sharp enrichment, whereas on ChIP-chip, you don’t see it. Seems to be saturation issue (dynamic range) on ChIP-chip, and the sharp enrichments are important.
You get specific motifs.

Deletion and mutation analysis. The motif is necessary and sufficient.

Some issues: Motif on X is enriched, but only by 2-fold. Why is X so much upregulated, then? Seems Histone H3 signals depleted over the entry sites on X chr. May also be other things going on, which aren’t known.

Refs: Alekseyenko et al., Cell, 2008 and Sural et al., Nat Str Mol Bio, 2008

>Alex Meissner, Harvard University- “From reference genome to reference epigenome(s)”

>Background on Chip-Seq.

High-throughput Bisulfite Sequencing. At 72 bp, you can still map these regions back without much loss of mapping ability. You get 10% loss at 36bp, 4% at 47bp and less at 72bp.

This was done with a m-CpG cutting enzyme, so you know all fragments come with at least a single Methylation. Some update on technology recently, including drops in cost and longer reads, and lower amounts of starting material.

About half of the CpG is found outside of CpG islands.

“Epigenomic space”: look at all marks you can find, and then external differences. Again, many are in gene deserts, but appear to be important in disease association. Also remarkable is the degree of conservation of epigenetic patterns as well as genes.

where are the functional elements?
when are they active?
when are they available

Also interested in Epigenetic Reprogramming (Stem cell to somatic cell).

Recap: Takahashi and Yamanaka: induce pluripotent stemcell with 4 transcription factors: Oct2, Sox2, c-Myc & KLF4[?] General efficiency is VERY low (0.0001% – 5%). Why are not all cells reprogramming?

To address this: ChIP-Seq before and after induction with 4 transcription factor. Strong correlation with chromatin state and iPS. Clearly see that genes in open chromatin are responsive. Chromatin state in MEFs correlates with reactivation.

Is loss of DNA methylation at pluripotency genes the critical step to fully reprogram? Test hypothesis that by demethylation, you could cause more cells to become pluripotent. Loss of DNA methylation does indeed allows transition to pluripotency shown. [lots of figures, which I can’t copy down.]

Finally: loss of differentiation potential in culture. Embryonic stem cell to neural progenitor, but eventually can not differentiate to neurons, just astrocytes. (Figure from Jaenisch and Young, Cell 2008)

Human ES cell differentiation: often fine in morphology, correct markers… etc etc, but specific markers are not consistent. Lose methylation and histone marks, which cause significant changes in pluripotency.

Can’t yet make predictions, but on the way towards it in the future where you can assess cell type quality using this information.

>**BREAKING NEWS** Marco Marra, BC Cancer Agency – “Sequencing Cancer Genomes and Transcriptomes: From New Pathology to Cancer Treatment.”

>Why sequence Cancer-ome: Most important: treatment-response difference. To match treatments to patients. Going to focus on that last one.

Two Anecdotes: Neuroblastoma (Olena Morozova and David Kaplan), and Papillary adenocarcinoma (tongue), primary unkown. 70 year difference in age. They have nothing in common except for “can sequence analysis present new treatment options?”

Background on Neuroblastoma. Most common cancer in infants, but not very common. 75 cases per year in Canada. Patients often have relapse and remission cycles after chemotherapy. Little improvement until recently, when Kaplan was able to show abiltity to enrich for tumour initiating cells (TICs). This gave a great model for more work.

Decided to have a look at Kaplan’s cells, and did transcriptome libraries (RNA-Seq) using PET, and sequenced a flow cell worth: giving 5Gb of raw seq from one sample, 4 from the other. Align to reference genome using custom database. (Probably Ryan Morin’s?) Junctions, etc.

Variants found that are B-cell related. Olena found markers, worked out lineage, and showed it was closer to B-cell malignancy than brain cancer sample. These cells also show neuroblastomas, when reintroduced to mice. So, is neuroblastoma like B-cell in expression? Yes, they seem to have a lot of traits in common. It appears as though the neuroblastoma is expressing early markers.

Thus, if you can target B-Cell markers, you’d have a clue.

David Kaplan verified and made sure that this was not contamination (Several markers). Showed that yes, the neuroblastoma cells are expressing b-cell markers, and that these are not B-cells. Thus, it seems that a drug that targets B-Cell markers could be used. (Rituximab, and Milatuzamab) Thus, we now have an insight that we wouldn’t have have had before. (Very small sample, but lots of promise.)

Anectdote 2: 80 year old male with adenoma of the tongue. Salivary gland origin possibly? Has had surgery and radiation and a Cat scan revealed lung nodules (no local recurrance.) There is no known standard chemotherapy that exists… so several guesses were made, and an EGFR inhibitor was tried.. Nothing changed. Thus, BC Cancer was approached: what can genome studies do? Didn’t know, but willing to try. Genome from formalin fixed sample (which is normally not done), and handful of WTSS from Fine-needle aspirates. (nanograms, which required amplification). 134Gb of aligned sequence across all libraries – about 110Gb to genome. (22X genome, 4X transcriptome.)

Data analysis, compared across many other in-house tumours, and looked for evidence of mutation. CNV was done from Genome. Integration with drug bank, to then provide appropriate candidates for treatment.

Comment on CNV: histograms shown: Showed that as many bases are found in single allele as diploid and then again, just as many in triploid and then some places at 4 and 5s. Was selected pressure involved in picking some places for gain, whereas much of the genome was involved in loss?

Investigated a few interesting high CNV regions, one of which contains RET. Some amplifications are highly specific, containing only a single gene, whereas they are surrounded by loss of CNV regions.

Looking at Expression level, you see a few interesting things. There is a lack of absolute correlation between changes in CNV and the expression of the gene.

When looking for intersection, ended up with some interesting features:
30 amplified genes in cancer pathways (kegg)
76 deleted genes in cancer pathways
~400 upregulated, ~400 downregulated genes
303 candidate non-synonymous snps
233 candidate novel coding SNPs
… more.

Went back to drugbank.ca (Yvonne and Jianghong?) When you merge that with target genes, you can find drugs specific to those targets. One of the key items on the list was RET.

Back to patient, the patient was using EGFR targetting drug. Why weren’t they responsive? Turns out that p10 and RB1 are lost in this patient… (see literature.. didn’t catch paper).

Pathway diagram made by Yvonne Li. Shows where mutations occur in pathways, gains and losses of expression are shown as well. Notice Lots of expression from RET, and no expression from p10. p10 regulates (negative) the RET pathway. Also increases of Mek and Ras. Suggests that in this tumour, activation of RET could be driving things.

Thus, came up with a short list of drugs. Favorite was Sunitinib. It’s fairly non-specific, used for renal cell carcinoma. Currently in clinical trials, tested for other cancers. Implications that RET is involved in some of those diseases (MEN2a, MEN2B and thyroid cancers.) RET sequence in patient was not likely to be mutated in patient.

CAT scans: response to Sunitinib and Erlotinib. When on the EGFR targetting drug, nodule grew. On Sunitinib, the cancer retreated!

Lots of Unanswered questions: Is RET really driving this tumour? Is drug really acting on RET? Is PTEN/RB1 loss responsible for erlotinib resistance in this tumour?

We don’t think we know everything, but can we use genome analysis to suggest treatment: YES!

First question: how did this work with ethics boards? How did they let you pass that test? Answer: this is not a path to treatment, it is a path towards making suggestion. In some cancers there is something called Hair Analysis. It can be considered or ignored. Same thing here: we didn’t administer… we just proposed a treatment.

>Keynote Speaker: Rick Myers, Hudson-Alpha Institute – “Global Analysis of Transcriptional Control in Human Cells”

>Talking about gene regulation – has been well studied for a long time, but only recent on a genomic scale. The field still wants comprehensive, accurate, unbiased, quantitative measurements (DNA methylation, DNA binding protein, mRNA) and they want it cheap fast and easy to get.

Next gen has revolutionized the field: ChIP-Seq, mRNA-Seq and Methyl-Seq are just three of them. Also need to integrate them with genome-wide genetic analysis.

Many versions of each of those technology.

RNA-Seq: 20M reads give 40 reads per 1kb-long mRNA present as low as 1-2 mRNA per cell. Thus, 2-4 lanes are need for deep transcriptome measurement. PET + long reads is excellent for phasing, and junctions.

ChIP-Seq: transcription factors and histones.. but should also be used for any DNA binding protein. (Explanation of how ChIP-Seq works.) Using no-antibody control generally gives you no background [?] Chip without control gets you into trouble.

Methylation: Methyl-seq. Cutting at unmethylated sites, then ligate to adaptors and fragment. Size select and run. (Many examples of how it works.)

Studying human embryonic stem cells. (Cell lines are old and very different…. hopefully there will be new ones available soon.) Using it for Gene expression versus methylation status: When you cluster by gene expression, they cluster by pathways. The DNA methylation patterns did not correlate well, more along the line of individual cell lines than pathways. Thus, they believe it’s not controling the pathways.. but that could be an artifact of the cell lines.

26,956 methylation sites. Many of them (7,572) are in non CpG regions.

Another study: Studying Cortisol. Steroid hormone made by adrenal gland. Controls 2/3rds of all biology, helps restore homeostasis and affects a LOT of pathways: blood pressure, blood sugar, suppress immune system, etc. Fluctuates throughout the day. Pharma is very interested in this.
Levels are also tied to mood, etc.

Glucocorticoid receptor binds hormone in cytoplasm, translocates to nucleus. Activates and represses transcription of thousands of genes.

Chip-seq in A549: GR (-hormone): 579 peaks. GR (+ hormone): 3,608 peaks. Low levels of endogenous cortisol in the cell probably accounts for the background. (of peaks, ~60% are repressive, ~40% are inducing.) When investigating the motifs, top 500 hits really changes the binding site motif! No longer as set as originally thought – and led to discovery of new genes controled by GRE. Also show that there’s a co-occupancy with AP1.

[Method for expression quantization: Use windows over exons.]

Finally: a few more little stories. Mono-allelic transcription factor binding. Turns out to occur frequently, where only one allele is bound in ChIP, and the other is not binding at all. (in the shown case, turns out the SNP causes a methylation site, which changes binding.) Same type of event also happens to methylation sites.

Still has time: just raise the point of Copy Number Variation. Interpretation is very important, and can be skewed by CNVs. Cell lines are particularly bad for this. If you don’t model this, it will be a significant problem. Just on the verge of incorporating this.

They are going to 40-80M reads for RNA-Seq. Their version of RNA-Seq is good, and doesn’t give background. The deeper you go, the more you learn. Not so much with ChIP-Seq, where you saturate sooner.