1.Somatic mutations in protein coding genes, including indels.
2.Also like to find: non-coding mutations, miRNAs and lincRNAs.
3.Like to learn about germ line variations.
4.Differential transcription and splicing
8.big problem: integrate all of this data.. and make sense of it.
Paradigm for years: exon focus for large collection of samples. Example: EGFR mutations in Lung Cancer. Large number of patients (some sample) had EGFR mutations. Further studies carry on this legacy in Lung cancer using new technology. However, when you look at pathways, you’ll find out finding that the pathways are more important than individual genes.
Description of “The Cancer Genome Atlas”
Initial lists of genes mutated in cancer. Mutations were found, many of which were new. (TCGA Research Network, Nature, 2008)
Treatment-related hypermutation. Another example of TCGA’s work: glioblastoma. Although they didn’t want treated samples, in the end they took a look and saw that treated samples have interesting changes in methylation sites, when MMR genes and MGMT were mutatated. If you know the status of the patient’s genome, you can better select the drug (eg, not use a alkylation based drug).
Pathways analysis can be done… looking for interesting accumulations of mutations. Network view of the Genome… (just looks like a mess, but a complex problem we need to work on.)
What are we missing? What are we missing by focusing on exons? There should be mutations in cancer cells that are outside exons.
Revisit the first slide: now we do “Everything” from the sample of patients, not just the list given earlier.
(Discussion of AML cancer example.) (Ley at al, Nature 2008)
Found 8 heterozygous somatic mutations, 2 somatic insertion mutations. Are they cause or effect?
The verdict is not yet in. Ultimately, functional experiments will be required.
There are things we’re not doing with the technology: Digital gene expression counts. Can PCR up gene of interest from tumour, sequence and do a count: how many cells have the genotype of interest?
Did the same thing for several genes, and generally got a ratio around 50%.
Started looking at GBM1. 3,520,407 tumour variants passing SNP filter. Broke down to Coding NS splice sites, coding silent, conserved regions, regulatory regions including miRNA targes, non repetitive regions, everything else (~15,000). Many of the first class were validated.
CNV analysis also done. Add coverage to sequence variants, and the information becomes more interesting. Can then use read pairs to find breakpoints/deletions/insertions.
What’s next for cancer genomics? More AML (Doing more structural variations, non-coding information, more genomes), more WGS for other tumours, and more lung cancer, neuroblastoma… etc.
“If the goal is to understand the pathogenensis of cancer, there will never be a substitute for understanding the sequence of the entire cancer genome” – Renato Dulbecco, 1986
Need ~25X coverage of WGS tumour and normal – also transcriptome and other data. Fortunately, costs are dropping rapidly.