>Complete Genomics CEO:
– sequence only human genomes – 1 Million genomes in the next 5 years
– build out tools to gain a good undertanding of the human genome
– done 50 genomes last year
– Recent Science publication
– expect to do 500 genomes/month
Lots of Customers.
– Deep projects
– don’t waste pixels,
– use ligases to read
– very high quality reads – low cost reagents
– provide all bioinformatics to customers
– don’t sell technology, just results.
– just return all the processed calls (snps, snv, sv, etc)
– more efficient to outsource the “engineering” for groups who just want to do biology
– fedex sample, get back results.
– high throughput “on demand” sequencing
– 10 centres around the world
– Sequence 1 Million genomes to “break the back” of the research problem
– they do the bioinformatics
– first wave: understand functional genomics
– second wave: pharmaceutical – patientient stratification
– third wave: personal genomics – use that for treatment
Focus on research community
Two customers to present results:
Jared Roach, Senior Research Sceintist, Institute for Systems Biology (Rare Genetic disease study)
– studied coverage in four genomes
– 85-92% of genome
– 96% coverage in at least one individual
– Excellent coverage in unique regions.
– within 25bp, and some places down to 10bp
– identified 125 breakpoints
– 90/125 occur at hotspots
– can reconstruct breakpoints in the family
Since they have twins, they can do some nice tests
– infer error rate: 1×10^-5
– excluded regions with compression blocks (error goes up to 1.1^-5)
– Homozygous only: 8.0×10^-6 (greater than 90% of genome)
– Heterozygous only: 1.7×10^-4
[Discussion of genes found – no names, so there’s no point in taking notes. They claim they get results that make sense.]
[Time’s up – on to next speaker.
Zemin Zhang, Senior Scientist, Genentech/Roche (Lung Cancer Study)
Cancer and Mutations
[Skipping overview of what cancer is…. I think that’s been well covered elsewhere.]
– lung cancer is the leading cause of cancer related mortality worldwide…
– significant unmet need for treatment
Start with one patient
– non small cell lung adenocarcinoma.
– 25 cigarettes/day
– tumour: 95% cancer cells
Genomic characterization on Affy and Agilent arrays
– lots of CNV and LOH
– circos diagrams!
– 131GB mapped sequence in normal, 171Gb mapped seq in tumour
– 46x coverage normal, 60x tumour
[Skipping some info on coverage…]
KRAS G12C mutation
what about rest of 2.7M SNVs?
– SomaticScore predicts SNV validation rates
– 67% are somatic by prediction
– more than 50,000 somatic SNV are projected
Selection and bias observed in the lung cancer genome by comparing somatic and germline mutations
GC to TA changes: Tobacco-associated DNA damage signature
Protection against mutations in coding and promoter regions.
– look at coding regions only – mutations are dramatically less than expected – there is probably strong selection pressure and/or repair
Fewer mutations in expressed genes.
– expressed genes have fewer mutations even lower in transcribed strand
– non-expressed genes have mutation rate similar to non-genic regions
Positive selection in subsets of genes
– KRAS is the only previously known mutation
– Genes also mutated in other lung cancers…
Finding structural variation by paired end reads
– median dist between pairs 300bp.
– distance almost never goes beyond 1kb.
Look for clusters of sequence reads where one arm is on a different chromosome or more than 1kb away
– small number of reads
– 23 inter-chr
– 56 intra-chr
– use fish + pcr
– validate results
– 43/65 test cases are found to be somatic and have nucleotide level breakpoint junctions
– chr 4 to 9 translocation
– 50% of cells showed this fusion (FISH)
Possible scenario of Chr15 inversion and deletion investigated.
[got distracted, missed point.. oops.]
– very nice Circos diagram
– > 1 mutation for every 3 cigarettes
In the process of doing more work with Complete Genomics