Enabling Large Scale Exome & Transcriptome Studies through Science as a Service (ScaaS)
Justin J. Johnson, EdgeBio
[EDIT: You can find his slides here – http://www.slideshare.net/justinhjohnson/enabling-large-scale-sequencing-studies-through-science-as-a-service ]
[Background on EdgeBio – I’ll let you visit their site…]
How do you become proficient without a large seuqencing center behind you?
The landscape has changed significantly as technologies change, and the landscape continually adjusts. They’ve worked with authors to help guide open source people improve their products.
[Plug for SeqAnswers, and a slide showing the list of aligners available… it’s long.]
Real world examples.. 1500+ sample epigenetic study. Challenges include automation, tracking, sample prep, QC & delivery. Standards are critically important.
Transcriptome: RNA-seq projects include collaborations with scripps.
As applications become less of an issue, the biology becomes more and more important.
You frequently have smaller and smaller samples as the application becomes more important. Hence, amplification is often used. Example shown with Unamplified vs. amplified. You do get a difference, but the problem is worse than that.
Found less correlation between two platforms than there is when you disrupt the biology. Hence, what you do to your sample prep is equally important to any other factor.
Analysis platform can also have a huge impact to your results. You really have to think through the entire project to get the right analysis tools AND the right sample prep.
Solution: Probably isn’t one.. You have to do a lot of work, and use multiple pipelines – and proper sample prep.
Exome & targeted resequencing. Many different methods to do this. Ultimately, comes down to the variations you’re looking for and what pipeline you apply. Pick the proper tools and prep methods.
SNP calling, btw, is not a solved problem.
Even coverage you require is dependent on the biology.
Example done with the Venter genome, then ran through different pipelines… each one comes out unique – and wow, poor overlap. (No software names given.) This is done with optimized settings, etc, after years of experience on them… this isn’t just default settings.
Each software works best on different biology data sets. Need to pick appropriately.
On to Ion Torrent.
Have 2 machines, Longer accurate reads in 2.5 hours. Some metagenomics, microbiobial resequencing, etc. They will be 200bp soon, which will be a game changer.
DH10B done on the Ion Torrent. Consistent high quality and good reads. [Can’t copy the data from the slide, but results look very good.]
Overall, quality is high right out of the gate.
An example of E. coli strain in european outbreak. Did the whole thing in a under 2 hours using CLC Bio software, got similar results to that at BGI.
As the talk was very fast, If you have more questions, find Justin at lunches or breaks. [Yes, it was VERY fast!]