AGBT talk: David Jaffe, Broad Institute of MIT and Harvard

Title: High-Quality Draft Assemblies of a Dozen Vertebrate Genomes from Massively Parallel Sequence Data”

[EDIT (2011-02-10): Note the speaker has provided some clarifications to these notes in the comments below.  I have struck out comments that have been clarified, and made references to the authors comments where warranted.]

[I walked in at 2:00 as the 454 seminar was wrapping up, instead of 2:30, so I’ve been waiting a while for this talk… the suspense has been building!]

In 2000, vertebrate sequencing cost ~$2,000,000 per genome.  29 were done.

By 2010, BGI used SOAP denovo to do more genomes using illumina alone.  Coverage of segmental duplications, etc were raised as issues.

Goal: “Convince you that it is possible”

evidence: 1.  control genomes (human + mouse) 2. new vertebrate genomes.

How do we get there?

  • ALLPATHS-LG (lg is for large genomes)
  • new algorithms need to evolve with the new lab techniques
  • approach: sequence and assemble blind, then compare to reference when available.
  • Goal: each new genome should not be a research project.

Lab recipe: PET 200bp, 2000bp, 6,000bp and 40kbp (which required development of new methods. – poster by Louise Williams)

Algorithm philosophy:Discussion of small/large k values for assembly. (use large k for  specificity, small k for sensitivity.)  Goal with allpaths is to use all possible biological information – don’t lose reads that are relevant.

Challenge 1: Reads are inaccurate.  Remove errors, but don’t remove SNPs.   (algorithm discussed. used k to identify read stacks, look outside the k to try to correct poor quality non-matching positions.)

[I missed challenge 2… I thought that was all challenge 1.]

Challenge 3 :coverage is uneven;

Solutions: Use High Coverage!  improve lab process, improve algorithms. (See Gnirke talkfor better methods for GC rich areas. [Unfortunately, I didn’t go to that session.])

[Again, I missed challenge 4, which I couldn’t differentiate from part of challenge 3.]

Challenge 5: You don’t know the exact answer for what the genome says.

Challenge 6: Computations are large: billions are reads, need all of your reads in memory at once for assembly.

Solution: Buy more ram. [Ok, really?  that’s not a challenge, if throwing money at it solves the problem that easily, it’s just a funding issue – EDIT: See comments below.]

On to the experiment: (control)

Sequenced Ms.B6 mouse (finished genome available) and NA12878 cell line (sequenced by 1000 genomes).  Three way comparison: Allpaths-lg, SOAPdenovo. (No other one is available. [really?  It thought trans-abyss and velvet were also available and able to do this…. maybe not, I’m not an expert on assembly.]

Criteria: Continuity, Completeness, Accuracy.

Comparison summary: Capillary and allpaths do ok (capillary is slightly better) and soapdenovo is always trailing the pack.

[A whole LOT of slides that look like identical bar graphs – you’ll have to make do with the summary above.]

Doing de-novo assemblies for 15 genomes (7 fish, 8 mammals).  [ooo coelacanth genome!]

Excluding genomes [did he mean scaffolds?] in which 40kb data sets are not good (good def’n by physical coverage 20x or greater).  This removes a lot of the variability in the quality.

“Fish are hard”

Assembly discovers sequences you can’t get any other way.

Even if you do alignment based stuff, there is great value in assembly as a complimentary approach.

Allpaths-lg future:  lots of improvement, error can be driven towards zero.

Yes, you can do whole genome assembly of vertabrates.

[I don’t think I learned much new in this talk – but I don’t really see much difference between the assemblers – EDIT:See comments below.  My take home message is that ALLPATHS-LG can be used on larger data sets, but then again, they claimed that they need massive amounts of RAM…  I’ll just suggest people check out the software and decide for themselves.]

AGBT talk: Sophien Kamoun – The Sainsbury Laboratory

Title: Genome Evolution in the Irish Potato Famine Pathogen Lineage

We face a crisis in food production and food prices.  (New York Times article from Feb 4th) If you can’t eat, you don’t worry about cancer.

World population is expected to peak at 9billion, and we’re not producing enough food.  One aspect of that is crop diseases, which could allow millions more to be fed if we could overcome this problem.

One of the most important is oomycete Phytophthora – (latin: plant destroyer), kills dicots – 10’s of Billions of dollars worth annually.  World’s biggest potato producer is China.

P.infestans suppresses or triggers plant immunity.  Able to invade cells, and forms stuctures between cells. (hyphae)

Some plants carry resistance and have an apoptosis like response.  Resistant plants also suppress the immunity supressors.  [strange phrasing is all mine.]

Genome sequence of Phytophthora infestans is complete.  Published last year. (Cover of nature – a rotten potato!)

Compare P infestans genome to others of the family – very large expansion. Number of genes is about the same, but 240Mbp vs. 65-95Mbp.  Much of it is repeat driven.

Effectors (immunity suppressors) typically occur in expanded repeat-rich and gene-poor loci. (Examples :RXLR, AVR4)

Most of the genes in the genome are all clustered with 1kb of each other, except for a spattering that occur in the repeat-rich regions.  This is an unusual distribution.

Core othologs are all in the clustered regular regions, effector genes all seem to be in the repeat-rich, again, unusual distribution.

Some discussion of how the parasite evolved along with “host jumping”.  Resequenced several isolates of 4 related strains. (all of which have the large genome expansion), and compared them.

4-fold number of genes missing in repeat region vs non-repeat regions. Repeat regions are more plastic. Look at dn/ds, and there is also different selection pressures between the two.

Repeat regions are also highly enriched in genes induced during colonization of tomato and potato (Raffaele et al, Science 2010)


  • Core genome- high density region/low repeat content
  • “plastic” region – low gene density/high repeat content
  • high rates of gene turnover and positive selection in the plastic genome
  • “niches” in the genome for rapid effector evolution.
  • Unexpected rapidly-evolving “plastic” genome familes – cell wall hydrolases, histone and rRNA methyltransferases. [wasn’t discussed in the talk, as far as I can tell, bt interesting nonetheless.]


Using Genomics to improve isease resistance.  Emergence of P infestans “blue 13” clone which is dominating UK isolates, but was barely present 10  years ago.

Core effectors as targets for resistance.

Synthetic R genes with expanded effector recognition.  (modify potato genes to improve resistance.)  Expansion: An R3a mutatnt that recognizes both AVR3a(ki and em form), it is expected to be effective against all P.infestans isolates.  Did create this in the lab… and some success. some clones were able to trigger cell death response.


Single resitue mutations expand effector recognition.

Non-Gm solution through genome editing?

The knowledge of pathogen effectors and comparative genomes is essential.

[Again, a neat talk on a topic I knew nothing about.  Well delivered and very clearly explained.]

AGBT talk: James Giovannoni, Cornell University

Title: Utilization of Next Generation Sequencing for Creation and Exploitation of the Tomato Genome.

[He’s creating tomato genomes?  Odd phrasing, but we’ll see how this works out.]

Think of this as a side dish to the genome technology you’ve been hearing. [nice…]

Tomato has a lot uses: eating, ketchup, de-skunking.  Most important source of vitamin A and C, simply because the amount of it that we eat.

Looking for a reference genome – a great biological system for studying fruit ripening.  Synteny of tomato and potato is high.  So, high quality reference for one of these will be a big help.  Related to pepper & eggplant as well.

Also a wide variety of tomato species, in a wide variety of environments.  (Picture shown of the wild progenitor of the tomato, but it’s really hard to see, unfortunately) Originated in south america, but domestication happened in Europe after Cortez brought it back. Modern breeding of tomatoes has all descended from european stock, so there is a LOT of diversity in the americas that has not yet been tapped.

International consortium working on this. Sequencing efforts started in 2004.  Originally started with a BAC approach.  1500 bacs were sequenced by the consortium.  Things have changed fast, and there is now 454 (31x), Sanger (3.6x), Illumina (82x) and Solid (141x) reads.

Assembler strategy covered – (sequencing, filtering and assembly – using all available information).

Very pretty genetic map and FISH slides. Also a summary of metrics from version 1.0 to 2.3, which is currently frozen for publication. Validation summary as well. [I’m sure all of this will be in the paper, so I’m not copying out details.]

An automated annotation pipeline, run by collaborators in Belgium, also frozen for publication.

Sequence is available:

Still working on the sequence – setting a high target.  apx 1/3 of gaps can be closed by in silico means. (using Celera CABOG assembly).  using 100 bacs that spanned gaps, most of the sequences match the gap. [I may have missed something]

IMAGE2 used for closure and finishing – Iterative mapping and assembly for gap elimination.  Closed 11 of 12 gaps, and was able to reduce size of 12th gap.  Very resource intensive, tho.


  • duplications common to plant genomes are found here.
  • triplication event for dicot clade
  • etc.

New carotenoid genes with novel tissue-specificities.

Neat explanation/slide of the genetic regulation of the development and maturation of fleshy fruit. Chlorophyll degradation is inversely related to non-photosynthetic pigments.

Decoding the fruit transcriptome using large-scale strand-specific transcriptome sequencing.  Looking at both strands has caused them to revise how they believe expression occurs in the fruit. (skipped over examples in the interest of time.) About 5% of genes needed to be revised when strand-specific was taken into account.

Also doing ChIP-Seq, for tomato epigenome, both TF and histone data. (plant specific transcription factors)

Only fleshy fruits ripen, histone methylation seems to be correlated with regulation of genes involved – processes are tied together. [again, I missed part of the explanation] (Manning et al, 2006, Nat Gen 38)


  • high quality assembly of tomato genome
  • continuing to refine it
  • 97% of assembly is in 91 scaffolds, linked to 12 tomato chromosomes
  • annotation of 35,000 genes.
  • evidence of whole duplication events.
  • epigenetic and RNA-Seq providing novel insights into control of fruit ripening.

[Interesting talk… I knew nothing about the tomato, and little about fruit ripening beyond what you learn in undergrad… There seem to be good papers on this, which would make a nice blog entry one day.]

AGBT – page views

In case anyone was wondering what traffic to my blog looks like during AGBT season, I think this image is relatively informative.   I don’t have the ones from the last couple year’s AGBT conferences, but it looks about the same.  Fortunately, my computer seems to be coping with the load – and my wife hasn’t emailed me to say that its fan is whining yet.. (Yes, every time you view my blog, you’re actually visiting my living room.)

Graph shows jan 19 – Feb 4, 2001.  Y-axis is page views.

AGBT Talk: Joseph Petrosino, Baylor College of Medicine

Title: Toward Improved Bacterial  and Viral Metagenomic Sequencing and AnalysisStrategies in Healthy and Diseased individuals.

[EDIT: I found this talk really hard to take notes on – many of the slides did not have easily extractable messages, despite being interesting.  Errors are likely in the content below.]

Will focus more on viruses, as Dr. Knight focused more on bacteria.

NIH Human Microbiome Project (HMP).  Genomes from 900 microbiome bacteria  – has grown to 3000.  Characterize microbiome from healthy people (baseline), doing transcriptome, viral an eukaryotic microbiome. 15 disease-oriented Demonstration Projects.

Sample sites are 15/18 locations on the body, depending on gender of subject.

[skipping poo joke….   right… carying on.]

Sample -> enrich bacteia, virueses fungi -> extract DNA, – > sequence (which ever strategy) -> community structure and other value info (pathways/etc).

Descrtiption of sample sites and collection techniques – you need to do a lot of standardization.  [I’m not going into details, and the speaker is covering it very briefly.]

Moving on to the bacterial communities dedrograms.  Samples cluster by location of colection, and very specific to environment. (tongue different from saliva, etc.)

Many new disease relationshp projects (long list…) includes astronaut microbiomes.

Viral Metagenomics: detect encasidated viruses in clinical samples to discover relationships to health and disease.  (Virus hunting.)

In healthy patients, you have small virus loads.

Upstream processing covered – much filtering done. [review of cDNA library construction]  Can require over 80 PCR cycles.

Do random primer designs sample viruses equally well?  How much depth is needed to capture viral diversity?  454 vs illumina?  (Huge Human contamination.)

Some slides comparing results, [couldn’t pull out take home message fast enough]. plateu out aroung 30-40% of reads of a lane. [GAII?  not sure.]

Sampling is difficult, you don’t know if you’re capturing the whole population, but what you see caps out at 30-40%.

Random primer construction- does it work? Compared 6 different strategies.  [No take home message that I heard.]

Does more sample = more viruses, maybe.  You don’t need huge amounts of sample.

Virus families captured by random primers: many of them.  [I’m not listing, but there’s a difference by which primers are used.]

Data section:

Viral familyies detected in 4 subjects.  Patterns starting to emerge. [I can’t see them, though] Both DNA and RNA viruses detected.  Hits need to be verified. Are these colonizing, or are they just “passing through”.

Phage: 48 phages in 1st pass query against database.  Phage population can give you info about the microbiome.

Virus protocol differentiates stool and nasal wash viruses.   [yes, you can tell the difference, qualitatively.]

Some Diseases:

  • Kawasaki disease
    • children’s disease, usually found in children of asian decent. Cause is unknown.
    • [unpublished data] – seems to be a few viruses associated – still needs to be validated.
  • Elephant Herpes virus
    • all 6 calves born at houston zoon in last 2 decates have died from EEHV.
    • At zoo, they named the baby elephant “Baylor” to up the ante.
    • Did the usual process to try to pull out virus
    • Able to assemble EEHV1
    • research still underway

Many other projects ongoing.  Upward trend for viral metagenomic strategies.

Many areas to improve still, including improved curration of viral db. Better measures for coloniztion/passing through viruses.

AGBT talk: Rob Knight, University of Colorado at Boulder

Title: Spatially and Temporally Explicit Studies of the Human Microbiome

Sequencing is getting dramatically searching, as we all know.  What we can do now is dramatically different than what we could do a decade ago.

We know, since the invention of the microscope, (van Leeuwenhoek in 1683), we know that the human body is covered in bacteria.

Why should you care about your microbes? They can have interesting effects, eg, determine whether tylenol is toxic to your liver. (PNAS).  If you’re a fruit fly, it can determine your partner preferences (PNAS), steal genes from your food to help you digest it.

There are as many E. coli in your gut as there are people on earth.  It’s not the dominant member in the cut, though – it’s just best at growing on a petri dish.

Any two people you pick have 99.9% the same genome, but E. coli genomes can differ up to 40%.  Humans may not be unique like snow flakes, but our symbionts are!

Can start asking intelligent questions about our microbial selves.

How human are we?  in terms of cells, we’re 10% of the cells in our body.  only one percent of the DNA [if I got that right]

Most of the world is made of bacteria – animals and plants are a very small number of organisms.  and 99% aren’t culturable.

How do we look at them, then?  Get samples and extract DNA -> PCR amplify (usually SSU rRNA gene) -> sequence -> blast against genebank (but this is less and less useful. You now get a lot of hits on uncultured stuff. so skip this.) -> align and build trees to figure out what you can.

Problem: big trees are hard to understand and analyze.

Issue: need to interpret vast amounts of sequence/tree data.  Interpretation isn’t trivial as trees become massive.

Experiment: microbial biogeography on the keyboard?  (Are keys deserts for bacteria, different from fingertips?)  Result: we have distinct cultures, each of us, but our keyboards mirror our fingertip bacteria. (PNAS) – it was on CSI: Miami, so you know it’s true.

Darwin’s “Origin” has the first phylogenitic tree. [I did not know that]

Calculating a community distance metric. If trees are identical, distance =0.  If complete separation right from root, then distance = 1. [Very visually informative slides – discussing how we perceive the data in the metric.]

(Lozupone & Knight 2007, PNAS) [hope I got the name right – it’s jammed into the corner of a slide.]  Experiment looking for related-ness among a large number of samples.  They did see a significant divide between saline/non-saline.

Interesting: Extreme environments are not outliers.  However, there are outliers: they’re in the vertibrate gut!

QIIME: integrating analysis of hundreds of samples using barcodes.  Use 454 mostly, but also illumina. Use sequences to build phylogenetic trees.

[Joke about why we still call it “454”…. because that’s the temperature your money burns at when you do these experiments…. ]

[Joke section on sequencing technologies to watch out for… I can’t do it justice.]

but i digress…..

Different body habitats are very different from each other.  (2009 Science)  [I recall seeing this last year at AGBT, I think.] When on antibiotics, your communities change dramatically, and getting a picture of overall human microbiome variability.

You don’t need a lot of sequences per samples to see the patterns.  same pattern in 10 seq/sample as 1500seq/sample.

Have done these studies over time – over 3 months , visualized in a live 3D graph. [worth seeing, actually, very cool.]

Picture of Rodrigo Salvadore Dali painting. (It’s a pretty picture, but doesn’t tell you the whole story)

Detailed biogography of the human face.

[Nifty visualizations for the visualization of distribution of bacteria on the face.  Obviously can’t blog that.]

Where do the bacteria come from?  (Which raises privacy issues.)  Babies who come out vaginally  all have vaginal communities,  those that are born by c-section have a very different population.

Diversity of babies’ bacterial communities increases by day, and by the end of 3 years, they resemble their mother’s bacterial communities.

Do differences in the microbiome matter?  Fat mouse experiment says yes. (Two examples – Leptin and TLR5)  With TLR5 knockout mouse, the bacteria are different and seem to make the mouse hungrier – you can “rescue” the mouse by changing the bacteria. Same applies to Burmese pythons.

Fat vs thin are Bacteriodes vs Firmicules [missed which one is which, tho, and not sure about the spelling.]

Future directions: personalized medicine in developing nations. Pilot studies in “humanized mice” measuring input microbes, diet change and BMI.  Can you develop test from gut microbes to predict effects of diet/obesity/etc?

Much of the work is in developing systems for measuring and recording environmental conditions, etc.

Earth Microbiome project coming…

Conclusion: we all have a microbiome, and anyone can do this type of work now that sequencing is so cheap  – much of the cost of experiment is now in DNA extraction.

[A neat talk, summarizing a lot of published work.  Unfortunately, I couldn’t read most of the citations.  Talk was memorable for it’s good visualization tools and the excellent speaker.]

AGBT talk: Praveen Cherukuri, NHGRI

Title: Massively Parallel Sequencing of Exmomes and Transcriptomes in ClinSeq Participants.

Clinseq: large scale sequencing project of 1000 patients who have identified as phenotype for clinical symptoms of coronary disease. Started in Jan 2007, participants between 45-65 years old.

Nice slide illustrating balance between: Clinical data, genome breadth and # subjects. Hard to get all 3.

Project started with Targeted Gene approach, switched to Whole Exome and Whole Transcriptome. (403 exomes and 14 transcriptomes already done.)

Data analysis and workflow slide – Very similar to everyone else – and have a nextgen variant database. [no description given here for the db, unfortunately.] Erange and cufflinks used for processing reads.

Many novel variants are singleton – most do not show up in multiple data sets. [expected, I suppose, given what we see elsewhere.  Only polymorphisms (not novel) saturate quickly, by definition.]

Focus on differential allele expression: when each copy of a chromosome carries different alleles, they may be expressed differently, and that may relate to disease.

Whole exome gives you ability to count reads and count freequency. [as you’d expect, really.]  Distribution is generally similar (looks kinda like a normal distribution), stuff on the tails are allele specific expression.

High amount of correlation of allele frequency for both variants, but at greater than 100x, you see more variation.

Example gene: ERAP2, which has previously been published and known to have differential ASE.


  • refining methodologies… [I think I missed something with this point.]
  • ASE is reproducible
  • implementing integrative computational approaches on participants on patients with both Exome and transcriptome data.

AGBT talk: Kateryna Makova, Penn State Univerisity

Title: Dynamics of Mitochondrial Heteroplasmy in three families

Brief overview of Mt DNA:  37 genes, 16.5kb long.  Maternally transmitted, sperm MT is destroyed upon fertilization, and also a high mutation rate (poor repair mechanism, environmental effects.)

Heteroplasmy: presence of more than one mtDNA variant in an individual.

Mitochondrial bottleneck during oogenesis.

Makes MT DNA interesting.

more than 200 diseases are mutations in mtDNA.  Can be severe, frequency of disease/normal mtDNA in one person can determine severity.  There is no cure – so focus is on prevention of transmission.  (Nuclear transfer of maternal DNA into a n enucleated oocite of a healthy female could be done… but hasn’t been proved.)

mtDNA mutations are predisposing to features of aging. (alzheimers, parkinsons, diabetes, etc). Also possible link to autism.

mtDNA mutations are also markers for cancer – link not yet determined.

Recent studies with NGS get you further indetecting heteroplasmy, but outstanding dissagreements. (He at al, 2010 : heteroplasmy is common (from cell lines), Li et al, 2010: heteroplasmy is rare (from 131 individuals.))

question: how does heteroplasmy affect individuals, and how does it change during transmission to offspring?

Study design [not blogging this part – go read the research. (-: ]

Real challenge: how to distinguish low freq heteroplasmy from seq errors?  Tackled with lots of simulations and clonal samples and spike ins. Result: 2% or greater (conservatively, probably closer to 1%).


6 heteroplasmic sites – [nice map of chrMT, but um… yeah, not bloggable without a camera.]

“Static heteroplasmy” never observed.  Frequency shift without mutation, somatic (shifts in minor allele frequency between tissues) and germline (passed on to child) mutations observed.

Found: One germline, 3 frequency shits, 2 somatic mutations.

The one germline was different between two children from one mother, suggested that mutation happened early on.

Use Galaxy for this project.  History and log are available for this project – can be run on the cloud if you like. [cool.]


  • Heteroplasmic frequency shifts happen frequently,
  • analysis is reproducible,
  • objective determination frequency threshold calculations,
  • and a framework exists for repeating this work.

AGBT talk: Daniel Neafsey, Broad Institute of MIT and Harvard

Title: Hybrid selection for sequencing pathogen genomes from clinical samples

Sample bottleneck for pathogen sequencing applications.  Host DNA monopolizes any sample, but culturing pathogens takes 6 weeks and can introduce bias.  Instead, use Hybrid selection.

Example: Plasmodium falciparium Malaria, but generalizes to any pathogen. (Some background presented.)

  • Seq first in 2003,
  • AT Rich: 81%!
  • hard to sequence

“Agilent’s Sure select system” is new name of Hybrid Selection.

  • Use bait, wash away non target DNA, then wash way rest, unhybridize.

40-fold enrichment when done on parasite DNA.  But, starting from blood spots, you don’t always have a lot.  However, no penalty for Whole Genome Amplification. (Same results as for straight blood.)

Technology helps remove human contamination, and gives much better data on regions covered by baits.

New approach: Whole Genome Baits (Rogov & Melnikov).

Whole Genome -> adaptors, (T7 tail) -> bait.

Whole genome bait results are comparable to synthetic baits, and again, no penalty for simulated clinical samples.

Coverage Similar to pure parasite DNA.  There are peaks and troughs, but comparable to pure DNA.  Dips correspond to AT rich regions, which are hard to sequence anyhow, and can’t be improved much over what you get with regular sequencing.

Tested on real clinical sample – blood spot, one year old.  Amplified, then whole genome selection.  Used HiSeq, and got 5.9% plasmodium DNA.

Figure of “Accurate SNP calling from Hyb-Sel data.”  Not just picking up parts that look like ref plasmodium, but also finding other stuff, which influences SNP calling – but still reasonable.

Other examples, with 50x enrichment.  Did other examples from 12 clinical samples, again saw good enrichment.

Summary: This is a good method for enriching pathogens, which helps reduce the amount of sequencing needed – and that means cost is also reduced.  Good for everyone.

Used on malaria for a model, but could work well on other pathogens.  Will enable sequencing of clinical samples collected in drug & vaccine trials.

AGBT Talk: Life Tech – Ion Torrent

Title: Scaling Semiconductor Sequencing.

  • Sequencing on a semi conductor chip.  Uses known technology for chips, but changes way we do sequencing.  Chip is actually the machine.
  • Should allow scaleup every 6 months.
  • Chemistry is “post-light” sequencing.  Chemistry requires no optics, so no quenching.  Coverage is very uniform.
  • Native polymerases and native nucleotides – which also makes it low cost.
  • Chips start @ $250
  • Speed is also a benefit.  2 hour seq. runs.
  • Sample prep is decoupled from sample prep, which can be done in batch.   Many improvements coming in sample prep – currently takes about 8 hours, should be about 2 hours by the time optimizations are done.
  • Rapid performance improvements [nice graphic, but can’t copy it down for you… I’m sure they’d share it, however.]

Chad Nusbaum from Broad Institute will give the main talk:

Title: Implementation and applications…. something like that.  [ (-:  ]

  • Key tech:
    • Does it do something we need
    • can it do it better
    • can it provide special benefit?
    • Is there enough need to make it worth the investment.
  • What’s appealing about ion torrent:
    • fast, simple, cheap
  • “[Speed] matters more now than it did then.”
  • Requires little infrastructure.
  • “lightweight process”
  • Easier to trouble shoot
  • machine is small and inexpensive
  • supply chain is simple too.
  • “Tactical”
    • Speed – quick turn around on processes.
    • simplicity means esier optimization, etc
  • Applications:
    • Usual stuff
    • Viral sequencing,
    • QC large sequencing pools
    • QC of targetted capture samples
    • Exon capture experiments
    • Transcriptomics/cDNA
    • Genome assembly.
  • Did a tumour sequencing validation expt.
    • After filtering, you get 10-100 variations per tumour.
    • showing validation results. 72/93 variants were called correctly (from the “hardest” set.  [not sure why they were hard… microphone keeps cutting out, and may have missed something.]
    • more than 99% of reads yield assignable barcodes (for a pooled library).
  • Lab optimization: (slide on optimization cycle….  not particularly informative.)
  • current chip have 7m wells.
  • Observed low GC bias.
  • good representation of homopolymers
  • some computational approaches being worked on… wont talk about it.

Summary: fast, signifiant performances, demonstrated utility.

[Neat talk, and very quick.. now off to the meetup!]