BlueSEQ revisited

On the first day of the Copenhagenomics 2011 conference, I took notes on a presentation made by Peter Jabbour of BlueSEQ in which I interlaced some comments of my own. I was particularly disappointed in the presentation, which completely failed, in my opinion, to demonstrate the value of the company.  This prompted BlueSEQ marketer Shawn Baker to post a reply that addresses some of my points, but failed to get to the heart of the matter.  However, I had the opportunity to speak to BlueSEQ CEO Michael Heltzen on Friday morning, setting me straight on several facts.  Given what I’d learned, I thought it was important to take the time to revisit what I had said about BlueSEQ.

I understand some people thought my criticism of BlueSEQ was targeted.  Let me set the record straight: Of all of the companies that presented or attended at Copenhagenomics 2011, the only one I have any relationship at all with is CLC bio, and that is – to this point – entirely informal.  Any criticisms I have made about BlueSEQ, or any other any company, are simply my own opinion based on the information presented – and for the record, I do have a little experience with business models.

In this case, the presentation lead me to believe there were a lot of holes in the BlueSEQ business model.  Fortunately, CEO Michael Heltzen was kind enough to patiently answer my questions and explain the business model to me, which has prompted me to change my opinion.

In case you haven’t heard of BlueSEQ, they’re an organization that serves to match users that have unmet sequencing needs (“users”) with groups that have surplus sequencing capacity (“providers”). This is a simplified version of what they do, at least – and was the focus of their presentation at Copenhagenomics 2011.

Initially, BlueSEQ set themselves up during the presentation as a young company that just “went live” recently.  While there’s nothing wrong with that, I have spent time as an entrepreneur and am aware that young companies have a tendency to be a little overly optimistic about their markets and potential for finding customers.  Although BlueSEQ did boast of about a hundred users signing up for their services, I listened carefully but didn’t hear anything about providers having signed up as well.  That set off flags for me.  BlueSEQ CEO Michael Heltzen patiently explained to me that they do, in fact, have 25 providers already signed up – a very impressive number for just over a month of operations.

Having paying clients, or providers in this case, is 90% of the battle for any match-making company and knowing that there are groups paying for BlueSEQ’s services should be music to the ears of any potential investors.  That, on it’s own, provided some significant validation of the company’s business model for me.  Obviously, if people are currently paying for it, then clearly there is value.

And speaking of paying, the presentation did not explain what it was that providers were paying for.  A 10% service fee – charged to providers – was mentioned during the presentation, which seems a little high for nothing more than a service linking buyers with sellers.  I heard the same comment from other people who saw the presentation and voiced their concern (albeit more quietly than I did) that it was a bit disproportional.  However, again, BlueSEQ’s Michael Heltzen provided the explanation:  BlueSEQ doesn’t just match sequencing providers with users –  they provide a complete front-office service, not only promoting the sequencing centre’s business by matching them with the users, but also by handling the initial steps of any inquiries and working with the user to sort out the wet and dry lab requirements of any potential sequencing project.  Suddenly, I think the value of BlueSEQ’s services should be apparent.

Many groups with excess sequencing capacity may find themselves in a position where they have the ability to provide sequencing services, but not the facilities to handle customer requests or promote themselves to find the users who could take advantage of the sequencing services.  Enter BlueSEQ.

This explanation, diametrically opposite to the “web portal” model described during the business presentation, suddenly shows where the potential for an entrepreneurial group can build a concrete business.   The analogy used during the BlueSeq presentation of a web portal where people can buy airline tickets by comparing prices on-line was a poor choice, completely diminishing the value that BlueSEQ provides by interpreting, analyzing and, in-part, educating the sequencing users.  What a service that could be!

With good experimental design being one of the most difficult parts of science, BlueSEQ is in fact sitting in the wonderful position of being the early entry into a completely new business model.  They are able to transform the disjointed requests of novice users into complete experimental plans and then match those experiments with labs that have experience and capacity for performing those experiments well.  The user gains by getting competitive quotes and help in setting up the product they want, while the the provider gains by being able to focus on the service they provide without the complexities of dealing with customers that may not know what they want or need.

Pure genius.

Of course, there are still pitfalls ahead with this type of business model.  There really is no bar to entry for other competitors, other than the experience of the current group. (I’m sure it’s extensive, but there are others out there who could do the same.)  There is also no real guarantee that what they are doing will be cost effective in the long run.  As sequencing becomes cheaper and cheaper, it might actually come to a point where it will be more cost efficient to turn to a professional sequencing company like Complete Genomics that does provide a full service than to a portal and matchmaking service like BlueSEQ.  Of course, those are concerns that I’m sure BlueSEQ has put more thought into than I have – and will be up to them to solve.

As I said last time, and I meant it quite sincerely: Good luck to the business.  I’ll be looking forward to hearing their presentations in the future – and I hope they have only good things to report.

11 tips for blogging talks and conferences

While I’m still not quite recovered from the jet-lag from the flight home, I thought I’d take a quick shot at answering a question I was asked frequently last week:  “How do you blog a scientific conference?”  So I thought I’d take a stab at some of the key points in case anyone else has any interest in trying.

  1. Focus! The hardest thing about blogging a conference is the amount of attention it takes.  If you are easily distracted, you’ll miss things – and it can be really hard to get back into a talk once you’ve missed a couple of key points.  Checking your email, twitter or surfing the web are all bad ideas.
  2. Listen! The speaker is really the best source for getting the key points.  If they’re doing a good job, then you don’t even need to see the slides – they’ll summarize the main points and make your job easy.
  3. Know your limits. If you don’t understand something, you’re not going to be able to summarize and explain it.  Frankly, product talks are pretty much impossible to blog – just point to the catalog.
  4. Read the slides.  A really bad speaker can make it hard to blog their talk, but fortunately, that’s what slides are for: summarizing the presenter’s points.  If you can’t follow along with what they’re saying, you can always interpret the slides for yourself.
  5. Know what to omit.  A really good speaker can be incredibly distracting, wandering away from the main point of the talk to tell stories or insert asides.  You don’t need to write down everything, especially if you can’t reproduce it well.  Capturing speakers jokes can be next to impossible.
  6. Think! It may sound odd, but the process of writing notes is about what you think is important.  You have to carefully interpret what the speaker is saying and decide what is that you feel is central to the arguments.  Blindly copying things frequently fails to tell the story well.
  7. Don’t guess! It’s easy to miss something (and yes, you will miss things), but how you handle the things you miss is important.  If you can’t remember a number or an exact phrasing, just summarize it – if you guess about the value or quote someone incorrectly, it can really upset both the speaker who’s work you’ve misrepresented or the audience, who may rely on what you’ve told them.  If you’re not sure on a point, be clear about that as well.  It’s better to err on the side of caution.
  8. Keep your thoughts separate. This can be challenging.  With all that’s going on, it’s easy to mix up your opinions with the speaker’s points, since your notes are really just your interpretation of what you’re hearing.   However, to preserve the integrity of the speaker’s points, you need to ensure that they don’t get confused.  I use a system of brackets to do so but any other clearly marked system will work as well.
  9. Type fast! This should be obvious.  The faster you can type, the more complete your notes will be.  Conference blogging is not for slow typers.
  10. Use the right tools. I blog directly in my blog’s editor, but you can use any other system that works for you.  The most challenging part is to make sure you have autosave on, and that it works well.  There’s nothing worse than losing something you’ve written – especially since you can’t go back to ask a speaker to do their first 10 slides over if something goes wrong.
  11. Practice! This isn’t a skill you develop overnight – the more you do this, the easier it becomes.  Start with a single talk and learn from your mistakes.

So, there you have it.  The top 11 tips I’d give for anyone who would like to blog a talk – or even a whole conference.   And, of course, don’t forget to enjoy the talks.  If you’re not getting something out of listening to someone else speaking, why are you taking notes on it? (-:

Copenhagenomics 2011, in review

It’s early Saturday morning in Copenhagen and Copenhagenomics 2011 is done.  I was going to say that the sun has set on it, but the city is far enough north that the sun really doesn’t do much more than sink a bit below the horizon at night.  That said, the bright summer sunshine has me up early – and ready to write out a few thoughts about the conference.

[Yes, for what it’s worth, I was invited to blog the conference so I may not be completely impartial in my evaluation, but I think my comments also reflect the general consensus of the other attendees I spoke to as well.  Dissenters are welcome to comment below.]

First, I have to say that I think it was an unqualified success.  Any comments I might have can’t possibly amount to more than suggestions for the next year.  The conference successfully brought together a lot of European bioinformaticians and biologists and provided a forum in which some great science could be shown off.

The choice of venue was inspired and the execution was flawless, despite a few last minute cancellations.  These things happen, and the conference rolled on without a pause.  Even the food was good (I didn’t even hear Sverker, a vegetarian Swede, complain much on that count) and the weather cooperated, clearing up after the first morning.

As well, the conference organizers’ enlightened blogging and twittering policy was nothing short of brilliant, as it provided ways for people to engage in the conversation without being here first hand.  Of course, notes and tweets can only give you so much of the flavour – so those who did attend had the benefits of the networking sessions and the friendly discussions over coffee and meals.  The online presence of the conference seemed disproportionately high for such a young venue and the chat on the #CPHx hashtag was lively.  I was impressed.

With all that said, there were things that could be suggested for next year.  Personally, I would have liked to have seen a poster session as part of the conference.  It would have been a great opportunity to showcase next-gen and bioinformatics work from across europe.  I know that the science must be there, hiding in the woodwork somewhere, but it didn’t have the opportunity to shine as brightly as it might have.  It also would have served to bring out more graduate students, who made up a small proportion of the attendees (as far as I could tell). Next year, I imagine that this conference will be an ideal place for European companies and labs to do some recruiting of young scientists – and encouraging more graduate students to attend by submitting posters and abstracts would be a great way to facilitate that.

Another element that seemed slightly off for me was the vendors.  They certainly had a presence and were able to make their presence noticed, but the booths at the back of the room might not have been the best way for companies to showcase their contributions.  That said, I suspect that copenhagenomics will have already outgrown this particular venue by the next year anyhow and that it won’t be a concern moving forward.

While I’m on the subject of vendors, what happened to European companies like Oxford Nanopore, or the usual editor or two from Nature?  Were some UK attendees scared off by the name of the conference?  I’m just putting it out there – it’s entirely possible that I simply failed to bump into their reps.

In any case, the main focus of the conference, the science, was excellent.  There were a few fantastic highlights for me.  Dr. John Quackenbush‘s talk challenged everyone to seriously re-consider how we make sense of our data – and more importantly, the biology it represents.  Dr. Elizabeth Murchison‘s talk on transmissible cancers was excellent as well and became a topic of much conversation.  Heck, three of my fellow twitter-ers were there and each one did a great job with their respective talks. (@rforsberg, @dgmacarthur and @bioinfo)

In summary, I think the conference came off about as smoothly as any I’ve seen before – and better than most.  If I were given the opportunity, this would be a conference I’d pick to come back to again. Congratulations to the organizers and the speakers!

CPHx: Morten Rasmussen, National High-Throughput Sequencing Centre, sponsored by Illumina – Exploring ancient human genomes

Exploring ancient human genomes
Morten Rasmussen, National High-Throughput Sequencing Centre, sponsored by Illumina


Why study ancient DNA?  By studying modern species, we can only add leaves to the end of the phylogenetic tree, but not to study the nodes, or extinct branches. [my interpretation.]

How do you get ancient DNA? Bones and Teeth, mainly.  Coprolites are now used as well, and soft tissue, if available.  Ice and sediments can also be used in some cases.

Characteristics: The colder and dryer the environment, the better quality of the DNA preservation.  Age is also a factor.  The older the DNA, the less likely it is to have survived.  More than 1 million years is the limit, if conditions were optimal.

Goldilocks principle.  There is a sensitivity limit – you need enough.  Some is too short – you need longer strands.  You also need to worry about modern DNA contamination – mostly microbial.  Thus, within those constraints, you need to work carefully.

Some advantages in next-gen seq tho – no need for internal primers, size constraints are ok, etc.

DNA barcodes are frequently used to look at biodiversity.  Align the sequences to look for conserved regions surrounding a variable region – allowing primers to be designed for either end of the variable region.  If sequences are identical, you can’t distinguish the origin of the DNA.  [obviously a different type of bar-coding than what we usually discuss in NGS.]

Ice core genetics.  Willerslev et al, Science (2007).  Interesting results found in the “silty” ice, which included DNA from warmer climate plants.

Late survival of mammoth and horse…  can use similar techniques as ice cores to soil cores.

Paleogenomics.  DNA is often highly fragmented and full of bacterial contamination.  A big part of this is finding the right sample.. Eg, look in greenland for good samples where the cold will have preserved samples well.  Hair sample found, which was eventually moved to denmark.

Big issue of contamination, however still has to be dealt with.  Fortunately, DNA is held inside the hair, so washing hair with bleach removes most surface contaminants without harming the DNA sample.  Gives good results – vastly better than bone results that can’t use that method.  (84% in this case is homo sapiens, versus 1% recovery for neanderthal bone.)

DNA damage:  Expected damage from ancient DNA as previously observed, but bioinformaticians did not see significant damage.  Turns out that Pfu was used in protocol in this round, and Pfu does not amplify Uracil.  This has the unexpected side effect of “removing” the damage.

Standard pipeline was used, mapping to hg18.  only 46% of reads mapped, because only uniquely mapped reads were used for the analysis.  Multi-mapped reads were discarded, and clonal reads were also “collapsed”.  Still, 2.4 billion basepairs covered, 79% of hg18, 20X depth.

Inference about phenotypic traits:

  • dark eyes
  • brown hair
  • dry earwax
  • tendancy to go bald

Of course, many of those could have been predicted anyhow, but nice to confirm.

Compared to other populations with SNP chip data.  Confirmed that the ancient greenland DNA places the sequenced individual near the chukchis and koryaks (Populations from northern siberia).  That’s good, because it also rules out contamination from the people who did the sequencing. (Europeans.)  Thus, this was probably from an earlier migration than the current greenlanders, consistent with known data about migrations to the region.

What does the future hold:

  • More ancient genomes
  • Targeted sequencing for larger samples.

Why targeted sequencing of ancient DNA?  If you capture the most important bits of DNA, you would generate more interesting data with less effort, giving the same results.


CPHx: Daniel MacArthur, Wellcome Trust Sanger Institute & Wired Science – Functional annotation of “healthy” genomes: implications for clinical application.

Functional annotation of “healthy” genomes: implications for clinical application.
Daniel MacArthur, Wellcome Trust Sanger Institute & Wired Science


The sequence-function intersection.

What we need are tools and resources for researchers and clinicians to merge information together to utilize this data.  Many things need to be done, including improving annotations, fixing the human reference sequence and improved databases of variation and disease mutations.

Data sets used – single high quality individual genome.  Anonymous European from hapmap project.  One of the most highly sequenced individuals in the world.

Also working on a pilot study with 1000 genomes, 179 individuals from 4 populations.

Focussing on loss of function variants.  SNPs with stop codons, disrupting splice sites, large deletions and frame-shift mutations.  Expected to be enriched for deleterious mutations.  Have been found in ALL published genomes – all genomes are “dysfunctional”.  Some genomes are more dysfuntional than others…  however, it might be an enrichment of sequencing errors.

Functional sites are typically enriched for selective pressures, leading to less variation.  The more likely something is to be functional, the more likely you are to find error. [I didn’t express it well, but the noise has a greater influence on highly conserved regions with low variation than on regions with higher variation.]

Hunting mistakes

  1. sequencing errors.  This gets easier to find as time goes by and tech. improves.
  2. reference or annotation artefacts.  False intron in annotation of genes, or otherwise.
  3. Unlikely to cause true loss of function.  eg, truncation in last amino acid of protein.

Loss of function filtering.  Done with experimental genotyping, manual annotation and informatic filtering.  Finally, after all those filtering, you get down to the “true LOF variations.”

example. 600 raw becomes 200 filtered by any transcript, down to 130 filtered on all transcripts.

Homozygous loss of function variants were observed in the high quality genome.  The ones observed cover a range of genes.  the real lof variations tend to be rare, enriched for mildly deleterious effects.

LOF variants affect RNA expression.  Variants predicted to undergo nonsense mediated decay are less frequent. [I may have made a mistake here.]

Can use LOF variants to inform clinical outcomes.  You can distinguish LOF variant genes from recessive disease genes.  ROC AUC = 0.81 (Reasonably modest but predictive model.) Applying this to disease studies at Sanger.


  • More LOF variants for better classification
  • Improve upstream processes
  • Improve human ref seq
  • Use catalogs of LOF tolerant genes for better disease gene prediction

CPHx: Kevin Davies, Bio-IT World – The $1,000 genome, the $1,000,000 interpretation

“The $1,000 genome, the $1,000,000 interpretation”
Kevin Davies, Bio-IT World


Taking notes on a talk by a journalist is pretty much a bad idea.   Frankly, it would be akin to reducing a work of art to a mere grunt.  The jokes, nuances and elegance would all be lost – and if I were some how able to do a good job, it would have the nasty side effect of putting Kevin out of work when everyone spends their time reading my blog instead of inviting him to speak himself  – or worse, instead of reading his book.  (Alas, I haven’t read it myself, either.)

However, in the vein of letting people know what’s happening here, Kevin has taken the opportunity to review some of the early history of next gen sequencing.  It’s splashed with all sorts of wonderful artefacts that represent the milestones: the first solexa genome sequenced (A phage), James Watson’s genome, the first prescription for human sequencing, etc.

More importantly, the talk also wandered into some of the more useful applications and work done on building the genomic revolution for personalized medicine.  (You might consider checking for one great example.  Pulitzer prize winning journalism, we’re told.)  Kevin managed to cover plenty of ways in which the new technologies have been applied to human health and disease – as well as to discover common human traits like freckling, hair curl and yes, even Asparagus anosmia!

Finally, the talk headed towards some of the sequencing centres and technologies we’ve seen here, including Complete Genomics, PacBio and a brief sojourn past Oxford Nanopore.  Some of my favourite technologies – and endlessly interesting topics for discussion over beer.  And naturally, as every conversation on next-gen sequencing must do, Kevin reminds us that the cost of the human genome has dropped from millions of dollars for the first set, down to the sub $10,000 specials.  Genomes for all!



CPHx: Anne Palser, Welcome Trust Sanger Inst., Sponsored by Agilent Technologies – Whole genome sequencing of human herpesviruses

Whole genome sequencing of human herpesviruses
Anne Palser, Welcome Trust Sanger Inst., Sponsored by Agilent Technologies


Herpes virus review.  dsDNA, enveloped viruses.  3 major classes, alpha, beta, gamma.

Diseases include Kaposi’s (KSHV-140kb genome) sarcoma, Burkitt’s lymphoma (EBV – 170kb genome).

Hard to isolate viruses to sequences.  In some clinical samples, not all cells are infected.  When you sequence samples, you get more human DNA than you do virus.  Little known about genome diversity,  All sequences come from cell lines and tumours.  There is no wild type full genome sequence.

Target enrichment method used to try to enrich for virus DNA.

Samples of cell lines used.  Tried 5 primary effusion lymphoma cell lines (3 have EBV, all 5 have KSHV) and 2 burkett lymphoma cell lines (EBV).

Custom baits designed using 120-mers, each base covered by 5 probes for KSHV.  Similar done for EBV1 and EBV2. [skipping some details of how this was done.]

Flow chart for “SureSelect target enrichment system capture process” from illustration.

Multiplexed 6 samples per lane.  Sequenced on Illumina GaII.

Walk through analysis pipeline.  Bowtie and Samtools used at final stages.

Specific capture of virus DNA.

  • KSHV.  77-91% reads map to reference sequence.  Capture looked good.
  • EBV: 52-82% mapping to ref.

Coverage looks good, and high for most of the genome.   Typical for viral sequencing.

SNPs relative to ref. sequence.  500-700 for KSHV, 2-2.5k for EBV relative to reference seq. Nice Circos-like figure showing distribution.


  • Custom SureSelect to isolate virus dna from human dna is successful.
  • full genome sequence viruses obtained.
  • analysing snps and minority species present
  • currently looking at saliva samples, looking estimate genomic diversity
  • looking at clinical pathologies
  • high throughput, cost effective, applicable as a method to analyse other pathogen sequences.

CPHx: Elizabeth A Worthey, Medical College of Wisconsin – Making a Definitive Diagnosis: Successful clinical application of diagnostic whole enome sequencing

Making a Definitive Diagnosis: Successful clinical application of diagnostic whole enome sequencing
Elizabeth A Worthey, Medical College of Wisconsin


Making a Definitive Diagnosis.

Original request came 2009, young child with intractable irritable bowel disease.  No known test was diagnostic.   Primary physician went to a talk on WGS, and wanted to know if it would work on the child.

Case, poor weight gain at 15 months w perianal abscess.  symptoms consistent with sever Crones disease.  90 trips to the OR by the age of 3. Disease progressed even after severe operations.  [Some very graphic photos here.]

Time is of the essence, bottleneck was in the analysis.  Child was very ill, so had to work fast.  Expected about 15,000 variation.    Used Adobe Flex UI, java middleare layer, oracle 11g DB.

CarpeNovo.  Gives variant reports, etc. Used the tool for about 4-5 months to narrow down 16k to just 2 that were highly conserved positions, not found in additional human genome sequences.  Left only 1 variant after more analysis.

XAIP gene.  Mutation changed single amino acid.  Clinical diagnostics done to confirm sequence variant.  Also not in other family members.  Conservation of this position is extreme, including in non-mammal model organisms.

Mutation would be predicted to affect release of inflammatory molecules.  Used assays to confirm this was the case in vitro.

Diagnosis then was made, and compared to other XIAP deficiencies, such as XLP2.  Standard treatment for XLP2 is allogenic hematopoetic transplant.  After this treatment, child progressed very well, and has few recurring symptoms, etc.  Doing well!

Not the end of the story.  After this, other physicians started to request similar programs.  Did not have resources to do this for everyone who requested it.  Went to the hospital and looked for funds to continue this program with additional children.

Multi-disciplinary, multi-institutional review process.  Patents receiving care at the hospital can be nominated.  Review committee makes decisions.  This is NOT a research project – it’s focussed on treatment of patient.  Is there a likely outcome for potentially changing treatment, can it reduce the cost of diagnostic testing, etc?

Structure of review board covered.  External expert physicians, committee review and nominating physician.  It takes 8-10 hours of work per patient nominated.

Discussion of ethics of what to return.  Data observed from NGS is not added to electronic health record.

WGS done on 6 individuals since.

Case #2.

Intractable siezures, neurological symptoms.  Also: was the twin sibling at risk?

Found two mutations that cause Jubert syndrome, but presentation was not classic.  Unfortunately, no direct actions were possible.

Case #3

Full term infant born, seemed normal, but at 10 weeks rushed to the hospital.  [missed the why though]  Two mutations in twinkle gene.  Child died at 6 months of age.  Avoided major futile surgery.

Broader findings.

Have pre-authorization from insurance.  Education of providers and patients are necessary.  Large diverse teams required.  Diseases will be redefined:  known phenotype but different gene.  You don’t always have improved treatment options, and sometimes there are none.




CPHx: Lisa D. White – Baylor College of Medicine – Chromosomal Microarray (aCGH) Applications in the Clinical Setting

Chromosomal Microarray (aCGH) Applications in the Clinical Setting
Lisa D. White – Baylor College of Medicine


Work shown here is the work of a large number of people.

Conflict of interest statement.  Baylor does get revenue from it’s sequencing services.

Custom targeted arrays.  180k postnatal CMA – high resolution, related to MR, DD, DF, autism, heart defects, seizure disorder.  entire mitochondria genome.  Recently upgraded to a 400kb postnatal array, has same coverage as other array, but includes 120k new snps across the genome.

Interested in detecting absense of heterozygosity.  eg, consanguinity.

How does it work?  uses same label protocol w restriction digestion.  SNPs are recognized by whether it is cut or not.

Absense of heterozygosity is not loss of heterozygosity.  Happens with consanguinity, eg, identical regions inherited, not loss of a chromosome.

Also, Uniparental disomy, when one parent gives you both copies of the same chromosome, rather than one from each parent.  [is that correct?]

Examples given, showing Illumina 610 Quad vs Agilent custom.  Looks good.

Discovery of incest in assessment of AOH detection.  In clinical setting, it’s possible to identify cases of incest based on chromosomal data, eg. consanguinity.  Raises ethical issues, however.

Limits to the array.  [general array stats, like regions it doesn’t cover, situations like balanced translocations.  etc.]

Other situations: DNA extraction of uncultured Amniocytes.  Informed Consent.  MD collects sample and ships to lab.  DNA extraction is done (Bi et al, 2008 Prenatal diagnosis.).  3-5ml.  Do three thingsÆ Maternal cell contamination test, gender PCR and Quantitation.  Average turn around time is 6 days. (some info about back up culture, from set aside portion of the sample, but it’s rarely needed)

Prenatal example… indication of abnormal hands and feed… found a 500kb duplication detected.  Able to show it was de novo, not tied to either parent.


  • Arrays are important for diagnostics, even given NGS.
  • Can do valuable work, and can be offered more uniersally for all pregnancies.
  • Recently launched a cancer genetics lab, which will also use array CGH and NGS as part of the test.

Also developing NGS tests as well, moving forward.  Looking for diagnostic tests that can move into the CLIA lab for proper applications.

Big effort with lots of people working on it.

CPHx: Edwin Cuppen, Hubrecht Inst. and Utrecht University – Are we looking at the right places for pathogenic human generic variation?

CPHx: Edwin Cuppen, Hubrecht Inst. and Utrecht University – Are we looking at the right places for pathogenic human generic variation?

Where are we looking, typically?  Genomes.  Thus, we search for variations across the genome.  We then end up sequencing the whole genome, but then lack the tools to sequencing.

Reduce work and costs by multiplexing.  Typically, we multiplex sample prep, multiplex enrichment, then barcode.  Instead, multiplex enrichment would be more cost effective.

Barcode Blocking would be the way to go.

Example shown comparing to agilent sureSelct – exactly the same.  Have pushed so far to 5 samples in this case.

You can also show this scales to 96x-fold, however, then you need proportionally more sequencing for large data sets. (eg, wouldn’t want to do this for a genome.)

Average base coverage per sample using 96-plex.  It is between 40-100x, so there’s only a 2-fold distribution.

Do you see allelic competition in enrichment pool? It’s possible, but in practice, you don’t see it.

Example given for X-exome screenome.  Only one enrichment with all of the different families, so it’s more cost effective.  Show ability to identify causative variants.

Are we looking at the right places?  UTRs, promoters, enhancers, insulators, chromatin organizers, non-coding RNA.  There is much more than just protein coding sections in the genome.  However, if we look at the whole genome, there are limitations there too.  And still, what about structural variations?

Mate Pair sequencing of structural variation using mate-pair sequencing.  Not only do you get distance of structural variations, you also get direction information.

Proof of principle.  Detection of a three way translocation.  Started with a diagnosed patient.  It was found by standard cytogenetic analysis, but the question was if they could find it using structural variant detection. Sequenced father, mother, child.

Thousands of predicted structural events.  It includes errors in reference, it includes artefacts.  Some are just found in mother or father – inherited.  Are you finding the known breakpoints?  Yes… but they found more.  It was not just 3, it was far more.

10 of them were then confirmed, including the three that were expected.  Original data set did not make predictions about disrupted genes.  Looking at the new breakpoints observed, however, one was a protocadherin15, which is mutated in Usher syndrome – which explains the phenotype.

Cytogenetics gives you less information, which is simple, but the next gen sequencing gives you way more information, and can then give explanatory power.  In fact, you can use den novo to make sense of the data more effectively and reconstruct the chromosomes.  You can even get single bp resolution.

Chromothripsis.  Shattering and reassembling of chromosomes.  Some pieces are lost, others are mixed, and reassembly occurs giving you information that would be challenging to identify otherwise.  Were able to reconstruct this data.

Mate pair seq in diagnostics.  Tag-density and mate-pair information can be used. Trio based approach used.  They were able to identify exact gene disrupted, where other approaches failed.  Single bp resolution.

You can also use it to resolve complex rearagements that could not otherwise be visualized with other technique.  Chromothripsis may be much more common than expected, as it has been observed in other samples.

Can be applied to cancer research.  Tumour specific structural variation.  There is significant differences between two tumours of the same type, even.

Chromothripsis looked for in cancer samples.. expected 2-4%.  Found it in all but one sample.  (looking in metastatic colorectal cancer.)  Chromotripsis seems to be a common phenomonon driving cancer events.

Able to find expected events as well, and able to find known cancer genes affected by the arrangements.

Also went back to exome sequencing.  Found a few interesting mutations in known cancer genes.


Multiplexed targetd sequencing aproaches are effective for large and small sample sets.

Structural variation can be relevant and is largely missed, but can be assayed by using mate pair sequencing.

Chromothripsis is a novel and frequent process that contribututes to dramatic somatic and germline structural variation and disease.

For understanding disease, we need to evaluate genomes at the nucleotide AND the structural level.