CPHx: Daniel MacArthur, Wellcome Trust Sanger Institute & Wired Science – Functional annotation of “healthy” genomes: implications for clinical application.

Daniel MacArthur, Wellcome Trust Sanger Institute & Wired Science


The sequence-function intersection.

What we need are tools and resources for researchers and clinicians to merge information together to utilize this data.  Many things need to be done, including improving annotations, fixing the human reference sequence and improved databases of variation and disease mutations.

Data sets used – single high quality individual genome.  Anonymous European from hapmap project.  One of the most highly sequenced individuals in the world.

Also working on a pilot study with 1000 genomes, 179 individuals from 4 populations.

Focussing on loss of function variants.  SNPs with stop codons, disrupting splice sites, large deletions and frame-shift mutations.  Expected to be enriched for deleterious mutations.  Have been found in ALL published genomes – all genomes are “dysfunctional”.  Some genomes are more dysfuntional than others…  however, it might be an enrichment of sequencing errors.

Functional sites are typically enriched for selective pressures, leading to less variation.  The more likely something is to be functional, the more likely you are to find error. [I didn’t express it well, but the noise has a greater influence on highly conserved regions with low variation than on regions with higher variation.]

Hunting mistakes

  1. sequencing errors.  This gets easier to find as time goes by and tech. improves.
  2. reference or annotation artefacts.  False intron in annotation of genes, or otherwise.
  3. Unlikely to cause true loss of function.  eg, truncation in last amino acid of protein.

Loss of function filtering.  Done with experimental genotyping, manual annotation and informatic filtering.  Finally, after all those filtering, you get down to the “true LOF variations.”

example. 600 raw becomes 200 filtered by any transcript, down to 130 filtered on all transcripts.

Homozygous loss of function variants were observed in the high quality genome.  The ones observed cover a range of genes.  the real lof variations tend to be rare, enriched for mildly deleterious effects.

LOF variants affect RNA expression.  Variants predicted to undergo nonsense mediated decay are less frequent. [I may have made a mistake here.]

Can use LOF variants to inform clinical outcomes.  You can distinguish LOF variant genes from recessive disease genes.  ROC AUC = 0.81 (Reasonably modest but predictive model.) Applying this to disease studies at Sanger.


  • More LOF variants for better classification
  • Improve upstream processes
  • Improve human ref seq
  • Use catalogs of LOF tolerant genes for better disease gene prediction

5 thoughts on “CPHx: Daniel MacArthur, Wellcome Trust Sanger Institute & Wired Science – Functional annotation of “healthy” genomes: implications for clinical application.

  1. Thanks Anthony! Your summary is clearer than my talk was, I suspect.

    One minor correction (to the bit where you already noted you might have made a mistake) – that plot of the effect of stop codons on RNA expression showed the number of reads mapping to the loss-of-function allele vs the functional allele in heterozygous individuals. What we found is that variants predicted to undergo NMD showed significantly lower expression of the stop codon allele. That’s what you would expect – however, this didn’t affect all variants predicted to undergo NMD, indicating that we still need to improve our models for predicting which stop codons will trigger this process.

  4. Great summary and great blog. Apologies for being a bastard in the past. You are performing a wonderful service here, the things I complained about in the past are obviously not a part of your psyche. Deeds mean more than words. Thanks for doing this.

    • I’m glad I’ve managed to change your opinion of me – and I appreciate the change in tone. I’ve also learned a lot about tone and editing from that experience, so I think it’s all good.

      At any rate, I’m glad you’re finding the posts to be useful. I had fun doing them, and it’s rewarding to know that others appreciate it as well.

