Issues with Python 3.6.3 Multiprocessing. Anyone else see the same?

On the ferry once more, and wanted to share a hard-fought lesson that I learned today, which somewhat updates my post from the other day on multiprocessing and python 3.6.3.

Unfortunately, the lessons weren’t nice at all.

First, I discovered that using the new Manager object is a terrible idea, even for incredibly simple objects (eg. an incrementing value, incremented every couple of seconds). The implementation is significantly slower than creating your own object out of a lock and a shared value, just to have two threads take turns incrementing the value. Ouch. (I don’t have my bench marks, unfortunately, but it was about 10% of the run time, IIRC.)

Worse still, using a manager.Queue object is horrifically bad. I created an app where one process reads from a file and puts things into a a queue, and a second process reads from that queue and does some operations on the object. Now, my objects are just small lists with one integer in it, so they’re pretty small. Switching from a multiprocessing Queue to a Manager Queue caused a 3-fold increase in the time to execute. (5 seconds to 15 seconds.) Given that the whole reason for writing multiprocessing code is to speed up the processing of my data, the Manager is effectively a non-starter for me.

I understand, of course, that that overhead might be worth it if your Manager runs on a separate server, and can make use of multiple machines, but I’m working on the opposite problem, with one machine and several cores.

The second big discovery, of course, was that multiprocessing Queues really dont’ work well in python 3.6.3. I don’t know when this happened, but somewhere along the line, someone has changed their behaviour.

In 2.7, I could create one process that fills the Queue, and then create a second type of process that reads from the queue. As long as process 1 is much faster than process 2, the rate limiting step would be process 2. Thus, doubling the number of process 2’s, should double the processing of the job.

Unfortunately, in 3.6.3, this is no longer the case – the speed with which the processes obtain data from the queue is now the rate limiting step. Process 2 can call Queue.get(), but get is only serving the data at a constant speed, no matter how many processes 2’s are there calling the Queue.get() function.

That means that you can’t get any speed up from multiprocessing Queues…. unless you have a single queue for every process 2. Yep… that’s what I did this afternoon. Replaced the single queue with a list of queues, so that I have a single Queue for every processing queue.

Bad design, you say? Yes! I agree. In fact, since I now have a set of queues in which there’s only one writer and one reader, I shouldn’t be using queues at all. I should be using Pipes!

So, tomorrow, I’ll rip out all of my queues, and start putting in pipes. (Except where I have multiple processes writing to a single pipe, of course)

I don’t know where multiprocessing in python went wrong, but that was a severely disappointing moment this morning when I discovered this issue. For now, I’ll resist the urge to return to python 2.7.

(If anyone knows where I went wrong, please let me know – we all make mistakes, and I’m really hoping I’m wrong on this one.)

Bioinformatics toolchain

Once again, it’s a monday morning, and I’ve found myself on the ferry headed across the bay, thinking to myself, what could be better than crowdsourcing my bioinformatics toolchain, right?

Actually, This serves two purposes: It’s a handy guide for myself of useful things to install on a clean system, as well as an opportunity to open a conversation about things that a bioinformatician should have on their computer. Obviously we don’t all do the same things, but the concepts should be the same.

My first round of installs were pretty obvious:

  • An IDE (Pycharm, community edition)
  • A programming language (Python 3.6)
  • A text editor (BBEdit… for now, and nano)
  • A browser (Chrome)
  • A package manager (Brew)
  • A python package manager (pip)
  • A some very handy tools (virtualenv, cython)
  • A code cleanliness tool (pylint)

I realized I also needed at least one source code tool, so the obvious was a private github repository.

My first order of business was to create a useful wrapper for running embarassingly parallel processes on computers with multiple cores – I wrote a similar tool at my last job, and it was invaluable for getting computer heavy tasks done quickly, so I rebuilt it from scratch, including unit tests. The good thing about that exercise was that it also gave me an opportunity to deploy my full toolchain, including configuring pylint (“Your code scores 10.0/10.0”), and github, so that I now have some basic organization and working environment. Unit testing also forced me to configure the virtual environment and the dependency chains of libraries, and ensured that what I wrote was doing what I expect.

All in all, a win-win situation.

I also installed a few other programs:

  • Slack, with which I connect with other bioinformaticians
  • Twitter, so I can follow along with stuff like #AMA17, which is going on this weekend.
  • Civ V, because you can’t write code all the time. (-:

What do you think, have I missed anything important?

A few hints about moving to Python 3.6 (from 2.7) with Mutliprocessing

To those who’ve worked with me over the past couple years, you’ll know I’m a big fan of multiprocessing, which is a python package that effectively spawns new processes, much the same way you’d use threads in any other programming language.  Mainly, that’s because python’s GIL (global interpreter lock) more or less throttles any attempt you might seriously make to get threads to work.  However, multiprocessing is a nice replacement and effectively sidesteps those issues, allowing you to use as much of your computer’s resources as are available to you.

Consequently, I’ve spent part of the last couple days building up a new set of generic processes that will let me parallelize pretty much any piece of code that can work with a queue.  That is to say, if I can toss a bunch of things into a pile, and have each piece processed by a separate running instance of code, I can use this library.  It’ll be very handy for processing individual lines in a file (eg, VCF or fastq, or anything where the lines are independent)

Of course, this post only has any relevance because I’ve also decided to move from python 2.7 to 3.6 – and to no one’s surprise, things have changed.  In 2.7, I spent time creating objects that had built in locks, and shared c_type variables that could be passed around.  In 3.6, all of that becomes irrelevant.  Instead, you create a new object, a Manager().

The Manager is a relatively complex object, in that it has built in locks – for which I haven’t figured out how efficient they are yet, that’s probably down the road a bit – which makes all of the Lock wrapping I’d done in 2.7 obsolete.  My first attempt a making it work was a failure, as it constantly threw errors that you can’t put Locks into the Manager.  In fact, you also can’t put objects containing locks (such as multiprocessing Value) into the Manager. You can, however, replace them with Value objects from the manager class.

The part of the Manager that I haven’t played with yet, is that they also seem to have the ability to share information across computers, if you launch it as a server process.  Although likely overkill (and network latency makes me really shy away from that), it seems like it could be useful for building big cluster jobs.  Again, something much further down the road for me.

Although not a huge milestone, it’s good to have at least one essential component back in my toolkit: My unit test suite passes, doing some simple processing using the generic processing class.  And yes, good code requires good unit tests, so I’ve also been writing those.

Lessons learned the hard way are often remembered the best.  Writing multiprocessing code out from scratch was a great exercise, and learning some of the changes between 2.7 and 3.6 was definitely worthwhile.

Dealing with being a lone bioinformatician – social media.

As I settle into my new job, I’ve quickly realized that I’m going to be a “lone bioinformatician” for a little while, and that I’m going to have to go back to my old habits of twitter and blogging, in order to keep up with the world around me.  In addition, I’m finding myself on slack as well, in the reddit bioinformatics channel.  The idea is that I’ll be able to keep in touch with developments in my field better this way.

That said, my current following list is heavily tilted towards non-bioinformatics, so I’ve begun the long journey of purging my list.  (If I’ve unfollowed you… sorry!)  The harder part will be trying to figure out who it is that I should be following.

The bright side of this is that the long ferry rides at either end of my day are giving me time to do some of this work, which is an unexpected bonus. I had no idea that adding to my commute time would also add to my productivity.

That said, If anyone has any suggestions about who I should be following on twitter or in blog format, please let me know – I’ll cheerfully compile a list of twittering/blogging bioinformaticians, or if you already know of a current list, I’d love to hear about it.

In the meantime, if you’re interested in joining a bioinformatics slack, please let me know, and I’d be happy to add you.

On a boat – and starting new things.

Well, it’s a ferry.  Does that count?

My new commute plan takes me from Oakland to South San Francisco by boat, across the bay on the ferry, with a short bike ride on either side.  Given that this is still day 2 of taking the ferry, I’ve still got that warm glow of disbelief that I get to watch the sunrise and sunset from a boat.  Too cool.

Anyhow the important thing is why I’m doing this, which should obviously be because I have a new job.  After three and a half years with Fabric Genomics, it was time to move on.  I left all of my work there in good hands, and the projects I wanted to finish were all wrapped up… and now I’ve got an opportunity to do some real bioinformatics, and not just engineering.  That’s a huge draw for me, really.  I miss doing algorithm design and working with the data, which is pretty much the part of bioinformatics that drew me to the field in the first place.  It’s nice to know that I can do kick-ass engineering, but it’s hard to see myself doing it much longer.

Anyhow, I’m very excited about my new job at Tenaya Therapeutics, and super thrilled to be working with a really awesome group of people.  Unlike many pharmaceutical companies, they’re thinking about their data right from the start.  That may seem obvious, but it honestly wasn’t – I’ve spoken to a lot of companies that had grown to 300+ people, with tons of research programs, and were just now thinking that they should hire someone who understands large scale data.  At that point, it’s way way way too late.  No matter how fast you’ll run, one bioinformatician will never be able to keep up with 60+ scientists generating data.

At any rate, I’d love to say more about what I’m doing, but that’s a conversation I’ll have to start up.  As I’ve learned over the years, surprises aren’t good.for anyone, unless it’s a birthday.

Stay tuned for more.

#AGBTPH – Kenna Mills Shaw, Precision oncology decision support: Building a tool to deliver the right drugs(s) at the right time(s) to the right patient(s).

[I have to catch a flight to the airport, so can’t stay for the whole talk…. d’oh]


Very narrow definition of precision medicine:  Use NGS to find patients who may respond better to one drug or another, or be resistant to a class of drugs: just matching patients to drugs.

Precision medicine is completely aspirational for patients.  We still do a bad job of figuring out how to match patients with drugs.  Right now, we don’t do it well – or at all.

We’re all bad at it, actually.

  • which patients should get tested?
  • use data to impact care
  • demonstrating data changes oucome
  • deciding how much of genome to sequence
  • how do we pay for it?

Why was MD anderson bad at it?

Patients concerned about, are those who have exhausted standard therapies, for instance.

Drop in cost leads to increases in data generation.  We all suck at using this data to impact outcome for patient.   MD Anderson was only able to impact 11% of patients with potentially actionable information.

Whole exome at other institutes were getting 5% (Beltran et al)

There are only 125 “actionable” genes.

NGS is not sufficient or necessary to drive personalized medicine.


  • solid tumours, behind liquid tumours because it’s hard to get the DNA.
  • Accessibility  – timing of data
  • Attitudes of doctors as well.

Leukaemia docs also use the molecular signature as well as other data to clarify.  Solid tumour docs do not.

Ignoring copy number, only 40% of patients have actionable variants.  (goes way up with copy number.)

Clinical trials categorized by type of match – even broadly, that’s 11% of patients.  Lack of enrolment not due to lack of available matched trials.

[Ok… time to go… alas, can’t stay to see the end of this talk.]

#AGBTPH – Imran Haque, Overcoming artificial selection to realize the potential of germ line cancer screening

@imranshaque – Counsyl

Selfie related deaths:  Indiscriminate killer – equal risk for men vs. women.  40% of related deaths occurred in india.  10% of those who sing in the car….   About on par with shark attack deaths.

Cancer genomics is about 30 years old.  RB1 (1986).  Today many genes are known to be implicated in cancer.  Many of the more recent ones are less penetrant.

You can now get a commercial NGS test for 39-42 genes – and it’s relatively cheap.  How to get it:  1: get cancer, or 2: related to those who had cancer.

Models is under strain.

Access to “free” genetic testing for cancer risk is gated by personal and family history.

Very complicated decision tree.  Personal history of Breast cancer (long list of tree)… or other cancers or many many other factors.  Why is this bad?  Requires a 3rd degree pedigree, which may be too complex for an appointment.  Only a small number of patients who qualify actually get test: 7%.

Counsyl – First Care. (Product)  Helps you do your pre-test consult before you go into the clinic.  Then, offer follow up with genetic counsellor.  Reports it back to physician for appropriate treatment.  Anecdotally doing very well and increasing the number of patients who qualify for free testing.

Some insurers require additional barriers to get testing.  Patients may also be required to do pre-testing.  This helps to bring genetic counselling into the picture, and guarantees that the right tests are being used.

Counsyl can evaluate that – A large segment of population cancels the test if the requirements of pre-counselling are put in place.  Pre-test counselling is not being seen as a bonus.


A good amount of cancers are driven by the same 2 genes (BRCA1/2).

Ability to integrate all high risk genes into single testes + discovery of new “moderate risk” genes has nearly doubled yield of diagnostic gremlin testing.  Expanded tests help, but still, total yields are around 7%.

Twin study, 1/3 of cancer risks come from genetics.  up to 55% from prostate cancer, but depends widely on the type of cancer.

Breast cancer: 20% heritability from single-gene penetrant alleles

Prostate Cancer: 55% heritability, but <5% from known single gene effects.

[Survey of literature, covering screening, risk and actionability.]


Most genetic studies are done on non-diverse cohorts.  VUS rates differ systematically by ethnicity: BRCA1/2 ~3% for Europeans, ~7% for Africans and Asians. Similar for larger cancer panels.  Correlate to panel size as well, and systematic across other diagnostic panels.

Lack of diversity in discovery cohort leads to seriously skewed ability to process non European populations. Worse, possible misdiagnoses for non-white population.


better systems to improve access;  better studies to demonstrate utility of bringing testing to wider population.

Polygenic risk is important and needs to be studied.

Issues of diversity are still plaguing us.  Need to include much more diverse populations.

#AGBTPH – Nikolas Papadopoulos, Detection of rare somatic mutations in bodily fluids in the era of precision medicine: Challenges and opportunities

Johns Hopkins School of Medicine

Data Analysis in Cancer Management: Risk Assessment.  Looking for germ line variants in families.

In cancer management: on tissue.  Prognosis, classification and response.

IN absence of tissue: are there still cancer cells present?  are there actionable changes?  Are there signs of cancer?

Use ctDNA to answer some of these questions.  ctDNA is minority of DNA that’s present.  half life is 30 min – one hour.


  • Technical – specificity and sensitivity
  • Biological: is there detectable amount of DNA in fluid
  • Interpretation: How do you do it?

Favourite markers: somatic mutations, indels and rearrangements.  Very good specificity.

Technical challenges – finding the needle in the haystack.  How do you do it?  Used to be that you would search the hay one by one… digital PCR for instance.  Now, we marry the digital signal to NGS.  Disadvantages: error rare is high for the clinical applications.  Early detection.

Safe-SEQs: Method

  • assignment of unique identifier to each molecule
  • a UID is used to group reads from a common template.
  • Use a PCR based approach. (Allows detection at 0.01%.)
  • divide results into mutant, non-mutant and artifact.

Which tumour types can we detect in cancer?  sensitivity depends on cancer type.  Best for bladder, colorectal, ovarian, gastroesophageal, etc.  Bad for Thyroid, Glioma, prostate, renal cell, etc.

ctDNA is a dynamic biomarker.  Can be used to follow tumour.

Stage II: colorectal cancer.  80% of patients cures by surgery alone.  Good setting for ctDNA.  230 patients.  Plasma for 4 to 10 weeks.  >1000 plasmas analyzed.   Physicians blinded to ctDNA.

Panel used to probe small number of variants.  ctDNA is a strong marker for recurrence.  If it’s not there, patients fare very well.

ctDNA detects minimal residual disease and predicts recurrence.  Provides real-time measurement of disease burden.

Detection of Resistance.

Liquid biopsy resistance mutations.  Pre-treatment, no mutations, post treatment, many did. 0-12 mutations.

Early detection: the holy grail

The most difficult and the longest to get out of the lab.  We know it works.  People will be missed if not part of screening.  Localized disease are frequent.

For early detection, the source makes a difference.  Stool works better than plasma for colorectal cancer.  CNS tumours are not detectable in plasma, pap smear tests work best for ovarian and endometrial cancer.

Doing both plasma and pap smear, improves over either one alone.

Pancreatic cancers are tough – However, a lot of cysts are being found.  Cyst types have different genes mutated.  Cyst fluid gathered in a test – SNP analysis done, 1026 resected cysts.

Goal: reduce surgeries, without leaving anything behind.

Summary: It can be done, it can be done well.

Vision: Prevent advance tumours by integrating such testes into routine physical exams.

#AGBTPH – The Great Debate (Panels vs. Exomes vs Genomes)

  • Richard Gibbs, Baylor College of Medicine
  • Heidi Rehm, Harvard Medical School, Partners Healthcare
  • Steven Kingsmore, Rady Children’s Hospital

Heidi Rehm is up first.

  • No one test fits all.  Should be influenced by phenotype, insurance, gene of interest, etc. Some genes are incredibly hard to work with (e.g.. hearing loss genes) so many genome and exome approaches, so panels would be better.  30% of hearing loss issues are CNV, which are very difficult to detect by exome.  If diagnosis is really tough, and there is no panel that caters to the symptoms, then go for genome.
  • We used to operate with labs that focused on single genes – but now our labs are broad, but we’ve lost all of the experts on single genes.  knowledge management is very hard in this environment.  Sometimes genomes miss things that panels catch, and vice versa.   At end of day, it’s all context specific.  If you have to factor in insurance coverage – and insurance companies really aren’t interested in secondary findings.  etc.
  • Address all of the other complexities: SVs, CNVs, etc. etc.  But these are expensive, but physicians will still go with the cheapest labs.

Richard Gibbs is next

  • The two points of view here are :
    • whole genomes are better
    • no, it’s more nuanced, it depends on the context
  • Most of us are probably in the second camp, if we were pressed.
  • The quality of the genomes, exomes and panels is still a huge factor.   Genomes DO miss things.
  • Exomes are getting better – we’re now up to about 84% coverage of every base in every gene in the coding genome. All genes are there, and most are only missing a small number of bases.
  • Genomes cover everything, but calling is a big deal.
  • CNV coverage is poor on Exomes, but good on genomes and panels.
  • Cost is linear between panels, exomes and genomes.  Informativeness relatively linear as well, but you can explode cost if you want to do more on a genome.
  • If you have a choice between trio of exomes vs genomes, you need to consider that too.
  • The vendor is selling instruments that have distorted prices for exomes, which could be as cheap as panels, if they wanted.

Steven Kingsmore:

  • Going to be provocative, and push hard. [devil’s advocate]
  • The Genome IS the ultimate test, and it’s incredibly informative.  Everyone needs a genome!
  • Genomes
    • Doesn’t miss all exomes of all genes
    • You don’t miss all intronic variants you’d like to see.
    • You can look at all of the reads to see what you’re missing in the gene of interest.  can’t really look at that in panels and exomes.
  • Problem is that they’re expensive.  At least 4 times more than an exome.  (and that’s 4 times more than a panel) Payers don’t pay us much.  How much money do you want to lose.  People don’t mine the genome either – they’re doing genomes, but only get back what they’re looking for.
  • Special cases, though: Cancer needs to go REALLY deep – 30x isn’t enough for somatic mutations.  Exomes will win there.
  • Hybrid Genomes – the next great things: matching long read sequencing with short read sequencing. Rich mining of indels/SVs.  SO EXCITING!
  • Panel < Exome < Geomes << Rich/Hybrid Genome (EVERYTHING ON STEROIDS….  which costs even more.)

Moderator: Why are hybrids a good thing? why is it so much better?  We can already get SVs, but what we can’t see, why would be expect to see it then?

RG: Just on the cusp of giving you all the rich information that we want.  If cost wasn’t a limit, we’d probably come up with a $40,000 bit for each genome.  Yes, we want rockets to mars, but the cost is much more nuanced now.

SK: Everyone is spending the same amount on a battery of tests, but genomes give us that anyhow.

RG: But we’re still missing stuff because of the technology.  You’ll still miss critical expansion regions anyhow.

Q: We’re so far beyond where the physicians are, but ultimately, we’re dealing with uninterpretable content in the genome.  Why is this really relevant?  The real question: there’s value in all of these, but if we want this to translate to the clinic, how we convince them any of this is useful?

HR: The strategy for reporting is very different between panel, exome, genome.  Panels, require an investigation of every variant in gene.  Even if you have VUS, you can follow up. In genome exome, you can not interpret every variant, so you filter.  Sometimes those filters don’t do a good job.  So, we miss things not because they’re technically missed, but because the clinicians and bioinformatics are failing to do what they should.

First lawsuit for misinterpretation of variant – the lab reported the variant, and requested parental testing in order to augment to pathogenic.  The doctor didn’t pass any of that on to the family, and the family found out years later.    The physicians have to be brought into the conversation.

SK: Have to respond to HR.  Panels tend to overdiagnose.  You have 10 genes, and darn it, you’re going to make a diagnosis.  You tend to want to call something.  Under diagnosis is the other end, because you miss other genes. Panels are cheap, and that’s why we do it, genomes will eventually be there.  Respond to questions on Physicians: moving to hybrid genomes will help, because then physicians just have to know one word: Hybrid-genome.   They don’t need to know protocols and panels… Patients shouldn’t have to bounce through specialist till the right test is run.

RG: Is it true that panels are over diagnosed?  It is a persistent problem.

HR: See just as many over diagnoses from genomes and exomes – it’s not unique to panels.

Q: (Fawzan) Point 1: litigation, Point 2: variant is there, but we fail to see it.  That combination is terrible, so we’re amplifying the odds of being sued!  We aren’t discussing that enough.

HR: I don’t think any of us are liable for doing the tests that are requested or doing their best for the patient.  Whoever it is, if you’re doing your best, follow your protocol, then you’re not open to liability.  If we diverge from protocol, or don’t validate, that’s when we become liable.  There’s not always a right or wrong.  We should be doing better, but that’s not unique to medicine.

Q (Follow up): If you’re in a court room, that 30% -40% difference in diagnosis rate may play differently.

HR: That’s why we do so many validation tests – The bioinformatics is maturing, and early versions didn’t do a great job – but it’s improving all the time.  It’s easy to say what should be done, but technically doing the pipeline is much more complex, data is more complex, nomenclature is not perfect.  Filters are very very hard to do, and pipelines need to be validated extensively.  it’s a challenge.

Q (Follow up):  When we miss something it’s usually because the filters are wrong – and with genomes this is just again opening up to liability.

SK: Your argument is silly.

Q (Follow up) : just being provocative!

SK: [reductio ad absurdum]  Patient goes to doctor with a headache…

Q: It’s contextual, we all agree.  In the practice setting, where diagnostics are being ordered.  The constraints on the doctors are even more tight.  How do you think about getting any of these tests (even panels!) accepted outside the academic environment.

RG: Kingsmore promotes genetic literacy by not burying people – applying filters that make the data less complex for physicians. We’ve even put filters in place to mimic that.

HR: Panel done on exome or genome backbone is good – That is a good transition.  Virtual panels are constructed in a way that appeals to the physician to mirror the standard of care for an off-the shelf test is a good intermediate – and allows physicians to return to that data and unmask the next set of genes.  iteratively go forth to reduce the cost, but not change complexity for the physician.

(@notSoJunkDNA): This debate ignored cancer – for another day.  We see resources like Exac, which help aggregate data.  Thus Whole Genome is the only investment in the future – which should be factored into the cost.

HR: Yes, by building that resource, we are making huge impact. Exac is single most useful resource of the last 20 years.  However, we can’t put that expense on our patients.  We can’t be shortlisted, we still need to care for patients.

RG: want to use a different argument to disagree with SK.  Whole genomes are not good enough yet.  Lets not burn all our dollars now before the genomes are great quality, lets get to the great genome, then do it.

SK:  Yes, there are flaws with genomes, but the point is well made: in an ideal world we should be getting genomes on everyone, and put them aggregated into the public domain to allow us to tackle major issues.  Really like that idea, and willing to help subsidize the incremental cost of that. The idea that you get one report, and can return to the genome and reanalyze them makes a huge impact.

M: We can generate them, but we can’t analyze them.  It’s not a great genome if you can’t analyze it.

RG: Eventually  someone is going to go back and use better methods to reanalyze what we’ve done.

Q: Hard studies are tough, but worth doing.  Q to RG: if you had a family member, would you really do 30x WGS, with existing technology  Wouldn’t you do Germline/Somatic?

RG: It’s different between cancer and mendelian.  It’s also different from family, from discovery from managing a health care portfolio.  Hypothetical emotional questions have be separate from the data questions.

HR: we have limited resources to do analysis anyhow.

RG: Have to ask what’s advancing the research agenda.

SK: we do have a lack of objective evidence. Most of our community is doing panels like this one, where we see who argues the best.  Analysis used to be an art – parameterization takes a huge amount of work and investment.  What we’re doing with sequencing is the same, and when the hard data shows up, we’ll convince the payers.

Q:  What price does it shift over from WES to WGS?

SK: I don’t think we’ve done clinical utility studies – it’s the missing piece.  There are studies starting to look at this.  There are way more studies needed for cost effectiveness.

HR: it’s not just prices, we’re losing something with the broader scope.  It’s a tradeoff that has to be examined, outside of cost.

M: RG, If you could put a certain test on a certain machine.  If you could run your exome on a different machine, are policies limiting the best highest quality health care?

RG: conflicting answer:  SK is right, we haven’t got all the studies done.  Even if it’s free, we don’t know how it would work.  However there are medical tests that have a price inflection that is convincing.  <$100 exomes, we would open up many new avenues.

HR: Need to look at numbers, what do the old vs. new numbers say about the changing tests.

M: We could drive down costs if we could use exomes on different machines…. that’s major issue.

Q: How do you think consumer genomics is going to change our field?   (Like GMO?)

SK: Great point – while we argue over which is best, there is an impending nightmare where traditional medicine becomes eclipsed.  Like Herbal medicine, chiropractors… there are things that medical fields didn’t embrace, but public wants it.  Patients will start wandering into their physicians with results  and looking for information.

Q (Follow up): Especially if quality they bring in is terrible.

HR: Terrible backlash to GMO because there was no public debate, and that lack of debate had huge negative impact.    It’s really important that we have those discussions.

Q: Moving target scenarios: cancer mutations, antibiotic resistance.  How do we balance coverage and cost – what’s best approach in that setting?

SK: Cancer, you want both exome and genome, match germline, tumour….. [everything!]

M: Out of time!  What should reimbursement do?

HR: It should be paying, all the hype has missed the fact that this is incredibly helpful for those who get diagnosis.  It’s on the community to do a better job to communicate and share resources so we can decide which tests are the right tests.

RG: Swayed that the top down tests are actually be good.  We shouldn’t be comparing exome vs panel.. Genomes aren’t there, but we should be looking them in the future.

SK: We’re privileged to be living in the day of NGS, and we should be looking at enabling all of the options for clinicians!

#AGBTPH – Stephan Kingsmore, Delivering 26-hour diagnostic genomes to all NICU infants who will benefit in California and Arizona: Potential impact bottlenecks and solutions.

Rady Children’s Hospital.

Translating api whole genome sequencing into precision medicine for infants in intensive care units.

60 slides, 30 minutes… buckle up.

Largely, this was triggered by Obama and Collins.  In San Siego, Rady donated $160M and said “make this a reality.”

This is all still an early stage.  We’re at the 0.25%… it’s going to take 10 years to deliver on this dream and make it medicine.

Scope: 35M people into california, and we can make it into a precision medicine centre.  Focus on Newborns – when a baby is born, doctors will do anything to save a baby’s life.  In CA, all babies feed into a network of hospitals down to specialize centres for expert care.  It’s a small number of health care systems that deliver care for babies.

Can we provide a scalable service like the NIH’s, and make an impact.

Why?  14% of newborns admitted in NICU or PICU. Leading cause of death is genetic diseases: 8250 genetic diseases.  Individually, they are rare, but aggregated they are common.  Conventional testing is too slow, and cost of care is $4000/day, so genomics is cheap comparitively.

Surviving: 35 babies in level 5 NICU… median survival is 60 days with genetic diseases…

Why single gene diseases?  They are tractable.  Looking for 1-2 highly penetrant variants that will poison a protein.  We have infrastructure that can deal with this information.  Orphan drugs are becoming a part of the scene.  Potentially, gene therapy might be scalable and real.

GAP: how do you scale the 26-hour diagnosis nationally.    Any clinic?  where there are no genetics.. etc.

It is possible to have dynamic EHR agents that monitor constantly.  How do you do it for babies?  [Review case presented earlier in conference.]

Disease heterogeneity is an issue – children may not have yet grown into phenotype.  Vast number of diseases, limited number of presentations.  So, start by Data mining medical record, then translate into a differential diagnosis.  Use HPO to calculate a projection of symptoms, which can be checked against other disorders.

Computer-generated list of 341 diseases that may fit feature.

Also, then, need a genome/exome.  Which one do we do?  Speed, sensitivity and specificity.  Genomes: one day faster, exomes are cheaper.

[An old Elaine Mardis slide: Fiscal environment:  $1000 genome is still a $100,000 analysis.]

Have a big bioinformatics infrastructure.  Analytics are very good.  But, diagnostic metrics may not be as good.  Use standard filtering tools to work out causative variants.

Major goal should be to automate ACMG style classification.

Structural variants should be included.  Not yet applied in clinical practice.  We are also missing de novo genome assemblies… but that’s coming as well.

When 26 hour process works, it really works.

Big gap: Gemome reimbursement.  Quality of evidence is pretty poor.  Need more original research, more randomized control studies, Standard testing of new diagnostic tests, is not good enough.  Payers are far more interested in other metrics.

Other groups have studied this around the world, using exome sequencing.  Diagnosis rate ~28%,  making it most effective method.  (Can be 25-50%, depending on unknown characteristics.)  Quality of phenotype may be a big issue.

WES + EHR can help to raise to 51% diagnosis.

de novo mutations are leading cause of genetic diseases in infants.  Really, forced to test trios.  This is a “sea-change” for the field.

Study: Trio exome sequencing yields 7.4% more diagnoses over sequencing proband alone.  ([Not entirely convincing…]

Another Study: 58% by WES vs. 14% standard methods.   [ And more studies – can’t show numbers fast enough.]

Faster you can turn around diagnostic, the faster you can get a change in care.

No recurrent mutations in infants treated… but some presentations are enriched for successful diagnoses.

Move on to Randomized control study:  just completed, admitted any NICU patient with phenotype suggestive of genetic tests.  15% molecular diagnosis by standard tests.  41% diagnosis with rapid WGS.  Had to end test early because it was clear that WGS was making a massive impact.

Problems and solutions: Focus back on parents and families, who may have different impression/understanding of testing or methods.  Don’t have enough experts to fill gap: 850,000 MDS, but only 1100 medical geneticists and 4000 genetic councillors. (Solution: more training, and possibly other experts?)

Triangle of priorities: Pick 2…

-> Scalable Clinical Utility <-> Rapid <-> Low Cost.  <-


  • Process engineering – scalable highly efficient methods
  • Clinical Research – much better evidence than we have now
  • Education and Engagement – med students need more training, for instance.  (Currently only get a day or a week of genetics…)