Design course – post 1

As promised, I said I’d give some updates on my learning UI/UX design…. and I have a few thoughts on the issue. Don’t expect this to be coherent.. It’s not.

First off, I spent some time looking at courses on the subject and quickly realized I don’t have 20 weeks to dedicate to this, so I’m just going to have to do it on my own, just the way I’d have done it in grad school: find a reference, and beat my head on it as much as possible. So, after some digging, I settled on this:

Each lesson seems to take a couple of hours, and I’ve managed to get through the first 5. Beware, like grad school, you’re only going get out of it what you take the time to learn. Design appears to be something that takes a lot of practice.

The first lesson is pretty abstract – it mostly focuses on the idea that design and web coding are separate skill sets. Given that I’m lacking in both, that doesn’t bother me – I’m here to acquire both! More troublesome, though is the insistence that both be done separately. Given that I’m a one-man-full-stack-developer on all of my current projects, there’s no real arms length possible.. so I’m just going to have to do the best I can. Mostly, that means drawing things out on paper, and then abandoning those plans as I work out what I can actually accomplish with my limited CSS/JS coding skills.

The next few lessons, focusing on the basics, were pretty useful. I skipped some of the tutorials on the tools of design. Yes, I can use a pen and paper because I’m not going to be showing my lousy designs to a panel of judges, so I haven’t worried too much about that part. However, the sections on typography and layouts were fascinating.

Typography, after reading several essays on the subject, is entirely subjective. I can summarize it as this: If it makes your page readable and clean looking, you’re doing well. Don’t go overboard with more than 2 fonts, don’t pick fonts that don’t do what you want, and don’t try to use CSS/HTML to make fonts do things they weren’t designed to do. The take away message is just that you just have to go by what you think looks ok.

Considering that I previously just used the default fonts for everything, though, that’s already a good lesson for me.

The real take away from all of this was the section on responsive UI’s. I didn’t know that was a thing: basically, you use CSS to allow your page to seamlessly resize itself as you grow or shrink the window. Trivial, you might say, but it was eye opening to me. I didn’t know that was possible – and I didn’t know that there are frameworks and pre-built CSS/HTML examples of that. It completely changed the way I thought about layouts.

In fact, so much so, that I have started practicing with it already. As a demo workspace, I’ve been templating up a replacement for my domain. What’s there now is 1997 technology. Hopefully in a day or two, I can apply what I’ve learned for a revamp. I don’t know how that’ll go, but it’s a great place to start learning.

If you were expecting a conclusion, however, I’m going to have to disappoint you. After 10 hours of delving into HTML/CSS/JS today, I’m still blindly flailing around. I have much to learn, but at least I can say that I have successfully applied the lessons in the first 5 chapters of the design tutorial. On the bright side, I only have another 45 chapters still to go!

UX/UI design time

So, this is something new – and imminently blog-able.  I’ve been given a challenge, which I take seriously.  After decades of working on back ends, it has been brought to my attention that my UI/UX design skills are, shall we say, lacking.

Thus, I am going to embark on brief journey to learn some design.  Now, I could do this by taking a course, and spend 24 weeks on it, but a brief reading of online or on-campus courses tells me that most of the time is spent learning such useful skills as using a text editor, and “HTML”.   And, for my purposes, it’s not that helpful to learn JQuery.  I mostly need to learn how to make a decent page that engages users – aka, just the design part of it.

Obviously, something that I”m not so good at.  No, I’m not going to show you screen shots.  I’ll admit they’re embarrassing… and if I get good at this, I’ll post a before and after picture.

So, the challenge.  5 days to learn how to make a passable web page, that encourages use. (And doesn’t look like industrial html from 1993.) . By the time everyone comes back from holidays, i want to have a much more engaging grasp of design, and how to execute that design.

Game on! 

If anyone has any recommendations… uh… yeah, they’re very welcome.

Bioinformatics toolchain

Once again, it’s a monday morning, and I’ve found myself on the ferry headed across the bay, thinking to myself, what could be better than crowdsourcing my bioinformatics toolchain, right?

Actually, This serves two purposes: It’s a handy guide for myself of useful things to install on a clean system, as well as an opportunity to open a conversation about things that a bioinformatician should have on their computer. Obviously we don’t all do the same things, but the concepts should be the same.

My first round of installs were pretty obvious:

  • An IDE (Pycharm, community edition)
  • A programming language (Python 3.6)
  • A text editor (BBEdit… for now, and nano)
  • A browser (Chrome)
  • A package manager (Brew)
  • A python package manager (pip)
  • A some very handy tools (virtualenv, cython)
  • A code cleanliness tool (pylint)

I realized I also needed at least one source code tool, so the obvious was a private github repository.

My first order of business was to create a useful wrapper for running embarassingly parallel processes on computers with multiple cores – I wrote a similar tool at my last job, and it was invaluable for getting computer heavy tasks done quickly, so I rebuilt it from scratch, including unit tests. The good thing about that exercise was that it also gave me an opportunity to deploy my full toolchain, including configuring pylint (“Your code scores 10.0/10.0”), and github, so that I now have some basic organization and working environment. Unit testing also forced me to configure the virtual environment and the dependency chains of libraries, and ensured that what I wrote was doing what I expect.

All in all, a win-win situation.

I also installed a few other programs:

  • Slack, with which I connect with other bioinformaticians
  • Twitter, so I can follow along with stuff like #AMA17, which is going on this weekend.
  • Civ V, because you can’t write code all the time. (-:

What do you think, have I missed anything important?

A few hints about moving to Python 3.6 (from 2.7) with Mutliprocessing

To those who’ve worked with me over the past couple years, you’ll know I’m a big fan of multiprocessing, which is a python package that effectively spawns new processes, much the same way you’d use threads in any other programming language.  Mainly, that’s because python’s GIL (global interpreter lock) more or less throttles any attempt you might seriously make to get threads to work.  However, multiprocessing is a nice replacement and effectively sidesteps those issues, allowing you to use as much of your computer’s resources as are available to you.

Consequently, I’ve spent part of the last couple days building up a new set of generic processes that will let me parallelize pretty much any piece of code that can work with a queue.  That is to say, if I can toss a bunch of things into a pile, and have each piece processed by a separate running instance of code, I can use this library.  It’ll be very handy for processing individual lines in a file (eg, VCF or fastq, or anything where the lines are independent)

Of course, this post only has any relevance because I’ve also decided to move from python 2.7 to 3.6 – and to no one’s surprise, things have changed.  In 2.7, I spent time creating objects that had built in locks, and shared c_type variables that could be passed around.  In 3.6, all of that becomes irrelevant.  Instead, you create a new object, a Manager().

The Manager is a relatively complex object, in that it has built in locks – for which I haven’t figured out how efficient they are yet, that’s probably down the road a bit – which makes all of the Lock wrapping I’d done in 2.7 obsolete.  My first attempt a making it work was a failure, as it constantly threw errors that you can’t put Locks into the Manager.  In fact, you also can’t put objects containing locks (such as multiprocessing Value) into the Manager. You can, however, replace them with Value objects from the manager class.

The part of the Manager that I haven’t played with yet, is that they also seem to have the ability to share information across computers, if you launch it as a server process.  Although likely overkill (and network latency makes me really shy away from that), it seems like it could be useful for building big cluster jobs.  Again, something much further down the road for me.

Although not a huge milestone, it’s good to have at least one essential component back in my toolkit: My unit test suite passes, doing some simple processing using the generic processing class.  And yes, good code requires good unit tests, so I’ve also been writing those.

Lessons learned the hard way are often remembered the best.  Writing multiprocessing code out from scratch was a great exercise, and learning some of the changes between 2.7 and 3.6 was definitely worthwhile.

Dealing with being a lone bioinformatician – social media.

As I settle into my new job, I’ve quickly realized that I’m going to be a “lone bioinformatician” for a little while, and that I’m going to have to go back to my old habits of twitter and blogging, in order to keep up with the world around me.  In addition, I’m finding myself on slack as well, in the reddit bioinformatics channel.  The idea is that I’ll be able to keep in touch with developments in my field better this way.

That said, my current following list is heavily tilted towards non-bioinformatics, so I’ve begun the long journey of purging my list.  (If I’ve unfollowed you… sorry!)  The harder part will be trying to figure out who it is that I should be following.

The bright side of this is that the long ferry rides at either end of my day are giving me time to do some of this work, which is an unexpected bonus. I had no idea that adding to my commute time would also add to my productivity.

That said, If anyone has any suggestions about who I should be following on twitter or in blog format, please let me know – I’ll cheerfully compile a list of twittering/blogging bioinformaticians, or if you already know of a current list, I’d love to hear about it.

In the meantime, if you’re interested in joining a bioinformatics slack, please let me know, and I’d be happy to add you.

#AGBTPH – Kenna Mills Shaw, Precision oncology decision support: Building a tool to deliver the right drugs(s) at the right time(s) to the right patient(s).

[I have to catch a flight to the airport, so can’t stay for the whole talk…. d’oh]


Very narrow definition of precision medicine:  Use NGS to find patients who may respond better to one drug or another, or be resistant to a class of drugs: just matching patients to drugs.

Precision medicine is completely aspirational for patients.  We still do a bad job of figuring out how to match patients with drugs.  Right now, we don’t do it well – or at all.

We’re all bad at it, actually.

  • which patients should get tested?
  • use data to impact care
  • demonstrating data changes oucome
  • deciding how much of genome to sequence
  • how do we pay for it?

Why was MD anderson bad at it?

Patients concerned about, are those who have exhausted standard therapies, for instance.

Drop in cost leads to increases in data generation.  We all suck at using this data to impact outcome for patient.   MD Anderson was only able to impact 11% of patients with potentially actionable information.

Whole exome at other institutes were getting 5% (Beltran et al)

There are only 125 “actionable” genes.

NGS is not sufficient or necessary to drive personalized medicine.


  • solid tumours, behind liquid tumours because it’s hard to get the DNA.
  • Accessibility  – timing of data
  • Attitudes of doctors as well.

Leukaemia docs also use the molecular signature as well as other data to clarify.  Solid tumour docs do not.

Ignoring copy number, only 40% of patients have actionable variants.  (goes way up with copy number.)

Clinical trials categorized by type of match – even broadly, that’s 11% of patients.  Lack of enrolment not due to lack of available matched trials.

[Ok… time to go… alas, can’t stay to see the end of this talk.]

#AGBTPH – Imran Haque, Overcoming artificial selection to realize the potential of germ line cancer screening

@imranshaque – Counsyl

Selfie related deaths:  Indiscriminate killer – equal risk for men vs. women.  40% of related deaths occurred in india.  10% of those who sing in the car….   About on par with shark attack deaths.

Cancer genomics is about 30 years old.  RB1 (1986).  Today many genes are known to be implicated in cancer.  Many of the more recent ones are less penetrant.

You can now get a commercial NGS test for 39-42 genes – and it’s relatively cheap.  How to get it:  1: get cancer, or 2: related to those who had cancer.

Models is under strain.

Access to “free” genetic testing for cancer risk is gated by personal and family history.

Very complicated decision tree.  Personal history of Breast cancer (long list of tree)… or other cancers or many many other factors.  Why is this bad?  Requires a 3rd degree pedigree, which may be too complex for an appointment.  Only a small number of patients who qualify actually get test: 7%.

Counsyl – First Care. (Product)  Helps you do your pre-test consult before you go into the clinic.  Then, offer follow up with genetic counsellor.  Reports it back to physician for appropriate treatment.  Anecdotally doing very well and increasing the number of patients who qualify for free testing.

Some insurers require additional barriers to get testing.  Patients may also be required to do pre-testing.  This helps to bring genetic counselling into the picture, and guarantees that the right tests are being used.

Counsyl can evaluate that – A large segment of population cancels the test if the requirements of pre-counselling are put in place.  Pre-test counselling is not being seen as a bonus.


A good amount of cancers are driven by the same 2 genes (BRCA1/2).

Ability to integrate all high risk genes into single testes + discovery of new “moderate risk” genes has nearly doubled yield of diagnostic gremlin testing.  Expanded tests help, but still, total yields are around 7%.

Twin study, 1/3 of cancer risks come from genetics.  up to 55% from prostate cancer, but depends widely on the type of cancer.

Breast cancer: 20% heritability from single-gene penetrant alleles

Prostate Cancer: 55% heritability, but <5% from known single gene effects.

[Survey of literature, covering screening, risk and actionability.]


Most genetic studies are done on non-diverse cohorts.  VUS rates differ systematically by ethnicity: BRCA1/2 ~3% for Europeans, ~7% for Africans and Asians. Similar for larger cancer panels.  Correlate to panel size as well, and systematic across other diagnostic panels.

Lack of diversity in discovery cohort leads to seriously skewed ability to process non European populations. Worse, possible misdiagnoses for non-white population.


better systems to improve access;  better studies to demonstrate utility of bringing testing to wider population.

Polygenic risk is important and needs to be studied.

Issues of diversity are still plaguing us.  Need to include much more diverse populations.

#AGBTPH – Stephan Kingsmore, Delivering 26-hour diagnostic genomes to all NICU infants who will benefit in California and Arizona: Potential impact bottlenecks and solutions.

Rady Children’s Hospital.

Translating api whole genome sequencing into precision medicine for infants in intensive care units.

60 slides, 30 minutes… buckle up.

Largely, this was triggered by Obama and Collins.  In San Siego, Rady donated $160M and said “make this a reality.”

This is all still an early stage.  We’re at the 0.25%… it’s going to take 10 years to deliver on this dream and make it medicine.

Scope: 35M people into california, and we can make it into a precision medicine centre.  Focus on Newborns – when a baby is born, doctors will do anything to save a baby’s life.  In CA, all babies feed into a network of hospitals down to specialize centres for expert care.  It’s a small number of health care systems that deliver care for babies.

Can we provide a scalable service like the NIH’s, and make an impact.

Why?  14% of newborns admitted in NICU or PICU. Leading cause of death is genetic diseases: 8250 genetic diseases.  Individually, they are rare, but aggregated they are common.  Conventional testing is too slow, and cost of care is $4000/day, so genomics is cheap comparitively.

Surviving: 35 babies in level 5 NICU… median survival is 60 days with genetic diseases…

Why single gene diseases?  They are tractable.  Looking for 1-2 highly penetrant variants that will poison a protein.  We have infrastructure that can deal with this information.  Orphan drugs are becoming a part of the scene.  Potentially, gene therapy might be scalable and real.

GAP: how do you scale the 26-hour diagnosis nationally.    Any clinic?  where there are no genetics.. etc.

It is possible to have dynamic EHR agents that monitor constantly.  How do you do it for babies?  [Review case presented earlier in conference.]

Disease heterogeneity is an issue – children may not have yet grown into phenotype.  Vast number of diseases, limited number of presentations.  So, start by Data mining medical record, then translate into a differential diagnosis.  Use HPO to calculate a projection of symptoms, which can be checked against other disorders.

Computer-generated list of 341 diseases that may fit feature.

Also, then, need a genome/exome.  Which one do we do?  Speed, sensitivity and specificity.  Genomes: one day faster, exomes are cheaper.

[An old Elaine Mardis slide: Fiscal environment:  $1000 genome is still a $100,000 analysis.]

Have a big bioinformatics infrastructure.  Analytics are very good.  But, diagnostic metrics may not be as good.  Use standard filtering tools to work out causative variants.

Major goal should be to automate ACMG style classification.

Structural variants should be included.  Not yet applied in clinical practice.  We are also missing de novo genome assemblies… but that’s coming as well.

When 26 hour process works, it really works.

Big gap: Gemome reimbursement.  Quality of evidence is pretty poor.  Need more original research, more randomized control studies, Standard testing of new diagnostic tests, is not good enough.  Payers are far more interested in other metrics.

Other groups have studied this around the world, using exome sequencing.  Diagnosis rate ~28%,  making it most effective method.  (Can be 25-50%, depending on unknown characteristics.)  Quality of phenotype may be a big issue.

WES + EHR can help to raise to 51% diagnosis.

de novo mutations are leading cause of genetic diseases in infants.  Really, forced to test trios.  This is a “sea-change” for the field.

Study: Trio exome sequencing yields 7.4% more diagnoses over sequencing proband alone.  ([Not entirely convincing…]

Another Study: 58% by WES vs. 14% standard methods.   [ And more studies – can’t show numbers fast enough.]

Faster you can turn around diagnostic, the faster you can get a change in care.

No recurrent mutations in infants treated… but some presentations are enriched for successful diagnoses.

Move on to Randomized control study:  just completed, admitted any NICU patient with phenotype suggestive of genetic tests.  15% molecular diagnosis by standard tests.  41% diagnosis with rapid WGS.  Had to end test early because it was clear that WGS was making a massive impact.

Problems and solutions: Focus back on parents and families, who may have different impression/understanding of testing or methods.  Don’t have enough experts to fill gap: 850,000 MDS, but only 1100 medical geneticists and 4000 genetic councillors. (Solution: more training, and possibly other experts?)

Triangle of priorities: Pick 2…

-> Scalable Clinical Utility <-> Rapid <-> Low Cost.  <-


  • Process engineering – scalable highly efficient methods
  • Clinical Research – much better evidence than we have now
  • Education and Engagement – med students need more training, for instance.  (Currently only get a day or a week of genetics…)


#AGBTPH – Mary Majumder, Prenatal testing

Baylor College of Medicine

Major worries: Conveying screening vs Diagnostic distinction.  (Do we convey that well to those who needs to know?)  Also, what to test for and report.  (How to support pregnant women and their partners.)

It’s hard to really communicate the difference between a diagnostic, vs a screen, when the screen is 99% accurate.

Personal toll on screens vs diagnostics can be significant.

When results come in, sometimes even the councillors have to do research online.  Definitive information can be hard to come by.

[This presentation is being told through comments from people who went through the process – entirely anecdotally based.  Hard to take notes on. Basically, support is lacking, and information is frequently unclear and difficult to communicate.]

Responses to challenges:  Professional societies are trying hard to improve on current state.  General predictive power calculator.  Still some distance to go.

[I’m way out of my depth – this talk is delving into social problems in the U.S. as much as the technology and the biology.  Much of this is related to terminating pregnancies, which caries social stigma here.  It’s interesting, but I can’t separate the salient points from the the asides.  The solutions to the problem mainly involve U.S. specific government structures.   I can follow, but I don’t feel that I can take notes for others that accurately reflect what’s being communicated.]



#AGBTPH – Nicolas Robine, NYGC glioblastoma clinical outcome study: Discovering therapeutic potential in GBM through integrative genomics.

Nicolas Robine, New York Genome Center  (@NotSoJunkDNA)

Collaborate with IBM to study Glioblastoma.

Big workup: Tumour normal WGS, tumour RNA-Seq, methylation array.

Pipeline: FASTQ, BAM, 3 callers each for {SNV, INDEl, SV}.  Rna-Sea uses fusionCatcher, Star-Fusion, Alignment with STAR.

It’s hard to do tumour normal comparison, so you need to get estimation of genes baseline.  Use TCGA RNA-Seq as background so you can compare.  Z-score normalization was suspicious, which correspond to regions of high-GC content.  Used EDASeq to do normalization, batch-effect correction with Combat.  Z-scores change over the course of the study, which is uncomfortable for clinicians.

Interpretation: 20h FTE/Sample.   Very time consuming with lots of steps, cumulating with a clinical report delivered to the referring physician.  Use Watson for Genomics to help.  Oncoprint created as well.

Case study presented: Very nice example of evidence, with variants found, RNA-seq used to identify complimentary deletion events, which cumulated in the patient being enrolled in a clinical trial.

Watson was fed same data – solved the issue in 9 minutes!  (Recommendations were slightly different, but same issues found.)  If the same sample is given to two different people, the same issue arrises.  It’s not perfect, but it’s not completely crazy either.

Note: don’t go cheap!  Sequence the normal sample.

[Wow]: 2/3rd of recommendations were done based on CNVs.

Now in second phase, with 200 cases, any cancer type.  29 cases complete.

What was learned:  identify novel variants in most samples, big differences between gene panel testing and WGS.  built a great infrastructure, and Watson for Genomics can be a great resource for scaling this.

More work needed, incorporating more data – and more data needed about the biology – and more drugs!

[Dring questions – First project: 30 recommendations, zero got the drugs. Patient are all at advanced phases of cancer, and has been difficult to convince doctor to start new therapies.  Better response with new project.]