A bit of due diligence… Endra Life Science

So, yes, my blog has been unused for a while. I don’t tend to have much to post here anymore, so it’s become more tumbleweeds than relevant content. I find myself writing a lot on slack and reddit, which sucked the wind out of my blog. Ah well.

So, if you’re not interested in investing, you’re just going to want to skip this post entirely.

In the name of finding a purpose for the blog, it’s going to be a bit more of a dumping ground for a while. For today, it’s a dump for some due diligence I did on a company called Endra Lifesciences, which was rejected by the publisher because the company is too small (Yes, really, it’s a public company, but it is too small for most people to care about it, apparently.) and because I didn’t adequately discuss the downside of the company. For a company this size, the worst case downside is 100%, obviously. The company is worth about $10M, and if they fail, I would assume the worst. What’s there to discuss about the downside?

Anyhow: Below is the follow up post to one I wrote elsewhere.


A quick summary of where we are:

I’ve already discussed Endra, a company focussed on a Thermal Accoustic add-on for Ultrasound (TAEUS), bringing a host of new capabilities to the cheap and efficient ultrasound platform which currently is ubiquitously used in the clinic.  Endra’s first order of business has been to use it for detecting fat around the liver, a major component of Non-Alcoholic Fatty Liver Disease (NAFLD).  This is currently a major market, with several companies working on treatments.  However, the diagnostic space around it is dominated by Magnetic Resonance Imaging (MRI) and Biopsies.  

Compared to MRI and Biopsies, TAEUS and ultrasound are cheaper and non-invasive, making it a very preferable alternative to either one.  MRIs are an order of magnitude more expensive, and biopsies involve someone taking a piece of your liver.  TAEUS simply scans your liver, much the same way you’d have an ultrasound for any other disease – or during pregnancy to check on the health of the fetus.  

I’m going to leave out the explanation of how TAEUS works, and some of the market background – those are all in my previous article on Endra (and on the company’s website).  Instead, I want to focus on the last press release.  ENDRA Life Sciences Reports Completion and Top Level Findings of Second Phase of Robarts Research Institute Liver Fat Feasibility Study

What metric to use?

Endra, in previous press releases, has used a metric, known as R^2 (R-squared), to compare the values they get with those of other technologies.  It’s a basic statistical tool that tells you how much alike two groups are.  It’s one of the things you learn in Stats 101, in the first few weeks.  It makes perfect sense, if you have a value that’s the “ground truth” to compare with.  (eg, if one of your measures have very little error, and you want to compare with that.)

However, Statisticians generally don’t use this metric for complex analyses.  It’s best left for use by undergraduates studying artificial populations. Instead, most analyses tend to use Receiver operating characteristics as a better way of comparing two different methods.  It yields two important bits of information: Sensitivity and Specificity.  If you read any papers on comparing algorithms, software or hardware, you’re going to get an analysis that shows these values – and for good reason:  they actually tell you what’s going on, where R^2 does not.  

In this case, the R^2 value for the correlation between MRI and TAUES released (R^2=0.54) effectively says “These results don’t look the same”, but the explanation for why is just a few lines below.  Let me explain:

Sensitivity:

Sensitivity is the measure of how often you find the true positives, which is the people who are correctly identified as having the diseases. If you have a test with a sensitivity of 50%, your test will diagnose you correctly half of the time.  Obviously a high sensitivity is desirable, because you don’t want to send sick people home without treatment.

From the recent press release, they claim TAEUS has a sensitivity of 90%, which is pretty good for a non-invasive test.  Even better, they provide estimates that MRI (the next best competition, at more than 10x the price) has a sensitivity between 68% and 87%.  We should interpret that as indicating that the TAEUS device is between 1- and 1.3-times better than MRI at identifying patients with NAFLD.  

That’s already the first indication of why the R^2 value is so poor – MRI tends to miss patients with NAFLD.  That skews the “similarity” between the MRI and TAEUS datasets.  I had bought in to the “R^2 is a good measure” with the last article, assuming MRI had both good specificity and sensitivity, but the current trials have shown that’s not the case.  Thus, R^2 is bundling up a the error from both techniques, making it look pretty bad.

Specificity:

Specificity is the other key, describing the true negatives.  In diagnostics, this is the group of people who you correctly diagnose with not having the disease.  Here’s where MRI shines: According to Endra’s PR, MRI comes with a specificity of 83-98%, meaning you rarely accidentally diagnose someone with NAFLD when they don’t have it.  TAEUS comes in at 75%, meaning that some people will be diagnosed with the disease when they don’t have it.  

Is that bad?  well, no!  Actually it isn’t.  

Complimentary techniques:

Having two tools with complimentary sensitivity and specificity is actually the desired outcome!  Think about this:  I have 100 patients that need to be screened, of which 10 of them have NAFLD. (Yes, I’m making up these numbers, since we don’t actually know the number of people who have NAFLD, because it’s too expensive to screen everyone right now…).  I have a cheap tool that can identify the patients with the disease, but might include a few extras who don’t, and I have an expensive tool that can rule out people who don’t have the disease?  How do you work this?

Easy!  You screen everyone with the cheap tool, leaving you with a group of about 13-14 people, who can then be screened on the expensive tool to rule out the 3-4 people who were accidentally included.  The overall costs decrease, because you’ve prevented 85% of the people from needing the expensive tool.

Even better, you no longer need to do biopsies, because you have two tools that will give you the right answer when combined – at a fraction of the cost of the biopsy + MRI tool kit that exists right now.

De-risking:

Much of Endra’s current share price is discounting the TAEUS platform because of the poor R squared value, but completely ignoring the golden nugget they left in the PR right next to it.  If you’re not a statistician, you probably missed that entirely.

Thus, Endra is currently sitting at a bargain price after de-risking the platform.

GE, one of the biggest ultrasound manufacturers, remains a significant partner for commercialization, and many of the management talent acquisitions have come from GE in recent months.  (eg. Amy Sitzler, Vice President of Engineering and Programs).  What’s left, however are publications and certifications, which we know are now on deck.  The publication for the full data set, from which this latest PR was an extract, should come out in about a month, and will hopefully explain the full data set better than the PR.

Certifications should also be on their way, given that two of Endra’s goals for the remaining part of 2019 include filing for both a CE mark (European mark for safety) and US 510(K) designation (U.S. certification for safety and efficacy) (see slide 12).

Current Financial Status:

On top of the $2.2M cash on hand at the end of June, Endra raised a further $2.8M, as predicted, in July, however, they chose to do it via a private placement. From recent shareholder updates, we also know that their R&D costs ($1.3M for 3 months, p.4) are supposedly dropping rapidly, now that the bulk of the R&D is behind them.  That, however, leaves a pretty big question mark on how long the $5M will last.  If R&D costs have completely dissipated, we can expect to see it last well into the new year. Additionally, if outstanding warrants are exercised (the latest batch have an exercise price of $1.49), Endra could pull in close to another $3M, at anytime.  Most likely, they should have received their CE mark, and possibly their 510(K) before they need to raise again – and by then, we can expect to see the first units rolling out into Europe, making them revenue generating.

Summary:

Endra, as before, remains a risky investment, given its small size, but the amount of risk just decreased dramatically, despite the drop in share price. They are well positioned for success, and should have no problem raising money, if they need to, and should be transitioning into commercial distribution in the next 4-6 months in Europe, and in under a year in the U.S. 

Design course – post 1

As promised, I said I’d give some updates on my learning UI/UX design…. and I have a few thoughts on the issue. Don’t expect this to be coherent.. It’s not.

First off, I spent some time looking at courses on the subject and quickly realized I don’t have 20 weeks to dedicate to this, so I’m just going to have to do it on my own, just the way I’d have done it in grad school: find a reference, and beat my head on it as much as possible. So, after some digging, I settled on this: https://hackdesign.org/lessons101

Each lesson seems to take a couple of hours, and I’ve managed to get through the first 5. Beware, like grad school, you’re only going get out of it what you take the time to learn. Design appears to be something that takes a lot of practice.

The first lesson is pretty abstract – it mostly focuses on the idea that design and web coding are separate skill sets. Given that I’m lacking in both, that doesn’t bother me – I’m here to acquire both! More troublesome, though is the insistence that both be done separately. Given that I’m a one-man-full-stack-developer on all of my current projects, there’s no real arms length possible.. so I’m just going to have to do the best I can. Mostly, that means drawing things out on paper, and then abandoning those plans as I work out what I can actually accomplish with my limited CSS/JS coding skills.

The next few lessons, focusing on the basics, were pretty useful. I skipped some of the tutorials on the tools of design. Yes, I can use a pen and paper because I’m not going to be showing my lousy designs to a panel of judges, so I haven’t worried too much about that part. However, the sections on typography and layouts were fascinating.

Typography, after reading several essays on the subject, is entirely subjective. I can summarize it as this: If it makes your page readable and clean looking, you’re doing well. Don’t go overboard with more than 2 fonts, don’t pick fonts that don’t do what you want, and don’t try to use CSS/HTML to make fonts do things they weren’t designed to do. The take away message is just that you just have to go by what you think looks ok.

Considering that I previously just used the default fonts for everything, though, that’s already a good lesson for me.

The real take away from all of this was the section on responsive UI’s. I didn’t know that was a thing: basically, you use CSS to allow your page to seamlessly resize itself as you grow or shrink the window. Trivial, you might say, but it was eye opening to me. I didn’t know that was possible – and I didn’t know that there are frameworks and pre-built CSS/HTML examples of that. It completely changed the way I thought about layouts.

In fact, so much so, that I have started practicing with it already. As a demo workspace, I’ve been templating up a replacement for my fejes.ca domain. What’s there now is 1997 technology. Hopefully in a day or two, I can apply what I’ve learned for a revamp. I don’t know how that’ll go, but it’s a great place to start learning.

If you were expecting a conclusion, however, I’m going to have to disappoint you. After 10 hours of delving into HTML/CSS/JS today, I’m still blindly flailing around. I have much to learn, but at least I can say that I have successfully applied the lessons in the first 5 chapters of the design tutorial. On the bright side, I only have another 45 chapters still to go!

UX/UI design time

So, this is something new – and imminently blog-able.  I’ve been given a challenge, which I take seriously.  After decades of working on back ends, it has been brought to my attention that my UI/UX design skills are, shall we say, lacking.

Thus, I am going to embark on brief journey to learn some design.  Now, I could do this by taking a course, and spend 24 weeks on it, but a brief reading of online or on-campus courses tells me that most of the time is spent learning such useful skills as using a text editor, and “HTML”.   And, for my purposes, it’s not that helpful to learn JQuery.  I mostly need to learn how to make a decent page that engages users – aka, just the design part of it.

Obviously, something that I”m not so good at.  No, I’m not going to show you screen shots.  I’ll admit they’re embarrassing… and if I get good at this, I’ll post a before and after picture.

So, the challenge.  5 days to learn how to make a passable web page, that encourages use. (And doesn’t look like industrial html from 1993.) . By the time everyone comes back from holidays, i want to have a much more engaging grasp of design, and how to execute that design.

Game on! 

If anyone has any recommendations… uh… yeah, they’re very welcome.

Trader Joes Slug Bait – Grapefruit Ale

Totally off topic for my usual posts, but I had to share this, since it’s well worth passing along my discovery.

The back story for this, in a nutshell: My daughter brought home a strawberry plant from school, which we planted in the back garden. Shortly after planting it, it started growing a single strawberry, which my daughter was absolutely excited to eat – figuring it was the first thing she’d grown from seed that had produced a fruit. However, as soon as it turned the slightest bit red, the berry was eaten by a slug, in what must have been a magnificent feast for the slug, and a very sad moment for my daughter.

Consequently, I declared war on the backyard slug. I was going to take it out – and I was going to keep the next strawberry safe, as long as it didn’t require me staying up all night with a pair of tweezers. If you look online, there are a lot of things you can do to get rid of slugs, which include covering your garden in diatomaceous earth or chalk, and a host of other somewhat annoying remedies. I opted for the least destructive and least costly: beer. If you put out a bowl a beer, the slugs (theoretically) will come drink some, fall in, and die in the alcohol. While I rarely celebrate things dying, this sounded like the perfect vengeance on the slug in the back yard… and of course, revenge is a dish best served cold, even for slugs.

First day, I put out a bit of beer from some fancy micro-brew that I was trying, with moderate success. I think we caught a slug or two overnight. That blew my theory that there was only one slug in the back yard, so obviously, I had to keep going. The next time, I put out a bit of cheap Trader Joe’s Simpler Times beer… again, it caught a couple of slugs.

The next night, my wife and I were sampling Trader Joe’s Grapefruit Ale, since it sounded interesting, and we do like a good white beer with orange in it – a great hot summer’s day classic pairing. However, to my dismay, Grapefruit flavour Ale is possibly one of the most horrifically unpalatable beers I’ve ever tasted. I have no idea how this got onto trader joe’s shelves, though it seems to get decent ratings from other people… though it sounds like a reason to have your taste buds checked. Rather than pour it out, however, I used it to refill the slug trap in the back yard.

Instant results! Where I was catching one or two slugs a day before, suddenly, I was catching 4-5 slugs every time I went outside to check on the trap. Over 24 hours, the slug trap was catching 20-30 slugs. They’re flocking to the grapefruit beer, and I can simply walk up and poke them in to their deaths with a trowel. After a week of this, I’ve easily caught close to a hundred slugs in a garden that’s not much bigger than a couple of sandboxes.

Anyhow, the point of this story is that you should stock up on grapefruit flavoured ale at Trader Joes, before they sell out. Just in case the slug invasion ever happens, you’ll be prepared. I don’t, however, suggest drinking it, unless you’re out of other beers and the slugs are on your side, when the snails begin their quest for world domination.

Issues with Python 3.6.3 Multiprocessing. Anyone else see the same?

On the ferry once more, and wanted to share a hard-fought lesson that I learned today, which somewhat updates my post from the other day on multiprocessing and python 3.6.3.

Unfortunately, the lessons weren’t nice at all.

First, I discovered that using the new Manager object is a terrible idea, even for incredibly simple objects (eg. an incrementing value, incremented every couple of seconds). The implementation is significantly slower than creating your own object out of a lock and a shared value, just to have two threads take turns incrementing the value. Ouch. (I don’t have my bench marks, unfortunately, but it was about 10% of the run time, IIRC.)

Worse still, using a manager.Queue object is horrifically bad. I created an app where one process reads from a file and puts things into a a queue, and a second process reads from that queue and does some operations on the object. Now, my objects are just small lists with one integer in it, so they’re pretty small. Switching from a multiprocessing Queue to a Manager Queue caused a 3-fold increase in the time to execute. (5 seconds to 15 seconds.) Given that the whole reason for writing multiprocessing code is to speed up the processing of my data, the Manager is effectively a non-starter for me.

I understand, of course, that that overhead might be worth it if your Manager runs on a separate server, and can make use of multiple machines, but I’m working on the opposite problem, with one machine and several cores.

The second big discovery, of course, was that multiprocessing Queues really dont’ work well in python 3.6.3. I don’t know when this happened, but somewhere along the line, someone has changed their behaviour.

In 2.7, I could create one process that fills the Queue, and then create a second type of process that reads from the queue. As long as process 1 is much faster than process 2, the rate limiting step would be process 2. Thus, doubling the number of process 2’s, should double the processing of the job.

Unfortunately, in 3.6.3, this is no longer the case – the speed with which the processes obtain data from the queue is now the rate limiting step. Process 2 can call Queue.get(), but get is only serving the data at a constant speed, no matter how many processes 2’s are there calling the Queue.get() function.

That means that you can’t get any speed up from multiprocessing Queues…. unless you have a single queue for every process 2. Yep… that’s what I did this afternoon. Replaced the single queue with a list of queues, so that I have a single Queue for every processing queue.

Bad design, you say? Yes! I agree. In fact, since I now have a set of queues in which there’s only one writer and one reader, I shouldn’t be using queues at all. I should be using Pipes!

So, tomorrow, I’ll rip out all of my queues, and start putting in pipes. (Except where I have multiple processes writing to a single pipe, of course)

I don’t know where multiprocessing in python went wrong, but that was a severely disappointing moment this morning when I discovered this issue. For now, I’ll resist the urge to return to python 2.7.

(If anyone knows where I went wrong, please let me know – we all make mistakes, and I’m really hoping I’m wrong on this one.)

Bioinformatics toolchain

Once again, it’s a monday morning, and I’ve found myself on the ferry headed across the bay, thinking to myself, what could be better than crowdsourcing my bioinformatics toolchain, right?

Actually, This serves two purposes: It’s a handy guide for myself of useful things to install on a clean system, as well as an opportunity to open a conversation about things that a bioinformatician should have on their computer. Obviously we don’t all do the same things, but the concepts should be the same.

My first round of installs were pretty obvious:

  • An IDE (Pycharm, community edition)
  • A programming language (Python 3.6)
  • A text editor (BBEdit… for now, and nano)
  • A browser (Chrome)
  • A package manager (Brew)
  • A python package manager (pip)
  • A some very handy tools (virtualenv, cython)
  • A code cleanliness tool (pylint)

I realized I also needed at least one source code tool, so the obvious was a private github repository.

My first order of business was to create a useful wrapper for running embarassingly parallel processes on computers with multiple cores – I wrote a similar tool at my last job, and it was invaluable for getting computer heavy tasks done quickly, so I rebuilt it from scratch, including unit tests. The good thing about that exercise was that it also gave me an opportunity to deploy my full toolchain, including configuring pylint (“Your code scores 10.0/10.0”), and github, so that I now have some basic organization and working environment. Unit testing also forced me to configure the virtual environment and the dependency chains of libraries, and ensured that what I wrote was doing what I expect.

All in all, a win-win situation.

I also installed a few other programs:

  • Slack, with which I connect with other bioinformaticians
  • Twitter, so I can follow along with stuff like #AMA17, which is going on this weekend.
  • Civ V, because you can’t write code all the time. (-:

What do you think, have I missed anything important?

A few hints about moving to Python 3.6 (from 2.7) with Mutliprocessing

To those who’ve worked with me over the past couple years, you’ll know I’m a big fan of multiprocessing, which is a python package that effectively spawns new processes, much the same way you’d use threads in any other programming language.  Mainly, that’s because python’s GIL (global interpreter lock) more or less throttles any attempt you might seriously make to get threads to work.  However, multiprocessing is a nice replacement and effectively sidesteps those issues, allowing you to use as much of your computer’s resources as are available to you.

Consequently, I’ve spent part of the last couple days building up a new set of generic processes that will let me parallelize pretty much any piece of code that can work with a queue.  That is to say, if I can toss a bunch of things into a pile, and have each piece processed by a separate running instance of code, I can use this library.  It’ll be very handy for processing individual lines in a file (eg, VCF or fastq, or anything where the lines are independent)

Of course, this post only has any relevance because I’ve also decided to move from python 2.7 to 3.6 – and to no one’s surprise, things have changed.  In 2.7, I spent time creating objects that had built in locks, and shared c_type variables that could be passed around.  In 3.6, all of that becomes irrelevant.  Instead, you create a new object, a Manager().

The Manager is a relatively complex object, in that it has built in locks – for which I haven’t figured out how efficient they are yet, that’s probably down the road a bit – which makes all of the Lock wrapping I’d done in 2.7 obsolete.  My first attempt a making it work was a failure, as it constantly threw errors that you can’t put Locks into the Manager.  In fact, you also can’t put objects containing locks (such as multiprocessing Value) into the Manager. You can, however, replace them with Value objects from the manager class.

The part of the Manager that I haven’t played with yet, is that they also seem to have the ability to share information across computers, if you launch it as a server process.  Although likely overkill (and network latency makes me really shy away from that), it seems like it could be useful for building big cluster jobs.  Again, something much further down the road for me.

Although not a huge milestone, it’s good to have at least one essential component back in my toolkit: My unit test suite passes, doing some simple processing using the generic processing class.  And yes, good code requires good unit tests, so I’ve also been writing those.

Lessons learned the hard way are often remembered the best.  Writing multiprocessing code out from scratch was a great exercise, and learning some of the changes between 2.7 and 3.6 was definitely worthwhile.

Dealing with being a lone bioinformatician – social media.

As I settle into my new job, I’ve quickly realized that I’m going to be a “lone bioinformatician” for a little while, and that I’m going to have to go back to my old habits of twitter and blogging, in order to keep up with the world around me.  In addition, I’m finding myself on slack as well, in the reddit bioinformatics channel.  The idea is that I’ll be able to keep in touch with developments in my field better this way.

That said, my current following list is heavily tilted towards non-bioinformatics, so I’ve begun the long journey of purging my list.  (If I’ve unfollowed you… sorry!)  The harder part will be trying to figure out who it is that I should be following.

The bright side of this is that the long ferry rides at either end of my day are giving me time to do some of this work, which is an unexpected bonus. I had no idea that adding to my commute time would also add to my productivity.

That said, If anyone has any suggestions about who I should be following on twitter or in blog format, please let me know – I’ll cheerfully compile a list of twittering/blogging bioinformaticians, or if you already know of a current list, I’d love to hear about it.

In the meantime, if you’re interested in joining a bioinformatics slack, please let me know, and I’d be happy to add you.

On a boat – and starting new things.

Well, it’s a ferry.  Does that count?

My new commute plan takes me from Oakland to South San Francisco by boat, across the bay on the ferry, with a short bike ride on either side.  Given that this is still day 2 of taking the ferry, I’ve still got that warm glow of disbelief that I get to watch the sunrise and sunset from a boat.  Too cool.

Anyhow the important thing is why I’m doing this, which should obviously be because I have a new job.  After three and a half years with Fabric Genomics, it was time to move on.  I left all of my work there in good hands, and the projects I wanted to finish were all wrapped up… and now I’ve got an opportunity to do some real bioinformatics, and not just engineering.  That’s a huge draw for me, really.  I miss doing algorithm design and working with the data, which is pretty much the part of bioinformatics that drew me to the field in the first place.  It’s nice to know that I can do kick-ass engineering, but it’s hard to see myself doing it much longer.

Anyhow, I’m very excited about my new job at Tenaya Therapeutics, and super thrilled to be working with a really awesome group of people.  Unlike many pharmaceutical companies, they’re thinking about their data right from the start.  That may seem obvious, but it honestly wasn’t – I’ve spoken to a lot of companies that had grown to 300+ people, with tons of research programs, and were just now thinking that they should hire someone who understands large scale data.  At that point, it’s way way way too late.  No matter how fast you’ll run, one bioinformatician will never be able to keep up with 60+ scientists generating data.

At any rate, I’d love to say more about what I’m doing, but that’s a conversation I’ll have to start up.  As I’ve learned over the years, surprises aren’t good.for anyone, unless it’s a birthday.

Stay tuned for more.

#AGBTPH – Kenna Mills Shaw, Precision oncology decision support: Building a tool to deliver the right drugs(s) at the right time(s) to the right patient(s).

[I have to catch a flight to the airport, so can’t stay for the whole talk…. d’oh]

@kennamshaw

Very narrow definition of precision medicine:  Use NGS to find patients who may respond better to one drug or another, or be resistant to a class of drugs: just matching patients to drugs.

Precision medicine is completely aspirational for patients.  We still do a bad job of figuring out how to match patients with drugs.  Right now, we don’t do it well – or at all.

We’re all bad at it, actually.

  • which patients should get tested?
  • use data to impact care
  • demonstrating data changes oucome
  • deciding how much of genome to sequence
  • how do we pay for it?

Why was MD anderson bad at it?

Patients concerned about, are those who have exhausted standard therapies, for instance.

Drop in cost leads to increases in data generation.  We all suck at using this data to impact outcome for patient.   MD Anderson was only able to impact 11% of patients with potentially actionable information.

Whole exome at other institutes were getting 5% (Beltran et al)

There are only 125 “actionable” genes.

NGS is not sufficient or necessary to drive personalized medicine.

Why?

  • solid tumours, behind liquid tumours because it’s hard to get the DNA.
  • Accessibility  – timing of data
  • Attitudes of doctors as well.

Leukaemia docs also use the molecular signature as well as other data to clarify.  Solid tumour docs do not.

Ignoring copy number, only 40% of patients have actionable variants.  (goes way up with copy number.)

Clinical trials categorized by type of match – even broadly, that’s 11% of patients.  Lack of enrolment not due to lack of available matched trials.

[Ok… time to go… alas, can’t stay to see the end of this talk.]