>Searching for SNPs… a disaster waiting to happen.

>Well, I’m postponing my planned article, because I just don’t feel in the mood to work on that tonight. Instead, I figured I’d touch on something a little more important to me this evening: WTSS SNP calls. Well, as my committee members would say, they’re not SNPs, they’re variations or putative mutations. Technically, that makes them Single Nucleotide Variations, or SNVs. (They’re only polymorphisms if they’re common to a portion of the population.

In this case, they’re from cancer cell lines, so after I filter out all the real SNPs, what’s left are SNVs… and they’re bloody annoying. This is the second major project I’ve done where SNP calling has played a central role. The first was based on very early 454 data, where homopolymers were frequent, and thus finding SNVs was pretty easy: they were all over the place! After much work, it turned out that pretty much all of them were fake (false positives), and I learned to check for homopolymer runs – a simple trick, easily accomplished by visualizing the data.

We moved onto Illumina, after that. Actually, it was still Solexa at the time. Yes, this is older data – nearly a year old. It wasn’t particularly reliable, and I’ve now used several different aligners, references and otherwise, each time (I thought) improving the data. We came down to a couple very intriguing variations, and decided to sequence them. After several rounds of primer design, we finally got one that worked… and lo and behold. 0/2. Neither of them are real. So, now comes the post-mortem: Why did we get the false positives this time? Is it bias from the platform? Bad alignments? Or something even more suspicious… do we have evidence of edited RNA? Who knows. The game begins all over again, in the quest for answering the question “why?” Why do we get unexpected results?

Fortunately, I’m a scientist, so that question is really something I like. I don’t begrudge the last year’s worth of work – which apparently is now more or less down the toilet – but I hope that the why leads to something more interesting this time. (Thank goodness I have other projects on the go, as well!)

Ah, science. Good thing I’m hooked, otherwise I’d have tossed in the towel long ago.

>My Geneticist dot com

>A while back, I received an email from a company called mygeneticist.com that is doing genetic testing to help patients identify adverse drug reactions. I’m not sure what the relationship is, but they seem to be a part of something called DiscoverMe technologies. I bring mygeneticist up, because I had an “interview” with one of their partners, to determine if I am a good subject for their genetic testing program. It seems I’m too healthy to be included, unless they later decide to include me as a control. Nuts-it! (I’m still trying to figure out how to get my genome sequenced here at the GSC too, but I don’t think anyone wants to fund that…)

At any rate, I spoke with the representative of their clinical side of operations this morning and had an interesting conversation about my background. In typical fashion, I also took the time to ask a few specific questions about their operations. I’m pretty sure they didn’t tell me much more than was available on their various web pages, but I think there was some interesting information that came out of it.

When I originally read their email, I had assumed that they were going to be doing WTSS on each of their patients. At about $8000 per patient, it’s expensive, but a relatively cheap form of discovery – if you can get around some of the challenges involved in tissue selection, etc. Instead, it seems that they’re doing specific gene interrogation, although I wasn’t able to get the type of platform their using. This leads me to believe that they’re probably doing some form of literature check for genes related to the drugs of interest, followed by a PCR or Array based validation across their patient group. Considering the challenges of associating drug reactions with SNPs and genomic variation, I would be very curious to see what they have planned for “value-added” resources. Any drug company can find out (and probably does already know) what’s in the literature, and any genetic testing done without approval from the FDA will probaby be sued/litigated/regulated out of existance… which doesn’t leave a lot of wiggle room for them.

And that lead me to thinking about a lot of other questions, which went un-asked. (I’ll probably email the Genomics expert there to ask some questions, though I’m mostly interested in the business side of it, which they probably won’t answer.) What makes them think that people will pay for their services? How can they charge a low-enough fee to make the service attractive while getting making a profit? And, from the scientific side, assuming they’re not just a diagnostic application company, I’m not sure how they’ll get a large enough cohort to make sense of the data they receive through their recruitment strategy.

Anyhow, I’ll be keeping my eyes on this company – if they’re still around in a year or two, I’d be very interested in talking to them again about their plans in the next-generation sequencing field.

>Nothing like reading to stimulate ideas

>Well, this week has been exciting. The house sale competed last night, with only a few hiccups. Both us and the seller of the house we were buying got low-ball offers during the week, which provided the real estate agents lots to talk about, but never really made an impact. We had a few sleepless nights waiting to find out of the seller would drop our offer and take the competing one that came in, but in the end it all worked out.

On the more science-related side, despite the fact I’m not doing any real work, I’ve learned a lot, and had the chance to talk about a lot of ideas.

There’s been a huge ongoing discussion about the qcal values, or calibrated base call scores that are appearing in Illumina runs these days. It’s my understanding that in some cases, these scores are calibrated by looking at the number of perfect alignments, 1-off alignments, and so on, and using the SNP rate to identify some sort of metric which can be applied to identify an expected rate of mismatched base calls. Now, that’s fine if you’re sequencing an organism that has a genome identical to, or nearly identical to the reference genome. When you’re working on cancer genomes, however, that approach may seriously bias your results for very obvious reasons. I’ve had this debate with three people this week, and I’m sure the conversation will continue on for a few more weeks.

In terms of studying for my comprehensive exam, I’m now done the first 12 chapters of the Weinberg “Biology of Genomes” textbook, and I seem to be retaining it fairly well. My girlfriend quizzed me on a few things last night, and I did reasonably well answering the questions. 6 more days, 4 more chapters to go.

The most interesting part of the studying was Thursday’s seminar day. In preparation for the Genome Sciences Centre’s bi-annual retreat, there was an all-day seminar series, in which many of the PIs spoke about their research. Incidentally, 3 of my committee members were speaking, so I figured it would be a good investment of my time to attend. (Co-incidentally, the 4th committee member was also speaking that day, but on campus, so I missed his talk.)

Indeed – having read so many chapters of the textbook on cancer biology, I was FAR better equipped to understand what I was hearing – and many of the research topics presented picked up exactly where the textbook left off. I also have a pretty good idea what questions they will be asking now: I can see where the questions during my committee meetings have come from; it’s never far from the research they’re most interested in. Finally, the big picture is coming together!

Anyhow, two specific things this week have stood out enough that I wanted to mention them here.

The first was the keynote speaker’s talk on Thursday. Dr. Morag Park spoke about the environment of tumours, and how it has a major impact on the prognosis of the cancer patient. One thing that wasn’t settled was why the environment is responding to the tumour at all. Is the reaction of the environment dictated by the tumour, making this just another element of the cancer biology, or does the environment have it’s own mechanism to detect growths, which is different in each person. This is definitely an area I hadn’t put much thought into until seeing Dr. Park speak. (She was a very good speaker, I might add.)

The second item was something that came out of the textbook. They have a single paragraph at the end of chapter 12, which was bothering me. After discussing cancer stem cells, DNA damage and repair, and the whole works (500 pages of cancer research into the book…), they mention progeria. In progeria, children age dramatically quickly, such that a 12-14 year old has roughly the appearance of an 80-90 year old. It’s a devastating disease. However, the textbook mentions it in the context of DNA damage, suggesting that the progression of this disease may be caused by general DNA damage sustained by the majority of cells in the body over the short course of the life of a progeria patient. This leaves me of two minds: 1), the DNA damage to the somatic cells of a patient would cause them to lose tissues more rapidly, which would have to be regenerated more quickly, causing more rapid degradation of tissues – shortening telomeres would take care of that. This could be cause a more rapid aging process. However, 2) the textbook just finished describing how stem cells and rapidly reproducing progenitor cells are dramatically more sensitive to DNA damage, which are the precursors involved in tissue repair. Wouldn’t it be more likely then that people suffering with this disease are actually drawing down their supply of stem cells more quickly than people without DNA repair defects? All of their tissues may also suffer more rapid degradation than normal, but it’s the stem cells which are clearly required for long term tissue maintenance. An interesting experiment could be done on these patients requiring no more than a few milliliters of blood – has their CD34+ ratio of cells dropped compared to non-sufferers of the disease? Alas, that’s well outside of what I can do in the next couple of years, so I hope someone else gives this a whirl.

Anyhow, just some random thoughs. 6 days left till the exam!

>How many biologists does it take to fix a radio?

>I love google analytics. You can get all sorts of information about traffic to your web page, including the google searches people use to get there. Admittedly, I really enjoy seeing when people link to my web page, but the google searches are a close second.

This morning, though, I looked through the search tearms, and discovered that someone had found my page by googling for “How many biologists does it take to fix a radio?” And that had me hooked. I’ve been toying with the idea all morning, and figured I had to try to blog an answer to that. (I’ve already touched on the subject once, with less humour, but it’s worth revisiting.)

Now, bear in mind that I’m actually a biochemist and possibly a bioinformatician – and by some stretch of imagination, a microbiologist – so I enjoy poking fun at biologists, but it’s all in good humour. Biology is infinitely more complicated than radios, but it makes for a fun analogy.


This is how I see it going.

  • A nobel prize winner makes a keynote speech, expounding on the subject that biologists have completely ignored the topic of radios. They deserve to be studied and are a long neglected topic that is key to understanding the universe. The Nobel prize winner further suggests his own type of broken radio that he’s been tinkering with in his/her garage for several months as the model organism.
  • After the speech, several prominent biologists go to the bar, drink a lot, and then decide that the general consensus is that they should look at fixing broken radios.
  • Several opinion papers and notes appear on the subject, and a couple grad student written reviews pop up in the literature.
  • A legion of taxonomists appear, naming broken radios according to some principle that makes perfect sense to them. (eg. Monoantenna smithii, Nullamperage robertsoniaii). High school students are forced to learn the correct spellings of several prominent strains.
  • A Nature paper appears, describing the glossy casing of the Radio, the interaction of the broken radio with an electrical socket and the failed attempt to sequence the genome. Researchers around the world have been scooped by this first publication, and all subsequent attempts to publish descriptions of broken radios are not sufficiently novel to warrant publication in a big name journal.
  • Biologists begin to specialize in radio parts. Journal articles appear on components such as “purple red purple gold (PrpG), which is shown to differ dramatically from a similar appearing component, “blue green purple gold” (BgpG), and both are promptly given new names by ex-drosophila researchers: “Brothers for the Preservation of Tequila Based Drinks 12” and “Trombone.”
  • Someone tries to patent a capacitor, just in case it’s ever useful. Spawns three biotech companies, two of which spend $120 million dollars in less than 3 years and fold.
  • Someone does a knock out on a working radio and promptly discovers and names the component “Signal Silencing Subcomponent 1” or “Sss1”. 25 more are discovered in a high-throughput screen.
  • X-ray studies are done on Sss22, resulting in a widely acclaimed paper which will later result in a Nobel prize nomination. No one has the faintest idea how Sss22 works or what it does.
  • Science fiction writers publish several fantastic novels that one day we might be able to fix radios by replacing individual parts.
  • The religious right declares biologists are playing god, and that fixing radios is beyond the capacity of humans. The moral dilemmas are too complex. Ethicists get involved. The US president tries to cut funding for biologists doing research on broken radios.
  • A researcher invents a method of doing in-situ component complementation, which allows a single element to be bypassed and replaced with a new one. All new components are attached with green flags attached to them to make studying them easier.
  • Someone else invents a method of replacing a frayed power cord, producing a working phenotype from a broken radio. The resulting media storm declares the discovery of the cure for broken radios.
  • The technique for fixing power cords begins the long process of getting FDA approval. 10 years later (and with a $1bn investment showing that technique also works on lamps and doesn’t cause side effects in electric toothbrushes) the fix is allowed to go to market.
  • Marketing is conducted, telling people (with working and broken radios alike) that maybe they should try the cure, just in case they might have a frayed power cord some day too. They should talk to their doctor about if it’s right for them.
  • Advertisements appear on tv showing silent smiling people holding on to power cords.
  • Long term studies after the fact show that the new part wasn’t as good as it could have been. Sucking on it may cause liver damage.
  • Religious right takes recall as sign that science has failed again. Holistic fixes for frayed power cords appear, as well as organic electricity and antenna adjustment therapies, which work for some people. Products appear on the shopping channel.
  • Technology moves on, the radio becomes obsolete. Several biotech companies acquire each other in blockbuster mergers and begin working on new target components for computer sound cards.

Have a good weekend, everyone. (=