>An interesting converation on bioinformatics business models

>Every once in a while, I suddenly remember SeqAnswers.com, and rush over there to see what I’ve been missing. (My occasional lapses generally coincide with my bi-weekly meetings with my supervisor, an upcoming talk or something of that sort…) SeqAnswers is easily the best resource on Next-Gen sequencing, and I truly enjoy the people that hang out on that forum.

Anyhow, I’ve been participating in an interesting conversation on the business of bioinformatics and next-gen sequencing. It started off on a question on market research, and then blossomed into a much wider ranging conversation. One re-occurring thread in the discussion is if there are valid bioinformatics business models in which the bioinformatics application is the commodity. I maintain that there aren’t but clearly other people disagree.

In the name of encouraging a wider audience to contribute, I thought I’d ask anyone who’s reading my blog what they think. Join in here or on the forums.

Cheers!

>A sunny day in Vancouver

>It’s a weekend, so I’m going to stay off topic for a little while longer. People often believe in the stereotype that it rains all the time in Vancouver. Well, that might be true from November-February, but the here summers are fantastic. Here’s just a little proof of that. (=

Now, you’ll have to excuse me – I have some sunshine to enjoy.

>m-based heirarchy

>An IRC friend of mine proposed the following hierarchy of terms for reactiveness and I liked it so much, I figured I’d have to post it here so that I wouldn’t forget it.

minimal < minor < mild < moderate < marked < major < maximal

It’s not news worthy, but I really liked it and figured other people might get some use out of it. Thanks Jasabella!

And, in case you’re wondering, you can find me on Efnet (#chemistry) and Freenode (#bioinformatics). I don’t watch the window all the time, but if you say my name, you’ll get my attention.

>CSHL: Personal Genomes

>For all that I’ve been ranting about how much I dislike Cold Spring Harbor’s policies on blogging (or at least the rumours about how they’ll be changing them in the future), I have to admit that they do have the coolest topics for conferences.

I just received an advertisement in the mail for their upcoming “Personal Genomes” conference in September. I’d like to reprint their ad’s description (I’m citing fair use here, just in case any one wonders why I feel free to reproduce it.) for anyone who’s interested:

“This second meeting builds on last year’s presentations showing a significant milestone in human genetics – the first production of “personal genomes.” Ultra high through put swequencing strategies have now been used to study more individual genomes – and yet few scientists and even fewer clinical geneticists, are familiar with the implications of this new data. This meeting will address the issues of individual genomes being part of research and routine clinical medicine within the new years.”

Far too cool. Here’s a link to the web page.

They have applied for funding to partially support postdocs and graduate students, so you’d better start working on that abstract if you’re intersted: they’re due July 1st.

By the way, the conference runs from Sept 14-17, 2009.

>FindPeaks 4.0

>Well, I’ve finally gotten to it: the tag for FindPeaks 4.0. At this point, I’m more or less satisfied with what made it in to this release: Saturation, Controls, Compares and a whole lot of changes to the underlying machinery. The documentation is still going through some changes, (I have another two flags to add in) and a lot more clarification to do on what some of the parameters actually accomplish, but it’s now in a reasonable state.

Despite the milestone, this project is really a constant evolution. I’m already thinking about what should be in the next version (4.1?): Support for SAM/BAM, “peakless peak calling” for regions instead of peaks, a vastly upgraded FindFeatures code and a host of small changes that I had thought weren’t worth the effort for this particular release. I’m even considering a GUI, if I can squeeze it in. (If anyone would like to help out on that project, I’d be thrilled to add them to the project!)

At this point, I’m happy to say I’m not aware of any outstanding coding bugs – although I do take it seriously that there is an open bug remarking that the documentation is insufficient. I’ve been worknig on improving it, and reorganizing the manual, which should be done in the next couple days. Once that’s done, I’ll jump back into using my code to do some analysis of my own. There are a few really neat things, based on work on my poster, that I’d like to play with. I guess that’s what they say about coders: when you write software for yourself, you never lack the motivation to add in one more feature. (=

>4 Freedoms of Research

>I’m going to venture off the beaten track for a few minutes. Ever since the discussion about conference blogging started to take off, I’ve been thinking about what the rights of scientists really are – and then came to the conclusion that there really aren’t any. There is no scientist’s manifesto or equivalent oath that scientists take upon receiving their degree. We don’t wear the iron ring like engineers, which signifies our commitment to integrity…

So, I figured I should do my little part to fix that. I’d like to propose the following 4 basic freedoms to research, without which science can not flourish.

  1. Freedom to explore new areas
  2. Freedom to share your results
  3. Freedom to access findings from other scientists
  4. Freedom to verify findings from other scientists

Broadly, these rights should be self evident. They are tightly intermingled, and can not be separated from each other:

  • The right to explore new ideas depends on us being able to trust and verify the results of experiments upon which our exploration is based.
  • The right to share information is contingent upon other groups being able to access those results.
  • The purpose of exploring new research opportunities is to share those results with people who can use them to build upon them
  • Being able to verify findings from other groups requires that we have access to their results.

In fact, they are so tightly mingled, that they are a direct consequence of the scientific method itself.

  1. Ask a question that explores a new area
  2. Use your prior knowledge, or access the literature to make a best guess as to what the answer is
  3. Test your result and confirm/verify if your guess matches the outcome
  4. share your results with the community.

(I liked the phrasing on this site) Of course if your question in step 1 is not new, you’re performing the verification step.

There are constraints on what we are allowed to do as scientists as well, we have to respect the ethics of the field in which we do our exploring, and we have to respect the fact that ultimately we are responsible to report to the people who fund the work.

However, that’s where we start to see problems. To the best of my knowledge, funding sources define the directions science is able to explore. We saw the U.S. restrict funding to science in order to throttle research in various fields (violating Research Freedom #1) for the past 8 years, which was effectively able to completely halt stem cell research, and suppress alternative fuel sources, etc. In the long term, this technique won’t work, because the scientists migrate to where the funding is. As the U.S. restores funding to these areas, the science is returning. Unfortunately, it’s Canada’s turn, with the conservative government (featuring a science minister who doesn’t believe in evolution) removing all funding from genomics research. The cycle of ignorance continues.

Moving along, and clearly in a related vein, Freedom #2 is also a problem of funding. Researchers who would like to verify other group’s findings (a key responsibility of the basic peer-review process) aren’t funded to do this type of work. While admitting my lack of exposure to granting committees, I’ve never heard of a grant being given to verify someone else’s findings. However, this is the basic way by which the scientists are held accountable. If no one can repeat your work, you will have many questions to answer – and yet the funding for ensuring accountability is rarely present.

The real threat to an open scientific community occurs with the last two Freedoms: sharing and access. If we’re unable to discuss the developments in our field, or are not even able to gain information on the latest work done, then science will come grinding to a major halt. We’ll waste all of our time and money exploring areas that have been exhaustively covered, or worse yet, come to the wrong conclusions about what areas are worth exploring in our ignorance of what’s really going on.

Ironically, Freedoms 3 and 4 are the most eroded in the scientific community today. Even considering only the academic world, where freedoms are taken for granted our interaction with the forums for sharing (and accessing) information are horribly stunted:

  • We do not routinely share negative results (causing unnecessary duplication and wasting resources)
  • We must pay to have our results shared in journals (limiting what can be shared)
  • We must pay to access other scientists results in journals (limiting what can be accessed)

It’s trivial to think of other examples of how these two freedoms are being eroded. Unfortunately, it’s not so easy to think of how to restore these basic rights to science, although there are a few things we can all do to encourage collaboration and sharing of information:

  • Build open source scientific software and collaborate to improve it – reducing duplication of effort
  • Publish in open access journals to help disseminate knowledge and bring down the barriers to access
  • Maintain blogs to help disseminate knowledge that is not publishable

If all scientists took advantage of these tools and opportunities to further collaborative research, I think we’d find a shift away from conferences towards online collaboration and the development of tools favoring faster and more efficient communication. This, in turn, would provide a significant speed up in the generation of ideas and technologies, leading to more efficient and productive research – something I believe all scientists would like to achieve.

To close, I’d like to propose a hypothesis of my own:

By guaranteeing the four freedoms of research, we will be able to accomplish higher quality research, more efficient use of resources and more frequent breakthroughs in science.

Now, all I need to do is to get someone to fund the research to prove this, but first, I’ll have to see what I can find in the literature…

>More on conference blogging…

>If you’ve been following along with the debate on conference blogging, you’ve surely been reading Daniel McArthur’s blog, Genetic Future. His latest post on the subject provides a nifty idea: presenters who are ok with their talks being discussed should have an icon in the conference proceedings beside the anouncement of their talks so that members of the audience know it’s safe to discuss their work. He even goes so far as to present a few icons that could be used.

On the whole, I’m not opposed to such a scheme – particularly at conference like Cold Spring, where unpublished information is commonly presented and even encouraged by the organizers. However, Cold Spring is one of the few rare venues where the attendance is “open”, but the policy on disclosing the information is restricted. It’s entirely regulated for journalists, but in the past has not been an issue for scientists. However, if a conference begins to restrict what the scientists are allowed to disclose outside of the meetings, the organizers are really removing themselves from the free and open scientific debate. A conference that does that isn’t technically a conference – at best it’s a closed door meeting – and the material should explicitly be labeled as confidential.

Assuming that the vast majority of presentations can’t be discussed without explicit permission is quite the anathema of science. If you look at the way technology is handled in western society, you’ll see a general trend: The patent system is based around the idea of disclosure, copyright is based on the idea of retaining rights after disclosure, and even our publication/peer review system demands full disclosure as the minimum standard. (Well, that plus a wad of cash for most journals…) For most conferences, then, I suggest we use a more fitting model than opting-in to allow disclosure, as proposed by Daniel. Rather, we should provide the opportunity to opt-out.

All presenters should have the option of choosing “I do not want my presentation disclosed.” We can even label their presentation with a nice little dohicky that indicates that the material is not for public discussion.


Audience members who attend the talk then agree that they are not allowed to discuss this information after leaving the room. Why operate in half measures? It’s either confidential or it’s not. Why should we forbid people from discussing it online, and then turn a blind eye to someone reading their notes in front of the non-attending members of their institution?

Hyperbole aside, what we’re all after here is a common middle-ground. Science Bloggers don’t want to bite the hands of the conference organizers, and I can’t really imagine conference organizers not being interested in fostering a healthy discussion. After all, conferences like AGBT have done well because of the buzz that surrounds their organization.

As I said in my last post on the topic, Science does well when the free and open exchange of ideas is allowed to take place, and people presenting at conferences should be aware of why they’re presenting. (I leave figuring out those reasons as exercise to the student.)

Lets not throw the blogger out with the bathwater in our haste to find a solution: Conferences are about disclosure and blogs are about communication: aren’t we all working towards the same goal?

>Another day, another result…

>I had the urge to just sit down and type out a long rant, but then common sense kicked in and I realized that no one is really interested in yet another graduate student’s rant about their project not working. However, it only took a few minutes for me to figure out why it’s relevant to the general world – something that’s (unfortunately) missing from most grad student projects.

If you follow along with Daniel McArthur’s blog, Genetic Future, you may have caught the announcement that Illumina is getting into the personal genome sequencing game. While I can’t admit that I was surprised by the news, I will have to admit that I am somewhat skeptical about how it’s going to play out.

If your business is using arrays, then you’ll have an easy time sorting through the relevance of the known “useful” changes to the genome – there are only a couple hundred or thousand that are relevant at the moment, and several hundred thousand more that might be relevant in the near future. However, when you’re sequencing a whole genome, interpretation becomes a lot more difficult.

Since my graduate project is really the analysis of transcriptome sequencing (a subset of genome sequencing), I know firsthand the frustration involved. Indeed, my project was originally focused on identifying changes to the genome common to several cancer cell lines. Unfortunately, this is what brought on my need to rant: there is vastly more going on in the genome than small sequence changes.

We tend to believe blindly what we were taught as the “central paradigm of molecular biology”. Genes are copied to mRNA, mRNA is translated to proteins, and the protein goes off to do it’s work. However, cells are infinitely more complex than that. Genes can be inactivated by small changes, can be chopped up and spliced together to become inactivated or even deregulated, interference can be run by distally modified sequences, gene splicing can be completely co-opted by inactivating genes we barely even understand yet and desperately over-expressed proteins can be marked for deletion by over-activating garbage collection systems so that they don’t have a chance to get where they were needed in the first place. And here we are, looking for single nucleotide variations, which make up a VERY small portion of the information in a Cell.

I don’t have the solution, yet, but whatever we do in the future, it’s not going to involve $48,000 genome re-sequencing. That information on it’s own is pretty useless – we’ll have to study expression (WTSS or RNA-Seq, so figure another $30,000), changes to epigenetics (of which there are many histone marks, so figure 30 x $10,000) and even dna methylation (I don’t begin to know what this process costs.)

So, yes, while I’m happy to see genome re-sequencing move beyond the confines of array based SNP testing, I’m pretty confident that this isn’t the big step forward it might seem. The early adopters might enjoy having a pretty piece of paper that tells them something unique about their DNA, and I don’t begrudge it. (In fact, I’d love to have my DNA sequenced, just for the sheer entertainment value.) Still, I don’t think we’re seeing a revolution in personal genomics – not quite yet. Various experiments have shown we’re on the cusp of a major change, but this isn’t the tipping point: we’re still going to have to wait for real insight into the use of this information.

When Illumina offers a nice toolkit that allows you to get all of the SNVs, changes in expression and full ChIP-Seq analysis – and maybe even a few mutant transcription factor ChIP-Seq experiments thrown in – and all for $48,000, then we’ll have a truly revolutionary system.

In the meantime, I think I’ll hold out on buying my genome sequence. $48,000 would buy me a couple more weeks in Tahiti, which would currently offer me a LOT more peace of mind. (=

And on that note, I’d better get back to doing the things I do…. new FindPeaks tag, anyone?

>Poster – reprise

>In an earlier post, I said I’d eventually get around to putting up a thumbnail of the poster that I presented at the Canadian Institutes of Health Research National Research Poster Competition. (Yes, the word “research” appears twice in that sentence.) After a couple days of being busy with other stuff, I’ve finally gotten around to it.

I’m also happy to say that the poster was well received, despite the unconventional appearance. It was awarded an Award of Excellence (Silver category) from the judges.

thumbnail of poster

>Once more into the breach…

>I haven’t been able to follow the whole conversation going on with respect to conference blogging, since I’m still away at a conference for another day. Technically, the conference ended a on thursday, but I’m still here visiting with some of the more important people in my life – so that is my excuse.

At any rate, I received an interesting comment from someone posting as “such.ire”, to which I wrote a reply. In the name of keeping the argument going (since it is such a fascinating topic), I thought I’d post my reply to the front page. For context, I suggest reading such.ire’s comment first:

click here for his comment.

My reply is below:

——-

Hi Such.ire,

I really appreciate your comment – it’s a great counter point to what I said, and really emphasizes the fact that this debate will have plenty of nuances, which will undoubted carry this conversation on long after the blogosphere has finished with it.

To rebut a few of your points, however, I should point out that your examples aren’t all correct.

Yes, conferences are well within their rights to ask you to sign NDAs as an attendee – or to require that confidentiality is a part of the conference – there is no debate on that point. However, if you attend a conference that is open and does not have an explicit policy, then it really is an open forum, and they do not have the right to retroactively dictate what you can (or can’t) do with the information you gathered at the conference.

I think all of us would agree that the boundaries for a conference should be clearly specified at the time of registration.

As for lab talks for your lab members – those are not “public disclosures” in the eye of the law. All of your lab colleagues are bound by the rules that govern your institution, and I would be surprised if your institution hadn’t asked you to sign various confidentiality rules or policies about disclosure at the time you joined them.

Department seminars are somewhat different – if they are advertised outside the department to individuals that are not members of the institution, then again, I would suggest they are fair game.

I don’t blog departmental talks or RIP talks for that reason. They are not public disclosures of information.

Finally, my last point was not that journalists and bloggers do anything different up front, but that the method of their publishing should have a major impact on how they are treated. Bloggers can make corrections that reach all of their audience members and can update their stories, while journalists can not.

If a conference demands to see the material a journalist publishes up front, it makes sense. If they demand to do the same thing for a blogger, it completely ignores the context of the media in which the communication occurs.