>new repository of second generation software

>I finally have a good resource for locating second gen (next gen) sequencing analysis software. For a long time, people have just been collecting it on a single thread in the bioinformatics section of the SeqAnswers.com forum, however, the brilliant people at SeqAnswers have spawned off a wiki for it, with an easy to use form. I highly recommend you check it out, and possibly even add your own package.


>Complete Genomics, part 2

>Ok, I couldn’t resist – I visited the Complete genomics “open house” today… twice. As a big fan of start up companies, and an avid follower of the 2nd gen (and possibly now 3rd gen) sequencing, it’s not every day that I get the chance to talk to the people who are working on the bleeding edge of the field.

After yesterday’s talk, where I missed the first half of the technology that Complete Genomics is working on, I had a LOT of questions, and a significant amount of doubt about how things would play out with their business model. In fact, I would say I didn’t understand either particularly well.

The technology itself is interesting, mainly because of the completely different approach to generating long reads… which also explains the business model, in some respects. Instead of developing a better way to “skin the cat”, as they say, they went with a strategy where the idea is to tag and assemble short reads. That is to say, their read size for an individual read is in the range of a 36-mer, but it’s really irrelevant, because they can figure out which sequences are contiguous. (At least, as I understood the technology.) Ok, so high reliability short reads with an ability to align using various clues is a neat concept.

If you’re wondering why that explains their business model, it’s because I think that the technique is a much more difficult pipeline to implement than any of the other sequencing suppliers demand. Of course, I’m sure that’s not the only reason – the reason why they’ll be competitive is the low cost of the technology, which only happens when they do all the sequencing for you. If they had to box reagents and ship it out, I can’t imagine that it would be more significantly cheaper than any of the other setups, and probably much more difficult to work with.

That said, I imagine that in their hands, the technology can do some pretty amazing things. I’m very impressed with the concept of phasing whole chromosomes (they’re not there yet, but eventually they will be, I’m sure), and the nifty way they’re using a hybridization based technique to do their sequencing. Unlike the SOLiD, it’s based on longer fragments, which answers some of the (really geeky, but probably uninformed) thermal questions that I had always wondered about with the SOLiD platform. (Have you ever calculated the binding energy of a 2-mer? It’s less than room temperature). Of course the cell manages to incorporate single bases (as does Pacific Biosciences), but that uses a different mechanism.

Just to wrap up the technology, someone left an anonymous comment the other day that they need a good ligase, and I checked into that. Actually, they really don’t need one. They don’t use an extension based method, which is really the advantage (and achilles heel of the method), which means they get highly reliable reads (and VERY short fragments, which they have to then process back to their 36-to 40-ish-mers).

Alright, so just to touch on the last point of their business model, I was extremely skeptical when I heard they were going to only sequence human genomes, which is a byproduct of their scale/cost model approach. To me, this meant that any of the large sequencing centres would probably not become customers – they’ll be forced to do their own sequencing anyhow for other species, so why would they treat humans any differently? What about cell lines, are they human enough?…

Which left, in my mind, hospitals. Hospitals, I could see buying into this – whoever supplies the best and least expensive medical diagnostics kit will obviously win this game and get their services, but that wouldn’t be enough to make this a google-sized or even Microsoft-sized company. But, it would probably be enough to make them a respected company like MDS metro or other medical service providers. Will their investors be happy with that… I have no idea.

On the other hand, I forgot pharma. If drug companies start moving this way, it could be a very large segment of their business. (Again, if it’s inexpensive enough.) Think of all the medical trials, disease discovery and drug discovery programs… and then I can start seeing this taking off.

Will researchers ever buy in? That, I don’t know. I certainly don’t see a genome science centre relinquishing control over their in house technology, much like asking Microsoft to outsource it’s IT division. Plausible… but I wouldn’t count on it.

So, in the end, all I can say is that I’m looking forward to seeing where this is going… All I can say is that I don’t see this concept disappearing any time soon, and that, as it stands, there’s room for more competition in the sequencing field. The next round of consolidation isn’t due for another two years or so.

So… Good luck! May the “best” sequencer win.


>A strange title, no?

I just discovered Google’s Knol project. Imagine an author-accountable version of Wikipedia. That’s quite the concept. It’s like a free encyclopedia, written by experts, where the experts make money by putting google adds on their pages (optional), and the encyclopedia itself is free. I can’t help but liking this concept.

This, to me, is about the influence of Open Source on business models other than software.

People used to claim, back in the 90’s, that the internet would eventually become nothing but adds, because no one in their right might will contribute content for free, and content generation would become the exclusive domain of major companies. That was the old thinking, which led to the “subscription models” favoured by companies like online subscription based dictionaries, and subscription based expert advice, both of which I find lacking in so many respects.

Subsequently, people began to shift in the other direction, where it was assumed that services could harness the vast power of the millions of online people. If each one contributed something to wikipedia, we’d have a mighty resource. Of course, they forgot the chaotic nature of society. There are always a bunch of idiots to ruin every party.

So where does this leave us? With Knol! This model is vastly more like the way software is created in the Open Source model. The Linux kernel is edited by thousands of people, creating an excellent software platform, and it’s not by letting just anyone edit the software. Many people create suggestions for new patches, and the best (or most useful, or most easily maintained…) are accepted. Everyone is accountable along the way, and the source of every patch is recorded. If you want to add something to the Linux kernel, you’d better know your stuff, and be able to demonstrate you do. I think the same thing goes for knol. If you want to create a page, fine, but you’ll be accountable for it, and your identity will be used to judge the validity of the page. If an anonymous person wants to edit it, great, that’s not a problem, but the page maintainer will have to agree to those changes. This is a decentralized expert based system, fueled by volunteers and self-sponsored (via the google adds) content providers. It’s a fantastic starting point for a new type of business model.

Anyhow, I have concerns about this model, as I would about any new product. What if someone hijacks a page or “squats” on it. I could register the page for “coca cola” and write an article on it and become the de-facto expert on something that has commercial value. ouch.

That said, I started my first knol article on ChIP-Seq. If anyone is interested in joining in, let me know. There’s always room for collaboration on this project.


>Synthetic genomes

>A nifty announcement this morning pre-empted my transcriptome post:

Scientists at the J. Craig Venter Institute have succeeded in creating a fully synthetic bacterial genome, which they have named Mycoplasma genitalium JCVI-1.0. This DNA structure is the largest man-made molecule in existence, measuring 582,970 base pairs.

Kind of neat, really. Unfortunately, I think it’s putting the cart before the horse. We don’t understand 95% of what’s actually going on in the genome, so making an artificial genome is more like having a Finnish person making a copy of the English dictionary by leaving out random words (just one or two), and then seeing if Englishmen can still have a decent conversation with what he’s left them. When he finds that leaving out two words still results in a reasonable discussion on toothpaste, he declares he’s created a new Dialect.

Still, it’s an engineering feat to build a genome from scratch, much like the UBC engineers hanging VW bugs off of bridges. Pointless and incomprehensible, but neat.

>Pacific Biotech new sequencing technology

>I have some breaking news. I doubt I’m the first to blog this, but I find it absolutely amazing, so I had to share.

Steve Turner from Pacific Biosciences (PacBio), just gave the final talk of the AGBT session, and it was damn impressive. They have a completely new method of doing sequencing that uses DNA polymerase as a sequencing engine. Most impressively, they’ve completed their proof of concept, and they presented data from it in the session.

The method is called Single Molecule Real Time (SMRT) sequencing. It’s capable of producing 5000-25,000 base pair reads, at a rate of 10 bases/second. (They apparently have 25bps techniques in development, and expect to release when they have 50bps working!)

The machinery has zero moving part, and once everything is in place, they anticipate that they’ll have a sequencing rate of greater than 100 Gb per hour! As they are proud to mention, that’s about a full draft genome for a human being in 15 minutes, and at a cost of about $100. Holy crap!