>Aligning DNA – comments from above

>I’ve been pretty bad about continuing my posts on how the different aligners work. It’s a lot of work keeping up with them, since I seem to hear about a new one each week. However, a post-doc in my lab gave a presentation on contrasting the various aligners, to discuss each of their strengths and weaknesses for doing short (Illumina) read alignments.

Admittedly, I don’t know how accurate the presenter’s data was – most of the presentation was in being used to set up his own in-house aligner development, and thus all of the aligners were painted in a poor light, except his, of course. That being said, there’s some truth to what he found: most of the aligners out there have some pretty serious issues.

Eland is still limited by it’s 32-base limit, which you’d think they’d have been over by now. For crying out loud, the company that produces it is trying to sell kits for doing 36-base alignments. It’s in their best interest to have an aligner that does more than 32 bases. (Yes, they have a new work-around in their Gerald program, but it’s hardly ideal.)

MAQ, apparently, has a weird “feature” that if multiple alignments are found, it just picks one at random as the “best”. Hardly ideal for most experiments.

Mosaik provides output in .ace files – which are useless for any further work, unless you want to reverse engineer converters to other, more reasonable, formats.

SOAP only aligns against the forward strand! (How hard can it be to map the reverse compliment???)

Exonerate is great when run in “slow mode”, at which point it’s hardly usable for 40M reads, and when it’s run in “fast mode”, it’s results are hardly usable at all.

SHRiMP, I just don’t know enough about to comment on.

And yes, even the post-doc’s in-house aligner (called Slider) has some serious issues: it’s going to miscall all SNPs, unless you’re aligning fragments from the reference sequence back to itself. (That’s not counting the 20 hours I’ve already put in to translate the thing to java proper, patching memory leaks, and the like…)

Seriously, what’s with all of these aligners? Why hasn’t anyone stepped up to the plate and come up with a decent open-source aligner? There are got to be hundreds of groups out there who are struggling to make these work, and not one of them is ideal for use with Illumina reads. Isn’t there one research group out there dog-fooding their own Illumina sequence aligner?

At this rate, I may have to build my own. I know what they say about software, though: You can have fast, efficient or cheap – pick any two. With aligners, it seems that’s exactly where we’re stuck.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.