>Has Apple gone too far?

>As a bioinformatician, I enjoy a good looking piece of computer hardware and, for the last few years, the best looking hardware around has been the Apple Macs. I’ve even thought about buying their new macbooks, although for the same specs, you can pick up a dell on sale at 1/3rd of the price, so it’s hardly a good deal. I really can’t see myself running anything other than Linux on it, though, so despite the beautiful engineering, I can’t see myself paying ~$300 for an OS I’d just remove. (I was even upset at paying ~$50 for a copy of Windows XP with my current laptop. Drop me a line if you want to buy the license – It doesn’t even have a Valid EULA… but that’s another story.)

Anyhow, I’ve got to admit, Apple has finally managed to turn me off completely. Check out this article. To paraphrase, Apple has decided to follow suit with Microsoft and Intel in order to prevent you from enjoying the content you own in the way you’d like to use it. In other words, Mac OSX is now claiming control over your media files. (And, I might add, this is not about copyright, because the article shows uses that are clearly restricting “fair use” as well.) DRM is now built right into your hardware, and if your hardware isn’t DRM enabled, you can’t use it. Ouch.

I feel sorry for those people who have jumped the Microsoft ship just to end up in the Apple camp and are about to discover that Apple doesn’t have their best interests at heart either. Why shouldn’t you be able show a movie on an external monitor or projector?

In the long run, this is probaby good advertising for GNU/Linux, which doesn’t enforce media company greed on it’s users. So, if anyone wants a free Ubuntu disk to make their Apple harware work for them instead of against them, here you go.

>Synergy!

>Studying for my comprehensive exam is moving along slowly, rather disrupted by the poster I’m creating for the annual Cancer Conference taking place this week. I’m a little behind, but I’m getting there. Anyhow, I thought I’d take a minute to mention something that’s come up several times in conversation this week: Synergy.

This is one of those applications that is an absolute must for bioinformatics students and researchers, or anyone who uses more than one computer. (Don’t we all, these days?) I’ve been using it, myself, for about a year now, and it’s one of the most useful applications on my computers.

Synergy is an open source software implementation of a KVM switch. Like a KVM switch, it can be used across operating systems – anything from win95 to XP to OSX to Linux/*nix. It’s not even hard to install. The beauty of it is really in its simplicity. Not only can your mouse and keyboard move across your computers, but it also carries a clipboard with it. Cutting and pasting between computers, on its own, is worth it’s weight in gold. (though, that probably depends on how much you have on your clipboard…)

Anyhow, just because not everyone is aware of this nifty little tool, I figured I’d mention it. Hopefully it’s useful to a few people out there!

>Processing Paired End Reads

>Ok, I’m taking a break from journals. I didn’t like the overly negative tone I was taking in those reviews, so I’m rethinking how I write about articles. Admittedly, I think criticism is in order for papers , but I can definitely focus on papers that are worth reviewing. Unfortunately, I’m rarely a fan of papers discussing bioinformatics applications, as I always feel like there’s a hidden agenda behind them. Whether it’s simply proving their application is the best, or just getting it out first, computer application papers are rarely impartial.

Anyhow, I have three issues to cover today:

  • FindPeaks is now on sourceforge
  • Put the “number of reads under a peak” to rest. permanently, I hope.
  • Bed files for different data sources.

The first one is pretty obvious. FindPeaks is now available under the GPL on sourceforge, and I hope people will participate in using and improving the software. We’re aiming for our first tagged release on friday, with frequent tags thereafter. Since I’m no longer the only developer on this project, it should continue moving forward quickly, even while I’m busy studying for my comps.

The second point is this silly notion that keeps coming up. “How many reads were found under each peak?” I’m quite sick of that question, because it really makes no sense. Unfortunately, this was a metric produced in Robertson et al.’s STAT1 data set, and I think other people have included it or copied it. Unfortunately it’s rubbish.

The reason it worked in STAT1 was because they used a fixed length (or XSET) value on their data set. This allowed them to determine the exact length of each read, which allowed them to figure out how many reads were “contiguously linked in each peak.” Readers who are paying attention will also realize what the second problem is… They didn’t use subpeaks either. Once you start identifying subpeaks, you can no longer assign to which peak a read spanning peaks belongs. Beyond that, what do you do with reads in a long tail? Are they part of the peak or not?

Anyhow, the best measure for a peak, at the moment at least, is the height of the peak. This can also include weighted reads, so that reads which are unlikely to contribute to a peak actually contribute less, bringing in a scaled value. After all, unless you have paired end tags, you really don’t know how long the original DNA fragment was, which means you don’t know where it ended.

That also makes a nice segue to my last point. There are several ways of processing paired end tags. When it comes to peak calling it’s pretty simple: you use the default method – you know where the two ends are, and they span the full read. For other applications, however, there are complexities.

If the data source is a transcriptome, your read didn’t cover the intron, so you need to process the transcript to include breaks, when mapping it back to the genome. That’s really a pain, but it is clearly the best way to visualize transcriptome PETs.

If the data source is unclear, or you don’t know where the introns are (which is a distinct possibility), then you have to be more conservative, and not assume the extension of each tag. Thus, you end up with a “tag and bridge” format. I’ve included an illustration to make it clear.

So why bring it up? Because I’ve had several requests for the tag-and-bridge format, and my code only works on the default mode. Time to make a few adjustments.