>One more blog for the day. I was postponing writing this one because it’s been driving me nuts, and I thought I might be able to work around it… but clearly I can’t.
With all the work I’ve put into the controls and compares in FindPeaks, I thought I was finally clear of the bugs and pains of working on the software itself – and I think I am. Unfortunately, what I didn’t count on was that the data sets themselves may not be amenable to this analysis.
My control finally came off the sequencer a couple weeks ago, and I’ve been working with it for various analyses (snps and the like – it’s a WTSS data set)… and I finally plugged it into my FindPeaks/FindFeatures pipeline. Unfortunately, while the analysis is good, the sample itself is looking pretty bad. In looking at the data sets, the only thing I can figure is that the year and a half of sequencing chemistry changes has made a big impact on the number of aligning reads and the quality of the reads obtained. I no longer get a linear correlation between the two libraries – it looks partly sigmoidal.
Unfortunately, there’s nothing to do except re-seqeunce the sample. But really, I guess that makes sense. If you’re doing a comparison between two data-sets, you need them to have as few differences as possible.
I just never realized that the time between samples also needed to be controlled. Now I have a new question when I review papers: How much time elapsed between the sequencing of your sample and it’s control?