>I saw an interesting seminar today, which I thought I’d like to comment on. Unfortunately, I didn’t bring my notes home with me, so I can only report on the details I recall – and my apologies in advance if I make any errors – as always, any mistakes are obviously with my recall, and not the fault of the presenter.
Ironically, I almost skipped the talk – it was billed as discussing Epigenetics using “ChIP-on-Chip”, which I wrote off several months ago as being a “poor man’s ChIP-Seq.” I try not to say that too loud, usually, since there are still people out there who put a lot of faith in it, and I have no evidence to say it’s bad. Or, at least, I didn’t until today.
The presenter was Dr. Stunnenberg, from Nijmegen Center for Molecular Sciences, who’s web page doesn’t do him justice in any respect. To begin with, Dr. Stunnenberg gave a big apology for the change in date of his talk – I gather the originally scheduled talk had to be postponed because someone had stolen his bags while he was on the way to the airport. That has got to suck, but I digress…
Right away, we were told that the talk would focus not on “ChIP-on-Chip”, but on ChIP-Seq, instead, which cheered me up tremendously. We were also told that the poor graduate student (Mark?) who had spent a full year generating the first data set based on the ChIP-on-Chip method had had to throw away all of his data and start over again once the ChIP-Seq data had become available. Yes, it’s THAT much better. To paraphrase Dr. Stunnenberg, it wasn’t worth anyone’s time to work with the ChIP-on-Chip data set when compared to the accuracy, speed and precision of the ChIP-Seq technology. Ah, music to my ears.
I’m not going to go over what data was presented, as it would mostly be of interest only to cancer researchers, other than to mention it was based on estrogen receptor mediated binding. However, I do want to raise two interesting points that Dr. Stunnenberg touched upon: the minimum height threshold they applied to their data, and the use of Polymerase occupancy.
With respect to their experiment, they performed several lanes of sequencing on their ChIP-Seq sample, and used the standard peak finding to identify areas of enrichment. This yielded a large number of sites, which I seem to recall was in the range of 60-100k peaks, with a “statistically derived” cutoff around 8-10. No surprise, this is a typical result for a complex interaction with a relatively promiscuous transcription factor; a lot of peaks! The surprise to me was that they decided that this was too many peaks, and so applied an arbitrary threshold of a minimum peak height of 30, which reduced the number of peaks down to 6,400-ish peaks. Unfortunately, I can’t come up with a single justification for this threshold at 30. In fact, I don’t know that anyone could, including Dr. Stunnenberg, who admitted it was rather arbitrary, because they thought the first number, in the 10’s of thousands of peaks was too many.
I’ll be puzzling over this for a while, but it seems like a lot of good data was rejected for no particularly good reason. yes, it made the data set more tractable, but considering the number of peaks we work on regularly at the GSC, I’m not really sure this is a defensible reason. I’m personally convinced that there is a lot of biological relevance for the peaks with low peak heights, even if we aren’t aware of what that is yet, and arbitrarily raising the minimum height threshold 3-fold over the statistically justifiable cut off is a difficult pill to swallow.
Moving along, the part that did impress me a lot (one of many impressive parts, really) was the use of Polymerase occupancy ChIP-Seq tracks. Whereas the GSC tends to do a lot of transcriptome work to identify the expression of genes, Dr. Stunnenberg demonstrated that polymerase ChIP can be used to gain the same information, but with much less sequencing. (I believe he said 2-3 lanes of Solexa data were all that were needed, whereas our transcriptomes have been done up to a full 8 lanes.) Admittedly, I’d rather have both transcriptome and polymerase occupancy, since it’s not clear where each one has weaknesses, but I can see obvious advantages to both methods, particularly the benefits of having direct DNA evidence, rather than mapping cDNA back to genomic locations for the same information. I think this is something I’ll definitely be following up on.
In summary, this was clearly a well thought through talk, delivered by a very animated and entertaining speaker. (I don’t think Greg even thought about napping through this one.) There’s clearly some good work being done at the Nijmegen Center for Molecular Sciences, and I’ll start following their papers more closely. In the meantime, I’m kicking myself for not going to the lunch to talk with Dr. Stunnenberg afterwards, but alas, the chip-on-chip poster sent out in advance had me fooled, and I had booked myself into a conflicting meeting earlier this week. Hopefully I’ll have another opportunity in the future.
By the way, Dr. Stunnenberg made a point of mentioning they’re hiring bioinformaticians, so interested parties may want to check out his web page.