Don't you think that controls used for microarray (expression
and ChIP-chip) are well established and that we could use
these controls with NGS?
I think this is a valid question, and one that should be addressed. My committee asked me the same thing during my comprehensive exam, so I’ve had a chance to think about it. Unfortunately, I’m not a statistics expert, or a ChIP-chip expert, so I would really value other people’s opinion on the matter.
Anyhow, I think the answer has to be put in perspective: Yes, we can learn from ChIP-chip and Arrays for the statistics that are being used, but no, they’re not directly applicable.
Both ChIP-chip and array experiments are based on hybridization to a probe – which makes them cheap and reasonably reliable. Unfortunately, it also leads to a much lower dynamic range, since they saturate out at the high end, and can be undetectable at the low end of the spectrum. This alone should be a key difference. What signal would be detected from a single hybridization event on a micro-array?
Additionally, the resolution of a chip-chip probe is vastly different from that of a sequencing reaction. In ChIP-Seq or RNA-Seq, we can get unique signals for sequences with a differing start location only one base apart, which should then be interpreted differently. With ChIP-chip, the resolution is closer to 400bp windows, and thus the statistics take that into account.
Another reason why I think the statistics are vastly different is because of the way we handle the data itself, when setting up an experiment. With arrays, you repeat the same experiment several times, and then use that data as several repeats of the same experiment, in order to quantify the variability (deviation and error) between the repeats. With second-generation sequencing, we pool the results from several different lanes, meaning we always have N=1 in our statistical analysis.
So, yes, I think we can learn from other methods of statistical analysis, but we can’t blindly apply the statistics from ChIP-chip and assume they’ll correctly interpret our results. The more statistics I learn, the more I realize how many assumptions go into each method – and how much more work it is to get the statistics right for each type of experiment.
At any rate, these are the three most compelling reasons that I have, but certainly aren’t the only ones. If anyone would like to add more reasons, or tell me why I’m wrong, please feel free to add a comment!