Last night, I hung around late into the evening to hear Dr. Andrew G Clark give a talk focusing on how most of the variations we see in the modern human genome are rare variants that haven’t had a chance to equilibrate into the larger population. This enormous expansion of rare variants is courtesy of the population explosion of humans since the dawn of the agricultural age, specifically in the past 2000 years at the dawn of modern science and education.
I think the talk was a very well done and managed to hit a lot of points that struck home for me. In particular, my own collected database of human variations in cancers and normals has shown me much of the same information that Dr Clark illustrated using 1000 genome data, as well as information from his 2010 paper on deep re-sequencing.
However interesting the talk was, one particular piece just didn’t click in until after the talk was over. During a conversation prior to the talk, I described my work to Dr. Clark and received a reaction I wasn’t expecting. Paraphrased, this is how the conversation went:
Me: “I’ve assembled a very large database, where all of the cancers and normals that we sequence here at the genome science centre are stored, so that we can investigate the frequency of variations in cancers to identify mutations of interest.”
Dr. Clark: “Oh, so it’s the same as a HapMap project?”
Me: “Yeah, I guess so…”
What I didn’t understand at the time was that Dr. Clark was asking was: “So, you’re just cataloging rare variations, which are more or less meaningless?” Which is exactly what HapMap projects are: Nothing more than large surveys of human variation across genomes. While they could be the basis of GWAS studies, the huge amount of rare variants in the modern human population means that many of these GWAS studies are doomed to fail. There will not be a large convergence of variations causing the disease, but rather an extreme number of rare variations with similar outcomes.
However, I think the problem was that I handled the question incorrectly. My answer should have touched on the following point:
“In most diseases, we’re stuck using lineages to look for points of interest (variations) passed on from parent to child and the large number of rare variants in the human population makes this incredibly difficult to do as each child will have a significant number of variation that neither parent passed on to them. However, in cancer, we have the unique ability to compare diseased cancer cells with a matched normal from the same patient, which allows us to effectively mask all of the rare variants that are not contributing to cancer. Thus, the database does act like a large HapMap database, if you’re interested in studying non-cancer, but the matched-normal sample pairing available to cancer studies means we’re not confined to using it as a HapMap-style database, enabling incredibly detailed and coherent information about the drivers and passengers involved in oncogenesis, without the same level of rare variants interfering in the interpretation of the genome.”
Alas, in the way of all things, that answer only came to me after I heard Dr. Clark’s talk and understood the subtext of his question. However, that answer is very important on its own.
It means that while many diseases will be hard slogs through the deep rare variant populations (which SNP chips will never be detailed enough to elucidate, by the way, for those of you who think 23andMe will solve a large number of complicated diseases), cancer is bound to be a more tractable disease in comparison! We will by-pass the misery of studying every single rare variant, which is a sizeable fraction of each new genome sequenced!
Unfortunately, unlike many other human metabolic diseases that target a single gene or pathway, cancer is really a whole genome disease and is vastly more complex than any other disease. Thus, even if our ability to zoom in on the “driver” mutations progresses rapidly as we sequence more cancer tissues (and their matched normal samples, of course!), it will undoubtedly be harder to interpret how all of these work and identify a cure.
So, as with everything, cancer’s somatic nature is a double edged sword: it can be used to more efficiently sort the wheat from the chaff, but will also be a source of great consternation for finding cures.
Now, if only I could convince other people of the dire necessity of matched normals in cancer research…