>The Noncoding Challenge: 50% of GWAS are falling into non-coding studies. CAD and Diabetes fall in gene deserts, so how do they work. Regulatory regions. Build a category of distal enhancers.
Talk about Sonic Hedgehog, involved in limb formation. Regulation of expression is a million bases away from the gene. There are very few examples. We don’t know if we’ll find lots, or if this is just the tip of the iceberg. How do we find more?
First part: work going on in the lab for the past 3 years. Using conservation to identify regions that are likely invovled. Using ChIP-Seq to do this.
Extreme conservation. Either things conserved over huge spans (human to fish) or within a smaller group. (human mouse, chimp).
Clone the regions into vectors, put them in mouse eggs, and then stain for Beta-galactosidase. Tested 1000 constructs, 250,000 eggs, 6000 blue mice. About 50% of them work as reproducible enhancers. Do everything at whole mouse level. Each one has a specific pattern. [Hey, I’ve seen this project before a year or two ago… nifty! I love to see updates a few years later.]
Bin by anatomical pattern. Forebrain enhancers is one of the big “bins”. Working on forebrain atlas.
All data is at enhancer.ldl.gov. Also in Genome Browser. There is also an Enhancer Browser. Comparative genomics works great at finding enhancers in vivo. No shortage of candidates to test.
While this works, it’s not a perfect system. Half of the things don’t express, and the system is slow and expensive. The comparative genomics also tells you nothing about where it expresses, so this is ok for wide scans, but not great if you want something random.
Enter ChIP-Seq. (brief model of how expression works) Collaboration with Bing Ren. (brief explanation of ChIP-Seq). Using Illumina to sequence. Looking at bits of mouse embryo. Did chipseq, got peaks. What’s the accuracy?
Took 90 of predictions, used same assay. When p300 was used, now up to 9/10 of conserved sequences work as enhancers. Also tissue specific.
Summarize: using comparative gives you 5-16% active things in one tissue. Using ChIP-Seq, you get 75-80%.
How good or bad is comparative genomics at ranking targets? 5% of exons are constrained, almost the rest are moderately constrained. [I don’t follow this slide. Showing better conserved in forebrain and other tissues].
P300 peaks are enriched near genes that are expressed in the same tissues.
Conclusion: p300 is a better way of prediction enhancers.
P300 occupancy circumvents DNA conservation only approach.
What about negatives? For ones that don’t work, it’s even better, but mouse orthologs bind, while human does not bind any more in mice.
Conclusion II: Identified 500 more enhancers with first method, and now a few reads done 9 months ago have 5000 new elements using ChIP-Seq.
Many new things can be done with this system, and integrating it with WGAS.