It seems to be University of Toronto week on my blog. Today, Julie Chih-yu Chen is visiting to give a talk titled:
Identifying tissue specific distal regulatory sequences in the mouse genome.
Enhancer Identification in Mouse Embryonic Stem Cells
(A last second change)
which, from all indications, is Chip-Seq related. Julie is currently wrapping up a masters degree in the Mitchell lab at U of T and has done a lot of coding work in the past, but has been working on more biological questions recently.
I was also fortunate enough to have been invited to lunch with Julie before the talk and to ask a few questions and to confirm that she would be happy to have her talk blogged.
And now, on with the talk.
Distal Regulatory elements – Non coding elemnents. Histone modifications are found to be tissue specific at enhancers, rather than at promoters and insulators. Over 40% of peaks for several Transcription factors are in transgenic region (more than 10kb from tss.)
Due to folding of DNA, enhancers that are not sequentially adjacent can drive transcription by folding the DNA to become proximal and effect expression in ways that would be expected from closer elements. (Carter 2002).
Examples: Thallasaemias result from deletions or rearrangements of beta-globin gene (HBB) enhancers, 50kb upstream. SHH enhancer mutations in mice, 1Mb upstream can cause severely shortened limbs.
How do we find these? ChIP-Seq or Chip-chip technology can be used to identify binding sites, and that information can be used to identify binding motifs.
We can also use other methods to enhance this analysis: high-throughput sequencing data for TFs, p300, histone methylation, or you can use annotations from comparative genomics, such as highly conserved regions.
Motivation: identify significant markers at known enhancers, predict enhancer regions, identify TFs potentially regulating the cell type from motif analysis.
Training data and features -extended sets of known enhancer positives and negatives.
Illustration of an example at a TSS, showing that there is frequent activity of different sorts all at the same location.
Method: Binning in 1kb increments. [Gah! Another Binning Method!] Something about input reads +3 for control… missed the detail, tho.
Feature extraction improves enhancer prediction…. classifier used for cross validation assessment. [Not sure how it works.]
Use a maximize penalized likelihood, with three classes: Positive, Negative and unknown. As lambda decreases, you can see some classifiers become more important than others. This gives you a signature that can be exploited to identify enhancers.
Enhancer candidates are located further from TSS compared to promoter-like regions. There is a distinct distribution for positive, negative and unknown types. Negative is closer, positive is a bit further, (10kb) and unknown is even further away.
when working with enhancers, it is assigned to the closest gene downstream.
Trend: genes with (predicted) enhancers have higher expression compared to genes without enhancers.
20% of the top enhancer regions locate near genes encoding trancription factors. [Ok, that’s neat.] Top 2000 highest are enriched in a small number of functions [by go terms?]
Previously identified and validated enhancers were used from lothi 2008 (SISSR), to compare. Compares well, and a few new ones were identified.
Can identify important functional regions… but is it cell type specific? [A bit lost for a minute – the graphs aren’t well labeled, so I’m somewhat puzzled as to what’s coming out of the Venn diagrams.]
When comparing enhancers from Embryonic Stem cells, you find more overlap with other data sets also done in Embryong Stem cells, as opposed to other types of cells, which means that the TF networks are cell type specific.
Various other TF enhancers are identified from this data set – which can be compared with known TF expression, to identify which ones are already known to be utilized by mouse ESC. Good concordance observed.
Ranked enhancer signatures.
Enhancer candidates: coupled with promoter-like regions increase expression of nearby genes. Overlap significantly with multiple transcribed loci. Potentially regulate genes encoding TFs. Are tissue specific and overlap with active histone mark.
Identified known and novel TFs in mESC with motif enrichment analysis of enhancers.
It is worth noting that some enhancers can interact with insulators, or can interact with different genes other than the closest. Other mechanisms may be possible.
[Overall, not a bad talk – and very bioinformatics-ish. That is to say, it could have been a little more heavy on either the algorithm or the biology, but seemed aimed to appeal to both audiences, but may not be detailed enough for either, however, that’s pretty typical for the field, and isn’t a criticism.
I think it’s also clear that the biology, in this case, is now being driven by the development of novel algorithms – and that this is a valid approach to gain insight into the discovery of enhancer biology. It’s a great initial foray into the topic, but the data itself shows that there is still lots more to learn about how everything integrates at the molecular level.]