>Annotating LincRNA Transcripts Using Targeted Sequencing
Goal: Identify functional large ncRNAs in the mammalian genome
* look like mRNA, but non-coding
* Use Chip-Seq to separate genome into regions
* use Tiling arrays, hybridize RNA…
* Tiling arrays – no information about connectivity, limited resolution
* studying the functions of lincRNAs reqruie precise sequences for both experimental and computational analyses.
Use RNA-Seq protocol to build transcriptome
what RNA-seq gives you:
* RNA, map to genome
* introns… junction reads.
* use reads with mate in poly-A to find end.
Used Tophat to align
* Longer reads provide junction evidence
* first, use only reads that align with a gap. (Build connectivity map)
* topology map
* use map with ChIP-Seq data to build “paths”
* use paths to call transcripts
* clean up with Paired End Data – > join or kill unlikely isoforms.
* Mouse ES
* Illumina sequence (156M – 76bp reads)
* 75% exonic alignment
* correctly reconstruct most expressed known genes at single nucleotide resolution.
* works even on overlapping genes.
* 81% genes fully-reconstructed
* Good recovery of genes at all expression levels.
Novel Transcripts discovered:
* 800 loci between genes
** 250 out of 317 ES lincsRNA are reconstructed
* 200 loci overlapping genes
** 131 overlap coding exons. (making them antisense for visual purpose.)
Are they protein coding genes?
* LincRNAs are probably too small to produce proteins [Strange assumption, IMHO… maybe I’m missing something.]
* 650 of 800 have no lincRNAs have no coding potential
* have lower expression level than coding regions.
* intergenic transcript conservations.. (similar conservation to old lincRNAs)
* Antisense transcripts? – no antisense coding potential
* antisense expression – very low antisense expression
* Antisense conservation – a little more conserved than sense lincRNA because of overlap with exons of genes
* antisense exons are not conserved.
What do overlapping trancripts do?
* expression is low,
* little or no conservation
* correlation with overlapping transcripts
* Thus: artifacts, noise, fine tuners? other ideas?
* novel statistical method takes advantage of longer reads
* mouse ES coding gene novelties
* intergenic non coding RNA (lincRNA)
* new family of antisense non coding RNA
* validation of 18/20.