Viral Genomes in Whole Genome Shotgun Sequencing of Hepatocellular carcinoma
– Strong link between cancer and viruses. 12% of cancer cases are caused by 7 viruses: EBV, HPV, HBV, HCV, HTLV-1, HHV8, polyomavirus)
Hepatocellular carcinoma: 5th most common cancer worlds-wide and 3rd leading casu of cancer death worldwide.
Hepatitus B Virus: small dna virus, ds and ss, circular, replicates by RNA intermediate. 4 overlapping open reading frames, 2 direct repeats of 11 bp. (Brief review of life cycle. Does not integrate into genome)
Talk today focuses on 3 patients – 1HBV+, 2 HCV+
30% of DNA reads from patient will be unmapped, and viral sequences will be in the unmapped section.
WGS 30x coverage for tumour/normal. use hg19 (BFAST) – used viral db (NCBI & JCVI viral genome database, soft mask viral seq)
Can confirm that in DNA, you do get the HBV signal, but not HCV (since it’s an rna virus), but you see both in RNA seq data.
Discussion of signal vs noise – if your signal is all in one spot on the genome, it’s just noise, not a real hit. If it maps across the whole viral genome, then it’s probably a good hit. [I tried this 3 years ago for another type of cancer, and saw the same thing with the noise – but never got a signal… nice to see what a real signal looks like.]
Discussion on how to find the integration site. Also indication of host-junction site.
There are implications to integration – could cause structural modification of genes, modifying product. – modify regulation, modify promotors, etc.
- Methods work – can find viruses in the virus positive individual.
- mate paired data allows you to find putative site of viral integration (TERT promotor, in this case)
- Method has applications for other pathogens
- And there is value in unmapped reads!
[neat talk – interesting to find out that unmapped reads can have so much value.]