>Stephan Schuster, Penn State University – “Genomics of Extinct and Endangered Species”

>Last year, introduced nanosequencing of complete extinct species. What are the implication of extinct genomes on endangered species.

Mammoth: went extinct 3 times… 45,000ya, 10,000 ya, and 3,500ya. Wooly rhino: 10,000 years ago, Moa 500 years ago (were eaten), Thylacine 73 years ago.. And Tasmanian devils, which are expected only to last another 10 years.

Makes you wonder about dinosaurs.. maybe dinosaurs just tasted like chicken.

Looking at population structure and biological diversity from a genomic perspective. (Review of Genotyping Biodiversity.) Mitochondrial genome is generally higher copy, and thus was traditionally the one used, but now with better sequencing, we can target nuclear DNA.

Mammoth Mitochondrial genome has been done. ~16,500bp. Includes ribosomal, coding and noncoding regions. In 2008, can get 1000x coverage on the mitochrondrial. You need extra coverage to correct for damaged DNA.

This has now allowed 18 mammoth mitochondrial genome sequences. 20-30 SNPs between members of same groups, and 200-300 between groups. WAY more sequencing than is available for african elephants!

Have now switched to using hair instead of bone, and can use hair shaft. (not just follicle)

Ancient DNA = highly fragmented. 300,000 sequences, 45% was nuclear DNA.

Now: Sequenced bases: 4.17Gb. Genome size is 4.7Gb. 77 Runs, got 32.6 million bases.

Can visit mammoth.psu.edu for more info.

Sequenced mammoth orthologs of human genes. Compared to watson/venter… rate of predicted genes of chromosomes (“No inferrences here”), Complete representation of genome available. SAP =Single Amino acid Polymorphism.

(Discussion Divergence for mammoth) coalescence time for human and neandethal, 600,000. Same thing happens for mammoth, but not really well accepted because the biological evidence doesn’t show it.

Did the same thing for the Tasmanian Tiger. Two complete genomes – only 5 differences between them.

Hair for one sample was taken from what fell off when preserved in a jar of ethanol!

Moa: did it from egg shell!

Wooly rhino: did the wooly rhino from hair – did other rhinos. (wooly is the only extinct one.) Rhinos radiated only a million years ago, so couldn’t resolve phylogenic tree. Tried: hair, horn, hoof, and bone.. bone was by far the worst.

Now, to jump to the living: the tasmanian devil. Highly endangered. 1996 infectious cancer discovered (not figured out till 2004). Devils protected since 1941. Isolations with fences, islands, mainland, insurance population. Culling and vaccination are also possible.

Genome markers will be very useful. Problem is probably because there is nearly no diversity in population. Sequenced museum sample devils, and show mitochondrial DNA had more diversity in non-living population.

Project for full genome is now underway – two animals. (More information on plans on what to do with this data and how to save them.) SNP info for genotyping to direct captive breeding program.
(“Project Arc”) Trying to breed resistant animals.

>Len Pennacchio, Lawrence Berkely National Laboratory – “ChIP-Seq Accurately Predicts Tissue Specific Enhancers in Vivo”

>The Noncoding Challenge: 50% of GWAS are falling into non-coding studies. CAD and Diabetes fall in gene deserts, so how do they work. Regulatory regions. Build a category of distal enhancers.

Talk about Sonic Hedgehog, involved in limb formation. Regulation of expression is a million bases away from the gene. There are very few examples. We don’t know if we’ll find lots, or if this is just the tip of the iceberg. How do we find more?

First part: work going on in the lab for the past 3 years. Using conservation to identify regions that are likely invovled. Using ChIP-Seq to do this.

Extreme conservation. Either things conserved over huge spans (human to fish) or within a smaller group. (human mouse, chimp).

Clone the regions into vectors, put them in mouse eggs, and then stain for Beta-galactosidase. Tested 1000 constructs, 250,000 eggs, 6000 blue mice. About 50% of them work as reproducible enhancers. Do everything at whole mouse level. Each one has a specific pattern. [Hey, I’ve seen this project before a year or two ago… nifty! I love to see updates a few years later.]

Bin by anatomical pattern. Forebrain enhancers is one of the big “bins”. Working on forebrain atlas.

All data is at enhancer.ldl.gov. Also in Genome Browser. There is also an Enhancer Browser. Comparative genomics works great at finding enhancers in vivo. No shortage of candidates to test.

While this works, it’s not a perfect system. Half of the things don’t express, and the system is slow and expensive. The comparative genomics also tells you nothing about where it expresses, so this is ok for wide scans, but not great if you want something random.

Enter ChIP-Seq. (brief model of how expression works) Collaboration with Bing Ren. (brief explanation of ChIP-Seq). Using Illumina to sequence. Looking at bits of mouse embryo. Did chipseq, got peaks. What’s the accuracy?

Took 90 of predictions, used same assay. When p300 was used, now up to 9/10 of conserved sequences work as enhancers. Also tissue specific.

Summarize: using comparative gives you 5-16% active things in one tissue. Using ChIP-Seq, you get 75-80%.

How good or bad is comparative genomics at ranking targets? 5% of exons are constrained, almost the rest are moderately constrained. [I don’t follow this slide. Showing better conserved in forebrain and other tissues].

P300 peaks are enriched near genes that are expressed in the same tissues.

Conclusion: p300 is a better way of prediction enhancers.
P300 occupancy circumvents DNA conservation only approach.

What about negatives? For ones that don’t work, it’s even better, but mouse orthologs bind, while human does not bind any more in mice.

Conclusion II: Identified 500 more enhancers with first method, and now a few reads done 9 months ago have 5000 new elements using ChIP-Seq.

Many new things can be done with this system, and integrating it with WGAS.

>Bruce Budowle, Federal Bureau of Investigation – “Detection by SOLiD Short-Read Sequencing of Bacilus Anthracis and Tersinia Pestis SNPs for Strain Id

>We live in a world with a “heightened sense of concern.” The ultimate goal is to reduce risk, whether it’s helping people with flooding, or otherwise. Mainly, they work on stopping crime and identifying threat.

Why do we do this? We’ve only had one anthrax incident since 2001… but we’ve been been using bioterrorism for a 2000 years. (several examples given.)

Microbial Forensics. We don’t just want knee jerk responses. Essentially the same as any other forensic discipline, again, to reduce risk. This is a very difficult problem. Over 1000 agents known to infect humans: 217 viruses, 538 bacterial species, 307 fungi, 66 parasitic protozoa. Not are all effective, but there are different diagnostic challenges. Laid out on the tree of life…. it’s pretty much the whole thing.

Biosynthetic technology. New risks are accruing due to advances in DNA synthesis. Risks are vastly outweighted by benefits of synthesis… bioengineering also plays a role.

Forensic genetic questions:
what is the source?
Is it endemic?
what is the significance?
How confident can you be in results?
Are there alternative explanations?

So, a bit of history on the “Amerantrax” case. VERY complex case, changed the way the government works on this type of case. Different preparations in different envelopes.

Goals and Objectives:
could they reverse engineer the process? To figure out how it was done? No, too complex, didn’t happen.

First sequencing – did a 15 locus 4-colour genotyping system. Was not a validated process – but helped identify strain. That helped narrow down the origin of the strain. Some came from texas, but it was more likely to have come from a lab than to come from the woods.

Identifying informative SNPs. Don’t need to know the evolution – just the signature. That can be then used for diagnostics. Whole genome sequencing for genotyping was a great use. Back in 2001, most of this WGS wasn’t possible. They had a great deal from Tigr – only $125,000 to sequence the genome. From the florida isolate : took 2 weeks, found out interesting details about copy number of plasmids. The major cost was then to validate and understand what was happening.

Florida was compared to Ames to one from UK, which gave 11 SNPs only. Many evolution challenges that came up. The strain they used was “cured” of it’s plasmid, so it evolved to have other SNPs… a very poor reference genome.

The key to identification: one of the microbiologists discovered that some cultures had different morphology. That was then used as another signature for identifying the source.

Limited Strategy: it didn’t give the whole answer – only allows them to rule out some colonies. It would be more useful to sequence full genomes… so entered into deal with ABI SOLiD for genome sequencing.

Some features were very appealing. One of them is the Emulsion PCR. Helped to improve quality and reliability of the assay. And beads, were useful too.

Multiplex value was very useful. Could test 8 samples simultaneously using barcoding, including the reference Ames strain. Coverage was 16x-80x, depending on DNA concentration. Multiple starting points gives more confidence, and to find better SNPs.

Compare to reference: found 12 SNPs in resequenced reference. When you look at SNP data, you see that there was a lot of confidence if it’s in both direction… however, it only turns up on the one strand. That becomes a major way to remove false positive result. That was really only possible by using higher coverage.

Not going to talk about Pestis.. (almost out of time.) Similar points, 130-180X coverage. Found multidrug transporter in the strain which has been a lab strain for 50 years. Plasmids were also higher coverage. SNPs were less in the north american, etc.
An interesting point. If you go to the ref in genbank, there are known errors in the sequence. Several have been corrected, and the higher coverage was helpful in figuring out the real sequence past the errors.

$1000 /strain using multiplex, using equipment that is not yet available. This type of data really changes the game, and can now screen samples VERY quickly (a week).

Conclusions:
Every project is a large scale sequencing project
depth is good
multiplexing is good
keep moving to higher accuracy sequencing.

>Andy Fire, Stanford University – “Understanding and Clinical Monitoring of Immune-Related Illnesses Using Massively-Parallel IgH and TcR Sequencing”

>The story starts off with a lab that works on small RNAs, which believe they form a small layer of immunity. [did I get that right?] They work in response to foreign DNA.

Joke Slide: by 2018, we’ll have an iSequencer.

Question: can you sequence the immunome. [new word for me.] Showing a picture of lymphoma cells, which to me looks like a test to see if you’re colour blind. There are patches of slightly different shades…

Brief intro to immulogy. “I got an F in immunology as a grad student.” [There’s hope for me, then!]
Overview of VDJ Recombination, control by B-Cell differentiation. This is really critical – responsible for our health. One Model: If something recognizes both a virus and self, then you can end up with autoimmune response.

There is a continuum based on this. It’s not necessarily an either /or relationship.

There is a PCR/454 test for VDJ distribution. Under some cases, you get a single dominating size class, and that is usually a sign of disease, such as lymphoma. You can also use 454 for this, since you need longer reads, and read the V, D and J units in the amplified fragment. Similar to email, you can get “spam”, and you can use similar technologies to drop out the “spam” from the run.

To show the results of the tests for B-cell recombination species, you put V on one axis, J on the other. D is dropped to make it more viewable. In lymphoma, a single species dominates the chart.

An interesting experiment – dilute with regular blood to see detection limit – it’s about 1:100. For some lymphomas, you can’t use these primers, and they don’t show up. There are other primers for the other diseases.

So what happens in normal distributions? Did the same thing with VDJ, (D included so there are way more spots). Neat image.. Do this experiments with two aliquots of blood from the same person. Look for concordance. You find lots of spots fail to correspond well at the different time points, but many do.

On another project, Bone Marrow transplant. Recipient has a funny pattern, mostly caused by “spam” because the recipient really has very little immune system left. The patient eventually gets the donor VDJ types, which is a completely donor response. You can also do something like this for autoimmune disorders.

Malfunctioning Lymphoid cells cause many human diseases and medical side-effects. (several examples given.)

>Keynote Speaker: Rick Wilson, Washington University School of Medicine – “Sequencing the Cancer Genome”

>Interested in:
1.Somatic mutations in protein coding genes, including indels.
2.Also like to find: non-coding mutations, miRNAs and lincRNAs.
3.Like to learn about germ line variations.
4.Differential transcription and splicing
5.CNV
6.structural variation
7.Epigenetic changes
8.big problem: integrate all of this data.. and make sense of it.

Paradigm for years: exon focus for large collection of samples. Example: EGFR mutations in Lung Cancer. Large number of patients (some sample) had EGFR mutations. Further studies carry on this legacy in Lung cancer using new technology. However, when you look at pathways, you’ll find out finding that the pathways are more important than individual genes.

Description of “The Cancer Genome Atlas”

Initial lists of genes mutated in cancer. Mutations were found, many of which were new. (TCGA Research Network, Nature, 2008)

Treatment-related hypermutation. Another example of TCGA’s work: glioblastoma. Although they didn’t want treated samples, in the end they took a look and saw that treated samples have interesting changes in methylation sites, when MMR genes and MGMT were mutatated. If you know the status of the patient’s genome, you can better select the drug (eg, not use a alkylation based drug).

Pathways analysis can be done… looking for interesting accumulations of mutations. Network view of the Genome… (just looks like a mess, but a complex problem we need to work on.)

What are we missing? What are we missing by focusing on exons? There should be mutations in cancer cells that are outside exons.

Revisit the first slide: now we do “Everything” from the sample of patients, not just the list given earlier.

(Discussion of AML cancer example.) (Ley at al, Nature 2008)
Found 8 heterozygous somatic mutations, 2 somatic insertion mutations. Are they cause or effect?
The verdict is not yet in. Ultimately, functional experiments will be required.

There are things we’re not doing with the technology: Digital gene expression counts. Can PCR up gene of interest from tumour, sequence and do a count: how many cells have the genotype of interest?
Did the same thing for several genes, and generally got a ratio around 50%.

Started looking at GBM1. 3,520,407 tumour variants passing SNP filter. Broke down to Coding NS splice sites, coding silent, conserved regions, regulatory regions including miRNA targes, non repetitive regions, everything else (~15,000). Many of the first class were validated.

CNV analysis also done. Add coverage to sequence variants, and the information becomes more interesting. Can then use read pairs to find breakpoints/deletions/insertions.

What’s next for cancer genomics? More AML (Doing more structural variations, non-coding information, more genomes), more WGS for other tumours, and more lung cancer, neuroblastoma… etc.

“If the goal is to understand the pathogenensis of cancer, there will never be a substitute for understanding the sequence of the entire cancer genome” – Renato Dulbecco, 1986

Need ~25X coverage of WGS tumour and normal – also transcriptome and other data. Fortunately, costs are dropping rapidly.

>Peter Park, Harvard Medical School – “Statistical Issues in ChIP-Seq and its Application to Dosage Compensation in Drosophila”

>(brief overview of ChIP-Seq, epigenomics again)

ChIP-Seq not always cost-competitive yet. (can’t do it at the same cost as chip-chip)

Issues in analysis:Generate tags, align, remove anomalus, assemble, subtract background, determine binding position, check sequencing depth.

Map tags in strand specific manner: (Like directional flag in Findpeaks). Scoring tags accounting for that profile. Can be incorporated into peak caller.

Do something called Cross-correlation analysis. (look at peaks in both directions.) use this to rescue more tags. Peaks get better if you add good data, and worse if you add bad data. Use it to learn something about histone modification marks. (Tolstorukov et al, Genome Research).

How deep to sequence? 10-12M reads is current. That’s one lane on illumina, but is it enough? What quality metric is important? Clearly this depends on the marks you’re seeing (narrow vs broad, noise, etc). Brings you to saturation analysis? Show no saturation for STAT1, CTCF, NRSF. [not a surprise, we knew that a year ago… We’re already using this analysis method, however, as you add new reads, you add new sites, so you have to threshold to make sure you don’t keep adding new peaks that are insignificant. Oh, he just said that. Ok, then.]

Talking about using “fold enrichment” to show saturation. This allows you to estimate how many tags you need to get a certain tag enrichment ratio.

See paper they published last year.

Next topic: Dosage compensation.

(Background on what dosage compensation is.)

In drosophila, the X chromosome is up-regulated in XY, unlike in humans, where the 2nd copy of the X is quashed in the XX genotype. Several models available. Some evidence that there’s something specific and sequence related. Can’t find anything easily in ChIP based methods – just too much information. Comparing ChIP-seq, you get sharp enrichment, whereas on ChIP-chip, you don’t see it. Seems to be saturation issue (dynamic range) on ChIP-chip, and the sharp enrichments are important.
You get specific motifs.

Deletion and mutation analysis. The motif is necessary and sufficient.

Some issues: Motif on X is enriched, but only by 2-fold. Why is X so much upregulated, then? Seems Histone H3 signals depleted over the entry sites on X chr. May also be other things going on, which aren’t known.

Refs: Alekseyenko et al., Cell, 2008 and Sural et al., Nat Str Mol Bio, 2008

>Alex Meissner, Harvard University- “From reference genome to reference epigenome(s)”

>Background on Chip-Seq.

High-throughput Bisulfite Sequencing. At 72 bp, you can still map these regions back without much loss of mapping ability. You get 10% loss at 36bp, 4% at 47bp and less at 72bp.

This was done with a m-CpG cutting enzyme, so you know all fragments come with at least a single Methylation. Some update on technology recently, including drops in cost and longer reads, and lower amounts of starting material.

About half of the CpG is found outside of CpG islands.

“Epigenomic space”: look at all marks you can find, and then external differences. Again, many are in gene deserts, but appear to be important in disease association. Also remarkable is the degree of conservation of epigenetic patterns as well as genes.

Questions:
where are the functional elements?
when are they active?
when are they available

Also interested in Epigenetic Reprogramming (Stem cell to somatic cell).

Recap: Takahashi and Yamanaka: induce pluripotent stemcell with 4 transcription factors: Oct2, Sox2, c-Myc & KLF4[?] General efficiency is VERY low (0.0001% – 5%). Why are not all cells reprogramming?

To address this: ChIP-Seq before and after induction with 4 transcription factor. Strong correlation with chromatin state and iPS. Clearly see that genes in open chromatin are responsive. Chromatin state in MEFs correlates with reactivation.

Is loss of DNA methylation at pluripotency genes the critical step to fully reprogram? Test hypothesis that by demethylation, you could cause more cells to become pluripotent. Loss of DNA methylation does indeed allows transition to pluripotency shown. [lots of figures, which I can’t copy down.]

Finally: loss of differentiation potential in culture. Embryonic stem cell to neural progenitor, but eventually can not differentiate to neurons, just astrocytes. (Figure from Jaenisch and Young, Cell 2008)

Human ES cell differentiation: often fine in morphology, correct markers… etc etc, but specific markers are not consistent. Lose methylation and histone marks, which cause significant changes in pluripotency.

Can’t yet make predictions, but on the way towards it in the future where you can assess cell type quality using this information.

>Marco Marra’s Talk

>That was clearly the coolest thing we’ve seen so far. From genome to cancer treatment, which seems to have worked to reduce the tumour size.. Wow. I was aware of the work, having been involved in a small way, but I wasn’t aware of the outcome until just last night.

Mind Blowing. Personalized medicine is here.

>**BREAKING NEWS** Marco Marra, BC Cancer Agency – “Sequencing Cancer Genomes and Transcriptomes: From New Pathology to Cancer Treatment.”

>Why sequence Cancer-ome: Most important: treatment-response difference. To match treatments to patients. Going to focus on that last one.

Two Anecdotes: Neuroblastoma (Olena Morozova and David Kaplan), and Papillary adenocarcinoma (tongue), primary unkown. 70 year difference in age. They have nothing in common except for “can sequence analysis present new treatment options?”

Background on Neuroblastoma. Most common cancer in infants, but not very common. 75 cases per year in Canada. Patients often have relapse and remission cycles after chemotherapy. Little improvement until recently, when Kaplan was able to show abiltity to enrich for tumour initiating cells (TICs). This gave a great model for more work.

Decided to have a look at Kaplan’s cells, and did transcriptome libraries (RNA-Seq) using PET, and sequenced a flow cell worth: giving 5Gb of raw seq from one sample, 4 from the other. Align to reference genome using custom database. (Probably Ryan Morin’s?) Junctions, etc.

Variants found that are B-cell related. Olena found markers, worked out lineage, and showed it was closer to B-cell malignancy than brain cancer sample. These cells also show neuroblastomas, when reintroduced to mice. So, is neuroblastoma like B-cell in expression? Yes, they seem to have a lot of traits in common. It appears as though the neuroblastoma is expressing early markers.

Thus, if you can target B-Cell markers, you’d have a clue.

David Kaplan verified and made sure that this was not contamination (Several markers). Showed that yes, the neuroblastoma cells are expressing b-cell markers, and that these are not B-cells. Thus, it seems that a drug that targets B-Cell markers could be used. (Rituximab, and Milatuzamab) Thus, we now have an insight that we wouldn’t have have had before. (Very small sample, but lots of promise.)

Anectdote 2: 80 year old male with adenoma of the tongue. Salivary gland origin possibly? Has had surgery and radiation and a Cat scan revealed lung nodules (no local recurrance.) There is no known standard chemotherapy that exists… so several guesses were made, and an EGFR inhibitor was tried.. Nothing changed. Thus, BC Cancer was approached: what can genome studies do? Didn’t know, but willing to try. Genome from formalin fixed sample (which is normally not done), and handful of WTSS from Fine-needle aspirates. (nanograms, which required amplification). 134Gb of aligned sequence across all libraries – about 110Gb to genome. (22X genome, 4X transcriptome.)

Data analysis, compared across many other in-house tumours, and looked for evidence of mutation. CNV was done from Genome. Integration with drug bank, to then provide appropriate candidates for treatment.

Comment on CNV: histograms shown: Showed that as many bases are found in single allele as diploid and then again, just as many in triploid and then some places at 4 and 5s. Was selected pressure involved in picking some places for gain, whereas much of the genome was involved in loss?

Investigated a few interesting high CNV regions, one of which contains RET. Some amplifications are highly specific, containing only a single gene, whereas they are surrounded by loss of CNV regions.

Looking at Expression level, you see a few interesting things. There is a lack of absolute correlation between changes in CNV and the expression of the gene.

When looking for intersection, ended up with some interesting features:
30 amplified genes in cancer pathways (kegg)
76 deleted genes in cancer pathways
~400 upregulated, ~400 downregulated genes
303 candidate non-synonymous snps
233 candidate novel coding SNPs
… more.

Went back to drugbank.ca (Yvonne and Jianghong?) When you merge that with target genes, you can find drugs specific to those targets. One of the key items on the list was RET.

Back to patient, the patient was using EGFR targetting drug. Why weren’t they responsive? Turns out that p10 and RB1 are lost in this patient… (see literature.. didn’t catch paper).

Pathway diagram made by Yvonne Li. Shows where mutations occur in pathways, gains and losses of expression are shown as well. Notice Lots of expression from RET, and no expression from p10. p10 regulates (negative) the RET pathway. Also increases of Mek and Ras. Suggests that in this tumour, activation of RET could be driving things.

Thus, came up with a short list of drugs. Favorite was Sunitinib. It’s fairly non-specific, used for renal cell carcinoma. Currently in clinical trials, tested for other cancers. Implications that RET is involved in some of those diseases (MEN2a, MEN2B and thyroid cancers.) RET sequence in patient was not likely to be mutated in patient.

CAT scans: response to Sunitinib and Erlotinib. When on the EGFR targetting drug, nodule grew. On Sunitinib, the cancer retreated!

Lots of Unanswered questions: Is RET really driving this tumour? Is drug really acting on RET? Is PTEN/RB1 loss responsible for erlotinib resistance in this tumour?

We don’t think we know everything, but can we use genome analysis to suggest treatment: YES!

First question: how did this work with ethics boards? How did they let you pass that test? Answer: this is not a path to treatment, it is a path towards making suggestion. In some cancers there is something called Hair Analysis. It can be considered or ignored. Same thing here: we didn’t administer… we just proposed a treatment.

>Keynote Speaker: Rick Myers, Hudson-Alpha Institute – “Global Analysis of Transcriptional Control in Human Cells”

>Talking about gene regulation – has been well studied for a long time, but only recent on a genomic scale. The field still wants comprehensive, accurate, unbiased, quantitative measurements (DNA methylation, DNA binding protein, mRNA) and they want it cheap fast and easy to get.

Next gen has revolutionized the field: ChIP-Seq, mRNA-Seq and Methyl-Seq are just three of them. Also need to integrate them with genome-wide genetic analysis.

Many versions of each of those technology.

RNA-Seq: 20M reads give 40 reads per 1kb-long mRNA present as low as 1-2 mRNA per cell. Thus, 2-4 lanes are need for deep transcriptome measurement. PET + long reads is excellent for phasing, and junctions.

ChIP-Seq: transcription factors and histones.. but should also be used for any DNA binding protein. (Explanation of how ChIP-Seq works.) Using no-antibody control generally gives you no background [?] Chip without control gets you into trouble.

Methylation: Methyl-seq. Cutting at unmethylated sites, then ligate to adaptors and fragment. Size select and run. (Many examples of how it works.)

Studying human embryonic stem cells. (Cell lines are old and very different…. hopefully there will be new ones available soon.) Using it for Gene expression versus methylation status: When you cluster by gene expression, they cluster by pathways. The DNA methylation patterns did not correlate well, more along the line of individual cell lines than pathways. Thus, they believe it’s not controling the pathways.. but that could be an artifact of the cell lines.

26,956 methylation sites. Many of them (7,572) are in non CpG regions.

Another study: Studying Cortisol. Steroid hormone made by adrenal gland. Controls 2/3rds of all biology, helps restore homeostasis and affects a LOT of pathways: blood pressure, blood sugar, suppress immune system, etc. Fluctuates throughout the day. Pharma is very interested in this.
Levels are also tied to mood, etc.

Glucocorticoid receptor binds hormone in cytoplasm, translocates to nucleus. Activates and represses transcription of thousands of genes.

Chip-seq in A549: GR (-hormone): 579 peaks. GR (+ hormone): 3,608 peaks. Low levels of endogenous cortisol in the cell probably accounts for the background. (of peaks, ~60% are repressive, ~40% are inducing.) When investigating the motifs, top 500 hits really changes the binding site motif! No longer as set as originally thought – and led to discovery of new genes controled by GRE. Also show that there’s a co-occupancy with AP1.

[Method for expression quantization: Use windows over exons.]

Finally: a few more little stories. Mono-allelic transcription factor binding. Turns out to occur frequently, where only one allele is bound in ChIP, and the other is not binding at all. (in the shown case, turns out the SNP causes a methylation site, which changes binding.) Same type of event also happens to methylation sites.

Still has time: just raise the point of Copy Number Variation. Interpretation is very important, and can be skewed by CNVs. Cell lines are particularly bad for this. If you don’t model this, it will be a significant problem. Just on the verge of incorporating this.

They are going to 40-80M reads for RNA-Seq. Their version of RNA-Seq is good, and doesn’t give background. The deeper you go, the more you learn. Not so much with ChIP-Seq, where you saturate sooner.