CPHx: Martin Kerick, Max Plank Inst. – (Epi-)Genomics in prostate cancer: Mutations, Copy Numbers & Methylation

(Epi-)Genomics in prostate cancer: Mutations, Copy Numbers & Methylation
Martin Kerick, Max Plank Inst.


Focus on Prostate cancer.  DNA-Seq, MeDIP-seq, RNA-seq.

They have a polonator, but not in use.. If anyone wants it, send them an email.

Sequence enrichment, mainly using in solution hybridization.  Works well for them.

Prostate cancers are multifocal. unsettled if that’s clonal or multi-events.

ERG under the control of an Androgen promotor – found in 50% of cases, indicator of poor prognosis, but again, subject of debate.  Commonly observed in TMPRSS2/ERG fusion.

Methylation studied by MeDIP, Mutations/CNV by targeted sequencing.

  • Mutations: 32 patients, tumour and adjacent non-affected tissue.
  • Methylation: 51 tumour patients/53 non-tumour subjects

Somatic SNV profiles: you get about 8000 variants per patient, but only 0-2 somatic non-synonymous mutations per patients.

Had to step back to look at non-coding somatic mutations.

Some patients show differences in transition/transversion status from usual. Associated with (TMPRSS2/ERG), Ratio is higher for cancers with fusion, lower where fusion is missing.

Focussed specifically on fusion events.  Fusions per aligned reds higher in patients with TMPRSS2/ERG fusions.

CNV:  one can do CNV analysis with targeted sequencing.  Demonstrate by showing the same data from exome-seq and whole genome sequencing.  At macro level, the data looks the same, though perhaps noisier with targeted exome.

Applied this to 32 patient to get CNV analysis for targeted genes.  Focussing specifically on cancer gene census genes, you can see nearly all patients (all but 5) show CNV changes for 1-14 genes.

Methylation profiling.

Looked at GSTP1 methylation, show hyper-methylation in 49/51 of tumour patients, but not in normals.  Methylation profiling shown as well [ nice pictures, but can’t replicate it].

Promotor methylation is also significant for cancer gene census genes.

Can also do some tumour classification using methylation status of 7 tumour specific markers.  Clustering shows a clear division between normal division.  Accuracy is 100% for separating tumours and normals.


  1. Low somatic variation rate in prostate cancer
  2. transition/transversion  ratio is associated to TMPRSS2/ERG fusion
  3. Chromosomal fusion rate tied to TMPRSS2/ERG fusion status
  4. CNV can be detected with targeted sequencing
  5. CNVs vary among patiens
  6. Hypermethylation can be used to classify tissue samples
  7. hypomethylation in intergenic regions
  8. Hypermethylation is found in promotor/CGI regions


CPHx: Elizabeth Murchison, Welcome Trust Sanger Inst. – Evolution of transmissible cancers

Evolution of transmissible cancers
Elizabeth Murchison, Welcome Trust Sanger Inst.


Focussing on two cancers – one in tasmanian devil, the other in dogs.

Every cancer is a somatic outgrowth with it’s own evolutionary process. Cancer is an evolutionary dead end, however, because eventually it leads to the death of the host.

Tasmanian devil facial tumour disease (DFTD) and Canine transmissible venerial tumour (CTVT).  In both of these tumours, a clone is transmitted to other individuals, making it able to propagate.

Tasmanian devil:  Largest marsupial carnivore, size of a small dog.  Habitat is just tasmania.

in 1996, a photographer took a photograph of a tasmanian with a tumour.  It was originally thought to be an isolated incident, however, by 2000, there were 2 more cases, by 2002, a lot more tumours were observed.  It became clear that they were not isolated cases, but rather a spreading disease.  By 2004, had spread through most of tasmania.  Only a few pockets now remain disease free.  This has been accompanied by a massive decline in tasmanian devil populations.  It may in fact lead to the extinction by 2030.

[Some images of the tumours.]  Tumours kill the individual within 6 months of first symptoms.  It’s likely of schwann cell origin.

Devil has 7 sets of chromosome, while the cancer misses chr2 and is haploid 6.  All of the tumours studied all have almost identical karyotypes.  That’s highly unusual for cancer.  Odd of it happening on it’s own is astronomically small.

Devils tend to bite each other a lot during fighting and mating.  This may be the mechanism of transmission.  Cancer cells are found in the saliva, so it makes sense.

Predictions based on this proposed mechanism: all the cancers should be genetically identical.  The cancers should be genetically different from their hosts.

Microsatelite genotyping.  Tumour and normal dna as collected from individuals.  Comparing variation, the normals are all variable for microsatelite length, however, the cancer is uniform for length, independent of the length of the microsatelite length.

This, then is unique – one cancer has been spectacularly successful both in propagation, and transmission and longevity.

Next step was to start looking at the DNA of the cancer, then.  This was done at Sanger on illumina.  De novo assembly was undertaken.  Reference genome was also done for tasmanian devil in the process.  It was not perfecgt, but was good enough to start studying the cancer.

However, before moving on, they Harvested DNA, flow sorted each chromosome to seaparate, then investigate each chromosome separately.  [ok, that’s neat.  I haven’t seen that done before.]  Helped to improve assembly significantly.

On to the cancer.  What’s actually being sequenced is the reference “index case” of the cancer.  However, the variation between the cancer and the first individual is less than between individuals, usually.  Thus, it can be difficult to separate cancer variations from normals.

Sequenced two tumours, as far apart as possible, to aid in finding somatic SNPs.

First individual “Joey”, about 0.5 SNPs different from reference.  Found about 15-18k mutations different from the two cancers. Use this information to build up a phylogeny of the cancers as it spread across the population.

Collected  more than 100 tumours and matched normals from across the region.  Pick variations from the two cancers selected, and then sample them across the populations sampled.  You can use this to identify the heritage patterns of the spread of the cancer.

This can then be used to explain the pattern of the devil tumour movement.

What’s next for the tasmanian devil?  Captive breeding programs, island sanctuaries, prevent diseases, vaciine and cures.

Briefly touch on canine cancer.  It’s a genital tumour, spread by sexual contact.  Found all around the world.  Highly transmissible.  Evidence confirms that this tumour is all a single clone in dogs.

First identified in 1876 in russia, so it’s at least hundreds of years old, but likely much older than that.  May have originally come from wolves, as it’s snps are more similar to wolves than dogs.  May be the oldest clone in existance today.

Summary:  Only 2 transmissible cancers we know of today.  Although they’re similar in mechanism, they’re very different.  One has a long lineage, but not so much in the other.  Metastasis is common in tasmanian devil, but not in dogs.  Tasmanian devil cancer is also not at all sensative to cancer therapies, while dog cancer is.

How about humans?  Can human cancers be transmissible?  Tested, once, under questionable situations… however, fortunately, it didn’t.  However, there is evidence that it can happen, but it requires exceptional circumstances. (example given: Surgeon cut himself while operating on a cancer.)

[Wow, I really enjoyed that talk.  Very clear, easy to follow, and interesting subject.]

CPHx: Brian Muegge, Washington University School of Medicine – Influence of the host diet on the structure & function of the mammalian gut microbiome.

Influence of the host diet on the structure & function of the mammalian gut microbiome.
Brian Muegge, Washington University School of Medicine


Not part of the sequencing centre, they do their own sequencing.  Small lab.

Working on the Human Microbiome.  Microbiome:  totality of microes, their genetic elements, and environmental interactions in a defined environment.  Metagenomics is the genomic analysis of a population of microorganism.

Gut is divided into different segments, and the microbial communities vary across the length.  Stomach is different than small intestine, which is again different from large intestine.  In this case, Stool bacteria is studied as a proxy for the large gut.

Past work: researcher went to zoos and started looking at mammalian stool, looking at 16S rRNA.  Strong clustering by diet type, just looking at species of the gut.

started by looking at 39 mammalian fecal samples from previous study.  Range of 3 diet classes, both free living and zoos.

also used 18 humans who are practising long-term caloric restriction with optimal nutrition.  They were selected because they keep great records of what they eat, not because of the caloric restrictions.

Method:  Single fecal sample from each individual.  DNA isolation, then sequencing on 454 FLX.  Two kinds of seq: V2 directed 16S rRNA sequencing and shotgun gene sequencing and functional annotation.  Used blasts against Kegg to figure out what the genes are doing.

What they found:

  1. community phylogeny structure predicts functional profile.
  2. Host diet is associated with different bacterial metabolic patterns (eg amino acids)
  3. Human dietary intake correlates with microbiota structure.

Discuss new type of analyses: procrustes analysis.  [I’ve never heard of it before.  seems to involve transforming data until you find a good fit, rotating through various axes to generate the best overlay to find the best clustering… Hope I got that right.]

Example of analysis shown for omnivore, herbivore, carnivore.  Clustering is apparent by diet type.  Fit is much better than expected than chance.  Omnivores sometimes cluster with herbivores, sometimes on their own.  Occasionally with carnivore.

Similar results obtained for several different classes of enzymes.

What is the biology separating the different groups?  Used E.C. annotations, looking for things that separate groups.  Amino acid metabolism enzymes not found in carnivores, but (12/20) enriched in herbivores.  Amino acid degredation found in carnivores (9/20) found in carnivores, but only 3/20 in herbivores.  [Nice map shown, indicating which enzymes are found in herbivores and which are in carnivores.]

Herbivores are making amino acids, carnivores are breaking them down.

On to humans.  Regression was core of analysis.  If you know how much protein they consume, you can explain about 30% of the variation.  36% can be explained by OTU.


  1. Adaptations by gut microbiome to host diet is reproducible across diverse mammalian hosts.
  2. Phylogenetic data can be used to predict functional provile
  3. E.C. level metabolic reconstrcution reveals reaction pathway differences.
  4. And the human stuff written above.  Diet reflects variation in human gut microbiome.


CPHx: Jose Carlos Clemente, University of Colorado at Boulder – High throughput spatial & temporal studies of the human microbiome

High throughput spatial & temporal studies of the human microbiome
Jose Carlos Clemente, University of Colorado at Boulder


1 trillion human cells, each with 30,000 genes.  Microbiome has 3,000,000 genes.

Many reasons why it’s important.  We heard about it’s effect on obesity, it also has significant effect on some drug metabolism, and it can help you chose your sexual partner… if you’re a fruitfly.

Key developments in the microbiome community:  Next gen seq, barcoded samples, and the tools now exist to do analysis.  QIIME (open source software for barcoded sequences to give visual interpretation of your results.)  Caporaso et all, Nature, 2010.

Keyboard microbiome studied.  Is there a transfer of bacteria from fingers to keyboards.  Is the transfer permenant, or is it transient? If it’s permanent, is it correlated to spacial arrangement? etc.

Turns out that infact, there is a transfer, and you can identify individuals from the bacterial residue left behind.    Each individual leaves different colonies.

In fact, each individual have different colonies depending on the different part of the individual.  gut and hair bacteria, for instance, are easily distinguished from other parts of the person.    No difference from gender, or latterally (no left/right differences.)

Also repeated this across time.  Variation within habitats less than variation between habitats.  same trend for time, etc.

Moving pictures of the Microbiome, visualize differences over time.  Gut samples are relatively stable, but have some variation over time.  Mouth samples are most stable over time.  Skin samples are all over the place, fluctuating on a daily basis.  (Caporaso et al., Genome Biology 2010)

Same results are being captured with Illumina and 454, which is good to know.

[Movie time!  Get to watch the metrics change over time.]

Moving on – face biogeography. Analysis shown, for 50 sites sampled – two samples are very different, but the rest cluster.  The two samples outlying are from the lips.  Mapping the PCOA to the face, shows the variability. The one sample shows some interesting variability.  The second component shows a distinct outlier on the face, on the nose bridge.  Turns out, that the sample constantly adjusted glasses, probably leaving “hand bacteria” contamination at that locaiton.  [neat visualization.]

Mapping specific types of bacteria to the face.  Propionibacteriacae, for instance is all over the face, except lips and ears.  Potentially ears are colder, and not optimal for this type of bacteria?

[neat analysis!]

Quick plug for QIIME – there is a forum (forum.qiime.org) for alls support.  It scales well onto large clusters.  [I suggest you visit the site for more info, so I don’t have to copy it down.  www.qiime.org ]

There is a cloud version of the software, available on amazon web services.

example: 70 million reads cost $200, took 3 hours.

The future:  Several large scale projects – re-analyzing part of the human microbiome project, and working on the earth microbiome project. 10,000 samples.  Challenge is to deal with all of the data generated by this project.  Possible solution is to do cloud analysis.  Put the data on a disk, ship it to amazon, then they upload it to their cloud.






CPHx: Torben Hansen, Copenhagen University – The impact of our genomes on metabolic health

The impact of our genomes on metabolic health
Torben Hansen, Copenhagen University


Part 1: human genetic studies of common forms of obesity.

Several Features of obesity, which all lead to type 2 diabetes and vascular diabetes.  Triggers: overeating and lack of physical activity, but what about genetics?

Eighteen risk loci associate with obesity or increased BMI with genome-wide significance.  Eight are common, but only increase risk of obesity with 8-33%.  Most of them were discovered in 2009 or later.  The most important was discovered in 2007: FTO.

Most of obesity genes are expressed in the brain.  Seems to be a brain disease.  A more recent analysis (Speliotes-EK et al. 2010) found 18 additional loci, over 249,796 individuals in the study.

Can also consider distribution of the fat, which are determined by another 13 genes.

Knowing all of this, how do we apply this to clinical settings?

Cumulative effect and predictive value of 32 loci was tested on 8,120 individuals.  There is a correlation between the number of risk loci carried and the BMI, which supports their contribution to human health analysis.  Thus, they should be used for predictive analysis to focus our attention on populations at highest risk.

ROC graph shown (Specificity vs Sensitivity).  [It looks Terrible!]  It’s not particularly sensitive or specific, so it can not be used yet.  (No value given, unfortunately.)  There is obviously more in play than just the risk loci.

Is an imbalance in gut bacteria in part to blame for obesity?

Human colon bacteria are collectively about 1.5kg of bacteria, mainly made up of baacteriodetes and Fimicutes (90%).  Protect against pathogens, control epithelial cell proliferation, etc.

Obese mice have more Firmicutes (Ley RE et all, PNAS 102).  Firmicutes have a higher capacity for fermentation of non-digestible polysaccharides than bacteriodetes.  This is also transferrable, when flora from obese mice were moved into skinny mice, they became obese.

80% of human gut bacteria can not be cultured.

Major aim of MetaHIT project is to identify the gut microbiome at the gene level, and to study the role of the gut bacterial genes and species.

Massively parallel shotgun sequencing of genes from distal gut bacteria of 86 Danes and 38 from Spain.  Sequencing done at BGI.  Published in Nature, 2010.

3.3 million genes from distal gut identified.  Human gut gene set is at least 150 times larger than the gene set of the thuman genome.  99% are bacterial.  160 gut bacterial species with 530k genes.

64/160 bacterial species are shared by more 90% of individuals.

High variabilit in abundance of shared species, up to 5000-fold

Do any of these species associate with obesity?

Danish population study undertaken, started in 1999, updated in 2004 and now added to again in 2009. (Study metrics show, with controls methods, including “no yogurt for 5 days prior to collection”.

Genes of the obese cohort.

177 danish individuals had their gene presence assessed by illumina sequencing.  30M PET per individual.  Sequencing again done at BGI.

How many genes do indiduals actually have.  Distribution of obese people is in two sets: a low gene count, and a high gene count.  Over weights and normals only show the high gene count distributions.

Those with low gene counts and obese phenotype are more insulin resistant that those with high gene count and obese phenotype.   This is not apparently affected by gender.

The differences in metabolic profile, however were most pronounced in obese women.

Those obese women with low gene counts also show other issues, such as dyslipidemia. [did I spell that right?]

Food questionaire was inspected.  Those with low gene counts had a higher incident of fat in their food.

Looking back over the study from the earliest control, there are other trends.  Over time low gene count obese women increase their BMI and waist circumferenece more than high gene count obese women.  Insulin levels also increase over time.

Bimodal distribution of gut bacterial genes seen in both French and Danish groups.  Distribution of genes is different between high and low gene counts, however.. Some are only found in one group or the other.

Phenotpyes of low-gene count obese people include other issues such as increased inflamatory parameters, found in the French data set.  However, the Danish study did not include these parameters.

Is low grade inflamation associated with low gene count obesity?

Four meta-species are diagnostic for low and high gene individuals.  AUC 0.99 for ROC. (Tested on 99 Danes.)

Models may have prognostic value for identifying people at higher risk of cardiovascular genes.

Searching for species associated with BMI.  33 species associate significantly with obesity.  ROC AUC is 0.85.

The same meta-species discriminate micro-obese individuals.

Which molecular mediators are involved in the metabolic dysfunctions?

Could microbiome based interventions work?  Fecal transplants?  Could we customize treatments based on genomic gut profiles?

We must consider gut microbiome as an integral part of the human health equation.

CPHx: Roald Forsberg, Sponsored by CLC bio.

CLC Genomics Gateway

Roald Forsberg

CLC’s take on next generation of genome informatics tools.

At the 10th aniversary of the human genome project.  We now have a common and comprehensible coordinate system.. That’s a real tangible effect of the project.  So much of the work we now do is based upon this first human genome coordinate set.    HUGE list of things we can do, because of it.

However, if you look at pre-NGS work, all of the efforts were focussed on making sense of the data and annotating it – and sometimes looking at it.  It was all centralized, curated, etc.

In the post-NGS, sequencing has now been democratized.  We have access to big dataset, changing our focus to analysis, mapping, modeling. Our data resources are now many, varied, decentralized and non-curated.  A complete paradigm shift in bioinformatics focus. [my words.]

CLC has built a platform for doing this type of work.  Server based, client based, scalability, etc..  It supports the majority of genomics workflows (de novo, reseq, chip-seq, rna-seq, smrna-seq, tag-seq.)  We lack the tools for pushing further down into integrative biology.  Pathways, diagnosis, systems biology…. but there ARE no tools for doing this.

Genome browsing is no longer enough. [Thank you, I’ve been waiting for someone to say this!]

Perl IS NOT A FRAMEWORK.  [Halelujah!]

You will not get the medical community to use this technology with perl and spreadsheets.

It is time to move downstream: CLC genomics gateway.

  • an integrated framework to visualize, analyze and combine data from the same reference genome
  • seamlessly add own expreiments in an integrated manner
  • foundation for systems biology, functional genomics, etc.

Technical details

  • One data object to store and /or point to data sources
  • lazy fetching’
  • federation of data sources
  • a graphical user interface
  • completely integrated with other bioinformatics tools
  • part of the clc systems developer kit, so developers can add components and reuse existing ones without having to maintain the gui.

Enables a wide variety of tools and integrates so you can work with other tools like blast, primer design, etc.

Visualization tools:  Bring in all of your tools into one place.  [looks clean, actually.  nice.]

Federation of data: can bring in together sources like UCSC, ENsembl, your own data, flat files, etc.

Analysis types beta. Some tools for SNP comparison and annotation, Filtering snps from federated data… basically, it brings together annotations from a variety of sources and puts it all together in one place.

More analysis types to be added, including cancer biology, further advanced features.

Beta tool is out now, with new versions coming out soon.

Also, they’re hiring bioinformaticians.



CPHx: Juan Medrano, UC Davis – Milking the genes: Genetic analysis of milk.

Milking the genes: Genetic analysis of milk.

Juan Medrano, UC Davis


What is the perfect food?  Milk.  Lactation: 120 million years of evolution.  The darwinian engine of nutrition.

Constituents of human milk.  Not a homogenous mixture.  88% water, 12% solids.  Zivkovic A M et al. PNAS 2011, 108 p4653-4658.  Solids include 70g/L lactose, 40g/L lipids, 8g/L proteins, 5-15g/L Human Milk Oligosacharides.  (Oligosaccharides vary between animals, etc.)

Overview of milk production.  Mammary tissue sample not possible in human, but you can extract cells from milk itself.

High correlation between somatic cells and expression in mammary gland. [Must have missed something, not sure where that came from.]

RNA-seq technology overview.  De novo, snp, indels, new transcripts.  Pathway analysis and functional annotation.

Look at overlap between mammary specific genes (570), mammary and milk(11.6k), milk specific (1k).  High correlation.

Milk transcriptome at different stages of lactation.  Used cows, sample in three major periods: start of lactation, peak lactation and end of lactation.  There is a skew in this.  10 genes represent 61% of reads at day 15, but only 11% on day 90.

Proteolytic enzymes in milk.  Changes dramatically, increasing over course of 3 samples taken.  Important in many processes, including cheese making.  [yes, we’re still talking about cows.]

Milk microbiome also determined for bovine and human milk.  there are beneficial bacteria in milk.  In Human, apx 60 different strains – about 20 uncultured.  In cow, about 2000 strains, 188 uncultured

Target validation.  Pathway analysis also done.  SNP selection.  Marker trait association studies to identify regulators of the pathway in an experimental way.

One example of this is citrate in milk. It’s a buffer in milk, Ca and P balance.  Heat stability, etc.  74 snps in 20 genes used for an association study.  Citrate content in milk was measured in 350 Holstein and 200 Jersey cows.  [how often do I get to write stuff like that? (-:  Moooo.] worked well to identify a gene of interest.



  • milk is interesting
  • RNA-workflow is powerful and insightful
  • Understanding expression patterns may help for future milk producers and users (bovine)
  • microbiome can have significant impact on our health
  • Pathway analysis can give us insight.


CPHx: Mike Hubank, UCL Genomics, Inst. of Child Health – Genome-wide Transcriptional Modeling

Genome-wide Transcriptional Modeling

Mike Hubank, UCL Genomics, Inst. of Child Health


Challenges in complex transcriptional analysis.

Interpretation beyond the “favorite genes/Top 10” style are less frequent.  Ask better questions:

  • What TF activities control the response?
  • which genes are targets of which TF?
  • How does TF activity affect the expression patter?
  • How do tf activities interact to shape the response?

Data driven mathematical modeling offers solutions.

[ooo.. actual math!  Can’t copy it, tho.]

Goal is to assign values to the parameters, including correct rate of change of gene, concentration, degredation rate, activity of transcription factor, etc.

[more equations..  rearrangements of the one before]

Everything you can measure on one side, what you can’t on the other.  Discretisation by Lagrange interpolation.  Genome wide transcriptional modeling (GWTM) generates production profiles for every transcript.

Example:  DNA damage response model, Radiating T-cell cell-line.  Model based screening of activity profiles.  Identifies jointly regulated transcripts, groups them, with probabilities.

Validation done.  Knock out one gene, see if predictions make sense in taking out other genes as well.  “obviously they went quite well, otherwise I wouldn’t be here talking to you.”

GWTM explains a high proportion of the response.  68% of upregulated genes correctly assigned.  Unexplained transcripts result of co-regulation or measurement error.  Aim to eradicate error with Sequencing based measurement.

There is a problem with NGS data, however, when it gives “zeros”.  When you have a read of zero, it may not actually be completely off, so it’s hard to model around that.  New Correction tool – Dirac-sigma truncated log normal. [did I get that right? I’ve never herd of it.]

Conclusions: Dynamic transcriptional modelling can be used for:

  • Data driven deconstruction of complex responses
  • Ability to conduct complex “experiments” in silico (eg, vary TF activity or parameter models to make predictions.)
  • Generation of biologically meaningful parameter values (eg, correlation with apoptosis regulators.)
  • Linkage of functional modules
  • (Benefit of) Economy

Much of this is already available in R, the rest of it will be shortly.

CPHx: John Quackenbush, Dana-Farber Cancer Inst. & Harvard – Network & State Space Models

Network & State Space Models: Science and Science Fiction Approaches to Cell Fate Predictions

John Quackenbush, Dana-Farber Cancer Inst. & Harvard


Challenge the way  you think about biological systems.

“Science is built with facts as a house is with stones,  but a collection of facts is no more a science than a heap of stones is a house” – Jules Henri Poincare

What is a model?

“The purpose of the models is not to fit the data, but to sharpen the questions.”

The question in biology – Is the mean large, given the variance?

Example, determining gender by height.  There is a correlation, but the variance is huge.

We would like small variance compared to the difference in mean.

An alternative:  Is the difference in variance large independent of the mean?

Modeling cell fate transitions.  How does one cell morph into another cell type based on stimulus. Also want to identify pathways that underlie various cell types. All of this comes from building models.

Referee #3 always contests the use of the word model on all his papers.

Phenomenology tries to look at the past.  Ultimately we look to develop a theory that describes the interactions that dreive biological systems.  Build an approximate model that describes a body of knowledge that relates empirical observations of phenomena to each other, consistent with fundamental theory, but not derived from theory.

A journey through Variation.  Jess Mar’s PhD work.

Cells converge to attractive states.  Stuart Kauffman presented the idea of a gene expression landscape with attractors.  Great illustration of gene networks on a landscape.. distinct patterns of gene expression.  States are attractors, and pathways tend to self organize towards them.

There are only 250 stable cell types and each of them represent attractors.

Can we push cells from one state to another based on external stimulus.

An example of Promyelocytes (HL-60) transforming into another cell type.  Arrays done to profile the states of the gene expression between the two end points over several days.

Cells Display Divergent Trajectories That Eventually Converge as they Differentiate.  What accounts for the divergence?

There are multiple processes that are occurring during this observed change.  What you see is actually the sum of all of the different process.  You can, in fact, divide the genes into different groups: transients and core changing genes.  Transients tend to be related to external stimuli.

Waddington’s hypothesis.  A developmental biologist, with publications of attractor states, etc.

Waddington’s model calls for creation of “canalization” of the landscape, in which you move from start to end in paths.

The paths, however, don’t have to be straight.  You can get paths that wander up the walls of the canals.  Individual cells can follow random courses down that path…  thus when you look at the population, you see the canal, but if you don’t, you’d see a high amount of variation.

Had to come up with a method or pathways that characterize various cell types.  What are the signatures?  “Attract” soon to be published.  Finds core pathways that underlie cell fate transition.  Pull out pathways from KEGG – then built new method of gene set phenotyping.  Ranking pathways based on cell type informativeness.

Need to look at separate expression groups.  Some profiles are common across various states, so you need to deconstruct the pathway profiles to make sense.  This can then be used to define an “informativeness” metric, which in tern can be used for identification of core pathways that identify states.

A variational approach to expression analysis.

A stem cell model for neurological disease, based on olfactory cells.  Nasal biopsies, culture pluripotent stemcells, then allow the stemcells to differentiate.  9 healthy, 9 schizophrenia, 13 parkinsons.

What are the pathways that characterize the differentiation of the stem cells?

A bunch of pathways were identified that stood out with significant p-values. One can then ask if anything stood out between the control and the neurological disease patients.  There were no real difference in average pathways… but there were significant differences in their variance!

How important is the difference in variance in defining phenotype?

When overlaid, you can observe skews in the data for pathways.  If the change in variance is important, you should see an even greater skew in the pathways that are key in defining the phenotype.

Indeed, when looking at key pathways, the skew becomes more apparent.  Top 5 pathways show the same skew each time.  There is a robust difference in the profiles, then.

You can also observe the same type of phenomena when using 5% top/5% bottom cutoffs.

High variance genes are cell surface genes and nucleus, low variance tend to be kinases, signalling. etc.

Variance constraints alter network topology.  This suggests schizophrenia are opposite ends of spectra of neural disease.  (Referring to variance being high in one, and low in the other)

Now, trying to understand the mechanisms underlying this variance.

Path integral formulations of quantum mechanics… neutonian objects follow one path. subatomic molecules follow EVERY path.  You must consider cells in the same way, they follow many paths that converge to the average path.

[Ok, I really like this analogy.]

Where are we going?

  • Biology is really driving this
  • integrated data types must be considered intelligently
  • We may be in a position to start developing functional biology models. [My words.. it was expressed more clearly by the speaker.]

Genomics is here to stay.  Even bus drivers have DNA kits to help identify people who spit on them. (-:


CPHx: Peter Jabbour, Sponsored by BlueSEQ – An exchange for next-generation sequencing

An exchange for next-generation sequencing

Peter Jabbour, Sponsored by BlueSEQ


A very new company, just went live last month.

What is an exchange?  A platform that brings together buyers and sellers within a market.  A web portal that helps place researchers, clinicians individuals, etc. with providers of next-gen sequencing services.

[web portals?  This seems very 1990s… time warp!]

Why do users need an exchange?  Users have limited access, need better access to technology, platvform, application, etc.

Why do providers need an exchange?  Providers may want to fill their queues.

[This is one stop shopping for next-gen sequencing providers?  How do you make money doing this?]

BlueSEQ platform: 3 parts.

  1. Knowledge Bank:  Comprehensive collection of continuously updated Next Generation Sequencing information, opinons, evaluations, tech bechmarks.
  2. Project Design: Standardized project parameters.  eg, de novo, etc. [How do you standardize the bioinformatics?  Seems… naive.]
  3. Sequencing exchange:  Providers get a list of projects that they can bid on.

[wow… not buying this. Keeps referring back to the model with airline tickets.]

Statistics will come out of the exchange – cost of sequencing, etc.

No cost to users.  Exchange fees for providers. [again, why would providers want to opt in to this?] 100 users have already signed up.

Future directions:  Specialized project desin tools, quoting tools, project management tools, comparison tools, customer reviews.

There are extensive tools for giving feedback, and rating other user’s feedback.

[Sorry for my snarky comments throughout.  This just really doesn’t seem like a well thought out business plan.  I see TONs of reasons why this shouldn’t work… and really not seeing any why it should.  Why would any provider want customer reviews of NGS data… the sample prep is a huge part of the quality, and if they don’t control it, it’s just going to be disaster.  I also don’t really see the value added component.  Good luck to the business, tho!]