>Link to all my AGBT 2010 Notes.

>One last blog for today.

If you’re looking for the complete list of my AGBT 2010 notes, look no further. The link below has the full list of talks and workshops I attended. I haven’t indexed it, but if you search for “AGBT 2010” within the page, it should take you to the next header/footer in reverse chronological order of the notes I took. Cheers!

AGBT 2010 notes.

>AGBT wrap up.

>So, everyone else has weighed in with their reviews of AGBT 2010 already, and as usual, I’m probably one of the last to write anything down. Perhaps the extreme carpel tunnel syndrome I’ve exposed myself to by typing out my notes should suffice as an excuse…

Anyhow, I wanted to put down a few thoughts on what I saw, heard and discussed before I forget what I wanted to say.

First off, I know everyone has commented on the new technologies already. I’m very disappointed that I wasn’t able to see the Ion Torrent presentation, and that I missed the presentation from Life Technologies. Those were two of the biggest hits, and I didn’t see either of them. While I did get a quick introduction on the Life Technologies platform from a rep in the Life Tech suite, it’s not quite the same.

However, I was there for several of the other workshops and launches, and in particular, the Pacific Biosciences workshop. In general, I think Pac Bio has been served up a lot of criticism for failing to disclose the exact error rate of their Single Molecule Real-Time (SMRT) sequencing platform, as well as for some of the problems they face. Personally, I’m not inclined to think of any of that as a failure – simply as engineering problems. Having worked on early 454 data, there were flaws that were equally disastrous as the challenges that Pac Bio now faces. Much of the criticism is simply directed at the fact that this is measuring single molecules of DNA, and not clusters. Clearly, there are will be challenges for them to overcome: The most obvious are that PacBio will have to lower the wattage of their light source and they’ll likely have to do some directed evolution (or even rational design) to lower the frequency at which bases are incorporated too quickly to be read, or possibly come up with a chemistry solution. (More viscous solutions? who knows.) All of the 2nd generation platforms were launched with problems – and Pac Bio certainly isn’t the exception to it. Each one gets better over time, and I’m certain PacBio will continue to improve. For the moment, they’ve suggested protocols like sequencing circular DNA that dramatically reduces the error rate, these issues aren’t nearly as big as the hype makes them out to be.

Just to finish off on the subject of SMRT sequencing, I think Elaine Mardis’ presentation on the results obtained with PacBio weren’t outstanding. Normally, I get really jealous about PacBio results, and wish that I could get my hands on some of them – but this time, I was left a little flat. While there are really neat applications for single molecule sequencing, Human SNPs really aren’t one of them. Why they chose to present that particular problem is somewhat beyond me. Not that the presentation was bad, but it failed to really showcase what the platform can be used for, IMHO. Their other presentations (SMRT Biology, for example), were pretty damn cool.

There has also been much talk about Complete Genomics, and how they’re not going to make it, which I’ve already written up in the previous post. I see that as a failure to understand their business model and to understand who they’re competing with (ie, not the other sequencing companies.) I expect that they’ll be the microarrays of the future – cheap diagnostic tools, with even better repeatability than your average microarray. I don’t think they should be written off just quite yet.

Finally, there has been much ado about the HiSeq 2000(tm), released by Illumina. While I have nothing against it (and am even looking forward to it), I don’t see it as much except for an upgraded version of their last machine, the GAIIx. They’ve changed the form factor and the shape of the flow cell, and then enabled some things that were previously disabled (such as two sided tile scanning), it’s really just an evolutionary change in a new box, which will allow them more room to grow the platform. Fair enough, really – I don’t know how many more upgrades you could put into one of their original boxes, but there’s nothing really new here that would have me running after them to get one. I should mention, however, that increased throughput and lower cost ARE significant and a good thing – they just don’t appeal to my geeky fascination for new technology.

Another criticism I heard was that these companies shouldn’t be calling their tech “3rd generation.” Frankly, I’ve been advocating since last year that they SHOULD be called 3rd generation, so that criticism seems silly, to say the least. Pyrosequencing is clearly synonymous with the 2nd generation of sequencing technologies, while Sanger sequencing is clearly first generation, and hybridization is kind of zero-th generation (although you could make a case for SOLiD being 2nd generation, which would also drag Complete Genomics into that group as well, then). However, the defining characteristic of 3rd generation, to me, is the move away from sequencing ensembles of molecules. An auxiliary definition is that it’s also the application of enzymes to do the sequencing itself. So, I’m just going to have to laugh at those who claim that 2nd and 3rd generation are all generically “next-generation” sequencing. There is a clear boundary between the two sets of technologies.

A topic I also wanted to mention was the use of technology at AGBT this year. Frankly, I was blown away by the coverage of all of the events through twitter. I enjoyed at least one talk where I left twitter open beside my text editor, and tried to keep notes while listening to the speaker had to say, while watching the audience’s comments. If I hadn’t been blogging, I think that would be the best way to engage. Insightful comments and questions were plentiful, and having people I respect discuss the topic was akin to having other scientists leave comments in the margins of a paper you’re reading. [Somewhat like reading Sun Tsu’s Art of War, where there are more annotations than original material, at some points.] Alas, it was too distracting to compile notes while reading comments, but it was really cool. Unfortunately, Internet coverage was spotty at best, and in some rooms, I wasn’t able to get any signal at all. The venue is great, but just not equipped for the 21st century scientist. Had I been there at the end of the conference, I would have suggested that perhaps it’s time to identify an alternate venue that can handle the larger crowds, as well as the technological demands of an audience that has 300+ laptop computers going at once. (Don’t get me started on electrical outlets.)

I’d like to end on a few good points.

The poster session was excellent – too short, as always, but the quality of the posters were outstanding, and I had fantastic conversations with a lot of scientists. I won’t mention them by name, but I’m sure they know who they are. I saw several tools I’ll try to follow up on. (By the way, if anyone was looking for me, I spent less than 20 minutes by my poster throughout the conference. There just wasn’t enough time to read all of them and still answer questions and absorb everything out there. Sorry about that – feel free to email me if you have questions.)

I should also mention that the vendors were all very hospitable. One of my enduring memories of this year will be Life Technologies allowing the Canadians to crash their suite and use one of their Demo TV’s to watch the semi-final Olympic hockey game. (Canada vs. Slovakia.) We were desperately outnumbered by non-Canadians, but they tolerated our screaming pretty well. (A few of them even seemed curious about this weird sport played on ice…) And, of course, anyone who saw my tweets knows about PacBio and the hawaiian shirt, just to name a few examples (-;

So, again, I think AGBT was a great success and I enjoyed it tremendously. Rarely in my life do I get to pack so many talks, discussions and networking into such a short period of time. It may have left me looking somewhat like a deer caught in the headlights, but unquestionably I’m already looking forward to what will be revealed next year.

>Complete Genomics, Revisited (Feb 2010)

>While I’m writing up my notes on my way back to Vancouver, I thought I’d include one more set of notes – the ones I took while talking to the Complete Genomics team.

Before launching into my notes (which won’t really be in note form), I should give the backstory on how this came to be. Normally, I don’t do interviews, and I was very hesitant about doing one this time. In fact, the format came out more like a chat, so I don’t mind discussing it – with Complete Genomic’s permission.

Going back about a week or so, I received an email from someone working on PR for Complete Genomics, inviting me to come talk with them at AGBT. They were aware of my blog post from last year, written after discussing some aspects of their company with several members of the Complete Genomics team.

I suppose in the world of marketing, any publicity is good publicity, and perhaps they were looking for an update for the blog entry. Either way, I was excited to have an opportunity to speak with them again, and I’m definitely happy to write what I learned. I won’t have much to contribute beyond what they’ve discussed elsewhere, but hey, not everything has to be new, right?

In the world of sequencing, who is Complete Genomics? They’re clearly not 2nd generation technology. Frankly, their technology is the dinosaur in the room. While everyone else is working on single molecule sequencing, Complete Genomics is using technology from the stone age of sequencing – and making it work.

Their technology doesn’t have any bells and whistles – and in fact, the first time I saw their ideas, I was fairly convinced that it wouldn’t be able to compete in the world of the Illuminas and Pac Bios… and all the rest. Actually, I think I was right. What I didn’t know at the time was that they don’t need to compete. They’re clearly in their own niche – and they have the potential to become the 300 pound gorilla.

While they’re never going to be the nimble or agile technology developers, they do have a shot at dominating the market they’ve picked: Low cost, standardized genomics. As long as they stick with this plan – and manage to keep their cost lower than everyone else, they’ve got a shot… Only time will tell.

A lot of my conversation with Complete Genomics revolved around the status of their technology – what it is that they’re offering to their customers. That’s old hat, though. You can look through their web page and get all of the information – you’ll probably even get more up to date information – so go check it out.

What is important is that their company is based on well developed technology. Nothing that they’re doing is bleeding edge, nothing is going to be a surprise show stopper: of all of the companies doing genomics, they’re the only one that can accurately chart the path ahead with clear vision. Pac bio may never solve their missing base problem, Illumina may never get their reads past 100bp, Life Tech may never solve their dark base problem, and Ion Torrent may never have a viable product. You never know… but Complete Genomics is the least likely to hit a snag in their plans.

That’s really the key to their future fate – there are no bottle necks to scaling up their technology. We’ll all watch as they bring down the distance between the spots on their chips, lower the amount of reagent required, and continue to automate their technology. It’s not rocket science – it’s just engineering. Each time they drop the scale of their technology down, they also drop the cost of the genome. That’s clearly the point – low cost.

The other interesting thing about their company is that they’ve really put an emphasis on automation and value-added services. Their process is one of the more hands off processes out there. It’s an intriguing concept. You fed-ex the DNA to them, and you get back a report. Done.

Of course, I have to say that while this may be their strength, it’s probably also one of their weaknesses. As a scientist, I don’t know that the bioinformatics of the field are well enough developed yet that I trust someone to do everything from alignment to analysis on a sample for me. I’ve seen aligners come and go so many times in the last 3 years that I really believe that there is value in having the latest modifications.
What you’re getting from Complete Genomics is a snapshot of where their technology is at the moment you (figuratively) click the “go” button. Researchers like do play with their data, revisit it, optimize it and squeeze every last drop out of it – something that is not going to be easy with a Complete Genomics dataset. (They aren’t sharing their tools..) However, as I said earlier, they’re not in the business of competing with the other sequencing companies – so really, they may be able to side step this weakness entirely by just not targeting those people who feel this way about genomic data.

And that also brings me to their second weakness – they are fixated on doing one thing, and doing it well. That’s often the sign of a good start-up company: a dogged pursuit of a single goal of excellence in one endeavour. However, in this one case, I disagree with Dr. Dramanac. Providing complete genomes is only part of the picture. In the long run, genomic information will have to be placed in the context of epigenetics, and so I wonder if this is an avenue that they’ll be forced to travel in the future. For the moment, Dr. Drmanac insists that this is not something they’ll do. If they haven’t put any thought into it, when it does become necessary, it’s something that will drive customers towards a company that can provide that information. Not all research questions can be solved by gazing into genomic sequences, and that’s a reality that could bite them hard.

For the moment, at least, Complete Genomics is well positioned to work well with researchers who don’t want to do the lab and bioinformatics tweaking themselves. You can’t ask a microbiology lab to give up their PCR machine, and sequencing centres will never drop the 2nd (and now 3rd) generation technology lab to jump on board the 1st generation sequencing provided by Complete Genomics. Despite the few centres that have ordered a few genomes (wow.. I can’t just believe I said “a few genomes”), I don’t see any of them committing to it in the long run for all of the reasons I’ve pointed out above.

However, Complete Genomics could take over genomic testing for pharma or hospital diagnostics. Whoever is best able to identify variations (structural or otherwise) in genomes for the lowest cost will be the best bet to do cohort studies for patient stratification studies – and hey, maybe they’ll be the back end for the next 23andMe.

So, to conclude, Complete Genomics has impressed me with their business model, and they have come to know themselves well. I’ll never understand why they think AGBT is the right conference to showcase their company, when it’s not likely to yield that many customers in the long run. But, I’m glad I’ve had the chance to watch them grow. Although they may be a dinosaur in the technology race, the T-Rex is still a fearsome beast, and I’d hate to meet one in a dark alley.

>AGBT 2010 – Illumina Workshop

>[I took these notes on a scrap of paper, when my laptop was starting to run low on batteries. They’re less complete than most of the other talks I’ve taken notes on, but should still give the gist of the talks. Besides, now that I’m at the airport, it’s nice to be able to lose a few pieces of scrap paper.]

Introducing the HiSeq 2000(tm)
– redefining the trajectory of sequencing

First presentation:
– Jared from Marketing

Overview of machine.
– real data of Genome and transcriptome
– more than 2 billion base pairs per run
– more than 25Gb per day
– uses line scanning (scan in rows, like a photocopier, instead of a whole picture at once, like a camera)
– now uses “dual surface engineering”: image both the top and bottom surface, which means you have twice as much area to form clusters
– Machine holds two individual flow cells
– flow cells are held in by a vacuum
– simple insertion – just toggle a switch through three positions – an LED lights up when you’ve turned it on.
– preconfigured reagenets – bottles all stacked together: just push in the rack
– touch screen user interface
– “wizard” like set up for runs
– realtime metrics available on interface – even an ipod app (available for ipad too..)
– multimedia help will walk you through things you may not understand.
– major focus on ease of use
– it has the “simplest workflow” of any of the sequencing machines available
– tile size reduced [that’s what I wrote but I seem to recall him saying that the number of tiles is smaller, but the tiles themselves are larger?]
– 1 run can now do a 30x coverage for a cancer and a normal (one in each flow cell.)
– 2 methylomes can be done in a week
– you could do 20 RNA-Seq experiments in 4 days.

Next up:
David Bently

Major points:
– error rates and feel of data are similar if not identical to the GAIIx.
– from a small sampling of experiments shown it looks like error rate is very slightly higher
– Demonstrated 300Gb/run, more than 25Gb per day at release
– PET 2×100 supported.
– Software is same for GAII [Although somewhere in the presentation, I heard that they are working on a new version of the pipeline (v 1.6?)… no details on it, tho.]

Next up:
Eliot Margulies, NHGRI/NIH Sequencing
– talking about projects today for the undiagnosed disease program

work flow
– basically same as in his earlier talk [notes are already posted.]
– use cross match to do realignment of reads that don’t map first time
– use MPG scores

[In a technology talk, I didn’t want to take notes on the experiment itself… mainly points are on the HiSeq data.

Data set: concordance with SNP Chips was in the range of 98% for each flow cell, 99% when both are combined (72x coverage)

– Speed: Increased throughput
– more focus on biology rather than on tweaking pipelines and bioinformatic processing. (eg, biological analysis takes front seat.)

Next Up:
Gary Schroth

Working on a project for Body Map 2.0 : Total human transcriptome
– 16 tissues, each PET 2x50bp, 1x75bp
– $8,900 for 1x50bp
– multiplexing will reduce cost further.
– if you only need 7M reads, you could mutliplex 192 samples (on both cells, I assume), and the cost would be $46. (including seqeuncing, not sample prep.

[which just makes the whole cost equation that much more vague in my mind… Wouldn’t it be nice to know how much it costs to do the whole process?]

[Many examples of how RNA-seq looks on HiSeq 2000 ™]

– output has 5 billion reads, 300Gb of data.

Next up:
David Bently

Present a graph
– amount of sequence per run.
– looks like a “hockey stick graph”

[Shouldn’t it be sequence per machine per day? It’d still look good – and wouldn’t totally shortchange the work done on the human genome project. This is really a bad graph…. at least put it on a log scale.]

In the past 5 years:
– 10^4 scale in throughput
– 10^7 scale up in parallelizations

Buzzwords about the future of the technology:
– “Democratizating sequencing”
– “putting it to work”

>AGBT 2010 – Complete Genomics Workshop

>Complete Genomics CEO:

– sequence only human genomes – 1 Million genomes in the next 5 years
– build out tools to gain a good undertanding of the human genome
– done 50 genomes last year
– Recent Science publication
– expect to do 500 genomes/month

Lots of Customers.
– Deep projects

– don’t waste pixels,
– use ligases to read
– very high quality reads – low cost reagents
– provide all bioinformatics to customers

– don’t sell technology, just results.
– just return all the processed calls (snps, snv, sv, etc)
– more efficient to outsource the “engineering” for groups who just want to do biology
– fedex sample, get back results.
– high throughput “on demand” sequencing
– 10 centres around the world
– Sequence 1 Million genomes to “break the back” of the research problem

Value add
– they do the bioinformatics

– first wave: understand functional genomics
– second wave: pharmaceutical – patientient stratification
– third wave: personal genomics – use that for treatment

Focus on research community

Two customers to present results:
First Customer:

Jared Roach, Senior Research Sceintist, Institute for Systems Biology (Rare Genetic disease study)

Miller Syndrome
– studied coverage in four genomes
– 85-92% of genome
– 96% coverage in at least one individual
– Excellent coverage in unique regions.

Breakpoint resolution
– within 25bp, and some places down to 10bp
– identified 125 breakpoints
– 90/125 occur at hotspots
– can reconstruct breakpoints in the family

Since they have twins, they can do some nice tests
– infer error rate: 1×10^-5
– excluded regions with compression blocks (error goes up to 1.1^-5)
– Homozygous only: 8.0×10^-6 (greater than 90% of genome)
– Heterozygous only: 1.7×10^-4

[Discussion of genes found – no names, so there’s no point in taking notes. They claim they get results that make sense.]

[Time’s up – on to next speaker.

Second Customer:
Zemin Zhang, Senior Scientist, Genentech/Roche (Lung Cancer Study)

Cancer and Mutations
[Skipping overview of what cancer is…. I think that’s been well covered elsewhere.]

– lung cancer is the leading cause of cancer related mortality worldwide…
– significant unmet need for treatment

Start with one patient
– non small cell lung adenocarcinoma.
– 25 cigarettes/day
– tumour: 95% cancer cells

Genomic characterization on Affy and Agilent arrays
– lots of CNV and LOH
– circos diagrams!

– 131GB mapped sequence in normal, 171Gb mapped seq in tumour
– 46x coverage normal, 60x tumour
[Skipping some info on coverage…]

KRAS G12C mutation

what about rest of 2.7M SNVs?
– SomaticScore predicts SNV validation rates
– 67% are somatic by prediction
– more than 50,000 somatic SNV are projected

Selection and bias observed in the lung cancer genome by comparing somatic and germline mutations

GC to TA changes: Tobacco-associated DNA damage signature

Protection against mutations in coding and promoter regions.
– look at coding regions only – mutations are dramatically less than expected – there is probably strong selection pressure and/or repair

Fewer mutations in expressed genes.
– expressed genes have fewer mutations even lower in transcribed strand
– non-expressed genes have mutation rate similar to non-genic regions

Positive selection in subsets of genes
– KRAS is the only previously known mutation
– Genes also mutated in other lung cancers…
– etc

Finding structural variation by paired end reads
– median dist between pairs 300bp.
– distance almost never goes beyond 1kb.

Look for clusters of sequence reads where one arm is on a different chromosome or more than 1kb away
– small number of reads
– 23 inter-chr
– 56 intra-chr
– use fish + pcr
– validate results
– 43/65 test cases are found to be somatic and have nucleotide level breakpoint junctions
– chr 4 to 9 translocation
– 50% of cells showed this fusion (FISH)

Possible scenario of Chr15 inversion and deletion investigated.
[got distracted, missed point.. oops.]

Genomic landscape:
– very nice Circos diagram
– > 1 mutation for every 3 cigarettes

In the process of doing more work with Complete Genomics

>AGBT 2010 – Yardena Samuels – NHGRI

>Mutational Analysis of the Melanoma Genome

Histological progression of Melanocyte Transformation
– too much detail to copy down

– mutational analysis of signal transduction gene families in genome
– evaluate most highly mutated gene family members
– translational

Somatic mutation analysis.
– matched tumor normal
– make cell lines

Tumor Bank establishment
– 100 tumor normal samles
– also have original OCT blocks
– have clinical information
– do SNP detection for matching normal/tumor
– 75% of cells are cancer
– look for highly mutated oncogenes

Start looking for somatic mutations
– looking at TK family (kinome)
– known to be frequently mutated by cancer

Sanger did this in the past, but only did 6 melanomas
– two phases: discovery, validation
– started with 29 samples – all kinase domains
– looked for somatic mutations
– move on to sequence all domains…

– 99 NS mutations
– 19 genes

[She’s talking fast, and running through the slides fast! I can’t keep up no matter how fast I type.]

Somatic mutations in ERBB4 – 19% in total
– one alteration was known in lung cancer

[Pathway diagram – running through the members VERY quickly] (Hynes and Lane, Nature Reviews)

Which mutation to investigate? Able to use crystal structure to identify location of mutations. Select for the ones that were previously found in EGFR1 and (something else?)

Picked 7 mutations, cloned and over-expressed – basic biochemistry followed.

[Insert westerns here – pricket et al Nature Genetics 41, 2009]

ERBB4 mutations have increased basal activity – also seen in melanoma cells

Mutant ERBB4 promotes NIH3T3 Transformation

Expression of Mutant ERBB4 Provides an Essential cell Survival Signal in Melanoma
– oncogene addiction

Is this a good target in the clinic.
– used lapatinib.
– showed that it also works here in melanoma. Mutant ERBB4 sensitizes cells to lapatinib
– mechanism is apoptosis
– it does not kill 100% of cells – may be necessary to combine it with other drugs.

– ERBB4 is mutated in 19% of melanomas
– reiterate poitns
– new oncogene in melanoma
– can use lapatinib
[only got 4 of the 8 or 9]

Future studies
– maybe use in clinics – trying a clinical trial.
– will isolated tumor dna w ICM
… test several hypotheses.
– sensitivity to lapatinib

What else should be sequenced? not taking into account whole genome sequencing.
– look at crosstalk to get good targets
– List of targets. (mainly transduction genes)

Want to look at other cancers, where whole exome was done.
– revealed : few gene alterations in majority of cancers. Limited number of siganlling pathways. Pathway oriented models will work better than Gene oriented models

[ chart that looks like london subway system… have no idea what it was.]

Personalized Medicine
– their next goal.

[great talk – way too fast, and is cool, but no NGS tie in. Seems odd that she’s picking targets this way – WGSS would make sense, and narrow things down faster.]

>AGBT 2010 – Joseph Puglisi – Stanford University School of Meicine

>The Molecular Choreography of Translation

Questions have made the same, despite recent advances – we still want to understand how the molecular machines work. We always have snapshots that capture the element of motion, but we want animation, not snapshots

– Converting nucleotides to amino acids.
– ribosome 1-20 aa/s
– 1/10^4 errors
– very complex process (tons of proteins factors, etc, required for the process)
-requires micro-molar concentrations of each component

– we now know the structure of the ribosome
– nobel prize given for it.
– 2 subunits. (50S & 30S)
– 3 sites, E, P & A
– image 3 trna’s to a ribosome – in the 3 sites…
– all our shots are static – no animated
– The Ribosome selects tRNA for Catalysis – must be correct, and incorrect must be rapidly rejected
– EFTu involved in rejection

[Walking us through how ribosomes work – there are better sources for this on the web, so I’m not going to copy it.]

Basic questions:
= timing of factor
– initiation pathway
– origins of translational fidelity
– mechanisms

Look at it as a high dynamic process
– flux of tRNAs
– movements of the ribosome (internal and external)
– much slower than photosynthesis, so easier to observe.

Can we track this process in real time?
– Try: Label the ligand involved in translation.
– Problem: solution averaging destroys signal (many copies of ribosome get out of sync FAST.) would require single molecule monitoring
– Solution: immobilization of single molecule – also allows us to watch for a long time

Single molecule real time translation
– Functional fluorescent labeling of tRNAs ribosomes and factors
– surface immobilization retains function.
– observation of translation at micromolar conc. fluorescent components
– instrumentation required to resolve multiple colors
– yes, it does work.
– you can tether with biotin-streptavidin, instead of fixing to surface
– immobilization does not modify kinetics

Tried this before talking to Pac Bio – It was a disaster. Worst experiments they’d ever tried.

– use PAcBio ZMW to do this experiment.
– has multiple colour resolution required
– 10ms time resolution

Can you put a 20nm ribosome into a 120nm hole? Use biotin tethering – Yes

Can consecutive tRNA binding be observed in real time? Yes

Flourescence doesn’t leave after… they overlap because the labeled tRNA must transit through the ribosome.
– at low nanomolar sigals, you can see the signals move through individual
– works at higher conc.
– if you leave EF-G out, you get binding, but no transit – then photobleaching.
– demonstrate Lys-tRNA
– 3 three labeled dyes (M, F, K)… you can see it work.
– timing isn’t always the same (pulse length)
-missing stop coding – so you see really long stall with labeled dye… and then sampling, as other tRNAs try to fit.
– you can also sequence as you code. [neat]

Decreased tRNA transit time at higher EF-G concentrations
– if you translocate faster, pulses are faster
– you can titrate to get the speed you’d like.
– translation is slowest for first couple of codons, but then speeds up. This may have to do with settling the reading frame? Much work to do here.

Ribosome is a target for antibiotics
– eg. erythromycin
– peptides exit through a channel in the 50S subunit.
– macrolide antibiotics block this channel by binding inside at narrowest point.
– They kill peptide chains at 6 bases. Are able to demonstrate this using the system.

Which model of tRNA dissociation during translation is correct
– tRNA arrival dependent model
– Translocate dependent model

Post syncrhonization of number of tRNA occupancy
– “remix our data”
– data can then be set up to synchronize an activity – eg, the 2nd binding.

Fusidic acid allows the translocation but blocks arrival of subsequent tRNA to A site.
– has no effect on departure rate of tRNA.

only ever 2 trnas at once on Ribosome. – it can happen, but not normally

Translocation dependent model is correct.

Correlating ribosome and tRNA dynamics
– towards true molecular movies
– label tRNAs… monitor fluctuation and movement

Translational processes are highly regulated
– regulation of initiation (51 and 3` UTR)
– endpoint in signallig pathways (mTOR, PKR)
– programmed changes in reading frames (frameshifts)
– control of translation mode (IRES, nromal)
– target of therapeutics (PTC124 [ribosome doesn’t respect stop codons] and antibiotics)

– directly track in real time
– tRNAs dissociate from the E site post translocation and no correlation…

Paper is in Nature today.

>AGBT 2010 – Bing Ren – UCSD

>Epigenomic Landscapes of Pluripotent and Lineage-Committed Human Cells

Sequencing of the human genome has led to
* identification of disease causing genes
* Personalized medicine
* advanced sequencing technologies
* Foundation for understanding the construction of human beings

But DNA is only half the story
* variations in DNA alone not account for all variations in phenotypic traits
* organisms with identical DNA often exhibit distinct phenotypes (eg plants, insects, mammals)
* Epigenetic changes contribute to human diseases, phenotypes, etc

We know about the mechanisms
* DNA is wrapped around histone proteins which can be modified
* DNA is itself modified (methylation)

[paraphrased] DNA is hardware, epigenome is the software (Duke university quote… missed author’s name)

* very complex
* varies among different cell types
* generally reprogrammed during the life cycle of tan organism
* Epigenome is also affected by environmental clues

How do we ecipher the “epigentic code”?
* sytematic approach
* large scale profindg of chromatin modification
* finding common modifications
* validation

* ChIP-Seq based. (started with Tiling arrays)
* use antibodies that recognize chromatin modification.

[beautiful pictures]
* Chromatin signature for the promoter and gene body
* H3K4me3 marks active promoters
* H3K36me3 marks gene body of active genes
* Signature has led to identification of thousands of long non-coding RNA genes.

Chromatin signatures of enhancers
* Can use information about modifications to model patterns
* predict enhancers in the human genome.
* 36,589 enhancer predictions were made
* 56% found in intergenic regions
* test a few with reporter assays – show that 80% of predicted enhancers do drive reporter genes. (Far fewer of the control sequences do – missed number)

Finding chromatin modification patterns in the genome de novo
(Hon et al, PLoS Comp Bio 2009)
* 16 different patterns of chromosome modification
* some are enhancers,
* others have no associations
* one has pattern highly enriched for exons.. regulates alt splicing.

* chromatin modification patterns could be used to annotate …
* Epigenome Roadmap project (Generate reference epigenome maps for a large number of primary human cells and tissues)

Datasets are available at GEO. (NCBI)

Mapping of DNA methyltion and 53 histone modifications in human cells
* Human embryonic stem cells (H1)
* Fetal fibroblast cell line

Method for mapping DNA methylation
* Ryan Lister and Joe Ecker (Salk)
* sodium bisulfite (C to U), if not methylated
* Must do deep sequencing. If using HiSeq – could do it in 10 days. Used to take 20 runs
* Methylation status for more than 94% of cytosines determined.
* 75.5% in H1, 99.98% in Fibroblast
* DNA methylation is depletee from functional sequences
* no-CpG methlyation is enriched in gene body of transcribed genes suggesting link to the transcription process

11 chromatin modification marks
* comparing cells: different results
* K9me3 and K27me3 become dramatically extended (7% in ES to more than 30% in fibroblast.)
* genes with above marks are highly enriched in developmental genes.

Reduction of repressive chromatins in induced pluripotent cells

Repressive chromatin domains occupy small fraction of genome which is maintained as open structure in stem cells

Repressive chromatin domains occupy large fraction of genome, keeping genes involved in development silenced in differentiated cells.

* widespread difference in epigenomes of ES and fibroblasts
* stem cells are characterized by abundant non-CpG methylation
* Expansion of repressive domains may be a key characteristic of cellular differentiation
* [Missed 2]

>AGBT 2010 – Jesse Gray – Harvard Medical School

>Widespread RNA Polymerase II Recruitment and Transcription at Enhancers During Stimulus-Dependent Gene Expression

Mamalian brain is [paraphrased] Awesome technology
* Sensory experience shapes brain wiring via neuronal activation
* Whiskers compete for real estate in meta-sensory cortex.
* Brain can re-wire to adapt to environment
* Transcriptional changes in nucleous as brain cells reprogram
* (Discussion in terms of real-estate for rat whisker areas of brain.)

Neuronal activation affects circuit function by altering gene expression
* Activity dependent gene expression

* Ca++ influx
* kinases & phosphatases
* recruit Creb binding protein
* Induce about 50-100x expression in genes (eg, fos)
* Can we do genome wide approaches to understand what’s being expressed?

An experimental system for genome-wide analysis of activity-regulatee gene expression
* grow in dish
* depolarize with KCl
* do ChIP-seq and RNA-seq

CBP and transcription factor binding at fos locus
* see CBP binding at conserved region up stream, as well as promotor for fos gene
* also see NPAS4 CREB and SRF with similar (but not identical) binding sites

Is the activity dependent binding CBP restricted to the locus or genome wide?
* compare CBP peaks in both conditions
* binding appears limited to KCL stimulated only.

Are CBP-bound sites enhancers or promoters or both?
* Promoters don’t necessarily drive transcription
* Promoters have H3K4Me3 histone modifications (enhancers dont)
* 3d configuration to bring enhancers together with promoters.

Most CBP peaks are not at TSSs and do not show H3K4Me3
* 5079 at TSSSs
* 36,069 not at TSSs

Align all seq that are enhancers
* there is much M3K4Me1 (clear pattern)
* there is not much M3K4Me3

Use known site
* upstream from Arc – used to build a construct

CBP and HK4Me1-marked loci function as activity-dependent transcriptional enhancers.
* Found 8 enhancers

* about 20,000 CBP sites that are activity-regulated enhancers
* do not correspond to annotated start sites
* H3K4Me1 modified
* lack H3K4Me3 mark
* do not initiate long RNAs
* confer activity-regulation on the arc promotor

Questions about activity-regulated enhancers
* do they play a role in binding RNA Polymerase II?
* Evidence is tending towards saying that most enhancers do not seem to have RNAPII binding.

fos enhancers bind RNAPII
* use chip for RNAPII and CBP
* 10-20% of sites have RNAPII at enhancer
* potential artifact – crosslinking conditions may exaggerate this by tying promotors and enhancers.

Does RNAPII at enhancers synthesize RNA?
* Enhancers at the fos locus produce enhancer RNAs
* non-polyadenylated RNA? Yes.
* you do get some transcription at enhancers… [doesn’t this start to describe lincRNA?]

Enhancer transcription is correlated with promoter transcription.

The Arc enhancer can be activated without the presence of the Arc promoter
* increases in polymerase binding at enhancer even when promoter is gone.
* preliminary – but may not be transcription when the promoter is gone.
* what is the function of eRNA transcription? (don’t know the answer yet)
* Could be that it helps to lay down epigenetic marks.

>AGBT 2010 – Keynote: Henry Erlich – Roche Molecular Systems

>Applications of Next Generation Sequencing: HLA Typing With the GSFLX System

High Throughput HLA typing
* the allelic diversity is enormous
* Focussing on HLA class I and II genes (germ-line)

Challengeing because it’s the most polymorphic region in the genome
* HLA-B has well over 1000 alleles
* only 68 different serological types can be distinguished
* 3,529 genes at 12 loci as of April 2009
* chromosome 6
* Can’t be typed using existing conventional techniques [I assume in high throughput]
* DR-DQ region – involved in type I diabetes
[Much detail here, which I can’t get down fast enough with any hope at accuracy.]

Polymorphism is highly localized.
* virtually all of the polymorphic amino acid residues are localized to a groove.
* most allelic differences are protein coding.
* critical to distinguish known alleles

* eg HLA-A * 24020101
* only the first 4 numbers are the ones that distinguish the protein.

Survival curve for bone marrow transplant
* even with 8/8 allele matches, there are WAY more things that need to be matched – and so you need the best possible match.
* a single coding mismatch can cause graph vs host disease.
* Bone Marrow matching requires high precision

[List of disease applications – 22 different diseases including Narcolepsy, cancers, drug allergic reactions..]

GWAS in Type 1 diabetes.
* identified disease related genes – HLA SNPs are significant
* Dr-DQ haplotypes are associated strongly with Odds ratio for diabetes
* looking at genomic risk factors increase up to 40x

[something about a particular combination of DR-DQ giving VERY high risk, and consequently is never seen in humans…]

* Dot blots… evolved into Probe Array Typing System.
* Even if you have hundreds of probes, you still have “HLA Genotye Ambiguity”
* “Fail to distinguish alleles” without NGS (with or without phasing..)

[Explanation of how 454 works – protocol]

* amplify exons with MID primers/emPCR/sequence

Benefits of clonal sequencing
* set phase to reduce ambiguity
* allow amplification and sequencing of multiple members of multi-gene family with generic primers
* allow sorting /separation of co-amplified sequences from target sequence (signal)

Parallel clonal sequencing of 8 loci x 24 samples

[More protocol… ]

Graph of read length : around 250bp

Connexio Assignment of DRB1 Genotype
* image reassuring to a HLA researcher.
* like the interface (plug for the company)
* aligns sequence, consensus sequence, does genotype assignment
* [Must admit, the information on this interface is rather mysterious to me…]
* [Several more slides of Connexio data and immunology types that mean nothing to me.]
* get a genotype report…


Testing on SCIDS patient
* patients are potentially chimeric
* look for presence of non-transmitted maternal allele
* can find stuff in “fail layer” because software assumes only two alleles possible.

[Wow… I know I don’t know much immunology, but I’m not getting much out of this. This is a lot of software for immunologists, and I really don’t understand the terminology, making it challenging to get coherent notes.]

Takes about 4 days – [says 5-7 on the slide]
* amplicon prep
* emulsion
* DNA bead process
* loading wells
* sequencing on GSLFX
* Data analysis

[Missed slide on how much data they were getting – 1M reads?]

Multiplex – 500 samples in one run
* Got good results [not copying down seemingly random DRB numbers…]