Sunrise on the west coast…

While all of you have been busy with work, I was up early to do some photography.  It was a cloudy morning, but the 2cm of fresh snow kinda made up for it.  I uploaded a few others to my photography site on monday evening, and if I get around to it, I’ll put a few more up tonight.

For now, here’s my favorite from this morning.

Yes, the black dot on the left is my dog.

Monday Afternoon in Tofino.

After a long drive, we’ve made it up to our hotel on the West Coast.  It’s unbelievably pretty.

We’re expecting gale force winds and 4m waves this evening, which should make a wonderful show from our beachside room – a view not too different from the one above.

But if you’ll excuse me, I have some relaxing to do

If you’re wondering where I am this weekend…

I’m on vacation this week, although I’m not going to be far from the computer, I suspect… but I did manage to take advantage of some of the nice weather on Vancouver Island.We spent part of our afternoon combing the beach for washed up bits of blue and green sea glass.

At any rate, tomorrow we head off north to Tofino.  If you’re looking for me, try the beach… or the old growth forests.  Or, maybe sitting by the fireplace.  (-;

Ubuntu 11.04 (natty), xorg 1.10 and Nouveau… it’s alive!

Not a very science-y post, but one of my hobbies is tinkering with my computers and in particular, playing with experimental operating systems.  It’s just one of those things – I like seeing what’s new, trying things out, and learning how to fix them when they’re broken. And, for the last week, my computer has been pretty broken.

When I got back from AGBT, I ran a large batch update, which included the dreaded experimental Xorg version 1.10.  Not that its bad code, but it’s completely incompatible with the Nvidia drivers that worked on 1.09 – and with new nvidia proprietary drivers unlikely to be released in the next couple of weeks, xorg 1.10 is likely to continue being thrashing people’s systems for a while to come.  Had I read the post warning people, I probably would have held off on the update, but alas, I hadn’t.

However, after doing some research last week, I was able to get the “nouveau” driver working on my system.  It isn’t a particularly difficult install, and mainly involves making sure you have the driver installed, then switching your xorg.conf file to say

"Driver         "nouveau"
#"Driver        "nvidea"

I did have to switch to using “Twinview” for my dual monitor setup, but that could all be done through the settings in the control panel. (No messing with the xorg file required.)

However, the nouveau driver, by default, does not come with 3D support enabled.  I’m led to believe it is still experimental, but if you’d like to run compiz, you do require 3D graphics.

Thankfully, after a week of 2D desktop usage, I found out that you can add on 3D acceleration quite easily, simply by installing one package:

sudo apt-get install libgl1-mesa-dri-experimental

Once installed, you simply need to restart X (or the computer, either one works) and 3D support will be there!

I tested out compiz first with the 2008 utility “compizcheck”, which will warn you that nouveau isn’t known to support compiz. (After all, in 2008, nouveau was a pretty lean package.)  But, the waning can be suppressed, and if all else is good, will give you the green light.

At that point, if all looks good, you can give compiz a whirl with the command:

compiz --replace &

If everything is working, you’ll see a bunch of status lines showing things being initialized and updated, and no errors.  If not, you can simply restart your previous window manager (in my case, it’s kwin) with the similar command:

kwin --replace &

Either way, I’m happy to report that everything is now working for me – I’ve got Ubuntu 11.04 (natty) running happily with Xorg 1.10 and the Nouveau drivers.  I won’t say it’s perfect, as I’m still seeing artifacts during 3D animations (for instance, during the rotation of screens for the desktop cube, an extra copy of the desktop in a partially rotated position flashes for about 1/2 a second.)  but for the most part, things are working pretty well.

Way off topic – using nature inspired technology

I saw this video via this afternoon.   Normally, I just listen to this stuff as background noise, but it really captured my attention, and consequently I blew 15 minutes listening to the talk.

It’s described as follows on the TED website: “Michael Pawlyn takes cues from nature to make new, sustainable architectural environments.”  However, it’s also about closing the loop in design, reversing desertification and building a better planet.  If you’ve got 15 minutes, I suggest watching it.

The small things that matter…

I’m sure you can take that title in many different ways, but I have a specific thought in mind – which is probably not what you expect.  And yes, it does eventually come back to science.

About a year and a half ago, the Genome Sciences Centre renovated its lunch room and took out the ping pong table.  Having lost my best procrastination tool, and probably most of the exercise I was getting, I decided it was time to return to fencing, which I’d done in high school and my first year of undergrad.  There’s a fencing club about a 10 minute walk from my house, which makes it pretty convenient too.

After about a year of participating in the intermediate classes on Mondays, I’ve now started going to the free fencing sessions on Thursday.  What’s immediately obvious is that there are two groups of fencers – generally those that are young (<20 years old) and have been fencing for probably less than 5 years, most of whom show up on Mondays, and the older group, most of whom have been fencing more than a decade or two and clearly don’t need lessons anymore. I usually do reasonably well against the younger group – and I’m absolutely slaughtered by the older group.  No surprise.  The older fencers have better technique and that will always win against speed and agility.

In my first match last night, I was devastated 5-1 by a guy who offers private lessons.  No big surprise really, but, in sheer frustration, I stepped back and spent a few minutes watching him in his next match.  Superficially, we do the same things, but upon careful observation I noticed that the biggest difference was simply that he held his blade 15 cm lower, covering his lower body more efficiently.  That’s it.  Just a tiny change in how you stand.

With nothing to lose, I switched my position to mirror his and immediately went from outright losing my matches with the older group to tying them.  I even saw frustration on my opponents faces now and then…. and I held the Monday evening instructor, who always wins against me without breaking a sweat, to a draw. (Normally you don’t draw in fencing, but the wire in my blade broke when we were tied at 4.  Good enough for me!) All in all, I’m thrilled with the change – and have fewer poke marks to show for it today!

Of course, I frequently use fencing as a metaphore for science, so I’ve been thinking an awful lot about mentors and having good examples to follow today – and how that fits into my future career.  I’ve been incredibly fortunate to be thrust into an environment where I’m surrounded by people who excel at their field.  Now, I think it’s up to me to watch and learn.

For me, this translates into a question of what guidance I’m missing.  The process of writing and planning papers is always done behind closed doors and is hard to watch – and is something I would really benefit from being seeing how other people do it.  When it comes to thesis writing, or application notes, I’ve got a few under my belt, but for some reason, I find papers more daunting.

To get to my point, All of this had me wondering what other small details graduate students are missing.  What are the tiny details that you’ve discovered that can make all the difference in getting things done right?

URLs as references

This past week, I submitted a final draft of an application note on some software I’d written (and am still writing, for that matter), and had it rejected twice because I’d included a URL as a reference. (The first time, I failed to notice that I’d cited postgresql 8.4 with a URL, in addition to Picard.)   As both a biochemist and a bioinformatician, I can see both sides of the story as to why that would be the case, but it still irked me enough that I thought it worth writing about.

If you look back 30 years ago, there really wasn’t an internet, and so this wasn’t even an issue on the horizon.  How you cite non-peer reviewed material was the same way you cited anything else: you gave the author’s name, the date of publication and the publication company – books were books, regardless of who paid to have it published.  Publications were all copyrighted by some journal, and scientists would read articles in the library.  Access to scientific information was restricted to those who had access to universities.

20 years ago, the Internet was a wild frontier, mostly made up of an ever changing network of modems.  what was on one computer might not be there next time you connected.  Hard drives failed, computers disconnected – and no one put anything of great value on bulletin boards.

15 years ago, web pages began to pop up, URLs entered into public consciousness and editors may have had to face the issue of what to do about self-published, transient information:  Ban it.   That was the response, as far as I can tell.  Why not?  It might not be there 2 days later, let alone by the time articles went to print.  A perfectly reasonable first reaction to something that failed to meet any of the criteria for being a reference.

Just over 10 years ago, we had google.  Suddenly, all of the information on the web was indexable and you could find just about anything you needed.  You could before that too, but getting from place to place was a mess.  Does anyone remember Internet Yellow Pages, where URLs were listed for companies?  Still, information then had a short shelf life.  Even the WayBack Machine archive was young, and information disappeared quickly.  Still unsuitable for referencing, really.  You could count on companies being there, but we were still in the days that urls could change hands for a fortune.

5 years ago, social media invaded – now you had to be online to keep up with your friends.  But, there was also a major shift behind that – bioinformatics went from being just a series of perl scripts to being composed of major projects.  Major projects went from being small team efforts to being massive collections of software.  We also saw the adoption of web tools, many of which weren’t published, and probably never will be.  We went from dial up to broadband… we went from miscellaneous computers to data centers.  We went from hobbyist software projects to sourceforge.  In short, the Internet matured, and the data it held went from being a transient thing to being a repository of far more knowledge than any book source.

It didn’t, however, become peer reviewed.  Many people no longer consider the Internet to be transient, but with major influences like wikipeda, which is unreliable as a reference at best, we don’t often think of URLs as being a good reference.   But how is that any different from books?

Unfortunately, somewhere along the line, I think journal editors confused their initial reasons for rejecting URLs (the transient nature) with something else: the lack of peer review.  No editor would bat an eye at citing a published book, even if that information was not peer reviewed, but citing wikipedia seems like such a terrible idea that perhaps the slippery slope fallacy has reared it’s ugly head.

For bioinformatics, many of our common tools aren’t built by scientists any more, or if they are, they’re open source: the collaborative work of many people, which means they’re not going to be published.  Many of them are useful toolkits that don’t even make sense to publish – but they are available on the web at a fixed address that doesn’t expire.  Unlike commercial products, open source projects may die, but they never disappear when they’re hosted at the likes of sourceforge – which means they’re no longer transient.

While common sense and many colleagues just tell me to get over it and just “put the URL in the text”, I fail to see why this is necessary.  Can’t editors see that the Internet is no longer a collection of random articles?

Hey Editors, there’s far more to the Internet than just wikipedia and facebook!

(NOTE: Ironically, as I write this, Sourceforge is doing upgrades on it’s web page, and some of the projects they host have “disappeared” temporarily…. but don’t worry, they’ve promised me that they’ll be back shortly.)

Refelections on AGBT2011

[Written Sunday, Feb 6th, while flying home to Vancouver.]

From my vantage point at 30,000 feet, I can take a good look back at this year’s Advances in Genome Biology and Technology conference.  There were certainly a few things that were worth reviewing.

The one change that impacted me the most was the surprise policy on blogging and tweeting.  Judging by the reactions of the people I spoke to, it wasn’t really on anyone’s radar until the the slide describing the policy went up – and of course when Elaine Mardis announced that the default of the conference would be opt-in, rather than opt-out.  Some of the surprise in the audience appeared to come from people who had never heard of twittering until the word appeared here, while others appeared quite enraged by the existence of a policy at all.

Not to belabour the point, but AGBT exerted a fair amount of forward thinking by having a policy at all – and I was glad that they had one – however, much of the confusion or “controversy” of the policy could have been avoided by educating the speakers on what twittering is – and what it means.   As for being a trend-setter in the science world, they could have done a better job by providing a more reasoned approach to it, however, by not being entirely draconian and banning it, they’ve already taken the first step, and realized that social media isn’t going away.  I’m certain that AGBT2012 will be a much more dynamic event, if social media becomes better integrated.  (I understand that there was even talk of a separate screen showing the tweets in real time, but that the organizing committee wasn’t quite ready for that yet.)

That aside, there were some fantastic highlights, and one that stands out for me is Pacific Bioscience’s lunchtime talk.  It was a great departure from previous years where they told big stories without a lot of specifications.  Don’t get me wrong – I still love the technology, but I think the more humble approach sits better with me.  Whereas in previous years, we weren’t given many details, the candid admission that Pac Bio isn’t going to displace all of other technology, but will rather supplement them, is a big step forward in the vision.   Furthermore, it’s also interesting to hear that single molecule sequencing isn’t going to have the same accuracy as the “ensemble” or colony sequencing devices, where base calling can average over the signals of many molecules.  Fortunately for them, they were able to demonstrate exactly why that doesn’t matter.

Pac Bio’s Eric Schadt also gave a talk that created quite a stir on the last day with some big ideas, including constant virus monitoring for effluent management stations and some very “big brother” like monitoring of DNA passing through Pac Bio’s own headquarters.  (I am a little worried about the pig and chicken DNA splattered around their kitchen, but at least they asked their employees to volunteer for the nose swabs used in some of their pet projects.)  Frankly, even if their technology is perfect, I’m not sure I buy into the “sequencing every restaurant” model.  Yes, it would drive sales of their technology, but is it really necessary to have a virus weather map in real time?  What actionable results would it give?

Unfortunately, the other hot topic of the conference was from a session I couldn’t attend myself. In the upcoming technology group, several groups were discussing work on nanopore sequencing, which appears to be showing significant progress.  While I missed those talks, I did see the talk on replacing alpha hemolysin channels with an MspA protein channel – and results presented from that talk did look very good.

Other things of interest included the usual travel delays and cancellations – once again the conference organizers managed to plan the conference for days with some of the worst storms – and as usual, they affected just about everyone.  I found myself thinking that Dallas looks better without snow – which would have seemed like a nonsensical thought if I hadn’t seen it first hand.  Fortunately, deicing delays in Dallas were the worst I had to put up with.

The conference was also notable for what wasn’t talked about as well.  I didn’t see any new methodologies presented, nor were there that many people talking about novel experiments.  People seem to be digesting a year of incremental improvements in the sequencing technology and the massive jumps in the amount of data available – which led to a lot of pipeline focused talks, as well as attempts to apply the information to influence clinical outcomes.  Those are both signs of maturity of the field, but a little disappointing from a conference called “Advances in Genome Biology and Technology”.

Another thing that has dropped off the radar was chip-seq.  Two years ago, EVERYONE was talking about it and there were tons of posters on the topic.  As a chip-seq software author, the complete lack of epigenetic posters, and the casual way in which epigenetics was mainly absent from talks, was pretty stark – and it makes me wonder if we’ve all just accepted the field the way it is, or if people have simply had enough of it and have given up.

I suppose one also has to mention the lack of bioinformatics talks as well.  I won’t say there weren’t any, as I did see a few, but in most cases, the presenters simply listed the tools they used on one slide, and moved on. The biology also took a back seat to the clinical applications, when possible.  It almost appeared as if the hierarchy of topics started with tear-jerking stories, moved through some systems biology, passed briefly over the bioinformatics, and maybe included some hardware issues.  Of course, that’s just my sampling – other opinions probably differ.

Probably the best part of the conference, for me, was the networking.  I met a people who are pioneering in bioinformatic business models or VC funding for life science, in addition to the academics and corporate scientists.  That diversity really makes me appreciate the experience of going to a conference like this.  As a fellow Canadian pointed out during the feedback session, the vendor parties also made for fantastic networking sessions and had struck a great balance between networking time and learning time.  (I should also point out that the number of twitterers I met was pretty incredible.  It was really neat to put faces to icons…)

Of course, I probably lost out on some of the free time by trying to blog all of the talks, but I’m told that my notes enabled other people to skip out on a few talks themselves and get in some beach time.  Although, despite the work that goes into the notes, I really enjoyed the process.  Without them, I wouldn’t have met as many neat people as I did.

Having been to the AGBT conferences for a few years, now I’ve had the opportunity to watch the conference evolve as well.  In the beginning, there were tons of slides on the sequencing technologies being presented – from the structures of the fluorophores to animations of the sequencing process, which have all dried up. All of the companies are now competing and struggling with similar chemistry issues, and of course, this information is now top secret.  But, it does mean that the devices have gone from being applied chemistry to being “black boxes”, of which the researcher is just expected to pick the one that fits best to the experiment they have planned.  In a way, that completely validates the Complete Genomics business model, in which I expressed so much skepticism when I first met them at past AGBT conferences.

And speaking of Complete Genomics, a last point would have to be the much re-tweeted announcement of 60 publicly available genomes from Complete Genomics.  It was clearly a signal that Complete Genomic’s model is working, and that they have nothing to hide – I know we already have people downloading them and I’m looking forward to seeing the data when I get back to work on Monday.  Whatever skepticism I may have had about what they can do is clearly being dissipated – and they definitely have my respect for going out of their way to engage the community. (It’s noteworthy that they’re the only company that actively makes an effort to talk to bloggers and twitterers, even though 8am breakfast meetings with shy and sleepy bioinformaticians can’t be all that exciting.)

I’m sure there were other things that would be worth mentioning, and several of the talks above are worthy of entire discussion sessions, but I think I’d like to wrap this up neatly.

With the rapid advances of the past years giving way to small but constant improvements in sequencing technology (mostly via chemistry improvements), this conference will likely find themselves re-evaluating the focus of their sessions in the near future.  I certainly don’t think that the advances will dry up any time soon, but I’m not sure that they’ll be able to bring in the same crowds in the future.  With the increasingly “black box” technology platforms and the lack of new methods being presented, I’ll be very curious to see where the conference goes in future years – and of course, how they adapt to the pressures of social media.

AGBT talk: Zhong Wang, Joint Genome Institute

Title: Massive Metagenomic Discovery of Biomass-Degrading Genes and Genomes from Cow Rumen

First slide: “biofuels”, “cellulosic Ethanol”, and “genomics” [I think I see where this is going. This was the hot topic in 2003, but I haven’t heard much about it since.]

Overview of Lignocellulose structure and cellulase.  [My shorthand – lignocellulose is broken down by a whole series of enzymes, each which breaks down a different bond in the link depending on branch points, etc.  It is also a semi-crystal state, which has hard to break down.]

All cellulase we use industrially comes from one source: fungal source.]

[Oh my… fistulated cow.  I remember that from visiting universities when I was in high school. It’s a cow with a hole in it’s side so you can get into it’s stomachs any time.]

Using cow to digest switchgrass, and looking for microbes that do the breakdown.

[Odd, wouldn’t you want to do that with corn cellulose, which is plentiful and a wasteproduct of animal feed/etc?]

Did 3 billion reads, 300,000Mb  (1/4 TB of sequence).  Hoping to find new enzymes in this.  [On the other hand, wtf do you do with that much sequence?]  This was like a monster!

Taming the monster: prediction: needed huge hardware. [skipping this…] More cellulases found than other studies. A comparison of Carbohydrate Active Enzymes (CAZy) database.  More found in rumen than were collected in database between 1975-2009.

Diversity: very pretty picture of family tree of cellulases.  Found many new branches – and those found were highly diverged [ which makes sense to me, since the microbiome sequencing this morning said that gut bacteria were the only ones that were really most strongly diverged…. ]

Functional validation.  Panel of cellulase substrates, plus cow rumen enzymes.  Higher the activity, the more novel.

Did they get to the bottom of the metagenome? From saturation plot, it’s linear, never saturates out.

Image “Look what I found in the cow!”

Summary: a large number of cellulases were predicted, found and tested and many have excellent potential for new industrial uses.

Community complexity: cow is intermediate between extreme environments (mine water run off) and soil communities.

Assembly: Used Velvet, 1.93Gb sequences assembled. 47 scaffolds match NCBI, which is only 0.03%.  We know very little about this community.

[on a side note, does “fistulating” a cow change the gut flora community???  that would add other odd questions about the diversity of the cow, and particularly oxygen sensitive members of the community, but I guess those enzymes are mostly useless to us.]

Were able to estimate completeness of some assemblies – one example shown at 89.8% with “genome binning”.  With random binning, you do worse.

From cow genome, were able to assemble 15 good draft genomes. (1.8-3.3h Mb)

Did “Single cell genome sequencing”..  Match reads to assembled scaffold : from single organism.  So, it works.

Conclusion: despite super deep sequencing, were only able to assemble 15 genemes.  Pac Bio may help.  Have already tested some Pac Bio long reads, which do help further assemble.  90% of pac bio reads to validate and resolve outstanding assembly problems.

[Neat and though provoking talk!]

Question: Have you sampled other cows? (nope this was all from one cow!)

Mark Akeson – Baskin School of Engineering, UCSC

Nanopore DNA sequencing: Precision and Control

[I don’t think this is one of my better note sets – the technology is neat, the results are fun to watch, but you can’t capture such a rich data stream in a blog… sorry.]

Two types of nanopore sequencing: Exonuclease Sequencing, Strand Sequencing.

Exonuclease cleaves bases, so you don’t move backwards, strand sequencing converts ssDNA to dsDNA so it doesn’t move backwards through the pore.

Not going to talk much about how channels work – basic idea, charge potential across membrane, resistance changes as things move through. Charge per unit area per unit time is VERY strong.

Good new: non-covalent chemistry determines currents for GATC. (non-covalent chemistry, not size, dictates current.)

History of nanopore seq.

  • ( John Kasianowiz @NIST) alpha hemolysin pore to measure current. worked molecule by molecule – not single base.
  • Simulation showed single bases are passing through in single file.
  • Wild Type alpha hemolysin pore.  Ten nucleotides contribute to pore resistance.
    • convoluted by 3 “reading heads”
  • Did protein engineering to modify by site directed mutagenesis till they got one that could distinguish all 4 bases. (Bayley group)
  • Jens Gunlach lab: Moved to MspA.  Analogy: Alpha homolysin is more like a champagne flute, where MSp has a 1nt width gap at the bottom of a shot glass like pore
    • Better separation of CTA,, but AG still overlap, but far better than hemolysin.

Polymerase and nanopores:

DNA replication in a crystal (A family polymerase).  [Ok, that’s cool, the crystal is still active, so you can take images over time to observe the chemistry happen!]

Sub-millisecond active control of DNA template control.  Seredipitous discovery: At end of peak, there’s a voltage change IF the enzyme is departing, so it works as a good control.

Neat experiment where ssDNA is bounded by dsDNA on both sides of the pore, can watch the processes back and forth.

Tethering polymerase to pore isn’t bad – can kill pores.

Blocking Oligomers [Very graphical flow. I can’t describe this fast enough.]

Polymerase catalyzed synthesis to +12 Endpoint (Klenow fragment) and shown in a movie.  Pulsing 20s intervals.

is phi29DNAP better? 2ms vs 2 seconds.  Applied voltage causes phi29DNAP to tease apart dsDNA absent catalysis.

[This whole talk is images of results, by the way, nearly impossible to get much down that explains the content.]

Movie and results (Leiberman et al JACS 2010)

A ‘Branton’ test: use mix of dntp w ddATP.

Basically, all of the requirements for what’s needed to do nanopore sequencing … but lots letf to do.

  • Seuqening and re-sequencing  individual native DNA
  • Read lengths longer than inudustry standard
  • Sequencing across ethernet.