Canadians debating the World Series

My wife and I are both Canadian and baseball to us is about the same as mayan handball is to the rest of the world – a complete non-issue. So this morning, when my wife announced that the Cardinals had won the World Series, we had an interesting conversation:

Me: “Cardinals… that’s St. Louis isn’t it?

Her: “I don’t know.  The TV said Texas and St. Louis yesterday.”

Me: “So it is St. Louis, then.”

Her: “How do you know?”

Me: “The other team had ‘Texas’ written on their shirts.”

Her: “Oh, then I guess St. Louis won!”

And now we know – St. Louis won the world series.

Letting the Cat out of the Bag.

It is finally official – I’ll be leaving Canada and going to Europe (Denmark) in December – joining the team at CLC bio in just over a month. You’ll have to excuse my holding off on letting everyone know.  Of course, things have been in the works for some time yet, but the last few pieces have only clicked into place this week.  And, of course, one doesn’t want to jump the gun by announcing these things before everything is in place.

Of course, this doesn’t mean I’ve finished my PhD yet.  There are a still a few more hurdles – my thesis has to go through my committee and the external examiner, and I still need to officially defend it – but it was looking like the soonest that could happen would be February, and with everything going on, my wife and I decided it would be better to just start the process of settling in to Denmark as soon as possible.

So, consequently, if you read my blog, you’ll probably hear a little bit more about some topics that are currently on my mind: learning Danish (lære Dansk), traveling, maybe some cultural collisions (Danish people don’t have closets?)  and possibly some photography, depending on how busy I am.  (Yes, now that I’m not actively writing my thesis for 6-8 hours a day, I seem to have more time.)

But don’t worry – in the next month, I still have a few things I want to blog about, and likely a few papers to review.  Even though I’m leaving Grad School, I’m not leaving science behind.

To be candid, I’m looking forward to starting up at CLC partly because of the job, which already sounds pretty awesome, and because of the people.  I’ve met some of the people I’ll be working with – albeit briefly – and I’m excited to have the chance to work with them.  I can honestly say that they one one of the nicest groups of people I’ve ever met.  Must be something in the water. (-;

Anyhow, to complete the circular nature of this post (like all good fugues, which is the way to write a good post, particularly if you’ve read Gödel, Escher Bach, if that’s not getting way to involved) I have one last point to clarify. As foreshadowed lightly by the title of this post, yes, my pets will be coming with me – and undoubtedly my cat will be thrilled to be let out of the bag once we’ve arrived in Denmark… so the moving process will be bookended, effectively, by letting cats (figurative and literal) out of their respective bags.

Ollie - My wife says we have the same nose.

Where’s the collaboration?

I had another topic queued up this morning, but an email from my sister-in-law reminded me of a more pressing beef: Lack of collaboration in the sciences. And, of course, I have no statistics to back this up, so I’m going to put this out there and see if anyone has anything to comment on the topic.

My contention is that the current methods for funding scientists is the culprit for driving less efficient science, mixed with a healthy dose of Zero Sum Game thinking.

First, my biggest pet peeve is that scientists – and bioinformaticians in particular – spend a lot of time reinventing the wheel.  How many SNP callers are currently available?  How many ChiP-Seq packages? How many aligners?  And, more importantly, how can you tell one from the other?  (How many of the hundreds of snp callers have you actually used?)

It’s a pretty annoying aspect of bioinformatics that people seem to feel the need to start from scratch on a new project every time they say “I could tweak a parameter in this alignment algorithm…”  and then off they go, writing aligner #23,483,337 from scratch instead of modifying the existing aligner.  At some point, we’ll have more aligners than genomes!  (Ok, that’s a shameless hyperbole.)

But, the point stands.  Bioinformaticians create a plethora of software that solve problems that are not entirely new.  While I’m not saying that bioinformaticians are working on solved problems, I am asserting that the creation of novel software packages is an inefficient way to tackle problems that someone else has already invested time/money into building software for. But I’ll come back to that in a minute.

But why is the default behavior to write your own package instead of building on top of an existing one?  Well, that’s clear: Publications.  In science, the method of determining your progress is how many journal publications you have, skewed by some “impact factor” for how impressive the name of the journal is.  The problem is that this is a terrible metric to judge progress and contribution.  Solving a difficult problem in an existing piece of software doesn’t merit a publication, but wasting 4 months to rewrite a piece of software DOES.

The science community, in general, and the funding community more specifically, will reward you for doing wasteful work instead of focusing your energies where it’s needed. This tends to squash software collaborations before they can take off simply by encouraging a proliferation of useless software that is rewarded because it’s novel.

There are examples of bioinformatics packages where collaboration is a bit more encouraged – and those provide models for more efficient ways of doing research.  For instance, in the molecular dynamics community, Charmm and Amber are the two software frameworks around which most people have gathered. Grad students don’t start their degree by being told to re-write one or the other packages, but are instead told to learn one and then add modules to it.  Eventually the modules are released along with a publication describing the model.  (Or left to rot in a dingy hard drive somewhere if they’re not useful.)   Publications come from the work done and the algorithm modifications being explained.  That, to me, seems like a better model – and means everyone doesn’t start from scratch

If you’re wondering where I’m going with this, it’s not towards the Microsoft model where everyone does bioinformatics in Excel, using Microsoft generated code.

Instead, I’d like to propose a coordinated bioinformatics code-base.  Not a single package, but a unified set of hooks instead.  Imagine one code base, where you could write a module and add it to a big git hub of bioinformatics code – and re-use a common (well debugged) core set of functions that handle many of the common pieces.  You could swap out aligner implementations and have modular common output formats.  You could build a chip-seq engine, and use modular functions for FDR calculations, replacing them as needed.  Imagine you could collaborate on code design with someone else – and when you’re done, you get a proper paper on the algorithm, not an application note announcing yet another package.

(We have been better in the past couple years with tool sets like SAMTools, but that deals with a single common file format.  Imagine if that also allowed for much bigger projects like providing core functions for RNA-Seq or CNV analysis…  but I digress.)

Even better, if we all developed around a single set of common hooks, you can imagine that, at the end of the day (once you’ve submitted your repository to the main trunk), someone like the Galaxy team would simply vacuum up your modules and instantly make your code available to every bioinformatician and biologist out there.  Instant usability!

While this model of bioinformatics development would take a small team of core maintainers for the common core and hooks, much the same way Linux has Linus Torvalds working on the Kernel, it would also cut down severely on code duplication, bugs in bioinformatics code and the plethora of software packages that never get used.

I don’t think this is an unachievable goal, either for the DIY bioinformatics community, the Open Source bioinformatics community or the academic bioinformatics community.  Indeed, if all three of those decided to work together, it could be a very powerful movement.  Moreso, corporate bioinformatics could be a strong player in it, providing support and development for users, much the way corporate Linux players have done for the past two decades.

What is needed, however, is buy-in from some influential people, and some influential labs.  Putting aside their own home grown software and investing in a common core is probably a challenging concept, but it could be done – and the rewards would be dramatic.

Finally, coming back to the funding issue.  Agencies funding bioinformatics work would also save a lot of money by investing in this type of framework.  It would ensure more time is spent on more useful coding, more time is spent on publications that do more to describe algorithms and to ensure higher quality code is being produced at the end of the day.  The big difference is that they’d have to start accepting that bioinformatics papers shouldn’t be about “new software” available, but “new statistics”, “new algorithms” and “new methods” – which may require a paradigm change in the way we evaluate bioinformatics funding.

Anyhow, I can always dream.

Notes: Yes, there are software frameworks out there that could be used to get the ball rolling.  I know Galaxy does have some fantastic tools, but (if I’m not mistaken), it doesn’t provide a common framework for coding – only for interacting with the software.  I’m also aware that Charmm and Amber have problems – mainly because they were developed by competing labs that failed to become entirely enclusive of the community, or to invest substantially in maintaining the infrastructure in a clean way.Finally, Yes, the licensing of this code would determine the extent of corporate participation, but the GPL provides at least one successful example of this working.

How to write a PhD thesis

After finishing yet another draft of my thesis, I thought I’d share some of the hard won pro-tips I’ve worked out along the way.  This, being my fourth thesis (one for each of my degrees, including my undergrads), was probably the best of the bunch. Let me tell you, by the time you’re on your fourth thesis, you’ve learned a few things.  So here they are:

Pick your Technology: The first step is to decide what software you’re going to use to write your thesis.  The minute you start, you’re going to be locked in with no hope of switching.  There really isn’t a way to transfer your work from one technology to the next, so pick wisely.  In the past, I’ve used Microsoft products, and I found them to be a very wasteful way to proceed.  This round, I picked LaTeX/Kile, and despite the 3-day intense learning curve at the front, things have gone very smoothly since. (Yes, only a programmer will appreciate the beauty of compiling a document – it’s not for everyone.)

Whatever you do, make sure you understand the ramifications of writing a 100+ page document in whatever software you chose.  A 10 page document in Word is nothing at all like a 100 page document in Word. (Hint: I found scaling with Word to be painful, particularly when illustrations are involved.)

Pick your Referencing System: Similar to the point above, in your thesis, references are your lifeblood – pick a system that works and start using it early!  Nothing says disorganization worse than trying to find “that paper” in a 50cm high stack.  The sooner you start organizing your papers, the better off you’ll be.

My trick was to use Texmed to get papers in the right format, then toss it into a .bib file with a quick annotation, so that I could figure out why I thought that paper was significant.  It cut down a LOT of time when it came time to add citations.

Organization: Writing a thesis isn’t hard, surprisingly.  The biggest, hardest part is organizing it.  My best trick was simply to write out alll of the headers, then write out in point form what I thought went into each section, then – one by one – expand the points into paragraphs.  Throwing in illustrations and references at that stage was a great help for writing as well – this helps you build the text around the information you have already gathered.

Once you have manageable small chunks to write, it was always easier to tackle them – and you never stress about having giant blank sections in front of you.

Communication & Feedback: Well, this is more of a wish list for my own thesis, but it would have made progress go quicker.  Once I had my outline of headings, I asked my supervisor to let me know if the topics I had included and left out were appropriate.  Keeping that loop tight can save time – particularly with the comments that said: “Please discuss topic X.”

Working with those who can provide you feedback is always a good idea – and the earlier you get it in the process, the easier it is to fix or prevent problems.

Keep track of your Progress: I found it very useful to keep a list of tasks (aka, sections) that I needed to write, and every time I finished a section, I could check it off the list.  Every time I did a revision, I’d write down a list of changes and then plow through that list, checking completed items off one at a time.  Not only is it better for making sure you aren’t missing things, it’s also easier to track your progress – which is the lifeblood of any major undertaking.

It becomes much easier to tell when you’re falling behind and need to pick up the pace, and it allows you to figure out what works best for you.

Set Goals and hold yourself Accountable: Yes, that may sound like fluff, but it made all the difference for me.  A colleague of mine is also working towards his thesis, and we came up with a great system where we meet once a week to share our accomplishments, and to admit our failures.  For added motivation, failing to accomplish your weeks goals got you a mark on the “beer list” – meaning you owe a beer to the other person.

Admittedly, owing a beer isn’t a huge motivation, but it does remind you of the consequence of not achieving your goals.  And missing your target too often will be a problem down the line, so this is a good way to make sure you’re staying on target.

I also have to say that in a project this big, there’s always the temptation to put off a goal for later because it’s just sliding by one day – and the final goal of the project is a long way away.  Don’t fall for it!  Those days add up, just like the beers.  Set your goals realistically, and then hold yourself to them!

Guilt management: This one always gets me.  When you have a project like this, it’s temping to think you need to work on it 24/7.  Realistically, that’s not going to happen, so figure out what schedule works for you.  My trick was to get up, spend 20 minutes reviewing my goals for the day – and then take a shower.  I do a lot of my best thinking in the shower, so it always helped me plan out what I wanted to write.  I’d then eat breakfast, while reading the days news… and then it was time to work.  On a good day, I’d work from 10:00 – 4:00 without much more than a few breaks.  Having put in a pretty solid 6 hours of writing, I’d go walk the dog and remind myself that that was a good day.  If I had goals to get done (see above), I’d return to it after dinner.

You are not expected to neglect your life to get your thesis done – but you are expected to focus on it during your prime working hours!  If you set aside a reasonable amount of time every day and meet your goals, that’s good enough.

Focus: That said, when you do set aside time to write, try to do it somewhere without distractions.  For me, a desk set in the corner of the room, facing a window that looks at nothing in particular was the trick.  Nothing outside to distract me, natural light to keep me awake and nothing around me to pull my attention away (except the cat) helped me get my focus and keep it for long periods of time.

Exercise: I can’t stress this one enough, really.  For me, getting out of the house a couple times a week for a good evening of fencing was really therapeutic.  When someone is swinging a sword at you, there’s just no room in your head for organizing chapters.  Setting aside my monday and thursday evenings to not think about thesis work, to get myself away from being sedentary in front of the desk, made a big difference. To paraphrase one of my favorite books (Microserfs), your body is not just a transportation unit for your brain!  Don’t forget to take care of it.

Cats: Do not let your cat write paragraphs for you!  Mine took a nap on the keyboard, and added 6 pages of “zzzzzzzzzzzzzzzzzz” to my thesis. Which leads me to:

Backups: Actually, since I was using LaTeX, I used revision control (SVN) to manage my documents.  Every hour or so, I’d check in my documents to make sure I had a copy.  I only ever used it once, but the one time I needed it, it was there – and it saved me several hours of trying to recreate something I’d lost.

Figures: Keep two copies – the original and the version you use in your thesis. And, keep them in the same place so you can always find it.  You will need to go back to the original many, many times, so having it handy – and separate from the one you’re using in the document can be a big help.

Remember you’re the Expert: The work you’re writing about is work you did.  No one knows it better than you do.  Write it out as if it’s a manual for someone who’s going to follow in your shoes… because someone probably will, and this is how they’ll continue on your legacy.

Enjoy it: Thesis writing is hard work, but so were the years you put in to get to this point.  Try to enjoy the process and make it as pleasant as possible for yourself – short of taking your laptop in to the bathtub, of course.

I’m sure there are other things, but that’s all that come to mind so far.  Feel free to add to the list in the comments.

Updates – Oct 2011.

So, my thesis with the requested changes has gone back to my supervisor.  The process we have going is pretty…  unstructured.  At this point, I’m not sure what happens next.  In theory, it should now go to my committee, but who knows when that will happen. My external examiner isn’t scheduled to get it until mid december, which means my thesis defense can’t be scheduled until February at the earliest. Thus, I more or less have  4 months of thumb twiddling penciled in, unless my committee decides they want me to do another experiment. (And I’d probably have some choice words about that plan.)

So, with that said, I have a couple projects to “wrap up”, if you consider it wrapping up to be a) starting a project that won’t appear in my thesis, and b) doing some maintenance work on an open source project that my committee disregards because its not biology.  At worst, that’s about a week’s worth of programming (probably 10-15 hours, really), and getting things organized for someone in the lab.  (Hopefully a few emails I can take care of this afternoon.)

So, that has left me pretty focused on the post-thesis phase.  While I do have plans, I’m just waiting for an airplane ticket to be booked before I announce things. Until there’s an actual date, I’m not sure I feel comfortable spreading the news just yet, just in case everything suddenly crashes down and things don’t work out the way I expect them to.  All I can say, for the moment, is that I’m incredibly excited by the job description and the opportunity to work with the people I have already met, and of course, to meet everyone else there.

(I should mention that I was horribly jet lagged when I met them all the first time, but they all left a great impression, even if I’ve forgotten a few names…)

So, on to blogging, which is the next big thing that I’m mulling over.  First, I haven’t discussed blogging with my prospective employer, so I’m quite sure where things will go.  Of course, I don’t think it’s appropriate to blog about one’s workplace in any great detail, but some corporate bloggers have done a great job of it by discussing issues important to the work place.  In any case, I can see myself doing a few things:

  1. Continuing along the same path, and blogging about next gen sequencing – with a slightly more corporate bent. (which is a hint about what I’ll be doing next.)  There will be plenty of NGS related topics that I’ll be watching, and I’m certain to have an opinion on many of them.  (Who’da guessed?)
  2. Continuing along the same path, but diversifying to other topics in science so as not to focus on science tangential to my work.
  3. Adding on new topics about moving to new lands (is that a hint?).
  4. Adding on new topics about things more personal to me. (photography, music, etc.)

yes, my version of foreshadowing is a bit heavy handed.

And, last but not least, the other two things on my mind:

I’m seriously considering releasing drafts of my thesis on the web.  Now that my supervisor has agreed that my biology project is not likely to lead to a publication (and yes, that was the bulk of my work for the past year), it’s unlikely to meet much resistance.  I won’t do it until I get a go-ahead from those affected by the work, but it’s a project I’m working on, as I’d love to get more feedback.

And, finally, I had plans to write out summaries of some of the papers I’ve read on my way to wrapping up my thesis.  I haven’t decided if I’ll do this yet, but it would probably be a great way to help study for my defense.

In any case, that’s what’s on my mind this afternoon. And now that I’ve gotten it all down, I can clear my mind and get back to some of the other things I’ve been neglecting this past week.  Whee… errands!

(K)Ubuntu – Oneiric Ocelot is out… meh.

Ok, I wasn’t going to blog until I’ve finished my thesis corrections, but I’ve spent the whole day formatting protein/gene names correctly, and I’m showing signs of brain calcification.  So, I’m going to do one post that has nothing to do with science.  I’m saving those for later.

Instead, I thought I’d give up a few thoughts on Ubuntu’s latest release: Oneiric Ocelot (11.10).

First, I have a few caveats.  I’ve been using Onieric for several months.  It’s something I like to do: load up alpha versions and watch them develop.  It keeps you sharp: when you have to trouble shoot things, you learn lots about how the operating system works and you get the pleasure of finding new software improvements all the time.  What’s not to like? At this point, I’ve been running alpha and beta versions on my laptops for about 4 years now, and it has been an enjoyable process – well, at least it was until Natty (11.04).

For the past year, I’ve been increasingly disappointed in Ubuntu because more things get broken than fixed as the development goes along. But, I should explain a few things:  I don’t use the vanilla Ubuntu.

First, I use Kubuntu.  I’ve flipped-flopped between KDE and GNOME a few times, but I always find myself gravitating back to KDE.  The ability to customize things so deeply has always kept me coming back. (For instance, I remapped the Eject button on my macbook to the eject command last night in the KDE settings panel, and I think that’s the cat’s pajamas.  I’ve done the similar things to enable my keyboard backlight as well.  Stuff like that just makes me happy.)

Second, I love compiz.  I find KDE’s effects kinda lackluster, whereas compiz has a few modules that make me more productive.  I love scale and the desktop cube, because I think well in terms of a 3D desk, and it just makes it easy for me to remember where I left my windows.

So, with that in mind, My ideal desktop is Ubuntu + KDE + compiz.  A combination that ran REALLY well in 10.10.  Actually, that was the last time it did run well – which is a big part of my beef.

After several months of watching Oneiric evolve, I’m really disappointed. I was hoping for many more bug fixes before today’s release, but I just got more and more bugs.  Here are some of the ones that annoy me on a frequent basis:

  • Compiz no longer runs smoothly in KDE.  Flickering and artifacts that were all cleaned up in Maverick are all back with a vengeance.
  • Compiz has been broken so that it hangs repeatedly whenever KDE panels are used – particularly stuff like the new activity manager, or even just resizing a panel.
  • Compiz itself now prevents panels from appearing when it’s set as the default window manager, so that you must kill compiz and start the KDE window manager on every boot just to see your panels.
  • KDE itself has been broken so that the window manager crashes on EVERY single exit.  You can’t shut down the computer without having to hit close on two separate “KDE window manager has crashed” windows
  • There’s the dreaded “Kernel is taking too much power” bug that was only recently “fixed”.  Actually, it wasn’t fixed, but with a few custom kernel parameters, becomes manageable.
  • Oneiric decided to go with the brcmsmac driver for my wireless card.  The driver works fine, but it’s in developmental stage, so there’s no power management for the driver, making it take up 3-5W of power.  For a laptop, that’s just inexcusable – it’s about 30-50% of my total battery draw (usually 11-15W after a lot of manual tuning)!
  • Something totally botched up the NVidia dual monitor support. Even a month ago I was able to drive a second monitor from my video card, but it now fails to do so reliably.  I gave up the second monitor because turning the second monitor on and off 10-15 times in a row in the Nvidia settings panel just to get it to work is no longer a reasonable solution I’m willing to engage in.
  • Then there is the move to “alternate architecture” – I’m still not sure how this is supposed to work, but try getting skype to work.  It’s “fun”.  (Fun being defined as a pain in the ass that involves manually installing i386 libraries that are automagically removed every time you upgrade a package, because the 64-bit version just plain fails to work at all.)
  • Oh, and now Skype can’t see my microphone, but I’m not sure that’s an Ubuntu problem, although I doubt anything will be fixed now that Microsoft has bought Skype. (Again, not an Ubuntu problem…)

Anyhow, you get the idea.

It seems that KDE isn’t a priority for Ubuntu developers, and worse, I don’t think that Ubuntu devs are even aware of the breakage they’ve caused in compiz and KDE while re-purposing it for Unity.  Lack of testing might be one problem, but I suspect that they’re not really even interested in keeping compatibility – which was always one of the core virtues of GNU/Linux for me: interchangable parts.  I have no interest in switching to Unity, but I wish they wouldn’t break everything else for me in the rush to get Unity working for themselves.

Alas, while I’m going to keep using Kubuntu for a little while longer, my love for the Ubuntu distros is fading.  I love bleeding edge, but I’m not a fan of this rampant (avoidable) breakage.

So, my advice – stay away from Kubuntu Oneiric Ocelot – it’s not worth the pain.  With any luck, some of these bugs will be fixed for the LTS release Pulverized Penguin Precise Pangolin.  But I won’t be holding my breath.