The glamour of Pipeline bioinformatics

I’m going to have to eat a bit of humble pie.  When I was a grad student, I may have just slightly looked down on “pipeline bioinformatics”, thinking it was a subject that was boring.  It clearly wasn’t as glamorous as designing new algorithms or plucking hidden bits of information out of giant data sets… I may have even thought it was something you just did as an after thought.

I was wrong.

I have to admit, now that I’ve had a taste of it, I’m enjoying it for exactly the opposite reasons:  It’s a fascinating game of balancing everything you know about computers and biology all at the same time, while making sure you get the right answer consistently.  It’s a cross between doing jigsaw puzzles and playing jeopardy…  and I’m kinda liking it.

In order to build a good pipeline, you need infrastructure that glues all the parts together, you need planning to make sure that it has room for growth, and you need to know what constraints the pipeline will face…  And, you need to be able to understand how everything from the bits of data you’re pushing through it will interact with all of the hardware on all of the machines and wires it’s going to run on.  That’s no small feat – but it’s an exhilarating challenge.

While I may have thought algorithm design was the cat’s pyjamas, building a pipeline is the same resource management challenge scaled up to include a whole lot more moving parts.  And, to those who manage all of those working parts, I finally grok what it is that drives you – and I am only working on a pipeline that was assembled by others, not even one of my own creation – which just increases my respect for those who have built pipelines out of nothing:

The thrill of watching data cascade through the waterfall that is the pipeline.

The excitement of having each individual piece operating in harmony, squeezing out that last bit of performance.

The fun of adding in three more pieces you thought would never fit, but making it work.

The satisfaction of knowing you managed to tame the mangy electrons that seemed so unruly before they entered your pipeline.

The reward of having someone look at the data afterwords, and learning something new from it.

Yes, pipeline bioinformaticians, I owe you an apology, your product is a magnificent work of art in it’s own right – and it is only truly completed when people are able to forget that it’s there. Cheers to you!

 

How do you become a bioinformatician?

I’ve been following the bioinformatics sub-reddit for the past couple of months, ever since I stumbled upon it when a colleague asked me about bioinformatics resources on the web.  It’s a fascinating place to visit, but it’s incredibly repetitive in that people keep asking “How do I become a bioinformatician?”

Unfortunately there is not a single answer, because bioinformatics isn’t a single job – it’s a collection of people who have found a way to live with one foot in each of two worlds: computer programming and biology.  Getting a firm footing in each can be a serious challenge, as people spend years studying just one of those to become proficient at it.

However, I think there are some common threads that tie the field together.  You need to invest the time in at least a handful of basic fields: some basic programming, some elementary cell biology and at least a simple understanding of math or statistics.  What you can accomplish with just that little can be incredibly productive.  Mostly in terms of automation of data processing or modelling of your results.

On the other hand, bioinformatics also includes a lot of sub-disciplines.  Great programmers can build incredible pipelines.  Great mathematicians can invent or apply algorithms to create new ways of interpreting data, and great biologists can develop heuristics and re-interpret data in new ways to generate insights that others have overlooked.  There’s even room for “neat freaks” in organizing and imposing order on unruly data.

The challenge of becoming a bioinformatician is learning where your strengths and weaknesses lay, and using them to your advantage.  Finding a research group that shores up your weaknesses – or helps you fill them in – can be a great boost to your career.  After my masters degree, I felt I had two big gaping holes in my resume: big data and databases, which I made the focus of my PhD research. Coming out of my defence, I felt I was able to bring a more balanced approach to the table – and had simultaneously purged any instinct I might have ever had to reach for a spreadsheet to interpret information. (Spreadsheets and big data don’t mix.)

So, where does that lead an aspiring bioinformatician?  Unless you take the time to do both a computer science degree and a biology degree, you probably won’t be able to shoehorn everything in to become an expert in both, and not everyone wants to get their PhD to fill in the gaps left in an undergrad education.

With that said, let me lay down a few useful points:

  1. Pick and chose to study subjects that interest you because you’ll at least end up with strengths in things you enjoy, which leads to jobs doing things you enjoy.
  2. You can always learn something new later… but take opportunities to try new things when they come.
  3. Remember that you’re not going to be the expert in every field you put your foot into – so look for opportunities to collaborate with the people who are.  (If you’re going into bioinformatics and expect to do everything yourself, you’re probably doing it wrong.)
  4. Don’t be afraid of the fact that you don’t know stuff.  Your job isn’t to be the best biologist and best computer scientist at the same time – it’s to be the bridge between.  The stronger your foundations, the better a bridge you can be, but unlike a concrete bridge, you can always invest in learning more.
  5. Yes, higher education does help in this field.  Bioinformatics is still dominated by research based organizations, and the academic hierarchy saturates the mindset of bioinformaticians everywhere.  (Or, almost everywhere.)
  6. Bioinformatics is also about the “soft” skills.  Don’t forget that bioinformaticians are also in a good place to be good leaders – since you’ll be one of the few people who can speak both languages, and tie together groups that would otherwise lack a common language.
  7. Don’t believe the hype about what you should learn:  R isn’t really the only language for doing bioinformatics.  Perl isn’t always evil (just most of the time, though it did save the human genome…), Java isn’t the slowest language out there, and c isn’t only for hardcore programmers. (Python, though, is a pretty good all-around language.)  Everyone has an opinion on where bioinformatics is going – but it’s just an opinion, so make your own choices.

At the end of the day, I always give students the same piece of advice:  As you go through life, you will learn new skills that you can apply as you see fit.  At the end of the day, each of these skills will be a tool in your toolbox that you can turn to when you hit a problem.  If you only have a hammer in your toolbox, your repertoire is pretty limited.  On the other hand, if you collect a fantastic assembly of tools, you’ll be equipped to handle just about anything that comes your way.  Your job is to invest your time into building the best toolkit you can, so that when you get out of school, you’ll be ready to solve as many problems as you can.

Bioinformatics is just a special case of toolbox building, in that you need the tools of at least two disciplines in your toolbox.  What you chose to put into your toolbox is entirely up to you, but (to stretch the toolbox analogy just a little too far), take a few minutes to ask if you’d like to be a plumber or a carpenter before you start collecting your tools. Or, without the metaphoric toolkit, ask yourself what kind of bioinformatician you want to be.

Once you know the answer to that question, you’ll figure out pretty quickly which tools you want to start collecting.  And the path towards becoming a bioinformatician will start to become clear.  It may not take you where you expect, but I can guarantee that you’ll be walking down an interesting road.

Ikea furniture and bioinformatics.

I’ll just come out and say it:  I love building Ikea furniture.  I know that sounds strange, but it truly amuses me and makes me happy.  I could probably do it every day for a year and be content.

I realized, while putting together a beautiful wooden FÖRHÖJA kitchen cart, that there is a good reason for it: because it’s the exact opposite of everything I do in my work.  Don’t get me wrong – I love my work, but sometimes you just need to step away from what you do and switch things up.

When you build ikea furniture, you know exactly what the end result will be.  You know what it will look like, you’ve seen an example in the showroom and you know all of the pieces that will go into putting it together.  Beyond that, you know that all the pieces you need will be in the box, and you know that someone, probably in Sweden, has taken the time to make sure that all of the pieces fit together and that it is not only possible to build whatever it is you’re assembling, but that you probably won’t damage your knuckles putting it together because something just isn’t quite aligned correctly.

Bioinformatics is nearly always the opposite.  You don’t know what the end result will be, you probably will hit at least three things no one else has ever tried, and you may or may not achieve a result that resembles what you expected.  Research and development are often fraught with traps that can snare even the best scientists.

But getting back to my epiphany, I realized that now and then, it’s really nice to know what the outcome of a project should be, and that you will be successful at it, before you start it.  Sometimes it’s just comforting to know that everything will fit together, right out of the box.

I’m looking forward to putting together a dresser tomorrow.

A bit of blogging

I’m more or less sure everyone has forgotten this blog by now… but that’s not a bad thing, really.   I don’t think I had much to say, and life has had a way of keeping me busy. Papers, work, changing work, changing diapers, all of it somehow keeps you from getting a lot of sleep, and that keeps me from having the motivation to write much.

However, I thought I’d start jotting down a few things that are interesting, as I come across them.  One that I’ve recently discovered is that reddit has a bioinformatics subreddit. (www.reddit.com/r/bioinformatics), which has been been inspiring me to start writing again.

The other, is that I’ve learned a LOT about mongodb recently, which I would like to start writing about.  Mostly under the “lessons learned” category, because scale up on software is just like scale up in the lab – it doesn’t just work.  Scaling things is tough.

Otherwise, I have a move to Oakland coming up, and there will probably be a few Goodbye Vancouver/Hello Oakland posts as well.  Somehow, I think the urge to write is coming back, and I haven’t had that spark since Denmark ripped it out of me.  Perhaps that’s just a bit of optimism coming back.  I would’t object to that.

What is a bioinformatician

I’ve been participating in an interesting conversation on linkedin, which has re-opened the age old question of what is a bioinformatician, which was inspired by a conversation on twitter, that was later blogged.  Hopefully I’ve gotten that chain down correctly.

In any case, it appears that there are two competing schools of thought.  One is that bioinformatician is a distinct entity, and the other is that it’s a vague term that embraces anyone and anything that has to do with either biology or computer science.  Frankly, I feel the second definition is a waste of a perfectly good word, despite being a commonly accepted method.

That leads me to the following two illustrations.

How bioinformatics is often used, and I would argue that it’s being used incorrectly.:

bioinformatics_chart2

And how it should be used, according to me:

bioinformatics_chart1

I think the second clearly describes something that just isn’t captured otherwise. It covers a specific skill set that’s otherwise not captured by anything else.

In fact, I have often argued that bioinformatician is really a position along a gradient from computer science to biology, where your skills in computer science would determine whether you’re a computational biologist (someone who applies computer programs to solve biology problems) or a bioinformatician (someone who designs computer programs to solve biology problems). Those, to me, are entirely different skill sets – and although bioinformaticians are often those who end up implementing the computer programs, that’s yet another skill, but can be done by a programmer who doesn’t understand the biology.

bioinformatics_chart3

That, effectively, makes bioinformatician an accurate description of a useful skill set – and further divides the murky field of “people who understand biology and use computers” – which is vague enough to include people who use an excel spreadsheets to curate bacterial strain collections.

I suppose the next step is to get those who do taxonomy into the computational side of things and have them sort us all out.

On 23andMe v. the FDA

Ok, it’s not really a court case… yet. However, from what I’ve read, it’s a pretty adversarial interaction. I’ve read a bunch of articles on the topic, already, and I have to say I’ve yet to see anyone state what I think is the obvious issue with the approach the FDA has taken.

They’re not regulating the equipment that does the testing.
They’re not regulating the interpretation of the information.

What’s left is that they appear to regulating the business model. It’s ok to do exactly what 23andMe is doing, but it’s not ok to do it if the consumer is uneducated. Were they handing the tests to an MD (who may or may not know what to do with the information) or a researcher (who may or may not have the ability to tell the subject of the test what the results are), it would be fine. As soon as it’s being handed over to a general consumer, it’s now going to be regulated.

I find that pretty hard to swallow.

If the FDA wants to regulate it as a medical device, then fine – regulate access to the medical device itself, and don’t try to regulate the burgeoning field of information interpretation and dissemination.

(Sorry for the lack of links – it’s been a busy week.)

I’ve landed.

So, I think it’s time to return to blogging.  I’ve started in a new group and have begun feeling my way around in a new area – so, for those who followed me for Next Gen Sequencing in the past, you may be surprised that it’s likely to play a diminished role in my new position.  I don’t think I’m done in NGS, but it looks like I’ll have a little break from it, until it my new group completes a few upgrades, at least.

Are you curious?  I’m working at the CMMT in Vancouver, in the Kobor Lab.  I can’t say enough how awesome this group is, and how welcoming they’ve been (and I don’t even think they read my blog…).  I’ll also likely be collaborating with a few other groups here – but the extent of that is yet to be determined.

So what will feature prominently in my blog?  Well, that’s a good question.

It seems like Chip-Seq will come back.  I don’t think I’ll be returning to FindPeaks – I’ve got better ideas and more interesting plans that I hope to move forward on. It seems likely that I have more to contribute in this particular area, so I expect I’ll be starting a new code base that deals a bit more with the statistics of Chip-Seq.  The findpeaks code base has become a bit too big for rapid prototyping, so it’s time to step out of it to move forward.

I’m sure that epigenetics will take a front row seat in my work. That’s a major focus in this group, both for histones and DNA methylation, so I can’t see it not playing a significant part.  (I’m looking forward to working with methylation, which I’ve never done before…)

I’ll probably be working with Python – I’ve been thinking that it’s time to move away from Java.  Not that there’s anything wrong with Java, but I’ve heard really good things about Python, and I’m excited to start a language that seems to fit a little more naturally with the way I’d like to approach the problem.

I’m hoping to work with Open Source..  well, that hasn’t been discussed much yet, but I still believe strongly in the open source philosophy – particularly in the academic world.  I’d rather not work on closed source code in this environment.

I’ll also likely be working with a bit of Yeast genomics – it’s a great model system, and there’s still a lot to learn about regulation and epigenetics in that particular organism.  And there’s always the tie in to beer.  That doesn’t hurt either.

At any rate, things are still evolving, and I have a 500Mb stack of papers to read (yes, I’m saving paper), but I think that I’m back.  I may do a few reviews of the subjects I’ll have to read up on, which include the epigenetics of healthy ageing and childhood development.  Oddly enough, I think we can learn things at both ends of the human age spectrum, so why not?

And yes, I’ll try to keep the disparaging comments about Denmark to a minimum from now on, but I can’t promise there won’t be any.  Not, at least, till the lawyers finish working out who’s owed what.

Starting over

I’m back.  Physically, I’m back in Canada, although not yet home.  I’m visiting family while all our possessions make their way back to Vancouver.  In the meantime, I wanted to get back to blogging.  To re-engage in the community and return my life to some sense of normalcy.

On Denmark, I don’t plan to say much.  It was a terrible experience from start to finish, and I’m leaving with less money, stability – and none of the bioinformatics experience I had wanted.  All in all, it was a disaster.  If people want specific details or advice about moving to Denmark, of course I’ll share what I know, but this isn’t the right forum for it.  For the moment, I won’t comment on how things went down at the end, although I’ve heard less than accurate versions in circulation.

On the subject of bioinformatics, I feel a bit out of touch.  I’ll be starting to get back into it shortly.  Obviously, it’ll take some time to ramp up and get back in to the swing of things.  However, I can say that last night was the first time in a year that I had actual free time. So what did I do?  I started to learn Python.  Honestly, I don’t think Java is the right tool for all occasions, and with about a month of downtime, python just feels like it might be the best fit for some of the stuff I’ll be working on in the future.

Anyhow, with any luck, things will start to work their way out.  At least, being back in Canada, I can see the light at the end of the tunnel.

Starting again

I’m sure, if you read my blog, that you’ve noticed a conspicuous absence of posts lately.  There were two main reasons for the “gap”.

The first is that I haven’t been blogging about bioinformatics because I didn’t want to blur my work with my blog.  It’s a challenging line to walk, and maintaining it requires a lot of late evenings, which have lately been sucked up by my daughter.  Only in the past week has that time started to be available again. (Thank goodness my daughter is sleeping well, finally!)

The second is that my family has been making some big decisions.  The biggest of the bunch is that we’ll be returning to North America.  Denmark hasn’t worked out for us, and this is really the only logical decision we could make.  Our original timeline was for three years in Denmark, but we’ll be cutting it down to one year.

Yes, that does mean I’m officially looking for a new job for the start of 2013, either in Vancouver or San Francisco – the two places that my family would be happiest.  If anyone knows of a company looking to hire a bioinformatician with experience in Next-gen sequencing in either of those cities, please let me know!  As I’ve discussed with my employer, I’ll be working in Denmark until the end of the year, but will be available in January.

For the moment, I’m making arrangements to do a post-doc, but I’m not sure that’s really a step forward for my career.

As for why things haven’t worked out here, I think it’s a combination of a lot of factors, but most obvious is that Denmark is really a hard country to find your way in.  The language is challenging, and with a young child, we haven’t been able to dedicate the time to the intensive language classes that are available, and we also haven’t been able to really find a social network that comes close to replacing the ones we’d left behind in Canada.  With our closest family members being a minimum of 18 hours away by plane, this just isn’t working well for us given that we have a young child.

Of course, there are other factors, ranging from the trivial (and silly) to complex, which include (in no particular order):

  • Accessibility (it’s much harder to get around in Denmark with a young child, compared to North America, particularly since Canadians can not get a driver’s licence here)
  • Availability (We’re forced to order from UK or further away to buy many of the products we need/want for our child because they’re just not available in Denmark.)
  • Comfort issues (Denmark doesn’t believe in bathtubs, for instance, and I haven’t had a shower in Denmark that didn’t involve either frozen toes or scalding blasts of hot water since arriving),
  • Pets health (still haven’t found a replacement food for the cat after searching for half a year)
  • Community (Everything shuts down on Sundays, which is devastating when you’re living in a “small” town and only have a handful of friends.)

All in all, however, I think these are simply things we take for granted in North America, and  a year in Denmark has been an expensive education in recognizing how differently Europeans and North Americans see the world.

Thus, I’ll be returning to North America with a lot more life experience and hoping that someone out there will want to put it to good use.