Bioinformatics toolchain

Once again, it’s a monday morning, and I’ve found myself on the ferry headed across the bay, thinking to myself, what could be better than crowdsourcing my bioinformatics toolchain, right?

Actually, This serves two purposes: It’s a handy guide for myself of useful things to install on a clean system, as well as an opportunity to open a conversation about things that a bioinformatician should have on their computer. Obviously we don’t all do the same things, but the concepts should be the same.

My first round of installs were pretty obvious:

  • An IDE (Pycharm, community edition)
  • A programming language (Python 3.6)
  • A text editor (BBEdit… for now, and nano)
  • A browser (Chrome)
  • A package manager (Brew)
  • A python package manager (pip)
  • A some very handy tools (virtualenv, cython)
  • A code cleanliness tool (pylint)

I realized I also needed at least one source code tool, so the obvious was a private github repository.

My first order of business was to create a useful wrapper for running embarassingly parallel processes on computers with multiple cores – I wrote a similar tool at my last job, and it was invaluable for getting computer heavy tasks done quickly, so I rebuilt it from scratch, including unit tests. The good thing about that exercise was that it also gave me an opportunity to deploy my full toolchain, including configuring pylint (“Your code scores 10.0/10.0”), and github, so that I now have some basic organization and working environment. Unit testing also forced me to configure the virtual environment and the dependency chains of libraries, and ensured that what I wrote was doing what I expect.

All in all, a win-win situation.

I also installed a few other programs:

  • Slack, with which I connect with other bioinformaticians
  • Twitter, so I can follow along with stuff like #AMA17, which is going on this weekend.
  • Civ V, because you can’t write code all the time. (-:

What do you think, have I missed anything important?

5 thoughts on “Bioinformatics toolchain

  1. What about docker? Conda/Bioconda are also pretty nice to use for managing libraries.. I’ve been playing with AWS Batch which uses docker as part of the deployment. It’s pretty cost effective solution (use spot instances) for some embarrassingly parallel problems… If you are doing any workflows, nextflow is pretty nice to use and they now support AWS Batch..

Leave a Reply

Your email address will not be published. Required fields are marked *