New spam for breakfast

I received an interesting piece of spam this morning. It came among the usual flurry of easily filtered spam, which is composed mainly of people trying to do SEO (search engine optimization) to get their hits up at the top of the search engine results.   That is to say, it mostly consists of a bogus comment and a link to something like an online pharmacy.

This morning, the link was a surprise…  check it out:

Author : Jaelyn (IP: 173.230.129.176 , li169-176.members.linode.com)
E-mail : www.droman827@misterpaws.net
URL    : http://www.bing.com/
Whois  : http://whois.arin.net/rest/ip/173.230.129.176
Comment: I'm not easily irpemssed but you've done it with that posting.

You’ll notice the usual typo in the comment, which is supposed to help it get past the filters, which it did in this case, but more surprisingly, the IP actually traces back to someone’s web page – just a random blog. My guess is that the computer hosting that blog has a virus which is pumping out the spam.

The most unusual thing about this is that it’s actually promoting bing.com,

  1. the web site it’s promoting is bing.com.
  2. If it is a virus promoting it, it’s almost guaranteed to be running on a Microsoft computer.

If I were into conspiracy theories, I’d wonder if Microsoft has now taken to paying virus creators to promote it’s web site using viruses that target Microsoft computers.

Yeeeesh.  Even Microsoft couldn’t sink that low…  but really, I would like to know who is behind this campaign.  Promoting bing through spam comments is already pretty despicable – but not something I’d put beyond Microsoft.

Womanspace – last lap.

I wrote a comment on Ed Rybicki’s blog, which is still awaiting moderation.  I’m not going to repeat what I said there, but I realized I had more to say than what I’d already written.  Specifically, I have much more to say about a comment he wrote on this article:

PS: “why publish something that you don’t believe in is another story” – no, it’s just that science fiction allows one to explore EVERYTHING, including what you don’t believe in.”

Ed makes a great point – Science fiction is exactly the right vehicle for exploring things that you don’t believe in.  Indeed, it’s been used exactly that way since the genre was invented.  You could say that Guliver’s Travels was a fantastic use of early science fiction, exploring a universe that mocked all sorts of contemporary idiocy that the author (Swift) disagreed with.

So, yes, I see Ed’s point – and he has a good one.  However, I’m going to have to disagree with Ed on the broader picture.  Science Fiction is perfect for exploring issues that you don’t believe in precisely because you can apply them to similar or parallel situations where they demonstrate their flaws.

For instance, if you want to write about how terrible apartheid is, you don’t set a science fiction novel in South Africa in the 1990’s, you set it up in on another planet where two civilizations clash – and you can explore the themes away from the flashpoint issues that are rife in the real world conflict. (Orson Scott Card explores a lot of issues of this type in his novels.)

The issue with Ed’s article – and there are plenty of them to chose from – is that he chose to engage with the lowest form of science fiction: Inclusion of some “vaguely science-like device” that casts no great insight into anything.  Science fiction, as a vehicle, is all about where you take it.

The premise would be equally offensive if he had picked: a race (“Filipino’s only get by because they have access to another dimension to compensate for their height”), a religion (“Christians use another dimension to hide from criticism leveled at their holy book”), or an age (“Anyone who can hold a job after the age of 65 is clearly doing so because they’re able to access another dimension”).

Ed could have made much better use of the vehicle he chose to drive.  He could have invented an alien species in which only one gender has access to a dimension, he could have used the alternate dimension to enable women to do things men can’t (and no, I don’t buy that men can’t shop efficiently) or he could have used his device to pick apart injustices that women face in competing with men.

Instead of using his idea to explore the societal consequences of the pllot device, he uses it to reinforce a stereotype.

That, to me, is not a good use of science fiction.  And the blame doesn’t just go to the author – it goes to the editors.  As a long time reader of science fiction, I can tell when a story doesn’t work and when it fails to achieve it’s desired effect.  This story neither worked, nor causes anyone to question their own values.  (It does, however make me wonder about the editor’s judgment in choosing to print it, as well as the author’s judgment in allowing it to be printed in a high profile forum.)

So, let me be clear – I despise the use of the sterotypes about women that Ed chose to explore. That he believes exploring gender issues this way is any less sensitive than race, religion or age would be is ridiculous – and shows a measure of bad judgement.

Having come up with a great tool (alternate dimensions) for making a comment on society (women and men aren’t treated equally), he completely missed the opportunity to use the venue (science fiction) to set the story in a world where he could have explored the issue and shown us something new.  In essence, he threw away a golden opportunity to cause his audience to ask deep questions and take another look at the issue from a fresh perspective – exactly what science fiction is all about.

Ed’s not a villian – but he’s not a great science fiction writer either.

Where’s the collaboration?

I had another topic queued up this morning, but an email from my sister-in-law reminded me of a more pressing beef: Lack of collaboration in the sciences. And, of course, I have no statistics to back this up, so I’m going to put this out there and see if anyone has anything to comment on the topic.

My contention is that the current methods for funding scientists is the culprit for driving less efficient science, mixed with a healthy dose of Zero Sum Game thinking.

First, my biggest pet peeve is that scientists – and bioinformaticians in particular – spend a lot of time reinventing the wheel.  How many SNP callers are currently available?  How many ChiP-Seq packages? How many aligners?  And, more importantly, how can you tell one from the other?  (How many of the hundreds of snp callers have you actually used?)

It’s a pretty annoying aspect of bioinformatics that people seem to feel the need to start from scratch on a new project every time they say “I could tweak a parameter in this alignment algorithm…”  and then off they go, writing aligner #23,483,337 from scratch instead of modifying the existing aligner.  At some point, we’ll have more aligners than genomes!  (Ok, that’s a shameless hyperbole.)

But, the point stands.  Bioinformaticians create a plethora of software that solve problems that are not entirely new.  While I’m not saying that bioinformaticians are working on solved problems, I am asserting that the creation of novel software packages is an inefficient way to tackle problems that someone else has already invested time/money into building software for. But I’ll come back to that in a minute.

But why is the default behavior to write your own package instead of building on top of an existing one?  Well, that’s clear: Publications.  In science, the method of determining your progress is how many journal publications you have, skewed by some “impact factor” for how impressive the name of the journal is.  The problem is that this is a terrible metric to judge progress and contribution.  Solving a difficult problem in an existing piece of software doesn’t merit a publication, but wasting 4 months to rewrite a piece of software DOES.

The science community, in general, and the funding community more specifically, will reward you for doing wasteful work instead of focusing your energies where it’s needed. This tends to squash software collaborations before they can take off simply by encouraging a proliferation of useless software that is rewarded because it’s novel.

There are examples of bioinformatics packages where collaboration is a bit more encouraged – and those provide models for more efficient ways of doing research.  For instance, in the molecular dynamics community, Charmm and Amber are the two software frameworks around which most people have gathered. Grad students don’t start their degree by being told to re-write one or the other packages, but are instead told to learn one and then add modules to it.  Eventually the modules are released along with a publication describing the model.  (Or left to rot in a dingy hard drive somewhere if they’re not useful.)   Publications come from the work done and the algorithm modifications being explained.  That, to me, seems like a better model – and means everyone doesn’t start from scratch

If you’re wondering where I’m going with this, it’s not towards the Microsoft model where everyone does bioinformatics in Excel, using Microsoft generated code.

Instead, I’d like to propose a coordinated bioinformatics code-base.  Not a single package, but a unified set of hooks instead.  Imagine one code base, where you could write a module and add it to a big git hub of bioinformatics code – and re-use a common (well debugged) core set of functions that handle many of the common pieces.  You could swap out aligner implementations and have modular common output formats.  You could build a chip-seq engine, and use modular functions for FDR calculations, replacing them as needed.  Imagine you could collaborate on code design with someone else – and when you’re done, you get a proper paper on the algorithm, not an application note announcing yet another package.

(We have been better in the past couple years with tool sets like SAMTools, but that deals with a single common file format.  Imagine if that also allowed for much bigger projects like providing core functions for RNA-Seq or CNV analysis…  but I digress.)

Even better, if we all developed around a single set of common hooks, you can imagine that, at the end of the day (once you’ve submitted your repository to the main trunk), someone like the Galaxy team would simply vacuum up your modules and instantly make your code available to every bioinformatician and biologist out there.  Instant usability!

While this model of bioinformatics development would take a small team of core maintainers for the common core and hooks, much the same way Linux has Linus Torvalds working on the Kernel, it would also cut down severely on code duplication, bugs in bioinformatics code and the plethora of software packages that never get used.

I don’t think this is an unachievable goal, either for the DIY bioinformatics community, the Open Source bioinformatics community or the academic bioinformatics community.  Indeed, if all three of those decided to work together, it could be a very powerful movement.  Moreso, corporate bioinformatics could be a strong player in it, providing support and development for users, much the way corporate Linux players have done for the past two decades.

What is needed, however, is buy-in from some influential people, and some influential labs.  Putting aside their own home grown software and investing in a common core is probably a challenging concept, but it could be done – and the rewards would be dramatic.

Finally, coming back to the funding issue.  Agencies funding bioinformatics work would also save a lot of money by investing in this type of framework.  It would ensure more time is spent on more useful coding, more time is spent on publications that do more to describe algorithms and to ensure higher quality code is being produced at the end of the day.  The big difference is that they’d have to start accepting that bioinformatics papers shouldn’t be about “new software” available, but “new statistics”, “new algorithms” and “new methods” – which may require a paradigm change in the way we evaluate bioinformatics funding.

Anyhow, I can always dream.

Notes: Yes, there are software frameworks out there that could be used to get the ball rolling.  I know Galaxy does have some fantastic tools, but (if I’m not mistaken), it doesn’t provide a common framework for coding – only for interacting with the software.  I’m also aware that Charmm and Amber have problems – mainly because they were developed by competing labs that failed to become entirely enclusive of the community, or to invest substantially in maintaining the infrastructure in a clean way.Finally, Yes, the licensing of this code would determine the extent of corporate participation, but the GPL provides at least one successful example of this working.