This past week, I submitted a final draft of an application note on some software I’d written (and am still writing, for that matter), and had it rejected twice because I’d included a URL as a reference. (The first time, I failed to notice that I’d cited postgresql 8.4 with a URL, in addition to Picard.) As both a biochemist and a bioinformatician, I can see both sides of the story as to why that would be the case, but it still irked me enough that I thought it worth writing about.
If you look back 30 years ago, there really wasn’t an internet, and so this wasn’t even an issue on the horizon. How you cite non-peer reviewed material was the same way you cited anything else: you gave the author’s name, the date of publication and the publication company – books were books, regardless of who paid to have it published. Publications were all copyrighted by some journal, and scientists would read articles in the library. Access to scientific information was restricted to those who had access to universities.
20 years ago, the Internet was a wild frontier, mostly made up of an ever changing network of modems. what was on one computer might not be there next time you connected. Hard drives failed, computers disconnected – and no one put anything of great value on bulletin boards.
15 years ago, web pages began to pop up, URLs entered into public consciousness and editors may have had to face the issue of what to do about self-published, transient information: Ban it. That was the response, as far as I can tell. Why not? It might not be there 2 days later, let alone by the time articles went to print. A perfectly reasonable first reaction to something that failed to meet any of the criteria for being a reference.
Just over 10 years ago, we had google. Suddenly, all of the information on the web was indexable and you could find just about anything you needed. You could before that too, but getting from place to place was a mess. Does anyone remember Internet Yellow Pages, where URLs were listed for companies? Still, information then had a short shelf life. Even the WayBack Machine archive was young, and information disappeared quickly. Still unsuitable for referencing, really. You could count on companies being there, but we were still in the days that urls could change hands for a fortune.
5 years ago, social media invaded – now you had to be online to keep up with your friends. But, there was also a major shift behind that – bioinformatics went from being just a series of perl scripts to being composed of major projects. Major projects went from being small team efforts to being massive collections of software. We also saw the adoption of web tools, many of which weren’t published, and probably never will be. We went from dial up to broadband… we went from miscellaneous computers to data centers. We went from hobbyist software projects to sourceforge. In short, the Internet matured, and the data it held went from being a transient thing to being a repository of far more knowledge than any book source.
It didn’t, however, become peer reviewed. Many people no longer consider the Internet to be transient, but with major influences like wikipeda, which is unreliable as a reference at best, we don’t often think of URLs as being a good reference. But how is that any different from books?
Unfortunately, somewhere along the line, I think journal editors confused their initial reasons for rejecting URLs (the transient nature) with something else: the lack of peer review. No editor would bat an eye at citing a published book, even if that information was not peer reviewed, but citing wikipedia seems like such a terrible idea that perhaps the slippery slope fallacy has reared it’s ugly head.
For bioinformatics, many of our common tools aren’t built by scientists any more, or if they are, they’re open source: the collaborative work of many people, which means they’re not going to be published. Many of them are useful toolkits that don’t even make sense to publish – but they are available on the web at a fixed address that doesn’t expire. Unlike commercial products, open source projects may die, but they never disappear when they’re hosted at the likes of sourceforge – which means they’re no longer transient.
While common sense and many colleagues just tell me to get over it and just “put the URL in the text”, I fail to see why this is necessary. Can’t editors see that the Internet is no longer a collection of random articles?
Hey Editors, there’s far more to the Internet than just wikipedia and facebook!
(NOTE: Ironically, as I write this, Sourceforge is doing upgrades on it’s web page, and some of the projects they host have “disappeared” temporarily…. but don’t worry, they’ve promised me that they’ll be back shortly.)