Replacing science publications in the 21st century

Yasset Perez-Riverol asked me to take a look at a post he wrote: a commentary on an article titled Beyond the Paper.  In fact, I suggest reading the original paper, as well as taking a look at Yasset’s wonderful summary image that’s being passed around.  There’s some merit to both of them in elucidating where the field is going, as well as how to capture the different forms of communication and the tools available to do so.

My first thought after reading both articles was “Wow… I’m not doing enough to engage in social media.”  And while that may be true, I’m not sure how many people have the time to do all of those things and still accomplish any real research.

Fortunately, as a bioinformatician, there are moments when you’ve sent all your jobs off and can take a blogging break.  (Come on statistics… find something good in this data set for me!)  And it doesn’t hurt when Lex Nederbragt asks your opinion, etither

However, I think there’s more to my initial reaction than just a glib feeling of under-accomplishment.  We really do need to consider streamlining the publication process, particularly for fast moving fields.  Whereas the blog and the paper above show how the current process can make use of social media, I’d rather take the opposite tack: How can social media replace the current process.  Instead of a slow, grinding peer-review process, a more technologically oriented one might replace a lot of the tools we currently have built ourselves around.  Let me take you on a little thought experiment, and please consider that I’m going to use my own field as an example, but I can see how it would apply to others as well. Imagine a multi-layered peer review process that goes like this:

  1. Alice has been working with a large data set that needs analysis.  Her first step is to put the raw data into an embargoed data repository.  She will have access to the data, perhaps even through the cloud, but now she has a backup copy, and one that can be released when she’s ready to share her data.  (A smart repository would release the data after 10 years, published or not, so that it can be used by others.)
  2. After a few months, she has a bunch of scripts that have cleaned up the data (normalization, trimming, whatever), yielding a nice clean data set.  These scripts end up in a source code repository, for instance github.
  3. Alice then creates a tool that allows her to find the best “hits” in her data set.  Not surprisingly, this goes to github as well.
  4. However, there’s also a meta data set – all of the commands she has run through part two and three.  This could become her electronic notebook, and if Alice is good, she could use this as her methods section: It’s a clear concise list of commands needed to take her raw data to her best hits.
  5. Alice takes her best hits to her supervisor Bob to check over them.  Bob thinks this is worthy of dissemination – and decides they should draft a blog post, with links to the data (as an attached file, along with the file’s hash), the github code and the electronic notebook.
  6. When Bob and Alice are happy with their draft, they publish it – and announce their blog post to a “publisher”, who lists their post as an “unreviewed” publication on their web page.  The data in the embargoed repository is now released to the public so that they can see and process it as well.
  7. Chris, Diane and Elaine notice the post on the “unreviewed” list, probably via an RSS feed or by visiting the “publisher’s” page and see that it is of interest to them.  They take the time to read and comment on the post, making a few suggestions to the authors.
  8. The authors make note of the comments and take the time to refine their scripts, which shows up on github, and add a few paragraphs to their blog post – perhaps citing a few missed blogs elsewhere.
  9. Alice and Bob think that the feedback they’ve gotten back has been helpful, and they inform the publisher, who takes a few minutes to check that they have had comments and have addressed the comments, and consequently they move the post from the “unreviewed” list to the “reviewed” list.  Of course, checks such as ensuring that no data is supplied in the dreaded PDF format are performed!
  10. The publisher also keeps a copy of the text/links/figures of the blog post, so that a snapshot of the post exists. If future disputes over the reviewed status of the paper occur, or if the author’s blog disappears, the publisher can repost the blog. (If the publisher was smart, they’d have provided the host for the blog post right from the start, instead of having to duplicate someone’s blog, otherwise.)
  11. The publisher then sends out tweets with hashtags appropriate to the subject matter (perhaps even the key words attached to the article), and Alice’s and Bob’s peers are notified of the “reviewed” status of their blog post.  Chris, Diane and Elaine are given credit for having made contributions towards the review of the paper.
  12. Alice and Bob interact with the other reviewers via comments and twitters, for which links are kept from the article.  (trackbacks and pings) Authors from other fields can point out errors or other papers of interest in the comments below.
  13. Google notes all of this interaction, and updates the scholar page for Alice and Bob, noting the interactions, and number of tweets in which the blog post is mentioned.   This is held up next to some nice stats about the number of posts that Alice and Bob have authored, and the impact of their blogging – and of course – the number of posts that achieve the “peer reviewed” status.
  14. Reviews or longer comments can be done on other blog pages, which are then collected by the publisher and indexed on the “reviews” list, cross-linked from the original post.

Look – science just left the hands of the vested interests, and jumped back into the hands of the scientists!

Frankly, I don’t see it as being entirely far fetched.  The biggest issue is going to be harmonizing a publisher’s blog with a personal blog – which means that most likely personal blogs will probably shrink pretty rapidly, or they’ll move towards consortia of “publishing” groups.

To be clear, the publisher, in this case, doesn’t have to be related whatsoever to the current publishers – they’ll make their money off of targeted ads, subscriptions to premium services (advanced notice of papers? better searches for relevant posts?) and their reputation will encourage others to join.  Better bloging tools and integration will the grounds by which the services compete, and more engagement in social media will benefit everyone.  Finally, because the bar for new publishers to enter the field will be relatively low, new players simply have to out-compete the old publishers to establish a good profitable foothold.

In any case – this appears to be just a fantasy, but I can see it play out successfully for those who have the time/vision/skills to grow a blogging network into something much more professional.  Anyone feel like doing this?

Feel free to comment below – although, alas, I don’t think your comments will ever make this publication count as “peer reviewed”, no matter how many of my peers review it. :(

~20% of American Scientists consider leaving the U.S.

Ah, if only Canada wasn’t currently governed by the science hating conservatives. This would be a great time to reverse the brain drain and bring a lot of top Canadian talent home, or even to bring in a lot of American and International researchers.  Alas, the odds of the conservatives increasing science funding is pretty slim.  Mostly, they’ve been busy trying to silence the scientists that are already here.

Anyhow, this is a sobering look at what’s happening in academia in the states:

Nearly 20 Percent Of Scientists Contemplate Moving Overseas Due In Part To Sequestration

Is Encode bunk?

Ok, I’m sick, so this is a very short post.  I just stumbled onto this article in the guardian.  Not being a Brit, I have no idea if it’s even a remotely reputable journal, or why this piece is so sensationalist.  So… scientists see evidence and are working to understand whether much of the genome serves a purpose and they disagree on the interpretation.  Neither side has conclusive evidence, but the Encode project certainly has evidence that makes it’s claims seem valid.

In contrast, a bunch of biologists seem to have jumped on it and insist that most of the DNA in the human genome is still “junk” and does nothing.

While I don’t support the side that seems to be calling “BS” on the Encode project, the character of the article seems unnecessarily vitriolic.  Does the UK have republicans?

Edit: Finally feeling well enough to get back on my computer and look for the source of this argument: Here.  And after reading a few pages…Wow! I can’t believe that was published as is.  The abstract alone sounds like someone got up on the wrong side of the bed, and then ate nettles for breakfast.

Asimov’s Corollary

Note: I’m going to risk a bit of copyright infringement today, which is generally unlike me.  However, I am not gaining anything from this – and in fact, hopefully I’m giving some publicity to the estate of Issac Asimov, rather than diminishing the value of the works.  If the publishers would like to make another run of this particular book, I’d be more than happy to take it down from the Internet.

Isaac Asimov has always been one of my favorite authors.  I’ve never been certain how serious he was about doing research science as a career, but I’ve always been inspired by the idea of a scientist becoming an author to publicize the value of science.  There’s something noble about it.  Although, like me, Asimov may have had a passion for science, but struggled in the lab.  As he phrased it,  “Whatever gifts I have, none of them includes deftness in experiment work.” (The Magic Isle, 1977)  I’ve always just said that glassware and I don’t get along, but I’m pretty sure we’re on the same page.

In any case, Asimov worked in an age before computers, so my solution of moving into bioinformatics wasn’t an avenue open to him – and unfortunately, neither was blogging. Asimov, however, was a prolific writer in the age before the internet and his style lends itself beautifully to the blogging format.  Although his pieces are slightly longer than the average blog, they have a format that has inspired me – and which I have always tried to emulate.  No surprise, though, Asimov has been a hero of mine almost since I was able to read.

Frankly, if he had lived during the age of the Internet, I think he’d not only have a blog, but he’d be one of the most prolific bloggers around. In a tribute to him, I’d like to republish one of his essays – it’s eerie how well it transcends the generations.  It’s one of his most timeless pieces, in my humble opinion. Asimov, like me, always starts his pieces with some preamble or a short story.  The one used here is a bit dated, but please stick with it, the rest of the piece makes up for it, charting out a manifesto for skeptics, for good science and for rational though. Without further ado, I’m going to republish the full piece here.  Please keep in mind that he wrote this 35 years ago – before I was born, before atheism was acceptable, and while skepticism was just taking off.

Isaac Asimov, you are my hero.

The following piece is reprinted without permission from Chapter seventeen of Isaac Asimov’s book Quasar, Quasar burning bright, originally printed in the February 1977 issue of The Magazine of Fantasy and Science Fiction.  (I believe neither book nor magazine are in press anywhere – and if they are, I highly recommend you pick up a copy just to have this essay on hand.)  Mistakes in the text below are undoubtedly mine.

I have just come back from Rensselaerville, New York, where, for the fifth year, I have led a four-day seminar on some futuristic topics. (This time it was on the colonization of space.)  Some seventy to eighty people attended, almost all of them interested in science fiction and all of them eager to apply their imaginations to the posing of problems and the suggesting of solutions.

The seminar only runs from a Sunday to a Thursday, but by Thursday there is a mass heartbreak at the thought of leaving and vast promises (usually kept) to return the next year.

This year we managed to persuade Ben Bova (editor of Analog) and his charming wife, Barbara, to attend.  They threw themselves into the sessions with a will and were beloved by all.

Finally came the end, at Thursday noon, and, as is customary on these occasions, I was given a fancy pseudo-plaque testifying to my good nature and to my suave approach towards members of the opposite sex. [footnote: See my book The Sensuous Dirty Old Man, (Walker 1971).]

A charming young woman, not quite five feet tall, made the presentation and in simple gratitude, I  placed my arm about her waist.  Owing to her unusually short height, however, I didn’t manage to get low enough and the result brought laughter from the audience.

Trying to dismiss this embarrassing faux pas (though I must admit that neither of us budged), I said “I’m sorry, folks.  That’s just the Asimov grip.”

And from the audience Ben Bova (who, it seems appropriate to say in this particular connection, is my bosom buddy) called out “Is that anything like the swine flu?

I was wiped out, and what does one do when one has been wiped out by a beloved pal? Why one turns about and proceeds to try to wipe out some other beloved pal. – In this case, my English colleague Arthur C. Clarke.

In Arthur’s book, Profiles of the Future(Harper & Row, 1962) he advances what he himself calls “Clarke’s Law.”  It goes as follows:

“When a distinguished but elderly scientist states that something is possible, he is almost certainly right.  When he states that something is impossible, he is very probably wrong.”

Arthur goes on to explain what he means by “elderly.” He says: “In physics, mathematics, and astronautics it means over thirty;  in other disciplines, senile decay is sometimes postponed to the forties.”

Arthur goes on to give examples of “distinguished but elderly scientists” who have pished and tut-tutted all sorts of things that have come to pass almost immediately.  The distinguished Briton Ernest Rutherford pooh-poohed the possibility of nuclear power, the distinguished American Vannevar Bush bah-humbugged intercontinental ballistic missiles, and so on.

But naturally when I read a paragraph like that, knowing Arthur as I do, I begin to wonder if, among all the others, he is thinking of me.

After all, I’m a scientist.  I am not exactly a “distinguished” one but nonscientists have gotten the notion somewhere that I am, and I am far too polite a person to subject them to the pain of disillusionment, so I don’t deny it.  And then, finally, I am a little over thirty and have been a little over thirty for a long time, so I qualify as elderly by Arthur’s definition. (So does he, by the way, for he is – ha, ha – three years older than I am.)

Well, then, as a distinguished but elderly scientist, have I been going around stating that something is impossible or, in any case, that that something bears no relationship to reality?  Heavens, yes! In fact, I am rarely content to say something is “wrong” and let it go at that.  I make free use of terms and phrases like “nonsense,” “claptrap,” “stupid folly,” “sheer idiocy,” and many other bits of gentle and loving language.

Among currently popular aberrations, I have belabored without stint Velikovskianism, astrology, flying saucers and so on.

While I haven’t yet had occasion to treat these matters in detail, I also consider the views of the Swiss Erich von Däniken on “ancient astronauts” to be utter hogwash; I take a similar attitude to the widely held conviction (reported, but not to my knowledge subscribed to, by Charles Berlitz in The Bermuda Triangle) that the “Bermuda Triangle” is the hunting ground of some alien intelligence.

Doesn’t Clarke’s Law make me uneasy, then?  Don’t I feel as though I am sure to be quoted extensively, and with derision, in some book written a century hence by some successor to Arthur?

No, I don’t.  Although I accept Clarke’s Law and think Arthur is right in his suspicion that the forward-looking pioneers of today are the backward-yearning conservatives of tomorrow, [Footnote: Heck, Einstein himself found he could not accept the uncertainty principle and, in consequence, spent the last thirty years of his life as a living monument and nothing more.  Physics went on without him.] I have no worries about myself.  I am very selective about the scientific heresies I denounce, for I am guided by what I call Asimov’s Corollary to Clarke’s Law.  Here is Asimov’s Corollary:

When, however, the lay public rallies round an idea that is denounced by distinguished but elderly scientists and supports that idea with great fervor and emotion – the distinguished but elderly scientists are then, after all, probably right.

But why should this be?  Why should I, who am not an elitist, but an old-fashioned liberal and an egalitarian (see “Thinking About Thinking” in The Planet That Wasn’t, Doubleday, 1976), thus proclaim the infallibility of the majority, holding it to be infallibly wrong?

The answer is that human beings have the habit (a bad one, perhaps, but an unavoidable one) of being human; which is to say that they believe in that which comforts them.

For instance, there are a great many inconveniences and disadvantages to the Universe as it exists.  As examples: you cannot live forever, you can’t get something for nothing, you can’t play with knives without cutting yourself, you can’t win every time, and so on and so on (see “Knock Plastic,” in Science, Numbers and I, Doubleday 1968).

Naturally, then, anything which promises to remove these inconveniences and disadvantages will be eagerly believed.  The inconveniences and disadvantages remain, of course, but what of that?

To take the greatest, most universal, and most unavoidable inconvenience, consider death.  Tell people that death does not exist and they will believe you and sob with gratitude at the good news.  Take a census and find out how many human beings believe in life after death, in heaven, in the doctrines of spiritualism, in the transmigration of souls. I am quite confident you will find a healthy majority, even an overwhelming one, in favor of side-stepping death by believing in its nonexistence through one strategy or another.

Yet as far as I know, there is not one piece of evidence ever advanced that would offer any hope that death is anything other than the permanent dissolution of the personality and that beyond it, as far as individual consciousness is concerned, there is nothing.

If you want to argue the point, present the evidence.  I must warn you, though, that there are some arguments I won’t accept.

I won’t accept any argument from authority. (“The Bible says so.”)

I won’t accept any argument from internal conviction. (“I have faith it’s so.”)

I won’t accept any argument from personal abuse. (“What are you, an atheist?”)

I won’t accept any argument from irrelevance. (“Do you think you have been put on this Earth just to exist for a moment of time?”)

I won’t accept any argument from anecdote. (“My cousin has a friend who went to a medium and talked to her dead husband.”)

And when all that (and other varieties of nonevidence) are eliminated, there turns out to be nothing. [Footnote: Lately, there have been detailed reports about what people are supposed to have seen during “clinical death.” – I don’t believe a word of it.]

Then why do people believe? Because they want to.  Because the mass desire to believe creates a social pressure that is difficult (and, in most times and places, dangerous) to face down.  Because few people have had the chance of being educated into the understanding of what is meant by evidence or into the techniques of arguing rationally.

But mostly because they want to.  And that is why a manufacturer of toothpaste finds it insufficient to tell you that it will clean your teeth almost as well as the bare brush will.  Instead he makes it clear to you , more or less by indirection, that his particular brand will get you a very desirable sex partner.  People, wanting sex somewhat more intensely than they want clean teeth, will be the readier to believe.

Then, too, people generally love to believe the dramatic, and incredibility is no bar to the belief but is, rather, a positive help.

Surely we all know this in an age when whole nations can be made to believe in any particular bit of foolishness that suits their rulers and can be made willing to die for it, too. (This age differs from previous ages in this, however, only in that the improvement of communications makes it possible to spread folly with much greater speed and efficiency.)

Considering their love of the dramatic, is it any surprise that millions are willing to believe, on mere say-so and nothing more, that alien spaceships are buzzing around the Earth and that there is a vast conspiracy of silence on the part of the government and scientists to hide that fact? No one has ever explained what government and scientists hope to gain by such a conspiracy or how it can be maintained, when every other secret is exposed at once in all its details – but what of that? People are always willing to believe in any conspiracy on any subject.

People are also willing and eager to believe in such dramatic matters as the supposed ability to carry on intelligent conversations with plants, the supposed mysterious force that is gobbling up ships and planes in a particular part of the ocean, the supposed penchant of Earth and Mars to play Ping-Pong with Venus and the supposed accurate description of the result in the Book of Exodus, the supposed excitement of visits from extraterrestrial astronauts in prehistoric times and their donation to us of our arts, techniques and even some of our genes.

To make matters still more exciting, people like to feel themselves to be rebels against some powerful repressive force – as long as they are sure it is quite safe.  To rebel against a powerful political, economic, religious, or social establishment is very dangerous and very few people dare do it, except, sometimes, as an anonymous part of a mob.  To rebel against the “scientific establishment,” however, is the easiest thing in the world, and anyone can do it and feel enormously brave, without risking as much as a hangnail.

[Footnote: A reader once wrote me to say that the scientific establishment could keep you from getting grants, promotions, and prestige, could destroy your career, and so on.  That’s true enough.  Of course, that’s not as bad as burning you at the stake or throwing you in a concentration camp, which is what a real establishment could and would do, but even depriving you of an appointment is rotten. However, that works only if you are a scientist.  If you are a nonscientist, the scientific establishment can do nothing more than make faces at you.]

Thus, the vast majority, who believe in astrology and think that the planets have nothing better to do than form a code that will tell them whether tomorrow is a good day to close a business deal or not, become all the more excited and enthusiastic about the bilge when a group of astronomers denounce it.

Again, when a few astronomers denounce the Russian-born American Immanuel Velikovsky, they lent the man (and, by reflection, his followers) an aura of the martyr, which he (and they) assiduously cultivate, though no martyr in the world has ever been harmed so little or helped so much by the denunciations.

I used to think, indeed, that it was entirely the scientific denunciations that had put Velikovsky over the top and that had the American astronomer Harlow Shapley only had the sang froid to ignore the Velikovskian folly, it would quickly have died a natural death.

I no longer think so.  I now have greater faith in the bottomless bag of credulity that human beings carry on their back.  After all, consider Von Däniken and his ancient astronauts.  Von Däniken’s books are even less sensible than Velikovsky’s and are written far more poorly, [Footnote: Velikovsky, to do him justice, is a fascinating writer and has an aura of scholarliness that Von Däniken utterly lacks.] and yet he does well.  What’s more, no scientist, as far as I know, has deigned to take notice of Van Däniken. Perhaps they felt such notice would do him too much honor and would but do for him what it had done for Velikovsky.

So Van Däniken has been ignored – and, despite that, is even more successful than Velikovsky is, attracts more interest, and makes more money.

You see, then, how I chose my “impossibles.” I decide that certain heresies are ridiculous and unworthy of any credit not so much because the world of science says,  “It is not so!” but because the world of nonscience says, “It is,” so enthusiastically.  It is not so much that I have confidence in scientists being right, but that I have so much in nonscientists being wrong.


I admit, by the way, that my confidence in scientists being right is somewhat weak.  Scientists have been wrong, even egregiously wrong, many times. There have been heretics who have flouted the scientific establishment and have been persecuted therefor (as far as the scientific establishment is able to persecute), and, in the end, it has been the heretic who has proved right.  This has happened not only once, I repeat, but many times.

Yet that doesn’t shake the confidence with which I denounce those heresies I do denounce, for in the cases in which heretics have won out, the public has, almost always, not been involved.

When something new in science is introduced, when it shakes the structure, when it must in the end be accepted, it is usually something that excites scientists, sure enough, but does not excite the general public – except perhaps to get them to yell for the blood of the heretic.

Consider Galileo, to begin with, since he is the patron saint (poor man!) of all self-pitying crackpots. To be sure, he was not persecuted primarily by scientists for his scientific “errors,” but by theologians for his very real heresies (and they were real enough by seventeenth-century standards).

Well, do you suppose the general public supported Galileo? Of course not. There was no outcry in his favor. There was no great movement in favor of the Earth going round the Sun.  There were no “sun-is-center” movements denouncing the authorities and accusing them of a conspiracy to hide the truth.  If Galileo had been burned at the stake, as Giordano Bruno had been a generation earlier, the action would probably have proved popular with those parts of the public that took the pains to notice it in the first place.

Or consider the most astonishing case of scientific heresy since Galileo – the matter of the English naturalist Charles Robert Darwin.  Darwin collected the evidence in favor of the evolution of species by natural selection and did it carefully and painstakingly over the decades, then published a meticulously reasoned book that established the fact of evolution to the point where no rational biologist can deny it [Footnote: Please don’t write me to tell me that there are creationists who call themselves biologists. Anyone can call himself a biologist.] even though there are arguments over the details of the mechanism.

Well, then, do you suppose the general public came to the support of Darwin and his dramatic theory? They certainly knew about it.  His theory made as much of a splash in his day as Velikovsky did a century later.  It was certainly dramatic – imagine species developing by sheer random mutation and selection, and human beings developing from apelike creatures! Nothing any science fiction writer ever dreamed up was as shatteringly astonishing as that to the people who from earliest childhood had taken it for established and absolute truth that God had created all the species ready-made in the space of a few days and that man in particular was created in the divine image.

Do you suppose the general public supported Darwin and waxed enthusiastic about him and made him rich and renowned and denounced the scientific establishment for persecuting him? You know they didn’t.  What support Darwin did get was from scientists. (The support any rational scientist gets is from scientists, though usually from only a minority of them at first.)

In fact, not only was the general public against Darwin then, they are against Darwin now.  It is my suspicion that if a vote were taken in the United States right now on the question of whether man was created all at once out of the dirt or through the subtle mechanisms of mutation and natural selection over millions of years, there would be a large majority who would vote for dirt.

There are other cases, less famous, where the general public didn’t join the persecutors only because they never heard there was an argument.

In the 1830s the greatest chemist alive was the Swede Jöns Jakob Berzelius.  Berzelius had a theory of the structure of organic compounds which was based on the evidence available at that time.  The French chemist August Laurent collected additional evidence that showed that Berzelius’ theory was inadequate.  He himself suggested an alternate theory of his own which was more nearly correct and which, in its essentials, is still in force now.

Berzelius, who was in his old age and very conservative, was unable to accept the new theory.  He retaliated furiously and none of the established chemists of the day had the nerve to stand up against the great Swede.

Laurent stuck to his guns and continued to accumulate evidence.  For this he was rewarded by being barred from the more famous laboratories and being forced to remain in the provinces.  He is supposed to have contracted tuberculosis as a result of working in poorly heated laboratories and he died in 1853 at the age of forty-six.

With both Laurent and Berzelius dead, Laurents’s new theory began to gain ground.  In fact, one French chemist who had originally supported Laurent but had backed away in the face of Berzelius’ displeasure now accepted it again and actually tried to make it appear that it was his theory. (Scientists are human, too.)

That’s not even a record for sadness.  The German physicist Julius Robert Mayer, for his championship of the law of conservation of energy in the 1840’s, was driven to madness. Ludwig Boltzmann, the Austrian physicist, for his work on the kinetic theory of gases in the late nineteenth century, was driven to suicide.  The work of both is now accepted and praised beyond measure.

But what did the public have to do with all these cases? Why, nothing. They never heard of them.  They never cared. It didn’t touch any of their great concerns.  In fact, if I wanted to be completely cynical, I would say that the heretics were in this case right and that the public, somehow sensing this, yawned.

This sort of thing goes on in the twentieth century, too. In 1912 a German geologist, Alfred Lothat Wegener, presented to the world his views on the continental drift.  He thought the continents all formed a single lump of land to begin with and that this lump, which he called “Pangaea,” had split up and that the various portions had drifted apart.  He suggested that the land floated on the soft, semi-solid underlying rock and that the continental pieces drifted apart as they floated.

Unfortunately, the evidence seemed to suggest that the underlying rock was far too stiff for continents to drift through and Wegener’s notions were dismissed and even hooted at. For half a century the few people who supported Wegener’s notions had difficulty in getting academic appointments.

Then, after World War II, new techniques of exploration of the sea bottom uncovered the global rift, the phenomenon of sea -floor spreading, the existence of crustal plates, and it became obvious that the Earth’s crust was a group of large pieces that were constantly on the move and that the continents were carried with the pieces. Continental drift, or “plate tectonics,” as it is more properly called, became the cornerstone of geology.

I personally witnessed this turnabout.  In the first two editions of my Guide to Science (Basic Books, 1960, 1965), I mentioned continental drift but dismissed it haughtily in a paragraph.  In the third edition (1972) I devoted several pages to it and admitted having been wrong to dismiss it so readily. (This is no disgrace, actually. If you follow the evidence you must change as additional evidence arrives and invalidates earlier conclusions. It is those who support ideas for emotional reasons only who can’t change.  Additional evidence has no effect on emotion.)

If Wegener had not been a true scientist, he could have made himself famous and wealthy.  All he had to do was take the concept of continental drift and bring it down to earth by having it explain the miracles of the Bible. The splitting of Pangaea might have been the cause, or the result, of Noah’s Flood.  The formation of the Great African Rift might have drowned Sodom.  The Israelites crossed the Red Sea because it was only a half mile wide in those days.  If he had said all that, the book would have been eaten up and he could have retired on his royalties.

In fact, if any reader wants to do this now, he can still get rich. Anyone pointing out this article as the inspirer of the book will be disregarded by the mass of “true believers,” I assure you.

So here’s a new version of Asimov’s Corollary, which you can use as your guide in deciding what to believe and what to dismiss:

If a scientific heresy is ignored or denounced by the general public, there is a chance it may be right. If a scientific heresy is emotionally supported by the general public, it is almost certainly wrong.

You’ll notice that in my two versions of Asimov’s Corollary I was careful to hedge a bit.  In the first I say that scientists are “probably right.” In the second I say that the public is “almost certainly wrong.” I am not absolute. I hint at exceptions.

Alas, not only are people human; not only are scientists human; but I’m human too.  I want the Universe to be as I want it to be and that means completely logical. I want silly, emotional judgments to be always wrong.

Unfortunately, I can’t have the Universe the way I want it, and one of the things that makes me a rational being is that I know this.

Somewhere in history there are bound to be cases in which science said “No” and the general public, for utterly emotional reasons, said “Yes” and in which it was the general public that was right. I thought about it and came up with an example in half a minute.

In 1798 the English physician Edward Jenner, guided by old wives’ tales based on the kind of anecdotal evidence I despise, tested to see whether the mild disease of cow-pox did indeed confer immunity upon humans from the deadly and dreaded disease of smallpox. (He wasn’t content with the anecdotal evidence, you understand; he experimented.) Jenner found the old wives were correct and he established the technique of vaccination.

The medical establishment of the day reacted to the new technique with the greatest suspicion. Had it been left to them, the technique might well have been buried.

However, popular acceptance of vaccination was immediate and overwhelming.  The technique spread to all parts of Europe. The British royal family was vaccinated; the British Parliament voted Jenner ten thousand pounds. In fact, Jenner was given semidivine status.

There’s no problem in seeing why. Smallpox was an unbelievably frightening disease, for when it did not kill, it permanently disfigured.  The general public therefor was almost hysterical with desire for the truth of the suggestion that the disease could be avoided by the mere prick of a needle.

And in this case, the public was right! The Universe was as they wanted it to be. In eighteen months after the introduction of vaccination, for instance, the number of deaths from smallpox in England was reduced to one third of what it had been.

So there are indeed exceptions. The popular fancy is sometimes right.

But not often, and I must warn you that I lose no sleep over the possibility that any of the popular enthusiasms of today are liable to turn out to be scientifically correct. Not an hour of sleep do I lose; not a minute.

BlueSeq Knowledgebase

Remember BlueSeq?  The company I gave a hard time after their presentation at Copenhagenomics?  Turns out they have some cool stuff up on the web.  Here’s a comparison of sequencing technologies that they’ve posted.  Looks like they’ve put together quite a decent set of resources.  I haven’t finished exploring it yet, but it looks quite useful.

Via CLC bio blog – Post: Goldmine of unbiased expert knowledge on next generation sequencing.

Nature Comment : The case for locus-specific databases

There’s an interesting comment available in Nature today (EDIT: it came out last month, though I only found it today.) Unfortunately, it’s by subscription only, but let me save you the hassle of downloading it, if you don’t already have a subscription.  It’s not what I thought it was.

The entire piece fails to make the case for locus-specific databases, but instead conflates locus-specific with “high-resolution”, and then proceeds to tell us why we need high resolution data.  The argument can roughly be summarized as:

  • Omim and databases like it are great, but don’t list all known variations
  • Next-gen sequencing gives us the ability to see genome in high resolution
  • You can only get high-resolution data by managing data in a locus-specific manner
  • Therefore, we should support locus-specific databases

Unfortunately, point number three is actually wrong.  It’s just that our public databases haven’t yet transitioned to the high resolution format.  (ie, we have an internal database that stores data in a genome-wide manner at high resolution…  the data is, alas, not public.)

Thus, on that premise, I don’t think we should be supporting locus specific databases specifically –  indeed, I would say that the support they need is to become amalgamated in to a single genome-wide database at high resolution.

You wouldn’t expect major gains in understanding of car mechanics if you, by analogy, insisted that all parts should be studied independently at high resolution.  Sure you might improve your understanding of each part, and how it works alone, but the real gains come from understanding the whole system.  You might not actually need certain parts, and sometimes you need to understand how two parts work together.  It’s only by studying the whole system that you begin to see the big picture.

IMHO, Locus-specific databases are blinders that we adopt in the name of needing higher resolution, which is more of a comment on the current state of biology.  In fact, the argument can really be made that we don’t need locus-specific databases, we need better bioinformatics!

Why should one become an academic?

You know what?  No one ever bothers to sell the academic path.  In all the time I’ve been in school, and even during my time in industry, no one has ever tried to tell me why I should want to become an academic.

There are a hell of a lot of blogs saying why one should abandon the path to academia, but not a single one that I could find saying “hey everyone, this is why I think academia is great”.  It’s as though everyone is born wanting to be an academic, and you only have to hear the other side to be convinced away from the natural academic leanings.

Of course, there’s a huge amount of competition for academic positions, so it isn’t exactly like people want to encourage incoming students to go down that path.  All that I see in searching the web is the balanced approach about weighing the two options – and that even assumes that all academia is the same, and all industry is the same.  (A blatant lie, if I ever heard one!)

Anyhow, the best I could do in putting together my list of why one should go into academia is in the following set of links.

If there are any academics out there who want to sell the academic path, this would be a great topic for future posts.  I’d love to read it.

As best as I can glean, the only reasons for it are “better working hours, once you become tenured” and “you can be your own boss”.   Seriously, there must be more to it than that!  Anyone?

Is blogging revolutionizing science communication?

There’s been a lot of talk about blogging changing the nature of science communication recently that I think is completely missing the mark.  And, given that I see this really often, I thought I’d comment on it quickly.   (aka, this is a short, and not particularly well researched post… but deal with it.  I’m on “vacation” this week.)

Two of the articles/posts that are still on my desktop (that discuss this topic, albeit in the context of changing the presentation of science, not really in science communication) are:

But  I’ve come across a ton of them, and they all say (emphatically) that blogging has changed the way we communicate in science.  Well Yes and No.

Yes, it has changed the way scientists communicate between themselves.  I don’t run to the journal stacks anymore when I want to know what’s going on in someone’s lab, I run to the lab blog.  Or I check the twitter feed… or I’ll look for someone else blogging about the research.  You learn a lot that way, and it is actually representative of what’s going on in the world – and the researcher’s opinions on a much broader set of topics.  That is to say, it’s not a static picture of what small set of experiments worked in the lab in 1997.

On the other hand, I don’t think that there are nearly enough bloggers making science accessible for lay people.  We haven’t made science more easily understood by those outside of our fields – we’ve just make it easier for scientists inside our own field to find and compare information.

I know there are a few good blogs out there trying to make research easier to understand, but they are few and far between.  I, personally, haven’t written an article trying to explain what I do for a non-scientist in well over a year.

So, yes, blogging has changed science communication, but as far as I can tell, we’ve only changed it for the scientists.

BlueSEQ revisited

On the first day of the Copenhagenomics 2011 conference, I took notes on a presentation made by Peter Jabbour of BlueSEQ in which I interlaced some comments of my own. I was particularly disappointed in the presentation, which completely failed, in my opinion, to demonstrate the value of the company.  This prompted BlueSEQ marketer Shawn Baker to post a reply that addresses some of my points, but failed to get to the heart of the matter.  However, I had the opportunity to speak to BlueSEQ CEO Michael Heltzen on Friday morning, setting me straight on several facts.  Given what I’d learned, I thought it was important to take the time to revisit what I had said about BlueSEQ.

I understand some people thought my criticism of BlueSEQ was targeted.  Let me set the record straight: Of all of the companies that presented or attended at Copenhagenomics 2011, the only one I have any relationship at all with is CLC bio, and that is – to this point – entirely informal.  Any criticisms I have made about BlueSEQ, or any other any company, are simply my own opinion based on the information presented – and for the record, I do have a little experience with business models.

In this case, the presentation lead me to believe there were a lot of holes in the BlueSEQ business model.  Fortunately, CEO Michael Heltzen was kind enough to patiently answer my questions and explain the business model to me, which has prompted me to change my opinion.

In case you haven’t heard of BlueSEQ, they’re an organization that serves to match users that have unmet sequencing needs (“users”) with groups that have surplus sequencing capacity (“providers”). This is a simplified version of what they do, at least – and was the focus of their presentation at Copenhagenomics 2011.

Initially, BlueSEQ set themselves up during the presentation as a young company that just “went live” recently.  While there’s nothing wrong with that, I have spent time as an entrepreneur and am aware that young companies have a tendency to be a little overly optimistic about their markets and potential for finding customers.  Although BlueSEQ did boast of about a hundred users signing up for their services, I listened carefully but didn’t hear anything about providers having signed up as well.  That set off flags for me.  BlueSEQ CEO Michael Heltzen patiently explained to me that they do, in fact, have 25 providers already signed up – a very impressive number for just over a month of operations.

Having paying clients, or providers in this case, is 90% of the battle for any match-making company and knowing that there are groups paying for BlueSEQ’s services should be music to the ears of any potential investors.  That, on it’s own, provided some significant validation of the company’s business model for me.  Obviously, if people are currently paying for it, then clearly there is value.

And speaking of paying, the presentation did not explain what it was that providers were paying for.  A 10% service fee – charged to providers – was mentioned during the presentation, which seems a little high for nothing more than a service linking buyers with sellers.  I heard the same comment from other people who saw the presentation and voiced their concern (albeit more quietly than I did) that it was a bit disproportional.  However, again, BlueSEQ’s Michael Heltzen provided the explanation:  BlueSEQ doesn’t just match sequencing providers with users –  they provide a complete front-office service, not only promoting the sequencing centre’s business by matching them with the users, but also by handling the initial steps of any inquiries and working with the user to sort out the wet and dry lab requirements of any potential sequencing project.  Suddenly, I think the value of BlueSEQ’s services should be apparent.

Many groups with excess sequencing capacity may find themselves in a position where they have the ability to provide sequencing services, but not the facilities to handle customer requests or promote themselves to find the users who could take advantage of the sequencing services.  Enter BlueSEQ.

This explanation, diametrically opposite to the “web portal” model described during the business presentation, suddenly shows where the potential for an entrepreneurial group can build a concrete business.   The analogy used during the BlueSeq presentation of a web portal where people can buy airline tickets by comparing prices on-line was a poor choice, completely diminishing the value that BlueSEQ provides by interpreting, analyzing and, in-part, educating the sequencing users.  What a service that could be!

With good experimental design being one of the most difficult parts of science, BlueSEQ is in fact sitting in the wonderful position of being the early entry into a completely new business model.  They are able to transform the disjointed requests of novice users into complete experimental plans and then match those experiments with labs that have experience and capacity for performing those experiments well.  The user gains by getting competitive quotes and help in setting up the product they want, while the the provider gains by being able to focus on the service they provide without the complexities of dealing with customers that may not know what they want or need.

Pure genius.

Of course, there are still pitfalls ahead with this type of business model.  There really is no bar to entry for other competitors, other than the experience of the current group. (I’m sure it’s extensive, but there are others out there who could do the same.)  There is also no real guarantee that what they are doing will be cost effective in the long run.  As sequencing becomes cheaper and cheaper, it might actually come to a point where it will be more cost efficient to turn to a professional sequencing company like Complete Genomics that does provide a full service than to a portal and matchmaking service like BlueSEQ.  Of course, those are concerns that I’m sure BlueSEQ has put more thought into than I have – and will be up to them to solve.

As I said last time, and I meant it quite sincerely: Good luck to the business.  I’ll be looking forward to hearing their presentations in the future – and I hope they have only good things to report.

Dueling Databases of Human Variation

When I got it to work this morning, I was greeted by an email from 23andMe’s PR company, saying they have “built one of the world’s largest databases of individual genetic information.”   Normally, I wouldn’t even bat an eye at a claim like that.  I’m pretty sure it is a big database of variation…  but I thought I should throw down the gauntlet and give 23andMe a run for their money.  (-:

The timing for it couldn’t be better for me.  My own database actually ran out of auto-increment IDs this week, as we surpassed 2^31 snps entered into the db and had to upgrade the key field to bigint from int. (Some variant calls have been deleted and replaced as variant callers have improved, so we actually have only 1.2 Billion variations recorded against the hg18 version of the human genome.  A few hundred million more than that for hg19.)  So, I thought I might have a bit of a claim to having one of the largest databases of human variation as well.  Of course, comparing databases really is dependent on the metric being used, but hey, there’s some academic value in trying anyhow.

In the first corner, my database stores information from 2200+ samples (cancer and non-cancer tissue), genome wide (or transcriptome wide, depending on the source of the information.), giving us a wide sampling of data, including variations unique to individuals, as well as common polymorphisms.  In the other corner, 23andMe has sampled a much greater number of individuals (100,000) using a SNP chip, meaning that they’re only able to sample a small amount of the variation in an individual – about 1/3rd of a single percent of the total amount of DNA in each individual.

(According to this page, they look at only 1 million possible SNPs, instead of the 3 Billion bases at which single nucleotide variations can be found – although arguments can be made about the importance of that specific fraction of a percent.)

The nature of the data being stored is pretty important, however.  For many studies, the number of people sampled has a greater impact on the statistics than the number of sites studied and, since those are mainly the ones 23andMe are doing, clearly their database is more useful in that regard.  In contrast, my database stores data from both cancer and non-cancer samples, which allows us to make sense of variations observed in specific types of cancers – and because cancer derived variations are less predictable (ie, not in the same 1M snps each time) than the run-of-the-mill-standard-human-variation-type snps, the same technology 23andMe used would have been entirely inappropriate for the cancer research we do.

Unfortunately, that means comparing the two databases is completely impossible – they have different purposes, different data and probably different designs.  They have a database of 100k individuals, covering 1 million sites, whereas my database has 2k individuals, covering closer to 3 billion base pairs.  So yeah, apples and oranges.

(In practice, however, we don’t see variations at all 3 Billion base pairs, so that metric is somewhat skewed itself.  The number is closer to 100 Million bp –  a fraction of the genome nearly 100 times larger than what 23andMe is actually sampling.)

But, I’d still be interested in knowing the absolute number of variations they’ve observed…  a great prize upon which we could hold this epic battle of “largest database of human variations.”  At best, 23andMe’s database holds 10^11 variations, (1×10^6 SNPs x 1×10^5 people), if every single variant was found in every single person – a rather unlikely case.  With my database currently  at 1.2×10^9 variations, I think we’ve got some pretty even odds here.

Really, despite the joking about comparing database sizes, the real deal would be the fantastic opportunity to learn something interesting by merging the two databases, which could teach use something both about cancer and about the frequencies of variations in the human population.

Alas, that is pretty much certain to never happen.  I doubt 23andMe will make their database public – and our organization never will either.  Beyond the ethical issues of making that type of information public, there are pretty good reasons why this data can only be shared with collaborators – and in measured doses at that.  That’s another topic for another day, which I won’t go into here.

For now, 23andMe and I will just have to settle for both having “one of the world’s largest databases of individual genetic information.”  The battle royale for the title will have to wait for another day… and who knows what other behemoths are lurking in other research labs around the world.

On the other hand, the irony of a graduate student challenging 23andMe for the title of largest database of human variation really does make my day. (=

[Note: I should mention that when I say that I have a database of human variation, the database was my creation but the data belongs to the Genome Sciences Centre – and credit should be given to all of those who did the biology and bench work, performed the sequencing, ran the bioinformatics pipelines and assisted in populating the database.]