>SNP Database v0.2

>My SNP database is now up and running, with the first imports of data working well. That’s a huge improvement over the v0.1, where the data had to be entered under pretty tightly controlled circumstances. The API now uses locks, better indexes, and I’ve even tuned the database a little. (I also cheated a little and boosted the P4 running it to 1Gb RAM.)

So, what’s most interesting to me? Some of the early stats:

11,545,499 snps in total, made from:

  • 870549 snp calls from the 1000 genome project
  • 11361676 snps from dbsnp

So, some quick math:
11,361,676 + 870,549 – 11,545,499 = 686,726 Snps that overlapped between the 1000 genome project (34 data sets) and the dbSNP calls.

That is a whopping 1.6% of the SNPs in my database were not previously annotated in dbSNP.

I suppose that’s not a bad thing, since those samples were all “normals”, and it’s good to get some sense as to how big dbSNP really is.

Anyhow, now the fun with the database begins. A bit of documentation, a few scripts to start extracting data, and then time to put in all of the cancer datasets….

This is starting to become fun.

Leave a Reply

Your email address will not be published. Required fields are marked *