>Science Online 2009 London.

>I just saw the most awesome conference in a tweet from Daniel McArthur: Science Online 2009 London. Unfortunately, a) it’s in London, UK, which is a little too far for me to walk and b) they’re already filled up with registrants. Fortunately, they will be streaming the whole conference on the web, which I’m highly tempted to buy into. (It costs 10 GBP… that’s ~$20 CDN, which is vastly more reasonable than flying to London.)

The program has awesome events, including Blogging for impact (Speakers: Dave Munger, Daniel MacArthur), Author identity – Creating a new kind of reputation online (Speakers: Duncan Hull, Geoffrey Bilder, Michael Habib, Reynold Guida – I have to admit I don’t know any of them… but I’ll go look them up later), and Legal and Ethical Aspects of Science Blogging (Speakers: Petra Boynton, David Allen Green).

In fact, pretty much every session sounds like it will be worth the 10 pounds… If only I were in london

>I hate Facebook – part 2

>I wasn’t going to elaborate on yesterday’s rant about hating facebook, but several people made comments, which got me thinking even more.

My main point yesterday was that I hate facebook because it’s protocols aren’t open, and is consequently is a “Walled Garden” approach to social networking. (Here’s another great rant on the subject) That’s not to say that you can’t work with it – there are plugins for pidgin that let you chat on the facebook protocol, and there are clients (as was pointed out to me) that will integrate your IMs with the facebook chat for windows. But that wasn’t my point anyways.

My point is that it’s creating it’s own separate protocols, which are each independent of the ones before it. In contrast to a service like twitter, in which the underlying protocol is XML, and is thus easily manipulated, using Facebook requires you work within their universe of standards. (I’m not the first person to come up with this – google will find you lots of examples of other people blogging the same thing.)

On the whole, that’s not necessarily a bad thing, but common, reusable standards are what drive progress.

For instance, without a common HTML standard, the web would not have flourished – we’d have many independent webs. If AOL had their way, they’d still have you dialing up into their own proprietary Internet.

Without a common electricity format, we’d have to pick the appropriate set of appliances for our homes with independent plugs – buying a hair dryer would be infinitely more painful than it would need to be.

Without a common word processing format, we’d suffer every time we try to send a document to someone who’s not using the same word processor that you do. (Oh wait, that’s actually Microsoft’s game – they refuse to properly support the one common document format every one else uses.)

So, when it comes to Facebook, my hate is this – if they used a simple RSS feed for the wall, I could have used that instead of twitter on my site. If they used a simple Jabber format for their chat, I could have merged that with my google chat account. And then there’s their private message system… well, that’s just email, but not accessible by IMAP or POP.

What they’ve done is try to resurect a business model that the web-unsavy keep trying. In the short term, it’s pure money. You drive people into it because everyone is using it. The innovate concept makes it’s adoption rapid and ubiquitous – but then you fall into the trap. The second generation of sites use open standards, and that allows FAR more cool things to be accomplished.

Examples of companies trying the walled garden approach on the net:

AOL and their independent internet, accessible only to AOL subscribers. Current Status: Laughable

Microsoft’s Hotmail, where hotmail users can’t export their email to migrate away. Current Status: GMail fodder.

Yahoo’s communities. Current Status: irrelevant.

Wall Street Journal’s new site. Current Status: ridiculed by people younger than 45.

Apple’s i(phone/pod/tunes/etc). Current Status: Frequently hacked, forced to accept the defacto .mp3 format. (No Ogg yet…)

Ok, that’s enough examples. All I have to say is that when Google (or anyone else) gets around to building a social networking site that’s open and easy to play with, it won’t be long before Facebook colapses.

The moral of the story? Don’t invest too much in your facebook profile – it’ll be obsolete in a few years.

>I hate facebook

>I have a short rant to end the day, brought on by my ever increasing tie-in between the web and my desktop (now KDE 4.3):

I hate facebook.

It’s not that I hate it the way I hate Myspace, which I hate because it’s so easy to make horribly annoying web pages. It’s not even that I hate it the way I hate Microsoft, which I hate because their business engages in unethical practices.

I hate it because it’s a walled garden. Not that I have a problem with walled gardens in principle, but it’s just so inaccessible – which is exactly what the facebook owners want. If you can only get at facebook through the facebook interface, you have to see their adds, which makes them money, if you ever get sucked into them. (You now have to manually opt out of having your picture used in adds for your friends… its a new option for your profile in your security settings, if you don’t believe me.)

Seriously, the whole facebook wall can be recreated with twitter, the photo albums with flickr, the private messages with gmail…. and all of it can be tied together in one place. Frankly, I suspect that’s what Google’s “Wave” will be.

If I could integrate my twitter account with my wall on facebook, that would be seriously useful – but why should I invest the energy to update my status twice? Why should I have to maintain my own web page AND the profile on facebook…

Yes, it’s a minor rant, but I just wanted to put that out there. Facebook is a great idea and a leader of it’s genre, but in the end, it’s going to die if its community starts drifting towards equivalent services that are more easily integrated into the desktop. I can now update twitter using an applet on my desktop – but facebook still requires a login so that I can see their adds.

Anyhow, If you don’t believe me about where this is all going, wait to see what Google Wave and Chrome do for you. I’m willing to bet desktop publishing will have a whole new meaning, and on-line communities will be a part of your computer experience even before you open your browser window.

For a taste of what’s now on my desktop, check out the OpenDesktop, Remember the Milk and microblog ( or even Choqok) plasmoids.

>Aligner tests

>You know what I’d kill for? A simple set of tests for each aligner available. I have no idea why we didn’t do this ages ago. I’m sick of off-by-one errors caused by all sorts of slightly different formats available – and I can’t do unit tests without a good simple demonstration file for each aligner type.

I know Sam format should help with this – assuming everyone adopts it – but even for SAM I don’t have a good control file.

I’ve asked someone here to set up this test using a known sequence- and if it works, I’ll bundle the results into the Vancouver Package so everyone can use it.

Here’s the 50-mer I picked to do the test. For those of you with some knowledge of cancer, it comes from tp53. It appears to blast uniquely to this location only.

>forward - chr17:7,519,148-7,519,197
CATGTGCTGTGACTGCTTGTAGATGGCCATGGCGCGGACGCGGGTGCCGG

>reverse - chr17:7,519,148-7,519,197
ccggcacccgcgtccgcgccatggccatctacaagcagtcacagcacatg

>From a report

>I was trying to sum up some of the development work done on FindPeaks in April-June this year for a quarterly report and ended up writing the following text. Maybe someone will be inspired by it to give the package a shot. (=

FindPeaks now includes Control and Compare modes that operate to identify features that differ in statistically significant ways between a sample and a control or two samples. In the Control mode, only those locations which differ significantly with greater enrichment in the sample are preserved, whereas Compare mode identifies areas of differing enrichment in both the sample and the control. This method uses peak pairing and linear regression methods that are symmetrical (resulting in identical peak pairing and statistics regardless of the order of the data sets presented). These methods can be used in a wide variety of situations including ChIP-Seq, RNA-Seq and even in copy number variation analysis for whole genome comparative analysis.

FindFeatures is a new application in the FindPeaks/Vancouver Short Read Analysis Package that allows peaks identified by the FindPeaks application to be mapped to annotated features on the genome of interest contained in the Ensembl database. This tool set uses the peaks files produced by the FindPeaks application to convert the relevant locations to a generic – bedfile-like format, which can then be used to identify any genes (introns or exons) to which they map. This may also be used to identify areas upstream of genes, or in close proximity to other features of interest.

>Dates and misleading messages.

>Here’s an entertaining debugging challenge for people.

I was trying to get the history of code changes between April and June, so that I could write up a quick report for a working group at the GSC. I used the following command:

svn log -r {2009-06-31}:{2009-04-01}

and got the following error:

svn: Syntax error in revision argument '{2009-06-31}:{2009-04-01}'

After scratching my head for a while to figure out what the correct syntax was, and going through a ton of different threads on-line to figure out what the correct format should be, I finally figured out the error…

Are you ready for it?

June doesn’t have 31 days. Replacing it with the correct date range solved the error. Oops!

>how recently was your sample sequenced?

>One more blog for the day. I was postponing writing this one because it’s been driving me nuts, and I thought I might be able to work around it… but clearly I can’t.

With all the work I’ve put into the controls and compares in FindPeaks, I thought I was finally clear of the bugs and pains of working on the software itself – and I think I am. Unfortunately, what I didn’t count on was that the data sets themselves may not be amenable to this analysis.

My control finally came off the sequencer a couple weeks ago, and I’ve been working with it for various analyses (snps and the like – it’s a WTSS data set)… and I finally plugged it into my FindPeaks/FindFeatures pipeline. Unfortunately, while the analysis is good, the sample itself is looking pretty bad. In looking at the data sets, the only thing I can figure is that the year and a half of sequencing chemistry changes has made a big impact on the number of aligning reads and the quality of the reads obtained. I no longer get a linear correlation between the two libraries – it looks partly sigmoidal.

Unfortunately, there’s nothing to do except re-seqeunce the sample. But really, I guess that makes sense. If you’re doing a comparison between two data-sets, you need them to have as few differences as possible.

I just never realized that the time between samples also needed to be controlled. Now I have a new question when I review papers: How much time elapsed between the sequencing of your sample and it’s control?

>Picard code contribution

>

Update 2: I should point out that the subject of this post has been resolved. I’ll mark it down to a misunderstanding. The patches I submitted were accepted several days after being sent and rejected, once the purpose of the patch was clarified with the developers. I will leave the rest of the post here, for posterity sake, and because I think that there is some merit to the points I made, even if they were misguided in their target.

Today is going to be a very blog-ful day. I just seem to have a lot to rant about. I’ll be blaming it on the spider and a lack of sleep.

One of the things that thrills me about Open Source software is the ability for anyone to make contributions (above and beyond the ability to share and understand the source code) – and I was ecstatic when I discovered the java based Picard project, an open source set of libraries for working with SAM/BAM files. I’ve been slowly reading through the code, as I’d like to use it in my project for reading/writing SAM format files – which nearly all of the aligners available are moving towards.

One of those wonderful tools that I use for my own development is called Enerjy. It’s an Eclipse plug-in designed to help you write better java code by making suggestions about things that can be improved. A lot of it’s suggestions are simple: re-order imports to make them alphabetical (and more readable), fill in missing javadoc flags, etc. They’re not key pieces, but they are important to maintain your code’s good health. It does also point the way to things that will likely cause bugs as well (such as doing string comparisons with the “==” operator).

While reading through the Picard libraries and code, Enerjy threw more than 1600 warnings. It’s not in bad shape, but it’s got a lot of little “problems” that could easily be fixed. Mainly a lot of missing javadoc, un-cast generic types, arrays being passed between classes and the like. As part of my efforts to read through and understand the code, which I want to do before using it, I figured I’d fix these details. As I ramped up into the more complex warnings, I wanted to start small while still making a contribution. Open source at it’s best, right?

The sad part of the tale is that open source only works when the community’s contributions are welcome. Apparently, with Picard, code cleaning and maintenance isn’t. My first set of patches (dealing mainly with the trivial warnings) were rejected. With that reception, I’m not going to waste my time submitting the second set of changes I made. That’s kind of sad, in my opinion. I expressly told them that these patches were just a small start and that I’d begin making larger code contributions as my familiarity with the code improves – and at this rate, my familiarity with the code is definitely not going to mature as quickly, since I have much less motivation to clean up their warnings if they themselves aren’t interested in fixing them.

At any rate, perhaps I should have known. Open source in science usually means people have agendas about what they’d like to accomplish with the software – and including contributions may mean including someone on a publication downstream if and when it does become published. I don’t know if that was the case here: it was well within the project leader’s rights to reject my patches on any grounds they like, but I can’t say it makes me happy. I still don’t enjoy staring at 1600+ warnings every time I open Eclipse.

The only lesson I take away from this is that next time I see “Open Source” software, I’ll remember that just because it’s open source, it doesn’t mean all contributions are welcome – I should have confirmed with the developers before touching the code that they are open to small changes, and not just bug fixes. In the future, I suppose I’ll be tempering my excitement for open source science software projects.

update: A friend of mine pointed me to a link that’s highly related. Anyone with an open source project (or interested in getting started in one) should check out this blog post titled Teaching people to fish.

>Giant Spider…

>Ok, way big diversion from my usual set of topics.

I came downstairs for a snack in the evening, slapped some cheese and tomatoes on a slice of bread, and then looked down at the floor when some movement caught my eye – and then ran for a glass. I’m not terrified of spiders, but this bugger was BIG.

After catching the spider, I looked online – I’m not used to finding spiders this size in Canada, and figured it might be something nasty. Indeed, my best classification for it is probably a Hobo Spider, which is actually a venomous spider. (So much for naively thinking there are no poisonous spiders in Canada!) It lacks the banded pattern on the legs – which I carefully investigated in the pictures I took before figuring out how to handle it.

At any rate, the spider was “ejected” from the house, and I spent some time making sure it hadn’t invited any friends over for the party. And, I’m happy to report, there were no bites at the end of the exercise.

>PHP script for latest twitter tweet in HTML

>One of my (many) projects this weekend was to sign up for twitter and then use it as a means of making micro updates to my web page. Obviously, it shouldn’t be hard, but I had a lot of details to work out, and several tickets to have my hosting service upgrade to PHP5, and install the curl library (both of which were necessary for this hack to work).

Since it’s all working now, I thought I’d share the source. This can obviously be modified, but for now, here’s the script that’s doing doing the job. Yes, bits of it were pulled from all over the web, and some of it was cobbled together by me. Obviously, you’ll need to put the correct source for the feed, which is marked below as “http://twitter.com/..####.rss”

Enjoy!



# INSTANTIATE CURL.
$curl = curl_init();

# CURL SETTINGS.
curl_setopt($curl, CURLOPT_URL, "http://twitter.com/..####.rss$
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_CONNECTTIMEOUT, 0);

# GRAB THE XML FILE.
$xmlTwitter = curl_exec($curl);

#curl_close($curl);

# SET UP XML OBJECT.
$xmlObjTwitter = simplexml_load_string( $xmlTwitter );
$item = $xmlObjTwitter -> channel -> item;
$title = substr_replace($item -> title,'',0,8);
$url = $xmlObjTwitter -> channel -> link;
echo "Anthony tweets: {$title}";