>Link Roundup – November 30th, 2009

>Here are my picks from the last two weeks, while I’ve been busy with other things. I’ve skimmed all of the articles below, and will probably have to return to read a few more closely, but each of them seemed interesting.

I’ve filtered out more than usual, simply because of the sheer number of links that collected in my in-box. (194 tweets, to be exact.) However, that also gave me an interesting opportunity to gather some statistics. For each link, I’ve given the twitter name of the person from whom I first saw the content. Since most of the tweets pop up in my box 2-3 times from different people, the list below is pretty much the list of people generating original content, or who are fastest at getting the content into twitter. Thus, it’s kinda neat to see who’s shaping the twitter conversations I follow.

Personal Genomics:

  • Reply to why Personal Genomics are worthwhile – Link (via @dgmacarthur)
  • DNA testing is changing fatherhood – Link (via @genomicslawyer)
  • Why paternity testing should be mandatory – Link (via @dgmacarthur)
  • Another DTC scam? Athleticode – Link (via @dgmacarthur)

Bioinformatics & Sequencing:

  • Benchmarking the cloud for genomics – Link (via @dgmacarthur)
  • The cost/benefit figures calculated for cloud genomics – Link (via @lukejostins)
  • Hacking Admixed 23andMe Ancestry Paintings – Link (via @dgmacarthur)
  • Amazon’s usage plans for web service research grant – Link (via @BioInfo)
  • Great chart comparing 2nd gen sequencing platforms – Link (via @CLCbio)
  • Data processing for GWAS data (abstract) – Link (via (@KatherineMejia)
  • 2nd Gen sequence analysis tools in BioLinux? – Link (via @BioInfo)
  • 2nd Gen command line tools – Link (via @BioInfo)
  • Luke Jostins comments on bad (possibly wrong) media coverage of GWAS – Link (via @lukejostins)
  • Bioinformatics are too good to be true? – Link (via @Katherine Mejia)
  • Sign up for new de novo assembler coming soon from CLC Bio- Link (via @CLCBio – kinda spamish?)

Computers & Math:

  • Chrome OS starts to appear for testing – Link (via @BioInfo)
  • Give everyone the chance to use science derived data – Link (via @BioInfo)
  • How heavy is the internet – Link (via @lukejostins)
  • Arthur Benjamin on teaching the right math – Link (via @apfejes)

Odds & Ends:

  • Growing meat without animals – Link (via @KatherineMejia)
  • Seeding science in the developing world – Link (via @ritajlg)

>Math…

>I haven’t opened twitter in a week and a half – I’ve just been too busy, but I will get back to it, starting tomorrow. So yes, I will continue the link roundups, but I just couldn’t do it when I had my committee meeting, it’s aftermath and a talk (which I just gave an hour ago).

Anyhow, in the meantime, I have a short video for you – and don’t worry, it’s short.

>What I’ve learned about PhD committees

>[Update: thanks to some excellent feedback, I thought I'd revisit this article and clean it up. I've tried to be clear where the revisions are, and only made minor clarifications to the body of the text where warranted.]

This has been a really bad week for me. It started with a botched committee meeting, a death in the family, and then a series of technical errors that have annoyed me to no end. All that has made me walk away from my computer in frustration several times, only to return and find something else that upsets me. Unfortunately, the technical issues are mostly just that: technical. They’re not something that other people will learn anything from, with the possible exception of this:

I understand dbsnp 130 has now begun to include cancer causing mutations and pretty much everything else in their annotated snps. And, of course, there doesn’t seem to be any mention of this on the web. Knowing this, you obviously shouldn’t use it for filtering out “neutral” changes. It won’t work. (If you’re working on genetic variations from RNA-seq like I am, this warning might save you a few hours or pain – or better, prevent severe embarrassment if you start talking about filtering in front of an audience, as a fellow grad student and friend of mine did recently.)

Anyhow, the greater part of the lessons I learned this week were about Grad School, and what I learned about committee meetings can be summed up in a few quick points: [Note, this is advice on meeting with the committee as a body, not meeting with individual members.]

  • Your relationship with your committee is not [necessarily] a friendly one.

They may be friendly with you, but they’re not there to give you friendly advice and guide you through your PhD. Instead, the committee (as a body) is really engaged in an adversarial relationship in which they are the gate keepers that will decide when you can leave this pit of doom, and they are the ones that will open the door at the end when they believe you’re ready to depart. Yes, they do have the roadmap to letting you out, but they would much rather you figure it out yourself instead of asking them to help plan it. [To be clear, it is the job of your committee members individually to give you advice and help you out - and the job of your advisor to help you find that road map. The committee exists to make sure that you've satisfied the requirements. Every project is different, so you'll have to chart your own path - and the committee as a body knows where you'll end up, but not how you'll get there.]

  • Your committee is not interested in your progress – they’re interested in your results.

The difference may be subtle, but it changes how I view my committee meetings. No more will I go in there with a “progress report” style presentation. Instead, I’m going to go in to present results, the same way I would if I were in a journal club presenting a paper. They don’t care if I’ve learned new coding languages, solved 12 cold cases and rescued a baby from a burning house. They only want to know what my results look like – because those will go into my thesis, and that’s all that matters. (Don’t bother asking what they think your thesis should include… you should decide that, and then they’ll tell you afterwards if you’re wrong – see “roadmap” in the first point.)

  • Your committee is not expecting great things from you – they want you to know what they know.

Actually, they expect you to memorize useless details, be able to regurgitate the names of people in your field blindly, and know which journal has the highest impact factor in your field. What they’re really after is that you should be able to point to the people they know in the field and explain how they solved the problems you’re working on. If you know who they know, you’ll know who’s papers they read. [This probably sounds much more harsh than I meant it to be. Getting a PhD means you're an expert in your field, and thus know all the details - when you know all of them, that's when you're ready to leave. The purpose of the committee as a body is to ensure you're an expert, not that you're destined for a Nobel prize. The only criteria they have to judge you on is what they know about your field. So knowing what they know about your field is the way to show that you're an expert - after all, those are the questions they'll ask you to determine if you're right about what you know. And yes, as undergrads, we all learned that the right answer to a question is what the professor gave in his notes, not what you think the right answer is.]

  • When your committee asks you an opinion question, they aren’t asking your opinion – they’re asking their opinion.

This should be obvious to any 1st year undergrad student, but as a grad student, we tend to forget it. Professors may ask a question that starts with “what do you think about/is….” The correct answer is not what you think it is – it’s what THEY think it is. (Remember this subtle point – it will probably be needed in your defense as well.) [Again, somewhat harsh, but it's like I've said above - they're asking you questions to test if you know the answers - and the correct answers, in their mind, will be what they know and what they believe. One on one, you can discuss and debate these issues with your committee members as individuals, but my committee meetings rarely seem to be discussions.]

  • Your committee won’t know why your results are important unless you explicitly explain it to them.

Again, this is something you learn as an undergrad, but may have faded with time. A committee member looked at my presentation and said at the end (paraphrased) “You’re just turning a crank and out pops Venn diagrams”. Obviously, I didn’t do a good job of explaining the 26,000 lines of code I’ve written and the novel algorithms that went into it.

  • Your committee will change their minds – and not know it.

Don’t expect that your committee will remember what they told you last time… they don’t. [It's probably been a year or more since last time you met. We all forget.] My last committee meeting, I was told me (explicitly) I should not include my ChIP-Seq work in my thesis. This time, they told me I’d be crazy to leave it out. (They may have used a different word… I was somewhat in awe at this point in the conversation.) [Clarification: What details may seem important to you are probably insignificant in their lives. Don't expect them to remember for you, and a year is a long time - things may have changed.]

  • Don’t expect sympathy from professors.

Once you’ve irritated all your committee members by doing nothing but turning cranks, remember that your job is just to keep producing results – that’s all that matters. When your committee has just discussed that you’re not turning the crank fast enough, your advisor isn’t going to come for a friendly chat to find out why – they’ll just send you notes that antagonize you. They assume that nothing is going on in your life and that your results are just not there because you have become lazy. They were in grad school once, so they know that your inability to conquer impossible problems is just because you’re off playing ping pong or getting coffee. (Be prepared for this – it’s the inevitable result of a bad committee meeting, if you haven’t taken my advice above.) [Again, harsh, but yeah, I was upset. Still, this was just an example. I do play ping pong, but I don't drink coffee, and yes, I did get a sarcastic email from my supervisor - probably deserved after such a poor presentation to my committee. As they say, Your Millage May Vary - if you're lucky enough to have a supervisor that is holding your hand through the process, that's great - but don't expect it. PhDs are all about preparing the student for the real world - and the real world is harsh.]

  • Professors are very good at juggling tasks, and the only way to learn is trial by fire

Since I’ve discussed it several times with my advisor that I’m doing too many things, and that’s how it’s been all year, maybe it shouldn’t be a surprise I can’t focus on turning out papers. It seems to me that the people who make the grade to become profs are the ones that are able to write grants while juggling 4 projects, and are able to make progress in all of them. That clearly makes sense – lousy PhDs don’t make good profs. However, those people who make it to professorship are (in my opinion) often the ones that are naturally good at managing their own tasks. For those of us trying to manage too many tasks, don’t expect them to help manage your priorities – they do it instinctively for themselves, and they expect you to do it instinctively too – even when your priorities are 180 degrees opposite from what you thought they were. [One of the major lessons I've learned is that in grad school, priorities are what you make them. Your committee exists to make sure they don't slip too far from what they think you need to accomplish - as much as we may all want hand holding, professors are busy people, and your priorities are exactly that: Yours to set and to juggle.]

So, there you have it – it’s been an educational week. I’ve learned:

  1. What a PhD committee is for.
  2. How to talk to and answer questions from committee members.
  3. What to expect from my committee and doing research.
  4. That I need to completely re-organize the way I manage my tasks.

While I’m helpless to do anything about the botched committee meeting, I have been able to work on that last point. I’ve changed how I manage my software, how I interact with my colleagues, what projects get my time, and I’m making a point of saying No to things that won’t get me out of here. With luck, that will put me back on track – which is what my committee wanted in the first place, right?

>Link Roundup – November 13-16

>Genomics:

Ethics:

  • Open Office Hours with Hank Greely (Bioscience and ethics) – Link

Blogging/Popular Culture/Funny:

  • Blogger and Biologist by day, prostitute by night – Link
  • Genomics assisted dating? – Link
  • New (to me) site: sciencehumor.org – Link
  • A LOT of videos of molecular events – Link
  • 35 Amazing Science Fair Projects – Link

Stuff I haven’t read/categorized:

  • Genetic links to Parkinsons – Link and Link

>Juggling

>Anyone who knows me knows that I juggle – not incessantly, but just as a way to pass the time or as a device for concentrating. If you sit near me at work, there’s a good chance I’ve even got you started on juggling as well.

Although not even close to being on-topic for my blog, I figured I had to share this video. It’s one of the better 3-ball juggling videos I’ve seen. He more or less goes through everything I can do… and then still has another 3 and a half minutes of stuff I can’t do. If you’ve ever been curious about what you can do with three or four balls (without the usual behind the back or hacky-sac tricks), this is more or less a catalog. It’s here to remind me next time I run into a problem at work and need to learn something new to get past it.

>News Roundup – Nov 11-13, 2009

>Take two at an article Roundup. Doing this project doesn’t take a lot of work, but the tools I’m using for it are somewhat inconvenient, which makes it an interesting (challenging?) project. Categorization isn’t difficult, but I can see how it’ll be time consuming in the end. I think the only way to really do this is to use a micro-database with a small gui to do the work and generate the HTML at the end of the day…. but hey, I’m a programmer, so you’d think I could write one if I wanted. (And yes, I could try to use Go to do it…) Unfortunately, it’ll have to wait a few more days. (Yes, I have a committee meeting on tuesday, and I’m still blogging – what’s wrong with this picture?)

You’ll notice some overlap in the days covered, mainly because I’m posting midway through the day. I could even go a step further and automate this, but somehow, I think I should take it one step at a time.

What I really wonder is if I’m casting my net wide enough, as in “what fraction of the cool genomics/bioinformatics/personalized medicine articles am I actually reading?” At least, based on the theory that I’m subscribed to the best of the cool twitterers who do a lot of the pre-filtering for me, if you believe in community filtering, maybe this really is the best of the best. (-:

Well, without any more delay, here’s the best of the past 48 hours.

——–

Microfluidics:

  • Cheap DIY Microfluidics using Shrinking Plastic – Link

Popular Health:

  • To immunize or not immunize – Link
  • Imagining Personalized Medicine – Link
  • Larry David finds out he’s not who he thinks he is – Video

Sequencing Technology:

  • Nanopores with single base resolution – Link

Computing:

  • Amazon offers various forms of computing resource, enumerated here – Link

Biotechnology:

  • Opening up the courts to allow anyone to challenge a patent – Link

>Go from Google…

>Just a short post, since I’m actually (although you probably can’t tell) rather busy today. However, I’m absolutely fascinated by Google’s new language, Go. It’s taken the best from just about every existing language out there, and appears so clean!

I’m currently watching Google’s talk on it, while I write… I’m only a few minutes in, but it seems pretty good. Watching this seriously makes me want to start a new bio-go project… so nifty!

>Article Roundup – November 9-11, 2009

>I’ve decided to start a new resource for myself, and for anyone who might find this useful. Each week, an incredible number of neat articles and posts do the rounds on twitter – but I often find myself skimming them and then forgetting where I saw it. Going back through twitter feeds is such a hassle that I figured I should start collecting them into a single resource.

I’m not quite sure what the right format it, but I’m sure it’ll fall into place quickly. The only criteria I have for this list is that it must be 1. insightful, 2. science related, preferable genomics or genetics related and 3. well written. I’ll probably end up settling for any two of the above, but hey, for the first post, I’ll try to keep my standards high.

Originally, when I started this, I had assumed I could do this one week at a time, although it’s rapidly becoming difficult to process the significant number of links. Instead, I’ll start off with 2-3 days, which should make it manageable. Additionally, having looked at this over the past couple days, I can see that some serious categorization will be necessary in the future. So, for a first try, here are the links I’ve collected since Monday.

——————————-
Articles & Blogs

  • Commentary on Next Gen technology and the gap to sequence analysis – Link
  • On why breakthroughs really aren’t… (the curse of multiple testing and statistics) – Link
  • Differences between DTC genotyping – Link
  • Commentary on differences between DTC genotyping risks – Link
  • The difference between Genetic and Genealogical family trees – Link
  • Helicos no longer selling itself – Link
  • ResearchMatch: an NIH resource to link volunteers with studies – Link
  • Update on Bilski (Patent-ability of non-machine inventions) and biotech – Link
  • Why sequencing non-Homo sapiens genomes is important (Horse genome) – Link
  • How related are you to your relations? – Link
  • Career Development for Life Scientists: An Ongoing and Disturbing Trend – Link

Off Topic posts:

  • DRM and Apple – a business model – Link

Unusual Tweets:

  • “RT @lindaavey : Looks like my husband’s 4th cousin is my dad’s 10th cousin (my 11th cousin?).” – From dgmacarthur, Nov 11th, 2009

>ChIP-Seq normalization.

>I’ve spent a lot of time working on ChIP-Seq controls recently, and wanted to raise an interesting point that I haven’t seen addressed much: How to normalize well. (I don’t claim to have read ALL of the chip-seq literature, and someone may have already beaten me to the punch… but I’m not aware of anything published on this yet.)

The question of normalization occurs as soon as you raise the issue of controls or comparing any two samples. You have to take it in to account when doing any type of comparision, really, so it’s somewhat important as the backbone to any good second-gen work.

The most common thing I’ve heard to date is to simply normalize by the number of tags in each data set. As far as I’m concerned, that really will only work when your data sets come from the same library, or two highly correlated samples – when nearly all of your tags come from the same locations.

However, this method fails as soon as you move into doing a null control.

Imagine you have two samples, one is your null control, with the “background” sequences in it. When you seqeunce, you get ~6M tags, all of which represent noise. The other is ChIP-Seq, so some background plus an enriched signal. When you sequence, hopefully you sequence 90% of your signal, and 10% of the background to get ~8M tags – of which ~.8M are noise. When you do a compare, the number of tags isn’t quite doing justice to the relationship between the two samples.

So what’s the real answer? Actually, I’m not sure – but I’ve come up with two different methods of doing controls in FindPeaks: One where you normalize by identifying a (symmetrical) linear regression through points that are found in both samples, the other by identifying the points that appear in both samples and summing up their peak heights. Oddly enough, they both work well, but in different scenarios. And clearly, both appear (so far) to work better than just assuming the number of tags is a good normalization ratio.

More interesting, yet, is that the normalization seems to change dramatically between chromosomes (as does the number of mapping reads), which leads you to ask why that might be. Unfortunately, I’m really not sure why it is. Why should one chromosome be over-represented in an “input dna” control?

Either way, I don’t think any of us are getting to the bottom of the rabbit hole of doing comparisons or good controls yet. On the bright side, however, we’ve come a LONG way from just assuming peak heights should fall into a nice Poisson distribution!

>New ChIP-seq control

>Ok, so I’ve finally implemented and debugged a second type of control in FindPeaks… It’s different, and it seems to be more sensitive, requiring less assumptions to be made about the data set itself.

What it needs, now, is some testing. Is anyone out there willing to try a novel form of control on a dataset that they have? (I won’t promise it’s flawless, but hey, it’s open source, and I’m willing to bug fix anything people find.)

If you do, let me know, and I’ll tell you how to activate it. Let the testing begin!