>CSCBC 2009

>Someone raised the good point that I had forgotten to mention the origin of the talks I had made notes on last week, which is a very important point for several reasons. Although the conference is over, it was a neat little conference which deserves a little publicity. Additionally, it’s now in planning for it’s fifth year, so it’s worth mentioning just in case people are interested but weren’t aware of it.

The full title of the conference is the Canadian Student Conference on Biomedical Computing, although I believe the next year’s title will also be expanded to include Biomedical Computing and Engineering explicitly. (CSCBCE 2010) This year’s program can be found at http://www.cscbc2009.org/, and my notes for it can all be found under the tag of the same name.

As for why I think it was a neat conference, I suppose I have several reasons. It doesn’t hurt that one of the organizers sits in the cubicle next to mine at the office, and that many of this years organizers are friends through the bioinformatics program at UBC/SFU. But just as important (to me, anyhow), I was invited to be an industry panelist for the conference for the saturday morning session and to help judge the bioinformatics poster session. Both of those were a lot of fun. (Oddly enough, another memver of the industry panel was one of my committee members, and he suggested I would probably graduate in the coming year in front of a room of witnesses…)

Anyhow, back to the point, CSCBCE 2010 is now officially in the planning, and the torch has formally been passed along to the new organizers. I understand next year’s conference is going to be held in May 2010 at my alma matter, the University of Waterloo, which is a beautiful campus in the spring. (I strongly concur with their decision to host it in May instead of March, by the way. Waterloo is typically a rainy, grey and bleak place in March.) And, for those of you who have never been, Waterloo now has it’s own airport. I’m not sure if I’ll be going next year – especially if I’ve completed my degree by then, but if this year’s attendance was any indication of where the conference is heading, it’ll probably be worth checking out.

>Dr. Michael Hallett, McGill University – Towards as systems approach to understanding the tumour microenvironment in breast cancer

>Most of this talk is from 2-3 years ago. Breast cancer is now more deadly for women than lung cancer. Lifetime risk for women is 1 in 9 women. Two most significant risk factors: being a woman, aging.

Treatment protocols include surgery, irradiation, hormonal therapy, chemotherapy, directed antibody therapy. Several clinical and molecular markers are now available to decide the treatment course. These also predict recurrence/survival well… but…

Many caveats: only 50% of Her2+ tumours respond to trastuzumab (Herceptin). No regime for (Her2-, ER-, PR-) “tripple negative” patients other than chemo/radiation. Many ER+ patients do not benefit from tamoxifen. 25% of lymph node negative patients (a less aggressive cancer) will develop micrometastatic disease and possibly recurrence (an example of under-treatment.) – Many other examples of undertreatment.

Microarray data caused a whole new perspective on breast cancer treatment. Created a taxonomy of breast cancer – Breast cancer is at least 5 different diseases. (Luminal Subtype A, Subtype B, ERBB2+, Basal Subtype, Normal Beast-like. Left to right, better prognosis to worst prognosis.)

[background into cellular origin of each type of cell. Classification, too.]

There are now gene expression biomarker panels for breast cancer. Most of them do very well in clinical trials. Point made that we almost never find biomarkers that are single gene. Most of the time you need to look at many many genes to figure out what’s going on. (“Good sign for bioinformatics”)

Microenvironment: Samples used on arrays, as above, include environment when run on arrays. We end up looking at averaging over the tumour. (Contribution of microenvironment is lost.) Epithelial gene expression signature “swamping out” signatures from other cell types. However, tumour cells interact successfully with it’s surrounding tissues.

Most therapies target epithelial cells. Genetic instability in epi cells lead to therapeutic resistance. Stromal cells (endothelial cells in particular) are genetically stable (eg, non-cancer.)

Therefore, If you target the stable microenvironment cells, it won’t become resistant.

Method: using invasive tumours, patient selection, laser capture microdiseaction, RNA isolation and amplification (Two rounds) -> microarray.

BIAS bioinformatics integrative application software. (Tool they’ve built)

LCM + Linear T7 amplification leads to 3′ Bias. Nearly 48% of probes are “bad”. Very hard to pick out the quality data.

Looking at just the tumour epitheila profiles (tumours themselves), confirmed that subtypes cluster as before. (Not new data. The breast cancer profiles we already have are basically epithelial driven.) When you look just at the stroma (the microenvironment), you find 6 different categories, and each one of them have distinct traits, which are not the same. There is almost no agreement between endothelial and epithelial cell categorization.. they are orthogonal.

Use both of these categorizations to predict even more accurate outcomes. Stroma are better at predicting outcome than the tumour type itself.

Found a “bad outcome cluster”, and then investigated each of the 163 genes that were differentially expressid between cluster and rest. Can use it to create a predictor. The subtypes are more difficult to work with, and become confounding effects. Used genes ordered by p-value from logistic regression. Apply to simple naive bayes’ classifier and cross validation using subsets. Identified 26 (of 163) as optimal classifier set.

“If you can’t explain it to a clinician, it won’t work.”

Stroma classifier is stroma specific.. It didn’t work on epithelial cells. But shows as well or better than other predictors (New, valuable information that wasn’t previously available.)

Cross validation of stromal targets against other data sets: worked on 8 datasets which were on bulk tumour. It was surprising that it worked that way, even though bulk tumour is usually just bulk tumour. You can also replicate this with blood vessels from a tumour.

Returning back to biology, you find the genes represent: angiogensis, hypoxic areas, immunosuppression.

[Skipping a few slides that say “on the verge of submission.”] Point: Linear Orderings are more informative than clustering! Things are not binary – it’s a real continuum with transitions between classic clusters. (Crosstalk between activated pathways?)

In a survey (2007, Breast Cancer Research 9-R61?), almost all things that breast cancer clinicians would like research done on is bioinformatic driven classification/organization,etc.


  • define all relevant breast cancer signatures
  • analysis of signatures
  • focus on transcriptional signatures
  • improve quality of signatures
  • aims for better statistics/computation with signatures.

There are too many papers coming out with new signature. Understanding breast cancer data in the litterature involves a lot of grouping and teasing out information – and avoiding noise. Signatures are heavily dependent on tissues type, etc etc.

Traditional pathway analysis: Always need experiment and control and require rankings. If that’s just two patients, that’s fine, if it’s a broad panel of patients, you won’t know what’s going on- you’re now in an unsupervised setting.

There are more than 8000 patients who have had array data collected. Even outcome is difficult to interpret.

Instead, using “BreSAT” to do linear ranking instead of clustering, and try to tease out signatures.

There is an activity of a signature – clinicians have always been ordering patients, so that’s what they want.

What is the optimal ordering that matches with the ordering….[sorry missed that.] Many trends show up when you do this than with hierarchical clustering. (Wnt, Hypoxia) You can even order two things: (eg. BRCA and Interferon), you can see tremendously strong signals. Start to see dependencies between signatures.

Working on several major technologies (chip-chip, microarray, smallRNA) and more precise view of microenvironment.

>Anamaria Crisan and Jing Xiang, UBC – Comparison of Hidden Markov Models and Sparse Bayesian Learning for Detection of Copy Number Alterations.

>Point was to implement a C-algorithm in Matlab. (Pique-Regi et al, 2008). Uses sparse Bayesian Learning (SPL) and Backward Elimination. (Used microarray data for this experiment.)

Identifying gains, loss or neutral. (in this case, they looked at specific genes, rather than regions.) [Probably because they were using array data, not 2nd gen sequencing.]

Novelty of algorithm: piece-wise constant (pwc) representation of breakpoints.

Assume normal distribution of weights, forumale as a posteriori estimate, and apply SBL. Hierarchical prior of the weights and hyperparameters….

[some stats in here] Last step is to optimize using (expectation maximization) EM algorithm.

Done in matlab “because you can do fancy tricks with the code”, easily readable. It’s fast, and diagonals from matrices can be calculated quickly and easily.

Seems to take 30 seconds per chromosome.

Have to filter out noise, which may indicate false breakpoints. So, backwards elimination algorithm – measures significance of each copy number variation found, and removes insignificant points. [AH! This algorithm is very similar to sub-peak optimization in FindPeaks… Basically you drop out the points until you find and remove all points below threshold.]

It’s slower, but more readable than C.

Use CNAHMMer by Sohrab Shah (2006). HMM with Gaussian mixture model to assign CNA type (L,G,N). On the same data set, results were not comparable.

SBL not much faster than CNAHMMer. (Did not always follow vectorized code, however, so some improvements are possible.)

Now planning to move this to Next-Gen sequencing.

Heh.. they were working from template code with Spanish comments! Yikes!

[My comments: this is pretty cool! What else do I need to say. Spanish comments sound evil, though… geez. Ok, so I should say that all their slagging on C probably isn’t that warranted…. but hey, to each their own. ]

>Aria Shahingohar, UWO – Parameter estimation of Bergman’s minimal model of insulin sensitivity using Genetic Algorithm.

>Abnormal insulin production can lead to serious problems. Goal is to enhance the estimation of insulin sensitivity. Glucose is injected into blood at time zero, insulin is injected shortly after. Bergman has a model that describes the curves produced in this experiment.

Equations given for:
Change in plasma glucose over time = ……
Rate of insulin removal….

There are 8 parameters in this model which vary from person to person. The model is a closed loop system, and requires the partitioning of the subsystems [?] Requires good signal to noise ratio.

Use a genetic algorithm to optimize the 8 parameters.

Tested different methods: Genetic algorithms and Simplex method. Also tested various methods of optimization using subsets of information.

Used a maximum of 1000 generations in Genetic Algorithm. Population size 20-40, depending on expt. Each method tested 50 times (stochastic) to measure error for each parameter separately.

Results: GA was always better, and partitioning subsystem works better than trying to estimate all parameters at once.

Conclusion: Genetic algorithm significantly lowers error, and parameters can be estimated with only glucose and insulin measurements.

[My Comments: This was an interesting project which clearly has real world impacts. Although much of it wasn’t particularly well explained, leaving the audience to pick out out the meaning. Very nice presentation, and cool concept. It would be nice to see more information on other algorithms…. ]

An audience member has asked about saturation. That’s another interesting topic that wasn’t covered.

>Harmonie Eleveld and Emilie Lalonde, Queen’s University – A computational approach for the discovery of Thi1 and Thi5 regulated (Thiamine repressible)

>[Interesting – two presenters! This is their undergraduate project]

Bioinformatics looking for genes activated by thiamine, using transcription factor binding motifs. [Some biological background] Thi1 and Thi5 binding sites are being detected.

Thiamine uptake causes repression of Thi1 and Thi5.

Used upstream sequences from genes of interest. Used motif detection tools to generate a dataset of potential sites.

Looking at Zinc finger TF’s, so bipartite, palindromic sites. Used BioProspector, from Stanford. It did what they wanted the best.

Implemented a pattern recognition network (feed forward), using training sets from bioprospector + negative (random) controls. Did lots of gene sets, many trials and tested many different parameters.

Used 3 different gene sets (nmt1 and nmt2 gene sets from different species), (gene set from s. Pombe only, 6 genes), (all gene sets all species)

Preliminary results: used length of 21, Train on S. pombe and S. japonicus, test on S. octosporus.
Results seem very good for first attempt. Evaluation with “confusion matrix” seems very good. (Accuracy appears to be in the range of 86-95%)

Final testing with the neural network: Significant findings will be verified biologically, and knockout strains may be tested with microarrays.

>Denny Chen Dai, SFU – An Incremental Redundancy Estimation for the Sequence Neighbourhood Boundary

>Background: RNA primary and secondary structure. Working on the RNA design problem (Inverse RNA folding.) [Ah, the memories…]

Divide into sequence space and structure space. Structure space is smaller than sequence space. (Many to one relationship.)

Biology application: how does sequence mutation change the structure space?

Neighbourhood Ball : Sequences that are closely related, but fold differently. As you get closer to the edge of the ball, you find… [something?]


  • Sample n sequences with unique mapping strucure
  • for each sample: search neutral sequence within inner layers, redundancy hit?
  • Compute redundancy rate p.
  • Redundancy rate distribution over Hamming layers. P will approach 1. (all structure are redundant.)

The question is at what point do you saturate? Where do you find this boundary? Somewhere around 50% of sequence space. [I think??]


  • An efficient estimation boundary – confirmed the existence of the neigborhood ball
  • ball radius is much smaller than the seqeunce length.

Where is this useful?

  • Reduce computational effort for RNA design
  • naturally occurring RNA molecules, faster reduncdancy growth rate suggests mutational robustness.
[My Comment: I really don’t see where this is coming from.  Seems to be kind of silly, doesn’t reference any of the other work in the field that I’m aware of.  (Some of the audience questions seem to agree.)  Overall, I just don’t see what he’s trying to do – I’m not even sure I agree with his results.  I’ll have to check out his poster later to see if I can make sense of it.  Sorry for the poor notes.  ]

>Connor Douglas, UBC – How User Configuration in Bioinformatics Can Facilitate “Translational Science” – A Social Science Perspective

>Background is in sociology of science – currently based in centre for applied ethics.

What is civic translational science? Why is it important?

Studying pathogenomics of innate immunity in a large project, including Hancock lab, Brinkman lab, etc. GE(3)LS: Genomics, Ethics, Economics, Environment, Legal and Social issues. What are the ramifications of the knowledge? Trying to hold a mirror up to scientific practices.

Basically, studying bioinformaticians from a social science perspective!

[talking a lot about what he won’t talk a lot about…. (-: ]

“Pathogenomics of Innate Immunity” (PI2). This project was required to have a GE(3)LS component, and that is what his research is.

What role does user configuration play in fostering civic translational science? What is it?

It is “iterative movements between the bench to markets to bedside”. Moving knowledge out from a lab into the wider research community.

Studying the development of the “InnateDB” tool being developed. It’s open access, open source, database & suite of tools. Not just for in-house use.

Looking at what forces help move tools out into the wider community:

  • Increased “Verstehen” within the research team. (Taking into account the needs of the wider community – understanding what the user wants.)
  • limited release strategies – the more disseminating the better
  • peer-review publication process: review not just the argument but the tool as well.
  • A continued blurring of divisions between producers and users.

And out of time….

>Medical Imaging and Computer-Assisted Interventions – Dr Terry Peters, Robarts Institute, London Ontario


This talk was given as the keynote at the 2009 CSCBC (Fourth Canadian Student Conference on Biomecical Computing.)

In the beginning, there were X-rays. They were the mainstay of medical imaging till the 70s, although ultrasound started in the 50’s, it didn’t take off for a while. MRI appeared in the 80’s. Tomography in 1973.

Of course, all of this required computers. [A bit of the history of computing.]

Computer Tomography. The fundamentals go back to 1917 – “The Radon Transform”, which are the mathematical underpinnings of CT.

Ronald Bracewell made contributions in 1956, with Radio Astronomer used this to reconstruct radio sources. He recognized that fourier transform relation between signals and reconstruction. He developed math very similar to what’s used for CT reconstruction.. he was working on a calculator (3 instructions /min)!

Sir Godfrey Hounsfield, Nobel prize winner in 1979. He was an engineer for EMI (the music producer!) Surprisingly, it was the profit of the Beatle’s albums that funded this research.

Dr Peters himself began working on CT in the late 1960’s. “Figure out a way sof measuring bone-density in the forearm using ultrasound….” (in the lab of Richard Bates, 1929-1990). That approach was a total disaster, so turned to X-ray. Everything in Dr. Bates lab started with Fourier transforms, so his research interests gave him a natural connection with Bracewell at Stanford… The same math that Bracewell was working on made the jump to CT.

The first “object” they used to do was with sheep bones – in New Zealand – what else??

The first reconstruction required 20 radiographs, a densitometer scan, a manual digitization, and 20 minutes on an IBM 360. “Pretty pictures but they will never replace radiographs” – NZ Radiologizt 1972.

The following months, Hounsfiled reports on invention of EMI scanner – scooping Dr. Peters PhD project. However, there were still lots of things to work on. “If you find you’re scooped, don’t give up there are plenty of problems to be solved…” “Don’t be afraid to look outside your field.”

How does CT work?  The central slice Theorem. Take an X-ray projection, fourier transform it, so instead of inverting the matrix, you can do the whole thing in the fourier transform space.

Filtered Back Projection: FT -> | rho | -> Inv FT.

This all leads to the clinical acceptance of CT. Shows us the first CT image ever. His radiology colleagues were less than enthusiastic. However, Dr. James Ambrose in London, saw the benefits of the EMI scanner. Of course, EMI only though there will ever be a need for 6 CT machines.

First CT was just for the head. It took about 80 seconds of scanning, and about the same to recreate the image.

His first job was to “build a CT scanner”, with a budget of $20,000, in 1975-78.

in 1974: 80×80 2009 : 1024×1024
3mm pixels less than .5mm pixels

13mm thick slices  less than 1mm thick slices

What good is CT scanning? Good for scanning density. Great for bones, good for high constrast, not so good in brain (poor contrast between white and grey matter), high spacial resolution,
tradeoff, high cost of radiation dose to patient.
Use for image-guidance for modeling and for pre-operative patients. Not used during surgery, however.

CT Angiography is one example of the power of the technique. You can use contrast dyes, and then collect images to observe many details, and reconstruct vessels. You can even look for occlusions in the heart in blood vessels.

Where is this going? Now working on robotically assisted CABG. Stereo visualization systems.

Currently working to optimize robot tools + CT combination. Improper thoracic port placement, and optimize patient selection.

Pre-operative imaging can be used to measure distances and optimize locations of cuts. This allows the doctor to work without opening the rib cage. They can now use a laser to locate and identify where the cuts should be made, in a computer controlled manner.


Has roots in physics and chemistry labs. NMR imaging built on mathematical foundations similar to CT. Some “nifty tricks” can be used to make images from it. Dropped “N” because nuclear wasn’t politically correct.

In 1975, Paul Lautebur presented “Zeumatography”. Magnets, water, tubes… confusing everyone! Seemed very far away from CT scanning. Most people thought he was WAY out there. He ended up sharing a Nobel Prize.

Sr Peter Mansfield in 1980 produced an MRI of a human using this method – although it didn’t look much better than the first CT.

[Explanation of how NMR works – and how Fourier transforms and gradients are applied.]

More than anything else, MRI combines more scientific disciplines than anything else he can think of.

We are now at 35 years of MRI. Originally said that MRI would never catch on. We now generate high resolution 7 Tesla images. [Very impressive pictures]

Discussion of Quenching of the magnets… yes, boiling off the liquid helium is bad. Showing image of how a modern MRI works.

What good is MRI? Well, the best signals come from water (protons), looking at T1 and T2 relaxation times. Have good soft tissue contrast – including white and grey matter brain cells. High spatial resolution, high temporal resolution. No radiation dose, great use for image-guidance.
(As far as we can tell, the human body does not react negatively to the magnetic fields we generate.)

Can also be used for inter-operative techniques, however everything used must be non-magnetic. Several neat MRI scanners exist for this purpose, including robots that can do MRI using just the “fringe fields” from a nearby MRI machine.

Can be used for:

  • MRA – Angiography (vascular system), 
  • MRS – Spectroscopy (images of brain and muscle metabolism)
  • fMRI – Functional magnetic resonance imagine (image of brain function)
  • PW MRI – Perfusion-Weighted imaging. (Blood flow in ischemia and stroke)
  • DW MRI – Diffusion-Weighted imaging (water flow along nerve pathways – images of nerve bundles).

FMRI: Looks at regions that demand more oxygen. Can differentiate 1% changes, and then can correlate signal intensity with some task (recognition, or functional) Can be used to avoid critical areas during surgery.

Diffusion Tensor: looks at the diffusion of water, resulting in technique of “Tractography”, which can be used to identify all of the nerve pathways, which can then be avoided during surgery.

There are applications for helping to avoid the symptoms of Parkinson’s. Mapped hundreds of patients to find best location, and now can use this information to tell them exactly where to place the electrodes in new patients.

[Showing an image in which they use X windows for their computer imaging – go *nix.]

Two minutes of Ultrasound: [How it works.] Typical sonar, and then reconstruct. “Reflections along line of sight.” Now, each ultrasound uses various focal lengths, several transducers, etc, etc. All done electronically now.

The beam has an interesting shape – not conical, as I had always though.

Original Ultrasound used an oscilloscope with long persistence, and they’d use a Polaroid camera to take pictures of it. The ultrasound head used joints to know where it was to graph points on the oscilloscope. (Long before computers were available.)

Advantage: Images interfaces between tissues, inexpensive, portable, realtime 2D/3D, does not pass through air or bone. Can be used to measure changes of reflective frequency, so blood flow direction and speed. Can be used for image-guidance – can be much more useful when combined with MRI, etc.
Disadvantage: difficult to interpret.

In the last year, 3d, dynamic ultrasound is now available. You can put a probe in the ultrasound and watch the heart valves.

For intra-cardiac intervention: Create model from pre-op imaging, register model to patient, use trans-esophogeal ultrasound for real-time image guidance, introduce instruments through chest/heart wall, magnetically track ultrasound and instruments, display in VR environment.

[Very cool demonstrations of the technology.] [Now showing another VR environment using windows XP. Bleh.]

Other modalities: PET – positron emission tomography, SPECT,

One important tool, now, is the fusion of several of these techniques: MRI-PET, CT-MRI, US-MRI.

Conclusion: CT and MRI provide high resolution 3d/4d data, but can’t be used well in operating room. US is inexpensive and 2d/3d imaging, but really hard to get context.

Future: image-guided procedures, deformable models with US synchronization. Challenges: tracking intra-op imaging devices and real-time registration. Deformation of pre-op models to intra-op anatomy.