skip to content

Cancer Research UK Cambridge Institute



Visit the Computational Biology lab website.

Statistics and computational biology

Our work has continued its focus on three main areas: Statistical methods for the analysis of next‑generation sequencing data, evolutionary approaches to cancer and methods for the analysis of genomics data.

We are continuing our collaboration with the International Cancer Genome Consortium projects on oesophageal adenocarcinoma, led by Professor Rebecca Fitzgerald from the MRC Cancer Unit, and prostate cancer, co-led by Professor David Neal (CRUK CI). These projects, funded by Cancer Research UK, are sequencing many tumour-normal samples, and should provide interesting and medically relevant information about the aberrations that occur in the genomes of these cancers. In both projects we are investigating the integration of multiple data types, the heterogeneity of tumours and their evolutionary history, with analysis methods being developed to facilitate these aims. A dedicated analysis group is now in place focusing on the International Cancer Genome Consortium (ICGC). The Bioinformatics core facility led by Dr Matt Eldridge in the CRUK CI and members of the Wellcome Trust Sanger Institute collaborate with us in this work.

We have continued our collaboration with the Winton laboratory on characterizing the stochastic processes that govern how cells carrying a mutation can gain a clonal advantage in intestinal tumours. Biases in these processes were elucidated in an article published this year in Science (Figure 1).

Figure 1. Stem cell dynamics in tumour initiation illustrated by quantifying the clonal benefit of KrasG12D (from Vermeulen et al., Science 2013; 342: 995). (A) Intestinal stem cells are equipotent and continuously replace each other in a stochastic fashion. (B) Confocal images of SI crypt bottoms of AhCreER/tdTom–/fl mice (WT) and AhCreER/tdTom–/fl/Kras-G12Dfl (KrasG12D) at the indicated time points after clone induction. Clone sizes are indicated as fractions (in eighths) of the crypt circumference. Blue, nuclear stain (DAPI); red, tdTom expression; scale bars represent 30mm. (C and D) The distribution of clone sizes and the corresponding distribution changes caused by the activation of Kras can be captured using stochastic models. The models describe the competition between the stem cells and summarise the fitness of the mutant stem cell via PR, the probability that the mutant will replace its neighbour. A value of PR = 0.5 means that the mutant stem cell is as fit as the WT cells and a value of 1 means that the mutant cell systematically outcompetes the WT cells. Bayesian inference allows us to formally match the model to the clone size distribution data and infer the fitness of the mutant cells. When applied to WT data (C), the model predicts a fitness of 0.5, whereas for Kras (D) there is a fitness advantage. The fact that the value is lower than 1 indicates that there is still an on-going competition.

Illumina technologies (both sequencing and BeadArray) are essential tools in cancer studies, and we, in collaboration with Mark Dunning (Bioinformatics core) and Matt Ritchie (WEHI, Australia), continue to update and support the beadarray Bioconductor package in order to facilitate transparent and flexible statistical analyses of full bead-level data. We have developed over 20 software packages, and our group is committed to providing open source computational tools for the analysis of sequencing data. We ran the European Bioconductor Developers’ Meeting in December. We have a number of other ongoing collaborations within the CRUK CI, in particular with the Narita and Rosenfeld labs. We continue to study intra‑tumour heterogeneity in glioblastoma with Dr Colin Watts’ lab in Clinical Neurosciences. This has led, inter alia, to the development of novel statistical methods for the analysis of transposable data, particularly in the “p >> n” setting typified by expression data sampled from multiple sites in the tumour.

We have continued our research in the area of evolutionary methods in cancer biology, focussing in particular on: (i) spatial stochastic models for the evolution of tumours. Such models allow us to study cancer stem cells by comparing the dynamics of particular molecular markers; (ii) Approximate Bayesian computation (ABC) for inference, particularly in the setting in which observations from the underlying model cannot be simulated sufficiently quickly; (iii) Methods for estimating the complexity of sequencing libraries, and on the evolution of post-stationary phase mutants in E. coli.

The lab has several new recruits this year. Achilleos Achilleas, Lawrence Bower and Haleh Yasrebi joined the Oesophageal ICGC bioinformatics team. Henry Farmery completed the MPhil in Computational Biology and joined the lab as a PhD student. Alexey Larionov joined the lab as a joint postdoc, supported by an ERC grant to Dr Marc Tischkowitz in the Department of Medical Genetics. Dr Von Bing Yap from the National University of Singapore visited the lab, and Dr Alexandra Jauhiainen  made another visit from the Karolinska Institute in Stockholm.

Charlotte Anderson, one of the ICGC bioinformaticians, returned to Australia. Mike Smith completed his PhD and continued as a postdoc with support from the EU FP7, and Jonathan Cairns submitted his PhD in September. Nick Shannon became a medical student at the National University of Singapore.