Principal Scientist, Computational Biology

Research Group Home Page: 

For the past two decades, I have been engaged in addressing challenging computational problems in artificial intelligence and computational biology, particularly in cancer research. My research interests are at the intersection of cancer, functional genomics, and artificial intelligence, facilitated by the development of freely available software tools.

As the leader of a group of bioinformatics analysts at Cancer Research UK’s Cambridge Institute (CRUK-CI), at the University of Cambridge, I have been involved in hundreds of research projects covering a range of cancers and experimental modalities. I am focused on collaborations with biologists studying genome-wide transcriptional regulation – and dis-regulation – in a variety of cancers. My goal is to understand how changes in transcriptional regulation impact gene expression, enabling the characterization of specific cancer subtypes in order to predict prognosis and identify drug targets and potential therapeutic agents.

I am interested in integrated analysis of a wide range of genomics data. While some genomic features, such as changes in copy number, can be predictive of transcription, we are far from being able to accurately predict transcript expression levels from examining genomes in isolation. This is especially true in cancer, where different gene expression patterns have been shown to be associated strongly with prognosis and therapeutic response. As a result, I am drawn to experiments that more directly measure active transcription in different cell types, using both bulk and single-cell RNA-sequencing, and integrating these with data from experiments targeting the factors that control transcription levels (notably DNA enrichment assays such as ChIP-seq). Central to this is the analysis of epigenomic data, including various aspects of chromatin state, such as open chromatin (ATAC-seq), histone modifications (methylation and acetylation), and chromatin conformation (HiC), as well as certain features of DNA (copy number alterations and methylation/hydroxy-methylation of cytosines).

The great challenge is to integrate these data modalities in order to model complex regulatory dynamics. I have obtained good results from integrative analysis in my collaborations with research groups studying a variety of cancers. The emphasis of my research interests moving forward is on developing new approaches to integrating analysis of cancer functional genomics data as the quantity, quality, and diversity of regulatory data expands.

More recently, I have been applying machine learning modelling methods to my studies of functional dis-regulation in cancer. It is only now that cancer datasets are beginning to obtain the scale required to realize the full power of deep learning techniques.

I am interested in establishing new collaborations with biologists exploring transcriptional regulation in cancer while continuing to develop new methods and tools for analysing complex, heterogenous functional genomics data. I envision the large-scale deployment of unsupervised deep-learning techniques on repositories of transcriptomic and epigenomic data to discover novel biological features useful to categorize cancer subtypes, predict prognosis, identify drug targets, and optimize the search for therapeutic agents.

Work address: 
Cancer Research UK Cambridge Institute University of Cambridge Li Ka Shing Centre Robinson Way Cambridge CB2 0RE