Single-cell and computational biology

Cell identity and function can be characterised at the molecular level by unique transcriptomic signatures. At the organismal level, different tissues possess distinct gene expression profiles and individual cells in early-stage embryos display highly divergent transcriptomic landscapes. Consequently, mutations that alter these expression profiles have been associated with adverse phenotypes ranging from a delayed immune response to diseases, including cancer.

Our Research

Until recently, molecular fingerprints were generated by profiling of gene expression levels from bulk populations of millions of input cells. These ensemble-based approaches meant that the expression value for each gene was an average of its expression across a population of input cells. However, there exist many biological questions where bulk measures of gene expression are insufficient. For instance, during early development there are only a small number of cells, each of which can have a distinct function and role. Moreover, many cancers are polyclonal, where each clone has a distinct expression profile; these clones are typically difficult to dissect experimentally. Finally, ensemble measures do not provide insights into the stochastic nature of gene expression. In these, and other settings, assaying gene expression at the single-cell level represents a powerful tool for biological discovery.

Critically, recent experimental advances, in particular single-cell RNA-sequencing (scRNA-seq) have greatly improved the high-throughput generation of cDNA libraries from the poly-adenylated fraction of mRNA molecules within a single cell. scRNA-seq can be applied to assay the individual transcriptomes of large numbers of cells isolated via microfluidics or other microwell-plate-based techniques. The combination of a large number of cells and high-throughput profiling of gene expression (and other omics measurements) at the single-cell level is crucial for answering many biologically relevant questions and provides an opportunity for new discoveries in important areas of biology (Figure 1).

To this end, in 2013 we co-established and currently co-ordinate the Sanger-EBI Single-Cell Genomics Centre (SCGC), which is a world-leader in developing technological and computational approaches for single-cell biology. Since joining the Cancer Research UK Cambridge Institute, we have begun working with colleagues to establish the experimental infrastructure necessary to apply these methods within the CI.

Within the context of single-cell biology, my group has focused particularly on developing computational approaches for analysing scRNA-seq data. For example, we have shown how external spike-in molecules can be used to model and account for technical noise (Brennecke et al., Nat Methods, 2013; Vallejos et al., PLoS Comput Biol., 2015) and postulated a strategy for removing confounding structure due to the cell cycle (Figure 2) (Buettner et al., Nat Biotechnol, 2015; Scialdone et al., Methods, 2015), both crucial challenges in single-cell transcriptomics (Stegle, Teichmann & Marioni, Nat Rev Genet, 2015). In addition, we have developed approaches for modelling single-cell gene expression within a spatial context (Pettit et al., PLoS Comp Biol, 2014; Achim et al., Nat Biotechnol., 2015). Furthermore, we have applied these computational tools, in conjunction with outstanding experimental collaborators, to model several interesting biological systems, in particular relating to cell fate decisions during early development. For example, we have helped establish how biased heterogeneity in gene expression levels contributes to the first cell-fate decisions in pre-implantation embryos (Goolam et al., Cell, in press) and are one of the key co-ordinators of a Wellcome Trust Strategic Award that is studying cell fate decisions during gastrulation.

Additionally, we have made significant contributions to understanding the regulation of gene expression using bulk-based genomics data. Through collaborations with the lab of Duncan Odom we have exploited intercrosses between closely related strains of inbred mice to study the relative contribution of cis and trans regulation of gene expression (Goncalves et al., Genome Res, 2012; Stefflova et al., Cell, 2013). This work has recently been extended to explore how differences in transcription factor binding are related to the mechanisms by which each gene’s expression is regulated. Additionally, we have studied the relationship between tRNA gene expression and anticodon abundance throughout mouse development, concluding that codon bias does not significantly contribute to the regulation of gene expression in mammals (Schmitt et al., Genome Res., 2014).