Single-cell and computational biology

The goal of the Marioni group is to use computational models to understand the molecular mechanisms that underlie cell fate decisions. Since cell fate decisions are made at the single cell level, these can only be properly understood by profiling molecular features at single cell resolution. The Marioni group has been a driver of the single cell genomics revolution, pioneering the development of robust and widely-used statistical approaches for all facets of data analysis, from normalization through to data integration and interpretation. Moreover, we have demonstrated how single cell genomics can be used to create molecular atlases and, critically, how these can facilitate novel and unexpected discoveries about how cells function in a variety of contexts including early development, ageing, immunology and cancer.

Previously, molecular fingerprints were generated by profiling gene expression levels from bulk populations of millions of input cells. These ensemble-based approaches meant that the expression value for each gene was an average of its expression across a population of input cells. However, there are many biological questions where bulk measures of gene expression are insufficient. During early development, for example, there are only a small number of cells, each of which can have a distinct function and role.

Recent experimental advances, in particular in single cell RNA-sequencing (scRNAseq), have greatly improved the high-throughput generation of cDNA libraries from the poly-adenylated fraction of mRNA molecules within a single cell. scRNAseq can be applied to assay the individual transcriptomes of large numbers of cells. The combination of large numbers of cells with high-throughput profiling of gene expression (and other ‘omics’ measurements) at the single cell level allows us to answer many new biological questions.

Overview of the immune compartment of the mouse mammary gland
Overview of the immune compartment of the mouse mammary gland. a) UMAP of all immune cell types seen in gestation and tumourigenesis (b) Gene expression of marker genes for the various immune cells from Bach, Pensa et al. (submitted)

From a methodological perspective, my group ensures these new data can be fully exploited by developing the required sophisticated and rigorously tested statistical models. We ensure that our code is well-documented and accessible to the wider scientific community, thereby facilitating use of these tools as well as establishing building blocks for further methodological development. Once established, we apply these tools, together with outstanding experimental collaborators, to understand fundamental biological questions, with a particular focus on early mammalian development, immunology and cancer.  

UMAP to show gross cellular changes during tumourigenesis in the mouse mammary gland, from Bach, Pensa et al. (submitted)

Moving forward, the group will increasingly focus on modeling cell fate decisions in space and in real time: In particular, whilst powerful, almost all single-cell genomics techniques to date require cells to be dissociated, with a loss of spatial context and a subsequent inability to fully comprehend a cell’s ecosystem. While new technologies are emerging to resolve this problem, there is a lack of robust and appropriate computational tools for making sense of the resulting data. Using statistical models, motivated by both classical and machine learning strategies, we will develop, in close collaboration with experimental colleagues, such a toolkit. This will enable construction of a comprehensive map of how a cell’s spatial position, its movement and its molecular profile impact its ultimate fate. We will pioneer these approaches using early mammalian development before moving on to apply them in the context of disease, especially cancer.

John Marioni is jointly employed by the EMBL European Bioinformatics Institute at the Wellcome Genome Campus, Hinxton, so the group is based across both sites.