J-B Pettit, R Tomer, K Achim, S Richardson, L Azizi, J Marioni
PLoS Comput Biol
Complex tissues, such as the brain, are composed of multiple different cell types, each of which have distinct and important roles, for example in neural function. Moreover, it has recently been appreciated that the cells that make up these sub-cell types themselves harbour significant cell-to-cell heterogeneity, in particular at the level of gene expression. The ability to study this heterogeneity has been revolutionised by advances in experimental technology, such as Wholemount in Situ Hybridizations (WiSH) and single-cell RNA-sequencing. Consequently, it is now possible to study gene expression levels in thousands of cells from the same tissue type. After generating such data one of the key goals is to cluster the cells into groups that correspond to both known and putatively novel cell types. Whilst many clustering algorithms exist, they are typically unable to incorporate information about the spatial dependence between cells within the tissue under study. When such information exists it provides important insights that should be directly included in the clustering scheme. To this end we have developed a clustering method that uses a Hidden Markov Random Field (HMRF) model to exploit both quantitative measures of expression and spatial information. To accurately reflect the underlying biology, we extend current HMRF approaches by allowing the degree of spatial coherency to differ between clusters. We demonstrate the utility of our method using simulated data before applying it to cluster single cell gene expression data generated by applying WiSH to study expression patterns in the brain of the marine annelid Platynereis dumereilii. Our approach allows known cell types to be identified as well as revealing new, previously unexplored cell types within the brain of this important model system.