The Bioinformatics Core facility offers advice on experimental design and statistics and provides training and support for data processing and analysis, working collaboratively with colleagues in CRUK CI research groups and other core facilities.


An important role of the Bioinformatics Core is to provide training to CRUK CI scientists and, working in partnership with the University’s Bioinformatics Training Facility, we offer a number of classroom-based training courses with an emphasis on hands-on, practical-based learning.

Bioinformatic and statistical data analysis

The team has considerable experience in analysing datasets generated by high-throughput technologies and supports research projects that employ a range of experimental approaches:

  • RNA-seq for differential gene expression between sample groups, e.g. treated vs. untreated cell lines, and single cell RNA-seq for identification and characterization of sub-populations of cells

  • ChIP-seq to investigate how transcriptional regulation is altered in cancers by locating binding sites of regulatory proteins and determining which of these are differentially bound under different conditions

  • Whole genome, exome and amplicon sequencing to explore variation in cancer genomes, identifying single nucleotide variants, small insertions/deletions, copy number aberration and genomic rearrangements

  • Quantitative proteomics including the analysis of tandem mass tag (TMT) mass spectrometry data to look at changes in protein abundance following treatment or a perturbation

  • Statistical analysis of a wide range of data types including application of mixed models for tumour growth curve data and survival analysis comparing treatment groups or disease subtypes

  • Interpretation of CRUK CI datasets and research findings in the context of publicly available data, including genomic feature annotations, sequence motifs, pathways and survival data


Figure 1. Temporal profiling of the Estrogen Receptor alpha interactome following treatment of MCF7 breast cancer cells with 4-hydroxytamoxifen (Papachristou, Kishore et al., Nature Communications 2018). The Bioinformatics Core is developing analysis techniques to support emerging experimental technologies, in this case quantitative multiplexed rapid immunoprecipitation mass spectrometry of endogenous proteins (qPLEX-RIME).



The Bioinformatics Core has developed a number of software packages and tools for analyzing and visualizing the data sets we work with. These include R and Bioconductor packages, analysis pipelines developed using Nextflow and interactive web applications written using the R Shiny framework.

Data processing and analysis infrastructure

The Bioinformatics Core is actively involved in processing increasingly large volumes of genomics data and develops analysis pipelines to run these efficiently on the CRUK CI high-performance compute cluster.

We work closely with the Genomics Core and support the Illumina sequencing operation with automated data processing and QC, LIMS deployment and extension for bespoke laboratory workflows, and delivery of sequence data to 12 partner institutes and University departments.

Supporting other core facilities

We are providing support to several of the other core facilities at CRUK CI.

  • Genomics – managing and delivering the data coming off the Illumina sequencers since early 2008 with the Genome Analyser I to the current NovaSeq 6000

  • Proteomics – mass spectrometry results database and query interface

  • Pre-Clinical Genome Editing – CRISPR guide design and clone selection tools

  • Flow Cytometry – mass cytometry image extraction and reconstruction