The bioinformatics core facility provides computational and statistical support for research at the CRUK CI through statistical data analysis, data management, and training and consultation services.

The group has expertise in the analysis of datasets generated by high-throughput genomics technologies including next-generation sequencing and microarrays and supports research projects that employ a range of experimental approaches:

  • ChIPseq to investigate how transcriptional regulation is altered in cancers by locating binding sites of regulatory proteins and analysing how these are differentially bound under different conditions or treatments.
  • Whole genome, exome and amplicon sequencing to explore variation in cancer genomes, identifying single nucleotide variants, small insertions/deletions, copy number aberration and genomic rearrangements.
  • mRNAseq and microarray transcriptomic analysis for differential gene expression analysis, SNP and copy number profiling, microRNAs and allele-specific expression.
  • Analysis of proteomics data, including differential protein expression analysis for tandem mass tag (TMT) experiment.
  • Interpretation of CI datasets and research findings in the context of publicly available data, including genomic feature annotations, sequence motifs, pathways and survival data.

The Bioinformatics Core is actively involved in processing increasingly large volumes of genomics data and develops analysis pipelines to run these efficiently on the CRUK CI high performance compute cluster.

Advice and help with experimental design and statistics is facilitated through design meetings and clinics; researchers undertaking projects requiring bioinformatics support are encouraged to engage with members of the Bioinformatics Core at an early stage.  An important aspect of the service is to provide training to CI researchers and the group offers a number of classroom-based training courses based on lectures and hands-on practical sessions.  These include courses on statistics, experimental design, analysis of high throughput sequencing and microarray data, and programming in R and Python to solve biological problems.

Figure 1: High-throughput sequencing has greater sensitivity than microarrays to detect changes in gene expression levels.

The facility is responsible for the development and maintenance of the high throughput sequencing LIMS that supports the Illumina sequencing instruments in the Genomics Core and for automating the transfer, processing and delivery of data from these sequencers.  The group also provides access to commercial databases and maintains a local Galaxy server, enabling researchers to carry out bioinformatics analyses.