Illumina sequencing available at The CRUK CI Genomics Core Facility

Our sequencing service is available across the University of Cambridge, originally born from a collaboration with several partners and departments coordinating efforts to develop the infrastructure. We always try to accommodate any request from within The University of Cambridge but ultimately it is dependent on our current capacity.

Run types by sequencer: MiSeq runs allow a maximum number of cycles depending on the reagent cartridge, but you can request any length and single- or paired-end as required as you own the whole flowcell. Most of our sequencing is carried out on Illumina’s NovaSeq and we can still perform almost any run configuration, including custom primers for your specific libraries. The most commonly requested runs remain paired-end (PE)150 bp and single-read (SR)50 bp, but as more and more single cell or CRISPR screen experiments are sequenced we see interesting trends to asymmetric PE reads to incorporate UMI’s and more creative barcoding strategies.

So, if in doubt just ask. Even for a single lane request with interesting parameters we can often find a suitable partner to enable a flowcell to be run.

Background to the service

Currently we have the following Illumina Sequencers available in the core: 1x NovaSeq and 2x MiSeq sequencers (the fleet of HiSeqs are all now retired – and enjoy a very well deserved rest). The larger sequencers and sequencing infrastructure has been purchased collaboratively over the years with the following Institutes and Departments in Cambridge – to whom we are all forever grateful!

  • Gurdon Institute
  • MRC Laboratory of Molecular Biology
  • MRC Cancer Unit
  • MRC single-cell infrastructure grant
  • CRUK Cambridge Institute
  • Department of Obstetrics & Gynaecology
  • Department of Haematology
  • Institute of Metabolic Sciences
  • The Stem Cell Institute
  • Centre for Cancer Epidemiology (Strangeways)
  • Astra Zeneca
  • Department of Clinical Neurosciences

We started with Illumina sequencing in 2007 and have become a centre of excellence for the use of this technology. We purchased an Illumina GA1 in 2007 beginning the collaboration with the Gurdon Institute and Department of Plant Sciences. Since then we have had almost every Illumina instrument in the lab, just never needing the MiniSeq and X10, or as of yet the iSeq. The SBS chemistry continues to be stretched to generate phenomenal volumes of data from a flowcell; we’ve gone from ~1Gb per flowcell on the GA1 to ~1Tb on the HiSeq 4000 and now upwards of 3 Tb on the larger NovaSeq flowcells!

NGS methods and applications:

Uses of NGS: The most common applications today are sequencing of whole genomes, exomes, amplicons, transcriptomes (commonly referred to a ‘RNA-seq’, but be careful, there are very many different ‘RNA-seq’ methods depending on the RNA species you are looking for…) and DNA:protein interactions (ChIP-seq). NGS sequence reads can be generated as either single-end, where only one end of a fragment molecule is sequenced or as paired-end, where both ends are sequenced. The format of a sequencing run is generally chosen before library preparation and there are a few commonly accepted defaults, e.g. single-end 36bp for ChIP-seq and paired-end 100bp for Cancer genomes. The structure of a sequencing data set is determined by the question being asked. ChIP-seq studies require high numbers of short-reads so these are generated on short fast runs. WGS requires as much actual genome coverage as possible so long-reads are used. RNA-seq and Structural-Variation-seq can use varying lengths of sequence read but both use paired-end data for many studies.

Whole genome sequencing: Tens of thousands of Human genomes have now been sequenced, with the majority of these completed on the Illumina platform. The costs of a genome sequencing is rapidly decreasing and is now well under $2,000, and can be done for as little as $1000 in some labs. Analysis methods are continually improving but it still takes many hundreds of hours of computation to complete primary and secondary analysis. If you want to run more than a 3 or 4 whole genomes then I’d recommend using Illumina’s X10 platform and we can help find you a service provider for this.

  • 30-50x genome coverage using long paired-end reads, generally 450M PE125bp reads per sample

 

Exome-seq: It is possible to sequence just the exons which reduces the time and cost of experiments and allows a significant increase in sample numbers. Data analysis is potentially easier as well. Most exome sequencing is performed using in-solution capture. In this method biotinylated-oligonucleotide baits are mixed with sequencing libraries to pull-down only the exon regions for sequencing.

  • 50x exome coverage using long paired-end reads, generally 30-50M PE75bp reads per sample

 

Amplicon-seq: For many research questions simply sequencing one or two exons in hundreds of samples is faster, uses little DNA for analysis and data analysis is simple. PCR amplification is well understood and has high specificity and sensitivity. Most users can simply design their own assays and it is theoretically possible to generate sequence data in one week, from primer design and ordering, PCR and sequencing on MiSeq or PGM. As potentially 100’s of samples can be multiplexed into a single NGS run the cost per sample or per amplicon can be very low, less than $1 each.

  • 50-2000x amplicon coverage using long paired-end reads, generally 1-5M PE150bp reads per sample

ChIP-seq: The analysis of Protein:DNA interactions allows researchers to unravel genome regulation and it’s influence on gene expression and methylation. Their most common application is the analysis of transcription factor binding: native DNA is cross-linked to bound proteins and fragmented; fragments bound to proteins are enriched by immunoprecipitation with an anti-body to the protein of interest; this DNA is then used in a library preparation ready for NGS sequencing. CLIP-Seq is a very similar method used for Protein:RNA interaction analysis.

  • Generally 10-50M SE50bp reads per sample, more, or PE-reads are not usually helpful. > 2 replicates

 

RNA-seq: Actually several very different methods are commonly referred to as RNA-seq: differential gene expression by mRNA-seq with a small number of reads (10-20M SE50bp), small RNA-seq with even fewer reads (1-2M SE50bp) and whole transcriptome analysis from mRNA enriched or ribosomal RNA-depleted total RNA for the analysis of all RNAs, splicing and allele specific expression. Most start with RNA that is converted to cDNA for library preparation and finally sequencing These are among the most popular applications we run on our Illumina sequencers and they have supplanted microarrays for several reasons.

  • DGE: Generally 10-20M SE50bp reads per sample, more, or PE-reads are not usually helpful. >4 replicates.