Illumina sequencing available in the Genomics Core
Our sequencing services is made available through a University of Cambridge collaboration. If your department is a member of this collaboration then you can request access to the service, if not you’ll have to make an ad hoc request.
Run types by sequencer: Most of our sequencing is carried out on Illumina’s HiSeq 4000. The most commonly requested runs for HiSeq 4000 are single-end 50bp and paired-end 150bp but if you need to use a different run configuration we recommend you submit 8 lanes for a full flowcell (we can still run HiSeq 2500 but a whole flowcell needs to be run, and consequently wait times are difficult to predict). MiSeq and NextSeq allow a maximum number of cycles to be run, but you can request any length and single- or paired-end as required.
Background to the service
We have one Illumina NovaSeq, two HiSeq 4000, two MiSeq , and one NextSeq sequencers. The HiSeq hardware and sequencing infrastructure has been purchased collaboratively with the following Institutes and Departments in Cambridge:
- CRUK Cambridge Institute
- Gurdon Institute
- MRC Laboratory of Molecular Biology
- MRC Cancer Unit
- MRC single-cell infrastructure
- Department of Obstetrics & Gynaecology
- Department of Haematology
- Institute of Metabolic Sciences
- Stem Cell Institute
- Centre for Cancer Epidemiology
- Astra Zeneca
We started with Illumina sequencing in 2007 and have become a center of excellence for the use of this technology. We purchased an Illumina GA1 in 2007 beginning the collaboration with the Gurdon Institute and Department of Plant Sciences. Since then we have had every Illumina instrument in the lab, excepting the MiniSeq and X10. The SBS chemistry continues to be stretched to generate phenomenal volumes of data from a flowcell, and we’ve gone from 1Gb per run on the GA1 to almost 1Tb on the HiSeq 4000.
NGS methods and applications:
Uses of NGS: The most common applications toady are sequencing of whole genomes, exomes, amplicons, transcriptomes (RNA-seq), and DNA:protein interactions (ChIP-seq). NGS sequence reads can be generated as either single-end, where only one end of a fragment molecule is sequenced or as paired-end, where both ends are sequenced. The format of a sequencing run is generally chosen before library preparation and there are a few commonly accepted defaults, e.g. single-end 36bp for ChIP-seq and paired-end 100bp for Cancer genomes. The structure of a sequencing data set is determined by the question being asked. ChIP-seq studies require high numbers of short-reads so these are generated on short fast runs. WGS requires as much actual genome coverage as possible so long-reads are used. RNA-seq and Structural-Variation-seq can use varying lengths of sequence read but both use paired-end data for many studies.
Whole genome sequencing: Tens of thousands of Human genomes have now been sequenced, with the majority of these completed on the Illumina platform. The costs of a genome sequencing is rapidly decreasing and is now well under $2,000, and can be done for as little as $1000 in some labs. Analysis methods are continually improving but it still takes many hundreds of hours of computation to complete primary and secondary analysis. If you want to run more than a 3 or 4 whole genomes then I’d recommend using Illumina’s X10 platform and we can help find you a service provider for this.
- 30-50x genome coverage using long paired-end reads, generally 450M PE125bp reads per sample
Exome-seq: It is possible to sequence just the exons which reduces the time and cost of experiments and allows a significant increase in sample numbers. Data analysis is potentially easier as well. Most exome sequencing is performed using in-solution capture. In this method biotinylated-oligonucleotide baits are mixed with sequencing libraries to pull-down only the exon regions for sequencing.
- 50x exome coverage using long paired-end reads, generally 30-50M PE75bp reads per sample
Amplicon-seq: For many research questions simply sequencing one or two exons in hundreds of samples is faster, uses little DNA for analysis and data analysis is simple. PCR amplification is well understood and has high specificity and sensitivity. Most users can simply design their own assays and it is theoretically possible to generate sequence data in one week, from primer design and ordering, PCR and sequencing on MiSeq or PGM. As potentially 100’s of samples can be multiplexed into a single NGS run the cost per sample or per amplicon can be very low, less than $1 each.
- 50-2000x amplicon coverage using long paired-end reads, generally 1-5M PE150bp reads per sample
ChIP-seq: The analysis of Protein:DNA interactions allows researchers to unravel genome regulation and it’s influence on gene expression and methylation. Their most common application is the analysis of transcription factor binding: native DNA is cross-linked to bound proteins and fragmented; fragments bound to proteins are enriched by immunoprecipitation with an anti-body to the protein of intersst; this DNA is then used in a library preparation ready for NGS sequencing. CLIP-Seq is a very similar method used for Protein:RNA interaction analysis.
- Generally 10-50M SE50bp reads per sample, more, or PE-reads are not usually helpful. > 2 replicates
RNA-seq: Actually several very different methods are commonly referred to as RNA-seq: differential gene expression by mRNA-seq with a small number of reads (10-20M SE50bp), small RNA-seq with even fewer reads (1-2M SE50bp) and whole transcriptome analysis from mRNA enriched or ribosomal RNA-depleted total RNA for the analysis of all RNAs, splicing and allele specific expression. Most start with RNA that is converted to cDNA for library preparation and finally sequencing These are among the most popular applications we run on our Illumina sequencers and they have supplanted microarrays for several reasons.
- DGE: Generally 10-20M Se50bp reads per sample, more, or PE-reads are not usually helpful. >4 replicates.