MGA is a contamination screen quality control tool for high-throughput sequencing data.

MGA alignment summary

MGA screens for contaminants by aligning sequence reads in FASTQ format against a series of reference genomes using Bowtie and against a set of adapter sequences using Exonerate.

To reduced the computational run time, MGA samples reads, taking a subset of 100,000 per sample or lane by default, and trims these to a specified length, typically 36 bases. Trimming ensures consistency of the output mapping and error rates across runs with differing read lengths. Exonerate alignment against adapters uses the full-length sequences.

MGA was developed by Matt Eldridge with James Hadfield from the Genomics Core and is run for all sequencing runs carried out at CRUK-CI as part of the automated primary data processing and QC pipeline. It provides an alignment-based QC report soon after the sequencing has finished with a single clear plot that shows for each lane:

  • Yield in terms of numbers of reads

  • Proportion of reads mapping to the expected species/genome

  • Quality of sequencing in terms of error rates (reflected by boldness or opacity of the coloured bar)

  • Possible contamination from other species, including bacteria, viruses and fungi

  • Adapter content

The source code and details about installing and deploying MGA are available on GitHub.

A plugin for MultiQC for MGA, developed by Richard Bowers, is also available here.