Ampliconseq is a variant calling pipeline for amplicon sequencing data.

Ampliconseq is an analysis pipeline for calling single nucleotide variants (SNVs) and indels in targeted amplicon sequencing data. Variants are called using GATK HaplotypeCaller, preferred for germline or clonal somatic mutations, especially in FFPE samples, or VarDict which can identify low allele fraction SNVs in circulating tumour DNA from plasma samples. In addition to caller-specific filters, the pipeline models the background substitution noise at each amplicon position to identify and filter SNV calls with very low allele fractions that are not distinguishable from noise. Alignment and target coverage metrics are computed and compiled into a QC report. Variants are annotated using Ensembl Variant Effect Predictor (VEP).

The ampliconseq pipeline is executed using the Nextflow scientic workflow system and all dependencies and tools are packaged in a Docker container that can be run either using Docker or Singularity. The inputs to the pipeline are BAM files containing sequence reads aligned to the reference genome.

The ampliconseq pipeline was developed by Matt Eldridge in collaboration with James Brenton’s research group. The package along with a guide to installing and running the pipeline, are available on GitHub.

Features

  • Choice of variant callers: GATK HaplotypeCaller and VarDict

  • Alignment and coverage QC report using metrics calculated by Picard CollectAlignmentSummaryMetrics and CollectTargetedPcrMetrics

  • Annotation of variants using Ensembl Variant Effect Predictor (VEP)

  • Support for overlapping amplicon targets by partitioning reads prior to variant calling

  • Support for calling and filtering low allele fraction SNVs, e.g. for circulating tumour DNA in plasma samples with allele fractions down to 0.1%, by fitting probability distributions to model background noise

  • Specific calling of known mutations

  • Assignment of confidence level based on whether a variant is called or filtered in each of a set of replicate libraries (usually duplicate libraries)

  • Minimal barrier to installation with the only requirements being a Java runtime, Nextflow, and either Docker or Singularity to run a container in which all other dependencies and tools are packaged

  • Scales easily from deployment on multi-core workstation to high-performance compute cluster or cloud with only a simple configuration change

  • Accompanying visualization tool for viewing and assessing SNV calls