R package and Shiny web application for estimating ploidy and cellularity from tumour copy number profiles.

Several research groups at CRUK CI are using shallow whole genome sequencing as a relatively inexpensive method for obtaining copy number profiles for tumour samples, particularly as several libraries can be multiplexed in a single lane of sequencing. We are principally using the QDNAseq R package from the Bioconductor project for summing reads that align within genomic windows or bins, typically 30kb in size, and correcting for GC-content and mappability. This results in values that are relative to the average copy number within the sample for the GC and mappability of each bin. These relative copy numbers are smoothed and segmented and provide useful insight into genomic abnormalities in cancers.

For some research projects, it is desirable to obtain absolute copy numbers and normally this would require deeper whole genome sequencing from which allele fractions of germline SNPs can help determine the clonal architecture of a tumour sample. In the absence of such information, and noting the significant increase in cost for deeper sequencing, we can attempt to fit the relative copy number profiles to absolute copy numbers by evaluating various ploidy and cellularity estimates.

The rascal R package and Shiny app were developed by Matt Eldridge in collaboration with the Brenton group and are based on concepts introduced in the ACE package created by Bauke Ylstra’s group at Amsterdam UMC (who are also the authors of QDNAseq). The mathematics underpinning this approach assume a single dominant clone, so estimating ploidy and cellularity for very heterogeneous tumour samples may prove difficult with this method.

The source code and releases of the R package are available on GitHub.


ACE: absolute copy number estimation from low-coverage whole-genome sequencing data. Poell et al., Bioinformatics 35:2847-2849 (2019).