SigCheck is an R package available in Bioconductor for checking a gene signature’s prognostic performance against random signatures, known signatures and permuted data or metadata. It was written by Justin Norden and Rory Stark.

A common task in the analysis of genomic data is the derivation of gene expression signatures that distinguish between phenotypes (disease outcomes, molecular subtypes, etc.). However, in their paper “Most random gene expression signatures are significantly associated with breast cancer outcome“, Venet, Dumont, and Detour point out that while a gene signature may distinguish between two classes of phenotype, their ultimate uniqueness and utility may be limited. They show that while a specialized feature selection process may appear to determine a unique set of predictor genes, the resultant signature may not perform better than one made up of random genes, or genes selected at random from all differentially expressed genes. This suggests that the genes in the derived signature may not be particularly informative as to underlying biological mechanisms. They further show that gene sets that comprise published signatures for a wide variety of phenotypic classes may perform just as well at predicting arbitrary phenotypes; famously, they show that a gene signature that distinguishes postprandial laughter performs as well at predicting the outcome of breast cancers as well as a widely-cited signature.

The SigCheck package was developed in order to make it easy to check a gene signature against random and known signatures, and assess the unique ability of that signature to distinguish phenotypical classes. It additionally provides the ability to check a signature’s performance against permuted data as a reality check that it is detecting a genuine signal in the original data.