AG Lynch, S-F Chin, MJ Dunning, C Caldas, S Tavaré, C Curtis
Sample tracking errors have been and always will be a part of the practical implementation of large experiments. It has recently been proposed that expression quantitative trait loci (eQTLs) and their associated effects could be used to identify sample mix-ups and this approach has been applied to a number of large population genomics studies to illustrate the prevalence of the problem. We had adopted a similar approach, termed 'BADGER', in the METABRIC project. METABRIC is a large breast cancer study that may have been the first in which eQTL-based detection of mismatches was used during the study, rather than after the event, to aid quality assurance. We report here on the particular issues associated with large cancer studies performed using historical samples, which complicate the interpretation of such approaches. In particular we identify the complications of using tumour samples, of considering cellularity and RNA quality, of distinct subgroups existing in the study population (including family structures), and of choosing eQTLs to use. We also present some results regarding the design of experiments given consideration of these matters. The eQTL-based approach to identifying sample tracking errors is seen to be of value to these studies, but requiring care in its implementation.