Functional follow up of GWAS studies
Our research focuses on how polygenic variation contributes to susceptibility to breast and lung cancer. We analyse mechanisms both at single loci, and at the level of gene regulatory networks. Our aims are 1) to improve recognition of high risk groups within the population and 2) ultimately, to devise strategies for prevention based on mechanisms of risk.
The class of uncommon, strongly predisposing, mutations in cancer susceptibility genes, such as BRCA1 and 2 in breast and ovarian cancer, are important because of their significant risks to individuals. But BRCA mutations account for fewer than 5% of breast cancers, and for only about 20% of the estimated total inherited susceptibility. The remaining 80% of susceptibility is thought to be ‘polygenic’: that is, the result of the combined effects of many hundreds or thousands of common and rare genetic variants, each making a small contribution. Our group published the first cancer GWAS in 2007, identifying five loci for predisposition to breast cancer. Since then world-wide efforts have identified another 90 breast cancer loci, which collectively account for about 15% of the estimated susceptibility. In lung cancer, only a small number of loci, accounting for less than 5% of susceptibility, have been identified.
We face two related problems: 1) identification of the loci which account for the as yet unexplained ‘missing heritability’ and 2) elucidation of the mechanism by which the genetic variants have their effect.
To date, these problems have been tackled one locus at a time. As outlined below, we and our colleagues have continued this approach. However, given that there are probably hundreds of loci of diminishing effect, it seems that this analysis can never be anywhere near complete. The effects of the weaker loci will be too small to detect in even the largest imaginable GWAS, and the labour of functional dissection of each individual locus will be too great. Moreover, in reality the genetic variants are having their effects, not one by one but in combination; in interaction with each other and with the environment. This suggests that some measurement of the combined effect might provide an integrated readout of the risk of all the variants and their interaction with the environmental exposures in that individual, and perhaps an indication of the mechanisms involved.
Breast Cancer: (1) 11q13 and FGFR2 risk loci
The 11q13 and FGFR2 risk loci were among the first breast cancer risk loci identified. To gain insight into their mechanism of function, genetic fine-mapping was carried out in collaboration with the iCOGS consortium for both loci and combined with functional studies using DNase hypersensitivity, transcriptional assays, chromatin immunoprecipitation and 3C (chromatin conformation capture). We identified likely functional variants and the regulated target genes: interestingly the initial risk association was driven by multiple independent risk signals for both genes and allele-specific binding by the transcription factors ELK4 and GATA3, and FOXA1 and E2F1 was shown to underlie changes in transcriptional activity of the target genes CCND1 and FGFR2, respectively (French et al., Am J Hum Genet. 2013; 92: 489; Meyer et al., Am J Hum Genet. 2013; 93: 1046). Our findings highlight that regulatory mechanisms at risk loci are highly complex.
(2) Gene regulatory analysis of FGFR2 signalling
The FGFR2 (fibroblast growth factor receptor 2) locus has consistently given the strongest association signal in breast cancer GWAS. A SNP in intron 2 of the gene is associated with increased risk of ER+ breast cancer, probably acting through reduced gene expression, and reduced antagonism by FGFR2 of estrogen receptor signalling. Downstream signalling from FGFR2 is through grb2, sos and ras, but this does not provide insight into the mechanisms of increased susceptibility. We took a network-based approach to understand better the effects of altered FGFR2 signalling. We constructed a breast cancer transcription factor based network using the ARACNE algorithm (in collaboration with the Califano lab, Columbia University) on published gene expression datasets. We then used three different experimental systems to compare gene expression levels in MCF‑7 cells before and after induction of FGFR2 signalling. The sets of differentially expressed genes were superimposed on the networks (Figure 1). We found that the five transcription factors and regulons that were consistently enriched for FGFR2-regulated genes across all the experiments included ESR1, FOXA1, and GATA 3, already implicated in breast cancer. The other two were SPDEF, also reported to show somatic alterations in breast cancer, and PTTG1, a known driver of proliferation (Figure 1) (Fletcher et al., Nat Commun. 2013; 4: 2464). This result shows that a major component of the FGFR2 effect on susceptibility is mediated through oestrogen receptor signalling networks – perhaps not unexpected, but a demonstration that the network approach can provide insights that the conventional pathway analysis may not.
Figure 1. Master regulators (purple squares) and regulons (dots) enriched for FGFR2 signatures, within a partial view of a filtered transcriptional network for breast cancer. The FGFR2 regulated genes are coloured purple, the intensity reflecting whether they were found in 1, 2 or 3 of the experimental perturbations of FGFR2 signalling.
(3) Extension of the gene regulatory network analysis: heterogeneity of mechanism
The most important practical consequence of the polygenic model is that it implies a distribution of risk in the population. The analogy is to think of genetic variants as a hand of cards dealt out at conception. A woman might be at high risk of breast cancer because she has inherited say 100 out of 500 or more possible higher risk variants. The question is, does the large number of different possible combinations of variants imply a similarly large number of different mechanisms – which would be difficult to unravel and difficult to target for prevention – or do these combinations all converge in the end on a small number of common mechanisms? If this were true, the problem would be far more tractable. To address this, we have taken two approaches:
- An extension of variant set enrichment analysis to show that the set of FGFR2-regulated genes are themselves enriched among the eQTLs at the top 68 breast cancer GWAS loci: that is, among the genes whose expression is altered by the SNPs that lie in the same haplotype block as the ‘tagging SNP’ detected by GWAS. This result implies a degree of clustering of mechanism around the FGFR2 pathway among the top breast GWAS hits.
- A wider look at the distribution of the eQTLs related to the top GWAS loci across the regulons in the entire breast cancer regulatory network. Our preliminary results (Figure 2) (unpublished) suggest a markedly non-random distribution, which implies that mechanisms of polygenic susceptibility may be less heterogeneous that we had feared.
Figure 2. Distribution of the eQTLs associated with the top 68 breast GWAS loci across the unfiltered breast cancer network. Circles represent regulons, named by their master regulator. Intensity of red colour indicates degree of enrichment of each regulon for GWAS - eQTLs. The highly enriched regulons are concentrated around the ESR1/FOXA1/GATA3 group shown in Figure 1.
In the future, we will extend these results, and apply the same approaches to lung cancer. Here our underlying hypothesis is that smokers may be at differing risk of lung cancer, depending on their genetically-influenced airway responses to cigarette smoke injury. We will use network-based comparison of gene expression patterns in smokers with and without lung cancer to search for such differences.