Regulatory systems biology
The ultimate aim of classical genetics and modern genomics – to understand the molecular details of how the genome is deployed transcriptionally to create a diversity of tissues and species – remains elusive. Such an understanding will have profound importance for cancer research, as a major hallmark of tumour progression is the occurrence of new genetic mutations and their resulting perturbation of gene expression programmes. Using liver and liver cancer as model systems, we research the regulatory evolution of transcription as well as how the molecular evolution of cancers depends on the starting regulatory landscape of parent cells.
The control and evolution of tissue-specific gene expression
The proteins that control DNA, known as transcription factors, bind to it in a combinatorial manner in yeast and bacteria, and our earliest work showed that this combinatorial binding occurs in mammalian tissues as well. Master regulators in primary human hepatocytes form a highly interconnected core circuitry that frequently bind promoter regions in clusters, particularly at highly regulated and transcribed genes (Odom et al., Mol Syst Biol 2006; 2: 2006.0017). We have recently found that transcriptional regulation diverges very rapidly in mammals (Schmidt et al., Science 2010; 328: 1036; Odom et al., Nat Genet 2007; 39: 730). Despite this evolution, we found specific genetic architectures that appear to preserve a small handful of transcription factor binding events across large evolutionary timescales (>300 million years) (Schmidt et al., Science 2010; 328: 1036). Even at closely related evolutionary distances, such as closely related inbred strains of mice, TF binding diverges with surprisingly greater speed than does the underlying genetic sequences (Figure 1) (Stefflova et al., Cell 2013; 154:530).
Figure 1. Tissue-specific transcription factor binding evolves rapidly, even in closely related species.
In asking why rapid variation occurs among most transcription factor binding events, we realised that a number of causative factors could contribute. These possible causes may be the result of variability of genetic sequences, the types and number of marks left in the histone proteins that package DNA (commonly thought of as an epigenetic code), or even diet or environmental differences between different species. In order to isolate a single one of these variables, we used a previously created mouse model of Down’s syndrome that carries a virtually complete copy of a human chromosome (O’Doherty et al., Science 2005; 309: 2033). By exploiting this aneuploid mouse strain, a unique and powerful genetic tool designed for an entirely different purpose, we determined that genetic sequence dominates other factors in directing transcription (Wilson et al., Science 2008; 322: 434). More recently, we have used this mouse to investigate how human-specific repetitive elements contain latent regulatory potential that is unmasked in a mouse heterologous environment (Ward et al., Mol Cell 2013; 49: 262).
The origin, regulation, and evolution of RNA transcription
We have been using similar comparative functional genomics approaches to look at the regions of the genome that are transcribed, but which do not code for proteins. These regions are known as non-coding RNAs, and range from well-characterised species like tRNAs and rRNAs to newer categories of regulatory nucleic acids like microRNAs, piRNAs, and endogenously expressed RNAi. We recently published results describing previously unseen functional conservation in tRNA gene transcription driven by RNA polymerase III, that only becomes apparent after analysis of data from multiple mammalian species (Kutter et al., Nat Genet 2011; 43: 948). We have recently reported that the rapid evolution of long noncoding RNAs between closely related mammals can influence nearby gene expression (Kutter et al., PLoS Genet 2012; 8: e1002841). Finally, by using closely related, but still interbreeding species of mice, we were able to dissect the relative cis- and trans- contributions to gene expression, discovering that in mammals, compensatory cis and trans effects appear to be the rule during evolution (Goncalves et al., Genome Res 2012; 22: 2376). This was an intriguing finding because other systems studied to date, such as flies and yeast, have found much stronger trans contributions at close evolutionary distances.
The complex interplay of CTCF, cohesin, and repetitive sequences in the genome
The CTCF protein is a genomic anchor that appears to have roles in regulating mitosis and meiosis, and in insulating chromatin and gene expression across the genome (Merkenschlager and Odom, Cell 2013; 152: 1285). Many of its functions are mediated by the cohesin complex in mammalian cells. We have discovered how the cohesin complex can co-regulate gene expression with tissue-specific transcription factors in the absence of its canonical partner CTCF (Schmidt et al., Genome Res 2010; 20: 578). By creating large, high-resolution maps of cohesin and tissue-specific transcription factor binding in mouse liver cells, we revealed that cohesin appears to stabilize large complexes of proteins, thus reducing the required motif quality for transcription factor binding (Faure et al., Genome Res 2012; 22: 2163). We have also explored how most lineage-specific CTCF binding is not ‘born’ in the same way as other tissue-specific transcription factors, but appears in the genome via carriage within repetitive elements that are active in a species-specific manner in mammals (Schmidt et al., Cell 2012; 148: 335) (Figure 2). Collectively we found that these newborn CTCF binding events are as functionally active as ancient ones found in six or more mammals, and that these ancient binding events show fossilized remains of the prior repeat expansions that gave birth to them.
Figure 2. CTCF binding evolution across mammals reveals new mechanisms of genome evolution, driven by repetitive elements.