A Stark, MF Lin, P Kheradpour, JS Pedersen, L Parts, JW Carlson, MA Crosby, MD Rasmussen, S Roy, AN Deoras, JG Ruby, J Brennecke, Harvard FlyBase curators, Berkeley Drosophila Genome Project, E Hodges, AS Hinrichs, A Caspi, B Paten, S-W Park, MV Han, ML Maeder, BJ Polansky, BE Robson, S Aerts, J van Helden, B Hassan, DG Gilbert, DA Eastman, M Rice, M Weir, MW Hahn, Y Park, CN Dewey, L Pachter, WJ Kent, D Haussler, EC Lai, DP Bartel, GJ Hannon, TC Kaufman, MB Eisen, AG Clark, D Smith, SE Celniker, WM Gelbart, M Kellis
Sequencing of multiple related species followed by comparative genomics analysis constitutes a powerful approach for the systematic understanding of any genome. Here, we use the genomes of 12 Drosophila species for the de novo discovery of functional elements in the fly. Each type of functional element shows characteristic patterns of change, or 'evolutionary signatures', dictated by its precise selective constraints. Such signatures enable recognition of new protein-coding genes and exons, spurious and incorrect gene annotations, and numerous unusual gene structures, including abundant stop-codon readthrough. Similarly, we predict non-protein-coding RNA genes and structures, and new microRNA (miRNA) genes. We provide evidence of miRNA processing and functionality from both hairpin arms and both DNA strands. We identify several classes of pre- and post-transcriptional regulatory motifs, and predict individual motif instances with high confidence. We also study how discovery power scales with the divergence and number of species compared, and we provide general guidelines for comparative studies.