S Tavaré, DC Adams, O Fedrigo, GJ Naylor
Journal name: 
Pac Symp Biocomput
Citation info: 
We investigated whether or not evolutionary change in DNA sequence data was homogeneous across different classes of base pairs. DNA sequences for eight protein-coding mitochrondrial genes were obtained for 38 vertebrate taxa from GenBank. Each nucleotide site in the alignment was classified according to a number of covariates, including its codon position, genetic code degeneracy, and hydrophobicity. The evolutionary transition matrix for each base was estimated by tracing implied character changes under parsimony on a known phylogenetic tree. Canonical variates analyses of the inferred transition matrices were performed for each gene to determine whether or not different classes of bases behaved similarly. We found five distinct clusters of transition matrices that could be roughly defined by combinations of codon position and degeneracy. This pattern was consistent among all genes. A stochastic model of rate variation based on the interaction of the covariates was developed to assess the statistical significance of the clusters. The five-group classification was found to explain significantly more sequence variation than did a codon only classification, a codon degeneracy classification, or a codon and degeneracy classification. The same five-group classification was found for all genes tested, suggesting a common process underlying the molecular evolution of the mitochondrial genome. These results confirm that there are classes of base pairs that evolve differently, and suggest that models of sequence evolution that incorporate covariate information may be useful in developing nucleotide substitution models that more accurately reflect evolutionary history.
Research group: 
Tavaré Group
E-pub date: 
01 Aug 2001
Users with this publication listed: 
Simon Tavaré