Authors:
RC Griffiths, S Tavaré
Journal name: 
Math Biosci
Citation info: 
127(1):77-98
Abstract: 
The infinitely-many-sites process is often used to model the sequence variability observed in samples of DNA sequences. Despite its popularity, the sampling theory of the process is rather poorly understood. We describe the tree structure underlying the model and show how this may be used to compute the probability of a sample of sequences. We show how to produce the unrooted genealogy from a set of sites in which the ancestral labeling is unknown and from this the corresponding rooted genealogies. We derive recursions for the probability of the configuration of sequences (equivalently, of trees) in both the rooted and unrooted cases. We give a computational method based on Monte Carlo recursion that provides approximates to sampling probabilities for samples of any size. Among several applications, this algorithm may be used to find maximum likelihood estimators of the substitution rate, both when the ancestral labeling of sites is known and when it is unknown.
DOI: 
http://doi.org/10.1016/0025-5564(94)00044-z
Research group: 
Tavaré Group
E-pub date: 
01 May 1995
Users with this publication listed: 
Simon Tavaré