Estimation of Reversible Substitution Matrices and Evolutionary Distance

William J. Bruno and Lars Arvestad

Abstract

We present a method for estimating the most general reversible substitution matrix corresponding to a given collection of aligned DNA sequences. This matrix can then be used to calculate evolutionary distances between pairs of sequences in the collection. Our algorithms are designed for fast execution times, even on large data sets. In a test case on a primate pseudogene, the matrix we arrived at resembles one obtained using maximum likelihood, and the resulting distance measure is shown to have better linearity than obtained in a less general model.

The paper is submitted (BiBTeX citation). Contact the authors if you want to be on our mailing list.

Source code

This is the C implementation of the method, distance. Both source and binaries are available.

Manual

Instructions on how to use distance.

Test data

The psi-eta-globin pseudo genes and their alignments that were used.

Illustrations 

Graphs and diagrams from the article as well as some related images.

Links

Related links.