Scientific Supercomputing at the NIH

PAML

Description

PAML is a package of programs for phylogenetic analyses of DNA or protein sequences using maximum likelihood. It is developed and maintained by Ziheng Yang at University College London. (PAML website)

PAML is intended to be used interactively on Helix. To run a PAML job, log on to helix.nih.gov using ssh, and type the full pathname of any PAML program. The PAML programs are installed in /usr/local/paml/bin. If you plan to use PAML frequently, it may be convenient to add this directory to your default path, as follows:

setenv PATH /usr/local/paml/bin:$PATH  (csh or tcsh)
PATH=/usr/local/paml/bin:$PATH; export PATH (bash)
To add this directory to your path at login time, add the appropriate line above to your ~/.cshrc or ~/.bashrc file.

PAML programs are:

  • baseml: ML analysis of nucleotide sequences: estimation of tree topology, branch lengths, and substitution parameters under a variety of nucleotide substitution models (JC69, K80, F81, F84, HKY85, TN93, REV); constant or gamma rates for sites; molecular clock (rate constancy among lineages) or no clock, among-gene and within-gene variation of substitution rates; models for combined analyses of multiple sequence data sets; calculation of substitution rates at sites; reconstruction of ancestral nucleotides.
  • basemlg: ML analysis of nucleotide sequences under the model of gamma rates among sites. The (continuous) gamma model is used with one of the following substitution models: JC69, K80, F81, F84, HKY85, TN93, and REV.
  • codonml (codeml with seqtype = 1): ML analysis of protein-coding DNA sequences using codon substitution models (e.g., Goldman and Yang 1994); calculation of the codon-usage table; estimation of synonymous and nonsynonymous substitution rates; likelihood ratio test of positive selection or relaxed selective constraints along lineages based on the dN/dS rate ratios; identification of amino acid sites or evolutionary lineages potentially under positive selection; reconstruction of ancestral codon sequences.
  • aaml (codeml with seqtype = 2): ML analysis of amino acid sequences under a number of amino acid substitution models (Poisson, Proportional, empirical models such as those of Dayhoff et al., Jones et al., mtREV24, and mtmam, and REV); constant or gamma-distributed rates among sites; molecular clock (rate constancy among lineages) or no clock, among-gene and within-gene variation of substitution rates; models for combined analyses of multiple gene data; calculation of substitution rates at sites; reconstruction of ancestral amino acid sequences.
  • pamp: Parsimony-based analyses for a given tree topology, estimation of the substitution pattern by the method of Yang and Kumar (1996); estimation of the gamma parameter for variable rates among sites by the method of moments, the method of Sullivan et al. (1995), and the method of Yang and Kumar (1996); reconstruction of ancestral character states using the algorithm of Hartigan (1973) and an unpublished "improved parsimony" method.
  • mcmctree: Bayesian estimation of phylogenies using DNA sequence data (Rannala and Yang, 1996; Yang and Rannala, 1997). Markov chain Monte Carlo calculation of posterior probabilities of trees. The algorithm is too slow to be usable.
  • evolver: This program used to be named listtree and does miscellaneous things, such as listing all rooted and unrooted trees for a given number of species, generating random trees with branch lengths from a birth-death process with species sampling, and calculating tree bipartition distances. It now also simulates nucleotide, codon, or amino acid sequence data sets. Parameters for the simulation are specified in the files MCbase.dat, MCcodon.dat, and MCaa.dat. You can run the program to see the main menu, and then consult one of those files to see the details. This program can easily fill your hard disk.
  • yn00: This program implements the method of Yang and Nielsen (2000) for estimating synonymous and nonsynonymous substitution rates in pairwise comparison of protein-coding DNA sequences. The method of Nei and Gojobori (1986) is also included in the program. Run yn00 and have a look at the control file yn00.ctl and the default result file yn. No further documentation is included for this program.

PAML is not good for tree making. There are a few options for heuristic tree search, but they do not work well except for small data sets of only a few species. If you hope to use PAML to compare trees from relatively large data sets, one possibility is to get a collection of candidate trees and then compare them using more sophisticated models implemented in PAML. You can get candidate trees by using other programs/methods implemented in PAUP*, PHYLIP, MOLPHY etc.

PAML may be useful if you are interested in the process of sequence evolution. The two main programs, baseml and codeml, implement a number of sophisticated models, which you can use to construt likelihood ratio tests of evolutionary hypotheses. Right now, the following options/models do not seem available in other packages.

Version

Type /usr/local/paml/baseml on command line

Sample session

% setenv PATH /usr/local/paml/bin:$PATH % cp /usr/local/paml/baseml.ctl . [...edit the baseml.ctl file to use your own files and set the desired parameters....] % baseml BASEML in paml 3.15, November 2005 Reading options from /usr/local/paml/baseml.ctl.. 6 verbose | verbose 0.00 7 runmode | runmode 0.00 15 model | model 4.00 11 Mgene | Mgene 0.00 9 clock | clock 0.00 16 fix_kappa | fix_kappa 0.00 17 kappa | kappa 5.00 18 fix_alpha | fix_alpha 0.00 19 alpha | alpha 0.50 20 Malpha | Malpha 0.00 21 ncatG | ncatG 5.00 24 nparK | nparK 0.00 12 nhomo | nhomo 0.00 13 getSE | getSE 0.00 14 RateAncestor | RateAncestor 1.00 27 Small_Diff | Small_Diff 0.00 5 cleandata | cleandata 1.00 8 method | method 0.00 Ambiguity character definition table: [...] ns = 5 ls = 895 Reading sequences, sequential format.. Reading seq # 1: Human Reading seq # 2: Chimpanzee Reading seq # 3: Gorilla Reading seq # 4: Orangutan Reading seq # 5: Gibbon Sequences read.. Counting site patterns.. 0:00 85 site patterns at 895 sites, 0:00 Counting frequencies.. 120 bytes for distance 0 bytes for conP0 10880 bytes for conP1 3400 bytes for fhK 8000000 bytes for space [...] Out... lnL = -2621.455491 convergence? Estimated rates for sites go into file rates lnL = -2621.455491 Reconstructed ancestral states go into file rst. Rates are variable among sites, marginal reconstructions only. Marginal reconstruction. Node 6: lnL = -2621.455491 Node 7: lnL = -2621.455491 Node 8: lnL = -2621.455491 Time used: 0:00

Documentation

A set of sample control files is available in /usr/local/paml
More examples in /usr/local/paml/examples
PAML website