HIV Databases HIV Databases home HIV Databases home
HIV sequence database



More information about PhyML settings

Maximum filesizes for:
BootstrapsMax. filesize
02M
10020K
10002K

From the PhyML manual:

  • Substitution model
    A nucleotide or amino-acid substitution model. For DNA sequences, the default choice is HKY85 (Hasegawa et al., 1985). This model is analogous to K80 (Kimura, 1980), but allows for different base frequencies. The other models are JC69 (Jukes and Cantor, 1969), K80 (Kimura, 1980), F81 (Felsenstein, 1981), F84 (Felsenstein, 1989), TN93 (Tamura and Nei, 1993) and GTR (e.g., Lanave et al. 1984, Tavaré 1986, Rodriguez et al. 1990). The rate matrices of these models are given in Swofford et al. (1996). For Amino-Acid sequences, the default choice is JTT (Jones, Taylor and Thornton, 1992). The other models are Dayhoff (Dayhoff et al., 1978), mtREV (as implemented in Yang's PAML), WAG (Whelan and Goldman, 2001), DCMut (Kosiol and Goldman, 2005), RtREV (Dimmic et al.), CpREV (Adachi et al., 2000) VT (Muller and Vingron, 2000), Blosum62 (Henikoff anf Henikoff, 1992) and MtMam (Cao, 1998).
  • Transition / transversion ratio
    With DNA sequences, it is possible to set the transition/transversion ratio, except for the JC69 and F81 models, or to estimate its value by maximizing the likelihood of the phylogeny. The later makes the program slower. The default value is 4.0. The definition of the transition/transversion ratio is the same as in PAML (Yang, 1994). In PHYLIP, the ''transition/transversion rate ratio'' is used instead. 4.0 in PHYML roughly corresponds to 2.0 in PHYLIP.
  • Proportion of invariable sites
    The default is to consider that the data set does not contain invariable sites (0.0). However, this proportion can be set to any value in the 0.0-1.0 range. This parameter can also be estimated by maximizing the likelihood of the phylogeny. The later makes the program slower.
  • Number of substitution rate categories
    The default is having all the sites evolving at the same rate, hence having one substitution rate category. A discrete-gamma distribution can be used to account for variable substitution rates among sites, in which case the number of categories that defines this distribution is supplied by the user. The higher this number, the better is the goodness-of-fit regarding the continuous distribution. The default is to use four categories, in this case the likelihood of the phylogeny at one site is averaged over four conditional likelihoods corresponding to four rates and the computation of the likelihood is four times slower than with a unique rate. Number of categories less than four or higher than eight are not recommended. In the first case, the discrete distribution is a poor approximation of the continuous one. In the second case, the computational burden becomes high and an higher number of categories is not likely to enhance the accuracy of phylogeny estimation.
  • Gamma distribution parameter
    The shape of a gamma distribution is defined by this numerical parameter. The higher its value, the lower the variation of substitution rates among sites (this option is used when having more than 1 substitution rate category). The default value is 1.0. It corresponds to a moderate variation. Values less than say 0.7 correspond to high variations. Values between 0.7 and 1.5 corresponds to moderate variations. Higher values correspond to low variations. This value can be fixed by the user. It can also be estimated by maximizing the likelihood of the phylogeny.
  • Starting tree(s)
    Used as the starting tree(s) to be refined by the maximum likelihood algorithm. The default is to use a BIONJ distance-based tree. It is also possible to supply one or several trees in NEWICK format, one per line in the file, which must be written in the standard parenthesis representation (NEWICK format) ; the branch lengths must be given, and the tree(s) must be unrooted. Labels on branches (such as bootstrap proportions) are supported. Therefore, a tree with four taxa named A, B, C, and D with a bootstrap value equals to 90 on its internal branch, should look like this:
    (A:0.02,B:0.004,(C:0.1,D:0.04)90:0.05);
    If you give several trees and analyse several data sets the two numbers must match.
  • Optimise starting tree(s) options
    You can optimise the starting tree(s) in three ways :
    o You can optimise the topology, the branch lengths and rate parameters (transition/transversion ratio, proportion of invariant sites, gamma distribution parameter),
    o You can keep the topology and optimise the branch lengths and rate parameters (it is not possible to optimise the tree topology and keep the branch lengths and rate parameters),
    o You can ask for no optimisation, PHYML just computes the likelihood of the starting tree(s).
  • last modified: Fri May 9 10:41 2008


    Questions or comments? Contact us at seq-info@lanl.gov.

     
    Operated by Los Alamos National Security, LLC, for the U.S. Department of Energy's National Nuclear Security Administration
    Copyright © 2005-2006 LANSLLC All rights reserved | Disclaimer/Privacy

    Dept of Health & Human Services Los Alamos National Institutes of Health