-
Notifications
You must be signed in to change notification settings - Fork 64
Input data
- Analysis type
- Multiple sequence alignment
- Evolutionary model
- Starting tree(s)
- Topological constraint
- State encoding & order
RAxML-NG supports several types of analysis, which can be selected by specifying a corresponding command:
Command | RAxML 8.x equivalent |
Meaning |
---|---|---|
--search |
-f d |
Run topology search to find the best-scoring ML tree (default) |
--evaluate |
-f e |
Optimize model parameters and/or branch lengths on a fixed tree topology |
--loglh |
N/A |
Compute log-likelihood of a given tree without any optimization. |
--bootstrap |
-b |
Run non-parametric bootstrap analysis (equivalent to 'slow' bootstrapping in RAxML). Number of bootstrap replicates and other parameters can be changed with respective options. |
--all |
-f a * |
Combined tree search and bootstrapping analysis; bootstrap support values will be plotted onto the best-scoring ML tree. |
--support |
-f b |
Compute bipartition support for a given reference tree (e.g., best ML tree) using an existing set of replicate trees (e.g., bootstrap trees obtained with --bootstrap option above). Usage:raxml-ng --support --tree bestML.tree --bs-trees bootstraps.tree
|
--bsconverge |
-I |
A posteriori bootstrap convergence test. Usage:raxml-ng --bsconverge --bs-trees bootstraps.tree --bs-cutoff 0.03
|
--check |
-f c |
Check alignment file and remove any columns consisting entirely of gaps |
--parse |
N/A |
Parse alignment, compress patterns and create binary MSA file |
--start |
-y |
Generate parsimony/random starting trees and exit |
--terrace |
N/A |
Check whether a tree lies on a phylogenetic terrace. Usage:raxml-ng --terrace --tree best.tre --msa ali.fa --model partition.txt
|
* Unlike in RAxML 8.x, this command will perform 'slow' bootstrapping procedure.
Option: --msa FILE
(mandatory)
RAxML-NG supports alignments in FASTA, non-interleaved PHYLIP and CATG formats.
By default, RAxML-NG will try to automatically detect alignment format based on the file contents. Usually this works just fine, but you can also specify the alignment format explicitly with the --msa-format
option.
Option: --model STRING | FILE
(mandatory)
Evolutionary model can be specified globally (i.e., for the whole alignment), or multiple models can be selected for different subsets of alignment columns (so called partitioned analysis).
Global per-alignment evolutionary model can be given as a string on the command line.
Model specification always starts with a substitution matrix name, e.g., GTR
for DNA data or LG
for protein data.
Several optional modifiers can be added, separated by +
and in arbitrary order. This notation is inspired by -- and mostly compatible with -- model specification in the IQ-Tree program (Nguyen et al. 2015).
NOTE: all per-state values (e.g. base frequencies) must be given in the following order.
All substitution matrices and modifiers are summarized in the following table:
Modifier | Possible values |
---|---|
Substitution matrix |
DNA data: JC , K80 , F81 , HKY , TN93ef , TN93 , K81 , K81uf , TPM2 , TPM2uf , TPM3 , TPM3uf , TIM1 , TIM1uf , TIM2 , TIM2uf , TIM3 , TIM3uf ,TVMef , TVM , SYM , GTR Protein data*: Dayhoff , LG , DCMut , JTT , mtREV , WAG , RtREV , CpREV , VT , Blosum62 , MtMam , MtArt , MtZoa , PMB , HIVb ,HIVw , JTT-DCMut , FLU , StmtREV , LG4M (implies +G4 ), LG4X (implies +R4 ), PROTGTR Binary data (0/1): BIN Morphological/multistate: MULTIx_MK , MULTIx_GTR (where x = number of states, e.g.: MULTI8_MK for a 8-state model with equal rates) state encoding Unphased diploid genotypes (10 states): GTJC GTHKY4 GTGTR4 GTGTR Fixed user-defined rates: e.g. HKY{1.0/2.5} or GTR{0.5/2.0/1.0/1.2/0.1/1.0}
|
Stationary frequencies |
+F or +FC (empirical)+FO (ML estimate)+FE (equal) +FU{f1/f2/../fn} (user-defined: f1 f2 ... fn ) |
Proportion of invariant sites |
+I or +IO (ML estimate)+IC (empirical)+IU{p} (user-defined: p ) |
Among-site rate heterogeneity model |
+G (discrete GAMMA with 4 categories, mean category rates, ML estimate of alpha) +GA (as above, but with median category rates) +Gn (discrete GAMMA with n categories, ML estimate of alpha) +Gn{a} (discrete GAMMA with n categories and user-defined alpha a ) +Rn (FreeRate with n categories, ML estimate of rates and weights) +Rn{r1/r2/../rn}{w1/w2/../wn} (FreeRate with n categories, user-defined rates r1 r2 ... rn and weights w1 w2 ... wn ) |
Ascertainment bias correction |
+ASC_LEWIS (Lewis' method)+ASC_FELS{w} (Felsenstein's method with total number of invariable sites w )+ASC_STAM{w1/w2/../wn} (Stamatakis' method with per-state invariable site numbers w1 w2 ... wn ) |
* see libpll wiki for details & references
Multiple models can be defined in a RAxML-style partition file. Example:
JC+G, p1 = 1-100, 252-400
HKY+F, p2 = 101-180, 251
GTR+I, p3 = 181-250
Here, each line defines a partition and consist of three elements:
- model specification (see above)
- partition name
- range of alignment columns
NOTE: In RAxML, certain model modifiers were global (e.g., GAMMA model of rate heterogeneity), and thus they were specified on the command line and not in partition file. In RAxML-NG, this limitation was lifted, i.e. it is now possible to combine partitions with and without GAMMA, proportion of invariant sites etc. (as in example above).
However, this means that RAxML partition files might need to be adjusted for RAxML-NG (e.g., by adding+G
for the partitions where GAMMA model of rate heterogeneity should be used).
In case of partitioned analysis, three branch length estimation modes are available:
Command | Meaning |
---|---|
--brlen linked |
Branch lengths are identical for all partitions (default) |
--brlen scaled |
Joint branch length estimation with individual per-partition scalers (i.e., branch lengths are proportional) |
--brlen unlinked |
Branch lengths are estimated independently for each partition (cf. RAxML -M option) |
Option: --tree rand{N} | pars{N} | FILE
RAxML-NG supports three types of starting trees:
- rand(om): start from a random topology
- pars(imony): start from a tree generated by the parsimony-based randomized stepwise addition algorithm
- user-defined: load a custom starting tree from the NEWICK file
For random and parsimony, you can specify the number of trees to generate in curly brackets (e.g., pars{10}
or rand{20}
). In this case, RAxML-NG will perform multiple tree searches (one per each starting tree), and pick the best-scoring topology as the final ML tree. You can also combine both parsimony and random starting trees in one run, e.g. --tree pars{10},rand{10}
.
Default number of starting trees depends on RAxML-NG version and command:
RAxML-NG v0.7.0b
Command | Meaning |
---|---|
--search |
1 random |
--all |
10 random + 10 parsimony |
RAxML-NG v0.7.0git >= 13.11.2018
Command | Meaning |
---|---|
--search |
10 random + 10 parsimony |
--search1 |
1 random |
--all |
10 random + 10 parsimony |
Option: --constraint-tree FILE
You can specify a constraint tree to e.g. enforce monophyly of certain groups (equivalent to the -g
option in RAxML8). If the constraint tree is comprehensive (i.e., it includes all taxa found in the MSA), then RAxML will simply resolve polytomies in the way that maximizes the likelihood. Conversely, if some taxa are missing from the constraint, they will be placed freely in the resulting ML tree.
Data type | Order |
---|---|
DNA | A C G T |
PROTEIN | A R N D C Q E G H I L K M F P S T W Y V |
MULTISTATE | 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z ! \ " # $ % & ' ( ) * + , / : ; < = > @ [ \ ] ^ _ { | } ~ |
GENOTYPE (diploid unphased) |
A C G T M R W S Y K (Meaning: A/A C/C G/G T/T A/C A/G A/T C/G C/T G/T ) |