-
Notifications
You must be signed in to change notification settings - Fork 17
Program list
Here is a list of the current programs in phyx. If you have suggestions for new programs, please let us know by submitting an issue.
By default, all programs are compiled with make
. An individual program can be compiled with the command make <name>
.
As with typical Unix programs, help is displayed with the -h
flag, printing out a menu with the options and their types. Program version information can be obtained by using the -V
flag (note uppercase; lowercase -v
is typically reserved for more verbose output).
A few notes on default behaviour. First, as with standard Unix programs, output files are overwritten without warning, so it is important to be aware of this. Second, most programs that produce output do so to a default format: newick for trees, and fasta for sequences. However, these can always be piped to a subsequent program to change the format (see below).
If you would like to process a large number of files at once a simple command line for loop will let you do that. An example would be if you have a few thousand fasta files that you would like to remove all ambiguous data from, you can run a line of code such as this:
for x in *.fa; do ./pxclsq -s $x -o $x-cln -p 1.0; done
The programs can also automatically pipe the output of one into another the input of another. An example of this could be to take the aligned amino acid sequences and guide that to align nucleotide sequences, then clean the file, then create a quick neighbor-joining tree:
./pxaa2cdn -a amino_acid_alignment -n nucleotide_alignment || ./pxclsq -p 1.0 || ./pxnj -o output_tree_file
- pxbdfit: diversification model inference
Fit a diversification model to an ultrametric tree. Model is controlled by the -m
flag, which has the options bd
(default), yule
, or best
(the optimal model as determined by AIC). Returns model parameters (b, d, r, e), likelihood, aic, and tree statistics.
pxbdfit -t bd.tre -m yule
- pxbdsim: a birth death simulator
This program is a birth death simulator that allows the user to either specify the number of extant species (-e
) or the simulation can be run for a given amount of time (-t
). The simulator also allows for the incorporation of the taxa that have gone extinct (via the -s
flag).
pxbdsim -e 100 -s -b 1 -d 0.5 -o output_tree_file
- pxbp: prints out bipartitions that make up the tree
This program takes in a tree file and prints out all the bipartitions that compose the tree.
pxbp -t Tree.tre -o bp_output
- pxbpsq: prints out bipartions from a sequence file
- pxboot: sequence alignment resampling (bootstrap or jackknife)
This program is designed to either run a bootstrap or a jackknife on the data. If a bootstrap is run then there is no need to specify an amount but for a jackknife the jackknife percentage must be specified with (-f
)
Example jackknife:
pxboot -s Alignment -x 112233 -f 0.50 -o output_of_50_jackknife
Example bootstrap:
pxboot -s Alignment -p parts -o output_of_bootstrap
- pxconsq: a consensus sequence constructor for an alignment
pxconsq -s Alignment
- pxcontrates: a brownian and ou estimator
pxcontrates -c contrates_file.txt -t contrates_tree.tre -a 1
- pxfqfilt: a fastq filter given a mean quality
pxfqfilt -s fqfilt_test.fastq -m 10
- pxlog: a MCMC log manipulator/concatenator
Resamples parameter or tree MCMC samples using some burnin and thinning across an arbitrary number of log files. NOTE: resampling parameters are in terms of number of samples, not number of generations. To determine the attributes of the log files, you can first use the -i
(--info
) flag:
./pxlog -t tree_files -i
and then sample accordingly:
./pxlog -t tree_files -b some_burnin -n some_thinning
- pxlstr: information about trees in a file (like ls but for a tree file)
Prints summary information about trees, including whether it is rooted/binary/ultrametric, number of terminals, tree length, etc.
pxlstr -t Tree.tre
- pxlssq: information about seqs in a file (like ls but for a seq file)
Prints summary information about a sequence alignment, including number of taxa/characters, character frequencies, etc.
pxlssq -s Alignment
- pxmrca: information about an mrca
This program will provide the information regarding the most recent common ancestor, giving number of tips in the tree and number of tips for each clade specified. The clade that will be analyzed is the smallest clade containing the tips specified.
pxmrca -t mrca_test.tre -m mrca.txt
- pxmrcacut: a mrca cutter
pxmrcacut -t tree -m mrca_file
- pxmrcaname: a mrca label maker
pxmrcaname -t tree -m mrca_file
- pxnni: a nni changer (being trouble shot)
This is a basic nearest neighbor interchange program. Takes in a newick or nexus file with one or more trees and performs a nearest neighbor interchange.
./pxnni -t tree_file -o output_tree_file
- pxnw: simple needleman-wunsch
This is a simple alignment program that performs an analysis of pairwise alignments using the Needleman Wunch algorithm for all sequences in the file. It has the options of outputting the scores (-o
) and the alignment (-a
). The program also allows the user to input a matrix or uses EDNA for DNA and Blossum62 for AA as defaults.
pxnw -s Alignment.aln
- pxsw: simple smith waterman
This is a simple alignment program that performs an analysis of pairwise alignments using the Smith Waterman algorithm for all sequences in the file. It has the options of outputting the scores (-o
) and the alignment (-a
). The program also allows the user to input a matrix or uses EDNA for DNA and Blossum62 for AA as defaults.
pxsw -s Alignment.fa
- pxrevcomp: a reverse complementor
pxrevcomp -s Nucleotide.fa
- pxstrec: a state reconstructor
This is a program that does some ancestral state reconstruction and stochastic mapping of categorical characters. There are a number of options and the requirement for a control file. The control file can be as simple as ancstates = _all_
which designates that you want ancestral states calculated for each node. The can then be output on a tree in a file given by an -o FILE
option. If you only want to look at particular nodes, these can be designated in the control with the mrca = MRCANAME tipid1 tipid2
. Then the MRCANAME can be given at the ancstates = MRCANAME
. If you would like stochastic mapping with the time in the state mapped you can use the same format but instead of ancstates
you would put stochtime
. For stochastic number of events stochnumber
or ``stochnumber_any. For the stochastic mapping, you will need to designate an MRCA or MRCAs (not _all_). Multiple can be separated by commas or spaces. You can output these to a file with
-n` for number of events, `-a` for the total number of events, and `-m` for the duration.
pxstrec -d test.data.narrow -t test.tre -c config_stmap
-
pxs2fa: a seq file converter to fasta (force to uppercase with
-u
argument)
pxs2fa -s Alignment
-
pxs2phy: a seq file converter to phylip (force to uppercase with
-u
argument)
pxs2phy -s Alignment
-
pxs2nex: a seq file converter to nexus (force to uppercase with
-u
argument)
pxs2nex -s Alignment
- pxt2new: a tree file converter to newick
pxt2new -t Tree.nex
- pxrecode: a sequence alignment recoder. Currently only to RY-coding, but more coming.
Program is designed to recode nucleotide files to RY-coding.
pxrecode -s Nucleotide.fa
-
pxvcf2fa: convert vcf file to fasta alignment (force to uppercase with
-u
argument)
pxvcf2fa -s vcf_file
- pxrr: rerooting and unrooting trees
This program will re-root trees based off of given outgroup(s) (-g
) or the program can unroot a tree (-u
). Outgroups are specified If not all the outgroups are found in the tree the program will print an error. However, the program can re-root the tree based on the outgroups that are available by using the silent option (-s
). Alternatively, if the outgroups are ranked in preference but not all necessarily present in a given tree the program can root on the first outgroup present by using the -r
option. It provides a useful tool for re-rooting thousands of trees which can then be used for analyzing gene discordance across phylogenies.
pxrr -t rr_test.tre -g s1,s2
- pxtscale: Tree rescaling.
Tree rescaling by providing either scaling factor (s
) or root height (r
) (not both); the latter requires an
ultrametric tree.
pxtscale -t ultra.tre -s 2.0
- pxrmt: pruning trees (like rm but for trees)
This program is designed to take in a tree and prune tips from the tree that are not wanted, tips can be given either as a comma separated list or can be given in a file.
pxrmt -t rmt_test.tre -n s1
- pxcat: an alignment concatenator
This is a concatenation program designed to rapidly concatenate thousands of files (~5 seconds on 2500 files using one processor). An example of how to run this would be if you have a folder with a mixture of files that end in .fa, .phy, and .nexus you can run the following line.
pxcat -s *.fas *.fa *.phy -p Parts -o Supermatrix
This will create a file called concatenated.fa with the sequences concatenated in the order they would appear if you type ls on the command line. It will also create a partition file called partitions.model that can be used with RAxML.
- pxclsq: cleanseqs (clean sites based on missing or ambiguous data)
This is a sequence alignment cleaning program, designed to help you rapidly remove sites containing ambiguous and missing data from an alignment file. The program requires you to specify a file you want cleaned and the proportion of data in individual columns that must be non-ambiguous. An example would be if you want no missing data, you could run it as follows.
pxclsq -s Alignment -p 1.0
- pxaa2cdn: converts AA alignment and unaligned nucleotide to codon alignment
This is a program that lets you give an amino acid alignment and an unaligned nucleotide sequence. The nucleotide sequence will then be turned into a codon alignment based on the amino acid alignment. The sequences in the amino acid alignment do not need to be in the same order as the nucleotide file.
pxaa2cdn -a AA_Alignment.fa -n Unaligned_Nucleotide.fa -o CDN_aln.fa
- pxupgma: builds a basic upgma tree
This is a basic upgma tree builder, definitely not for making publishable trees, it's here because there are not many out there and it's useful if you're teaching a phylogenetics class and need an example of one of the earliest ways trees were made. It also prints the distance matrix to the screen.
./pxupgma -s drosophila.aln
- pxtlate: Translate nucleotide sequences into amino acids
This program translates nucleotide sequences to their corresponding amino acid sequences. By default it uses the standard translation table, but this can be changed with the -t
argument (use -h
to which tables are currently available).
pxtlate -s Sequence.fa
- pxrms: pruning seqs (like rm but for seqs)
This program is designed to delete sequences from a file, through the input of a file with the sequences you wish to delete that are all on a separate line.
pxrms -s Nucleotide.fa -f List.txt
- pxrlt: Taxon relabelling for trees
Takes two ordered lists of taxon labels, -c
(current) and -n
(new), with one label per line. Substitutes the former for the latter in trees passed in by -t
(or stdin). This is convenient for switching between lab codes for analysis and taxon names for figure preparation.
pxrlt -t kingdoms.tre -c kingdoms.oldnames.txt -n kingdoms.newnames.txt
- pxrls: Taxon relabelling for sequences
Takes two ordered lists of taxon labels, -c
(current) and -n
(new), with one label per line. Substitutes the former for the latter in an alignment passed in by -s
(or stdin). This is convenient for switching between lab codes for analysis and taxon names for data archiving.
pxrls -s SeqFile -c CurrentNames -n NewNames
- pxnj: Basic neighbor joining program
This is a basic neighbor joining program that will produce trees with the cannonical branch lengths, # of substitutions instead of substitutions per base pair. It allows for parallel processing and is designed to be extremely rapid to give a rough idea of what the final tree will come out to be before doing an ML or Bayesian analysis.
pxnj -s Alignment.aln
- pxseqgen: Sequence simulation program
This is a sequence simulator that allows the user to give a tree and specify a model of evolution and sequences will be generated for that tree under the model. Some features are that it allows for the model of evolution to change at nodes along the tree using the (-m
) option. The program also allows the user to specify rate variation through a value for the shape of the gamma distribution with the (-g
) option and the user is able to specify the proportion of invariable sites the would like to include using the (-i
) option. Other options can be found from the help menu by typing (-h
) after the program.
For multimodel simulations it is easiest to print out the node labels on your tree originally using the (-p
) option.
Once you know the nodes that you would like the model to change at you can specify these nodes on the input using the (-m
) option. An example if you wanted two models of evolution on your tree one for the tree and one where it changes at node two, you would enter the command as follows.
if the model you want for the tree is: (.33,.33,.33,.33,.33) where values correspond to (A<->C,A<->G,A<->T,C<->G,C<->T,G<->T)
and the model you want to change to at node two is: (.30,.30,.20,.50,.40) where values correspond to (A<->C,A<->G,A<->T,C<->G,C<->T,G<->T)
The command would be as follows
./pxseqgen -t tree_file -o output_alignment -m A<->C,A<->G,A<->T,C<->G,C<->T,G<->T,Node#,A<->C,A<->G,A<->T,C<->G,C<->T,G<->T
./pxseqgen -t tree_file -o output_alignment -m .33,.33,.33,.33,.33,.33,2,.3,.3,.2,.5,.4,.2
- pxnprs: calculate a time calibrated tree with NPRS
- pxtrsq: tree-seq; remove from tree and alignment taxa not found in both (JWB)
- pxgbdb: a basic genbank database creator
- pxcomp: a composition homogeneity test (JWB)
- pxtdist: tree distance calculator (JWB)
- pxcoal: gene trees in species trees; simulation and probabilities. (JWB)
- pxau: calculate the "approximately unbiased" test tree probabilities (Shimodaira, 2002). (JWB)