-
Notifications
You must be signed in to change notification settings - Fork 7
Subcommand: edgepca
Perform Edge PCA for a set of samples.
Usage: gappa analyze edgepca [options]
Input | |
---|---|
--jplace-path |
Required. TEXT:PATH(existing)=[] ... List of jplace files or directories to process. For directories, only files with the extension .jplace are processed. |
Settings | |
--kappa |
FLOAT=1 Exponent for scaling between weighted and unweighted splitification. |
--epsilon |
FLOAT=1e-05 Epsilon to use to determine if a split matrix’s column is constant for filtering. Set to a negative value to deavtivate constant columnn filtering. |
--components |
UINT=5 Number of principal coordinates to calculate. Use 0 to calculate all possible coordinates. |
--point-mass |
Treat every pquery as a point mass concentrated on the highest-weight placement. |
--ignore-multiplicities |
Set the multiplicity of each pquery to 1. |
Color | |
--color-list |
TEXT=spectral List of colors to use for the palette. Can either be the name of a color list, a file containing one color per line, or an actual list of colors. |
--reverse-color-list |
If set, the --color-list is reversed. |
--mask-color |
TEXT=#dfdfdf Color used to indicate masked values. |
Output | |
--out-dir |
TEXT=. Directory to write files to |
--file-prefix |
TEXT=edgepca_ File prefix for output files |
Tree Output | |
--write-newick-tree |
If set, the tree is written to a Newick file. |
--write-nexus-tree |
If set, the tree is written to a Nexus file. |
--write-phyloxml-tree |
If set, the tree is written to a Phyloxml file. |
--write-svg-tree |
If set, the tree is written to a Svg file. |
Svg Tree Output | |
--svg-tree-shape |
TEXT:{circular,rectangular}=circular Shape of the tree. |
--svg-tree-type |
TEXT:{cladogram,phylogram}=cladogram Type of the tree. |
--svg-tree-stroke-width |
FLOAT=5 Svg stroke width for the branches of the tree. |
--svg-tree-ladderize |
If set, the tree is ladderized. |
Global Options | |
--allow-file-overwriting |
Allow to overwrite existing output files instead of aborting the command. |
--verbose |
Produce more verbose output. |
--threads |
UINT Number of threads to use for calculations. |
--log-file |
TEXT Write all output to a log file, in addition to standard output to the terminal. |
Performs Edge PCA. The command is a re-implementation of guppy epca
, see there for more details.
Edge PCA is an analysis method for phylogenetic placement data that reveals consistent differences between samples (jplace
files). It uses the imbalance of placements across the edges of tree, which allows to find differences in placements that may be close in the tree.
The command produces two tables that contain the result of the analysis. The projection
contains the jplace
samples projected into principal coordinate space, and the transformation
lists the top eigenvalues (first column) and their corresponding eigenvectors (remaining columns).
The principal components projection of the samples can be plotted and for example colored according to some per-sample metadata feature, in order to reveal dependencies between the placements of a samples and its metadata:
Furthermore, if the --write-...-tree
options are used, the principal components are visualized on the tree:
These trees allow to interpret how the plot above separates samples; that is, they show which edges contribute most to distinguish samples from each other.
When using this method, please do not forget to cite
Frederick Matsen, Steven Evans. Edge Principal Components and Squash Clustering: Using the Special Structure of Phylogenetic Placement Data for Sample Comparison. PLOS ONE, 2013. doi:10.1371/journal.pone.0056859
Module analyze
- correlation
- dispersion
- edgepca
- imbalance-kmeans
- krd
- phylogenetic-kmeans
- placement-factorization
- squash
Module edit
Module examine
Module prepare
Module simulate
Module tools