Skip to content

Subcommand: edgepca

Lucas Czech edited this page Jun 19, 2019 · 15 revisions

Perform Edge PCA for a set of samples.

Usage: gappa analyze edgepca [options]

Options

Input
--jplace-path Required. TEXT:PATH(existing)=[] ...
List of jplace files or directories to process. For directories, only files with the extension .jplace are processed.
Settings
--kappa FLOAT=1
Exponent for scaling between weighted and unweighted splitification.
--epsilon FLOAT=1e-05
Epsilon to use to determine if a split matrix’s column is constant for filtering. Set to a negative value to deavtivate constant columnn filtering.
--components UINT=5
Number of principal coordinates to calculate. Use 0 to calculate all possible coordinates.
--point-mass Treat every pquery as a point mass concentrated on the highest-weight placement.
--ignore-multiplicities Set the multiplicity of each pquery to 1.
Color
--color-list TEXT=spectral
List of colors to use for the palette. Can either be the name of a color list, a file containing one color per line, or an actual list of colors.
--reverse-color-list If set, the --color-list is reversed.
--mask-color TEXT=#dfdfdf
Color used to indicate masked values.
Output
--out-dir TEXT=.
Directory to write files to
--file-prefix TEXT=edgepca_
File prefix for output files
Tree Output
--write-newick-tree If set, the tree is written to a Newick file.
--write-nexus-tree If set, the tree is written to a Nexus file.
--write-phyloxml-tree If set, the tree is written to a Phyloxml file.
--write-svg-tree If set, the tree is written to a Svg file.
Svg Tree Output
--svg-tree-shape TEXT:{circular,rectangular}=circular
Shape of the tree.
--svg-tree-type TEXT:{cladogram,phylogram}=cladogram
Type of the tree.
--svg-tree-stroke-width FLOAT=5
Svg stroke width for the branches of the tree.
--svg-tree-ladderize If set, the tree is ladderized.
Global Options
--allow-file-overwriting Allow to overwrite existing output files instead of aborting the command.
--verbose Produce more verbose output.
--threads UINT
Number of threads to use for calculations.
--log-file TEXT
Write all output to a log file, in addition to standard output to the terminal.

Description

Performs Edge PCA. The command is a re-implementation of guppy epca, see there for more details.

Details

Edge PCA is an analysis method for phylogenetic placement data that reveals consistent differences between samples (jplace files). It uses the imbalance of placements across the edges of tree, which allows to find differences in placements that may be close in the tree.

The command produces two tables that contain the result of the analysis. The projection contains the jplace samples projected into principal coordinate space, and the transformation lists the top eigenvalues (first column) and their corresponding eigenvectors (remaining columns).

The principal components projection of the samples can be plotted and for example colored according to some per-sample metadata feature, in order to reveal dependencies between the placements of a samples and its metadata:

First two Edge PCA components projected.

Furthermore, if the --write-...-tree options are used, the principal components are visualized on the tree:

First two Edge PCA component trees.

These trees allow to interpret how the plot above separates samples; that is, they show which edges contribute most to distinguish samples from each other.

Citation

When using this method, please do not forget to cite

Frederick Matsen, Steven Evans. Edge Principal Components and Squash Clustering: Using the Special Structure of Phylogenetic Placement Data for Sample Comparison. PLOS ONE, 2013. doi:10.1371/journal.pone.0056859

Clone this wiki locally