Skip to content
Donovan Parks edited this page Jun 16, 2014 · 19 revisions

Lineage-specific Workflow

The recommended workflow for assessing the completeness and contamination of genome bins is to use lineage-specific marker sets. This workflow consists of 4 mandatory (M) steps and 1 recommended (R) step:

(M) > checkm tree <bin folder> <output folder>
(R) > checkm tree_qa <output folder>
(M) > checkm lineage_set <output folder> <marker file>
(M) > checkm analyze <marker file> <bin folder> <output folder>
(M) > checkm qa <marker file> <output folder>

The tree command places genome bins into a reference genome tree. All genomes to be analyzed must reside in a single bins directory. CheckM assumes genome bins are in FASTA format with the extension fna, though this can be changed with the –x flag. The tree command can optionally be followed by the tree_qa command which will indicate the number of phylogenetically informative marker genes found in each genome bin along with a taxonomic string indicating its approximate placement in the tree. If desired, genome bins with few phylogenetically marker genes may be removed in order to reduce the computational requirements of the following commands. Alternatively, if only genomes from a particular taxonomic group are of interest these can be moved to a separate directory and analysed separately. The lineage_set command creates a marker file indicating lineage-specific marker sets suitable for evaluating each genome. This marker file is passed to the analyze command in order to identify marker genes and estimate the completeness and contamination of each genome bin. Finally, the qa command can be used to produce different tables summarizing the quality of each genome bin.

For convenience, the above workflow can be executed in a single step:

> checkm lineage_wf <bin folder> <output folder>

Taxonomic-specific Workflow

Using Custom Marker Genes

Clone this wiki locally