-
Notifications
You must be signed in to change notification settings - Fork 1
7 FAQ: Frequently Asked Questions
Q: Does Vclust handle circularly permuted bacteriophage genomes?
A: Yes, Vclust handles circularly permuted bacteriophage genomes by being robust to sequence rearrangements (e.g., translocations and circular permutations). It calculates ANI and alignment fraction (coverage) across all local alignments between two genomes, even when homologous segments are reordered. Vclust showed minimal inaccuracies in ANI and coverage in tests with circularly permuted genomes, with a mean absolute error of 0.04% compared to non-permuted genomes. These small discrepancies are due to short alignment discontinuities at the breakpoint positions in circular genomes.
Q: How does Vclust's sensitivity compare to BLASTn and MegaBLAST?
A: Vclust is designed to match the sensitivity of BLASTn, which is considered highly reliable for estimating ANI. Like BLASTn, Vclust uses an anchor length of 11 nucleotides to align sequences with high precision. MegaBLAST, in comparison, uses a larger word size of 28 nucleotides, making it less sensitive.
Q: Can I increase the minimum sequence identity (default: 0.7) in prefilter if aim for a higher ANI threshold (0.95)?
A: Yes, you can safely increase the default minimum sequence identity (--min-ident
) in the prefilter step to target a higher ANI threshold. We designed the sequence identity calculation in the prefilter
command to be higher than the ANI derived from the subsequent align
step. Specifically, while the sequence identity is calculated similarly to ANI in Mash, Vclust's calculation is based on the shorter sequence. As a result, the default --min-ident
of 0.7
can be raised to values closer to the final alignment-based ANI threshold.
In our tests for vOTU clustering (ANI ≥ 95% and AF ≥ 85%), even increasing --min-ident
to 0.95
during prefiltering did not exclude any genome pairs with an alignment-based ANI of ≥ 95%. Additionally, raising the default --min-ident
from 0.7
to 0.95
significantly reduces the number of genome pairs requiring alignment, thereby speeding up the alignment step.
- Features
- Installation
- Quick Start
- Usage
- Optimizing sensitivity and resource usage
-
Use cases
- Classify viruses into species and genera following ICTV standards
- Assign viral contigs into vOTUs following MIUViG standards
- Dereplicate viral contigs into representative genomes
- Calculate pairwise similarities between all-versus-all genomes
- Process large dataset of diverse virus genomes (IMG/VR)
- Process large dataset of highly redundant virus genomes
- Cluster plasmid genomes into pOTUs
- FAQ: Frequently Asked Questions