Skip to content

7 FAQ: Frequently Asked Questions

Andrzej Zielezinski edited this page Oct 12, 2024 · 3 revisions

Q: Does Vclust handle circularly permuted bacteriophage genomes?

A: Yes, Vclust handles circularly permuted bacteriophage genomes by being robust to sequence rearrangements (e.g., translocations and circular permutations). It calculates ANI and alignment fraction (coverage) across all local alignments between two genomes, even when homologous segments are reordered. Vclust showed minimal inaccuracies in ANI and coverage in tests with circularly permuted genomes, with a mean absolute error of 0.04% compared to non-permuted genomes. These small discrepancies are due to short alignment discontinuities at the breakpoint positions in circular genomes.

Q: How does Vclust's sensitivity compare to BLASTn and MegaBLAST?

A: Vclust is designed to match the sensitivity of BLASTn, which is considered highly reliable for estimating ANI. Like BLASTn, Vclust uses an anchor length of 11 nucleotides to align sequences with high precision. MegaBLAST, in comparison, uses a larger word size of 28 nucleotides, making it less sensitive.

Q: Can I increase the minimum sequence identity (default: 0.7) in prefilter if aim for a higher ANI threshold (0.95)?

A: Yes, you can safely increase the default minimum sequence identity (--min-ident) in the prefilter step to target a higher ANI threshold. We designed the sequence identity calculation in the prefilter command to be higher than the ANI derived from the subsequent align step. Specifically, while the sequence identity is calculated similarly to ANI in Mash, Vclust's calculation is based on the shorter sequence. As a result, the default --min-ident of 0.7 can be raised to values closer to the final alignment-based ANI threshold.

In our tests for vOTU clustering (ANI ≥ 95% and AF ≥ 85%), even increasing --min-ident to 0.95 during prefiltering did not exclude any genome pairs with an alignment-based ANI of ≥ 95%. Additionally, raising the default --min-ident from 0.7 to 0.95 significantly reduces the number of genome pairs requiring alignment, thereby speeding up the alignment step.