MashMap v3.0.1
MashMap3 Changelog
-
Instead of indexing locations of minimizers, we track indexing of windows for which a k-mer is one of the lowest
s
hashes in the window wheres
is the sketch size. These k-mers are termed "minmers." -
The first-pass filtering stage computes the number of shared minmers for each candidate mapping in linear time. Regions with significantly high counts of shared minmers are passed on to stage 2.
-
The second stage of filtering, where the minhash score of each mapping in the candidate region is calculated, uses a
std::vector
to keep track of the rolling minhash score as opposed to thestd::map
used in MashMap2. The details can be seen inslidingMap.hpp
. -
While the mapping stage is faster, particularly for lower ANI cutoffs (90% and below), the indexing stage does require a bit more time than before. To avoid spending time recomputing the index, users can save the index via
--saveIndex PREFIX
, and then reuse it in a later run with--loadIndex PREFIX
. -
The default parameter for the sketch size depends on the value of the minimum ANI threshold (
pi
) and the segment length (L
). Decreasing the sketch size will decrease runtime in a linear fashion at the cost of increasing the variance in the ANI estimation error. -
Frequent seeds are filtered out based on how many minmer-intervals they have as opposed to how many times the kmer actually occurs in the reference. This adds some noise to frequent-kmer filtering, as its possible for a less frequent kmer to have more intervals than a more frequent kmer.
-
The binomial model is used to estimate ANI from Jaccard instead of the Poisson model.
-
k-mer size is no longer limited to
<=16
, as the hash values are 64 bits instead of 32 bits. The default kmer size is now19
. -
Numerous interface updates were copied over from
wfmash
, including a progress meter and usage of the samtools.fai
index. -
The output of MashMap3 is now in PAF format, with
id
andjc
tags which represent the estimated ANI and the estimated Jaccard similarity, respectively. Thejc
tag is only present for mappings where chaining is disabled. -
There is now an option for significantly denser sketching,
--dense