Skip to content

MashMap v3.0.1

Compare
Choose a tag to compare
@bkille bkille released this 12 Apr 16:18
· 88 commits to master since this release

MashMap3 Changelog

  • Instead of indexing locations of minimizers, we track indexing of windows for which a k-mer is one of the lowest s hashes in the window where s is the sketch size. These k-mers are termed "minmers."

  • The first-pass filtering stage computes the number of shared minmers for each candidate mapping in linear time. Regions with significantly high counts of shared minmers are passed on to stage 2.

  • The second stage of filtering, where the minhash score of each mapping in the candidate region is calculated, uses a std::vector to keep track of the rolling minhash score as opposed to the std::map used in MashMap2. The details can be seen in slidingMap.hpp.

  • While the mapping stage is faster, particularly for lower ANI cutoffs (90% and below), the indexing stage does require a bit more time than before. To avoid spending time recomputing the index, users can save the index via --saveIndex PREFIX, and then reuse it in a later run with --loadIndex PREFIX.

  • The default parameter for the sketch size depends on the value of the minimum ANI threshold (pi) and the segment length (L). Decreasing the sketch size will decrease runtime in a linear fashion at the cost of increasing the variance in the ANI estimation error.

  • Frequent seeds are filtered out based on how many minmer-intervals they have as opposed to how many times the kmer actually occurs in the reference. This adds some noise to frequent-kmer filtering, as its possible for a less frequent kmer to have more intervals than a more frequent kmer.

  • The binomial model is used to estimate ANI from Jaccard instead of the Poisson model.

  • k-mer size is no longer limited to <=16, as the hash values are 64 bits instead of 32 bits. The default kmer size is now 19.

  • Numerous interface updates were copied over from wfmash, including a progress meter and usage of the samtools .fai index.

  • The output of MashMap3 is now in PAF format, with id and jc tags which represent the estimated ANI and the estimated Jaccard similarity, respectively. The jc tag is only present for mappings where chaining is disabled.

  • There is now an option for significantly denser sketching, --dense