Skip to content

Python implementation of the Damerau–Levenshtein diff algorithm

License

Notifications You must be signed in to change notification settings

nickeldan/dam_lev

Repository files navigation

dam_lev

Author: Daniel Walker
Version: 0.1.2
Date: 2023-01-05

Overview

The dam_lev package implements the Damerau–Levenshtein diff algorithm. That is, it will take two sequences and determine the minimum number of transpositions, substitutions, insertions, and deletions needed to transform the first sequence into the second.

Usage

The package exposes a single function, dam_lev.get_changes. It takes two sequences (i.e., they must implement the __len__ and __getitem__ methods) and returns a list of dam_lev.Mutation objects. There are four subclasses of dam_lev.Mutation corresponding to the four types of transformations. For example,

diffs = dam_lev.get_changes('abcdef', 'bcedxy')
print(diffs) # [Deletion(at=0), Transposition(at=3), Substitution(at=5, at2=4), Insertion(at=6, at2=5)]

We see that the sequence of transformations is:

  • Delete the item at index 0 ('a')
  • Transpose the item at index 3 ('d') with its successor
  • Substitute the item at index 5 ('f') with the item from the second sequence at index 4 ('x')
  • Insert at index 6 the item from the second sequence at index 5 ('y')

Note the index for the transposition. Even though, after the deletion, the 'd' is at index 2, it's at index 3 in the original version of the sequence. Likewise for the successive mutations.

Key function

You can also pass a callable as the key keyword argument to dam_lev.get_changes. Similar to list.sort, this callable will be used to compare the elements of the sequences. For example,

diffs = dam_lev.get_changes('aBc', 'AbC', key=str.upper)
print(diffs) # []

About

Python implementation of the Damerau–Levenshtein diff algorithm

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages