feat: Configure splitting on graphemes #314

afnanenayet · 2022-03-21T02:58:06Z

Allow configuring whether the input tree is split into graphemes. This
allows for a performance-precision trade-off. Splitting on graphemes is
more granular, but that means that we need to allocate a metadata struct
and split out every single unicode grapheme in the document. We can
avoid this processing step, but that means we're comparing all of the
text in a node.

This also does some organizational refactoring to clean up the former
ast module. The input processing method has been cleaned up, as well
as its supporting functions. The method for actually computing the diff
between two entry vectors has been moved into the diff module, which
seems more appropriate.

Allow configuring whether the input tree is split into graphemes. This allows for a performance-precision trade-off. Splitting on graphemes is more granular, but that means that we need to allocate a metadata struct and split out every single unicode grapheme in the document. We can avoid this processing step, but that means we're comparing all of the text in a node. This also does some organizational refactoring to clean up the former `ast` module. The input processing method has been cleaned up, as well as its supporting functions. The method for actually computing the diff between two entry vectors has been moved into the `diff` module, which seems more appropriate.

afnanenayet merged commit b89d2fa into main Mar 21, 2022

afnanenayet deleted the afnan/input-processing-cfg-refactor branch March 21, 2022 03:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Configure splitting on graphemes #314

feat: Configure splitting on graphemes #314

afnanenayet commented Mar 21, 2022

feat: Configure splitting on graphemes #314

feat: Configure splitting on graphemes #314

Conversation

afnanenayet commented Mar 21, 2022