feat: Configure splitting on graphemes #314
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Allow configuring whether the input tree is split into graphemes. This
allows for a performance-precision trade-off. Splitting on graphemes is
more granular, but that means that we need to allocate a metadata struct
and split out every single unicode grapheme in the document. We can
avoid this processing step, but that means we're comparing all of the
text in a node.
This also does some organizational refactoring to clean up the former
ast
module. The input processing method has been cleaned up, as wellas its supporting functions. The method for actually computing the diff
between two entry vectors has been moved into the
diff
module, whichseems more appropriate.