Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Add unicode_word_indices #91

Merged

Conversation

basile-henry
Copy link
Contributor

The PR adds a new iterator: UnicodeWordIndices (and the function unicode_word_indices). It is similar to UnicodeWords but also provides byte offsets for each word.

The motivation for this PR was making nushell/reedline#5 in which I used split_word_bound_indices and then filtered the result using logic that is internal to unicode_words. I believe that PR would have been trivial using unicode_word_indices. Hopefully it can also be useful to others.

Should I add more tests for unicode_word_indices? Or are the existing tests for unicode_words and the doc test for unicode_word_indices sufficient?

The iterator UnicodeWordIndices is similar to UnicodeWord but also provides byte offsets for each word
@Manishearth Manishearth closed this Mar 7, 2021
@Manishearth Manishearth reopened this Mar 7, 2021
@Manishearth
Copy link
Member

Retriggering GHA

@Manishearth Manishearth merged commit cea3ce6 into unicode-rs:master Mar 9, 2021
@basile-henry basile-henry deleted the basile/unicode-word-indices branch March 9, 2021 06:54
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants