Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Database#createFTS5Tokenizer API #944

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

indutny-signal
Copy link

FTS5 doesn't support CJK symbols and non-latin locales in general. The easiest way to add them is to just use Intl global object available in V8 to segment the UTF-8 string into words with ICU. This Pull Request adds the API to map Intl.Segmenter APIs into FTS5 as a custom tokenizer, or alternatively implement your own tokenizer from scratch.

@valstu
Copy link

valstu commented Sep 4, 2024

This would be great addition, with this one could easily implement something like snowball stemmer to fts5 👍

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants