Tokenizer 1.19.0
New features
- Add BPE dropout (Provilkov et al. 2019)
- [Python] Introduce the "Token API": a set of methods that manipulate
Token
objects instead of serialized strings - [Python] Add
unicode_ranges
argument to thedetokenize_with_ranges
method to return ranges over Unicode characters instead of bytes
Fixes and improvements
- Include "Half-width kana" in Katakana script detection