Skip to content

Tokenizer 1.19.0

Compare
Choose a tag to compare
@guillaumekln guillaumekln released this 02 Sep 09:17
· 225 commits to master since this release

New features

  • Add BPE dropout (Provilkov et al. 2019)
  • [Python] Introduce the "Token API": a set of methods that manipulate Token objects instead of serialized strings
  • [Python] Add unicode_ranges argument to the detokenize_with_ranges method to return ranges over Unicode characters instead of bytes

Fixes and improvements

  • Include "Half-width kana" in Katakana script detection