Skip to content

Tokenizer 1.25.0

Compare
Choose a tag to compare
@guillaumekln guillaumekln released this 15 Mar 09:20
· 103 commits to master since this release

New features

  • Add training flag in tokenization methods to disable subword regularization during inference
  • [Python] Implement __len__ method in the Token class

Fixes and improvements

  • Raise an error when enabling case_markup with incompatible tokenization modes "space" and "none"
  • [Python] Improve parallelization when Tokenizer.tokenize is called from multiple Python threads (the Python GIL is now released)
  • [Python] Cleanup some manual Python <-> C++ types conversion