English | 中文
DeepSeek Tokenizer is an efficient and lightweight tokenization libraries which doesn't require heavy dependencies like the transformers
library, DeepSeek Tokenizer solely relies on the tokenizers
library, making it a more streamlined and efficient choice for tokenization tasks.
To install DeepSeek Tokenizer, use the following command:
pip install deepseek_tokenizer
Below is a simple example demonstrating how to use DeepSeek Tokenizer to encode text:
from deepseek_tokenizer import ds_token
# Sample text
text = "Hello! 毕老师!1 + 1 = 2 ĠÑĤвÑĬÑĢ"
# Encode text
result = ds_token.encode(text)
# Print result
print(result)
[19923, 3, 223, 5464, 5008, 1175, 19, 940, 223, 19, 438, 223, 20, 6113, 257, 76589, 131, 100, 76032, 1628, 76589, 131, 108, 76589, 131, 98]
This project is licensed under the MIT License.