We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Qwen2.5
Qwen/Qwen2.5-3B-Instruct
transformers
OS: Ubuntu 22.04 Python: Python 3.10 GPUs: 8 x NVIDIA A100 NVIDIA driver: 535 (from nvidia-smi) CUDA compiler: 12.1 (from nvcc -V) PyTorch: 2.2.1+cu121 (from python -c "import troch; print(torch.version)")
import torch from transformers import AutoTokenizer
model_name_or_path = "Qwen/Qwen2.5-3B-Instruct" tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
text = " 那明天呢?"
tokens = tokenizer.tokenize(text) print("Tokenized Text:", tokens)
token_ids = tokenizer.convert_tokens_to_ids(tokens) print("Token IDs:", token_ids)
decoded_text = tokenizer.decode(token_ids) print("Decoded Text:", decoded_text)
Encoded Text: " 那明天呢?" Tokenized Text: ['Ġé', 'Ĥ', '£', 'æĺİ天', 'åij¢', '?'] Token IDs: [18137, 224, 96, 104807, 101036, 30] Decoded Text: 那明天呢?
The text was updated successfully, but these errors were encountered:
请问您这个token计算器是哪个工具
Sorry, something went wrong.
请问您这个token计算器是哪个工具 通过tokenizer = AutoTokenizer.from_pretrained(model_name_or_path) 网页版token计算器链接:https://dashscope.console.aliyun.com/tokenizer 计算的结果是一致的:
No branches or pull requests
Model Series
Qwen2.5
What are the models used?
Qwen/Qwen2.5-3B-Instruct
What is the scenario where the problem happened?
transformers
Is this badcase known and can it be solved using avaiable techniques?
Information about environment
OS: Ubuntu 22.04
Python: Python 3.10
GPUs: 8 x NVIDIA A100
NVIDIA driver: 535 (from nvidia-smi)
CUDA compiler: 12.1 (from nvcc -V)
PyTorch: 2.2.1+cu121 (from python -c "import troch; print(torch.version)")
Description
Steps to reproduce
import torch
from transformers import AutoTokenizer
加载 tokenizer
model_name_or_path = "Qwen/Qwen2.5-3B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
例子文本
text = " 那明天呢?"
检查每个 token
tokens = tokenizer.tokenize(text)
print("Tokenized Text:", tokens)
转换成 token ID
token_ids = tokenizer.convert_tokens_to_ids(tokens)
print("Token IDs:", token_ids)
解码回文本
decoded_text = tokenizer.decode(token_ids)
print("Decoded Text:", decoded_text)
Encoded Text: " 那明天呢?"
Tokenized Text: ['Ġé', 'Ĥ', '£', 'æĺİ天', 'åij¢', '?']
Token IDs: [18137, 224, 96, 104807, 101036, 30]
Decoded Text: 那明天呢?
The text was updated successfully, but these errors were encountered: