-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
【Hackathon 7th No.43】完善 TokenizerFast 功能支持 part 1 #9407
Conversation
Thanks for your contribution! |
|
451d07d
to
6d95920
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #9407 +/- ##
===========================================
- Coverage 53.08% 52.96% -0.12%
===========================================
Files 687 689 +2
Lines 109472 109412 -60
===========================================
- Hits 58114 57952 -162
- Misses 51358 51460 +102 ☔ View full report in Codecov by Sentry. |
5f1403e
to
7161b60
Compare
cc:@DrownFish19 麻烦再帮我看下pr吧,感谢 |
@@ -0,0 +1,131 @@ | |||
# Copyright (c) 2024 PaddlePaddle Authors. All Rights Reserved. | |||
# |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
辛苦在这里增加一下HuggingFace的Copyright
("bloom", "BloomTokenizer"), | ||
( | ||
"bloom", | ||
("BloomTokenizer", "BloomTokenizerFast" if is_tokenizers_available() else None), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
建议这里换行一下,格式统一
"LlamaTokenizer": LlamaConverter, | ||
"BertTokenizer": BertConverter, | ||
} | ||
SLOW_TO_FAST_CONVERTERS = {"LlamaTokenizer": LlamaConverter, "BertTokenizer": BertConverter} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里的convert是可以通用吗?后续可以验证一下
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里应该没有新加bloom的,因为我看在hf上bloom只有fast,没有convert的流程
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR types
Others
PR changes
Models
Description
为Bloom提供tokenizer fast支持,顺便想问一下。是对test里每个def我都要添加一个fast的测试吗~感谢🙏