-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
[LLM INFER] Fix some bugs and chatglm_v2 support block_attn #9271
Conversation
Thanks for your contribution! |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #9271 +/- ##
===========================================
+ Coverage 52.80% 52.89% +0.08%
===========================================
Files 660 660
Lines 106869 106929 +60
===========================================
+ Hits 56434 56561 +127
+ Misses 50435 50368 -67 ☔ View full report in Codecov by Sentry. |
f3b2d99
to
7551730
Compare
else: | ||
return 8192 # Maximum sequence length. | ||
total_max_length: int = field( | ||
default=4096, metadata={"help": "Super parameter. Maximum sequence length(encoder+decoder)."} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个跟npu相关同学确认的吗?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已确认,没问题。
@@ -520,7 +520,7 @@ def _preprocess(self, source): | |||
alibi_slopes = llm_utils.get_alibi_slopes(self.model_config.n_head) | |||
inputs["position_ids"] = paddle.to_tensor(alibi_slopes, dtype="float32") | |||
arange_tensor_encoder = paddle.arange(self.config.total_max_length, dtype=self.config.dtype) | |||
alibi = alibi_slopes[None, :, None, None] * arange_tensor_encoder | |||
alibi = (alibi_slopes[None, :, None, None] * arange_tensor_encoder).astype(self.config.dtype) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
emm,这个 config dtype保险吗?用户可以改这个值。要不用里面一个tensor的dtype。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个dtype确实需要与config.dtype保持一致的
|
||
model = Model.from_pretrained( | ||
predictor_args.total_max_length = config.seq_length | ||
if predictor_args.block_attn: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
emm,我建议吧 block_attn 放到config的属性里面,然后 ChatGLMv2InferenceModel 里面自己控制。
这里改的话,后期这样修改的模型太多了。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
但严格来说其实这个不属于每个模型的Config,如果加入如LlamaConfig的话,每个模型的Config里都需要加,先保持这样吧,后面重构的时候,会看下有没有更好的方式
PR types
New features
PR changes
Models
Description