Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[LLM INFER] Fix some bugs and chatglm_v2 support block_attn #9271

Merged
merged 6 commits into from
Oct 25, 2024

Conversation

yuanlehome
Copy link
Collaborator

@yuanlehome yuanlehome commented Oct 15, 2024

PR types

New features

PR changes

Models

Description

  • chatglm_v2 support block_attn mode, but the accuracy needs to be aligned
  • 修复先前disable掉的诸多单测
  • 略微优化下组网代码
  • 添加USE_FASTER_TOP_P_SAMPLING环境变量用于使用性能更好的top_p_sampling算子

Copy link

paddle-bot bot commented Oct 15, 2024

Thanks for your contribution!

@yuanlehome yuanlehome marked this pull request as draft October 15, 2024 07:39
Copy link

codecov bot commented Oct 15, 2024

Codecov Report

Attention: Patch coverage is 0% with 104 lines in your changes missing coverage. Please review.

Project coverage is 52.89%. Comparing base (7551730) to head (d19ed92).
Report is 263 commits behind head on develop.

Files with missing lines Patch % Lines
...p/experimental/transformers/chatglm_v2/modeling.py 0.00% 84 Missing ⚠️
...erimental/transformers/fused_transformer_layers.py 0.00% 15 Missing ⚠️
...enlp/experimental/transformers/generation_utils.py 0.00% 5 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #9271      +/-   ##
===========================================
+ Coverage    52.80%   52.89%   +0.08%     
===========================================
  Files          660      660              
  Lines       106869   106929      +60     
===========================================
+ Hits         56434    56561     +127     
+ Misses       50435    50368      -67     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@yuanlehome yuanlehome reopened this Oct 24, 2024
@yuanlehome yuanlehome marked this pull request as ready for review October 24, 2024 06:55
@yuanlehome yuanlehome changed the title [LLM INFER] chatglm_v2 support block_attn [LLM INFER] Fix some bugs and chatglm_v2 support block_attn Oct 24, 2024
else:
return 8192 # Maximum sequence length.
total_max_length: int = field(
default=4096, metadata={"help": "Super parameter. Maximum sequence length(encoder+decoder)."}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个跟npu相关同学确认的吗?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已确认,没问题。

@@ -520,7 +520,7 @@ def _preprocess(self, source):
alibi_slopes = llm_utils.get_alibi_slopes(self.model_config.n_head)
inputs["position_ids"] = paddle.to_tensor(alibi_slopes, dtype="float32")
arange_tensor_encoder = paddle.arange(self.config.total_max_length, dtype=self.config.dtype)
alibi = alibi_slopes[None, :, None, None] * arange_tensor_encoder
alibi = (alibi_slopes[None, :, None, None] * arange_tensor_encoder).astype(self.config.dtype)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

emm,这个 config dtype保险吗?用户可以改这个值。要不用里面一个tensor的dtype。

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个dtype确实需要与config.dtype保持一致的


model = Model.from_pretrained(
predictor_args.total_max_length = config.seq_length
if predictor_args.block_attn:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

emm,我建议吧 block_attn 放到config的属性里面,然后 ChatGLMv2InferenceModel 里面自己控制。
这里改的话,后期这样修改的模型太多了。

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

但严格来说其实这个不属于每个模型的Config,如果加入如LlamaConfig的话,每个模型的Config里都需要加,先保持这样吧,后面重构的时候,会看下有没有更好的方式

@qingqing01 qingqing01 merged commit 2e8b220 into PaddlePaddle:develop Oct 25, 2024
9 of 12 checks passed
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants