Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

support quant ckpt limit strategy #9494

Merged
merged 5 commits into from
Nov 29, 2024

Conversation

wtmlon
Copy link
Collaborator

@wtmlon wtmlon commented Nov 26, 2024

PR types

PR changes

Description

支持 resume 压缩 ckpt 上限控制

Copy link

paddle-bot bot commented Nov 26, 2024

Thanks for your contribution!

Copy link

codecov bot commented Nov 26, 2024

Codecov Report

Attention: Patch coverage is 11.42857% with 31 lines in your changes missing coverage. Please review.

Project coverage is 53.10%. Comparing base (8fd33a9) to head (12d84c6).
Report is 17 commits behind head on develop.

Files with missing lines Patch % Lines
...p/trainer/unified_checkpoint/unified_checkpoint.py 5.00% 19 Missing ⚠️
...lp/quantization/unified_checkpoint_quantization.py 15.38% 11 Missing ⚠️
paddlenlp/transformers/model_utils.py 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #9494      +/-   ##
===========================================
+ Coverage    52.91%   53.10%   +0.19%     
===========================================
  Files          688      694       +6     
  Lines       109331   110989    +1658     
===========================================
+ Hits         57848    58940    +1092     
- Misses       51483    52049     +566     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


# Quantization times exceeds the limit. Turn off the quantization strategy.
if quant_ckpt_resume_times > MAX_QUANTIZATION_TIMES:
ckpt_quant_stage = "O0"
Copy link
Contributor

@DesmonDay DesmonDay Nov 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

把这个开关的修改写在这里感觉不太对?MAX_QUANTIZATION_TIMES主要是限制你保存为压缩checkpoint的次数,所以应该把 ckpt_quant_stage 的修改同步到 save逻辑,加载这里改了也没有作用吧?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

最后这里还是有点疑问,这里是加载optimizer逻辑,ckpt_quant_stage = "O0"不应该有外界的改变,而是直接通过checkpoint保存的index来读取。

# save opt index json if checkpoint quantization is on.
if self.args.ckpt_quant_stage != "O0":
sharded_optim_index = {"ckpt_quant_stage": self.args.ckpt_quant_stage}
if self.args.ckpt_quant_stage != "O0" and "quant_reach_limit" not in infohub:

This comment was marked as resolved.

@@ -257,7 +267,7 @@ def save_non_merge_optimizer(self, model, optim_state_dict, master_weights, outp
signal_path=signal_dir,
is_sync=is_sync_save,
state_dict_type="optimizer_weight",
ckpt_quant_stage=self.args.ckpt_quant_stage,
ckpt_quant_stage=self.args.ckpt_quant_stage if "quant_reach_limit" not in infohub else "O0",

This comment was marked as resolved.

@@ -379,7 +389,7 @@ def save_unified_optimizer(self, model, optimizer, output_dir, signal_dir):
signal_path=signal_dir,
is_sync=is_sync_save,
state_dict_type="optimizer_weight",
ckpt_quant_stage=self.args.ckpt_quant_stage,
ckpt_quant_stage=self.args.ckpt_quant_stage if "quant_reach_limit" not in infohub else "O0",

This comment was marked as resolved.

Copy link
Contributor

@DesmonDay DesmonDay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wawltor wawltor merged commit 2985f90 into PaddlePaddle:develop Nov 29, 2024
9 of 12 checks passed
wtmlon added a commit to wtmlon/PaddleNLP that referenced this pull request Nov 29, 2024
* support quant ckpt limit strategy

* bug fix

* bug fix

* fix bug

* add log, fix bug
Conflicts:
	paddlenlp/utils/env.py
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants