-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
support quant ckpt limit strategy #9494
Conversation
Thanks for your contribution! |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #9494 +/- ##
===========================================
+ Coverage 52.91% 53.10% +0.19%
===========================================
Files 688 694 +6
Lines 109331 110989 +1658
===========================================
+ Hits 57848 58940 +1092
- Misses 51483 52049 +566 ☔ View full report in Codecov by Sentry. |
|
||
# Quantization times exceeds the limit. Turn off the quantization strategy. | ||
if quant_ckpt_resume_times > MAX_QUANTIZATION_TIMES: | ||
ckpt_quant_stage = "O0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
把这个开关的修改写在这里感觉不太对?MAX_QUANTIZATION_TIMES主要是限制你保存为压缩checkpoint的次数,所以应该把 ckpt_quant_stage 的修改同步到 save逻辑,加载这里改了也没有作用吧?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
最后这里还是有点疑问,这里是加载optimizer逻辑,ckpt_quant_stage = "O0"不应该有外界的改变,而是直接通过checkpoint保存的index来读取。
# save opt index json if checkpoint quantization is on. | ||
if self.args.ckpt_quant_stage != "O0": | ||
sharded_optim_index = {"ckpt_quant_stage": self.args.ckpt_quant_stage} | ||
if self.args.ckpt_quant_stage != "O0" and "quant_reach_limit" not in infohub: |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
@@ -257,7 +267,7 @@ def save_non_merge_optimizer(self, model, optim_state_dict, master_weights, outp | |||
signal_path=signal_dir, | |||
is_sync=is_sync_save, | |||
state_dict_type="optimizer_weight", | |||
ckpt_quant_stage=self.args.ckpt_quant_stage, | |||
ckpt_quant_stage=self.args.ckpt_quant_stage if "quant_reach_limit" not in infohub else "O0", |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
@@ -379,7 +389,7 @@ def save_unified_optimizer(self, model, optimizer, output_dir, signal_dir): | |||
signal_path=signal_dir, | |||
is_sync=is_sync_save, | |||
state_dict_type="optimizer_weight", | |||
ckpt_quant_stage=self.args.ckpt_quant_stage, | |||
ckpt_quant_stage=self.args.ckpt_quant_stage if "quant_reach_limit" not in infohub else "O0", |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* support quant ckpt limit strategy * bug fix * bug fix * fix bug * add log, fix bug Conflicts: paddlenlp/utils/env.py
PR types
PR changes
Description
支持 resume 压缩 ckpt 上限控制