-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
[Trainer] fix save_model #9286
[Trainer] fix save_model #9286
Conversation
Thanks for your contribution! |
if isinstance(self.model, LoRAModel) and (self.model.quantized or self.args.pipeline_parallel_degree > 1): | ||
self.save_model(output_dir, False, signal_dir) | ||
elif isinstance(self.model, LoRAModel) or isinstance(self.model, PrefixModelForCausalLM): | ||
self.save_model(output_dir, True, signal_dir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
signal_dir = os.path.join(signal_dir, os.path.split(output_dir)[-1])
5b20bd3
to
6ebe5b6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #9286 +/- ##
===========================================
- Coverage 53.27% 53.09% -0.19%
===========================================
Files 657 657
Lines 107194 106533 -661
===========================================
- Hits 57104 56559 -545
+ Misses 50090 49974 -116 ☔ View full report in Codecov by Sentry. |
693d6fb
to
2eafad3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* bug fix * bug fix
* bug fix * bug fix
* [Unified Checkpoint] Support expert parallel (#9055) * update code * [Unified Checkpoint] Fix generation config save (#9223) * [Unified Checkpoint] update async_save_info in develop (#9173) * [Unified Checkpoint] update async save logic (#9274) * update async save signal * fix async save hang * bug fix * bug fix * [Trainer] fix save_model (#9286) * bug fix * bug fix --------- Co-authored-by: Weiguo Zhu <DrownFish19@gmail.com>
PR types
Others
PR changes
Others
Description
Modify the
save_model
call to enhance compatibility.