Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[inf] Add config var to enable keeping module on host #6846

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

oelayan7
Copy link
Contributor

Using keep_module_on_host config var will let us control if the loaded checkpoints to model parameters will be moved to the device or stay on host

Using keep_module_on_host config var will let us control if the loaded
checkpoints to model parameters will be moved to the device or stay on
host
@@ -17,9 +17,11 @@
from deepspeed.module_inject.tp_shard import get_shard_size, get_shard_size_list


def move(tensor, device):
def move(tensor, device, keep_module_on_host=False):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't it be simpler to modify callers to pass device='cpu' when keep_module_on_host=True?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@oelayan7 oelayan7 requested a review from hwchen2017 as a code owner December 24, 2024 06:49
@oelayan7
Copy link
Contributor Author

@tjruwase Can you please rereview and retrigger the CI?

@@ -554,6 +554,7 @@ def test(self, model_w_task, injection_policy, query, inf_kwargs, assert_fn, dty


@pytest.mark.seq_inference
@pytest.mark.parametrize('keep_module_on_host', [True, False])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it useful to validate that tensors are on cpu when keep_module_on_host=True?

Copy link
Contributor Author

@oelayan7 oelayan7 Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tjruwase
You're right.
A check was added.

@oelayan7 oelayan7 requested a review from tjruwase January 7, 2025 13:51
@loadams loadams removed the request for review from awan-10 January 7, 2025 16:08
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants