Limit the use of PreTrainedModel.device #16935

sgugger · 2022-04-25T17:40:45Z

What does this PR do?

I'm currently working on solutions to do model parallelism, offload weights to the CPU or the hard drive, and I've encountered some bugs linked to the way we use the PreTrainedModel.device: it grabs the first parameter of the model to infer a device for the whole model. This doesn't work when the model is:

split on several devices and the first parameter grabbed happens to be on the wrong one
not materialized because its parameters are offloaded on the CPU or the hard-drive.

So whenever it's possible, it would be great to rely on something else if we can, for instance some device where the inputs are. This PR does this for every use of this device attribute in modeling_utils and generation_utils, with the exception of some code where there are no inputs passed so we generate them and have to use something for the device.

If all works well, I plan to add all modeling files that make use of that attribute (when in the dummy_inputs, I'll leave the self.device but outside of it, will grab the device of any inputs we have).

sgugger · 2022-04-25T17:42:44Z

src/transformers/generation_utils.py

+            if device is None:
+                device = self.device


Default to self.device here for a 100% backward compatible change.

Great thanks!

HuggingFaceDocBuilderDev · 2022-04-25T17:55:04Z

The documentation is not available anymore as the PR was closed or merged.

LysandreJik

LGTM, thanks @sgugger!

* Limit the use of PreTrainedModel.device * Fix

Limit the use of PreTrainedModel.device

0aab405

sgugger requested review from patrickvonplaten and LysandreJik April 25, 2022 17:40

sgugger commented Apr 25, 2022

View reviewed changes

Fix

bed2b6b

LysandreJik approved these changes Apr 25, 2022

View reviewed changes

patrickvonplaten approved these changes Apr 25, 2022

View reviewed changes

sgugger merged commit 344b9fb into main Apr 26, 2022

sgugger deleted the avoid_self_device branch April 26, 2022 00:58

elusenji pushed a commit to elusenji/transformers that referenced this pull request Jun 12, 2022

Limit the use of PreTrainedModel.device (huggingface#16935)

e8c8a80

* Limit the use of PreTrainedModel.device * Fix

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Limit the use of PreTrainedModel.device #16935

Limit the use of PreTrainedModel.device #16935

sgugger commented Apr 25, 2022

sgugger Apr 25, 2022

patrickvonplaten Apr 25, 2022

HuggingFaceDocBuilderDev commented Apr 25, 2022 •

edited

Loading

LysandreJik left a comment

Limit the use of PreTrainedModel.device #16935

Limit the use of PreTrainedModel.device #16935

Conversation

sgugger commented Apr 25, 2022

What does this PR do?

sgugger Apr 25, 2022

Choose a reason for hiding this comment

patrickvonplaten Apr 25, 2022

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Apr 25, 2022 • edited Loading

LysandreJik left a comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Apr 25, 2022 •

edited

Loading