-
Notifications
You must be signed in to change notification settings - Fork 6k
enable cpu offloading of new pipelines on XPU & use device agnostic empty to make pipelines work on XPU #11671
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
base: main
Are you sure you want to change the base?
Conversation
@@ -193,7 +193,7 @@ def __init__( | |||
def enable_xformers_memory_efficient_attention(self, attention_op: Optional[Callable] = None): | |||
self.decoder_pipe.enable_xformers_memory_efficient_attention(attention_op) | |||
|
|||
def enable_sequential_cpu_offload(self, gpu_id: Optional[int] = None, device: Union[torch.device, str] = "cuda"): | |||
def enable_sequential_cpu_offload(self, gpu_id: Optional[int] = None, device: Union[torch.device, str] = None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
per discussion in this PR #11288, we change the default to None
, so cpu_offloading can work on other accelerators like XPU w/ application code change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with such changes, cases like tests/pipelines/wuerstchen/test_wuerstchen_combined.py::WuerstchenCombinedPipelineFastTests::test_cpu_offload_forward_pass_twice
, tests/pipelines/kandinsky2_2/test_kandinsky_combined.py::KandinskyV22PipelineImg2ImgCombinedFastTests::test_cpu_offload_forward_pass_twice
can pass on XPU
@a-r-r-o-w @DN6 , pls help review, thx very much. |
@bot /style |
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
device_mod = getattr(torch, self.device.type, None) | ||
if hasattr(device_mod, "empty_cache") and device_mod.is_available(): | ||
device_mod.empty_cache() # otherwise we don't see the memory savings (but they probably exist) | ||
empty_device_cache(orig_device_type) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DN6 , the original code has a bug, the empty_cache
will not be executed.
The logic is like: check module's device type w/ self.device.type
, at this time, it's "cuda" or "xpu", then it goes to the if
scope, then it put module to cpu w/ to
, after this, the self.device.type
will be "cpu". So, "device_mod" will be torch.cpu
, it has no empty_cache
, so the following check will not pass, so empty_cache
will not be called, so no empty_cache
behavior.
I changed the code to make it will empty_cache
of device, pls review in case that's not what in your design.
PS. I attached the PR which changes to current code here: #4191
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
No description provided.