Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Device agnostic testing #5612

Merged
merged 12 commits into from
Dec 5, 2023
Merged

Conversation

arsalanu
Copy link
Contributor

@arsalanu arsalanu commented Nov 1, 2023

What does this PR do?

Adds new features to testing_utils.py and import_utils.py to make testing with non-default PyTorch backends (beyond just cuda, cpu and mps) possible. This should not affect any current testing within the repo or the behaviour of the devices they are run on.

This is heavily based on similar work we have done for Transformers, see: Transformers PR #25870

Adds some device agnostic functions which dispatch to specific backend functions. This is mainly applicable to functions which are device-specific (e.g. torch.cuda.manual_seed). Users can specify new backends and backends for device agnostic functions by creating a device specification file and pointing the test suite to it using the environment variable DIFFUSERS_TEST_DEVICE_SPEC, and add a new device for PyTorch using DIFFUSERS_TEST_DEVICE.

Example of a device specification to run the tests with an alternative accelerator:

import torch
import torch_npu
# User can add additional imports here

# Specify the device name (eg. 'cuda', 'cpu')
DEVICE_NAME = 'npu'

# Specify device-specific backends to dispatch to.
# If not specified (i.e., `None`) will fallback to 'default' in 'testing_utils.py`
MANUAL_SEED_FN = torch.npu.manual_seed
EMPTY_CACHE_FN = None
DEVICE_COUNT_FN = torch.npu.device_count
SUPPORTS_TRAINING=True

Implementation details are fully outlined in the issue #5562

I have a modified a single file (UNet2D condition model tests, and test_modeling_common as this is used in the UNet2D tests) rather than all the tests, as this PR is more focused on the implementation of features required for device-agnostic testing.

Before submitting

@yiyixuxu
Copy link
Collaborator

yiyixuxu commented Nov 3, 2023

@patrickvonplaten
can you take a look here?

Copy link
Contributor

@patrickvonplaten patrickvonplaten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look very reasonable to me! Thanks a lot for making everything device-agnostic.

Just a bit worried about the is_torch_fp16_available because we're essestially just saying if the matmul doesn't work fp16 is not available, but the matmul might also not work because of other reasons (badly installed CUDA, OOM, ...)

In PyTorch there is actually a is_bf16_available(): https://github.com/pytorch/pytorch/blob/d64bc8f0f81bd9b514eb1a5ee6f5b03094e4e6e9/torch/cuda/__init__.py#L141

The function seems to check some device properties which is probably less brittle - guess it's hard to do this for fp16 here, but can we maybe make sure that we don't accidentally misinterpret other erros as fp16 not being available?

@arsalanu
Copy link
Contributor Author

arsalanu commented Nov 15, 2023

Thanks, @patrickvonplaten. I agree that just catching any exception might not be the best way to do this, but I'm not sure if there is a specific exception that would be agnostic to any accelerator or device. On CPU and XLA I believe you get a RuntimeError when trying to perform operations with FP16, but this is vague in itself and may overlap with a different issue.

One suggestion is to write it so the exception is logged and print the error when tests are run, to specify why FP16 is not working? This would make it clear to the user whether it is unsupported behaviour or an issue with their setup.

Looking at the PyTorch is_bf16_available() this specifies different checks for different hardware, which makes it less brittle as you said, but that function is also not device agnostic and would only work for ROCm and CUDA backends.

Another suggestion I have is to add a CUDA specific check and skip the FP16 matmul check in is_torch_fp16_available() if a GPU is being used (I could do this for MPS and CPU too, which would be set to False by default).

This would still be in line with the changes as these backends have defaults specified for them in the custom function dispatch as well. Then the matmul check would only be used if a non-default device is being used. We could do this and also log the error to make it explicit to the user.

Let me know if that makes sense to you and I will add those changes, or any other suggestions you have. Thanks!

@patrickvonplaten
Copy link
Contributor

Thanks, @patrickvonplaten. I agree that just catching any exception might not be the best way to do this, but I'm not sure if there is a specific exception that would be agnostic to any accelerator or device. On CPU and XLA I believe you get a RuntimeError when trying to perform operations with FP16, but this is vague in itself and may overlap with a different issue.

One suggestion is to write it so the exception is logged and print the error when tests are run, to specify why FP16 is not working? This would make it clear to the user whether it is unsupported behaviour or an issue with their setup.

Looking at the PyTorch is_bf16_available() this specifies different checks for different hardware, which makes it less brittle as you said, but that function is also not device agnostic and would only work for ROCm and CUDA backends.

Another suggestion I have is to add a CUDA specific check and skip the FP16 matmul check in is_torch_fp16_available() if a GPU is being used (I could do this for MPS and CPU too, which would be set to False by default).

This would still be in line with the changes as these backends have defaults specified for them in the custom function dispatch as well. Then the matmul check would only be used if a non-default device is being used. We could do this and also log the error to make it explicit to the user.

Let me know if that makes sense to you and I will add those changes, or any other suggestions you have. Thanks!

Could we maybe do something like this: https://github.com/huggingface/diffusers/pull/5612/files#r1399038284 just to add an extra safety-mechanism that a sure doesn't understand the function incorrectly in case cuda is badly set up?

Also can we make the function private for now, e.g. add an underscore so that it's _is_torch_fp16_available()?

@arsalanu
Copy link
Contributor Author

I've added the changes, slightly restructured so the FP16 op-check happens by default for all accelerators and the CUDA error is raised only if the device type is cuda.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

Copy link
Contributor

@patrickvonplaten patrickvonplaten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool thanks for iterating here. This PR LGTM - should we merge it now or do you want to add tests for other classes directly here?



# Guard for when Torch is not available
if is_torch_available():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this meant to run when torch isn't available or if DIFFUSERS_TEST_DEVICE_SPEC is set?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The guard is there because the function dispatch should only run if torch is available, it doesn't strictly matter if DIFFUSERS_TEST_DEVICE_SPEC is set. For example for a GPU, CPU or MPS device, a spec doesn't need to be set but torch must still be available to dispatch to the default torch device functions.

@arsalanu
Copy link
Contributor Author

should we merge it now or do you want to add tests for other classes directly here?

If thats okay I'll add some more before merging 😄 I had a few other tests ready but removed them for this PR to keep it minimal.

@arsalanu
Copy link
Contributor Author

I've added more test coverage. The latest commit has the changes for most of the model classes (unet, vae, vq, unet2d and some common files) and one pipeline test (SD2). Any more tests could be added in future PRs.

Copy link
Contributor

@patrickvonplaten patrickvonplaten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! The changes look good to me - @DN6 wdyt? Feel free to merge once you're happy with it

Copy link
Collaborator

@DN6 DN6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍🏽 Nice work @arsalanu!

@DN6 DN6 merged commit f427345 into huggingface:main Dec 5, 2023
donhardman pushed a commit to donhardman/diffusers that referenced this pull request Dec 18, 2023
* utils and test modifications to enable device agnostic testing

* device for manual seed in unet1d

* fix generator condition in vae test

* consistency changes to testing

* make style

* add device agnostic testing changes to source and one model test

* make dtype check fns private, log cuda fp16 case

* remove dtype checks from import utils, move to testing_utils

* adding tests for most model classes and one pipeline

* fix vae import
yoonseokjin pushed a commit to yoonseokjin/diffusers that referenced this pull request Dec 25, 2023
* utils and test modifications to enable device agnostic testing

* device for manual seed in unet1d

* fix generator condition in vae test

* consistency changes to testing

* make style

* add device agnostic testing changes to source and one model test

* make dtype check fns private, log cuda fp16 case

* remove dtype checks from import utils, move to testing_utils

* adding tests for most model classes and one pipeline

* fix vae import
AmericanPresidentJimmyCarter pushed a commit to AmericanPresidentJimmyCarter/diffusers that referenced this pull request Apr 26, 2024
* utils and test modifications to enable device agnostic testing

* device for manual seed in unet1d

* fix generator condition in vae test

* consistency changes to testing

* make style

* add device agnostic testing changes to source and one model test

* make dtype check fns private, log cuda fp16 case

* remove dtype checks from import utils, move to testing_utils

* adding tests for most model classes and one pipeline

* fix vae import
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants