Skip to content

[CI] add a big GPU marker to run memory-intensive tests separately on CI #9691

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Merged
merged 38 commits into from
Oct 31, 2024

Conversation

sayakpaul
Copy link
Member

What does this PR do?

I have only touched a handful of tests with the marker being introduced. I think we may need to change the slices based on the CI machine and infra. @a-r-r-o-w should consider marking the Cog tests similarly as well?

@DN6 would love to get your thoughts on the design.

@sayakpaul sayakpaul requested a review from DN6 October 16, 2024 07:45
@@ -2,6 +2,7 @@ name: Nightly and release tests on main/release branch

on:
workflow_dispatch:
pull_request:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is temporary.

@@ -18,6 +19,7 @@ env:

jobs:
setup_torch_cuda_pipeline_matrix:
if: github.event_name == 'schedule'
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Temporary.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@a-r-r-o-w
Copy link
Member

should consider marking the Cog tests similarly as well?

With model cpu offload and vae tiling, it should be < 16 GB, and I think we documented it here. Are we seeing Cog test failures due to memory? I see that they are passing here

@sayakpaul
Copy link
Member Author

Ah okay then. No issues.

Copy link
Collaborator

@DN6 DN6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! 👍🏽

@sayakpaul
Copy link
Member Author

@DN6 okay if I modified the failing tests to account for the machine change?

@sayakpaul sayakpaul requested a review from DN6 October 17, 2024 10:53
@sayakpaul
Copy link
Member Author

@DN6 can you give this a look? I think the test failures should go away once the CI Bot has access to Flux.

Once approved I will revert the changes which I have denoted as temporary (like this).


@slow
@require_torch_gpu
class FluxControlNetImg2ImgPipelineSlowTests(unittest.TestCase):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this test was correctly done as it doesn't pass the controlnet module to the pipeline and it also uses very dummy inputs which I think should be avoided for an integration test. LMK if you think otherwise.

@sayakpaul
Copy link
Member Author

@DN6 regarding https://github.com/huggingface/diffusers/actions/runs/11398910357/job/31716739483?pr=9691#step:7:67, my hunch is that there's some kind of leakage happening which is causing the worker to crash. When I SSH'd into the runner and manually ran the test, it passed.

@sayakpaul sayakpaul merged commit ff182ad into main Oct 31, 2024
18 checks passed
@sayakpaul sayakpaul deleted the big-model-marker branch October 31, 2024 13:14
@sayakpaul
Copy link
Member Author

In a follow-up I will introduce the quantization tests.

a-r-r-o-w added a commit that referenced this pull request Nov 1, 2024
… CI (#9691)

* add a marker for big gpu tests

* update

* trigger on PRs temporarily.

* onnx

* fix

* total memory

* fixes

* reduce memory threshold.

* bigger gpu

* empty

* g6e

* Apply suggestions from code review

* address comments.

* fix

* fix

* fix

* fix

* fix

* okay

* further reduce.

* updates

* remove

* updates

* updates

* updates

* updates

* fixes

* fixes

* updates.

* fix

* workflow fixes.

---------

Co-authored-by: Aryan <aryan@huggingface.co>
sayakpaul added a commit that referenced this pull request Dec 23, 2024
… CI (#9691)

* add a marker for big gpu tests

* update

* trigger on PRs temporarily.

* onnx

* fix

* total memory

* fixes

* reduce memory threshold.

* bigger gpu

* empty

* g6e

* Apply suggestions from code review

* address comments.

* fix

* fix

* fix

* fix

* fix

* okay

* further reduce.

* updates

* remove

* updates

* updates

* updates

* updates

* fixes

* fixes

* updates.

* fix

* workflow fixes.

---------

Co-authored-by: Aryan <aryan@huggingface.co>
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants