Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

since commit [#5588725e8e] ,FluxPipeline inference yelds ERROR: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! #9895

Closed
paparico opened this issue Nov 8, 2024 · 6 comments
Labels
bug Something isn't working

Comments

@paparico
Copy link

paparico commented Nov 8, 2024

flux pipeline inference fails
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
when enable_sequential_cpu_offload() is used

i cannot test in other memory management settings because my 3090 wont allow it to run

it fails at #5588725e8e7be497839432e5328c596169385f16
it works fine at #ded3db164bb3c090871647f30ff9988c9c17fd83 (the parent commit)

below is my venv :

accelerate==0.34.2
albucore==0.0.17
albumentations==1.4.16
annotated-types==0.7.0
asarPy==1.0.1
bsrgan==0.1.5
certifi==2024.8.30
charset-normalizer==3.3.2
cmake==3.30.4
compel==2.0.3
contourpy==1.3.0
cycler==0.12.1
Cython==3.0.11
-e git+https://github.com/huggingface/diffusers.git@main#egg=diffusers
easydict==1.13
eval_type_backport==0.2.0
filelock==3.16.1
fonttools==4.54.1
fsspec==2024.9.0
huggingface-hub==0.25.1
idna==3.10
imageio==2.35.1
importlib_metadata==8.5.0
insightface==0.7.3
Jinja2==3.1.4
joblib==1.4.2
kiwisolver==1.4.7
lazy_loader==0.4
MarkupSafe==2.1.5
matplotlib==3.9.2
mpmath==1.3.0
networkx==3.3
numpy==2.1.1
nvidia-cublas-cu11==11.11.3.6
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu11==11.8.89
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu11==11.8.89
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu11==9.4.0.58
nvidia-cudnn-cu12==9.1.0.70
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu12==12.1.0.106
nvidia-nccl-cu12==2.20.5
nvidia-nvjitlink-cu12==12.6.68
nvidia-nvtx-cu12==12.1.105
onnx==1.16.2
opencv-python==4.10.0.84
opencv-python-headless==4.10.0.84
packaging==24.1
peft==0.12.0
pillow==10.4.0
prettytable==3.11.0
protobuf==5.28.2
psutil==6.0.0
pydantic==2.9.2
pydantic_core==2.23.4
pyparsing==3.1.4
python-dateutil==2.9.0.post0
PyYAML==6.0.2
regex==2024.9.11
requests==2.32.3
safetensors==0.4.5
scikit-image==0.24.0
scikit-learn==1.5.2
scipy==1.14.1
sentencepiece==0.2.0
six==1.16.0
style==1.1.0
sympy==1.13.3
threadpoolctl==3.5.0
tifffile==2024.9.20
timm==1.0.11
tokenizers==0.19.1
torch==2.4.1
torchaudio==2.4.1
torchsde==0.2.6
torchvision==0.19.1
tqdm==4.66.5
trampoline==0.1.2
transformers==4.44.2
triton==3.0.0
typing_extensions==4.12.2
update==0.0.1
urllib3==2.2.3
wcwidth==0.2.13
websockets==13.1
zipp==3.20.2

Reproduction

p=diffusers.FluxPipeline.from_pretrained('/home/rico/yastade/models/Flux/FLUX.1-schnell',torch_dtype=torch.bfloat16,use_safetensors=True)
p.enable_sequential_cpu_offload()
p(prompt="whatever",num_inference_steps=1)

Logs

>>> p=diffusers.FluxPipeline.from_pretrained('/home/rico/yastade/models/Flux/FLUX.1-schnell',torch_dtype=torch.bfloat16,use_safetensors=True)
Loading pipeline components...:  43%|███████████████████████████████████▏                                              | 3/7 [00:00<00:00,  6.14it/s]
You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 11.71it/s]
Loading pipeline components...: 100%|██████████████████████████████████████████████████████████████████████████████████| 7/7 [00:00<00:00,  7.28it/s]
>>> p.enable_sequential_cpu_offload()
>>> p(prompt="whatever",num_inference_steps=1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/rico/yastade/.yastade/lib64/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/rico/yastade/.yastade/src/diffusers/src/diffusers/pipelines/flux/pipeline_flux.py", line 684, in __call__
    latents, latent_image_ids = self.prepare_latents(
                                ^^^^^^^^^^^^^^^^^^^^^
  File "/home/rico/yastade/.yastade/src/diffusers/src/diffusers/pipelines/flux/pipeline_flux.py", line 522, in prepare_latents
    latent_image_ids = self._prepare_latent_image_ids(batch_size, height // 2, width // 2, device, dtype)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rico/yastade/.yastade/src/diffusers/src/diffusers/pipelines/flux/pipeline_flux.py", line 431, in _prepare_latent_image_ids
    latent_image_ids[..., 1] = latent_image_ids[..., 1] + torch.arange(height)[:, None]
                               ~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

System Info

Copy-and-paste the text below in your GitHub issue and FILL OUT the two last points.

  • 🤗 Diffusers version: 0.32.0.dev0
  • Platform: Linux-6.10.11-1-default-x86_64-with-glibc2.40
  • Running on Google Colab?: No
  • Python version: 3.11.10
  • PyTorch version (GPU?): 2.4.1+cu121 (True)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Huggingface_hub version: 0.25.1
  • Transformers version: 4.44.2
  • Accelerate version: 0.34.2
  • PEFT version: 0.12.0
  • Bitsandbytes version: not installed
  • Safetensors version: 0.4.5
  • xFormers version: not installed
  • Accelerator: NVIDIA GeForce RTX 3090, 24576 MiB
  • Using GPU in script?: yes, via
  • Using distributed or parallel set-up in script?: no

Who can help?

@sayakpaul

because


commit 5588725e8e7be497839432e5328c596169385f16
Author: Sayak Paul <spsayakpaul@gmail.com>
Date:   Thu Nov 7 03:33:39 2024 +0100

    [Flux] reduce explicit device transfers and typecasting in flux. (#9817)
    
    reduce explicit device transfers and typecasting in flux.
@paparico paparico added the bug Something isn't working label Nov 8, 2024
@squewel
Copy link

squewel commented Nov 8, 2024

+1

Trying to run this pipeline: https://gist.github.com/AmericanPresidentJimmyCarter/873985638e1f3541ba8b00137e7dacd9

Checking out an earlier commit of diffusers work
pip install git+https://github.com/huggingface/diffusers@76b7d86a9a5c0c2186efa09c4a67b5f5666ac9e3

@sayakpaul
Copy link
Member

Thanks for reporting the bug.
#9896 should cut it for now.

@paparico
Copy link
Author

paparico commented Nov 9, 2024

Not sure about the etiquette.. should i close the issue ? (My requirements.txt can point to main again, so for all I care, I'm good with the revert)

@sayakpaul
Copy link
Member

Yeah closing is fine.

@paparico
Copy link
Author

paparico commented Nov 9, 2024

Ok, and thanks for your reactivity :)

@paparico paparico closed this as completed Nov 9, 2024
@paparico
Copy link
Author

paparico commented Nov 9, 2024

Flux pipeline works fine with sequential offload again

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants