-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
HunyuanVideoPipeline produces NaN values #10314
Comments
Transformer needs to be in bfloat16. Could you try with that? |
Same result @a-r-r-o-w |
On CUDA we've seen the same issue when not using the latest PyTorch, from |
Thanks for the suggestion @hlky , I'll try some more combinations.
|
Same, I also get nan value. |
@tanshuai0219 Is this on a CUDA GPU or MPS/ROCm? I'm unable to replicate when using the transformer in |
Yes, it's on a CUDA GPU, CUDA version: 12.4 Then I run: `import torch model_id = "hunyuanvideo-community/HunyuanVideo" output = pipe( import numpy as np print(np.array(output[0])) export_to_video(output, "output.mp4", fps=15)` np.array(output[0]) is all zero. output.mp4 |
Can you share the output of
output.mp4 |
here is mine:
|
If I upgrade the transformers from 4.46.3 to 4.48.0.dev0, I get the error like: |
I would recommend trying to replicate in a clean environment if you are current in a broken state. Atleast 5 people have confirmed so far that upgrading torch to 2.5.1 does not lead to black videos any more. We are still unsure why it doesn't work on 2.4 or below. |
I was not able to get a usable output with pytorch 2.5.1 either.
Hardware: AMD Instinct MI300X
`pip freeze````bash absl-py==2.1.0 accelerate==1.2.1 aiohappyeyeballs==2.4.4 aiohttp==3.11.9 aiosignal==1.3.1 amdsmi @ file:///opt/rocm-6.3.0/share/amd_smi apex @ file:///var/lib/jenkins/apex asgiref==3.8.1 astunparse==1.6.3 async-timeout==5.0.1 attrs==24.2.0 audioread==3.0.1 autocommand==2.2.2 backports.tarfile==1.2.0 boto3==1.19.12 botocore==1.22.12 cachetools==5.5.0 certifi==2024.8.30 cffi==1.17.1 charset-normalizer==3.4.0 click==8.1.7 colorama==0.4.6 coremltools==5.0b5 cryptography==44.0.0 Cython==3.0.11 decorator==5.1.1 Deprecated==1.2.15 -e git+https://github.com/huggingface/diffusers.git@1826a1e#egg=diffusers dill==0.3.7 Django==5.1.4 exceptiongroup==1.2.2 execnet==2.1.1 expecttest==0.2.1 fbscribelogger==0.1.6 filelock==3.16.1 flatbuffers==2.0 frozenlist==1.5.0 fsspec==2024.10.0 future==1.0.0 geojson==2.5.0 ghstack==0.8.0 google-auth==2.36.0 google-auth-oauthlib==1.0.0 grpcio==1.68.1 huggingface-hub==0.27.1 hypothesis==5.35.1 idna==3.10 image==1.5.33 imageio==2.36.1 imageio-ffmpeg==0.5.1 importlib_metadata==8.0.0 importlib_resources==6.4.0 inflect==7.3.1 iniconfig==2.0.0 jaraco.collections==5.1.0 jaraco.context==5.3.0 jaraco.functools==4.0.1 jaraco.text==3.12.1 Jinja2==3.1.4 jmespath==0.10.0 joblib==1.4.2 junitparser==2.1.1 lark==0.12.0 lazy_loader==0.4 librosa==0.10.2.post1 lintrunner==0.12.5 llvmlite==0.38.1 lxml==5.0.0 Markdown==3.7 MarkupSafe==3.0.2 ml_dtypes==0.5.0 more-itertools==10.3.0 mpmath==1.3.0 msgpack==1.1.0 multidict==6.1.0 mypy==1.10.0 mypy-extensions==1.0.0 networkx==2.8.8 numba==0.55.2 numpy==1.21.2 oauthlib==3.2.2 onnx==1.16.1 onnxscript==0.1.0.dev20240817 opencv-python==4.10.0.84 opt-einsum==3.3.0 optionloop==1.0.7 optree==0.12.1 packaging==24.2 pillow==10.3.0 platformdirs==4.3.6 pluggy==1.5.0 ply==3.11 pooch==1.8.2 propcache==0.2.1 protobuf==3.20.2 psutil==6.1.0 pyasn1==0.6.1 pyasn1_modules==0.4.1 pycparser==2.22 PyGithub==2.3.0 Pygments==2.15.0 PyJWT==2.10.1 PyNaCl==1.5.0 pytest==7.3.2 pytest-cpp==2.3.0 pytest-flakefinder==1.1.0 pytest-rerunfailures==14.0 pytest-xdist==3.3.1 python-dateutil==2.9.0.post0 PyWavelets==1.4.1 PyYAML @ file:///croot/pyyaml_1728657952215/work redis==5.2.0 regex==2024.11.6 requests==2.32.3 requests-oauthlib==2.0.0 rockset==1.0.3 rsa==4.9 s3transfer==0.5.2 safetensors==0.5.0 scikit-image==0.22.0 scikit-learn==1.5.2 scipy==1.10.1 sentencepiece==0.2.0 six @ file:///tmp/build/80754af9/six_1644875935023/work sortedcontainers==2.4.0 soundfile==0.12.1 soxr==0.5.0.post1 sqlparse==0.5.2 sympy==1.13.1 tb-nightly==2.13.0a20230426 tensorboard==2.13.0 tensorboard-data-server==0.7.2 threadpoolctl==3.5.0 thriftpy2==0.5.2 tifffile==2024.9.20 tlparse==0.3.7 tokenizers==0.21.0 tomli==2.2.1 torch @ file:///var/lib/jenkins/pytorch/dist/torch-2.5.1%2Bgitabbfe77-cp310-cp310-linux_x86_64.whl#sha256=b5fecdb1e666ea7de99d5ca164c7dbe22f341f4bd07a288beeeddca65f2232be torchvision==0.20.0a0+afc54f7 tqdm==4.67.1 transformers==4.47.1 # Editable install with no version control (triton==3.1.0) -e /var/lib/jenkins/triton/python typeguard==4.3.0 typing_extensions==4.12.2 unittest-xml-reporting==3.2.0 urllib3==1.26.20 Werkzeug==3.1.3 wrapt==1.17.0 xdoctest==1.1.0 yarl==1.18.3 z3-solver==4.12.2.0 zipp==3.19.2 ``` |
Update to |
tested with
|
@smedegaard Could you test with these changes? diff --git a/src/diffusers/models/transformers/transformer_hunyuan_video.py b/src/diffusers/models/transformers/transformer_hunyuan_video.py
index 6cb97af9..84610471 100644
--- a/src/diffusers/models/transformers/transformer_hunyuan_video.py
+++ b/src/diffusers/models/transformers/transformer_hunyuan_video.py
@@ -713,15 +713,15 @@ class HunyuanVideoTransformer3DModel(ModelMixin, ConfigMixin, PeftAdapterMixin,
condition_sequence_length = encoder_hidden_states.shape[1]
sequence_length = latent_sequence_length + condition_sequence_length
attention_mask = torch.zeros(
- batch_size, sequence_length, sequence_length, device=hidden_states.device, dtype=torch.bool
- ) # [B, N, N]
+ batch_size, sequence_length, device=hidden_states.device, dtype=torch.bool
+ ) # [B, N]
effective_condition_sequence_length = encoder_attention_mask.sum(dim=1, dtype=torch.int) # [B,]
effective_sequence_length = latent_sequence_length + effective_condition_sequence_length
for i in range(batch_size):
- attention_mask[i, : effective_sequence_length[i], : effective_sequence_length[i]] = True
- attention_mask = attention_mask.unsqueeze(1) # [B, 1, N, N], for broadcasting across attention heads
+ attention_mask[i, : effective_sequence_length[i]] = True
+ attention_mask = attention_mask.unsqueeze(1) # [B, 1, N], for broadcasting across attention heads
# 4. Transformer blocks
if torch.is_grad_enabled() and self.gradient_checkpointing: I was able to generate successfully on CUDA with PyTorch 2.4.1 which is also known to produce NaN. output.mp4cc @a-r-r-o-w There's also a small performance gain Codeimport torch
from diffusers import HunyuanVideoPipeline, HunyuanVideoTransformer3DModel
from diffusers.utils import export_to_video
model_id = "hunyuanvideo-community/HunyuanVideo"
transformer = HunyuanVideoTransformer3DModel.from_pretrained(
model_id, subfolder="transformer", torch_dtype=torch.bfloat16
)
pipe = HunyuanVideoPipeline.from_pretrained(model_id, transformer=transformer, torch_dtype=torch.float16).to("cuda")
pipe.vae.enable_tiling()
output = pipe(
prompt="A cat walks on the grass, realistic",
height=320,
width=512,
num_frames=61,
num_inference_steps=30,
).frames[0]
export_to_video(output, "output.mp4", fps=15)
|
@hlky Thanks for the tip. I'm afraid it didn't fix the problem for me. I added your suggested changes to |
For clarity, here's my changes to @staticmethod
def numpy_to_pil(images: np.ndarray) -> List[PIL.Image.Image]:
"""
Convert numpy image array(s) to PIL images with validation.
Args:
images (np.ndarray): Image array in range [0, 1] with shape (N, H, W, C) or (H, W, C)
Returns:
List[PIL.Image.Image]: List of PIL images
Raises:
ValueError: If images contain invalid values
TypeError: If input is not a numpy array or has invalid shape/type
"""
if not isinstance(images, np.ndarray):
raise TypeError(f"Expected numpy array, got {type(images)}")
# Handle single image case
if images.ndim == 3:
images = images[None, ...]
elif images.ndim != 4:
raise ValueError(f"Expected 3D or 4D array, got {images.ndim}D")
# Check for NaN/inf before any operations
if np.any(np.isnan(images)):
raise ValueError("Image array contains NaN values")
if np.any(np.isinf(images)):
raise ValueError("Image array contains infinite values")
# Check value range
min_val = np.min(images)
max_val = np.max(images)
if min_val < 0 or max_val > 1:
raise ValueError(
f"Image values must be in range [0, 1], got range [{min_val}, {max_val}]"
)
try:
# Convert to uint8
images_uint8 = (images * 255).round().astype("uint8")
# Verify the conversion worked correctly
if np.any(np.isnan(images_uint8)):
raise ValueError("Conversion to uint8 produced NaN values")
except Exception as e:
raise ValueError(f"Failed to convert to uint8: {str(e)}")
try:
# Convert to PIL images
if images.shape[-1] == 1:
pil_images = [Image.fromarray(image.squeeze(), mode="L") for image in images_uint8]
else:
pil_images = [Image.fromarray(image) for image in images_uint8]
return pil_images
except Exception as e:
raise ValueError(f"Failed to create PIL images: {str(e)}") |
Could you double check with the PR #10482? I was able to generate the following on AMD Instinct MI300X using the PR branch. output.10.mp4output.9.mp4 |
Thanks @hlky and @a-r-r-o-w , we have confirmed on our side that it produces video images after the recent patch. |
Describe the bug
Running
diffusers.utils.export_to_video()
on the output ofHunyuanVideoPipeline
results inAfter adding some checks to
numpy_to_pil()
inimage_processor.py
I have confirmed that the output containsNaN
valuesReproduction
Logs
No response
System Info
GPU: AMD MI300X
Who can help?
@DN6 @a-r-r-o-w
The text was updated successfully, but these errors were encountered: