Fix HunyuanVideo produces NaN on PyTorch<2.5 #10482

hlky · 2025-01-07T11:57:45Z

What does this PR do?

NaN tracked to

diffusers/src/diffusers/models/transformers/transformer_hunyuan_video.py

Lines 117 to 135 in 811560b

    
           hidden_states = F.scaled_dot_product_attention( 
        
               query, key, value, attn_mask=attention_mask, dropout_p=0.0, is_causal=False 
        
           ) 
        
           hidden_states = hidden_states.transpose(1, 2).flatten(2, 3) 
        
           hidden_states = hidden_states.to(query.dtype) 
        
           # 6. Output projection 
        
           if encoder_hidden_states is not None: 
        
               hidden_states, encoder_hidden_states = ( 
        
                   hidden_states[:, : -encoder_hidden_states.shape[1]], 
        
                   hidden_states[:, -encoder_hidden_states.shape[1] :], 
        
               ) 
        
               if getattr(attn, "to_out", None) is not None: 
        
                   hidden_states = attn.to_out[0](hidden_states) 
        
                   hidden_states = attn.to_out[1](hidden_states) 
        
               if getattr(attn, "to_add_out", None) is not None: 
        
                   encoder_hidden_states = attn.to_add_out(encoder_hidden_states)

Specifically, some elements of encoder_hidden_states.

The dimensions of query, key, value and mask are large which suggests versions <2.5 used 32-bit indexing, this tracks with #10314 if ROCm versions are still using 32-bit indexing, this may also close that issue, awaiting confirmation from user.

Tested on CUDA 2.4.1

output.mp4

Code

import torch
from diffusers import HunyuanVideoPipeline, HunyuanVideoTransformer3DModel
from diffusers.utils import export_to_video

model_id = "hunyuanvideo-community/HunyuanVideo"
transformer = HunyuanVideoTransformer3DModel.from_pretrained(
  model_id, subfolder="transformer", torch_dtype=torch.bfloat16
)
pipe = HunyuanVideoPipeline.from_pretrained(model_id, transformer=transformer, torch_dtype=torch.float16).to("cuda")
pipe.vae.enable_tiling()

output = pipe(
  prompt="A cat walks on the grass, realistic",
  height=320,
  width=512,
  num_frames=61,
  num_inference_steps=30,
).frames[0]
export_to_video(output, "output.mp4", fps=15)

There's also a small performance increase

2.4.1 with fix	2.5.1	2.5.1 with fix
30/30 [01:56<00:00, 3.88s/it]	30/30 [02:04<00:00, 4.16s/it]	[01:56<00:00, 3.89s/it]

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@sayakpaul @a-r-r-o-w

HuggingFaceDocBuilderDev · 2025-01-07T12:04:10Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

a-r-r-o-w · 2025-01-07T15:18:12Z

Oh wow, this is very cool 🤯 It maybe is saving some extra memory too now

Just to confirm, the results before and after are numerically the same, no? Can take a look too if not matched yet 🤗

So, just using a big attention mask is not supported/buggy for < 2.5.1?

Nerogar · 2025-01-12T22:01:37Z

This change broke batching again. It was previously fixed in #10454

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

Fix HunyuanVideo produces NaN on PyTorch<2.5

6cbb05f

hlky mentioned this pull request Jan 7, 2025

HunyuanVideoPipeline produces NaN values #10314

Closed

a-r-r-o-w approved these changes Jan 7, 2025

View reviewed changes

Merge branch 'main' into hunyuanvideo-torch_2.5

ab27fd8

yiyixuxu merged commit 01bd796 into huggingface:main Jan 7, 2025
12 checks passed

Nerogar mentioned this pull request Jan 12, 2025

Hunyuan Video Batch Size > 1 is broken again #10542

Closed

DN6 pushed a commit that referenced this pull request Jan 15, 2025

Fix HunyuanVideo produces NaN on PyTorch<2.5 (#10482)

13ea83f

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix HunyuanVideo produces NaN on PyTorch<2.5 #10482

Fix HunyuanVideo produces NaN on PyTorch<2.5 #10482

hlky commented Jan 7, 2025 •

edited

Loading

HuggingFaceDocBuilderDev commented Jan 7, 2025

a-r-r-o-w commented Jan 7, 2025 •

edited

Loading

Nerogar commented Jan 12, 2025

	hidden_states = F.scaled_dot_product_attention(
	query, key, value, attn_mask=attention_mask, dropout_p=0.0, is_causal=False
	)
	hidden_states = hidden_states.transpose(1, 2).flatten(2, 3)
	hidden_states = hidden_states.to(query.dtype)

	# 6. Output projection
	if encoder_hidden_states is not None:
	hidden_states, encoder_hidden_states = (
	hidden_states[:, : -encoder_hidden_states.shape[1]],
	hidden_states[:, -encoder_hidden_states.shape[1] :],
	)

	if getattr(attn, "to_out", None) is not None:
	hidden_states = attn.to_out[0](hidden_states)
	hidden_states = attn.to_out[1](hidden_states)

	if getattr(attn, "to_add_out", None) is not None:
	encoder_hidden_states = attn.to_add_out(encoder_hidden_states)

Fix HunyuanVideo produces NaN on PyTorch<2.5 #10482

Fix HunyuanVideo produces NaN on PyTorch<2.5 #10482

Conversation

hlky commented Jan 7, 2025 • edited Loading

What does this PR do?

Who can review?

HuggingFaceDocBuilderDev commented Jan 7, 2025

a-r-r-o-w commented Jan 7, 2025 • edited Loading

Nerogar commented Jan 12, 2025

hlky commented Jan 7, 2025 •

edited

Loading

a-r-r-o-w commented Jan 7, 2025 •

edited

Loading