Motion Model / Adapter versatility #8301

Arlaz · 2024-05-28T15:59:27Z

In the MotionModel and MotionAdapter for AnimateDiff pipeline :

allow to use a different number of layers per block
allow to use a different number of transformer per layers per block
allow a different number of motion attention head per block
allow to pass the dropout argument

These modifications are needed for some custom trained model that follow the SDXL architecture more closely (less down block) and more Transformer block per layers.

New arguments / tuple instead of int can now be used, as in the following example :

adapter = MotionAdapter(
    block_out_channels=(320, 640, 1280),  # to be checked with the base model unet for compatibility
    motion_layers_per_block=2,  # to be checked with the base model unet for compatibility
    motion_transformer_per_layers=(1, 2, 6),  # free to choose
    motion_mid_block_layers_per_block=1,
    motion_transformer_per_mid_layers=10,
    motion_num_attention_heads=(5, 8, 10), # free to choose
    motion_norm_num_groups=32,
    motion_max_seq_length=32,
    use_motion_mid_block=True,
    conv_in_channels=None,
)

Important : this PR does not break any existing code

Before submitting

Did you read the contributor guideline?
Did you read our philosophy doc (important for complex PRs)?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@sayakpaul @yiyixuxu @DN6

HuggingFaceDocBuilderDev · 2024-06-03T06:38:43Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

- allow to use a different number of layers per block - allow to use a different number of transformer per layers per block - allow a different number of motion attention head per block - use dropout argument in get_down/up_block in 3d blocks

src/diffusers/models/unets/unet_3d_blocks.py

src/diffusers/models/unets/unet_motion_model.py

Arlaz · 2024-06-05T09:45:48Z

The suggested changes have been made, thanks

DN6 · 2024-06-10T06:19:51Z

Could we add a fast test for the model here:
https://github.com/huggingface/diffusers/blob/main/tests/models/unets/test_models_unet_motion.py

I think we we should test creating an asymmetric UNetMotionModel. I'd just like to confirm that the updated parameters aren't breaking anything.

Arlaz · 2024-06-10T13:10:47Z

Of course, a forward test for the most asymmetrical UnetMotionModel possible is added 😃
It's passing on my side.

Arlaz · 2024-06-17T08:53:23Z

Are some of the tests failing because of my commits ?
After log inspection I don't think so, but if that's the case can you please give me some hints on how to solve it ?

DN6 · 2024-06-27T05:41:21Z

Failing tests are unrelated @Arlaz. Merging this.

* Motion Model / Adapter versatility - allow to use a different number of layers per block - allow to use a different number of transformer per layers per block - allow a different number of motion attention head per block - use dropout argument in get_down/up_block in 3d blocks * Motion Model added arguments renamed & refactoring * Add test for asymmetric UNetMotionModel

DN6 self-requested a review May 28, 2024 16:14

Motion Model / Adapter versatility

4678f9c

- allow to use a different number of layers per block - allow to use a different number of transformer per layers per block - allow a different number of motion attention head per block - use dropout argument in get_down/up_block in 3d blocks

Arlaz force-pushed the motionmodel_improvements branch from c3525f9 to 4678f9c Compare June 3, 2024 08:43