Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Can the coefficients of CogVideoX-5B be used in CogVideoX1.5-5B? #24

Closed
zishen-ucap opened this issue Jan 14, 2025 · 5 comments
Closed

Comments

@zishen-ucap
Copy link
Contributor

Hello,

Thank you for your amazing work on the CogVideo series!

I noticed the coefficients for CogVideoX-5B (as shown in the attached image) and wanted to ask if they can be directly applied to CogVideoX1.5-5B, or if any adjustments are needed?
D82FD2CC-1991-429a-AEF3-193D47B42483

Looking forward to your response. Thanks again!

@LiewFeng
Copy link
Collaborator

hi, @zishen-ucap . Thank you for your interest in our work. Not sure about it. You can try it and show some results here. If it can not work well, you can follow issue 20 to obtain new coeff.

@zishen-ucap
Copy link
Contributor Author

Thanks for your suggestion! I tried it out, and with a negligible subjective performance drop, the sampling time decreased significantly from 475s to 260s. Here are the results:

cogvideo15_teacache.-._20250114_15475606.mp4
final_output13_20250114_15485175.mp4

I tested five different prompt sets and noticed that the residual replacement always occurs at the same inference_steps (e.g., [2, 14, 19, ...]). I’m curious, in your experiments with other models, did you observe a similar pattern, or is this behavior unique to CogVideoX?

If residual replacement tends to happen at fixed inference_steps, would it be feasible to treat these steps as priors to further accelerate video generation?

Looking forward to your insights!

@LiewFeng
Copy link
Collaborator

For CogVideoX, we find that timestep embedding shows stronger correlation with model ouput. Thus, we leverage timstep embedding to decide which step to be cached. Since timestep embedding keeps the same for all prompts, the same timesteps will be cached, given a threshold. Different threshold will select different timesteps to be cached. For most models, we find that timestep embedding modulted noisy input shows stronger correlation with model outpt and leverage timestep embedding modulated noisy input to decide which timestep to be cached.

@zishen-ucap
Copy link
Contributor Author

I see! Thank you for your detailed answer and sharing

@LiewFeng
Copy link
Collaborator

Welcome to launch a PR to support CogVideoX1.5-5B if it's convenient.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants