Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Implement zero terminal SNR noise schedule option #14145

Merged
merged 13 commits into from
Jan 1, 2024

Conversation

drhead
Copy link
Contributor

@drhead drhead commented Nov 29, 2023

Description

Draft progress

  • Currently the implementation works roughly as expected. I am sure it works correctly with K-diffusion samplers
  • I can't verify that it works with DDIM or UniPC because current dynamic thresholding and CFG rescale extensions do not work with those samplers, so I have no proper way of testing them. PLMS doesn't work with v-prediction at all to begin with so it is effectively written off.
  • The relevant options, when active, are saved to PNG metadata, but I don't know how to get them to apply setting overrides from the PNG info tab. This is one of the reasons why I am making this a draft for the moment. fixed thanks to catboxanon

Checklist:

@drhead drhead marked this pull request as ready for review November 29, 2023 23:13
@AUTOMATIC1111
Copy link
Owner

From the code it looks like the feature only works for half precision: alphas_cumprod_original is only added for half, and nothing is done if alphas_cumprod_original does not exist.

This looks like a good change overall but since it alters generation by default, I don't want to be hasty with merging it. If anyone wants to discuss his, please do here. Some images for comparison can also help.

@drhead
Copy link
Contributor Author

drhead commented Dec 2, 2023

I will fix its implementation on the other code paths. Here are a few grids for comparing the zero SNR noise schedule, generated using ZeroDiffusion 0.9. The first two are on Euler A with the default noise schedule. Note how they are overexposed and that there is a black border on some samples on the first grid. Also note that these images were generated without any CFG rescale or dynamic thresholding applied.

grid-17721
grid-17725

With the zero SNR noise schedule, these modes of failure are less likely to occur. You can also note that there's no longer a glowing outline around the subject in the second grid.

grid-17722
grid-17724

This is one sample of the compatibility mode:

grid-17726

This should be compared with the second grid posted above. The changes are extremely minor, I only notice a few patches of the first image and one patch on the third image having any differences, the others appear to be identical.

@drhead
Copy link
Contributor Author

drhead commented Dec 6, 2023

With further testing I am getting this error:
/home/drhead/miniconda3/envs/automatic/lib/python3.10/site-packages/torchsde/_brownian/brownian_interval.py:601: UserWarning: Should have ta>=t0 but got ta=0.0291675366461277 and t0=0.029168. warnings.warn(f"Should have ta>=t0 but got ta={ta} and t0={self._start}.")

Give me some time to investigate this before merging since it seems obvious that something is being casted inappropriately (line numbers will be wrong because of my debug prints)

edit: It appears that this is not a casting issue, it is an issue with torchsde's brownian interval rounding the start and stop values. I don't know if this has a significant impact, but rounding the sigma schedule to 6 decimal places would match it. It might make more sense for this to be fixed upstream in the k-diffusion repo.

@drhead
Copy link
Contributor Author

drhead commented Dec 6, 2023

Current implementation also seems to break Heun, but not Restart for some odd reason. Need to investigate.

@drhead
Copy link
Contributor Author

drhead commented Dec 9, 2023

I have identified the root cause for some samplers breaking. It appears that Heun, DDIM, and UniPC use other values that can be derived from alphas_cumprod at various points in sampling and I have to either update those values to match or update parts of the code to consistently re-derive those values (which theoretically would have a performance impact, but an insignificant one). I will stick to updating those values for now and expect to have a commit implementing these changes later today.

This is the only place these values are ever referenced outside of training code so this change is very justifiable and more consistent.
@drhead
Copy link
Contributor Author

drhead commented Dec 9, 2023

I went forward with re-deriving the values in the one place that used those values and now the DDIM sampler works more or less as expected.

I have further investigated the problems with Heun and UniPC and have concluded that both of these issues are present in the original ComfyUI implementation of zero terminal SNR sampling that I based this on. Model outputs for Heun start out in a ridiculous range and never seem to reach the expected range of -1 to 1, probably partly because it starts at sigma ~4500 and then goes to a sigma around 60. Under Restart, model outputs still start out in a rather high range but end up leveling out since the second sigma is around 3300. Outputs look normal under Heun if I use CFG rescale with fi=1.0, so I think the issue is lack of said rescale, and I would consider that out of scope of this pull request.

As far as I am concerned, this pull request is complete and ready. There's a number of other things that could be altered in order to make zero terminal SNR better supported (I would want to look at the discrepancy in Restart's sigma schedule for instance now that I know how to get reasonable outputs from Heun), but those can be investigated separately later.

@AUTOMATIC1111
Copy link
Owner

Here's an example of differences, for whoever is reading:

Without this PR applied:
00299-1508704561

With it:
00298-1508704561

@BurnZeZ
Copy link

BurnZeZ commented Dec 22, 2023

I’ve been seeing much closer prompt adherence especially with regard to fine details after applying this patch.

I haven’t used the new noise scheduler yet so is this caused by the alphas_cumprod fix?

@drhead
Copy link
Contributor Author

drhead commented Dec 30, 2023

I’ve been seeing much closer prompt adherence especially with regard to fine details after applying this patch.

I haven’t used the new noise scheduler yet so is this caused by the alphas_cumprod fix?

The changes shouldn't really be substantial enough to have a noticeable impact on prompt adherence. If it does then that may indicate something is wrong. Could you provide a side-by-side comparison with and without the patch on the same seed and settings to demonstrate?

@AUTOMATIC1111 AUTOMATIC1111 merged commit 267fd5d into AUTOMATIC1111:dev Jan 1, 2024
3 checks passed
@AUTOMATIC1111
Copy link
Owner

@drhead

I decided to add the system I was considering previously, and that was recently suggested to me in discord, for automatically enabling backwards compatibility options; it does not completely solve the problem, but alleviates it a bit. And with that the PR is accepted.

One thing I'd like to ask you to do is to make another PR where you remove unneeded if hasattr(p.sd_model, 'alphas_cumprod') and hasattr(p.sd_model, 'alphas_cumprod_original'): check, and move your code inside process_images_inner into two functions outside of it (because it's already a very crowded function), preferably between program_version and create_infotext, leaving just a single call to your function inside process_images_inner. I can do it myself, but then git would show me as author of those lines.

@ibrainventures
Copy link
Contributor

ibrainventures commented Jan 9, 2024

@drhead

I updated today to the dev version and in fine details / photorealistic the improvement by your PR is seeable / great.

grafik

only difference between the generation was: (hint for the api / override_setting param user)

Img 1: 'use_downcasted_alpha_bar' => true

Img 2: 'use_downcasted_alpha_bar' => false / (or no param) (the "new" calculation)

Thanks for this PR.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
5 participants