-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Fix Flux CLIP prompt embeds repeat for num_images_per_prompt > 1 #9280
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the fix!
but this didn't cause different results, no?
So the effect can be subtle for prompt = ["a cat", "a dog"]
height = 1024
width = 768
images = pipe(
prompt=prompt,
guidance_scale=3.5,
num_inference_steps=20,
num_images_per_prompt=2,
generator=torch.Generator("cpu").manual_seed(1),
height=height,
width=width
).images The CLIP prompt embeds alternate between cat/dog embeddings
but it should be (2 cat embeddings, 2 dog embeddings)
Since the T5 embeddings are in the right order, what happens in this case is that
|
@DN6 got it! |
What does this PR do?
Flux uses the pooled CLIP prompt embeds, so it produces a 2D tensor (batch_size, dim) rather than a 3D tensor (batch_size, seq_len, dim). The current way of repeating the prompt embeds for
num_images_per_prompt
is incorrect. This PR corrects the behaviourFixes # (issue)
#9215
Before submitting
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.