Skip to content

Creative-comfyUI/Understanding-how-the-image-is-built.-zeroConditioning-clipVision-seed-and-prompt-injection-

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 

Repository files navigation

Understanding how the image is built. zeroConditioning, clipVision, seed and prompt injection

Are we sure we understand how the image is built and what reference the prompt image is based on?

Since @cubiq creation of the prompt injection node, I have discovered that what I thought about image creation in comfyUI is probably not what I imagined.

Before I begin my demonstration. I invite you to read the first part of Prompt Injection (https://github.com/Creative-comfyUI/prompt_injection) . You can also read what I have written about seeds (https://github.com/Creative-comfyUI/seed_eng). All this information is necessary to understand what I am about to demonstrate.

In my previous repisotery about prompt injection, I explained that even prompt is muted with zeroConditioning, ksampler generated image . This image looks like a reference. You have to understand that no text in the clip text encode doesn't mean that the prompt is muted. The zeroConditioning ensures that no text information is used by the clip (https://openai.com/index/clip/).

The image generated by zeroConditioning appears to be a reference image, but how it is used is the question. It is probably the noise that is used. What we are sure of is that for each seed the model produces an image, even without a prompt. For the same seed, there is a similarity between the images rendered (example: a man, a room, a garden.... ) but not always. What surprised me was that each model didn't give the same finishing touch to the image reference, some are blured, some are not well defined and others are a perfect photo. Let us look at a few examples with the same seeds and zero conditioning.

Model_Euler

The most intriguing is that the sampler can give a different image. Many users thought that the sampler influenced the finishing of the image. Sampler can influence much more the images. Lets have a look at some example with always zeroConditioning and same seed.

Sampler

There is another surprise the format can change the reference image too

Size

Now let us use prompt injection to see what has happened. As I mentioned in my previous repository, input 8 is the key change of the image. Not for the first image, but for the third more than the second. Input 8 with an empty prompt affected the result. The most defined reference image will show the more important change. Example with Sdxl 1 the change is slight. Let us look at some examples.

This example is with sdxl_ligthning_8step model

Screenshot 2024-06-16 at 2 38 21 PM

If we change the size to 832 * 1216 the third image is completely different, which is not the case when using Euler. Input 8 change the image

Screenshot 2024-06-16 at 2 45 07 PM

Before continuing, I write a prompt and mute the negative prompt to see what the result is for each model without prompt injection.

Prompt : Paris champ de mars, an elegant woman colorfull, in 1950 style

Model_Prompt

Now let try If, as previously, sampler change the image. I am going to use the same model and the same samplers I used before and see the results

Sampler_Prompt

The results obtained with different samplers show us that the image is in relation to the reference image style (realistic, drawing, sketching...) but it is not always the case

All these images are in 1024 * 1024 format, which means that with another format the result will be different.

w 832 * h 1216

Screenshot 2024-06-16 at 7 18 29 PM

w 1536 * 640

Screenshot 2024-06-16 at 7 43 30 PM

Times to times with bigger images we can see a signature on the image or text as with this image without being able to read it.

What is commun with all these images ?

The Eiffel Tower, because it is associated with the Champ de Mars in Paris, cars, an elegant lady in a hat and, finally, nice colours.

Before continuing the explanation, we can conclude that Euler samplers always give the same images. Other samplers may give different images or style. The image following the first image can be different from the others with samplers other than Euler. The first picture is a kind of reference. The seed is linked to this image. Changing the size of the image changes the point of view and can change the image. It looks like the image size is related to the deep field of the camera. 1526 * 640 can be considered as f22.

Now we are going to use prompt injection as it can change many things in the result of the image.

Adding an empty prompt to input 8 add variation to image 2 and 3. 3 has much more variation. Image 1 didn't change

Screenshot 2024-06-16 at 8 30 33 PM

Now I will go to the muted prompt and write the prompt in the one that links to input 8

Screenshot 2024-06-16 at 10 58 17 PM

This means that the apply prompt in input 8 doesn't create an image, but gives a variation of the image reference, because images 2 and 3 are different. Input 8 will create a variation no matter what you write. This means that if you have a good reference image you can use it to make a variation using input 8 and using batch size 3, depending on the model and latent size the variation may be less important and less or more interesting. Output 0 and 1 will help to add detail to these variations. Let us take a look at the previous examples using the DreamshaperXL mode and the LCM sampler. I use zeroconditionning for the prompt and the input 8 prompt is blank here is the result

LCM input_8

Now I am going to add details to image 2 and 3 using different input and ouput

input 5 and input 8 with blank prompt change face expression of the third image

visage

input 7 and input 8 with blank prompt change much more face expression of the third image

visage2

Now we are going to use prompt with input 0 and input 1

input 0 : woman smiling and dancing with hands up input 1 : Red jacket, blue trouser, yellow hat,

Screenshot 2024-06-17 at 8 23 33 PM

Now I will link prompt injection all to a prompt. The prompt is the one we use before : Paris champ de mars, an elegant woman colourful, in 1950 style This prompt changes the second image, but the prompt was not applied. If we disconnect input 8 and keep all outputs 1 and 0, then not only the third image but also the second one will be affected. This time the output prompt 0 and 1 are correctly applied, but it is not the prompt we expected.

Screenshot 2024-06-17 at 8 42 13 PM Screenshot 2024-06-17 at 10 09 27 PM

If we now change the Lcm sampler for dpmpp_2m with input 8, the result is different. The second image is the metamorphosis of the reference image and the third image is the final transformation.

Screenshot 2024-06-17 at 9 25 10 PM

Now we applied the current prompt and prompt injection all, output 0 and output 1 applied

Screenshot 2024-06-17 at 10 23 57 PM

without injection all

Screenshot 2024-06-17 at 9 57 39 PM

without injection all and dpmmp_sde sampler instead Lcm

Screenshot 2024-06-17 at 10 42 38 PM

Prompt injection bring precision and detail to the image. Adding input 8 blocked a part of the original prompt (third image)

Screenshot 2024-06-17 at 11 03 25 PM

Two points: Using an image for latent_image instead of empty latent changes the posture of the character. Using batch size 1 can give a different result than if we choose batch size 3.

The question is: is there a relationship between the reference image and the final image, or is it a glitch ? Is the reference image, an AI image or an existing image that was used to train the AI?

There is a relationship between the reference image with the seed for sure. That means there is a relationship also with the image we will create especially with the first image

Now let use clip vision to confirm or not this relationship. Clipvision let us used an image as prompt. Using this technic can help us to mix two images and create a new one. Let try and see what happend. It is important to understand that it looks like the model we used seems to affect the result. Some models seem to be more accurate than others. Why is another question?

Let us look at some examples.

Model DreamShaperXL Turbo

Regular image with prompt

RenduDreamShaper

Image with muted prompt (zeroconditionning)

Dream_muted

Image using clip vision zeroconditionning

Strength 0

ClipVisionDreamShapter

Strength 1

Clipvision1_dreamShaper

For strength 1, I wonder where this picture came from. It looks like a variation of the first image. Playing with the strength is not a variation between strength 0 and 1, it looks like a variation of strength 1.

This happend because we choose image reference 1. If we choose the third image of our batch size everything is different with strength 1

Screenshot 2024-06-18 at 5 01 47 PM

What happens if we vary the seed?

The new seed will give a new reference image for strength 0, meaning the image doesn't contain the image reference. The reference is linked to the seed.

Now we are going to use another model, which seems to be different and maybe more accurate. I invite to read the previous repositery on the pormpt injection

With the model sdxl_lightning_8 steps we will have for strength 0 the reference image and for 1 the third image if have had with input 8 with prompt zeroconditionning

This model as you can see in the previous repository seems to act differently. Using batch size 1 will give the image 3 of batch size 3. For this model there is two different reference. Why ? That is the question ?

For the same seed, with the same model, the reference image will always be the same. Using clip vision with this model and image done with another model will give the same reference image but with variation. The sampler can influence the reference image Euler, LCM dpmmpp_.... or dpmpp_sde can have different references.

Prompt injection can help us to add some detail more precisely wiht input 8. input 8 can in some case as seen with my previous repisotery on prompt injection help us to have variation of image reference and can give interesting result. In this case the choice of model is very important. Not all models react in the same way.

How is this reference used, how does it affect the result? Why is the effect important for some models and not for others? If instead a model in Ksampler we can choose an image that will act differently at the one add in the latent, might be a good idea to see what happens?

Prompt injection can be a good tool if we can influence the results more precisely and decide what we want. At the same time, I haven't tested negative prompt, which can have an impact on the results describe in this text.

This analysis was done to understand more what happens with the image and maybe it can give ideas to the developer to create new models or new nodes to have more control on the final image.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published