Implementation of MagicMix: Semantic Mixing with Diffusion Models paper.
The aim of the method is to mix two different concepts in a semantic manner to synthesize a new concept while preserving the spatial layout and geometry.
The method takes an image that provides the layout semantics and a prompt that provides the content semantics for the mixing process.
There are 3 parameters for the method-
v
: It is the interpolation constant used in the layout generation phase. The greater the value of v, the greater the influence of the prompt on the layout generation process.kmax
andkmin
: These determine the range for the layout and content generation process. A higher value of kmax results in loss of more information about the layout of the original image and a higher value of kmin results in more steps for content generation process.
from PIL import Image
from magic_mix import magic_mix
img = Image.open('phone.jpg')
out_img = magic_mix(img, 'bed', kmax=0.5)
out_img.save("mix.jpg")
python3 magic_mix.py \
"phone.jpg" \
"bed" \
"mix.jpg" \
--kmin 0.3 \
--kmax 0.6 \
--v 0.5 \
--steps 50 \
--seed 42 \
--guidance_scale 7.5
Also, check out the demo notebook for example usage of the implementation to reproduce examples from the paper.
You can also use the community pipeline on the diffusers libary.
from diffusers import DiffusionPipeline, DDIMScheduler
from PIL import Image
pipe = DiffusionPipeline.from_pretrained(
"CompVis/stable-diffusion-v1-4",
custom_pipeline="magic_mix",
scheduler = DDIMScheduler.from_pretrained("CompVis/stable-diffusion-v1-4", subfolder="scheduler"),
).to('cuda')
img = Image.open('phone.jpg')
mix_img = pipe(
img,
prompt = 'bed',
kmin = 0.3,
kmax = 0.5,
mix_factor = 0.5,
)
mix_img.save('mix.jpg')
I'm not the author of the paper, and this is not an official implementation