You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is it considered to implement motion vector map import to use with either txt2vid or vid2vid? Instead of generating one with RAFT on the fly. Either to use one generated from 3D software or from a different video.
For background:
I've been researching and developing workflow for my short animated film (for MA degree) and been looking to use visual generative AI with as much control as possible. It's been 2-3 years since I started looking into it. I was about to settle on (now legacy) pix2pix GAN techniques, but Controlnets (duh) was a revelation and now the final missing peace is some control over temporal consistency. I've been following Alexander's progress on his "warpfusion" notebook, and he's doing all the right things, but I cannot go back to notebooks after having used webui and software implementations such as dream-textures.
To me this is a no-brainer: control-nets are perfect modular peaces that allow artistic control, and having control of optical flow should be modular as well in my eyes. If you work with 3D - depth maps, normal maps, poses already easily plug into control-nets naturally, demonstrated here. The mentioned dream-textures addon literally plugs these within blender itself, using objects in your scene, even allowing you to assign objects to segmentation maps.
Optical flow motion vectors (usually generated with RAFT from video) can also come directly from your animated 3d scenes. I know you can have RAFT generate it from 3d renders, but that's an unnecessary step and requires there to be texture for RAFT to pick up on.
Here's a simple test I did in blender with only 1 controlnet (depth) LINK
There's plenty of flicker, some of which could be fixed with tighter prompting and multi-controlnet, but I wish wish I could run something like that with features like this project and control the amount of consistency. Some flicker and "hallucination" is nice and is actually the reason I'm choosing such tools in the first place.
Thanks for reading, sorry if it's wordy, I'm just quite hyped and thankful for devs pushing the open-source way ❤️
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hey! I'm really excited to discover this.
Is it considered to implement motion vector map import to use with either txt2vid or vid2vid? Instead of generating one with RAFT on the fly. Either to use one generated from 3D software or from a different video.
For background:
I've been researching and developing workflow for my short animated film (for MA degree) and been looking to use visual generative AI with as much control as possible. It's been 2-3 years since I started looking into it. I was about to settle on (now legacy) pix2pix GAN techniques, but Controlnets (duh) was a revelation and now the final missing peace is some control over temporal consistency. I've been following Alexander's progress on his "warpfusion" notebook, and he's doing all the right things, but I cannot go back to notebooks after having used webui and software implementations such as dream-textures.
To me this is a no-brainer: control-nets are perfect modular peaces that allow artistic control, and having control of optical flow should be modular as well in my eyes. If you work with 3D - depth maps, normal maps, poses already easily plug into control-nets naturally, demonstrated here. The mentioned dream-textures addon literally plugs these within blender itself, using objects in your scene, even allowing you to assign objects to segmentation maps.
Optical flow motion vectors (usually generated with RAFT from video) can also come directly from your animated 3d scenes. I know you can have RAFT generate it from 3d renders, but that's an unnecessary step and requires there to be texture for RAFT to pick up on.
Here's a simple test I did in blender with only 1 controlnet (depth)
LINK
There's plenty of flicker, some of which could be fixed with tighter prompting and multi-controlnet, but I wish wish I could run something like that with features like this project and control the amount of consistency. Some flicker and "hallucination" is nice and is actually the reason I'm choosing such tools in the first place.
Thanks for reading, sorry if it's wordy, I'm just quite hyped and thankful for devs pushing the open-source way ❤️
Beta Was this translation helpful? Give feedback.
All reactions