feat(multimodal): Video understanding #2318

mudler · 2024-05-13T20:40:22Z

It should be possible now to expand the vision support to understand videos, there are projects like
https://github.com/Efficient-Large-Model/VILA
https://github.com/LLaVA-VL/LLaVA-NeXT
https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct?s=09

which make this possible nowadays. Since OpenAI has announced GPT4o, makes sense start looking into open solutions that we can plug into the API with specific backends.

llama.cpp: ggml-org/llama.cpp#9165
vLLM: #3670

Closes: #2318 Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(vllm): add support for image-to-text Related to #3670 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(vllm): add support for video-to-text Closes: #2318 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(vllm): support CPU installations Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(vllm): add bnb Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * chore: add docs reference Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Apply suggestions from code review Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>

) * feat(vllm): add support for image-to-text Related to mudler#3670 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(vllm): add support for video-to-text Closes: mudler#2318 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(vllm): support CPU installations Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(vllm): add bnb Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * chore: add docs reference Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Apply suggestions from code review Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>

mudler added the enhancement New feature or request label May 13, 2024

mudler mentioned this issue May 13, 2024

[EPIC] Model support dashboard (v2) #1126

Open

91 tasks

mudler added roadmap up for grabs Tickets that no-one is currently working on labels May 13, 2024

mudler mentioned this issue Sep 19, 2024

feat(api): allow to pass videos to backends #3601

Merged

mudler mentioned this issue Sep 26, 2024

Support multimodals models with vLLM #3670

Closed

mudler mentioned this issue Oct 4, 2024

feat(multimodal): allow to template placeholders #3728

Merged

1 task

mudler added a commit that referenced this issue Oct 4, 2024

feat(vllm): add support for video-to-text

f3f9d1d

Closes: #2318 Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

mudler mentioned this issue Oct 4, 2024

feat(vllm): add support for image-to-text and video-to-text #3729

Merged

1 task

mudler closed this as completed in #3729 Oct 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(multimodal): Video understanding #2318

feat(multimodal): Video understanding #2318

mudler commented May 13, 2024 •

edited

Loading

feat(multimodal): Video understanding #2318

feat(multimodal): Video understanding #2318

Comments

mudler commented May 13, 2024 • edited Loading

mudler commented May 13, 2024 •

edited

Loading