Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

planning: Supporting vision model (Llava and Llama3.2) #1493

Open
6 tasks
namchuai opened this issue Oct 16, 2024 · 6 comments
Open
6 tasks

planning: Supporting vision model (Llava and Llama3.2) #1493

namchuai opened this issue Oct 16, 2024 · 6 comments
Assignees

Comments

@namchuai
Copy link
Contributor

namchuai commented Oct 16, 2024

Problem Statement

To support Vision models on Cortex, we need the following:

  • 1. Download model .gguf and mmproj file
  • 2. v1/models/start takes in model_path (.gguf) and mmproj parameters
  • 3. /chat/completions to take in messages content image_url
  • 4. image_url has to be encoded in base64 (via Jan, or link to tool eg https://base64.guru/converter/encode/image)
  • 5. model support - (side note: Jan currently supports BakLlava 1, llava 7B, Llava 13B)
  • 6. Pull correct NGL settings from chat model. Ref issue bug: Missing NGL Setting for Vision Models #1763

1. Downloading model .gguf and mmprog file:

For fully compatible with Jan, cortex should be able to pull mmproj file along with GGUF file.

Let's take the image below for example.
Screenshot 2024-10-16 at 08 35 06

Scenario steps:

  1. User want to download llava model and expect it to support vision. So, user input:
  • Direct URL to the GGUF file (e.g. llava-v1.6-mistral-7b.Q3_K_M.gguf), or Url to repository (we will list options filter .gguf file) for user to select.
  • Since mmproj is also ended with .gguf, we also listed that in the selection.
  1. Cortex will only pull that selected GGUF file, ignoring that:
  • mmproj.gguf alone won't work.
  • only traditional gguf file (e.g. llava-v1.6-mistral-7b.Q3_K_M.gguf) will not have vision feature.

So, we need to come up with a way so that cortex knows when to download the mmproj file along with traditional gguf file.

cc @dan-homebrew , @louis-jan , @nguyenhoangthuan99, @vansangpfiev

Feature Idea

Couple of thoughts:

  1. File name based.
    1.1. For CLI: Ignore file name contains mmproj when presenting selection list. And download it along with selected traditional gguf file.
    1.2. For API: Always scan the directory with same level as the URL provided. If there's a mmproj file name, cortex adds it to the download list.
  • Edge case: If user provide a direct URL to mmproj file, return error with clear error message.
  1. Thinking / You tell me
@github-project-automation github-project-automation bot moved this to Investigating in Menlo Oct 16, 2024
@namchuai namchuai changed the title idea: [DESCRIPTION] Supporting vision model (llava) idea: [Vision model] Supporting vision model (llava) Oct 16, 2024
@gabrielle-ong
Copy link
Contributor

gabrielle-ong commented Nov 7, 2024

Updates:

  1. CLI cortex pull presents .gguf and mmproj files
image 2. `mmproj` param is added to /v1/models/start parameters in [#1537](https://github.com//issues/1537)

@dan-menlo dan-menlo changed the title idea: [Vision model] Supporting vision model (llava) epic: Supporting vision model (Llava and Llama3.2) Nov 7, 2024
@dan-menlo
Copy link
Contributor

We should ensure that model.yaml supports this type of abstraction, cc @hahuyhoang411

@gabrielle-ong
Copy link
Contributor

@vansangpfiev and @hahuyhoang411 - can I get your thoughts to add to this list from my naive understanding?

To support Vision models on Cortex, we need the following:

  1. Download model - downloads .gguf and mmproj file -> What is the model pull UX?
  2. v1/models/start takes in model_path (.gguf) and mmproj parameters ✅
  3. /chat/completions to take in messages content image_url ✅
  4. image_url has to be encoded in base64 (via Jan, or link to tool eg https://base64.guru/converter/encode/image)
  5. model support - (side note: Jan currently supports BakLlava 1, llava 7B, Llava 13B)
    ..

@gabrielle-ong gabrielle-ong changed the title epic: Supporting vision model (Llava and Llama3.2) planning: Supporting vision model (Llava and Llama3.2) Nov 7, 2024
@gabrielle-ong gabrielle-ong moved this from Investigating to Planning in Menlo Nov 7, 2024
@vansangpfiev
Copy link
Contributor

vansangpfiev commented Nov 7, 2024

@vansangpfiev and @hahuyhoang411 - can I get your thoughts to add to this list from my naive understanding?

To support Vision models on Cortex, we need the following:

  1. Download model - downloads .gguf and mmproj file -> What is the model pull UX?
  2. v1/models/start takes in model_path (.gguf) and mmproj parameters ✅
  3. /chat/completions to take in messages content image_url ✅
  4. image_url has to be encoded in base64 (via Jan, or link to tool eg https://base64.guru/converter/encode/image)
  5. model support - (side note: Jan currently supports BakLlava 1, llava 7B, Llava 13B)
    ..
  1. I'm not sure about this yet, since 1 folder can have multiple chat model files with 1 mmproj file.
  2. Yes
  3. I'm not sure if this is a good UX
  4. image_url can be a local path to image, llama-cpp engine support encoding image to base64 and pass it to model.
  5. llama-cpp engine supports BakLlava 1, llava 7B, llava 13B.
    llama.cpp upstream has already supported MiniCPM-V 2.6, we can integrate it to llama-cpp.
    llama.cpp upstream does not support llama3.2 vision yet.

We probably need to consider changing the UX for inferencing with vision model, for example:

cortex run llava-7b --image xx.jpg -p "What is in the image?"

@gabrielle-ong
Copy link
Contributor

gabrielle-ong commented Nov 7, 2024

Thank you @vansangpfiev and @hahuyhoang411! Quick notes from call:

  • upstream llama.cpp -> cortex.llama-cpp needs to expose vision parameters to cortex.cpp
  • Ease of models support: LLava, then MiniCPM.
  • Llama3.2 vision

@louis-menlo
Copy link
Contributor

Added an action item, where model management should pull metadata from chat model file instead of projector file (just to make sure we tracked this)

@LeVinhGithub LeVinhGithub added the type: epic A major feature or initiative label Feb 21, 2025
@dan-menlo dan-menlo removed the type: epic A major feature or initiative label Mar 13, 2025
@vansangpfiev vansangpfiev moved this from Scheduled to Investigating in Menlo Mar 24, 2025
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
Status: No status
Status: Investigating
Development

No branches or pull requests

6 participants