Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[Usage]: Dose vLLM support embedding api of multimodal llm? #8483

Closed
1 task done
sfyumi opened this issue Sep 14, 2024 · 11 comments
Closed
1 task done

[Usage]: Dose vLLM support embedding api of multimodal llm? #8483

sfyumi opened this issue Sep 14, 2024 · 11 comments
Labels
usage How to use vllm

Comments

@sfyumi
Copy link

sfyumi commented Sep 14, 2024

Your current environment

The output of `python collect_env.py`

How would you like to use vllm

eg: get embedding of minicpmv 2.6

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
@sfyumi sfyumi added the usage How to use vllm label Sep 14, 2024
@DarkLight1337
Copy link
Member

No, this is not supported yet.

@DarkLight1337
Copy link
Member

DarkLight1337 commented Sep 14, 2024

In fact, this isn't even available for most language-only models. The only one supported right now is Mistral. See also #7915

@noooop
Copy link
Contributor

noooop commented Sep 14, 2024

I am working on it. #8453 #8452 @DarkLight1337

@noooop
Copy link
Contributor

noooop commented Sep 20, 2024

According to my understanding, MiniCPM-V 2.6 is a generative model, not a retrieval model specifically used to generate embeddings. (Maybe you need multimodal retrieval models such as BAAI/bge-visualized-m3 https://huggingface.co/BAAI/bge-visualized)

Can you send some sample code and tell me how you want to use MiniCPM-V 2.6 to generate embedding
@sfyumi

@sfyumi
Copy link
Author

sfyumi commented Sep 20, 2024

@noooop
We obtain the last hidden states from a language model (LLM) and use a multi-layer linear transformation to reduce the dimensionality as embedding.
We use both minicpmv 2.6 and qwen2 model as base model to get embeddings.

sample code

class MiniCpmlWithProjectionModel(MiniCPMV):
    def __init__(self, config):
        super().__init__(config)
        embedding_dim = config.hidden_size
        projection_output_dim = config.projection_output_dim if hasattr(config, "projection_output_dim") else 128
        self.projection_layer = nn.Sequential(
            nn.Linear(embedding_dim, 512),
            nn.ReLU(),
            nn.Linear(512, 256),
            nn.ReLU(),
            nn.Linear(256, projection_output_dim)
        )

    def forward(self, data, **kwargs):
        outputs = super().forward(data, **kwargs)
        hidden_states = outputs.hidden_states[-1]
        return self.projection_layer(hidden_states)

@noooop
Copy link
Contributor

noooop commented Sep 20, 2024

Understood

There is a small detail here.
vllm routing model_name to it corresponding implementation is through the architectures parameter in config.json.

You must think of a cool name to avoid routing to the previous model.

@noooop
Copy link
Contributor

noooop commented Sep 20, 2024

There are also issues that propose to output last hidden states #853, but I think this is very costly and the best way is to implement a model yourself. adding_model

@noooop
Copy link
Contributor

noooop commented Sep 20, 2024

Simple but inefficient method:

Output last hidden states #853, A hacker’s method is mentioned below. https://github.com/WuNein/vllm4mteb/tree/main
(Maybe vllm can add an option to output last hidden states in the future.
But you need to implement mlp in another process

More efficient implementation

Implement a model yourself. adding_model
There is a small detail here.
vllm routing model_name to it corresponding implementation is through the architectures parameter in config.json.

You must think of a cool name to avoid routing to the previous model.

@DarkLight1337
Copy link
Member

You can now modify any existing model to support embeddings, please see #9314 (comment).

@jianglan89
Copy link

According to my understanding, MiniCPM-V 2.6 is a generative model, not a retrieval model specifically used to generate embeddings. (Maybe you need multimodal retrieval models such as BAAI/bge-visualized-m3 https://huggingface.co/BAAI/bge-visualized)

Can you send some sample code and tell me how you want to use MiniCPM-V 2.6 to generate embedding @sfyumi

does vllm support BAAI/bge-visualized now?

@DarkLight1337
Copy link
Member

According to my understanding, MiniCPM-V 2.6 is a generative model, not a retrieval model specifically used to generate embeddings. (Maybe you need multimodal retrieval models such as BAAI/bge-visualized-m3 https://huggingface.co/BAAI/bge-visualized)
Can you send some sample code and tell me how you want to use MiniCPM-V 2.6 to generate embedding @sfyumi

does vllm support BAAI/bge-visualized now?

It doesn't look like the HF repo is compatible with transformers, so we can hardly load the model.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
usage How to use vllm
Projects
None yet
Development

No branches or pull requests

4 participants