Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Integrate vllm for multimodal data #1098

Draft
wants to merge 1 commit into
base: develop
Choose a base branch
from
Draft

Integrate vllm for multimodal data #1098

wants to merge 1 commit into from

Conversation

plaguss
Copy link
Contributor

@plaguss plaguss commented Jan 15, 2025

Description

Integrates vision language models on vLLM:

loader = LoadDataFromDicts(
    data=[
        {
            "instruction": "What’s in this image?",
            "image": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
        }
    ],
)

llm = vLLM(
    model_id="meta-llama/Llama-3.2-11B-Vision-Instruct",
)

vision = TextGenerationWithImage(name="vision_gen", llm=llm, image_type="url")

@plaguss plaguss added the enhancement New feature or request label Jan 15, 2025
@plaguss plaguss requested a review from gabrielmbmb January 15, 2025 16:09
@plaguss plaguss self-assigned this Jan 15, 2025
Copy link

Documentation for this PR has been built. You can view it at: https://distilabel.argilla.io/pr-1098/

Copy link

codspeed-hq bot commented Jan 15, 2025

CodSpeed Performance Report

Merging #1098 will not alter performance

Comparing vllm-image (70c8758) with develop (5257600)

Summary

✅ 1 untouched benchmarks

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant