Integrate vllm for multimodal data #1098

plaguss · 2025-01-15T16:09:03Z

Description

Integrates vision language models on vLLM:

loader = LoadDataFromDicts(
    data=[
        {
            "instruction": "What’s in this image?",
            "image": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
        }
    ],
)

llm = vLLM(
    model_id="meta-llama/Llama-3.2-11B-Vision-Instruct",
)

vision = TextGenerationWithImage(name="vision_gen", llm=llm, image_type="url")

github-actions · 2025-01-15T16:10:28Z

Documentation for this PR has been built. You can view it at: https://distilabel.argilla.io/pr-1098/

codspeed-hq · 2025-01-15T16:14:15Z

CodSpeed Performance Report

Merging #1098 will not alter performance

_{Comparing vllm-image (70c8758) with develop (5257600)}

Summary

✅ 1 untouched benchmarks

Integrate vllm for multimodal data

70c8758

plaguss added the enhancement New feature or request label Jan 15, 2025

plaguss requested a review from gabrielmbmb January 15, 2025 16:09

plaguss self-assigned this Jan 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate vllm for multimodal data #1098

Integrate vllm for multimodal data #1098

plaguss commented Jan 15, 2025

github-actions bot commented Jan 15, 2025

codspeed-hq bot commented Jan 15, 2025

Integrate vllm for multimodal data #1098

Are you sure you want to change the base?

Integrate vllm for multimodal data #1098

Conversation

plaguss commented Jan 15, 2025

Description

github-actions bot commented Jan 15, 2025

codspeed-hq bot commented Jan 15, 2025

CodSpeed Performance Report

Merging #1098 will not alter performance

Summary