Skip to content

Can tools ingest images attached to image_url? #875

Open
@polaon

Description

@polaon

Hello,

I'd like to create a tool, which can ingest images passed via {"type": "input_image", "image_url": "..."} schema, adjust them and then return either the adjusted image (which I think is not currently possible: Issue 341), or a HTTPS URL.

Example:

import io
from pathlib import Path

from agents import Agent, function_tool, Runner


def to_base64_encoded_str(file_path: Path) -> str:
    with open(file_path, mode="rb") as image_file:
        return base64.b64encode(image_file.read()).decode(encoding="utf-8")


@function_tool(docstring_style="google")
def change_brightness(image_url: str, change: Literal["increase", "decrease"]) -> str:
    """Changes brightness of the input image.

    Args:
        image_url: Data URI of the image as defined by [IETF RFC 2397 document](https://datatracker.ietf.org/doc/html/rfc2397). The URI is of the form: `data:[<mediatype>][;base64],<data>`. The `<mediatype>` is an Internet media type specification. The appearance of `;base64` means that the data is encoded as base64. The image data is expected to be encoded as base64.
        change: Increase or decrease the image brightness.

    Returns:
        Download URL to the adjusted image.
    """
    with io.BytesIO(base64.b64decode(image_url, validate=True)) as image_io:
        ...


agent = Agent(name="My Agent", tools=[change_brightness])

result = await Runner.run(
    starting_agent=agent,
    input=[
        {
            "role": "user",
            "content": [
                {
                    "type": "input_text",
                    "text": "I'd like you to increase brightness of the following image.",
                },
                {
                    "type": "input_image",
                    "image_url": "data:image/png;base64," + to_base64_encoded_str(Path("./image.png")),
                },
            ],
        }
    ],
)

Please is it possible for the tools to ingest images passed via {"type": "input_image", "image_url": "..."} schema? If yes how? I couldn't find any such mention in the documentation and when I tried this, it seems that the agent never passes the expected image_url to the function tool, but it hallucinates some random URL instead.

Thank you very much for any advice and have a nice day.

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionQuestion about using the SDK

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions