Open
Description
Hello,
I'd like to create a tool, which can ingest images passed via {"type": "input_image", "image_url": "..."}
schema, adjust them and then return either the adjusted image (which I think is not currently possible: Issue 341), or a HTTPS URL.
Example:
import io
from pathlib import Path
from agents import Agent, function_tool, Runner
def to_base64_encoded_str(file_path: Path) -> str:
with open(file_path, mode="rb") as image_file:
return base64.b64encode(image_file.read()).decode(encoding="utf-8")
@function_tool(docstring_style="google")
def change_brightness(image_url: str, change: Literal["increase", "decrease"]) -> str:
"""Changes brightness of the input image.
Args:
image_url: Data URI of the image as defined by [IETF RFC 2397 document](https://datatracker.ietf.org/doc/html/rfc2397). The URI is of the form: `data:[<mediatype>][;base64],<data>`. The `<mediatype>` is an Internet media type specification. The appearance of `;base64` means that the data is encoded as base64. The image data is expected to be encoded as base64.
change: Increase or decrease the image brightness.
Returns:
Download URL to the adjusted image.
"""
with io.BytesIO(base64.b64decode(image_url, validate=True)) as image_io:
...
agent = Agent(name="My Agent", tools=[change_brightness])
result = await Runner.run(
starting_agent=agent,
input=[
{
"role": "user",
"content": [
{
"type": "input_text",
"text": "I'd like you to increase brightness of the following image.",
},
{
"type": "input_image",
"image_url": "data:image/png;base64," + to_base64_encoded_str(Path("./image.png")),
},
],
}
],
)
Please is it possible for the tools to ingest images passed via {"type": "input_image", "image_url": "..."}
schema? If yes how? I couldn't find any such mention in the documentation and when I tried this, it seems that the agent never passes the expected image_url
to the function tool, but it hallucinates some random URL instead.
Thank you very much for any advice and have a nice day.