Add support for Image, Video, and Audio input into Forge and AutoGPT #7152

ntindle · 2024-05-14T14:40:12Z

Duplicates

I have searched the existing issues

Summary 💡

Adding support for Image, Video, and Audio inputs into the AutoGPT system is more than just supporting it at the fastapi server level, it includes passing them through the MultiProvider for LLMs and checking which LLMs support which features as part of their configs.

Examples 🌈

No response

Motivation 🔦

The future of Agents is multimodal

The text was updated successfully, but these errors were encountered:

ntindle · 2024-06-21T15:52:23Z

From boosterbot.ai, before we setup the integration with GitHub to auto comment the responses

forge/forge/llm/providers/multi.py - Defines the MultiProvider class, which serves as a unified interface to access multiple chat model providers. Consider updating the MultiProvider class in forge/forge/llm/providers/multi.py to include methods for processing Image, Video, and Audio inputs. Ensure that these methods route the inputs to the appropriate LLMs based on their configurations.
autogpt/autogpt/app/configurator.py - Handles configuration settings and overrides within the AutoGPT application. Consider updating the apply_overrides_to_config function in autogpt/autogpt/app/configurator.py to include new fields for Image, Video, and Audio support in the LLM configurations.
autogpt/autogpt/app/main.py - Serves as the entry point for the application, managing setup, configuration, and execution. Consider updating the main application logic in autogpt/autogpt/app/main.py to handle Image, Video, and Audio inputs and route them to the appropriate processing components.
autogpt/autogpt/app/agent_protocol_server.py - Manages tasks, steps, artifacts, and interactions with agents in the AutoGPT system. Consider updating the AgentProtocolServer class in autogpt/autogpt/app/agent_protocol_server.py to handle Image, Video, and Audio inputs and ensure they are processed correctly.
forge/forge/components/image_gen/image_gen.py - Provides commands to generate images from text prompts using various providers. Consider using the ImageGeneratorComponent in forge/forge/components/image_gen/image_gen.py as a reference for handling Image inputs and extend similar logic for Video and Audio inputs.
forge/forge/llm/providers/schema.py - Defines models and structures related to language and embedding models used in the system. Consider updating the models and structures in forge/forge/llm/providers/schema.py to include support for Image, Video, and Audio inputs.
forge/forge/llm/providers/openai.py - Manages interactions with OpenAI's API for chat completions and embeddings. Consider updating the OpenAIProvider class in forge/forge/llm/providers/openai.py to handle Image, Video, and Audio inputs if OpenAI supports these features.
forge/forge/llm/providers/groq.py - Manages interactions with Groq's API for chat completions and embeddings. Consider updating the GroqProvider class in forge/forge/llm/providers/groq.py to handle Image, Video, and Audio inputs if Groq supports these features.
forge/forge/llm/providers/__init__.py - Initializes various provider modules within the LLM system. Consider updating the __init__.py file in forge/forge/llm/providers to include the new input types for Image, Video, and Audio.
autogpt/autogpt/app/input.py - Handles user input within the AutoGPT application. Consider updating the clean_input function in autogpt/autogpt/app/input.py to handle new input types for Image, Video, and Audio.

ntindle added the fridge Items that can't be processed right now but can be of use or inspiration later label May 14, 2024

ntindle added this to AutoGPT Roadmap May 14, 2024

ntindle closed this as completed Nov 13, 2024

github-project-automation bot moved this to Done in AutoGPT Roadmap Nov 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for Image, Video, and Audio input into Forge and AutoGPT #7152

Add support for Image, Video, and Audio input into Forge and AutoGPT #7152

ntindle commented May 14, 2024

ntindle commented Jun 21, 2024 •

edited

Loading

Add support for Image, Video, and Audio input into Forge and AutoGPT #7152

Add support for Image, Video, and Audio input into Forge and AutoGPT #7152

Comments

ntindle commented May 14, 2024

Duplicates

Summary 💡

Examples 🌈

Motivation 🔦

ntindle commented Jun 21, 2024 • edited Loading

ntindle commented Jun 21, 2024 •

edited

Loading