-
-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Feature Request - Multimodal I/O: Support multimodal input and output #43
Comments
pranav-kural
added a commit
that referenced
this issue
Jul 18, 2024
pranav-kural
added a commit
that referenced
this issue
Jul 18, 2024
pranav-kural
added a commit
that referenced
this issue
Jul 18, 2024
pranav-kural
added a commit
that referenced
this issue
Jul 28, 2024
pranav-kural
added a commit
that referenced
this issue
Jul 29, 2024
pranav-kural
added a commit
that referenced
this issue
Jul 29, 2024
pranav-kural
added a commit
that referenced
this issue
Jul 29, 2024
On testing, observed that when defining an open-ended chat endpoint, query never gets sent to the LLM, due to an issue in the prompt. The prompt template being used didn't have the Also noticed warnings regarding prompt with certain name (e.g., |
pranav-kural
added a commit
that referenced
this issue
Jul 29, 2024
pranav-kural
added a commit
that referenced
this issue
Jul 29, 2024
Fix for above issue added pre-release: #59 |
pranav-kural
added a commit
that referenced
this issue
Jul 30, 2024
pranav-kural
added a commit
that referenced
this issue
Jul 30, 2024
…igureAndRunServer` method #43 #54 (#63) * fix for LLM model configurations type inference (Gemini, OpenAI, DALL-E) #43 * Bumped pre-release package version + removed dependency on genkit-langchain and genkit-chromadb * added the `configureAndRunServer` method #54 + moved exports for server and genkit to root level
pranav-kural
added a commit
that referenced
this issue
Aug 1, 2024
* updated cache store to support multimodal inputs #43 * updates to cache store for multimodality #43 * updated define chat endpoint logic to support multimodal input output and verbose mode * fixed endpoint output schema to support verbose response * updated tests to support multimodal input output (tests only use text I/O for now) #43 * formatting change * added changes to setup `alpha` branch for alpha release * Added Changes to prompts, chat agent, endpoints to support Multimodal I/O (#57) * added DALL-E 3 to supported models * Updated prompts with partials and custom output schema #43 * updated prompts for multimodality #43 * Updated chat agent class for multimodal I/O #43 * Updated defineChatEndpoint to support multimodal I/O #43 * bumped NPM package for alpha release * Fixed issues with basic system prompt + added default system prompts #43 (#59) * Multimodal fix - multiple changes to chat agent module to support multimodal I/O (#60) * Fixed issues with basic system prompt + added default system prompts #43 * multiple changes to chat agent module to support multimodal I/O * bumped package version for alpha release * Models patch - fixes for multimodal support + implementation of `configureAndRunServer` method #43 #54 (#63) * fix for LLM model configurations type inference (Gemini, OpenAI, DALL-E) #43 * Bumped pre-release package version + removed dependency on genkit-langchain and genkit-chromadb * added the `configureAndRunServer` method #54 + moved exports for server and genkit to root level * Final merge for v2 pre-release branch (#71) * removed dependency on @genkit-ai/firebase and @genkit-ai/firebase #67 * Refactored codebase to implement type-only imports and export #68 * Implemented logic to reset cache record on expiry #69 + fixed #66 * updated implementation of method for cache store #69 * re-factored and re-organized code to export `defineChatEndpoint` from root #70 * updated `langchain` to v0.2.12 + added badges to README #66 * Bumped pre-release version - final for v2 * Remove alpha NPM package workflow + changed package version to 2.0.0 * fixed workflows for pre-deploy build + code scanning
# for free
to join this conversation on GitHub.
Already have an account?
# to comment
Checklist
Description
Add support for Multimodal I/O: Support multimodal input and output.
Impact (Why is this feature important?)
Will allow users to add media (images, videos, etc.) in input and get output that contains media and not just text responses.
Select Components this Feature will Impact
Select component(s) this feature will impact
Proposal (Optional)
Will require several changes to multiple components.
ChatAgent
class: should have a method that can handle multimodal input and generate multimodal output.defineChatEndpoint
to use the new method created inChatAgent
class to support multimodality.Could roll out the support for multimodal input-output for only chat endpoints not using the chat history and RAG, or just specify in the documentation that non-text information will not work with chat history and RAG.
Alternatives (Optional)
Can still use chat endpoints to generate multi-media content. Output will likely contain a URL to the generated content.
Can not provide multi-media content to models that support multi-modal input right now.
Resources (Optional)
Include any resources, references, or links that might be helpful in understanding or implementing this feature.
The text was updated successfully, but these errors were encountered: