Feature Request - Multimodal I/O: Support multimodal input and output #43

pranav-kural · 2024-07-15T18:16:59Z

Checklist

I have searched the existing issues and this feature has not been requested before.
I have checked the QvikChat Feature Release project and this feature is not listed there.
Optional: I have read the QvikChat documentation and there is no alternative to this feature.
Optional: I am willing to implement this feature and submit a pull request.

Description

Add support for Multimodal I/O: Support multimodal input and output.

Impact (Why is this feature important?)

Will allow users to add media (images, videos, etc.) in input and get output that contains media and not just text responses.

Select Components this Feature will Impact

Select component(s) this feature will impact

Proposal (Optional)

Will require several changes to multiple components.

Modify ChatAgent class: should have a method that can handle multimodal input and generate multimodal output.
Modify defineChatEndpoint to use the new method created in ChatAgent class to support multimodality.
Add a new flag 'enableMultimodality` to allow creation of endpoints that support multimodal input and output.
Chat history: will need to update handling of chat history
- how chat history will store non-text information present in previous chat messages.
- how can this non-text information be correctly retrieved and re-used when conversation is continued.
Vector store and RAG: will need to add support for storing and retrieving non-text information.

Could roll out the support for multimodal input-output for only chat endpoints not using the chat history and RAG, or just specify in the documentation that non-text information will not work with chat history and RAG.

Alternatives (Optional)

Can still use chat endpoints to generate multi-media content. Output will likely contain a URL to the generated content.

Can not provide multi-media content to models that support multi-modal input right now.

Resources (Optional)

Include any resources, references, or links that might be helpful in understanding or implementing this feature.

The text was updated successfully, but these errors were encountered:

… I/O for now) #43

… I/O (#57) * added DALL-E 3 to supported models * Updated prompts with partials and custom output schema #43 * updated prompts for multimodality #43 * Updated chat agent class for multimodal I/O #43 * Updated defineChatEndpoint to support multimodal I/O #43 * bumped NPM package for alpha release

pranav-kural · 2024-07-29T20:00:22Z

On testing, observed that when defining an open-ended chat endpoint, query never gets sent to the LLM, due to an issue in the prompt. The prompt template being used didn't have the {{query}} construct.

Also noticed warnings regarding prompt with certain name (e.g., openEndedSystemPrompt) being over-written.

…ompt usage logic #43

(#59)

pranav-kural · 2024-07-29T20:27:05Z

Fix for above issue added pre-release: #59

…timodal I/O (#60) * Fixed issues with basic system prompt + added default system prompts #43 * multiple changes to chat agent module to support multimodal I/O * bumped package version for alpha release

…-E) #43

…igureAndRunServer` method #43 #54 (#63) * fix for LLM model configurations type inference (Gemini, OpenAI, DALL-E) #43 * Bumped pre-release package version + removed dependency on genkit-langchain and genkit-chromadb * added the `configureAndRunServer` method #54 + moved exports for server and genkit to root level

* updated cache store to support multimodal inputs #43 * updates to cache store for multimodality #43 * updated define chat endpoint logic to support multimodal input output and verbose mode * fixed endpoint output schema to support verbose response * updated tests to support multimodal input output (tests only use text I/O for now) #43 * formatting change * added changes to setup `alpha` branch for alpha release * Added Changes to prompts, chat agent, endpoints to support Multimodal I/O (#57) * added DALL-E 3 to supported models * Updated prompts with partials and custom output schema #43 * updated prompts for multimodality #43 * Updated chat agent class for multimodal I/O #43 * Updated defineChatEndpoint to support multimodal I/O #43 * bumped NPM package for alpha release * Fixed issues with basic system prompt + added default system prompts #43 (#59) * Multimodal fix - multiple changes to chat agent module to support multimodal I/O (#60) * Fixed issues with basic system prompt + added default system prompts #43 * multiple changes to chat agent module to support multimodal I/O * bumped package version for alpha release * Models patch - fixes for multimodal support + implementation of `configureAndRunServer` method #43 #54 (#63) * fix for LLM model configurations type inference (Gemini, OpenAI, DALL-E) #43 * Bumped pre-release package version + removed dependency on genkit-langchain and genkit-chromadb * added the `configureAndRunServer` method #54 + moved exports for server and genkit to root level * Final merge for v2 pre-release branch (#71) * removed dependency on @genkit-ai/firebase and @genkit-ai/firebase #67 * Refactored codebase to implement type-only imports and export #68 * Implemented logic to reset cache record on expiry #69 + fixed #66 * updated implementation of method for cache store #69 * re-factored and re-organized code to export `defineChatEndpoint` from root #70 * updated `langchain` to v0.2.12 + added badges to README #66 * Bumped pre-release version - final for v2 * Remove alpha NPM package workflow + changed package version to 2.0.0 * fixed workflows for pre-deploy build + code scanning

pranav-kural added the enhancement New feature or request label Jul 15, 2024

pranav-kural added this to the Target Release 1.2.x milestone Jul 15, 2024

pranav-kural added this to QvikChat Project Jul 15, 2024

pranav-kural moved this to Backlog in QvikChat Project Jul 15, 2024

pranav-kural self-assigned this Jul 18, 2024

pranav-kural moved this from Backlog to In progress in QvikChat Project Jul 18, 2024

pranav-kural added a commit that referenced this issue Jul 18, 2024

updated cache store to support multimodal inputs #43

edaea7d

pranav-kural added a commit that referenced this issue Jul 18, 2024

updates to cache store for multimodality #43

787d5a7

pranav-kural added a commit that referenced this issue Jul 18, 2024

updated tests to support multimodal input output (tests only use text…

1024a76

… I/O for now) #43

pranav-kural added a commit that referenced this issue Jul 28, 2024

Updated prompts with partials and custom output schema #43

2efac28

pranav-kural added a commit that referenced this issue Jul 29, 2024

updated prompts for multimodality #43

bb9d673

pranav-kural added a commit that referenced this issue Jul 29, 2024

Updated chat agent class for multimodal I/O #43

679057e

pranav-kural added a commit that referenced this issue Jul 29, 2024

Updated defineChatEndpoint to support multimodal I/O #43

479ce37

pranav-kural mentioned this issue Jul 29, 2024

Added Changes to prompts, chat agent, endpoints to support Multimodal I/O #57

Merged

pranav-kural added a commit that referenced this issue Jul 29, 2024

Fixed issues with basic system prompt + implemented default system pr…

051f11c

…ompt usage logic #43

pranav-kural added a commit that referenced this issue Jul 29, 2024

Fixed issues with basic system prompt + added default system prompts #43

f29b6a8

pranav-kural added a commit that referenced this issue Jul 29, 2024

Fixed issues with basic system prompt + added default system prompts #43

5f5fa03

(#59)

pranav-kural added a commit that referenced this issue Jul 30, 2024

fix for LLM model configurations type inference (Gemini, OpenAI, DALL…

ef68093

…-E) #43

pranav-kural mentioned this issue Jul 30, 2024

Models patch - fixes for multimodal support + implementation of configureAndRunServer method #43 #54 #63

Merged

pranav-kural closed this as completed Jul 30, 2024

github-project-automation bot moved this from In progress to Done in QvikChat Project Jul 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request - Multimodal I/O: Support multimodal input and output #43

Feature Request - Multimodal I/O: Support multimodal input and output #43

pranav-kural commented Jul 15, 2024

pranav-kural commented Jul 29, 2024

pranav-kural commented Jul 29, 2024

Feature Request - Multimodal I/O: Support multimodal input and output #43

Feature Request - Multimodal I/O: Support multimodal input and output #43

Comments

pranav-kural commented Jul 15, 2024

Checklist

Description

Impact (Why is this feature important?)

Select Components this Feature will Impact

Proposal (Optional)

Alternatives (Optional)

Resources (Optional)

pranav-kural commented Jul 29, 2024

pranav-kural commented Jul 29, 2024