-
Notifications
You must be signed in to change notification settings - Fork 1.6k
[Vertex AI] Add responseModalities
to GenerationConfig
#14658
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
Generated by 🚫 Danger |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces the responseModalities
parameter to the GenerationConfig
, enabling support for multimodal responses (text and images) from the Gemini 2.0 Flash model. The changes include adding the ResponseModality
type, updating the GenerationConfig
struct and initializer, and adding a test case for image generation. Overall, the code is well-structured and addresses the intended functionality. However, there are a few areas that could be improved for clarity and maintainability.
Summary of Findings
- Documentation for Public Preview Features: The documentation and comments clearly indicate the Public Preview status of the image generation feature. This is important for users to understand the limitations and potential changes associated with experimental features.
- Error Handling in Example Usage: The example usage includes
fatalError
calls for unexpected scenarios. While this is acceptable for example code, consider using more robust error handling mechanisms in production code. - Naming Consistency: Consider renaming
rawValue
inResponseModality
toprotoEnumValue
to align with the naming convention used in otherDecodableProtoEnum
implementations.
Merge Readiness
The pull request is almost ready for merging. The added functionality is well-implemented and the code is generally clear. However, addressing the naming consistency issue and considering more robust error handling in the example usage would further improve the quality of the code. I am unable to directly approve this pull request, and recommend that others review and approve this code before merging. I would recommend that the naming consistency issue be addressed before merging.
FirebaseVertexAI/Tests/TestApp/Tests/Integration/GenerateContentIntegrationTests.swift
Outdated
Show resolved
Hide resolved
FirebaseVertexAI/Tests/TestApp/Tests/Integration/GenerateContentIntegrationTests.swift
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces the responseModalities
parameter to the GenerationConfig
, enabling support for multimodal responses (text and images) using the Gemini 2.0 Flash model. The changes include updates to the GenerationConfig
struct, the addition of a ResponseModality
enum, and a new integration test to verify image generation. Overall, the code is well-structured and addresses the intended functionality.
Summary of Findings
- Copyright Year: The copyright year in
ResponseModality.swift
is set to 2025. Please confirm if this is intentional or if it should be updated to the current year. - Test Coverage: The new integration test
generateImage
includes checks for image dimensions. Consider adding more comprehensive tests to cover different scenarios and edge cases, such as handling errors or invalid image data. - Documentation Completeness: The documentation for
ResponseModality
is good, but consider adding a note about the potential cost implications of requesting image generation, as it might consume more resources than text-only responses.
Merge Readiness
The pull request is well-structured and introduces a valuable feature. However, before merging, please address the copyright year discrepancy and consider enhancing the test coverage as suggested. I am unable to directly approve this pull request, and other reviewers should review and approve this code before merging.
Added support for setting response modalities (text or images) in
GenerationConfig
to enable image generation (public experimental) using Gemini 2.0 Flash (gemini-2.0-flash-exp
only).Usage Example