[Vertex AI] Add `responseModalities` to `GenerationConfig` #14658

andrewheard · 2025-04-04T22:17:09Z

Added support for setting response modalities (text or images) in GenerationConfig to enable image generation (public experimental) using Gemini 2.0 Flash (gemini-2.0-flash-exp only).

Usage Example

let vertexAI = VertexAI.vertexAI()
let model = vertexAI.generativeModel(
  modelName: "gemini-2.0-flash-exp",
  generationConfig: GenerationConfig(
    responseModalities: [.text, .image]
  )
)
let prompt = "Generate an image of a cute cartoon cat."

let response = try await model.generateContent(prompt)
for part in response.candidates.first!.content.parts {
  switch part {
  case let inlineDataPart as InlineDataPart:
    guard inlineDataPart.mimeType == "image/png" else {
      fatalError("Expected a PNG image, but got \(inlineDataPart.mimeType)")
    }
    // Handle Image
    guard let image = UIImage(data: inlineDataPart.data) else {
      fatalError("Failed to extract image from data.")
    }
    print("Received Image: \(image)")
  case let textPart as TextPart:
    // Handle Text
    print("Received Text: \(textPart.text)")
  default:
    fatalError("Received Unexpected Part Type: \(part)")
  }
}

google-oss-bot · 2025-04-04T22:17:57Z

	1 Warning
⚠️	New public headers were added, did you remember to add them to the umbrella header?

Generated by 🚫 Danger

gemini-code-assist

Code Review

This pull request introduces the responseModalities parameter to the GenerationConfig, enabling support for multimodal responses (text and images) from the Gemini 2.0 Flash model. The changes include adding the ResponseModality type, updating the GenerationConfig struct and initializer, and adding a test case for image generation. Overall, the code is well-structured and addresses the intended functionality. However, there are a few areas that could be improved for clarity and maintainability.

Summary of Findings

Documentation for Public Preview Features: The documentation and comments clearly indicate the Public Preview status of the image generation feature. This is important for users to understand the limitations and potential changes associated with experimental features.
Error Handling in Example Usage: The example usage includes fatalError calls for unexpected scenarios. While this is acceptable for example code, consider using more robust error handling mechanisms in production code.
Naming Consistency: Consider renaming rawValue in ResponseModality to protoEnumValue to align with the naming convention used in other DecodableProtoEnum implementations.

Merge Readiness

The pull request is almost ready for merging. The added functionality is well-implemented and the code is generally clear. However, addressing the naming consistency issue and considering more robust error handling in the example usage would further improve the quality of the code. I am unable to directly approve this pull request, and recommend that others review and approve this code before merging. I would recommend that the naming consistency issue be addressed before merging.

FirebaseVertexAI/Sources/Types/Public/ResponseModality.swift

FirebaseVertexAI/Sources/GenerationConfig.swift

FirebaseVertexAI/Tests/TestApp/Tests/Integration/GenerateContentIntegrationTests.swift

gemini-code-assist

Code Review

This pull request introduces the responseModalities parameter to the GenerationConfig, enabling support for multimodal responses (text and images) using the Gemini 2.0 Flash model. The changes include updates to the GenerationConfig struct, the addition of a ResponseModality enum, and a new integration test to verify image generation. Overall, the code is well-structured and addresses the intended functionality.

Summary of Findings

Copyright Year: The copyright year in ResponseModality.swift is set to 2025. Please confirm if this is intentional or if it should be updated to the current year.
Test Coverage: The new integration test generateImage includes checks for image dimensions. Consider adding more comprehensive tests to cover different scenarios and edge cases, such as handling errors or invalid image data.
Documentation Completeness: The documentation for ResponseModality is good, but consider adding a note about the potential cost implications of requesting image generation, as it might consume more resources than text-only responses.

Merge Readiness

The pull request is well-structured and introduces a valuable feature. However, before merging, please address the copyright year discrepancy and consider enhancing the test coverage as suggested. I am unable to directly approve this pull request, and other reviewers should review and approve this code before merging.

FirebaseVertexAI/Sources/Types/Public/ResponseModality.swift

FirebaseVertexAI/CHANGELOG.md

FirebaseVertexAI/Sources/GenerationConfig.swift

FirebaseVertexAI/Tests/TestApp/Tests/Integration/GenerateContentIntegrationTests.swift

[Vertex AI] Add responseModalities to GenerationConfig

12a7bef

andrewheard added the api: vertexai label Apr 4, 2025

This comment was marked as outdated.

# to view

Add developer API integration testing and update docs

0ca33d4

gemini-code-assist bot reviewed Apr 7, 2025

View reviewed changes

andrewheard added 2 commits April 7, 2025 17:59

Remove "no text in response" and add comment for image size

8326319

Update GenerationConfig unit test to include responseModalities

ec5df60

andrewheard marked this pull request as ready for review April 7, 2025 22:13

andrewheard requested a review from paulb777 April 7, 2025 22:14

andrewheard marked this pull request as draft April 7, 2025 23:51

andrewheard added 2 commits April 8, 2025 16:17

Handle 503 errors in integration test using withKnownIssue

d959d33

Merge branch 'main' into ah/vertex-multimodal-output

d8db0e3

gemini-code-assist bot reviewed Apr 8, 2025

View reviewed changes

Reword CHANGELOG slightly

9a986d5

andrewheard marked this pull request as ready for review April 8, 2025 21:24

paulb777 approved these changes Apr 8, 2025

View reviewed changes

andrewheard merged commit 7a86f19 into main Apr 9, 2025
36 checks passed

andrewheard deleted the ah/vertex-multimodal-output branch April 9, 2025 00:40

github-actions bot mentioned this pull request Apr 22, 2025

Update to Firebase 11.12.0 afresh-technologies/firebase-ios-sdk-xcframeworks#92

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Vertex AI] Add `responseModalities` to `GenerationConfig` #14658

[Vertex AI] Add `responseModalities` to `GenerationConfig` #14658

andrewheard commented Apr 4, 2025 •

edited

Loading

This comment was marked as outdated.

google-oss-bot commented Apr 4, 2025

gemini-code-assist bot left a comment

gemini-code-assist bot left a comment

[Vertex AI] Add responseModalities to GenerationConfig #14658

[Vertex AI] Add responseModalities to GenerationConfig #14658

Conversation

andrewheard commented Apr 4, 2025 • edited Loading

This comment was marked as outdated.

google-oss-bot commented Apr 4, 2025

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Summary of Findings

Merge Readiness

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Summary of Findings

Merge Readiness

[Vertex AI] Add `responseModalities` to `GenerationConfig` #14658

[Vertex AI] Add `responseModalities` to `GenerationConfig` #14658

andrewheard commented Apr 4, 2025 •

edited

Loading