-
-
Notifications
You must be signed in to change notification settings - Fork 237
Image to image with gemini-2.0-flash-preview-image-generation #248
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
base: main
Are you sure you want to change the base?
Conversation
Thinking I need to move to content with attachments so the image gets sent properly on the next call. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know it's a draft and you mentioned it in a comment but you shouldn't add an images
attribute to the Message
object since we have the Content
object for a reason.
I realize this is a very different approach than the This has similar value as #152 but there is a bit of a clash as this introduces an I also am not sure exactly how/where to document this in the guides. @crmne Looking forward to your feedback/thoughts. |
This document describes the two approaches pretty well I think. I could see an implementation of Imagen in RubyLLM that looks more like the #152 approach. It looks like OpenAI supports conversational image generation through the responses API and a built in tool called "image_generation" - see here. |
I like how OpenAI allows you to reference the previous images via IDs. We really need to get support for these built-in tools via the responses API into RubyLLM. We are already doing it in a fork to get web_search_preview (see diff here) but it's pretty messy. |
- Modified to_llm to accept optional context parameter - Updated with_context to pass context to to_llm - Added tests to verify custom contexts work without global configuration - Users can now use custom contexts even when global RubyLLM config is missing
…ON) (crmne#302) ## What this does When migrating from [ruby-openai](https://github.com/alexrudall/ruby-openai), I had some issues getting the same responses in my Anthropic test suite. After some digging, I observed that the Anthropic requests send the `system context` as serialized JSON instead of a plain string like described in the [API reference](https://docs.anthropic.com/en/api/messages#body-system): ```ruby { :system => "{type:\n \"text\", text: \"You must include the exact phrase \\\"XKCD7392\\\" somewhere\n in your response.\"}", [...] } ``` instead of : ```ruby { :system => "You must include the exact phrase \"XKCD7392\" somewhere in your response.", [...] } ``` It works quite well (the model still understands it) but it uses more tokens than needed. It could also mislead the model in interpreting the system prompt. This PR fixed it. I also took the initiative to make the temperature an optional parameter ([just like with OpenAI](https://github.com/crmne/ruby_llm/blob/main/lib/ruby_llm/providers/openai/chat.rb#L21-L22)). I hope it's not too much for a single PR, but since I was already re-recording the cassettes, I figured it would be easier. I'm sorry but I don't have any API key for Bedrock/OpenRouter. I only recorded the main Anthropic cassettes. ## Type of change - [x] Bug fix - [ ] New feature - [ ] Breaking change - [ ] Documentation - [ ] Performance improvement ## Scope check - [x] I read the [Contributing Guide](https://github.com/crmne/ruby_llm/blob/main/CONTRIBUTING.md) - [x] This aligns with RubyLLM's focus on **LLM communication** - [x] This isn't application-specific logic that belongs in user code - [x] This benefits most users, not just my specific use case ## Quality check - [x] I ran `overcommit --install` and all hooks pass - [x] I tested my changes thoroughly - [ ] I updated documentation if needed - [x] I didn't modify auto-generated files manually (`models.json`, `aliases.json`) ## API changes - [ ] Breaking change - [ ] New public methods/classes - [ ] Changed method signatures - [ ] No API changes ## Related issues <!-- Link issues: "Fixes crmne#123" or "Related to crmne#123" --> --------- Co-authored-by: Carmine Paolino <carmine@paolino.me>
## What this does <!-- Clear description of what this PR does and why --> Give callers access to the Faraday response on a property of the Message called "raw" ## Type of change - [x] New feature ## Scope check - [x] I read the [Contributing Guide](https://github.com/crmne/ruby_llm/blob/main/CONTRIBUTING.md) - [x] This aligns with RubyLLM's focus on **LLM communication** - [x] This isn't application-specific logic that belongs in user code - [x] This benefits most users, not just my specific use case ## Quality check - [x] I ran `overcommit --install` and all hooks pass - [x] I tested my changes thoroughly - [x] I updated documentation if needed - [x] I didn't modify auto-generated files manually (`models.json`, `aliases.json`) ## API changes - [x] New public methods/classes ## Related issues <!-- Link issues: "Fixes crmne#123" or "Related to crmne#123" --> Resolves crmne#301 --------- Co-authored-by: Mike Robbins <mrobbins@alum.mit.edu>
## What this does This PR adds a new callback hook to `Chat` that sends information when a tool call is initiated by the model. This is useful when building a coding agent to show the user progress of interactions inline with streaming responses. ## Type of change - [ ] Bug fix - [x] New feature - [ ] Breaking change - [ ] Documentation - [ ] Performance improvement ## Scope check - [x] I read the [Contributing Guide](https://github.com/crmne/ruby_llm/blob/main/CONTRIBUTING.md) - [x] This aligns with RubyLLM's focus on **LLM communication** - [x] This isn't application-specific logic that belongs in user code - [x] This benefits most users, not just my specific use case - this is beneficial to all users who want to show tool call indications to the user ## Quality check - [x] I ran `overcommit --install` and all hooks pass - [x] I tested my changes thoroughly - [x] I updated documentation if needed - [x] I didn't modify auto-generated files manually (`models.json`, `aliases.json`) ## API changes - [ ] Breaking change - [x] New public methods/classes - [ ] Changed method signatures - [ ] No API changes ## Related issues N/A --------- Co-authored-by: Carmine Paolino <carmine@paolino.me>
…y V1 and V2 (crmne#273) ## What this does When used within our app, streaming error responses were throwing an error and not being properly handled ``` worker | D, [2025-07-03T18:49:52.221013 #81269] DEBUG -- RubyLLM: Received chunk: event: error worker | data: {"type":"error","error":{"details":null,"type":"overloaded_error","message":"Overloaded"} } worker | worker | worker | 2025-07-03 18:49:52.233610 E [81269:sidekiq.default/processor chat_agent.rb:42] {jid: 7382519287f08cfa7cd1e4e4, queue: default} Rails -- Error in ChatAgent#send_with_streaming: NoMethodError - undefined method `merge' for nil:NilClass worker | worker | error_response = env.merge(body: JSON.parse(error_data), status: status) worker | ^^^^^^ worker | 2025-07-03 18:49:52.233852 E [81269:sidekiq.default/processor chat_agent.rb:43] {jid: 7382519287f08cfa7cd1e4e4, queue: default} Rails -- Backtrace: /Users/dansingerman/.rbenv/versions/3.1.6/lib/ruby/gems/3.1.0/gems/ruby_llm-1.3.1/lib/ruby_llm/streaming.rb:91:in `handle_error_chunk' worker | /Users/dansingerman/.rbenv/versions/3.1.6/lib/ruby/gems/3.1.0/gems/ruby_llm-1.3.1/lib/ruby_llm/streaming.rb:62:in `process_stream_chunk' worker | /Users/dansingerman/.rbenv/versions/3.1.6/lib/ruby/gems/3.1.0/gems/ruby_llm-1.3.1/lib/ruby_llm/streaming.rb:70:in `block in legacy_stream_processor' worker | /Users/dansingerman/.rbenv/versions/3.1.6/lib/ruby/gems/3.1.0/gems/faraday-net_http-1.0.1/lib/faraday/adapter/net_http.rb:113:in `block in perform_request' worker | /Users/dansingerman/.rbenv/versions/3.1.6/lib/ruby/gems/3.1.0/gems/net-protocol-0.2.2/lib/net/protocol.rb:535:in `call_block' worker | /Users/dansingerman/.rbenv/versions/3.1.6/lib/ruby/gems/3.1.0/gems/net-protocol-0.2.2/lib/net/protocol.rb:526:in `<<' worker | /Users/dansingerman/.rbenv/versions/3.1.6/lib/ruby/gems/3.1.0/gems/net-protocol-0.2.2/lib/net/protocol.rb ``` It looks like the [introduction of support for Faraday V1 ](crmne#173 this error, as the error handling relies on an `env` that is no longer passed. This should provide a fix for both V1 and V2. One thing to note, I had to manually construct the VCR cassettes, I'm not sure of a better way to test an intermittent error response. I have also only written the tests against `anthropic/claude-3-5-haiku-20241022` - it's possible other models with a different error format may still not be properly handled, but even in that case it won't error for the reasons fixed here. ## Type of change - [x] Bug fix - [ ] New feature - [ ] Breaking change - [ ] Documentation - [ ] Performance improvement ## Scope check - [x] I read the [Contributing Guide](https://github.com/crmne/ruby_llm/blob/main/CONTRIBUTING.md) - [x] This aligns with RubyLLM's focus on **LLM communication** - [x] This isn't application-specific logic that belongs in user code - [x] This benefits most users, not just my specific use case ## Quality check - [x] I ran `overcommit --install` and all hooks pass - [x] I tested my changes thoroughly - [x] I updated documentation if needed - [x] I didn't modify auto-generated files manually (`models.json`, `aliases.json`) ## API changes - [ ] Breaking change - [ ] New public methods/classes - [ ] Changed method signatures - [x] No API changes ## Related issues --------- Co-authored-by: Carmine Paolino <carmine@paolino.me>
## Summary - Added documentation for handling ActionCable message ordering issues - Includes a Stimulus controller solution for client-side reordering - Mentions async stack and AnyCable as alternatives ## Context This PR addresses the message ordering issues discussed in crmne#282. The documentation includes: 1. A Stimulus controller that reorders messages based on timestamps 2. Explanation of ActionCable's ordering limitations 3. Alternative approaches (async stack, AnyCable) ## Request for Review @ioquatix @palkan - I'd appreciate your review on the technical accuracy of this documentation, particularly: - Is my description of ActionCable's ordering behavior accurate? - Are the suggested solutions appropriate? - Any other approaches you'd recommend documenting? ## Test Plan - [x] Documentation builds correctly - [x] Code examples are syntactically correct - [ ] Technical accuracy verified by domain experts
Corrects ActionCable to Action Cable throughout the documentation to match Rails naming conventions.
- Add structured output with JSON schemas example - Include async support and model registry features - Expand document analysis to include CSV, JSON, XML, Markdown, and code files - Add smart configuration and automatic retry features - Show proper RubyLLM::Schema subclassing pattern for structured output - Ensure feature parity between README.md and docs/index.md
Promoted the available models documentation from guides subfolder to top-level navigation after ecosystem section for better visibility.
Implements Mistral AI as an OpenAI-compatible provider with minimal customizations: - Extends OpenAI provider for core functionality - Custom Chat module to handle system role mapping (uses 'system' instead of 'developer') - Custom render_payload to remove unsupported stream_options parameter - Custom Embeddings module that ignores dimensions parameter (not supported by Mistral) - Implements capabilities detection for vision (pixtral), embeddings, and chat models - Adds ministral-3b-latest for chat tests (cheapest option) - Adds pixtral-12b-latest for vision tests - Adds mistral-embed for embedding tests - Fetches and includes 63 Mistral models in models.json - Adds appropriate test skips for known model limitations
…ersations and custom dimensions
- Fix mistral models capabilities format (Hash -> Array) - Fix imagen models output modality (text -> image) - Add models_schema.json for validation Fixes crmne#315
…#318) ## What this does An `on_tool_call` callback was added in `1.4.0` via crmne#299, but it doens't work with a model using the rails integration via `acts_as_chat`. This PR wires up the missing method so it works with the integration. ## Type of change - [x] Bug fix - [ ] New feature - [ ] Breaking change - [ ] Documentation - [ ] Performance improvement ## Scope check - [x] I read the [Contributing Guide](https://github.com/crmne/ruby_llm/blob/main/CONTRIBUTING.md) - [x] This aligns with RubyLLM's focus on **LLM communication** - [x] This isn't application-specific logic that belongs in user code - [x] This benefits most users, not just my specific use case ## Quality check - [ ] ~I ran `overcommit --install` and all hooks pass~ - When I tried to commit the hooks generated a bunch of changes to `models.json` and `aliases.json` and broke a bunch of the specs, so I removed the hooks and ran specs and rubocop manually - [x] I tested my changes thoroughly - [x] I updated documentation if needed - No need - [x] I didn't modify auto-generated files manually (`models.json`, `aliases.json`) ## API changes - [ ] Breaking change - [ ] New public methods/classes - [ ] Changed method signatures - [x] No API changes ## Related issues <!-- Link issues: "Fixes crmne#123" or "Related to crmne#123" -->
a6ced7a
to
9704fe1
Compare
What this does
Enable image-to-image generation with gemini-2.0-flash-preview-image-generation
Type of change
Scope check
Quality check
overcommit --install
and all hooks passmodels.json
,aliases.json
)API changes
Related issues
Screenshots
Here's what the test did.
Input

put this in a ring
Output

Second input
'change the background to blue'
Second output
