Skip to content

Conversation

tpaulshippy
Copy link
Contributor

@tpaulshippy tpaulshippy commented Jun 14, 2025

What this does

Enable image-to-image generation with gemini-2.0-flash-preview-image-generation

Type of change

  • New feature

Scope check

  • I read the Contributing Guide
  • This aligns with RubyLLM's focus on LLM communication
  • This isn't application-specific logic that belongs in user code
  • This benefits most users, not just my specific use case

Quality check

  • I ran overcommit --install and all hooks pass
  • I tested my changes thoroughly
  • I updated documentation if needed
  • I didn't modify auto-generated files manually (models.json, aliases.json)

API changes

  • New public methods/classes

Related issues

Screenshots

Here's what the test did.

Input
put this in a ring
ruby

Output
image

Second input
'change the background to blue'

Second output
image

@tpaulshippy
Copy link
Contributor Author

Thinking I need to move to content with attachments so the image gets sent properly on the next call.

Copy link
Owner

@crmne crmne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it's a draft and you mentioned it in a comment but you shouldn't add an images attribute to the Message object since we have the Content object for a reason.

@crmne crmne added the enhancement New feature or request label Jul 16, 2025
@tpaulshippy tpaulshippy marked this pull request as ready for review July 20, 2025 04:57
@tpaulshippy
Copy link
Contributor Author

I realize this is a very different approach than the RubyLLM.paint method as it involves generating images within a chat. I do think it has some value however, as it allows for multimodal conversations.

This has similar value as #152 but there is a bit of a clash as this introduces an ImageAttachment that is provider agnostic (although only used in Gemini so far) while that PR has a ImageAttachments class that is OpenAI specific.

I also am not sure exactly how/where to document this in the guides.

@crmne Looking forward to your feedback/thoughts.

@tpaulshippy
Copy link
Contributor Author

This document describes the two approaches pretty well I think. I could see an implementation of Imagen in RubyLLM that looks more like the #152 approach.

It looks like OpenAI supports conversational image generation through the responses API and a built in tool called "image_generation" - see here.

@tpaulshippy
Copy link
Contributor Author

tpaulshippy commented Jul 20, 2025

I like how OpenAI allows you to reference the previous images via IDs. We really need to get support for these built-in tools via the responses API into RubyLLM. We are already doing it in a fork to get web_search_preview (see diff here) but it's pretty messy.

@tpaulshippy tpaulshippy requested a review from crmne July 29, 2025 01:32
crmne and others added 13 commits August 2, 2025 22:01
- Modified to_llm to accept optional context parameter
- Updated with_context to pass context to to_llm
- Added tests to verify custom contexts work without global configuration
- Users can now use custom contexts even when global RubyLLM config is missing
…ON) (crmne#302)

## What this does

When migrating from
[ruby-openai](https://github.com/alexrudall/ruby-openai), I had some
issues getting the same responses in my Anthropic test suite.

After some digging, I observed that the Anthropic requests send the
`system context` as serialized JSON instead of a plain string like
described in the [API
reference](https://docs.anthropic.com/en/api/messages#body-system):

```ruby
{
  :system => "{type:\n        \"text\", text: \"You must include the exact phrase \\\"XKCD7392\\\" somewhere\n        in your response.\"}",
  [...]
}
```

instead of :  

```ruby
{
  :system => "You must include the exact phrase \"XKCD7392\" somewhere in your response.",
  [...]
}
```

It works quite well (the model still understands it) but it uses more
tokens than needed. It could also mislead the model in interpreting the
system prompt.

This PR fixed it. I also took the initiative to make the temperature an
optional parameter ([just like with
OpenAI](https://github.com/crmne/ruby_llm/blob/main/lib/ruby_llm/providers/openai/chat.rb#L21-L22)).
I hope it's not too much for a single PR, but since I was already
re-recording the cassettes, I figured it would be easier.

I'm sorry but I don't have any API key for Bedrock/OpenRouter. I only
recorded the main Anthropic cassettes.

## Type of change

- [x] Bug fix
- [ ] New feature
- [ ] Breaking change
- [ ] Documentation
- [ ] Performance improvement

## Scope check

- [x] I read the [Contributing
Guide](https://github.com/crmne/ruby_llm/blob/main/CONTRIBUTING.md)
- [x] This aligns with RubyLLM's focus on **LLM communication**
- [x] This isn't application-specific logic that belongs in user code
- [x] This benefits most users, not just my specific use case

## Quality check

- [x] I ran `overcommit --install` and all hooks pass
- [x] I tested my changes thoroughly
- [ ] I updated documentation if needed
- [x] I didn't modify auto-generated files manually (`models.json`,
`aliases.json`)

## API changes

- [ ] Breaking change
- [ ] New public methods/classes
- [ ] Changed method signatures
- [ ] No API changes

## Related issues

<!-- Link issues: "Fixes crmne#123" or "Related to crmne#123" -->

---------

Co-authored-by: Carmine Paolino <carmine@paolino.me>
## What this does

<!-- Clear description of what this PR does and why -->
Give callers access to the Faraday response on a property of the Message
called "raw"

## Type of change

- [x] New feature

## Scope check

- [x] I read the [Contributing
Guide](https://github.com/crmne/ruby_llm/blob/main/CONTRIBUTING.md)
- [x] This aligns with RubyLLM's focus on **LLM communication**
- [x] This isn't application-specific logic that belongs in user code
- [x] This benefits most users, not just my specific use case

## Quality check

- [x] I ran `overcommit --install` and all hooks pass
- [x] I tested my changes thoroughly
- [x] I updated documentation if needed
- [x] I didn't modify auto-generated files manually (`models.json`,
`aliases.json`)

## API changes

- [x] New public methods/classes

## Related issues

<!-- Link issues: "Fixes crmne#123" or "Related to crmne#123" -->
Resolves crmne#301

---------

Co-authored-by: Mike Robbins <mrobbins@alum.mit.edu>
## What this does

This PR adds a new callback hook to `Chat` that sends information when a
tool call is initiated by the model. This is useful when building a
coding agent to show the user progress of interactions inline with
streaming responses.

## Type of change

- [ ] Bug fix
- [x] New feature
- [ ] Breaking change
- [ ] Documentation
- [ ] Performance improvement

## Scope check

- [x] I read the [Contributing
Guide](https://github.com/crmne/ruby_llm/blob/main/CONTRIBUTING.md)
- [x] This aligns with RubyLLM's focus on **LLM communication**
- [x] This isn't application-specific logic that belongs in user code
- [x] This benefits most users, not just my specific use case
- this is beneficial to all users who want to show tool call indications
to the user

## Quality check

- [x] I ran `overcommit --install` and all hooks pass
- [x] I tested my changes thoroughly
- [x] I updated documentation if needed
- [x] I didn't modify auto-generated files manually (`models.json`,
`aliases.json`)

## API changes

- [ ] Breaking change
- [x] New public methods/classes
- [ ] Changed method signatures
- [ ] No API changes

## Related issues

N/A

---------

Co-authored-by: Carmine Paolino <carmine@paolino.me>
…y V1 and V2 (crmne#273)

## What this does

When used within our app, streaming error responses were throwing an
error and not being properly handled

```
worker      | D, [2025-07-03T18:49:52.221013 #81269] DEBUG -- RubyLLM: Received chunk: event: error
worker      | data: {"type":"error","error":{"details":null,"type":"overloaded_error","message":"Overloaded"}               }
worker      | 
worker      | 
worker      | 2025-07-03 18:49:52.233610 E [81269:sidekiq.default/processor chat_agent.rb:42] {jid: 7382519287f08cfa7cd1e4e4, queue: default} Rails -- Error in ChatAgent#send_with_streaming: NoMethodError - undefined method `merge' for nil:NilClass
worker      | 
worker      |       error_response = env.merge(body: JSON.parse(error_data), status: status)
worker      |                           ^^^^^^
worker      | 2025-07-03 18:49:52.233852 E [81269:sidekiq.default/processor chat_agent.rb:43] {jid: 7382519287f08cfa7cd1e4e4, queue: default} Rails -- Backtrace: /Users/dansingerman/.rbenv/versions/3.1.6/lib/ruby/gems/3.1.0/gems/ruby_llm-1.3.1/lib/ruby_llm/streaming.rb:91:in `handle_error_chunk'
worker      | /Users/dansingerman/.rbenv/versions/3.1.6/lib/ruby/gems/3.1.0/gems/ruby_llm-1.3.1/lib/ruby_llm/streaming.rb:62:in `process_stream_chunk'
worker      | /Users/dansingerman/.rbenv/versions/3.1.6/lib/ruby/gems/3.1.0/gems/ruby_llm-1.3.1/lib/ruby_llm/streaming.rb:70:in `block in legacy_stream_processor'
worker      | /Users/dansingerman/.rbenv/versions/3.1.6/lib/ruby/gems/3.1.0/gems/faraday-net_http-1.0.1/lib/faraday/adapter/net_http.rb:113:in `block in perform_request'
worker      | /Users/dansingerman/.rbenv/versions/3.1.6/lib/ruby/gems/3.1.0/gems/net-protocol-0.2.2/lib/net/protocol.rb:535:in `call_block'
worker      | /Users/dansingerman/.rbenv/versions/3.1.6/lib/ruby/gems/3.1.0/gems/net-protocol-0.2.2/lib/net/protocol.rb:526:in `<<'
worker      | /Users/dansingerman/.rbenv/versions/3.1.6/lib/ruby/gems/3.1.0/gems/net-protocol-0.2.2/lib/net/protocol.rb
```

It looks like the [introduction of support for Faraday V1
](crmne#173 this error, as
the error handling relies on an `env` that is no longer passed. This
should provide a fix for both V1 and V2.

One thing to note, I had to manually construct the VCR cassettes, I'm
not sure of a better way to test an intermittent error response.

I have also only written the tests against
`anthropic/claude-3-5-haiku-20241022` - it's possible other models with
a different error format may still not be properly handled, but even in
that case it won't error for the reasons fixed here.

## Type of change

- [x] Bug fix
- [ ] New feature
- [ ] Breaking change
- [ ] Documentation
- [ ] Performance improvement

## Scope check

- [x] I read the [Contributing
Guide](https://github.com/crmne/ruby_llm/blob/main/CONTRIBUTING.md)
- [x] This aligns with RubyLLM's focus on **LLM communication**
- [x] This isn't application-specific logic that belongs in user code
- [x] This benefits most users, not just my specific use case

## Quality check

- [x] I ran `overcommit --install` and all hooks pass
- [x] I tested my changes thoroughly
- [x] I updated documentation if needed
- [x] I didn't modify auto-generated files manually (`models.json`,
`aliases.json`)

## API changes

- [ ] Breaking change
- [ ] New public methods/classes
- [ ] Changed method signatures
- [x] No API changes

## Related issues

---------

Co-authored-by: Carmine Paolino <carmine@paolino.me>
## Summary
- Added documentation for handling ActionCable message ordering issues
- Includes a Stimulus controller solution for client-side reordering
- Mentions async stack and AnyCable as alternatives

## Context
This PR addresses the message ordering issues discussed in crmne#282. The
documentation includes:

1. A Stimulus controller that reorders messages based on timestamps
2. Explanation of ActionCable's ordering limitations
3. Alternative approaches (async stack, AnyCable)

## Request for Review
@ioquatix @palkan - I'd appreciate your review on the technical accuracy
of this documentation, particularly:

- Is my description of ActionCable's ordering behavior accurate?
- Are the suggested solutions appropriate?
- Any other approaches you'd recommend documenting?

## Test Plan
- [x] Documentation builds correctly
- [x] Code examples are syntactically correct
- [ ] Technical accuracy verified by domain experts
Corrects ActionCable to Action Cable throughout the documentation to match Rails naming conventions.
- Add structured output with JSON schemas example
- Include async support and model registry features
- Expand document analysis to include CSV, JSON, XML, Markdown, and code files
- Add smart configuration and automatic retry features
- Show proper RubyLLM::Schema subclassing pattern for structured output
- Ensure feature parity between README.md and docs/index.md
crmne and others added 23 commits August 2, 2025 22:01
Promoted the available models documentation from guides subfolder to
top-level navigation after ecosystem section for better visibility.
Implements Mistral AI as an OpenAI-compatible provider with minimal customizations:

- Extends OpenAI provider for core functionality
- Custom Chat module to handle system role mapping (uses 'system' instead of 'developer')
- Custom render_payload to remove unsupported stream_options parameter
- Custom Embeddings module that ignores dimensions parameter (not supported by Mistral)
- Implements capabilities detection for vision (pixtral), embeddings, and chat models
- Adds ministral-3b-latest for chat tests (cheapest option)
- Adds pixtral-12b-latest for vision tests
- Adds mistral-embed for embedding tests
- Fetches and includes 63 Mistral models in models.json
- Adds appropriate test skips for known model limitations
- Fix mistral models capabilities format (Hash -> Array)
- Fix imagen models output modality (text -> image)
- Add models_schema.json for validation

Fixes crmne#315
…#318)

## What this does

An `on_tool_call` callback was added in `1.4.0` via
crmne#299, but it doens't work with a
model using the rails integration via `acts_as_chat`.

This PR wires up the missing method so it works with the integration.

## Type of change

- [x] Bug fix
- [ ] New feature
- [ ] Breaking change
- [ ] Documentation
- [ ] Performance improvement

## Scope check

- [x] I read the [Contributing
Guide](https://github.com/crmne/ruby_llm/blob/main/CONTRIBUTING.md)
- [x] This aligns with RubyLLM's focus on **LLM communication**
- [x] This isn't application-specific logic that belongs in user code
- [x] This benefits most users, not just my specific use case

## Quality check

- [ ] ~I ran `overcommit --install` and all hooks pass~
- When I tried to commit the hooks generated a bunch of changes to
`models.json` and `aliases.json` and broke a bunch of the specs, so I
removed the hooks and ran specs and rubocop manually
- [x] I tested my changes thoroughly
- [x] I updated documentation if needed
  - No need
- [x] I didn't modify auto-generated files manually (`models.json`,
`aliases.json`)

## API changes

- [ ] Breaking change
- [ ] New public methods/classes
- [ ] Changed method signatures
- [x] No API changes

## Related issues

<!-- Link issues: "Fixes crmne#123" or "Related to crmne#123" -->
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants