Prompt Caching #234

tpaulshippy · 2025-06-09T07:16:13Z

What this does

Support prompt caching config in both Anthropic and Bedrock providers for Claude models that support it. And report prompt caching token counts for OpenAI and Gemini which cache automatically.

Caching system prompts:

chat = RubyLLM.chat
chat.with_instructions("You are a helpful assistant.")
chat.cache_prompts(system: true)

Caching user prompts:

chat = RubyLLM.chat
chat.with_instructions("You are a helpful assistant.")
chat.cache_prompts(system: true, user: true)
chat.ask("What is the capital of France?")

Caching tool definitions:

chat = RubyLLM.chat
chat.with_instructions("You are a helpful assistant.")
chat.with_tool(MyTool)
chat.cache_prompts(tools: true)
chat.ask("What is the capital of France?")

Type of change

New feature

Scope check

I read the Contributing Guide
This aligns with RubyLLM's focus on LLM communication
This isn't application-specific logic that belongs in user code
This benefits most users, not just my specific use case

Quality check

I ran overcommit --install and all hooks pass
I tested my changes thoroughly
I updated documentation if needed
I didn't modify auto-generated files manually (models.json, aliases.json)

API changes

New public methods/classes

Related issues

Resolves #13

tpaulshippy · 2025-06-09T21:45:24Z

@crmne As I don't have an Anthropic key, I'll need you to generate the VCR cartridges for that provider. Hoping everything just works, but let me know if not.

crmne · 2025-06-11T07:55:06Z

@tpaulshippy this would be great to have! Will you be willing to enable it on all providers?

I'll do a proper review when I can.

tpaulshippy · 2025-06-11T14:00:21Z

My five minutes of research indicates that at least OpenAI and Gemini take the approach of automatically caching for you based on the size and structure of your request. So the only support I think we'd really need for those two is to populate the cached token counts on the response messages. Unless we want to try to support explicit caching on the Gemini API but that looks complex and not as commonly needed.

Do you know of other providers that require payload changes for prompt caching?

tpaulshippy · 2025-06-11T14:06:54Z

lib/ruby_llm/providers/anthropic/media.rb

+        def with_cache_control(hash, cache: false)
+          return hash unless cache
+
+          hash.merge(cache_control: { type: 'ephemeral' })


Realizing this might cause errors on older models that do not support caching. If it does, we could raise here, or just let the API validation handle it. I'm torn on whether the capabilities check complexity is worth it as these models are probably so rarely used.

tpaulshippy · 2025-06-12T18:08:46Z

@crmne As I don't have an Anthropic key, I'll need you to generate the VCR cartridges for that provider. Hoping everything just works, but let me know if not.

Scratch that. I decided to stop being a cheapskate and just pay Anthropic their $5.

tpaulshippy · 2025-07-16T15:17:21Z

Looking to implement this in our project and now I'm wondering if it should be an opt out rather than an opt in. If you are using unique prompts every time I guess it adds some cost to cache them but my guess is in most applications prompts will get repeated, especially system prompts.

crmne

Thank you for this feature @tpaulshippy, however there are several improvements I'd like you to make before we merge this.

On top of the ones made in the comments, and the most important one, I'd like to have prompt caching implemented in all providers.

Plus I have not fully checked the logic in providers/anthropic but the patch seems a bit heavy-handed with the amount of changes needed at first glance. Where all changes necessary or could it be done in a simpler manner?

docs/guides/prompt-caching.md

lib/ruby_llm/chat.rb

lib/ruby_llm/completion_params.rb

lib/ruby_llm/provider.rb

spec/fixtures/large_prompt.txt

tpaulshippy · 2025-07-16T17:56:16Z

I'd like to have prompt caching implemented in all providers.

Did you see this? Is the request to populate the cached token counts on the response messages for OpenAI and Gemini?

crmne · 2025-07-16T18:18:29Z

Did you see this? Is the request to populate the cached token counts on the response messages for OpenAI and Gemini?

Thank you for pointing that out, I had missed it. I think it would certainly be a nice addition to RubyLLM to have all providers have almost the same level of support of caching.

tpaulshippy · 2025-07-16T18:32:58Z

Did you see this? Is the request to populate the cached token counts on the response messages for OpenAI and Gemini?

Thank you for pointing that out, I had missed it. I think it would certainly be a nice addition to RubyLLM to have all providers have almost the same level of support of caching.

Ok we have a bit of a naming issue. Here's the property names we get from each provider:

Anthropic
cache_creation_input_tokens
cache_read_input_tokens

OpenAI
cached_tokens

Gemini
cached_content_token_count

My reading of the docs indicates that the OpenAI and Gemini values correspond pretty closely with the cache_read_input_tokens in Anthropic.

What should we call these properties in the Message?

crmne · 2025-07-16T18:48:44Z

For the naming, let's go with:

cached_tokens - maps to the cache read values from all providers (the main property developers will use)
cache_creation_tokens - Anthropic-specific cache creation cost (nil for other providers)

This keeps it consistent with our existing input_tokens/output_tokens pattern while handling the provider differences cleanly.

Can you update the Message properties to use these names? Thanks Paul!

Take advantage of https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching#how-automatic-prefix-checking-works

tpaulshippy · 2025-08-03T04:48:04Z

Had to play with the length/complexity of the prompts. Finally got them to cache. Should be ready to go.

jordanyeo · 2025-08-27T18:43:10Z

This will be huge for me- possible to get a review and merge? Thanks!

tpaulshippy · 2025-08-29T21:25:31Z

I rolled this out 2 weeks ago. So far the numbers show that I am paying 55% less than I would have without this.

crmne

Paul, this PR is 90% VCR cassettes, but it's still a monster and I feel like the actual code is overengineered. Passing cache_control through 7 layers? No thanks.

However, how about embracing the fact that there may be moments we want to augment each message?

# Just add message-level params
chat.add_message(
  role: :user,
  content: "huge doc",
  params: { cache_control: { type: 'ephemeral' } }
)

That's it. Add params to Message.

We can then even add:

chat.with_message_params(cache_control: { type: 'ephemeral' })

In order to apply it to every message. That's it.

Then adding a provider agnostic implementation of cache control becomes trivial.

tpaulshippy · 2025-08-30T15:46:58Z

That could work for system and user messages. How about tools?

I still think there will be some complexity in the anthropic provider that may be unavoidable.

tpaulshippy · 2025-08-30T16:40:23Z

One complication is that these params have to be nested under the content array for anthropic.

So it would actually need to be something like this:

chat.add_message(
  role: :user,
  content: "huge doc",
  params: { content: [{ cache_control: { type: 'ephemeral' } }] }
)

Right?

tpaulshippy · 2025-08-30T16:47:45Z

I don't think adding the params to all messages with chat.with_message_params will generally work as you suggest.

Couple of reasons -

Anthropic limits you to 4 cache breakpoints per request.
You generally want your cache breakpoints on the last system or the last user message to get the maximum use of the cache.

tpaulshippy · 2025-08-30T16:51:40Z

I do like the suggestion of adding params to messages because that could give more fine grained control over exactly where you want the breakpoints. Would also enable passing optional keys on other providers like this one from OpenAI --

EDIT: Not seeing this option on the OpenAI responses API...

tpaulshippy

Would like to explore making this a per message option and maybe something like a with_tools_params option to make this more provider agnostic.

Thanks for the feedback @crmne

tpaulshippy · 2025-08-30T17:07:10Z

docs/_core_features/chat.md

+chat = RubyLLM.chat(model: 'claude-3-5-haiku-20241022')
+
+# Enable caching for different types of content
+chat.cache_prompts(


The more I think about it, the more I'm thinking it is odd to have a provider specific feature enabled like this. I wish anthropic was just more like the others in this area.

tpaulshippy · 2025-08-31T04:12:28Z

One complication is that these params have to be nested under the content array for anthropic.

So it would actually need to be something like this:
chat.add_message(
  role: :user,
  content: "huge doc",
  params: { content: [{ cache_control: { type: 'ephemeral' } }] }
)

Looks like to make this work, deep_merge will have to be enhanced to support arrays. Currently it just overrides the whole array rather than merging the hashes within. Will need something like this:

What do you think @crmne ?

tpaulshippy added 7 commits June 8, 2025 22:57

13: Failing specs

2e84006

13: Get caching specs passing for Bedrock

be61e48

13: Remove comments in specs

edec138

13: Add unused param on other providers

971f176

13: Rubocop -A

557a5ee

13: Add cassettes for bedrock cache specs

9673b13

13: Resolve Rubocop aside from Metrics/ParameterLists

c47d270

tpaulshippy changed the title ~~Prompt caching~~ Prompt caching for Claude Jun 9, 2025

tpaulshippy added 4 commits June 9, 2025 12:08

13: Use large enough prompt to hit cache meaningfully

eaf0876

13: Ensure cache tokens are being used

160d9ab

13: Refactor completion parameters

d1698bf

16: Add guide for prompt caching

344729f

tpaulshippy marked this pull request as ready for review June 9, 2025 21:44

tpaulshippy commented Jun 11, 2025

View reviewed changes

tpaulshippy added 2 commits June 12, 2025 11:02

Add real anthropic cassettes ($0.03)

7b98277

Merge branch 'main' into prompt-caching

fd30f14

crmne requested changes Jul 16, 2025

View reviewed changes

crmne added the enhancement New feature or request label Jul 16, 2025

tpaulshippy added 3 commits July 18, 2025 21:28

Switch from large_prompt.txt to 10,000 of the letter a

a91d07e

Make that 2048 * 4 (2048 tokens for Haiku)

f40f37d

Rename properties on message class

109bb51

tpaulshippy added 8 commits July 28, 2025 22:45

Set cache control on last message only

24cdb63

Take advantage of https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching#how-automatic-prefix-checking-works

Merge branch 'main' into prompt-caching

97bde47

Merge branch 'main' into prompt-caching

8aff99a

Fix some merge issues

7c5d792

Get openai prompt cache reporting to work

2d49d5f

Fix gemini prompt caching reporting

013b527

Add comment about why gemini is special

9dbdd12

Resolve rubocop offenses

5f6b9b3

tpaulshippy added 6 commits August 6, 2025 21:02

Merge branch 'main' into prompt-caching

f591ab1

Merge branch 'main' into prompt-caching

dd7abc9

Merge branch 'main' into prompt-caching

ace160c

Merge branch 'main' into prompt-caching

74846b2

Clean up the aaaaaaaaaaaa prompts in VCRs

91032de

Reduce line length

05cc1d9

tpaulshippy mentioned this pull request Aug 14, 2025

Prompt caching tpaulshippy/ruby_llm_community#1

Closed

tpaulshippy added 2 commits August 15, 2025 14:12

Support caching in rails model

f861b63

Merge branch 'main' into prompt-caching

f923385

Merge branch 'main' into prompt-caching

970deba

Merge branch 'main' into prompt-caching

010f889

crmne requested changes Aug 30, 2025

View reviewed changes

tpaulshippy commented Aug 30, 2025

View reviewed changes

Merge branch 'main' into prompt-caching

5c31698

Uh oh!

Prompt Caching #234

Are you sure you want to change the base?

Prompt Caching #234

Conversation

tpaulshippy commented Jun 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this does

Type of change

Scope check

Quality check

API changes

Related issues

Uh oh!

tpaulshippy commented Jun 9, 2025

Uh oh!

crmne commented Jun 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tpaulshippy commented Jun 11, 2025

Uh oh!

tpaulshippy Jun 11, 2025

Choose a reason for hiding this comment

Uh oh!

tpaulshippy commented Jun 12, 2025

Uh oh!

tpaulshippy commented Jul 16, 2025

Uh oh!

crmne left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tpaulshippy commented Jul 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

crmne commented Jul 16, 2025

Uh oh!

tpaulshippy commented Jul 16, 2025

Uh oh!

crmne commented Jul 16, 2025

Uh oh!

tpaulshippy commented Aug 3, 2025

Uh oh!

jordanyeo commented Aug 27, 2025

Uh oh!

tpaulshippy commented Aug 29, 2025

Uh oh!

crmne left a comment

Choose a reason for hiding this comment

Uh oh!

tpaulshippy commented Aug 30, 2025

Uh oh!

tpaulshippy commented Aug 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tpaulshippy commented Aug 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tpaulshippy commented Aug 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tpaulshippy left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tpaulshippy Aug 30, 2025

Choose a reason for hiding this comment

Uh oh!

tpaulshippy commented Aug 31, 2025

Uh oh!

Uh oh!

tpaulshippy commented Jun 9, 2025 •

edited

Loading

crmne commented Jun 11, 2025 •

edited

Loading

tpaulshippy commented Jul 16, 2025 •

edited

Loading

tpaulshippy commented Aug 30, 2025 •

edited

Loading

tpaulshippy commented Aug 30, 2025 •

edited

Loading

tpaulshippy commented Aug 30, 2025 •

edited

Loading

tpaulshippy left a comment •

edited

Loading