HTTP Client configuration for models and vector stores #512

ThomasVitale · 2024-03-25T20:32:24Z

Enhancement Description

Each model integration is composed of two aspects: an *Api class calling the model provider over HTTP, and a *Client class encapsulating the LLM specific aspects.

Each *Client class is highly customizable based on nice interfaces, making it possible to overwrite many different options. It would be nice to provide similar flexibility for each *Api class as well. In particular, it would be useful to be able to configure options related to the HTTP Client.

Examples of aspects that would need to be configured:

enable logging of requests/responses, very useful for general troubleshooting but also for refining prompts during development and testing;
define connection and read timeout settings;
configure an SslBundle to connect with on-prem model providers using custom CA certificates;
configure connections through a corporate proxy, very common in production deployments.

Furthermore, there might be additional needs for configuring resilience patterns:

configure retry strategy in case of failures;
define a fallback logic in case of failures.

More settings that right now are part of the model connection configuration (and that still relates to the HTTP interaction) would also need to be customisable in enterprise use cases in production (e.g. multi-user applications or even multi-tenant applications). For example, when using OpenAI, the following could need changing per request/session.

API Key
Organization
User

All the above is focused on the HTTP interactions with model providers, but the same would be useful for vector stores.

Possible Solutions

Drawing from the nice abstractions designed to customize the model integrations and ultimately implementing the ModelOptions interface, it could be an idea to define a dedicated abstraction to pass HTTP client customizations to an *Api class (something like HttpClientConfig), which might also be exposed via configuration properties (under spring.ai.<model>.client.*).

For the more specific resilience configurations (like retries and fallbacks), an annotation-driven approach might be more suitable. Resilience4j might provide a way to achieve this, since I don't think Spring supports the Fault Tolerance Microprofile spec.

A partial alternative solution would be for developers to define a custom RestClient.Builder or WebClient.Builder and pass that to each *Api class, but it would result in a lot of extra configurations and reduce the convenience of the autoconfiguration. Also, it would tight a generic configuration like "enable logs" or "use a custom CA" to the specific client used, resulting in duplication when both blocking and streaming interactions are used in the same application.

I'm available to contribute and help solve this issue.

Related Issues

The text was updated successfully, but these errors were encountered:

thingersoft · 2024-03-29T20:19:42Z

Hello,
that's more or less the same strategy I tought to use for a generic approach to the timeout problem.
I think it's a crucial aspect to take care of moving towards a 1.0 release since we are talking about common requirements for non streaming consumers.
Also when a read timeout occurs you lost the response forever and for larger commercial models it means money.

I was available to contribute too but till now I had little luck getting feedback from project owners.

markpollack · 2024-06-20T15:14:19Z

There is a lot to unpack here, so let's start small and work our way to more features.

At the lowest level, we are using either our own hand written client to talk with a model, OpenAiApi is a perfect example. If a user is operating at this level, there are a few things that can be done.

We can add some trace or debug level logging that can be enabled in the typical spring boot manner.
A user can also create a RestClientCustomizer as shown in this example.

For other models, for example AzureOpenAI or Google vertex, we are using client libraries provided by Microsoft and Google and we can't use the approach above.

We can however at a high level, the ChatClient level, I first thought we could introduce a logging advisor to the code base but the advisor doesn't yet have access to the final prompt, only the parts that go into making it. So instead we should update ChatModel implementations to do the logging at the appropriate places in that class. This issue discusses that.

Potentially we can still have a logging advisor, but it would serve a different purpose, and is likely still a useful addition.

On another topic, of retry, this could potentially move out of the *Api classes and be moved into an advisor. The issue there is that retry would only kick in if using ChatClient and not the *Api classes. I suspect that the right strategy is to put retry in when we can at the lowest level and also provide a retry advisor to be used when we don't control the underlying library that communicates with the AI model.

Thoughts?

piotrooo · 2024-06-21T05:51:11Z

I like the idea of creating advisors for logging purposes 👍

However, when thinking about retry logic...

Currently, we handle two ways of calling models:

Using an HTTP client *Api - RestClient or WebClient
Using an SDK - such as Azure OpenAIClient or Google GenerativeModel

I imagine the retry logic should be the same across all models. Tying it to the *Api classes doesn't allow us to reuse it in the SDK scenarios. Additionally, we should consider models that don't use a ChatClient, such as transcription or speech models.

Therefore, I suggest introducing a new retry layer — or even more broadly, a resilience layer (starting with retry support but with the potential to add new features in the future):

There could also be several other layers for customizing the HTTP client and so on, as @ThomasVitale mentioned.

ThomasVitale · 2024-06-23T16:05:52Z

@markpollack @piotrooo thank you both for sharing your thoughts!

I see two types of logs that can be useful in an application using Spring AI. My original intent with this issue was to cover the first type.

HTTP Requests/Responses. Logging of headers and/or body of the HTTP interactions with an LLM provider. For example, this is useful when troubleshooting what's the underlying format of a request/response and spot JSON conversion errors or incompatibilities with updated APIs from the provider.

For all the *Api classes provided by Spring AI, I think there should be a way to customise the underlying RestClient or WebClient with a logging interceptor (and similarly also timeouts and SslBundles). The workaround shown here is good enough for experiments, but it cannot really be used in real-world application because the RestClientCustomizer/WebClientCustomizer would be shared across the application.
For all the integrations where third-party libraries are used (such as Vertex AI or Azure OpenAI), I expect those libraries to provide options for logging requests/responses (as well as timeouts and TLS). That's not something Spring AI can solve (unless perhaps surfacing some auto configuration properties, should that capability exist in those libraries).

Prompt/Completion. Logging of the content of a prompt or a completion. For example, this is very important when it comes to prompt design/evaluation or observability. I would not recommend implementing such functionality via explicit log messages in the ChatClient API (or the underlying ChatModel). Instead, I would recommend framing this feature in the broader context of introducing observability for Spring AI. Using the Micrometer Observation API, it's possible to instrument the ChatModel classes once and configure logs, metrics, and traces through the Micrometer machinery. It's critical to include prompt/completion content in the observability solution because it's necessary for any evaluation/prompt de#tegration. I have a draft solution I'll share soon, I need to polish a few things. I wouldn't introduce a LoggingAdvisor at the moment. I think we need first the observability foundation at the ChatModel level before addressing further observability needs at the ChatClient level (using Advisors to offer observability for these higher-level workflows/chains, which typically consist of multiple LLM requests and function calls).

What do you think?

piotrooo · 2024-06-24T17:26:12Z

That's not something Spring AI can solve (unless perhaps surfacing some auto configuration properties, should that capability exist in those libraries).

I thought about some customizers for SDK clients, but I'm not really convinced by this approach. However, I think this is probably how I want to customize e.g., Azure OpenAIClient (and others).

I think we need first the observability foundation at the ChatModel level before addressing further observability needs at the ChatClient level (using Advisors to offer observability for these higher-level workflows/chains, which typically consist of multiple LLM requests and function calls).

Right now, ChatClient is going to be a Swiss army knife with observability, retries, ahh and, of course sending requests to the model 😬.
But for now, I don't have a better idea.

fmunch mentioned this issue May 20, 2024

Add support for custom WebClientBuilder #739

Closed

ThomasVitale mentioned this issue Jun 18, 2024

Easy way to log requests and responses to LLM's #883

Closed

This was referenced Jun 23, 2024

Create an AbstractChatModel class that adds logging of request and response. #909

Closed

Add request/response logging in OpenAiApi #908

Closed

ThomasVitale mentioned this issue Jun 25, 2024

[Observability] Initial Observability for Models #953

Closed

ThomasVitale mentioned this issue Jul 8, 2024

OpenAiImageModel fails when Spring Boot uses the OkHttpClient implementation of RestClient #1016

Closed

markpollack mentioned this issue Jul 22, 2024

Provide the ability to configure client timeouts #354

Closed

markpollack added model client to discuss design labels Jul 22, 2024

markpollack added this to the 1.0.0-RC1 milestone Jul 22, 2024

markpollack mentioned this issue Jul 22, 2024

'JsonEOFException: Unexpected end-of-input: expected close marker for Object' when making synchronous openAiChatClient.call #372

Open

ThomasVitale mentioned this issue Aug 20, 2024

Caused by: java.net.SocketTimeoutException: Read timed out #1250

Closed

andreas-eberle mentioned this issue Aug 28, 2024

Add support to configure custom headers for Azure OpenAI like it is supported for OpenAI already #1284

Closed

muhrifqii mentioned this issue Sep 4, 2024

Chat Memory and MessageStore Usecase muhrifqii/LLM-Ollama-Java-Spring-AI#2

Merged

ThomasVitale mentioned this issue Oct 30, 2024

asking for longer prompts triggers - Request processing failed: org.springframework.web.client.ResourceAccessException: I/O error on POST request for "http://localhost:11434/api/chat": timeout #1634

Closed

asaikali added the configuration label Nov 10, 2024

asaikali mentioned this issue Nov 11, 2024

Explore WebClient configuration based on RestClient configuration #1714

Open

markpollack modified the milestones: 1.0.0-RC1-triage, 1.0.0-RC1 Apr 21, 2025

markpollack modified the milestones: 1.0.0-RC1, 1.0.x May 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

HTTP Client configuration for models and vector stores #512

HTTP Client configuration for models and vector stores #512

ThomasVitale commented Mar 25, 2024 •

edited

Loading

thingersoft commented Mar 29, 2024

Uh oh!

markpollack commented Jun 20, 2024 •

edited

Loading

Uh oh!

piotrooo commented Jun 21, 2024

Uh oh!

ThomasVitale commented Jun 23, 2024 •

edited

Loading

Uh oh!

piotrooo commented Jun 24, 2024

Uh oh!

HTTP Client configuration for models and vector stores #512

HTTP Client configuration for models and vector stores #512

Comments

ThomasVitale commented Mar 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Enhancement Description

Possible Solutions

Related Issues

thingersoft commented Mar 29, 2024

Uh oh!

markpollack commented Jun 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

piotrooo commented Jun 21, 2024

Uh oh!

ThomasVitale commented Jun 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

piotrooo commented Jun 24, 2024

Uh oh!

ThomasVitale commented Mar 25, 2024 •

edited

Loading

markpollack commented Jun 20, 2024 •

edited

Loading

ThomasVitale commented Jun 23, 2024 •

edited

Loading