Skip to content

Commit 1e1ad41

Browse files
ThomasVitalemarkpollack
authored andcommitted
Introduce DocumentPostProcessor API for Modular RAG
The DocumentPostProcessor API has been introduced to implement post-retrieval components in a Modular RAG architecture, superseding the DocumentCompressor, DocumentRanker, DocumentSelector APIs that are now deprecated. Signed-off-by: Thomas Vitale <ThomasVitale@users.noreply.github.com>
1 parent 5397108 commit 1e1ad41

File tree

7 files changed

+88
-23
lines changed

7 files changed

+88
-23
lines changed

spring-ai-docs/src/main/antora/modules/ROOT/pages/api/retrieval-augmented-generation.adoc

+3-23
Original file line numberDiff line numberDiff line change
@@ -361,31 +361,11 @@ List<Document> documents = documentJoiner.join(documentsForQuery);
361361

362362
Post-Retrieval modules are responsible for processing the retrieved documents to achieve the best possible generation results.
363363

364-
==== Document Ranking
364+
==== Document Post-Processing
365365

366-
A component for ordering and ranking documents based on their relevance to a query to bring the most relevant documents
367-
to the top of the list, addressing challenges such as _lost-in-the-middle_.
366+
A component for post-processing retrieved documents based on a query, addressing challenges such as _lost-in-the-middle_, context length restrictions from the model, and the need to reduce noise and redundancy in the retrieved information.
368367

369-
Unlike `DocumentSelector`, this component does not remove entire documents from the list, but rather changes
370-
the order/score of the documents in the list. Unlike `DocumentCompressor`, this component does not alter the content
371-
of the documents.
372-
373-
==== Document Selection
374-
375-
A component for removing irrelevant or redundant documents from a list of retrieved documents, addressing challenges
376-
such as _lost-in-the-middle_ and context length restrictions from the model.
377-
378-
Unlike `DocumentRanker`, this component does not change the order/score of the documents in the list, but rather
379-
removes irrelevant or redundant documents. Unlike `DocumentCompressor`, this component does not alter the content
380-
of the documents, but rather removes entire documents.
381-
382-
==== Document Compression
383-
384-
A component for compressing the content of each document to reduce noise and redundancy in the retrieved information,
385-
addressing challenges such as _lost-in-the-middle_ and context length restrictions from the model.
386-
387-
Unlike `DocumentSelector`, this component does not remove entire documents from the list, but rather alters the content
388-
of the documents. Unlike `DocumentRanker`, this component does not change the order/score of the documents in the list.
368+
For example, it could rank documents based on their relevance to the query, remove irrelevant or redundant documents, or compress the content of each document to reduce noise and redundancy.
389369

390370
=== Generation
391371

spring-ai-docs/src/main/antora/modules/ROOT/pages/upgrade-notes.adoc

+6
Original file line numberDiff line numberDiff line change
@@ -137,6 +137,7 @@ This approach offers more control when you need to conditionally change parts of
137137

138138
* The `PromptTemplate` API has been redesigned to support a more flexible and extensible way of templating prompts, relying on a new `TemplateRenderer` API. As part of this change, the `getInputVariables()` and `validate()` methods have been deprecated and will throw an `UnsupportedOperationException` if called. Any logic specific to a template engine should be available through the `TemplateRenderer` API.
139139

140+
140141
=== Observability
141142

142143
* Changes to the `spring.ai.client` observation:
@@ -146,6 +147,11 @@ This approach offers more control when you need to conditionally change parts of
146147
* Changes to the `spring.ai.advisor` observation:
147148
** The `spring.ai.advisor.type` attribute has been deprecated. In previous releases, the Advisor API was categorized based on the type of advisor (`before`, `after`, `around`). That distinction doesn't apply anymore meaning that all Advisors are now of the same type (`around`).
148149

150+
=== Retrieval Augmented Generation
151+
152+
* The `DocumentPostProcessor` API has been introduced to implement post-retrieval components in a Modular RAG architecture, superseding the `DocumentCompressor`, `DocumentRanker`, `DocumentSelector` APIs that are now deprecated.
153+
154+
149155
[[upgrading-to-1-0-0-m7]]
150156
== Upgrading to 1.0.0-M7
151157

spring-ai-rag/src/main/java/org/springframework/ai/rag/postretrieval/compression/DocumentCompressor.java

+4
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@
2121

2222
import org.springframework.ai.document.Document;
2323
import org.springframework.ai.rag.Query;
24+
import org.springframework.ai.rag.postretrieval.document.DocumentPostProcessor;
2425
import org.springframework.ai.rag.postretrieval.ranking.DocumentRanker;
2526
import org.springframework.ai.rag.postretrieval.selection.DocumentSelector;
2627

@@ -33,7 +34,10 @@
3334
* the list, but rather alters the content of the documents. Unlike
3435
* {@link DocumentRanker}, this component does not change the order/score of the documents
3536
* in the list.
37+
*
38+
* @deprecated in favour of {@link DocumentPostProcessor}.
3639
*/
40+
@Deprecated
3741
public interface DocumentCompressor extends BiFunction<Query, List<Document>, List<Document>> {
3842

3943
/**
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
/*
2+
* Copyright 2023-2025 the original author or authors.
3+
*
4+
* Licensed under the Apache License, Version 2.0 (the "License");
5+
* you may not use this file except in compliance with the License.
6+
* You may obtain a copy of the License at
7+
*
8+
* https://www.apache.org/licenses/LICENSE-2.0
9+
*
10+
* Unless required by applicable law or agreed to in writing, software
11+
* distributed under the License is distributed on an "AS IS" BASIS,
12+
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
* See the License for the specific language governing permissions and
14+
* limitations under the License.
15+
*/
16+
17+
package org.springframework.ai.rag.postretrieval.document;
18+
19+
import org.springframework.ai.document.Document;
20+
import org.springframework.ai.rag.Query;
21+
22+
import java.util.List;
23+
import java.util.function.BiFunction;
24+
25+
/**
26+
* A component for post-processing retrieved documents based on a query, addressing
27+
* challenges such as "lost-in-the-middle", context length restrictions from the model,
28+
* and the need to reduce noise and redundancy in the retrieved information.
29+
* <p>
30+
* For example, it could rank documents based on their relevance to the query, remove
31+
* irrelevant or redundant documents, or compress the content of each document to reduce
32+
* noise and redundancy.
33+
*
34+
* @author Thomas Vitale
35+
* @since 1.0.0
36+
*/
37+
public interface DocumentPostProcessor extends BiFunction<Query, List<Document>, List<Document>> {
38+
39+
List<Document> process(Query query, List<Document> documents);
40+
41+
default List<Document> apply(Query query, List<Document> documents) {
42+
return process(query, documents);
43+
}
44+
45+
}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
/*
2+
* Copyright 2023-2025 the original author or authors.
3+
*
4+
* Licensed under the Apache License, Version 2.0 (the "License");
5+
* you may not use this file except in compliance with the License.
6+
* You may obtain a copy of the License at
7+
*
8+
* https://www.apache.org/licenses/LICENSE-2.0
9+
*
10+
* Unless required by applicable law or agreed to in writing, software
11+
* distributed under the License is distributed on an "AS IS" BASIS,
12+
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
* See the License for the specific language governing permissions and
14+
* limitations under the License.
15+
*/
16+
17+
@NonNullApi
18+
@NonNullFields
19+
package org.springframework.ai.rag.postretrieval.document;
20+
21+
import org.springframework.lang.NonNullApi;
22+
import org.springframework.lang.NonNullFields;

spring-ai-rag/src/main/java/org/springframework/ai/rag/postretrieval/ranking/DocumentRanker.java

+4
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@
2222
import org.springframework.ai.document.Document;
2323
import org.springframework.ai.rag.Query;
2424
import org.springframework.ai.rag.postretrieval.compression.DocumentCompressor;
25+
import org.springframework.ai.rag.postretrieval.document.DocumentPostProcessor;
2526
import org.springframework.ai.rag.postretrieval.selection.DocumentSelector;
2627

2728
/**
@@ -32,7 +33,10 @@
3233
* Unlike {@link DocumentSelector}, this component does not remove entire documents from
3334
* the list, but rather changes the order/score of the documents in the list. Unlike
3435
* {@link DocumentCompressor}, this component does not alter the content of the documents.
36+
*
37+
* @deprecated in favour of {@link DocumentPostProcessor}.
3538
*/
39+
@Deprecated
3640
public interface DocumentRanker extends BiFunction<Query, List<Document>, List<Document>> {
3741

3842
/**

spring-ai-rag/src/main/java/org/springframework/ai/rag/postretrieval/selection/DocumentSelector.java

+4
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@
2222
import org.springframework.ai.document.Document;
2323
import org.springframework.ai.rag.Query;
2424
import org.springframework.ai.rag.postretrieval.compression.DocumentCompressor;
25+
import org.springframework.ai.rag.postretrieval.document.DocumentPostProcessor;
2526
import org.springframework.ai.rag.postretrieval.ranking.DocumentRanker;
2627

2728
/**
@@ -33,7 +34,10 @@
3334
* documents in the list, but rather removes irrelevant or redundant documents. Unlike
3435
* {@link DocumentCompressor}, this component does not alter the content of the documents,
3536
* but rather removes entire documents.
37+
*
38+
* @deprecated in favour of {@link DocumentPostProcessor}.
3639
*/
40+
@Deprecated
3741
public interface DocumentSelector extends BiFunction<Query, List<Document>, List<Document>> {
3842

3943
/**

0 commit comments

Comments
 (0)