Skip to content

Commit

Permalink
community: Add configurable text key for indexing and the retriever i…
Browse files Browse the repository at this point in the history
…n Pinecone Hybrid Search (langchain-ai#29697)

**issue**

In Langchain, the original content is generally stored under the `text`
key. However, the `PineconeHybridSearchRetriever` searches the `context`
field in the metadata and cannot change this key. To address this, I
have modified the code to allow changing the key to something other than
context.

In my opinion, following Langchain's conventions, the `text` key seems
more appropriate than `context`. However, since I wasn't sure about the
author's intent, I have left the default value as `context`.
  • Loading branch information
e7217 authored and bluearrow98 committed Feb 13, 2025
1 parent b6377a8 commit 0d6db85
Showing 1 changed file with 5 additions and 3 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ def create_index(
ids: Optional[List[str]] = None,
metadatas: Optional[List[dict]] = None,
namespace: Optional[str] = None,
text_key: str = "context",
) -> None:
"""Create an index from a list of contexts.
Expand Down Expand Up @@ -69,7 +70,7 @@ def create_index(
)
# add context passages as metadata
meta = [
{"context": context, **metadata}
{text_key: context, **metadata}
for context, metadata in zip(context_batch, metadata_batch)
]

Expand Down Expand Up @@ -114,7 +115,7 @@ class PineconeHybridSearchRetriever(BaseRetriever):
"""Alpha value for hybrid search."""
namespace: Optional[str] = None
"""Namespace value for index partition."""

text_key: str = "context"
model_config = ConfigDict(
arbitrary_types_allowed=True,
extra="forbid",
Expand All @@ -135,6 +136,7 @@ def add_texts(
ids=ids,
metadatas=metadatas,
namespace=namespace,
text_key=self.text_key,
)

@pre_init
Expand Down Expand Up @@ -174,7 +176,7 @@ def _get_relevant_documents(
)
final_result = []
for res in result["matches"]:
context = res["metadata"].pop("context")
context = res["metadata"].pop(self.text_key)
metadata = res["metadata"]
if "score" not in metadata and "score" in res:
metadata["score"] = res["score"]
Expand Down

0 comments on commit 0d6db85

Please # to comment.