Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Bytes Vectors from r.hget vs Bytes string returned from r.ft().search(query="*") #2772

Closed
ghost opened this issue May 22, 2023 · 3 comments · Fixed by #3309
Closed

Bytes Vectors from r.hget vs Bytes string returned from r.ft().search(query="*") #2772

ghost opened this issue May 22, 2023 · 3 comments · Fixed by #3309

Comments

@ghost
Copy link

ghost commented May 22, 2023

Redis Python Lib Version: version 4.5.5

Redis Stack Version: version 7.0.0

Platform: Python 3.10.6 and Ubuntu 22.04

Description: Description of your issue, stack traces from errors and code that reproduces the issue

After storing a bunch of numpy vectors in bytes in HSETs and creating an index (FT), I am trying to retrieve all of the embeddings using FT.SEARCH with "*" query, however, the vector is returned in a string that differs from the bytes format I get when using HGET. I'll add a few line of code as an example:

import redis
import os
import numpy as np

_redis_match_config = os.getenv("NQAI_REDIS_MATCH_CONFIG")
fake_vec = np.array([0.1,0.2,0.3,0.4])
r = redis.Redis(**_redis_match_config)
expert_hash = {"person_id":1, "vector_emb" : fake_vec.astype(np.float32).tobytes()}
r.hset("person:1", mapping=expert_hash)
index_name = "person"
person_prefix = f"{index_name}:"
vector_search_attributes = {"TYPE": "FLOAT32", "DIM": 4, "DISTANCE_METRIC": "COSINE"}
schema = (
                    TagField("person_id"),
                    VectorField("embeddings_bio", algorithm="HNSW", attributes=vector_search_attributes)
                    )

r.ft(index_name).create_index(fields=schema, definition=IndexDefinition(prefix=[person_prefix], index_type=IndexType.HASH))

byets_person_1 = r.hget("person:1", "vector_emb")
print(byets_person_1)
print(np.frombuffer(byets_person_1, dtype=np.float32))
> output : b"\xcd\xcc\xcc=\xcd\xccL>\x9a\x99\x99>\xcd\xcc\xcc>"
> output : array([0.1, 0.2, 0.3, 0.4], dtype=float32)

However, when I do:

query = (
                    Query("*")
                    .return_fields("id", "vector_emb",)
                )
all_of = r.ft(index_name).search(query=query, query_params={}).docs
print(all_of[0]["vector_emb"])
print(all_of[0]["vector_emb"].encode("utf-32"))
print(np.frombuffer(bytes(all_of[0]["vector_emb"].encode("utf-32")), dtype=np.float32))
> output : "=L>>>"
> output: b'\xff\xfe\x00\x00=\x00\x00\x00L\x00\x00\x00>\x00\x00\x00>\x00\x00\x00>\x00\x00\x00'
> output : array([9.1475e-41 8.5479e-44 1.0650e-43 8.6881e-44 8.6881e-44 8.6881e-44], dtype=float32)

I have tried different combinations of .encode("utf-xx") and dtype=np.floatxx to no avail! Please help. Thanks.

@trish11953
Copy link

Yes, had the same bug on my end when I tried retrieving the list of floats vector from
results = self.client.ft(self.index).search(query_expression, query_params).docs
It came back with a weird encoding which could not be decoded back to the original vector.

@AdamAdLightning
Copy link

I'm having a similar issue. I need to read these vectors back and do some processing on them, but I'm unable to decode them when I read them from a hash using hget.

uglide added a commit to uglide/redis-py that referenced this issue Jul 9, 2024
uglide added a commit to uglide/redis-py that referenced this issue Jul 9, 2024
uglide added a commit to uglide/redis-py that referenced this issue Jul 9, 2024
@gerzse gerzse closed this as completed in 1bb8eab Jul 10, 2024
gerzse pushed a commit to gerzse/redis-py that referenced this issue Jul 11, 2024
Make it possible to configure at field level how search
results are decoded.

Fixes: redis#2772, redis#2275
gerzse pushed a commit to gerzse/redis-py that referenced this issue Jul 11, 2024
Make it possible to configure at field level how search
results are decoded.

Fixes: redis#2772, redis#2275
gerzse pushed a commit that referenced this issue Jul 11, 2024
Make it possible to configure at field level how search
results are decoded.

Fixes: #2772, #2275
agnesnatasya pushed a commit to agnesnatasya/redis-py that referenced this issue Jul 20, 2024
Make it possible to configure at field level how search
results are decoded.

Fixes: redis#2772, redis#2275
@marisancans
Copy link

Is this fixed? Im having the same problem. I just want my vector back :/

vladvildanov pushed a commit that referenced this issue Sep 27, 2024
Make it possible to configure at field level how search
results are decoded.

Fixes: #2772, #2275
vladvildanov pushed a commit that referenced this issue Sep 27, 2024
Make it possible to configure at field level how search
results are decoded.

Fixes: #2772, #2275
vladvildanov pushed a commit that referenced this issue Sep 27, 2024
Make it possible to configure at field level how search
results are decoded.

Fixes: #2772, #2275
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants