Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

VectorQuery + highlight() fails #266

Open
cancerberoSgx opened this issue Jan 28, 2025 · 6 comments
Open

VectorQuery + highlight() fails #266

cancerberoSgx opened this issue Jan 28, 2025 · 6 comments
Assignees
Labels
bug Something isn't working

Comments

@cancerberoSgx
Copy link

I need to perform VectorQuery with text filters plus highlight but it fails with: redisvl.exceptions.RedisSearchError: Error while searching: Property vector_distance is not in schema

The same works OK for FilterQuery.

Folllowing is a working example:

import sys
sys.path.append('.')

import numpy as np
from src.store.get_redis import getRedis
from redisvl.index import SearchIndex
from redisvl.query.filter import Tag, Num, FilterExpression, Text
from redisvl.query import VectorQuery, FilterQuery

def vector_highlight_issue():
    r = getRedis()
    r.flushall()
    schema = {
        "index": {
            "name": "user_simple",
            "prefix": "user_simple_docs",
        },
        "fields": [
            {"name": "user", "type": "text"},
            {"name": "credit_score", "type": "tag"},
            {"name": "job", "type": "text"},
            {"name": "age", "type": "numeric"},
            # {"name": "vector_distance", "type": "numeric"},
            {
                "name": "user_embedding",
                "type": "vector",
                "attrs": {
                    "dims": 3,
                    "distance_metric": "cosine",
                    "algorithm": "flat",
                    "datatype": "float32"
                }
            },
        ]
    }
    data = [    
        {
            'user': 'Sebastian Gurin',
            'age': 1,
            'job': 'engineer',
            'credit_score': 'high',
            'user_embedding': np.array([0.4, 0.3, 0.5], dtype=np.float32).tobytes()
        },
        {
            'user': 'Sebastian Martinez',
            'age': 2,
            'job': 'doctor',
            'credit_score': 'low',
            'user_embedding': np.array([0.1, 0.1, 0.5], dtype=np.float32).tobytes()
        },
        {
            'user': 'Maria Cristina Miños',
            'age': 3,
            'job': 'dentist',
            'credit_score': 'medium',
            'user_embedding': np.array([0.9, 0.9, 0.1], dtype=np.float32).tobytes()
        }
    ]
    
    index = SearchIndex.from_dict(schema)
    index.set_client(r)
    index.create(overwrite=True)
    
    keys = index.load(data)
    
    filter_expression = Text("user") % "Sebas*"
    return_fields = ["user", "age", "job", "credit_score"]
    
    query = VectorQuery(
        vector=[0.1, 0.1, 0.5],
        vector_field_name="user_embedding",
        return_fields=return_fields,
        num_results=3,
        filter_expression=filter_expression,
    )
    query.highlight(fields=['user'])
    
    # FilterQuery + highlight works fine: 
    # query = FilterQuery(
    #     return_fields=return_fields,
    #     num_results=3
    # )
    # query.set_filter(filter_expression)
    # query.highlight(fields=['user'])
    
    results = index.query(query)
    
    print('LEN', len(results))
    for doc in results:
        print(doc)
        
    
if __name__ == "__main__":
    vector_highlight_issue()
@tylerhutcherson
Copy link
Collaborator

Thanks for reporting. Will look into this today. Which version of redisvl and redis-py are you using?

@cancerberoSgx
Copy link
Author

Thanks for reporting. Will look into this today. Which version of redisvl and redis-py are you using?

redis                    5.0.0
redisvl                  0.3.9

@abrookins abrookins added the bug Something isn't working label Jan 29, 2025
@abrookins
Copy link
Collaborator

abrookins commented Jan 29, 2025

This looks like a bug! There appears to be a problem with the underlying query in FT.SEARCH, where HIGHLIGHT doesn't work alongside a knn query. I'll talk to our search engineers and update this issue as we learn more.

This query works (full-text search with HIGHLIGHT):

127.0.0.1:6379> "FT.SEARCH" "user_simple" "@user:(Sebas*)" "RETURN" "4" "user" "age" "job" "credit_score"  "DIALECT" "2" "LIMIT" "0" "3" HIGHLIGHT FIELDS 1 user
1) (integer) 2
2) "user_simple_docs:7a65714003254c1a988325d7192c90da"
3) 1) "user"
   2) "<b>Sebastian</b> Gurin"
   3) "age"
   4) "1"
   5) "job"
   6) "engineer"
   7) "credit_score"
   8) "high"
4) "user_simple_docs:0aa7f56ab0604a388ebd73c69ce77e64"
5) 1) "user"
   2) "<b>Sebastian</b> Martinez"
   3) "age"
   4) "2"
   5) "job"
   6) "doctor"
   7) "credit_score"
   8) "low"

This query also works (full-text search pre-filter and KNN query without HIGHLIGHT):

127.0.0.1:6379> "FT.SEARCH" "user_simple" "@user:(Sebas*)=>[KNN 3 @user_embedding $vector AS vector_distance]" "RETURN" "5" "user" "age" "job" "credit_score" "vector_distance" "SORTBY" "vector_distance" "ASC" "DIALECT" "2" "LIMIT" "0" "3" "params" "2" "vector" "\xcd\xcc\xcc=\xcd\xcc\xcc=\x00\x00\x00?"
1) (integer) 2
2) "user_simple_docs:0aa7f56ab0604a388ebd73c69ce77e64"
3)  1) "vector_distance"
    2) "0"
    3) "user"
    4) "Sebastian Martinez"
    5) "age"
    6) "2"
    7) "job"
    8) "doctor"
    9) "credit_score"
   10) "low"
4) "user_simple_docs:7a65714003254c1a988325d7192c90da"
5)  1) "vector_distance"
    2) "0.129070281982"
    3) "user"
    4) "Sebastian Gurin"
    5) "age"
    6) "1"
    7) "job"
    8) "engineer"
    9) "credit_score"
   10) "high"

But this query fails (full-text pre-filter and KNN query, with HIGHLIGHT):

127.0.0.1:6379> "FT.SEARCH" "user_simple" "@user:(Sebas*)=>[KNN 3 @user_embedding $vector AS vector_distance]" "RETURN" "5" "user" "age" "job" "credit_score" "vector_distance" "SORTBY" "vector_distance" "ASC" "DIALECT" "2" "LIMIT" "0" "3" "params" "2" "vector" "\xcd\xcc\xcc=\xcd\xcc\xcc=\x00\x00\x00?" HIGHLIGHT FIELDS 1 user
(error) Property `vector_distance` is not in schema

@cancerberoSgx
Copy link
Author

Aja, thanks! couple of comments:

  • tried to add a vector_distance myself in the schema but then it fails with "duplicated schema field vector_distance
  • same error also happens when using .summarize()

BTW amazing library, keep the good work!

@tylerhutcherson
Copy link
Collaborator

Aja, thanks! couple of comments:

  • tried to add a vector_distance myself in the schema but then it fails with "duplicated schema field vector_distance
  • same error also happens when using .summarize()

BTW amazing library, keep the good work!

Thanks @cancerberoSgx :) -- btw the error reported back here is just what comes from the Redis server and search library within the core. It's a bit of a red herring as the issue has nothing to do with the schema (but nice attempt at trying!!!)

We will bring this to product management from Redis to see what the status is of fixes. In the meantime, mind also sharing what version of redis you are using?

@cancerberoSgx
Copy link
Author

cancerberoSgx commented Jan 29, 2025

In the meantime, mind also sharing what version of redis you are using?

'redis_version': '7.4.1',

@tylerhutcherson @abrookins would be awesome if you share with me the follow up of this bug in other base projects if there's any issue or PR. Thanks

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants