Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Bad performance with empty data #93

Open
iharshulhan opened this issue Feb 12, 2021 · 2 comments
Open

Bad performance with empty data #93

iharshulhan opened this issue Feb 12, 2021 · 2 comments

Comments

@iharshulhan
Copy link

Building the index with huge number of empty vectors is very slow and may result in pure search performance. I would suggest to either handle the case separately or throw a warning to a user.

@masajiro
Copy link
Member

Could you tell me your situation more?
What do you mean by empty vector? Is the vector {} or {0.0, ..., 0.0}?
What is the number of dimensions of the empty vector?
Which distance function do you use for the empty vectors?

@iharshulhan
Copy link
Author

I've ment the vector with zeros {0.0, ..., 0.0}. I've used vectors with a dimension of 500. The total number of vectors was ~3.5 million and the cosine similarity function.

I believe that it also a case for vectors with a single element like this {0, 1, ..., 0}. The index was stuck during querying time for such vectors

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants