Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Can't obtain results using Rust implementation #13

Open
paulbricman opened this issue Jul 8, 2021 · 2 comments
Open

Can't obtain results using Rust implementation #13

paulbricman opened this issue Jul 8, 2021 · 2 comments

Comments

@paulbricman
Copy link

paulbricman commented Jul 8, 2021

I'm roughly using the following code:

let query_emb: Vec<f32>;
let doc_emb: Vec<Vec<f32>>; // contains 3 document embeddings

...

let mut lsh = LshMem::new(10, 30, 512).srp().unwrap();
let _x = lsh.store_vecs(&doc_emb[..]);
let result = lsh.query_bucket(&query_emb).unwrap();
println!("lsh-rs: {:?}", result);

Unfortunately, the result is empty. I'm testing the same query and documents with ngt-rs and I get some results (I'm looking for an alternative to ngt-rs which runs on windows). Is this a problem of using better parameters?

@paulbricman
Copy link
Author

paulbricman commented Jul 8, 2021

It seems like it, messing with n_projections and n_hash_tables make it sometimes return results. Do you know of effective heuristics for choosing values for the two? I plan on working with 100-10000 candidate vectors of dimension 512, but was just testing with 3 of them.

@ritchie46
Copy link
Owner

ritchie46 commented Jul 11, 2021

Here is a presentation I have on the subject:
LSH.pdf

And a notebook with some theory notebook

Most important is understanding the gap amplification. The latest plot in the notebook. You can choose K and L and thereby tuning the collision probability for a certain similarity value.

P.S. you can play around with the python version of this crate in the notebook:

https://pypi.org/project/floky/

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants