Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Failure to reproduce paper's recall on IVF Product quantization #1045

Closed
mayya-sharipova opened this issue Nov 27, 2019 · 6 comments
Closed
Labels

Comments

@mayya-sharipova
Copy link

I am trying to reproduce results the paper: “Product quantization for nearest neighbor search”, Jégou & al., PAMI 2011.
The paper reports that IVFADC for sift 1 million docs dataset with 128 dims with k'=1024, m=8, w=8 produces recall of around 0.9 (Fig 7).

Running ann-benchmarks on the sift dataset with nprobe=8

ncentroids = 1024
m = 8 
code_size = 8 
self.coarse_quantizer = faiss.IndexFlatL2(X.shape[1])
index = faiss.IndexIVFPQ(self.coarse_quantizer, X.shape[1], ncentroids, m, code_size)

produces recall of 0.372.
FaissIVFPQ(n_list=1024, n_probe=8) 0.372 4493.587

What would be the way to achieve recall as reported by the paper?

@mdouze
Copy link
Contributor

mdouze commented Nov 29, 2019

Here is code that reproduces the result:
repro_IVF_paper.ipynb
It gives 87.3% recall. There is probably an error in your eval code.

@mayya-sharipova
Copy link
Author

Thanks so much for your reply. I will investigate further.

@mayya-sharipova
Copy link
Author

mayya-sharipova commented Dec 1, 2019

@mdouze I am not clear about the way you calculate recall:
recall_at_100 = (I[:, :100] == gt[:, :1]).sum() / float(xq.shape[0])

For every query, we have a ground truth -- 100 expected documents to find. So, we need to compare these 100 expected docs with 100 documents found during the search, calculate how many documents are in common (intersection), and this will be the recall for this query. We do the same for all queries to get the average recall among 10K queries.

But in your calculation, you only check if 1st expected document is present in 100 documents found during the search. That's not a traditional evaluation of recall. Do you have references when this definition of the recall is used?

So, using your code in repro_IVF_paper.ipynb, and calculating the recall in the traditional way:

total_matches = 0
for q in range(xq.shape[0]):
    matches = [id for id in I[q] if id in gt[q]]
    total_matches += len(matches)
recall_at_100 = (total_matches / float(xq.shape[0]))/ 100
print("recall@100: %.3f" % recall_at_100);

I am getting the result of:
recall@100: 0.436 or 43.6%

@mdouze
Copy link
Contributor

mdouze commented Dec 2, 2019

Check the definition of recall@100 in the paper:

The search quality is measured with recall@R, i.e., the
proportion of query vectors for which the nearest neighbor
is ranked in the first R positions. This measure indicates
the fraction of queries for which the nearest neighbor is
retrieved correctly, if a short-list of R vectors is verified
using Euclidean distances. Furthermore, the curve obtained by
varying R corresponds to the distribution function of the ranks,
and the point R=1 corresponds to the “precision” measure used
in [9] to evaluate ANN methods.

@mayya-sharipova
Copy link
Author

@mdouze Thank you so much for your clarification, I have missed this.
This definition of recall looks to be a very relaxed definition, and not sure how much it is really useful in practice.

@mdouze
Copy link
Contributor

mdouze commented Jan 6, 2020

no activity, closing.

@mdouze mdouze closed this as completed Jan 6, 2020
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants