First lookup for an ID is slow #376

scottbreyfogle · 2019-03-29T17:03:19Z

I'm seeing a problem when using the library where calling get_nns_by_vector is very slow on first execution for a given value (15s for an index with ~2m vectors of length 200), and then much faster in subsequent calls (subsecond). I've also seen similar behavior in get_item_vector, but I'm not sure if it's related. I've set prefault to true when loading the index from disk.

The strangest part is that it only occurs on some computers, and I'm not able to repro on my local machine to get details on what is going on. If I load the same index on my computer, the execution time is always subsecond. I'm continuing to look into this to see if I can find a reliable method of reproducing.

Has anyone seen performance patterns like this? Do you have thoughts on what the problem may be? I'm not sure that it's Annoy specifically, but would be good to know if it is.

My current thoughts are that it has to do with the MMAPing and that the index is not all saved into RAM and the disk lookups are slow on some machines. I'm not very familiar with MMAPing and would appreciate outside thoughts on whether that's reasonable and/or how to verify if it's the problem.

erikbern · 2019-03-29T17:10:39Z

yeah seems possible the page cache just isn't warm and that prefault for whatever reason isn't working on your machine

if you want to warm up the page cache – consider doing a sequential scan or a few thousand random lookups first

scottbreyfogle · 2019-03-29T17:24:34Z

Looking at the code, it seems like prefault only triggers if MAP_POPULATE is defined. It does not seem to be set anywhere in the library. Is that something defined on the system in question? Part of compilation? Something I should have set in my code calling the library?

In short, is there a way for me to check on a particular computer/installation if that flag is enabled?

Edit: Thank you so much for the quick response, by the way! I really appreciate the library and documentation.

erikbern · 2019-03-29T21:46:16Z

There's some doc here: http://man7.org/linux/man-pages/man2/mmap.2.html

We should probably throw a warning or similar if MAP_POPULATE isn't defined and prefault = true here: https://github.com/spotify/annoy/blob/master/src/annoylib.h#L925

scottbreyfogle · 2019-04-03T16:58:09Z

I ended up moving to a different ann library because of this. My best guess about the problem is that something about the cloud/kubernetes configuration where the code was running caused MAP_POPULATE to be undefined, but I did not confirm.

erikbern · 2019-04-03T19:50:57Z

@scottbreyfogle why is this a problem though? you can just do some random lookups to warm up the page cache, (or just loop through all vectors)

anyway closing for now

scottbreyfogle · 2019-04-03T20:09:14Z

That is a possibility that I considered. It was really a cost-benefit. A 15 second latency hit was really not acceptable for us, and I decided that it would be more reliable to switch to something I understood more.

If I can't understand fully what is going wrong (only guess), then I'm not comfortable making guarantees about the comprehensiveness of a solution. Are there multiple data structures that need to be warmed? If I call a get_item_vector on all elements, will that fix the problem for get_nns_by_vector? Or do I need to loop on both, which would presumably take quite a long time? What about get_nns_by_vector on un-indexed vectors?

It was just simpler to move to a solution where I didn't have to test all that, especially when validation needed to be done mostly on a remote machine with a very large dataset, since the issue is hard to detect locally or with a small dataset.

Closing seems reasonable though.

erikbern · 2019-04-03T23:23:18Z

seems fair. generally i don't think the linux kernel guarantees anything about mmap and swapping it out from primary memory, but i could be wrong. i'll look into the mmap flags again at some point

shoegazerstella · 2019-04-04T16:17:20Z

Hello @erikbern,
We are experiencing the same 'issue'.
ATM we have about 60k items stored in an annoy index. We noticed that the first query (queryByVector) takes quite long (44seconds), but the second was 0.3-0.4 seconds already.
This runs inside an API and we could think about running some random queries after the index loading. How many rounds should I consider to do? You previously suggested a few thousand: is this number somehow related to the size of the index?

erikbern · 2019-04-04T23:53:00Z

yeah, basically you need to scan through the index and make sure every page is hit (i think the linux page size is 4kb?)

so scan through and hit maybe every 100 vectors, and that should be fine (it probably won't be much slower to hit every vector actually)

loretoparisi · 2019-04-17T10:14:06Z

@erikfox what about the distribution of the hits among the index? Shall we randomly select the hits to call / 100 vectors so that the distribution of the hits over the data will likely be uniform?

sonots · 2019-06-14T16:49:23Z

A comment as a SRE. Relying on disk cache is unstable in terms of performance.
SOLUTION: Use tmpfs to locate annoy files.

chikubee · 2020-01-27T08:46:16Z

That is a possibility that I considered. It was really a cost-benefit. A 15 second latency hit was really not acceptable for us, and I decided that it would be more reliable to switch to something I understood more.

If I can't understand fully what is going wrong (only guess), then I'm not comfortable making guarantees about the comprehensiveness of a solution. Are there multiple data structures that need to be warmed? If I call a get_item_vector on all elements, will that fix the problem for get_nns_by_vector? Or do I need to loop on both, which would presumably take quite a long time? What about get_nns_by_vector on un-indexed vectors?

It was just simpler to move to a solution where I didn't have to test all that, especially when validation needed to be done mostly on a remote machine with a very large dataset, since the issue is hard to detect locally or with a small dataset.

Closing seems reasonable though.

@scottbreyfogle Which solution did you move to, to address this problem, as I am facing similar issues.

erikbern · 2020-01-27T13:40:10Z

you can load it with prefault = True – i believe this should speed it up significantly

chikubee · 2020-01-27T14:04:09Z

@erdtman I get the following error, i am working on mac, prefault is set to true, but MAP_POPULATE is not defined on this platform.

erikbern · 2020-01-27T15:23:20Z

yeah i believe prefault doesn't work on os x unfortunately

erikbern · 2020-01-27T15:24:09Z

as a workaround, you could iterate over all indices and run get_vector just to warm up the page cache. or another way is you can just run cat index.ann > /dev/null on the command line

loretoparisi · 2020-01-27T17:47:35Z

@erikbern what will happen when doing cat index.ann > /dev/null? thanks

erikbern · 2020-01-27T18:40:18Z

@loretoparisi typically the kernel will cache that file in memory, meaning subsequent random access to it will be very fast

erikbern · 2020-01-27T18:42:21Z

https://serverfault.com/a/43391 confirms this :)

eddie-scio · 2020-03-12T18:17:12Z

I'm running into an interesting issue -- when I query locally (1.5m items, 128-d, num_trees = 100, k = 1000, search_k = 500000), I always get sub 100ms. This is on OSX, where prefault=True does nothing. When I deploy to Google App Engine, I'm getting O(15s) queries. I've tried with both prefault = True there and looping through the whole index and calling .get_item_vector(i), and neither of them resolve the slow first-query problem. I suspect that the ephemeral disk used with GAE flex might be interfering with the page cache. Any thoughts?

eddie-scio · 2020-03-12T18:40:50Z

I've confirmed with vmtouch that my local environment is correctly mmapping the index, whereas on GAE it is not (even with both prefault=True and the full item scan I mentioned above).

eddiezhou:~/workspace/vmtouch[11:29:52] (master) $ vmtouch /tmp/index.ann
           Files: 1
     Directories: 0
  Resident Pages: 734722/734722  2G/2G  100%
         Elapsed: 0.12743 seconds

root@cfa460207703:/home/vmagent/app/vmtouch# vmtouch /tmp/index.ann
           Files: 1
     Directories: 0
  Resident Pages: 216861/734722  847M/2G  29.5%
         Elapsed: 0.030743 seconds

erikbern · 2020-03-12T19:46:19Z

i don't think prefault is supported on all platforms

eddie-scio · 2020-03-12T21:16:59Z

I believe prefault is working on GAE (there's no warning like there is when I run locally). For anyone who's trying to get this working on GAE, allocating more RAM to my instance solved this problem, I think there was more eviction of the index from the filesystem cache due to memory constraint. Using tmpfs as suggested above is the way to go.

akshaykarangale · 2020-12-11T08:57:17Z

as a workaround, you could iterate over all indices and run get_vector just to warm up the page cache. or another way is you can just run cat index.ann > /dev/null on the command line

This worked like magic. Thanks!

erikbern closed this as completed Apr 3, 2019

erikbern added a commit that referenced this issue Apr 10, 2019

Simplify changes made in #380, better error message for #376

a12da33

eddie-scio mentioned this issue May 22, 2020

Slowdown with two indices (using tmpfs) #482

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

First lookup for an ID is slow #376

First lookup for an ID is slow #376

scottbreyfogle commented Mar 29, 2019 •

edited

Loading

erikbern commented Mar 29, 2019

scottbreyfogle commented Mar 29, 2019 •

edited

Loading

erikbern commented Mar 29, 2019

scottbreyfogle commented Apr 3, 2019

erikbern commented Apr 3, 2019

scottbreyfogle commented Apr 3, 2019

erikbern commented Apr 3, 2019

shoegazerstella commented Apr 4, 2019 •

edited

Loading

erikbern commented Apr 4, 2019 •

edited

Loading

loretoparisi commented Apr 17, 2019

sonots commented Jun 14, 2019

chikubee commented Jan 27, 2020 •

edited

Loading

erikbern commented Jan 27, 2020

chikubee commented Jan 27, 2020

erikbern commented Jan 27, 2020

erikbern commented Jan 27, 2020

loretoparisi commented Jan 27, 2020

erikbern commented Jan 27, 2020

erikbern commented Jan 27, 2020

eddie-scio commented Mar 12, 2020

eddie-scio commented Mar 12, 2020

erikbern commented Mar 12, 2020

eddie-scio commented Mar 12, 2020

akshaykarangale commented Dec 11, 2020

First lookup for an ID is slow #376

First lookup for an ID is slow #376

Comments

scottbreyfogle commented Mar 29, 2019 • edited Loading

erikbern commented Mar 29, 2019

scottbreyfogle commented Mar 29, 2019 • edited Loading

erikbern commented Mar 29, 2019

scottbreyfogle commented Apr 3, 2019

erikbern commented Apr 3, 2019

scottbreyfogle commented Apr 3, 2019

erikbern commented Apr 3, 2019

shoegazerstella commented Apr 4, 2019 • edited Loading

erikbern commented Apr 4, 2019 • edited Loading

loretoparisi commented Apr 17, 2019

sonots commented Jun 14, 2019

chikubee commented Jan 27, 2020 • edited Loading

erikbern commented Jan 27, 2020

chikubee commented Jan 27, 2020

erikbern commented Jan 27, 2020

erikbern commented Jan 27, 2020

loretoparisi commented Jan 27, 2020

erikbern commented Jan 27, 2020

erikbern commented Jan 27, 2020

eddie-scio commented Mar 12, 2020

eddie-scio commented Mar 12, 2020

erikbern commented Mar 12, 2020

eddie-scio commented Mar 12, 2020

akshaykarangale commented Dec 11, 2020

scottbreyfogle commented Mar 29, 2019 •

edited

Loading

scottbreyfogle commented Mar 29, 2019 •

edited

Loading

shoegazerstella commented Apr 4, 2019 •

edited

Loading

erikbern commented Apr 4, 2019 •

edited

Loading

chikubee commented Jan 27, 2020 •

edited

Loading