Single Server | GpuIndexFlatL2 Write Strategy #1072

dexception · 2019-12-30T09:50:25Z

Running on:

CPU
[yes] GPU

Interface:

[ yes] C++
Python

About my app:

Multi threaded http server based application
Accepts id and vector for /add request
Provide GpuIndexFlatL2 search functionality

However all the adding and search is happending in memory and if the application closes or crashes the data is lost. My question is since faiss supports writing index via:

    const char *name = "index.bin";
    faiss::write_index(faiss::gpu::index_gpu_to_cpu(index), name);

how do i implement the most efficient index saving strategy ?

Block all requests while index is being written to file for every new vector added
This will lead to decrease in performance.
Periodically update the index in the background after every 10,000 new vectors
if application crashes unwritten new vectors will be lost
Other strategy ?

Please help me. I have been scratching my head for the last 2 weeks regarding this problem.

The text was updated successfully, but these errors were encountered:

mdouze · 2019-12-31T06:57:12Z

It really depends on the operating conditions.
One approach is with two indexes: one big one with most of the vectors, and one in which you add new vectors. At search time, you search in both.

Then you can save every 10k adds with:

save small index with (fast) with incremental file names
merge small index into big one (fast, in RAM)
clear small index.

At recover time, you then need to load the small indexes to reconstruct the big one. You could have a background job that merges the small indexes on disk.

dexception · 2019-12-31T07:07:08Z

Thanks i think this would work without data loss in case of failure.

Other question is how do you handle meta data for the vectors because when the results for distance search are achieved that might not be revelant. For example,

In our application we have clientId,categoryId for each vector and other attributes as well. So when the topK results are returned that might not be for that clientID. Is there an Index that suports adding attributes for vectors inside the index as well ?

mdouze · 2020-01-02T10:26:50Z

No, you need to put metadata in a separate conventional table.
Rationale in #641

dexception · 2020-01-02T10:29:51Z

Resolved.

mdouze added the question label Dec 31, 2019

dexception closed this as completed Jan 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Single Server | GpuIndexFlatL2 Write Strategy #1072

Single Server | GpuIndexFlatL2 Write Strategy #1072

dexception commented Dec 30, 2019

mdouze commented Dec 31, 2019

dexception commented Dec 31, 2019 •

edited

Loading

mdouze commented Jan 2, 2020

dexception commented Jan 2, 2020

Single Server | GpuIndexFlatL2 Write Strategy #1072

Single Server | GpuIndexFlatL2 Write Strategy #1072

Comments

dexception commented Dec 30, 2019

mdouze commented Dec 31, 2019

dexception commented Dec 31, 2019 • edited Loading

mdouze commented Jan 2, 2020

dexception commented Jan 2, 2020

dexception commented Dec 31, 2019 •

edited

Loading