Skip to content

Python C code snippets

Matthijs Douze edited this page Nov 7, 2024 · 24 revisions

It is not always obvious how the C++ and Python layers interact. Therefore, we give some handy code in Python notebooks that can be copy/pasted to perform some useful operations.

They rely mostly on vector_to_array and a few other Python/C++ tricks described here

The faiss.contrib.inspect_tools module has a few useful functions to inspect the Faiss objects. In particular inspect_tools.print_object_fields lists all the fields of an object and their values.

How can I get the PCA matrix in numpy from a PCA object?

Use the function faiss.contrib.inspect_tools. get_LinearTransform_matrix , or see this code: get_matrix_from_PCA.ipynb. This applies to any LinearTransform object.

How can I get / change the centroids from a ProductQuantizer or ResidualQuantizer object?

For PQ: see access_PQ_centroids.ipynb.

For RQ: see demo_replace_RQ_codebooks.ipynb

How can I get the content of inverted lists?

Use the function faiss.contrib.inspect_tools.get_invlist, or see this code: get_invlists.ipynb

How can I lookup the inverted list corresponding to a stored vector?

This does not require C++ magic. See #3555

How can I get the link structure of a HNSW index?

See this code snippet: demo_hnsw_struct.ipynb alternative rendering.

How can I get the knn graph for an IndexNNDescent?

See demo_access_nndescent.ipynb

How can I merge normal ArrayInvertedLists?

See demo_merge_array_invertedlists.ipynb

Faiss/pytorch interop: how can I use a PQ codec without leaving the GPU?

See PQ_codec_pytorch.ipynb.

How to explore the contents of an opaque index?

We have an index file but don't know what's in it. When accessing the Index fields of a wrapper index, they show up as a plain Index object. The downcast_index converts this plain index to the "leaf" class the index belongs to. This snippet is a demo on how to use downcast_index to extract all info from it: demo_explore_indedex.ipynb

How can I get all the ids from an IDMap or an IDMap2?

IDMap2 inherits IDMap, so this code works for both.

how can I convert an IDMap2 to IDMap?

This code works for both directions: convert_idmap2_idmap.ipynb

How to train a CPU index with a GPU just for k-means?

See train_ivf_with_gpu.ipynb

How to use the GPU at add time?

See assign_on_gpu.ipynb.

How can I force the k-means initialization?

plus: how to do this for IVF training

See initial_centroids_demo.ipynb

How to transfer a trained OPQ and/or IVF centroids to another index?

See https://github.com/facebookresearch/faiss/issues/2455

How can I replace the inverted list content:

See demo_replace_invlists.ipynb

How can I get access to non-8 bit quantization code entries in PQ / IVFPQ / AQ ?

You need a BitStringReader, see #2285

Simulating an IndexPQ on GPU with a 1-centroid IVFPQ

IndexPQ is not supported on GPU, but it is relatively easy to simulate it with an IVFPQ.

demo_1_centroid_PQ.ipynb

Accessing the vectors of a graph-based index (NSG or HNSW)

The data is stored in a storage index, which is an IndexFlatCodes. demo_access_NSG_data.ipynb

To get the reconstructed vectors, use index2.reconstruct(vector_id) or index2.reconstruct_n().

Wrapping small C++ objects for use from Python

Sometimes it is useful to implement a small callback needed by Faiss in C++. However, it may be too specific or depend to external code, so it does not make sense to include in Faiss (and Faiss is hard to compile ;-) )

In that case, you can make a SWIG wrapper for a snippet of C++.

Here is an example for an IDSelector object that has an is_member callback: bow_id_selector.swig

To compile the code with Faiss installed via conda and SWIG 4.x on Linux:

# generate wrapper code 
swig -c++ -python -I$CONDA_PREFIX/include  bow_id_selector.swig 

# compile generated wrapper code: 
g++ -shared -O3 -g -fPIC bow_id_selector_wrap.cxx -o _bow_id_selector.so  \
  -I $( python -c "import distutils.sysconfig ; print(distutils.sysconfig.get_python_inc())" )  \
  -I $CONDA_PREFIX/include $CONDA_PREFIX/lib/libfaiss_avx2.so

This produces bow_id_selector.py and _bow_id_selector.so that can be loaded in Python with

import numpy as np
import faiss
import bow_id_selector

# very small sparse CSR matrix
n = 3
indptr = np.array([0, 2, 3, 6], dtype='int32')
indices = np.array([7, 8, 3, 1, 2, 3], dtype='int32')

# don't forget swig_ptr to convert from a numpy array to a C++ pointer
selector = bow_id_selector.IDSelectorBOW(n, faiss.swig_ptr(indptr), faiss.swig_ptr(indices))

selector.set_query_words(1, 2)
selector.is_member(0)   # returns False
selector.is_member(1)   # returns False
selector.is_member(2)   # returns True
selector.is_member(3)   # crashes! 

# And of course you can combine it with existing Faiss objects
params = faiss.SearchParameters(sel=selector)
Clone this wiki locally