Faiss cuVS

cuVS Overview

cuVS contains state-of-the-art implementations of several algorithms for running approximate nearest neighbors and clustering on the GPU. The primary goal of cuVS is to simplify the use of GPUs for vector similarity search and clustering. cuVS is built on top of the RAPIDS RAFT library of high performance machine learning primitives.

RMM

cuVS internally uses Rapids Memory Manager (RMM) to allow customizing device and host memory allocations. With cuVS enabled, FAISS' GPU resource manager is also configured to use RMM. The following example shows how to construct a PoolMemoryResource with an initial size of 1 GiB and a maximum size of 4 GiB. The pool uses CudaMemoryResource as its underlying “upstream” MR.

>>> import rmm
>>> pool = rmm.mr.PoolMemoryResource(
...     rmm.mr.CudaMemoryResource(),
...     initial_pool_size=2**30,
...     maximum_pool_size=2**32
... )
>>> rmm.mr.set_current_device_resource(pool)

In a cuVS enabled build, the StandardGpuResources object uses the current RMM device resource set by the user to do allocations (for non-cuVS ) constructs new RMM resources for pinned and managed allocations

Note: RMM's python interface is not a direct dependency of FAISS and must be installed externally:

conda install -c rapidsai rmm

Implemented indexes

The GPU indexes - GpuIndexFlat, GpuIndexIVFFlat and GpuIndexIVFPQ can use cuVS implementations. In addition, the graph-based CAGRA has been added to FAISS for faster search.

CAGRA

CAGRA, or (C)UDA (A)NN (GRA)ph-based, is a new graph-based index supported in FAISS through cuVS. It is based loosely on the popular navigable small-world graph (NSG) algorithm, but which has been built from the ground-up specifically for the GPU. CAGRA constructs a flat graph representation by first building a kNN graph of the training points and then removing redundant paths between neighbors.

The CAGRA algorithm has two basic steps:

Construct a kNN graph
Prune redundant routes from the kNN graph.

cuVS provides IVF-PQ and NN-Descent strategies for building the initial kNN graph and these can be selected in index params object during index construction. A cuVS CAGRA index can be built through FAISS and serialized to a CPU HNSW index thus providing more flexibility while constructing HNSW indexes. More details in the next chapter.

Improvements over Regular FAISS GPU Indices

Relaxed parameter settings for GpuIndexIVFPQ:
- There are no fixed values of code sizes as long as the number of codes representing a vector is lesser than or equal to the base dimension
- GpuIndexIVFPQ indexes with 56 bytes per code or more does not require the use of the float16 IVFPQ mode and the shared memory limitations do not apply
- cuVS allows for the number of bits per code to be in the closed interval [4, 8], whereas regular FAISS GPU indexes only support 8 bits per PQ code
The use of RMM allows for automatic temporary memory allocations with pooled memory resources and gives users more control over how memory is allocated
Performance: cuVS index builds are optimized and index builds are significantly faster than FAISS GPU index Train + Add. The performance is compared in the next chapter.

Limitations

Searching multi-GPU indexes is not supported for cuVS indexes through FAISS. In general, building and searching multi-GPU cuVS indexes can be done by installing cuVS directly
precomputed tables are not supported for GpuIndexIVFPQ built with cuVS
calling reserveVecs on a GpuIndexIVFPQ or GpuIndexIVFFlat is not supported
searchPreassigned to find nearest neighbors for IVF indices with pre-assigned centroids is not supported
INDICES_64_BIT is the only indices storage option available for cuVS indexes
Building from source: Building Faiss from source with cuVS enabled is slower