-
Notifications
You must be signed in to change notification settings - Fork 3k
Searching with HNSW index
HNSW (Hierarchical Navigable Small World Graph) is a graph-based indexing algorithm. It builds a multi-layer navigation structure for an image according to certain rules. In this structure, the upper layers are more sparse and the distances between nodes are farther; the lower layers are denser and he distances between nodes are closer. The search starts from the uppermost layer, finds the node closest to the target in this layer, and then enters the next layer to begin another search. After multiple iterations, it can quickly approach the target position.
In order to improve performance, HNSW limits the maximum degree of nodes on each layer of the graph to M
.
In addition, you can use efConstruction
(when building index) or ef
(when searching targets) to specify a search range.
-
Index building parameters
Parameter Description Range M
Maximum degree of the node [4, 64] efConstruction
Search scope [8, 512] -
Search parameters
Parameter Description Range ef
Search scope [ top_k
, 32768]
For example:
In python
from pymilvus import connections, FieldSchema, CollectionSchema, DataType, Collection
# create a collection
collection_name = "milvus_HSNW"
default_fields = [
FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=True),
FieldSchema(name="vector", dtype=DataType.FLOAT_VECTOR, dim=d)
]default_schema = CollectionSchema(fields=default_fields, description="test collection")
print(f"\nCreate collection...")
collection = Collection(name= collection_name, schema=default_schema)
# insert data
import random
vectors = [[random.random() for _ in range(8)] for _ in range(10)]
entities = [vectors]
mr = collection.insert(entities)
print(collection.num_entities)
# create index
collection.create_index(field_name=field_name,
index_params={'index_type': 'HNSW',
'metric_type': 'L2',
'params': {
"M": 16, # int. 4~64
"efConstruction": 40 # int. 8~512
}})
collection.load()
# search
top_k = 10
results = collection.search(vectors[:5], anns_field="vector", param={
"ef": 64
}, limit=top_k)# show results
for result in results:
print(result.ids)
print(result.distance)