-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix the problem of low recall rate of glass(hnsw) in multi-core environment #11
base: master
Are you sure you want to change the base?
Conversation
Thanks for fixing this! Unfortunately, it looks like this version is non-deterministic due to some kind of a concurrency issue in import glassppy as glass
import numpy as np
import threadpoolctl
n, d = 10000, 128
np.random.seed(0)
X = np.random.randn(n, d)
Y = np.random.randn(d)
index = glass.Index(index_type="HNSW", dim=d, metric="L2", R=32, L=50)
with threadpoolctl.threadpool_limits(limits=1, user_api='openmp'):
graph = index.build(X)
searcher = glass.Searcher(graph=graph, data=X, metric="L2", level=1)
searcher.set_ef(32)
ret = searcher.search(query=Y, k=10)
print(ret) |
|
The recall is improved compared to the original. I just tested it. For the deep10m dataset, when ef=500, recall=97.8973%, and when ef=1000, recall=99.2578%. The deep10m dataset was downloaded through create_dataset.py in https://github.com/harsha-simhadri/big-ann-benchmarks. |
我这边测试数据集是SIFT1M 128的维度,ef=1000,精度约91%,麻烦可以发一下你那边build函数的其他参数吗?比如:index = std::make_uniqueglass::HNSW(dim, "L2", X,Y)中的X和Y,以及Optimize(Z)的参数Z吗? |
This PR is to fix the issue mentioned in #10
In the Graph structure of glass (hnsw), the neighbor's internal ID is stored in the neighbor list, and the internal ID is used to find the corresponding neighbor list. I noticed that you can store the neighbor's external ID in the neighbor list and use the external ID to access the corresponding neighbor list. These two methods are equivalent, but the latter method can avoid the problem of low recall rate caused by inconsistency between internal ID and external ID.
There is an example in the following branch: https://github.com/cwj0bzxg/pyglass/blob/fix_bug_deep10M/main.cpp