Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The recall rate of glass is lower than hnswlib under the same parameters. #10

Open
cwj0bzxg opened this issue May 13, 2024 · 2 comments

Comments

@cwj0bzxg
Copy link

cwj0bzxg commented May 13, 2024

In the hnswlib library, there is a distinction between internal IDs and external IDs. However, I noticed that in glass's hnsw index, what is returned during the search is the internal ID of the neighbor. When using single thread, the internal ID and external ID are completely consistent, so returning the internal ID will not affect the recall rate. However, in the case of multi-threading, this can become a problem.

After a series of experimental verifications, I confirmed the existence of this problem.
Server configuration: 96 cores, 512G memory
Dataset: deep1M, deep10M
Method:glass(hnsw),hnswlib
Parameters: R=32(M=16), efc=200, efs=500

Result(recall@100):
deep1M: glass(90.0532%), hnswlib(99.3282%)
deep10M: glass(92.6417%), hnswlib(97.8977%)

@cwj0bzxg
Copy link
Author

It is worth noting that when using multi-core, recall is not stable because the internal ID and external ID are inconsistent due to concurrency issues.

@Wainberg
Copy link

Wainberg commented Jul 3, 2024

@hhy3 my own tests confirm that Glass's multi-threaded HNSW is unusable due to this bug, please fix it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants