-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Computing hashes from embeddings #1
Comments
Trying to find a function f such that for an embedding e of size let's say 1024 floats i have for most e1, e2 in the space of my embeddings : One use case of such a function f would be to perform efficient deduplication of items represented by embeddings. I think it would be possible to directly train a neural net to be f. But I'm wondering if using the quantization techniques implemented in faiss could be also a good technique. Maybe the encodings produced by IndexLSH could work. Maybe ones produced by PQ index could be helpful too. |
On the trained network path: We could probably generate a bunch of positive and negative by using the existing faiss index. f would need to quantize the embedding into a small amount of = comparable bytes |
But it seems to me the quantization performed in faiss are quite similar to what we want here. But might not be fully optimizable towards the right task |
@rom1504 related to what you say, on the trained network path, we might want to look at what people do in deep hashing. e.g. DistillHash (https://openaccess.thecvf.com/content_CVPR_2019/papers/Yang_DistillHash_Unsupervised_Deep_Hashing_by_Distilling_Data_Pairs_CVPR_2019_paper.pdf) seems to be relevant, it specifically deals with the case where we can sample positive/negative pairs following some pre-defined criterion, and it learns a hash function preserving the pairs relationship. |
https://www.algolia.com/blog/ai/vectors-vs-hashes/ learn binary hashes with a nn |
facebookresearch/faiss#2531 (comment) some thoughts here
https://docs.google.com/document/d/1AryWpV0dD_r9x82I_quUzBuRyzDotL_HHnKuNB9H3Zc/edit?usp=drivesdk more thoughts there
Also this LAION-AI/project-menu#28
The text was updated successfully, but these errors were encountered: