distances of vectors in the database #38374
-
I am not sure if the following is possible and how to do this with miluvs in an efficient way. For example I can do it in the following way. Get subset of embeddings for a certain class from database: and then do search agains database with those emebddings and the same filter. However returning embeddings is expensive and therefore this way is prohibitevely slow. Is there a way to do this internally in milvus without returning emebeddings first and the doing similarity search? |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 2 replies
-
Currently, there is no such interface. If you are using in-memory index, the embeddings are loaded into memory, performance of query() is ok to return embeddings, maybe hundreds of milliseconds. By the way, as the embeddings belong to different classes A B C, why not create a collection for each class? embeddings = query(X1<attribute<X2, limit =1000) |
Beta Was this translation helpful? Give feedback.
-
Thanks for this fast reply. Then how can we know the results belong to which entity? I thought might be thre is something in-build which would be faster than returning embeddings. |
Beta Was this translation helpful? Give feedback.
-
I don't quite get u use case, but let me try to understand it: You have multiple embedding catogories, and you can put it into different partitions based on some meta information. You want to query from one of the partitions based on a filter. And batch calculate NN with vectors in another partitons? So this is more like a filtered join so we can get NN of one set of vectors to another set of vectors |
Beta Was this translation helpful? Give feedback.
Currently, there is no such interface.
If we don't return the embeddings, how can we know which entity is returned by query(X1<attribute<X2 and class=A, limit =1000)? Then how can we know the results belong to which entity?
If you are using in-memory index, the embeddings are loaded into memory, performance of query() is ok to return embeddings, maybe hundreds of milliseconds.
By the way, as the embeddings belong to different classes A B C, why not create a collection for each class?
collection_A for class A
collection_B for class B
embeddings = query(X1<attribute<X2, limit =1000)
results = collection_A.search(embeddings, filter = X1<with artibute <X2)