distances of vectors in the database #38374

maxshatskiy · 2024-12-11T09:00:29Z

maxshatskiy
Dec 11, 2024

I am not sure if the following is possible and how to do this with miluvs in an efficient way.
I have an index with 1ml>vectors, which belong to different classes A,B,C and have additional attributes. I need all pariwise distances to 1..k NN withing a subset of certain class.

For example I can do it in the following way. Get subset of embeddings for a certain class from database:
embeddings = query(X1<attribute<X2 and class=A, limit =1000)

and then do search agains database with those emebddings and the same filter.
results = search (embeddings, filter = X1<with artibute <X2 and class==A)

However returning embeddings is expensive and therefore this way is prohibitevely slow.

Is there a way to do this internally in milvus without returning emebeddings first and the doing similarity search?
Or is there efficient way to return embeedings?

Answered by yhmo

Dec 11, 2024

Currently, there is no such interface.
If we don't return the embeddings, how can we know which entity is returned by query(X1<attribute<X2 and class=A, limit =1000)? Then how can we know the results belong to which entity?

If you are using in-memory index, the embeddings are loaded into memory, performance of query() is ok to return embeddings, maybe hundreds of milliseconds.

By the way, as the embeddings belong to different classes A B C, why not create a collection for each class?
collection_A for class A
collection_B for class B

embeddings = query(X1<attribute<X2, limit =1000)
results = collection_A.search(embeddings, filter = X1<with artibute <X2)

View full answer

yhmo · 2024-12-11T09:35:25Z

yhmo
Dec 11, 2024
Collaborator

Currently, there is no such interface.
If we don't return the embeddings, how can we know which entity is returned by query(X1<attribute<X2 and class=A, limit =1000)? Then how can we know the results belong to which entity?

If you are using in-memory index, the embeddings are loaded into memory, performance of query() is ok to return embeddings, maybe hundreds of milliseconds.

By the way, as the embeddings belong to different classes A B C, why not create a collection for each class?
collection_A for class A
collection_B for class B

embeddings = query(X1<attribute<X2, limit =1000)
results = collection_A.search(embeddings, filter = X1<with artibute <X2)

0 replies

maxshatskiy · 2024-12-11T10:16:47Z

maxshatskiy
Dec 11, 2024
Author

Thanks for this fast reply. Then how can we know the results belong to which entity? I thought might be thre is something in-build which would be faster than returning embeddings.
By the way, as the embeddings belong to different classes A B C, why not create a collection for each class?
I have a use case, where I have to make search across all classes as well. But this is a good idea, maybe I will create 2 databases: 1st with all classes and some approximate search and 2nd one with collection for each class as you suggested and slower, but more accurate search withing class. Then they can be run independently from each other.

0 replies

xiaofan-luan · 2024-12-11T10:29:49Z

xiaofan-luan
Dec 11, 2024
Maintainer

I don't quite get u use case, but let me try to understand it:

You have multiple embedding catogories, and you can put it into different partitions based on some meta information.

You want to query from one of the partitions based on a filter.

And batch calculate NN with vectors in another partitons?

So this is more like a filtered join so we can get NN of one set of vectors to another set of vectors

2 replies

maxshatskiy Dec 11, 2024
Author

I have 2 use cases:
1:
Given: test vectors
Return: classes of top k-NNs
Here I need a milvus collection with all classes
2.
Given: test vectors + class label
Return: distances from test vectors to k-NNs withitng given class (or subset of given class)
and for every vector in given class (or subset of given class) distance to k-NNs of emebddings withing the same class (basically withing class distances). I thought that this part of finding distances between vectors which are already in database might be more efficienly implemented with some internal functions, than getting emebedings with query and then doing search.

For both of these cases I was using the same single collection.
But now I think that for the second use case it is better to use separate collections as was suggested.

xiaofan-luan Dec 11, 2024
Maintainer

what if you put vector with different class lebel into different partitions?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

distances of vectors in the database #38374

{{title}}

Replies: 3 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

distances of vectors in the database #38374

maxshatskiy Dec 11, 2024

Replies: 3 comments · 2 replies

yhmo Dec 11, 2024 Collaborator

maxshatskiy Dec 11, 2024 Author

xiaofan-luan Dec 11, 2024 Maintainer

maxshatskiy Dec 11, 2024 Author

xiaofan-luan Dec 11, 2024 Maintainer

maxshatskiy
Dec 11, 2024

Replies: 3 comments 2 replies

yhmo
Dec 11, 2024
Collaborator

maxshatskiy
Dec 11, 2024
Author

xiaofan-luan
Dec 11, 2024
Maintainer

maxshatskiy Dec 11, 2024
Author

xiaofan-luan Dec 11, 2024
Maintainer