Skip to content
This repository has been archived by the owner on Aug 2, 2022. It is now read-only.

Performance benchmarks #41

Closed
mikepalei opened this issue Feb 7, 2020 · 8 comments
Closed

Performance benchmarks #41

mikepalei opened this issue Feb 7, 2020 · 8 comments
Labels
question Further information is requested

Comments

@mikepalei
Copy link

Hi guys,

thanks for publishing this wonderful plugins. Do you by any chance have some performance benchmarks?

I used an EC2 instance (16 CPUs 64GB RAM) and indexed 100K documents with vectors of size 4K. It takes 70-80 ms to execute a search query. Is there a way to boost it yet further?

Many thanks,
Mike

@vamshin
Copy link
Member

vamshin commented Feb 7, 2020

Hi @mikepalei,

Thanks for interest in KNN plugin. We are in the process of publishing performance benchmarks [#42 ]. Stay tuned :)

Couple of suggestions to improve performance

  • Forcemerge segment count to 1.

Lucene runs through each segment sequentially to answer search query on a shard. You can reduce to 1 segment, so you just have 1 graph.

  • Avoid loading all the stored fields

If you are just trying out vector search(all you need is the nearest doc ids for the query vector), you can improve the performance by asking Elasticsearch not to read the stored fields.

Example;-

{
"size": 5,
"stored_fields": "_none_",
"docvalue_fields": ["_id"],
"query": {
  "knn": {
   "v": {            
     "vector": [-0.16490704,-0.047262248,-0.078923926],
     "k": 50
   }       
  }
}
}

Also let us know which ES version you are referring to. We recently fixed multiple leaks and improved performance in opendistro-1.4. We are yet to backport changes to other release.

@vamshin vamshin added the question Further information is requested label Feb 7, 2020
@mikepalei
Copy link
Author

Thanks @vamshin ! I shall certainly try that and report the results.

@mnagaya
Copy link

mnagaya commented May 19, 2020

Hello,

Is there a way to control ef_search in the query?

I am trying to bench K-NN Plugin:
jobergum/dense-vector-ranking-performance#4

Thanks,

@vamshin
Copy link
Member

vamshin commented May 19, 2020

Hello,

Is there a way to control ef_search in the query?

I am trying to bench K-NN Plugin:
jobergum/dense-vector-ranking-performance#4

Thanks,

Sorry we cannot control ef_search in the query. But definitely seem like something we should support. Created issue #116

@mnagaya
Copy link

mnagaya commented May 19, 2020

Thanks @vamshin ! I watched it.

@vamshin
Copy link
Member

vamshin commented May 19, 2020

Hello,
Is there a way to control ef_search in the query?
I am trying to bench K-NN Plugin:
jobergum/dense-vector-ranking-performance#4
Thanks,

Sorry we cannot control ef_search in the query. But definitely seem like something we should support. Created issue #116

Also on the side note, couple of suggestions for bench mark,

  • graphs are loaded to memory only when you search and cached later on. So you might want to run queries for warm up, may be couple of minutes. We have plan to expose warmup api to make it easy.

  • Force merge to single segment to have one single graph.
    POST /<index_name>/_forcemerge?max_num_segments=1

You might find this link useful for indexing/search tuning https://medium.com/@kumon/how-to-realize-similarity-search-with-elasticsearch-3dd5641b9adb

@mnagaya
Copy link

mnagaya commented May 19, 2020

@vamshin Thanks you for suggestions. I will check it out.

@vamshin
Copy link
Member

vamshin commented Nov 5, 2020

Duplicate of #42

@vamshin vamshin marked this as a duplicate of #42 Nov 5, 2020
@vamshin vamshin closed this as completed Nov 5, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants