Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOC-12715] Document New Search Request Params for Search Fine-Tuning #291

Open
wants to merge 8 commits into
base: prerelease/7.6.4
Choose a base branch
from
4 changes: 4 additions & 0 deletions modules/search/examples/run-search-full-request.jsonc
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,10 @@
"knn": [
{
"k": 10,
"params": {
"ivf_nprobe_pct": 1,
"ivf_max_codes_pct": 0.2
},
"field": "vector_field",
"vector": [ 0.707106781186548, 0, 0.707106781186548 ]
}
Expand Down
2 changes: 1 addition & 1 deletion modules/search/pages/search-index-params.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -1077,7 +1077,7 @@ The child field's type. Can be one of:

For more information about the available field data types, see xref:field-data-types-reference.adoc[].

|vector_index_optimized_for |String |Vector Only a|
|[#vector-index-optimized-param]#vector_index_optimized_for# |String |Vector Only a|

include::partial$vector-search-field-descriptions.adoc[tag=optimized_for]

Expand Down
40 changes: 40 additions & 0 deletions modules/search/pages/search-request-params.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -173,6 +173,12 @@ The Search Service returns the `k` closest vectors to the vector given in `vecto

NOTE: The <<size-limit,size or limit property>> overrides any value set in `k`.

|params |Object |No a|

Enter additional parameters to control how the Search Service compares vectors when running a Vector Search request.

For more information about the `params` object, see <<knn-params,>>.

|field |String |Yes a|

The name of the field that contains the vector data you want to search.
Expand All @@ -199,6 +205,40 @@ For more information about the dimension value, see the xref:search-index-params

|====

[#knn-params]
=== knn params Object

Use the `params` object inside a `knn` object to fine tune the probes and centroids the Search Services uses and searches while running a Vector Search request.

The `params` object can contain the following properties:

[cols="1,1,1,4"]
|====
|Property |Type |Required? |Description

|ivf_nprobe_pct |Number (percentage) |No a|

Set the `ivf_nprobe_pct` value to control the percentage of clusters that the Search Service searches during a single Vector Search query, or the percentage of probes used.

The Search Service automatically calculates a default `nprobe` percentage based on the vectors in a given partition of your Vector Search index.
For more information about this calculation, see xref:vector-search:fine-tune-vector-search.adoc[].

If you set the value of `ivf_nprobe_pct` higher than this default calculated value, the Search Service will search a higher percentage of clusters in your processed vectors.
This can increase your accuracy and recall for Vector Search, but requires more compute time for each query.

In the example, the Search Service searches only `1%` of the total available clusters.

|ivf_max_codes_pct |Number (percentage out of 100) |No a|

Set the `ivf_max_codes_pct` value to control the maximum number of centroids that the Search Service accesses during a single Vector Search query.

By default, this value is always 100%.

If you reduce your `ivf_max_codes_pct` value, the Search Service accesses fewer centroids, which reduces your Vector Search accuracy and recall, but gives faster compute times for your search.

In the example, the Search Service searches only `0.2%` of the available centroids in your vector data.
|====

[#query-object]
== Query Object

Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
// tag::optimized_for[]
For a `vector` child field, choose whether the Search Service should prioritize recall or latency when returning similar vectors in search results:
For a `vector` child field, choose whether the Search Service should prioritize recall, latency, or memory efficiency when returning similar vectors in search results:

* *recall*: The Search Service prioritizes returning the most accurate result.
This may increase resource usage for Search queries.
Expand All @@ -12,6 +12,11 @@ This may reduce the accuracy of results.
+
The Search Service uses half the `nprobe` value calculated for *recall* priority.

* *memory-efficient*: The Search Service prioritizes reducing memory usage and optimizes search operations for less resources.
This may reduce both accuracy (recall) and latency.
+
The Search Service uses either an inverted file index with scalar quantization, or a directly mapped index with exact vector comparisons, depending on the number of vectors in your data.

For more information about Vector Search indexes, see xref:vector-search:vector-search.adoc[] or xref:vector-search:create-vector-search-index-ui.adoc[].
// end::optimized_for[]
// tag::similarity_metric[]
Expand Down
151 changes: 151 additions & 0 deletions modules/vector-search/pages/fine-tune-vector-search.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,151 @@
= Fine-Tuning a Vector Search Query
:stem: asciimath
:page-ui-name: {ui-name}
:page-product-name: {product-name}
:description: Add additional parameters to a Vector Search REST API call to tune the search for recall or accuracy.

[abstract]
{description}



The Search Service automatically tunes your Vector Search indexes to achieve a balance between:

* Recall, or the quality of your search results
* Latency, or your search response time
* Memory efficiency

This tuning occurs during indexing and querying.
You do not need to adjust these parameters manually.

Specifically, the Search Service dynamically adjusts two critical vector parameters:

`nlist`, also known as `Centroid` count::
The number of clusters used for indexing.
Centroids are used to quickly find the surrounding closest matches in the Vector Search index.
Increasing the number of centroids will increase accuracy but will decrease the speed of the search.
+
The `nlist` is determined dynamically based on the size of the dataset, or the number of vectors in a partition:
+
[%header, cols="3*a"]
|===
| Number of vectors in partition (`nvec`)
| `Centroid count` (`nlist` calculation)
| Notes

| stem:["nvec " ge " 200,000"]
| stem:[4 xx sqrt("nvec")]
| This formula is designed to handle larger datasets
where increasing the number of datasets does not yield significant improvements in recall.

| stem:["1000" le "nvec" le "200,000" ]
| stem:["nvec" / 100]
| This formula targets approximately 100 vectors per cluster,
which balances between too few and too many clusters, ensuring efficient indexing.

| stem:["nvec" lt 1000]
| N/A
| For a number of vectors less than 1000, the Search Service will carry out a straight forward one-to-one mapping between IDs and vectors with an exact vector comparison.
Vectors are directly stored without the need for additional processing for the `nlist` calculation.

|===

`nprobes` (or `probes`)::
This is the number of `centroids` that a Search query will check for similar vectors.
The `nprobe` value is only set when the Search Service is using an Inverted File Index. The Search Service will select the best index type and comparison method depending on the size of the dataset and your `vector_index_optimized_for` setting.
+
(For more information about the `vector_index_optimized_for` setting, see xref:search:search-index-params.adoc#vector-index-optimized-param[Search Index JSON properties ])
+
+
[%header, cols="3*a"]
|===
| Query optimization
| `nprobe` calculation
| Notes


| Default calculation
| stem:[sqrt("nlist")]
| This provides a balanced tradeoff between recall and latency by adjusting the number of clusters probed during queries.

| Latency-optimized calculation (`vector_index_optimized_for: latency`)
| stem:[sqrt("nlist") / 2]
| A minimum value of 1 is enforced to avoid setting `nprobe` too low.

|===

== Default `nlist` and `nprobe` calculations on a Vector Search Index


The cluster maintains two dynamically adjusted parameters that will affect the speed, accuracy, and resources used during the search:

`ivf_nprobe_pct`::
The percentage of clusters searched during queries, allowing for fine-tuning of the balance between recall and performance.
If the value of `nprobe` is 5% of `nlist` (the centroid count), then setting the value of `ivf_nprobe_pct` higher than 5% will have the search cover a higher percentage of clusters, which will improve the accuracy of the search.

`ivf_max_codes_pct`::
The value represents the percentage of `centroids` that will be visited during a search.
Reducing the value reduces the number of centroids visited, which will decrease accuracy and recall, but will result in faster compute times. The default value is 100 (i.e 100% of the centroids will be visited during the search).


.Default calculation
====
If you have a Vector Search index with `vector_index_optimized_for` set to `"recall"` and `indexPartitions` set to `5`, then the `centroid count` (`nlist`) and `nprobe` are determined based on the current vector count in a given partition.
[options="noheader", frame="none", grid="none" cols="1,1"]

|===
| Total vectors in index (optimization = recall)
| 10,000,000

| Average vectors in a partition for 5 partitions total
| 2,000,000

| centroid count (`nlist`) = stem:[4 times sqrt("total vectors in index")]
| 5657

| nprobes = stem:[sqrt(nlist)]
| 75

| Calculated default: `ivf_nprobe_pct`
| 1.325%

| Calculated default: `ivf_max_codes_pct`
| 100% (default value)

|===

====

== Fine-Tuning Query Parameters


You can add set the values of `ivf_nprobe_pct` and `ivf_max_codes_pct` in your Vector Search queries to tune the recall or accuracy of your search.

You can add the following parameters to your query:

[source, json]
.Using tuning parameters
----
{
"fields": ["*"],
"knn": [{
"k": 10,
"params": {
"ivf_nprobe_pct": 1,
"ivf_max_codes_pct": 0.2
},
"field": "embedding",
"vector": [0.024901132253900747, 1535]
}]
}
----

In the example above, the search will be carried out across all the fields, looking for close matches for the numbers specified in `vector`.
The parameters have been set to search 1% of the clusters, and 0.2 per cent of the centroids.

For more information about how to set and include these parameters in a Vector Search query, see xref:search:search-request-params.adoc#knn-params[Search Request JSON Properties].





2 changes: 2 additions & 0 deletions modules/vector-search/partials/nav.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,5 @@
** xref:7.6@server:vector-search:run-vector-search-ui.adoc[]
** xref:7.6@server:vector-search:run-vector-search-rest-api.adoc[]
** xref:7.6@server:vector-search:run-vector-search-sdk.adoc[]
** xref:7.6@server:vector-search:fine-tune-vector-search.adoc[]