Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Publish blog and triage meeting details. #3027

Merged
merged 13 commits into from
Jun 27, 2024
Merged
2 changes: 1 addition & 1 deletion _community_members/navtat.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
name: Naveen Tatikonda
short_name: naveen
photo: '/assets/media/community/members/navtat.jpg'
photo: '/assets/media/community/members/navtat.png'
title: 'OpenSearch Community Member: Naveen Tatikonda'
primary_title: Naveen Tatikonda
breadcrumbs:
Expand Down
83 changes: 83 additions & 0 deletions _events/2024-0702-dev-triage-ml-commons.markdown
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
---
calendar_date: '2024-07-02'
eventdate: 2024-07-02 10:30:00 -0700
primary_title: Development Backlog & Triage Meeting - ml-commons - 2024-07-02
title: Development Backlog & Triage Meeting - ml-commons - 2024-07-02
online: true
signup:
url: https://www.meetup.com/opensearch/events/301901152
title: Join on Meetup

---

Join the OpenSearch ml-commons team for their next backlog & triage planning meeting.

(hosts: [Yaliang Wu](https://github.com/ylwu-amzn), [Dhrubo Saha](https://github.com/dhrubo-os), [Jing Zhang](https://github.com/jngz-es), & [Xun Zhang](https://github.com/Zhangxunmt))

---

**Join Zoom Meeting**
```json
https://us02web.zoom.us/j/82164218920

Meeting ID: 821 6421 8920
Passcode: 259735

---

One tap mobile
+12532050468,,82164218920# US
+12532158782,,82164218920# US (Tacoma)

---
Dial by your location
• +1 253 205 0468 US
• +1 253 215 8782 US (Tacoma)
• +1 346 248 7799 US (Houston)
• +1 669 444 9171 US
• +1 669 900 9128 US (San Jose)
• +1 719 359 4580 US
• +1 646 558 8656 US (New York)
• +1 646 931 3860 US
• +1 689 278 1000 US
• +1 301 715 8592 US (Washington DC)
• +1 305 224 1968 US
• +1 309 205 3325 US
• +1 312 626 6799 US (Chicago)
• +1 360 209 5623 US
• +1 386 347 5053 US
• +1 507 473 4847 US
• +1 564 217 2000 US
• 877 853 5247 US Toll-free
• 888 788 0099 US Toll-free

Meeting ID: 821 6421 8920

Find your local number: https://us02web.zoom.us/u/kcboT3QOI

```

---

**Agenda:**

**Triage issues** *(add the triaged label once reviewed/ready. They can be also labelled as sprint backlog if we are looking to queueing them up next, or good first issue / help wanted when appropriate.)*

* [Backend ml-commons](https://github.com/opensearch-project/ml-commons/issues)
* [Dashboards ml-commons](https://github.com/opensearch-project/ml-commons-dashboards/issues)

**Sprint backlog** *(Examine if it still reflects the work that we are committing to doing and is it in the right priority order)*

* [Backend ml-commons](https://github.com/opensearch-project/ml-commons/issues)
* [Dashboards ml-commons](https://github.com/opensearch-project/ml-commons-dashboards/issues)

**Backlog** *(anything we should move to sprint backlog? anything we should tag asking for help from the community?)*

* [Backend ml-commons](https://github.com/opensearch-project/ml-commons/issues)
* [Dashboards ml-commons](https://github.com/opensearch-project/ml-commons-dashboards/issues)


***Please see Meetup link for URL and required passcode.***


*By joining the Development Backlog & Triage Meeting, you grant OpenSearch, and our affiliates the right to record, film, photograph, and capture your voice and image during the Development Backlog & Triage Meeting (the “Recordings”). You grant to us an irrevocable, nonexclusive, perpetual, worldwide, royalty-free right and license to use, reproduce, modify, distribute, and translate, for any purpose, all or any part of the Recordings and Your Materials. For example, we may distribute Recordings or snippets of Recordings via our social media outlets.*
285 changes: 285 additions & 0 deletions _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,285 @@
---
layout: post
title: "Optimizing OpenSearch with Faiss FP16 scalar quantization: Enhancing memory efficiency and cost-effectiveness"
authors:
- naveen
- vamshin
- tal
date: 2024-06-27 00:00:00 -0700
categories:
- technical-posts
meta_keywords: faiss scalar quantization, OpenSearch k-NN plugin, FP16 scalar quantization, vector embeddings
meta_description: Learn how FP16 Quantization in OpenSearch helps to reduce memory requirements up to 50% with a very minimal loss in quality.
has_science_table: true
---

The rise of large language models (LLMs) and generative AI has ushered in a new era of natural language processing capabilities. Vector databases have emerged as a crucial
component in this landscape, acting as external databases that can efficiently index, store, and retrieve embeddings generated by LLMs. However, as the scale and complexity
of LLMs continue to grow, vector database workloads have also increased significantly. Ingesting and querying billions of vectors can strain computational resources,
leading to higher memory requirements and increased operational costs. Faiss scalar quantization enables you to store vector embeddings with lower precision, which reduces memory consumption and, consequently, lowers costs.

## Why use Faiss scalar quantization?

When you index vectors in [OpenSearch 2.13](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-2.13.0.md) or later versions, you can configure your k-NN index to apply _scalar quantization_. Scalar quantization converts each dimension of a vector from a 32-bit floating-point (`fp32`) to a 16-bit floating-point (`fp16`) representation. Using the Faiss scalar quantizer (SQfp16), integrated in the k-NN plugin, saves about 50% of the memory with minimal reduction in recall (see [Benchmarking results](#benchmarking-results)). When used with [SIMD optimization](https://opensearch.org/docs/latest/search-plugins/knn/knn-index#simd-optimization-for-the-faiss-engine),

Check failure on line 23 in _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md#L23

[OpenSearch.TableHeadings] 'm' is a table heading and should be in sentence case.
Raw output
{"message": "[OpenSearch.TableHeadings] 'm' is a table heading and should be in sentence case.", "location": {"path": "_posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md", "range": {"start": {"line": 23, "column": 62}}}, "severity": "ERROR"}
SQfp16 quantization can also significantly reduce search latencies and improve indexing throughput.

## How to use Faiss scalar quantization

To use Faiss scalar quantization, set the k-NN vector field's `method.parameters.encoder.name` to `sq` when creating a k-NN index:

```json
PUT /test-index
{
"settings": {
"index": {
"knn": true
}
},
"mappings": {
"properties": {
"my_vector1": {
"type": "knn_vector",
"dimension": 8,
"method": {
"name": "hnsw",
"engine": "faiss",
"space_type": "l2",
"parameters": {
"encoder": {
"name": "sq",
"parameters": {
"type": "fp16",
"clip": true
}
},
"ef_construction": 256,
"m": 8
}
}
}
}
}
}
```

For more information about the SQ parameters, see the [k-NN documentation](https://opensearch.org/docs/latest/search-plugins/knn/knn-index/#sq-parameters).

The `fp16` encoder converts 32-bit vectors into their 16-bit counterparts. For this encoder type, the vector values must be in the range **[-65504.0, 65504.0]**.

The `clip` parameter above specifies how to handle out-of-range values:

Check failure on line 69 in _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md#L69

[OpenSearch.SpacingWords] There should be one space between words in 'above specifies'.
Raw output
{"message": "[OpenSearch.SpacingWords] There should be one space between words in 'above specifies'.", "location": {"path": "_posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md", "range": {"start": {"line": 69, "column": 22}}}, "severity": "ERROR"}

* By default, `clip` is `false`, and any vectors containing out-of-range values are rejected.
* When `clip` is set to `true`, out of-range vector values are rounded up or down so that they are in the supported range. For example, if the original 32-bit vector is
`[65510.82, -65504.1]`, the vector will be indexed in the range `[65504.0, -65504.0]`.

**Note**: We recommend setting `clip` to `true` only if very few elements lie outside of the supported range. Rounding the values may cause a drop in recall.

During ingestion, make sure each dimension of the vector is within the supported range ([-65504.0, 65504.0]):

```json
PUT test-index/_doc/1
{
"my_vector1": [-65504.0, 65503.845, 55.82, -65300.456, 34.67, -1278.23, 90.62, 8.36]
}
```

During querying, there is no range limitation for the query vector:

```json
GET test-index/_search
{
"size": 2,
"query": {
"knn": {
"my_vector1": {
"vector": [265436.876, -120906.256, 99.84, 89.45, 100000.45, 9.23, -70.17, 6.93],
"k": 2
}
}
}
}
```

## HNSW memory estimation with fp16

The memory required for HNSW is estimated to be `1.1 * (2 * dimension + 8 * M)` bytes/vector.

As an example, assume that you have 1 million vectors with a dimension of 256 and M of 16. The memory requirement can be estimated as follows:

`1.1 * (2 * 256 + 8 * 16) * 1,000,000 ~= 0.656 GB`

For more information about memory estimation for scalar quantization with the inverted file (IVF) algorithm, refer to [this documentation](https://opensearch.org/docs/latest/search-plugins/knn/knn-vector-quantization/#memory-estimation-1).

## Benchmarking results

We ran benchmarking tests on some popular datasets using our [opensearch-benchmark](https://github.com/opensearch-project/opensearch-benchmark-workloads/tree/main/vectorsearch) tool
to compare the indexing, search performance, and quality of search results of Faiss scalar quantization. We compared Faiss scalar quantization (FP16) against using Faiss with float vectors without any encoding (FP32). All tests were performed with [SIMD](https://opensearch.org/docs/latest/search-plugins/knn/knn-index/#simd-optimization-for-the-faiss-engine) (Single Instruction Multiple Data).
enabled on x86 architecture with AVX2 optimization.

**Note**: Without SIMD optimization (AVX2 or NEON) or with AVX2 disabled (on x86 architecture), the quantization process introduces additional overhead, which leads to an increase in latency.
For information about processors that support AVX2, see [CPUs with AVX2](https://en.wikipedia.org/wiki/Advanced_Vector_Extensions#CPUs_with_AVX2). In an AWS environment, all community Amazon Machine Images (AMIs) with [HVM](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/virtualization_types.html) support AVX2 optimization for the x86 architecture.

### Benchmarking results using small workloads

We ran the following tests on a single-node cluster without any replicas.


#### Configuration

|m |ef_construction |ef_search |replica|

Check failure on line 129 in _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md#L129

[OpenSearch.Spelling] Error: ef_construction. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.
Raw output
{"message": "[OpenSearch.Spelling] Error: ef_construction. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md", "range": {"start": {"line": 129, "column": 5}}}, "severity": "ERROR"}

Check failure on line 129 in _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md#L129

[OpenSearch.TableHeadings] 'ef_construction' is a table heading and should be in sentence case.
Raw output
{"message": "[OpenSearch.TableHeadings] 'ef_construction' is a table heading and should be in sentence case.", "location": {"path": "_posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md", "range": {"start": {"line": 129, "column": 5}}}, "severity": "ERROR"}

Check failure on line 129 in _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md#L129

[OpenSearch.Spelling] Error: ef_search. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.
Raw output
{"message": "[OpenSearch.Spelling] Error: ef_search. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md", "range": {"start": {"line": 129, "column": 22}}}, "severity": "ERROR"}

Check failure on line 129 in _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md#L129

[OpenSearch.TableHeadings] 'ef_search' is a table heading and should be in sentence case.
Raw output
{"message": "[OpenSearch.TableHeadings] 'ef_search' is a table heading and should be in sentence case.", "location": {"path": "_posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md", "range": {"start": {"line": 129, "column": 22}}}, "severity": "ERROR"}

Check failure on line 129 in _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md#L129

[OpenSearch.TableHeadings] 'replica' is a table heading and should be in sentence case.
Raw output
{"message": "[OpenSearch.TableHeadings] 'replica' is a table heading and should be in sentence case.", "location": {"path": "_posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md", "range": {"start": {"line": 129, "column": 33}}}, "severity": "ERROR"}
|--- |--- |--- |--- |
|16 |100 |100 |0 |

The dataset and other configuration details are listed in the following table.

|Dataset ID |Dataset |Vector dimension |Data size |Number of queries |Training data range |Query data range |Space type |Primary shards |Indexing clients|
|--- |--- |--- |--- |--- |--- |--- |--- |--- |--- |
|Dataset 1 |gist-960-euclidean |960 |1,000,000 |1,000 |[ 0.0, 1.48 ] |[ 0.0, 0.729 ] |L2 |8 |16|
|Dataset 2 |mnist-784-euclidean |784 |60,000 |10,000 |[ 0.0, 255.0 ] |[ 0.0, 255.0 ] |L2 |1 |2|
|Dataset 3 |cohere-wiki-simple-embeddings-768 |768 |475,858 |10,000 |[ -4.1561704, 5.5478516 ] |[ -4.065383, 5.4902344 ] |L2 |4 |8|
|Dataset 4 |cohere-ip-1m |768 |1,000,000 |10,000 |[ -4.1073565, 5.504557 ] |[ -4.109505, 5.4809895 ] |innerproduct |8 |16|
|Dataset 5 |sift-128-euclidean |128 |1,000,000 |10,000 |[ 0.0, 218.0 ] |[ 0.0, 184.0 ] |L2 |8 |16|

#### Recall and memory results

|Dataset ID | Faiss hnsw recall@100 | Faiss hnsw-sqfp16 recall@100 |Faiss hnsw memory estimate (gb) |Faiss hnsw-sqfp16 memory estimate (gb) |Faiss hnsw memory usage (gb) |Faiss hnsw-sqfp16 memory usage (gb) |% reduction in memory |

Check failure on line 145 in _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md#L145

[OpenSearch.Spelling] Error: hnsw. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.
Raw output
{"message": "[OpenSearch.Spelling] Error: hnsw. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md", "range": {"start": {"line": 145, "column": 21}}}, "severity": "ERROR"}

Check failure on line 145 in _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md#L145

[OpenSearch.Spelling] Error: hnsw. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.
Raw output
{"message": "[OpenSearch.Spelling] Error: hnsw. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md", "range": {"start": {"line": 145, "column": 77}}}, "severity": "ERROR"}

Check failure on line 145 in _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md#L145

[OpenSearch.Spelling] Error: gb. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.
Raw output
{"message": "[OpenSearch.Spelling] Error: gb. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md", "range": {"start": {"line": 145, "column": 99}}}, "severity": "ERROR"}

Check failure on line 145 in _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md#L145

[OpenSearch.Spelling] Error: gb. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.
Raw output
{"message": "[OpenSearch.Spelling] Error: gb. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md", "range": {"start": {"line": 145, "column": 139}}}, "severity": "ERROR"}

Check failure on line 145 in _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md#L145

[OpenSearch.Spelling] Error: hnsw. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.
Raw output
{"message": "[OpenSearch.Spelling] Error: hnsw. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md", "range": {"start": {"line": 145, "column": 150}}}, "severity": "ERROR"}

Check failure on line 145 in _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md#L145

[OpenSearch.Spelling] Error: gb. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.
Raw output
{"message": "[OpenSearch.Spelling] Error: gb. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md", "range": {"start": {"line": 145, "column": 169}}}, "severity": "ERROR"}

Check failure on line 145 in _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md#L145

[OpenSearch.Spelling] Error: gb. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.
Raw output
{"message": "[OpenSearch.Spelling] Error: gb. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md", "range": {"start": {"line": 145, "column": 206}}}, "severity": "ERROR"}

Check failure on line 145 in _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md#L145

[OpenSearch.TableHeadings] '% reduction in memory' is a table heading and should be in sentence case.
Raw output
{"message": "[OpenSearch.TableHeadings] '% reduction in memory' is a table heading and should be in sentence case.", "location": {"path": "_posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md", "range": {"start": {"line": 145, "column": 211}}}, "severity": "ERROR"}
|--- |--------------------|-------------------------|--- |--- |--- |--- |-- |
|Dataset 1 | 0.91 | 0.91 |4.07 |2.10 |3.72 |1.93 |48 |
|Dataset 2 | 0.99 | 0.99 |0.20 |0.10 |0.18 |0.10 |44|
|Dataset 3 | 0.95 | 0.95 |1.56 |0.81 |1.43 |0.75 |48|
|Dataset 4 | 0.94 | 0.94 |3.28 |1.70 |3.00 |1.57 |48|
|Dataset 5 | 0.99 | 0.99 |0.66 |0.39 |0.62 |0.38 |39|

#### Indexing and query results

|Dataset ID |Faiss hnsw mean throughput (docs/sec) |Faiss hnsw-sqfp16 mean throughput (docs/sec) |Faiss hnsw p90 (ms) |Faiss hnsw-sqfp16 p90 (ms) |Faiss hnsw p99 (ms) |Faiss hnsw-sqfp16 p99 (ms) |

Check failure on line 155 in _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md#L155

[OpenSearch.Spelling] Error: hnsw. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.
Raw output
{"message": "[OpenSearch.Spelling] Error: hnsw. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md", "range": {"start": {"line": 155, "column": 20}}}, "severity": "ERROR"}

Check failure on line 155 in _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md#L155

[OpenSearch.Spelling] Error: hnsw. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.
Raw output
{"message": "[OpenSearch.Spelling] Error: hnsw. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md", "range": {"start": {"line": 155, "column": 105}}}, "severity": "ERROR"}

Check failure on line 155 in _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md#L155

[OpenSearch.Spelling] Error: hnsw. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.
Raw output
{"message": "[OpenSearch.Spelling] Error: hnsw. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md", "range": {"start": {"line": 155, "column": 154}}}, "severity": "ERROR"}
|--- |--- |--- |--- |--- |--- |--- |
|Dataset 1 |4681 |4696 |4.97 |5.08 |5.54 |5.50|
|Dataset 2 |4271 |4580 |2.01 |2.06 |2.16 |2.21|
|Dataset 3 |4690 |4698 |3.35 |3.33 |3.58 |3.57|
|Dataset 4 |6044 |6129 |4.61 |4.81 |5.16 |5.37|
|Dataset 5 |115499 |102060 |2.73 |2.68 |2.96 |2.89|

#### Analysis

When comparing the benchmarking results, note that:

* The recall obtained using Faiss HNSW SQfp16 matches that of Faiss HNSW (with a negligible difference).
* Using SQfp16, there is a significant reduction in memory usage of up to **48%**, with a slight reduction in disk usage. These results indicate that a larger vector dimension leads to greater memory reduction.
* When using SQfp16, the performance metrics are similar to those of `fp32` vectors.


### Benchmarking results using large workloads

To compare performance metrics and memory savings, we ran tests on the large-scale [Laion](https://laion.ai/about/) 100M dataset with 768 dimensions, using both Faiss HNSW SQfp16 and Faiss HNSW.

Check failure on line 174 in _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md#L174

[OpenSearch.Spelling] Error: Laion. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.
Raw output
{"message": "[OpenSearch.Spelling] Error: Laion. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md", "range": {"start": {"line": 174, "column": 85}}}, "severity": "ERROR"}

#### Configuration

| |Faiss HNSW SQfp16 |Faiss HNSW |

Check failure on line 178 in _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md#L178

[OpenSearch.TableHeadings] 'Faiss HNSW SQfp16' is a table heading and should be in sentence case.
Raw output
{"message": "[OpenSearch.TableHeadings] 'Faiss HNSW SQfp16' is a table heading and should be in sentence case.", "location": {"path": "_posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md", "range": {"start": {"line": 178, "column": 4}}}, "severity": "ERROR"}
|--- |--- |--- |
|OpenSearch version |2.13 |2.13 |
|Engine |faiss |faiss |

Check failure on line 181 in _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md#L181

[Vale.Terms] Use 'Faiss' instead of 'faiss'.
Raw output
{"message": "[Vale.Terms] Use 'Faiss' instead of 'faiss'.", "location": {"path": "_posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md", "range": {"start": {"line": 181, "column": 10}}}, "severity": "ERROR"}

Check failure on line 181 in _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md#L181

[Vale.Terms] Use 'Faiss' instead of 'faiss'.
Raw output
{"message": "[Vale.Terms] Use 'Faiss' instead of 'faiss'.", "location": {"path": "_posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md", "range": {"start": {"line": 181, "column": 17}}}, "severity": "ERROR"}
|Vector dimension |768 |768 |
|Ingest vectors |100M |100M |
|Test vectors |1k |1k |
|Primary shards |36 |36 |
|Replica shards |0 |0 |
|Data nodes |4 |8 |
|Data node instance type |r5.4xlarge |r5.4xlarge |
|Cluster manager nodes |3 |3 |
|Cluster manager node instance type |c5.xlarge |c5.xlarge |
|Indexing clients |9 |9 |
|Query clients |1 |1 |
|Force merge segments |1 |1 |
|Client instance |r5.16xlarge |r5.16xlarge |

Config ID |Optimization strategy |m |ef_construction |ef_search |

Check failure on line 196 in _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md#L196

[OpenSearch.TableHeadings] 'm' is a table heading and should be in sentence case.
Raw output
{"message": "[OpenSearch.TableHeadings] 'm' is a table heading and should be in sentence case.", "location": {"path": "_posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md", "range": {"start": {"line": 196, "column": 35}}}, "severity": "ERROR"}

Check failure on line 196 in _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md#L196

[OpenSearch.TableHeadings] 'ef_construction' is a table heading and should be in sentence case.
Raw output
{"message": "[OpenSearch.TableHeadings] 'ef_construction' is a table heading and should be in sentence case.", "location": {"path": "_posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md", "range": {"start": {"line": 196, "column": 38}}}, "severity": "ERROR"}

Check failure on line 196 in _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md#L196

[OpenSearch.Spelling] Error: ef_construction. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.
Raw output
{"message": "[OpenSearch.Spelling] Error: ef_construction. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md", "range": {"start": {"line": 196, "column": 38}}}, "severity": "ERROR"}

Check failure on line 196 in _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md#L196

[OpenSearch.Spelling] Error: ef_search. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.
Raw output
{"message": "[OpenSearch.Spelling] Error: ef_search. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md", "range": {"start": {"line": 196, "column": 55}}}, "severity": "ERROR"}

Check failure on line 196 in _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md#L196

[OpenSearch.TableHeadings] 'ef_search' is a table heading and should be in sentence case.
Raw output
{"message": "[OpenSearch.TableHeadings] 'ef_search' is a table heading and should be in sentence case.", "location": {"path": "_posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md", "range": {"start": {"line": 196, "column": 55}}}, "severity": "ERROR"}
|--- |--- |--- |--- |--- |
|hnsw1 |Default configuration |16 |100 |100 |
|hnsw2 |Balance between latency, memory, and recall |16 |128 |128 |
|hnsw3 |Optimize for recall |16 |256 |256 |

Faiss HNSW SQfp16 requires 4 data nodes---half the number needed for Faiss HNSW (8). This demonstrates that SQfp16 reduces memory requirements by 50%.
For more information about estimating the required memory and number of data nodes, see the [Appendix](#appendix-memory-and-data-node-requirement-estimation).

#### Recall and memory results

|Experiment ID |hnsw-recall@1000 |hnsw-sqfp16-recall@1000 |hnsw memory usage (gb) |hnsw-sqfp16 memory usage (gb) |% reduction in memory |

Check failure on line 207 in _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md#L207

[OpenSearch.TableHeadings] 'hnsw-recall@1000' is a table heading and should be in sentence case.
Raw output
{"message": "[OpenSearch.TableHeadings] 'hnsw-recall@1000' is a table heading and should be in sentence case.", "location": {"path": "_posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md", "range": {"start": {"line": 207, "column": 17}}}, "severity": "ERROR"}

Check failure on line 207 in _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md#L207

[OpenSearch.TableHeadings] 'hnsw-sqfp16-recall@1000' is a table heading and should be in sentence case.
Raw output
{"message": "[OpenSearch.TableHeadings] 'hnsw-sqfp16-recall@1000' is a table heading and should be in sentence case.", "location": {"path": "_posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md", "range": {"start": {"line": 207, "column": 35}}}, "severity": "ERROR"}

Check failure on line 207 in _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md#L207

[OpenSearch.Spelling] Error: hnsw. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.
Raw output
{"message": "[OpenSearch.Spelling] Error: hnsw. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md", "range": {"start": {"line": 207, "column": 60}}}, "severity": "ERROR"}

Check failure on line 207 in _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md#L207

[OpenSearch.TableHeadings] 'hnsw memory usage (gb)' is a table heading and should be in sentence case.
Raw output
{"message": "[OpenSearch.TableHeadings] 'hnsw memory usage (gb)' is a table heading and should be in sentence case.", "location": {"path": "_posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md", "range": {"start": {"line": 207, "column": 60}}}, "severity": "ERROR"}

Check failure on line 207 in _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md#L207

[OpenSearch.Spelling] Error: gb. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.
Raw output
{"message": "[OpenSearch.Spelling] Error: gb. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md", "range": {"start": {"line": 207, "column": 79}}}, "severity": "ERROR"}

Check failure on line 207 in _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md#L207

[OpenSearch.TableHeadings] 'hnsw-sqfp16 memory usage (gb)' is a table heading and should be in sentence case.
Raw output
{"message": "[OpenSearch.TableHeadings] 'hnsw-sqfp16 memory usage (gb)' is a table heading and should be in sentence case.", "location": {"path": "_posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md", "range": {"start": {"line": 207, "column": 84}}}, "severity": "ERROR"}

Check failure on line 207 in _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md#L207

[OpenSearch.Spelling] Error: gb. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.
Raw output
{"message": "[OpenSearch.Spelling] Error: gb. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md", "range": {"start": {"line": 207, "column": 110}}}, "severity": "ERROR"}

Check failure on line 207 in _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md#L207

[OpenSearch.TableHeadings] '% reduction in memory' is a table heading and should be in sentence case.
Raw output
{"message": "[OpenSearch.TableHeadings] '% reduction in memory' is a table heading and should be in sentence case.", "location": {"path": "_posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md", "range": {"start": {"line": 207, "column": 115}}}, "severity": "ERROR"}
|--- |--- |--- |--- |--- |--- |
|hnsw 1 |0.94 |0.94 |300.28 |157.23 |47.64 |

Check failure on line 209 in _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md#L209

[OpenSearch.Spelling] Error: hnsw. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.
Raw output
{"message": "[OpenSearch.Spelling] Error: hnsw. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md", "range": {"start": {"line": 209, "column": 2}}}, "severity": "ERROR"}
|hnsw 2 |0.96 |0.96 |300.28 |157.23 |47.64 |

Check failure on line 210 in _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md#L210

[OpenSearch.Spelling] Error: hnsw. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.
Raw output
{"message": "[OpenSearch.Spelling] Error: hnsw. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md", "range": {"start": {"line": 210, "column": 2}}}, "severity": "ERROR"}
|hnsw 3 |0.98 |0.98 |300.28 |157.23 |47.64 |

Check failure on line 211 in _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md#L211

[OpenSearch.Spelling] Error: hnsw. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.
Raw output
{"message": "[OpenSearch.Spelling] Error: hnsw. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md", "range": {"start": {"line": 211, "column": 2}}}, "severity": "ERROR"}

#### Indexing and query results

|Experiment ID |hnsw mean throughput (docs/sec) |hnsw-sqfp16 mean throughput (docs/sec) |hnsw p90 (ms) |hnsw-sqfp16 p90 (ms) |hnsw p99 (ms) |hnsw-sqfp16 p99 (ms) |

Check failure on line 215 in _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md#L215

[OpenSearch.Spelling] Error: hnsw. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.
Raw output
{"message": "[OpenSearch.Spelling] Error: hnsw. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md", "range": {"start": {"line": 215, "column": 17}}}, "severity": "ERROR"}

Check failure on line 215 in _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md#L215

[OpenSearch.TableHeadings] 'hnsw mean throughput (docs/sec)' is a table heading and should be in sentence case.
Raw output
{"message": "[OpenSearch.TableHeadings] 'hnsw mean throughput (docs/sec)' is a table heading and should be in sentence case.", "location": {"path": "_posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md", "range": {"start": {"line": 215, "column": 17}}}, "severity": "ERROR"}

Check failure on line 215 in _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md#L215

[OpenSearch.TableHeadings] 'hnsw-sqfp16 mean throughput (docs/sec)' is a table heading and should be in sentence case.
Raw output
{"message": "[OpenSearch.TableHeadings] 'hnsw-sqfp16 mean throughput (docs/sec)' is a table heading and should be in sentence case.", "location": {"path": "_posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md", "range": {"start": {"line": 215, "column": 50}}}, "severity": "ERROR"}

Check failure on line 215 in _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md#L215

[OpenSearch.Spelling] Error: hnsw. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.
Raw output
{"message": "[OpenSearch.Spelling] Error: hnsw. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md", "range": {"start": {"line": 215, "column": 90}}}, "severity": "ERROR"}

Check failure on line 215 in _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md#L215

[OpenSearch.TableHeadings] 'hnsw p90 (ms)' is a table heading and should be in sentence case.
Raw output
{"message": "[OpenSearch.TableHeadings] 'hnsw p90 (ms)' is a table heading and should be in sentence case.", "location": {"path": "_posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md", "range": {"start": {"line": 215, "column": 90}}}, "severity": "ERROR"}

Check failure on line 215 in _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md#L215

[OpenSearch.TableHeadings] 'hnsw-sqfp16 p90 (ms)' is a table heading and should be in sentence case.
Raw output
{"message": "[OpenSearch.TableHeadings] 'hnsw-sqfp16 p90 (ms)' is a table heading and should be in sentence case.", "location": {"path": "_posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md", "range": {"start": {"line": 215, "column": 105}}}, "severity": "ERROR"}

Check failure on line 215 in _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md#L215

[OpenSearch.TableHeadings] 'hnsw p99 (ms)' is a table heading and should be in sentence case.
Raw output
{"message": "[OpenSearch.TableHeadings] 'hnsw p99 (ms)' is a table heading and should be in sentence case.", "location": {"path": "_posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md", "range": {"start": {"line": 215, "column": 127}}}, "severity": "ERROR"}

Check failure on line 215 in _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md#L215

[OpenSearch.Spelling] Error: hnsw. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.
Raw output
{"message": "[OpenSearch.Spelling] Error: hnsw. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md", "range": {"start": {"line": 215, "column": 127}}}, "severity": "ERROR"}

Check failure on line 215 in _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md#L215

[OpenSearch.TableHeadings] 'hnsw-sqfp16 p99 (ms)' is a table heading and should be in sentence case.
Raw output
{"message": "[OpenSearch.TableHeadings] 'hnsw-sqfp16 p99 (ms)' is a table heading and should be in sentence case.", "location": {"path": "_posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md", "range": {"start": {"line": 215, "column": 142}}}, "severity": "ERROR"}
|--- |--- |--- |--- |--- |--- |--- |
|hnsw 1 |7544 |7657 |14.02 |16.99 |19.18 |20.83 |

Check failure on line 217 in _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md#L217

[OpenSearch.Spelling] Error: hnsw. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.
Raw output
{"message": "[OpenSearch.Spelling] Error: hnsw. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md", "range": {"start": {"line": 217, "column": 2}}}, "severity": "ERROR"}
|hnsw 2 |7063 |7219 |14.21 |17.44 |18.86 |21.80 |

Check failure on line 218 in _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md#L218

[OpenSearch.Spelling] Error: hnsw. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.
Raw output
{"message": "[OpenSearch.Spelling] Error: hnsw. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md", "range": {"start": {"line": 218, "column": 2}}}, "severity": "ERROR"}
|hnsw 3 |6004 |5848 |16.14 |20.85 |17.65 |24.73 |

Check failure on line 219 in _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md#L219

[OpenSearch.Spelling] Error: hnsw. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.
Raw output
{"message": "[OpenSearch.Spelling] Error: hnsw. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md", "range": {"start": {"line": 219, "column": 2}}}, "severity": "ERROR"}

#### Analysis

* For k=1000, the recall is identical for both Faiss HNSW and Faiss HNSW with SQfp16.
* Faiss HNSW with SQfp16 requires approximately half the memory as Faiss HNSW (as measured by the required number of data nodes). Based on the [k-NN stats API metrics](https://opensearch.org/docs/latest/search-plugins/knn/api/#stats), the memory usage was reduced by 47.64% by using SQfp16.
* In most instances, SQfp16 demonstrated better indexing throughput as compared to `fp32` vectors.

## Conclusion

Faiss SQfp16 scalar quantization is a powerful technique that provides significant memory savings while maintaining high recall performance similar to full-precision vectors. Converting vectors to a 16-bit floating-point representation can reduce memory requirements by up to 50%. When combined with SIMD optimization, SQfp16 scalar quantization also enhances indexing throughput and reduces search latency, leading to better overall performance. This method strikes an excellent balance between memory efficiency and accuracy, making it a valuable tool for large-scale similarity search applications.

## Future scope

To achieve even greater memory efficiency, we plan to introduce `int8` quantization support using a [Faiss scalar quantizer](https://github.com/opensearch-project/k-NN/issues/1723) and [Lucene scalar quantizer](https://github.com/opensearch-project/k-NN/issues/1277).
This technique will enable a remarkable 75% reduction in memory requirements, or 4x compression, compared to full-precision vectors and we expect to find minimal reduction in recall.
The quantizers will accept `fp32` vectors as input, perform online training, and quantize the data into byte-sized vectors, eliminating the need for external quantization or extra training steps.

Furthermore, we aim to release binary vector support, enabling an unprecedented 32x compression rate. This approach will further reduce memory consumption. Moreover, we plan to incorporate AVX-512 optimization, which will contribute to further reducing search latency.

Our ongoing analysis and tuning of OpenSearch lets you address large-scale similarity search while minimizing resource requirements and maximizing cost-effectiveness.

## Appendix: Memory and data node requirement estimation

Here are some estimates of the amount of memory and number of data nodes needed for the 100M, 768 dimension large workload benchmarking test:

```
// Faiss HNSW SQfp16 Memory Estimation
1.1 * (2 * dimension + 8 * M) * num_of_vectors * (1 + num_of_replicas) bytes

Let m = 16 and num_replicas = 0

1.1 * (2 * 768 + 8 * 16) * 100000000 * (1 + 0) = 170.47 gb = 171 gb

Instance r5.4xlarge has a memory of 128 gb in which 32 gb is used for JVM.
Let us assume circuit breaker limit is 0.5

Total available memory = (data node instance memory - jvm memory) * circuit breaker limit
Total available memory = (128 - 32 ) * 0.5 = 48gb

Number of Data nodes -> 171/48 = 3.56 = 4
```

```
// Faiss HNSW Memory Estimation
1.1 * (4 * dimension + 8 * M) * num_of_vectors * (1 + num_of_replicas) bytes

Let m = 16 and num_replicas = 0

1.1 * (4 * 768 + 8 * 16) * 100000000 * (1 + 0) = 327.83 gb = 328 gb

Instance r5.4xlarge has a memory of 128 gb in which 32 gb is used for JVM.
Let us assume circuit breaker limit is 0.5

Total available memory = (data node instance memory - jvm memory) * circuit breaker limit
Total available memory = (128 - 32 ) * 0.5 = 48gb

Number of Data nodes -> 328/48 = 6.83 = 7 + 1(for stability) = 8
```

## References

* [Benchmarking datasets](https://github.com/erikbern/ann-benchmarks?tab=readme-ov-file#data-sets)
* [Cohere/wikipedia-22-12-simple-embeddings](https://huggingface.co/datasets/Cohere/wikipedia-22-12-simple-embeddings)
* [Laion](https://laion.ai/about/)

Check failure on line 283 in _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md#L283

[OpenSearch.Spelling] Error: Laion. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.
Raw output
{"message": "[OpenSearch.Spelling] Error: Laion. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md", "range": {"start": {"line": 283, "column": 4}}}, "severity": "ERROR"}
* Schuhmann, C., Beaumont, R., Vencu, R., Gordon, C., Wightman, R., Cherti, M., Coombes, T., Katta, A., Mullis, C., Wortsman, M., Schramowski, P., Kundurthy, S., Crowson, K., Schmidt, L., Kaczmarczyk, R., & Jitsev, J. (2022). LAION-5B: An open large-scale dataset for training next generation image-text models. arXiv (Cornell University). [https://doi.org/10.48550/arxiv.2210.08402](https://doi.org/10.48550/arxiv.2210.08402)

Check failure on line 284 in _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md#L284

[OpenSearch.Spelling] Error: Schuhmann. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.
Raw output
{"message": "[OpenSearch.Spelling] Error: Schuhmann. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md", "range": {"start": {"line": 284, "column": 3}}}, "severity": "ERROR"}

Check failure on line 284 in _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md#L284

[OpenSearch.Spelling] Error: Vencu. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.
Raw output
{"message": "[OpenSearch.Spelling] Error: Vencu. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md", "range": {"start": {"line": 284, "column": 32}}}, "severity": "ERROR"}

Check failure on line 284 in _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md#L284

[OpenSearch.Spelling] Error: Wightman. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.
Raw output
{"message": "[OpenSearch.Spelling] Error: Wightman. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md", "range": {"start": {"line": 284, "column": 55}}}, "severity": "ERROR"}

Check failure on line 284 in _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md#L284

[OpenSearch.Spelling] Error: Cherti. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.
Raw output
{"message": "[OpenSearch.Spelling] Error: Cherti. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md", "range": {"start": {"line": 284, "column": 69}}}, "severity": "ERROR"}

Check failure on line 284 in _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md#L284

[OpenSearch.Spelling] Error: Coombes. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.
Raw output
{"message": "[OpenSearch.Spelling] Error: Coombes. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md", "range": {"start": {"line": 284, "column": 81}}}, "severity": "ERROR"}

Check failure on line 284 in _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md#L284

[OpenSearch.Spelling] Error: Katta. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.
Raw output
{"message": "[OpenSearch.Spelling] Error: Katta. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md", "range": {"start": {"line": 284, "column": 94}}}, "severity": "ERROR"}

Check failure on line 284 in _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md#L284

[OpenSearch.Spelling] Error: Mullis. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.
Raw output
{"message": "[OpenSearch.Spelling] Error: Mullis. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md", "range": {"start": {"line": 284, "column": 105}}}, "severity": "ERROR"}

Check failure on line 284 in _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md#L284

[OpenSearch.Spelling] Error: Wortsman. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.
Raw output
{"message": "[OpenSearch.Spelling] Error: Wortsman. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md", "range": {"start": {"line": 284, "column": 117}}}, "severity": "ERROR"}

Check failure on line 284 in _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md#L284

[OpenSearch.Spelling] Error: Schramowski. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.
Raw output
{"message": "[OpenSearch.Spelling] Error: Schramowski. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md", "range": {"start": {"line": 284, "column": 131}}}, "severity": "ERROR"}

Check failure on line 284 in _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md#L284

[OpenSearch.Spelling] Error: Kundurthy. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.
Raw output
{"message": "[OpenSearch.Spelling] Error: Kundurthy. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md", "range": {"start": {"line": 284, "column": 148}}}, "severity": "ERROR"}

Check failure on line 284 in _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md#L284

[OpenSearch.Spelling] Error: Crowson. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.
Raw output
{"message": "[OpenSearch.Spelling] Error: Crowson. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md", "range": {"start": {"line": 284, "column": 163}}}, "severity": "ERROR"}

Check failure on line 284 in _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md#L284

[OpenSearch.Spelling] Error: Kaczmarczyk. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.
Raw output
{"message": "[OpenSearch.Spelling] Error: Kaczmarczyk. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md", "range": {"start": {"line": 284, "column": 189}}}, "severity": "ERROR"}

Check failure on line 284 in _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md#L284

[OpenSearch.Spelling] Error: Jitsev. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.
Raw output
{"message": "[OpenSearch.Spelling] Error: Jitsev. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md", "range": {"start": {"line": 284, "column": 208}}}, "severity": "ERROR"}

Check failure on line 284 in _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md#L284

[OpenSearch.Spelling] Error: arXiv. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.
Raw output
{"message": "[OpenSearch.Spelling] Error: arXiv. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md", "range": {"start": {"line": 284, "column": 313}}}, "severity": "ERROR"}
* Douze, Matthijs, Alexandr Guzhva, Chengqi Deng, Jeff Johnson, Gergely Szilvasy, Pierre-Emmanuel Mazar'e, Maria Lomeli, Lucas Hosseini and Herv'e J'egou. The Faiss library. [https://arxiv.org/abs/2401.08281](https://arxiv.org/abs/2401.08281)

Check failure on line 285 in _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md#L285

[OpenSearch.Spelling] Error: Matthijs. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.
Raw output
{"message": "[OpenSearch.Spelling] Error: Matthijs. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md", "range": {"start": {"line": 285, "column": 10}}}, "severity": "ERROR"}

Check failure on line 285 in _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md#L285

[OpenSearch.Spelling] Error: Alexandr. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.
Raw output
{"message": "[OpenSearch.Spelling] Error: Alexandr. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md", "range": {"start": {"line": 285, "column": 20}}}, "severity": "ERROR"}

Check failure on line 285 in _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md#L285

[OpenSearch.Spelling] Error: Guzhva. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.
Raw output
{"message": "[OpenSearch.Spelling] Error: Guzhva. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md", "range": {"start": {"line": 285, "column": 29}}}, "severity": "ERROR"}

Check failure on line 285 in _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md#L285

[OpenSearch.Spelling] Error: Chengqi. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.
Raw output
{"message": "[OpenSearch.Spelling] Error: Chengqi. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md", "range": {"start": {"line": 285, "column": 37}}}, "severity": "ERROR"}

Check failure on line 285 in _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md#L285

[OpenSearch.Spelling] Error: Gergely. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.
Raw output
{"message": "[OpenSearch.Spelling] Error: Gergely. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md", "range": {"start": {"line": 285, "column": 65}}}, "severity": "ERROR"}

Check failure on line 285 in _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md#L285

[OpenSearch.Spelling] Error: Szilvasy. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.
Raw output
{"message": "[OpenSearch.Spelling] Error: Szilvasy. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md", "range": {"start": {"line": 285, "column": 73}}}, "severity": "ERROR"}

Check failure on line 285 in _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md#L285

[OpenSearch.Spelling] Error: Mazar'e. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.
Raw output
{"message": "[OpenSearch.Spelling] Error: Mazar'e. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md", "range": {"start": {"line": 285, "column": 99}}}, "severity": "ERROR"}

Check failure on line 285 in _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md#L285

[OpenSearch.Spelling] Error: Lomeli. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.
Raw output
{"message": "[OpenSearch.Spelling] Error: Lomeli. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md", "range": {"start": {"line": 285, "column": 114}}}, "severity": "ERROR"}

Check failure on line 285 in _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md#L285

[OpenSearch.Spelling] Error: Hosseini. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.
Raw output
{"message": "[OpenSearch.Spelling] Error: Hosseini. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md", "range": {"start": {"line": 285, "column": 128}}}, "severity": "ERROR"}

Check failure on line 285 in _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md#L285

[OpenSearch.Spelling] Error: Herv'e. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.
Raw output
{"message": "[OpenSearch.Spelling] Error: Herv'e. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md", "range": {"start": {"line": 285, "column": 141}}}, "severity": "ERROR"}

Check failure on line 285 in _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md

View workflow job for this annotation

GitHub Actions / vale

[vale] _posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md#L285

[OpenSearch.Spelling] Error: J'egou. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.
Raw output
{"message": "[OpenSearch.Spelling] Error: J'egou. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_posts/2024-06-19-optimizing-opensearch-with-fp16-quantization.md", "range": {"start": {"line": 285, "column": 148}}}, "severity": "ERROR"}
Binary file removed assets/media/community/members/navtat.jpg
Binary file not shown.
Binary file added assets/media/community/members/navtat.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading