Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate Lucene Vector field with native engines to use KNNVectorFormat during segment creation #1945

Merged
merged 1 commit into from
Aug 12, 2024

Conversation

navneet1v
Copy link
Collaborator

@navneet1v navneet1v commented Aug 9, 2024

Description

Integrate Lucene Vector field with native engines, to use KNNVectorFormat during segment creation

What has changed and why:

  1. With this change I added the capability to use Lucene based vector field to for native engines. This feature is currently added behind a cluster setting to ensure that 2.x and main branch builds don't break. Once the code for adding the vector data structures is added for new KNNVectorsFormat this setting will be removed and new ITs will also be added.
  2. There is a loose attribute with name isIndexKNN added which will be refactored once this PR Refactor Around Mapper and Mapping #1939 is merged with FlatVectorsMapper class.
  3. If index.knn is false then we will use DocValuesBased Vector Field. Because if we use the Lucene based vector field then KNNCodec will not be triggered and default KNNFormat will be used which will create the HNSW graph as that is what default behavior of Lucene library. This is the reason why indexKNN check is added while deciding which VectorField to use.

Related Issues

#1853

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@navneet1v navneet1v added backport 2.x Features Introduces a new unit of functionality that satisfies a requirement memory-reduction labels Aug 9, 2024
@navneet1v navneet1v changed the title Integrate Lucene Vector field with native engines, to use KNNVectorFormat during segment creation Integrate Lucene Vector field with native engines to use KNNVectorFormat during segment creation Aug 9, 2024
@navneet1v
Copy link
Collaborator Author

Checking why CIs are failing. it doesn't seem the issue with the code.

@jmazanec15
Copy link
Member

  1. If index.knn is false and we use the Lucene based vector field then KNNCodec will not be triggered and default KNNFormat will be used which will create the HNSW graph as that is what default behavior of Lucene library.

Does this mean if someone upgrades their index with knn=false, it is going to switch from binary doc values to lucene?

@navneet1v
Copy link
Collaborator Author

  1. If index.knn is false and we use the Lucene based vector field then KNNCodec will not be triggered and default KNNFormat will be used which will create the HNSW graph as that is what default behavior of Lucene library.

Does this mean if someone upgrades their index with knn=false, it is going to switch from binary doc values to lucene?

To limit the scope of this pr and refactoring you are doing if its not a knn index we will still BDV

@navneet1v navneet1v force-pushed the codec branch 2 times, most recently from 3b9a5a0 to 9958d0a Compare August 11, 2024 05:43
@navneet1v
Copy link
Collaborator Author

navneet1v commented Aug 11, 2024

  1. If index.knn is false and we use the Lucene based vector field then KNNCodec will not be triggered and default KNNFormat will be used which will create the HNSW graph as that is what default behavior of Lucene library.

Does this mean if someone upgrades their index with knn=false, it is going to switch from binary doc values to lucene?

No. I have clarified the description now. I see how the confusion was happening.

jmazanec15
jmazanec15 previously approved these changes Aug 11, 2024
Copy link
Member

@jmazanec15 jmazanec15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Needs a rebase but approving

@navneet1v
Copy link
Collaborator Author

Looks good. Needs a rebase but approving

Thanks. I am fixing the conflicts. Will raise the PR in few hours.

jmazanec15
jmazanec15 previously approved these changes Aug 11, 2024
…mat during segment creation

Signed-off-by: Navneet Verma <[email protected]>
Copy link
Member

@ryanbogan ryanbogan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small comment for the future, but LGTM!

@navneet1v navneet1v merged commit 5a5351f into opensearch-project:main Aug 12, 2024
30 checks passed
opensearch-trigger-bot bot pushed a commit that referenced this pull request Aug 12, 2024
…mat during segment creation (#1945)

Signed-off-by: Navneet Verma <[email protected]>
(cherry picked from commit 5a5351f)
navneet1v added a commit that referenced this pull request Aug 12, 2024
…mat during segment creation (#1945)

Signed-off-by: Navneet Verma <[email protected]>
(cherry picked from commit 5a5351f)
navneet1v added a commit that referenced this pull request Aug 12, 2024
…mat during segment creation (#1945) (#1948)

Signed-off-by: Navneet Verma <[email protected]>
(cherry picked from commit 5a5351f)

Co-authored-by: Navneet Verma <[email protected]>
akashsha1 pushed a commit to akashsha1/k-NN that referenced this pull request Sep 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Features Introduces a new unit of functionality that satisfies a requirement memory-reduction
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants