Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement] Fetch system index mappings from json file instead of string constants #3153

Merged

Conversation

pyek-bot
Copy link
Contributor

@pyek-bot pyek-bot commented Oct 23, 2024

Description

Moves the index mappings for system indices into json files and fetches the version from the file itself.
Version should be provided under "_meta"."schema_version"

All mappings were fetched by using the below method in MlIndex.java

public static void main(String[] args) {
        for (MLIndex index : values()) {
            System.out.println(index.getIndexName() + "\n"
                    + index.getMapping() + "\n");
        }
    }

Related Issues

Resolves #2951

Enhancements

  • Add a required_fields object under _meta object to validate if these fields are present in the mapping
  • Support placeholders to enable reusability

These enhancements will be raised in a separate PR: https://github.com/pyek-bot/ml-commons/pull/1/files

Testing

  • Mappings were compared by fetching the existing mappings from main and comparing against new mappings using the above mentioned code block
  • Added validation on mappings to ensure they follow a particular schema
  • Added a test case that initializes the enum so that any failures in mapping creation can be caught before runtime
    https://github.com/pyek-bot/ml-commons/pull/1/files
  • Added UTs for helper methods
  • Ran integration tests to ensure all indices are created
  • Manually created all system indices, fetched them via the GET api and validated the mappings

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@pyek-bot pyek-bot had a problem deploying to ml-commons-cicd-env-require-approval October 23, 2024 23:10 — with GitHub Actions Failure
@pyek-bot pyek-bot had a problem deploying to ml-commons-cicd-env-require-approval October 23, 2024 23:10 — with GitHub Actions Failure
Signed-off-by: Pavan Yekbote <[email protected]>
@pyek-bot pyek-bot had a problem deploying to ml-commons-cicd-env-require-approval October 23, 2024 23:12 — with GitHub Actions Failure
@pyek-bot pyek-bot had a problem deploying to ml-commons-cicd-env-require-approval October 23, 2024 23:12 — with GitHub Actions Failure
@pyek-bot pyek-bot had a problem deploying to ml-commons-cicd-env-require-approval October 23, 2024 23:17 — with GitHub Actions Failure
@pyek-bot pyek-bot had a problem deploying to ml-commons-cicd-env-require-approval October 23, 2024 23:17 — with GitHub Actions Failure
Signed-off-by: Pavan Yekbote <[email protected]>
@pyek-bot pyek-bot had a problem deploying to ml-commons-cicd-env-require-approval October 24, 2024 00:12 — with GitHub Actions Failure
@pyek-bot pyek-bot had a problem deploying to ml-commons-cicd-env-require-approval October 24, 2024 00:12 — with GitHub Actions Failure
@pyek-bot pyek-bot had a problem deploying to ml-commons-cicd-env-require-approval October 24, 2024 00:20 — with GitHub Actions Failure
@pyek-bot pyek-bot had a problem deploying to ml-commons-cicd-env-require-approval October 24, 2024 00:20 — with GitHub Actions Failure
@pyek-bot pyek-bot force-pushed the fetch_index_mappings_from_files branch from d50f8cf to 63a6d62 Compare October 24, 2024 00:22
@pyek-bot pyek-bot had a problem deploying to ml-commons-cicd-env-require-approval October 24, 2024 00:22 — with GitHub Actions Failure
@pyek-bot pyek-bot had a problem deploying to ml-commons-cicd-env-require-approval October 24, 2024 00:22 — with GitHub Actions Failure
Signed-off-by: Pavan Yekbote <[email protected]>
@pyek-bot pyek-bot had a problem deploying to ml-commons-cicd-env-require-approval October 24, 2024 17:49 — with GitHub Actions Failure
@pyek-bot pyek-bot had a problem deploying to ml-commons-cicd-env-require-approval October 24, 2024 17:49 — with GitHub Actions Failure
@pyek-bot pyek-bot marked this pull request as ready for review October 24, 2024 18:44
@pyek-bot pyek-bot had a problem deploying to ml-commons-cicd-env-require-approval November 26, 2024 02:04 — with GitHub Actions Failure
@pyek-bot pyek-bot had a problem deploying to ml-commons-cicd-env-require-approval November 26, 2024 02:04 — with GitHub Actions Failure
@dhrubo-os
Copy link
Collaborator

How can we check that all the json files are property formatted? Can we have corresponding validator test for all those files?

@dhrubo-os
Copy link
Collaborator

IndexUtilsTest > testGetMappingFromFilesMalformedJson FAILED
    java.lang.AssertionError at IndexUtilsTest.java:88
        Caused by: java.lang.IllegalArgumentException at IndexUtilsTest.java:88
        ```

@pyek-bot pyek-bot had a problem deploying to ml-commons-cicd-env-require-approval November 26, 2024 02:39 — with GitHub Actions Failure
@pyek-bot
Copy link
Contributor Author

IndexUtilsTest > testGetMappingFromFilesMalformedJson FAILED
    java.lang.AssertionError at IndexUtilsTest.java:88
        Caused by: java.lang.IllegalArgumentException at IndexUtilsTest.java:88
        ```

This was due to the latest change with the error handling, i will fix it

@pyek-bot
Copy link
Contributor Author

pyek-bot commented Nov 26, 2024

How can we check that all the json files are property formatted? Can we have corresponding validator test for all those files?

In this enhancements PR: https://github.com/pyek-bot/ml-commons/pull/1/files, I have added a test so that we can catch any malformed mappings or failures before runtime: common/src/test/java/MLIndexTest.java

It simply references the Enum object so that the mapping path for all indices is triggered. The mapping path has inbuilt verification for these json files.

I can add explicit checks for each file but as we add system indices that will require that each file have its own test. This will also help with keeping the test suite small and not have redundant checks.

Moreover, I have added schema validation to the files themselves, the enforced schema can be found here: common/src/main/resources/index-mappings/schema.json in the above PR.

Let me know what you think!

@pyek-bot pyek-bot had a problem deploying to ml-commons-cicd-env-require-approval November 26, 2024 06:19 — with GitHub Actions Failure
@pyek-bot pyek-bot had a problem deploying to ml-commons-cicd-env-require-approval November 26, 2024 06:19 — with GitHub Actions Failure
Signed-off-by: Pavan Yekbote <[email protected]>
@pyek-bot pyek-bot temporarily deployed to ml-commons-cicd-env-require-approval November 26, 2024 06:20 — with GitHub Actions Inactive
@pyek-bot pyek-bot had a problem deploying to ml-commons-cicd-env-require-approval November 26, 2024 06:20 — with GitHub Actions Failure
@dhrubo-os
Copy link
Collaborator

How can we check that all the json files are property formatted? Can we have corresponding validator test for all those files?

In this enhancements PR: https://github.com/pyek-bot/ml-commons/pull/1/files, I have added a test so that we can catch any malformed mappings or failures before runtime: common/src/test/java/MLIndexTest.java

It simply references the Enum object so that the mapping path for all indices is triggered. The mapping path has inbuilt verification for these json files.

I can add explicit checks for each file but as we add system indices that will require that each file have its own test. This will also help with keeping the test suite small and not have redundant checks.

Moreover, I have added schema validation to the files themselves, the enforced schema can be found here: common/src/main/resources/index-mappings/schema.json in the above PR.

Let me know what you think!

It's a trade off between generic check vs tailored check. We can follow hybrid approach like for most important indices (like model, connector, config), we can add more deliberate tests. And Overall for all the indices we can have a generic malformed json checks.

@pyek-bot pyek-bot had a problem deploying to ml-commons-cicd-env-require-approval November 26, 2024 18:05 — with GitHub Actions Failure
@pyek-bot
Copy link
Contributor Author

How can we check that all the json files are property formatted? Can we have corresponding validator test for all those files?

In this enhancements PR: https://github.com/pyek-bot/ml-commons/pull/1/files, I have added a test so that we can catch any malformed mappings or failures before runtime: common/src/test/java/MLIndexTest.java
It simply references the Enum object so that the mapping path for all indices is triggered. The mapping path has inbuilt verification for these json files.
I can add explicit checks for each file but as we add system indices that will require that each file have its own test. This will also help with keeping the test suite small and not have redundant checks.
Moreover, I have added schema validation to the files themselves, the enforced schema can be found here: common/src/main/resources/index-mappings/schema.json in the above PR.
Let me know what you think!

It's a trade off between generic check vs tailored check. We can follow hybrid approach like for most important indices (like model, connector, config), we can add more deliberate tests. And Overall for all the indices we can have a generic malformed json checks.

Sure, I will add a couple tests in that PR (https://github.com/pyek-bot/ml-commons/pull/1/files) since it has most of the validation and verification logic!

@pyek-bot pyek-bot had a problem deploying to ml-commons-cicd-env-require-approval November 26, 2024 20:44 — with GitHub Actions Failure
@pyek-bot
Copy link
Contributor Author

I have addressed your comments @dhrubo-os @Zhangxunmt , please check and let me know

@dhrubo-os for explicit tests, u can find it here: pyek-bot@d8390ca

once this PR is merged, I will raise another PR for the enhancements to this feature to make it complete

@pyek-bot pyek-bot had a problem deploying to ml-commons-cicd-env-require-approval November 27, 2024 00:10 — with GitHub Actions Failure
@dhrubo-os
Copy link
Collaborator

Have you done some sanity testing (end to end testing including agents setup) with this PR changes?

@pyek-bot
Copy link
Contributor Author

pyek-bot commented Nov 27, 2024

Have you done some sanity testing (end to end testing including agents setup) with this PR changes?

Yup, I have done some functional testing for these components. However, the most critical test imo is that I have compared the new mappings and old mappings via a script after having extracted them as mentioned in the description.

I plan to run the same script for the enhancements PR as well.

@dhrubo-os dhrubo-os merged commit 78a304a into opensearch-project:main Nov 27, 2024
7 of 8 checks passed
opensearch-trigger-bot bot pushed a commit that referenced this pull request Nov 27, 2024
…tring constants (#3153)

* feat(index mappings): fetch mappings and version from json file instead of string constants

Signed-off-by: Pavan Yekbote <[email protected]>

* refactor: changing exception being thrown

Signed-off-by: Pavan Yekbote <[email protected]>

* chore: remove unused file

Signed-off-by: Pavan Yekbote <[email protected]>

* chore: fix typo in comment

Signed-off-by: Pavan Yekbote <[email protected]>

* chore: adding new line at the end of files

Signed-off-by: Pavan Yekbote <[email protected]>

* feat: add test cases

Signed-off-by: Pavan Yekbote <[email protected]>

* fix: remove test code

Signed-off-by: Pavan Yekbote <[email protected]>

* fix(test): in main the versions were not updated appropriately

Signed-off-by: Pavan Yekbote <[email protected]>

* refactor: move mapping templates under common module

Signed-off-by: Pavan Yekbote <[email protected]>

* refactor: ensure that conversationindexconstants reference mlindex enums rather than use their own mappings

Signed-off-by: Pavan Yekbote <[email protected]>

* refactor: update comment

Signed-off-by: Pavan Yekbote <[email protected]>

* refactor: rename dir from mappings to index-mappings

Signed-off-by: Pavan Yekbote <[email protected]>

* fix: add null checks

Signed-off-by: Pavan Yekbote <[email protected]>

* fix: adding dependencies for testing

Signed-off-by: Pavan Yekbote <[email protected]>

* fix(test): compare json object rather than strings to avoid eol character issue

Signed-off-by: Pavan Yekbote <[email protected]>

* refactor: combine if statements into single check

Signed-off-by: Pavan Yekbote <[email protected]>

* refactoring: null handling + clean code

Signed-off-by: Pavan Yekbote <[email protected]>

* spotless apply

Signed-off-by: Pavan Yekbote <[email protected]>

---------

Signed-off-by: Pavan Yekbote <[email protected]>
(cherry picked from commit 78a304a)
Zhangxunmt pushed a commit that referenced this pull request Dec 4, 2024
…tring constants (#3153) (#3239)

* feat(index mappings): fetch mappings and version from json file instead of string constants

Signed-off-by: Pavan Yekbote <[email protected]>

* refactor: changing exception being thrown

Signed-off-by: Pavan Yekbote <[email protected]>

* chore: remove unused file

Signed-off-by: Pavan Yekbote <[email protected]>

* chore: fix typo in comment

Signed-off-by: Pavan Yekbote <[email protected]>

* chore: adding new line at the end of files

Signed-off-by: Pavan Yekbote <[email protected]>

* feat: add test cases

Signed-off-by: Pavan Yekbote <[email protected]>

* fix: remove test code

Signed-off-by: Pavan Yekbote <[email protected]>

* fix(test): in main the versions were not updated appropriately

Signed-off-by: Pavan Yekbote <[email protected]>

* refactor: move mapping templates under common module

Signed-off-by: Pavan Yekbote <[email protected]>

* refactor: ensure that conversationindexconstants reference mlindex enums rather than use their own mappings

Signed-off-by: Pavan Yekbote <[email protected]>

* refactor: update comment

Signed-off-by: Pavan Yekbote <[email protected]>

* refactor: rename dir from mappings to index-mappings

Signed-off-by: Pavan Yekbote <[email protected]>

* fix: add null checks

Signed-off-by: Pavan Yekbote <[email protected]>

* fix: adding dependencies for testing

Signed-off-by: Pavan Yekbote <[email protected]>

* fix(test): compare json object rather than strings to avoid eol character issue

Signed-off-by: Pavan Yekbote <[email protected]>

* refactor: combine if statements into single check

Signed-off-by: Pavan Yekbote <[email protected]>

* refactoring: null handling + clean code

Signed-off-by: Pavan Yekbote <[email protected]>

* spotless apply

Signed-off-by: Pavan Yekbote <[email protected]>

---------

Signed-off-by: Pavan Yekbote <[email protected]>
(cherry picked from commit 78a304a)

Co-authored-by: Pavan Yekbote <[email protected]>
tkykenmt pushed a commit to tkykenmt/ml-commons that referenced this pull request Dec 15, 2024
…tring constants (opensearch-project#3153)

* feat(index mappings): fetch mappings and version from json file instead of string constants

Signed-off-by: Pavan Yekbote <[email protected]>

* refactor: changing exception being thrown

Signed-off-by: Pavan Yekbote <[email protected]>

* chore: remove unused file

Signed-off-by: Pavan Yekbote <[email protected]>

* chore: fix typo in comment

Signed-off-by: Pavan Yekbote <[email protected]>

* chore: adding new line at the end of files

Signed-off-by: Pavan Yekbote <[email protected]>

* feat: add test cases

Signed-off-by: Pavan Yekbote <[email protected]>

* fix: remove test code

Signed-off-by: Pavan Yekbote <[email protected]>

* fix(test): in main the versions were not updated appropriately

Signed-off-by: Pavan Yekbote <[email protected]>

* refactor: move mapping templates under common module

Signed-off-by: Pavan Yekbote <[email protected]>

* refactor: ensure that conversationindexconstants reference mlindex enums rather than use their own mappings

Signed-off-by: Pavan Yekbote <[email protected]>

* refactor: update comment

Signed-off-by: Pavan Yekbote <[email protected]>

* refactor: rename dir from mappings to index-mappings

Signed-off-by: Pavan Yekbote <[email protected]>

* fix: add null checks

Signed-off-by: Pavan Yekbote <[email protected]>

* fix: adding dependencies for testing

Signed-off-by: Pavan Yekbote <[email protected]>

* fix(test): compare json object rather than strings to avoid eol character issue

Signed-off-by: Pavan Yekbote <[email protected]>

* refactor: combine if statements into single check

Signed-off-by: Pavan Yekbote <[email protected]>

* refactoring: null handling + clean code

Signed-off-by: Pavan Yekbote <[email protected]>

* spotless apply

Signed-off-by: Pavan Yekbote <[email protected]>

---------

Signed-off-by: Pavan Yekbote <[email protected]>
Signed-off-by: tkykenmt <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Enhancement] Use index mapping json file
5 participants