Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix post_process_function bug on sort results for rerank_pipeline_with_bge-rerank-m3-v2_model_deployed_on_Sagemaker.md #3248

Conversation

tkykenmt
Copy link
Contributor

@tkykenmt tkykenmt commented Dec 3, 2024

Description

fix post_process_function bug on sort results for docs/tutorials/rerank/rerank_pipeline_with_bge-rerank-m3-v2_model_deployed_on_Sagemaker.md

Related Issues

Resolves #3247

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

"post_process_function": "\n \n def dataType = \"FLOAT32\";\n \n \n if (params.result == null)\n {\n return 'no result generated';\n //return params.response;\n }\n def outputs = params.result;\n \n def sorted_outputs = outputs;\n for (int i=0; i<outputs.length; i++) {\n def idx = new BigDecimal(outputs[i].index.toString()).intValue();\n sorted_outputs[idx] = outputs[i];\n }\n def resultBuilder = new StringBuilder('[');\n for (int i=0; i<outputs.length; i++) {\n resultBuilder.append(' {\"name\": \"similarity\", \"data_type\": \"FLOAT32\", \"shape\": [1],');\n //resultBuilder.append('{\"name\": \"similarity\"}');\n \n resultBuilder.append('\"data\": [');\n resultBuilder.append(outputs[i].score);\n resultBuilder.append(']}');\n if (i<outputs.length - 1) {\n resultBuilder.append(',');\n }\n }\n resultBuilder.append(']');\n \n return resultBuilder.toString();\n "
"post_process_function": """
if (params.result == null) {
return "no result generated";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should here throw exception rather than returning the string?

Copy link
Contributor Author

@tkykenmt tkykenmt Dec 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct, throwing exception is prefer than just returning string. As far as I can see, pre-built function throws exceptions for these situations.

I'd like to know implementation rule for pre or post processing functions. Do you know coding style or rule on the function? Other sample looks not include codes to throw exceptions.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should write the exception in painless.

maybe try this way?

if (params.result == null) {
    throw new IllegalArgumentException("Result is null");
}

Copy link
Contributor Author

@tkykenmt tkykenmt Dec 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tested with throwing exception. When embedding throwing exception, client sees following results. It's vague output but better than returning 200 code.

generated error by throwing exception

TransportError: TransportError(500, 'm_l_exception', 'runtime error')

generated error by output error message

{
  "inference_results": [
    {
      "output": [
        {
          "name": "response",
          "dataAsMap": {
            "response": "no result generated"
          }
        }
      ],
      "status_code": 200
    }
  ]
}

I'll update to replace error handling.

"pre_process_function": "\n def query_text = params.query_text;\n def text_docs = params.text_docs;\n def textDocsBuilder = new StringBuilder('[');\n for (int i=0; i<text_docs.length; i++) {\n textDocsBuilder.append('\"');\n textDocsBuilder.append(text_docs[i]);\n textDocsBuilder.append('\"');\n if (i<text_docs.length - 1) {\n textDocsBuilder.append(',');\n }\n }\n textDocsBuilder.append(']');\n def parameters = '{ \"query\": \"' + query_text + '\", \"texts\": ' + textDocsBuilder.toString() + ' }';\n return '{\"parameters\": ' + parameters + '}';\n",
"post_process_function": "\n \n def dataType = \"FLOAT32\";\n \n \n if (params.result == null)\n {\n return 'no result generated';\n //return params.response;\n }\n def outputs = params.result;\n \n def sorted_outputs = outputs;\n for (int i=0; i<outputs.length; i++) {\n def idx = new BigDecimal(outputs[i].index.toString()).intValue();\n sorted_outputs[idx] = outputs[i];\n }\n def resultBuilder = new StringBuilder('[');\n for (int i=0; i<outputs.length; i++) {\n resultBuilder.append(' {\"name\": \"similarity\", \"data_type\": \"FLOAT32\", \"shape\": [1],');\n //resultBuilder.append('{\"name\": \"similarity\"}');\n \n resultBuilder.append('\"data\": [');\n resultBuilder.append(outputs[i].score);\n resultBuilder.append(']}');\n if (i<outputs.length - 1) {\n resultBuilder.append(',');\n }\n }\n resultBuilder.append(']');\n \n return resultBuilder.toString();\n "
"post_process_function": """
if (params.result == null) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

result is an array, should check empty array as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add the check logic

}
def outputs = params.result;
def scores = new Double[outputs.length];
for (int i=0; i<outputs.length; i++) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add some explanations in the doc to explain what does this post processing function does.

It's important to mention the order retained by the document index order.

Also, because you used bigdecimal. some precision will be loss, that's ok, but please note in the tutorials.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to add more details on the tutorial based on the discussion in #3247

@tkykenmt tkykenmt had a problem deploying to ml-commons-cicd-env-require-approval December 15, 2024 12:59 — with GitHub Actions Failure
@tkykenmt tkykenmt had a problem deploying to ml-commons-cicd-env-require-approval December 15, 2024 12:59 — with GitHub Actions Failure
@tkykenmt tkykenmt had a problem deploying to ml-commons-cicd-env-require-approval December 15, 2024 14:02 — with GitHub Actions Failure
@tkykenmt tkykenmt temporarily deployed to ml-commons-cicd-env-require-approval December 15, 2024 14:02 — with GitHub Actions Inactive
pyek-bot and others added 9 commits December 15, 2024 23:05
…tring constants (opensearch-project#3153)

* feat(index mappings): fetch mappings and version from json file instead of string constants

Signed-off-by: Pavan Yekbote <[email protected]>

* refactor: changing exception being thrown

Signed-off-by: Pavan Yekbote <[email protected]>

* chore: remove unused file

Signed-off-by: Pavan Yekbote <[email protected]>

* chore: fix typo in comment

Signed-off-by: Pavan Yekbote <[email protected]>

* chore: adding new line at the end of files

Signed-off-by: Pavan Yekbote <[email protected]>

* feat: add test cases

Signed-off-by: Pavan Yekbote <[email protected]>

* fix: remove test code

Signed-off-by: Pavan Yekbote <[email protected]>

* fix(test): in main the versions were not updated appropriately

Signed-off-by: Pavan Yekbote <[email protected]>

* refactor: move mapping templates under common module

Signed-off-by: Pavan Yekbote <[email protected]>

* refactor: ensure that conversationindexconstants reference mlindex enums rather than use their own mappings

Signed-off-by: Pavan Yekbote <[email protected]>

* refactor: update comment

Signed-off-by: Pavan Yekbote <[email protected]>

* refactor: rename dir from mappings to index-mappings

Signed-off-by: Pavan Yekbote <[email protected]>

* fix: add null checks

Signed-off-by: Pavan Yekbote <[email protected]>

* fix: adding dependencies for testing

Signed-off-by: Pavan Yekbote <[email protected]>

* fix(test): compare json object rather than strings to avoid eol character issue

Signed-off-by: Pavan Yekbote <[email protected]>

* refactor: combine if statements into single check

Signed-off-by: Pavan Yekbote <[email protected]>

* refactoring: null handling + clean code

Signed-off-by: Pavan Yekbote <[email protected]>

* spotless apply

Signed-off-by: Pavan Yekbote <[email protected]>

---------

Signed-off-by: Pavan Yekbote <[email protected]>
Signed-off-by: tkykenmt <[email protected]>
…MTest (opensearch-project#3253)

* fix uneeded call to get model_id for task api within RestMLGuardrailsIT

Following opensearch-project#3244 this IT called the task api to check the model id again however this is redundant. Instead one can directly pull the model_id upon creating the model group. Manual testing was done to see that the behavior is intact, this should help reduce the calls within a IT to make it less flaky

Signed-off-by: Brian Flores <[email protected]>

* fix ToolIntegrationWithLLMTest model undeploy race condition

Previously the test class attempted to delete a model without fully knowing if the model was undeployed in time. This change adds a waiting for 5 retries each 1 second to check the status of the model and only when undeployed will it proceed to delete the model. When the number of retries are exceeded it throws a error indicating a deeper problem. Manual testing was done to check that the model is undeployed by searching for the specific model via the checkForModelUndeployedStatus method.

Signed-off-by: Brian Flores <[email protected]>

---------

Signed-off-by: Brian Flores <[email protected]>
Signed-off-by: tkykenmt <[email protected]>
…y for potential flaky behavior due to timeout (opensearch-project#3259)

* refactor: add retries and remove bloat

Signed-off-by: Pavan Yekbote <[email protected]>

* fix: adding return for valid inputs to prevent multiple runs

Signed-off-by: Pavan Yekbote <[email protected]>

---------

Signed-off-by: Pavan Yekbote <[email protected]>
Signed-off-by: tkykenmt <[email protected]>
Signed-off-by: Andriy Redko <[email protected]>
Signed-off-by: tkykenmt <[email protected]>
…tly (opensearch-project#3198)

* Use longs when splitting model zip file

Signed-off-by: Max Lepikhin <[email protected]>

* add test

Signed-off-by: Max Lepikhin <[email protected]>

* spotless

Signed-off-by: Max Lepikhin <[email protected]>

* clean up test

Signed-off-by: Max Lepikhin <[email protected]>

---------

Signed-off-by: Max Lepikhin <[email protected]>
Signed-off-by: tkykenmt <[email protected]>
…ersions (opensearch-project#3241)

* fix for sync up job not working in 2.17 when upgraded from previous versions

Signed-off-by: Bhavana Goud Ramaram <[email protected]>

* add sync-up job in missing places

Signed-off-by: Bhavana Goud Ramaram <[email protected]>

---------

Signed-off-by: Bhavana Goud Ramaram <[email protected]>
Signed-off-by: tkykenmt <[email protected]>
@tkykenmt tkykenmt force-pushed the patch.tutorial-rerank-bge-rerank-m3-v2 branch from 6954f4b to ad747f2 Compare December 15, 2024 14:06
@tkykenmt tkykenmt had a problem deploying to ml-commons-cicd-env-require-approval December 15, 2024 14:07 — with GitHub Actions Failure
@tkykenmt tkykenmt had a problem deploying to ml-commons-cicd-env-require-approval December 15, 2024 14:07 — with GitHub Actions Failure
@tkykenmt tkykenmt closed this Dec 15, 2024
@tkykenmt tkykenmt deleted the patch.tutorial-rerank-bge-rerank-m3-v2 branch December 15, 2024 14:09
@tkykenmt
Copy link
Contributor Author

moving to #3277

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] post_process_function on rerank_pipeline_with_bge-rerank-m3-v2_model_deployed_on_Sagemaker.md
8 participants