Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Query workbench PPL 'parse' syntax doesn't work on nested/non-single line field values #3206

Open
opensearch1 opened this issue Dec 18, 2024 · 8 comments
Labels
bug Something isn't working PPL Piped processing language

Comments

@opensearch1
Copy link

Describe the bug

'parse' PPL syntax doesn't seem to extract nested/non-single line field values. Documentation examples are very limited (ref https://github.com/opensearch-project/sql/blob/main/docs/user/ppl/cmd/parse.rst)

Related component

Search:Query Capabilities

To Reproduce

  1. Go to Query workbench then select PPL
  2. Query indexes that have nested field values

Expected behavior

Parse syntax should return a field value

Additional Details

parse2
parse1

@opensearch1 opensearch1 added bug Something isn't working untriaged labels Dec 18, 2024
@getsaurabh02 getsaurabh02 transferred this issue from opensearch-project/OpenSearch Dec 18, 2024
@YANG-DB YANG-DB removed the untriaged label Dec 18, 2024
@YANG-DB
Copy link
Member

YANG-DB commented Dec 18, 2024

Thanks for bringing this to our attention.

@YANG-DB YANG-DB moved this to Todo in PPL Commands Dec 18, 2024
@YANG-DB YANG-DB added the PPL Piped processing language label Dec 18, 2024
@acarbonetto
Copy link
Collaborator

I was able to run this query using parse on nested values, to get this result:

/_plugins/_ppl:
{
  "query" : "source=nested | parse message.info '.*a(?<name>.+)' | fields message.info, name"
}

Response: 
{
    "schema": [
        {
            "name": "message.info",
            "type": "string"
        },
        {
            "name": "name",
            "type": "string"
        }
    ],
    "datarows": [
        [
            "a",
            ""
        ],
        [
            "bab",
            "b"
        ],
        [
            "aba",
            "ba"
        ]
    ],
    "total": 3,
    "size": 3
}

The parse command doesn't work against multi-values, and results in an error:

/_plugins/_ppl:
{
  "query" : "source=nested | parse message.info '.*a(?<name>.+)' | fields message.info, name"
}

Response: 
{
    "schema": [
        {
            "name": "mv",
            "type": "string"
        }
    ],
    "datarows": [
        [
            [
                "aaab",
                "aab",
                "ab",
                "b"
            ]
        ],
        [
            "aaac"
        ],
        [
            [
                "d",
                "ad",
                "aad",
                "aaad"
            ]
        ]
    ],
    "total": 3,
    "size": 3
}

/_plugins/_ppl:
{
  "query" : "source=nested | parse mv '.*a(?<name>.+)' | fields mv, name"
}

Response: 
{
  "error": {
    "reason": "Invalid Query",
    "details": "failed to parse field \"mv\" with type [ARRAY]",
    "type": "SemanticCheckException"
  },
  "status": 400
}

@acarbonetto
Copy link
Collaborator

We could be more explicit about how the syntax works in the documentation: https://github.com/opensearch-project/sql/blob/main/docs/user/ppl/cmd/parse.rst

Also, add a limitation that multi-valued fields result in an error.

@YANG-DB
Copy link
Member

YANG-DB commented Dec 18, 2024

@acarbonetto I've asked for details on the version this issue was discovered

@acarbonetto
Copy link
Collaborator

I think there's a problem with your match query: .+(?<name>.+) won't match anything for name. The .+ match is a wildcard matcher that will pick up everything leaving nothing to put into the name field. According to the screenshot, name is returning empty string results (I believe that's what's going on).

Depending on your data, you can try and include a pattern that matches more than one variable in the field.

@opensearch1
Copy link
Author

Hi,

This is being used in OpenSearch 2.15 under query workbench.

@opensearch1
Copy link
Author

opensearch1 commented Dec 20, 2024

@acarbonetto it looks that way since I just copied the unparsed values. it looks like a table (unsure if it's JSON or something else) that there's set of fields and values (see screenshot). The complete values appear when I hover the mouse or copy them.
parse3

@andy-k-improving
Copy link
Contributor

@acarbonetto it looks that way since I just copied the unparsed values. it looks like a table (unsure if it's JSON or something else) that there's set of fields and values (see screenshot). The complete values appear when I hover the mouse or copy them. parse3

@opensearch1 I can confirm indeed the hover is expected behaviour in the case of displaying nested field, similar to fields geo, event and machine from the sample dataset provided by OpenSearch out of the box.

Image

In this particular case, would you mind to share the query along with the expected pattern you would like to achieve under the parse command but failed to do so?

Also in the case of Bad request, there should be a Java exception being thrown in the backend, which would be great if you can share such information, if policy is allowed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working PPL Piped processing language
Projects
Status: Todo
Development

No branches or pull requests

4 participants