-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve concat_json
#2557
Improve concat_json
#2557
Conversation
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
nullify_invalid_rows
parameter to concat_json
concat_json
// Currently, set `nullify_invalid_rows = false` as `concatenateJsonStrings` is used only for | ||
// `from_json` with struct schema. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...and this Java binding concatenateJsonStrings
will be removed completely in the next PR for new from_json
implementation.
if (i + 3 < size && | ||
(d_str[i] == 'n' && d_str[i + 1] == 'u' && d_str[i + 2] == 'l' && d_str[i + 3] == 'l')) { | ||
i += 4; | ||
|
||
// Skip the very last whitespace characters. | ||
bool is_null_literal{true}; | ||
for (; i < size; ++i) { | ||
ch = d_str[i]; | ||
if (not_whitespace(ch)) { | ||
is_null_literal = false; | ||
break; | ||
} | ||
} | ||
|
||
// The current row contains only `null` string literal and not any other non-whitespace | ||
// characters. Such rows need to be masked out as null when doing concatenation. | ||
if (is_null_literal) { | ||
output[idx] = thrust::make_tuple(false, false); | ||
return; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
null
string literal is not any different from other garbage (invalid JSON) strings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI Spark integration tests all passed with this removal.
Signed-off-by: Nghia Truong <[email protected]>
build |
This PR improves
concat_json
injson_utils.*
:nullify_invalid_rows
toconcat_json
, allowing to control whether we should mark the input rows containing invalid JSON objects as "to be nullified" after parsing the input JSON strings.The added parameter is to facilitate different behaviors of parsing JSON with varying types of schema. In particular, Spark's
from_json
will produce different null rows when specifying schema as map vs struct types: