Optimize `get_json_object` by calling the main kernel only once #2129

ttnghia · 2024-06-07T23:49:37Z

This PR optimizes the host code for get_json_object function, improving performance by calling the main kernel only once.

Currently, the main kernel is called twice, one to compute the output string sizes and another one is to write the output strings. The idea of this work is to pre-allocate a temporary char column to write the output without knowing the output string sizes. Then, while calling to the main kernel, we write out both the output strings along with their sizes, using just one kernel call. A (relatively cheap) gathering step is then executed to extract the final output strings column.

Closes #2124.

Signed-off-by: Nghia Truong <[email protected]>

thirtiseven · 2024-06-11T02:08:11Z

src/main/cpp/src/get_json_object.cu

+  // size so that we will not write output strings into an out-of-bound position.
+  // Checking out-of-bound needs to be performed in the main kernel to make sure we will not have
+  // data corruption.
+  constexpr auto padding_ratio = 1.01;


Question: How do we get this ratio? I think we have some case that will make the output larger than input_size*1.01, like number normalization, adding quotes under some styles, and escaping some chars.

This is the total size, not one row's size. If we have 1M rows, this will be the size of approx. 1'010'000 rows. In other word, we have around 10'000 more rows to avoid invalid memory access issue. We don't care if data of row n is written cross its boundary to row n+1 which causes corruption, since we will discard the entire buffer if such overflow is detected. The extra 10'000 rows at the end is just to make sure data of the last row will not be written into out-of-bound of the entire buffer.

The extra 10'000 rows seem to be too much. We can turn up this padding ratio to be a dynamic value. For example: the padding should always be the average size of 100 rows. That should be large enough to avoid invalid memory access when writing the last row, but it is not guaranteed.

One extreme case:

scala> val d = Seq("['\n']", "['\n\n\n\n\n\n\n\n\n\n']") scala> val df = d.toDF scala> df.createOrReplaceTempView("tab") scala> spark.sql("select length(value) from tab").show() +-------------+ |length(value)| +-------------+ | 5| | 17| +-------------+ scala> spark.sql("select length(get_json_object(value, '$')) from tab").show() +---------------------------------+ |length(get_json_object(value, $))| +---------------------------------+ | 6| | 30| +---------------------------------+

This case will cause invalid memory access:
round up 2 * 1.01 = 3, avg size is 11, total allocated size is 33, but total write size will be 35 (5 + 30).
It causes overlap writing as expected(is not an issue), but the tailing writing causes invalid memory access.

One option:
Add a method in json_generator like:

write_byte(char c) { if (curr_size >= max_allowed_size) { // write nothing curr_size++; } else { *curr_pos = c; curr_size++; } }

In my latest code, I pad the buffer by 100 * average_row_size. In this case, we have avg_row_size = 22/2=11, so buffer size will be 22 + 100*11, no overflow.

I'll add that code and check the benchmark to see how it will affect the overall performance.
Edit: I indeed attempted to implement such bound check write, but gave up as it requires to modify all the json generator, json parser and ftos converter. In the meantime, I have an idea to further optimize writing with shared memory (#2130) so let this be a future work.

I updated the padding, which is now computed using max row size. There is some overhead with this but it is small. This way can avoid invalid memory access in normal situations and also extreme cases such as when the column has all null rows except one very long row.

Signed-off-by: Nghia Truong <[email protected]>

ttnghia · 2024-06-11T18:52:27Z

build

res-life · 2024-06-12T05:28:02Z

It would be nice to add a unit test:

Causes the 2nd kernel invoking.

Signed-off-by: Nghia Truong <[email protected]>

ttnghia · 2024-06-13T03:16:44Z

build

…IA#2129) * Implement the CPU code Signed-off-by: Nghia Truong <[email protected]> * Reimplement kernel Signed-off-by: Nghia Truong <[email protected]> * Fix the kernel caller Signed-off-by: Nghia Truong <[email protected]> * Optimize validity computation Signed-off-by: Nghia Truong <[email protected]> * Cleanup Signed-off-by: Nghia Truong <[email protected]> * Cleanup Signed-off-by: Nghia Truong <[email protected]> * Cleanup Signed-off-by: Nghia Truong <[email protected]> * Add comment Signed-off-by: Nghia Truong <[email protected]> * Turning kernel occupancy Signed-off-by: Nghia Truong <[email protected]> * Cleanup Signed-off-by: Nghia Truong <[email protected]> * Change padding for the scratch buffer Signed-off-by: Nghia Truong <[email protected]> * Update docs Signed-off-by: Nghia Truong <[email protected]> * Add test for overflow case Signed-off-by: Nghia Truong <[email protected]> * Pad the output buffer using max row size Signed-off-by: Nghia Truong <[email protected]> * Update test Signed-off-by: Nghia Truong <[email protected]> * Change the padding ratio Signed-off-by: Nghia Truong <[email protected]> --------- Signed-off-by: Nghia Truong <[email protected]>

ttnghia added 8 commits June 7, 2024 11:36

Implement the CPU code

da2c751

Signed-off-by: Nghia Truong <[email protected]>

Reimplement kernel

69715af

Signed-off-by: Nghia Truong <[email protected]>

Fix the kernel caller

7e52458

Signed-off-by: Nghia Truong <[email protected]>

Merge branch 'branch-24.08' into optimize_get_json

c2dfa2e

Optimize validity computation

7dbc05c

Signed-off-by: Nghia Truong <[email protected]>

Cleanup

8dc9550

Signed-off-by: Nghia Truong <[email protected]>

Cleanup

21d4bc8

Signed-off-by: Nghia Truong <[email protected]>

Cleanup

b240dd6

Signed-off-by: Nghia Truong <[email protected]>

ttnghia added the performance label Jun 7, 2024

ttnghia self-assigned this Jun 7, 2024

ttnghia added 4 commits June 7, 2024 16:52

Add comment

5207c2d

Signed-off-by: Nghia Truong <[email protected]>

Turning kernel occupancy

eaada31

Signed-off-by: Nghia Truong <[email protected]>

Cleanup

e492d82

Signed-off-by: Nghia Truong <[email protected]>

Merge branch 'branch-24.08' into optimize_get_json

f0c2067

thirtiseven reviewed Jun 11, 2024

View reviewed changes

ttnghia added 3 commits June 10, 2024 22:17

Change padding for the scratch buffer

201565e

Signed-off-by: Nghia Truong <[email protected]>

Merge branch 'branch-24.08' into optimize_get_json

ba8fe74

Update docs

690f233

Signed-off-by: Nghia Truong <[email protected]>

ttnghia added 4 commits June 12, 2024 16:20

Add test for overflow case

caf5f72

Signed-off-by: Nghia Truong <[email protected]>

Pad the output buffer using max row size

a50a8b6

Signed-off-by: Nghia Truong <[email protected]>

Update test

a20b3a4

Signed-off-by: Nghia Truong <[email protected]>

Change the padding ratio

1e02af4

Signed-off-by: Nghia Truong <[email protected]>

res-life approved these changes Jun 14, 2024

View reviewed changes

thirtiseven approved these changes Jun 14, 2024

View reviewed changes

ttnghia merged commit 41c1b52 into NVIDIA:branch-24.08 Jun 14, 2024
3 checks passed

ttnghia deleted the optimize_get_json branch June 14, 2024 03:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize `get_json_object` by calling the main kernel only once #2129

Optimize `get_json_object` by calling the main kernel only once #2129

ttnghia commented Jun 7, 2024

thirtiseven Jun 11, 2024

ttnghia Jun 11, 2024 •

edited

Loading

ttnghia Jun 11, 2024 •

edited

Loading

res-life Jun 12, 2024

ttnghia Jun 12, 2024

ttnghia Jun 12, 2024 •

edited

Loading

ttnghia Jun 13, 2024 •

edited

Loading

ttnghia commented Jun 11, 2024

res-life commented Jun 12, 2024 •

edited

Loading

ttnghia commented Jun 13, 2024

Optimize get_json_object by calling the main kernel only once #2129

Optimize get_json_object by calling the main kernel only once #2129

Conversation

ttnghia commented Jun 7, 2024

thirtiseven Jun 11, 2024

Choose a reason for hiding this comment

ttnghia Jun 11, 2024 • edited Loading

Choose a reason for hiding this comment

ttnghia Jun 11, 2024 • edited Loading

Choose a reason for hiding this comment

res-life Jun 12, 2024

Choose a reason for hiding this comment

ttnghia Jun 12, 2024

Choose a reason for hiding this comment

ttnghia Jun 12, 2024 • edited Loading

Choose a reason for hiding this comment

ttnghia Jun 13, 2024 • edited Loading

Choose a reason for hiding this comment

ttnghia commented Jun 11, 2024

res-life commented Jun 12, 2024 • edited Loading

ttnghia commented Jun 13, 2024

Optimize `get_json_object` by calling the main kernel only once #2129

Optimize `get_json_object` by calling the main kernel only once #2129

ttnghia Jun 11, 2024 •

edited

Loading

ttnghia Jun 11, 2024 •

edited

Loading

ttnghia Jun 12, 2024 •

edited

Loading

ttnghia Jun 13, 2024 •

edited

Loading

res-life commented Jun 12, 2024 •

edited

Loading