Improve perfomance of `reverse` function #14025

tlm365 · 2025-01-06T16:50:22Z

Which issue does this PR close?

Closes #.

Rationale for this change

Improve performance of reverse function.

What changes are included in this PR?

Are these changes tested?

By CI.

Are there any user-facing changes?

No.

*Benchmark result

Compiling datafusion-functions v44.0.0 (/home/tailm/repos/github/datafusion/datafusion/functions)
    Finished `bench` profile [optimized] target(s) in 1m 02s
     Running benches/reverse.rs (target/release/deps/reverse-ee84c92e095fbd3f)
Gnuplot not found, using plotters backend
reverse_string_view [size=1024, str_len=8]
                        time:   [24.668 µs 24.701 µs 24.739 µs]
                        change: [-38.667% -38.454% -38.190%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
  9 (9.00%) high mild
  4 (4.00%) high severe

reverse_string_view [size=1024, str_len=32]
                        time:   [72.822 µs 72.914 µs 73.004 µs]
                        change: [-39.081% -38.568% -38.119%] (p = 0.00 < 0.05)
                        Performance has improved.

reverse_string [size=1024, str_len=32]
                        time:   [69.446 µs 69.490 µs 69.542 µs]
                        change: [-40.553% -40.437% -40.332%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
  7 (7.00%) high mild
  6 (6.00%) high severe

reverse_string_view [size=4096, str_len=8]
                        time:   [96.201 µs 96.293 µs 96.398 µs]
                        change: [-38.778% -38.551% -38.312%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  5 (5.00%) high mild
  6 (6.00%) high severe

reverse_string_view [size=4096, str_len=32]
                        time:   [279.38 µs 280.15 µs 281.21 µs]
                        change: [-40.007% -39.843% -39.678%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  4 (4.00%) high mild
  7 (7.00%) high severe

reverse_string [size=4096, str_len=32]
                        time:   [276.02 µs 276.34 µs 276.72 µs]
                        change: [-40.441% -40.357% -40.266%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
  6 (6.00%) high mild
  7 (7.00%) high severe

Signed-off-by: Tai Le Manh <[email protected]>

simonvandel · 2025-01-07T02:47:37Z

datafusion/functions/src/unicode/reverse.rs

+
+    for string in string_array.iter() {
+        if let Some(s) = string {
+            let mut reversed = String::with_capacity(s.len());


I wonder if this allocation can be removed by using the Write impl? See https://arrow.apache.org/rust/arrow/array/type.GenericStringBuilder.html#example-incrementally-writing-strings-with-stdfmtwrite

Perhaps by iterating through the rev iterator, writing chars one at a time.

If the above is slower, it could also be interesting to see if reusing the String allocation with a clear() on every loop is faster

@simonvandel Thanks for reviewing

I wonder if this allocation can be removed by using the Write impl? See https://arrow.apache.org/rust/arrow/array/type.GenericStringBuilder.html#example-incrementally-writing-strings-with-stdfmtwrite
Perhaps by iterating through the rev iterator, writing chars one at a time.

I tested this solution and the performance is not as good as this PR.

If the above is slower, it could also be interesting to see if reusing the String allocation with a clear() on every loop is faster

Indeed, this one is faster. I will update the code and provide benchmarks shortly. TYSM ❤️

2010YOUY01 · 2025-01-07T06:07:08Z

datafusion/functions/src/unicode/reverse.rs

-        .map(|string| string.map(|string: &str| string.chars().rev().collect::<String>()))
-        .collect::<GenericStringArray<T>>();
+    let mut builder: GenericStringBuilder<T> =
+        GenericStringBuilder::with_capacity(string_array.len(), 1024);


I think we can use the actual data size here for pre-allocation, instead of a constant 1024, the complexity of adding another argument for array size seems reasonable

@2010YOUY01 Thanks for reviewing,

I think we can use the actual data size here for pre-allocation, instead of a constant 1024, the complexity of adding another argument for array size seems reasonable

I agree that it would be better if we could pre-allocate the actual data size here, but I think it's difficult to compute accurately - it depends on context. Keeping it simple here seems reasonable as well.

Currently GenericStringBuilder have new and with_capacity to init new builder, and 1024 is default size if we using GenericStringBuilder::new (ref) that's why I choose 1024 here.

I wonder if this is a good alternative to 1024? 🤔

I wonder if this is a good alternative to 1024? 🤔

Yes, I think so

Yes, I think so

Unfortunately, it's a bit slower than the current implementation. It allocates more mem than I expected, string_array.get_array_memory_size() = 16632 when str_len = 1024, and = 66168 when str_len = 4096 🤔

I noticed get_array_memory_size() will overestimate source, also many other string function is not using the accurate estimation

I think it's okay to keep it simple and use the default size now, perhaps we can introduce a function to calculate only payload size in the future, and make estimation correct for all usages of GenericStringBuilder. Thank you for the experiment.

alamb · 2025-01-08T22:19:57Z

Nice work @tlm365 @2010YOUY01 and @simonvandel ❤️

alamb · 2025-01-09T21:45:54Z

🚀

Improve perfomance of 'reverse' function

af3a6da

Signed-off-by: Tai Le Manh <[email protected]>

github-actions bot added the functions label Jan 6, 2025

tlm365 marked this pull request as ready for review January 6, 2025 17:37

simonvandel reviewed Jan 7, 2025

View reviewed changes

2010YOUY01 reviewed Jan 7, 2025

View reviewed changes

tlm365 added 2 commits January 7, 2025 14:27

Apply sugestion change

f00aebb

Fix typo

92e74a5

alamb approved these changes Jan 8, 2025

View reviewed changes

alamb merged commit f9d3133 into apache:main Jan 9, 2025
27 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve perfomance of `reverse` function #14025

Improve perfomance of `reverse` function #14025

tlm365 commented Jan 6, 2025 •

edited

Loading

simonvandel Jan 7, 2025

tlm365 Jan 7, 2025

2010YOUY01 Jan 7, 2025

tlm365 Jan 7, 2025

tlm365 Jan 7, 2025

2010YOUY01 Jan 7, 2025

tlm365 Jan 7, 2025 •

edited

Loading

2010YOUY01 Jan 8, 2025 •

edited

Loading

alamb commented Jan 8, 2025

alamb commented Jan 9, 2025

Improve perfomance of reverse function #14025

Improve perfomance of reverse function #14025

Conversation

tlm365 commented Jan 6, 2025 • edited Loading

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

simonvandel Jan 7, 2025

Choose a reason for hiding this comment

tlm365 Jan 7, 2025

Choose a reason for hiding this comment

2010YOUY01 Jan 7, 2025

Choose a reason for hiding this comment

tlm365 Jan 7, 2025

Choose a reason for hiding this comment

tlm365 Jan 7, 2025

Choose a reason for hiding this comment

2010YOUY01 Jan 7, 2025

Choose a reason for hiding this comment

tlm365 Jan 7, 2025 • edited Loading

Choose a reason for hiding this comment

2010YOUY01 Jan 8, 2025 • edited Loading

Choose a reason for hiding this comment

alamb commented Jan 8, 2025

alamb commented Jan 9, 2025

Improve perfomance of `reverse` function #14025

Improve perfomance of `reverse` function #14025

tlm365 commented Jan 6, 2025 •

edited

Loading

tlm365 Jan 7, 2025 •

edited

Loading

2010YOUY01 Jan 8, 2025 •

edited

Loading