Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve perfomance of reverse function #14025

Merged
merged 3 commits into from
Jan 9, 2025
Merged

Conversation

tlm365
Copy link
Contributor

@tlm365 tlm365 commented Jan 6, 2025

Which issue does this PR close?

Closes #.

Rationale for this change

Improve performance of reverse function.

What changes are included in this PR?

Are these changes tested?

By CI.

Are there any user-facing changes?

No.

*Benchmark result

Compiling datafusion-functions v44.0.0 (/home/tailm/repos/github/datafusion/datafusion/functions)
    Finished `bench` profile [optimized] target(s) in 1m 02s
     Running benches/reverse.rs (target/release/deps/reverse-ee84c92e095fbd3f)
Gnuplot not found, using plotters backend
reverse_string_view [size=1024, str_len=8]
                        time:   [24.668 µs 24.701 µs 24.739 µs]
                        change: [-38.667% -38.454% -38.190%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
  9 (9.00%) high mild
  4 (4.00%) high severe

reverse_string_view [size=1024, str_len=32]
                        time:   [72.822 µs 72.914 µs 73.004 µs]
                        change: [-39.081% -38.568% -38.119%] (p = 0.00 < 0.05)
                        Performance has improved.

reverse_string [size=1024, str_len=32]
                        time:   [69.446 µs 69.490 µs 69.542 µs]
                        change: [-40.553% -40.437% -40.332%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
  7 (7.00%) high mild
  6 (6.00%) high severe

reverse_string_view [size=4096, str_len=8]
                        time:   [96.201 µs 96.293 µs 96.398 µs]
                        change: [-38.778% -38.551% -38.312%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  5 (5.00%) high mild
  6 (6.00%) high severe

reverse_string_view [size=4096, str_len=32]
                        time:   [279.38 µs 280.15 µs 281.21 µs]
                        change: [-40.007% -39.843% -39.678%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  4 (4.00%) high mild
  7 (7.00%) high severe

reverse_string [size=4096, str_len=32]
                        time:   [276.02 µs 276.34 µs 276.72 µs]
                        change: [-40.441% -40.357% -40.266%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
  6 (6.00%) high mild
  7 (7.00%) high severe

@tlm365 tlm365 marked this pull request as ready for review January 6, 2025 17:37

for string in string_array.iter() {
if let Some(s) = string {
let mut reversed = String::with_capacity(s.len());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this allocation can be removed by using the Write impl? See https://arrow.apache.org/rust/arrow/array/type.GenericStringBuilder.html#example-incrementally-writing-strings-with-stdfmtwrite

Perhaps by iterating through the rev iterator, writing chars one at a time.


If the above is slower, it could also be interesting to see if reusing the String allocation with a clear() on every loop is faster

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@simonvandel Thanks for reviewing

I wonder if this allocation can be removed by using the Write impl? See https://arrow.apache.org/rust/arrow/array/type.GenericStringBuilder.html#example-incrementally-writing-strings-with-stdfmtwrite
Perhaps by iterating through the rev iterator, writing chars one at a time.

I tested this solution and the performance is not as good as this PR.

If the above is slower, it could also be interesting to see if reusing the String allocation with a clear() on every loop is faster

Indeed, this one is faster. I will update the code and provide benchmarks shortly. TYSM ❤️

.map(|string| string.map(|string: &str| string.chars().rev().collect::<String>()))
.collect::<GenericStringArray<T>>();
let mut builder: GenericStringBuilder<T> =
GenericStringBuilder::with_capacity(string_array.len(), 1024);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can use the actual data size here for pre-allocation, instead of a constant 1024, the complexity of adding another argument for array size seems reasonable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@2010YOUY01 Thanks for reviewing,

I think we can use the actual data size here for pre-allocation, instead of a constant 1024, the complexity of adding another argument for array size seems reasonable

I agree that it would be better if we could pre-allocate the actual data size here, but I think it's difficult to compute accurately - it depends on context. Keeping it simple here seems reasonable as well.

Currently GenericStringBuilder have new and with_capacity to init new builder, and 1024 is default size if we using GenericStringBuilder::new (ref) that's why I choose 1024 here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this is a good alternative to 1024? 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this is a good alternative to 1024? 🤔

Yes, I think so

Copy link
Contributor Author

@tlm365 tlm365 Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think so

Unfortunately, it's a bit slower than the current implementation. It allocates more mem than I expected, string_array.get_array_memory_size() = 16632 when str_len = 1024, and = 66168 when str_len = 4096 🤔

Copy link
Contributor

@2010YOUY01 2010YOUY01 Jan 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed get_array_memory_size() will overestimate source, also many other string function is not using the accurate estimation

I think it's okay to keep it simple and use the default size now, perhaps we can introduce a function to calculate only payload size in the future, and make estimation correct for all usages of GenericStringBuilder. Thank you for the experiment.

@alamb
Copy link
Contributor

alamb commented Jan 8, 2025

Nice work @tlm365 @2010YOUY01 and @simonvandel ❤️

@alamb
Copy link
Contributor

alamb commented Jan 9, 2025

🚀

@alamb alamb merged commit f9d3133 into apache:main Jan 9, 2025
27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants