Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat!: Default to coalesce=False in left outer join #16769

Merged
merged 5 commits into from
Jun 6, 2024
Merged

Conversation

stinodego
Copy link
Member

@stinodego stinodego commented Jun 6, 2024

Closes #13441

Changes

  • Change default coalesce behavior for join(how="left") from True to False.

Example

Before:

>>> df1 = pl.DataFrame({"a": [1, 2, 3], "b": [9, 9, 9]})
>>> df2 = pl.DataFrame({"a": [2, 3, 4], "c": [0, 0, 0]})
>>> df1.join(df2, on="a", how="left")
shape: (3, 3)
┌─────┬─────┬──────┐
│ a   ┆ b   ┆ c    │
│ --- ┆ --- ┆ ---  │
│ i64 ┆ i64 ┆ i64  │
╞═════╪═════╪══════╡
│ 1   ┆ 9   ┆ null │
│ 2   ┆ 9   ┆ 0    │
│ 3   ┆ 9   ┆ 0    │
└─────┴─────┴──────┘

After:

>>> df1.join(df2, on="a", how="left")
shape: (3, 4)
┌─────┬─────┬─────────┬──────┐
│ a   ┆ b   ┆ a_right ┆ c    │
│ --- ┆ --- ┆ ---     ┆ ---  │
│ i64 ┆ i64 ┆ i64     ┆ i64  │
╞═════╪═════╪═════════╪══════╡
│ 1   ┆ 9   ┆ null    ┆ null │
│ 2   ┆ 9   ┆ 2       ┆ 0    │
│ 3   ┆ 9   ┆ 3       ┆ 0    │
└─────┴─────┴─────────┴──────┘

Use coalesce=True to get the previous behavior:

>>> df1.join(df2, on="a", how="left", coalesce=True)
shape: (3, 3)
┌─────┬─────┬──────┐
│ a   ┆ b   ┆ c    │
│ --- ┆ --- ┆ ---  │
│ i64 ┆ i64 ┆ i64  │
╞═════╪═════╪══════╡
│ 1   ┆ 9   ┆ null │
│ 2   ┆ 9   ┆ 0    │
│ 3   ┆ 9   ┆ 0    │
└─────┴─────┴──────┘

Specifying coalesce=True is not required if the right column is never projected, as it will not be materialized. The following is equivalent to the code above:

>>> df1.join(df2, on="a", how="left").select("a", "b", "c")
shape: (3, 3)
┌─────┬─────┬──────┐
│ a   ┆ b   ┆ c    │
│ --- ┆ --- ┆ ---  │
│ i64 ┆ i64 ┆ i64  │
╞═════╪═════╪══════╡
│ 1   ┆ 9   ┆ null │
│ 2   ┆ 9   ┆ 0    │
│ 3   ┆ 9   ┆ 0    │
└─────┴─────┴──────┘

@github-actions github-actions bot added breaking Change that breaks backwards compatibility enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars labels Jun 6, 2024
@stinodego stinodego marked this pull request as draft June 6, 2024 11:15
@stinodego stinodego force-pushed the breaking-left-join branch from ccbac35 to 7f688fd Compare June 6, 2024 11:46
@stinodego stinodego marked this pull request as ready for review June 6, 2024 11:48
@stinodego stinodego marked this pull request as draft June 6, 2024 11:49
Copy link

codecov bot commented Jun 6, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 81.31%. Comparing base (4d35be2) to head (b8dea60).
Report is 7 commits behind head on main.

Current head b8dea60 differs from pull request most recent head 80dcc03

Please upload reports for the commit 80dcc03 to get more accurate results.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #16769      +/-   ##
==========================================
+ Coverage   81.29%   81.31%   +0.01%     
==========================================
  Files        1424     1424              
  Lines      187205   187208       +3     
  Branches     2714     2713       -1     
==========================================
+ Hits       152194   152233      +39     
+ Misses      34514    34478      -36     
  Partials      497      497              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@stinodego stinodego marked this pull request as ready for review June 6, 2024 12:44
@stinodego stinodego force-pushed the breaking-left-join branch from b8dea60 to 80dcc03 Compare June 6, 2024 16:21
Copy link

codspeed-hq bot commented Jun 6, 2024

CodSpeed Performance Report

Merging #16769 will degrade performances by 21.98%

Comparing breaking-left-join (80dcc03) with main (1fbfa08)

Summary

❌ 1 regressions
✅ 36 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark main breaking-left-join Change
test_groupby_h2oai_q5 1.8 ms 2.3 ms -21.98%

@ritchie46 ritchie46 merged commit f4549f1 into main Jun 6, 2024
27 checks passed
@ritchie46 ritchie46 deleted the breaking-left-join branch June 6, 2024 17:11
@lyngc
Copy link

lyngc commented Jun 13, 2024

This is going to be annoying when trying to convince the team to use polars. Have to explain that every time we have to do a left join, we need to add another parameter to avoid having _right columns all over

@stinodego stinodego added the skip changelog Do not include in changelog label Jun 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking Change that breaks backwards compatibility enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars skip changelog Do not include in changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Default to coalesce=False in left outer join
3 participants