feat: Add `index_of()` function to `Series` and `Expr` #19894

itamarst · 2024-11-20T18:14:30Z

Categoricals don't work yet; see #20171 and #20318.

codecov · 2024-11-20T18:49:19Z

Codecov Report

Attention: Patch coverage is 98.68421% with 2 lines in your changes missing coverage. Please review.

Project coverage is 79.00%. Comparing base (72cd66a) to head (8c3e0d4).
Report is 4 commits behind head on main.

Files with missing lines	Patch %	Lines
crates/polars-ops/src/series/ops/index_of.rs	98.76%	1 Missing ⚠️
.../polars-python/src/lazyframe/visitor/expr_nodes.rs	0.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main   #19894      +/-   ##
==========================================
+ Coverage   78.95%   79.00%   +0.04%     
==========================================
  Files        1564     1566       +2     
  Lines      220882   221035     +153     
  Branches     2510     2510              
==========================================
+ Hits       174407   174619     +212     
+ Misses      45900    45842      -58     
+ Partials      575      574       -1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

crates/polars-ops/src/series/ops/index_of.rs

crates/polars-plan/src/dsl/mod.rs

crates/polars-ops/src/series/ops/index_of.rs

itamarst · 2024-12-02T18:31:29Z

I think I've figured out how to use row encoding, so now I just need to write lots and lots of tests and make sure it actually works beyond the trivial case I've already tested.

itamarst · 2024-12-02T23:04:58Z

Unfortunately categorical and enum don't work (they also don't work for search_sorted(), which would be nice to fix); they ought to work, since e.g. pl.Series(["A", "B"], dtype=pl.Categorical) == "B" works, but I'm not sure how that is different than what I'm doing, so would appreciate any hints.

E.g. for Categorical:

>>> import polars as pl
>>> pl.Series(["a", "b", "a"], dtype=pl.Categorical).index_of("a")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/itamarst/devel/polars/py-polars/polars/series/series.py", line 4771, in index_of
    return F.select(F.lit(self).index_of(element)).item()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/itamarst/devel/polars/py-polars/polars/functions/lazy.py", line 1913, in select
    return pl.DataFrame().select(*exprs, **named_exprs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/itamarst/devel/polars/py-polars/polars/dataframe/frame.py", line 9113, in select
    return self.lazy().select(*exprs, **named_exprs).collect(_eager=True)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/itamarst/devel/polars/py-polars/polars/lazyframe/frame.py", line 2029, in collect
    return wrap_df(ldf.collect(callback))
                   ^^^^^^^^^^^^^^^^^^^^^
polars.exceptions.InvalidOperationError: got invalid or ambiguous dtypes: '[cat, str]' in expression 'index_of'

Consider explicitly casting your input types to resolve potential ambiguity.

Resolved plan until failure:

        ---> FAILED HERE RESOLVING 'select' <---
 SELECT [Series.index_of([String(a)])] FROM
  DF []; PROJECT */0 COLUMNS; SELECTION: None

coastalwhite · 2024-12-04T07:51:10Z

My guess is that you are treating a categorical as a string when it goes into the row encoding. If you want to compare the row encoding of a series with the row encoding of another series they need to have been encoded with the exact same dtype (i.e. so the same RevMap as well) otherwise the output is undefined. If search_sorted doesn't do that either, that is a bug and I can look into it.

itamarst · 2024-12-04T13:14:59Z

@coastalwhite search_sorted() does gets it wrong, yes. And separately if memory serves, you pass in a non-matching pl.lit("a", dtype=pl.Categorical) it doesn't error out with mismatching categoricals, it gives the wrong result.

itamarst · 2024-12-04T13:16:03Z

@coastalwhite and the question is how/where do I convert to an enum/categorical, my attempts have failed so far.

crates/polars-ops/src/series/ops/index_of.rs

crates/polars-plan/src/dsl/function_expr/index_of.rs

crates/polars-ops/src/series/ops/search_sorted.rs

crates/polars-plan/src/dsl/mod.rs

crates/polars-ops/src/series/ops/index_of.rs

itamarst · 2025-01-02T17:28:31Z

Thank you for the new casting logic! I've updated to use it, and addressed the other two comments.

ritchie46 · 2025-01-05T12:04:58Z

Alright, looks great @itamarst. Thanks. I believe we only need docs entries on the python side (so that they end up in the ref guide), then it is good to go.

rodrigogiraoserrao

Do we really need the tiny user-guide page? It's pretty much the same as the docstrings, so I feel like it's enough to have the docstrings.

docs/source/user-guide/expressions/searching.md

itamarst · 2025-01-07T13:03:13Z

OK, I figured out how to add index_of to the API reference guide, and removed the user guide page.

ritchie46 · 2025-01-07T13:28:11Z

Alright, can you rebase? I believe that that should resolve CI.

itamarst · 2025-01-07T15:24:30Z

Done.

ritchie46 · 2025-01-07T18:46:51Z

Alright, thanks @itamarst, looks good!

github-actions bot added enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars labels Nov 20, 2024

itamarst commented Nov 21, 2024

View reviewed changes

crates/polars-ops/src/series/ops/index_of.rs Outdated Show resolved Hide resolved

itamarst commented Nov 21, 2024

View reviewed changes

crates/polars-ops/src/series/ops/index_of.rs Outdated Show resolved Hide resolved

itamarst commented Nov 21, 2024

View reviewed changes

crates/polars-plan/src/dsl/mod.rs Show resolved Hide resolved

itamarst marked this pull request as ready for review November 21, 2024 14:13

itamarst requested review from ritchie46, c-peters, alexander-beedie, MarcoGorelli, reswqa, wence- and orlp as code owners November 21, 2024 14:13

nameexhaustion reviewed Nov 21, 2024

View reviewed changes

crates/polars-ops/src/series/ops/index_of.rs Outdated Show resolved Hide resolved

ritchie46 reviewed Nov 22, 2024

View reviewed changes

crates/polars-ops/src/series/ops/index_of.rs Outdated Show resolved Hide resolved

crates/polars-ops/src/series/ops/index_of.rs Outdated Show resolved Hide resolved

crates/polars-ops/src/series/ops/index_of.rs Outdated Show resolved Hide resolved

itamarst marked this pull request as draft November 27, 2024 15:00

itamarst closed this Dec 2, 2024

itamarst reopened this Dec 2, 2024

itamarst changed the title ~~feat: Start of Series.index_of(), for primitive numeric types~~ feat: Series.index_of() Dec 2, 2024

coastalwhite reviewed Dec 4, 2024

View reviewed changes

crates/polars-ops/src/series/ops/index_of.rs Outdated Show resolved Hide resolved

coastalwhite reviewed Dec 4, 2024

View reviewed changes

crates/polars-plan/src/dsl/function_expr/index_of.rs Outdated Show resolved Hide resolved

itamarst mentioned this pull request Dec 5, 2024

search_sorted on Categorial and Enum Series fails to work if given a string #20171

Open

2 tasks

itamarst marked this pull request as ready for review December 5, 2024 16:16

pythonspeed added 4 commits December 17, 2024 08:26

Merge remote-tracking branch 'origin/main' into 5503-series-index_of

0a02b48

Enum literals work now.

dbebe7c

Add missing cfg

138bf73

Remove redundant type annotations

dbc0cbd

ritchie46 reviewed Dec 21, 2024

View reviewed changes

crates/polars-ops/src/series/ops/search_sorted.rs Show resolved Hide resolved

crates/polars-plan/src/dsl/mod.rs Outdated Show resolved Hide resolved

crates/polars-ops/src/series/ops/index_of.rs Show resolved Hide resolved

pythonspeed added 5 commits January 2, 2025 11:37

Merge remote-tracking branch 'origin/main' into 5503-series-index_of

35250de

Switch to strict casting.

731fd6a

Remove duplicate logic.

7ee4ede

Don't panic.

3179992

Improve testing slightly, and pacify mypy.

a9a06af

itamarst requested a review from ritchie46 January 2, 2025 17:28

pythonspeed added 4 commits January 6, 2025 09:18

Merge remote-tracking branch 'origin/main' into 5503-series-index_of

b0196ae

Minimal guide level documentation for index_of().

0fe814e

Pacify linter

b358e22

Reformat so dprint is happy.

541049a

rodrigogiraoserrao reviewed Jan 6, 2025

View reviewed changes

pythonspeed added 2 commits January 6, 2025 10:56

fix reference

68ebd34

Add index references

3cb65df

nameexhaustion changed the title ~~feat: Series.index_of()~~ feat: Add index_of() function to Series and Expr Jan 7, 2025

pythonspeed added 2 commits January 7, 2025 07:59

Remove user guide.

0965950

Add index_of to Python API docs

22f3e88

pythonspeed added 2 commits January 7, 2025 09:48

Merge remote-tracking branch 'origin/main' into 5503-series-index_of

0a61902

Update to changed API.

8c3e0d4

ritchie46 merged commit 785bb1e into pola-rs:main Jan 7, 2025
28 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add `index_of()` function to `Series` and `Expr` #19894

feat: Add `index_of()` function to `Series` and `Expr` #19894

itamarst commented Nov 20, 2024 •

edited

Loading

codecov bot commented Nov 20, 2024 •

edited

Loading

itamarst commented Dec 2, 2024

itamarst commented Dec 2, 2024 •

edited

Loading

coastalwhite commented Dec 4, 2024

itamarst commented Dec 4, 2024

itamarst commented Dec 4, 2024

itamarst commented Jan 2, 2025

ritchie46 commented Jan 5, 2025

rodrigogiraoserrao left a comment

itamarst commented Jan 7, 2025

ritchie46 commented Jan 7, 2025

itamarst commented Jan 7, 2025

ritchie46 commented Jan 7, 2025

feat: Add index_of() function to Series and Expr #19894

feat: Add index_of() function to Series and Expr #19894

Conversation

itamarst commented Nov 20, 2024 • edited Loading

codecov bot commented Nov 20, 2024 • edited Loading

Codecov Report

itamarst commented Dec 2, 2024

itamarst commented Dec 2, 2024 • edited Loading

coastalwhite commented Dec 4, 2024

itamarst commented Dec 4, 2024

itamarst commented Dec 4, 2024

itamarst commented Jan 2, 2025

ritchie46 commented Jan 5, 2025

rodrigogiraoserrao left a comment

Choose a reason for hiding this comment

itamarst commented Jan 7, 2025

ritchie46 commented Jan 7, 2025

itamarst commented Jan 7, 2025

ritchie46 commented Jan 7, 2025

feat: Add `index_of()` function to `Series` and `Expr` #19894

feat: Add `index_of()` function to `Series` and `Expr` #19894

itamarst commented Nov 20, 2024 •

edited

Loading

codecov bot commented Nov 20, 2024 •

edited

Loading

itamarst commented Dec 2, 2024 •

edited

Loading