-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unexpected index out of bounds
error for specific dataset and set of operations
#16830
Comments
@ritchie46 here is a repro for #16605 |
Can reproduce. If it is of use for debugging: It does not seem to happen using the Lazy API. df = df.lazy()
df = df.filter(pl.col("val1") | pl.col("val3"))
df = df.with_columns(pl.col("val4").max().over("group1", "group2").fill_null(0).alias("val4"))
df = df.filter(pl.col("val4") > pl.col("val7").sum().over("group1", "group2"))
df.with_columns(pl.col("val4").floor()).collect()
# shape: (9, 10)
# ┌────────┬────────┬──────┬──────┬───┬───────┬──────┬──────────┬───────────┐
# │ group1 ┆ group2 ┆ val1 ┆ val2 ┆ … ┆ val5 ┆ val6 ┆ val7 ┆ val8 │
# │ --- ┆ --- ┆ --- ┆ --- ┆ ┆ --- ┆ --- ┆ --- ┆ --- │
# │ i64 ┆ i64 ┆ bool ┆ f64 ┆ ┆ f64 ┆ i64 ┆ f64 ┆ f64 │
# ╞════════╪════════╪══════╪══════╪═══╪═══════╪══════╪══════════╪═══════════╡
# │ 1001 ┆ 100004 ┆ true ┆ null ┆ … ┆ 87.0 ┆ 0 ┆ 2.705119 ┆ 40.904418 │
# │ 1001 ┆ 100007 ┆ true ┆ null ┆ … ┆ 173.0 ┆ 0 ┆ 2.6165 ┆ 34.486 │
# │ 1001 ┆ 100009 ┆ true ┆ null ┆ … ┆ 211.0 ┆ 0 ┆ 4.458603 ┆ 77.95037 │
# │ 1001 ┆ 100010 ┆ true ┆ null ┆ … ┆ 178.0 ┆ 0 ┆ 2.3165 ┆ 37.77 │
# │ 1001 ┆ 100011 ┆ true ┆ null ┆ … ┆ 174.0 ┆ 0 ┆ 5.548593 ┆ 71.207139 │
# │ 1001 ┆ 100012 ┆ true ┆ null ┆ … ┆ 196.0 ┆ 0 ┆ 2.1685 ┆ 32.888 │
# │ 1001 ┆ 100015 ┆ true ┆ null ┆ … ┆ 89.0 ┆ 0 ┆ 2.400406 ┆ 39.732588 │
# │ 1003 ┆ 100008 ┆ true ┆ null ┆ … ┆ 238.0 ┆ 0 ┆ 4.913397 ┆ 93.076396 │
# │ 1003 ┆ 100013 ┆ true ┆ null ┆ … ┆ 101.5 ┆ 0 ┆ 2.254043 ┆ 45.486928 │
# └────────┴────────┴──────┴──────┴───┴───────┴──────┴──────────┴───────────┘ |
I cannot reproduce this 🤔 |
Surprisingly I cannot reproduce using the given data/code, however I have the same issue. I will try to find the time to make a minimal repro code for my case. |
Here it is, I was able to cut out a lot of the initial code : import polars as pl
import numpy as np
df = pl.DataFrame({"index_1":np.repeat(np.arange(100), 10), "index_2":np.repeat(np.arange(100), 10)})
df = pl.concat([df[0:500], df[500:]])
df = df.filter(df["index_1"] == 0)
df = df.with_columns(index_2 = pl.Series(values=[0]*10))
df.set_sorted("index_2") #Also crashes on write_parquet and some other operations It crashes for me (Windows 11).
|
That one I can reproduce, thanks! |
Taking a look. |
Perhaps the original issue could be platform specific? I can reproduce it on macOS (same as @maxzw). @Elvynzs I can reproduce your example also. It seems it may be a little different, and have to do with your use of Series. Changing the filter to use expressions makes the example run for me: df.filter(pl.col("index_1") == 0) |
@Elvynzs I'm not sure your issue is equal to the one in the description, but I'll check if the fix also works for mine 😃 |
The original issue no longer reproduces for me thanks to #16852 |
I can confirm as well! Thanks @ritchie46! 💯 |
Checks
Reproducible example
With data of shape (872, 10):
repro_data.txt
Note: not all columns have operations performed on them, but they apparently need to be present for the error to occur!
Log output
Issue description
Operations result in unexpected error.
Casting to pandas and back anywhere within these operations resolves the issue:
Expected behavior
I expect this error not to occur.
Installed versions
The text was updated successfully, but these errors were encountered: