Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Platform Dependent pyo3_runtime.PanicException #17089

Closed
2 tasks done
HCelion opened this issue Jun 20, 2024 · 3 comments
Closed
2 tasks done

Platform Dependent pyo3_runtime.PanicException #17089

HCelion opened this issue Jun 20, 2024 · 3 comments
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars

Comments

@HCelion
Copy link

HCelion commented Jun 20, 2024

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

Unfortunately I don't have exactly reproducing code.

Log output

found multiple sources; run comm_subplan_elim
join parallel: false
keys/aggregates are not partitionable: running default HASH AGGREGATION
join parallel: false
join parallel: false
dataframe filtered
join parallel: false
dataframe filtered
dataframe filtered
dataframe filtered
dataframe filtered
dataframe filtered
LEFT join dataframes finished
thread '<unnamed>' panicked at crates/polars-core/src/series/mod.rs:213:42:
index out of bounds: the len is 1 but the index is 1
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.12/site-packages/polars/lazyframe/frame.py", line 1967, in collect
    return wrap_df(ldf.collect(callback))
                   ^^^^^^^^^^^^^^^^^^^^^
pyo3_runtime.PanicException: index out of bounds: the len is 1 but the index is 1

Issue description

I believe this relates to closed issue 16830

I have code that joins two data frames

frame_a = complicated_operation(source)
frame_b = other_complicated_operation(source)

I can run

frame_a_collected = frame_a.collect()
frame_b_collected = frame_b.collect()

(frame_a_collected.join(frame_b_collected, on=['a', 'b', 'c'], how='left')

Independent of platform.
If I change the order of operation though

frame_a.join(frame_b, on = ['a', 'b', 'c], how='left').collect()

then I get an error on linux', but not on darwin`.
The error reads

thread '<unnamed>' panicked at crates/polars-core/src/series/mod.rs:213:42:
index out of bounds: the len is 1 but the index is 1
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.12/site-packages/polars/lazyframe/frame.py", line 1967, in collect
    return wrap_df(ldf.collect(callback))
                   ^^^^^^^^^^^^^^^^^^^^^
pyo3_runtime.PanicException: index out of bounds: the len is 1 but the index is 1

When digging a bit deeper I found a statement that seems to trigger the difference.

In frame_a I calculate a feature

frame_a = (source
    .with_columns(
        raw_value=when(col("value") > 0)
        .then(col("value"))
        .otherwise(0.0)
        .fill_nan(0.0)
    )

I do not merge on the feature, however when I remove the feature from frame_a the join works just fine on both platforms

Expected behavior

I expect that the code runs on linux in the same way as it does on mac os/darwin

Installed versions

On Linux, where it breaks

--------Version info---------
Polars:               0.20.31
Index type:           UInt32
Platform:             Linux-4.14.322-246.539.amzn2.x86_64-x86_64-with-glibc2.36
Python:               3.12.3 (main, May 14 2024, 07:23:41) [GCC 12.2.0]

----Optional dependencies----
adbc_driver_manager:  <not installed>
cloudpickle:          3.0.0
connectorx:           <not installed>
deltalake:            <not installed>
fastexcel:            <not installed>
fsspec:               2024.6.0
gevent:               <not installed>
hvplot:               <not installed>
matplotlib:           3.8.1
nest_asyncio:         1.6.0
numpy:                1.26.4
openpyxl:             3.1.2
pandas:               2.1.4
pyarrow:              15.0.0
pydantic:             2.7.4
pyiceberg:            <not installed>
pyxlsb:               <not installed>
sqlalchemy:           1.4.49
torch:                <not installed>
xlsx2csv:             <not installed>
xlsxwriter:           <not installed>

On MacOs where it runs

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[2], line 1
----> 1 file_names = [paths.events_path / "3796295" for name in chunk]

NameError: name 'chunk' is not defined
--------Version info---------
Polars:               0.20.31
Index type:           UInt32
Platform:             macOS-13.3.1-arm64-i386-64bit
Python:               3.12.2 (main, Feb  6 2024, 20:19:44) [Clang 15.0.0 (clang-1500.1.0.2.5)]

----Optional dependencies----
adbc_driver_manager:  <not installed>
cloudpickle:          3.0.0
connectorx:           <not installed>
deltalake:            <not installed>
fastexcel:            <not installed>
fsspec:               2024.6.0
gevent:               <not installed>
hvplot:               <not installed>
matplotlib:           3.8.1
nest_asyncio:         1.6.0
numpy:                1.26.4
openpyxl:             3.1.2
pandas:               2.1.4
pyarrow:              15.0.0
pydantic:             2.7.4
pyiceberg:            <not installed>
pyxlsb:               <not installed>
sqlalchemy:           1.4.49
torch:                2.3.1
xlsx2csv:             <not installed>
xlsxwriter:           <not installed>
@HCelion HCelion added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Jun 20, 2024
@cmdlineluser
Copy link
Contributor

Perhaps you can test the latest pre-release version polars-1.0.0b1 to see if it still reproduces?

@HCelion
Copy link
Author

HCelion commented Jun 20, 2024

Thanks, will try and report back

@HCelion
Copy link
Author

HCelion commented Jun 20, 2024

I had to do quite a bit of a rewrite, but it seems to run now on both platforms.

@HCelion HCelion closed this as completed Jun 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars
Projects
None yet
Development

No branches or pull requests

2 participants