Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support "large" arrow types in the parquet reader #398

Open
ptomecek opened this issue Nov 27, 2024 · 0 comments
Open

Support "large" arrow types in the parquet reader #398

ptomecek opened this issue Nov 27, 2024 · 0 comments
Labels
adapter: parquet Issues and PRs related to our Apache Parquet/Arrow adapter

Comments

@ptomecek
Copy link
Collaborator

ptomecek commented Nov 27, 2024

Is your feature request related to a problem? Please describe.
At the moment, csp fails to read parquet files that have the "large string" arrow type. This is particularly problematic because polars uses large string by default and doesn't plan to change. More info about polars string types here. This means that parquet files written by polars are not directly readable by csp.

Describe the solution you'd like
The csp parquet reader natively supports large string type.

Describe alternatives you've considered
We have utility functions now when writing polars parquet files to convert to arrow, identify large types, cast to small types, and write using pyarrow.parquet, but this is tedious/non-standard and this approach doesn't support other polars functionality (like the streaming engine) that might generate such parquet files.

Additional context
N/A

@ptomecek ptomecek added the adapter: parquet Issues and PRs related to our Apache Parquet/Arrow adapter label Nov 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
adapter: parquet Issues and PRs related to our Apache Parquet/Arrow adapter
Projects
None yet
Development

No branches or pull requests

1 participant