Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Truncate the table if the size exceeds more than 2GB #446

Merged
merged 2 commits into from
Jan 15, 2025
Merged

Conversation

raghumdani
Copy link
Collaborator

Summary

This change ensures the resultant record batches size does not exceed 2GB. Pyarrow uses int offsets and hence cannot hold arrays more than 2GB as the offset would overflow.

Rationale

The fix had already been applied but missed a corner case.

Changes

Address the corner case.

Impact

No impact on existing job runs. However, this change will allow onboarding tables with large parquet files.

Testing

Added functional tests for both cases.

Regression Risk

Very Low

Checklist

  • Unit tests covering the changes have been added

    • If this is a bugfix, regression tests have been added
  • E2E testing has been performed

Additional Notes

Any additional information or context relevant to this PR.

@raghumdani raghumdani changed the title Truncate the table if the size exceeds more then 2GB Truncate the table if the size exceeds more than 2GB Jan 14, 2025
Copy link
Collaborator

@pfaraone pfaraone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Member

@Zyiqin-Miranda Zyiqin-Miranda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Easier fix than I thought!

@raghumdani raghumdani merged commit 637b33f into main Jan 15, 2025
3 checks passed
@raghumdani raghumdani deleted the offset-pyarrow branch January 15, 2025 18:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants