-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
don't parse_skip() for very many components #714
Conversation
Signed-off-by: Martijn Govers <[email protected]>
Quality Gate passedIssues Measures |
I put this PR on-hold. After running benchmark on my side in Python, I do not see a significant performance difference between 1.9.35
1.9.34
|
@mgovers I also run the script in Windows. There is no significant performance difference. 1.9.35
1.9.34
|
Did you |
I have no idea why but I also am not able to reproduce the performance regression in the Python package I found on Friday anymore; neither in a custom built package using editable mode, nor on the package pulled from PyPI. I am, however, still able to reproduce the problem in which otherwise skipped components are terribly inefficient to parse. |
@mgovers Since it does not affect current row-based deserialization. I hereby close this PR. You can continue to investigate the issue with |
Fixes very slow row-based deserialization cfr. #708 (comment)
NOTE: columnar deserialization is still potentially slow due to very many
parse_skip
calls when a certain column is not presentIssue was introduced in #708
A couple issues are not yet explained and also need further investigation but are out of scope of the immediate regression mitigation:
parse
multiple times on different components?