Modin 0.27.0
This release updates pandas to 2.2, introduces lazy execution mode on Ray, new functions that support glob
syntax and speeds up several more groupby cases. It also includes some other new features, performance
optimizations and many bug fixes.
Key Features and Updates Since 0.26.0
- Stability and Bugfixes
- FIX-#2405: Make sure named aggregation work for Series objects (#6892)
- FIX-#5925: Put a sorting-hack into groupby tests to hide #6875 bug (#6896)
- FIX-#6830: Pass AWS related env vars to mpiexec (#6867)
- FIX-#6840: Call
tolist
function inDtypesDescriptor._merge_dtypes
(#6844) - FIX-#6855: Make sure
read_parquet
works with integer columns for pyarrow engine (#6874) - FIX-#6879: Convert the right DF to single partition before broadcasting in
query_compiler.merge
(#6880) - FIX-#6881: Make sure
astype
works correctly withint32
andfloat32
dtypes (#6884) - FIX-#6897: Preprocess kernel function that aligns columns in groupby (#6898)
- FIX-#6897: Revert unidist specific fix for groupby (#6902)
- FIX-#6899: Avoid sending lazy categorical proxies to workers (#6900)
- FIX-#6904: Align levels of partially known dtypes with MultiIndex labels (#6905)
- FIX-#6911: Remove unidist specific workaround in
.from_pandas()
(#6912) - FIX-#6916: Unpin
pydantic
dependency (#6917) - FIX-#6924: HDK: Use
JoinNode
instead ofMaskNode
for non-range row_position (#6926)
- Performance enhancements
- Refactor Codebase
- REFACTOR-#6293: Corrected
missmatch
tomismatch
inErrorMessage.missmatch_with_pandas
method (#6901) - REFACTOR-#6812: Remove
PyarrowOnRay
execution in favour of pyarrow-backed pandas dataframes (#6848) - REFACTOR-#6833: Remove
SocksProxy
,DoLogRpyc
,DoTraceRpyc
outdated classes (#6834) - REFACTOR-#6845: Fix import issues found by CodeQL (#6837)
- REFACTOR-#6852: Remove
OrderedDict
in favor of builtindict
(#6853) - REFACTOR-#6858: Rename
_get_dimensions
and change arguments (#6859) - REFACTOR-#6889: Define
__all__
inmodin.config.__init__.py
(#6886) - REFACTOR-#6903: Remove duplicated definitions of
create_test_series
(#6910) - REFACTOR-#6918: Docstring and type hints fixes (#6925)
- REFACTOR-#6293: Corrected
- Update testing suite
- TEST-#6708: Create test files using
tmp_path
fixture (#6709) - TEST-#6777: Make
to_csv
tests on Unidist more stable (fortest-all-unidist
CI job) (#6851) - TEST-#6830: Use local s3 server instead of public s3 buckets (#6863)
- TEST-#6846: Skip unstable Unidist
to_csv
tests (#6847) - TEST-#6868: Remove tests for
gs
remote protocol since we rely onfsspec
(#6882) - TEST-#6885: Switch to
black>=24.1.0
(#6887) - TEST-#6893: Added support for
pytest 8.0.0
(#6894) - TEST-#6920: Remove testing for Ray client (#6921)
- TEST-#6708: Create test files using
- Documentation improvements
- New Features
- FEAT-#3450: Implement
read_json_glob
andto_json_glob
(#6873) - FEAT-#5809: New implementation of the Ray lazy execution queue (#6731)
- FEAT-#5925: Enable grouping on categoricals with range-partitioning impl (#6862)
- FEAT-#6382: Execute bitwise NOT (~) operations on HDK (#6383)
- FEAT-#6398: Improved performance of list-like objects insertion into HDK DataFrames (#6412)
- FEAT-#6830: Remove public s3 bucket reference (#6829)
- FEAT-#6831: Implement
read_parquet_glob
andto_parquet_glob
(#6854) - FEAT-#6832: Implement
read_xml_glob
,to_xml_glob
(#6930) - FEAT-#6835: Do not put binary functions to the Ray storage multiple times (#6836)
- FEAT-#6838: Prefer lazy execution for binary operations with scalar (#6839)
- FEAT-#6841: Fixing ray anti pattern with
.length()
and.width()
being called in a loop (#6842) - FEAT-#6849: Removing
to_pandas
call inmerge
andjoin
functions (#6850) - FEAT-#6883: Support grouping on a Series with range-partitioning impl (#6888)
- FEAT-#6906: Update to pandas
2.2.*
(#6907) - FEAT-#6908: Remove the warning regarding engine initialization (#6909)
- FEAT-#6914: Add a config for setting a number of threads per Dask worker (#6915)
- FEAT-#6918: Add auto mode to the lazy execution. (#6919)
- FEAT-#3450: Implement
Contributors
@AndreyPavlenko
@YarShev
@anmyachev
@arunjose696
@dchigarev
@leshikus
@vedant