Skip to content

Modin 0.27.0

Compare
Choose a tag to compare
@anmyachev anmyachev released this 14 Feb 14:00
· 102 commits to master since this release
0.27.0
d54dcfd

This release updates pandas to 2.2, introduces lazy execution mode on Ray, new functions that support glob
syntax and speeds up several more groupby cases. It also includes some other new features, performance
optimizations and many bug fixes.

Key Features and Updates Since 0.26.0

  • Stability and Bugfixes
    • FIX-#2405: Make sure named aggregation work for Series objects (#6892)
    • FIX-#5925: Put a sorting-hack into groupby tests to hide #6875 bug (#6896)
    • FIX-#6830: Pass AWS related env vars to mpiexec (#6867)
    • FIX-#6840: Call tolist function in DtypesDescriptor._merge_dtypes (#6844)
    • FIX-#6855: Make sure read_parquet works with integer columns for pyarrow engine (#6874)
    • FIX-#6879: Convert the right DF to single partition before broadcasting in query_compiler.merge (#6880)
    • FIX-#6881: Make sure astype works correctly with int32 and float32 dtypes (#6884)
    • FIX-#6897: Preprocess kernel function that aligns columns in groupby (#6898)
    • FIX-#6897: Revert unidist specific fix for groupby (#6902)
    • FIX-#6899: Avoid sending lazy categorical proxies to workers (#6900)
    • FIX-#6904: Align levels of partially known dtypes with MultiIndex labels (#6905)
    • FIX-#6911: Remove unidist specific workaround in .from_pandas() (#6912)
    • FIX-#6916: Unpin pydantic dependency (#6917)
    • FIX-#6924: HDK: Use JoinNode instead of MaskNode for non-range row_position (#6926)
  • Performance enhancements
    • PERF-#6876: Skip the masking stage on iloc where beneficial (#6878)
    • PERF-#6922: Set DaskThreadsPerWorker to 1 (#6923)
  • Refactor Codebase
    • REFACTOR-#6293: Corrected missmatch to mismatch in ErrorMessage.missmatch_with_pandas method (#6901)
    • REFACTOR-#6812: Remove PyarrowOnRay execution in favour of pyarrow-backed pandas dataframes (#6848)
    • REFACTOR-#6833: Remove SocksProxy, DoLogRpyc, DoTraceRpyc outdated classes (#6834)
    • REFACTOR-#6845: Fix import issues found by CodeQL (#6837)
    • REFACTOR-#6852: Remove OrderedDict in favor of builtin dict (#6853)
    • REFACTOR-#6858: Rename _get_dimensions and change arguments (#6859)
    • REFACTOR-#6889: Define __all__ in modin.config.__init__.py (#6886)
    • REFACTOR-#6903: Remove duplicated definitions of create_test_series (#6910)
    • REFACTOR-#6918: Docstring and type hints fixes (#6925)
  • Update testing suite
    • TEST-#6708: Create test files using tmp_path fixture (#6709)
    • TEST-#6777: Make to_csv tests on Unidist more stable (for test-all-unidist CI job) (#6851)
    • TEST-#6830: Use local s3 server instead of public s3 buckets (#6863)
    • TEST-#6846: Skip unstable Unidist to_csv tests (#6847)
    • TEST-#6868: Remove tests for gs remote protocol since we rely on fsspec (#6882)
    • TEST-#6885: Switch to black>=24.1.0 (#6887)
    • TEST-#6893: Added support for pytest 8.0.0 (#6894)
    • TEST-#6920: Remove testing for Ray client (#6921)
  • Documentation improvements
    • DOCS-#6860: Add an ecosystem page to the docs (#6861)
  • New Features
    • FEAT-#3450: Implement read_json_glob and to_json_glob (#6873)
    • FEAT-#5809: New implementation of the Ray lazy execution queue (#6731)
    • FEAT-#5925: Enable grouping on categoricals with range-partitioning impl (#6862)
    • FEAT-#6382: Execute bitwise NOT (~) operations on HDK (#6383)
    • FEAT-#6398: Improved performance of list-like objects insertion into HDK DataFrames (#6412)
    • FEAT-#6830: Remove public s3 bucket reference (#6829)
    • FEAT-#6831: Implement read_parquet_glob and to_parquet_glob (#6854)
    • FEAT-#6832: Implement read_xml_glob, to_xml_glob (#6930)
    • FEAT-#6835: Do not put binary functions to the Ray storage multiple times (#6836)
    • FEAT-#6838: Prefer lazy execution for binary operations with scalar (#6839)
    • FEAT-#6841: Fixing ray anti pattern with .length() and .width() being called in a loop (#6842)
    • FEAT-#6849: Removing to_pandas call in merge and join functions (#6850)
    • FEAT-#6883: Support grouping on a Series with range-partitioning impl (#6888)
    • FEAT-#6906: Update to pandas 2.2.* (#6907)
    • FEAT-#6908: Remove the warning regarding engine initialization (#6909)
    • FEAT-#6914: Add a config for setting a number of threads per Dask worker (#6915)
    • FEAT-#6918: Add auto mode to the lazy execution. (#6919)

Contributors

@AndreyPavlenko
@YarShev
@anmyachev
@arunjose696
@dchigarev
@leshikus
@vedant