Release Modin 0.25.0 · modin-project/modin

This release introduces modin.utils.execute function to improve benchmarking experience, includes new version of HDK 0.9.
It also includes performance optimizations for sort_values, value_counts, 2D setitem and several others, as well as many bug fixes.

Key Features and Updates Since 0.24.0

Stability and Bugfixes
- FIX-#4507: Do not call ray.get() inside of the kernel executing call queues (#6633)
- FIX-#6585: Avoid FutureWarnings in rolling unless necessary (#6586)
- FIX-#6600: Fix usage of list of UDF functions in Series.groupby.agg (#6613)
- FIX-#6602: Refactor join to avoid distributing a dict object warning (#6612)
- FIX-#6604: HDK: Added support for list to DataFrame.agg() (#6606)
- FIX-#6607: Fix incorrect cache after .sort_values() (#6608)
- FIX-#6624: Add FutureWarnings for first/last/bool (#6625)
- FIX-#6628: Allow groupby.diff() for dates (#6631)
- FIX-#6632: Return Series instead of Dataframe for groupby.apply in case of experimental groupby (#6649)
- FIX-#6635: HDK: read_csv(): treat object dtype as string (#6636)
- FIX-#6637: Fix skiprows parameter usage for read_excel (#6638)
- FIX-#6642: Fix modin.numpy.array.sum on HDK (#6643)
- FIX-#6647: Added init file to make modin/experimental/sql/hdk/query.py part of modin package (#6646)
- FIX-#6651: Make sure Series.between works correctly (#6656)
- FIX-#6680: Specify navigation_with_keys=True to fix docs build (#6681)
Performance enhancements
- PERF-#2813: Distributed from_pandas() for numerical data in Ray (#6640)
- PERF-#5533: Improved sort_values by reducing the number of partitions (#6589)
- PERF-#6362: Implement 2D setitem without to-pandas conversion (#6618)
- PERF-#6614: HDK: Use MODIN_CPUS instead of os.cpu_count() for the fragment size calculation (#6615)
- PERF-#6629: HDK: Avoid LazyProxyCategoricalDtype materialization on merge (#6630)
- PERF-#6645: Avoid label synchronization for dot operation (#6644)
- PERF-#6653: value_counts(): Eliminate redundant sorting. (#6654)
- PERF-#6661: Do not convert columns dtypes if the new dtypes are the same (#6662)
Refactor Codebase
- REFACTOR-#6622: Don't use deprecated random_integers func (#6623)
Update testing suite
- TEST-#5489: Allow for pytest to print warnings in tests output (#6621)
Documentation improvements
- DOCS-#4085: Replace vague links to actual names of the pages/sections in docs (#4096)
- DOCS-#6658: Add a note how to enable object spilling in a multi-node Ray cluster (#6659)
New Features
- FEAT-#5221: Add execute to trigger lazy computations and wait for them to complete (#6648)
- FEAT-#5634: Introduce materialize parameter for partition.ip func (#6650)
- FEAT-#6675: Bump pyhdk version to 0.9 (#6676)

Contributors

@AndreyPavlenko
@Egor-Krivov
@Garra1980
@YarShev
@anmyachev
@dchigarev

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Modin 0.25.0

Key Features and Updates Since 0.24.0

Contributors

Contributors