Skip to content

Modin 0.25.0

Compare
Choose a tag to compare
@anmyachev anmyachev released this 26 Oct 19:46
· 224 commits to master since this release
0.25.0
e12b217

This release introduces modin.utils.execute function to improve benchmarking experience, includes new version of HDK 0.9.
It also includes performance optimizations for sort_values, value_counts, 2D setitem and several others, as well as many bug fixes.

Key Features and Updates Since 0.24.0

  • Stability and Bugfixes
    • FIX-#4507: Do not call ray.get() inside of the kernel executing call queues (#6633)
    • FIX-#6585: Avoid FutureWarnings in rolling unless necessary (#6586)
    • FIX-#6600: Fix usage of list of UDF functions in Series.groupby.agg (#6613)
    • FIX-#6602: Refactor join to avoid distributing a dict object warning (#6612)
    • FIX-#6604: HDK: Added support for list to DataFrame.agg() (#6606)
    • FIX-#6607: Fix incorrect cache after .sort_values() (#6608)
    • FIX-#6624: Add FutureWarnings for first/last/bool (#6625)
    • FIX-#6628: Allow groupby.diff() for dates (#6631)
    • FIX-#6632: Return Series instead of Dataframe for groupby.apply in case of experimental groupby (#6649)
    • FIX-#6635: HDK: read_csv(): treat object dtype as string (#6636)
    • FIX-#6637: Fix skiprows parameter usage for read_excel (#6638)
    • FIX-#6642: Fix modin.numpy.array.sum on HDK (#6643)
    • FIX-#6647: Added init file to make modin/experimental/sql/hdk/query.py part of modin package (#6646)
    • FIX-#6651: Make sure Series.between works correctly (#6656)
    • FIX-#6680: Specify navigation_with_keys=True to fix docs build (#6681)
  • Performance enhancements
    • PERF-#2813: Distributed from_pandas() for numerical data in Ray (#6640)
    • PERF-#5533: Improved sort_values by reducing the number of partitions (#6589)
    • PERF-#6362: Implement 2D setitem without to-pandas conversion (#6618)
    • PERF-#6614: HDK: Use MODIN_CPUS instead of os.cpu_count() for the fragment size calculation (#6615)
    • PERF-#6629: HDK: Avoid LazyProxyCategoricalDtype materialization on merge (#6630)
    • PERF-#6645: Avoid label synchronization for dot operation (#6644)
    • PERF-#6653: value_counts(): Eliminate redundant sorting. (#6654)
    • PERF-#6661: Do not convert columns dtypes if the new dtypes are the same (#6662)
  • Refactor Codebase
    • REFACTOR-#6622: Don't use deprecated random_integers func (#6623)
  • Update testing suite
    • TEST-#5489: Allow for pytest to print warnings in tests output (#6621)
  • Documentation improvements
    • DOCS-#4085: Replace vague links to actual names of the pages/sections in docs (#4096)
    • DOCS-#6658: Add a note how to enable object spilling in a multi-node Ray cluster (#6659)
  • New Features
    • FEAT-#5221: Add execute to trigger lazy computations and wait for them to complete (#6648)
    • FEAT-#5634: Introduce materialize parameter for partition.ip func (#6650)
    • FEAT-#6675: Bump pyhdk version to 0.9 (#6676)

Contributors

@AndreyPavlenko
@Egor-Krivov
@Garra1980
@YarShev
@anmyachev
@dchigarev