Skip to content

0.12.0

Compare
Choose a tag to compare
@eyadgaran eyadgaran released this 03 Mar 08:33
· 7 commits to master since this release
6122a52
  • Changed internal dataset structure from mixins to direct inheritance
  • Condensed all pandas dataset types into a single base class
  • Adds support for dask datasets
  • Placeholders for additional dataset libraries
  • Adds hashing support for dask dataframes
  • Refactored persistence ("save_patterns") package into standalone extensible framework
  • Adds context manager support to registries for temporary overwrite
  • Refactor pipelines into library based subclasses

BREAKING CHANGES

  • Pandas dataset will default param squeeze_return to False (classes expecting to return a series will need to be updated)
  • Numpy dataset is considered unstable and will be redesigned in a future release
  • Onedrive, Hickle, and database save patterns are removed (functionality is still available but a composed pattern is not predefined. these can be trivially added in user code if needed)
  • Changed pandas hash output to int from numpy.int64 (due to breaking change in NumpyHasher)
  • Changed primitive deterministic hash from pickle to md5
  • Extracted data iterators into utility wrappers. Pipelines no longer have flags to return iterators
  • Random split defaults are computed at runtime instead of precalculated (affects hash)

What's Changed

New Contributors

Full Changelog: 0.11.0...0.12.0