0.12.0
- Changed internal dataset structure from mixins to direct inheritance
- Condensed all pandas dataset types into a single base class
- Adds support for dask datasets
- Placeholders for additional dataset libraries
- Adds hashing support for dask dataframes
- Refactored persistence ("save_patterns") package into standalone extensible framework
- Adds context manager support to registries for temporary overwrite
- Refactor pipelines into library based subclasses
BREAKING CHANGES
- Pandas dataset will default param
squeeze_return
to False (classes expecting to return a series will need to be updated) - Numpy dataset is considered unstable and will be redesigned in a future release
- Onedrive, Hickle, and database save patterns are removed (functionality is still available but a composed pattern is not predefined. these can be trivially added in user code if needed)
- Changed pandas hash output to int from numpy.int64 (due to breaking change in NumpyHasher)
- Changed primitive deterministic hash from pickle to md5
- Extracted data iterators into utility wrappers. Pipelines no longer have flags to return iterators
- Random split defaults are computed at runtime instead of precalculated (affects hash)
What's Changed
- Ml management structure by @eyadgaran in #87
- Python eol by @eyadgaran in #92
- Dataset libraries by @eyadgaran in #90
- Pipeline refactor by @eyadgaran in #96
- additional testing coverage by @eyadgaran in #83
- Adding Ensemble Model Histogram-based Gradient Boosting Classifier by @aolopez in #91
- version bump by @eyadgaran in #98
New Contributors
Full Changelog: 0.11.0...0.12.0