Releases: eyadgaran/SimpleML
Releases · eyadgaran/SimpleML
0.14.0
0.14.0 (2022-07-13)
- Standarized formatting with Black
- Split up ORM into a standalone swappable backend
- Persistables maintain weakrefs for lineage
- Persistables are normal python objects now
- Hashing flag to reject non-serializable objects
What's Changed
- Black formatting by @eyadgaran in #103
- dask tweaks by @eyadgaran in #104
- Orm separation by @eyadgaran in #99
- Allow user to request a raised exception if hash(content) will be inconsistent by @ptoman-pa in #107
- version bump and changelog by @eyadgaran in #109
New Contributors
- @ptoman-pa made their first contribution in #107
Full Changelog: 0.13.0...0.14.0
0.13.0
- Path existence check for pandas serialization
What's Changed
- Path creation by @eyadgaran in #101
Full Changelog: 0.12.0...0.13.0
0.12.0
- Changed internal dataset structure from mixins to direct inheritance
- Condensed all pandas dataset types into a single base class
- Adds support for dask datasets
- Placeholders for additional dataset libraries
- Adds hashing support for dask dataframes
- Refactored persistence ("save_patterns") package into standalone extensible framework
- Adds context manager support to registries for temporary overwrite
- Refactor pipelines into library based subclasses
BREAKING CHANGES
- Pandas dataset will default param
squeeze_return
to False (classes expecting to return a series will need to be updated) - Numpy dataset is considered unstable and will be redesigned in a future release
- Onedrive, Hickle, and database save patterns are removed (functionality is still available but a composed pattern is not predefined. these can be trivially added in user code if needed)
- Changed pandas hash output to int from numpy.int64 (due to breaking change in NumpyHasher)
- Changed primitive deterministic hash from pickle to md5
- Extracted data iterators into utility wrappers. Pipelines no longer have flags to return iterators
- Random split defaults are computed at runtime instead of precalculated (affects hash)
What's Changed
- Ml management structure by @eyadgaran in #87
- Python eol by @eyadgaran in #92
- Dataset libraries by @eyadgaran in #90
- Pipeline refactor by @eyadgaran in #96
- additional testing coverage by @eyadgaran in #83
- Adding Ensemble Model Histogram-based Gradient Boosting Classifier by @aolopez in #91
- version bump by @eyadgaran in #98
New Contributors
Full Changelog: 0.11.0...0.12.0
0.11.0
- Added support to hasher for initialized objects
- Adds support for arbitrary dataset splits and sections
- Dataset hooks to validate dataframe setting
- Pipelines no longer cache dataset splits and proxy directly to dataset on every call
- Introduces pipeline splits as reproducible projections over dataset splits
- Database utility to recalculate hashes for existing persistables
BREAKING CHANGES
- Hash for an uninitialized class changed from repr(cls) to "cls._module.cls._name"
- Database migrations no longer recalculate hashes. That has to be done manually via a utility
0.10.0
- Dataset external file setter with validation hooks
- Pandas changes to always return dataframe copies (does not extend to underlying python objects! eg lists, objects, etc)
- Pandas Dataset Subclasses for Single and Multi label datasets
- PersistableLoader methods do not require name as a parameter
BREAKING CHANGES
PandasDataset
is deprecated and will be dropped in a future release. UseSingleLabelPandasDataset
orMultiLabelPandasDataset
instead- Pandas Dataset Classes require dataframe objects of type pd.DataFrame and will validate input (containers of pd.DataFrames are no longer supported)
0.9.3
0.9.2
0.9.1
0.9.0
- Refactored save patterns. Supports multiple concurrent save locations and arbitrary artifact declaration
- Registry centric model for easier extension and third party contrib
- Support for in-memory sqlite db
- Changed database JSON mapping class and dependency to support mutability tracking
- New import wrapper class to manage optional dependencies
- Added dataset_id as a Metric reference. Breaking workflow change! Will raise an error if a dataset is not added and the metric depends on it
- Dropped default Train pipeline split. Will return an empty split for split pipelines and a singleton full dataset split for NoSplitPipelines
- Explicitly migrated to tensorflow 2 and tf.keras