Skip to content

Latest commit

 

History

History
250 lines (178 loc) · 15.5 KB

HISTORY.md

File metadata and controls

250 lines (178 loc) · 15.5 KB

History

v0.8.1 - 2022-12-09

This release fixes bugs in the existing metrics and reports. We also make the reports compatible with future SDV versions.

New Features

  • Filter out additional sdtypes that will be available in future versions of SDV - Issue #265 by @katxiao
  • NewRowSynthesis should ignore PrimaryKey column - Issue #260 by @katxiao

Bug Fixes

  • Visualization crashes if there are metric errors - Issue #272 by @katxiao
  • Score for TVComplement if synthetic data only has missing values - Issue #271 by @katxiao
  • Fix 'timestamp' column metadata in the multi table demo - Issue #267 by @katxiao
  • Fix 'duration' column in the single table demo - Issue #266 by @katxiao
  • README.md example has a bug - Issue #262 by @katxiao
  • Update README.md to fix a bug - Issue #263 by @katxiao
  • Visualization get_column_pair_plot: update parameter name to column_names - Issue #258 by @katxiao
  • "Column Shapes" and "Column Pair Trends" Calculation Inconsistency - Issue #254 by @katxiao
  • Diagnostic Report missing RangeCoverage for numerical columns - Issue #255 by @katxiao

v0.8.0 - 2022-11-02

This release introduces the DiagnosticReport, which helps a user verify – at a quick glance – that their data is valid. We also fix an existing bug with detection metrics.

New Features

  • Fixes for new metadata - Issue #253 by @katxiao
  • Add default synthetic sample size to DiagnosticReport - Issue #248 by @katxiao
  • Exclude pii columns from single table metrics - Issue #245 by @katxiao
  • Accept both old and new metadata - Issue #244 by @katxiao
  • Address Diagnostic Report and metric edge cases - Issue #243 by @katxiao
  • Update visualization average per table - Issue #242 by @katxiao
  • Add save and load functionality to multi-table DiagnosticReport - Issue #218 by @katxiao
  • Visualization methods for the multi-table DiagnosticReport - Issue #217 by @katxiao
  • Add getter methods to multi-table DiagnosticReport - Issue #216 by @katxiao
  • Create multi-table DiagnosticReport - Issue #215 by @katxiao
  • Visualization methods for the single-table DiagnosticReport - Issue #211 by @katxiao
  • Add getter methods to single-table DiagnosticReport - Issue #210 by @katxiao
  • Create single-table DiagnosticReport - Issue #209 by @katxiao
  • Add save and load functionality to single-table DiagnosticReport - Issue #212 by @katxiao
  • Add single table diagnostic report - Issue #237 by @katxiao

Bug Fixes

  • Detection test test doesn't look at metadata when determining which columns to use - Issue #119 by @R-Palazzo

Internal Improvements

  • Remove torch dependency - Issue #233 by @katxiao
  • Update README - Issue #250 by @katxiao

v0.7.0 - 2022-09-27

This release introduces the QualityReport, which evaluates how well synthetic data captures mathematical properties from the real data. The QualityReport incorporates the new metrics introduced in the previous release, and allows users to get detailed results, visualize the scores, and save the report for future viewing. We also add utility methods for visualizing columns and pairs of columns.

New Features

  • Catch typeerror in new row synthesis query - Issue #234 by @katxiao
  • Add NewRowSynthesis Metric - Issue #207 by @katxiao
  • Update plot utilities API - Issue #228 by @katxiao
  • Fix column pairs visualization bug - Issue #230 by @katxiao
  • Save version - Issue #229 by @katxiao
  • Update efficacy metrics API - Issue #227 by @katxiao
  • Add RangeCoverage Metric - Issue #208 by @katxiao
  • Add get_column_pairs_plot utility method - Issue #223 by @katxiao
  • Parse date as datetime - Issue #222 by @katxiao
  • Update error handling for reports - Issue #221 by @katxiao
  • Visualization API update - Issue #220 by @katxiao
  • Bug fixes for QualityReport - Issue #219 by @katxiao
  • Update column pair metric calculation - Issue #214 by @katxiao
  • Add get score methods for multi table QualityReport - Issue #190 by @katxiao
  • Add multi table QualityReport visualization methods - Issue #192 by @katxiao
  • Add plot_column visualization utility method - Issue #193 by @katxiao
  • Add save and load behavior to multi table QualityReport - Issue #188 by @katxiao
  • Create multi-table QualityReport - Issue #186 by @katxiao
  • Add single table QualityReport visualization methods - Issue #191 by @katxiao
  • Add save and load behavior to single table QualityReport - Issue #187 by @katxiao
  • Add get score methods for single table Quality Report - Issue #189 by @katxiao
  • Create single-table QualityReport - Issue #185 by @katxiao

Internal Improvements

  • Auto apply "new" label instead of "pending review" - Issue #164 by @katxiao
  • fix typo - Issue #195 by @fealho

v0.6.0 - 2022-08-12

This release removes SDMetric's dependency on the RDT library, and also introduces new quality and diagnostic metrics. Additionally, we introduce a new compute_breakdown method that returns a breakdown of metric results.

New Features

  • Handle null values correctly - Issue #194 by @katxiao
  • Add wrapper classes for new single and multi table metrics - Issue #169 by @katxiao
  • Add CorrelationSimilarity metric - Issue #143 by @katxiao
  • Add CardinalityShapeSimilarity metric - Issue #160 by @katxiao
  • Add CardinalityStatisticSimilarity metric - Issue #145 by @katxiao
  • Add ContingencySimilarity Metric - Issue #159 by @katxiao
  • Add TVComplement metric - Issue #142 by @katxiao
  • Add MissingValueSimilarity metric - Issue #139 by @katxiao
  • Add CategoryCoverage metric - Issue #140 by @katxiao
  • Add compute breakdown column for single column - Issue #152 by @katxiao
  • Add BoundaryAdherence metric - Issue #138 by @katxiao
  • Get KSComplement Score Breakdown - Issue #130 by @katxiao
  • Add StatisticSimilarity Metric - Issue #137 by @katxiao
  • New features for KSTest.compute - Issue #129 by @amontanez24

Internal Improvements

  • Add integration tests and fixes - Issue #183 by @katxiao
  • Remove rdt hypertransformer dependency in timeseries metrics - Issue #176 by @katxiao
  • Replace rdt LabelEncoder with sklearn - Issue #178 by @katxiao
  • Remove rdt as a dependency - Issue #182 by @katxiao
  • Use sklearn's OneHotEncoder instead of rdt - Issue #170 by @katxiao
  • Remove KSTestExtended - Issue #180 by @katxiao
  • Remove TSFClassifierEfficacy and TSFCDetection metrics - Issue #171 by @katxiao
  • Update the default tags for a feature request - Issue #172 by @katxiao
  • Bump github macos version - Issue #174 by @katxiao
  • Fix pydocstyle to check sdmetrics - Issue #153 by @pvk-developer
  • Update the RDT version to 1.0 - Issue #150 by @pvk-developer
  • Update slack invite link - Issue #132 by @pvk-developer

v0.5.0 - 2022-05-11

This release fixes an error where the relational KSTest crashes if a table doesn't have numerical columns. It also includes some housekeeping, updating the pomegranate and copulas version requirements.

Issues closed

  • Cap pomegranate to <0.14.7 - Issue #116 by @csala
  • Relational KSTest crashes with IncomputableMetricError if a table doesn't have numerical columns - Issue #109 by @katxiao

v0.4.1 - 2021-12-09

This release improves the handling of metric errors, and updates the default transformer behavior used in SDMetrics.

Issues closed

  • Report metric errors from compute_metrics - Issue #107 by @katxiao
  • Specify default categorical transformers - Issue #105 by @katxiao

v0.4.0 - 2021-11-16

This release adds support for Python 3.9 and updates dependencies to ensure compatibility with the rest of the SDV ecosystem, and upgrades to the latests RDT release.

Issues closed

  • Replace sktime for pyts - Issue #103 by @pvk-developer
  • Add support for Python 3.9 - Issue #102 by @pvk-developer
  • Increase code style lint - Issue #80 by @fealho
  • Add pip check to CI workflows - Issue #79 by @pvk-developer
  • Upgrade dependency ranges - Issue #69 by @katxiao

v0.3.2 - 2021-08-16

This release makes pomegranate an optional dependency.

Issues closed

  • Make pomegranate an optional dependency - Issue #63 by @fealho

v0.3.1 - 2021-07-12

This release fixes a bug to make the privacy metrics available in the API docs. It also updates dependencies to ensure compatibility with the rest of the SDV ecosystem.

Issues closed

  • CategoricalSVM not being imported - Issue #65 by @csala

v0.3.0 - 2021-03-30

This release includes privacy metrics to evaluate if the real data could be obtained or deduced from the synthetic samples. Additionally all the metrics have a normalize method which takes the raw_score generated by the metric and returns a value between 0 and 1.

Issues closed

  • Add normalize method to metrics - Issue #51 by @csala and @fealho
  • Implement privacy metrics - Issue #36 by @ZhuofanXie and @fealho

v0.2.0 - 2021-02-24

Dependency upgrades to ensure compatibility with the rest of the SDV ecosystem.

v0.1.3 - 2021-02-13

Updates the required dependecies to facilitate a conda release.

Issues closed

  • Upgrade sktime - Issue #49 by @fealho

v0.1.2 - 2021-01-27

Big fixing release that addresses several minor errors.

Issues closed

  • More splits than classes - Issue #46 by @fealho
  • Scipy 1.6.0 causes an AttributeError - Issue #44 by @fealho
  • Time series metrics fails with variable length timeseries - Issue #42 by @fealho
  • ParentChildDetection metrics KeyError - Issue #39 by @csala

v0.1.1 - 2020-12-30

This version adds Time Series Detection and Efficacy metrics, as well as a fix to ensure that Single Table binary classification efficacy metrics work well with binary targets which are not boolean.

Issues closed

  • Timeseries efficacy metrics - Issue #35 by @csala
  • Timeseries detection metrics - Issue #34 by @csala
  • Ensure binary classification targets are bool - Issue #33 by @csala

v0.1.0 - 2020-12-18

This release introduces a new project organization and API, with metrics grouped by data modality, with a common API:

  • Single Column
  • Column Pair
  • Single Table
  • Multi Table
  • Time Series

Within each data modality, different families of metrics have been implemented:

  • Statistical
  • Detection
  • Bayesian Network and Gaussian Mixture Likelihood
  • Machine Learning Efficacy

v0.0.4 - 2020-11-27

Patch release to relax dependencies and avoid conflicts when using the latest SDV version.

v0.0.3 - 2020-11-20

Fix error on detection metrics when input data contains infinity or NaN values.

Issues closed

  • ValueError: Input contains infinity or a value too large for dtype('float64') - Issue #11 by @csala

v0.0.2 - 2020-08-08

Add support for Python 3.8 and a broader range of dependencies.

v0.0.1 - 2020-06-26

First release to PyPI.