-
Notifications
You must be signed in to change notification settings - Fork 238
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add custom sampling option to PySMO #1298
Conversation
- User can explicitly define a distribution for sampling of each variable. Sampling options currently available are random, uniform and Gaussian.
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #1298 +/- ##
==========================================
+ Coverage 77.36% 77.38% +0.02%
==========================================
Files 390 390
Lines 63462 63547 +85
Branches 11671 11699 +28
==========================================
+ Hits 49099 49179 +80
- Misses 11832 11836 +4
- Partials 2531 2532 +1 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks very good, and the coverage is very high as well. Nicely done!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few requests, mostly for more specific exception types and descriptive test names.
|
||
@pytest.mark.unit | ||
@pytest.mark.parametrize("array_type", [np.array]) | ||
def test_sample_points_01(self, array_type): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These look like tests that could (should) be done using fixed inputs. I.e. do not use the sample generation and mock up a set of known distributions that we can then use to ensure we get the right answer out.
As far as I can tell, sample_points
is a case where we can test the method in isolation used manufactured data - i.e. provide the input specification, a mock up of a random dataset and a concrete answer we expect to get back out.
@OOAmusat looks like this did not make the Dec 2.3.0 release... |
Yes, I don't know why as it had the required approvals and tests passing... |
Summary/Motivation:
This PR adds a sampling method that allows users to specify the sampling distributions to be used for each input variable/column. Current interface provides random, uniform and normal distribution sampling options.
Changes proposed in this PR:
CustomSampling
classLegal Acknowledgement
By contributing to this software project, I agree to the following terms and conditions for my contribution: