-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validate inputs #85
base: main
Are you sure you want to change the base?
Validate inputs #85
Conversation
Codecov Report
@@ Coverage Diff @@
## dscim-v0.4.0_fixes #85 +/- ##
======================================================
+ Coverage 68.15% 68.72% +0.56%
======================================================
Files 17 17
Lines 1878 1912 +34
======================================================
+ Hits 1280 1314 +34
Misses 598 598
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
This validation script requires very strict input damages, but we may want to generalize in the future. The package itself requires very particular dimensions for each object, so we may want the config to include the dimensions which would allow more flexible input files. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey y'all. I gave this a quick look and I have some comments here and in-line.
-
Having validation throw
assertions
is very common; unfortunately, it's possible to turn off assertions entirely. It also makes it hard for users to catch validation errors, specifically if assertions are getting thrown from different bodies of code in this pkg. Consider throwing a custom exception or something else instead. I'd consider what happens or what failures you'd see if this ran without throwing assertions and how that changes behavior. This might be something to consider in another PR later. -
Checking for chunking might be too rigid. Is there a really good reason for this? Some of these checks feel pretty rigid, generally. Just double-check that you actually need to validate this stuff. It's pretty common to dance around with how tight or loose checks are. I'd say you want to ensure the data has what you need, but you also want to avoid blocking people from doing good work in the future because you were overzealous with checks.
## Unreleased | ||
### Added | ||
- Function to validate input damages. ([PR #85](https://github.com/ClimateImpactLab/dscim/pull/83), [@JMGilbert](https://github.com/JMGilbert)) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why add this in a new "unreleased" section here and not in the one below?
@@ -863,3 +874,50 @@ def coastal_inputs( | |||
consolidated=True, | |||
mode="w", | |||
) | |||
|
|||
|
|||
def validate_damages(sector, path): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docstr? What are the inputs params and what is being checked? Expected behavior?
Be clear about what you're actually checking for. Don't be shy with inline comments, too because people will usually be back to read this kind of checking code in the future. (You did a pretty good job with comments here so 👍 )
|
||
|
||
def validate_damages(sector, path): | ||
inputs = xr.open_zarr(path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious why have IO here instead of reading in a zarr store? Would this make it so you don't need to monkey patch it for tests?
chunks_expected = dict(zip(dims, chunks)) | ||
|
||
assert dims_expected == dict(inputs.dims) | ||
for i in list(inputs.keys()): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be able to just have this as inputs.keys()
without casting it to a list?
monkeypatch.setattr( | ||
"dscim.preprocessing.input_damages.validate_damages", lambda *args: True | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(also applies to all monkeypatching below)
Why does this need to be monkey patched out?
Might be better if the validation function takes a dataset or dataarray as input instead of doing its own IO?
Patching like this in tests might be a symptom of a design problem.
|
||
|
||
@pytest.mark.parametrize("sector", ["mortality", "coastal"]) | ||
def test_validate_damages_correct(tmp_path, sector): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need docstr. What behavior is this testing for?
validate_damages(sector, path) # No assertion error should be raised | ||
|
||
|
||
def test_validate_damages_incorrect_batches(tmp_path): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need docstr. What's this testing. I can kinda get a feel for it from the title but might want to say more specifically what you're looking for.
These docstrs also can print when tests fail so its good to have even if they seem obvious.
) | ||
|
||
|
||
def test_validate_damages_incorrect_rcps(tmp_path): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need docstr. Harder to grok what this is testing for exactly.
|
||
|
||
@pytest.mark.parametrize("sector", ["mortality", "coastal"]) | ||
def test_validate_damages_incorrect_chunk_sizes(tmp_path, sector): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs docstr.
|
||
|
||
@pytest.mark.parametrize("sector", ["mortality", "coastal"]) | ||
def test_validate_damages_incorrect_region_chunk_sizes(tmp_path, sector): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neeeeds doooocstrr.
One more thing! Consider having an option to turn off hard, run-blocking validation in functions where you've hardcoded it. This is another thing that could help with testing or if the validation starts getting in the way for silly reasons. |
So far the input validation checks:
In general, I have tried to design the
validate_damages
function to be pretty flexible so it should be easy to change what checks we run on the input damages.