-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improved CF decoding #6812
base: main
Are you sure you want to change the base?
Improved CF decoding #6812
Changes from 3 commits
2a5686c
108586e
4eedd29
312acda
4615720
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -237,7 +237,7 @@ def _choose_float_dtype(dtype, has_offset): | |
# Sensitivity analysis can be tricky, so we just use a float64 | ||
# if there's any offset at all - better unoptimised than wrong! | ||
if not has_offset: | ||
return np.float32 | ||
return np.float64 | ||
# For all other types and circumstances, we just use float64. | ||
# (safe because eg. complex numbers are not supported in NetCDF) | ||
return np.float64 | ||
|
@@ -269,7 +269,7 @@ def decode(self, variable, name=None): | |
if "scale_factor" in attrs or "add_offset" in attrs: | ||
scale_factor = pop_to(attrs, encoding, "scale_factor", name=name) | ||
add_offset = pop_to(attrs, encoding, "add_offset", name=name) | ||
dtype = _choose_float_dtype(data.dtype, "add_offset" in attrs) | ||
dtype = _choose_float_dtype(data.dtype, "add_offset" in encoding) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I suspect this fixed one issue, but the original issue still remains because we still aren't looking at the dtype of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Note - I think the conventions referred to above are: https://cfconventions.org/Data/cf-conventions/cf-conventions-1.7/build/ch08.html or
|
||
if np.ndim(scale_factor) > 0: | ||
scale_factor = np.asarray(scale_factor).item() | ||
if np.ndim(add_offset) > 0: | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -96,7 +96,7 @@ def test_coder_roundtrip() -> None: | |
assert_identical(original, roundtripped) | ||
|
||
|
||
@pytest.mark.parametrize("dtype", "u1 u2 i1 i2 f2 f4".split()) | ||
@pytest.mark.parametrize("dtype", "f2 f4".split()) | ||
def test_scaling_converts_to_float32(dtype) -> None: | ||
original = xr.Variable( | ||
("x",), np.arange(10, dtype=dtype), encoding=dict(scale_factor=10) | ||
|
@@ -109,6 +109,19 @@ def test_scaling_converts_to_float32(dtype) -> None: | |
assert roundtripped.dtype == np.float32 | ||
|
||
|
||
@pytest.mark.parametrize("dtype", "u1 u2 i1 i2".split()) | ||
def test_scaling_converts_to_float64(dtype) -> None: | ||
original = xr.Variable( | ||
("x",), np.arange(10, dtype=dtype), encoding=dict(scale_factor=10) | ||
) | ||
coder = variables.CFScaleOffsetCoder() | ||
encoded = coder.encode(original) | ||
assert encoded.dtype == np.float64 | ||
roundtripped = coder.decode(encoded) | ||
assert_identical(original, roundtripped) | ||
assert roundtripped.dtype == np.float64 | ||
|
||
|
||
@pytest.mark.parametrize("scale_factor", (10, [10])) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. These kinds of things tend to happen though. Since we have tested for it, we should just keep it around. |
||
@pytest.mark.parametrize("add_offset", (0.1, [0.1])) | ||
def test_scaling_offset_as_list(scale_factor, add_offset) -> None: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the code matches the comments. It would be clearer if written as
Without your edits, if there is an offset the condition does not trigger and we return np.float64 later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for reviewing this pull request. FYI my original comment (later edited) said:
Based on your comment, I think my original intuition - that this function needs a large rewrite - is correct. I'll look into this and submit additional commits to this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for looking into this!