-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cf-coding #7654
cf-coding #7654
Conversation
I'll add to |
The one failed run might be spurious. Maybe a re-run of this would work. |
Thanks a lot for the quick PR! I can confirm that this fixes But not |
@basnijholt Thanks for testing. I can't reproduce this. Here everything works as expected. But I've a slightly different environment (full netcdf4-stack, latest versions of everything). The tests which check for dtype equality (vlen-case) do not raise in this PR for any backend and array container, so I'm assuming this should work as expected (at least for As stated over at #7652 I can't reproduce the int64->int32 conversion in any environments I tested so far. I'll have another look at your environment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added some explanations and concerns. I'd very much appreciate comments and guidance here.
Pinging @shoyer and @max-sixty as authors of relevant code.
@basnijholt I've explained the issues whereabouts over at your issue #7652 (comment). |
4b65ffd
to
5901354
Compare
XRef: #2040 (comment) Citing @shoyer from above comment:
Another interesting issue by @shoyer: I'm really uncertain if using |
And yet another related issue: #1647 Currently both netcdf/h5netcdf are able to set a |
Let's discuss at next week's meeting. @kmuehlbauer can you make it? 9.30am MT Wed 29 Mar 2023 |
That's 15 UTC, or 17 CEST (my local time). Should work for me. I'll try to collect all available information on that topic and the current status. |
Just leaving a note here. I would expect that the datatype that was saved, is the datatype that is loaded. So preferably if I save a string array of e.g., type Thanks again @kmuehlbauer for digging into this problem and all your work! 😄 |
64ef0b9
to
b5dad00
Compare
After the dev-meeting I've taken a step back and first implemented the coders as mentioned in @shoyer's ToDo. I've fixed the one bool->int issue and it now derives the dtype for ScaleOffset coding from scale_factor add_offset. I've improved some test with regard to the scale/offset issue. I'll concentrate on the string fillvalue issues in a follow up PR. |
2e4a81d
to
fc29432
Compare
@dcherian @Illviljan Thanks for the first round of review. I've rebased everything on latest main. Now the code moving from To sum up this PR, it does:
|
@Illviljan I'm not able to figure out the typing if I want to use Data-types as functions to convert python numbers to array scalars. If you have any suggestion how to fix this, please let me know. |
def _choose_float_dtype(dtype: np.dtype, has_offset: bool) -> type[np.floating[Any]]: | ||
"""Return a float dtype that can losslessly represent `dtype` values.""" | ||
# Keep float32 as-is. Upcast half-precision to single-precision, | ||
def _choose_float_dtype( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mankoff do you have time to take a look here please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @dcherian . I'm not sure what to look for. I will link to my open-but-stale PR that tried to start addressing this/similar issues: #6812
Perhaps also relevant are my last few comments on #2304 (see #2304 (comment) ). The problem for me is that (1) the CF standards are ambiguously defined and (2) xarray needs to address the many use-cases where the CF standards are not followed (usually this means different data types).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Does this PR fix your original issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @mankoff, I'll have a look at your extensive notes over there and try to come up with aomething.
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
typing by @Illviljan Co-authored-by: Illviljan <[email protected]>
for more information, see https://pre-commit.ci
Co-authored-by: Illviljan <[email protected]>
…k for float32/64 only.
…nflict, use _FillValue for missing_value if available
aeed54f
to
031cac5
Compare
Still hunting for corner cases and issues inside encode_cf_variable/decode_cf_variable. It looks like I already see some light again. Not sure, if this is the last iteration, but the testsuite is still running green with added and enhanced tests, which is not that bad. Unfortunately #2304 is still an issue for now. I'll clarify that later with an added test. |
@dcherian Just a heads-up: I find this PR getting more and more involved at different parts of the machinery and hard to follow for reviewers. I'll split this up and start with the more or less undisputed changes. |
I've converted to draft for now, as I'm still evaluating solutions for the CF encoding/decoding. |
I'll close this one. Most things have been addressed in other PR's. |
whats-new.rst
api.rst
transform numpy object-dtype strings (vlen) to numpy unicode stringsThis should also fix:
nan
values appearing when saving and loading fromnetCDF
due to encoding #7691and possible other issues which have the float32/float64 issue