-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add dtype::normalized_num and dtype::num_of #5429
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@seberg Is there a chance that you could help reviewing this pybind11/numpy.h PR?
It looks good to me, but I'm not very familiar with the details?
// This is needed to correctly handle situations where multiple typenums map to the same type, | ||
// e.g. NPY_LONG_ may be equivalent to NPY_INT_ or NPY_LONGLONG_ despite having a different | ||
// typenum. The normalized typenum always matches the values used in npy_format_descriptor. | ||
static constexpr int normalized_dtype_num[npy_api::NPY_VOID_ + 1] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow ... What if someone tries to make a changes to enum constants
, could that result in confusing behavior and time-consuming debugging?
Could it make sense to leave "if this than that" comments here and near the top of enum constants
?
// If you change this code, please review `normalized_dtype_num` below.
...
// If you change this code, please review `enum constants` above.
Maybe also: Could this new code be moved closer to the enum constants
code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bit verbose on the possibility (a long long
is always 64bit on any supported platform for example), but seems fine to me.
I wonder if "bitsized" or so might be a clearer name than "normalized".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if someone tries to make a changes to enum constants, could that result in confusing behavior and time-consuming debugging?
Those constants are part of the Numpy API, so that would cause much larger problems. In any case I've added the suggested comment.
Maybe also: Could this new code be moved closer to the enum constants code?
Do you mean inside struct npy_api
? That would create some extra complications since it's a static array in a header-only library.
I wonder if "bitsized" or so might be a clearer name than "normalized".
Do you want to rename just the normalized_dtype_num
array or also dtype::normalized_num
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm ... I think "normalized" better captures what this is about. What we really want is to normalize (for use in switch statements). That we're picking the bitsize names is a choice of secondary importance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, I think normalize is fine. I agree in the context of switch statements, it is clear enough.
// This is needed to correctly handle situations where multiple typenums map to the same type, | ||
// e.g. NPY_LONG_ may be equivalent to NPY_INT_ or NPY_LONGLONG_ despite having a different | ||
// typenum. The normalized typenum always matches the values used in npy_format_descriptor. | ||
static constexpr int normalized_dtype_num[npy_api::NPY_VOID_ + 1] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm ... I think "normalized" better captures what this is about. What we really want is to normalize (for use in switch statements). That we're picking the bitsize names is a choice of secondary importance.
Thanks @MaartenBaert, and thanks a lot for the review @seberg! |
Description
This commit adds two functions,
dtype::normalized_num
anddtype::num_of
, which make it possible to use a switch statement for dynamic dispatch based on the dtype of an array, like this:dtype::normalized_num
does the same asdtype::num
, except the type number is normalized to match the one used innpy_format_descriptor
. This is needed becausedtype::num
can return different values for equivalent types, e.g. even thoughlong
may be equivalent toint
orlong long
, they still have different type numbers (at least when the dtype is constructed by type number, rather than withdtype::of<T>()
). Without normalization, this leads to strange bugs where the switch statement inexplicably rejects arrays from certain sources despite superficially having the correct dtype. E.g. on x86_64 (linux), dtypeslong
(7) andlonglong
(9) both appear asint64
but have different type numbers:dtype::num_of<T>
is simply the constexpr equivalent ofdtype::of<T>().num()
, so it can be used for cases in a switch statement.While developing this, I ran into an inconsistency in the behavior of
is_fmt_numeric
which is relevant for the implementation ofdtype::normalized_num
, I have reported this as a separate issue: #5428Suggested changelog entry:
Add dtype::normalized_num and dtype::num_of