Add dtype::normalized_num and dtype::num_of #5429

MaartenBaert · 2024-11-03T18:36:55Z

Description

This commit adds two functions, dtype::normalized_num and dtype::num_of, which make it possible to use a switch statement for dynamic dispatch based on the dtype of an array, like this:

switch (arr.dtype().normalized_num()) {
    case py::dtype::num_of<int8_t>():
        return foo<int8_t>(arr);
    case py::dtype::num_of<uint8_t>():
        return foo<uint8_t>(arr);
    case py::dtype::num_of<int16_t>():
        return foo<int16_t>(arr);
    case py::dtype::num_of<uint16_t>():
        return foo<uint16_t>(arr);
// ...

dtype::normalized_num does the same as dtype::num, except the type number is normalized to match the one used in npy_format_descriptor. This is needed because dtype::num can return different values for equivalent types, e.g. even though long may be equivalent to int or long long, they still have different type numbers (at least when the dtype is constructed by type number, rather than with dtype::of<T>()). Without normalization, this leads to strange bugs where the switch statement inexplicably rejects arrays from certain sources despite superficially having the correct dtype. E.g. on x86_64 (linux), dtypes long (7) and longlong (9) both appear as int64 but have different type numbers:

In [1]: np.dtype('long')
Out[1]: dtype('int64')

In [2]: np.dtype('longlong')
Out[2]: dtype('int64')

In [3]: np.dtype('long').num
Out[3]: 7

In [4]: np.dtype('longlong').num
Out[4]: 9

dtype::num_of<T> is simply the constexpr equivalent of dtype::of<T>().num(), so it can be used for cases in a switch statement.

While developing this, I ran into an inconsistency in the behavior of is_fmt_numeric which is relevant for the implementation of dtype::normalized_num, I have reported this as a separate issue: #5428

Suggested changelog entry:

Add dtype::normalized_num and dtype::num_of

rwgk

@seberg Is there a chance that you could help reviewing this pybind11/numpy.h PR?

It looks good to me, but I'm not very familiar with the details?

rwgk · 2024-11-11T23:28:49Z

include/pybind11/numpy.h

+// This is needed to correctly handle situations where multiple typenums map to the same type,
+// e.g. NPY_LONG_ may be equivalent to NPY_INT_ or NPY_LONGLONG_ despite having a different
+// typenum. The normalized typenum always matches the values used in npy_format_descriptor.
+static constexpr int normalized_dtype_num[npy_api::NPY_VOID_ + 1] = {


Wow ... What if someone tries to make a changes to enum constants, could that result in confusing behavior and time-consuming debugging?

Could it make sense to leave "if this than that" comments here and near the top of enum constants?

// If you change this code, please review `normalized_dtype_num` below. ... // If you change this code, please review `enum constants` above.

Maybe also: Could this new code be moved closer to the enum constants code?

A bit verbose on the possibility (a long long is always 64bit on any supported platform for example), but seems fine to me.
I wonder if "bitsized" or so might be a clearer name than "normalized".

What if someone tries to make a changes to enum constants, could that result in confusing behavior and time-consuming debugging?

Those constants are part of the Numpy API, so that would cause much larger problems. In any case I've added the suggested comment.

Maybe also: Could this new code be moved closer to the enum constants code?

Do you mean inside struct npy_api? That would create some extra complications since it's a static array in a header-only library.

I wonder if "bitsized" or so might be a clearer name than "normalized".

Do you want to rename just the normalized_dtype_num array or also dtype::normalized_num?

Hm ... I think "normalized" better captures what this is about. What we really want is to normalize (for use in switch statements). That we're picking the bitsize names is a choice of secondary importance.

Sounds good, I think normalize is fine. I agree in the context of switch statements, it is clear enough.

rwgk · 2024-11-16T21:29:59Z

include/pybind11/numpy.h

+// This is needed to correctly handle situations where multiple typenums map to the same type,
+// e.g. NPY_LONG_ may be equivalent to NPY_INT_ or NPY_LONGLONG_ despite having a different
+// typenum. The normalized typenum always matches the values used in npy_format_descriptor.
+static constexpr int normalized_dtype_num[npy_api::NPY_VOID_ + 1] = {


Hm ... I think "normalized" better captures what this is about. What we really want is to normalize (for use in switch statements). That we're picking the bitsize names is a choice of secondary importance.

rwgk · 2024-11-17T15:55:58Z

Thanks @MaartenBaert, and thanks a lot for the review @seberg!

MaartenBaert added 4 commits November 3, 2024 18:30

Add dtype::normalized_num and dtype::num_of

bf9c0ba

Fix compiler warning and improve NumPy 1.x compatibility

7cb5799

Fix clang-tidy warning

963ea97

Fix another clang-tidy warning

e03450b

rwgk reviewed Nov 11, 2024

View reviewed changes

Add extra comment

7313b69

rwgk approved these changes Nov 16, 2024

View reviewed changes

rwgk merged commit f41dae3 into pybind:master Nov 17, 2024
81 checks passed

github-actions bot added the needs changelog Possibly needs a changelog entry label Nov 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add dtype::normalized_num and dtype::num_of #5429

Add dtype::normalized_num and dtype::num_of #5429

MaartenBaert commented Nov 3, 2024

rwgk left a comment

rwgk Nov 11, 2024

seberg Nov 12, 2024

MaartenBaert Nov 16, 2024

rwgk Nov 16, 2024

seberg Nov 17, 2024

rwgk Nov 16, 2024

rwgk commented Nov 17, 2024

Add dtype::normalized_num and dtype::num_of #5429

Add dtype::normalized_num and dtype::num_of #5429

Conversation

MaartenBaert commented Nov 3, 2024

Description

Suggested changelog entry:

rwgk left a comment

Choose a reason for hiding this comment

rwgk Nov 11, 2024

Choose a reason for hiding this comment

seberg Nov 12, 2024

Choose a reason for hiding this comment

MaartenBaert Nov 16, 2024

Choose a reason for hiding this comment

rwgk Nov 16, 2024

Choose a reason for hiding this comment

seberg Nov 17, 2024

Choose a reason for hiding this comment

rwgk Nov 16, 2024

Choose a reason for hiding this comment

rwgk commented Nov 17, 2024