Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add dtype::normalized_num and dtype::num_of #5429

Merged
merged 5 commits into from
Nov 17, 2024

Conversation

MaartenBaert
Copy link
Contributor

Description

This commit adds two functions, dtype::normalized_num and dtype::num_of, which make it possible to use a switch statement for dynamic dispatch based on the dtype of an array, like this:

switch (arr.dtype().normalized_num()) {
    case py::dtype::num_of<int8_t>():
        return foo<int8_t>(arr);
    case py::dtype::num_of<uint8_t>():
        return foo<uint8_t>(arr);
    case py::dtype::num_of<int16_t>():
        return foo<int16_t>(arr);
    case py::dtype::num_of<uint16_t>():
        return foo<uint16_t>(arr);
// ...

dtype::normalized_num does the same as dtype::num, except the type number is normalized to match the one used in npy_format_descriptor. This is needed because dtype::num can return different values for equivalent types, e.g. even though long may be equivalent to int or long long, they still have different type numbers (at least when the dtype is constructed by type number, rather than with dtype::of<T>()). Without normalization, this leads to strange bugs where the switch statement inexplicably rejects arrays from certain sources despite superficially having the correct dtype. E.g. on x86_64 (linux), dtypes long (7) and longlong (9) both appear as int64 but have different type numbers:

In [1]: np.dtype('long')
Out[1]: dtype('int64')

In [2]: np.dtype('longlong')
Out[2]: dtype('int64')

In [3]: np.dtype('long').num
Out[3]: 7

In [4]: np.dtype('longlong').num
Out[4]: 9

dtype::num_of<T> is simply the constexpr equivalent of dtype::of<T>().num(), so it can be used for cases in a switch statement.

While developing this, I ran into an inconsistency in the behavior of is_fmt_numeric which is relevant for the implementation of dtype::normalized_num, I have reported this as a separate issue: #5428

Suggested changelog entry:

Add dtype::normalized_num and dtype::num_of

Copy link
Collaborator

@rwgk rwgk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seberg Is there a chance that you could help reviewing this pybind11/numpy.h PR?

It looks good to me, but I'm not very familiar with the details?

// This is needed to correctly handle situations where multiple typenums map to the same type,
// e.g. NPY_LONG_ may be equivalent to NPY_INT_ or NPY_LONGLONG_ despite having a different
// typenum. The normalized typenum always matches the values used in npy_format_descriptor.
static constexpr int normalized_dtype_num[npy_api::NPY_VOID_ + 1] = {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow ... What if someone tries to make a changes to enum constants, could that result in confusing behavior and time-consuming debugging?

Could it make sense to leave "if this than that" comments here and near the top of enum constants?

// If you change this code, please review `normalized_dtype_num` below.
...
// If you change this code, please review `enum constants` above.

Maybe also: Could this new code be moved closer to the enum constants code?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit verbose on the possibility (a long long is always 64bit on any supported platform for example), but seems fine to me.
I wonder if "bitsized" or so might be a clearer name than "normalized".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if someone tries to make a changes to enum constants, could that result in confusing behavior and time-consuming debugging?

Those constants are part of the Numpy API, so that would cause much larger problems. In any case I've added the suggested comment.

Maybe also: Could this new code be moved closer to the enum constants code?

Do you mean inside struct npy_api? That would create some extra complications since it's a static array in a header-only library.

I wonder if "bitsized" or so might be a clearer name than "normalized".

Do you want to rename just the normalized_dtype_num array or also dtype::normalized_num?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm ... I think "normalized" better captures what this is about. What we really want is to normalize (for use in switch statements). That we're picking the bitsize names is a choice of secondary importance.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, I think normalize is fine. I agree in the context of switch statements, it is clear enough.

// This is needed to correctly handle situations where multiple typenums map to the same type,
// e.g. NPY_LONG_ may be equivalent to NPY_INT_ or NPY_LONGLONG_ despite having a different
// typenum. The normalized typenum always matches the values used in npy_format_descriptor.
static constexpr int normalized_dtype_num[npy_api::NPY_VOID_ + 1] = {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm ... I think "normalized" better captures what this is about. What we really want is to normalize (for use in switch statements). That we're picking the bitsize names is a choice of secondary importance.

@rwgk
Copy link
Collaborator

rwgk commented Nov 17, 2024

Thanks @MaartenBaert, and thanks a lot for the review @seberg!

@rwgk rwgk merged commit f41dae3 into pybind:master Nov 17, 2024
81 checks passed
@github-actions github-actions bot added the needs changelog Possibly needs a changelog entry label Nov 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs changelog Possibly needs a changelog entry
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants