Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix RISC-V Linux build #212

Merged
merged 1 commit into from
Jan 5, 2024
Merged

Conversation

markdryan
Copy link
Contributor

Cpuinfo was failing to build on RISC-V Linux distributions, e.g., Ubuntu 23.10, as it includes a header file sys/hwprobe.h that is not yet provided by glibc (although it is provided by bionic). We fix the issue by only including sys/hwprobe.h when building for Android, and invoking the hwprobe syscall directly on other Linux distributions. The Android specific check can be removed in the future once sys/hwprobe.h becomes available in glibc.

Cpuinfo was failing to build on RISC-V Linux distributions, e.g.,
Ubuntu 23.10, as it includes a header file sys/hwprobe.h that is
not yet provided by glibc (although it is provided by bionic).  We
fix the issue by only including sys/hwprobe.h when building for
Android, and invoking the hwprobe syscall directly on other
Linux distributions.  The Android specific check can be removed in
the future once sys/hwprobe.h becomes available in glibc.
@markdryan
Copy link
Contributor Author

@prashanthswami Could I ask you to verify that this patch works correctly on Android and that it doesn't change the hwprobe behaviour on Android builds in any way. It shouldn't do but I don't have a way to test this.

@prashanthswami
Copy link
Collaborator

Confirming - I can build this for Android RISC-V and verified that cpuinfo and isainfo both return properly. I also built against a toolchain that uses an older kernel header and it successfully built.

@markdryan
Copy link
Contributor Author

Confirming - I can build this for Android RISC-V and verified that cpuinfo and isainfo both return properly. I also built against a toolchain that uses an older kernel header and it successfully built.

Great. Thanks. Could you also possibly confirm that we're still taking the bionic path on Android, e.g., that we're executing __riscv_hwprobe here?

@prashanthswami
Copy link
Collaborator

Yup - in the previous run, I had added some logs to ensure that Android was running through the hwprobe path to make sure we weren't falling back to the direct-syscall path.

@markdryan
Copy link
Contributor Author

Yup - in the previous run, I had added some logs to ensure that Android was running through the hwprobe path to make sure we weren't falling back to the direct-syscall path.

Thanks!

@markdryan
Copy link
Contributor Author

markdryan commented Jan 2, 2024

@malfet, @prashanthswami what is the process for getting this PR reviewed and merged? I'm a little concerned that if we leave it much longer pytorch itself might stop building on standard Linux distributions of RISC-V. This could happen if pytorch were to update its submodule to include #190 but not this patch. Pytorch seems to have updated its cpuinfo submodule recently, but luckily it was updated to point to the patch before #4e5be9e. The next update will break the RISC-V builds unless this patch is merged.

@prashanthswami
Copy link
Collaborator

@markdryan - there's some context in #210 but TL;DR this project doesn't have an official maintainer at the moment and malfet doesn't have the bandwidth or hardware setup to review patches. I wasn't able to get into the Slack so I've started a conversation on the PyTorch discussion forums here, mentioning this patch as something that's eventually going to break a future roll: https://discuss.pytorch.org/t/maintenance-of-pytorch-cpuinfo/194722/1

I'll defer to Nikita to comment on what or if we can do anything in the meanwhile (though I'm happy to lend help to validate any patches) - if possible I'd like to see if we can land this just to avoid the breakage and then pause any future work until we figure out how this project gets resourced with the PyTorch group.

@malfet
Copy link
Contributor

malfet commented Jan 2, 2024

@prashanthswami Can we start an email thread instead? Also, what error are you seeing when trying to join slack?
@markdryan i'll try to review/validate PR today, but yes, we need to find a maintainer for the project (perhaps I should volunteer to do it for the first half of 2024, but I want to have a fallback)

@prashanthswami
Copy link
Collaborator

Ah sorry - "I wasn't able to get in" was a poor choice of words, I meant "I have not yet gotten an invitation link". I have the Google form confirmation of the request from the beginning of December, but haven't gotten feedback on a yes/no. Of course, joining the Slack is only for the purpose of the discussion thread.

I was redirected to post in https://dev-discuss.pytorch.org/t/maintenance-of-pytorch-cpuinfo/1767 - I'm happy to start an email thread if that's more appropriate though. What alias should be included from the PyTorch team?

@malfet
Copy link
Contributor

malfet commented Jan 4, 2024

@prashanthswami feel free to just email me (nshulga at meta) and I'll add the respective folks
@markdryan my problem with this PR is that it:

  • Essentially embeds headers/constants into the code rather than skip if they are not available
  • This also somewhat non-cmake friendly, as more conventional way of doing something like that would be to detect if header is present and use it during the build if it is

@markdryan
Copy link
Contributor Author

markdryan commented Jan 5, 2024

  • Essentially embeds headers/constants into the code rather than skip if they are not available

Unfortunately, skipping if the headers are not available is not really an option, as they are likely to be unavailable on build machines for the foreseeable future.  The sys/hwprobe.h header file does not yet exist on any glibc based Linux distributions as the glibc patches for hwprobe have not yet been merged.  This is why cpuinfo does not currently build on RISC-V Ubuntu and Fedora distributions.  On these distributions we could try including the kernel header file asm/hwprobe.h instead (which is the source of the constants in this PR), but there's another problem.  The asm/hwprobe.h file will only exist if the build machine is running a 6.4 kernel or greater.  As none of the RISC-V Ubuntu LTS releases have a 6.4 kernel (Ubuntu 22.04 has a 5.19 kernel) and cpuinfo binaries are likely to be built on the oldest distribution possible (x86_64 pytorch builds seem to be done on manylinux_2014 which is based on Centos 7 (which as far as I can tell has a 3.10 kernel)) it's unlikely that asm/hwprobe.h is going to be available at build time for a while, which would mean cpuinfo binaries would not be able to take advantage of hwprobe even when running on devices with the latest kernel, that support hwprobe.

There's also another complication, which admittedly is more of a problem for a future patch that uses hwprobe to detect extensions, but is worth mentioning in the context of this discussion.   The hwprobe API and headers files are being updated with each new kernel release.  Support for Zba, Zbb, Zbs an V was added in the 6.5 kernel.  Support for Zicboz was added in 6.7 and patches are pending to add support for many other extensions.  Each new kernel version adds new constants to the asm/hwprobe.h.  So let's say you build cpuinfo on a distribution with a 6.4 kernel.  The asm/hwprobe.h will be available but it won't contain the constants for Zba, Zbb, Zbs, V or Zicboz.  So if you want to use hwprobe to detect these extensions and you want your code to be buildable on a wide range of distributions or just older distributions, you have no option but to include the definitions for the constants in your code.

And it's worth pointing out here that if you want to detect things like Zba, Zbb, Zbs and Zicboz and other multi-letter extensions, of which there are many, you have no option but to use hwprobe.  You cannot simply fall back to hwcap if the hwprobe header files are not available, as hwcap cannot detect multi-letter extensions.

For all these reasons other projects that use hwprobe tend to duplicate the kernel constants in their own code.

Also note that there is a precedent for including constants from kernel header files in the cpuinfo code.  See

In summary then

  • It's highly likely that the hwprobe glibc and kernel headers will not be available at build time for the foreseeable future so the constants must be duplicated.
  • Other parts of cpuinfo already duplicate constants from the kernel header files so we're not doing anything in this patch that isn't already done elsewhere.
  • There's no alternative to hwprobe on RISC-V. It is the only way to detect the majority of RISC-V extensions.
  • This also somewhat non-cmake friendly, as more conventional way of doing something like that would be to detect if header is present and use it during the build if it is

Right, I wasn't sure about this bit. Using __has_include seemed to be the simplest solution. I'll update the patch to use CMake to check for the hwprobe headers.

Copy link
Contributor

@malfet malfet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it unblock you (and affects only RiscV), let's land it as is and fix forward

@malfet malfet merged commit 313524a into pytorch:main Jan 5, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants