-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ld: cannot find ./nvidia/nv-kernel.o due to Kbuild changes in 6.13 #747
Comments
Thank you for reporting this. We're also tracking this internally as NVIDIA bug 4989092. If you can do a git-bisect, that would be very useful. Thanks. |
Guy's you should step up your game and check. Seems the symlinks get done wrongly as mentioned here: #746 (comment). For 470xx series the community already has a patch: https://gist.github.com/joanbm/d1f89391a4b20f4b56ba931ef6ca62da. Should apply to newer and older Nvidia drivers just fine. |
@aritger the issue is |
So the patch definitely works, just apply https://gist.githubusercontent.com/joanbm/d1f89391a4b20f4b56ba931ef6ca62da/raw/8458c7c58249a0dceb5ab1b5aada7e705a88b4ff/nvidia-470xx-fix-linux-6.13.patch but I'm stuck on this part:
It looks like it's against the nvidia rules to post a bug for an RC kernel version. Though, it seems like this one was welcomed in #747 (comment) While it seems like it's considered acceptable by nvidia in this case, I'm curious if someone can explain to me why the bug reports for release candidates are not allowed, or why they would be accepted in this particular case? |
Hi there. We'll need to revisit our policy on this, but I imagine it could look something roughly like:
I believe there are two reasons for the original policy. One is that because the early RCs are sometimes very unstable, we don't want to spend time chasing down issues that are not part of our stack. The other is that by far the overwhelming majority of these RC bug reports are due to some API changes that make the NV driver fail to build/load. This is something we are actively tracking internally, so these bug reports don't actually help us, it is just extra work to triage and respond. However, they do help everyone else, so I think the policy might get changed. Maybe we'll provide a separate bug template and tag for these issues. |
@mtijanic the RC1 is the most important kernel version there is. It normally includes all changes made in that kernel series. The following RCs are to polish the regressions or features introduced in that development cycle. Nvidia can simply join that development cycle similar as other companies like Intel and AMD do. Also post their fixes as soon as they have some to the public on given issues we may open. As distro maintainers we have to prepare our systems as soon as possible. Some distributions, especially rolling ones, might adopt software as soon as they got tagged stable. Therefore I always ignored the rules of not posting bugs for not released kernels. It simply makes no sense to have such a rule, as you have to change the code anyway. Sure, QA and official support might take longer for Nvidia to do. However, if Nvidia decides to do open kernel modules like AMD, Nvidia has to join and develop in the development cycle at given merge windows as everybody else has to do too. So why not start early as the trend is in that direction anyway. |
As an out of tree driver we adapt our release (and thus QA) schedule not just to linux mainline, but also HW releases, security rollouts across multiple LTS driver branches, enterprise customer schedules, LTS kernel releases, and even Windows and the big game title releases. Unfortunately, this means we cannot always provide latest mainline compatible releases. There's certainly an argument to be made that we can still handle this much better even within all the above constraints, such as maybe pushing the kernel-compatibility patches out of band as they are developed. We are actively discussing this internally and looking to improve the situation for bleeding edge rolling release distros. |
That would be really good. As most of the patches the user community provides get tested even on older Nvidia drivers which are EOL, as the 390xx series as an example. You can follow those efforts for example at the AUR from Arch: https://aur.archlinux.org/cgit/aur.git/log/?h=nvidia-470xx-utils Also Joan provides some patches for the 470xx driver series, as he still has to use that driver. Most of them also apply to newer or older drivers: https://gist.github.com/joanbm Most bleeding edge Distros are able to patch drivers as needed. The only issue here might be, that it might violate the licenses of the proprietary drivers. However, we only modify those source available parts anyway. So the binary blobs are never touched. Might be a gray area. So if that can be also added to the internal discussions it would help us a lot. |
That's the point that Milos is making -- we already have started. We track not only the -rc kernels internally, but also linux-next, so we are typically aware of -rc1 build failures before -rc1 is released. So as he said: "these bug reports don't actually help us, it is just extra work to triage and respond." |
@mtijanic Thank you for your response. That's super helpful to understand the position. I agree that this bug report was helpful to me as it led me to the resolution of the issue I was having. I also understand that from a development/product owner standpoint it probably is not beneficial to understand these bugs. |
@ttabi the issue here is that Nvidia may know about the issues due to their internal bug tracker. However the Distro maintainers and users who want to use the latest kernels or have to ship the drivers in combination with the latest kernels would want to know about the issue and possible patches already existing. So having those issues publicly posted helps the community and users of the Nvidia drivers. Posts like https://forums.developer.nvidia.com/t/patch-for-565-57-01-linux-kernel-6-12/313260 are great - a great example to publish patches as needed as soon as they are known and created. And even when the community is posting again issues you already know, the proper response would be: hey here is the patch, will be officially released in 565.xxx, which is scheduled for release at |
Cherry-pick of abc88d52ee5e86decd5e98b3223d3feec0dd66bc from nvidia-driver tracking repo. Link: https://gist.github.com/joanbm/d1f89391a4b20f4b56ba931ef6ca62da Link: NVIDIA#747 Author: Joan Bruguera <[email protected]>
NVIDIA Open GPU Kernel Modules Version
565.57.01
Operating System and Version
Manjaro 24.2
Kernel Release
6.13.0-rc1-1-MANJARO
Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.
Build Command
The Kbuild changes merged in 6.13-rc1 prevent the nvidia drivers to compile.
Terminal output/Build Log
More Info
When I find time, I might do a git-bisect to find the commit causing this.
The text was updated successfully, but these errors were encountered: