Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The NVIDIA ICD JSON occasionally goes missing from 'nvidia-ctk cdi generate' #767

Open
debarshiray opened this issue Oct 31, 2024 · 5 comments
Assignees

Comments

@debarshiray
Copy link

debarshiray commented Oct 31, 2024

I have been playing with the NVIDIA Container Toolkit on Fedora 39 Workstation and the proprietary NVIDIA driver from RPM Fusion. I have noticed that the NVIDIA installable client driver (or ICD) JSON for Vulkan occasionally goes missing from nvidia-ctk cdi generate:

$ nvidia-ctk cdi generate --format yaml 2>/dev/null | grep vulkan
 - containerPath: /etc/vulkan/implicit_layer.d/nvidia_layers.json
   hostPath: /usr/share/vulkan/implicit_layer.d/nvidia_layers.json

... even though the file is present on the host operating system at /usr/share/vulkan/icd.d/nvidia_icd.x86_64.json and Vulkan support on the host is confirmed by:

$ vulkaninfo --summary
...
...
Devices:
========
GPU0:
	apiVersion         = 1.3.280
	driverVersion      = 560.35.3.0
	vendorID           = 0x10de
	deviceID           = 0x1cbc
	deviceType         = PHYSICAL_DEVICE_TYPE_DISCRETE_GPU
	deviceName         = Quadro P600
	driverID           = DRIVER_ID_NVIDIA_PROPRIETARY
	driverName         = NVIDIA
	driverInfo         = 560.35.3.0
	conformanceVersion = 1.3.8.2
	deviceUUID         = 2efa4848-ba99-ccd3-0a19-f497b31331ca
	driverUUID         = c3ca0510-c7e6-5f1c-86a1-dc0ed4ea4e21
...
...

This means that Podman containers don't have Vulkan support through the proprietary NVIDIA driver, and can only use LLVMpipe.

Right now, I am observing this problem with:

$ uname --kernel-release
6.11.4-101.fc39.x86_64
$ rpm -q kernel
kernel-6.5.6-300.fc39.x86_64
kernel-6.11.4-101.fc39.x86_64
$ rpm -q kmod-nvidia
kmod-nvidia-560.35.03-1.fc39.x86_64
@debarshiray
Copy link
Author

I forgot to mention the NVIDIA Container Toolkit version:

$ nvidia-ctk --version
NVIDIA Container Toolkit CLI version 1.16.1
$ rpm -qf $(which nvidia-ctk)
golang-github-nvidia-container-toolkit-1.16.1-1.fc39.x86_64

Note that the NVIDIA Container Toolkit version didn't change between the NVIDIA ICD JSON for Vulkan being listed and not listed. What changed was that I pulled in the RPM updates for the rest of the Fedora host.

@elezar
Copy link
Member

elezar commented Nov 1, 2024

@debarshiray the host path you mention /usr/share/vulkan/icd.d/nvidia_icd.x86_64.json is not one that we explicitly search for. Could you please confirm which package provides that file? It could be that the 560.35.3.0 driver that you're using now includes the file including the architecture string.

(Looking at some older internal documentation it seems as if this has been the case for a while).

@debarshiray
Copy link
Author

Thanks for looking into it, @elezar !

Meanwhile, I reinstalled different versions of Fedora a few times to see if the problem is specific to a particular combination of package versions. I could reproduce it reliably on Fedora 40 and 41, which was surprising because this used to work. :)

Now with Fedora 41 Workstation and the proprietary NVIDIA driver from RPM Fusion, I see:

$ rpm --query --file /usr/share/vulkan/icd.d/nvidia_icd.x86_64.json
xorg-x11-drv-nvidia-libs-560.35.03-5.fc41.x86_64

If I force /usr/share/vulkan/icd.d/nvidia_icd.x86_64.json to be present inside the container through an explicit bind mount then I do get Vulkan support through the proprietary NVIDIA driver.

In all cases, Vulkan support is available through the proprietary driver on the host operating system, as shown in the vulkaninfo --summary snippet above.

@elezar
Copy link
Member

elezar commented Nov 4, 2024

Who is the publisher of the xorg-x11-drv-nvidia-libs-560.35.03-5.fc41.x86_64 package above?

@debarshiray
Copy link
Author

Who is the publisher of the xorg-x11-drv-nvidia-libs-560.35.03-5.fc41.x86_64 package above?

It's RPM Fusion. That's where I got the proprietary NVIDIA driver from.

@elezar elezar self-assigned this Nov 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants