Poor multi-GPU performance due to CPU-side stalls in gbm_surface_lock_front_buffer #743
Open
1 of 2 tasks
Labels
bug
Something isn't working
NVIDIA Open GPU Kernel Modules Version
565.57.01
Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific to the open kernel driver.
Operating System and Version
Arch Linux
Kernel Release
6.11.8
Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.
Hardware: GPU
NVIDIA GeForce RTX 4090 Laptop GPU
Describe the bug
The NVIDIA driver blocks calls to
gbm_surface_lock_front_buffer
on the CPU until its rendering has fully finished. Because it blocks on the CPU, in multi-GPU scenarios where it copies from a primary GPU, it holds up mutter's rendering thread, preventing it from doing further rendering on the primary GPU until the NVIDIA driver is fully done, which in turn makes the next frame be delayed for both primary and secondary GPU alike.Essentially rendering on all primary and secondary GPU go in lockstep with one another, reducing performance of displays attached to the primary GPU (as it can't continue rendering until the NVIDIA driver finishes rendering/copying) as well as its own displays (as the NVIDIA GPU can't render/copy until the primary GPU is done rendering).
The reasoning for why this happens is described by Austin Shafer in this mutter thread. Potential solutions that may be compatible with other graphics drivers are also described here by Michel Dänzer.
To Reproduce
CLUTTER_SHOW_FPS=1
, or tools such asglxgears
, especially when moving its window around (the former is the most accurate).Ideally use high-refresh rate monitors with >= 120 Hz in all cases since these make the issue much more visible where full performance is not attained. These are fairly common nowadays in high-end laptops.
There is a lot of debugging and profiling information to be found in this mutter ticket (see also above and below the linked comment).
Bug Incidence
Always
nvidia-bug-report.log.gz
nvidia-bug-report.log.gz
More Info
I'm not sure how this affects single-GPU NVIDIA systems; the block likely also happens there but may have a less adverse effect.
I wanted to create this ticket here to keep track of this problem and create more visibility as it is unique the NVIDIA driver on Linux and it is a performance problem that may be the root cause of other open problems that might otherwise be hard to track down. GNOME/mutter is discussed here, but this may also impact KDE and other desktop environments.
Note that mutter stable 47.1 has several performance bottlenecks of its own with multiple GPUs, so to properly test the full potential of fixes you may also need (these are also touched upon in the linked mutter issue):
The text was updated successfully, but these errors were encountered: