Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cuda: fix check for GPU device availability #2510

Merged
merged 1 commit into from
Nov 12, 2024

Commits on Nov 10, 2024

  1. cuda: fix check for GPU device availability

    The check for `/dev/nvidiactl` to determine if the CUDA plugin can be
    used is unreliable because in some cases the default path for driver
    installation is different [1]. This patch changes the logic to check
    if a GPU device is available in `/proc/driver/nvidia/gpus/`. This
    approach is similar to `torch.cuda.is_available()` and it is a more
    accurate indicator.
    
    The subsequent check for support of the `cuda-checkpoint --action`
    option would confirm if the driver supports checkpoint/restore.
    
    [1] https://github.com/NVIDIA/gpu-operator
    
    Fixes: checkpoint-restore#2509
    
    Signed-off-by: Radostin Stoyanov <[email protected]>
    rst0git committed Nov 10, 2024
    Configuration menu
    Copy the full SHA
    de9d552 View commit details
    Browse the repository at this point in the history