-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
all GPUs in host are visible to job submit with '--gpus=0' #662
Comments
/assign @happy2048 |
@cheyang: GitHub didn't allow me to assign the following users: happy2048. Note that only kubeflow members, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@nowenL 2 quick questions to confirm:
|
@wsxiaozhang
As mentioned above,
It's a minimal example to reproduce the issue. This may happen in practice as well, for example, users reuse their GPU base image for CPU training workload. Anyway, the root cause is 'NVIDIA_VISIBLE_DEVICES=all' and cuda image is just one way to trigger it. Another major concern is, by simply setting an env, any cluster user can gain control of GPU in the host even it's assigned to other jobs. This looks vulnerable and can become critical in some cases. |
@nowenL got your points now, that's fair. |
Env
arena version: v0.8.6+a2bec8c
k8s server version: {Major:"1", Minor:"20+", GitVersion:"v1.20.4-aliyun.1", GitCommit:"7a23884", GitTreeState:"", BuildDate:"2021-05-31T13:47:24Z", GoVersion:"go1.15.8", Compiler:"gc", Platform:"linux/amd64"}
Problem
The text was updated successfully, but these errors were encountered: