Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the attributes needed in OpenShift i.e. nvidia.com/gpu specifically for RHODS to use GPU resource #111

Closed
Milstein opened this issue Aug 1, 2023 · 4 comments · Fixed by #123
Assignees

Comments

@Milstein
Copy link
Contributor

Milstein commented Aug 1, 2023

  • Set nvidia.com/gpu: 0 by default
  • We can repurpose the current OpenStack quota attribute i.e. OpenStack GPU Quota as Allocation GPU quota
  • PI/ Manager(s) will change request to enable GPU resource in OpenShift as well
@Milstein Milstein changed the title Add the attributes needed for RHODS i.e. nvidia.com/gpu Add the attributes needed in OpenShift i.e. nvidia.com/gpu in RHODS Aug 1, 2023
@Milstein Milstein changed the title Add the attributes needed in OpenShift i.e. nvidia.com/gpu in RHODS Add the attributes needed in OpenShift i.e. nvidia.com/gpu specifically for RHODS Aug 1, 2023
@Milstein Milstein changed the title Add the attributes needed in OpenShift i.e. nvidia.com/gpu specifically for RHODS Add the attributes needed in OpenShift i.e. nvidia.com/gpu specifically for RHODS to use GPU resource Aug 1, 2023
@Milstein
Copy link
Contributor Author

@knikolla : any idea how can we get this feature added to our current ocp approval plugin

@knikolla
Copy link
Collaborator

@Milstein

Based on some reading, it seems nvidia.com/gpu is a resource type that a limit can be set on.

  1. First openshift-acct-mgt needs to be made aware that such a limit exists by adding it to the quotas.json file. Ignore base and coefficient.
":limits.nvidia.com/gpu":         { "base": 0, "coefficient": 0 },
  1. Second, create a new attribute OpenShift GPU Quota in attributes.py here. I don't think repurposing an existing attribute makes things any easier and would suggest creating a new one.
QUOTA_LIMITS_GPU = 'OpenShift Limit on GPUs'
  1. Add the attribute under the static quota section for OpenShift as zero.
{
    attributes.QUOTA_LIMITS_GPU: 0,
}
  1. Add the quota key mapping in openshift.py here. This maps the attribute to the expected entry in the call to openshift-acct-mgt.
attributes.LIMITS_GPU: lambda x: {":limits.nvidia.com/gpu": f"{x}"},
  1. Test

@joachimweyl
Copy link

@jtriley it looks like @Milstein has assigned this to you. Do you feel you have the details you need to resolve this issue?

@jtriley
Copy link
Contributor

jtriley commented Nov 9, 2023

@knikolla re: 1) I made a PR here CCI-MOC/openshift-acct-mgt#100

NOTE: from our testing the quota has to be set on requests.nvidia.com/gpu not limits.nvidia.com/gpu otherwise users are able to still get a GPU.

Similar PR here in the config repo:

OCP-on-NERC/nerc-ocp-config#315

Both have been merged

For 2-4) I have a draft PR here #123

Looking into 5) if it makes sense to test this from this repo via CI/CD

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants