-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Partitions aren't created, but getting "MIG configuration applied successfully" message #18
Comments
@alekraus did applying the configuration require a mig mode change? On A100 and A30 devices, this would require a reboot. Also note that the MIG configuration (after a mode change) does not persist across reboots and would require a config to be applied at startup. |
Apologies for the delay in response @elezar . The node automatically enables mig mode on reboot. When applying the configuration, no mig mode change or reboot was requested by the node, which has two A100 GPUs. The first block below has the output when attempting to apply the Output after running
Output of
Output of
|
Hi @alekraus. Which version of mig-parted are you using in this case? I have just done a sanity check on my side with an executable built off Note that for an 80GB device, the
and check whether applying this works as expected. If it does, then we have to improve our checks around valid profile names. |
Hi @elezar, your suggestion appears to have made the process work as expected. I created a file called
Output from running
Output from running
Thank you very much for your help! Much appreciated. |
Thanks for confirming that this works. I have created an internal ticket to track adding a more verbose error if an unsupprted profile name is requested. Could you please confirm the version of |
As of v0.5.2 (i.e. the very latest) mig-parted should already error out if the requested MIG profile is not valid for the current platform. @alekraus can you verify which version of mig parted you were using that didn't do this? |
Also note that this is not quite accurate:
A GPU reset is necessary, not a reboot, and MIG parted should automatically take care to bring all GPU clients down and back up to allow the reset to happen when necessary. This is actually one of the major value-adds over using raw nvidia-smi, because this is not possible with nvidia-smi alone. |
I have installed the mig-parted tool as root on a node. I am able to run the sample commands listed in the readme page, getting the "MIG configuration applied successfully" message after applying different configurations from the config.yaml file. However, the partitions do not seem to be created, as checked by both nvidia-smi and nvidia-mig-parted export (mig-devices returns "{}"). Do you have any guidance on what could be going on here?
The text was updated successfully, but these errors were encountered: