Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic mig slicing #119

Open
lengrongfu opened this issue Aug 29, 2024 · 2 comments
Open

Dynamic mig slicing #119

lengrongfu opened this issue Aug 29, 2024 · 2 comments

Comments

@lengrongfu
Copy link

Currently, mig resources need to be pre-divided, but we hope to dynamically divide mig when we use it.

@MehdiTantaoui-99
Copy link

When a MIG profile is added to the GPU and you start using it, and later on you want to add another one it gives you an error.

cat <<EOF | ./nvidia-mig-parted apply -f -
version: v1
mig-configs:
  all-1g.6gb:
  - devices: 0
    mig-enabled: true
    mig-devices:
      1g.6gb: 2
EOF
ERRO[0000] Error clearing MigConfig: error walking gpu instances for '0': error walking compute instances for '0': error destroying Compute instance for profile '(0, 0)': ERROR_IN_USE 
ERRO[0000] Error clearing MIG config on GPU 0, erroneous devices may persist 
FATA[0000] Error applying MIG configuration with hooks: error setting MIGConfig: error attempting multiple config orderings: all orderings failed 

Is there a way to increment the MIG profiles available ? It seems to be trying to destroy the existing instead of incrementing.

@klueska
Copy link
Contributor

klueska commented Oct 10, 2024

That is not supported.

mig-parted is a declarative API, that assumes you want to reflash a set of GPUs with a predefined set of MIG devices.

All workloads on the GPUs (or MIG devices on those GPUs if some already exist) must be shutdown prior to running mig-parted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants