Dynamic mig slicing #119

lengrongfu · 2024-08-29T06:58:12Z

Currently, mig resources need to be pre-divided, but we hope to dynamically divide mig when we use it.

The text was updated successfully, but these errors were encountered:

MehdiTantaoui-99 · 2024-10-10T09:36:27Z

When a MIG profile is added to the GPU and you start using it, and later on you want to add another one it gives you an error.

cat <<EOF | ./nvidia-mig-parted apply -f -
version: v1
mig-configs:
  all-1g.6gb:
  - devices: 0
    mig-enabled: true
    mig-devices:
      1g.6gb: 2
EOF

ERRO[0000] Error clearing MigConfig: error walking gpu instances for '0': error walking compute instances for '0': error destroying Compute instance for profile '(0, 0)': ERROR_IN_USE 
ERRO[0000] Error clearing MIG config on GPU 0, erroneous devices may persist 
FATA[0000] Error applying MIG configuration with hooks: error setting MIGConfig: error attempting multiple config orderings: all orderings failed

Is there a way to increment the MIG profiles available ? It seems to be trying to destroy the existing instead of incrementing.

klueska · 2024-10-10T09:47:18Z

That is not supported.

mig-parted is a declarative API, that assumes you want to reflash a set of GPUs with a predefined set of MIG devices.

All workloads on the GPUs (or MIG devices on those GPUs if some already exist) must be shutdown prior to running mig-parted.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dynamic mig slicing #119

Dynamic mig slicing #119

lengrongfu commented Aug 29, 2024

MehdiTantaoui-99 commented Oct 10, 2024

klueska commented Oct 10, 2024

Dynamic mig slicing #119

Dynamic mig slicing #119

Comments

lengrongfu commented Aug 29, 2024

MehdiTantaoui-99 commented Oct 10, 2024

klueska commented Oct 10, 2024