Distributed inference - 16 GPU limit #11218

justinjja · 2025-01-13T18:19:04Z

justinjja
Jan 13, 2025

Was trying to get Deepseek v3 running on across 3 machines and hit a 16 gpu limit,
I don't suppose there is a way around this right?

Is this related to Nvidia/CUDA, or just that Distributed inference is new and 1 machine would never have more than 16?

Answered by slaren

You should be able to use any number of devices by increasing the value of GGML_SCHED_MAX_BACKENDS.

slaren · 2025-01-13T18:25:05Z

You should be able to use any number of devices by increasing the value of GGML_SCHED_MAX_BACKENDS.

1 reply

Bumped it up to 64 and I'm good to go, Thanks!