When will distributed training on quantized models be available? #3619

adivoj · 2023-09-16T13:17:36Z

adivoj
Sep 16, 2023

When will distributed training on quantized models (llm with lora) be available? Do you know any other simple framework i can try?

tgaddair · 2023-09-16T19:37:40Z

tgaddair
Sep 16, 2023
Maintainer

Hey @adivoj, can you tell us a bit about the environment you're trying to run in? What model you're trying to train, how many nodes, how many GPUs per node, and what kind of GPUs?

I ask, because the right distributed training approach will depend on the above factors. For example, if you want to train on a single node with multiple GPUs, and the model is too large to fit on a single GPU, that is supported in Ludwig today.

But if you want to do multi-node, or data-parallel, then it can be done with DeepSpeed integration in Ludwig, but only up to 8 bit quantization today.

Happy to give you more details when I better understand the use case and can offer more precise advice.

1 reply

adivoj Sep 17, 2023
Author

Hi @tgaddair, thanks for the prompt response and great news about 8bit at least :)

We're 4 friends, each having one T4 15GB and 64-128GB RAM.

llama2-7b 8bit takes 9GB and it can be trained with batch size 8 on one machine. What options do we have to either increase the batch size or run on multiple machines to train faster?

llama2-13b 8bit throws OOM on one machine. What options do we have to load it and possibly train faster?

Also, where do we specify each others IPs? Is it SSH without password that is used?

Thanks @tgaddair for helping out :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When will distributed training on quantized models be available? #3619

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

When will distributed training on quantized models be available? #3619

adivoj Sep 16, 2023

Replies: 1 comment · 1 reply

tgaddair Sep 16, 2023 Maintainer

adivoj Sep 17, 2023 Author

adivoj
Sep 16, 2023

Replies: 1 comment 1 reply

tgaddair
Sep 16, 2023
Maintainer

adivoj Sep 17, 2023
Author