HSDP question #5

sanjayss34 · 2024-10-18T18:04:54Z

Thanks for releasing this great work! I was able to get the training to run with sequence length 1024 on the Llama 8B model on 24GB GPUs. I would like to also be able to train on sequence lengths of 2048 and longer with these GPUs, and due to the memory constraint, I would need HSDP. I tried setting sharding_group_size to 2 and replica_group_size to 1 in the fsdp config. But I'm still getting OOM, so is there something else I need to do to get the hybrid sharding to work?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HSDP question #5

HSDP question #5

sanjayss34 commented Oct 18, 2024

HSDP question #5

HSDP question #5

Comments

sanjayss34 commented Oct 18, 2024