Could somebody share a sdxl-env.sh that works with a 24gb gpu (3090/4090)? I keep getting cuda OOM #219
-
Thanks in advance. Been following this repo's development on SDXL for a while now, excited to have time to sit down and test it out. If someone has successfully finetuned on a 3090/4090, please share your config so I can have a starting point to work with. Just trying to get it running. I've followed all the instructions I can find here on the github, configured accelerate/deepspeed, edited sdxl-env.sh. And yet I get cuda OOM errors on 3090/4090. Accelerate config according to example on: DEEPSPEED.MD In sdxl-env.sh, kept everything same except: Training machine is Ubuntu 22.04, 2x 3090, 1x 4090, 64gb ram. End goal is to use deepspeed to split the SDXL model across multiple gpus (3090/4090s). Any info towards that goal would be appreciated. Error output: |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 7 replies
-
i believe the pytorch version has a substantial impact here. can you share your library versions? and your deepspeed config? |
Beta Was this translation helpful? Give feedback.
-
Is it still possible to train SDXL with 24GB vram using latest release branch? Now that the only optimizer supported is adam. I always get cuda OOM with deepspeed stage 1. Stage 2 is way too slow, 100+ s/it. |
Beta Was this translation helpful? Give feedback.
8bit adam actually wasn't the reason 24G was doable before, that was Adafactor
but bf16 support can be introduced to any one of the optimizers with the exception of Bits and Bytes, which would require altering that upstream project. for any of the python optimizers, just open a pull request with a stochastic bf16 variant.