Retraining SAM #1

alexcbb · 2023-07-17T08:43:39Z

Hello, thank you for your very interesting job.

I was wondering for the training of SAM, I went through your code and saw that you used a classical Data Parallism parallelism for the training, did you encounter any issue for the training in terms of memory ? I also checked at the config file you use, did you retrain all the model only with a batch size of 2 and did you only tested with SAM-b version ? If so, I was wondering, how much time did the training took you on the 4GPUs A40 ?

Thank you in advance for your answer !

JasonQSY · 2023-07-17T18:23:15Z

Thanks for your interests in our work. To clarify,

you used a classical Data Parallism parallelism for the training, did you encounter any issue for the training in terms of memory ?

Yes, we only use DDP for training. The code tests and adds support for mix-precision training but it's not necessary to train it on A40. https://github.com/JasonQSY/3DOI/blob/main/monoarti/configs/sam.yaml#L23
It's doable but we do reduce the batch size and use vit_b as the backbone to fit it into the gpu memory.

I also checked at the config file you use, did you retrain all the model only with a batch size of 2 and did you only tested with SAM-b version ?

Yes. Other backbone needs more gpu memory. vit_h will use too much gpu memory under our current ddp setup. I've tried it and realized it needs more tricks to save gpu memory (such as deepspeed or fsdp). To simplify it I just use vit_b.

how much time did the training took you on the 4GPUs A40 ?

I don't remember the exact time and I try to give a rough estimation. For SAM, it takes approximately 36h to train 200 epochs.

Please let me know if you want to learn more implementation details and I can help.

alexcbb · 2023-07-18T06:14:49Z

Thank you very much for you clarification it was all I needed !

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retraining SAM #1

Retraining SAM #1

alexcbb commented Jul 17, 2023

JasonQSY commented Jul 17, 2023 •

edited

Loading

alexcbb commented Jul 18, 2023

Retraining SAM #1

Retraining SAM #1

Comments

alexcbb commented Jul 17, 2023

JasonQSY commented Jul 17, 2023 • edited Loading

alexcbb commented Jul 18, 2023

JasonQSY commented Jul 17, 2023 •

edited

Loading