why finetuning DINOv2 encoder is much slower than other visual encoder? #458

tian1327 · 2024-08-22T19:45:45Z

Hi everyone, I am running end-to-end finetuning (updating both the DINOv2 encoder and linear classifier) and noticed that with the same batch size, finetuning DINOv2 ViT-B/14 (86M parameters) is almost 5-7 times slower than another visual encoder of the similar size, i.e. CLIP ViT-B/32 (88M parameters). Is the longer training time because of the particular architecture in DINOv2? Does anyone have any ideas?

Thanks a lot!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

why finetuning DINOv2 encoder is much slower than other visual encoder? #458

why finetuning DINOv2 encoder is much slower than other visual encoder? #458

tian1327 commented Aug 22, 2024

why finetuning DINOv2 encoder is much slower than other visual encoder? #458

why finetuning DINOv2 encoder is much slower than other visual encoder? #458

Comments

tian1327 commented Aug 22, 2024