You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi everyone, I am running end-to-end finetuning (updating both the DINOv2 encoder and linear classifier) and noticed that with the same batch size, finetuning DINOv2 ViT-B/14 (86M parameters) is almost 5-7 times slower than another visual encoder of the similar size, i.e. CLIP ViT-B/32 (88M parameters). Is the longer training time because of the particular architecture in DINOv2? Does anyone have any ideas?
Thanks a lot!
The text was updated successfully, but these errors were encountered:
Hi everyone, I am running end-to-end finetuning (updating both the DINOv2 encoder and linear classifier) and noticed that with the same batch size, finetuning DINOv2 ViT-B/14 (86M parameters) is almost 5-7 times slower than another visual encoder of the similar size, i.e. CLIP ViT-B/32 (88M parameters). Is the longer training time because of the particular architecture in DINOv2? Does anyone have any ideas?
Thanks a lot!
The text was updated successfully, but these errors were encountered: