Question regarding value range (-1,1) #68
-
Hi, thank you so much for releasing code for these inspiring works. I notice that the config file uses Thank you very much for your time and help. |
Beta Was this translation helpful? Give feedback.
Replies: 6 comments
-
We used |
Beta Was this translation helpful? Give feedback.
-
Thank you very much for your kind reply. We've successfully reproduced the vit_s_i1k result after 90 epochs! If you don't mind, one more thing that is troubling us is the speed of data loading by TFDS from a gs bucket. Our TFRecords are organized in a different way and we are using an old-fashioned way to load them, which seems quite slow (on v3-64 90ep takes 2h instead of 6.5 / 8 = 0.8 h). Would you mind telling us what causes our way to be much slower than yours? Huge thanks!
|
Beta Was this translation helpful? Give feedback.
-
I would try the following two experiments to find the root cause of the slowness:
|
Beta Was this translation helpful? Give feedback.
-
Thank you so much for your kind advice! We found that it's indeed 2. that is slowing us down, and we will improve it. Another thing we noticed is that in the vit_s16_i1k experiment, we additionally return the |
Beta Was this translation helpful? Give feedback.
-
That's probably due to the big_vision/big_vision/models/vit.py Line 162 in 47ac2fd |
Beta Was this translation helpful? Give feedback.
-
Oh, I see! The learning rate at the 1st iteration is 0 so no parameter is updated. In the 2nd iteration, the kernel weights of the last head layer are still 0 and thus no gradient is back-proped to earlier layers. Thank you so much for helping us out! Really appreciate it! |
Beta Was this translation helpful? Give feedback.
We used
value_range(-1,1)
for most of our experiments, so you should use that same preprocessing when you use the models. If you use pre-trained models, you should check the original configuration (e.g.big_vision/configs/vit_i1k.py
). If you train from scratch, it shouldn't make a difference whether you usevalue_range(-1,1)
orvgg_value_range
.