You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for your sharing. Did all of experiments distillation work with the CE loss? I have a problem about this training strategy. First , well trained teacher model fixed parameters, then add one dimension linear transfer layer before the last classification layer of teacher and student model respectively, this linear transfer layer is trainable as the student model. But if you have the CE loss , you add after the original teacher student models' last layer. CE loss doesn't have any relationship with your linear dimension transfer layer. I feel this is a little strange. Your linear transfer layer has no connection with your final classification task. How could this layer learning. And another question is if my student and teacher model's penultimate layer have the same dimension, can I drop the linear dimension transfer layer?
Thanks very much for your reply.
The text was updated successfully, but these errors were encountered:
Thanks for your sharing. Did all of experiments distillation work with the CE loss? I have a problem about this training strategy. First , well trained teacher model fixed parameters, then add one dimension linear transfer layer before the last classification layer of teacher and student model respectively, this linear transfer layer is trainable as the student model. But if you have the CE loss , you add after the original teacher student models' last layer. CE loss doesn't have any relationship with your linear dimension transfer layer. I feel this is a little strange. Your linear transfer layer has no connection with your final classification task. How could this layer learning. And another question is if my student and teacher model's penultimate layer have the same dimension, can I drop the linear dimension transfer layer?
Thanks very much for your reply.
The text was updated successfully, but these errors were encountered: