-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Label smoothing in training #261
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## dev #261 +/- ##
==========================================
+ Coverage 89.43% 89.47% +0.03%
==========================================
Files 12 12
Lines 909 912 +3
==========================================
+ Hits 813 816 +3
Misses 96 96 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me aside from needing a unit test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me!
After encountering the NaN outputs from the model half way through training in a few runs, I experimented with minimal label smoothing when calculating the training loss as a mitigation strategy. I was able redo the same training runs with the same setup significantly longer without encountering NaNs and with similar performance metrics compared to the original runs.
Loss calculation is only impacted by smoothing during training steps, i.e. not validation, and I tentatively added the minimal label smoothing factor as the default option.