Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KL loss becomes nan #7

Open
NewEricWang opened this issue Mar 28, 2019 · 0 comments
Open

KL loss becomes nan #7

NewEricWang opened this issue Mar 28, 2019 · 0 comments

Comments

@NewEricWang
Copy link

Hi, I use my owner data to train teacher model and student model. The teacher model is normal; but the KL loss of student model becomes nan at about 50k step. The log information is like as follows:
Global Step : 67438, [55, 100] [Total Loss, KL Loss, Reg Loss, Frame Loss] : [ nan nan 2.301 nan]
100 Step Time : 143.40849208831787
Global Step : 67538, [55, 200] [Total Loss, KL Loss, Reg Loss, Frame Loss] : [ nan nan 2.3145 nan]
100 Step Time : 142.97523188591003
Global Step : 67638, [55, 300] [Total Loss, KL Loss, Reg Loss, Frame Loss] : [ nan nan 2.3899 nan]
100 Step Time : 142.70189571380615
Global Step : 67738, [55, 400] [Total Loss, KL Loss, Reg Loss, Frame Loss] : [ nan nan 2.2744 nan]
100 Step Time : 142.38775205612183
Global Step : 67838, [55, 500] [Total Loss, KL Loss, Reg Loss, Frame Loss] : [ nan nan 2.436 nan]
100 Step Time : 142.91834259033203
Global Step : 67938, [55, 600] [Total Loss, KL Loss, Reg Loss, Frame Loss] : [ nan nan 2.3337 nan]
100 Step Time : 142.9723343849182
Global Step : 68038, [55, 700] [Total Loss, KL Loss, Reg Loss, Frame Loss] : [ nan nan 2.5294 nan]
100 Step Time : 142.89931344985962
Global Step : 68138, [55, 800] [Total Loss, KL Loss, Reg Loss, Frame Loss] : [ nan nan 2.3567 nan]
100 Step Time : 143.14595890045166
Global Step : 68238, [55, 900] [Total Loss, KL Loss, Reg Loss, Frame Loss] : [ nan nan 2.3591 nan]
100 Step Time : 143.44508004188538
Global Step : 68338, [55, 1000] [Total Loss, KL Loss, Reg Loss, Frame Loss] : [ nan nan 2.2139 nan]
100 Step Time : 143.32597756385803
Global Step : 68438, [55, 1100] [Total Loss, KL Loss, Reg Loss, Frame Loss] : [ nan nan 2.402 nan]
100 Step Time : 142.90216040611267
Global Step : 68538, [55, 1200] [Total Loss, KL Loss, Reg Loss, Frame Loss] : [ nan nan 2.4041 nan]
100 Step Time : 142.8380832672119
55 Epoch Training Loss : nan
100 [Total, KL, Reg, Frame Loss] : [ nan nan 2.042 nan]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant