-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
训练时仅计算1个epoch的结果就停止训练的问题 #121
Comments
我debug发现,到下面代码第一行这里,就没有继续运行下去了,这个运行优化是选取模型优化方法吗?新手理解可能不到位?
|
把这一段注释掉就不会停了 |
嗯呐,我后来解决了,原因是配置问题,我把CUDA驱动更新到10.0且相应tensorflow==1.12.0,就可以正常进行训练了,只是还有一个小问题,经常在运行的时候,会报错提示无法初始化: |
这个问题倒没有碰到过 |
我现在也遇到这个问题 请问 您解决了嘛 |
请问,楼主有没有遇到过在训练时python run_cnn.py train 开始后,只训练计算得到1个epoch 结果,就停止训练了?
我检查了显卡的显存占用,发现没有出现内存泄露问题。继而又尝试了两种显存的分配方式,①分配了0.4的显存 ②自动适应分配。得到的结果和上面一样,均只训练一个epoch就停止了。
Configuring TensorBoard and Saver... Loading training and validation data... Time usage: 0:00:11 2019-06-03 11:40:30.224462: I c:\users\user\source\repos\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1405] Found device 0 with properties: name: GeForce RTX 2060 major: 7 minor: 5 memoryClockRate(GHz): 1.71 pciBusID: 0000:01:00.0 totalMemory: 6.00GiB freeMemory: 4.89GiB 2019-06-03 11:40:30.237900: I c:\users\user\source\repos\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1484] Adding visible gpu devices: 0 2019-06-03 11:40:30.996786: I c:\users\user\source\repos\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-06-03 11:40:31.005045: I c:\users\user\source\repos\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:971] 0 2019-06-03 11:40:31.010727: I c:\users\user\source\repos\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:984] 0: N 2019-06-03 11:40:31.015885: I c:\users\user\source\repos\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 2457 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2060, pci bus id: 0000:01:00.0, compute capability: 7.5) Training and evaluating... Epoch: 1 Iter: 0, Train Loss: 2.3, Train Acc: 10.94%, Val Loss: 2.3, Val Acc: 10.02%, Time: 0:00:02 *
能给解答一下吗?
The text was updated successfully, but these errors were encountered: