-
Notifications
You must be signed in to change notification settings - Fork 382
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ConnectionRefusedError: [Errno 111] Connection refused #128
Comments
i use CenterNet to train VOC2007,but it's break at 79748/180000 (at 64th epoch). i try again and break at 68364/180000 again. my gpu memory-usage is 8051mib/5116mib. and the error is: training loss at iteration 68355: 5.786685466766357 training loss at iteration 68360: 6.084456443786621 And then the program doesn't run anymore |
training loss at iteration 79735: 5.6166815757751465
focal loss at iteration 79735: 5.0547027587890625
pull loss at iteration 79735: 0.0331345796585083
push loss at iteration 79735: 0.30962249636650085
regr loss at iteration 79735: 0.219222292304039
training loss at iteration 79740: 3.3387136459350586
focal loss at iteration 79740: 2.8270068168640137
pull loss at iteration 79740: 0.02639671042561531
push loss at iteration 79740: 0.2322157919406891
regr loss at iteration 79740: 0.25309425592422485
44%|█████████████▎ | 79741/180000 [36:08:34<45:26:33, 1.63s/it]Exception in thread Thread-3:
Traceback (most recent call last):
File "/home/zhanghan/anaconda3/envs/CornerNet_Lite/lib/python3.7/threading.py", line 917, in _bootstrap_inner
self.run()
File "/home/zhanghan/anaconda3/envs/CornerNet_Lite/lib/python3.7/threading.py", line 865, in run
self._target(*self._args, **self._kwargs)
File "train.py", line 51, in pin_memory
data = data_queue.get()
File "/home/zhanghan/anaconda3/envs/CornerNet_Lite/lib/python3.7/multiprocessing/queues.py", line 113, in get
return _ForkingPickler.loads(res)
File "/home/zhanghan/anaconda3/envs/CornerNet_Lite/lib/python3.7/site-packages/torch/multiprocessing/reductions.py", line 256, in rebuild_storage_fd
fd = df.detach()
File "/home/zhanghan/anaconda3/envs/CornerNet_Lite/lib/python3.7/multiprocessing/resource_sharer.py", line 57, in detach
with _resource_sharer.get_connection(self._id) as conn:
File "/home/zhanghan/anaconda3/envs/CornerNet_Lite/lib/python3.7/multiprocessing/resource_sharer.py", line 87, in get_connection
c = Client(address, authkey=process.current_process().authkey)
File "/home/zhanghan/anaconda3/envs/CornerNet_Lite/lib/python3.7/multiprocessing/connection.py", line 492, in Client
c = SocketClient(address)
File "/home/zhanghan/anaconda3/envs/CornerNet_Lite/lib/python3.7/multiprocessing/connection.py", line 619, in SocketClient
s.connect(address)
ConnectionRefusedError: [Errno 111] Connection refused
training loss at iteration 79745: 1.7480967044830322
focal loss at iteration 79745: 1.15070378780365
pull loss at iteration 79745: 0.019453493878245354
push loss at iteration 79745: 0.3843255937099457
regr loss at iteration 79745: 0.19361379742622375
44%|█████████████▎ | 79748/180000 [36:08:45<45:26:22, 1.63s/it]
^CTraceback (most recent call last):
File "train.py", line 203, in
Process Process-5:
Process Process-2:
Process Process-1:
Process Process-4:
The text was updated successfully, but these errors were encountered: