training log #31

vshk42 · 2019-05-26T07:51:53Z

Hi,

Would you be able to provide the training log especially interested in the point matching loss after few epochs. I'm training it on google compute VM with 4 Nvidia Tesla K80 GPUs but the training speed is about 2.2 frames/sec. The point matching loss is between 10-11 for last few hour (although the flowLoss and the maskLoss is showing a good downward trend).

poch[3] Batch [260] Speed: 2.26 samples/sec Train-Flow_L2Loss=0.321852, Flow_CurLoss=0.000000, PointMatchingLoss=10.396450, MaskLoss=0.108871,
Epoch[3] Batch [280] Speed: 2.33 samples/sec Train-Flow_L2Loss=0.309070, Flow_CurLoss=0.000000, PointMatchingLoss=10.542913, MaskLoss=0.109692,
Epoch[3] Batch [300] Speed: 2.31 samples/sec Train-Flow_L2Loss=0.309031, Flow_CurLoss=0.000000, PointMatchingLoss=10.530530, MaskLoss=0.109173,
Epoch[3] Batch [320] Speed: 2.36 samples/sec Train-Flow_L2Loss=0.299514, Flow_CurLoss=0.000000, PointMatchingLoss=10.616660, MaskLoss=0.108448,
Epoch[3] Batch [340] Speed: 2.37 samples/sec Train-Flow_L2Loss=0.291039, Flow_CurLoss=0.000000, PointMatchingLoss=10.707121, MaskLoss=0.107355,
Epoch[3] Batch [360] Speed: 2.34 samples/sec Train-Flow_L2Loss=0.280393, Flow_CurLoss=0.000000, PointMatchingLoss=10.794026, MaskLoss=0.107118,
Epoch[3] Batch [380] Speed: 2.35 samples/sec Train-Flow_L2Loss=0.276359, Flow_CurLoss=0.202416, PointMatchingLoss=10.857506, MaskLoss=0.110217,
Epoch[3] Batch [400] Speed: 2.33 samples/sec Train-Flow_L2Loss=0.265881, Flow_CurLoss=0.000000, PointMatchingLoss=11.151155, MaskLoss=0.109521,
Epoch[3] Batch [420] Speed: 2.36 samples/sec Train-Flow_L2Loss=0.256921, Flow_CurLoss=0.000000, PointMatchingLoss=11.267041, MaskLoss=0.110485,
Epoch[3] Batch [440] Speed: 2.32 samples/sec Train-Flow_L2Loss=0.250065, Flow_CurLoss=0.000000, PointMatchingLoss=11.396601, MaskLoss=0.111172,
Epoch[3] Batch [460] Speed: 2.32 samples/sec Train-Flow_L2Loss=0.244040, Flow_CurLoss=0.000000, PointMatchingLoss=11.543343, MaskLoss=0.110957,
Epoch[3] Batch [480] Speed: 2.31 samples/sec Train-Flow_L2Loss=0.239315, Flow_CurLoss=0.000000, PointMatchingLoss=11.643859, MaskLoss=0.114532,

vshk42 · 2019-05-26T18:58:54Z

The training finished for train_and_test_deepim_ape.sh. I'm getting some error during the test which I'm debugging.
raceback (most recent call last):
File "experiments/deepim/deepim_train_test.py", line 22, in
test.main()
File "experiments/deepim/../../deepim/test.py", line 210, in main
test_deepim()
File "experiments/deepim/../../deepim/test.py", line 203, in test_deepim
pairdb=pairdb,
File "experiments/deepim/../../deepim/core/tester.py", line 590, in pred_eval
data_batch = update_data_batch(config, data_batch, update_package)
File "experiments/deepim/../../lib/pair_matching/data_pair.py", line 79, in update_data_batch
package = update_package[ctx_idx]
IndexError: list index out of range

I will update once the test runs.
Attached are the plots for PointMatchingloss/lr plots, does it match with what you observe?

huberl · 2019-06-11T08:55:19Z

I'm facing the same kind of behavior. All metrics show a significant decrease while the point matching loss seems to increase at the same time.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

training log #31

training log #31

vshk42 commented May 26, 2019

vshk42 commented May 26, 2019

huberl commented Jun 11, 2019

training log #31

training log #31

Comments

vshk42 commented May 26, 2019

vshk42 commented May 26, 2019

huberl commented Jun 11, 2019