Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

training log #31

Open
vshk42 opened this issue May 26, 2019 · 2 comments
Open

training log #31

vshk42 opened this issue May 26, 2019 · 2 comments

Comments

@vshk42
Copy link

vshk42 commented May 26, 2019

Hi,

Would you be able to provide the training log especially interested in the point matching loss after few epochs. I'm training it on google compute VM with 4 Nvidia Tesla K80 GPUs but the training speed is about 2.2 frames/sec. The point matching loss is between 10-11 for last few hour (although the flowLoss and the maskLoss is showing a good downward trend).

poch[3] Batch [260] Speed: 2.26 samples/sec Train-Flow_L2Loss=0.321852, Flow_CurLoss=0.000000, PointMatchingLoss=10.396450, MaskLoss=0.108871,
Epoch[3] Batch [280] Speed: 2.33 samples/sec Train-Flow_L2Loss=0.309070, Flow_CurLoss=0.000000, PointMatchingLoss=10.542913, MaskLoss=0.109692,
Epoch[3] Batch [300] Speed: 2.31 samples/sec Train-Flow_L2Loss=0.309031, Flow_CurLoss=0.000000, PointMatchingLoss=10.530530, MaskLoss=0.109173,
Epoch[3] Batch [320] Speed: 2.36 samples/sec Train-Flow_L2Loss=0.299514, Flow_CurLoss=0.000000, PointMatchingLoss=10.616660, MaskLoss=0.108448,
Epoch[3] Batch [340] Speed: 2.37 samples/sec Train-Flow_L2Loss=0.291039, Flow_CurLoss=0.000000, PointMatchingLoss=10.707121, MaskLoss=0.107355,
Epoch[3] Batch [360] Speed: 2.34 samples/sec Train-Flow_L2Loss=0.280393, Flow_CurLoss=0.000000, PointMatchingLoss=10.794026, MaskLoss=0.107118,
Epoch[3] Batch [380] Speed: 2.35 samples/sec Train-Flow_L2Loss=0.276359, Flow_CurLoss=0.202416, PointMatchingLoss=10.857506, MaskLoss=0.110217,
Epoch[3] Batch [400] Speed: 2.33 samples/sec Train-Flow_L2Loss=0.265881, Flow_CurLoss=0.000000, PointMatchingLoss=11.151155, MaskLoss=0.109521,
Epoch[3] Batch [420] Speed: 2.36 samples/sec Train-Flow_L2Loss=0.256921, Flow_CurLoss=0.000000, PointMatchingLoss=11.267041, MaskLoss=0.110485,
Epoch[3] Batch [440] Speed: 2.32 samples/sec Train-Flow_L2Loss=0.250065, Flow_CurLoss=0.000000, PointMatchingLoss=11.396601, MaskLoss=0.111172,
Epoch[3] Batch [460] Speed: 2.32 samples/sec Train-Flow_L2Loss=0.244040, Flow_CurLoss=0.000000, PointMatchingLoss=11.543343, MaskLoss=0.110957,
Epoch[3] Batch [480] Speed: 2.31 samples/sec Train-Flow_L2Loss=0.239315, Flow_CurLoss=0.000000, PointMatchingLoss=11.643859, MaskLoss=0.114532,

@vshk42
Copy link
Author

vshk42 commented May 26, 2019

The training finished for train_and_test_deepim_ape.sh. I'm getting some error during the test which I'm debugging.
raceback (most recent call last):
File "experiments/deepim/deepim_train_test.py", line 22, in
test.main()
File "experiments/deepim/../../deepim/test.py", line 210, in main
test_deepim()
File "experiments/deepim/../../deepim/test.py", line 203, in test_deepim
pairdb=pairdb,
File "experiments/deepim/../../deepim/core/tester.py", line 590, in pred_eval
data_batch = update_data_batch(config, data_batch, update_package)
File "experiments/deepim/../../lib/pair_matching/data_pair.py", line 79, in update_data_batch
package = update_package[ctx_idx]
IndexError: list index out of range

I will update once the test runs.
Attached are the plots for PointMatchingloss/lr plots, does it match with what you observe?

image

@huberl
Copy link

huberl commented Jun 11, 2019

I'm facing the same kind of behavior. All metrics show a significant decrease while the point matching loss seems to increase at the same time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants