NaN values in Predictions #7

nhafez · 2018-10-11T14:48:38Z

After following the instructions in the latest commit and then running the train_and_test_deepim_all.sh I got the following error:

Traceback (most recent call last):
File "experiments/deepim/deepim_train_test.py", line 20, in
train.main()
File "experiments/deepim/../../deepim/train.py", line 287, in main
config.TRAIN.begin_epoch, config.TRAIN.end_epoch, config.TRAIN.lr, config.TRAIN.lr_step)
File "experiments/deepim/../../deepim/train.py", line 280, in train_net
prefix=prefix)
File "experiments/deepim/../../deepim/core/module.py", line 1026, in fit
data_batch = interBatchUpdater.forward(data_batch, preds, config)
File "experiments/deepim/../../lib/pair_matching/batch_updater_py_multi.py", line 231, in forward
rot_type='QUAT')
File "experiments/deepim/../../lib/pair_matching/RT_transform.py", line 34, in calc_RT_delta
r = mat2quat(Rm_delta)
File "experiments/deepim/../../lib/pair_matching/RT_transform.py", line 459, in mat2quat
vals, vecs = np.linalg.eigh(K)
File "/home/saadhana/.local/lib/python2.7/site-packages/numpy/linalg/linalg.py", line 1410, in eigh
w, vt = gufunc(a, signature=signature, extobj=extobj)
File "/home/saadhana/.local/lib/python2.7/site-packages/numpy/linalg/linalg.py", line 95, in _raise_linalgerror_eigenvalues_nonconvergence
raise LinAlgError("Eigenvalues did not converge")
numpy.linalg.linalg.LinAlgError: Eigenvalues did not converge

Looks like the predicted poses are all NaN values.
I printed the rotation and translation predicted:

[array([[nan, nan, nan, nan],
[nan, nan, nan, nan],
[nan, nan, nan, nan],
[nan, nan, nan, nan]], dtype=float32)] [array([[nan, nan, nan],
[nan, nan, nan],
[nan, nan, nan],
[nan, nan, nan]], dtype=float32)]

Has anybody successfully trained the network for LINEMOD or OCCLUSION datasets?

liyi14 · 2018-10-12T02:00:11Z

I executed train_and_test_deepim_all.sh and doesn't found such error. Can you provide more information, like the context of this error? Does it fail in the first iteration or after a few iterations?

nhafez · 2018-10-12T09:51:04Z

For the first batch it completes one iteration and on the next one this error happens because the predictions are NaN

liyi14 · 2018-10-13T00:11:47Z

Can you change the frequent in the experiments/deepim/cfg/*_any/all.yaml(abbreviate as config below)->default to 1 and tell me the result running such modifications separately:

rerun using the train_test_deepim_all.sh
run the train_test_deepim_ape.yaml
change train_iter_size in config->network to 1 and run any config reporting error before
replace dataset: LM6D_REFINE+LM6D_REFINE_SYN to dataset: LM6D_REFINE and image_set: train_+train_ to image_set: train_
change the config->TRAIN->warmup_lr to 0.0
Tell me what happened after applying such modifications, thank you.

wangg12 added the awaiting_response label Dec 6, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NaN values in Predictions #7

NaN values in Predictions #7

nhafez commented Oct 11, 2018

liyi14 commented Oct 12, 2018

nhafez commented Oct 12, 2018

liyi14 commented Oct 13, 2018

NaN values in Predictions #7

NaN values in Predictions #7

Comments

nhafez commented Oct 11, 2018

liyi14 commented Oct 12, 2018

nhafez commented Oct 12, 2018

liyi14 commented Oct 13, 2018