Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NaN values in Predictions #7

Open
nhafez opened this issue Oct 11, 2018 · 3 comments
Open

NaN values in Predictions #7

nhafez opened this issue Oct 11, 2018 · 3 comments

Comments

@nhafez
Copy link

nhafez commented Oct 11, 2018

After following the instructions in the latest commit and then running the train_and_test_deepim_all.sh I got the following error:

Traceback (most recent call last):
File "experiments/deepim/deepim_train_test.py", line 20, in
train.main()
File "experiments/deepim/../../deepim/train.py", line 287, in main
config.TRAIN.begin_epoch, config.TRAIN.end_epoch, config.TRAIN.lr, config.TRAIN.lr_step)
File "experiments/deepim/../../deepim/train.py", line 280, in train_net
prefix=prefix)
File "experiments/deepim/../../deepim/core/module.py", line 1026, in fit
data_batch = interBatchUpdater.forward(data_batch, preds, config)
File "experiments/deepim/../../lib/pair_matching/batch_updater_py_multi.py", line 231, in forward
rot_type='QUAT')
File "experiments/deepim/../../lib/pair_matching/RT_transform.py", line 34, in calc_RT_delta
r = mat2quat(Rm_delta)
File "experiments/deepim/../../lib/pair_matching/RT_transform.py", line 459, in mat2quat
vals, vecs = np.linalg.eigh(K)
File "/home/saadhana/.local/lib/python2.7/site-packages/numpy/linalg/linalg.py", line 1410, in eigh
w, vt = gufunc(a, signature=signature, extobj=extobj)
File "/home/saadhana/.local/lib/python2.7/site-packages/numpy/linalg/linalg.py", line 95, in _raise_linalgerror_eigenvalues_nonconvergence
raise LinAlgError("Eigenvalues did not converge")
numpy.linalg.linalg.LinAlgError: Eigenvalues did not converge

Looks like the predicted poses are all NaN values.
I printed the rotation and translation predicted:

[array([[nan, nan, nan, nan],
[nan, nan, nan, nan],
[nan, nan, nan, nan],
[nan, nan, nan, nan]], dtype=float32)] [array([[nan, nan, nan],
[nan, nan, nan],
[nan, nan, nan],
[nan, nan, nan]], dtype=float32)]

Has anybody successfully trained the network for LINEMOD or OCCLUSION datasets?

@liyi14
Copy link
Owner

liyi14 commented Oct 12, 2018

I executed train_and_test_deepim_all.sh and doesn't found such error. Can you provide more information, like the context of this error? Does it fail in the first iteration or after a few iterations?

@nhafez
Copy link
Author

nhafez commented Oct 12, 2018

For the first batch it completes one iteration and on the next one this error happens because the predictions are NaN

@liyi14
Copy link
Owner

liyi14 commented Oct 13, 2018

Can you change the frequent in the experiments/deepim/cfg/*_any/all.yaml(abbreviate as config below)->default to 1 and tell me the result running such modifications separately:

  1. rerun using the train_test_deepim_all.sh
  2. run the train_test_deepim_ape.yaml
  3. change train_iter_size in config->network to 1 and run any config reporting error before
  4. replace dataset: LM6D_REFINE+LM6D_REFINE_SYN to dataset: LM6D_REFINE and image_set: train_+train_ to image_set: train_
  5. change the config->TRAIN->warmup_lr to 0.0
    Tell me what happened after applying such modifications, thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants