Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error while training pretrained model #209

Open
Khaled-Maani opened this issue Jun 1, 2018 · 4 comments
Open

Error while training pretrained model #209

Khaled-Maani opened this issue Jun 1, 2018 · 4 comments

Comments

@Khaled-Maani
Copy link

Khaled-Maani commented Jun 1, 2018

Hi,
Thanks for sharing your code.
I am trying to train the vgg16 as the instructions.
and I've got this error:
A fatal error occurred in asynchronous engine operation. If you do not know what caused this error, you can try set environment variable MXNET_ENGINE_TYPE to NaiveEngine and run with debugger (i.e. gdb). This will force all operations to be synchronous and backtrace will give you the series of calls that lead to this error. Remember to set MXNET_ENGINE_TYPE back to empty after debugging. Stack trace returned 8 entries: [bt] (0) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x17ec9d) [0x7f70d833ac9d] [bt] (1) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x17f068) [0x7f70d833b068] [bt] (2) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x27abe26) [0x7f70da967e26] [bt] (3) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x27af461) [0x7f70da96b461] [bt] (4) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x27ac01b) [0x7f70da96801b] [bt] (5) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7f70ce23fc80] [bt] (6) /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7f70e03f26ba] [bt] (7) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f70e012841d] Aborted (core dumped)

with this warning shown at first:

Warning: using pre-installed version of mxnet may cause unexpected error... (export MXNET_EXAMPLE_SSD_DISABLE_PRE_INSTALLED=1) to prevent loading pre-installed mxnet.
it still appears, although I export it.
I should mention that, I am using CPU to train, but I specified it by edit the train file and remove the condition in main(), because it doesn't insert in help.
I would be grateful to help me,
I am looking forward your solution.
Regards.

@zhreshold
Copy link
Owner

can you show the full stack trace? Might be related to CPU because no body even my self have trained on CPU before.

@Khaled-Maani
Copy link
Author

here you are:
Using mxnet as:
<module 'mxnet' from '/usr/local/lib/python2.7/dist-packages/mxnet/init.pyc'>
Warning: using pre-installed version of mxnet may cause unexpected error...
(export MXNET_EXAMPLE_SSD_DISABLE_PRE_INSTALLED=1) to prevent loading pre-installed mxnet.
[22:11:22] src/io/iter_image_det_recordio.cc:281: ImageDetRecordIOParser: /home/eng-khaled/mxnet-ssd/data/train.rec, use 3 threads for decoding..
[22:11:51] src/io/iter_image_det_recordio.cc:334: ImageDetRecordIOParser: /home/eng-khaled/mxnet-ssd/data/train.rec, label padding width: 350
[22:11:53] src/io/iter_image_det_recordio.cc:281: ImageDetRecordIOParser: /home/eng-khaled/mxnet-ssd/data/val.rec, use 3 threads for decoding..
[22:12:02] src/io/iter_image_det_recordio.cc:334: ImageDetRecordIOParser: /home/eng-khaled/mxnet-ssd/data/val.rec, label padding width: 350
INFO:root:Start training with (cpu(0)) from pretrained model /home/eng-khaled/mxnet-ssd/model/vgg16_reduced
[22:12:04] src/nnvm/legacy_json_util.cc:209: Loading symbol saved by previous version v0.8.0. Attempting to upgrade...
[22:12:04] src/nnvm/legacy_json_util.cc:217: Symbol successfully upgraded!
INFO:root:Freezed parameters: [conv1_1_weight,conv1_1_bias,conv1_2_weight,conv1_2_bias,conv2_1_weight,conv2_1_bias,conv2_2_weight,conv2_2_bias]
[22:12:26] src/operator/nn/mkldnn/mkldnn_base.cc:60: Allocate 6912 bytes with malloc directly
[22:13:01] src/operator/nn/mkldnn/mkldnn_base.cc:60: Allocate 147456 bytes with malloc directly
terminate called after throwing an instance of 'dmlc::Error'
what(): [22:13:53] src/engine/./threaded_engine.h:379: std::exception
A fatal error occurred in asynchronous engine operation. If you do not know what caused this error, you can try set environment variable MXNET_ENGINE_TYPE to NaiveEngine and run with debugger (i.e. gdb). This will force all operations to be synchronous and backtrace will give you the series of calls that lead to this error. Remember to set MXNET_ENGINE_TYPE back to empty after debugging.

Stack trace returned 8 entries:
[bt] (0) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x17ec9d) [0x7f612443dc9d]
[bt] (1) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x17f068) [0x7f612443e068]
[bt] (2) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x27abe26) [0x7f6126a6ae26]
[bt] (3) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x27af461) [0x7f6126a6e461]
[bt] (4) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x27ac01b) [0x7f6126a6b01b]
[bt] (5) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7f611a342c80]
[bt] (6) /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7f612c4f56ba]
[bt] (7) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f612c22b41d]

Aborted (core dumped)
I am appreciate your time.

@Khaled-Maani
Copy link
Author

Sir, did I misunderstand your request?, did mean to follow the instructions in error sentences?
One more question please, could I train using CPU or it's impossible?
Many thanks.

@MedericTrungVU
Copy link

I have the same problem, could anyone give some help ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants