Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error training inceptionv3 #226

Open
ssdevel opened this issue Nov 10, 2018 · 4 comments
Open

Error training inceptionv3 #226

ssdevel opened this issue Nov 10, 2018 · 4 comments

Comments

@ssdevel
Copy link

ssdevel commented Nov 10, 2018

Hi, i want to train the inceptionv3 network. I use the following command:

python train.py --network inceptionv3 --prefix final\inception\new\ssd --finetune 1 --end-epoch 400 --num-class 1 --class-names billboard --data-shape 512 --num-example 2340 --batch-size 4 --train-path records\final_30_train.rec --val-path records\final_30_val.rec

also i tried with adding the pretrained parameter. I renamed the files but i got this error:

Traceback (most recent call last):
File "train.py", line 149, in
tensorboard=args.tensorboard)
File "C:\Users\stefa\Desktop\mxnet-ssd-master\train\train_net.py", line 256, in train_net
exe = net.simple_bind(mx.cpu(), data=(1, 3, data_shape[0], data_shape[1]), label=(1, 1, 5), grad_req='null')
File "C:\Users\stefa\Anaconda2\lib\site-packages\mxnet\symbol\symbol.py", line 1519, in simple_bind
raise RuntimeError(error_msg)
RuntimeError: simple_bind error. Arguments:
data: (1, 3, 3, 512)
label: (1, 1, 5)
Error in operator conv_1_conv2d: [14:17:19] c:\jenkins\workspace\mxnet-tag\mxnet\src\operator\nn\convolution.cc:191: Check failed: dilated_ksize_y <= AddPad(dshape[2], param_.pad[0]) (3 vs. 1) kernel size exceed input

When i run this command:
python train.py --network inceptionv3 --prefix final\inception\ssd_inceptionv3_512 --begin-epoch 215 --end-epoch 400 --num-class 1 --class-names billboard --data-shape 512 --num-example 2340 --batch-size 4 --train-path records\final_30_train.rec --val-path records\final_30_val.rec --pretrained final\inception\ssd_inceptionv3_512

I got this error:
Traceback (most recent call last):
File "train.py", line 149, in
tensorboard=args.tensorboard)
File "C:\Users\stefa\Desktop\mxnet-ssd-master\train\train_net.py", line 355, in train_net
monitor=monitor)
File "C:\Users\stefa\Anaconda2\lib\site-packages\mxnet\module\base_module.py", line 488, in fit
allow_missing=allow_missing, force_init=force_init)
File "C:\Users\stefa\Anaconda2\lib\site-packages\mxnet\module\module.py", line 309, in init_params
_impl(desc, arr, arg_params)
File "C:\Users\stefa\Anaconda2\lib\site-packages\mxnet\module\module.py", line 297, in _impl
cache_arr.copyto(arr)
File "C:\Users\stefa\Anaconda2\lib\site-packages\mxnet\ndarray\ndarray.py", line 1970, in copyto
return _internal._copyto(self, out=other)
File "", line 25, in _copyto
File "C:\Users\stefa\Anaconda2\lib\site-packages\mxnet_ctypes\ndarray.py", line 92, in _imperative_invoke
ctypes.byref(out_stypes)))
File "C:\Users\stefa\Anaconda2\lib\site-packages\mxnet\base.py", line 149, in check_call
raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: [14:27:05] c:\jenkins\workspace\mxnet-tag\mxnet\src\operator\elemwise_op_common.h:123: Check failed: assign(&dattr, (*vec)[i]) Incompatible attr in node at 0-th output: expected [126], got [12]

When i use this comand:

python train.py --network inceptionv3 --prefix final\inception\ssd_inceptionv3_512 --begin-epoch 215 --end-epoch 400 --num-class 1 --class-names billboard --data-shape 512 --num-example 2340 --batch-size 4 --train-path records\final_30_train.rec --val-path records\final_30_val.rec

the model does not converge.

Can you help me?

Thanks

@ssdevel
Copy link
Author

ssdevel commented Nov 11, 2018

I solved this issue

@NewbYang
Copy link

I solved this issue

can you help me to solve the same problem??

@liuzhenhui
Copy link

can you tell me the reason? I got the same wrong .like this :

RuntimeError: simple_bind error. Arguments:
label: (40, 11, 6)
data: (40, 3, 160, 48)
Error in operator broadcast_mul0: [13:59:59] src/operator/tensor/./elemwise_binary_broadcast_op.h:68: Check failed: l == 1 || r == 1 operands could not be broadcast together with shapes [40,64,20,6] [40,64,21,7]

thank you ~
my email: [email protected]

@ssdevel
Copy link
Author

ssdevel commented Nov 27, 2018

I used this command and was working:
python train.py --network inceptionv3 --prefix --begin-epoch 1 --end-epoch 200 --num-example 1 --class-names --data-shape 512 --num-example --batch-size --train-path --val-path

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants