Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA illegal memory access when training fpn+ROITransform #10

Open
husthkk opened this issue Jul 18, 2019 · 0 comments
Open

CUDA illegal memory access when training fpn+ROITransform #10

husthkk opened this issue Jul 18, 2019 · 0 comments

Comments

@husthkk
Copy link

husthkk commented Jul 18, 2019

out of memory
invalid argument
an illegal memory access was encountered
an illegal memory access was encountered
an illegal memory access was encountered
an illegal memory access was encountered
an illegal memory access was encountered
an illegal memory access was encountered
an illegal memory access was encountered
an illegal memory access was encountered
[15:49:11] /home/hkk/mxnet/dmlc-core/include/dmlc/./logging.h:304: [15:49:11] /home/hkk/mxnet/mshadow/mshadow/./tensor_gpu-inl.h:69: Check failed: e == cudaSuccess CUDA: an illegal memory access was encountered

Stack trace returned 10 entries:
[bt] (0) /home/hkk/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7f5d863bc46c]
[bt] (1) /home/hkk/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN7mshadow4CopyINS_3gpuENS_3cpuELi1EfEEvNS_6TensorIT_XT1_ET2_EENS3_IT0_XT1_ES5_EE14cudaMemcpyKindPNS_6StreamIS1_EE+0x1f8) [0x7f5d88074cd8]
[bt] (2) /home/hkk/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN5mxnet7ndarray4CopyIN7mshadow3cpuENS2_3gpuEEEvRKNS_5TBlobEPS5_NS_7ContextES9_NS_10RunContextE+0x3267) [0x7f5d88050c87]
[bt] (3) /home/hkk/mxnet/python/mxnet/../../lib/libmxnet.so(+0x136d44f) [0x7f5d8708344f]
[bt] (4) /home/hkk/mxnet/python/mxnet/../../lib/libmxnet.so(ZNSt17_Function_handlerIFvN5mxnet10RunContextENS0_6engine18CallbackOnCompleteEEZNS0_6Engine8PushSyncESt8functionIFvS1_EENS0_7ContextERKSt6vectorIPNS2_3VarESaISC_EESG_NS0_10FnPropertyEiPKcEUlS1_S3_E_E9_M_invokeERKSt9_Any_dataOS1_OS3+0x3d) [0x7f5d86f59fdd]
[bt] (5) /home/hkk/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN5mxnet6engine14ThreadedEngine15ExecuteOprBlockENS_10RunContextEPNS0_8OprBlockE+0x87) [0x7f5d87399507]
[bt] (6) /home/hkk/mxnet/python/mxnet/../../lib/libmxnet.so(ZNSt17_Function_handlerIFvSt10shared_ptrIN5mxnet6engine10ThreadPool11SimpleEventEEEZZNS2_23ThreadedEnginePerDevice13PushToExecuteEPNS2_8OprBlockEbENKUlvE0_clEvEUlS5_E_E9_M_invokeERKSt9_Any_dataOS5+0x100) [0x7f5d873a1c70]
[bt] (7) /home/hkk/mxnet/python/mxnet/../../lib/libmxnet.so(_ZNSt6thread5_ImplISt12_Bind_simpleIFSt8functionIFvSt10shared_ptrIN5mxnet6engine10ThreadPool11SimpleEventEEEES8_EEE6_M_runEv+0x4e) [0x7f5d8739b85e]
[bt] (8) /home/hkk/miniconda3/envs/python2/bin/../lib/libstdc++.so.6(+0xb8678) [0x7f5ddef2f678]
[bt] (9) /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7f5de15366ba]

[15:49:11] /home/hkk/mxnet/dmlc-core/include/dmlc/./logging.h:304: [15:49:11] /home/hkk/mxnet/mshadow/mshadow/./tensor_gpu-inl.h:69: Check failed: e == cudaSuccess CUDA: an illegal memory access was encountered

Stack trace returned 10 entries:
[bt] (0) /home/hkk/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7f5d863bc46c]
[bt] (1) /home/hkk/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN7mshadow4CopyINS_3gpuENS_3cpuELi1EfEEvNS_6TensorIT_XT1_ET2_EENS3_IT0_XT1_ES5_EE14cudaMemcpyKindPNS_6StreamIS1_EE+0x1f8) [0x7f5d88074cd8]
[bt] (2) /home/hkk/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN5mxnet7ndarray4CopyIN7mshadow3cpuENS2_3gpuEEEvRKNS_5TBlobEPS5_NS_7ContextES9_NS_10RunContextE+0x3267) [0x7f5d88050c87]
[bt] (3) /home/hkk/mxnet/python/mxnet/../../lib/libmxnet.so(+0x136d44f) [0x7f5d8708344f]
[bt] (4) /home/hkk/mxnet/python/mxnet/../../lib/libmxnet.so(ZNSt17_Function_handlerIFvN5mxnet10RunContextENS0_6engine18CallbackOnCompleteEEZNS0_6Engine8PushSyncESt8functionIFvS1_EENS0_7ContextERKSt6vectorIPNS2_3VarESaISC_EESG_NS0_10FnPropertyEiPKcEUlS1_S3_E_E9_M_invokeERKSt9_Any_dataOS1_OS3+0x3d) [0x7f5d86f59fdd]
[bt] (5) /home/hkk/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN5mxnet6engine14ThreadedEngine15ExecuteOprBlockENS_10RunContextEPNS0_8OprBlockE+0x87) [0x7f5d87399507]
[bt] (6) /home/hkk/mxnet/python/mxnet/../../lib/libmxnet.so(ZNSt17_Function_handlerIFvSt10shared_ptrIN5mxnet6engine10ThreadPool11SimpleEventEEEZZNS2_23ThreadedEnginePerDevice13PushToExecuteEPNS2_8OprBlockEbENKUlvE0_clEvEUlS5_E_E9_M_invokeERKSt9_Any_dataOS5+0x100) [0x7f5d873a1c70]
[bt] (7) /home/hkk/mxnet/python/mxnet/../../lib/libmxnet.so(_ZNSt6thread5_ImplISt12_Bind_simpleIFSt8functionIFvSt10shared_ptrIN5mxnet6engine10ThreadPool11SimpleEventEEEES8_EEE6_M_runEv+0x4e) [0x7f5d8739b85e]
[bt] (8) /home/hkk/miniconda3/envs/python2/bin/../lib/libstdc++.so.6(+0xb8678) [0x7f5ddef2f678]
[bt] (9) /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7f5de15366ba]

[15:49:11] /home/hkk/mxnet/dmlc-core/include/dmlc/./logging.h:304: [15:49:11] src/engine/./threaded_engine.h:329: [15:49:11] /home/hkk/mxnet/mshadow/mshadow/./tensor_gpu-inl.h:69: Check failed: e == cudaSuccess CUDA: an illegal memory access was encountered

Stack trace returned 10 entries:
[bt] (0) /home/hkk/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7f5d863bc46c]
[bt] (1) /home/hkk/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN7mshadow4CopyINS_3gpuENS_3cpuELi1EfEEvNS_6TensorIT_XT1_ET2_EENS3_IT0_XT1_ES5_EE14cudaMemcpyKindPNS_6StreamIS1_EE+0x1f8) [0x7f5d88074cd8]
[bt] (2) /home/hkk/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN5mxnet7ndarray4CopyIN7mshadow3cpuENS2_3gpuEEEvRKNS_5TBlobEPS5_NS_7ContextES9_NS_10RunContextE+0x3267) [0x7f5d88050c87]
[bt] (3) /home/hkk/mxnet/python/mxnet/../../lib/libmxnet.so(+0x136d44f) [0x7f5d8708344f]
[bt] (4) /home/hkk/mxnet/python/mxnet/../../lib/libmxnet.so(ZNSt17_Function_handlerIFvN5mxnet10RunContextENS0_6engine18CallbackOnCompleteEEZNS0_6Engine8PushSyncESt8functionIFvS1_EENS0_7ContextERKSt6vectorIPNS2_3VarESaISC_EESG_NS0_10FnPropertyEiPKcEUlS1_S3_E_E9_M_invokeERKSt9_Any_dataOS1_OS3+0x3d) [0x7f5d86f59fdd]
[bt] (5) /home/hkk/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN5mxnet6engine14ThreadedEngine15ExecuteOprBlockENS_10RunContextEPNS0_8OprBlockE+0x87) [0x7f5d87399507]
[bt] (6) /home/hkk/mxnet/python/mxnet/../../lib/libmxnet.so(ZNSt17_Function_handlerIFvSt10shared_ptrIN5mxnet6engine10ThreadPool11SimpleEventEEEZZNS2_23ThreadedEnginePerDevice13PushToExecuteEPNS2_8OprBlockEbENKUlvE0_clEvEUlS5_E_E9_M_invokeERKSt9_Any_dataOS5+0x100) [0x7f5d873a1c70]
[bt] (7) /home/hkk/mxnet/python/mxnet/../../lib/libmxnet.so(_ZNSt6thread5_ImplISt12_Bind_simpleIFSt8functionIFvSt10shared_ptrIN5mxnet6engine10ThreadPool11SimpleEventEEEES8_EEE6_M_runEv+0x4e) [0x7f5d8739b85e]
[bt] (8) /home/hkk/miniconda3/envs/python2/bin/../lib/libstdc++.so.6(+0xb8678) [0x7f5ddef2f678]
[bt] (9) /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7f5de15366ba]

An fatal error occurred in asynchronous engine operation. If you do not know what caused this error, you can try set environment variable MXNET_ENGINE_TYPE to NaiveEngine and run with debugger (i.e. gdb). This will force all operations to be synchronous and backtrace will give you the series of calls that lead to this error. Remember to set MXNET_ENGINE_TYPE back to empty after debugging.

Stack trace returned 7 entries:
[bt] (0) /home/hkk/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7f5d863bc46c]
[bt] (1) /home/hkk/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN5mxnet6engine14ThreadedEngine15ExecuteOprBlockENS_10RunContextEPNS0_8OprBlockE+0x317) [0x7f5d87399797]
[bt] (2) /home/hkk/mxnet/python/mxnet/../../lib/libmxnet.so(ZNSt17_Function_handlerIFvSt10shared_ptrIN5mxnet6engine10ThreadPool11SimpleEventEEEZZNS2_23ThreadedEnginePerDevice13PushToExecuteEPNS2_8OprBlockEbENKUlvE0_clEvEUlS5_E_E9_M_invokeERKSt9_Any_dataOS5+0x100) [0x7f5d873a1c70]
[bt] (3) /home/hkk/mxnet/python/mxnet/../../lib/libmxnet.so(_ZNSt6thread5_ImplISt12_Bind_simpleIFSt8functionIFvSt10shared_ptrIN5mxnet6engine10ThreadPool11SimpleEventEEEES8_EEE6_M_runEv+0x4e) [0x7f5d8739b85e]
[bt] (4) /home/hkk/miniconda3/envs/python2/bin/../lib/libstdc++.so.6(+0xb8678) [0x7f5ddef2f678]
[bt] (5) /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7f5de15366ba]
[bt] (6) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f5de0b5c41d]

terminate called after throwing an instance of 'dmlc::Error'
what(): [15:49:11] src/engine/./threaded_engine.h:329: [15:49:11] /home/hkk/mxnet/mshadow/mshadow/./tensor_gpu-inl.h:69: Check failed: e == cudaSuccess CUDA: an illegal memory access was encountered

Stack trace returned 10 entries:
[bt] (0) /home/hkk/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7f5d863bc46c]
[bt] (1) /home/hkk/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN7mshadow4CopyINS_3gpuENS_3cpuELi1EfEEvNS_6TensorIT_XT1_ET2_EENS3_IT0_XT1_ES5_EE14cudaMemcpyKindPNS_6StreamIS1_EE+0x1f8) [0x7f5d88074cd8]
[bt] (2) /home/hkk/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN5mxnet7ndarray4CopyIN7mshadow3cpuENS2_3gpuEEEvRKNS_5TBlobEPS5_NS_7ContextES9_NS_10RunContextE+0x3267) [0x7f5d88050c87]
[bt] (3) /home/hkk/mxnet/python/mxnet/../../lib/libmxnet.so(+0x136d44f) [0x7f5d8708344f]
[bt] (4) /home/hkk/mxnet/python/mxnet/../../lib/libmxnet.so(ZNSt17_Function_handlerIFvN5mxnet10RunContextENS0_6engine18CallbackOnCompleteEEZNS0_6Engine8PushSyncESt8functionIFvS1_EENS0_7ContextERKSt6vectorIPNS2_3VarESaISC_EESG_NS0_10FnPropertyEiPKcEUlS1_S3_E_E9_M_invokeERKSt9_Any_dataOS1_OS3+0x3d) [0x7f5d86f59fdd]
[bt] (5) /home/hkk/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN5mxnet6engine14ThreadedEngine15ExecuteOprBlockENS_10RunContextEPNS0_8OprBlockE+0x87) [0x7f5d87399507]
[bt] (6) /home/hkk/mxnet/python/mxnet/../../lib/libmxnet.so(ZNSt17_Function_handlerIFvSt10shared_ptrIN5mxnet6engine10ThreadPool11SimpleEventEEEZZNS2_23ThreadedEnginePerDevice13PushToExecuteEPNS2_8OprBlockEbENKUlvE0_clEvEUlS5_E_E9_M_invokeERKSt9_Any_dataOS5+0x100) [0x7f5d873a1c70]
[bt] (7) /home/hkk/mxnet/python/mxnet/../../lib/libmxnet.so(_ZNSt6thread5_ImplISt12_Bind_simpleIFSt8functionIFvSt10shared_ptrIN5mxnet6engine10ThreadPool11SimpleEventEEEES8_EEE6_M_runEv+0x4e) [0x7f5d8739b85e]
[bt] (8) /home/hkk/miniconda3/envs/python2/bin/../lib/libstdc++.so.6(+0xb8678) [0x7f5ddef2f678]
[bt] (9) /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7f5de15366ba]

An fatal error occurred in asynchronous engine operation. If you do not know what caused this error, you can try set environment variable MXNET_ENGINE_TYPE to NaiveEngine and run with debugger (i.e. gdb). This will force all operations to be synchronous and backtrace will give you the series of calls that lead to this error. Remember to set MXNET_ENGINE_TYPE back to empty after debugging.

Stack trace returned 7 entries:
[bt] (0) /home/hkk/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7f5d863bc46c]
[bt] (1) /home/hkk/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN5mxnet6engine14ThreadedEngine15ExecuteOprBlockENS_10RunContextEPNS0_8OprBlockE+0x317) [0x7f5d87399797]
[bt] (2) /home/hkk/mxnet/python/mxnet/../../lib/libmxnet.so(ZNSt17_Function_handlerIFvSt10shared_ptrIN5mxnet6engine10ThreadPool11SimpleEventEEEZZNS2_23ThreadedEnginePerDevice13PushToExecuteEPNS2_8OprBlockEbENKUlvE0_clEvEUlS5_E_E9_M_invokeERKSt9_Any_dataOS5+0x100) [0x7f5d873a1c70]
[bt] (3) /home/hkk/mxnet/python/mxnet/../../lib/libmxnet.so(_ZNSt6thread5_ImplISt12_Bind_simpleIFSt8functionIFvSt10shared_ptrIN5mxnet6engine10ThreadPool11SimpleEventEEEES8_EEE6_M_runEv+0x4e) [0x7f5d8739b85e]
[bt] (4) /home/hkk/miniconda3/envs/python2/bin/../lib/libstdc++.so.6(+0xb8678) [0x7f5ddef2f678]
[bt] (5) /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7f5de15366ba]
[bt] (6) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f5de0b5c41d]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant