You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Oct 26, 2022. It is now read-only.
Hi:
When I train the de-en model with the command in github README, I got following error info:
| [en] Dictionary: 24738 types
| [de] Dictionary: 35474 types
| IndexedDataset: loaded data-bin/iwslt14.tokenized.de-en with 160215 examples
| IndexedDataset: loaded data-bin/iwslt14.tokenized.de-en with 7282 examples
| IndexedDataset: loaded data-bin/iwslt14.tokenized.de-en with 6750 examples
| IndexedDataset: loaded data-bin/iwslt14.tokenized.de-en with 7282 examples
| IndexedDataset: loaded data-bin/iwslt14.tokenized.de-en with 6750 examples
THCudaCheck FAIL file=/tmp/luarocks_cutorch-scm-1-9601/cutorch/lib/THC/generic/THCTensorMath.cu line=26 error=77 : an illegal memory access was encountered
THCudaCheck FAIL file=/tmp/luarocks_cutorch-scm-1-9601/cutorch/lib/THC/generic/THCStorage.cu line=66 error=77 : an illegal memory access was encountered
/home/yulinlin/torch/install/bin/luajit: ...yulinlin/torch/install/share/lua/5.1/threads/threads.lua:183: [thread 6 callback] /home/yulinlin/torch/install/share/lua/5.1/nn/Container.lua:67:
In 3 module of nn.Sequential:
/home/yulinlin/torch/install/share/lua/5.1/nn/Dropout.lua:26: Creating MTGP constants failed. at /tmp/luarocks_cutorch-scm-1-9601/cutorch/lib/THC/THCTensorRandom.cu:33
stack traceback:
[C]: in function 'bernoulli'
/home/yulinlin/torch/install/share/lua/5.1/nn/Dropout.lua:26: in function </home/yulinlin/torch/install/share/lua/5.1/nn/Dropout.lua:17>
[C]: in function 'xpcall'
/home/yulinlin/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
...e/yulinlin/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'func'
...yulinlin/torch/install/share/lua/5.1/nngraph/gmodule.lua:345: in function 'neteval'
...yulinlin/torch/install/share/lua/5.1/nngraph/gmodule.lua:380: in function 'func'
...yulinlin/torch/install/share/lua/5.1/nngraph/gmodule.lua:345: in function 'neteval'
...yulinlin/torch/install/share/lua/5.1/nngraph/gmodule.lua:380: in function 'forward'
...hare/lua/5.1/fairseq/torchnet/ResumableDPOptimEngine.lua:370: in function <...hare/lua/5.1/fairseq/torchnet/ResumableDPOptimEngine.lua:347>
[C]: in function 'xpcall'
...yulinlin/torch/install/share/lua/5.1/threads/threads.lua:234: in function 'callback'
...e/yulinlin/torch/install/share/lua/5.1/threads/queue.lua:65: in function <...e/yulinlin/torch/install/share/lua/5.1/threads/queue.lua:41>
[C]: in function 'pcall'
...e/yulinlin/torch/install/share/lua/5.1/threads/queue.lua:40: in function 'dojob'
[string " local Queue = require 'threads.queue'..."]:13: in main chunk
WARNING: If you see a stack trace below, it doesn't point to the place where this error occurred. Please use only the one above.
stack traceback:
[C]: in function 'error'
/home/yulinlin/torch/install/share/lua/5.1/nn/Container.lua:67: in function 'rethrowErrors'
...e/yulinlin/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'func'
...yulinlin/torch/install/share/lua/5.1/nngraph/gmodule.lua:345: in function 'neteval'
...yulinlin/torch/install/share/lua/5.1/nngraph/gmodule.lua:380: in function 'func'
...yulinlin/torch/install/share/lua/5.1/nngraph/gmodule.lua:345: in function 'neteval'
...yulinlin/torch/install/share/lua/5.1/nngraph/gmodule.lua:380: in function 'forward'
...hare/lua/5.1/fairseq/torchnet/ResumableDPOptimEngine.lua:370: in function <...hare/lua/5.1/fairseq/torchnet/ResumableDPOptimEngine.lua:347>
[C]: in function 'xpcall'
...yulinlin/torch/install/share/lua/5.1/threads/threads.lua:234: in function 'callback'
...e/yulinlin/torch/install/share/lua/5.1/threads/queue.lua:65: in function <...e/yulinlin/torch/install/share/lua/5.1/threads/queue.lua:41>
[C]: in function 'pcall'
...e/yulinlin/torch/install/share/lua/5.1/threads/queue.lua:40: in function 'dojob'
[string " local Queue = require 'threads.queue'..."]:13: in main chunk
stack traceback:
[C]: in function 'error'
...yulinlin/torch/install/share/lua/5.1/threads/threads.lua:183: in function 'dojob'
...yulinlin/torch/install/share/lua/5.1/threads/threads.lua:264: in function 'synchronize'
...hare/lua/5.1/fairseq/torchnet/ResumableDPOptimEngine.lua:385: in function 'doTrain'
...hare/lua/5.1/fairseq/torchnet/ResumableDPOptimEngine.lua:189: in function 'train'
...in/torch/install/share/lua/5.1/fairseq/scripts/train.lua:410: in main chunk
[C]: in function 'require'
...rch/install/lib/luarocks/rocks/fairseq/scm-1/bin/fairseq:17: in main chunk
[C]: at 0x00406670
Segmentation fault
Does someone know any causes of this?
The text was updated successfully, but these errors were encountered:
Lingogo
changed the title
thcudacheck fail
realloc(): invalid next size
Sep 8, 2017
Lingogo
changed the title
realloc(): invalid next size
train the example model error: Segmentation fault
Sep 8, 2017
The backtrace points to an error in the nn.Dropout module. I can only guess, but are you maybe running out of GPU memory? Does your GPU work well for other use-cases?
The GPU memory of the computer is enough to run the training model, but I think the error may still be caused by the GPU environment, because when I switched to another computer, everything goes well.
I will check the error then. Thanks a lot. @jgehring
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Hi:
When I train the de-en model with the command in github README, I got following error info:
Does someone know any causes of this?
The text was updated successfully, but these errors were encountered: