-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Many different problems #15
Comments
Thanks for the detailed feedback! @Fred-Erik I'm looking into these issues now. |
Okay most problems as reported by @Fred-Erik are fixed now. |
Thanks, everything is working now! I get about 2-3 results per second when evaluating the working model you uploaded. Now I'm going to try to get it to work with release_model.lua in the hope I can deploy this model to an ARM platform. :) You didn't perchance make any progress with it already? #5 |
Not yet, since I'm using cudnn in the CNN part. I'll look into this to make it working with GPU (note that would be much slower though) |
I guess you mean without GPU? Anyway, thank you very much! I'd be glad to hear it when you got something working. :) |
Hello all,
I get a lot of very basic problems when I try to use Im2Text. Could anyone please help me? It seems unlikely to me that I am the only one that encounters these problems, so it does make me wonder if I am the only one trying to use the model using the
Quick start
section.First, when I installed OpenNMT using
luarocks install --local https://raw.githubusercontent.com/OpenNMT/OpenNMT/master/rocks/opennmt-scm-1.rockspec
, but then I get the errorOpenNMT not found. Please enter the path to OpenNMT
. If I then enter the path to opennmt where it is installed according toluarocks list
it can't find it. Same goes when I manually clone OpenNMT from GitHub. But if I uninstall opennmt, Im2Text doesn't give any error and everything seems to work! So is this part ofQuick start
outdated?Next, I'm able to train the test data model without errors, and the validation perplexity is going down. But when I try to run the model provided in
Quick start
I get this error:/home/frederik/torch/install/bin/luajit: ./src/model.lua:55: attempt to get length of field 'idToVocab' (a nil value) stack traceback: ./src/model.lua:55: in function 'load' src/train.lua:234: in function 'main' src/train.lua:288: in main chunk [C]: in function 'dofile' ...erik/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk [C]: at 0x00405d50
So maybe this model is made for an older version of Im2Text and doesn't work anymore?
I then try to run my own trained model, which seems to work but is finished in 1 second and only writes an empty results.txt:
[07/31/17 14:45:56 INFO] Loading data state from /home/frederik/Documents/dev/experimental/Im2Text/model/model_190-data [07/31/17 14:45:56 INFO] Loaded [07/31/17 14:45:56 INFO] Running... [07/31/17 14:45:56 INFO] Results saved to /home/frederik/Documents/dev/experimental/Im2Text/results/results.txt.
Looking intro train.lua, I see that the problem is that the current epoch is extracted from trianData.epoch, and the maximum number of epochs is set to 1 for testing. So I set
trainData.epoch = 1
whenphase == test
, which works for a couple of images, depending on how for how many steps I trained my model, because it tries to test the trainingdata. When I then removetrainData:load(dataPath)
in line 272 of train.lua it does execute ondata/test.txt
.But the test speed is very slow. With a batch_size of 1 it takes 13(!) seconds to get the results for one image; 10 images take 2min23sec! When I set the batch_size to 16 (it still only takes 2-4 images at once) it uses only marginally more time at 15 seconds. The results are this, but that's probably because of the few training steps:
[07/31/17 14:56:41 INFO] 55358c150e.png { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } {
But is it supposed to be this slow? Evaluating the model during training happens in 4 seconds for batch_size 10, so there does seem to be something wrong I'd say.
Also, the training takes an enormous amount of VRAM. I had to reduce the batch_size to 18 during training for it to fit into the 12 GB I have available. During test time it fares better, with the model using batch_size 1 "only" using 1GB of VRAM. Is the RNN so memory-intensive? Because the CNN is quite small. And with
Number of parameters: 9382588
(9M parameters) and the default settings I'd say the the whole network should not be too computationally expensive.Finally, should I be able to train a working model using the data and model provided in the
Quik start
? Because the step perplexity doesn't go below ~35 and the Val perplexity doesn't go below 45.8, and the results keep being the same as described above (for every image exactly the same).The text was updated successfully, but these errors were encountered: