Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Many different problems #15

Closed
Fred-Erik opened this issue Jul 31, 2017 · 5 comments
Closed

Many different problems #15

Fred-Erik opened this issue Jul 31, 2017 · 5 comments

Comments

@Fred-Erik
Copy link

Fred-Erik commented Jul 31, 2017

Hello all,

I get a lot of very basic problems when I try to use Im2Text. Could anyone please help me? It seems unlikely to me that I am the only one that encounters these problems, so it does make me wonder if I am the only one trying to use the model using the Quick start section.

  1. First, when I installed OpenNMT using luarocks install --local https://raw.githubusercontent.com/OpenNMT/OpenNMT/master/rocks/opennmt-scm-1.rockspec, but then I get the error OpenNMT not found. Please enter the path to OpenNMT. If I then enter the path to opennmt where it is installed according to luarocks list it can't find it. Same goes when I manually clone OpenNMT from GitHub. But if I uninstall opennmt, Im2Text doesn't give any error and everything seems to work! So is this part of Quick start outdated?

  2. Next, I'm able to train the test data model without errors, and the validation perplexity is going down. But when I try to run the model provided in Quick start I get this error:
    /home/frederik/torch/install/bin/luajit: ./src/model.lua:55: attempt to get length of field 'idToVocab' (a nil value) stack traceback: ./src/model.lua:55: in function 'load' src/train.lua:234: in function 'main' src/train.lua:288: in main chunk [C]: in function 'dofile' ...erik/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk [C]: at 0x00405d50
    So maybe this model is made for an older version of Im2Text and doesn't work anymore?

  3. I then try to run my own trained model, which seems to work but is finished in 1 second and only writes an empty results.txt:
    [07/31/17 14:45:56 INFO] Loading data state from /home/frederik/Documents/dev/experimental/Im2Text/model/model_190-data [07/31/17 14:45:56 INFO] Loaded [07/31/17 14:45:56 INFO] Running... [07/31/17 14:45:56 INFO] Results saved to /home/frederik/Documents/dev/experimental/Im2Text/results/results.txt.
    Looking intro train.lua, I see that the problem is that the current epoch is extracted from trianData.epoch, and the maximum number of epochs is set to 1 for testing. So I set trainData.epoch = 1 when phase == test, which works for a couple of images, depending on how for how many steps I trained my model, because it tries to test the trainingdata. When I then remove trainData:load(dataPath) in line 272 of train.lua it does execute on data/test.txt.

  4. But the test speed is very slow. With a batch_size of 1 it takes 13(!) seconds to get the results for one image; 10 images take 2min23sec! When I set the batch_size to 16 (it still only takes 2-4 images at once) it uses only marginally more time at 15 seconds. The results are this, but that's probably because of the few training steps:
    [07/31/17 14:56:41 INFO] 55358c150e.png { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } {
    But is it supposed to be this slow? Evaluating the model during training happens in 4 seconds for batch_size 10, so there does seem to be something wrong I'd say.

  5. Also, the training takes an enormous amount of VRAM. I had to reduce the batch_size to 18 during training for it to fit into the 12 GB I have available. During test time it fares better, with the model using batch_size 1 "only" using 1GB of VRAM. Is the RNN so memory-intensive? Because the CNN is quite small. And with Number of parameters: 9382588 (9M parameters) and the default settings I'd say the the whole network should not be too computationally expensive.

  6. Finally, should I be able to train a working model using the data and model provided in the Quik start? Because the step perplexity doesn't go below ~35 and the Val perplexity doesn't go below 45.8, and the results keep being the same as described above (for every image exactly the same).

@Fred-Erik Fred-Erik changed the title Is this actually maintained? Many different problems Jul 31, 2017
@da03
Copy link
Collaborator

da03 commented Jul 31, 2017

Thanks for the detailed feedback! @Fred-Erik I'm looking into these issues now.

@da03
Copy link
Collaborator

da03 commented Aug 9, 2017

Okay most problems as reported by @Fred-Erik are fixed now.
For 4, the reason that the test is even slower than train is because the model is not well trained, so the model cannot produce an END-OF-SEQUENCE properly, in which case beam search would go until the max_num_tokens steps (500 as in default) have been reached. For a trained model, the beam search would typically end within several ten-ish steps, which is much faster.
For 5, that is normal behavior, because for training we need to keep most decoder hidden states, including the attentions and context which is of the size of the image. This finding was mentioned in our paper, as well as by Bluche, Théodore, Jérôme Louradour, and Ronaldo Messina. "Scan, attend and read: End-to-end handwritten paragraph recognition with mdlstm attention." arXiv preprint arXiv:1604.03286 (2016).
For 6, the provided sample set is quite small, so you cannot start with it for a reasonable model. As mentioned in our paper (https://arxiv.org/pdf/1609.04938.pdf), at least 20K instances are required to get a decent performance.

@Fred-Erik
Copy link
Author

Thanks, everything is working now! I get about 2-3 results per second when evaluating the working model you uploaded.

Now I'm going to try to get it to work with release_model.lua in the hope I can deploy this model to an ARM platform. :) You didn't perchance make any progress with it already? #5

@da03
Copy link
Collaborator

da03 commented Aug 10, 2017

Not yet, since I'm using cudnn in the CNN part. I'll look into this to make it working with GPU (note that would be much slower though)

@Fred-Erik
Copy link
Author

I guess you mean without GPU? Anyway, thank you very much! I'd be glad to hear it when you got something working. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants