-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
size mismatch for weights and bias #3
Comments
Looks like the number of tokens is different from the number of token when it was training. Did you change the dataset or run BPE again? |
No, I dont think I did, the weird thing is my friend also met this problem and the difference between two dimensions is bigger than mine, he got [1084, 512] and [1092, 512] respectively. One way we solve this problem is to train more times and select another checkpoint, sometimes it works. I'm not sure what goes wrong here, whether it could be in the part of "segment-level recurrence"? I have no idea since I haven't review the code carefully. |
This looks sounds like a bug. The dimensions of the embedding weights are number of tokens and number of embedding features (d_model) |
I will give it a try and see if I can reproduce. Are you running the latest master? Did you make changes? Also is the dataset the same? |
Ok, try and see what will happen, lol. Yep, I downloaded this model several days ago so I suppose I have run the latest master. I haven't make any change other than commented the part of downloading data part, I used my own data and copied it from other folder directly. I also suspected at first whether is because I edited any key code but I don't think so after comparing with the original version. I tried other times from the very beginning, say from downloading the model to making it work. Similar question still exist, btw, my friend also met this, so maybe there is something wrong in the model? And yeah, I always keep the dataset the same. |
Hi there. After trained the model, I run "python serve.py" to test whether the model is capable to use, before this I have changed run_uuid to be that of my model and checkpoint. Any idea about why it raises error "RuntimeError: Error(s) in loading state_dict for TransformerXLModel:"? Thanks.
(autocomplete) daijianbo@ubuntu18:~/python_autocomplete-master-old/python_autocomplete$ python serve.py
LABML WARNING
Not a valid git repository: /home/daijianbo/python_autocomplete-master-old
Prepare model...
Prepare n_tokens...
Prepare tokenizer...[DONE] 1.27ms
Prepare n_tokens...[DONE] 2.10ms
Prepare transformer...[DONE] 1.33ms
Prepare ffn...[DONE] 0.30ms
Prepare device...
Prepare device_info...[DONE] 23.29ms
Prepare device...[DONE] 23.51ms
Prepare model...[DONE] 107.18ms
Selected experiment = source_code run = b32da5eea23711eb982bccbbfe110075 checkpoint = 1744896
Loading checkpoint...[FAIL] 840.09ms
Traceback (most recent call last):
File "serve.py", line 18, in
predictor = get_predictor()
File "/home/daijianbo/python_autocomplete-master-old/python_autocomplete/evaluate/factory.py", line 39, in get_predictor conf = load_experiment()
File "/home/daijianbo/python_autocomplete-master-old/python_autocomplete/evaluate/factory.py", line 33, in load_experiment
experiment.start()
File "/home/daijianbo/miniconda3/envs/autocomplete/lib/python3.8/site-packages/labml/experiment.py", line 256, in start
return _experiment_singleton().start(run_uuid=_load_run_uuid, checkpoint=_load_checkpoint)
File "/home/daijianbo/miniconda3/envs/autocomplete/lib/python3.8/site-packages/labml/internal/experiment/init.py", line 407, in start
global_step = self.__start_from_checkpoint(run_uuid, checkpoint)
File "/home/daijianbo/miniconda3/envs/autocomplete/lib/python3.8/site-packages/labml/internal/experiment/init.py", line 312, in __start_from_check point
self._load_checkpoint(checkpoint_path)
File "/home/daijianbo/miniconda3/envs/autocomplete/lib/python3.8/site-packages/labml/internal/experiment/init.py", line 280, in _load_checkpoint
self.checkpoint_saver.load(checkpoint_path)
File "/home/daijianbo/miniconda3/envs/autocomplete/lib/python3.8/site-packages/labml/internal/experiment/init.py", line 118, in load
saver.load(checkpoint_path, info[name])
File "/home/daijianbo/miniconda3/envs/autocomplete/lib/python3.8/site-packages/labml/internal/experiment/pytorch.py", line 66, in load self.model.load_state_dict(state)
File "/home/daijianbo/miniconda3/envs/autocomplete/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1223, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(RuntimeError: Error(s) in loading state_dict for TransformerXLModel:
size mismatch for src_embed.weight: copying a param with shape torch.Size([1096, 512]) from checkpoint, the shape in current model is torch.Size([1097, 512]).
size mismatch for generator.weight: copying a param with shape torch.Size([1096, 512]) from checkpoint, the shape in current model is torch.Size([1097, 512]).
size mismatch for generator.bias: copying a param with shape torch.Size([1096]) from checkpoint, the shape in current model is torch.Size([1097]).
The text was updated successfully, but these errors were encountered: