size mismatch for weights and bias #3

1e0ndavid · 2021-04-29T12:11:59Z

Hi there. After trained the model, I run "python serve.py" to test whether the model is capable to use, before this I have changed run_uuid to be that of my model and checkpoint. Any idea about why it raises error "RuntimeError: Error(s) in loading state_dict for TransformerXLModel:"? Thanks.

(autocomplete) daijianbo@ubuntu18:~/python_autocomplete-master-old/python_autocomplete$ python serve.py

LABML WARNING
Not a valid git repository: /home/daijianbo/python_autocomplete-master-old

Prepare model...
Prepare n_tokens...
Prepare tokenizer...[DONE] 1.27ms
Prepare n_tokens...[DONE] 2.10ms
Prepare transformer...[DONE] 1.33ms
Prepare ffn...[DONE] 0.30ms
Prepare device...
Prepare device_info...[DONE] 23.29ms
Prepare device...[DONE] 23.51ms
Prepare model...[DONE] 107.18ms
Selected experiment = source_code run = b32da5eea23711eb982bccbbfe110075 checkpoint = 1744896
Loading checkpoint...[FAIL] 840.09ms
Traceback (most recent call last):
File "serve.py", line 18, in
predictor = get_predictor()
File "/home/daijianbo/python_autocomplete-master-old/python_autocomplete/evaluate/factory.py", line 39, in get_predictor conf = load_experiment()
File "/home/daijianbo/python_autocomplete-master-old/python_autocomplete/evaluate/factory.py", line 33, in load_experiment
experiment.start()
File "/home/daijianbo/miniconda3/envs/autocomplete/lib/python3.8/site-packages/labml/experiment.py", line 256, in start
return _experiment_singleton().start(run_uuid=_load_run_uuid, checkpoint=_load_checkpoint)
File "/home/daijianbo/miniconda3/envs/autocomplete/lib/python3.8/site-packages/labml/internal/experiment/init.py", line 407, in start
global_step = self.__start_from_checkpoint(run_uuid, checkpoint)
File "/home/daijianbo/miniconda3/envs/autocomplete/lib/python3.8/site-packages/labml/internal/experiment/init.py", line 312, in __start_from_check point
self._load_checkpoint(checkpoint_path)
File "/home/daijianbo/miniconda3/envs/autocomplete/lib/python3.8/site-packages/labml/internal/experiment/init.py", line 280, in _load_checkpoint
self.checkpoint_saver.load(checkpoint_path)
File "/home/daijianbo/miniconda3/envs/autocomplete/lib/python3.8/site-packages/labml/internal/experiment/init.py", line 118, in load
saver.load(checkpoint_path, info[name])
File "/home/daijianbo/miniconda3/envs/autocomplete/lib/python3.8/site-packages/labml/internal/experiment/pytorch.py", line 66, in load self.model.load_state_dict(state)
File "/home/daijianbo/miniconda3/envs/autocomplete/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1223, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(RuntimeError: Error(s) in loading state_dict for TransformerXLModel:
size mismatch for src_embed.weight: copying a param with shape torch.Size([1096, 512]) from checkpoint, the shape in current model is torch.Size([1097, 512]).
size mismatch for generator.weight: copying a param with shape torch.Size([1096, 512]) from checkpoint, the shape in current model is torch.Size([1097, 512]).
size mismatch for generator.bias: copying a param with shape torch.Size([1096]) from checkpoint, the shape in current model is torch.Size([1097]).

vpj · 2021-04-30T05:21:43Z

Looks like the number of tokens is different from the number of token when it was training. Did you change the dataset or run BPE again?

1e0ndavid · 2021-04-30T07:03:08Z

Looks like the number of tokens is different from the number of token when it was training. Did you change the dataset or run BPE again?

No, I dont think I did, the weird thing is my friend also met this problem and the difference between two dimensions is bigger than mine, he got [1084, 512] and [1092, 512] respectively. One way we solve this problem is to train more times and select another checkpoint, sometimes it works. I'm not sure what goes wrong here, whether it could be in the part of "segment-level recurrence"? I have no idea since I haven't review the code carefully.

vpj · 2021-04-30T13:03:54Z

This looks sounds like a bug. The dimensions of the embedding weights are number of tokens and number of embedding features (d_model)

vpj · 2021-04-30T13:05:04Z

I will give it a try and see if I can reproduce. Are you running the latest master? Did you make changes? Also is the dataset the same?

1e0ndavid · 2021-05-01T14:23:56Z

I will give it a try and see if I can reproduce. Are you running the latest master? Did you make changes? Also is the dataset the same?

Ok, try and see what will happen, lol. Yep, I downloaded this model several days ago so I suppose I have run the latest master. I haven't make any change other than commented the part of downloading data part, I used my own data and copied it from other folder directly. I also suspected at first whether is because I edited any key code but I don't think so after comparing with the original version. I tried other times from the very beginning, say from downloading the model to making it work. Similar question still exist, btw, my friend also met this, so maybe there is something wrong in the model?

And yeah, I always keep the dataset the same.

vpj added the help wanted Extra attention is needed label May 3, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

size mismatch for weights and bias #3

size mismatch for weights and bias #3

1e0ndavid commented Apr 29, 2021 •

edited

Loading

vpj commented Apr 30, 2021

1e0ndavid commented Apr 30, 2021

vpj commented Apr 30, 2021

vpj commented Apr 30, 2021

1e0ndavid commented May 1, 2021

size mismatch for weights and bias #3

size mismatch for weights and bias #3

Comments

1e0ndavid commented Apr 29, 2021 • edited Loading

vpj commented Apr 30, 2021

1e0ndavid commented Apr 30, 2021

vpj commented Apr 30, 2021

vpj commented Apr 30, 2021

1e0ndavid commented May 1, 2021

1e0ndavid commented Apr 29, 2021 •

edited

Loading