-
Notifications
You must be signed in to change notification settings - Fork 160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
使用AutoModel替代build_transformer_model,发现其参数在训练过程中不会被更新 #140
Comments
另外,我用convert_deberta_v2.py将预训练模型参数名改好后,用build_transformer_model加载,训练完成后,发现train_model.bert.embeddings.word_embeddings.weight参数“没”被更新,其他的层有更新(比如train_model.bert.encoderLayer[0].multiHeadAttention.o.weight) |
我刚刚没隔一些step打印权重的sum(),从打印结果看是有变动的,只是变动的幅度和别的比是小了一点 2023-07-02 21:48:03 - Start Training
2023-07-02 21:48:03 - Epoch: 1/10
9/1129 [..............................] - ETA: 8:15 - loss: 0.7241 - accuracy: 0.5417 [embedding]: -11801.388671875 [o.weight]: 7.179624557495117
19/1129 [..............................] - ETA: 5:35 - loss: 0.6300 - accuracy: 0.6184 [embedding]: -11801.509765625 [o.weight]: 7.172887325286865
29/1129 [..............................] - ETA: 4:48 - loss: 0.6178 - accuracy: 0.6422 [embedding]: -11801.685546875 [o.weight]: 7.165395736694336
39/1129 [>.............................] - ETA: 4:28 - loss: 0.5923 - accuracy: 0.6603 [embedding]: -11801.83203125 [o.weight]: 7.176580429077148
49/1129 [>.............................] - ETA: 4:15 - loss: 0.5714 - accuracy: 0.6862 [embedding]: -11801.955078125 [o.weight]: 7.194809436798096
59/1129 [>.............................] - ETA: 4:03 - loss: 0.5740 - accuracy: 0.6833 [embedding]: -11801.974609375 [o.weight]: 7.193059921264648
69/1129 [>.............................] - ETA: 3:57 - loss: 0.5531 - accuracy: 0.7029 [embedding]: -11801.9990234375 [o.weight]: 7.181048393249512
79/1129 [=>............................] - ETA: 3:50 - loss: 0.5431 - accuracy: 0.7144 [embedding]: -11801.96484375 [o.weight]: 7.179488182067871
89/1129 [=>............................] - ETA: 3:44 - loss: 0.5396 - accuracy: 0.7191 [embedding]: -11801.923828125 [o.weight]: 7.181138038635254
99/1129 [=>............................] - ETA: 3:39 - loss: 0.5331 - accuracy: 0.7216 [embedding]: -11801.8671875 [o.weight]: 7.16298246383667 |
嗯嗯,是的,loss会下降,但是模型只有一部分参数被更新。 |
我加载的Erlangshen-DeBERTa-v2-97M-Chinese |
请问您这里时使用的build_transformer_model吗? |
我刚刚看的这个example,我打印出来权重是有略微的改变的,那你直接用huggingface的试试看呢,那边是什么情况 |
你这样修改看看打印出来是否有变化 class Evaluator(Callback):
"""评估与保存
"""
def __init__(self):
self.best_val_acc = 0.
def on_batch_begin(self, global_step, local_step, logs=None):
if (global_step+1) % 50 == 0:
print('[embedding]: ', model.bert.embeddings.word_embeddings.weight[:4,:4].detach()) |
不好意思,我知道为什么在我这embedding看起来没有变化了: |
至于之前 AutoModel.from_pretrained 来替代 build_transformer_model时,我看attention层的向量训练前后并没有变化 |
嗯嗯,应该是要语料中出现该token,其才会更新到embedding的权重中去 |
使用 AutoModel.from_pretrained 来替代 build_transformer_model(config_path, checkpoint_path) 作为backbone后,发现模型训练过程不会更新backbone的参数( requires_grad=True),请问这个问题您能帮忙解答下吗?
|
我看loss是下降的,说明肯定有参数更新了,你可以试着记录所有参数层的权重和看看呢,看看哪些层变化了,哪些层没变化 |
我感觉这个框架应该没啥关系,用bert4torch或者hf的trainer应该不是导致这个问题的原因 |
我用的CasRel代码,看参数只更新了self.bert 以外的,如 self.linear1 class Model(BaseModel):
def __init__(self) -> None:
super().__init__()
# self.bert = build_transformer_model(config_path, checkpoint_path, model='deberta_v2')
self.bert = AutoModel.from_pretrained("../../data/bert/Erlangshen-DeBERTa-v2-97M-Chinese")
self.linear1 = nn.Linear(768, 2)
self.condLayerNorm = LayerNorm(hidden_size=768, conditional_size=768 * 2)
self.LayerNorm = LayerNorm(hidden_size=768)
self.linear2 = nn.Linear(768, len(predicate2id) * 2) 以下是加载bert的打印结果,是正常的: bert.encoder.layer[0].attention.output.dense.weight:
tensor([[ 0.0147, -0.0067, -0.0006, -0.0297],
[ 0.0141, -0.0764, -0.1015, -0.0069],
[-0.0212, 0.0386, -0.0464, -0.0098],
[ 0.0502, 0.0950, -0.0278, -0.0396]], device='cuda:7')
10/31 [========>.....................] - ETA: 15s - loss: 0.6156 - subject_loss: 0.1724 - object_loss: 0.4431
bert.encoder.layer[0].attention.output.dense.weight:
tensor([[ 0.0148, -0.0065, -0.0003, -0.0295],
[ 0.0140, -0.0765, -0.1016, -0.0070],
[-0.0205, 0.0393, -0.0459, -0.0091],
[ 0.0505, 0.0953, -0.0279, -0.0393]], device='cuda:7')
20/31 [==================>...........] - ETA: 5s - loss: 0.4846 - subject_loss: 0.1588 - object_loss: 0.3258
bert.encoder.layer[0].attention.output.dense.weight:
tensor([[ 0.0149, -0.0064, -0.0002, -0.0294],
[ 0.0141, -0.0764, -0.1016, -0.0069],
[-0.0203, 0.0395, -0.0458, -0.0089],
[ 0.0506, 0.0953, -0.0279, -0.0392]], device='cuda:7')
30/31 [============================>.] - ETA: 0s - loss: 0.4405 - subject_loss: 0.1542 - object_loss: 0.2863
bert.encoder.layer[0].attention.output.dense.weight:
tensor([[ 0.0150, -0.0064, -0.0001, -0.0294],
[ 0.0141, -0.0764, -0.1016, -0.0069],
[-0.0202, 0.0397, -0.0458, -0.0088],
[ 0.0506, 0.0953, -0.0279, -0.0392]], device='cuda:7')
31/31 [==============================] - 13s 430ms/step - loss: 0.4373 - subject_loss: 0.1535 - object_loss: 0.2837 |
我使用 AutoModel.from_pretrained 来替代 build_transformer_model(config_path, checkpoint_path) 作为backbone后,发现模型训练过程不会更新backbone的参数( requires_grad=True),而其他的加上的 linear 层还是正常更新的。
请问能提示下是哪里的问题吗?
The text was updated successfully, but these errors were encountered: