Why not detach the hidden state of GRU from the computational graph? 为什么不将GRU的隐藏状态从计算图中detach? #44

MejiroSilence · 2024-12-19T04:23:38Z

Hello author, I have a question regarding your code. Why isn't the hidden state of the GRU detached from the computational graph? This could lead to exploding/vanishing gradients. I've seen other code using RNNs that seems to detach the hidden state from the previous step. It appears that only PyMARL and its various improved extensions don't do this. I checked https://github.com/oxwhirl/pymarl and it seems PyMARL is also written this way, and all the various improved repositories based on PyMARL handle it similarly. I hope to get an answer. Thank you.

作者您好，关于您的代码，我有一个疑问，为什么不将GRU的隐藏状态从计算图中detach?这会导致梯度爆炸/消失。我见过使用rnn的其他代码，似乎都将上一步的隐藏状态detach了，似乎只有pymarl以及其各种改进拓展没这么做。我查看了一下 https://github.com/oxwhirl/pymarl 似乎pymarl也是这么写的，之后所有的pymarl的各种改进的仓库，也都是这么处理的。希望能获得解答，谢谢。

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why not detach the hidden state of GRU from the computational graph? 为什么不将GRU的隐藏状态从计算图中detach? #44

Why not detach the hidden state of GRU from the computational graph? 为什么不将GRU的隐藏状态从计算图中detach? #44

MejiroSilence commented Dec 19, 2024 •

edited

Loading

Why not detach the hidden state of GRU from the computational graph? 为什么不将GRU的隐藏状态从计算图中detach? #44

Why not detach the hidden state of GRU from the computational graph? 为什么不将GRU的隐藏状态从计算图中detach? #44

Comments

MejiroSilence commented Dec 19, 2024 • edited Loading

MejiroSilence commented Dec 19, 2024 •

edited

Loading