Skip to content
This repository has been archived by the owner on Apr 25, 2023. It is now read-only.

Why using relu to compute additaive attention #28

Open
yuboona opened this issue May 2, 2020 · 0 comments
Open

Why using relu to compute additaive attention #28

yuboona opened this issue May 2, 2020 · 0 comments

Comments

@yuboona
Copy link

yuboona commented May 2, 2020

1、Attention's formula

  • In Normal Additive version, the attention score as follow:
score = v * tanh(W * [hidden; encoder_outputs])
  • In your code
score = v * relu(W * [hidden; encoder_outputs])

2、question

Is there some trick here? or this is a result after experimental comparision.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant