-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
can i use MonarchMixer replace cross attention lay #9
Comments
This is something we're very interested in and still working on! We don't have a formula for it quite yet. |
This doesn't sound like good news, it looks like I'll just have to CROSS ATTENTION mix MonarchMixer, is there a performance loss compared to raw ATTENTION? |
We've seen that we can match self-attention in quality with some gated convolutions (see the paper for details). Cross attention is still an open problem - which we'll be working on! |
If I use M2 can I not use positional coding as I feel that M2 looks a bit similar to conv which allows the model to know the positional information |
The Sequence Mixer in the paper doesn't seem to be able to mix unequal lengths of sequences in the same way as corss attention.because it uses elementwise multiplication.Is this a misunderstanding on my part or is Monarch Mixer not a replacement for cross attention?
The text was updated successfully, but these errors were encountered: