can i use MonarchMixer replace cross attention lay #9

autumn-2-net · 2023-11-04T03:44:08Z

The Sequence Mixer in the paper doesn't seem to be able to mix unequal lengths of sequences in the same way as corss attention.because it uses elementwise multiplication.Is this a misunderstanding on my part or is Monarch Mixer not a replacement for cross attention?

DanFu09 · 2023-11-04T03:51:08Z

This is something we're very interested in and still working on! We don't have a formula for it quite yet.

autumn-2-net · 2023-11-04T04:11:56Z

This is something we're very interested in and still working on! We don't have a formula for it quite yet.

This doesn't sound like good news, it looks like I'll just have to CROSS ATTENTION mix MonarchMixer, is there a performance loss compared to raw ATTENTION?

DanFu09 · 2023-11-04T05:14:13Z

We've seen that we can match self-attention in quality with some gated convolutions (see the paper for details). Cross attention is still an open problem - which we'll be working on!

autumn-2-net · 2023-12-03T08:38:09Z

We've seen that we can match self-attention in quality with some gated convolutions (see the paper for details). Cross attention is still an open problem - which we'll be working on!

If I use M2 can I not use positional coding as I feel that M2 looks a bit similar to conv which allows the model to know the positional information

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

can i use MonarchMixer replace cross attention lay #9

can i use MonarchMixer replace cross attention lay #9

autumn-2-net commented Nov 4, 2023

DanFu09 commented Nov 4, 2023

autumn-2-net commented Nov 4, 2023

DanFu09 commented Nov 4, 2023

autumn-2-net commented Dec 3, 2023

can i use MonarchMixer replace cross attention lay #9

can i use MonarchMixer replace cross attention lay #9

Comments

autumn-2-net commented Nov 4, 2023

DanFu09 commented Nov 4, 2023

autumn-2-net commented Nov 4, 2023

DanFu09 commented Nov 4, 2023

autumn-2-net commented Dec 3, 2023