multihead attention #2375

mrityunjay-tripathi · 2020-04-17T13:03:00Z

Hi everyone,
I've worked on the implementation of multihead attention. The multihead attention layer would be required for the Transformer model. Debugging and refactoring of the code will come subsequently but this is the initial structure on which I will be working on. The implementation is mostly motivated from PyTorch and Tensorflow.

src/mlpack/methods/ann/layer/multihead_attention.hpp

lozhnikov

Sorry for the slow response. I added a couple of comments.

src/mlpack/methods/ann/layer/multihead_attention.hpp

src/mlpack/methods/ann/layer/multihead_attention_impl.hpp

src/mlpack/methods/ann/layer/multihead_attention.hpp

src/mlpack/methods/ann/layer/layer_types.hpp

lozhnikov

I added a couple of comments.

src/mlpack/methods/ann/layer/multihead_attention.hpp

src/mlpack/methods/ann/layer/multihead_attention_impl.hpp

src/mlpack/methods/ann/layer/multihead_attention.hpp

lozhnikov

I think perhaps, it's better to pass three matrices (query, key, and value, concatenated into one matrix) each time. It would simplify the interface.

src/mlpack/methods/ann/layer/multihead_attention.hpp

lozhnikov

Some comments for the Forward() implementation.

src/mlpack/methods/ann/layer/multihead_attention_impl.hpp

mrityunjay-tripathi · 2020-07-16T15:21:58Z

Wow! Tests are failing. Locally they were passing. 😕

src/mlpack/tests/ann_layer_test.cpp

lozhnikov

Some comments.

src/mlpack/methods/ann/layer/multihead_attention.hpp

src/mlpack/methods/ann/layer/multihead_attention_impl.hpp

lozhnikov

I added some minor comments. I'm still looking through the implementation.

src/mlpack/core/math/multiply_slices.hpp

src/mlpack/core/math/multiply_slices_impl.hpp

src/mlpack/core/math/multiply_slices.hpp

src/mlpack/methods/ann/layer/multihead_attention.hpp

src/mlpack/tests/ann_layer_test.cpp

lozhnikov · 2020-08-03T12:46:17Z

I think these memory errors aren't your fault. I verified your tests with valgrind and AddressSanitizer, they didn't show anything wrong. However, other tests in ANNLayerTest have a considerable amount of memory leaks (see leaks.zip, I used gcc-6 under ubuntu 18.04).

mrityunjay-tripathi · 2020-08-16T17:14:58Z

Ohh right. I had to scroll up again and again to see that comment, so missed the part that selfAttention is actually Concat layer. Thanks for the clarification :)

lozhnikov · 2020-08-17T10:21:57Z

Could you apply this patch please? restore-ordering.patch.zip It restores natural ordering.

mrityunjay-tripathi · 2020-08-17T10:59:56Z

Could you apply this patch please? restore-ordering.patch.zip It restores natural ordering.

Wow!!! Thank you so much for working this out.
Looks like we are solving Rubik's cube 😅

mrityunjay-tripathi · 2020-08-19T03:05:24Z

@lozhnikov I was trying to implement using the Concat and other layers as you suggested. I'm finding problems with extending it to further decoder blocks.

// This concatenates encoder output and output of first attention block inside the decoder block.
Concat<> encoderAndDecoderBottom;
encoderAndDecoderBottom.Add(encoderSequence);
encoderAndDecoderBottom.Add(decoderBottomSequence);

If I try to extend it to other decoder blocks, how should I do it?

Concat<> encoderAndDecoderSecond;
encoderAndDecoderSecond.Add(encoderSequence); ??
encoderAndDecoderSecond.Add(bottomDecoder);   ??

The input sources are totally different in this case. And if I club those inputs then use subview again to split it, it will run the whole encoder stack again for concatenating the last encoder output and the output of the first decoder. And the same for second and third and so on.
Am I doing it the wrong way?? Do I need to use some other method for concatenating encoder output to further decoder blocks?

mrityunjay-tripathi · 2020-08-19T08:35:45Z

I'm also clueless about how residual connections would be employed as residual connections are made b/w query and the output of attention block. For the encoder side, it won't be a problem but how can I do it for decoder side residual connections when we use concatenated [query key value].

lozhnikov · 2020-08-19T13:16:49Z

If I try to extend it to other decoder blocks, how should I do it?

It's hard to explain. Let me write a prototype for N = 2 decoder blocks. I think it'll be easy to do the same for an arbitrary N. I'll try to do this in the evening.

it will run the whole encoder stack again for concatenating the last encoder output and the output of the first decoder

It's easy to avoid. You just need to broadcast the encoder output 2N times, where N is the number of Decoder blocks.

lozhnikov · 2020-08-19T22:31:39Z

I've implemented a rough model of the encoder block. It should answer your question about the residual connections. I'll continue thinking of the decoder block in the morning.

Sequential<>* CreateEncoder() {
  Sequential<>* encoder = new Sequential<>;

  {
    Concat<>* selfAttentionInput = new Concat<>();
    selfAttentionInput->Add<IdentityLayer<>>();
    selfAttentionInput->Add<IdentityLayer<>>();
    selfAttentionInput->Add<IdentityLayer<>>();

    Sequential<>* selfAttention = new Sequential<>();
    selfAttention->Add(selfAttentionInput);
    selfAttention->Add<MultiheadAttention<>>();

    AddMerge<>* residualAddMerge = new AddMerge<>();
    residualAddMerge->Add(selfAttention);
    residualAddMerge->Add<IdentityLayer<>>();

    encoder->Add(residualAddMerge);
  }
  
  encoder->Add<LayerNorm<>>();
  
  {
    Sequential<>* pointWiseFeedForwardNetwork = new Sequential<>();
//    pointWiseFeedForwardNetwork->Add(......);
//    pointWiseFeedForwardNetwork->Add(......);

    AddMerge<>* residualAddMerge = new AddMerge<>();
    residualAddMerge->Add(pointWiseFeedForwardNetwork);
    residualAddMerge->Add<IdentityLayer<>>();
    encoder->Add(residualAddMerge);
  }

  encoder->Add<LayerNorm<>>();

  return encoder;
}

Upd: I meant the encoder. Now I'm thinking on the decoder.

lozhnikov · 2020-08-20T10:22:59Z

@mrityunjay-tripathi Finally I implemented the whole draft of the transformer model. Despite the fact it's a draft, I think it's quite accurate. It supports an arbitrary number of the decoders and encoders. You just need to put the correct arguments to the layer constructors especially put the correct arguments to the Subview<> layers.

https://gist.github.com/lozhnikov/aabb9231c0bb72528ff64a4f9bc19923

Tell me if you need any help with this.

mrityunjay-tripathi · 2020-08-22T10:05:15Z

INFO: Starting to record.
INFO: Processing BoostTest-1.x (default)
INFO: [BoostTest-1.x (default)] - 147 test report file(s) were found with the pattern 'reports/tests/*.boost_test.xml' relative to '/home/jenkins/workspace/pull-requests mlpack memory@2' for the testing framework 'BoostTest-1.x (default)'.
WARNING: The file '/home/jenkins/workspace/pull-requests mlpack memory@2/reports/tests/AsyncLearningTest_OneStepSarsaTest.boost_test.xml' is an invalid file.
WARNING: At line 1 of file:/home/jenkins/workspace/pull-requests%20mlpack%20memory@2/reports/tests/AsyncLearningTest_OneStepSarsaTest.boost_test.xml:XML document structures must start and end within the same entity.
WARNING: Technical validation:XML document structures must start and end within the same entity.
WARNING: The result file '/home/jenkins/workspace/pull-requests mlpack memory@2/reports/tests/AsyncLearningTest_OneStepSarsaTest.boost_test.xml' for the metric 'BoostTest' is not valid. The result file has been skipped.
WARNING: The result file '/home/jenkins/workspace/pull-requests mlpack memory@2/reports/tests/DCGANNetworkTest_DCGANCelebATest.boost_test.xml' for the metric 'BoostTest' is empty. The result file has been skipped.
WARNING: The file '/home/jenkins/workspace/pull-requests mlpack memory@2/reports/tests/InitRulesTest_KathirvalavakumarSubavathiInitTest.boost_test.xml' is an invalid file.
WARNING: At line 1 of file:/home/jenkins/workspace/pull-requests%20mlpack%20memory@2/reports/tests/InitRulesTest_KathirvalavakumarSubavathiInitTest.boost_test.xml:XML document structures must start and end within the same entity.
WARNING: Technical validation:XML document structures must start and end within the same entity.
WARNING: The result file '/home/jenkins/workspace/pull-requests mlpack memory@2/reports/tests/InitRulesTest_KathirvalavakumarSubavathiInitTest.boost_test.xml' for the metric 'BoostTest' is not valid. The result file has been skipped.
WARNING: The result file '/home/jenkins/workspace/pull-requests mlpack memory@2/reports/tests/empty.boost_test.xml' for the metric 'BoostTest' is empty. The result file has been skipped.
INFO: Check 'Failed Tests' threshold.
INFO: Check 'Skipped Tests' threshold.
INFO: Setting the build status to ABORTED
INFO: Stopping recording.
Setting status of 6bbbd94430cc57e4384b1354f3d7caf35b6513f6 to FAILURE with url http://ci.mlpack.org/job/pull-requests%20mlpack%20memory/6288/ and message: 'Build finished. '
Using context: Memory Checks
Finished: ABORTED

I think memory failure has something to do with skipped tests and not failed tests. It shows some .xml file is invalid. Why??

lozhnikov · 2020-08-22T10:18:48Z

I think memory failure has something to do with skipped tests and not failed tests. It shows some .xml file is invalid. Why??

I think these memory issues are unrelated to your PR. Looks like there are some memory issues in other ANN tests/methods.

lozhnikov

Looks good to me. I added some minor style suggestions.

src/mlpack/core/math/multiply_slices_impl.hpp

src/mlpack/methods/ann/layer/multihead_attention.hpp

src/mlpack/tests/ann_test_tools.hpp

…ormer

Co-authored-by: Mikhail Lozhnikov <[email protected]>

…nto transformer

mlpack-bot

Second approval provided automatically after 24 hours. 👍

lozhnikov · 2020-08-24T10:51:46Z

I merged the PR. The memory issues were unrelated to this PR, I checked the tests with valgrind locally. Thanks for the contribution!

mrityunjay-tripathi · 2020-08-24T11:34:26Z

Thanks, @lozhnikov! Finally this is done 😅. Thanks for all the reviews, suggestions and helps :)

mlpack-bot bot added s: needs review s: unanswered s: unlabeled labels Apr 17, 2020

zoq added c: methods t: added feature and removed s: unanswered s: unlabeled labels Apr 18, 2020

mrityunjay-tripathi commented May 2, 2020

View reviewed changes

src/mlpack/methods/ann/layer/multihead_attention.hpp Outdated Show resolved Hide resolved

mrityunjay-tripathi force-pushed the transformer branch from c818c76 to e51a42f Compare May 6, 2020 12:30

lozhnikov reviewed May 11, 2020

View reviewed changes

mrityunjay-tripathi marked this pull request as ready for review May 17, 2020 10:56

mrityunjay-tripathi commented May 19, 2020

View reviewed changes

src/mlpack/methods/ann/layer/multihead_attention_impl.hpp Outdated Show resolved Hide resolved

src/mlpack/methods/ann/layer/multihead_attention_impl.hpp Outdated Show resolved Hide resolved

birm marked this pull request as draft May 19, 2020 16:06

mrityunjay-tripathi mentioned this pull request Jun 4, 2020

transformer mlpack/models#16

Open

6 tasks

mrityunjay-tripathi force-pushed the transformer branch from d692767 to 6ff1d77 Compare June 9, 2020 08:24

mrityunjay-tripathi commented Jun 11, 2020

View reviewed changes

src/mlpack/methods/ann/layer/multihead_attention.hpp Outdated Show resolved Hide resolved

mrityunjay-tripathi commented Jun 11, 2020

View reviewed changes

src/mlpack/methods/ann/layer/layer_types.hpp Outdated Show resolved Hide resolved

lozhnikov reviewed Jun 11, 2020

View reviewed changes

zoq reviewed Jun 11, 2020

View reviewed changes

src/mlpack/methods/ann/layer/multihead_attention.hpp Outdated Show resolved Hide resolved

lozhnikov reviewed Jun 19, 2020

View reviewed changes

src/mlpack/methods/ann/layer/multihead_attention.hpp Outdated Show resolved Hide resolved

src/mlpack/methods/ann/layer/multihead_attention.hpp Outdated Show resolved Hide resolved

src/mlpack/methods/ann/layer/multihead_attention.hpp Outdated Show resolved Hide resolved

lozhnikov reviewed Jun 22, 2020

View reviewed changes

mrityunjay-tripathi mentioned this pull request Jul 4, 2020

scaled dot product attention #2500

Closed

mrityunjay-tripathi marked this pull request as ready for review July 8, 2020 18:12

mrityunjay-tripathi changed the title ~~[WIP] multihead attention~~ multihead attention Jul 8, 2020

mrityunjay-tripathi commented Jul 25, 2020

View reviewed changes

src/mlpack/tests/ann_layer_test.cpp Show resolved Hide resolved

lozhnikov reviewed Jul 27, 2020

View reviewed changes

mrityunjay-tripathi mentioned this pull request Aug 1, 2020

fix lookup layer #2398

Merged

lozhnikov reviewed Aug 1, 2020

View reviewed changes

apply patch

77a4313

fix static code analysis failure

6bbbd94

mrityunjay-tripathi added 8 commits August 21, 2020 16:18

adding multihead attention class

5bcbd3a

apply patch

65f3421

add jacobian test for the case, query = key = value

4dbac85

this assertion is already implemented at required place

9577643

apply patch

ae3596a

correcting dimensions of backpropagated error

ac38f1e

apply patch

39ca84c

fix static code analysis failure

8e949ce

lozhnikov approved these changes Aug 22, 2020

View reviewed changes

mrityunjay-tripathi and others added 4 commits August 23, 2020 11:07

Merge branch 'master' of https://github.com/mlpack/mlpack into transf…

c2dbd72

…ormer

apply suggestions from code review

735d7bf

Co-authored-by: Mikhail Lozhnikov <[email protected]>

Merge branch 'transformer' of github.com:mrityunjay-tripathi/mlpack i…

37b1f76

…nto transformer

removing some redundant code

4b317ba

mlpack-bot bot approved these changes Aug 23, 2020

View reviewed changes

mlpack-bot bot removed the s: needs review label Aug 23, 2020

lozhnikov merged commit 944f2b5 into mlpack:master Aug 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

multihead attention #2375

multihead attention #2375

mrityunjay-tripathi commented Apr 17, 2020 •

edited

Loading

lozhnikov left a comment

lozhnikov left a comment

lozhnikov left a comment

lozhnikov left a comment

mrityunjay-tripathi commented Jul 16, 2020

lozhnikov left a comment

lozhnikov left a comment

lozhnikov commented Aug 3, 2020 •

edited

Loading

mrityunjay-tripathi commented Aug 16, 2020

lozhnikov commented Aug 17, 2020

mrityunjay-tripathi commented Aug 17, 2020

mrityunjay-tripathi commented Aug 19, 2020

mrityunjay-tripathi commented Aug 19, 2020

lozhnikov commented Aug 19, 2020

lozhnikov commented Aug 19, 2020 •

edited

Loading

lozhnikov commented Aug 20, 2020

mrityunjay-tripathi commented Aug 22, 2020

lozhnikov commented Aug 22, 2020

lozhnikov left a comment

mlpack-bot bot left a comment

lozhnikov commented Aug 24, 2020

mrityunjay-tripathi commented Aug 24, 2020

multihead attention #2375

multihead attention #2375

Conversation

mrityunjay-tripathi commented Apr 17, 2020 • edited Loading

lozhnikov left a comment

Choose a reason for hiding this comment

lozhnikov left a comment

Choose a reason for hiding this comment

lozhnikov left a comment

Choose a reason for hiding this comment

lozhnikov left a comment

Choose a reason for hiding this comment

mrityunjay-tripathi commented Jul 16, 2020

lozhnikov left a comment

Choose a reason for hiding this comment

lozhnikov left a comment

Choose a reason for hiding this comment

lozhnikov commented Aug 3, 2020 • edited Loading

mrityunjay-tripathi commented Aug 16, 2020

lozhnikov commented Aug 17, 2020

mrityunjay-tripathi commented Aug 17, 2020

mrityunjay-tripathi commented Aug 19, 2020

mrityunjay-tripathi commented Aug 19, 2020

lozhnikov commented Aug 19, 2020

lozhnikov commented Aug 19, 2020 • edited Loading

lozhnikov commented Aug 20, 2020

mrityunjay-tripathi commented Aug 22, 2020

lozhnikov commented Aug 22, 2020

lozhnikov left a comment

Choose a reason for hiding this comment

mlpack-bot bot left a comment

Choose a reason for hiding this comment

lozhnikov commented Aug 24, 2020

mrityunjay-tripathi commented Aug 24, 2020

mrityunjay-tripathi commented Apr 17, 2020 •

edited

Loading

lozhnikov commented Aug 3, 2020 •

edited

Loading

lozhnikov commented Aug 19, 2020 •

edited

Loading