[WIP] Attention across documents. #213

GeorgiosSmyrnis · 2024-01-31T21:21:44Z

This adds a flag that stops attention from going across documents, identified by the EOT token.

The loss for the token right after the EOT token is ignored.

TODO: add some tests for the shape of the mask.

sagadre · 2024-02-02T03:50:23Z

open_lm/params.py

@@ -742,6 +742,11 @@ def parse_args(args):
        action="store_true",
        help="If set, allow model to do multiple data passes over our dataset, in order to reach the desired number of tokens.",
    )
+    parser.add_argument(
+        "--mask-across-documents",


i think this should be an int not a bool so that a user can specify their EOT token

Makes sense - will update the parameter.

sagadre · 2024-02-02T03:51:25Z

open_lm/train.py

+                if args.mask_across_documents:
+                    # Some input samples contain EOT as the final token. The prediction after that is meaningless, so it
+                    # should not contribute to the loss.
+                    ignore_indices = torch.nonzero(inputs == SpecialTokens.END_OF_TEXT.value, as_tuple=True)


i prefer not to hard code our EOT to keep open_lm tokenizer agnostic

Agreed - I'll change it so that it uses the user defined EOT token.

open_lm/attention.py

* Update .gitignore * Fix requirements for env. * Remove test data prep file erroneously committed. * Revert requirements. * Update makefile. --------- Co-authored-by: George Smyrnis <[email protected]>

achalddave · 2024-02-08T17:17:56Z

open_lm/train.py

+                # Some input samples contain EOT as the final token. The prediction after that is meaningless, so it
+                # should not contribute to the loss.
+                ignore_indices = torch.nonzero(inputs == SpecialTokens.END_OF_TEXT.value, as_tuple=True)
+                targets = targets.detach().clone()  # Clone this because it shares mem with input!


Interesting, is the detach necessary here? When args.mask_across_documents is False, should we also a detach()?

Detach is not necessary, but clone is - because the targets and the input share the underlying tensor, if the target is explicitly set then the input is also affected.

When args.mask_across_documents is False, this is not an issue - neither the target nor the input are explicitly changed.

GeorgiosSmyrnis requested a review from achalddave January 31, 2024 21:21

GeorgiosSmyrnis changed the title ~~Attention across documents.~~ [WIP] Attention across documents. Jan 31, 2024

sagadre reviewed Feb 2, 2024

View reviewed changes

open_lm/attention.py Show resolved Hide resolved

GeorgiosSmyrnis and others added 9 commits February 2, 2024 17:29

Added extra attention parts.

d7afc8c

Update .gitignore (#208)

119df71

* Update .gitignore * Fix requirements for env. * Remove test data prep file erroneously committed. * Revert requirements. * Update makefile. --------- Co-authored-by: George Smyrnis <[email protected]>

Different mask per element in batch.

e24738c

Add attention calls training.

3e88907

Running version.

d24b533

Ignore predictions right after EOT.

af1a4cd

Formatting.

3e0036b

doc attention eot enum value

e4a5bac

Trying to debug.

7234b31

GeorgiosSmyrnis force-pushed the gsmyrnis/document_attention branch from 4c322d1 to 7234b31 Compare February 2, 2024 23:29

GeorgiosSmyrnis added 3 commits February 2, 2024 17:30

Revert mistake on makefile.

427291f

Fixed mem sharing.

f28d984

Remove debug.

3521ce1

achalddave reviewed Feb 8, 2024

View reviewed changes

GeorgiosSmyrnis mentioned this pull request Feb 28, 2024

Support attention masking to prevent attention across EOT tokens #206

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Attention across documents. #213

[WIP] Attention across documents. #213

GeorgiosSmyrnis commented Jan 31, 2024

sagadre Feb 2, 2024

GeorgiosSmyrnis Feb 2, 2024

sagadre Feb 2, 2024

GeorgiosSmyrnis Feb 2, 2024

achalddave Feb 8, 2024

GeorgiosSmyrnis Feb 11, 2024

[WIP] Attention across documents. #213

Are you sure you want to change the base?

[WIP] Attention across documents. #213

Conversation

GeorgiosSmyrnis commented Jan 31, 2024

sagadre Feb 2, 2024

Choose a reason for hiding this comment

GeorgiosSmyrnis Feb 2, 2024

Choose a reason for hiding this comment

sagadre Feb 2, 2024

Choose a reason for hiding this comment

GeorgiosSmyrnis Feb 2, 2024

Choose a reason for hiding this comment

achalddave Feb 8, 2024

Choose a reason for hiding this comment

GeorgiosSmyrnis Feb 11, 2024

Choose a reason for hiding this comment