Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non causal sliding window mask ? #73

Closed
Optimox opened this issue Nov 2, 2024 · 3 comments
Closed

Non causal sliding window mask ? #73

Optimox opened this issue Nov 2, 2024 · 3 comments

Comments

@Optimox
Copy link

Optimox commented Nov 2, 2024

Hi,

I am trying to port gemma 2 to torchtune library.

When following the code for sliding window mask generation I first used a binary causal mask but it seems that this creates a non causal sliding window mask with 1s for previous tokens, 0s for future sliding window tokens and -inf for the other tokens.

I have shared a minimal reproducible code to demonstrate my problem here.

Could someone clarify what format is expected for the causal or block causal masks in your implementation ?

I would also be curious to know why your are using -2.3819763e38 instead of -torch.inf ?

Thank you for helping me clarifying the situation!

@Gopi-Uppari
Copy link
Collaborator

Hi @Optimox,

In the Gemma model's implementation, causal masks are utilized to ensure that each token in a sequence can only attend to itself and preceding tokens.
Expected Format for Causal Mask should Elements on or below the diagonal are set to 0 (indicating allowed attention), while elements above the diagonal
are set to a large negative value -2.3819763e38 to effectively mask future tokens by assigning them negligible attention scores.

Use of -2.3819763e38 Instead of -torch.inf because numerical stability (Using extremely large negative values can prevent potential issues with floating-point precision during computations)
and compatibility reasons (Some hardware accelerators or libraries may not handle -inf gracefully, leading to undefined behavior or errors.)

For more reference, could you please refer to this reference

Thank you.

@Gopi-Uppari
Copy link
Collaborator

Hi @Optimox,

Could you please confirm if this issue is resolved for you with the above comment ? Please feel free to close the issue if it is resolved ?

Thank you.

@Optimox
Copy link
Author

Optimox commented Nov 5, 2024

Yes thank you for your help!

@Optimox Optimox closed this as completed Nov 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants