Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support for gemma-2 #1813

Closed
almugabo opened this issue Oct 11, 2024 · 5 comments
Closed

support for gemma-2 #1813

almugabo opened this issue Oct 11, 2024 · 5 comments
Labels
community help wanted We would love the community's help completing this issue

Comments

@almugabo
Copy link

Could you please add fine-tuning support for gemma-2 ? It has good good multilingual capabilities and is a good candidate for fine-tuning for languages other than English.
Its different sizes also make it attractive for fine-tuning for different tasks.
I would gladly help but am not knowledgeable enough
Thank you

@krammnic
Copy link
Contributor

Actually, in inspiration of one current kaggle competition - it is really good idea too add this pretty soon.

@ebsmothers
Copy link
Contributor

Thanks @almugabo for creating the issue. I think this will be a bit of effort, quickly jotting down a couple of things I'm aware of that we'd need to support:

  • Logit softcapping in attention layer
  • Logit softcapping in output layer
  • Sliding window attention
  • Expose sliding window attention in configurable set of layers
  • Post layernorm for both attention and FFN (hacky thing to do is just use attn_scale and mlp_scale)
  • Model builders for 2B, 9B, 27B sizes

For logit softcapping and sliding window attention, I suspect we can use FlexAttention APIs. See this blog post where they give explicit examples of each.

@ebsmothers ebsmothers added the community help wanted We would love the community's help completing this issue label Oct 11, 2024
@Optimox
Copy link
Contributor

Optimox commented Oct 15, 2024

Hello, I have started the addition of gemma2, I will create now my PR in WIP mode. I haven't run any test yet but will do soon!

Edit: My PR is here: #1835

@ebsmothers it would be great if you could have a quick look to validate the choices I made in order to implement sliding windows, pre-post layer normalisation and softcapping... I would be happy to make things differently to keep the changes minimal (I tried as much as possible to keep all changes minimal).

@Optimox Optimox mentioned this issue Oct 15, 2024
13 tasks
@Optimox
Copy link
Contributor

Optimox commented Oct 15, 2024

I didn't know about FlexAttention, I will look into it!

@joecummings joecummings mentioned this issue Oct 15, 2024
34 tasks
@joecummings
Copy link
Contributor

ADDED!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community help wanted We would love the community's help completing this issue
Projects
None yet
Development

No branches or pull requests

5 participants