support for gemma-2 #1813

almugabo · 2024-10-11T15:19:25Z

Could you please add fine-tuning support for gemma-2 ? It has good good multilingual capabilities and is a good candidate for fine-tuning for languages other than English.
Its different sizes also make it attractive for fine-tuning for different tasks.
I would gladly help but am not knowledgeable enough
Thank you

krammnic · 2024-10-11T15:56:47Z

Actually, in inspiration of one current kaggle competition - it is really good idea too add this pretty soon.

ebsmothers · 2024-10-11T15:59:59Z

Thanks @almugabo for creating the issue. I think this will be a bit of effort, quickly jotting down a couple of things I'm aware of that we'd need to support:

Logit softcapping in attention layer
Logit softcapping in output layer
Sliding window attention
Expose sliding window attention in configurable set of layers
Post layernorm for both attention and FFN (hacky thing to do is just use attn_scale and mlp_scale)
Model builders for 2B, 9B, 27B sizes

For logit softcapping and sliding window attention, I suspect we can use FlexAttention APIs. See this blog post where they give explicit examples of each.

Optimox · 2024-10-15T09:34:39Z

Hello, I have started the addition of gemma2, I will create now my PR in WIP mode. I haven't run any test yet but will do soon!

Edit: My PR is here: #1835

@ebsmothers it would be great if you could have a quick look to validate the choices I made in order to implement sliding windows, pre-post layer normalisation and softcapping... I would be happy to make things differently to keep the changes minimal (I tried as much as possible to keep all changes minimal).

Optimox · 2024-10-15T10:02:31Z

I didn't know about FlexAttention, I will look into it!

joecummings · 2024-12-10T11:34:05Z

ADDED!

ebsmothers added the community help wanted We would love the community's help completing this issue label Oct 11, 2024

Optimox mentioned this issue Oct 15, 2024

feat: add gemma2b variants #1835

Merged

13 tasks

joecummings mentioned this issue Oct 15, 2024

v0.4.0 release tracker #1747

Closed

34 tasks

joecummings closed this as completed Dec 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support for gemma-2 #1813

support for gemma-2 #1813

almugabo commented Oct 11, 2024

krammnic commented Oct 11, 2024

ebsmothers commented Oct 11, 2024

Optimox commented Oct 15, 2024 •

edited

Loading

Optimox commented Oct 15, 2024

joecummings commented Dec 10, 2024

support for gemma-2 #1813

support for gemma-2 #1813

Comments

almugabo commented Oct 11, 2024

krammnic commented Oct 11, 2024

ebsmothers commented Oct 11, 2024

Optimox commented Oct 15, 2024 • edited Loading

Optimox commented Oct 15, 2024

joecummings commented Dec 10, 2024

Optimox commented Oct 15, 2024 •

edited

Loading