-
Notifications
You must be signed in to change notification settings - Fork 472
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support for gemma-2 #1813
Comments
Actually, in inspiration of one current kaggle competition - it is really good idea too add this pretty soon. |
Thanks @almugabo for creating the issue. I think this will be a bit of effort, quickly jotting down a couple of things I'm aware of that we'd need to support:
For logit softcapping and sliding window attention, I suspect we can use FlexAttention APIs. See this blog post where they give explicit examples of each. |
Hello, I have started the addition of gemma2, I will create now my PR in WIP mode. I haven't run any test yet but will do soon! Edit: My PR is here: #1835 @ebsmothers it would be great if you could have a quick look to validate the choices I made in order to implement sliding windows, pre-post layer normalisation and softcapping... I would be happy to make things differently to keep the changes minimal (I tried as much as possible to keep all changes minimal). |
I didn't know about FlexAttention, I will look into it! |
ADDED! |
Could you please add fine-tuning support for gemma-2 ? It has good good multilingual capabilities and is a good candidate for fine-tuning for languages other than English.
Its different sizes also make it attractive for fine-tuning for different tasks.
I would gladly help but am not knowledgeable enough
Thank you
The text was updated successfully, but these errors were encountered: