Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OLMo 2 (WIP) #1897

Open
wants to merge 10 commits into
base: main
Choose a base branch
from
Open

OLMo 2 (WIP) #1897

wants to merge 10 commits into from

Conversation

ysjprojects
Copy link
Contributor

https://huggingface.co/collections/allenai/olmo-2-674117b93ab84e98afc72edc
https://arxiv.org/abs/2501.00656

Version 2 of OLMo released by Ai2.

Comes in 7B and 13B Base, Instruct, and additional SFT and DPO models.

First, we find that OLMo 2 7B and 13B are the best fully-open models to-date, often outperforming open weight models of equivalent size. Not only do we observe a dramatic improvement in performance across all tasks compared to our earlier OLMo 0424 model but, notably, OLMo 2 7B outperforms LLama-3.1 8B and OLMo 2 13B outperforms Qwen 2.5 7B despite its lower total training FLOPs. The OLMo 2 models sit at the Pareto frontier of training FLOPs vs model average performance (see figure above).

@ysjprojects ysjprojects changed the title OLMo 2 OLMo 2 (WIP) Jan 4, 2025
@rasbt
Copy link
Collaborator

rasbt commented Jan 8, 2025

Hi there,
just wanted to say thanks for taking on this PR (I know this is a lot of work)! The OLMo models are awesome, and I'd be great to have OLMo 2 in LitGPT.

@ysjprojects
Copy link
Contributor Author

Hi there, just wanted to say thanks for taking on this PR (I know this is a lot of work)! The OLMo models are awesome, and I'd be great to have OLMo 2 in LitGPT.

Thanks mate!

Currently on vacation, will resume working on this PR once I'm back.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants