Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add methods #18

Open
Andron00e opened this issue Oct 15, 2024 · 13 comments
Open

add methods #18

Andron00e opened this issue Oct 15, 2024 · 13 comments
Assignees
Labels
enhancement New feature or request

Comments

@Andron00e
Copy link
Collaborator

Andron00e commented Oct 15, 2024

  • SOAP
  • Muon (add schedules)
  • Shampoo (only DistributedShampoo)
  • Adam-mini
  • Lion
  • Sophia
  • AdEMAMix
  • Schedule-Free
  • Adafactor
  • Adalayer
  • Signum, signSGD
  • AdaHessian
  • Prodigy
  • SGDF
  • ADOPT
  • ...
@Andron00e Andron00e added the enhancement New feature or request label Oct 15, 2024
@Andron00e Andron00e self-assigned this Oct 15, 2024
@Andron00e
Copy link
Collaborator Author

Andron00e commented Oct 17, 2024

add schedules also | link

upd: has been added via this commit

@Andron00e
Copy link
Collaborator Author

Andron00e commented Oct 17, 2024

some problems with installation of the lates version of schedulefree, so I added this manually
see: https://github.com/epfml/llm-baselines/blob/soap/src/optim/schedulefree.py

@martinjaggi
Copy link
Member

is there a pull request for this? would be nice to collaborate

@Andron00e
Copy link
Collaborator Author

is there a pull request for this? would be nice to collaborate

hi, we are deploying it to the soap branch together with @mpagli

@Andron00e
Copy link
Collaborator Author

a useful settings:

@Andron00e
Copy link
Collaborator Author

Andron00e commented Oct 20, 2024

Adam-mini Note

I use model.named_parameters() for Adam-mini instead of group_specs, therefore in main.py it looks like:

  elif args.opt == "adam-mini":
      opt = Adam_mini(
          device=args.device,
          world_size=args.world_size,
          named_parameters=model.named_parameters(),  # check
          lr=args.lr,
          betas=(args.beta1, args.beta2),
          weight_decay=args.weight_decay,
          model_sharding=args.model_sharding,
          dim=args.n_embd,
          n_heads=args.n_head,
          n_kv_heads=args.n_kv_head,
          verbose=args.adam_mini_verbose,
      )

TODO: update partitions names

@kylematoba
Copy link

hi, I'll add sophia and adafactor.

@Andron00e
Copy link
Collaborator Author

hi, I'll add sophia and adafactor.

Hello! Super, just develop this in your branch and then PR to soap. I am a bit overloaded these days, but wanted to try Sophia also

Note: in official repository, they do not show SophiaH (with Hutchinson's preconditioner), only SophiaG. We want to have both methods here. SophiaH is nicely implemented in optax for now, but its not so hard to write in PyToch, see: this link

Thx)

@kylematoba
Copy link

hi, Bristen is back early, so I'll get back to that.

I did some research on Sophia, though, main findings:

Adafactor is simple, it's already close to being released officially, see pytorch/pytorch#129905.

When I get some time next I'll return to this if you haven't.

@martinjaggi
Copy link
Member

muon optimizer should also be a good one to add. i think @doikov might be interested in that one too:
https://x.com/Yuchenj_UW/status/1846964136204173318

@martinjaggi
Copy link
Member

once we have a handful, we'll have a nice benchmark collection for LLM optimizers, probably worth a small writeup soon

@Andron00e
Copy link
Collaborator Author

Andron00e commented Oct 29, 2024

muon optimizer should also be a good one to add. i think @doikov might be interested in that one too: https://x.com/Yuchenj_UW/status/1846964136204173318

yes, i am working on that.
already have some test runs of the Muon. but, again, it is hard to deduce when batch size is less than 0.5M tokens

btw
an interesting exercise – to try this new muon/soap/whatever on the banana function :)

@Andron00e
Copy link
Collaborator Author

Andron00e commented Oct 29, 2024

hi, Bristen is back early, so I'll get back to that.

I did some research on Sophia, though, main findings:

Adafactor is simple, it's already close to being released officially, see pytorch/pytorch#129905.

When I get some time next I'll return to this if you haven't.

I mean, for the official version of SophaiG, you may just look at the paper's repo: https://github.com/Liuhong99/Sophia

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants