add methods #18

Andron00e · 2024-10-15T20:53:40Z

~~SOAP~~
Muon (add schedules)
Shampoo (only DistributedShampoo)
~~Adam-mini~~
~~Lion~~
~~Sophia~~
~~AdEMAMix~~
~~Schedule-Free~~
Adafactor
Adalayer
~~Signum, signSGD~~
AdaHessian
~~Prodigy~~
~~SGDF~~
ADOPT
...

Andron00e · 2024-10-17T07:02:16Z

~~add schedules also | link~~

upd: has been added via this commit

Andron00e · 2024-10-17T12:53:38Z

some problems with installation of the lates version of schedulefree, so I added this manually
see: https://github.com/epfml/llm-baselines/blob/soap/src/optim/schedulefree.py

martinjaggi · 2024-10-17T14:05:53Z

is there a pull request for this? would be nice to collaborate

Andron00e · 2024-10-17T14:26:16Z

is there a pull request for this? would be nice to collaborate

hi, we are deploying it to the soap branch together with @mpagli

Andron00e · 2024-10-17T18:14:59Z

a useful settings:

anything but sgd
...

Andron00e · 2024-10-20T20:40:39Z

Adam-mini Note

I use model.named_parameters() for Adam-mini instead of group_specs, therefore in main.py it looks like:

  elif args.opt == "adam-mini":
      opt = Adam_mini(
          device=args.device,
          world_size=args.world_size,
          named_parameters=model.named_parameters(),  # check
          lr=args.lr,
          betas=(args.beta1, args.beta2),
          weight_decay=args.weight_decay,
          model_sharding=args.model_sharding,
          dim=args.n_embd,
          n_heads=args.n_head,
          n_kv_heads=args.n_kv_head,
          verbose=args.adam_mini_verbose,
      )

TODO: update partitions names

kylematoba · 2024-10-28T07:35:57Z

hi, I'll add sophia and adafactor.

Andron00e · 2024-10-28T08:33:26Z

hi, I'll add sophia and adafactor.

Hello! Super, just develop this in your branch and then PR to soap. I am a bit overloaded these days, but wanted to try Sophia also

Note: in official repository, they do not show SophiaH (with Hutchinson's preconditioner), only SophiaG. We want to have both methods here. SophiaH is nicely implemented in optax for now, but its not so hard to write in PyToch, see: this link

Thx)

kylematoba · 2024-10-29T13:14:06Z

hi, Bristen is back early, so I'll get back to that.

I did some research on Sophia, though, main findings:

The official implementation of SophiaG makes some weird choices, described here https://web.stanford.edu/class/archive/cs/cs224n/cs224n.1244/final-projects/CaiaMaiCostelloJasonDanielLazar.pdf.
The levanter implementation does not have SophiaG, only SophiaH.
There's a quite readable Julia implementation of SophiaH here https://github.com/SciML/Optimization.jl/blob/master/src/sophia.jl.

Adafactor is simple, it's already close to being released officially, see pytorch/pytorch#129905.

When I get some time next I'll return to this if you haven't.

martinjaggi · 2024-10-29T13:17:53Z

muon optimizer should also be a good one to add. i think @doikov might be interested in that one too:
https://x.com/Yuchenj_UW/status/1846964136204173318

martinjaggi · 2024-10-29T13:18:45Z

once we have a handful, we'll have a nice benchmark collection for LLM optimizers, probably worth a small writeup soon

Andron00e · 2024-10-29T14:03:07Z

muon optimizer should also be a good one to add. i think @doikov might be interested in that one too: https://x.com/Yuchenj_UW/status/1846964136204173318

yes, i am working on that.
already have some test runs of the Muon. but, again, it is hard to deduce when batch size is less than 0.5M tokens

btw
an interesting exercise – to try this new muon/soap/whatever on the banana function :)

Andron00e · 2024-10-29T14:05:30Z

hi, Bristen is back early, so I'll get back to that.

I did some research on Sophia, though, main findings:

The official implementation of SophiaG makes some weird choices, described here https://web.stanford.edu/class/archive/cs/cs224n/cs224n.1244/final-projects/CaiaMaiCostelloJasonDanielLazar.pdf.

The levanter implementation does not have SophiaG, only SophiaH.

There's a quite readable Julia implementation of SophiaH here https://github.com/SciML/Optimization.jl/blob/master/src/sophia.jl.

Adafactor is simple, it's already close to being released officially, see pytorch/pytorch#129905.

When I get some time next I'll return to this if you haven't.

I mean, for the official version of SophaiG, you may just look at the paper's repo: https://github.com/Liuhong99/Sophia

Andron00e added the enhancement New feature or request label Oct 15, 2024

Andron00e self-assigned this Oct 15, 2024

Andron00e mentioned this issue Oct 17, 2024

A bunch of new optimizers and schedules #21

Open

Andron00e mentioned this issue Nov 2, 2024

add methods Andron00e/learning-at-scale#2

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add methods #18

add methods #18

Andron00e commented Oct 15, 2024 •

edited

Loading

Andron00e commented Oct 17, 2024 •

edited

Loading

Andron00e commented Oct 17, 2024 •

edited

Loading

martinjaggi commented Oct 17, 2024

Andron00e commented Oct 17, 2024

Andron00e commented Oct 17, 2024

Andron00e commented Oct 20, 2024 •

edited

Loading

kylematoba commented Oct 28, 2024

Andron00e commented Oct 28, 2024

kylematoba commented Oct 29, 2024

martinjaggi commented Oct 29, 2024

martinjaggi commented Oct 29, 2024

Andron00e commented Oct 29, 2024 •

edited

Loading

Andron00e commented Oct 29, 2024 •

edited

Loading

add methods #18

add methods #18

Comments

Andron00e commented Oct 15, 2024 • edited Loading

Andron00e commented Oct 17, 2024 • edited Loading

Andron00e commented Oct 17, 2024 • edited Loading

martinjaggi commented Oct 17, 2024

Andron00e commented Oct 17, 2024

Andron00e commented Oct 17, 2024

Andron00e commented Oct 20, 2024 • edited Loading

kylematoba commented Oct 28, 2024

Andron00e commented Oct 28, 2024

kylematoba commented Oct 29, 2024

martinjaggi commented Oct 29, 2024

martinjaggi commented Oct 29, 2024

Andron00e commented Oct 29, 2024 • edited Loading

Andron00e commented Oct 29, 2024 • edited Loading

Andron00e commented Oct 15, 2024 •

edited

Loading

Andron00e commented Oct 17, 2024 •

edited

Loading

Andron00e commented Oct 17, 2024 •

edited

Loading

Andron00e commented Oct 20, 2024 •

edited

Loading

Andron00e commented Oct 29, 2024 •

edited

Loading

Andron00e commented Oct 29, 2024 •

edited

Loading