Low bit shampoo #1257

msaroufim · 2024-11-09T05:28:23Z

Opening this on behalf of @winglian

An optimizer that many folks have been interested in is Shampoo https://arxiv.org/abs/1802.09568 - its fans say it converges faster because it uses second order gradients but still manages to keep memory requirements in check

To further keep the memory requirements in check we can quantize it! There are some existing papers out there that are good recipes for how this could work for int4 https://arxiv.org/abs/2405.18144

As far as implementing the work we have many reference examples for int8, int4, fp8 adam and adamw https://github.com/pytorch/ao/tree/main/torchao/prototype/low_bit_optim and we have in progress contribution here #1231

Ideally the work above can be turned into a guide on how to implement a new low bit optimizer that people can follow and implement a new optimizer in a day's worth of work if they already understand how the optimizer they're trying to implement works

cc @gau-nernst @andrewor14 @vkuzo @janeyx99 @supriyar

msaroufim added optimizer enhancement New feature or request labels Nov 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Low bit shampoo #1257

Low bit shampoo #1257

msaroufim commented Nov 9, 2024 •

edited

Loading

Low bit shampoo #1257

Low bit shampoo #1257

Comments

msaroufim commented Nov 9, 2024 • edited Loading

msaroufim commented Nov 9, 2024 •

edited

Loading