[in progress] fp16 memory optimizations #96

nwatx · 2024-04-17T21:59:05Z

still need to bench performance accurately (will add a bench suite soon)
working torch.half() / floating point
model memory optimization
kv cache memory optimization
clean up code

rishikksh20 · 2024-04-19T09:49:34Z

@nwatx Hi, is fp16 gives good output and speed up than normal ?

nwatx · 2024-04-19T12:27:58Z

i haven't measured speed up, but from observation, it seems to improve memory consumption

nwatx · 2024-04-19T12:28:07Z

the output seems to be of similar quality

Ph0rk0z · 2024-04-27T12:05:41Z

Basically I tested this and found that there is no difference beyond just changing the KV_CACHE to fp16. Using the autocasting and stuff like that gives no benefit that I can see. I was sorta hopeful this did something I didn't, but no such luck.

On a side note, I can generate on my 2080 22g using the fp16 cache, previously it would OOM but so far it has not.

jasonppy · 2024-04-27T21:59:21Z

flashattention might help

Ph0rk0z · 2024-04-28T11:48:54Z

There's a vllm one that will work for all tensor core cards: https://github.com/vllm-project/vllm/blob/main/vllm/attention/ops/triton_flash_attention.py

current one only supports ampere+

Not sure how to wrap it around your forwards.

aashay-sarvam · 2024-07-22T08:08:09Z

@nwatx I am getting error while running, specifically text_tokens.half() is creating an issue

update

b81fc56

jasonppy self-assigned this Apr 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[in progress] fp16 memory optimizations #96

[in progress] fp16 memory optimizations #96

nwatx commented Apr 17, 2024 •

edited

Loading

rishikksh20 commented Apr 19, 2024

nwatx commented Apr 19, 2024

nwatx commented Apr 19, 2024

Ph0rk0z commented Apr 27, 2024

jasonppy commented Apr 27, 2024

Ph0rk0z commented Apr 28, 2024

aashay-sarvam commented Jul 22, 2024

[in progress] fp16 memory optimizations #96

Are you sure you want to change the base?

[in progress] fp16 memory optimizations #96

Conversation

nwatx commented Apr 17, 2024 • edited Loading

rishikksh20 commented Apr 19, 2024

nwatx commented Apr 19, 2024

nwatx commented Apr 19, 2024

Ph0rk0z commented Apr 27, 2024

jasonppy commented Apr 27, 2024

Ph0rk0z commented Apr 28, 2024

aashay-sarvam commented Jul 22, 2024

nwatx commented Apr 17, 2024 •

edited

Loading