Request for mixed precision (i8 and f16) gemm support #1144

Unanswered

sjw36 asked this question in Ideas

sjw36
Jul 7, 2023
Maintainer

From Harris:

There are LLMs in the zoo that have i8 weights but compute in f16. These can save memory and bandwidth by storing the parameters in lower precision.

For GEMM (and Conv2d) support rocMLIR should allow fusion of conversion functions (i.e. tosa.cast) on the inputs.

I will update with a link to these models.

Replies: 0 comments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment