-
Notifications
You must be signed in to change notification settings - Fork 454
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
int8*int8 -> float? #203
Comments
For such use cases, we typically have the matmul output raw In gemmlowp, you get raw Lines 1211 to 1230 in fda83bd
May I suggest taking a look at the ruy library instead of gemmlowp. It's basically gemmlowp's successor, it's what TFLite has been using by default on ARM for 18 months now, it supports both float and quantized, any combination of int8 and uint8, zero point or not and more quantization flavor variations. I've added an example for getting raw |
@bjacob thank you that will do nicely. I think I'll use RUY. Looking at the test, as far as I can see, only |
Yes, exactly. |
Hey,
I'm looking to perform
int8 * int8 -> fp32
. where at the output stage I dequantise theint32_t
result intofloat
(and then potentially add a bias. I was following the example from https://github.com/google/gemmlowp/blob/master/doc/quantization_example.cc#L305But it seems that in order to unquantise to
float
you compute the quantisation parameters from the fp32 result that you had already computed before, which in practise I wouldn't know. I can compute it with a compensation factor, but it becomes incredibly complicated and computationally (and memory) expensive. Any alternatives?If I am able to assume quantisation into
int8
as opposed touint8
as in the example, I would be able to have quantisation without the zero_point parameter (assuming zero cantered distribution) which would massively simplify dequantisation. Do you support this? Do you have any examples in the codebase where something like this is done?The text was updated successfully, but these errors were encountered: