there is a problem about the result_scale and result_zero_point? #137

victorygogogo · 2018-06-01T09:31:37Z

I saw the doc/quantization_example.cc.

there is a problem about the result_scale and result_zero_point?

how to make sure about it ?

you count the real result ,them calculate it ?

but if we all do it every time in the real net work,the speed is lower obviously?

who can help me ,solve the problem ??

bjacob · 2018-06-01T14:17:43Z

maybe out paper gives more context.
https://arxiv.org/abs/1712.05877

haoyan01 · 2018-06-26T20:20:26Z

@bjacob
Hi Benoit,
I read the paper you mentioned, but I still have the same question.

result_quantized_value = result_zero_point + (lhs_scale * rhs_scale / result_scale) * Sum_over_i( (lhs_quantized_value[i] - lhs_zero_point) * (rhs_quantized_value[i] - rhs_zero_point) ) (5)

The above equation is the basic scheme to calculate the quantized matrix multiplication. Since the input matrices are given, lhs_scale*rhs_scale, and Sum_over parts are easy to compute. But how to calculate result_scale and result_zero_point is not well described in both paper and gemmlowp documents.
Assume the result quantized value has 8 bits, my guess is

255 = result_quantized_value_max = result_zero_point + (lhs_scale * rhs_scale / result_scale) *Sum_over_i_max (a)

and

0 = result_quantized_value_min = result_zero_point + (lhs_scale * rhs_scale / result_scale) *Sum_over_i_min (b)

(a) -(b), we can get:

255 = (lhs_scale * rhs_scale / result_scale) *(Sum_over_i_max - Sum_over_i_min) (c)

Then,

result_scale = (lhs_scale * rhs_scale / 255) *(Sum_over_i_max - Sum_over_i_min)

Since Sum_over_i_max and Sum_over_i_min can be calculated, the result_scale can be got from the above equation. Is it correct and is it the way you used for calculating the result_scale and result_zero_point? Thank you so much.

bjacob · 2018-06-26T20:31:13Z

The result scale and zero_point are not to be inferred from the inputs scale and zero point, that 's why neither our example code nor paper give formulas for that. There is no formula for that.

Instead, the quantization parameters of the result must be given by the user.

In a typical quantized neural network application, as in our paper, it is the training process that will record the min-max used for each matrix, including for the result matrix. The quantization and inference process will then use that pre-recorded min-max to quantize the result matrix.

haoyan01 · 2018-07-03T16:52:35Z

@bjacob
Thanks Benoit, is there any pretrained quantized model such as mobilenet that contains scales and zeropoints?

bjacob · 2018-07-03T16:55:19Z

I think there is, explore around
https://www.tensorflow.org/mobile/tflite/
and maybe ask on the issue tracker there if it's not obvious.

sxsxsx · 2018-07-19T06:00:10Z

@bjacob hello
From your paper https://arxiv.org/abs/1712.05877 I get that
During the training with simulated quantization, you only quantized the weights and activations, so we can get the corresponding scale and zero_point

(1)could you tell me how to get the result scale and zero_point during training process?
Is it right that to inference the model without to be quantized and collect [a; b] ranges about the result and deal with it just like deal the activations during the Training with simulated quantization?

you said that "The quantization and inference process will then use that pre-recorded min-max to quantize the result matrix."
(2)How to ensure that the quantized model with pre-recorded min-max has generalization ability?

thanks a lot, good luck to you @bjacob

bjacob · 2018-07-19T13:01:20Z

Redirecting these questions to @skligys who wrote Section 3 of this paper on training and is generally the training expert :-)

zyc4me · 2018-11-30T10:36:32Z

same question，i have trained a quantized model in tf object object API， but when i get the global variables in the ".ckpt", i only found the weight_min/max and the min/max after relu6 (0 /5.9997) ,there is not output min/max of conv ,why?
the name of min/max tensor like that:

FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_0/act_quant/min:0
FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_0/act_quant/max:0
FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_0/act_quant/FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_0/act_quant/min/biased:0
FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_0/act_quant/FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_0/act_quant/min/local_step:0
FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_0/act_quant/FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_0/act_quant/max/biased:0
FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_0/act_quant/FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_0/act_quant/max/local_step:0
FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_1_depthwise/weights_quant/min:0
FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_1_depthwise/weights_quant/max:0
FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_1_depthwise/act_quant/min:0
FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_1_depthwise/act_quant/max:0

bjacob · 2018-11-30T13:53:11Z

that's intentional, quantized inference only operates on fully fused graphs (conv + biasadd + relu6 fused into a single node), the placement of minmax info matches that fusion granularity. See https://arxiv.org/abs/1712.05877 for more explanations.

…

On Fri, Nov 30, 2018 at 5:36 AM zyc4me ***@***.***> wrote: same question，i have trained a quantized model in tf object object API， but when i get the global variables in the ".ckpt", i only found the weight_min/max and the min/max after relu6 (0 /5.9997) ,there is not output min/max of conv ,why? the name of min/max tensor like that: FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_0/act_quant/min:0 FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_0/act_quant/max:0 FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_0/act_quant/FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_0/act_quant/min/biased:0 FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_0/act_quant/FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_0/act_quant/min/local_step:0 FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_0/act_quant/FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_0/act_quant/max/biased:0 FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_0/act_quant/FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_0/act_quant/max/local_step:0 FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_1_depthwise/weights_quant/min:0 FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_1_depthwise/weights_quant/max:0 FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_1_depthwise/act_quant/min:0 FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_1_depthwise/act_quant/max:0 — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#137 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAE2r0eYczYwTYdbg0OZ2rdB_eZYvu-Vks5u0QowgaJpZM4UWY0X> .

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

there is a problem about the result_scale and result_zero_point? #137

there is a problem about the result_scale and result_zero_point? #137

victorygogogo commented Jun 1, 2018

bjacob commented Jun 1, 2018

haoyan01 commented Jun 26, 2018 •

edited

Loading

bjacob commented Jun 26, 2018

haoyan01 commented Jul 3, 2018

bjacob commented Jul 3, 2018

sxsxsx commented Jul 19, 2018 •

edited

Loading

bjacob commented Jul 19, 2018

zyc4me commented Nov 30, 2018

bjacob commented Nov 30, 2018 via email

there is a problem about the result_scale and result_zero_point? #137

there is a problem about the result_scale and result_zero_point? #137

Comments

victorygogogo commented Jun 1, 2018

bjacob commented Jun 1, 2018

haoyan01 commented Jun 26, 2018 • edited Loading

bjacob commented Jun 26, 2018

haoyan01 commented Jul 3, 2018

bjacob commented Jul 3, 2018

sxsxsx commented Jul 19, 2018 • edited Loading

bjacob commented Jul 19, 2018

zyc4me commented Nov 30, 2018

bjacob commented Nov 30, 2018 via email

haoyan01 commented Jun 26, 2018 •

edited

Loading

sxsxsx commented Jul 19, 2018 •

edited

Loading