int8*int8 -> float? #203

XapaJIaMnu · 2020-12-19T15:27:17Z

Hey,

I'm looking to perform int8 * int8 -> fp32. where at the output stage I dequantise the int32_t result into float (and then potentially add a bias. I was following the example from https://github.com/google/gemmlowp/blob/master/doc/quantization_example.cc#L305
But it seems that in order to unquantise to float you compute the quantisation parameters from the fp32 result that you had already computed before, which in practise I wouldn't know. I can compute it with a compensation factor, but it becomes incredibly complicated and computationally (and memory) expensive. Any alternatives?

If I am able to assume quantisation into int8 as opposed to uint8 as in the example, I would be able to have quantisation without the zero_point parameter (assuming zero cantered distribution) which would massively simplify dequantisation. Do you support this? Do you have any examples in the codebase where something like this is done?

The text was updated successfully, but these errors were encountered:

bjacob · 2020-12-21T21:28:52Z

For such use cases, we typically have the matmul output raw int32 accumulators, then we do a pass outside of the matmul library converting those to float.

In gemmlowp, you get raw int32 accumulators simply by passing an empty output_pipeline, as in this part of the test:

gemmlowp/test/test.cc

Lines 1211 to 1230 in fda83bd

    
           // Test an empty pipeline, i.e. returning raw int32 accumulators. 
        
           auto empty_pipeline = std::make_tuple(); 
        
           GemmContext context; 
        
           GemmWithOutputPipeline<std::uint8_t, std::int32_t, DefaultL8R8BitDepthParams>( 
        
               &context, lhs.const_map(), rhs.const_map(), &result_raw_int32, lhs_offset, 
        
               rhs_offset, empty_pipeline); 
        
           for (int r = 0; r < rows; r++) { 
        
             for (int c = 0; c < cols; c++) { 
        
               std::int32_t expected = 0; 
        
               for (int d = 0; d < depth; d++) { 
        
                 std::int32_t lhs_val = 
        
                     static_cast<std::int32_t>(lhs(r, d)) + lhs_offset; 
        
                 std::int32_t rhs_val = 
        
                     static_cast<std::int32_t>(rhs(d, c)) + rhs_offset; 
        
                 expected += lhs_val * rhs_val; 
        
               } 
        
               Check(expected == result_raw_int32(r, c)); 
        
             } 
        
           }

May I suggest taking a look at the ruy library instead of gemmlowp. It's basically gemmlowp's successor, it's what TFLite has been using by default on ARM for 18 months now, it supports both float and quantized, any combination of int8 and uint8, zero point or not and more quantization flavor variations. I've added an example for getting raw int32 accumulators.
https://github.com/google/ruy/blob/878283640de7946a43053e8ebf4f15114fbc9156/example/example.cc#L129-L152

XapaJIaMnu · 2021-02-02T13:59:29Z

@bjacob thank you that will do nicely. I think I'll use RUY.

Looking at the test, as far as I can see, only i8_i8_i32_i32 is supported, no i8_i8_i32_f32, so I'd have to do the float conversion outside of the multiply, correct?

bjacob · 2021-02-02T20:56:02Z

Yes, exactly.

jerinphilip mentioned this issue Nov 1, 2021

ARM Support for bergamot-translator matrix-multiplies for Mozilla browsermt/bergamot-translator#249

Closed

jerinphilip mentioned this issue Apr 12, 2022

ARM Backend using ruy for fp32 and int8 browsermt/marian-dev#79

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

int8*int8 -> float? #203

int8*int8 -> float? #203

XapaJIaMnu commented Dec 19, 2020

bjacob commented Dec 21, 2020 •

edited

Loading

XapaJIaMnu commented Feb 2, 2021

bjacob commented Feb 2, 2021

int8*int8 -> float? #203

int8*int8 -> float? #203

Comments

XapaJIaMnu commented Dec 19, 2020

bjacob commented Dec 21, 2020 • edited Loading

XapaJIaMnu commented Feb 2, 2021

bjacob commented Feb 2, 2021

bjacob commented Dec 21, 2020 •

edited

Loading