Release Note

The following are the highlights in this release:

Support Quantization For MACE Micro

At the beginning of this year, we released MACE Micro to fully support ultra-low-power inference scenarios of mobile phones and IoT devices. In this version, we support quantization for MACE Micro and integrate CMSIS5 to support Cortex-M chips better.

Support More Model Formats

We find more and more R&D engineers are using the PyTorch framework to train their models. In previous versions, MACE transformed the PyTorch model by using ONNX format as a bridge. In order to serve PyTorch developers better, we support direct transformation for PyTorch models in this version, which improves the performance of the model inference.
At the same time, we cooperated with MEGVII company and support its MegEngine model format. If you trained your models by MegEngine framework, now you can use MACE to deploy the models on mobile phones or IoT devices.

Support More Data Precision

Armv8.2 provides support for half-precision floating-point data processing instructions, in this version we support the fp16 precision computation by Armv8.2 fp16 instructions, which increases inference speed by roughly 40% for models such as mobilenet-v1 model.
The bfloat16 (Brain Floating Point) floating-point format is a computer number format occupying 16 bits in computer memory, we also support bfloat16 precision in this version, which increases inference speed by roughly 40% for models such as mobilenet-v1/2 model on some low-end chips.

Others

In this version, we also add the following features:

Support more operators, such as GroupNorm, ExtractImagePatches, Elu, etc.
Optimize the performance of the framework and operators, such as the Reduce operator.
Support dynamic filter of conv2d/deconv2d.
Integrate MediaTek APU support on mt6873, mt6885, and mt6853.

Acknowledgement

Thanks to the following guys who contribute code which makes MACE better.

@ZhangZhijing1, who contributed the bf16 code which was then committed by someone else.
@yungchienhsu, @Yi-Kai-Chen, @Eric-YK-Chen, @yzchen, @gasgallo, @lq, @huahang, @elswork, @LovelyBuggies, @freewym.

Performance Optimization

We found that the lack of OP implementations on devices(GPU, HTA, etc.) would lead to inefficient model execution, for the memory synchronization between the device and the CPU consumed much time, so we added and enhanced some operators on the GPU( reshape, lpnorm, mvnorm, etc.) and HTA (s2d, d2s, sub, etc.) to improve the efficiency of model execution.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release Note

Support Quantization For MACE Micro

Support More Model Formats

Support More Data Precision

Others

Acknowledgement

Release Note

Performance Optimization

Further Support For Speech Recognition

CMake Support

Others

Acknowledgement

Releases: lu229/repository

v0.13.0

Release Note

Support Quantization For MACE Micro

Support More Model Formats

Support More Data Precision

Others

Acknowledgement

v0.12.0

Release Note

Performance Optimization

Further Support For Speech Recognition

CMake Support

Others

Acknowledgement