Skip to content

RGBA 32-bit and initial aarch64 SIMD

Compare
Choose a tag to compare
@bitbank2 bitbank2 released this 04 Jan 18:32
· 44 commits to master since this release
c7e9af2

I corrected some errors in the 16 different permutations of subsampling and scaling options. I also added an experimental set of code to optimize the color conversion for aarch64 (Arm NEON) for the 4:2:0 subsampling, full size output. On my MacBook Air M1, it doubles the decode speed. A 126K 938x698 file decodes in just 8 milliseconds (previously 15 milliseconds). I can optimize this code for x86 and Arm desktop usage, but need to evaluate the cost/benefit of investing the time. I believe my code can beat libjpeg-turbo for certain situations (if I fully deploy SIMD optimizations). Please let me know if you need this code optimized for your desktop application.