Releases: ARM-software/astc-encoder
5.1.0
Status: November 2024
The 5.1.0 release is an optimization release, giving moderate performance improvements on all platforms. There are no image quality differences.
- General:
- Feature: Added a new CMake build option to control use of native gathers, as they can be slower than scalar loads on some common x86 microarchitectures. Build with
-DASTCENC_X86_GATHERS=OFF
to disable use of native gathers in AVX2 builds. - Optimization: Added new
gather()
abstraction for gathers using byte indices, allowing implementations without gather hardware to skip the byte-to-int index conversion. - Optimization: Optimized
compute_lowest_and_highest_weight()
to pre-compute min/max outside of the main loop. - Optimization: Added improved intrinsics sequence for SSE and AVX2 integer
hmin()
andhmax()
. - Optimization: Added improved intrinsics sequence for
vint4(uint8_t*)
on systems implementing Arm SVE.
- Feature: Added a new CMake build option to control use of native gathers, as they can be slower than scalar loads on some common x86 microarchitectures. Build with
Binary release sha256 checksums
a9f61f954f8c6f75675b9b8187454a3b1328c172cb62747715a30937bb8fe7bb astcenc-5.1.0-linux-x64.zip
4ecb330104e05a35febfc0d83b0c128ee2f51b671577e9952ca4278b5455197b astcenc-5.1.0-macos-universal.zip
ff01a6f988aaf5d7f1a139d2cff4c1988f70252b6f53fa3ba72888cc732ba154 astcenc-5.1.0-windows-arm64.zip
c3e5954b9f263e4e13d92892a9e9ef4728abf4acffbf013bb466dd97cc2a0a15 astcenc-5.1.0-windows-x64.zip
5.0.0
Status: November 2024
The 5.0.0 release is the first stable release in the 5.x series. The main new feature over the 4.x series is support for the Arm Scalable Vector Extensions (SVE) SIMD instruction set, with both 128-bit and 256-bit backends provided. This gives up to 60% performance improvement on Neoverse V2 with its 256-bit SVE implementation.
- General:
- Bug fix: Fixed incorrect return type in "None" vector library reference implementation.
- Bug fix: Fixed sincos table index under/overflow.
- Feature: Changed
ASTCENC_ISA_NATIVE
builds to use-march=native
and-mcpu=native
. - Feature: Added backend for Arm SVE fixed-width 256-bit builds. These can only run on hardware implementing 256-bit SVE.
- Feature: Added backend for Arm SVE 128-bit builds. These are portable builds and can run on hardware implementing any SVE vector length, but the explicit SVE use is augmented NEON and will only use the bottom 128-bits of each SVE vector.
- Feature: Optimized NEON mask
any()
andall()
functions. - Feature: Migrated build and test to GitHub Actions pipelines.
Binary release sha256 checksums
70183c4346f9fc0f55cd8e3ca5b326cbb4675233b026c02df60657e344c44cc0 astcenc-5.0.0-linux-x64.zip
b8e450250932f07765d868318709964da2a36d41be0538f5c018d8fc72a41e70 astcenc-5.0.0-macos-universal.zip
dbaf1a1329f6fd909457b43e7872092e979ba8ebd9bdf9dbbd950069fa533124 astcenc-5.0.0-windows-arm64.zip
6c22b89f3d437d457c45036c8297ce9cec400f8bdfa019eed71e3b8d343f5ebb astcenc-5.0.0-windows-x64.zip
4.8.0
Status: May 2024
The 4.8.0 release is a minor maintenance release.
Reminder - the codec library API is not designed to be binary compatible across versions. We always recommend rebuilding your client-side code using the updated astcenc.h
header.
- General:
- Bug fix: Native builds on macOS will now correctly build for arm64 when run outside of Rosetta on an Apple silicon device.
- Bug fix: Multiple small improvements to remove use of undefined language behavior, to improve support for deployment using Emscripten.
- Feature: Builds using Clang can now build with undefined behavior sanitizer by setting
-DASTCENC_UBSAN=ON
on the CMake configure line. - Feature: Updated to Wuffs library 0.3.4, which ignores tRNS alpha chunks for type 4 (LA) and 6 (RGBA) PNGs, to improve compatibility with libpng.
Binary release sha256 checksums
dfd7fb8056aeb1c9fa19d8c101f9a6a710ffa2000e62d5a536d4a7ca06c81c8d astcenc-4.8.0-linux-x64.zip
26b82cac7a41c99c4a5789df5025d1bc42a02067be4670a5f4639e176083cee2 astcenc-4.8.0-macos-universal.zip
cdf073b42c86e766959e0c363b84b16ca4908b0d1a6304eafa6ae8d32b2dd393 astcenc-4.8.0-windows-arm64.zip
a821d4c58fa5bb5ecf421aabbdb514f0083baea0c0bc0e6d10fba65fe3ceb7b2 astcenc-4.8.0-windows-x64.zip
4.7.0
Status: January 2024
The 4.7.0 release is a major maintenance release, fixing rounding behavior in the decompressor to match the Khronos specification. This fix includes the addition of explicit support for optimizing for decode_unorm8 rounding.
Reminder - the codec library API is not designed to be binary compatible across versions. We always recommend rebuilding your client-side code using the updated astcenc.h
header.
- General:
- Bug fix: sRGB LDR decompression now uses the correct endpoint expansion method to create the 16-bit RGB endpoint colors, and removes the previous correction code from the interpolation function. This bug could result in LSB bit flips relative to the standard specification.
- Bug fix: Decompressing to an 8-bit per component output image now matches the decode_unorm8 extension rounding rules. This bug could result in LSB bit flips relative to the standard specification.
- Bug fix: Code now avoids using
alignas()
in the reference C implementation, as the defaultalignas(16)
is narrower than the native minimum alignment requirement on some CPU architectures. - Feature: Library configuration supports a new flag,
ASTCENC_FLG_USE_DECODE_UNORM8
. This flag indicates that the image will be used with the decode_unorm8 decode mode. When set during compression this allows the compressor to use the correct rounding when determining the best encoding. - Feature: Command line tool supports a new option,
-decode_unorm8
. This option indicates that the image will be used with thedecode_unorm8
decode mode. This option will automatically be set for decompression (-d*
) and trial (-t*
) tool operation if the decompressed output image is stored to an 8-bit per component file format. This option must be set manually for compression (-c*
) tool operation, as the desired decode mode cannot be reliably determined. - Feature: Library configuration supports a new optional progress reporting callback to be specified. This is called during compression to to allow interactive tooling use cases to display incremental progress. The command line tool uses this feature to show compression progress unless
-silent
is used. - Feature: Prebuilt Linux binary releases updated to use Clang 14 (previously Clang 9) which gives a small performance improvement.
Binary release sha256 checksums
4f596546808c58f2b7e0271302d05df40159cdc7c47645cc96f331229d22ab66 astcenc-4.7.0-linux-x64.zip
176b2c0bf2673d15eb1ba11c1e2691ccb041b79e4076fccd6cd0ece3f75aa2bc astcenc-4.7.0-macos-universal.zip
092bc9023195c9ccef811c97f1e0746e0f803a5397e5a18773566615b077914d astcenc-4.7.0-windows-arm64.zip
deb20ea0cb4ef522ca1eecee82358bfc9cdafd8d4101bc22f28502a0165180bd astcenc-4.7.0-windows-x64.zip
4.6.1
Status: November 2023
The 4.6.1 release is a minor maintenance release to fix a performance scaling issue on large core count Windows systems. No other performance or image changes are expected for this release.
- General:
- Optimization: Windows builds of the
astcenc
command line tool can now use more than 64 cores on large core count systems. This change doubles command line performance for-exhastive
compression when testing on an 96 core/192 thread system. - Feature: Windows Arm64 native builds of the
astcenc
command line tool are now included in the prebuilt release binaries.
- Optimization: Windows builds of the
Binary release sha256 checksums
e360aeabf3b5aeda6a7cfabddc49af8b204e28befa04ab8e8942c85620ba071a astcenc-4.6.1-linux-x64.zip
40f19df27799f6f2ad6890c147165f8e077ff6547be57b02d7949677d3f1ea9e astcenc-4.6.1-macos-universal.zip
92cd085b6a2f8f748fd384f2dc8e3c977756ffbe13e263729695cd18233b55a8 astcenc-4.6.1-windows-arm64.zip
0c4ba7af8b5ec22e9bd4f4173866985d2d10b5be137753171481b9629054e38e astcenc-4.6.1-windows-x64.zip
4.6.0
Status: November 2023
The 4.6.0 release is a minor release with a few code quality improvements, and a small performance boost.
Reminder - the codec library API is not designed to be binary compatible across versions. We always recommend rebuilding your client-side code using the updated astcenc.h
header.
- General:
- Bug-fix: Fixed context allocation for contexts allocated with the
ASTCENC_FLG_DECOMPRESS_ONLY
flag. - Bug-fix: Reduced use of
reinterpret_cast
in the core codec to avoid strict aliasing violations. - Optimization:
-medium
search quality no longer tests 4 partition encodings for block sizes between 25 and 83 texels (inclusive). This improves performance for a tiny drop in image quality. - Optimization:
-thorough
and higher search qualities no longer test the mode0 first search for block sizes between 25 and 83 texels (inclusive). This improves performance for a tiny drop in image quality. - Optimization:
TUNE_MAX_PARTITIONING_CANDIDATES
reduced from 32 to 8 to reduce the size of stack allocated data structures. This causes a tiny drop in image quality for the-verythorough
and-exhaustive
presets.
- Bug-fix: Fixed context allocation for contexts allocated with the
Binary release sha256 checksums
321229025183e9f8f1cdb766b1a036da33a192d56e46bdf8b44295759c36fc9e astcenc-4.6.0-linux-x64.zip
19adae19a7a46b05739fe9285dbac6c960e08504780890bbef4f8eca6663e0d7 astcenc-4.6.0-macos-universal.zip
a140454aa6c2dee29e85a1dc162430a6914123aea62546bfad9885ca336bff24 astcenc-4.6.0-windows-x64.zip
4.5.0
Status: June 2023
The 4.5.0 release is a minor release with minor image quality improvements, and multiple build system quality-of-life improvements.
- General:
- Bug-fix: Improved handling compiler arguments in CMake, including consistent use of MSVC-style command line arguments for ClangCL.
- Bug-fix: Invariant Clang builds now use
-ffp-model=precise
with-ffp-contract=off
which is needed to restore invariance due to recent changes in compiler defaults. - Change: macOS binary releases are now distributed as a single universal binary for all platforms.
- Change: Windows binary releases are now compiled with VS2022.
- Change: Invariant MSVC builds for VS2022 now use
/fp:precise
instead of/fp:strict
, which is is now possible because precise no longer implies contraction. This should improve performance for MSVC builds. - Change: Non-invariant Clang builds now use
-ffp-model=precise
with-ffp-contract=on
. This should improve performance on older Clang versions which defaulted to no contraction. - Change: Non-invariant MSVC builds for VS2022 now use
/fp:precise
with/fp:contract
. This should improve performance for MSVC builds. - Change: CMake config variables now use an
ASTCENC_
prefix to add a namespace and group options when the library is used in a larger project. - Change: CMake config
ASTCENC_UNIVERSAL_BUILD
for building macOS universal binaries has been improved to include thex86_64h
slice for AVX2 builds. Universal builds are now on by default for macOS, and always include NEON (arm64), SSE4.1 (x86_64), and AVX2 (x86_64h) variants. - Change: CMake config
ASTCENC_NO_INVARIANCE
has been inverted to remove the negated option, and is nowASTCENC_INVARIANCE
with a default ofON
. Disabling this option can substantially improve performance, but images can different across platforms and compilers. - Optimization: Color quantization and packing for LDR RGB and RGBA has been vectorized to improve performance.
- Change: Color quantization for LDR RGB and RGBA endpoints will now try multiple quantization packing methods, and pick the one with the lowest endpoint encoding error. This gives a minor image quality improvement, for no significant performance impact when combined with the vectorization optimizations.
Binary release sha256 checksums
fe2a1e5c8e57fc77175c6e9d0e1a10e583816507c82c748c568694bf39ae9f57 astcenc-4.5.0-linux-x64.zip
bc8895222820106135575b7dd1bef4d9f184be7ef7de6e684ab563b64c22d163 astcenc-4.5.0-macos-universal.zip
7c63f167558c65e607f72a4dc30b86d631ae3b561cffaee9d2afb5d5149d72e0 astcenc-4.5.0-windows-x64.zip
4.4.0
Status: March 2023
The 4.4.0 release is a minor release with image quality improvements, a small performance boost, a few new quality-of-life features, and a few minor fixes for uncommon build configurations.
- General:
- Change: Core library no longer checks availability of required instruction set extensions, such as SSE4.1 or AVX2. Checking compatibility is now the responsibility of the caller. See
astcenccli_entry.cpp
for an example of code performing this check. - Change: Core library can be built as a shared object by setting the
-DSHAREDLIB=ON
CMake option, resulting in e.g.libastcenc-avx2-shared.so
. Note that the command line tool is always statically linked. - Change: Decompressed 3D images will now write one output file per slice, if the target format is a 2D image format, rather than just writing slice zero.
- Change: Command line tool errors print to
stderr
instead ofstdout
. - Change: Color encoding uses new quantization tables, that now factor in floating-point rounding if a distance tie is found when using the integer quant256 value. This improves image quality for 4x4 and 5x5 block sizes.
- Optimization: Partition selection uses a simplified line calculation with a faster approximation. This improves performance for all block sizes.
- Bug-fix: Fixed missing symbol error in dead code for decompressor-only builds.
- Bug-fix: Fixed infinity handling in debug trace JSON files.
- Change: Core library no longer checks availability of required instruction set extensions, such as SSE4.1 or AVX2. Checking compatibility is now the responsibility of the caller. See
Binary release sha256 checksums
98267b6e23f188658de1e275816bad6bf1e9fe3ae113a1bac4109bf7ee75d579 astcenc-4.4.0-linux-x64.zip
d88c90c82f0e5cf15b9fae7582da394e5855a7cdeee776887351b07f5b6d21ea astcenc-4.4.0-macos-aarch64.zip
8f97f78dd9bedc8a21cfdafdd712ee1f8ec29414804c4e38bd630f4c8894517f astcenc-4.4.0-macos-x64.zip
e5407a8d7c4a0355aa1f0614aee21f3407c7f10cac2349bf3e57e8aaa4a21d53 astcenc-4.4.0-windows-x64.zip
4.3.1
Status: January 2023
The 4.3.1 release is a minor maintenance release. No performance or image quality changes are expected.
- General:
- Bug-fix: Fixed typo in -2/3/4partitioncandidatelimit CLI options.
- Bug-fix: Fixed handling for -3/4partitionindexlimit CLI options.
- Bug-fix: Updated to stb_image.h v2.28, which includes multiple fixes for image loading.
Binary release sha256 checksums
93ccb1c96493a9066487f641c2bc5eca87c7ec58b2270f7645af95db795dda7f astcenc-4.3.1-linux-x64.zip
ddd548181b5beda535171c6eda6280ff99e175868cc75e22efc65bbcccd37fa9 astcenc-4.3.1-macos-aarch64.zip
eaa425ad34a0455960bb660271fe853ba374543adf6c22823dbaa96d8aff7593 astcenc-4.3.1-macos-x64.zip
95bbd32112fcf21ef17927dcc4d446b24bfc5838efd9c4d8bb34edbfb85849e8 astcenc-4.3.1-windows-x64.zip
4.3.0
Status: January 2023
The 4.3.0 release is an optimization release. There are minor performance improvements, minor image quality improvements, significant memory footprint improvements, and library interface changes in this release.
Reminder - the codec library API is not designed to be binary compatible across versions. We always recommend rebuilding your client-side code using the updated astcenc.h
header.
- General:
- Bug-fix: Use lower case
windows.h
include for MinGW compatibility. - Change: The
-mask
command line option,ASTCENC_FLG_MAP_MASK
in the library API, has been removed. - Optimization: Always skip blue-contraction for
QUANT_256
encodings. This gives a small image quality improvement for images using the 4x4 block size. - Optimization: Always skip RGBO vector calculation for LDR encodings.
- Optimization: Defer color packing and scrambling to physical layer.
- Optimization: Remove folded
decimation_info
lookup tables. This significantly reduces compressor memory footprint and improves context creation time. Impact increases with block size. - Optimization: Increased trial and refinement pruning by using stricter target errors when determining whether to skip iterations.
- Bug-fix: Use lower case
Binary release sha256 checksums
7d74df06ec5e186f1ae2c3718df4eac649f80b0d3e38bcd1378a8803574e5459 astcenc-4.3.0-linux-x64.zip
1e3ad1c07f234c97d2668639910fc032677b5a7716425e478cba219708cc6a65 astcenc-4.3.0-macos-aarch64.zip
fa6d1322c59c8b1cdbfbf607bdfe546d6325c6fe067c740f704e026efdacbe43 astcenc-4.3.0-macos-x64.zip
e3b648bcbbc016f67091aef5563983cafeffc8c3feb3ef6a8d4e988fabce152a astcenc-4.3.0-windows-x64.zip