Highlights since version 0.0.4.2:
Use __builtin_shuffle
for 'generic' SIMD code
Some compilers, including GCC, provide a built-in function to efficiently shuffle vector elements without resorting to platform-specific SIMD intrinsics. If available, we now use this function instead of a hand-written byte-wise implementation for the 'generic' implementations of the SIMD routines. For non-generic implementations, the code generated by __builtin_shuffle
is slightly more complicated than the hand-written intrinsics code.
See commit 724dddb for more information, including how compiler support is checked in the ./configure
script.
See: 724dddb
Detect and use system-provided __get_cpuid_count
The X86 system headers coming with GCC 6.3 now provide a definition of __get_cpuid_count
in cpuid.h
. We define said function in a cbits
module as well (for compilers not providing an implementation in their
headers), which conflicts.
A test for the declaration is now performed by ./configure
, and if provided by the system, this version of the routine is used.
See: 0458a96
Various
- Dependency version bounds of
optparse-applicative
are widened to support current Stackage nightly. Related API changes are handled as well. See 495369d.