Skip to content

v0.8.2

Latest
Compare
Choose a tag to compare
@mr-c mr-c released this 02 May 10:07
· 62 commits to master since this release

SIMDe 0.8.2

Summary

  • Start of RISCV64 optimized implementation using the RVV1.0 vector extension! Thank you @eric900115 @howjmay @zengdage
  • 62 of the ARM Neon intrinsics added in SIMDe 0.8.0 had to be removed for not exactly matching the specs and real hardware
    (from the FCVTZS/FCVTMS/FCVTPS/FCVTNS families). This brings us down from 100% coverage of the NEON functions to 99.07%.

Details

Implementation of Arm intrinsics

NEON

  • arm neon: disable some FCVTZS/FCVTMS/FCVTPS/FCVTNS family intrinsics 339ffe4 @mr-c
  • arm neon sm3: check constant range 3d34fcd @mr-c
  • arm 32 bits: native def fixes; workarounds for gcc 22900e6 @Cuda-Chen
  • x86 implementations: allow _m128 access from SSE 114c3cd @mr-c

WASM intrinsics

  • wasm x86 impl: some were incorrectly marked SSE instead of SSE2 fee149a @mr-c

x86 intrinsics

SVML

  • SSE is good enough for native m128i and m128d types & functions 9982b27 @mr-c

XOP

  • fix some native functions 608200b @mr-c

Arch support

arm / arm64

  • arm platform: cleanup feature detection. 08c21f3 @mr-c
  • arm: enable more intrinsic function for armv7 416091e @zengdage

RISCV64

  • Initial Support for the RISC-V Vector Extension (RVV1.0) in ARM NEON (#1130) b4e805a @eric900115
  • arm: fix some neon2rvv intrinsic function error 2a548e5 @zengdage
  • arm: Add neon2rvv support in vand series intrinsics dac67f3 @howjmay
  • arm: improve performance in vabd_xxx for risc-v b63ba04 @zengdage
  • arm: improve performance in vhadd_xxx for risc-v a68fa90 @zengdage

Compiler Specific

Clang

  • detect clang versions 18 & 19 ed4a5cd @mr-c
  • arm neon clang: skip vrnd native before clang v18 e647f10 @mr-c
  • apple clang arm64: ignore SHA2 be48ef8 @mr-c

Emscripten

  • use __builtin_roundeven{f,} from version 3.1.43 onwards 4379740 @mr-c

MSVC

  • x86 test msvc: really disable warning 4799,4730 487507d @mr-c
  • sse2 MSVC _mm_pause implementaiton for x86 8d95f83 @mr-c
  • SSE is good enough for native m128i and m128d types & functions 9982b27 @mr-c

Testing with Docker/Podman & CI

  • CI: don't run twice on dependabot branches 70748cd @mr-c

Cirrus CI

  • upgrade to clang-17 7ab3240 @mr-c

GitHub Actions

  • test Mac arm64 0080b28 @mr-c
  • macos: report log if there is a configuration failure. df3e930 @mr-c
  • build(deps): bump actions/checkout from 3 to 4 (#1149) 9605608 @dependabot[bot]
  • build(deps): bump codecov/codecov-action from 3 to 4 25382c1 @dependabot[bot]
  • codecov: use token 2c45dd4 @mr-c
  • Add gcc arm 32bit armv8-a test in CI 72bde75 @Cuda-Chen
  • build for AMD Buildozer version 2 9746537 @mr-c

Packit CI

  • Drop i386 (i686) support. (#1155) cf68aaf @junaruga

Semaphore CI

  • stop testing on GCC 5 & 6, clang 3.9 & 4 due to forced upgrade to Ubuntu 20.04 9982f10 @mr-c

Misc

  • update list of fully implemented instruction sets (#1152) b568fcd @mr-c
  • typo fixes from codespell 8639fef @mr-c
  • README.md - move CLMUL to partial, list more of the CI.yml architectures 285b50d @Torinde
  • Update README.md - link to VPCLMULQDQ; mention MSA (#1157) 517da84 @Torinde
  • Update README.md (#1156) b88a66d @mr-c
  • README: two more related projects 7429dff @mr-c

New Contributors

Full Changelog: v0.8.0...v0.8.2