`fmaf` is an empty unreachable function with optimizations on cortex-m33 #123387

PiJoules · 2025-01-17T19:06:13Z

The generic fmaf implementation in llvm-libc is

namespace LIBC_NAMESPACE_DECL {

LLVM_LIBC_FUNCTION(float, fmaf, (float x, float y, float z)) {
  return fputil::fma<float>(x, y, z);
}

} // namespace LIBC_NAMESPACE_DECL

On cortex-m33, fputil::fma<float> is

template <> LIBC_INLINE float fma(float x, float y, float z) {
  return __builtin_fmaf(x, y, z);
}

However, clang replaces the __builtin_fmaf call with a call to a normal fmaf at -O0

; Function Attrs: mustprogress noinline nounwind optnone
define linkonce_odr hidden noundef float @_ZN22__llvm_libc_20_0_0_git6fputil3fmaIffEET_T0_S3_S3_(float noundef %x, float noundef %y, float noundef %z) #0 comdat {
entry:
  %x.addr = alloca float, align 4
  %y.addr = alloca float, align 4
  %z.addr = alloca float, align 4
  store float %x, ptr %x.addr, align 4
  store float %y, ptr %y.addr, align 4
  store float %z, ptr %z.addr, align 4
  %0 = load float, ptr %x.addr, align 4
  %1 = load float, ptr %y.addr, align 4
  %2 = load float, ptr %z.addr, align 4
  %call = call float @fmaf(float noundef %0, float noundef %1, float noundef %2) #2
  ret float %call
}

Since fmaf just calls fputil::fma<float>, we end up with mutual recursion. With -Os or -O3, clang removes the function body for fmaf

define hidden noundef float @fmaf(float %x, float %y, float %z) #0 {
entry:
  unreachable
}

and since the entire function is unreachable, this just ends up becoming an empty function

fmaf:
        .fnstart
@ %bb.0:                                @ %entry
.Lfunc_end0:
        .size   fmaf, .Lfunc_end0-fmaf
        .cantunwind
        .fnend

It seems either (1) clang is incorrectly lowering the __builtin_fmaf to fmaf or (2) llvm-libc should have tighter checks on ensuring that a builtin call won't be lowered to a libcall. I would assume this shouldn't be lowered to a libcall in the first place since __ARM_FEATURE_FMA is defined, but perhaps it might be permissible for any compiler to lower a builtin call to a normal function call?

This can be reproduced by building armv8m.main-unknown-none-eabi runtimes from the fuchsia cache file. Alternatively, the fma.cpp code expands to

namespace [[gnu::visibility("hidden")]] __llvm_libc_20_0_0_git {

namespace fputil {
inline float fma(float x, float y, float z) {
  return __builtin_fmaf(x, y, z);
}
}  // namespace fputil

float fmaf(float x, float y, float z);

decltype(__llvm_libc_20_0_0_git::fmaf) __fmaf_impl__ __asm__("fmaf");
decltype(__llvm_libc_20_0_0_git::fmaf) fmaf [[gnu::alias("fmaf")]];
float __fmaf_impl__ (float x, float y, float z) {
  return fputil::fma(x, y, z);
}

}  // namespace __llvm_libc_20_0_0_git

which can be compiled with

clang++ --target=armv8m.main-none-eabi -mthumb -mfloat-abi=softfp -march=armv8m.main+fp+dsp -mcpu=cortex-m33 -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Os -ffreestanding -nostdlibinc -fno-builtin -fno-exceptions -fno-lax-vector-conversions -fno-unwind-tables -fno-asynchronous-unwind-tables -fno-rtti /tmp/test.cc

The text was updated successfully, but these errors were encountered:

llvmbot · 2025-01-17T19:06:43Z

@llvm/issue-subscribers-bug

Author: None (PiJoules)

The generic `fmaf` implementation in llvm-libc is

namespace LIBC_NAMESPACE_DECL {

LLVM_LIBC_FUNCTION(float, fmaf, (float x, float y, float z)) {
  return fputil::fma&lt;float&gt;(x, y, z);
}

} // namespace LIBC_NAMESPACE_DECL

On cortex-m33, fputil::fma<float> is

template &lt;&gt; LIBC_INLINE float fma(float x, float y, float z) {
  return __builtin_fmaf(x, y, z);
}

However, clang replaces the __builtin_fmaf call with a call to a normal fmaf at -O0

; Function Attrs: mustprogress noinline nounwind optnone
define linkonce_odr hidden noundef float @<!-- -->_ZN22__llvm_libc_20_0_0_git6fputil3fmaIffEET_T0_S3_S3_(float noundef %x, float noundef %y, float noundef %z) #<!-- -->0 comdat {
entry:
  %x.addr = alloca float, align 4
  %y.addr = alloca float, align 4
  %z.addr = alloca float, align 4
  store float %x, ptr %x.addr, align 4
  store float %y, ptr %y.addr, align 4
  store float %z, ptr %z.addr, align 4
  %0 = load float, ptr %x.addr, align 4
  %1 = load float, ptr %y.addr, align 4
  %2 = load float, ptr %z.addr, align 4
  %call = call float @<!-- -->fmaf(float noundef %0, float noundef %1, float noundef %2) #<!-- -->2
  ret float %call
}

Since fmaf just calls fputil::fma<float>, we end up with a circular dependency. With -Os or -O3, clang removes the function body for fmaf

define hidden noundef float @<!-- -->fmaf(float %x, float %y, float %z) #<!-- -->0 {
entry:
  unreachable
}

and since the entire function is unreachable, this just ends up becoming an empty function

fmaf:
        .fnstart
@ %bb.0:                                @ %entry
.Lfunc_end0:
        .size   fmaf, .Lfunc_end0-fmaf
        .cantunwind
        .fnend

It seems either (1) clang is incorrectly lowering the __builtin_fmaf to fmaf or (2) llvm-libc should have tighter checks on ensuring that a builtin call won't be lowered to a libcall. I would assume this shouldn't be lowered to a libcall in the first place since __ARM_FEATURE_FMA is defined, but perhaps it might be permissible for any compiler to lower a builtin call to a normal function call?

This can be reproduced by building armv8m.main-unknown-none-eabi runtimes from the fuchsia cache file. Alternatively, the fma.cpp code expands to

namespace [[gnu::visibility("hidden")]] __llvm_libc_20_0_0_git {

namespace fputil {
inline float fma(float x, float y, float z) {
  return __builtin_fmaf(x, y, z);
}
}  // namespace fputil

float fmaf(float x, float y, float z);

decltype(__llvm_libc_20_0_0_git::fmaf) __fmaf_impl__ __asm__("fmaf");
decltype(__llvm_libc_20_0_0_git::fmaf) fmaf [[gnu::alias("fmaf")]];
float __fmaf_impl__ (float x, float y, float z) {
  return fputil::fma(x, y, z);
}

}  // namespace __llvm_libc_20_0_0_git

which can be compiled with

clang++ --target=armv8m.main-none-eabi -mthumb -mfloat-abi=softfp -march=armv8m.main+fp+dsp -mcpu=cortex-m33 -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Os -ffreestanding -nostdlibinc -fno-builtin -fno-exceptions -fno-lax-vector-conversions -fno-unwind-tables -fno-asynchronous-unwind-tables -fno-rtti /tmp/test.cc

llvmbot · 2025-01-17T19:06:43Z

@llvm/issue-subscribers-libc

Author: None (PiJoules)

The generic `fmaf` implementation in llvm-libc is

namespace LIBC_NAMESPACE_DECL {

LLVM_LIBC_FUNCTION(float, fmaf, (float x, float y, float z)) {
  return fputil::fma&lt;float&gt;(x, y, z);
}

} // namespace LIBC_NAMESPACE_DECL

On cortex-m33, fputil::fma<float> is

template &lt;&gt; LIBC_INLINE float fma(float x, float y, float z) {
  return __builtin_fmaf(x, y, z);
}

However, clang replaces the __builtin_fmaf call with a call to a normal fmaf at -O0

; Function Attrs: mustprogress noinline nounwind optnone
define linkonce_odr hidden noundef float @<!-- -->_ZN22__llvm_libc_20_0_0_git6fputil3fmaIffEET_T0_S3_S3_(float noundef %x, float noundef %y, float noundef %z) #<!-- -->0 comdat {
entry:
  %x.addr = alloca float, align 4
  %y.addr = alloca float, align 4
  %z.addr = alloca float, align 4
  store float %x, ptr %x.addr, align 4
  store float %y, ptr %y.addr, align 4
  store float %z, ptr %z.addr, align 4
  %0 = load float, ptr %x.addr, align 4
  %1 = load float, ptr %y.addr, align 4
  %2 = load float, ptr %z.addr, align 4
  %call = call float @<!-- -->fmaf(float noundef %0, float noundef %1, float noundef %2) #<!-- -->2
  ret float %call
}

Since fmaf just calls fputil::fma<float>, we end up with a circular dependency. With -Os or -O3, clang removes the function body for fmaf

define hidden noundef float @<!-- -->fmaf(float %x, float %y, float %z) #<!-- -->0 {
entry:
  unreachable
}

and since the entire function is unreachable, this just ends up becoming an empty function

fmaf:
        .fnstart
@ %bb.0:                                @ %entry
.Lfunc_end0:
        .size   fmaf, .Lfunc_end0-fmaf
        .cantunwind
        .fnend

It seems either (1) clang is incorrectly lowering the __builtin_fmaf to fmaf or (2) llvm-libc should have tighter checks on ensuring that a builtin call won't be lowered to a libcall. I would assume this shouldn't be lowered to a libcall in the first place since __ARM_FEATURE_FMA is defined, but perhaps it might be permissible for any compiler to lower a builtin call to a normal function call?

This can be reproduced by building armv8m.main-unknown-none-eabi runtimes from the fuchsia cache file. Alternatively, the fma.cpp code expands to

namespace [[gnu::visibility("hidden")]] __llvm_libc_20_0_0_git {

namespace fputil {
inline float fma(float x, float y, float z) {
  return __builtin_fmaf(x, y, z);
}
}  // namespace fputil

float fmaf(float x, float y, float z);

decltype(__llvm_libc_20_0_0_git::fmaf) __fmaf_impl__ __asm__("fmaf");
decltype(__llvm_libc_20_0_0_git::fmaf) fmaf [[gnu::alias("fmaf")]];
float __fmaf_impl__ (float x, float y, float z) {
  return fputil::fma(x, y, z);
}

}  // namespace __llvm_libc_20_0_0_git

which can be compiled with

clang++ --target=armv8m.main-none-eabi -mthumb -mfloat-abi=softfp -march=armv8m.main+fp+dsp -mcpu=cortex-m33 -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Os -ffreestanding -nostdlibinc -fno-builtin -fno-exceptions -fno-lax-vector-conversions -fno-unwind-tables -fno-asynchronous-unwind-tables -fno-rtti /tmp/test.cc

PiJoules · 2025-01-17T20:44:32Z

I believe for the short term, we can replace the __builtin_fmaf with either a vfma.f32 instruction or the normal equivalent C code. For a freestanding environment, it should still be permissible for the compiler to lower the __builtin_ equivalent for a libc function to the actual libc function.

For the long term, it might be useful to have an audit of all the __builtin_ functions that have libc-equivalents and ensuring that they don't end up calling that libc-equivalent. Ideally for each __builtin_ with a libc-equivalent, we would have some target-checking preprocessor checks that ensure it gets lowered to either (1) a target-specific intrinsic (ie. it doesn't have a libc-equivalent), or (2) inline asm to the actual intended instruction, or (3) language-level source code.

PiJoules added libc bug Indicates an unexpected problem or unintended behavior labels Jan 17, 2025

EugeneZelenko removed the bug Indicates an unexpected problem or unintended behavior label Jan 17, 2025

PiJoules assigned lntue Jan 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`fmaf` is an empty unreachable function with optimizations on cortex-m33 #123387

`fmaf` is an empty unreachable function with optimizations on cortex-m33 #123387

PiJoules commented Jan 17, 2025 •

edited

Loading

llvmbot commented Jan 17, 2025

llvmbot commented Jan 17, 2025

PiJoules commented Jan 17, 2025

fmaf is an empty unreachable function with optimizations on cortex-m33 #123387

fmaf is an empty unreachable function with optimizations on cortex-m33 #123387

Comments

PiJoules commented Jan 17, 2025 • edited Loading

llvmbot commented Jan 17, 2025

llvmbot commented Jan 17, 2025

PiJoules commented Jan 17, 2025

`fmaf` is an empty unreachable function with optimizations on cortex-m33 #123387

`fmaf` is an empty unreachable function with optimizations on cortex-m33 #123387

PiJoules commented Jan 17, 2025 •

edited

Loading