Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fmaf is an empty unreachable function with optimizations on cortex-m33 #123387

Open
PiJoules opened this issue Jan 17, 2025 · 3 comments
Open

fmaf is an empty unreachable function with optimizations on cortex-m33 #123387

PiJoules opened this issue Jan 17, 2025 · 3 comments
Assignees
Labels

Comments

@PiJoules
Copy link
Contributor

PiJoules commented Jan 17, 2025

The generic fmaf implementation in llvm-libc is

namespace LIBC_NAMESPACE_DECL {

LLVM_LIBC_FUNCTION(float, fmaf, (float x, float y, float z)) {
  return fputil::fma<float>(x, y, z);
}

} // namespace LIBC_NAMESPACE_DECL

On cortex-m33, fputil::fma<float> is

template <> LIBC_INLINE float fma(float x, float y, float z) {
  return __builtin_fmaf(x, y, z);
}

However, clang replaces the __builtin_fmaf call with a call to a normal fmaf at -O0

; Function Attrs: mustprogress noinline nounwind optnone
define linkonce_odr hidden noundef float @_ZN22__llvm_libc_20_0_0_git6fputil3fmaIffEET_T0_S3_S3_(float noundef %x, float noundef %y, float noundef %z) #0 comdat {
entry:
  %x.addr = alloca float, align 4
  %y.addr = alloca float, align 4
  %z.addr = alloca float, align 4
  store float %x, ptr %x.addr, align 4
  store float %y, ptr %y.addr, align 4
  store float %z, ptr %z.addr, align 4
  %0 = load float, ptr %x.addr, align 4
  %1 = load float, ptr %y.addr, align 4
  %2 = load float, ptr %z.addr, align 4
  %call = call float @fmaf(float noundef %0, float noundef %1, float noundef %2) #2
  ret float %call
}

Since fmaf just calls fputil::fma<float>, we end up with mutual recursion. With -Os or -O3, clang removes the function body for fmaf

define hidden noundef float @fmaf(float %x, float %y, float %z) #0 {
entry:
  unreachable
}

and since the entire function is unreachable, this just ends up becoming an empty function

fmaf:
        .fnstart
@ %bb.0:                                @ %entry
.Lfunc_end0:
        .size   fmaf, .Lfunc_end0-fmaf
        .cantunwind
        .fnend

It seems either (1) clang is incorrectly lowering the __builtin_fmaf to fmaf or (2) llvm-libc should have tighter checks on ensuring that a builtin call won't be lowered to a libcall. I would assume this shouldn't be lowered to a libcall in the first place since __ARM_FEATURE_FMA is defined, but perhaps it might be permissible for any compiler to lower a builtin call to a normal function call?

This can be reproduced by building armv8m.main-unknown-none-eabi runtimes from the fuchsia cache file. Alternatively, the fma.cpp code expands to

namespace [[gnu::visibility("hidden")]] __llvm_libc_20_0_0_git {

namespace fputil {
inline float fma(float x, float y, float z) {
  return __builtin_fmaf(x, y, z);
}
}  // namespace fputil

float fmaf(float x, float y, float z);

decltype(__llvm_libc_20_0_0_git::fmaf) __fmaf_impl__ __asm__("fmaf");
decltype(__llvm_libc_20_0_0_git::fmaf) fmaf [[gnu::alias("fmaf")]];
float __fmaf_impl__ (float x, float y, float z) {
  return fputil::fma(x, y, z);
}

}  // namespace __llvm_libc_20_0_0_git

which can be compiled with

clang++ --target=armv8m.main-none-eabi -mthumb -mfloat-abi=softfp -march=armv8m.main+fp+dsp -mcpu=cortex-m33 -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Os -ffreestanding -nostdlibinc -fno-builtin -fno-exceptions -fno-lax-vector-conversions -fno-unwind-tables -fno-asynchronous-unwind-tables -fno-rtti /tmp/test.cc
@PiJoules PiJoules added libc bug Indicates an unexpected problem or unintended behavior labels Jan 17, 2025
@llvmbot
Copy link
Member

llvmbot commented Jan 17, 2025

@llvm/issue-subscribers-bug

Author: None (PiJoules)

The generic `fmaf` implementation in llvm-libc is
namespace LIBC_NAMESPACE_DECL {

LLVM_LIBC_FUNCTION(float, fmaf, (float x, float y, float z)) {
  return fputil::fma&lt;float&gt;(x, y, z);
}

} // namespace LIBC_NAMESPACE_DECL

On cortex-m33, fputil::fma&lt;float&gt; is

template &lt;&gt; LIBC_INLINE float fma(float x, float y, float z) {
  return __builtin_fmaf(x, y, z);
}

However, clang replaces the __builtin_fmaf call with a call to a normal fmaf at -O0

; Function Attrs: mustprogress noinline nounwind optnone
define linkonce_odr hidden noundef float @<!-- -->_ZN22__llvm_libc_20_0_0_git6fputil3fmaIffEET_T0_S3_S3_(float noundef %x, float noundef %y, float noundef %z) #<!-- -->0 comdat {
entry:
  %x.addr = alloca float, align 4
  %y.addr = alloca float, align 4
  %z.addr = alloca float, align 4
  store float %x, ptr %x.addr, align 4
  store float %y, ptr %y.addr, align 4
  store float %z, ptr %z.addr, align 4
  %0 = load float, ptr %x.addr, align 4
  %1 = load float, ptr %y.addr, align 4
  %2 = load float, ptr %z.addr, align 4
  %call = call float @<!-- -->fmaf(float noundef %0, float noundef %1, float noundef %2) #<!-- -->2
  ret float %call
}

Since fmaf just calls fputil::fma&lt;float&gt;, we end up with a circular dependency. With -Os or -O3, clang removes the function body for fmaf

define hidden noundef float @<!-- -->fmaf(float %x, float %y, float %z) #<!-- -->0 {
entry:
  unreachable
}

and since the entire function is unreachable, this just ends up becoming an empty function

fmaf:
        .fnstart
@ %bb.0:                                @ %entry
.Lfunc_end0:
        .size   fmaf, .Lfunc_end0-fmaf
        .cantunwind
        .fnend

It seems either (1) clang is incorrectly lowering the __builtin_fmaf to fmaf or (2) llvm-libc should have tighter checks on ensuring that a builtin call won't be lowered to a libcall. I would assume this shouldn't be lowered to a libcall in the first place since __ARM_FEATURE_FMA is defined, but perhaps it might be permissible for any compiler to lower a builtin call to a normal function call?

This can be reproduced by building armv8m.main-unknown-none-eabi runtimes from the fuchsia cache file. Alternatively, the fma.cpp code expands to

namespace [[gnu::visibility("hidden")]] __llvm_libc_20_0_0_git {

namespace fputil {
inline float fma(float x, float y, float z) {
  return __builtin_fmaf(x, y, z);
}
}  // namespace fputil

float fmaf(float x, float y, float z);

decltype(__llvm_libc_20_0_0_git::fmaf) __fmaf_impl__ __asm__("fmaf");
decltype(__llvm_libc_20_0_0_git::fmaf) fmaf [[gnu::alias("fmaf")]];
float __fmaf_impl__ (float x, float y, float z) {
  return fputil::fma(x, y, z);
}

}  // namespace __llvm_libc_20_0_0_git

which can be compiled with

clang++ --target=armv8m.main-none-eabi -mthumb -mfloat-abi=softfp -march=armv8m.main+fp+dsp -mcpu=cortex-m33 -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Os -ffreestanding -nostdlibinc -fno-builtin -fno-exceptions -fno-lax-vector-conversions -fno-unwind-tables -fno-asynchronous-unwind-tables -fno-rtti /tmp/test.cc

@llvmbot
Copy link
Member

llvmbot commented Jan 17, 2025

@llvm/issue-subscribers-libc

Author: None (PiJoules)

The generic `fmaf` implementation in llvm-libc is
namespace LIBC_NAMESPACE_DECL {

LLVM_LIBC_FUNCTION(float, fmaf, (float x, float y, float z)) {
  return fputil::fma&lt;float&gt;(x, y, z);
}

} // namespace LIBC_NAMESPACE_DECL

On cortex-m33, fputil::fma&lt;float&gt; is

template &lt;&gt; LIBC_INLINE float fma(float x, float y, float z) {
  return __builtin_fmaf(x, y, z);
}

However, clang replaces the __builtin_fmaf call with a call to a normal fmaf at -O0

; Function Attrs: mustprogress noinline nounwind optnone
define linkonce_odr hidden noundef float @<!-- -->_ZN22__llvm_libc_20_0_0_git6fputil3fmaIffEET_T0_S3_S3_(float noundef %x, float noundef %y, float noundef %z) #<!-- -->0 comdat {
entry:
  %x.addr = alloca float, align 4
  %y.addr = alloca float, align 4
  %z.addr = alloca float, align 4
  store float %x, ptr %x.addr, align 4
  store float %y, ptr %y.addr, align 4
  store float %z, ptr %z.addr, align 4
  %0 = load float, ptr %x.addr, align 4
  %1 = load float, ptr %y.addr, align 4
  %2 = load float, ptr %z.addr, align 4
  %call = call float @<!-- -->fmaf(float noundef %0, float noundef %1, float noundef %2) #<!-- -->2
  ret float %call
}

Since fmaf just calls fputil::fma&lt;float&gt;, we end up with a circular dependency. With -Os or -O3, clang removes the function body for fmaf

define hidden noundef float @<!-- -->fmaf(float %x, float %y, float %z) #<!-- -->0 {
entry:
  unreachable
}

and since the entire function is unreachable, this just ends up becoming an empty function

fmaf:
        .fnstart
@ %bb.0:                                @ %entry
.Lfunc_end0:
        .size   fmaf, .Lfunc_end0-fmaf
        .cantunwind
        .fnend

It seems either (1) clang is incorrectly lowering the __builtin_fmaf to fmaf or (2) llvm-libc should have tighter checks on ensuring that a builtin call won't be lowered to a libcall. I would assume this shouldn't be lowered to a libcall in the first place since __ARM_FEATURE_FMA is defined, but perhaps it might be permissible for any compiler to lower a builtin call to a normal function call?

This can be reproduced by building armv8m.main-unknown-none-eabi runtimes from the fuchsia cache file. Alternatively, the fma.cpp code expands to

namespace [[gnu::visibility("hidden")]] __llvm_libc_20_0_0_git {

namespace fputil {
inline float fma(float x, float y, float z) {
  return __builtin_fmaf(x, y, z);
}
}  // namespace fputil

float fmaf(float x, float y, float z);

decltype(__llvm_libc_20_0_0_git::fmaf) __fmaf_impl__ __asm__("fmaf");
decltype(__llvm_libc_20_0_0_git::fmaf) fmaf [[gnu::alias("fmaf")]];
float __fmaf_impl__ (float x, float y, float z) {
  return fputil::fma(x, y, z);
}

}  // namespace __llvm_libc_20_0_0_git

which can be compiled with

clang++ --target=armv8m.main-none-eabi -mthumb -mfloat-abi=softfp -march=armv8m.main+fp+dsp -mcpu=cortex-m33 -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Os -ffreestanding -nostdlibinc -fno-builtin -fno-exceptions -fno-lax-vector-conversions -fno-unwind-tables -fno-asynchronous-unwind-tables -fno-rtti /tmp/test.cc

@EugeneZelenko EugeneZelenko removed the bug Indicates an unexpected problem or unintended behavior label Jan 17, 2025
@PiJoules
Copy link
Contributor Author

I believe for the short term, we can replace the __builtin_fmaf with either a vfma.f32 instruction or the normal equivalent C code. For a freestanding environment, it should still be permissible for the compiler to lower the __builtin_ equivalent for a libc function to the actual libc function.

For the long term, it might be useful to have an audit of all the __builtin_ functions that have libc-equivalents and ensuring that they don't end up calling that libc-equivalent. Ideally for each __builtin_ with a libc-equivalent, we would have some target-checking preprocessor checks that ensure it gets lowered to either (1) a target-specific intrinsic (ie. it doesn't have a libc-equivalent), or (2) inline asm to the actual intended instruction, or (3) language-level source code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants