Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

arm_float_to_q31: why not to use VCVT.F32.S32 on Cortex-M4F #212

Open
AlanCui4080 opened this issue Oct 21, 2024 · 9 comments
Open

arm_float_to_q31: why not to use VCVT.F32.S32 on Cortex-M4F #212

AlanCui4080 opened this issue Oct 21, 2024 · 9 comments
Labels
enhancement New feature or request question Further information is requested

Comments

@AlanCui4080
Copy link

Hi,

As what we know, Cortex-M4F implemented a little set of FPU instructions including VCVT.F32.S32, and GCC did have a builtin intrinsic for it.
But the question is why the intrinsic is only enabled when NEON available in GCC, and also, why not to use it in arm_float_to_q31.

Alan.

@christophe0606 christophe0606 added question Further information is requested review Under review enhancement New feature or request and removed review Under review labels Oct 21, 2024
@christophe0606
Copy link
Contributor

@AlanCui4080 I don't see any reason. Perhaps the function was first developped for M0 and was not upgraded to support all other architectures.

@AlanCui4080
Copy link
Author

AlanCui4080 commented Oct 21, 2024

@christophe0606

Sorry for my mistake, it's ok to include arm_neon.h on M4F, even there is only a subset of NEON implemented on M4F, but the vcvt_s32_f32 will call a SIMD instruction "FCVTZS Vd.2S,Vn.2S" which is invalid on M4F. The only vaild one is "VCVT.F32.S32 Sd,Sm #fbits" included in FPv4-SP.
I'm testing it on my STM32G4, if that be ok, i will put a pull request.

@christophe0606
Copy link
Contributor

christophe0606 commented Oct 21, 2024

@AlanCui4080 I won't include arm_neon.h for M4F.

There are thus two possibilities : this intrinsics is supported by the Arm C language extensions (ACLE). Unfortunately, there are still too many compilers that are not fully implementing all of the ACLE.

That's why most of the intrinsics used by CMSIS-DSP are coming from CMSIS-Core (part of CMSIS-6). And probably, you'll need to open an issue on CMSIS-Core if you want this new intrinsic to be supported

@AlanCui4080
Copy link
Author

AlanCui4080 commented Oct 21, 2024

@christophe0606
I figure out it, ACLE have no single precision version (FPv4-SP) for vcvt_s32_f32, it's either d-register or q-register in ACLE.
So I do believe someone forget to add it into ACLE, because this instruction is unpopular and not wide used.

And the CMSIS-Core seems to be not include any part of FPU instructions, where should i put it in.

Note: following inline asm is proved usable as a replacement of arm_float_to_q31.

asm("vcvt.s32.f32 %0, %0, #31": "+t"(thetaf_divpi.f)::);

@AlanCui4080
Copy link
Author

@christophe0606
Hi,
Is that ok to add a inline assembly specilized for M4? If so, i will open a pull request.

@christophe0606
Copy link
Contributor

@AlanCui4080 asm definitions are provided through CMSIS-Core by defining new intrinsics macros in the compiler headers

So you can try to open a github issue on the CMSIS repository.

@JonatanAntoni
Copy link
Member

@christophe0606, @AlanCui4080, where does this new instrinsic belong to?
Is this one the correct location? https://github.com/ARM-software/CMSIS_6/blob/8c4dc58928b3347f6aa98b6fb2bf6770f32a72b7/CMSIS/Core/Include/cmsis_gcc.h#L877-L992

Please feel free to raise a PR adding the required define. Please bare in mind that this is location not specific to Cortex-M4F. Nor is is specialized for Arm-v7EM but is pulled for A- and R-class as well. If this is strictly for Arm-v7EM, it would need to go into https://github.com/ARM-software/CMSIS_6/blob/main/CMSIS/Core/Include/m-profile/cmsis_gcc_m.h et al.

@christophe0606
Copy link
Contributor

@JonatanAntoni @AlanCui4080 It must be added to each compiler and would be for Cortex-M only. (Since the intrinsics is defined with Neon header).

I don't have compiler specific implementation (except in case of major compiler bug) in CMSIS-DSP so this new intrinsics should be made available for all compiler so that CMSIS-DSP can still be built with all of them.

Otherwise, it is now possible to define functions as WEAK for linker in CMSIS-DSP. So, another possibility is for a user to replace a specific function with a different implementation if for some reason we can't do it on our side.

@AlanCui4080
Copy link
Author

@christophe0606, @AlanCui4080, where does this new instrinsic belong to? Is this one the correct location? https://github.com/ARM-software/CMSIS_6/blob/8c4dc58928b3347f6aa98b6fb2bf6770f32a72b7/CMSIS/Core/Include/cmsis_gcc.h#L877-L992

@JonatanAntoni
No, that is not a DSP instruction, it's belong to FPU. I have never seen intrinsics for FPUs in CMSIS-Core.
I think that's because CMSIS-DSP is designed for fixed-point at beginning, and FPU instructions are duty of compliers.

Meanwhile, there is no complier include intrinsics for FPU instuctions (But have MVE&NEON) to itself. Well, so i think people all focused on newer architecture such one have MVE. So armv7 old FPU just be forgot like 8087 insturctions.

See https://developer.arm.com/documentation/ddi0439/b/BEHJADED, since there is no absolute value statement, square root statement, or q31 to float statement in C language, compliers will never generate such instructions to accelerate program running. For newlib, they wrote a inline asm for sqrtf() see https://github.com/bminor/newlib/blob/master/newlib/libm/machine/arm/e_sqrt.c, this fact also pointed out that "there is no complier include intrinsics for FPU instuctions". By the way, i didn't see they did the same thing for "FABS.F32" or "FCVT.XX.XX", may because just no one cared.

These FPU instructions are orphans, i have no idea about where to add them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants