Message ID | 20220118104338.2081259-1-giulio.benetti@benettiengineering.com |
---|---|
Headers | show |
Series | Use the best FPU strategies on 32-bits Arm Cortex | expand |
Hello, On Tue, 18 Jan 2022 11:43:10 +0100 Giulio Benetti <giulio.benetti@benettiengineering.com> wrote: > this patchset aims to enable the best FPU strategy for every board with > 32-bits Arm Cortex actually present in Buildroot. I don't own these boards > so I can't test these changes. What is about Allwinner doesn't worry me > because I've tested a lot of cases with Olimex boards, but the other > changes are still to be tested. > > So I ask to the board maintainers to test the patches that involve their > boards if possible. Anyway I've checked well all the SoCs Datasheet and I > think I've made it correctly, even if in some of them it's not specified > if VFPv4 means -D32. I assume it like that because otherwise -D16 is > usually specified. Do we have a good understand of what -mfpu=neon-vfpv4 does? Your patch series basically converts many defconfigs to use this -mfpu value, but it's not clear to me how it works. What does it mean to combine NEON and VFPv4 instructions? Regarding VFPv4 D16 vs. D32, https://developer.arm.com/documentation/den0018/a/Compiling-NEON-Instructions/GCC-command-line-options/Option-to-specify-the-FPU tells us: VFPv3 and VFPv4 implementations provide 32 double-precision registers. However, when NEON unit is not present, the top sixteen registers (D16-D31) become optional. This is shown by the -d16 in the option name, which means that the top sixteen D registers are not available. So, my understanding is that when NEON is available, the VFPv4 is guaranteed to have the 32 double-precision registers (D32). Thomas
Thomas, Giulio, All, On 2022-01-22 15:32 +0100, Thomas Petazzoni spake thusly: > On Tue, 18 Jan 2022 11:43:10 +0100 > Giulio Benetti <giulio.benetti@benettiengineering.com> wrote: > > > this patchset aims to enable the best FPU strategy for every board with > > 32-bits Arm Cortex actually present in Buildroot. I don't own these boards > > so I can't test these changes. What is about Allwinner doesn't worry me > > because I've tested a lot of cases with Olimex boards, but the other > > changes are still to be tested. > > > > So I ask to the board maintainers to test the patches that involve their > > boards if possible. Anyway I've checked well all the SoCs Datasheet and I > > think I've made it correctly, even if in some of them it's not specified > > if VFPv4 means -D32. I assume it like that because otherwise -D16 is > > usually specified. > > Do we have a good understand of what -mfpu=neon-vfpv4 does? Your patch > series basically converts many defconfigs to use this -mfpu value, but > it's not clear to me how it works. What does it mean to combine NEON > and VFPv4 instructions? The gcc man page states that specifying Neon as part of the fpu setting has no effect, unless the -funsafe-math-optimizations is also specified, because Neon is not compliant zith IEEE 754: If the selected floating-point hardware includes the NEON extension (e.g. -mfpu=neon), note that floating-point operations are not generated by GCC's auto-vectorization pass unless -funsafe-math-optimizations is also specified. This is because NEON hardware does not fully implement the IEEE 754 standard for floating-point arithmetic (in particular denormal values are treated as zero), so the use of NEON instructions may lead to a loss of precision. So it is my understanding that using Neon is not a good idea overall. It should only be requested on-demand by people who know what they are doing, and most probably, be setting appropriate CFLAGS on a per-package basis. Additionally, changing the default FPU setting on those defconfigs is not really interesting. Indeed, the base system that (most of) those defconfig build are probably not exercising the FPU setting much (the busybox login and shell are probably not using much FPU insns). So, for me, this series is a no-go, first because it has not been tsted on actual hardware, second because some of the changes introduce a dubious feature which is actually a no-op at best, or worse will generate incorrect code. Regards, Yann E. MORIN. > Regarding VFPv4 D16 vs. D32, > https://developer.arm.com/documentation/den0018/a/Compiling-NEON-Instructions/GCC-command-line-options/Option-to-specify-the-FPU > tells us: > > VFPv3 and VFPv4 implementations provide 32 double-precision > registers. However, when NEON unit is not present, the top sixteen > registers (D16-D31) become optional. This is shown by the -d16 in the > option name, which means that the top sixteen D registers are not > available. > > So, my understanding is that when NEON is available, the VFPv4 is > guaranteed to have the 32 double-precision registers (D32). > > Thomas > -- > Thomas Petazzoni, co-owner and CEO, Bootlin > Embedded Linux and Kernel engineering and training > https://bootlin.com
Hi Yann, Thomas, All, On 25/01/22 21:55, Yann E. MORIN wrote: > Thomas, Giulio, All, > > On 2022-01-22 15:32 +0100, Thomas Petazzoni spake thusly: >> On Tue, 18 Jan 2022 11:43:10 +0100 >> Giulio Benetti <giulio.benetti@benettiengineering.com> wrote: >> >>> this patchset aims to enable the best FPU strategy for every board with >>> 32-bits Arm Cortex actually present in Buildroot. I don't own these boards >>> so I can't test these changes. What is about Allwinner doesn't worry me >>> because I've tested a lot of cases with Olimex boards, but the other >>> changes are still to be tested. >>> >>> So I ask to the board maintainers to test the patches that involve their >>> boards if possible. Anyway I've checked well all the SoCs Datasheet and I >>> think I've made it correctly, even if in some of them it's not specified >>> if VFPv4 means -D32. I assume it like that because otherwise -D16 is >>> usually specified. >> >> Do we have a good understand of what -mfpu=neon-vfpv4 does? Your patch >> series basically converts many defconfigs to use this -mfpu value, but >> it's not clear to me how it works. What does it mean to combine NEON >> and VFPv4 instructions? > > The gcc man page states that specifying Neon as part of the fpu setting > has no effect, unless the -funsafe-math-optimizations is also specified, > because Neon is not compliant zith IEEE 754: > > If the selected floating-point hardware includes the NEON extension > (e.g. -mfpu=neon), note that floating-point operations are not > generated by GCC's auto-vectorization pass unless > -funsafe-math-optimizations is also specified. This is because NEON > hardware does not fully implement the IEEE 754 standard for > floating-point arithmetic (in particular denormal values are treated > as zero), so the use of NEON instructions may lead to a loss of > precision. > > So it is my understanding that using Neon is not a good idea overall. It > should only be requested on-demand by people who know what they are > doing, and most probably, be setting appropriate CFLAGS on a per-package > basis. > > Additionally, changing the default FPU setting on those defconfigs is > not really interesting. Indeed, the base system that (most of) those > defconfig build are probably not exercising the FPU setting much (the > busybox login and shell are probably not using much FPU insns). > > So, for me, this series is a no-go, first because it has not been tsted > on actual hardware, second because some of the changes introduce a > dubious feature which is actually a no-op at best, or worse will > generate incorrect code. I agree with you, it's too dangerous. I mean, I've tested on all Allwinner Axx series, but haven't check that in the end the assembly code corresponding. What I plan to do later, so now you can reject this patchset, is to use the smallest package that uses fpu instructions for sure, then enable for example: -mfpu=vfpv4 check AND -mfpu=neon-vfpv4 This way I will have my real answer before eventually re-submitting while explaining this in detail. Also, once I will have cleared this topic and it comes out that fpu is not use with neon-vfpv4, then I'll send patches to remove that fpu strategy on the boards I've set it to it. I thought it was easier, instead this is an insidious topic. So yes, let's drop this and thank you all for helping me to clarify. Kind regards!