Message ID | 20210816031437.15856-1-samuel@sholland.org |
---|---|
State | Superseded |
Delegated to: | Tom Rini |
Headers | show |
Series | ARM: Prevent the compiler from using NEON registers | expand |
On Sun, 15 Aug 2021 22:14:37 -0500 Samuel Holland <samuel@sholland.org> wrote: Hi, in general I think the patch makes sense, and we should use that option since we also specify -msoft-float. > For ARMv8-A, NEON is standard, It should be noted that the ARMv8-A architecture itself treats FP and AdvSIMD as optional, and little cores like Cortex-A53 make this even an integration option [1]. This also gives another reason for this patch, as we cannot assume NEON support for *every* core we are compiling for (even though most A53s out there seem to include NEON). Anyway GCC decides to include both +fp and +simd when the basic (and probably default) "armv8-a" is used for -march [2], so we must indeed restrict it explicitly, when we want to avoid it. [1]https://developer.arm.com/documentation/ddi0500/j/Introduction/Implementation-options [2]https://gcc.gnu.org/onlinedocs/gcc/AArch64-Options.html -march=name > so the compiler can use it even when no > special target flags are provided. For example, it can use stores from > NEON registers to zero-initialize large structures. GCC 11 decides to > do this inside the DRAM init code for the Allwinner H6, which breaks > boot on that platform, as NEON is not available in SPL. And that brings up the question: why? The Cortex cores in all Allwinner SoCs support NEON, and we always clear CPTR_EL3 in start.S, so it should be usable. So I did some experiments, and I guess it's our old friend "unaligned access" again, because SIMD instructions themselves work (movi, str q0). But the SPL runs with the MMU off, so everything is device memory, and natural alignment is mandatory, even with SCTLR.A cleared. "stp q0, q0, [x0]" worked when x0 was 16 bytes aligned, but hang when it was not. The same applied to "stur q0, [x0]", which is used with an unaligned offset in the generated code (https://tpaste.us/qPEw). So this deserves some more research, for instance to find out if GCC ignores -mstrict-align here? Cheers, Andre > Fix this by > restricting the compiler to using GPRs only, not vector registers. > > Signed-off-by: Samuel Holland <samuel@sholland.org> > --- > arch/arm/config.mk | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/arch/arm/config.mk b/arch/arm/config.mk > index 16c63e12667..964c6b026ec 100644 > --- a/arch/arm/config.mk > +++ b/arch/arm/config.mk > @@ -25,6 +25,7 @@ endif > > PLATFORM_RELFLAGS += -fno-common -ffixed-r9 > PLATFORM_RELFLAGS += $(call cc-option, -msoft-float) \ > + $(call cc-option,-mgeneral-regs-only) \ > $(call cc-option,-mshort-load-bytes,$(call cc-option,-malignment-traps,)) > > # LLVM support
diff --git a/arch/arm/config.mk b/arch/arm/config.mk index 16c63e12667..964c6b026ec 100644 --- a/arch/arm/config.mk +++ b/arch/arm/config.mk @@ -25,6 +25,7 @@ endif PLATFORM_RELFLAGS += -fno-common -ffixed-r9 PLATFORM_RELFLAGS += $(call cc-option, -msoft-float) \ + $(call cc-option,-mgeneral-regs-only) \ $(call cc-option,-mshort-load-bytes,$(call cc-option,-malignment-traps,)) # LLVM support
For ARMv8-A, NEON is standard, so the compiler can use it even when no special target flags are provided. For example, it can use stores from NEON registers to zero-initialize large structures. GCC 11 decides to do this inside the DRAM init code for the Allwinner H6, which breaks boot on that platform, as NEON is not available in SPL. Fix this by restricting the compiler to using GPRs only, not vector registers. Signed-off-by: Samuel Holland <samuel@sholland.org> --- arch/arm/config.mk | 1 + 1 file changed, 1 insertion(+)