Message ID | 0612833c-daa9-4008-a376-07fb81d80811@arm.com |
---|---|
State | New |
Headers | show |
Series | aarch64: Fix ACLE SME streaming mode error in neon-sve-bridge | expand |
Richard Ball <richard.ball@arm.com> writes: > When using LTO, handling the pragma for sme before the pragma > for the neon-sve-bridge caused the following error on svset_neonq, > in the neon-sve-bridge.c test. > > error: ACLE function '0' can only be called when SME streaming mode is enabled. > > Handling the pragmas the other way around fixes this. Why is that the right fix though? And where did the weird function name in the error message come from? Trying it locally, I think the patch works for the testcase because it mimics the testcase's particular include order. But the patch regresses LTO for: #include <arm_sme.h> int main (void) { __builtin_printf ("%d\n", (int) svcntsb ()); return 0; } because the handle_*() calls no longer match the arm_sme.h include order. I guess this is where assigning dynamic ids breaks down. It worked well for one header file, and arguably for two (given that one required the other). But now we need a slightly different approach. Some ideas for fixes: (1) Treat arm_neon_sve_bridge.h and arm_sme.h as separate enums in aarch64_builtin_class. (2) Subdivide AARCH64_BUILTIN_SVE internally within arm_sve.h into arm_sve.h, arm_neon_sve_bridge.h and arm_sme.h, using the low 2 bits of the function code. (3) Manually reserve a number of functions for each header file and assert that the reserved number is big enough. So arm_sve.h function codes would start at 0, arm_sme.h at N (for some "nice" N that is bigger than the number of functions in arm_sve.h), and arm_neon_sve_bridge.h at N+M, etc. (4) Like (3), but continue to maintain ids dynamically. Have a way of cycling through all the functions in a header file *without* declaring them, and just pushing nulls to registered_functions instead. I think (1) goes against the current design, since aarch64_builtin_class really selects a framework rather than a header file. (2) would mean maintaining separate datastructures for each header file. (3) could be wasteful, since we'd either have padding elements in registered_functions, or we'd need to convert registered_functions to some other datastructure. So I'm leaning towards (4). Thanks, Richard > > No regressions on aarch64-none-elf. > > > gcc/ChangeLog: > > * config/aarch64/aarch64-sve-builtins.cc (init_builtins): > Re-order handle pragma functions for lto. > > diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc b/gcc/config/aarch64/aarch64-sve-builtins.cc > index c2f1486315f02c3e38f808b3124a9b4cf49a4708..10f1df233ca9ae4e4a639a45a25db1f055a7d7c3 100644 > --- a/gcc/config/aarch64/aarch64-sve-builtins.cc > +++ b/gcc/config/aarch64/aarch64-sve-builtins.cc > @@ -4511,8 +4511,8 @@ init_builtins () > if (in_lto_p) > { > handle_arm_sve_h (); > - handle_arm_sme_h (); > handle_arm_neon_sve_bridge_h (); > + handle_arm_sme_h (); > } > } >
diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc b/gcc/config/aarch64/aarch64-sve-builtins.cc index c2f1486315f02c3e38f808b3124a9b4cf49a4708..10f1df233ca9ae4e4a639a45a25db1f055a7d7c3 100644 --- a/gcc/config/aarch64/aarch64-sve-builtins.cc +++ b/gcc/config/aarch64/aarch64-sve-builtins.cc @@ -4511,8 +4511,8 @@ init_builtins () if (in_lto_p) { handle_arm_sve_h (); - handle_arm_sme_h (); handle_arm_neon_sve_bridge_h (); + handle_arm_sme_h (); } }