Message ID | ZbkWAKenx6eXMx13@arm.com |
---|---|
State | New |
Headers | show |
Series | aarch64: Avoid out-of-range shrink-wrapped saves [PR111677] | expand |
Alex Coplan <alex.coplan@arm.com> writes: > Hi, > > The PR shows us ICEing due to an unrecognizable TFmode save emitted by > aarch64_process_components. The problem is that for T{I,F,D}mode we > conservatively require mems to be in range for x-register ldp/stp. That > is because (at least for TImode) it can be allocated to both GPRs and > FPRs, and in the GPR case that is an x-reg ldp/stp, and the FPR case is > a q-register load/store. > > As Richard pointed out in the PR, aarch64_get_separate_components > already checks that the offsets are suitable for a single load, so we > just need to choose a mode in aarch64_reg_save_mode that gives the full > q-register range. In this patch, we choose V16QImode as an alternative > 16-byte "bag-of-bits" mode that doesn't have the artificial range > restrictions imposed on T{I,F,D}mode. > > For T{F,D}mode in GCC 15 I think we could consider relaxing the > restriction imposed in aarch64_classify_address, as AFAIK T{F,D}mode can > only be allocated to FPRs (unlike TImode). But such a change seems too > invasive to consider for GCC 14 at this stage (let alone backports). GPRs can hold all three, due to the way aarch64_hard_regno_mode_ok is defined. (They can also hold individual Advanced SIMD vectors.) But the ABI says that TFmode is passed in FPRs, so I agree that it seems better to optimise for the FPR range. Same for TDmode. > Fortunately the new flexible load/store pair patterns in GCC 14 allow > this mode change to work without further changes. The backports are > more involved as we need to adjust the load/store pair handling to cater > for V16QImode in a few places. > > Note that for the testcase we are relying on the torture options to add > -funroll-loops at -O3 which is necessary to trigger the ICE on trunk > (but not on the 13 branch). > > Bootstrapped/regtested on aarch64-linux-gnu, OK for trunk? > > Thanks, > Alex > > gcc/ChangeLog: > > PR target/111677 > * config/aarch64/aarch64.cc (aarch64_reg_save_mode): Use > V16QImode for the full 16-byte FPR saves in the vector PCS case. > > gcc/testsuite/ChangeLog: > > PR target/111677 > * gcc.target/aarch64/torture/pr111677.c: New test. OK, thanks. Richard
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index a37d47b243e..4556b8dd504 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -2361,7 +2361,7 @@ aarch64_reg_save_mode (unsigned int regno) case ARM_PCS_SIMD: /* The vector PCS saves the low 128 bits (which is the full register on non-SVE targets). */ - return TFmode; + return V16QImode; case ARM_PCS_SVE: /* Use vectors of DImode for registers that need frame diff --git a/gcc/testsuite/gcc.target/aarch64/torture/pr111677.c b/gcc/testsuite/gcc.target/aarch64/torture/pr111677.c new file mode 100644 index 00000000000..6bb640c42c0 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/torture/pr111677.c @@ -0,0 +1,28 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target fopenmp } */ +/* { dg-options "-ffast-math -fstack-protector-strong -fopenmp" } */ +typedef struct { + long size_z; + int width; +} dt_bilateral_t; +typedef float dt_aligned_pixel_t[4]; +#pragma omp declare simd +void dt_bilateral_splat(dt_bilateral_t *b) { + float *buf; + long offsets[8]; + for (; b;) { + int firstrow; + for (int j = firstrow; j; j++) + for (int i; i < b->width; i++) { + dt_aligned_pixel_t contrib; + for (int k = 0; k < 4; k++) + buf[offsets[k]] += contrib[k]; + } + float *dest; + for (int j = (long)b; j; j++) { + float *src = (float *)b->size_z; + for (int i = 0; i < (long)b; i++) + dest[i] += src[i]; + } + } +}