Message ID | 87tvo31u3a.fsf@arm.com |
---|---|
State | New |
Headers | show |
Series | Allow inner-loop reductions with variable-length vectors | expand |
On August 9, 2018 4:40:41 PM GMT+02:00, Richard Sandiford <richard.sandiford@arm.com> wrote: >While working on PR 86871, I noticed we were being overly restrictive >when handling variable-length vectors. For: > > for (i : ...) > { > res = ...; > for (j : ...) > res op= ...; > a[i] = res; > } > >we don't need a reduction operation (although we do for double >reductions like: > > res = ...; > for (i : ...) > for (j : ...) > res op= ...; > a[i] = res; > >which must still be rejected). > >Tested on aarch64-linux-gnu (with and without SVE), aarch64_be-elf and >x86_64-linux-gnu. OK to install? OK. Richard. >Richard > > >2018-08-09 Richard Sandiford <richard.sandiford@arm.com> > >gcc/ > * tree-vect-loop.c (vectorizable_reduction): Allow inner-loop > reductions for variable-length vectors. > >gcc/testsuite/ > * gcc.target/aarch64/sve/reduc_8.c: New test. > >Index: gcc/tree-vect-loop.c >=================================================================== >--- gcc/tree-vect-loop.c 2018-08-01 16:14:50.227052736 +0100 >+++ gcc/tree-vect-loop.c 2018-08-09 15:38:35.230258362 +0100 >@@ -6711,6 +6711,7 @@ vectorizable_reduction (stmt_vec_info st > } > > if (reduction_type != EXTRACT_LAST_REDUCTION >+ && (!nested_cycle || double_reduc) > && reduc_fn == IFN_LAST > && !nunits_out.is_constant ()) > { >Index: gcc/testsuite/gcc.target/aarch64/sve/reduc_8.c >=================================================================== >--- /dev/null 2018-07-26 10:26:13.137955424 +0100 >+++ gcc/testsuite/gcc.target/aarch64/sve/reduc_8.c 2018-08-09 >15:38:35.230258362 +0100 >@@ -0,0 +1,20 @@ >+/* { dg-do compile } */ >+/* { dg-options "-O2 -ftree-vectorize" } */ >+ >+int >+reduc (int *restrict a, int *restrict b, int *restrict c) >+{ >+ for (int i = 0; i < 100; ++i) >+ { >+ int res = 0; >+ for (int j = 0; j < 100; ++j) >+ if (b[i + j] != 0) >+ res = c[i + j]; >+ a[i] = res; >+ } >+} >+ >+/* { dg-final { scan-assembler-times {\tcmpne\tp[0-9]+\.s, } 1 } } */ >+/* We ought to use the CMPNE result for the SEL too. */ >+/* { dg-final { scan-assembler-not {\tcmpeq\tp[0-9]+\.s, } { xfail >*-*-* } } } */ >+/* { dg-final { scan-assembler-times {\tsel\tz[0-9]+\.s, } 1 } } */
Index: gcc/tree-vect-loop.c =================================================================== --- gcc/tree-vect-loop.c 2018-08-01 16:14:50.227052736 +0100 +++ gcc/tree-vect-loop.c 2018-08-09 15:38:35.230258362 +0100 @@ -6711,6 +6711,7 @@ vectorizable_reduction (stmt_vec_info st } if (reduction_type != EXTRACT_LAST_REDUCTION + && (!nested_cycle || double_reduc) && reduc_fn == IFN_LAST && !nunits_out.is_constant ()) { Index: gcc/testsuite/gcc.target/aarch64/sve/reduc_8.c =================================================================== --- /dev/null 2018-07-26 10:26:13.137955424 +0100 +++ gcc/testsuite/gcc.target/aarch64/sve/reduc_8.c 2018-08-09 15:38:35.230258362 +0100 @@ -0,0 +1,20 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ftree-vectorize" } */ + +int +reduc (int *restrict a, int *restrict b, int *restrict c) +{ + for (int i = 0; i < 100; ++i) + { + int res = 0; + for (int j = 0; j < 100; ++j) + if (b[i + j] != 0) + res = c[i + j]; + a[i] = res; + } +} + +/* { dg-final { scan-assembler-times {\tcmpne\tp[0-9]+\.s, } 1 } } */ +/* We ought to use the CMPNE result for the SEL too. */ +/* { dg-final { scan-assembler-not {\tcmpeq\tp[0-9]+\.s, } { xfail *-*-* } } } */ +/* { dg-final { scan-assembler-times {\tsel\tz[0-9]+\.s, } 1 } } */