Allow inner-loop reductions with variable-length vectors

Message ID 87tvo31u3a.fsf@arm.com
State New
Headers show
Series
  • Allow inner-loop reductions with variable-length vectors
Related show

Commit Message

Richard Sandiford Aug. 9, 2018, 2:40 p.m.
While working on PR 86871, I noticed we were being overly restrictive
when handling variable-length vectors.  For:

  for (i : ...)
    {
      res = ...;
      for (j : ...)
        res op= ...;
      a[i] = res;
    }

we don't need a reduction operation (although we do for double
reductions like:

  res = ...;
  for (i : ...)
    for (j : ...)
      res op= ...;
  a[i] = res;

which must still be rejected).

Tested on aarch64-linux-gnu (with and without SVE), aarch64_be-elf and
x86_64-linux-gnu.  OK to install?

Richard


2018-08-09  Richard Sandiford  <richard.sandiford@arm.com>

gcc/
	* tree-vect-loop.c (vectorizable_reduction): Allow inner-loop
	reductions for variable-length vectors.

gcc/testsuite/
	* gcc.target/aarch64/sve/reduc_8.c: New test.

Comments

Richard Biener Aug. 9, 2018, 3:42 p.m. | #1
On August 9, 2018 4:40:41 PM GMT+02:00, Richard Sandiford <richard.sandiford@arm.com> wrote:
>While working on PR 86871, I noticed we were being overly restrictive
>when handling variable-length vectors.  For:
>
>  for (i : ...)
>    {
>      res = ...;
>      for (j : ...)
>        res op= ...;
>      a[i] = res;
>    }
>
>we don't need a reduction operation (although we do for double
>reductions like:
>
>  res = ...;
>  for (i : ...)
>    for (j : ...)
>      res op= ...;
>  a[i] = res;
>
>which must still be rejected).
>
>Tested on aarch64-linux-gnu (with and without SVE), aarch64_be-elf and
>x86_64-linux-gnu.  OK to install?

OK. 

Richard. 

>Richard
>
>
>2018-08-09  Richard Sandiford  <richard.sandiford@arm.com>
>
>gcc/
>	* tree-vect-loop.c (vectorizable_reduction): Allow inner-loop
>	reductions for variable-length vectors.
>
>gcc/testsuite/
>	* gcc.target/aarch64/sve/reduc_8.c: New test.
>
>Index: gcc/tree-vect-loop.c
>===================================================================
>--- gcc/tree-vect-loop.c	2018-08-01 16:14:50.227052736 +0100
>+++ gcc/tree-vect-loop.c	2018-08-09 15:38:35.230258362 +0100
>@@ -6711,6 +6711,7 @@ vectorizable_reduction (stmt_vec_info st
>     }
> 
>   if (reduction_type != EXTRACT_LAST_REDUCTION
>+      && (!nested_cycle || double_reduc)
>       && reduc_fn == IFN_LAST
>       && !nunits_out.is_constant ())
>     {
>Index: gcc/testsuite/gcc.target/aarch64/sve/reduc_8.c
>===================================================================
>--- /dev/null	2018-07-26 10:26:13.137955424 +0100
>+++ gcc/testsuite/gcc.target/aarch64/sve/reduc_8.c	2018-08-09
>15:38:35.230258362 +0100
>@@ -0,0 +1,20 @@
>+/* { dg-do compile } */
>+/* { dg-options "-O2 -ftree-vectorize" } */
>+
>+int
>+reduc (int *restrict a, int *restrict b, int *restrict c)
>+{
>+  for (int i = 0; i < 100; ++i)
>+    {
>+      int res = 0;
>+      for (int j = 0; j < 100; ++j)
>+	if (b[i + j] != 0)
>+	  res = c[i + j];
>+      a[i] = res;
>+    }
>+}
>+
>+/* { dg-final { scan-assembler-times {\tcmpne\tp[0-9]+\.s, } 1 } } */
>+/* We ought to use the CMPNE result for the SEL too.  */
>+/* { dg-final { scan-assembler-not {\tcmpeq\tp[0-9]+\.s, } { xfail
>*-*-* } } } */
>+/* { dg-final { scan-assembler-times {\tsel\tz[0-9]+\.s, } 1 } } */

Patch

Index: gcc/tree-vect-loop.c
===================================================================
--- gcc/tree-vect-loop.c	2018-08-01 16:14:50.227052736 +0100
+++ gcc/tree-vect-loop.c	2018-08-09 15:38:35.230258362 +0100
@@ -6711,6 +6711,7 @@  vectorizable_reduction (stmt_vec_info st
     }
 
   if (reduction_type != EXTRACT_LAST_REDUCTION
+      && (!nested_cycle || double_reduc)
       && reduc_fn == IFN_LAST
       && !nunits_out.is_constant ())
     {
Index: gcc/testsuite/gcc.target/aarch64/sve/reduc_8.c
===================================================================
--- /dev/null	2018-07-26 10:26:13.137955424 +0100
+++ gcc/testsuite/gcc.target/aarch64/sve/reduc_8.c	2018-08-09 15:38:35.230258362 +0100
@@ -0,0 +1,20 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-vectorize" } */
+
+int
+reduc (int *restrict a, int *restrict b, int *restrict c)
+{
+  for (int i = 0; i < 100; ++i)
+    {
+      int res = 0;
+      for (int j = 0; j < 100; ++j)
+	if (b[i + j] != 0)
+	  res = c[i + j];
+      a[i] = res;
+    }
+}
+
+/* { dg-final { scan-assembler-times {\tcmpne\tp[0-9]+\.s, } 1 } } */
+/* We ought to use the CMPNE result for the SEL too.  */
+/* { dg-final { scan-assembler-not {\tcmpeq\tp[0-9]+\.s, } { xfail *-*-* } } } */
+/* { dg-final { scan-assembler-times {\tsel\tz[0-9]+\.s, } 1 } } */