diff mbox series

[vect] Account for epilogue's peeling for gaps when checking if we have enough niters for epilogue

Message ID 6b8319c5-9bb4-78f4-7abc-92ad91bdd781@arm.com
State New
Headers show
Series [vect] Account for epilogue's peeling for gaps when checking if we have enough niters for epilogue | expand

Commit Message

Andre Vieira (lists) Nov. 8, 2019, 2:48 p.m. UTC
Hi,

As I mentioned in the patch to disable epilogue vectorization for loops 
with SIMDUID set, there were still some aarch64 libgomp failures. This 
patch fixes those.

The problem was that we were vectorizing a reduction that was only using 
one of the parts from a complex number, creating data accesses with 
gaps. For this we set PEELING_FOR_GAPS which forces us to peel an extra 
scalar iteration.

What was happening in the testcase I looked at was that we had a known 
niters of 10. The first VF was 4, leaving 10 % 4 = 2 scalar iterations. 
The epilogue had VF 2, which meant the current code thought we could do 
it. However, given the PEELING_FOR_GAPS it would create a scalar 
epilogue and we would end up doing too many iterations, surprisingly 12 
as I think the code assumed we hadn't created said epilogue.

I ran a local check where I upped the iterations of the fortran test to 
11 and I see GCC vectorizing the epilogue with VF = 2 and a scalar 
epilogue for one iteration, so that looks good too. I have transformed 
it into a test that would reproduce the issue in C and without openacc 
so I can run it in gcc's normal testsuite more easily.

Bootstrap on aarch64 and x86_64.

Is this OK for trunk?

Cheers,
Andre

gcc/ChangeLog:
2019-11-08  Andre Vieira  <andre.simoesdiasvieira@arm.com>

	* tree-vect-loop-manip.c (vect_do_peeling): Take epilogue gaps
         into account when checking if there are enough iterations to
         vectorize epilogue.

gcc/testsuite/ChangeLog:
2019-11-08  Andre Vieira  <andre.simoesdiasvieira@arm.com>

	* gcc.dg/vect/vect-reduc-epilogue-gaps.c: New test.

Comments

Richard Biener Nov. 11, 2019, 7:08 a.m. UTC | #1
On Fri, 8 Nov 2019, Andre Vieira (lists) wrote:

> Hi,
> 
> As I mentioned in the patch to disable epilogue vectorization for loops with
> SIMDUID set, there were still some aarch64 libgomp failures. This patch fixes
> those.
> 
> The problem was that we were vectorizing a reduction that was only using one
> of the parts from a complex number, creating data accesses with gaps. For this
> we set PEELING_FOR_GAPS which forces us to peel an extra scalar iteration.
> 
> What was happening in the testcase I looked at was that we had a known niters
> of 10. The first VF was 4, leaving 10 % 4 = 2 scalar iterations. The epilogue
> had VF 2, which meant the current code thought we could do it. However, given
> the PEELING_FOR_GAPS it would create a scalar epilogue and we would end up
> doing too many iterations, surprisingly 12 as I think the code assumed we
> hadn't created said epilogue.
> 
> I ran a local check where I upped the iterations of the fortran test to 11 and
> I see GCC vectorizing the epilogue with VF = 2 and a scalar epilogue for one
> iteration, so that looks good too. I have transformed it into a test that
> would reproduce the issue in C and without openacc so I can run it in gcc's
> normal testsuite more easily.
> 
> Bootstrap on aarch64 and x86_64.
> 
> Is this OK for trunk?

OK.

Richard.

> Cheers,
> Andre
> 
> gcc/ChangeLog:
> 2019-11-08  Andre Vieira  <andre.simoesdiasvieira@arm.com>
> 
> 	* tree-vect-loop-manip.c (vect_do_peeling): Take epilogue gaps
>         into account when checking if there are enough iterations to
>         vectorize epilogue.
> 
> gcc/testsuite/ChangeLog:
> 2019-11-08  Andre Vieira  <andre.simoesdiasvieira@arm.com>
> 
> 	* gcc.dg/vect/vect-reduc-epilogue-gaps.c: New test.
> 
>
diff mbox series

Patch

diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-epilogue-gaps.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-epilogue-gaps.c
new file mode 100644
index 0000000000000000000000000000000000000000..dc5704f5607fc94fd02036e5db4a9bd37ce5169b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-epilogue-gaps.c
@@ -0,0 +1,45 @@ 
+/* { dg-options "-O3 -fno-vect-cost-model" } */
+struct {
+    float real;
+    float img;
+} g[11];
+
+float __attribute__ ((noclone))
+foo_11 (void)
+{
+  float sum = 0.0;
+  for (int i = 0; i < 11; ++i)
+    sum += g[i].real;
+  return sum;
+}
+
+float __attribute__ ((noclone))
+foo_10 (void)
+{
+  float sum = 0.0;
+  for (int i = 0; i < 10; ++i)
+    sum += g[i].real;
+  return sum;
+}
+
+int main (void)
+{
+  float check_10 = 0.0;
+  float check_11 = 0.0;
+  for (int i = 0; i < 11; ++i)
+    {
+      asm volatile ("" : : : "memory");
+      g[i].real = (float) i;
+      g[i].img = (float) -i;
+      if (i < 10)
+	check_10 += (float) i;
+      check_11 += (float) i;
+    }
+
+  if (foo_10 () != check_10)
+    __builtin_abort ();
+  if (foo_11 () != check_11)
+    __builtin_abort ();
+
+  return 0;
+}
diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c
index 54f3ccf3ec373b5621e7778e6e80bab853a57687..559d59bbe78738e53e7e6c1d64e7f87eed255d76 100644
--- a/gcc/tree-vect-loop-manip.c
+++ b/gcc/tree-vect-loop-manip.c
@@ -2530,9 +2530,11 @@  vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
 	= eiters % lowest_vf + LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo);
 
       unsigned int ratio;
+      unsigned int epilogue_gaps
+	= LOOP_VINFO_PEELING_FOR_GAPS (epilogue_vinfo);
       while (!(constant_multiple_p (loop_vinfo->vector_size,
 				    epilogue_vinfo->vector_size, &ratio)
-	       && eiters >= lowest_vf / ratio))
+	       && eiters >= lowest_vf / ratio + epilogue_gaps))
 	{
 	  delete epilogue_vinfo;
 	  epilogue_vinfo = NULL;
@@ -2543,6 +2545,7 @@  vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
 	    }
 	  epilogue_vinfo = loop_vinfo->epilogue_vinfos[0];
 	  loop_vinfo->epilogue_vinfos.ordered_remove (0);
+	  epilogue_gaps = LOOP_VINFO_PEELING_FOR_GAPS (epilogue_vinfo);
 	}
     }
   /* Prolog loop may be skipped.  */