diff mbox

Improve vectorizer peeling for alignment costmodel

Message ID alpine.LSU.2.20.1705030957290.17885@zhemvz.fhfr.qr
State New
Headers show

Commit Message

Richard Biener May 3, 2017, 8:19 a.m. UTC
The following extends the very simplistic cost modeling I added somewhen
late in the release process to, for all unknown misaligned refs, also
apply this model for loops containing stores.

The model basically says it's useless to peel for alignment if there's
only a single DR that is affected or if, in case we'll end up using
hw-supported misaligned loads, the cost of misaligned loads is the same
as of aligned ones.  Previously we'd usually align one of the stores
with the theory that this improves (precious) store-bandwith.

Note this is only a so slightly conservative (aka less peeling).  We'll
still apply peeling for alignment if you make the testcase use +=
because then we'll align both the load and the store from v1.

Bootstrap / regtest running on x86_64-unknown-linux-gnu.

Richard.

2017-05-03  Richard Biener  <rguenther@suse.de>

	* tree-vect-data-refs.c (vect_enhance_data_refs_alignment):
	When all DRs have unknown misaligned do not always peel
	when there is a store but apply the same costing model as if
	there were only loads.

	* gcc.dg/vect/costmodel/x86_64/costmodel-alignpeel.c: New testcase.

Comments

Christophe Lyon May 5, 2017, 3:01 p.m. UTC | #1
Hi Richard,


On 3 May 2017 at 10:19, Richard Biener <rguenther@suse.de> wrote:
>
> The following extends the very simplistic cost modeling I added somewhen
> late in the release process to, for all unknown misaligned refs, also
> apply this model for loops containing stores.
>
> The model basically says it's useless to peel for alignment if there's
> only a single DR that is affected or if, in case we'll end up using
> hw-supported misaligned loads, the cost of misaligned loads is the same
> as of aligned ones.  Previously we'd usually align one of the stores
> with the theory that this improves (precious) store-bandwith.
>
> Note this is only a so slightly conservative (aka less peeling).  We'll
> still apply peeling for alignment if you make the testcase use +=
> because then we'll align both the load and the store from v1.
>
> Bootstrap / regtest running on x86_64-unknown-linux-gnu.
>
> Richard.
>
> 2017-05-03  Richard Biener  <rguenther@suse.de>
>
>         * tree-vect-data-refs.c (vect_enhance_data_refs_alignment):
>         When all DRs have unknown misaligned do not always peel
>         when there is a store but apply the same costing model as if
>         there were only loads.
>
>         * gcc.dg/vect/costmodel/x86_64/costmodel-alignpeel.c: New testcase.
>

This patch (r247544) caused regressions on aarch64 and arm:
  - PASS now FAIL             [PASS => FAIL]:

  Executed from: gcc.dg/vect/vect.exp
    gcc.dg/vect/vect-44.c -flto -ffat-lto-objects
scan-tree-dump-times vect "Alignment of access forced using peeling" 1
    gcc.dg/vect/vect-44.c -flto -ffat-lto-objects
scan-tree-dump-times vect "Vectorizing an unaligned access" 2
    gcc.dg/vect/vect-44.c scan-tree-dump-times vect "Alignment of
access forced using peeling" 1
    gcc.dg/vect/vect-44.c scan-tree-dump-times vect "Vectorizing an
unaligned access" 2
    gcc.dg/vect/vect-50.c -flto -ffat-lto-objects
scan-tree-dump-times vect "Alignment of access forced using peeling" 1
    gcc.dg/vect/vect-50.c -flto -ffat-lto-objects
scan-tree-dump-times vect "Vectorizing an unaligned access" 2
    gcc.dg/vect/vect-50.c scan-tree-dump-times vect "Alignment of
access forced using peeling" 1
    gcc.dg/vect/vect-50.c scan-tree-dump-times vect "Vectorizing an
unaligned access" 2

Thanks,

Christophe

> Index: gcc/tree-vect-data-refs.c
> ===================================================================
> --- gcc/tree-vect-data-refs.c   (revision 247498)
> +++ gcc/tree-vect-data-refs.c   (working copy)
> @@ -1715,18 +1741,18 @@ vect_enhance_data_refs_alignment (loop_v
>              dr0 = first_store;
>          }
>
> -      /* In case there are only loads with different unknown misalignments, use
> -         peeling only if it may help to align other accesses in the loop or
> +      /* Use peeling only if it may help to align other accesses in the loop or
>          if it may help improving load bandwith when we'd end up using
>          unaligned loads.  */
>        tree dr0_vt = STMT_VINFO_VECTYPE (vinfo_for_stmt (DR_STMT (dr0)));
> -      if (!first_store
> -         && !STMT_VINFO_SAME_ALIGN_REFS (
> -                 vinfo_for_stmt (DR_STMT (dr0))).length ()
> +      if (STMT_VINFO_SAME_ALIGN_REFS
> +           (vinfo_for_stmt (DR_STMT (dr0))).length () == 0
>           && (vect_supportable_dr_alignment (dr0, false)
>               != dr_unaligned_supported
> -             || (builtin_vectorization_cost (vector_load, dr0_vt, 0)
> -                 == builtin_vectorization_cost (unaligned_load, dr0_vt, -1))))
> +             || (DR_IS_READ (dr0)
> +                 && (builtin_vectorization_cost (vector_load, dr0_vt, 0)
> +                     == builtin_vectorization_cost (unaligned_load,
> +                                                    dr0_vt, -1)))))
>          do_peeling = false;
>      }
>
>
> Index: gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-alignpeel.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-alignpeel.c    (nonexistent)
> +++ gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-alignpeel.c    (working copy)
> @@ -0,0 +1,9 @@
> +/* { dg-do compile } */
> +
> +void func(double * __restrict__ v1, double * v2, unsigned n)
> +{
> +  for (unsigned i = 0; i < n; ++i)
> +    v1[i] = v2[i];
> +}
> +
> +/* { dg-final { scan-tree-dump-not "Alignment of access forced using peeling" "vect" } } */
diff mbox

Patch

Index: gcc/tree-vect-data-refs.c
===================================================================
--- gcc/tree-vect-data-refs.c	(revision 247498)
+++ gcc/tree-vect-data-refs.c	(working copy)
@@ -1715,18 +1741,18 @@  vect_enhance_data_refs_alignment (loop_v
             dr0 = first_store;
         }
 
-      /* In case there are only loads with different unknown misalignments, use
-         peeling only if it may help to align other accesses in the loop or
+      /* Use peeling only if it may help to align other accesses in the loop or
 	 if it may help improving load bandwith when we'd end up using
 	 unaligned loads.  */
       tree dr0_vt = STMT_VINFO_VECTYPE (vinfo_for_stmt (DR_STMT (dr0)));
-      if (!first_store
-	  && !STMT_VINFO_SAME_ALIGN_REFS (
-		  vinfo_for_stmt (DR_STMT (dr0))).length ()
+      if (STMT_VINFO_SAME_ALIGN_REFS
+	    (vinfo_for_stmt (DR_STMT (dr0))).length () == 0
 	  && (vect_supportable_dr_alignment (dr0, false)
 	      != dr_unaligned_supported
-	      || (builtin_vectorization_cost (vector_load, dr0_vt, 0)
-		  == builtin_vectorization_cost (unaligned_load, dr0_vt, -1))))
+	      || (DR_IS_READ (dr0)
+		  && (builtin_vectorization_cost (vector_load, dr0_vt, 0)
+		      == builtin_vectorization_cost (unaligned_load,
+						     dr0_vt, -1)))))
         do_peeling = false;
     }
 

Index: gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-alignpeel.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-alignpeel.c	(nonexistent)
+++ gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-alignpeel.c	(working copy)
@@ -0,0 +1,9 @@ 
+/* { dg-do compile } */
+
+void func(double * __restrict__ v1, double * v2, unsigned n)
+{
+  for (unsigned i = 0; i < n; ++i)
+    v1[i] = v2[i];
+}
+
+/* { dg-final { scan-tree-dump-not "Alignment of access forced using peeling" "vect" } } */