diff mbox series

[3/6] Avoid accounting for non-existent vector loop versioning

Message ID mpty2wus8hn.fsf@arm.com
State New
Headers show
Series Optionally pick the cheapest loop_vec_info | expand

Commit Message

Richard Sandiford Nov. 5, 2019, 2:28 p.m. UTC
vect_analyze_loop_costing uses two profitability thresholds: a runtime
one and a static compile-time one.  The runtime one is simply the point
at which the vector loop is cheaper than the scalar loop, while the
static one also takes into account the cost of choosing between the
scalar and vector loops at runtime.  We compare this static cost against
the expected execution frequency to decide whether it's worth generating
any vector code at all.

However, we never reclaimed the cost of applying the runtime threshold
if it turned out that the vector code can always be used.  And we only
know whether that's true once we've calculated what the runtime
threshold would be.


2019-11-04  Richard Sandiford  <richard.sandiford@arm.com>

gcc/
	* tree-vectorizer.h (vect_apply_runtime_profitability_check_p):
	New function.
	* tree-vect-loop-manip.c (vect_loop_versioning): Use it.
	* tree-vect-loop.c (vect_analyze_loop_2): Likewise.
	(vect_transform_loop): Likewise.
	(vect_analyze_loop_costing): Don't take the cost of versioning
	into account for the static profitability threshold if it turns
	out that no versioning is needed.

Comments

Richard Biener Nov. 6, 2019, 12:05 p.m. UTC | #1
On Tue, Nov 5, 2019 at 3:28 PM Richard Sandiford
<Richard.Sandiford@arm.com> wrote:
>
> vect_analyze_loop_costing uses two profitability thresholds: a runtime
> one and a static compile-time one.  The runtime one is simply the point
> at which the vector loop is cheaper than the scalar loop, while the
> static one also takes into account the cost of choosing between the
> scalar and vector loops at runtime.  We compare this static cost against
> the expected execution frequency to decide whether it's worth generating
> any vector code at all.
>
> However, we never reclaimed the cost of applying the runtime threshold
> if it turned out that the vector code can always be used.  And we only
> know whether that's true once we've calculated what the runtime
> threshold would be.

OK.

>
> 2019-11-04  Richard Sandiford  <richard.sandiford@arm.com>
>
> gcc/
>         * tree-vectorizer.h (vect_apply_runtime_profitability_check_p):
>         New function.
>         * tree-vect-loop-manip.c (vect_loop_versioning): Use it.
>         * tree-vect-loop.c (vect_analyze_loop_2): Likewise.
>         (vect_transform_loop): Likewise.
>         (vect_analyze_loop_costing): Don't take the cost of versioning
>         into account for the static profitability threshold if it turns
>         out that no versioning is needed.
>
> Index: gcc/tree-vectorizer.h
> ===================================================================
> --- gcc/tree-vectorizer.h       2019-11-05 11:14:42.786884473 +0000
> +++ gcc/tree-vectorizer.h       2019-11-05 14:19:33.829371745 +0000
> @@ -1557,6 +1557,17 @@ vect_get_scalar_dr_size (dr_vec_info *dr
>    return tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (dr_info->dr))));
>  }
>
> +/* Return true if LOOP_VINFO requires a runtime check for whether the
> +   vector loop is profitable.  */
> +
> +inline bool
> +vect_apply_runtime_profitability_check_p (loop_vec_info loop_vinfo)
> +{
> +  unsigned int th = LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo);
> +  return (!LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> +         && th >= vect_vf_for_cost (loop_vinfo));
> +}
> +
>  /* Source location + hotness information. */
>  extern dump_user_location_t vect_location;
>
> Index: gcc/tree-vect-loop-manip.c
> ===================================================================
> --- gcc/tree-vect-loop-manip.c  2019-11-05 10:38:31.838181047 +0000
> +++ gcc/tree-vect-loop-manip.c  2019-11-05 14:19:33.825371773 +0000
> @@ -3173,8 +3173,7 @@ vect_loop_versioning (loop_vec_info loop
>      = LOOP_REQUIRES_VERSIONING_FOR_SIMD_IF_COND (loop_vinfo);
>    unsigned th = LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo);
>
> -  if (th >= vect_vf_for_cost (loop_vinfo)
> -      && !LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> +  if (vect_apply_runtime_profitability_check_p (loop_vinfo)
>        && !ordered_p (th, versioning_threshold))
>      cond_expr = fold_build2 (GE_EXPR, boolean_type_node, scalar_loop_iters,
>                              build_int_cst (TREE_TYPE (scalar_loop_iters),
> Index: gcc/tree-vect-loop.c
> ===================================================================
> --- gcc/tree-vect-loop.c        2019-11-05 11:14:42.782884501 +0000
> +++ gcc/tree-vect-loop.c        2019-11-05 14:19:33.829371745 +0000
> @@ -1689,6 +1689,24 @@ vect_analyze_loop_costing (loop_vec_info
>        return 0;
>      }
>
> +  /* The static profitablity threshold min_profitable_estimate includes
> +     the cost of having to check at runtime whether the scalar loop
> +     should be used instead.  If it turns out that we don't need or want
> +     such a check, the threshold we should use for the static estimate
> +     is simply the point at which the vector loop becomes more profitable
> +     than the scalar loop.  */
> +  if (min_profitable_estimate > min_profitable_iters
> +      && !LOOP_REQUIRES_VERSIONING (loop_vinfo)
> +      && !LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo)
> +      && !LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo)
> +      && !vect_apply_runtime_profitability_check_p (loop_vinfo))
> +    {
> +      if (dump_enabled_p ())
> +       dump_printf_loc (MSG_NOTE, vect_location, "no need for a runtime"
> +                        " choice between the scalar and vector loops\n");
> +      min_profitable_estimate = min_profitable_iters;
> +    }
> +
>    HOST_WIDE_INT estimated_niter;
>
>    /* If we are vectorizing an epilogue then we know the maximum number of
> @@ -2225,8 +2243,7 @@ vect_analyze_loop_2 (loop_vec_info loop_
>
>        /*  Use the same condition as vect_transform_loop to decide when to use
>           the cost to determine a versioning threshold.  */
> -      if (th >= vect_vf_for_cost (loop_vinfo)
> -         && !LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> +      if (vect_apply_runtime_profitability_check_p (loop_vinfo)
>           && ordered_p (th, niters_th))
>         niters_th = ordered_max (poly_uint64 (th), niters_th);
>
> @@ -8268,14 +8285,13 @@ vect_transform_loop (loop_vec_info loop_
>       run at least the (estimated) vectorization factor number of times
>       checking is pointless, too.  */
>    th = LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo);
> -  if (th >= vect_vf_for_cost (loop_vinfo)
> -      && !LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo))
> +  if (vect_apply_runtime_profitability_check_p (loop_vinfo))
>      {
> -       if (dump_enabled_p ())
> -         dump_printf_loc (MSG_NOTE, vect_location,
> -                          "Profitability threshold is %d loop iterations.\n",
> -                          th);
> -       check_profitability = true;
> +      if (dump_enabled_p ())
> +       dump_printf_loc (MSG_NOTE, vect_location,
> +                        "Profitability threshold is %d loop iterations.\n",
> +                        th);
> +      check_profitability = true;
>      }
>
>    /* Make sure there exists a single-predecessor exit bb.  Do this before
diff mbox series

Patch

Index: gcc/tree-vectorizer.h
===================================================================
--- gcc/tree-vectorizer.h	2019-11-05 11:14:42.786884473 +0000
+++ gcc/tree-vectorizer.h	2019-11-05 14:19:33.829371745 +0000
@@ -1557,6 +1557,17 @@  vect_get_scalar_dr_size (dr_vec_info *dr
   return tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (dr_info->dr))));
 }
 
+/* Return true if LOOP_VINFO requires a runtime check for whether the
+   vector loop is profitable.  */
+
+inline bool
+vect_apply_runtime_profitability_check_p (loop_vec_info loop_vinfo)
+{
+  unsigned int th = LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo);
+  return (!LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
+	  && th >= vect_vf_for_cost (loop_vinfo));
+}
+
 /* Source location + hotness information. */
 extern dump_user_location_t vect_location;
 
Index: gcc/tree-vect-loop-manip.c
===================================================================
--- gcc/tree-vect-loop-manip.c	2019-11-05 10:38:31.838181047 +0000
+++ gcc/tree-vect-loop-manip.c	2019-11-05 14:19:33.825371773 +0000
@@ -3173,8 +3173,7 @@  vect_loop_versioning (loop_vec_info loop
     = LOOP_REQUIRES_VERSIONING_FOR_SIMD_IF_COND (loop_vinfo);
   unsigned th = LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo);
 
-  if (th >= vect_vf_for_cost (loop_vinfo)
-      && !LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
+  if (vect_apply_runtime_profitability_check_p (loop_vinfo)
       && !ordered_p (th, versioning_threshold))
     cond_expr = fold_build2 (GE_EXPR, boolean_type_node, scalar_loop_iters,
 			     build_int_cst (TREE_TYPE (scalar_loop_iters),
Index: gcc/tree-vect-loop.c
===================================================================
--- gcc/tree-vect-loop.c	2019-11-05 11:14:42.782884501 +0000
+++ gcc/tree-vect-loop.c	2019-11-05 14:19:33.829371745 +0000
@@ -1689,6 +1689,24 @@  vect_analyze_loop_costing (loop_vec_info
       return 0;
     }
 
+  /* The static profitablity threshold min_profitable_estimate includes
+     the cost of having to check at runtime whether the scalar loop
+     should be used instead.  If it turns out that we don't need or want
+     such a check, the threshold we should use for the static estimate
+     is simply the point at which the vector loop becomes more profitable
+     than the scalar loop.  */
+  if (min_profitable_estimate > min_profitable_iters
+      && !LOOP_REQUIRES_VERSIONING (loop_vinfo)
+      && !LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo)
+      && !LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo)
+      && !vect_apply_runtime_profitability_check_p (loop_vinfo))
+    {
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_NOTE, vect_location, "no need for a runtime"
+			 " choice between the scalar and vector loops\n");
+      min_profitable_estimate = min_profitable_iters;
+    }
+
   HOST_WIDE_INT estimated_niter;
 
   /* If we are vectorizing an epilogue then we know the maximum number of
@@ -2225,8 +2243,7 @@  vect_analyze_loop_2 (loop_vec_info loop_
 
       /*  Use the same condition as vect_transform_loop to decide when to use
 	  the cost to determine a versioning threshold.  */
-      if (th >= vect_vf_for_cost (loop_vinfo)
-	  && !LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
+      if (vect_apply_runtime_profitability_check_p (loop_vinfo)
 	  && ordered_p (th, niters_th))
 	niters_th = ordered_max (poly_uint64 (th), niters_th);
 
@@ -8268,14 +8285,13 @@  vect_transform_loop (loop_vec_info loop_
      run at least the (estimated) vectorization factor number of times
      checking is pointless, too.  */
   th = LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo);
-  if (th >= vect_vf_for_cost (loop_vinfo)
-      && !LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo))
+  if (vect_apply_runtime_profitability_check_p (loop_vinfo))
     {
-	if (dump_enabled_p ())
-	  dump_printf_loc (MSG_NOTE, vect_location,
-			   "Profitability threshold is %d loop iterations.\n",
-			   th);
-	check_profitability = true;
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_NOTE, vect_location,
+			 "Profitability threshold is %d loop iterations.\n",
+			 th);
+      check_profitability = true;
     }
 
   /* Make sure there exists a single-predecessor exit bb.  Do this before