Fix PR80457

Submitted by James Greenhalgh on April 21, 2017, 10:41 a.m.

Details

Message ID 20170421104113.GA8839@arm.com
State New
Headers show

Commit Message

James Greenhalgh April 21, 2017, 10:41 a.m.
On Tue, Apr 18, 2017 at 05:38:48PM -0500, Bill Schmidt wrote:
> Hi,
> 
> While investigating a performance issue, I happened to notice that vectorized
> COND_EXPRs were not contributing to the vectorizer cost model.  This patch
> addresses that.
> 
> Bootstrapped and tested on powerpc64le-unknown-linux-gnu.  Is this ok for
> trunk, or should it wait for GCC 8?

Hi Bill,

I had a similar patch sitting in my tree waiting for GCC 8, with a few
differences. The main one is that you don't record the cost of the
vector conditional itself, only of the then/else branches.

I've attached what I was working on. In my patch I first generalize
vect_model_simple_cost to fix the FORNOW: Assuming maximum 2 args per
stmts, and to cost the broadcast as a broadcast rather than a vector
statiement as the conditionals need a judgement on the prologue cost
of 4 args. That means taking an extra argument for the number of arguments
a statement needs to cost which in turn means updating the rest of the
functions which call vect_model_simple_cost.

I haven't hacked on the vector cost model before, and your patch acheieves
the main part of what I was looking to do, but if you think any parts of
this are still valuable after your change, I'd be happy to rebase it once
the branch opens.

My patch was bootstrapped and tested on aarch64-none-linux-gnu with no
issues, with a ChangeLog which looks like:

gcc/

2017-04-21  James Greenhalgh  <james.greenhalgh@arm.com>

	* tree-vect-stmts.c (vect_model_simple_cost): Model the cost
	of all arguments to a statement as scalar_to_vec operations.
	(vectorizable_call): Adjust call to vect_model_simple_cost for
	new parameter.
	(vectorizable_conversion): Likewise.
	(vectorizable_assignment): Likewise.
	(vectorizable_shift): Likewise.
	(vectorizable_operation): Likewise.
	(vectorizable_comparison): Likewise.
	(vect_is_simple_cond): Record the def types for operands.
	(vectorizable_condition): Likewise, call vect_model_simple_cost.
	* tree-vectorizer.h (vect_model_simple_cost): Add new parameter
	for statement argument count.

Thanks,
James

Comments

Richard Guenther April 21, 2017, 12:04 p.m.
On Fri, Apr 21, 2017 at 12:41 PM, James Greenhalgh
<james.greenhalgh@arm.com> wrote:
> On Tue, Apr 18, 2017 at 05:38:48PM -0500, Bill Schmidt wrote:
>> Hi,
>>
>> While investigating a performance issue, I happened to notice that vectorized
>> COND_EXPRs were not contributing to the vectorizer cost model.  This patch
>> addresses that.
>>
>> Bootstrapped and tested on powerpc64le-unknown-linux-gnu.  Is this ok for
>> trunk, or should it wait for GCC 8?
>
> Hi Bill,
>
> I had a similar patch sitting in my tree waiting for GCC 8, with a few
> differences. The main one is that you don't record the cost of the
> vector conditional itself, only of the then/else branches.
>
> I've attached what I was working on. In my patch I first generalize
> vect_model_simple_cost to fix the FORNOW: Assuming maximum 2 args per
> stmts, and to cost the broadcast as a broadcast rather than a vector
> statiement as the conditionals need a judgement on the prologue cost
> of 4 args. That means taking an extra argument for the number of arguments
> a statement needs to cost which in turn means updating the rest of the
> functions which call vect_model_simple_cost.
>
> I haven't hacked on the vector cost model before, and your patch acheieves
> the main part of what I was looking to do, but if you think any parts of
> this are still valuable after your change, I'd be happy to rebase it once
> the branch opens.
>
> My patch was bootstrapped and tested on aarch64-none-linux-gnu with no
> issues, with a ChangeLog which looks like:

Looks good to me as well.

Thanks,
Richard.

> gcc/
>
> 2017-04-21  James Greenhalgh  <james.greenhalgh@arm.com>
>
>         * tree-vect-stmts.c (vect_model_simple_cost): Model the cost
>         of all arguments to a statement as scalar_to_vec operations.
>         (vectorizable_call): Adjust call to vect_model_simple_cost for
>         new parameter.
>         (vectorizable_conversion): Likewise.
>         (vectorizable_assignment): Likewise.
>         (vectorizable_shift): Likewise.
>         (vectorizable_operation): Likewise.
>         (vectorizable_comparison): Likewise.
>         (vect_is_simple_cond): Record the def types for operands.
>         (vectorizable_condition): Likewise, call vect_model_simple_cost.
>         * tree-vectorizer.h (vect_model_simple_cost): Add new parameter
>         for statement argument count.
>
> Thanks,
> James
>
William J. Schmidt April 21, 2017, 8:57 p.m.
> On Apr 21, 2017, at 5:41 AM, James Greenhalgh <james.greenhalgh@arm.com> wrote:
> 
> On Tue, Apr 18, 2017 at 05:38:48PM -0500, Bill Schmidt wrote:
>> Hi,
>> 
>> While investigating a performance issue, I happened to notice that vectorized
>> COND_EXPRs were not contributing to the vectorizer cost model.  This patch
>> addresses that.
>> 
>> Bootstrapped and tested on powerpc64le-unknown-linux-gnu.  Is this ok for
>> trunk, or should it wait for GCC 8?
> 
> Hi Bill,
> 
> I had a similar patch sitting in my tree waiting for GCC 8, with a few
> differences. The main one is that you don't record the cost of the
> vector conditional itself, only of the then/else branches.
> 
> I've attached what I was working on. In my patch I first generalize
> vect_model_simple_cost to fix the FORNOW: Assuming maximum 2 args per
> stmts, and to cost the broadcast as a broadcast rather than a vector
> statiement as the conditionals need a judgement on the prologue cost
> of 4 args. That means taking an extra argument for the number of arguments
> a statement needs to cost which in turn means updating the rest of the
> functions which call vect_model_simple_cost.
> 
> I haven't hacked on the vector cost model before, and your patch acheieves
> the main part of what I was looking to do, but if you think any parts of
> this are still valuable after your change, I'd be happy to rebase it once
> the branch opens.

This looks great to me as well!  Since Richard is happy with your patch,
please just go ahead with it and I'll withdraw mine.  It's good that you're
adding this additional flexibility to the cost model.

Thanks!
Bill

> 
> My patch was bootstrapped and tested on aarch64-none-linux-gnu with no
> issues, with a ChangeLog which looks like:
> 
> gcc/
> 
> 2017-04-21  James Greenhalgh  <james.greenhalgh@arm.com>
> 
> 	* tree-vect-stmts.c (vect_model_simple_cost): Model the cost
> 	of all arguments to a statement as scalar_to_vec operations.
> 	(vectorizable_call): Adjust call to vect_model_simple_cost for
> 	new parameter.
> 	(vectorizable_conversion): Likewise.
> 	(vectorizable_assignment): Likewise.
> 	(vectorizable_shift): Likewise.
> 	(vectorizable_operation): Likewise.
> 	(vectorizable_comparison): Likewise.
> 	(vect_is_simple_cond): Record the def types for operands.
> 	(vectorizable_condition): Likewise, call vect_model_simple_cost.
> 	* tree-vectorizer.h (vect_model_simple_cost): Add new parameter
> 	for statement argument count.
> 
> Thanks,
> James
> 
> <0001-Patch-Expand-vector-cost-model-to-cover-vectorizable.patch>

Patch hide | download patch | download mbox

diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index bfb7185..4b6f337 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -801,6 +801,7 @@  vect_mark_stmts_to_be_vectorized (loop_vec_info loop_vinfo)
 void
 vect_model_simple_cost (stmt_vec_info stmt_info, int ncopies,
 			enum vect_def_type *dt,
+			int ndts,
 			stmt_vector_for_cost *prologue_cost_vec,
 			stmt_vector_for_cost *body_cost_vec)
 {
@@ -811,10 +812,12 @@  vect_model_simple_cost (stmt_vec_info stmt_info, int ncopies,
   if (PURE_SLP_STMT (stmt_info))
     return;
 
-  /* FORNOW: Assuming maximum 2 args per stmts.  */
-  for (i = 0; i < 2; i++)
+  /* Cost the "broadcast" of a scalar operand in to a vector operand.
+     Use scalar_to_vec to cost the broadcast, as elsewhere in the vector
+     cost model.  */
+  for (i = 0; i < ndts; i++)
     if (dt[i] == vect_constant_def || dt[i] == vect_external_def)
-      prologue_cost += record_stmt_cost (prologue_cost_vec, 1, vector_stmt,
+      prologue_cost += record_stmt_cost (prologue_cost_vec, 1, scalar_to_vec,
 					 stmt_info, 0, vect_prologue);
 
   /* Pass the inside-of-loop statements to the target-specific cost model.  */
@@ -2599,6 +2602,7 @@  vectorizable_call (gimple *gs, gimple_stmt_iterator *gsi, gimple **vec_stmt,
   gimple *def_stmt;
   enum vect_def_type dt[3]
     = {vect_unknown_def_type, vect_unknown_def_type, vect_unknown_def_type};
+  int ndts = 3;
   gimple *new_stmt = NULL;
   int ncopies, j;
   vec<tree> vargs = vNULL;
@@ -2804,7 +2808,7 @@  vectorizable_call (gimple *gs, gimple_stmt_iterator *gsi, gimple **vec_stmt,
       if (dump_enabled_p ())
         dump_printf_loc (MSG_NOTE, vect_location, "=== vectorizable_call ==="
                          "\n");
-      vect_model_simple_cost (stmt_info, ncopies, dt, NULL, NULL);
+      vect_model_simple_cost (stmt_info, ncopies, dt, ndts, NULL, NULL);
       if (ifn != IFN_LAST && modifier == NARROW && !slp_node)
 	add_stmt_cost (stmt_info->vinfo->target_cost_data, ncopies / 2,
 		       vec_promote_demote, stmt_info, 0, vect_body);
@@ -4023,6 +4027,7 @@  vectorizable_conversion (gimple *stmt, gimple_stmt_iterator *gsi,
   tree new_temp;
   gimple *def_stmt;
   enum vect_def_type dt[2] = {vect_unknown_def_type, vect_unknown_def_type};
+  int ndts = 2;
   gimple *new_stmt = NULL;
   stmt_vec_info prev_stmt_info;
   int nunits_in;
@@ -4301,7 +4306,7 @@  vectorizable_conversion (gimple *stmt, gimple_stmt_iterator *gsi,
       if (code == FIX_TRUNC_EXPR || code == FLOAT_EXPR)
         {
 	  STMT_VINFO_TYPE (stmt_info) = type_conversion_vec_info_type;
-	  vect_model_simple_cost (stmt_info, ncopies, dt, NULL, NULL);
+	  vect_model_simple_cost (stmt_info, ncopies, dt, ndts, NULL, NULL);
 	}
       else if (modifier == NARROW)
 	{
@@ -4610,7 +4615,8 @@  vectorizable_assignment (gimple *stmt, gimple_stmt_iterator *gsi,
   loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
   tree new_temp;
   gimple *def_stmt;
-  enum vect_def_type dt[2] = {vect_unknown_def_type, vect_unknown_def_type};
+  enum vect_def_type dt[1] = {vect_unknown_def_type};
+  int ndts = 1;
   int ncopies;
   int i, j;
   vec<tree> vec_oprnds = vNULL;
@@ -4710,7 +4716,7 @@  vectorizable_assignment (gimple *stmt, gimple_stmt_iterator *gsi,
       if (dump_enabled_p ())
         dump_printf_loc (MSG_NOTE, vect_location,
                          "=== vectorizable_assignment ===\n");
-      vect_model_simple_cost (stmt_info, ncopies, dt, NULL, NULL);
+      vect_model_simple_cost (stmt_info, ncopies, dt, ndts, NULL, NULL);
       return true;
     }
 
@@ -4822,6 +4828,7 @@  vectorizable_shift (gimple *stmt, gimple_stmt_iterator *gsi,
   machine_mode optab_op2_mode;
   gimple *def_stmt;
   enum vect_def_type dt[2] = {vect_unknown_def_type, vect_unknown_def_type};
+  int ndts = 2;
   gimple *new_stmt = NULL;
   stmt_vec_info prev_stmt_info;
   int nunits_in;
@@ -5080,7 +5087,7 @@  vectorizable_shift (gimple *stmt, gimple_stmt_iterator *gsi,
       if (dump_enabled_p ())
         dump_printf_loc (MSG_NOTE, vect_location,
                          "=== vectorizable_shift ===\n");
-      vect_model_simple_cost (stmt_info, ncopies, dt, NULL, NULL);
+      vect_model_simple_cost (stmt_info, ncopies, dt, ndts, NULL, NULL);
       return true;
     }
 
@@ -5196,6 +5203,7 @@  vectorizable_operation (gimple *stmt, gimple_stmt_iterator *gsi,
   gimple *def_stmt;
   enum vect_def_type dt[3]
     = {vect_unknown_def_type, vect_unknown_def_type, vect_unknown_def_type};
+  int ndts = 3;
   gimple *new_stmt = NULL;
   stmt_vec_info prev_stmt_info;
   int nunits_in;
@@ -5407,7 +5415,7 @@  vectorizable_operation (gimple *stmt, gimple_stmt_iterator *gsi,
       if (dump_enabled_p ())
         dump_printf_loc (MSG_NOTE, vect_location,
                          "=== vectorizable_operation ===\n");
-      vect_model_simple_cost (stmt_info, ncopies, dt, NULL, NULL);
+      vect_model_simple_cost (stmt_info, ncopies, dt, ndts, NULL, NULL);
       return true;
     }
 
@@ -7659,15 +7667,16 @@  vectorizable_load (gimple *stmt, gimple_stmt_iterator *gsi, gimple **vec_stmt,
 
    Output:
    *COMP_VECTYPE - the vector type for the comparison.
+   *DTS - The def types for the arguments of the comparison
 
    Returns whether a COND can be vectorized.  Checks whether
    condition operands are supportable using vec_is_simple_use.  */
 
 static bool
-vect_is_simple_cond (tree cond, vec_info *vinfo, tree *comp_vectype)
+vect_is_simple_cond (tree cond, vec_info *vinfo,
+		     tree *comp_vectype, enum vect_def_type *dts)
 {
   tree lhs, rhs;
-  enum vect_def_type dt;
   tree vectype1 = NULL_TREE, vectype2 = NULL_TREE;
 
   /* Mask case.  */
@@ -7676,7 +7685,7 @@  vect_is_simple_cond (tree cond, vec_info *vinfo, tree *comp_vectype)
     {
       gimple *lhs_def_stmt = SSA_NAME_DEF_STMT (cond);
       if (!vect_is_simple_use (cond, vinfo, &lhs_def_stmt,
-			       &dt, comp_vectype)
+			       &dts[0], comp_vectype)
 	  || !*comp_vectype
 	  || !VECTOR_BOOLEAN_TYPE_P (*comp_vectype))
 	return false;
@@ -7692,21 +7701,25 @@  vect_is_simple_cond (tree cond, vec_info *vinfo, tree *comp_vectype)
   if (TREE_CODE (lhs) == SSA_NAME)
     {
       gimple *lhs_def_stmt = SSA_NAME_DEF_STMT (lhs);
-      if (!vect_is_simple_use (lhs, vinfo, &lhs_def_stmt, &dt, &vectype1))
+      if (!vect_is_simple_use (lhs, vinfo, &lhs_def_stmt, &dts[0], &vectype1))
 	return false;
     }
-  else if (TREE_CODE (lhs) != INTEGER_CST && TREE_CODE (lhs) != REAL_CST
-	   && TREE_CODE (lhs) != FIXED_CST)
+  else if (TREE_CODE (lhs) == INTEGER_CST || TREE_CODE (lhs) == REAL_CST
+	   || TREE_CODE (lhs) == FIXED_CST)
+    dts[0] = vect_constant_def;
+  else
     return false;
 
   if (TREE_CODE (rhs) == SSA_NAME)
     {
       gimple *rhs_def_stmt = SSA_NAME_DEF_STMT (rhs);
-      if (!vect_is_simple_use (rhs, vinfo, &rhs_def_stmt, &dt, &vectype2))
+      if (!vect_is_simple_use (rhs, vinfo, &rhs_def_stmt, &dts[1], &vectype2))
 	return false;
     }
-  else if (TREE_CODE (rhs) != INTEGER_CST && TREE_CODE (rhs) != REAL_CST
-	   && TREE_CODE (rhs) != FIXED_CST)
+  else if (TREE_CODE (rhs) == INTEGER_CST || TREE_CODE (rhs) == REAL_CST
+	   || TREE_CODE (rhs) == FIXED_CST)
+    dts[1] = vect_constant_def;
+  else
     return false;
 
   if (vectype1 && vectype2
@@ -7746,7 +7759,10 @@  vectorizable_condition (gimple *stmt, gimple_stmt_iterator *gsi,
   tree vec_compare;
   tree new_temp;
   loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
-  enum vect_def_type dt, dts[4];
+  enum vect_def_type dts[4]
+    = {vect_unknown_def_type, vect_unknown_def_type,
+       vect_unknown_def_type, vect_unknown_def_type};
+  int ndts = 4;
   int ncopies;
   enum tree_code code, cond_code, bitop1 = NOP_EXPR, bitop2 = NOP_EXPR;
   stmt_vec_info prev_stmt_info = NULL;
@@ -7808,15 +7824,16 @@  vectorizable_condition (gimple *stmt, gimple_stmt_iterator *gsi,
   then_clause = gimple_assign_rhs2 (stmt);
   else_clause = gimple_assign_rhs3 (stmt);
 
-  if (!vect_is_simple_cond (cond_expr, stmt_info->vinfo, &comp_vectype)
+  if (!vect_is_simple_cond (cond_expr, stmt_info->vinfo,
+			    &comp_vectype, &dts[0])
       || !comp_vectype)
     return false;
 
   gimple *def_stmt;
-  if (!vect_is_simple_use (then_clause, stmt_info->vinfo, &def_stmt, &dt,
+  if (!vect_is_simple_use (then_clause, stmt_info->vinfo, &def_stmt, &dts[2],
 			   &vectype1))
     return false;
-  if (!vect_is_simple_use (else_clause, stmt_info->vinfo, &def_stmt, &dt,
+  if (!vect_is_simple_use (else_clause, stmt_info->vinfo, &def_stmt, &dts[3],
 			   &vectype2))
     return false;
 
@@ -7900,8 +7917,13 @@  vectorizable_condition (gimple *stmt, gimple_stmt_iterator *gsi,
 		return false;
 	    }
 	}
-      return expand_vec_cond_expr_p (vectype, comp_vectype,
-				     cond_code);
+      if (expand_vec_cond_expr_p (vectype, comp_vectype,
+				     cond_code))
+	{
+	  vect_model_simple_cost (stmt_info, ncopies, dts, ndts, NULL, NULL);
+	  return true;
+	}
+      return false;
     }
 
   /* Transform.  */
@@ -8104,6 +8126,7 @@  vectorizable_comparison (gimple *stmt, gimple_stmt_iterator *gsi,
   tree new_temp;
   loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
   enum vect_def_type dts[2] = {vect_unknown_def_type, vect_unknown_def_type};
+  int ndts = 2;
   unsigned nunits;
   int ncopies;
   enum tree_code code, bitop1 = NOP_EXPR, bitop2 = NOP_EXPR;
@@ -8229,7 +8252,7 @@  vectorizable_comparison (gimple *stmt, gimple_stmt_iterator *gsi,
     {
       STMT_VINFO_TYPE (stmt_info) = comparison_vec_info_type;
       vect_model_simple_cost (stmt_info, ncopies * (1 + (bitop2 != NOP_EXPR)),
-			      dts, NULL, NULL);
+			      dts, ndts, NULL, NULL);
       if (bitop1 == NOP_EXPR)
 	return expand_vec_cmp_expr_p (vectype, mask_type, code);
       else
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index 12bb904..c0bc493 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -1079,7 +1079,7 @@  extern bool supportable_narrowing_operation (enum tree_code, tree, tree,
 extern stmt_vec_info new_stmt_vec_info (gimple *stmt, vec_info *);
 extern void free_stmt_vec_info (gimple *stmt);
 extern void vect_model_simple_cost (stmt_vec_info, int, enum vect_def_type *,
-                                    stmt_vector_for_cost *,
+				    int, stmt_vector_for_cost *,
 				    stmt_vector_for_cost *);
 extern void vect_model_store_cost (stmt_vec_info, int, vect_memory_access_type,
 				   enum vect_def_type, slp_tree,

--------------2.6.4.2.gae996d8--