diff mbox series

Do not attempt to vectorize uniform CTORs

Message ID nycvar.YFH.7.76.1911200954300.5566@zhemvz.fhfr.qr
State New
Headers show
Series Do not attempt to vectorize uniform CTORs | expand

Commit Message

Richard Biener Nov. 20, 2019, 8:56 a.m. UTC
We're doing quite some useless work here and in the case we
actually manage to "vectorize" it, we've done a no-op (bb-slp-42.c).

It also refactors the routine a bit and only dumps about "vectorizable"
CTORs when we actually analyze the SLP tree (when all CTOR elements
were internally defined).

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Richard.

2019-11-20  Richard Biener  <rguenther@suse.de>

	* tree-vect-slp.c (vect_analyze_slp_instance): Dump
	constructors we are actually analyzing.
	(vect_slp_check_for_constructors): Do not vectorize uniform
	constuctors, do not dump here.

	* gcc.dg/vect/bb-slp-42.c: Adjust.

Comments

Richard Sandiford Nov. 21, 2019, 7:02 p.m. UTC | #1
Richard Biener <rguenther@suse.de> writes:
> We're doing quite some useless work here and in the case we
> actually manage to "vectorize" it, we've done a no-op (bb-slp-42.c).

The point of that test was that we could share a vector load at b with
four constructors, so it's not really a no-op:

  https://gcc.gnu.org/ml/gcc-patches/2019-11/msg01240.html

Is this because x86 prefers to load-and-duplicate four individual
elements rather than load them all at once?

Thanks,
Richard

>
> It also refactors the routine a bit and only dumps about "vectorizable"
> CTORs when we actually analyze the SLP tree (when all CTOR elements
> were internally defined).
>
> Bootstrap and regtest running on x86_64-unknown-linux-gnu.
>
> Richard.
>
> 2019-11-20  Richard Biener  <rguenther@suse.de>
>
> 	* tree-vect-slp.c (vect_analyze_slp_instance): Dump
> 	constructors we are actually analyzing.
> 	(vect_slp_check_for_constructors): Do not vectorize uniform
> 	constuctors, do not dump here.
>
> 	* gcc.dg/vect/bb-slp-42.c: Adjust.
>
> Index: gcc/tree-vect-slp.c
> ===================================================================
> --- gcc/tree-vect-slp.c	(revision 278484)
> +++ gcc/tree-vect-slp.c	(working copy)
> @@ -2183,6 +2183,10 @@ vect_analyze_slp_instance (vec_info *vin
>  	  else
>  	    return false;
>  	}
> +      if (dump_enabled_p ())
> +	dump_printf_loc (MSG_NOTE, vect_location,
> +			 "Analyzing vectorizable constructor: %G\n",
> +			 stmt_info->stmt);
>      }
>    else
>      {
> @@ -3123,31 +3127,22 @@ vect_slp_check_for_constructors (bb_vec_
>    gimple_stmt_iterator gsi;
>  
>    for (gsi = bb_vinfo->region_begin;
> -      gsi_stmt (gsi) != gsi_stmt (bb_vinfo->region_end); gsi_next (&gsi))
> +       gsi_stmt (gsi) != gsi_stmt (bb_vinfo->region_end); gsi_next (&gsi))
>      {
> -      gimple *stmt = gsi_stmt (gsi);
> -
> -      if (is_gimple_assign (stmt)
> -	  && gimple_assign_rhs_code (stmt) == CONSTRUCTOR
> -	  && TREE_CODE (gimple_assign_lhs (stmt)) == SSA_NAME
> -	  && TREE_CODE (TREE_TYPE (gimple_assign_lhs (stmt))) == VECTOR_TYPE)
> -	{
> -	  tree rhs = gimple_assign_rhs1 (stmt);
> -
> -	  if (CONSTRUCTOR_NELTS (rhs) == 0)
> -	    continue;
> -
> -	  poly_uint64 subparts = TYPE_VECTOR_SUBPARTS (TREE_TYPE (rhs));
> +      gassign *stmt = dyn_cast <gassign *> (gsi_stmt (gsi));
> +      if (!stmt || gimple_assign_rhs_code (stmt) != CONSTRUCTOR)
> +	continue;
>  
> -	  if (maybe_ne (subparts, CONSTRUCTOR_NELTS (rhs)))
> -	    continue;
> +      tree rhs = gimple_assign_rhs1 (stmt);
> +      if (!VECTOR_TYPE_P (TREE_TYPE (rhs))
> +	  || maybe_ne (TYPE_VECTOR_SUBPARTS (TREE_TYPE (rhs)),
> +		       CONSTRUCTOR_NELTS (rhs))
> +	  || VECTOR_TYPE_P (TREE_TYPE (CONSTRUCTOR_ELT (rhs, 0)->value))
> +	  || uniform_vector_p (rhs))
> +	continue;
>  
> -	  if (dump_enabled_p ())
> -	    dump_printf_loc (MSG_NOTE, vect_location,
> -			     "Found vectorizable constructor: %G\n", stmt);
> -	  stmt_vec_info stmt_info = bb_vinfo->lookup_stmt (stmt);
> -	  BB_VINFO_GROUPED_STORES (bb_vinfo).safe_push (stmt_info);
> -	}
> +      stmt_vec_info stmt_info = bb_vinfo->lookup_stmt (stmt);
> +      BB_VINFO_GROUPED_STORES (bb_vinfo).safe_push (stmt_info);
>      }
>  }
>  
> Index: gcc/testsuite/gcc.dg/vect/bb-slp-42.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/bb-slp-42.c	(revision 278484)
> +++ gcc/testsuite/gcc.dg/vect/bb-slp-42.c	(working copy)
> @@ -44,6 +44,5 @@ main ()
>  
>  }
>  
> -/* See that we vectorize an SLP instance.  */
> -/* { dg-final { scan-tree-dump "Found vectorizable constructor" "slp1" { target { ! vect_fully_masked } } } } */
> -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 4 "slp1" { target { ! vect_fully_masked } } } } */
> +/* See that we do not try to vectorize the uniform CTORs.  */
> +/* { dg-final { scan-tree-dump-not "Analyzing vectorizable constructor" "slp1" } } */
Richard Biener Nov. 22, 2019, 7:30 a.m. UTC | #2
On November 21, 2019 8:02:14 PM GMT+01:00, Richard Sandiford <richard.sandiford@arm.com> wrote:
>Richard Biener <rguenther@suse.de> writes:
>> We're doing quite some useless work here and in the case we
>> actually manage to "vectorize" it, we've done a no-op (bb-slp-42.c).
>
>The point of that test was that we could share a vector load at b with
>four constructors, so it's not really a no-op:
>
>  https://gcc.gnu.org/ml/gcc-patches/2019-11/msg01240.html

Ah, so it becomes a cost issue. The testcase wasn't too verbose at that... 

>Is this because x86 prefers to load-and-duplicate four individual
>elements rather than load them all at once?

Yeah, it can splat from a memory operand and thereby avoid the dependence on a single load plus eventually the STLF penalty if timely earlier stores were not vector. So I thought that's always more profitable than a vector load and then four shufflings. On x86 shuffling resources are also limited while the splat can be done plenty in parallel (even combined with other vector ops with AVX 512, saving a register). 

Richard. 

>Thanks,
>Richard
>
>>
>> It also refactors the routine a bit and only dumps about
>"vectorizable"
>> CTORs when we actually analyze the SLP tree (when all CTOR elements
>> were internally defined).
>>
>> Bootstrap and regtest running on x86_64-unknown-linux-gnu.
>>
>> Richard.
>>
>> 2019-11-20  Richard Biener  <rguenther@suse.de>
>>
>> 	* tree-vect-slp.c (vect_analyze_slp_instance): Dump
>> 	constructors we are actually analyzing.
>> 	(vect_slp_check_for_constructors): Do not vectorize uniform
>> 	constuctors, do not dump here.
>>
>> 	* gcc.dg/vect/bb-slp-42.c: Adjust.
>>
>> Index: gcc/tree-vect-slp.c
>> ===================================================================
>> --- gcc/tree-vect-slp.c	(revision 278484)
>> +++ gcc/tree-vect-slp.c	(working copy)
>> @@ -2183,6 +2183,10 @@ vect_analyze_slp_instance (vec_info *vin
>>  	  else
>>  	    return false;
>>  	}
>> +      if (dump_enabled_p ())
>> +	dump_printf_loc (MSG_NOTE, vect_location,
>> +			 "Analyzing vectorizable constructor: %G\n",
>> +			 stmt_info->stmt);
>>      }
>>    else
>>      {
>> @@ -3123,31 +3127,22 @@ vect_slp_check_for_constructors (bb_vec_
>>    gimple_stmt_iterator gsi;
>>  
>>    for (gsi = bb_vinfo->region_begin;
>> -      gsi_stmt (gsi) != gsi_stmt (bb_vinfo->region_end); gsi_next
>(&gsi))
>> +       gsi_stmt (gsi) != gsi_stmt (bb_vinfo->region_end); gsi_next
>(&gsi))
>>      {
>> -      gimple *stmt = gsi_stmt (gsi);
>> -
>> -      if (is_gimple_assign (stmt)
>> -	  && gimple_assign_rhs_code (stmt) == CONSTRUCTOR
>> -	  && TREE_CODE (gimple_assign_lhs (stmt)) == SSA_NAME
>> -	  && TREE_CODE (TREE_TYPE (gimple_assign_lhs (stmt))) ==
>VECTOR_TYPE)
>> -	{
>> -	  tree rhs = gimple_assign_rhs1 (stmt);
>> -
>> -	  if (CONSTRUCTOR_NELTS (rhs) == 0)
>> -	    continue;
>> -
>> -	  poly_uint64 subparts = TYPE_VECTOR_SUBPARTS (TREE_TYPE (rhs));
>> +      gassign *stmt = dyn_cast <gassign *> (gsi_stmt (gsi));
>> +      if (!stmt || gimple_assign_rhs_code (stmt) != CONSTRUCTOR)
>> +	continue;
>>  
>> -	  if (maybe_ne (subparts, CONSTRUCTOR_NELTS (rhs)))
>> -	    continue;
>> +      tree rhs = gimple_assign_rhs1 (stmt);
>> +      if (!VECTOR_TYPE_P (TREE_TYPE (rhs))
>> +	  || maybe_ne (TYPE_VECTOR_SUBPARTS (TREE_TYPE (rhs)),
>> +		       CONSTRUCTOR_NELTS (rhs))
>> +	  || VECTOR_TYPE_P (TREE_TYPE (CONSTRUCTOR_ELT (rhs, 0)->value))
>> +	  || uniform_vector_p (rhs))
>> +	continue;
>>  
>> -	  if (dump_enabled_p ())
>> -	    dump_printf_loc (MSG_NOTE, vect_location,
>> -			     "Found vectorizable constructor: %G\n", stmt);
>> -	  stmt_vec_info stmt_info = bb_vinfo->lookup_stmt (stmt);
>> -	  BB_VINFO_GROUPED_STORES (bb_vinfo).safe_push (stmt_info);
>> -	}
>> +      stmt_vec_info stmt_info = bb_vinfo->lookup_stmt (stmt);
>> +      BB_VINFO_GROUPED_STORES (bb_vinfo).safe_push (stmt_info);
>>      }
>>  }
>>  
>> Index: gcc/testsuite/gcc.dg/vect/bb-slp-42.c
>> ===================================================================
>> --- gcc/testsuite/gcc.dg/vect/bb-slp-42.c	(revision 278484)
>> +++ gcc/testsuite/gcc.dg/vect/bb-slp-42.c	(working copy)
>> @@ -44,6 +44,5 @@ main ()
>>  
>>  }
>>  
>> -/* See that we vectorize an SLP instance.  */
>> -/* { dg-final { scan-tree-dump "Found vectorizable constructor"
>"slp1" { target { ! vect_fully_masked } } } } */
>> -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 4
>"slp1" { target { ! vect_fully_masked } } } } */
>> +/* See that we do not try to vectorize the uniform CTORs.  */
>> +/* { dg-final { scan-tree-dump-not "Analyzing vectorizable
>constructor" "slp1" } } */
diff mbox series

Patch

Index: gcc/tree-vect-slp.c
===================================================================
--- gcc/tree-vect-slp.c	(revision 278484)
+++ gcc/tree-vect-slp.c	(working copy)
@@ -2183,6 +2183,10 @@  vect_analyze_slp_instance (vec_info *vin
 	  else
 	    return false;
 	}
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_NOTE, vect_location,
+			 "Analyzing vectorizable constructor: %G\n",
+			 stmt_info->stmt);
     }
   else
     {
@@ -3123,31 +3127,22 @@  vect_slp_check_for_constructors (bb_vec_
   gimple_stmt_iterator gsi;
 
   for (gsi = bb_vinfo->region_begin;
-      gsi_stmt (gsi) != gsi_stmt (bb_vinfo->region_end); gsi_next (&gsi))
+       gsi_stmt (gsi) != gsi_stmt (bb_vinfo->region_end); gsi_next (&gsi))
     {
-      gimple *stmt = gsi_stmt (gsi);
-
-      if (is_gimple_assign (stmt)
-	  && gimple_assign_rhs_code (stmt) == CONSTRUCTOR
-	  && TREE_CODE (gimple_assign_lhs (stmt)) == SSA_NAME
-	  && TREE_CODE (TREE_TYPE (gimple_assign_lhs (stmt))) == VECTOR_TYPE)
-	{
-	  tree rhs = gimple_assign_rhs1 (stmt);
-
-	  if (CONSTRUCTOR_NELTS (rhs) == 0)
-	    continue;
-
-	  poly_uint64 subparts = TYPE_VECTOR_SUBPARTS (TREE_TYPE (rhs));
+      gassign *stmt = dyn_cast <gassign *> (gsi_stmt (gsi));
+      if (!stmt || gimple_assign_rhs_code (stmt) != CONSTRUCTOR)
+	continue;
 
-	  if (maybe_ne (subparts, CONSTRUCTOR_NELTS (rhs)))
-	    continue;
+      tree rhs = gimple_assign_rhs1 (stmt);
+      if (!VECTOR_TYPE_P (TREE_TYPE (rhs))
+	  || maybe_ne (TYPE_VECTOR_SUBPARTS (TREE_TYPE (rhs)),
+		       CONSTRUCTOR_NELTS (rhs))
+	  || VECTOR_TYPE_P (TREE_TYPE (CONSTRUCTOR_ELT (rhs, 0)->value))
+	  || uniform_vector_p (rhs))
+	continue;
 
-	  if (dump_enabled_p ())
-	    dump_printf_loc (MSG_NOTE, vect_location,
-			     "Found vectorizable constructor: %G\n", stmt);
-	  stmt_vec_info stmt_info = bb_vinfo->lookup_stmt (stmt);
-	  BB_VINFO_GROUPED_STORES (bb_vinfo).safe_push (stmt_info);
-	}
+      stmt_vec_info stmt_info = bb_vinfo->lookup_stmt (stmt);
+      BB_VINFO_GROUPED_STORES (bb_vinfo).safe_push (stmt_info);
     }
 }
 
Index: gcc/testsuite/gcc.dg/vect/bb-slp-42.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/bb-slp-42.c	(revision 278484)
+++ gcc/testsuite/gcc.dg/vect/bb-slp-42.c	(working copy)
@@ -44,6 +44,5 @@  main ()
 
 }
 
-/* See that we vectorize an SLP instance.  */
-/* { dg-final { scan-tree-dump "Found vectorizable constructor" "slp1" { target { ! vect_fully_masked } } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 4 "slp1" { target { ! vect_fully_masked } } } } */
+/* See that we do not try to vectorize the uniform CTORs.  */
+/* { dg-final { scan-tree-dump-not "Analyzing vectorizable constructor" "slp1" } } */