diff mbox

[2/14,Vectorizer] Make REDUC_xxx_EXPR tree codes produce a scalar result

Message ID 541AC71C.5010702@arm.com
State New
Headers show

Commit Message

Alan Lawrence Sept. 18, 2014, 11:50 a.m. UTC
This fixes PR/61114 by redefining the REDUC_{MIN,MAX,PLUS}_EXPR tree codes.

These are presently documented as producing a vector with the result in element 
0, and this is inconsistent with their use in tree-vect-loop.c (which on 
bigendian targets pulls the bits out of the wrong end of the vector result). 
This leads to bugs on bigendian targets - see also 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61114.

I discounted "fixing" the vectorizer (to read from element 0) and then making 
bigendian targets (whose architectural insn produces the result in lane N-1) 
permute the result vector, as optimization of vectors in RTL seems unlikely to 
remove such a permute and would lead to a performance regression.

Instead it seems more natural for the tree code to produce a scalar result 
(producing a vector with the result in lane 0 has already caused confusion, e.g. 
https://gcc.gnu.org/ml/gcc-patches/2012-10/msg01100.html).

However, this patch preserves the meaning of the optab (producing a result in 
lane 0 on little-endian architectures or N-1 on bigendian), thus generally 
avoiding the need to change backends. Thus, expr.c extracts an 
endianness-dependent element from the optab result to give the result expected 
for the tree code.

Previously posted as an RFC 
https://gcc.gnu.org/ml/gcc-patches/2014-08/msg00041.html , now with an extra 
VIEW_CONVERT_EXPR if the types of the reduction/result do not match.

Testing:
	x86_86-none-linux-gnu: bootstrap, check-gcc, check-g++
	aarch64-none-linux-gnu: bootstrap
	aarch64-none-elf:  check-gcc, check-g++
	arm-none-eabi: check-gcc

	aarch64_be-none-elf: check-gcc, showing
	FAIL->PASS: gcc.dg/vect/no-scevccp-outer-7.c execution test
	FAIL->PASS: gcc.dg/vect/no-scevccp-outer-13.c execution test
	Passes the (previously-failing) reduced testcase on
	 	https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61114

	Have also assembler/stage-1 tested that testcase on PowerPC, also fixed.

gcc/ChangeLog:

	* expr.c (expand_expr_real_2): For REDUC_{MIN,MAX,PLUS}_EXPR, add
	extract_bit_field around optab result.

	* fold-const.c (fold_unary_loc): For REDUC_{MIN,MAX,PLUS}_EXPR, produce
	scalar not vector.

	* tree-cfg.c (verify_gimple_assign_unary): Check result vs operand type
	for REDUC_{MIN,MAX,PLUS}_EXPR.

	* tree-vect-loop.c (vect_analyze_loop): Update comment.
	(vect_create_epilog_for_reduction): For direct vector reduction, use
	result of tree code directly without extract_bit_field.

	* tree.def (REDUC_MAX_EXPR, REDUC_MIN_EXPR, REDUC_PLUS_EXPR): Update
	comment.

Comments

Richard Biener Sept. 22, 2014, 10:34 a.m. UTC | #1
On Thu, Sep 18, 2014 at 1:50 PM, Alan Lawrence <alan.lawrence@arm.com> wrote:
> This fixes PR/61114 by redefining the REDUC_{MIN,MAX,PLUS}_EXPR tree codes.
>
> These are presently documented as producing a vector with the result in
> element 0, and this is inconsistent with their use in tree-vect-loop.c
> (which on bigendian targets pulls the bits out of the wrong end of the
> vector result). This leads to bugs on bigendian targets - see also
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61114.
>
> I discounted "fixing" the vectorizer (to read from element 0) and then
> making bigendian targets (whose architectural insn produces the result in
> lane N-1) permute the result vector, as optimization of vectors in RTL seems
> unlikely to remove such a permute and would lead to a performance
> regression.
>
> Instead it seems more natural for the tree code to produce a scalar result
> (producing a vector with the result in lane 0 has already caused confusion,
> e.g. https://gcc.gnu.org/ml/gcc-patches/2012-10/msg01100.html).
>
> However, this patch preserves the meaning of the optab (producing a result
> in lane 0 on little-endian architectures or N-1 on bigendian), thus
> generally avoiding the need to change backends. Thus, expr.c extracts an
> endianness-dependent element from the optab result to give the result
> expected for the tree code.
>
> Previously posted as an RFC
> https://gcc.gnu.org/ml/gcc-patches/2014-08/msg00041.html , now with an extra
> VIEW_CONVERT_EXPR if the types of the reduction/result do not match.

Huh.  Does that ever happen?  Please use a NOP_EXPR instead of
a VIEW_CONVERT_EXPR.

Ok with that change.

Thanks,
Richard.

> Testing:
>         x86_86-none-linux-gnu: bootstrap, check-gcc, check-g++
>         aarch64-none-linux-gnu: bootstrap
>         aarch64-none-elf:  check-gcc, check-g++
>         arm-none-eabi: check-gcc
>
>         aarch64_be-none-elf: check-gcc, showing
>         FAIL->PASS: gcc.dg/vect/no-scevccp-outer-7.c execution test
>         FAIL->PASS: gcc.dg/vect/no-scevccp-outer-13.c execution test
>         Passes the (previously-failing) reduced testcase on
>                 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61114
>
>         Have also assembler/stage-1 tested that testcase on PowerPC, also
> fixed.

> gcc/ChangeLog:
>
>         * expr.c (expand_expr_real_2): For REDUC_{MIN,MAX,PLUS}_EXPR, add
>         extract_bit_field around optab result.
>
>         * fold-const.c (fold_unary_loc): For REDUC_{MIN,MAX,PLUS}_EXPR,
> produce
>         scalar not vector.
>
>         * tree-cfg.c (verify_gimple_assign_unary): Check result vs operand
> type
>         for REDUC_{MIN,MAX,PLUS}_EXPR.
>
>         * tree-vect-loop.c (vect_analyze_loop): Update comment.
>         (vect_create_epilog_for_reduction): For direct vector reduction, use
>         result of tree code directly without extract_bit_field.
>
>         * tree.def (REDUC_MAX_EXPR, REDUC_MIN_EXPR, REDUC_PLUS_EXPR): Update
>         comment.
Alan Lawrence Sept. 22, 2014, 1:23 p.m. UTC | #2
Richard Biener wrote:
> 
> Huh.  Does that ever happen?  Please use a NOP_EXPR instead of
> a VIEW_CONVERT_EXPR.

Yes, the testcase is gcc.target/i386/pr51235.c which performs black magic*** 
with void *. (This testcase otherwise fails the verify_gimple_assign_unary check 
in tree-cfg.c .)   However, test passes also with your suggestion of NOP_EXPR so 
that's good by me.

***that is, computes the minimum

--Alan

> 
> Ok with that change.
> 
> Thanks,
> Richard.
> 
>> Testing:
>>         x86_86-none-linux-gnu: bootstrap, check-gcc, check-g++
>>         aarch64-none-linux-gnu: bootstrap
>>         aarch64-none-elf:  check-gcc, check-g++
>>         arm-none-eabi: check-gcc
>>
>>         aarch64_be-none-elf: check-gcc, showing
>>         FAIL->PASS: gcc.dg/vect/no-scevccp-outer-7.c execution test
>>         FAIL->PASS: gcc.dg/vect/no-scevccp-outer-13.c execution test
>>         Passes the (previously-failing) reduced testcase on
>>                 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61114
>>
>>         Have also assembler/stage-1 tested that testcase on PowerPC, also
>> fixed.
> 
>> gcc/ChangeLog:
>>
>>         * expr.c (expand_expr_real_2): For REDUC_{MIN,MAX,PLUS}_EXPR, add
>>         extract_bit_field around optab result.
>>
>>         * fold-const.c (fold_unary_loc): For REDUC_{MIN,MAX,PLUS}_EXPR,
>> produce
>>         scalar not vector.
>>
>>         * tree-cfg.c (verify_gimple_assign_unary): Check result vs operand
>> type
>>         for REDUC_{MIN,MAX,PLUS}_EXPR.
>>
>>         * tree-vect-loop.c (vect_analyze_loop): Update comment.
>>         (vect_create_epilog_for_reduction): For direct vector reduction, use
>>         result of tree code directly without extract_bit_field.
>>
>>         * tree.def (REDUC_MAX_EXPR, REDUC_MIN_EXPR, REDUC_PLUS_EXPR): Update
>>         comment.
>
diff mbox

Patch

diff --git a/gcc/expr.c b/gcc/expr.c
index 58b87ba7ed7eee156b9730b61679af946694e8df..a293c06489f09586ed56dff1381467401687be45 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -9020,7 +9020,17 @@  expand_expr_real_2 (sepops ops, rtx target, enum machine_mode tmode,
       {
         op0 = expand_normal (treeop0);
         this_optab = optab_for_tree_code (code, type, optab_default);
-        temp = expand_unop (mode, this_optab, op0, target, unsignedp);
+        enum machine_mode vec_mode = TYPE_MODE (TREE_TYPE (treeop0));
+        temp = expand_unop (vec_mode, this_optab, op0, NULL_RTX, unsignedp);
+        gcc_assert (temp);
+        /* The tree code produces a scalar result, but (somewhat by convention)
+           the optab produces a vector with the result in element 0 if
+           little-endian, or element N-1 if big-endian.  So pull the scalar
+           result out of that element.  */
+        int index = BYTES_BIG_ENDIAN ? GET_MODE_NUNITS (vec_mode) - 1 : 0;
+        int bitsize = GET_MODE_BITSIZE (GET_MODE_INNER (vec_mode));
+        temp = extract_bit_field (temp, bitsize, bitsize * index, unsignedp,
+				  target, mode, mode);
         gcc_assert (temp);
         return temp;
       }
diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index d44476972158b125aecd8c4a5c8d6176ad3b0e5c..b8baa94d37a74ebb824e2a4d03f2a10befcdf749 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -8475,12 +8475,13 @@  fold_unary_loc (location_t loc, enum tree_code code, tree type, tree op0)
     case REDUC_MAX_EXPR:
     case REDUC_PLUS_EXPR:
       {
-	unsigned int nelts = TYPE_VECTOR_SUBPARTS (type), i;
+	unsigned int nelts, i;
 	tree *elts;
 	enum tree_code subcode;
 
 	if (TREE_CODE (op0) != VECTOR_CST)
 	  return NULL_TREE;
+        nelts = TYPE_VECTOR_SUBPARTS (TREE_TYPE (op0));
 
 	elts = XALLOCAVEC (tree, nelts);
 	if (!vec_cst_ctor_to_array (op0, elts))
@@ -8499,10 +8500,9 @@  fold_unary_loc (location_t loc, enum tree_code code, tree type, tree op0)
 	    elts[0] = const_binop (subcode, elts[0], elts[i]);
 	    if (elts[0] == NULL_TREE || !CONSTANT_CLASS_P (elts[0]))
 	      return NULL_TREE;
-	    elts[i] = build_zero_cst (TREE_TYPE (type));
 	  }
 
-	return build_vector (type, elts);
+	return elts[0];
       }
 
     default:
diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index 9d1de01021cfda296c3fe53c9212c3aa6bd627c5..49986cc40758bb5998e395c727142e75f7d6e9f4 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -3527,12 +3527,21 @@  verify_gimple_assign_unary (gimple stmt)
 
         return false;
       }
-
-    case VEC_UNPACK_HI_EXPR:
-    case VEC_UNPACK_LO_EXPR:
     case REDUC_MAX_EXPR:
     case REDUC_MIN_EXPR:
     case REDUC_PLUS_EXPR:
+      if (!VECTOR_TYPE_P (rhs1_type)
+	  || !useless_type_conversion_p (lhs_type, TREE_TYPE (rhs1_type)))
+        {
+	  error ("reduction should convert from vector to element type");
+	  debug_generic_expr (lhs_type);
+	  debug_generic_expr (rhs1_type);
+	  return true;
+	}
+      return false;
+
+    case VEC_UNPACK_HI_EXPR:
+    case VEC_UNPACK_LO_EXPR:
     case VEC_UNPACK_FLOAT_HI_EXPR:
     case VEC_UNPACK_FLOAT_LO_EXPR:
       /* FIXME.  */
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 7e013f3b549a07bd44789bd4d3e3701eec7c51dc..36f51977845bf5ce451564ccd1eb8ad5f39ac8de 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -1892,9 +1892,9 @@  vect_analyze_loop (struct loop *loop)
 
    Output:
    REDUC_CODE - the corresponding tree-code to be used to reduce the
-      vector of partial results into a single scalar result (which
-      will also reside in a vector) or ERROR_MARK if the operation is
-      a supported reduction operation, but does not have such tree-code.
+      vector of partial results into a single scalar result, or ERROR_MARK
+      if the operation is a supported reduction operation, but does not have
+      such a tree-code.
 
    Return FALSE if CODE currently cannot be vectorized as reduction.  */
 
@@ -4167,6 +4167,7 @@  vect_create_epilog_for_reduction (vec<tree> vect_defs, gimple stmt,
   if (reduc_code != ERROR_MARK && !slp_reduc)
     {
       tree tmp;
+      tree vec_elem_type;
 
       /*** Case 1:  Create:
            v_out2 = reduc_expr <v_out1>  */
@@ -4175,14 +4176,26 @@  vect_create_epilog_for_reduction (vec<tree> vect_defs, gimple stmt,
         dump_printf_loc (MSG_NOTE, vect_location,
 			 "Reduce using direct vector reduction.\n");
 
-      vec_dest = vect_create_destination_var (scalar_dest, vectype);
-      tmp = build1 (reduc_code, vectype, new_phi_result);
-      epilog_stmt = gimple_build_assign (vec_dest, tmp);
-      new_temp = make_ssa_name (vec_dest, epilog_stmt);
+      vec_elem_type = TREE_TYPE (TREE_TYPE (new_phi_result));
+      if (!useless_type_conversion_p (scalar_type, vec_elem_type))
+	{
+          tree tmp_dest =
+	      vect_create_destination_var (scalar_dest, vec_elem_type);
+	  tmp = build1 (reduc_code, vec_elem_type, new_phi_result);
+	  epilog_stmt = gimple_build_assign (tmp_dest, tmp);
+	  new_temp = make_ssa_name (tmp_dest, epilog_stmt);
+	  gimple_assign_set_lhs (epilog_stmt, new_temp);
+	  gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
+
+	  tmp = build1 (VIEW_CONVERT_EXPR, scalar_type, new_temp);
+	}
+      else
+	tmp = build1 (reduc_code, scalar_type, new_phi_result);
+      epilog_stmt = gimple_build_assign (new_scalar_dest, tmp);
+      new_temp = make_ssa_name (new_scalar_dest, epilog_stmt);
       gimple_assign_set_lhs (epilog_stmt, new_temp);
       gsi_insert_before (&exit_gsi, epilog_stmt, GSI_SAME_STMT);
-
-      extract_scalar_result = true;
+      scalar_results.safe_push (new_temp);
     }
   else
     {
diff --git a/gcc/tree.def b/gcc/tree.def
index 84ffe93aa6fdc827f18ca81225bca007d50b50f6..e9af52e554babb100d49ea14f47c805cd5024949 100644
--- a/gcc/tree.def
+++ b/gcc/tree.def
@@ -1157,10 +1157,9 @@  DEFTREECODE (TRANSACTION_EXPR, "transaction_expr", tcc_expression, 1)
    result (e.g. summing the elements of the vector, finding the minimum over
    the vector elements, etc).
    Operand 0 is a vector.
-   The expression returns a vector of the same type, with the first
-   element in the vector holding the result of the reduction of all elements
-   of the operand.  The content of the other elements in the returned vector
-   is undefined.  */
+   The expression returns a scalar, with type the same as the elements of the
+   vector, holding the result of the reduction of all elements of the operand.
+   */
 DEFTREECODE (REDUC_MAX_EXPR, "reduc_max_expr", tcc_unary, 1)
 DEFTREECODE (REDUC_MIN_EXPR, "reduc_min_expr", tcc_unary, 1)
 DEFTREECODE (REDUC_PLUS_EXPR, "reduc_plus_expr", tcc_unary, 1)