diff mbox

[3/14] Add new optabs for reducing vectors to scalars

Message ID 541AC800.2060907@arm.com
State New
Headers show

Commit Message

Alan Lawrence Sept. 18, 2014, 11:54 a.m. UTC
These match their corresponding tree codes, by taking a vector and returning a 
scalar; this is more architecturally neutral than the (somewhat loosely defined) 
previous optab that took a vector and returned a vector with the result in the 
least significant bits (i.e. element 0 for little-endian or N-1 for bigendian). 
However, the old optabs are preserved so as not to break existing backends, so 
clients check for both old + new optabs.

Bootstrap, check-gcc and check-g++ on x86_64-none-linux-gnu.
aarch64.exp + vect.exp on aarch64{,_be}-none-elf.
(of course at this point in the series all these are using the old optab + 
migration path.)

gcc/ChangeLog:

	* doc/md.texi (Standard Names): Add reduc_(plus,[us](min|max))|scal
	optabs, and note in reduc_[us](plus|min|max) to prefer the former.

	* expr.c (expand_expr_real_2): Use reduc_..._scal if available, fall
	back to old reduc_... + BIT_FIELD_REF only if not.

	* optabs.c (optab_for_tree_code): for REDUC_(MAX,MIN,PLUS)_EXPR,
	return the reduce-to-scalar (reduc_..._scal) optab.
	(scalar_reduc_to_vector): New.

	* optabs.def (reduc_smax_scal_optab, reduc_smin_scal_optab,
	reduc_plus_scal_optab, reduc_umax_scal_optab, reduc_umin_scal_optab):
	New.

	* optabs.h (scalar_reduc_to_vector): Declare.

	* tree-vect-loop.c (vectorizable_reduction): Look for optabs reducing
	to either scalar or vector.

Comments

Richard Biener Sept. 22, 2014, 10:40 a.m. UTC | #1
On Thu, Sep 18, 2014 at 1:54 PM, Alan Lawrence <alan.lawrence@arm.com> wrote:
> These match their corresponding tree codes, by taking a vector and returning
> a scalar; this is more architecturally neutral than the (somewhat loosely
> defined) previous optab that took a vector and returned a vector with the
> result in the least significant bits (i.e. element 0 for little-endian or
> N-1 for bigendian). However, the old optabs are preserved so as not to break
> existing backends, so clients check for both old + new optabs.
>
> Bootstrap, check-gcc and check-g++ on x86_64-none-linux-gnu.
> aarch64.exp + vect.exp on aarch64{,_be}-none-elf.
> (of course at this point in the series all these are using the old optab +
> migration path.)

scalar_reduc_to_vector misses a comment.

I wonder if at the end we wouldn't transition all backends and then
renaming reduc_*_scal_optab back to reduc_*_optab makes sense.

The optabs have only one mode - I wouldn't be surprised if an ISA
invents for example v4si -> di reduction?  So do we want to make
reduc_plus_scal_optab a little bit more future proof (maybe there
is already an ISA that supports this kind of reduction?).

Otherwise the patch looks good to me.

Thanks,
Richard.

> gcc/ChangeLog:
>
>         * doc/md.texi (Standard Names): Add reduc_(plus,[us](min|max))|scal
>         optabs, and note in reduc_[us](plus|min|max) to prefer the former.
>
>         * expr.c (expand_expr_real_2): Use reduc_..._scal if available, fall
>         back to old reduc_... + BIT_FIELD_REF only if not.
>
>         * optabs.c (optab_for_tree_code): for REDUC_(MAX,MIN,PLUS)_EXPR,
>         return the reduce-to-scalar (reduc_..._scal) optab.
>         (scalar_reduc_to_vector): New.
>
>         * optabs.def (reduc_smax_scal_optab, reduc_smin_scal_optab,
>         reduc_plus_scal_optab, reduc_umax_scal_optab,
> reduc_umin_scal_optab):
>         New.
>
>         * optabs.h (scalar_reduc_to_vector): Declare.
>
>         * tree-vect-loop.c (vectorizable_reduction): Look for optabs
> reducing
>         to either scalar or vector.
Alan Lawrence Sept. 22, 2014, 1:26 p.m. UTC | #2
Richard Biener wrote:
> 
> scalar_reduc_to_vector misses a comment.

Ok to reuse the comment in optabs.h in optabs.c also?

> I wonder if at the end we wouldn't transition all backends and then
> renaming reduc_*_scal_optab back to reduc_*_optab makes sense.

Yes, that sounds like a plan, the _scal is a bit of a mouthful.

> The optabs have only one mode - I wouldn't be surprised if an ISA
> invents for example v4si -> di reduction?  So do we want to make
> reduc_plus_scal_optab a little bit more future proof (maybe there
> is already an ISA that supports this kind of reduction?).

That sounds like a plausible thing for an ISA to do, indeed. However given these 
names are only used by the autovectorizer rather than directly, the question is 
what the corresponding source code looks like, and/or what changes to the 
autovectorizer we might have to make to (look for code to) exploit such an 
instruction. At this point I could go for a 
reduc_{plus,min_max}_scal_<mode><mode> which reduces from the first vector mode 
to the second scalar mode, and then make the vectorizer look only for cases 
where the second mode was the element type of the first; but I'm not sure I want 
to do anything more complicated than that at this stage. (However, indeed it 
would leave the possibility open for the future.)

--Alan
Richard Biener Sept. 22, 2014, 1:38 p.m. UTC | #3
On Mon, Sep 22, 2014 at 3:26 PM, Alan Lawrence <alan.lawrence@arm.com> wrote:
> Richard Biener wrote:
>>
>>
>> scalar_reduc_to_vector misses a comment.
>
>
> Ok to reuse the comment in optabs.h in optabs.c also?

Sure.

>> I wonder if at the end we wouldn't transition all backends and then
>> renaming reduc_*_scal_optab back to reduc_*_optab makes sense.
>
>
> Yes, that sounds like a plan, the _scal is a bit of a mouthful.
>
>> The optabs have only one mode - I wouldn't be surprised if an ISA
>> invents for example v4si -> di reduction?  So do we want to make
>> reduc_plus_scal_optab a little bit more future proof (maybe there
>> is already an ISA that supports this kind of reduction?).
>
>
> That sounds like a plausible thing for an ISA to do, indeed. However given
> these names are only used by the autovectorizer rather than directly, the
> question is what the corresponding source code looks like, and/or what
> changes to the autovectorizer we might have to make to (look for code to)
> exploit such an instruction.

Ah, indeed.  Would be sth like a REDUC_WIDEN_SUM_EXPR or so.

> At this point I could go for a
> reduc_{plus,min_max}_scal_<mode><mode> which reduces from the first vector
> mode to the second scalar mode, and then make the vectorizer look only for
> cases where the second mode was the element type of the first; but I'm not
> sure I want to do anything more complicated than that at this stage.
> (However, indeed it would leave the possibility open for the future.)

Yeah, agreed.  For the min/max case a widen variant isn't useful anyway.

Thanks,
Richard.

> --Alan
>
diff mbox

Patch

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index dd7861188afb8afd01971f9f75f0e32da9f9f826..3f5fd6f0e3ac3fcc30f6c961e3e2709a35f4d413 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -4811,29 +4811,48 @@  it is unspecified which of the two operands is returned as the result.
 @cindex @code{reduc_smax_@var{m}} instruction pattern
 @item @samp{reduc_smin_@var{m}}, @samp{reduc_smax_@var{m}}
 Find the signed minimum/maximum of the elements of a vector. The vector is
-operand 1, and the scalar result is stored in the least significant bits of
+operand 1, and the result is stored in the least significant bits of
 operand 0 (also a vector). The output and input vector should have the same
-modes.
+modes. These are legacy optabs, and platforms should prefer to implement
+@samp{reduc_smin_scal_@var{m}} and @samp{reduc_smax_scal_@var{m}}.
 
 @cindex @code{reduc_umin_@var{m}} instruction pattern
 @cindex @code{reduc_umax_@var{m}} instruction pattern
 @item @samp{reduc_umin_@var{m}}, @samp{reduc_umax_@var{m}}
 Find the unsigned minimum/maximum of the elements of a vector. The vector is
-operand 1, and the scalar result is stored in the least significant bits of
+operand 1, and the result is stored in the least significant bits of
 operand 0 (also a vector). The output and input vector should have the same
-modes.
+modes. These are legacy optabs, and platforms should prefer to implement
+@samp{reduc_umin_scal_@var{m}} and @samp{reduc_umax_scal_@var{m}}.
 
 @cindex @code{reduc_splus_@var{m}} instruction pattern
-@item @samp{reduc_splus_@var{m}}
-Compute the sum of the signed elements of a vector. The vector is operand 1,
-and the scalar result is stored in the least significant bits of operand 0
-(also a vector). The output and input vector should have the same modes.
-
 @cindex @code{reduc_uplus_@var{m}} instruction pattern
-@item @samp{reduc_uplus_@var{m}}
-Compute the sum of the unsigned elements of a vector. The vector is operand 1,
-and the scalar result is stored in the least significant bits of operand 0
+@item @samp{reduc_splus_@var{m}}, @samp{reduc_uplus_@var{m}}
+Compute the sum of the signed/unsigned elements of a vector. The vector is
+operand 1, and the result is stored in the least significant bits of operand 0
 (also a vector). The output and input vector should have the same modes.
+These are legacy optabs, and platforms should prefer to implement
+@samp{reduc_plus_scal_@var{m}}.
+
+@cindex @code{reduc_smin_scal_@var{m}} instruction pattern
+@cindex @code{reduc_smax_scal_@var{m}} instruction pattern
+@item @samp{reduc_smin_scal_@var{m}}, @samp{reduc_smax_scal_@var{m}}
+Find the signed minimum/maximum of the elements of a vector. The vector is
+operand 1, and operand 0 is the scalar result, with mode equal to the mode of
+the elements of the input vector.
+
+@cindex @code{reduc_umin_scal_@var{m}} instruction pattern
+@cindex @code{reduc_umax_scal_@var{m}} instruction pattern
+@item @samp{reduc_umin_scal_@var{m}}, @samp{reduc_umax_scal_@var{m}}
+Find the unsigned minimum/maximum of the elements of a vector. The vector is
+operand 1, and operand 0 is the scalar result, with mode equal to the mode of
+the elements of the input vector.
+
+@cindex @code{reduc_plus_scal_@var{m}} instruction pattern
+@item @samp{reduc_plus_scal_@var{m}}
+Compute the sum of the elements of a vector. The vector is operand 1, and
+operand 0 is the scalar result, with mode equal to the mode of the elements of
+the input vector.
 
 @cindex @code{sdot_prod@var{m}} instruction pattern
 @item @samp{sdot_prod@var{m}}
diff --git a/gcc/expr.c b/gcc/expr.c
index a293c06489f09586ed56dff1381467401687be45..11930ca121e4e1f3807261a2e5b0ca4f6723176d 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -9018,21 +9018,39 @@  expand_expr_real_2 (sepops ops, rtx target, enum machine_mode tmode,
     case REDUC_MIN_EXPR:
     case REDUC_PLUS_EXPR:
       {
-        op0 = expand_normal (treeop0);
-        this_optab = optab_for_tree_code (code, type, optab_default);
-        enum machine_mode vec_mode = TYPE_MODE (TREE_TYPE (treeop0));
-        temp = expand_unop (vec_mode, this_optab, op0, NULL_RTX, unsignedp);
-        gcc_assert (temp);
-        /* The tree code produces a scalar result, but (somewhat by convention)
-           the optab produces a vector with the result in element 0 if
-           little-endian, or element N-1 if big-endian.  So pull the scalar
-           result out of that element.  */
-        int index = BYTES_BIG_ENDIAN ? GET_MODE_NUNITS (vec_mode) - 1 : 0;
-        int bitsize = GET_MODE_BITSIZE (GET_MODE_INNER (vec_mode));
-        temp = extract_bit_field (temp, bitsize, bitsize * index, unsignedp,
-				  target, mode, mode);
-        gcc_assert (temp);
-        return temp;
+	op0 = expand_normal (treeop0);
+	enum machine_mode vec_mode = GET_MODE (op0);
+	this_optab = optab_for_tree_code (code, type, optab_default);
+
+	if (optab_handler (this_optab, vec_mode) != CODE_FOR_nothing)
+	  {
+	    struct expand_operand ops[2];
+	    enum insn_code icode = optab_handler (this_optab, vec_mode);
+
+	    create_output_operand (&ops[0], target, mode);
+	    create_input_operand (&ops[1], op0, vec_mode);
+	    if (maybe_expand_insn (icode, 2, ops))
+	      {
+		target = ops[0].value;
+		if (GET_MODE (target) != mode)
+		  return gen_lowpart (tmode, target);
+		return target;
+	      }
+	  }
+	/* Fall back to optab with vector result, and then extract scalar.  */
+	this_optab = scalar_reduc_to_vector (this_optab, type);
+	temp = expand_unop (vec_mode, this_optab, op0, NULL_RTX, unsignedp);
+	gcc_assert (temp);
+	/* The tree code produces a scalar result, but (somewhat by convention)
+	   the optab produces a vector with the result in element 0 if
+	   little-endian, or element N-1 if big-endian.  So pull the scalar
+	   result out of that element.  */
+	int index = BYTES_BIG_ENDIAN ? GET_MODE_NUNITS (vec_mode) - 1 : 0;
+	int bitsize = GET_MODE_BITSIZE (GET_MODE_INNER (vec_mode));
+	temp = extract_bit_field (temp, bitsize, bitsize * index, unsignedp,
+	                          target, mode, mode);
+	gcc_assert (temp);
+	return temp;
       }
 
     case VEC_LSHIFT_EXPR:
diff --git a/gcc/optabs.c b/gcc/optabs.c
index d6412ec42d7c4908a74d098e80d7038e068ca557..e422bcce18d06a39b26547b510c35858efc2303e 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -505,13 +505,15 @@  optab_for_tree_code (enum tree_code code, const_tree type,
       return fma_optab;
 
     case REDUC_MAX_EXPR:
-      return TYPE_UNSIGNED (type) ? reduc_umax_optab : reduc_smax_optab;
+      return TYPE_UNSIGNED (type)
+	     ? reduc_umax_scal_optab : reduc_smax_scal_optab;
 
     case REDUC_MIN_EXPR:
-      return TYPE_UNSIGNED (type) ? reduc_umin_optab : reduc_smin_optab;
+      return TYPE_UNSIGNED (type)
+	     ? reduc_umin_scal_optab : reduc_smin_scal_optab;
 
     case REDUC_PLUS_EXPR:
-      return TYPE_UNSIGNED (type) ? reduc_uplus_optab : reduc_splus_optab;
+      return reduc_plus_scal_optab;
 
     case VEC_LSHIFT_EXPR:
       return vec_shl_optab;
@@ -607,7 +609,22 @@  optab_for_tree_code (enum tree_code code, const_tree type,
       return unknown_optab;
     }
 }
-
+
+optab
+scalar_reduc_to_vector (optab unoptab, const_tree type)
+{
+  switch (unoptab)
+    {
+    case reduc_plus_scal_optab:
+      return TYPE_UNSIGNED (type) ? reduc_uplus_optab : reduc_splus_optab;
+
+    case reduc_smin_scal_optab: return reduc_smin_optab;
+    case reduc_umin_scal_optab: return reduc_umin_optab;
+    case reduc_smax_scal_optab: return reduc_smax_optab;
+    case reduc_umax_scal_optab: return reduc_umax_optab;
+    default: return unknown_optab;
+    }
+}
 
 /* Expand vector widening operations.
 
diff --git a/gcc/optabs.def b/gcc/optabs.def
index b75547006585267d9f5b4f17ba972ba388852cf5..131ea048b012b073345be3b426d4ac8f33061809 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -243,12 +243,20 @@  OPTAB_D (sin_optab, "sin$a2")
 OPTAB_D (sincos_optab, "sincos$a3")
 OPTAB_D (tan_optab, "tan$a2")
 
+/* Vector reduction to a scalar.  */
+OPTAB_D (reduc_smax_scal_optab, "reduc_smax_scal_$a")
+OPTAB_D (reduc_smin_scal_optab, "reduc_smin_scal_$a")
+OPTAB_D (reduc_plus_scal_optab, "reduc_plus_scal_$a")
+OPTAB_D (reduc_umax_scal_optab, "reduc_umax_scal_$a")
+OPTAB_D (reduc_umin_scal_optab, "reduc_umin_scal_$a")
+/* (Old) Vector reduction, returning a vector with the result in one lane.  */
 OPTAB_D (reduc_smax_optab, "reduc_smax_$a")
 OPTAB_D (reduc_smin_optab, "reduc_smin_$a")
 OPTAB_D (reduc_splus_optab, "reduc_splus_$a")
 OPTAB_D (reduc_umax_optab, "reduc_umax_$a")
 OPTAB_D (reduc_umin_optab, "reduc_umin_$a")
 OPTAB_D (reduc_uplus_optab, "reduc_uplus_$a")
+
 OPTAB_D (sdot_prod_optab, "sdot_prod$I$a")
 OPTAB_D (ssum_widen_optab, "widen_ssum$I$a3")
 OPTAB_D (udot_prod_optab, "udot_prod$I$a")
diff --git a/gcc/optabs.h b/gcc/optabs.h
index 089b15a6fcd261bb15c898f185a157f1257284ba..d9f4900620a13d74fc3dfb1bac9bcb34416012de 100644
--- a/gcc/optabs.h
+++ b/gcc/optabs.h
@@ -162,6 +162,11 @@  enum optab_subtype
    vector shifts and rotates */
 extern optab optab_for_tree_code (enum tree_code, const_tree, enum optab_subtype);
 
+/* Given optab UNOPTAB that reduces a vector to a scalar, find instead the old
+   optab that produces a vector with the reduction result in one element,
+   for a tree with type TYPE.  */
+extern optab scalar_reduc_to_vector (optab unoptab, const_tree type);
+
 /* The various uses that a comparison can have; used by can_compare_p:
    jumps, conditional moves, store flag operations.  */
 enum can_compare_purpose
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 36f51977845bf5ce451564ccd1eb8ad5f39ac8de..d0a29d312bfd9a7eb552d937e3c64cf9b30d558a 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -5101,15 +5101,17 @@  vectorizable_reduction (gimple stmt, gimple_stmt_iterator *gsi,
 
           epilog_reduc_code = ERROR_MARK;
         }
-
-      if (reduc_optab
-          && optab_handler (reduc_optab, vec_mode) == CODE_FOR_nothing)
+      else if (optab_handler (reduc_optab, vec_mode) == CODE_FOR_nothing)
         {
-          if (dump_enabled_p ())
-	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-			     "reduc op not supported by target.\n");
+          optab = scalar_reduc_to_vector (reduc_optab, vectype_out);
+          if (optab_handler (optab, vec_mode) == CODE_FOR_nothing)
+            {
+              if (dump_enabled_p ())
+	        dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+				 "reduc op not supported by target.\n");
 
-          epilog_reduc_code = ERROR_MARK;
+	      epilog_reduc_code = ERROR_MARK;
+	    }
         }
     }
   else