diff mbox series

[1/4] middle-end Vect: Add support for dot-product where the sign for the multiplicant changes.

Message ID patch-14433-tamar@arm.com
State New
Headers show
Series [1/4] middle-end Vect: Add support for dot-product where the sign for the multiplicant changes. | expand

Commit Message

Tamar Christina May 5, 2021, 5:38 p.m. UTC
Hi All,

This patch adds support for a dot product where the sign of the multiplication
arguments differ. i.e. one is signed and one is unsigned but the precisions are
the same.

#define N 480
#define SIGNEDNESS_1 unsigned
#define SIGNEDNESS_2 signed
#define SIGNEDNESS_3 signed
#define SIGNEDNESS_4 unsigned

SIGNEDNESS_1 int __attribute__ ((noipa))
f (SIGNEDNESS_1 int res, SIGNEDNESS_3 char *restrict a,
   SIGNEDNESS_4 char *restrict b)
{
  for (__INTPTR_TYPE__ i = 0; i < N; ++i)
    {
      int av = a[i];
      int bv = b[i];
      SIGNEDNESS_2 short mult = av * bv;
      res += mult;
    }
  return res;
}

The operations are performed as if the operands were extended to a 32-bit value.
As such this operation isn't valid if there is an intermediate conversion to an
unsigned value. i.e.  if SIGNEDNESS_2 is unsigned.

more over if the signs of SIGNEDNESS_3 and SIGNEDNESS_4 are flipped the same
optab is used but the operands are flipped in the optab expansion.

To support this the patch extends the dot-product detection to optionally
ignore operands with different signs and stores this information in the optab
subtype which is now made a bitfield.

The subtype can now additionally controls which optab an EXPR can expand to.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* optabs.def (usdot_prod_optab): New.
	* doc/md.texi: Document it.
	* optabs-tree.c (optab_for_tree_code): Support usdot_prod_optab.
	* optabs-tree.h (enum optab_subtype): Likewise.
	* optabs.c (expand_widen_pattern_expr): Likewise.
	* tree-cfg.c (verify_gimple_assign_ternary): Likewise.
	* tree-vect-loop.c (vect_determine_dot_kind): New.
	(vectorizable_reduction): Query dot-product kind.
	* tree-vect-patterns.c (vect_supportable_direct_optab_p): Take optional
	optab subtype.
	(vect_joust_widened_type, vect_widened_op_tree): Optionally ignore
	mismatch types.
	(vect_recog_dot_prod_pattern): Support usdot_prod_optab.

--- inline copy of patch -- 
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index d166a0debedf4d8edf55c842bcf4ff4690b3e9ce..baf20416e63745097825fc30fdf2e66bc80d7d23 100644


--

Comments

Richard Biener May 7, 2021, 11:45 a.m. UTC | #1
On Wed, 5 May 2021, Tamar Christina wrote:

> Hi All,
> 
> This patch adds support for a dot product where the sign of the multiplication
> arguments differ. i.e. one is signed and one is unsigned but the precisions are
> the same.
> 
> #define N 480
> #define SIGNEDNESS_1 unsigned
> #define SIGNEDNESS_2 signed
> #define SIGNEDNESS_3 signed
> #define SIGNEDNESS_4 unsigned
> 
> SIGNEDNESS_1 int __attribute__ ((noipa))
> f (SIGNEDNESS_1 int res, SIGNEDNESS_3 char *restrict a,
>    SIGNEDNESS_4 char *restrict b)
> {
>   for (__INTPTR_TYPE__ i = 0; i < N; ++i)
>     {
>       int av = a[i];
>       int bv = b[i];
>       SIGNEDNESS_2 short mult = av * bv;
>       res += mult;
>     }
>   return res;
> }
> 
> The operations are performed as if the operands were extended to a 32-bit value.
> As such this operation isn't valid if there is an intermediate conversion to an
> unsigned value. i.e.  if SIGNEDNESS_2 is unsigned.
> 
> more over if the signs of SIGNEDNESS_3 and SIGNEDNESS_4 are flipped the same
> optab is used but the operands are flipped in the optab expansion.
> 
> To support this the patch extends the dot-product detection to optionally
> ignore operands with different signs and stores this information in the optab
> subtype which is now made a bitfield.
> 
> The subtype can now additionally controls which optab an EXPR can expand to.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 	* optabs.def (usdot_prod_optab): New.
> 	* doc/md.texi: Document it.
> 	* optabs-tree.c (optab_for_tree_code): Support usdot_prod_optab.
> 	* optabs-tree.h (enum optab_subtype): Likewise.
> 	* optabs.c (expand_widen_pattern_expr): Likewise.
> 	* tree-cfg.c (verify_gimple_assign_ternary): Likewise.
> 	* tree-vect-loop.c (vect_determine_dot_kind): New.
> 	(vectorizable_reduction): Query dot-product kind.
> 	* tree-vect-patterns.c (vect_supportable_direct_optab_p): Take optional
> 	optab subtype.
> 	(vect_joust_widened_type, vect_widened_op_tree): Optionally ignore
> 	mismatch types.
> 	(vect_recog_dot_prod_pattern): Support usdot_prod_optab.
> 
> --- inline copy of patch -- 
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index d166a0debedf4d8edf55c842bcf4ff4690b3e9ce..baf20416e63745097825fc30fdf2e66bc80d7d23 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -5440,11 +5440,13 @@ Like @samp{fold_left_plus_@var{m}}, but takes an additional mask operand
>  @item @samp{sdot_prod@var{m}}
>  @cindex @code{udot_prod@var{m}} instruction pattern
>  @itemx @samp{udot_prod@var{m}}
> +@cindex @code{usdot_prod@var{m}} instruction pattern
> +@itemx @samp{usdot_prod@var{m}}
>  Compute the sum of the products of two signed/unsigned elements.
> -Operand 1 and operand 2 are of the same mode. Their product, which is of a
> -wider mode, is computed and added to operand 3. Operand 3 is of a mode equal or
> -wider than the mode of the product. The result is placed in operand 0, which
> -is of the same mode as operand 3.
> +Operand 1 and operand 2 are of the same mode but may differ in signs. Their
> +product, which is of a wider mode, is computed and added to operand 3.
> +Operand 3 is of a mode equal or wider than the mode of the product. The
> +result is placed in operand 0, which is of the same mode as operand 3.

This doesn't really say what the 's', 'u' and 'us' specify.  Since
we're doing a widen multiplication and then a non-widening addition
we only need to know the effective sign of the multiplication so
I think the existing 's' and 'u' are enough to cover all cases?

The tree.def docs say the sum is also possibly widening but I don't see
this covered by the optab so we should eventually remove this
feature from the tree side.  In fact the tree-cfg.c verifier requires
the addition to be not widening - thus only tree.def needs adjustment.

>  @cindex @code{ssad@var{m}} instruction pattern
>  @item @samp{ssad@var{m}}
> diff --git a/gcc/optabs-tree.h b/gcc/optabs-tree.h
> index c3aaa1a416991e856d3e24da45968a92ebada82c..ebc23ac86fe99057f375781c2f1990e0548ba08d 100644
> --- a/gcc/optabs-tree.h
> +++ b/gcc/optabs-tree.h
> @@ -27,11 +27,29 @@ along with GCC; see the file COPYING3.  If not see
>     shift amount vs. machines that take a vector for the shift amount.  */
>  enum optab_subtype
>  {
> -  optab_default,
> -  optab_scalar,
> -  optab_vector
> +  optab_default = 1 << 0,
> +  optab_scalar = 1 << 1,
> +  optab_vector = 1 << 2,
> +  optab_signed_to_unsigned = 1 << 3,
> +  optab_unsigned_to_signed = 1 << 4
>  };
>  
> +/* Override the OrEqual-operator so we can use optab_subtype as a bit flag.  */
> +inline enum optab_subtype&
> +operator |= (enum optab_subtype& a, enum optab_subtype b)
> +{
> +    return a = static_cast<optab_subtype>(static_cast<int>(a)
> +					  | static_cast<int>(b));
> +}
> +
> +/* Override the Or-operator so we can use optab_subtype as a bit flag.  */
> +inline enum optab_subtype
> +operator | (enum optab_subtype a, enum optab_subtype b)
> +{
> +    return static_cast<optab_subtype>(static_cast<int>(a)
> +				      | static_cast<int>(b));
> +}
> +
>  /* Return the optab used for computing the given operation on the type given by
>     the second argument.  The third argument distinguishes between the types of
>     vector shifts and rotates.  */
> diff --git a/gcc/optabs-tree.c b/gcc/optabs-tree.c
> index 95ffe397c23e80c105afea52e9d47216bf52f55a..2f60004545defc53182e004eea1e5c22b7453072 100644
> --- a/gcc/optabs-tree.c
> +++ b/gcc/optabs-tree.c
> @@ -127,7 +127,17 @@ optab_for_tree_code (enum tree_code code, const_tree type,
>        return TYPE_UNSIGNED (type) ? usum_widen_optab : ssum_widen_optab;
>  
>      case DOT_PROD_EXPR:
> -      return TYPE_UNSIGNED (type) ? udot_prod_optab : sdot_prod_optab;
> +      {
> +	gcc_assert (subtype & optab_default
> +		    || subtype & optab_vector
> +		    || subtype & optab_signed_to_unsigned
> +		    || subtype & optab_unsigned_to_signed);
> +
> +	if (subtype & (optab_unsigned_to_signed | optab_signed_to_unsigned))
> +	  return usdot_prod_optab;
> +
> +	return (TYPE_UNSIGNED (type) ? udot_prod_optab : sdot_prod_optab);
> +      }
>  
>      case SAD_EXPR:
>        return TYPE_UNSIGNED (type) ? usad_optab : ssad_optab;
> diff --git a/gcc/optabs.c b/gcc/optabs.c
> index f4614a394587787293dc8b680a38901f7906f61c..2e18b76de1412eab71971753ac678597c0d00098 100644
> --- a/gcc/optabs.c
> +++ b/gcc/optabs.c
> @@ -262,6 +262,11 @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1, rtx wide_op,
>    bool sbool = false;
>  
>    oprnd0 = ops->op0;
> +  if (nops >= 2)
> +    oprnd1 = ops->op1;
> +  if (nops >= 3)
> +    oprnd2 = ops->op2;
> +
>    tmode0 = TYPE_MODE (TREE_TYPE (oprnd0));
>    if (ops->code == VEC_UNPACK_FIX_TRUNC_HI_EXPR
>        || ops->code == VEC_UNPACK_FIX_TRUNC_LO_EXPR)
> @@ -285,6 +290,27 @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1, rtx wide_op,
>  	   ? vec_unpacks_sbool_hi_optab : vec_unpacks_sbool_lo_optab);
>        sbool = true;
>      }
> +  else if (ops->code == DOT_PROD_EXPR)
> +    {
> +      enum optab_subtype subtype = optab_default;
> +      signop sign1 = TYPE_SIGN (TREE_TYPE (oprnd0));
> +      signop sign2 = TYPE_SIGN (TREE_TYPE (oprnd1));
> +      if (sign1 == sign2)
> +	;
> +      else if (sign1 == SIGNED && sign2 == UNSIGNED)
> +	{
> +	  subtype |= optab_signed_to_unsigned;
> +	  /* Same as optab_unsigned_to_signed but flip the operands.  */
> +	  std::swap (op0, op1);
> +	}
> +      else if (sign1 == UNSIGNED && sign2 == SIGNED)
> +	subtype |= optab_unsigned_to_signed;
> +      else
> +	gcc_unreachable ();
> +
> +      widen_pattern_optab
> +	= optab_for_tree_code (ops->code, TREE_TYPE (oprnd0), subtype);
> +    }
>    else
>      widen_pattern_optab
>        = optab_for_tree_code (ops->code, TREE_TYPE (oprnd0), optab_default);
> @@ -298,10 +324,7 @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1, rtx wide_op,
>    gcc_assert (icode != CODE_FOR_nothing);
>  
>    if (nops >= 2)
> -    {
> -      oprnd1 = ops->op1;
> -      tmode1 = TYPE_MODE (TREE_TYPE (oprnd1));
> -    }
> +    tmode1 = TYPE_MODE (TREE_TYPE (oprnd1));
>    else if (sbool)
>      {
>        nops = 2;
> @@ -316,7 +339,6 @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1, rtx wide_op,
>      {
>        gcc_assert (tmode1 == tmode0);
>        gcc_assert (op1);
> -      oprnd2 = ops->op2;
>        wmode = TYPE_MODE (TREE_TYPE (oprnd2));
>      }
>  
> diff --git a/gcc/optabs.def b/gcc/optabs.def
> index b192a9d070b8aa72e5676b2eaa020b5bdd7ffcc8..f470c2168378cec840edf7fbdb7c18615baae928 100644
> --- a/gcc/optabs.def
> +++ b/gcc/optabs.def
> @@ -352,6 +352,7 @@ OPTAB_D (uavg_ceil_optab, "uavg$a3_ceil")
>  OPTAB_D (sdot_prod_optab, "sdot_prod$I$a")
>  OPTAB_D (ssum_widen_optab, "widen_ssum$I$a3")
>  OPTAB_D (udot_prod_optab, "udot_prod$I$a")
> +OPTAB_D (usdot_prod_optab, "usdot_prod$I$a")
>  OPTAB_D (usum_widen_optab, "widen_usum$I$a3")
>  OPTAB_D (usad_optab, "usad$I$a")
>  OPTAB_D (ssad_optab, "ssad$I$a")
> diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
> index 7e3aae5f9c28a49feedc7cc66e8ac0d476b9f28a..58b55bb648ad97d514f1fa18bb00808fd2678b42 100644
> --- a/gcc/tree-cfg.c
> +++ b/gcc/tree-cfg.c
> @@ -4421,7 +4421,8 @@ verify_gimple_assign_ternary (gassign *stmt)
>  		  && !SCALAR_FLOAT_TYPE_P (rhs1_type))
>  		 || (!INTEGRAL_TYPE_P (lhs_type)
>  		     && !SCALAR_FLOAT_TYPE_P (lhs_type))))
> -	    || !types_compatible_p (rhs1_type, rhs2_type)
> +	    || (!types_compatible_p (rhs1_type, rhs2_type)
> +		&& TYPE_SIGN (rhs1_type) == TYPE_SIGN (rhs2_type))

That's not restrictive enough.  I suggest you use

            && element_precision (rhs1_type) != element_precision 
(rhs2_type)

instead.

As said, I'm not sure all the changes in this patch are required.

Please elaborate.

Thanks,
Richard.

>  	    || !useless_type_conversion_p (lhs_type, rhs3_type)
>  	    || maybe_lt (GET_MODE_SIZE (element_mode (rhs3_type)),
>  			 2 * GET_MODE_SIZE (element_mode (rhs1_type))))
> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> index 93fa2928e001c154bd4a9a73ac1dbbbf73c456df..cb8f5fbb6abca181c4171194d19fec29ec6e4176 100644
> --- a/gcc/tree-vect-loop.c
> +++ b/gcc/tree-vect-loop.c
> @@ -6401,6 +6401,33 @@ build_vect_cond_expr (enum tree_code code, tree vop[3], tree mask,
>      }
>  }
>  
> +/* Determine the optab_subtype to use for the given CODE and STMT.  For
> +   most CODE this will be optab_vector, however for certain operations such as
> +   DOT_PROD_EXPR where the operation can different signs for the operands we
> +   need to be able to pick the right optabs.  */
> +
> +static enum optab_subtype
> +vect_determine_dot_kind (tree_code code, stmt_vec_info stmt_vinfo)
> +{
> +  enum optab_subtype subtype = optab_vector;
> +  switch (code)
> +    {
> +      case DOT_PROD_EXPR:
> +	{
> +	  gassign *stmt = as_a <gassign *> (STMT_VINFO_STMT (stmt_vinfo));
> +	  signop rhs1_sign = TYPE_SIGN (TREE_TYPE (gimple_assign_rhs1 (stmt)));
> +	  signop rhs2_sign = TYPE_SIGN (TREE_TYPE (gimple_assign_rhs2 (stmt)));
> +	  if (rhs1_sign != rhs2_sign)
> +	    subtype |= optab_unsigned_to_signed;
> +	  break;
> +	}
> +      default:
> +	break;
> +    }
> +
> +  return subtype;
> +}
> +
>  /* Function vectorizable_reduction.
>  
>     Check if STMT_INFO performs a reduction operation that can be vectorized.
> @@ -7189,7 +7216,8 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
>        bool ok = true;
>  
>        /* 4.1. check support for the operation in the loop  */
> -      optab optab = optab_for_tree_code (code, vectype_in, optab_vector);
> +      enum optab_subtype subtype = vect_determine_dot_kind (code, stmt_info);
> +      optab optab = optab_for_tree_code (code, vectype_in, subtype);
>        if (!optab)
>  	{
>  	  if (dump_enabled_p ())
> diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
> index 441d6cd28c4eaded7abd756164890dbcffd2f3b8..943c001fb13777b4d1513841fa84942316846d5e 100644
> --- a/gcc/tree-vect-patterns.c
> +++ b/gcc/tree-vect-patterns.c
> @@ -201,7 +201,8 @@ vect_get_external_def_edge (vec_info *vinfo, tree var)
>  static bool
>  vect_supportable_direct_optab_p (vec_info *vinfo, tree otype, tree_code code,
>  				 tree itype, tree *vecotype_out,
> -				 tree *vecitype_out = NULL)
> +				 tree *vecitype_out = NULL,
> +				 enum optab_subtype subtype = optab_default)
>  {
>    tree vecitype = get_vectype_for_scalar_type (vinfo, itype);
>    if (!vecitype)
> @@ -211,7 +212,7 @@ vect_supportable_direct_optab_p (vec_info *vinfo, tree otype, tree_code code,
>    if (!vecotype)
>      return false;
>  
> -  optab optab = optab_for_tree_code (code, vecitype, optab_default);
> +  optab optab = optab_for_tree_code (code, vecitype, subtype);
>    if (!optab)
>      return false;
>  
> @@ -487,14 +488,31 @@ vect_joust_widened_integer (tree type, bool shift_p, tree op,
>  }
>  
>  /* Return true if the common supertype of NEW_TYPE and *COMMON_TYPE
> -   is narrower than type, storing the supertype in *COMMON_TYPE if so.  */
> +   is narrower than type, storing the supertype in *COMMON_TYPE if so.
> +   If ALLOW_SHORT_SIGN_MISMATCH then accept that *COMMON_TYPE and NEW_TYPE
> +   may be of different signs but equal precision.   */
>  
>  static bool
> -vect_joust_widened_type (tree type, tree new_type, tree *common_type)
> +vect_joust_widened_type (tree type, tree new_type, tree *common_type,
> +			 bool allow_short_sign_mismatch = false)
>  {
>    if (types_compatible_p (*common_type, new_type))
>      return true;
>  
> +  /* Check if the mismatch is only in the sign and if we have
> +     allow_short_sign_mismatch then allow it.  */
> +  if (allow_short_sign_mismatch
> +      && TYPE_SIGN (*common_type) != TYPE_SIGN (new_type))
> +    {
> +      bool sign = TYPE_SIGN (*common_type) == UNSIGNED;
> +      tree eq_type
> +	= build_nonstandard_integer_type (TYPE_PRECISION (new_type),
> +					  sign);
> +
> +      if (types_compatible_p (*common_type, eq_type))
> +	return true;
> +    }
> +
>    /* See if *COMMON_TYPE can hold all values of NEW_TYPE.  */
>    if ((TYPE_PRECISION (new_type) < TYPE_PRECISION (*common_type))
>        && (TYPE_UNSIGNED (new_type) || !TYPE_UNSIGNED (*common_type)))
> @@ -532,6 +550,9 @@ vect_joust_widened_type (tree type, tree new_type, tree *common_type)
>     to a type that (a) is narrower than the result of STMT_INFO and
>     (b) can hold all leaf operand values.
>  
> +   If ALLOW_SHORT_SIGN_MISMATCH then allow that the signs of the operands
> +   may differ in signs but not in precision.
> +
>     Return 0 if STMT_INFO isn't such a tree, or if no such COMMON_TYPE
>     exists.  */
>  
> @@ -539,7 +560,8 @@ static unsigned int
>  vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
>  		      tree_code widened_code, bool shift_p,
>  		      unsigned int max_nops,
> -		      vect_unpromoted_value *unprom, tree *common_type)
> +		      vect_unpromoted_value *unprom, tree *common_type,
> +		      bool allow_short_sign_mismatch = false)
>  {
>    /* Check for an integer operation with the right code.  */
>    gassign *assign = dyn_cast <gassign *> (stmt_info->stmt);
> @@ -600,7 +622,8 @@ vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
>  		= vinfo->lookup_def (this_unprom->op);
>  	      nops = vect_widened_op_tree (vinfo, def_stmt_info, code,
>  					   widened_code, shift_p, max_nops,
> -					   this_unprom, common_type);
> +					   this_unprom, common_type,
> +					   allow_short_sign_mismatch);
>  	      if (nops == 0)
>  		return 0;
>  
> @@ -617,7 +640,8 @@ vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
>  	      if (i == 0)
>  		*common_type = this_unprom->type;
>  	      else if (!vect_joust_widened_type (type, this_unprom->type,
> -						 common_type))
> +						 common_type,
> +						 allow_short_sign_mismatch))
>  		return 0;
>  	    }
>  	}
> @@ -888,21 +912,24 @@ vect_reassociating_reduction_p (vec_info *vinfo,
>  
>     Try to find the following pattern:
>  
> -     type x_t, y_t;
> +     type1a x_t
> +     type1b y_t;
>       TYPE1 prod;
>       TYPE2 sum = init;
>     loop:
>       sum_0 = phi <init, sum_1>
>       S1  x_t = ...
>       S2  y_t = ...
> -     S3  x_T = (TYPE1) x_t;
> -     S4  y_T = (TYPE1) y_t;
> +     S3  x_T = (TYPE3) x_t;
> +     S4  y_T = (TYPE4) y_t;
>       S5  prod = x_T * y_T;
>       [S6  prod = (TYPE2) prod;  #optional]
>       S7  sum_1 = prod + sum_0;
>  
> -   where 'TYPE1' is exactly double the size of type 'type', and 'TYPE2' is the
> -   same size of 'TYPE1' or bigger. This is a special case of a reduction
> +   where 'TYPE1' is exactly double the size of type 'type1a' and 'type1b',
> +   the sign of 'TYPE1' must be one of 'type1a' or 'type1b' but the sign of
> +   'type1a' and 'type1b' can differ. 'TYPE2' is the same size of 'TYPE1' or
> +   bigger and must be the same sign. This is a special case of a reduction
>     computation.
>  
>     Input:
> @@ -939,15 +966,16 @@ vect_recog_dot_prod_pattern (vec_info *vinfo,
>  
>    /* Look for the following pattern
>            DX = (TYPE1) X;
> -          DY = (TYPE1) Y;
> +	  DY = (TYPE2) Y;
>            DPROD = DX * DY;
> -          DDPROD = (TYPE2) DPROD;
> +	  DDPROD = (TYPE3) DPROD;
>            sum_1 = DDPROD + sum_0;
>       In which
>       - DX is double the size of X
>       - DY is double the size of Y
>       - DX, DY, DPROD all have the same type but the sign
> -       between DX, DY and DPROD can differ.
> +       between DX, DY and DPROD can differ. The sign of DPROD
> +       is one of the signs of DX or DY.
>       - sum is the same size of DPROD or bigger
>       - sum has been recognized as a reduction variable.
>  
> @@ -986,14 +1014,41 @@ vect_recog_dot_prod_pattern (vec_info *vinfo,
>       inside the loop (in case we are analyzing an outer-loop).  */
>    vect_unpromoted_value unprom0[2];
>    if (!vect_widened_op_tree (vinfo, mult_vinfo, MULT_EXPR, WIDEN_MULT_EXPR,
> -			     false, 2, unprom0, &half_type))
> +			     false, 2, unprom0, &half_type, true))
>      return NULL;
>  
> +  /* Check to see if there is a sign change happening in the operands of the
> +     multiplication and pick the appropriate optab subtype.  */
> +  enum optab_subtype subtype;
> +  tree rhs_type1 = unprom0[0].type;
> +  tree rhs_type2 = unprom0[1].type;
> +  if (TYPE_SIGN (rhs_type1) == TYPE_SIGN (rhs_type2))
> +     subtype = optab_default;
> +  else if (TYPE_SIGN (rhs_type1) == SIGNED
> +	   && TYPE_SIGN (rhs_type2) == UNSIGNED)
> +     subtype = optab_signed_to_unsigned;
> +  else if (TYPE_SIGN (rhs_type1) == UNSIGNED
> +	   && TYPE_SIGN (rhs_type2) == SIGNED)
> +     subtype = optab_unsigned_to_signed;
> +  else
> +    gcc_unreachable ();
> +
> +  /* If we have a sign changing dot product we need to check that the
> +     promoted type if unsigned has at least the same precision as the final
> +     type of the dot-product.  */
> +  if (subtype != optab_default)
> +    {
> +      tree mult_type = TREE_TYPE (unprom_mult.op);
> +      if (TYPE_SIGN (mult_type) == UNSIGNED
> +	  && TYPE_PRECISION (mult_type) < TYPE_PRECISION (type))
> +	return NULL;
> +    }
> +
>    vect_pattern_detected ("vect_recog_dot_prod_pattern", last_stmt);
>  
>    tree half_vectype;
>    if (!vect_supportable_direct_optab_p (vinfo, type, DOT_PROD_EXPR, half_type,
> -					type_out, &half_vectype))
> +					type_out, &half_vectype, subtype))
>      return NULL;
>  
>    /* Get the inputs in the appropriate types.  */
> @@ -1002,8 +1057,22 @@ vect_recog_dot_prod_pattern (vec_info *vinfo,
>  		       unprom0, half_vectype);
>  
>    var = vect_recog_temp_ssa_var (type, NULL);
> +
> +  /* If we have a sign changing dot-product the dot-product itself does any
> +     sign conversions, so consume the type and use the unpromoted types.  */
> +  tree mult_arg1, mult_arg2;
> +  if (subtype == optab_default)
> +    {
> +      mult_arg1 = mult_oprnd[0];
> +      mult_arg2 = mult_oprnd[1];
> +    }
> +  else
> +    {
> +      mult_arg1 = unprom0[0].op;
> +      mult_arg2 = unprom0[1].op;
> +    }
>    pattern_stmt = gimple_build_assign (var, DOT_PROD_EXPR,
> -				      mult_oprnd[0], mult_oprnd[1], oprnd1);
> +				      mult_arg1, mult_arg2, oprnd1);
>  
>    return pattern_stmt;
>  }
> 
> 
>
Tamar Christina May 7, 2021, 12:42 p.m. UTC | #2
Hi Richi,

> -----Original Message-----
> From: Richard Biener <rguenther@suse.de>
> Sent: Friday, May 7, 2021 12:46 PM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>
> Subject: Re: [PATCH 1/4]middle-end Vect: Add support for dot-product
> where the sign for the multiplicant changes.
> 
> On Wed, 5 May 2021, Tamar Christina wrote:
> 
> > Hi All,
> >
> > This patch adds support for a dot product where the sign of the
> > multiplication arguments differ. i.e. one is signed and one is
> > unsigned but the precisions are the same.
> >
> > #define N 480
> > #define SIGNEDNESS_1 unsigned
> > #define SIGNEDNESS_2 signed
> > #define SIGNEDNESS_3 signed
> > #define SIGNEDNESS_4 unsigned
> >
> > SIGNEDNESS_1 int __attribute__ ((noipa)) f (SIGNEDNESS_1 int res,
> > SIGNEDNESS_3 char *restrict a,
> >    SIGNEDNESS_4 char *restrict b)
> > {
> >   for (__INTPTR_TYPE__ i = 0; i < N; ++i)
> >     {
> >       int av = a[i];
> >       int bv = b[i];
> >       SIGNEDNESS_2 short mult = av * bv;
> >       res += mult;
> >     }
> >   return res;
> > }
> >
> > The operations are performed as if the operands were extended to a 32-bit
> value.
> > As such this operation isn't valid if there is an intermediate
> > conversion to an unsigned value. i.e.  if SIGNEDNESS_2 is unsigned.
> >
> > more over if the signs of SIGNEDNESS_3 and SIGNEDNESS_4 are flipped
> > the same optab is used but the operands are flipped in the optab
> expansion.
> >
> > To support this the patch extends the dot-product detection to
> > optionally ignore operands with different signs and stores this
> > information in the optab subtype which is now made a bitfield.
> >
> > The subtype can now additionally controls which optab an EXPR can expand
> to.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > 	* optabs.def (usdot_prod_optab): New.
> > 	* doc/md.texi: Document it.
> > 	* optabs-tree.c (optab_for_tree_code): Support usdot_prod_optab.
> > 	* optabs-tree.h (enum optab_subtype): Likewise.
> > 	* optabs.c (expand_widen_pattern_expr): Likewise.
> > 	* tree-cfg.c (verify_gimple_assign_ternary): Likewise.
> > 	* tree-vect-loop.c (vect_determine_dot_kind): New.
> > 	(vectorizable_reduction): Query dot-product kind.
> > 	* tree-vect-patterns.c (vect_supportable_direct_optab_p): Take
> optional
> > 	optab subtype.
> > 	(vect_joust_widened_type, vect_widened_op_tree): Optionally
> ignore
> > 	mismatch types.
> > 	(vect_recog_dot_prod_pattern): Support usdot_prod_optab.
> >
> > --- inline copy of patch --
> > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index
> >
> d166a0debedf4d8edf55c842bcf4ff4690b3e9ce..baf20416e63745097825fc30fd
> f2
> > e66bc80d7d23 100644
> > --- a/gcc/doc/md.texi
> > +++ b/gcc/doc/md.texi
> > @@ -5440,11 +5440,13 @@ Like @samp{fold_left_plus_@var{m}}, but
> takes
> > an additional mask operand  @item @samp{sdot_prod@var{m}}  @cindex
> > @code{udot_prod@var{m}} instruction pattern  @itemx
> > @samp{udot_prod@var{m}}
> > +@cindex @code{usdot_prod@var{m}} instruction pattern @itemx
> > +@samp{usdot_prod@var{m}}
> >  Compute the sum of the products of two signed/unsigned elements.
> > -Operand 1 and operand 2 are of the same mode. Their product, which is
> > of a -wider mode, is computed and added to operand 3. Operand 3 is of
> > a mode equal or -wider than the mode of the product. The result is
> > placed in operand 0, which -is of the same mode as operand 3.
> > +Operand 1 and operand 2 are of the same mode but may differ in signs.
> > +Their product, which is of a wider mode, is computed and added to
> operand 3.
> > +Operand 3 is of a mode equal or wider than the mode of the product.
> > +The result is placed in operand 0, which is of the same mode as operand 3.
> 
> This doesn't really say what the 's', 'u' and 'us' specify.  Since we're doing a
> widen multiplication and then a non-widening addition we only need to
> know the effective sign of the multiplication so I think the existing 's' and 'u'
> are enough to cover all cases?

The existing 's' and 'u' enforce that both operands of the multiplication are of the
same sign.  So for e.g. 'u' both operand must be unsigned.

In the `us` case one can be signed and one unsigned. Operationally this does a sign
extension to the wider type for the signed value, and the unsigned value gets zero extended
first, and then converts it to unsigned to perform the
unsigned multiplication, conforming to the C promotion rules.

TL;DR; Without a new optab I can't tell during expansion which semantic the operation
had at the gimple/C level as modes don't carry signs.

Long version:

The problem with using the existing patterns, because of their enforcement of `av` and `bv` being
the same sign is that we can't remove the explicit sign extensions, but the multiplication must be done
on the sign/zero extended char input in the same sign.

Which means (unless I am mistaken) to get the correct result, you can't use neither `udot` nor `sdot` as
semantically these would zero or sign extend both operands from char to int to perform the multiplication
in the same sigh.  Whereas in this case, one parameter is zero and one parameter is sign extended and the result
is always an unsigned number.

So basically

udot<unsigned c, unsigned a, unsigned b> ==
   c = zero-ext (a) * zero-ext (b)
sdot<signed c, signed a, signed b> ==
   c = sign-ext (a) * sign-ext (b)
usdot<unsigned c, unsigned a, signed b> ==
   c = ((unsigned-conv) sign-ext (a)) * zero-ext (b)

So semantically the existing optabs won't fit here. udot would internally promote to unsigned types before
the multiplication so the result of the multiplication would be wrong.  sdot would promote both to signed
and do signed multiplication, so the result is also wrong.

Now if I relax the constraint on the signs of udot and sdot there are two problems:
RTL Modes don't contain signs.  So a target can't tell me how the operands will be promoted.
So:

1) I can't really check which semantics the target will adhere to on expansion.
2) at expand time I have no way to differentiate between the two instructions variants, given just modes
     I can't tell whether I expand to the normal dot-product or the new instruction.

Regards,
Tamar

> 
> The tree.def docs say the sum is also possibly widening but I don't see this
> covered by the optab so we should eventually remove this feature from the
> tree side.  In fact the tree-cfg.c verifier requires the addition to be not
> widening - thus only tree.def needs adjustment.
> 
> >  @cindex @code{ssad@var{m}} instruction pattern  @item
> > @samp{ssad@var{m}} diff --git a/gcc/optabs-tree.h b/gcc/optabs-tree.h
> > index
> >
> c3aaa1a416991e856d3e24da45968a92ebada82c..ebc23ac86fe99057f375781c2f
> 19
> > 90e0548ba08d 100644
> > --- a/gcc/optabs-tree.h
> > +++ b/gcc/optabs-tree.h
> > @@ -27,11 +27,29 @@ along with GCC; see the file COPYING3.  If not see
> >     shift amount vs. machines that take a vector for the shift amount.
> > */  enum optab_subtype  {
> > -  optab_default,
> > -  optab_scalar,
> > -  optab_vector
> > +  optab_default = 1 << 0,
> > +  optab_scalar = 1 << 1,
> > +  optab_vector = 1 << 2,
> > +  optab_signed_to_unsigned = 1 << 3,
> > +  optab_unsigned_to_signed = 1 << 4
> >  };
> >
> > +/* Override the OrEqual-operator so we can use optab_subtype as a bit
> > +flag.  */ inline enum optab_subtype& operator |= (enum
> optab_subtype&
> > +a, enum optab_subtype b) {
> > +    return a = static_cast<optab_subtype>(static_cast<int>(a)
> > +					  | static_cast<int>(b));
> > +}
> > +
> > +/* Override the Or-operator so we can use optab_subtype as a bit
> > +flag.  */ inline enum optab_subtype operator | (enum optab_subtype a,
> > +enum optab_subtype b) {
> > +    return static_cast<optab_subtype>(static_cast<int>(a)
> > +				      | static_cast<int>(b));
> > +}
> > +
> >  /* Return the optab used for computing the given operation on the type
> given by
> >     the second argument.  The third argument distinguishes between the
> types of
> >     vector shifts and rotates.  */
> > diff --git a/gcc/optabs-tree.c b/gcc/optabs-tree.c index
> >
> 95ffe397c23e80c105afea52e9d47216bf52f55a..2f60004545defc53182e004eea
> 1e
> > 5c22b7453072 100644
> > --- a/gcc/optabs-tree.c
> > +++ b/gcc/optabs-tree.c
> > @@ -127,7 +127,17 @@ optab_for_tree_code (enum tree_code code,
> const_tree type,
> >        return TYPE_UNSIGNED (type) ? usum_widen_optab :
> > ssum_widen_optab;
> >
> >      case DOT_PROD_EXPR:
> > -      return TYPE_UNSIGNED (type) ? udot_prod_optab : sdot_prod_optab;
> > +      {
> > +	gcc_assert (subtype & optab_default
> > +		    || subtype & optab_vector
> > +		    || subtype & optab_signed_to_unsigned
> > +		    || subtype & optab_unsigned_to_signed);
> > +
> > +	if (subtype & (optab_unsigned_to_signed |
> optab_signed_to_unsigned))
> > +	  return usdot_prod_optab;
> > +
> > +	return (TYPE_UNSIGNED (type) ? udot_prod_optab :
> sdot_prod_optab);
> > +      }
> >
> >      case SAD_EXPR:
> >        return TYPE_UNSIGNED (type) ? usad_optab : ssad_optab; diff
> > --git a/gcc/optabs.c b/gcc/optabs.c index
> >
> f4614a394587787293dc8b680a38901f7906f61c..2e18b76de1412eab71971753ac
> 67
> > 8597c0d00098 100644
> > --- a/gcc/optabs.c
> > +++ b/gcc/optabs.c
> > @@ -262,6 +262,11 @@ expand_widen_pattern_expr (sepops ops, rtx op0,
> rtx op1, rtx wide_op,
> >    bool sbool = false;
> >
> >    oprnd0 = ops->op0;
> > +  if (nops >= 2)
> > +    oprnd1 = ops->op1;
> > +  if (nops >= 3)
> > +    oprnd2 = ops->op2;
> > +
> >    tmode0 = TYPE_MODE (TREE_TYPE (oprnd0));
> >    if (ops->code == VEC_UNPACK_FIX_TRUNC_HI_EXPR
> >        || ops->code == VEC_UNPACK_FIX_TRUNC_LO_EXPR) @@ -285,6
> +290,27
> > @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1, rtx
> wide_op,
> >  	   ? vec_unpacks_sbool_hi_optab : vec_unpacks_sbool_lo_optab);
> >        sbool = true;
> >      }
> > +  else if (ops->code == DOT_PROD_EXPR)
> > +    {
> > +      enum optab_subtype subtype = optab_default;
> > +      signop sign1 = TYPE_SIGN (TREE_TYPE (oprnd0));
> > +      signop sign2 = TYPE_SIGN (TREE_TYPE (oprnd1));
> > +      if (sign1 == sign2)
> > +	;
> > +      else if (sign1 == SIGNED && sign2 == UNSIGNED)
> > +	{
> > +	  subtype |= optab_signed_to_unsigned;
> > +	  /* Same as optab_unsigned_to_signed but flip the operands.  */
> > +	  std::swap (op0, op1);
> > +	}
> > +      else if (sign1 == UNSIGNED && sign2 == SIGNED)
> > +	subtype |= optab_unsigned_to_signed;
> > +      else
> > +	gcc_unreachable ();
> > +
> > +      widen_pattern_optab
> > +	= optab_for_tree_code (ops->code, TREE_TYPE (oprnd0), subtype);
> > +    }
> >    else
> >      widen_pattern_optab
> >        = optab_for_tree_code (ops->code, TREE_TYPE (oprnd0),
> > optab_default); @@ -298,10 +324,7 @@ expand_widen_pattern_expr
> (sepops ops, rtx op0, rtx op1, rtx wide_op,
> >    gcc_assert (icode != CODE_FOR_nothing);
> >
> >    if (nops >= 2)
> > -    {
> > -      oprnd1 = ops->op1;
> > -      tmode1 = TYPE_MODE (TREE_TYPE (oprnd1));
> > -    }
> > +    tmode1 = TYPE_MODE (TREE_TYPE (oprnd1));
> >    else if (sbool)
> >      {
> >        nops = 2;
> > @@ -316,7 +339,6 @@ expand_widen_pattern_expr (sepops ops, rtx op0,
> rtx op1, rtx wide_op,
> >      {
> >        gcc_assert (tmode1 == tmode0);
> >        gcc_assert (op1);
> > -      oprnd2 = ops->op2;
> >        wmode = TYPE_MODE (TREE_TYPE (oprnd2));
> >      }
> >
> > diff --git a/gcc/optabs.def b/gcc/optabs.def index
> >
> b192a9d070b8aa72e5676b2eaa020b5bdd7ffcc8..f470c2168378cec840edf7fbd
> b7c
> > 18615baae928 100644
> > --- a/gcc/optabs.def
> > +++ b/gcc/optabs.def
> > @@ -352,6 +352,7 @@ OPTAB_D (uavg_ceil_optab, "uavg$a3_ceil")
> OPTAB_D
> > (sdot_prod_optab, "sdot_prod$I$a")  OPTAB_D (ssum_widen_optab,
> > "widen_ssum$I$a3")  OPTAB_D (udot_prod_optab, "udot_prod$I$a")
> > +OPTAB_D (usdot_prod_optab, "usdot_prod$I$a")
> >  OPTAB_D (usum_widen_optab, "widen_usum$I$a3")  OPTAB_D
> (usad_optab,
> > "usad$I$a")  OPTAB_D (ssad_optab, "ssad$I$a") diff --git
> > a/gcc/tree-cfg.c b/gcc/tree-cfg.c index
> >
> 7e3aae5f9c28a49feedc7cc66e8ac0d476b9f28a..58b55bb648ad97d514f1fa18bb
> 00
> > 808fd2678b42 100644
> > --- a/gcc/tree-cfg.c
> > +++ b/gcc/tree-cfg.c
> > @@ -4421,7 +4421,8 @@ verify_gimple_assign_ternary (gassign *stmt)
> >  		  && !SCALAR_FLOAT_TYPE_P (rhs1_type))
> >  		 || (!INTEGRAL_TYPE_P (lhs_type)
> >  		     && !SCALAR_FLOAT_TYPE_P (lhs_type))))
> > -	    || !types_compatible_p (rhs1_type, rhs2_type)
> > +	    || (!types_compatible_p (rhs1_type, rhs2_type)
> > +		&& TYPE_SIGN (rhs1_type) == TYPE_SIGN (rhs2_type))
> 
> That's not restrictive enough.  I suggest you use
> 
>             && element_precision (rhs1_type) != element_precision
> (rhs2_type)
> 
> instead.
> 
> As said, I'm not sure all the changes in this patch are required.
> 
> Please elaborate.
> 
> Thanks,
> Richard.
> 
> >  	    || !useless_type_conversion_p (lhs_type, rhs3_type)
> >  	    || maybe_lt (GET_MODE_SIZE (element_mode (rhs3_type)),
> >  			 2 * GET_MODE_SIZE (element_mode (rhs1_type))))
> diff --git
> > a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index
> >
> 93fa2928e001c154bd4a9a73ac1dbbbf73c456df..cb8f5fbb6abca181c4171194d1
> 9f
> > ec29ec6e4176 100644
> > --- a/gcc/tree-vect-loop.c
> > +++ b/gcc/tree-vect-loop.c
> > @@ -6401,6 +6401,33 @@ build_vect_cond_expr (enum tree_code code,
> tree vop[3], tree mask,
> >      }
> >  }
> >
> > +/* Determine the optab_subtype to use for the given CODE and STMT.
> For
> > +   most CODE this will be optab_vector, however for certain operations
> such as
> > +   DOT_PROD_EXPR where the operation can different signs for the
> operands we
> > +   need to be able to pick the right optabs.  */
> > +
> > +static enum optab_subtype
> > +vect_determine_dot_kind (tree_code code, stmt_vec_info stmt_vinfo) {
> > +  enum optab_subtype subtype = optab_vector;
> > +  switch (code)
> > +    {
> > +      case DOT_PROD_EXPR:
> > +	{
> > +	  gassign *stmt = as_a <gassign *> (STMT_VINFO_STMT (stmt_vinfo));
> > +	  signop rhs1_sign = TYPE_SIGN (TREE_TYPE (gimple_assign_rhs1
> (stmt)));
> > +	  signop rhs2_sign = TYPE_SIGN (TREE_TYPE (gimple_assign_rhs2
> (stmt)));
> > +	  if (rhs1_sign != rhs2_sign)
> > +	    subtype |= optab_unsigned_to_signed;
> > +	  break;
> > +	}
> > +      default:
> > +	break;
> > +    }
> > +
> > +  return subtype;
> > +}
> > +
> >  /* Function vectorizable_reduction.
> >
> >     Check if STMT_INFO performs a reduction operation that can be
> vectorized.
> > @@ -7189,7 +7216,8 @@ vectorizable_reduction (loop_vec_info
> loop_vinfo,
> >        bool ok = true;
> >
> >        /* 4.1. check support for the operation in the loop  */
> > -      optab optab = optab_for_tree_code (code, vectype_in, optab_vector);
> > +      enum optab_subtype subtype = vect_determine_dot_kind (code,
> stmt_info);
> > +      optab optab = optab_for_tree_code (code, vectype_in, subtype);
> >        if (!optab)
> >  	{
> >  	  if (dump_enabled_p ())
> > diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c index
> >
> 441d6cd28c4eaded7abd756164890dbcffd2f3b8..943c001fb13777b4d1513841f
> a84
> > 942316846d5e 100644
> > --- a/gcc/tree-vect-patterns.c
> > +++ b/gcc/tree-vect-patterns.c
> > @@ -201,7 +201,8 @@ vect_get_external_def_edge (vec_info *vinfo, tree
> > var)  static bool  vect_supportable_direct_optab_p (vec_info *vinfo,
> > tree otype, tree_code code,
> >  				 tree itype, tree *vecotype_out,
> > -				 tree *vecitype_out = NULL)
> > +				 tree *vecitype_out = NULL,
> > +				 enum optab_subtype subtype =
> optab_default)
> >  {
> >    tree vecitype = get_vectype_for_scalar_type (vinfo, itype);
> >    if (!vecitype)
> > @@ -211,7 +212,7 @@ vect_supportable_direct_optab_p (vec_info *vinfo,
> tree otype, tree_code code,
> >    if (!vecotype)
> >      return false;
> >
> > -  optab optab = optab_for_tree_code (code, vecitype, optab_default);
> > +  optab optab = optab_for_tree_code (code, vecitype, subtype);
> >    if (!optab)
> >      return false;
> >
> > @@ -487,14 +488,31 @@ vect_joust_widened_integer (tree type, bool
> > shift_p, tree op,  }
> >
> >  /* Return true if the common supertype of NEW_TYPE and
> *COMMON_TYPE
> > -   is narrower than type, storing the supertype in *COMMON_TYPE if so.
> */
> > +   is narrower than type, storing the supertype in *COMMON_TYPE if so.
> > +   If ALLOW_SHORT_SIGN_MISMATCH then accept that *COMMON_TYPE
> and NEW_TYPE
> > +   may be of different signs but equal precision.   */
> >
> >  static bool
> > -vect_joust_widened_type (tree type, tree new_type, tree
> *common_type)
> > +vect_joust_widened_type (tree type, tree new_type, tree
> *common_type,
> > +			 bool allow_short_sign_mismatch = false)
> >  {
> >    if (types_compatible_p (*common_type, new_type))
> >      return true;
> >
> > +  /* Check if the mismatch is only in the sign and if we have
> > +     allow_short_sign_mismatch then allow it.  */
> > +  if (allow_short_sign_mismatch
> > +      && TYPE_SIGN (*common_type) != TYPE_SIGN (new_type))
> > +    {
> > +      bool sign = TYPE_SIGN (*common_type) == UNSIGNED;
> > +      tree eq_type
> > +	= build_nonstandard_integer_type (TYPE_PRECISION (new_type),
> > +					  sign);
> > +
> > +      if (types_compatible_p (*common_type, eq_type))
> > +	return true;
> > +    }
> > +
> >    /* See if *COMMON_TYPE can hold all values of NEW_TYPE.  */
> >    if ((TYPE_PRECISION (new_type) < TYPE_PRECISION (*common_type))
> >        && (TYPE_UNSIGNED (new_type) || !TYPE_UNSIGNED
> (*common_type)))
> > @@ -532,6 +550,9 @@ vect_joust_widened_type (tree type, tree
> new_type, tree *common_type)
> >     to a type that (a) is narrower than the result of STMT_INFO and
> >     (b) can hold all leaf operand values.
> >
> > +   If ALLOW_SHORT_SIGN_MISMATCH then allow that the signs of the
> operands
> > +   may differ in signs but not in precision.
> > +
> >     Return 0 if STMT_INFO isn't such a tree, or if no such COMMON_TYPE
> >     exists.  */
> >
> > @@ -539,7 +560,8 @@ static unsigned int  vect_widened_op_tree
> > (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
> >  		      tree_code widened_code, bool shift_p,
> >  		      unsigned int max_nops,
> > -		      vect_unpromoted_value *unprom, tree *common_type)
> > +		      vect_unpromoted_value *unprom, tree *common_type,
> > +		      bool allow_short_sign_mismatch = false)
> >  {
> >    /* Check for an integer operation with the right code.  */
> >    gassign *assign = dyn_cast <gassign *> (stmt_info->stmt); @@ -600,7
> > +622,8 @@ vect_widened_op_tree (vec_info *vinfo, stmt_vec_info
> stmt_info, tree_code code,
> >  		= vinfo->lookup_def (this_unprom->op);
> >  	      nops = vect_widened_op_tree (vinfo, def_stmt_info, code,
> >  					   widened_code, shift_p, max_nops,
> > -					   this_unprom, common_type);
> > +					   this_unprom, common_type,
> > +					   allow_short_sign_mismatch);
> >  	      if (nops == 0)
> >  		return 0;
> >
> > @@ -617,7 +640,8 @@ vect_widened_op_tree (vec_info *vinfo,
> stmt_vec_info stmt_info, tree_code code,
> >  	      if (i == 0)
> >  		*common_type = this_unprom->type;
> >  	      else if (!vect_joust_widened_type (type, this_unprom->type,
> > -						 common_type))
> > +						 common_type,
> > +						 allow_short_sign_mismatch))
> >  		return 0;
> >  	    }
> >  	}
> > @@ -888,21 +912,24 @@ vect_reassociating_reduction_p (vec_info *vinfo,
> >
> >     Try to find the following pattern:
> >
> > -     type x_t, y_t;
> > +     type1a x_t
> > +     type1b y_t;
> >       TYPE1 prod;
> >       TYPE2 sum = init;
> >     loop:
> >       sum_0 = phi <init, sum_1>
> >       S1  x_t = ...
> >       S2  y_t = ...
> > -     S3  x_T = (TYPE1) x_t;
> > -     S4  y_T = (TYPE1) y_t;
> > +     S3  x_T = (TYPE3) x_t;
> > +     S4  y_T = (TYPE4) y_t;
> >       S5  prod = x_T * y_T;
> >       [S6  prod = (TYPE2) prod;  #optional]
> >       S7  sum_1 = prod + sum_0;
> >
> > -   where 'TYPE1' is exactly double the size of type 'type', and 'TYPE2' is the
> > -   same size of 'TYPE1' or bigger. This is a special case of a reduction
> > +   where 'TYPE1' is exactly double the size of type 'type1a' and 'type1b',
> > +   the sign of 'TYPE1' must be one of 'type1a' or 'type1b' but the sign of
> > +   'type1a' and 'type1b' can differ. 'TYPE2' is the same size of 'TYPE1' or
> > +   bigger and must be the same sign. This is a special case of a
> > + reduction
> >     computation.
> >
> >     Input:
> > @@ -939,15 +966,16 @@ vect_recog_dot_prod_pattern (vec_info *vinfo,
> >
> >    /* Look for the following pattern
> >            DX = (TYPE1) X;
> > -          DY = (TYPE1) Y;
> > +	  DY = (TYPE2) Y;
> >            DPROD = DX * DY;
> > -          DDPROD = (TYPE2) DPROD;
> > +	  DDPROD = (TYPE3) DPROD;
> >            sum_1 = DDPROD + sum_0;
> >       In which
> >       - DX is double the size of X
> >       - DY is double the size of Y
> >       - DX, DY, DPROD all have the same type but the sign
> > -       between DX, DY and DPROD can differ.
> > +       between DX, DY and DPROD can differ. The sign of DPROD
> > +       is one of the signs of DX or DY.
> >       - sum is the same size of DPROD or bigger
> >       - sum has been recognized as a reduction variable.
> >
> > @@ -986,14 +1014,41 @@ vect_recog_dot_prod_pattern (vec_info *vinfo,
> >       inside the loop (in case we are analyzing an outer-loop).  */
> >    vect_unpromoted_value unprom0[2];
> >    if (!vect_widened_op_tree (vinfo, mult_vinfo, MULT_EXPR,
> WIDEN_MULT_EXPR,
> > -			     false, 2, unprom0, &half_type))
> > +			     false, 2, unprom0, &half_type, true))
> >      return NULL;
> >
> > +  /* Check to see if there is a sign change happening in the operands of
> the
> > +     multiplication and pick the appropriate optab subtype.  */
> > +  enum optab_subtype subtype;
> > +  tree rhs_type1 = unprom0[0].type;
> > +  tree rhs_type2 = unprom0[1].type;
> > +  if (TYPE_SIGN (rhs_type1) == TYPE_SIGN (rhs_type2))
> > +     subtype = optab_default;
> > +  else if (TYPE_SIGN (rhs_type1) == SIGNED
> > +	   && TYPE_SIGN (rhs_type2) == UNSIGNED)
> > +     subtype = optab_signed_to_unsigned;
> > +  else if (TYPE_SIGN (rhs_type1) == UNSIGNED
> > +	   && TYPE_SIGN (rhs_type2) == SIGNED)
> > +     subtype = optab_unsigned_to_signed;
> > +  else
> > +    gcc_unreachable ();
> > +
> > +  /* If we have a sign changing dot product we need to check that the
> > +     promoted type if unsigned has at least the same precision as the final
> > +     type of the dot-product.  */
> > +  if (subtype != optab_default)
> > +    {
> > +      tree mult_type = TREE_TYPE (unprom_mult.op);
> > +      if (TYPE_SIGN (mult_type) == UNSIGNED
> > +	  && TYPE_PRECISION (mult_type) < TYPE_PRECISION (type))
> > +	return NULL;
> > +    }
> > +
> >    vect_pattern_detected ("vect_recog_dot_prod_pattern", last_stmt);
> >
> >    tree half_vectype;
> >    if (!vect_supportable_direct_optab_p (vinfo, type, DOT_PROD_EXPR,
> half_type,
> > -					type_out, &half_vectype))
> > +					type_out, &half_vectype, subtype))
> >      return NULL;
> >
> >    /* Get the inputs in the appropriate types.  */ @@ -1002,8 +1057,22
> > @@ vect_recog_dot_prod_pattern (vec_info *vinfo,
> >  		       unprom0, half_vectype);
> >
> >    var = vect_recog_temp_ssa_var (type, NULL);
> > +
> > +  /* If we have a sign changing dot-product the dot-product itself does any
> > +     sign conversions, so consume the type and use the unpromoted
> > + types.  */  tree mult_arg1, mult_arg2;  if (subtype ==
> > + optab_default)
> > +    {
> > +      mult_arg1 = mult_oprnd[0];
> > +      mult_arg2 = mult_oprnd[1];
> > +    }
> > +  else
> > +    {
> > +      mult_arg1 = unprom0[0].op;
> > +      mult_arg2 = unprom0[1].op;
> > +    }
> >    pattern_stmt = gimple_build_assign (var, DOT_PROD_EXPR,
> > -				      mult_oprnd[0], mult_oprnd[1], oprnd1);
> > +				      mult_arg1, mult_arg2, oprnd1);
> >
> >    return pattern_stmt;
> >  }
> >
> >
> >
> 
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409
> Nuernberg, Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
Richard Biener May 10, 2021, 11:39 a.m. UTC | #3
On Fri, 7 May 2021, Tamar Christina wrote:

> Hi Richi,
> 
> > -----Original Message-----
> > From: Richard Biener <rguenther@suse.de>
> > Sent: Friday, May 7, 2021 12:46 PM
> > To: Tamar Christina <Tamar.Christina@arm.com>
> > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>
> > Subject: Re: [PATCH 1/4]middle-end Vect: Add support for dot-product
> > where the sign for the multiplicant changes.
> > 
> > On Wed, 5 May 2021, Tamar Christina wrote:
> > 
> > > Hi All,
> > >
> > > This patch adds support for a dot product where the sign of the
> > > multiplication arguments differ. i.e. one is signed and one is
> > > unsigned but the precisions are the same.
> > >
> > > #define N 480
> > > #define SIGNEDNESS_1 unsigned
> > > #define SIGNEDNESS_2 signed
> > > #define SIGNEDNESS_3 signed
> > > #define SIGNEDNESS_4 unsigned
> > >
> > > SIGNEDNESS_1 int __attribute__ ((noipa)) f (SIGNEDNESS_1 int res,
> > > SIGNEDNESS_3 char *restrict a,
> > >    SIGNEDNESS_4 char *restrict b)
> > > {
> > >   for (__INTPTR_TYPE__ i = 0; i < N; ++i)
> > >     {
> > >       int av = a[i];
> > >       int bv = b[i];
> > >       SIGNEDNESS_2 short mult = av * bv;
> > >       res += mult;
> > >     }
> > >   return res;
> > > }
> > >
> > > The operations are performed as if the operands were extended to a 32-bit
> > value.
> > > As such this operation isn't valid if there is an intermediate
> > > conversion to an unsigned value. i.e.  if SIGNEDNESS_2 is unsigned.
> > >
> > > more over if the signs of SIGNEDNESS_3 and SIGNEDNESS_4 are flipped
> > > the same optab is used but the operands are flipped in the optab
> > expansion.
> > >
> > > To support this the patch extends the dot-product detection to
> > > optionally ignore operands with different signs and stores this
> > > information in the optab subtype which is now made a bitfield.
> > >
> > > The subtype can now additionally controls which optab an EXPR can expand
> > to.
> > >
> > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > >
> > > Ok for master?
> > >
> > > Thanks,
> > > Tamar
> > >
> > > gcc/ChangeLog:
> > >
> > > 	* optabs.def (usdot_prod_optab): New.
> > > 	* doc/md.texi: Document it.
> > > 	* optabs-tree.c (optab_for_tree_code): Support usdot_prod_optab.
> > > 	* optabs-tree.h (enum optab_subtype): Likewise.
> > > 	* optabs.c (expand_widen_pattern_expr): Likewise.
> > > 	* tree-cfg.c (verify_gimple_assign_ternary): Likewise.
> > > 	* tree-vect-loop.c (vect_determine_dot_kind): New.
> > > 	(vectorizable_reduction): Query dot-product kind.
> > > 	* tree-vect-patterns.c (vect_supportable_direct_optab_p): Take
> > optional
> > > 	optab subtype.
> > > 	(vect_joust_widened_type, vect_widened_op_tree): Optionally
> > ignore
> > > 	mismatch types.
> > > 	(vect_recog_dot_prod_pattern): Support usdot_prod_optab.
> > >
> > > --- inline copy of patch --
> > > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index
> > >
> > d166a0debedf4d8edf55c842bcf4ff4690b3e9ce..baf20416e63745097825fc30fd
> > f2
> > > e66bc80d7d23 100644
> > > --- a/gcc/doc/md.texi
> > > +++ b/gcc/doc/md.texi
> > > @@ -5440,11 +5440,13 @@ Like @samp{fold_left_plus_@var{m}}, but
> > takes
> > > an additional mask operand  @item @samp{sdot_prod@var{m}}  @cindex
> > > @code{udot_prod@var{m}} instruction pattern  @itemx
> > > @samp{udot_prod@var{m}}
> > > +@cindex @code{usdot_prod@var{m}} instruction pattern @itemx
> > > +@samp{usdot_prod@var{m}}
> > >  Compute the sum of the products of two signed/unsigned elements.
> > > -Operand 1 and operand 2 are of the same mode. Their product, which is
> > > of a -wider mode, is computed and added to operand 3. Operand 3 is of
> > > a mode equal or -wider than the mode of the product. The result is
> > > placed in operand 0, which -is of the same mode as operand 3.
> > > +Operand 1 and operand 2 are of the same mode but may differ in signs.
> > > +Their product, which is of a wider mode, is computed and added to
> > operand 3.
> > > +Operand 3 is of a mode equal or wider than the mode of the product.
> > > +The result is placed in operand 0, which is of the same mode as operand 3.
> > 
> > This doesn't really say what the 's', 'u' and 'us' specify.  Since we're doing a
> > widen multiplication and then a non-widening addition we only need to
> > know the effective sign of the multiplication so I think the existing 's' and 'u'
> > are enough to cover all cases?
> 
> The existing 's' and 'u' enforce that both operands of the multiplication are of the
> same sign.  So for e.g. 'u' both operand must be unsigned.
> 
> In the `us` case one can be signed and one unsigned. Operationally this does a sign
> extension to the wider type for the signed value, and the unsigned value gets zero extended
> first, and then converts it to unsigned to perform the
> unsigned multiplication, conforming to the C promotion rules.
> 
> TL;DR; Without a new optab I can't tell during expansion which semantic the operation
> had at the gimple/C level as modes don't carry signs.
> 
> Long version:
> 
> The problem with using the existing patterns, because of their enforcement of `av` and `bv` being
> the same sign is that we can't remove the explicit sign extensions, but the multiplication must be done
> on the sign/zero extended char input in the same sign.
> 
> Which means (unless I am mistaken) to get the correct result, you can't use neither `udot` nor `sdot` as
> semantically these would zero or sign extend both operands from char to int to perform the multiplication
> in the same sigh.  Whereas in this case, one parameter is zero and one parameter is sign extended and the result
> is always an unsigned number.
> 
> So basically
> 
> udot<unsigned c, unsigned a, unsigned b> ==
>    c = zero-ext (a) * zero-ext (b)
> sdot<signed c, signed a, signed b> ==
>    c = sign-ext (a) * sign-ext (b)
> usdot<unsigned c, unsigned a, signed b> ==
>    c = ((unsigned-conv) sign-ext (a)) * zero-ext (b)
> 
> So semantically the existing optabs won't fit here. udot would internally promote to unsigned types before
> the multiplication so the result of the multiplication would be wrong.  sdot would promote both to signed
> and do signed multiplication, so the result is also wrong.
> 
> Now if I relax the constraint on the signs of udot and sdot there are two problems:
> RTL Modes don't contain signs.  So a target can't tell me how the operands will be promoted.
> So:
> 
> 1) I can't really check which semantics the target will adhere to on expansion.
> 2) at expand time I have no way to differentiate between the two instructions variants, given just modes
>      I can't tell whether I expand to the normal dot-product or the new instruction.

Ah, OK.  Indeed with such a weird instruction the new variant makes
sense.  Still can you please amend the optab documentation to say
which operand is unsigned and which is signed?  Just 'may differ in signs'
is bad.

Since the multiplication is commutative I wonder why you need to handle
both signed_to_unsigned and unsigned_to_signed - we should just enforce
a canonical order (like the optab does).  I also think it's a
particular bad fit for the bad optab_for_tree_code API - would any of
that improve when using a direct internal function here?  In
particular all the changes around optab_subtype look like they make
a bad API worse ... at least a single optab_vector_mixed_sign should
suffice here, no need to make it a flags kind.

+  /* If we have a sign changing dot product we need to check that the
+     promoted type if unsigned has at least the same precision as the 
final
+     type of the dot-product.  */
+  if (subtype != optab_default)
+    {
+      tree mult_type = TREE_TYPE (unprom_mult.op);
+      if (TYPE_SIGN (mult_type) == UNSIGNED
+         && TYPE_PRECISION (mult_type) < TYPE_PRECISION (type))
+       return NULL;
+    }

I don't understand this - how do we ever arrive at a result with
less precision?  And why's this not an issue for signed multiplication?
Also...

+  /* If we have a sign changing dot-product the dot-product itself does 
any
+     sign conversions, so consume the type and use the unpromoted types.  
*/
+  tree mult_arg1, mult_arg2;
+  if (subtype == optab_default)
+    {
+      mult_arg1 = mult_oprnd[0];
+      mult_arg2 = mult_oprnd[1];
+    }
+  else
+    {
+      mult_arg1 = unprom0[0].op;
+      mult_arg2 = unprom0[1].op;
+    }
   pattern_stmt = gimple_build_assign (var, DOT_PROD_EXPR,
-                                     mult_oprnd[0], mult_oprnd[1], 
oprnd1);
+                                     mult_arg1, mult_arg2, oprnd1);

I thought DOT_PROD always performs the promotion.  Maybe
mult_oprnd and unprom0 are just misnamed here?

Richard.

> Regards,
> Tamar
> 
> > 
> > The tree.def docs say the sum is also possibly widening but I don't see this
> > covered by the optab so we should eventually remove this feature from the
> > tree side.  In fact the tree-cfg.c verifier requires the addition to be not
> > widening - thus only tree.def needs adjustment.
> > 
> > >  @cindex @code{ssad@var{m}} instruction pattern  @item
> > > @samp{ssad@var{m}} diff --git a/gcc/optabs-tree.h b/gcc/optabs-tree.h
> > > index
> > >
> > c3aaa1a416991e856d3e24da45968a92ebada82c..ebc23ac86fe99057f375781c2f
> > 19
> > > 90e0548ba08d 100644
> > > --- a/gcc/optabs-tree.h
> > > +++ b/gcc/optabs-tree.h
> > > @@ -27,11 +27,29 @@ along with GCC; see the file COPYING3.  If not see
> > >     shift amount vs. machines that take a vector for the shift amount.
> > > */  enum optab_subtype  {
> > > -  optab_default,
> > > -  optab_scalar,
> > > -  optab_vector
> > > +  optab_default = 1 << 0,
> > > +  optab_scalar = 1 << 1,
> > > +  optab_vector = 1 << 2,
> > > +  optab_signed_to_unsigned = 1 << 3,
> > > +  optab_unsigned_to_signed = 1 << 4
> > >  };
> > >
> > > +/* Override the OrEqual-operator so we can use optab_subtype as a bit
> > > +flag.  */ inline enum optab_subtype& operator |= (enum
> > optab_subtype&
> > > +a, enum optab_subtype b) {
> > > +    return a = static_cast<optab_subtype>(static_cast<int>(a)
> > > +					  | static_cast<int>(b));
> > > +}
> > > +
> > > +/* Override the Or-operator so we can use optab_subtype as a bit
> > > +flag.  */ inline enum optab_subtype operator | (enum optab_subtype a,
> > > +enum optab_subtype b) {
> > > +    return static_cast<optab_subtype>(static_cast<int>(a)
> > > +				      | static_cast<int>(b));
> > > +}
> > > +
> > >  /* Return the optab used for computing the given operation on the type
> > given by
> > >     the second argument.  The third argument distinguishes between the
> > types of
> > >     vector shifts and rotates.  */
> > > diff --git a/gcc/optabs-tree.c b/gcc/optabs-tree.c index
> > >
> > 95ffe397c23e80c105afea52e9d47216bf52f55a..2f60004545defc53182e004eea
> > 1e
> > > 5c22b7453072 100644
> > > --- a/gcc/optabs-tree.c
> > > +++ b/gcc/optabs-tree.c
> > > @@ -127,7 +127,17 @@ optab_for_tree_code (enum tree_code code,
> > const_tree type,
> > >        return TYPE_UNSIGNED (type) ? usum_widen_optab :
> > > ssum_widen_optab;
> > >
> > >      case DOT_PROD_EXPR:
> > > -      return TYPE_UNSIGNED (type) ? udot_prod_optab : sdot_prod_optab;
> > > +      {
> > > +	gcc_assert (subtype & optab_default
> > > +		    || subtype & optab_vector
> > > +		    || subtype & optab_signed_to_unsigned
> > > +		    || subtype & optab_unsigned_to_signed);
> > > +
> > > +	if (subtype & (optab_unsigned_to_signed |
> > optab_signed_to_unsigned))
> > > +	  return usdot_prod_optab;
> > > +
> > > +	return (TYPE_UNSIGNED (type) ? udot_prod_optab :
> > sdot_prod_optab);
> > > +      }
> > >
> > >      case SAD_EXPR:
> > >        return TYPE_UNSIGNED (type) ? usad_optab : ssad_optab; diff
> > > --git a/gcc/optabs.c b/gcc/optabs.c index
> > >
> > f4614a394587787293dc8b680a38901f7906f61c..2e18b76de1412eab71971753ac
> > 67
> > > 8597c0d00098 100644
> > > --- a/gcc/optabs.c
> > > +++ b/gcc/optabs.c
> > > @@ -262,6 +262,11 @@ expand_widen_pattern_expr (sepops ops, rtx op0,
> > rtx op1, rtx wide_op,
> > >    bool sbool = false;
> > >
> > >    oprnd0 = ops->op0;
> > > +  if (nops >= 2)
> > > +    oprnd1 = ops->op1;
> > > +  if (nops >= 3)
> > > +    oprnd2 = ops->op2;
> > > +
> > >    tmode0 = TYPE_MODE (TREE_TYPE (oprnd0));
> > >    if (ops->code == VEC_UNPACK_FIX_TRUNC_HI_EXPR
> > >        || ops->code == VEC_UNPACK_FIX_TRUNC_LO_EXPR) @@ -285,6
> > +290,27
> > > @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1, rtx
> > wide_op,
> > >  	   ? vec_unpacks_sbool_hi_optab : vec_unpacks_sbool_lo_optab);
> > >        sbool = true;
> > >      }
> > > +  else if (ops->code == DOT_PROD_EXPR)
> > > +    {
> > > +      enum optab_subtype subtype = optab_default;
> > > +      signop sign1 = TYPE_SIGN (TREE_TYPE (oprnd0));
> > > +      signop sign2 = TYPE_SIGN (TREE_TYPE (oprnd1));
> > > +      if (sign1 == sign2)
> > > +	;
> > > +      else if (sign1 == SIGNED && sign2 == UNSIGNED)
> > > +	{
> > > +	  subtype |= optab_signed_to_unsigned;
> > > +	  /* Same as optab_unsigned_to_signed but flip the operands.  */
> > > +	  std::swap (op0, op1);
> > > +	}
> > > +      else if (sign1 == UNSIGNED && sign2 == SIGNED)
> > > +	subtype |= optab_unsigned_to_signed;
> > > +      else
> > > +	gcc_unreachable ();
> > > +
> > > +      widen_pattern_optab
> > > +	= optab_for_tree_code (ops->code, TREE_TYPE (oprnd0), subtype);
> > > +    }
> > >    else
> > >      widen_pattern_optab
> > >        = optab_for_tree_code (ops->code, TREE_TYPE (oprnd0),
> > > optab_default); @@ -298,10 +324,7 @@ expand_widen_pattern_expr
> > (sepops ops, rtx op0, rtx op1, rtx wide_op,
> > >    gcc_assert (icode != CODE_FOR_nothing);
> > >
> > >    if (nops >= 2)
> > > -    {
> > > -      oprnd1 = ops->op1;
> > > -      tmode1 = TYPE_MODE (TREE_TYPE (oprnd1));
> > > -    }
> > > +    tmode1 = TYPE_MODE (TREE_TYPE (oprnd1));
> > >    else if (sbool)
> > >      {
> > >        nops = 2;
> > > @@ -316,7 +339,6 @@ expand_widen_pattern_expr (sepops ops, rtx op0,
> > rtx op1, rtx wide_op,
> > >      {
> > >        gcc_assert (tmode1 == tmode0);
> > >        gcc_assert (op1);
> > > -      oprnd2 = ops->op2;
> > >        wmode = TYPE_MODE (TREE_TYPE (oprnd2));
> > >      }
> > >
> > > diff --git a/gcc/optabs.def b/gcc/optabs.def index
> > >
> > b192a9d070b8aa72e5676b2eaa020b5bdd7ffcc8..f470c2168378cec840edf7fbd
> > b7c
> > > 18615baae928 100644
> > > --- a/gcc/optabs.def
> > > +++ b/gcc/optabs.def
> > > @@ -352,6 +352,7 @@ OPTAB_D (uavg_ceil_optab, "uavg$a3_ceil")
> > OPTAB_D
> > > (sdot_prod_optab, "sdot_prod$I$a")  OPTAB_D (ssum_widen_optab,
> > > "widen_ssum$I$a3")  OPTAB_D (udot_prod_optab, "udot_prod$I$a")
> > > +OPTAB_D (usdot_prod_optab, "usdot_prod$I$a")
> > >  OPTAB_D (usum_widen_optab, "widen_usum$I$a3")  OPTAB_D
> > (usad_optab,
> > > "usad$I$a")  OPTAB_D (ssad_optab, "ssad$I$a") diff --git
> > > a/gcc/tree-cfg.c b/gcc/tree-cfg.c index
> > >
> > 7e3aae5f9c28a49feedc7cc66e8ac0d476b9f28a..58b55bb648ad97d514f1fa18bb
> > 00
> > > 808fd2678b42 100644
> > > --- a/gcc/tree-cfg.c
> > > +++ b/gcc/tree-cfg.c
> > > @@ -4421,7 +4421,8 @@ verify_gimple_assign_ternary (gassign *stmt)
> > >  		  && !SCALAR_FLOAT_TYPE_P (rhs1_type))
> > >  		 || (!INTEGRAL_TYPE_P (lhs_type)
> > >  		     && !SCALAR_FLOAT_TYPE_P (lhs_type))))
> > > -	    || !types_compatible_p (rhs1_type, rhs2_type)
> > > +	    || (!types_compatible_p (rhs1_type, rhs2_type)
> > > +		&& TYPE_SIGN (rhs1_type) == TYPE_SIGN (rhs2_type))
> > 
> > That's not restrictive enough.  I suggest you use
> > 
> >             && element_precision (rhs1_type) != element_precision
> > (rhs2_type)
> > 
> > instead.
> > 
> > As said, I'm not sure all the changes in this patch are required.
> > 
> > Please elaborate.
> > 
> > Thanks,
> > Richard.
> > 
> > >  	    || !useless_type_conversion_p (lhs_type, rhs3_type)
> > >  	    || maybe_lt (GET_MODE_SIZE (element_mode (rhs3_type)),
> > >  			 2 * GET_MODE_SIZE (element_mode (rhs1_type))))
> > diff --git
> > > a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index
> > >
> > 93fa2928e001c154bd4a9a73ac1dbbbf73c456df..cb8f5fbb6abca181c4171194d1
> > 9f
> > > ec29ec6e4176 100644
> > > --- a/gcc/tree-vect-loop.c
> > > +++ b/gcc/tree-vect-loop.c
> > > @@ -6401,6 +6401,33 @@ build_vect_cond_expr (enum tree_code code,
> > tree vop[3], tree mask,
> > >      }
> > >  }
> > >
> > > +/* Determine the optab_subtype to use for the given CODE and STMT.
> > For
> > > +   most CODE this will be optab_vector, however for certain operations
> > such as
> > > +   DOT_PROD_EXPR where the operation can different signs for the
> > operands we
> > > +   need to be able to pick the right optabs.  */
> > > +
> > > +static enum optab_subtype
> > > +vect_determine_dot_kind (tree_code code, stmt_vec_info stmt_vinfo) {
> > > +  enum optab_subtype subtype = optab_vector;
> > > +  switch (code)
> > > +    {
> > > +      case DOT_PROD_EXPR:
> > > +	{
> > > +	  gassign *stmt = as_a <gassign *> (STMT_VINFO_STMT (stmt_vinfo));
> > > +	  signop rhs1_sign = TYPE_SIGN (TREE_TYPE (gimple_assign_rhs1
> > (stmt)));
> > > +	  signop rhs2_sign = TYPE_SIGN (TREE_TYPE (gimple_assign_rhs2
> > (stmt)));
> > > +	  if (rhs1_sign != rhs2_sign)
> > > +	    subtype |= optab_unsigned_to_signed;
> > > +	  break;
> > > +	}
> > > +      default:
> > > +	break;
> > > +    }
> > > +
> > > +  return subtype;
> > > +}
> > > +
> > >  /* Function vectorizable_reduction.
> > >
> > >     Check if STMT_INFO performs a reduction operation that can be
> > vectorized.
> > > @@ -7189,7 +7216,8 @@ vectorizable_reduction (loop_vec_info
> > loop_vinfo,
> > >        bool ok = true;
> > >
> > >        /* 4.1. check support for the operation in the loop  */
> > > -      optab optab = optab_for_tree_code (code, vectype_in, optab_vector);
> > > +      enum optab_subtype subtype = vect_determine_dot_kind (code,
> > stmt_info);
> > > +      optab optab = optab_for_tree_code (code, vectype_in, subtype);
> > >        if (!optab)
> > >  	{
> > >  	  if (dump_enabled_p ())
> > > diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c index
> > >
> > 441d6cd28c4eaded7abd756164890dbcffd2f3b8..943c001fb13777b4d1513841f
> > a84
> > > 942316846d5e 100644
> > > --- a/gcc/tree-vect-patterns.c
> > > +++ b/gcc/tree-vect-patterns.c
> > > @@ -201,7 +201,8 @@ vect_get_external_def_edge (vec_info *vinfo, tree
> > > var)  static bool  vect_supportable_direct_optab_p (vec_info *vinfo,
> > > tree otype, tree_code code,
> > >  				 tree itype, tree *vecotype_out,
> > > -				 tree *vecitype_out = NULL)
> > > +				 tree *vecitype_out = NULL,
> > > +				 enum optab_subtype subtype =
> > optab_default)
> > >  {
> > >    tree vecitype = get_vectype_for_scalar_type (vinfo, itype);
> > >    if (!vecitype)
> > > @@ -211,7 +212,7 @@ vect_supportable_direct_optab_p (vec_info *vinfo,
> > tree otype, tree_code code,
> > >    if (!vecotype)
> > >      return false;
> > >
> > > -  optab optab = optab_for_tree_code (code, vecitype, optab_default);
> > > +  optab optab = optab_for_tree_code (code, vecitype, subtype);
> > >    if (!optab)
> > >      return false;
> > >
> > > @@ -487,14 +488,31 @@ vect_joust_widened_integer (tree type, bool
> > > shift_p, tree op,  }
> > >
> > >  /* Return true if the common supertype of NEW_TYPE and
> > *COMMON_TYPE
> > > -   is narrower than type, storing the supertype in *COMMON_TYPE if so.
> > */
> > > +   is narrower than type, storing the supertype in *COMMON_TYPE if so.
> > > +   If ALLOW_SHORT_SIGN_MISMATCH then accept that *COMMON_TYPE
> > and NEW_TYPE
> > > +   may be of different signs but equal precision.   */
> > >
> > >  static bool
> > > -vect_joust_widened_type (tree type, tree new_type, tree
> > *common_type)
> > > +vect_joust_widened_type (tree type, tree new_type, tree
> > *common_type,
> > > +			 bool allow_short_sign_mismatch = false)
> > >  {
> > >    if (types_compatible_p (*common_type, new_type))
> > >      return true;
> > >
> > > +  /* Check if the mismatch is only in the sign and if we have
> > > +     allow_short_sign_mismatch then allow it.  */
> > > +  if (allow_short_sign_mismatch
> > > +      && TYPE_SIGN (*common_type) != TYPE_SIGN (new_type))
> > > +    {
> > > +      bool sign = TYPE_SIGN (*common_type) == UNSIGNED;
> > > +      tree eq_type
> > > +	= build_nonstandard_integer_type (TYPE_PRECISION (new_type),
> > > +					  sign);
> > > +
> > > +      if (types_compatible_p (*common_type, eq_type))
> > > +	return true;
> > > +    }
> > > +
> > >    /* See if *COMMON_TYPE can hold all values of NEW_TYPE.  */
> > >    if ((TYPE_PRECISION (new_type) < TYPE_PRECISION (*common_type))
> > >        && (TYPE_UNSIGNED (new_type) || !TYPE_UNSIGNED
> > (*common_type)))
> > > @@ -532,6 +550,9 @@ vect_joust_widened_type (tree type, tree
> > new_type, tree *common_type)
> > >     to a type that (a) is narrower than the result of STMT_INFO and
> > >     (b) can hold all leaf operand values.
> > >
> > > +   If ALLOW_SHORT_SIGN_MISMATCH then allow that the signs of the
> > operands
> > > +   may differ in signs but not in precision.
> > > +
> > >     Return 0 if STMT_INFO isn't such a tree, or if no such COMMON_TYPE
> > >     exists.  */
> > >
> > > @@ -539,7 +560,8 @@ static unsigned int  vect_widened_op_tree
> > > (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
> > >  		      tree_code widened_code, bool shift_p,
> > >  		      unsigned int max_nops,
> > > -		      vect_unpromoted_value *unprom, tree *common_type)
> > > +		      vect_unpromoted_value *unprom, tree *common_type,
> > > +		      bool allow_short_sign_mismatch = false)
> > >  {
> > >    /* Check for an integer operation with the right code.  */
> > >    gassign *assign = dyn_cast <gassign *> (stmt_info->stmt); @@ -600,7
> > > +622,8 @@ vect_widened_op_tree (vec_info *vinfo, stmt_vec_info
> > stmt_info, tree_code code,
> > >  		= vinfo->lookup_def (this_unprom->op);
> > >  	      nops = vect_widened_op_tree (vinfo, def_stmt_info, code,
> > >  					   widened_code, shift_p, max_nops,
> > > -					   this_unprom, common_type);
> > > +					   this_unprom, common_type,
> > > +					   allow_short_sign_mismatch);
> > >  	      if (nops == 0)
> > >  		return 0;
> > >
> > > @@ -617,7 +640,8 @@ vect_widened_op_tree (vec_info *vinfo,
> > stmt_vec_info stmt_info, tree_code code,
> > >  	      if (i == 0)
> > >  		*common_type = this_unprom->type;
> > >  	      else if (!vect_joust_widened_type (type, this_unprom->type,
> > > -						 common_type))
> > > +						 common_type,
> > > +						 allow_short_sign_mismatch))
> > >  		return 0;
> > >  	    }
> > >  	}
> > > @@ -888,21 +912,24 @@ vect_reassociating_reduction_p (vec_info *vinfo,
> > >
> > >     Try to find the following pattern:
> > >
> > > -     type x_t, y_t;
> > > +     type1a x_t
> > > +     type1b y_t;
> > >       TYPE1 prod;
> > >       TYPE2 sum = init;
> > >     loop:
> > >       sum_0 = phi <init, sum_1>
> > >       S1  x_t = ...
> > >       S2  y_t = ...
> > > -     S3  x_T = (TYPE1) x_t;
> > > -     S4  y_T = (TYPE1) y_t;
> > > +     S3  x_T = (TYPE3) x_t;
> > > +     S4  y_T = (TYPE4) y_t;
> > >       S5  prod = x_T * y_T;
> > >       [S6  prod = (TYPE2) prod;  #optional]
> > >       S7  sum_1 = prod + sum_0;
> > >
> > > -   where 'TYPE1' is exactly double the size of type 'type', and 'TYPE2' is the
> > > -   same size of 'TYPE1' or bigger. This is a special case of a reduction
> > > +   where 'TYPE1' is exactly double the size of type 'type1a' and 'type1b',
> > > +   the sign of 'TYPE1' must be one of 'type1a' or 'type1b' but the sign of
> > > +   'type1a' and 'type1b' can differ. 'TYPE2' is the same size of 'TYPE1' or
> > > +   bigger and must be the same sign. This is a special case of a
> > > + reduction
> > >     computation.
> > >
> > >     Input:
> > > @@ -939,15 +966,16 @@ vect_recog_dot_prod_pattern (vec_info *vinfo,
> > >
> > >    /* Look for the following pattern
> > >            DX = (TYPE1) X;
> > > -          DY = (TYPE1) Y;
> > > +	  DY = (TYPE2) Y;
> > >            DPROD = DX * DY;
> > > -          DDPROD = (TYPE2) DPROD;
> > > +	  DDPROD = (TYPE3) DPROD;
> > >            sum_1 = DDPROD + sum_0;
> > >       In which
> > >       - DX is double the size of X
> > >       - DY is double the size of Y
> > >       - DX, DY, DPROD all have the same type but the sign
> > > -       between DX, DY and DPROD can differ.
> > > +       between DX, DY and DPROD can differ. The sign of DPROD
> > > +       is one of the signs of DX or DY.
> > >       - sum is the same size of DPROD or bigger
> > >       - sum has been recognized as a reduction variable.
> > >
> > > @@ -986,14 +1014,41 @@ vect_recog_dot_prod_pattern (vec_info *vinfo,
> > >       inside the loop (in case we are analyzing an outer-loop).  */
> > >    vect_unpromoted_value unprom0[2];
> > >    if (!vect_widened_op_tree (vinfo, mult_vinfo, MULT_EXPR,
> > WIDEN_MULT_EXPR,
> > > -			     false, 2, unprom0, &half_type))
> > > +			     false, 2, unprom0, &half_type, true))
> > >      return NULL;
> > >
> > > +  /* Check to see if there is a sign change happening in the operands of
> > the
> > > +     multiplication and pick the appropriate optab subtype.  */
> > > +  enum optab_subtype subtype;
> > > +  tree rhs_type1 = unprom0[0].type;
> > > +  tree rhs_type2 = unprom0[1].type;
> > > +  if (TYPE_SIGN (rhs_type1) == TYPE_SIGN (rhs_type2))
> > > +     subtype = optab_default;
> > > +  else if (TYPE_SIGN (rhs_type1) == SIGNED
> > > +	   && TYPE_SIGN (rhs_type2) == UNSIGNED)
> > > +     subtype = optab_signed_to_unsigned;
> > > +  else if (TYPE_SIGN (rhs_type1) == UNSIGNED
> > > +	   && TYPE_SIGN (rhs_type2) == SIGNED)
> > > +     subtype = optab_unsigned_to_signed;
> > > +  else
> > > +    gcc_unreachable ();
> > > +
> > > +  /* If we have a sign changing dot product we need to check that the
> > > +     promoted type if unsigned has at least the same precision as the final
> > > +     type of the dot-product.  */
> > > +  if (subtype != optab_default)
> > > +    {
> > > +      tree mult_type = TREE_TYPE (unprom_mult.op);
> > > +      if (TYPE_SIGN (mult_type) == UNSIGNED
> > > +	  && TYPE_PRECISION (mult_type) < TYPE_PRECISION (type))
> > > +	return NULL;
> > > +    }
> > > +
> > >    vect_pattern_detected ("vect_recog_dot_prod_pattern", last_stmt);
> > >
> > >    tree half_vectype;
> > >    if (!vect_supportable_direct_optab_p (vinfo, type, DOT_PROD_EXPR,
> > half_type,
> > > -					type_out, &half_vectype))
> > > +					type_out, &half_vectype, subtype))
> > >      return NULL;
> > >
> > >    /* Get the inputs in the appropriate types.  */ @@ -1002,8 +1057,22
> > > @@ vect_recog_dot_prod_pattern (vec_info *vinfo,
> > >  		       unprom0, half_vectype);
> > >
> > >    var = vect_recog_temp_ssa_var (type, NULL);
> > > +
> > > +  /* If we have a sign changing dot-product the dot-product itself does any
> > > +     sign conversions, so consume the type and use the unpromoted
> > > + types.  */  tree mult_arg1, mult_arg2;  if (subtype ==
> > > + optab_default)
> > > +    {
> > > +      mult_arg1 = mult_oprnd[0];
> > > +      mult_arg2 = mult_oprnd[1];
> > > +    }
> > > +  else
> > > +    {
> > > +      mult_arg1 = unprom0[0].op;
> > > +      mult_arg2 = unprom0[1].op;
> > > +    }
> > >    pattern_stmt = gimple_build_assign (var, DOT_PROD_EXPR,
> > > -				      mult_oprnd[0], mult_oprnd[1], oprnd1);
> > > +				      mult_arg1, mult_arg2, oprnd1);
> > >
> > >    return pattern_stmt;
> > >  }
> > >
> > >
> > >
> > 
> > --
> > Richard Biener <rguenther@suse.de>
> > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409
> > Nuernberg, Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
>
Tamar Christina May 10, 2021, 12:58 p.m. UTC | #4
> -----Original Message-----
> From: Richard Biener <rguenther@suse.de>
> Sent: Monday, May 10, 2021 12:40 PM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>
> Subject: RE: [PATCH 1/4]middle-end Vect: Add support for dot-product
> where the sign for the multiplicant changes.
> 
> On Fri, 7 May 2021, Tamar Christina wrote:
> 
> > Hi Richi,
> >
> > > -----Original Message-----
> > > From: Richard Biener <rguenther@suse.de>
> > > Sent: Friday, May 7, 2021 12:46 PM
> > > To: Tamar Christina <Tamar.Christina@arm.com>
> > > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>
> > > Subject: Re: [PATCH 1/4]middle-end Vect: Add support for dot-product
> > > where the sign for the multiplicant changes.
> > >
> > > On Wed, 5 May 2021, Tamar Christina wrote:
> > >
> > > > Hi All,
> > > >
> > > > This patch adds support for a dot product where the sign of the
> > > > multiplication arguments differ. i.e. one is signed and one is
> > > > unsigned but the precisions are the same.
> > > >
> > > > #define N 480
> > > > #define SIGNEDNESS_1 unsigned
> > > > #define SIGNEDNESS_2 signed
> > > > #define SIGNEDNESS_3 signed
> > > > #define SIGNEDNESS_4 unsigned
> > > >
> > > > SIGNEDNESS_1 int __attribute__ ((noipa)) f (SIGNEDNESS_1 int res,
> > > > SIGNEDNESS_3 char *restrict a,
> > > >    SIGNEDNESS_4 char *restrict b)
> > > > {
> > > >   for (__INTPTR_TYPE__ i = 0; i < N; ++i)
> > > >     {
> > > >       int av = a[i];
> > > >       int bv = b[i];
> > > >       SIGNEDNESS_2 short mult = av * bv;
> > > >       res += mult;
> > > >     }
> > > >   return res;
> > > > }
> > > >
> > > > The operations are performed as if the operands were extended to a
> > > > 32-bit
> > > value.
> > > > As such this operation isn't valid if there is an intermediate
> > > > conversion to an unsigned value. i.e.  if SIGNEDNESS_2 is unsigned.
> > > >
> > > > more over if the signs of SIGNEDNESS_3 and SIGNEDNESS_4 are
> > > > flipped the same optab is used but the operands are flipped in the
> > > > optab
> > > expansion.
> > > >
> > > > To support this the patch extends the dot-product detection to
> > > > optionally ignore operands with different signs and stores this
> > > > information in the optab subtype which is now made a bitfield.
> > > >
> > > > The subtype can now additionally controls which optab an EXPR can
> > > > expand
> > > to.
> > > >
> > > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > > >
> > > > Ok for master?
> > > >
> > > > Thanks,
> > > > Tamar
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > > 	* optabs.def (usdot_prod_optab): New.
> > > > 	* doc/md.texi: Document it.
> > > > 	* optabs-tree.c (optab_for_tree_code): Support usdot_prod_optab.
> > > > 	* optabs-tree.h (enum optab_subtype): Likewise.
> > > > 	* optabs.c (expand_widen_pattern_expr): Likewise.
> > > > 	* tree-cfg.c (verify_gimple_assign_ternary): Likewise.
> > > > 	* tree-vect-loop.c (vect_determine_dot_kind): New.
> > > > 	(vectorizable_reduction): Query dot-product kind.
> > > > 	* tree-vect-patterns.c (vect_supportable_direct_optab_p): Take
> > > optional
> > > > 	optab subtype.
> > > > 	(vect_joust_widened_type, vect_widened_op_tree): Optionally
> > > ignore
> > > > 	mismatch types.
> > > > 	(vect_recog_dot_prod_pattern): Support usdot_prod_optab.
> > > >
> > > > --- inline copy of patch --
> > > > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index
> > > >
> > >
> d166a0debedf4d8edf55c842bcf4ff4690b3e9ce..baf20416e63745097825fc30fd
> > > f2
> > > > e66bc80d7d23 100644
> > > > --- a/gcc/doc/md.texi
> > > > +++ b/gcc/doc/md.texi
> > > > @@ -5440,11 +5440,13 @@ Like @samp{fold_left_plus_@var{m}}, but
> > > takes
> > > > an additional mask operand  @item @samp{sdot_prod@var{m}}
> @cindex
> > > > @code{udot_prod@var{m}} instruction pattern  @itemx
> > > > @samp{udot_prod@var{m}}
> > > > +@cindex @code{usdot_prod@var{m}} instruction pattern @itemx
> > > > +@samp{usdot_prod@var{m}}
> > > >  Compute the sum of the products of two signed/unsigned elements.
> > > > -Operand 1 and operand 2 are of the same mode. Their product,
> > > > which is of a -wider mode, is computed and added to operand 3.
> > > > Operand 3 is of a mode equal or -wider than the mode of the
> > > > product. The result is placed in operand 0, which -is of the same mode
> as operand 3.
> > > > +Operand 1 and operand 2 are of the same mode but may differ in
> signs.
> > > > +Their product, which is of a wider mode, is computed and added to
> > > operand 3.
> > > > +Operand 3 is of a mode equal or wider than the mode of the product.
> > > > +The result is placed in operand 0, which is of the same mode as
> operand 3.
> > >
> > > This doesn't really say what the 's', 'u' and 'us' specify.  Since
> > > we're doing a widen multiplication and then a non-widening addition
> > > we only need to know the effective sign of the multiplication so I think
> the existing 's' and 'u'
> > > are enough to cover all cases?
> >
> > The existing 's' and 'u' enforce that both operands of the
> > multiplication are of the same sign.  So for e.g. 'u' both operand must be
> unsigned.
> >
> > In the `us` case one can be signed and one unsigned. Operationally
> > this does a sign extension to the wider type for the signed value, and
> > the unsigned value gets zero extended first, and then converts it to
> > unsigned to perform the unsigned multiplication, conforming to the C
> promotion rules.
> >
> > TL;DR; Without a new optab I can't tell during expansion which
> > semantic the operation had at the gimple/C level as modes don't carry signs.
> >
> > Long version:
> >
> > The problem with using the existing patterns, because of their
> > enforcement of `av` and `bv` being the same sign is that we can't
> > remove the explicit sign extensions, but the multiplication must be done on
> the sign/zero extended char input in the same sign.
> >
> > Which means (unless I am mistaken) to get the correct result, you
> > can't use neither `udot` nor `sdot` as semantically these would zero
> > or sign extend both operands from char to int to perform the
> > multiplication in the same sigh.  Whereas in this case, one parameter is zero
> and one parameter is sign extended and the result is always an unsigned
> number.
> >
> > So basically
> >
> > udot<unsigned c, unsigned a, unsigned b> ==
> >    c = zero-ext (a) * zero-ext (b)
> > sdot<signed c, signed a, signed b> ==
> >    c = sign-ext (a) * sign-ext (b)
> > usdot<unsigned c, unsigned a, signed b> ==
> >    c = ((unsigned-conv) sign-ext (a)) * zero-ext (b)
> >
> > So semantically the existing optabs won't fit here. udot would
> > internally promote to unsigned types before the multiplication so the
> > result of the multiplication would be wrong.  sdot would promote both to
> signed and do signed multiplication, so the result is also wrong.
> >
> > Now if I relax the constraint on the signs of udot and sdot there are two
> problems:
> > RTL Modes don't contain signs.  So a target can't tell me how the operands
> will be promoted.
> > So:
> >
> > 1) I can't really check which semantics the target will adhere to on
> expansion.
> > 2) at expand time I have no way to differentiate between the two
> instructions variants, given just modes
> >      I can't tell whether I expand to the normal dot-product or the new
> instruction.
> 
> Ah, OK.  Indeed with such a weird instruction the new variant makes sense.
> Still can you please amend the optab documentation to say which operand is
> unsigned and which is signed?  Just 'may differ in signs'
> is bad.

Sure, will expand on it.

> 
> Since the multiplication is commutative I wonder why you need to handle
> both signed_to_unsigned and unsigned_to_signed - we should just enforce
> a canonical order (like the optab does). 

Sure, I thought it would have been better to change the order at expand time,
but can do so at detection time.

> I also think it's a particular bad fit for
> the bad optab_for_tree_code API - would any of that improve when using a
> direct internal function here? 

Somewhat, but this has considerable knock on effects, e.g. currently DOT_PROD is
treated as a widening operation and so is handled by supportable_widening_operation
which does not support calls. There's a significant number of places which work on the
tree EXPR (including constant folding) which all need to be changed.

> In particular all the changes around
> optab_subtype look like they make a bad API worse ... at least a single
> optab_vector_mixed_sign should suffice here, no need to make it a flags
> kind.

The reason I did so is because depending on where the query is done it does use
different subtypes currently.  During detection it uses optab_default, and during
vectorization optab_vector.  For this instruction this difference doesn't seem to be
used, but did not want to lose this information in case something depended on it.

But can make it just one.

> 
> +  /* If we have a sign changing dot product we need to check that the
> +     promoted type if unsigned has at least the same precision as the
> final
> +     type of the dot-product.  */
> +  if (subtype != optab_default)
> +    {
> +      tree mult_type = TREE_TYPE (unprom_mult.op);
> +      if (TYPE_SIGN (mult_type) == UNSIGNED
> +         && TYPE_PRECISION (mult_type) < TYPE_PRECISION (type))
> +       return NULL;
> +    }
> 
> I don't understand this - how do we ever arrive at a result with less precision?

The user could have manually truncated the results, i.e. in the detection code notice `mult`

      int av = a[i];
      int bv = b[i];
      SIGNEDNESS_2 short mult = av * bv;
      res += mult;

which is a short, so it's manually truncating the multiplication which is done as int by the instruction.
If `mult` is unsigned then it will truncate the result if the signed input to usdot was negative, unless the
Intermediate calculation is of the same precision as the instruction. i.e. if mult is unsigned int then there's
no truncation going on, it's casting from int to unsigned int so it's safe to use then as the instruction does the
same thing internally.

> And why's this not an issue for signed multiplication?

It is, but in that case it's handled by the type jousting, which doesn't allow the type mismatch. i.e.

#define SIGNEDNESS_1 unsigned
#define SIGNEDNESS_2 unsigned
#define SIGNEDNESS_3 signed
#define SIGNEDNESS_4 signed

SIGNEDNESS_1 int __attribute__ ((noipa))
f (SIGNEDNESS_1 int res, SIGNEDNESS_3 char *restrict a,
   SIGNEDNESS_4 char *restrict b)
{
  for (__INTPTR_TYPE__ i = 0; i < N; ++i)
    {
      int av = a[i];
      int bv = b[i];
      SIGNEDNESS_2 short mult = av * bv;
      res += mult;
    }
  return res;
}

Is also not detected as a dot product.  By adding the carve out to the widen multiplication detection it now
allows this case through so I handle it in the detection code.  Thinking about it now, it seems more logical
to add this case handling inside the type jousting code as I don't think it's ever something you'd want.

> Also...
> 
> +  /* If we have a sign changing dot-product the dot-product itself does
> any
> +     sign conversions, so consume the type and use the unpromoted types.
> */
> +  tree mult_arg1, mult_arg2;
> +  if (subtype == optab_default)
> +    {
> +      mult_arg1 = mult_oprnd[0];
> +      mult_arg2 = mult_oprnd[1];
> +    }
> +  else
> +    {
> +      mult_arg1 = unprom0[0].op;
> +      mult_arg2 = unprom0[1].op;
> +    }
>    pattern_stmt = gimple_build_assign (var, DOT_PROD_EXPR,
> -                                     mult_oprnd[0], mult_oprnd[1],
> oprnd1);
> +                                     mult_arg1, mult_arg2, oprnd1);
> 
> I thought DOT_PROD always performs the promotion.  Maybe mult_oprnd
> and unprom0 are just misnamed here?

Somewhat, in a normal dot-product the sign of the multiplication are the same here
as the "unpromoted" types. So after vect_convert_input these two types are the same.

However because here the sign changes and to maintain the semantics of the C code
there's an extra conversion here to get the arguments in the same sign.  That needs to be
stripped before given to the instruction which does the conversion internally.

Regards,
Tamar

> 
> Richard.
> 
> > Regards,
> > Tamar
> >
> > >
> > > The tree.def docs say the sum is also possibly widening but I don't
> > > see this covered by the optab so we should eventually remove this
> > > feature from the tree side.  In fact the tree-cfg.c verifier
> > > requires the addition to be not widening - thus only tree.def needs
> adjustment.
> > >
> > > >  @cindex @code{ssad@var{m}} instruction pattern  @item
> > > > @samp{ssad@var{m}} diff --git a/gcc/optabs-tree.h
> > > > b/gcc/optabs-tree.h index
> > > >
> > >
> c3aaa1a416991e856d3e24da45968a92ebada82c..ebc23ac86fe99057f375781c2f
> > > 19
> > > > 90e0548ba08d 100644
> > > > --- a/gcc/optabs-tree.h
> > > > +++ b/gcc/optabs-tree.h
> > > > @@ -27,11 +27,29 @@ along with GCC; see the file COPYING3.  If not
> see
> > > >     shift amount vs. machines that take a vector for the shift amount.
> > > > */  enum optab_subtype  {
> > > > -  optab_default,
> > > > -  optab_scalar,
> > > > -  optab_vector
> > > > +  optab_default = 1 << 0,
> > > > +  optab_scalar = 1 << 1,
> > > > +  optab_vector = 1 << 2,
> > > > +  optab_signed_to_unsigned = 1 << 3,  optab_unsigned_to_signed =
> > > > + 1 << 4
> > > >  };
> > > >
> > > > +/* Override the OrEqual-operator so we can use optab_subtype as a
> > > > +bit flag.  */ inline enum optab_subtype& operator |= (enum
> > > optab_subtype&
> > > > +a, enum optab_subtype b) {
> > > > +    return a = static_cast<optab_subtype>(static_cast<int>(a)
> > > > +					  | static_cast<int>(b));
> > > > +}
> > > > +
> > > > +/* Override the Or-operator so we can use optab_subtype as a bit
> > > > +flag.  */ inline enum optab_subtype operator | (enum
> > > > +optab_subtype a, enum optab_subtype b) {
> > > > +    return static_cast<optab_subtype>(static_cast<int>(a)
> > > > +				      | static_cast<int>(b)); }
> > > > +
> > > >  /* Return the optab used for computing the given operation on the
> > > > type
> > > given by
> > > >     the second argument.  The third argument distinguishes between
> > > > the
> > > types of
> > > >     vector shifts and rotates.  */ diff --git a/gcc/optabs-tree.c
> > > > b/gcc/optabs-tree.c index
> > > >
> > >
> 95ffe397c23e80c105afea52e9d47216bf52f55a..2f60004545defc53182e004eea
> > > 1e
> > > > 5c22b7453072 100644
> > > > --- a/gcc/optabs-tree.c
> > > > +++ b/gcc/optabs-tree.c
> > > > @@ -127,7 +127,17 @@ optab_for_tree_code (enum tree_code code,
> > > const_tree type,
> > > >        return TYPE_UNSIGNED (type) ? usum_widen_optab :
> > > > ssum_widen_optab;
> > > >
> > > >      case DOT_PROD_EXPR:
> > > > -      return TYPE_UNSIGNED (type) ? udot_prod_optab :
> sdot_prod_optab;
> > > > +      {
> > > > +	gcc_assert (subtype & optab_default
> > > > +		    || subtype & optab_vector
> > > > +		    || subtype & optab_signed_to_unsigned
> > > > +		    || subtype & optab_unsigned_to_signed);
> > > > +
> > > > +	if (subtype & (optab_unsigned_to_signed |
> > > optab_signed_to_unsigned))
> > > > +	  return usdot_prod_optab;
> > > > +
> > > > +	return (TYPE_UNSIGNED (type) ? udot_prod_optab :
> > > sdot_prod_optab);
> > > > +      }
> > > >
> > > >      case SAD_EXPR:
> > > >        return TYPE_UNSIGNED (type) ? usad_optab : ssad_optab; diff
> > > > --git a/gcc/optabs.c b/gcc/optabs.c index
> > > >
> > >
> f4614a394587787293dc8b680a38901f7906f61c..2e18b76de1412eab71971753ac
> > > 67
> > > > 8597c0d00098 100644
> > > > --- a/gcc/optabs.c
> > > > +++ b/gcc/optabs.c
> > > > @@ -262,6 +262,11 @@ expand_widen_pattern_expr (sepops ops, rtx
> > > > op0,
> > > rtx op1, rtx wide_op,
> > > >    bool sbool = false;
> > > >
> > > >    oprnd0 = ops->op0;
> > > > +  if (nops >= 2)
> > > > +    oprnd1 = ops->op1;
> > > > +  if (nops >= 3)
> > > > +    oprnd2 = ops->op2;
> > > > +
> > > >    tmode0 = TYPE_MODE (TREE_TYPE (oprnd0));
> > > >    if (ops->code == VEC_UNPACK_FIX_TRUNC_HI_EXPR
> > > >        || ops->code == VEC_UNPACK_FIX_TRUNC_LO_EXPR) @@ -285,6
> > > +290,27
> > > > @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1, rtx
> > > wide_op,
> > > >  	   ? vec_unpacks_sbool_hi_optab : vec_unpacks_sbool_lo_optab);
> > > >        sbool = true;
> > > >      }
> > > > +  else if (ops->code == DOT_PROD_EXPR)
> > > > +    {
> > > > +      enum optab_subtype subtype = optab_default;
> > > > +      signop sign1 = TYPE_SIGN (TREE_TYPE (oprnd0));
> > > > +      signop sign2 = TYPE_SIGN (TREE_TYPE (oprnd1));
> > > > +      if (sign1 == sign2)
> > > > +	;
> > > > +      else if (sign1 == SIGNED && sign2 == UNSIGNED)
> > > > +	{
> > > > +	  subtype |= optab_signed_to_unsigned;
> > > > +	  /* Same as optab_unsigned_to_signed but flip the operands.  */
> > > > +	  std::swap (op0, op1);
> > > > +	}
> > > > +      else if (sign1 == UNSIGNED && sign2 == SIGNED)
> > > > +	subtype |= optab_unsigned_to_signed;
> > > > +      else
> > > > +	gcc_unreachable ();
> > > > +
> > > > +      widen_pattern_optab
> > > > +	= optab_for_tree_code (ops->code, TREE_TYPE (oprnd0), subtype);
> > > > +    }
> > > >    else
> > > >      widen_pattern_optab
> > > >        = optab_for_tree_code (ops->code, TREE_TYPE (oprnd0),
> > > > optab_default); @@ -298,10 +324,7 @@ expand_widen_pattern_expr
> > > (sepops ops, rtx op0, rtx op1, rtx wide_op,
> > > >    gcc_assert (icode != CODE_FOR_nothing);
> > > >
> > > >    if (nops >= 2)
> > > > -    {
> > > > -      oprnd1 = ops->op1;
> > > > -      tmode1 = TYPE_MODE (TREE_TYPE (oprnd1));
> > > > -    }
> > > > +    tmode1 = TYPE_MODE (TREE_TYPE (oprnd1));
> > > >    else if (sbool)
> > > >      {
> > > >        nops = 2;
> > > > @@ -316,7 +339,6 @@ expand_widen_pattern_expr (sepops ops, rtx
> > > > op0,
> > > rtx op1, rtx wide_op,
> > > >      {
> > > >        gcc_assert (tmode1 == tmode0);
> > > >        gcc_assert (op1);
> > > > -      oprnd2 = ops->op2;
> > > >        wmode = TYPE_MODE (TREE_TYPE (oprnd2));
> > > >      }
> > > >
> > > > diff --git a/gcc/optabs.def b/gcc/optabs.def index
> > > >
> > >
> b192a9d070b8aa72e5676b2eaa020b5bdd7ffcc8..f470c2168378cec840edf7fbd
> > > b7c
> > > > 18615baae928 100644
> > > > --- a/gcc/optabs.def
> > > > +++ b/gcc/optabs.def
> > > > @@ -352,6 +352,7 @@ OPTAB_D (uavg_ceil_optab, "uavg$a3_ceil")
> > > OPTAB_D
> > > > (sdot_prod_optab, "sdot_prod$I$a")  OPTAB_D (ssum_widen_optab,
> > > > "widen_ssum$I$a3")  OPTAB_D (udot_prod_optab, "udot_prod$I$a")
> > > > +OPTAB_D (usdot_prod_optab, "usdot_prod$I$a")
> > > >  OPTAB_D (usum_widen_optab, "widen_usum$I$a3")  OPTAB_D
> > > (usad_optab,
> > > > "usad$I$a")  OPTAB_D (ssad_optab, "ssad$I$a") diff --git
> > > > a/gcc/tree-cfg.c b/gcc/tree-cfg.c index
> > > >
> > >
> 7e3aae5f9c28a49feedc7cc66e8ac0d476b9f28a..58b55bb648ad97d514f1fa18bb
> > > 00
> > > > 808fd2678b42 100644
> > > > --- a/gcc/tree-cfg.c
> > > > +++ b/gcc/tree-cfg.c
> > > > @@ -4421,7 +4421,8 @@ verify_gimple_assign_ternary (gassign *stmt)
> > > >  		  && !SCALAR_FLOAT_TYPE_P (rhs1_type))
> > > >  		 || (!INTEGRAL_TYPE_P (lhs_type)
> > > >  		     && !SCALAR_FLOAT_TYPE_P (lhs_type))))
> > > > -	    || !types_compatible_p (rhs1_type, rhs2_type)
> > > > +	    || (!types_compatible_p (rhs1_type, rhs2_type)
> > > > +		&& TYPE_SIGN (rhs1_type) == TYPE_SIGN (rhs2_type))
> > >
> > > That's not restrictive enough.  I suggest you use
> > >
> > >             && element_precision (rhs1_type) != element_precision
> > > (rhs2_type)
> > >
> > > instead.
> > >
> > > As said, I'm not sure all the changes in this patch are required.
> > >
> > > Please elaborate.
> > >
> > > Thanks,
> > > Richard.
> > >
> > > >  	    || !useless_type_conversion_p (lhs_type, rhs3_type)
> > > >  	    || maybe_lt (GET_MODE_SIZE (element_mode (rhs3_type)),
> > > >  			 2 * GET_MODE_SIZE (element_mode (rhs1_type))))
> > > diff --git
> > > > a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index
> > > >
> > >
> 93fa2928e001c154bd4a9a73ac1dbbbf73c456df..cb8f5fbb6abca181c4171194d1
> > > 9f
> > > > ec29ec6e4176 100644
> > > > --- a/gcc/tree-vect-loop.c
> > > > +++ b/gcc/tree-vect-loop.c
> > > > @@ -6401,6 +6401,33 @@ build_vect_cond_expr (enum tree_code
> code,
> > > tree vop[3], tree mask,
> > > >      }
> > > >  }
> > > >
> > > > +/* Determine the optab_subtype to use for the given CODE and STMT.
> > > For
> > > > +   most CODE this will be optab_vector, however for certain
> > > > + operations
> > > such as
> > > > +   DOT_PROD_EXPR where the operation can different signs for the
> > > operands we
> > > > +   need to be able to pick the right optabs.  */
> > > > +
> > > > +static enum optab_subtype
> > > > +vect_determine_dot_kind (tree_code code, stmt_vec_info
> > > > +stmt_vinfo) {
> > > > +  enum optab_subtype subtype = optab_vector;
> > > > +  switch (code)
> > > > +    {
> > > > +      case DOT_PROD_EXPR:
> > > > +	{
> > > > +	  gassign *stmt = as_a <gassign *> (STMT_VINFO_STMT (stmt_vinfo));
> > > > +	  signop rhs1_sign = TYPE_SIGN (TREE_TYPE (gimple_assign_rhs1
> > > (stmt)));
> > > > +	  signop rhs2_sign = TYPE_SIGN (TREE_TYPE (gimple_assign_rhs2
> > > (stmt)));
> > > > +	  if (rhs1_sign != rhs2_sign)
> > > > +	    subtype |= optab_unsigned_to_signed;
> > > > +	  break;
> > > > +	}
> > > > +      default:
> > > > +	break;
> > > > +    }
> > > > +
> > > > +  return subtype;
> > > > +}
> > > > +
> > > >  /* Function vectorizable_reduction.
> > > >
> > > >     Check if STMT_INFO performs a reduction operation that can be
> > > vectorized.
> > > > @@ -7189,7 +7216,8 @@ vectorizable_reduction (loop_vec_info
> > > loop_vinfo,
> > > >        bool ok = true;
> > > >
> > > >        /* 4.1. check support for the operation in the loop  */
> > > > -      optab optab = optab_for_tree_code (code, vectype_in,
> optab_vector);
> > > > +      enum optab_subtype subtype = vect_determine_dot_kind (code,
> > > stmt_info);
> > > > +      optab optab = optab_for_tree_code (code, vectype_in,
> > > > + subtype);
> > > >        if (!optab)
> > > >  	{
> > > >  	  if (dump_enabled_p ())
> > > > diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
> > > > index
> > > >
> > >
> 441d6cd28c4eaded7abd756164890dbcffd2f3b8..943c001fb13777b4d1513841f
> > > a84
> > > > 942316846d5e 100644
> > > > --- a/gcc/tree-vect-patterns.c
> > > > +++ b/gcc/tree-vect-patterns.c
> > > > @@ -201,7 +201,8 @@ vect_get_external_def_edge (vec_info *vinfo,
> > > > tree
> > > > var)  static bool  vect_supportable_direct_optab_p (vec_info
> > > > *vinfo, tree otype, tree_code code,
> > > >  				 tree itype, tree *vecotype_out,
> > > > -				 tree *vecitype_out = NULL)
> > > > +				 tree *vecitype_out = NULL,
> > > > +				 enum optab_subtype subtype =
> > > optab_default)
> > > >  {
> > > >    tree vecitype = get_vectype_for_scalar_type (vinfo, itype);
> > > >    if (!vecitype)
> > > > @@ -211,7 +212,7 @@ vect_supportable_direct_optab_p (vec_info
> > > > *vinfo,
> > > tree otype, tree_code code,
> > > >    if (!vecotype)
> > > >      return false;
> > > >
> > > > -  optab optab = optab_for_tree_code (code, vecitype,
> > > > optab_default);
> > > > +  optab optab = optab_for_tree_code (code, vecitype, subtype);
> > > >    if (!optab)
> > > >      return false;
> > > >
> > > > @@ -487,14 +488,31 @@ vect_joust_widened_integer (tree type, bool
> > > > shift_p, tree op,  }
> > > >
> > > >  /* Return true if the common supertype of NEW_TYPE and
> > > *COMMON_TYPE
> > > > -   is narrower than type, storing the supertype in *COMMON_TYPE if
> so.
> > > */
> > > > +   is narrower than type, storing the supertype in *COMMON_TYPE if
> so.
> > > > +   If ALLOW_SHORT_SIGN_MISMATCH then accept that
> *COMMON_TYPE
> > > and NEW_TYPE
> > > > +   may be of different signs but equal precision.   */
> > > >
> > > >  static bool
> > > > -vect_joust_widened_type (tree type, tree new_type, tree
> > > *common_type)
> > > > +vect_joust_widened_type (tree type, tree new_type, tree
> > > *common_type,
> > > > +			 bool allow_short_sign_mismatch = false)
> > > >  {
> > > >    if (types_compatible_p (*common_type, new_type))
> > > >      return true;
> > > >
> > > > +  /* Check if the mismatch is only in the sign and if we have
> > > > +     allow_short_sign_mismatch then allow it.  */
> > > > +  if (allow_short_sign_mismatch
> > > > +      && TYPE_SIGN (*common_type) != TYPE_SIGN (new_type))
> > > > +    {
> > > > +      bool sign = TYPE_SIGN (*common_type) == UNSIGNED;
> > > > +      tree eq_type
> > > > +	= build_nonstandard_integer_type (TYPE_PRECISION (new_type),
> > > > +					  sign);
> > > > +
> > > > +      if (types_compatible_p (*common_type, eq_type))
> > > > +	return true;
> > > > +    }
> > > > +
> > > >    /* See if *COMMON_TYPE can hold all values of NEW_TYPE.  */
> > > >    if ((TYPE_PRECISION (new_type) < TYPE_PRECISION (*common_type))
> > > >        && (TYPE_UNSIGNED (new_type) || !TYPE_UNSIGNED
> > > (*common_type)))
> > > > @@ -532,6 +550,9 @@ vect_joust_widened_type (tree type, tree
> > > new_type, tree *common_type)
> > > >     to a type that (a) is narrower than the result of STMT_INFO and
> > > >     (b) can hold all leaf operand values.
> > > >
> > > > +   If ALLOW_SHORT_SIGN_MISMATCH then allow that the signs of the
> > > operands
> > > > +   may differ in signs but not in precision.
> > > > +
> > > >     Return 0 if STMT_INFO isn't such a tree, or if no such COMMON_TYPE
> > > >     exists.  */
> > > >
> > > > @@ -539,7 +560,8 @@ static unsigned int  vect_widened_op_tree
> > > > (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
> > > >  		      tree_code widened_code, bool shift_p,
> > > >  		      unsigned int max_nops,
> > > > -		      vect_unpromoted_value *unprom, tree *common_type)
> > > > +		      vect_unpromoted_value *unprom, tree *common_type,
> > > > +		      bool allow_short_sign_mismatch = false)
> > > >  {
> > > >    /* Check for an integer operation with the right code.  */
> > > >    gassign *assign = dyn_cast <gassign *> (stmt_info->stmt); @@
> > > > -600,7
> > > > +622,8 @@ vect_widened_op_tree (vec_info *vinfo, stmt_vec_info
> > > stmt_info, tree_code code,
> > > >  		= vinfo->lookup_def (this_unprom->op);
> > > >  	      nops = vect_widened_op_tree (vinfo, def_stmt_info, code,
> > > >  					   widened_code, shift_p, max_nops,
> > > > -					   this_unprom, common_type);
> > > > +					   this_unprom, common_type,
> > > > +					   allow_short_sign_mismatch);
> > > >  	      if (nops == 0)
> > > >  		return 0;
> > > >
> > > > @@ -617,7 +640,8 @@ vect_widened_op_tree (vec_info *vinfo,
> > > stmt_vec_info stmt_info, tree_code code,
> > > >  	      if (i == 0)
> > > >  		*common_type = this_unprom->type;
> > > >  	      else if (!vect_joust_widened_type (type, this_unprom->type,
> > > > -						 common_type))
> > > > +						 common_type,
> > > > +						 allow_short_sign_mismatch))
> > > >  		return 0;
> > > >  	    }
> > > >  	}
> > > > @@ -888,21 +912,24 @@ vect_reassociating_reduction_p (vec_info
> > > > *vinfo,
> > > >
> > > >     Try to find the following pattern:
> > > >
> > > > -     type x_t, y_t;
> > > > +     type1a x_t
> > > > +     type1b y_t;
> > > >       TYPE1 prod;
> > > >       TYPE2 sum = init;
> > > >     loop:
> > > >       sum_0 = phi <init, sum_1>
> > > >       S1  x_t = ...
> > > >       S2  y_t = ...
> > > > -     S3  x_T = (TYPE1) x_t;
> > > > -     S4  y_T = (TYPE1) y_t;
> > > > +     S3  x_T = (TYPE3) x_t;
> > > > +     S4  y_T = (TYPE4) y_t;
> > > >       S5  prod = x_T * y_T;
> > > >       [S6  prod = (TYPE2) prod;  #optional]
> > > >       S7  sum_1 = prod + sum_0;
> > > >
> > > > -   where 'TYPE1' is exactly double the size of type 'type', and 'TYPE2' is
> the
> > > > -   same size of 'TYPE1' or bigger. This is a special case of a reduction
> > > > +   where 'TYPE1' is exactly double the size of type 'type1a' and 'type1b',
> > > > +   the sign of 'TYPE1' must be one of 'type1a' or 'type1b' but the sign of
> > > > +   'type1a' and 'type1b' can differ. 'TYPE2' is the same size of 'TYPE1' or
> > > > +   bigger and must be the same sign. This is a special case of a
> > > > + reduction
> > > >     computation.
> > > >
> > > >     Input:
> > > > @@ -939,15 +966,16 @@ vect_recog_dot_prod_pattern (vec_info
> > > > *vinfo,
> > > >
> > > >    /* Look for the following pattern
> > > >            DX = (TYPE1) X;
> > > > -          DY = (TYPE1) Y;
> > > > +	  DY = (TYPE2) Y;
> > > >            DPROD = DX * DY;
> > > > -          DDPROD = (TYPE2) DPROD;
> > > > +	  DDPROD = (TYPE3) DPROD;
> > > >            sum_1 = DDPROD + sum_0;
> > > >       In which
> > > >       - DX is double the size of X
> > > >       - DY is double the size of Y
> > > >       - DX, DY, DPROD all have the same type but the sign
> > > > -       between DX, DY and DPROD can differ.
> > > > +       between DX, DY and DPROD can differ. The sign of DPROD
> > > > +       is one of the signs of DX or DY.
> > > >       - sum is the same size of DPROD or bigger
> > > >       - sum has been recognized as a reduction variable.
> > > >
> > > > @@ -986,14 +1014,41 @@ vect_recog_dot_prod_pattern (vec_info
> *vinfo,
> > > >       inside the loop (in case we are analyzing an outer-loop).  */
> > > >    vect_unpromoted_value unprom0[2];
> > > >    if (!vect_widened_op_tree (vinfo, mult_vinfo, MULT_EXPR,
> > > WIDEN_MULT_EXPR,
> > > > -			     false, 2, unprom0, &half_type))
> > > > +			     false, 2, unprom0, &half_type, true))
> > > >      return NULL;
> > > >
> > > > +  /* Check to see if there is a sign change happening in the
> > > > + operands of
> > > the
> > > > +     multiplication and pick the appropriate optab subtype.  */
> > > > +  enum optab_subtype subtype;
> > > > +  tree rhs_type1 = unprom0[0].type;
> > > > +  tree rhs_type2 = unprom0[1].type;
> > > > +  if (TYPE_SIGN (rhs_type1) == TYPE_SIGN (rhs_type2))
> > > > +     subtype = optab_default;
> > > > +  else if (TYPE_SIGN (rhs_type1) == SIGNED
> > > > +	   && TYPE_SIGN (rhs_type2) == UNSIGNED)
> > > > +     subtype = optab_signed_to_unsigned;
> > > > +  else if (TYPE_SIGN (rhs_type1) == UNSIGNED
> > > > +	   && TYPE_SIGN (rhs_type2) == SIGNED)
> > > > +     subtype = optab_unsigned_to_signed;
> > > > +  else
> > > > +    gcc_unreachable ();
> > > > +
> > > > +  /* If we have a sign changing dot product we need to check that the
> > > > +     promoted type if unsigned has at least the same precision as the
> final
> > > > +     type of the dot-product.  */
> > > > +  if (subtype != optab_default)
> > > > +    {
> > > > +      tree mult_type = TREE_TYPE (unprom_mult.op);
> > > > +      if (TYPE_SIGN (mult_type) == UNSIGNED
> > > > +	  && TYPE_PRECISION (mult_type) < TYPE_PRECISION (type))
> > > > +	return NULL;
> > > > +    }
> > > > +
> > > >    vect_pattern_detected ("vect_recog_dot_prod_pattern",
> > > > last_stmt);
> > > >
> > > >    tree half_vectype;
> > > >    if (!vect_supportable_direct_optab_p (vinfo, type,
> > > > DOT_PROD_EXPR,
> > > half_type,
> > > > -					type_out, &half_vectype))
> > > > +					type_out, &half_vectype, subtype))
> > > >      return NULL;
> > > >
> > > >    /* Get the inputs in the appropriate types.  */ @@ -1002,8
> > > > +1057,22 @@ vect_recog_dot_prod_pattern (vec_info *vinfo,
> > > >  		       unprom0, half_vectype);
> > > >
> > > >    var = vect_recog_temp_ssa_var (type, NULL);
> > > > +
> > > > +  /* If we have a sign changing dot-product the dot-product itself does
> any
> > > > +     sign conversions, so consume the type and use the unpromoted
> > > > + types.  */  tree mult_arg1, mult_arg2;  if (subtype ==
> > > > + optab_default)
> > > > +    {
> > > > +      mult_arg1 = mult_oprnd[0];
> > > > +      mult_arg2 = mult_oprnd[1];
> > > > +    }
> > > > +  else
> > > > +    {
> > > > +      mult_arg1 = unprom0[0].op;
> > > > +      mult_arg2 = unprom0[1].op;
> > > > +    }
> > > >    pattern_stmt = gimple_build_assign (var, DOT_PROD_EXPR,
> > > > -				      mult_oprnd[0], mult_oprnd[1], oprnd1);
> > > > +				      mult_arg1, mult_arg2, oprnd1);
> > > >
> > > >    return pattern_stmt;
> > > >  }
> > > >
> > > >
> > > >
> > >
> > > --
> > > Richard Biener <rguenther@suse.de>
> > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409
> > > Nuernberg, Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> >
> 
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409
> Nuernberg, Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
Richard Biener May 10, 2021, 1:29 p.m. UTC | #5
On Mon, 10 May 2021, Tamar Christina wrote:

> 
> 
> > -----Original Message-----
> > From: Richard Biener <rguenther@suse.de>
> > Sent: Monday, May 10, 2021 12:40 PM
> > To: Tamar Christina <Tamar.Christina@arm.com>
> > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>
> > Subject: RE: [PATCH 1/4]middle-end Vect: Add support for dot-product
> > where the sign for the multiplicant changes.
> > 
> > On Fri, 7 May 2021, Tamar Christina wrote:
> > 
> > > Hi Richi,
> > >
> > > > -----Original Message-----
> > > > From: Richard Biener <rguenther@suse.de>
> > > > Sent: Friday, May 7, 2021 12:46 PM
> > > > To: Tamar Christina <Tamar.Christina@arm.com>
> > > > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>
> > > > Subject: Re: [PATCH 1/4]middle-end Vect: Add support for dot-product
> > > > where the sign for the multiplicant changes.
> > > >
> > > > On Wed, 5 May 2021, Tamar Christina wrote:
> > > >
> > > > > Hi All,
> > > > >
> > > > > This patch adds support for a dot product where the sign of the
> > > > > multiplication arguments differ. i.e. one is signed and one is
> > > > > unsigned but the precisions are the same.
> > > > >
> > > > > #define N 480
> > > > > #define SIGNEDNESS_1 unsigned
> > > > > #define SIGNEDNESS_2 signed
> > > > > #define SIGNEDNESS_3 signed
> > > > > #define SIGNEDNESS_4 unsigned
> > > > >
> > > > > SIGNEDNESS_1 int __attribute__ ((noipa)) f (SIGNEDNESS_1 int res,
> > > > > SIGNEDNESS_3 char *restrict a,
> > > > >    SIGNEDNESS_4 char *restrict b)
> > > > > {
> > > > >   for (__INTPTR_TYPE__ i = 0; i < N; ++i)
> > > > >     {
> > > > >       int av = a[i];
> > > > >       int bv = b[i];
> > > > >       SIGNEDNESS_2 short mult = av * bv;
> > > > >       res += mult;
> > > > >     }
> > > > >   return res;
> > > > > }
> > > > >
> > > > > The operations are performed as if the operands were extended to a
> > > > > 32-bit
> > > > value.
> > > > > As such this operation isn't valid if there is an intermediate
> > > > > conversion to an unsigned value. i.e.  if SIGNEDNESS_2 is unsigned.
> > > > >
> > > > > more over if the signs of SIGNEDNESS_3 and SIGNEDNESS_4 are
> > > > > flipped the same optab is used but the operands are flipped in the
> > > > > optab
> > > > expansion.
> > > > >
> > > > > To support this the patch extends the dot-product detection to
> > > > > optionally ignore operands with different signs and stores this
> > > > > information in the optab subtype which is now made a bitfield.
> > > > >
> > > > > The subtype can now additionally controls which optab an EXPR can
> > > > > expand
> > > > to.
> > > > >
> > > > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > > > >
> > > > > Ok for master?
> > > > >
> > > > > Thanks,
> > > > > Tamar
> > > > >
> > > > > gcc/ChangeLog:
> > > > >
> > > > > 	* optabs.def (usdot_prod_optab): New.
> > > > > 	* doc/md.texi: Document it.
> > > > > 	* optabs-tree.c (optab_for_tree_code): Support usdot_prod_optab.
> > > > > 	* optabs-tree.h (enum optab_subtype): Likewise.
> > > > > 	* optabs.c (expand_widen_pattern_expr): Likewise.
> > > > > 	* tree-cfg.c (verify_gimple_assign_ternary): Likewise.
> > > > > 	* tree-vect-loop.c (vect_determine_dot_kind): New.
> > > > > 	(vectorizable_reduction): Query dot-product kind.
> > > > > 	* tree-vect-patterns.c (vect_supportable_direct_optab_p): Take
> > > > optional
> > > > > 	optab subtype.
> > > > > 	(vect_joust_widened_type, vect_widened_op_tree): Optionally
> > > > ignore
> > > > > 	mismatch types.
> > > > > 	(vect_recog_dot_prod_pattern): Support usdot_prod_optab.
> > > > >
> > > > > --- inline copy of patch --
> > > > > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index
> > > > >
> > > >
> > d166a0debedf4d8edf55c842bcf4ff4690b3e9ce..baf20416e63745097825fc30fd
> > > > f2
> > > > > e66bc80d7d23 100644
> > > > > --- a/gcc/doc/md.texi
> > > > > +++ b/gcc/doc/md.texi
> > > > > @@ -5440,11 +5440,13 @@ Like @samp{fold_left_plus_@var{m}}, but
> > > > takes
> > > > > an additional mask operand  @item @samp{sdot_prod@var{m}}
> > @cindex
> > > > > @code{udot_prod@var{m}} instruction pattern  @itemx
> > > > > @samp{udot_prod@var{m}}
> > > > > +@cindex @code{usdot_prod@var{m}} instruction pattern @itemx
> > > > > +@samp{usdot_prod@var{m}}
> > > > >  Compute the sum of the products of two signed/unsigned elements.
> > > > > -Operand 1 and operand 2 are of the same mode. Their product,
> > > > > which is of a -wider mode, is computed and added to operand 3.
> > > > > Operand 3 is of a mode equal or -wider than the mode of the
> > > > > product. The result is placed in operand 0, which -is of the same mode
> > as operand 3.
> > > > > +Operand 1 and operand 2 are of the same mode but may differ in
> > signs.
> > > > > +Their product, which is of a wider mode, is computed and added to
> > > > operand 3.
> > > > > +Operand 3 is of a mode equal or wider than the mode of the product.
> > > > > +The result is placed in operand 0, which is of the same mode as
> > operand 3.
> > > >
> > > > This doesn't really say what the 's', 'u' and 'us' specify.  Since
> > > > we're doing a widen multiplication and then a non-widening addition
> > > > we only need to know the effective sign of the multiplication so I think
> > the existing 's' and 'u'
> > > > are enough to cover all cases?
> > >
> > > The existing 's' and 'u' enforce that both operands of the
> > > multiplication are of the same sign.  So for e.g. 'u' both operand must be
> > unsigned.
> > >
> > > In the `us` case one can be signed and one unsigned. Operationally
> > > this does a sign extension to the wider type for the signed value, and
> > > the unsigned value gets zero extended first, and then converts it to
> > > unsigned to perform the unsigned multiplication, conforming to the C
> > promotion rules.
> > >
> > > TL;DR; Without a new optab I can't tell during expansion which
> > > semantic the operation had at the gimple/C level as modes don't carry signs.
> > >
> > > Long version:
> > >
> > > The problem with using the existing patterns, because of their
> > > enforcement of `av` and `bv` being the same sign is that we can't
> > > remove the explicit sign extensions, but the multiplication must be done on
> > the sign/zero extended char input in the same sign.
> > >
> > > Which means (unless I am mistaken) to get the correct result, you
> > > can't use neither `udot` nor `sdot` as semantically these would zero
> > > or sign extend both operands from char to int to perform the
> > > multiplication in the same sigh.  Whereas in this case, one parameter is zero
> > and one parameter is sign extended and the result is always an unsigned
> > number.
> > >
> > > So basically
> > >
> > > udot<unsigned c, unsigned a, unsigned b> ==
> > >    c = zero-ext (a) * zero-ext (b)
> > > sdot<signed c, signed a, signed b> ==
> > >    c = sign-ext (a) * sign-ext (b)
> > > usdot<unsigned c, unsigned a, signed b> ==
> > >    c = ((unsigned-conv) sign-ext (a)) * zero-ext (b)
> > >
> > > So semantically the existing optabs won't fit here. udot would
> > > internally promote to unsigned types before the multiplication so the
> > > result of the multiplication would be wrong.  sdot would promote both to
> > signed and do signed multiplication, so the result is also wrong.
> > >
> > > Now if I relax the constraint on the signs of udot and sdot there are two
> > problems:
> > > RTL Modes don't contain signs.  So a target can't tell me how the operands
> > will be promoted.
> > > So:
> > >
> > > 1) I can't really check which semantics the target will adhere to on
> > expansion.
> > > 2) at expand time I have no way to differentiate between the two
> > instructions variants, given just modes
> > >      I can't tell whether I expand to the normal dot-product or the new
> > instruction.
> > 
> > Ah, OK.  Indeed with such a weird instruction the new variant makes sense.
> > Still can you please amend the optab documentation to say which operand is
> > unsigned and which is signed?  Just 'may differ in signs'
> > is bad.
> 
> Sure, will expand on it.
> 
> > 
> > Since the multiplication is commutative I wonder why you need to handle
> > both signed_to_unsigned and unsigned_to_signed - we should just enforce
> > a canonical order (like the optab does). 
> 
> Sure, I thought it would have been better to change the order at expand time,
> but can do so at detection time.
> 
> > I also think it's a particular bad fit for
> > the bad optab_for_tree_code API - would any of that improve when using a
> > direct internal function here? 
> 
> Somewhat, but this has considerable knock on effects, e.g. currently DOT_PROD is
> treated as a widening operation and so is handled by supportable_widening_operation
> which does not support calls. There's a significant number of places which work on the
> tree EXPR (including constant folding) which all need to be changed.
> 
> > In particular all the changes around
> > optab_subtype look like they make a bad API worse ... at least a single
> > optab_vector_mixed_sign should suffice here, no need to make it a flags
> > kind.
> 
> The reason I did so is because depending on where the query is done it does use
> different subtypes currently.  During detection it uses optab_default, and during
> vectorization optab_vector.  For this instruction this difference doesn't seem to be
> used, but did not want to lose this information in case something depended on it.
> 
> But can make it just one.
> 
> > 
> > +  /* If we have a sign changing dot product we need to check that the
> > +     promoted type if unsigned has at least the same precision as the
> > final
> > +     type of the dot-product.  */
> > +  if (subtype != optab_default)
> > +    {
> > +      tree mult_type = TREE_TYPE (unprom_mult.op);
> > +      if (TYPE_SIGN (mult_type) == UNSIGNED
> > +         && TYPE_PRECISION (mult_type) < TYPE_PRECISION (type))
> > +       return NULL;
> > +    }
> > 
> > I don't understand this - how do we ever arrive at a result with less precision?
> 
> The user could have manually truncated the results, i.e. in the detection code notice `mult`
> 
>       int av = a[i];
>       int bv = b[i];
>       SIGNEDNESS_2 short mult = av * bv;
>       res += mult;
> 
> which is a short, so it's manually truncating the multiplication which 
> is done as int by the instruction. If `mult` is unsigned then it will 
> truncate the result if the signed input to usdot was negative, unless 
> the Intermediate calculation is of the same precision as the 
> instruction. i.e. if mult is unsigned int then there's no truncation 
> going on, it's casting from int to unsigned int so it's safe to use then 
> as the instruction does the same thing internally.

It looks to me that we simply should only ever allow sing-changes
from multiplication result to the sum.  At least your example
above is not special to mixed sign multiplications, no?

> > And why's this not an issue for signed multiplication?
> 
> It is, but in that case it's handled by the type jousting, which doesn't 
> allow the type mismatch. i.e.
> 
> #define SIGNEDNESS_1 unsigned
> #define SIGNEDNESS_2 unsigned
> #define SIGNEDNESS_3 signed
> #define SIGNEDNESS_4 signed
> 
> SIGNEDNESS_1 int __attribute__ ((noipa))
> f (SIGNEDNESS_1 int res, SIGNEDNESS_3 char *restrict a,
>    SIGNEDNESS_4 char *restrict b)
> {
>   for (__INTPTR_TYPE__ i = 0; i < N; ++i)
>     {
>       int av = a[i];
>       int bv = b[i];
>       SIGNEDNESS_2 short mult = av * bv;
>       res += mult;
>     }
>   return res;
> }
> 
> Is also not detected as a dot product.  By adding the carve out to the 
> widen multiplication detection it now allows this case through so I 
> handle it in the detection code.  Thinking about it now, it seems more 
> logical to add this case handling inside the type jousting code as I 
> don't think it's ever something you'd want.

Yeah, I think we only need to look through sign changes on the
multiplication result.

> > Also...
> > 
> > +  /* If we have a sign changing dot-product the dot-product itself does
> > any
> > +     sign conversions, so consume the type and use the unpromoted types.
> > */
> > +  tree mult_arg1, mult_arg2;
> > +  if (subtype == optab_default)
> > +    {
> > +      mult_arg1 = mult_oprnd[0];
> > +      mult_arg2 = mult_oprnd[1];
> > +    }
> > +  else
> > +    {
> > +      mult_arg1 = unprom0[0].op;
> > +      mult_arg2 = unprom0[1].op;
> > +    }
> >    pattern_stmt = gimple_build_assign (var, DOT_PROD_EXPR,
> > -                                     mult_oprnd[0], mult_oprnd[1],
> > oprnd1);
> > +                                     mult_arg1, mult_arg2, oprnd1);
> > 
> > I thought DOT_PROD always performs the promotion.  Maybe mult_oprnd
> > and unprom0 are just misnamed here?
> 
> Somewhat, in a normal dot-product the sign of the multiplication are the 
> same here as the "unpromoted" types. So after vect_convert_input these 
> two types are the same.
> 
> However because here the sign changes and to maintain the semantics of 
> the C code there's an extra conversion here to get the arguments in the 
> same sign.  That needs to be stripped before given to the instruction 
> which does the conversion internally.

Yes, but then why's that not done by the detection code?  That is,
does it (mis-)handle the (int)short_a * (int)(unsigned short)short_b
where we'd want the mixed-sign handling and not strip the
unsigned short conversion from short_b?

Richard.

> 
> Regards,
> Tamar
> 
> > 
> > Richard.
> > 
> > > Regards,
> > > Tamar
> > >
> > > >
> > > > The tree.def docs say the sum is also possibly widening but I don't
> > > > see this covered by the optab so we should eventually remove this
> > > > feature from the tree side.  In fact the tree-cfg.c verifier
> > > > requires the addition to be not widening - thus only tree.def needs
> > adjustment.
> > > >
> > > > >  @cindex @code{ssad@var{m}} instruction pattern  @item
> > > > > @samp{ssad@var{m}} diff --git a/gcc/optabs-tree.h
> > > > > b/gcc/optabs-tree.h index
> > > > >
> > > >
> > c3aaa1a416991e856d3e24da45968a92ebada82c..ebc23ac86fe99057f375781c2f
> > > > 19
> > > > > 90e0548ba08d 100644
> > > > > --- a/gcc/optabs-tree.h
> > > > > +++ b/gcc/optabs-tree.h
> > > > > @@ -27,11 +27,29 @@ along with GCC; see the file COPYING3.  If not
> > see
> > > > >     shift amount vs. machines that take a vector for the shift amount.
> > > > > */  enum optab_subtype  {
> > > > > -  optab_default,
> > > > > -  optab_scalar,
> > > > > -  optab_vector
> > > > > +  optab_default = 1 << 0,
> > > > > +  optab_scalar = 1 << 1,
> > > > > +  optab_vector = 1 << 2,
> > > > > +  optab_signed_to_unsigned = 1 << 3,  optab_unsigned_to_signed =
> > > > > + 1 << 4
> > > > >  };
> > > > >
> > > > > +/* Override the OrEqual-operator so we can use optab_subtype as a
> > > > > +bit flag.  */ inline enum optab_subtype& operator |= (enum
> > > > optab_subtype&
> > > > > +a, enum optab_subtype b) {
> > > > > +    return a = static_cast<optab_subtype>(static_cast<int>(a)
> > > > > +					  | static_cast<int>(b));
> > > > > +}
> > > > > +
> > > > > +/* Override the Or-operator so we can use optab_subtype as a bit
> > > > > +flag.  */ inline enum optab_subtype operator | (enum
> > > > > +optab_subtype a, enum optab_subtype b) {
> > > > > +    return static_cast<optab_subtype>(static_cast<int>(a)
> > > > > +				      | static_cast<int>(b)); }
> > > > > +
> > > > >  /* Return the optab used for computing the given operation on the
> > > > > type
> > > > given by
> > > > >     the second argument.  The third argument distinguishes between
> > > > > the
> > > > types of
> > > > >     vector shifts and rotates.  */ diff --git a/gcc/optabs-tree.c
> > > > > b/gcc/optabs-tree.c index
> > > > >
> > > >
> > 95ffe397c23e80c105afea52e9d47216bf52f55a..2f60004545defc53182e004eea
> > > > 1e
> > > > > 5c22b7453072 100644
> > > > > --- a/gcc/optabs-tree.c
> > > > > +++ b/gcc/optabs-tree.c
> > > > > @@ -127,7 +127,17 @@ optab_for_tree_code (enum tree_code code,
> > > > const_tree type,
> > > > >        return TYPE_UNSIGNED (type) ? usum_widen_optab :
> > > > > ssum_widen_optab;
> > > > >
> > > > >      case DOT_PROD_EXPR:
> > > > > -      return TYPE_UNSIGNED (type) ? udot_prod_optab :
> > sdot_prod_optab;
> > > > > +      {
> > > > > +	gcc_assert (subtype & optab_default
> > > > > +		    || subtype & optab_vector
> > > > > +		    || subtype & optab_signed_to_unsigned
> > > > > +		    || subtype & optab_unsigned_to_signed);
> > > > > +
> > > > > +	if (subtype & (optab_unsigned_to_signed |
> > > > optab_signed_to_unsigned))
> > > > > +	  return usdot_prod_optab;
> > > > > +
> > > > > +	return (TYPE_UNSIGNED (type) ? udot_prod_optab :
> > > > sdot_prod_optab);
> > > > > +      }
> > > > >
> > > > >      case SAD_EXPR:
> > > > >        return TYPE_UNSIGNED (type) ? usad_optab : ssad_optab; diff
> > > > > --git a/gcc/optabs.c b/gcc/optabs.c index
> > > > >
> > > >
> > f4614a394587787293dc8b680a38901f7906f61c..2e18b76de1412eab71971753ac
> > > > 67
> > > > > 8597c0d00098 100644
> > > > > --- a/gcc/optabs.c
> > > > > +++ b/gcc/optabs.c
> > > > > @@ -262,6 +262,11 @@ expand_widen_pattern_expr (sepops ops, rtx
> > > > > op0,
> > > > rtx op1, rtx wide_op,
> > > > >    bool sbool = false;
> > > > >
> > > > >    oprnd0 = ops->op0;
> > > > > +  if (nops >= 2)
> > > > > +    oprnd1 = ops->op1;
> > > > > +  if (nops >= 3)
> > > > > +    oprnd2 = ops->op2;
> > > > > +
> > > > >    tmode0 = TYPE_MODE (TREE_TYPE (oprnd0));
> > > > >    if (ops->code == VEC_UNPACK_FIX_TRUNC_HI_EXPR
> > > > >        || ops->code == VEC_UNPACK_FIX_TRUNC_LO_EXPR) @@ -285,6
> > > > +290,27
> > > > > @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1, rtx
> > > > wide_op,
> > > > >  	   ? vec_unpacks_sbool_hi_optab : vec_unpacks_sbool_lo_optab);
> > > > >        sbool = true;
> > > > >      }
> > > > > +  else if (ops->code == DOT_PROD_EXPR)
> > > > > +    {
> > > > > +      enum optab_subtype subtype = optab_default;
> > > > > +      signop sign1 = TYPE_SIGN (TREE_TYPE (oprnd0));
> > > > > +      signop sign2 = TYPE_SIGN (TREE_TYPE (oprnd1));
> > > > > +      if (sign1 == sign2)
> > > > > +	;
> > > > > +      else if (sign1 == SIGNED && sign2 == UNSIGNED)
> > > > > +	{
> > > > > +	  subtype |= optab_signed_to_unsigned;
> > > > > +	  /* Same as optab_unsigned_to_signed but flip the operands.  */
> > > > > +	  std::swap (op0, op1);
> > > > > +	}
> > > > > +      else if (sign1 == UNSIGNED && sign2 == SIGNED)
> > > > > +	subtype |= optab_unsigned_to_signed;
> > > > > +      else
> > > > > +	gcc_unreachable ();
> > > > > +
> > > > > +      widen_pattern_optab
> > > > > +	= optab_for_tree_code (ops->code, TREE_TYPE (oprnd0), subtype);
> > > > > +    }
> > > > >    else
> > > > >      widen_pattern_optab
> > > > >        = optab_for_tree_code (ops->code, TREE_TYPE (oprnd0),
> > > > > optab_default); @@ -298,10 +324,7 @@ expand_widen_pattern_expr
> > > > (sepops ops, rtx op0, rtx op1, rtx wide_op,
> > > > >    gcc_assert (icode != CODE_FOR_nothing);
> > > > >
> > > > >    if (nops >= 2)
> > > > > -    {
> > > > > -      oprnd1 = ops->op1;
> > > > > -      tmode1 = TYPE_MODE (TREE_TYPE (oprnd1));
> > > > > -    }
> > > > > +    tmode1 = TYPE_MODE (TREE_TYPE (oprnd1));
> > > > >    else if (sbool)
> > > > >      {
> > > > >        nops = 2;
> > > > > @@ -316,7 +339,6 @@ expand_widen_pattern_expr (sepops ops, rtx
> > > > > op0,
> > > > rtx op1, rtx wide_op,
> > > > >      {
> > > > >        gcc_assert (tmode1 == tmode0);
> > > > >        gcc_assert (op1);
> > > > > -      oprnd2 = ops->op2;
> > > > >        wmode = TYPE_MODE (TREE_TYPE (oprnd2));
> > > > >      }
> > > > >
> > > > > diff --git a/gcc/optabs.def b/gcc/optabs.def index
> > > > >
> > > >
> > b192a9d070b8aa72e5676b2eaa020b5bdd7ffcc8..f470c2168378cec840edf7fbd
> > > > b7c
> > > > > 18615baae928 100644
> > > > > --- a/gcc/optabs.def
> > > > > +++ b/gcc/optabs.def
> > > > > @@ -352,6 +352,7 @@ OPTAB_D (uavg_ceil_optab, "uavg$a3_ceil")
> > > > OPTAB_D
> > > > > (sdot_prod_optab, "sdot_prod$I$a")  OPTAB_D (ssum_widen_optab,
> > > > > "widen_ssum$I$a3")  OPTAB_D (udot_prod_optab, "udot_prod$I$a")
> > > > > +OPTAB_D (usdot_prod_optab, "usdot_prod$I$a")
> > > > >  OPTAB_D (usum_widen_optab, "widen_usum$I$a3")  OPTAB_D
> > > > (usad_optab,
> > > > > "usad$I$a")  OPTAB_D (ssad_optab, "ssad$I$a") diff --git
> > > > > a/gcc/tree-cfg.c b/gcc/tree-cfg.c index
> > > > >
> > > >
> > 7e3aae5f9c28a49feedc7cc66e8ac0d476b9f28a..58b55bb648ad97d514f1fa18bb
> > > > 00
> > > > > 808fd2678b42 100644
> > > > > --- a/gcc/tree-cfg.c
> > > > > +++ b/gcc/tree-cfg.c
> > > > > @@ -4421,7 +4421,8 @@ verify_gimple_assign_ternary (gassign *stmt)
> > > > >  		  && !SCALAR_FLOAT_TYPE_P (rhs1_type))
> > > > >  		 || (!INTEGRAL_TYPE_P (lhs_type)
> > > > >  		     && !SCALAR_FLOAT_TYPE_P (lhs_type))))
> > > > > -	    || !types_compatible_p (rhs1_type, rhs2_type)
> > > > > +	    || (!types_compatible_p (rhs1_type, rhs2_type)
> > > > > +		&& TYPE_SIGN (rhs1_type) == TYPE_SIGN (rhs2_type))
> > > >
> > > > That's not restrictive enough.  I suggest you use
> > > >
> > > >             && element_precision (rhs1_type) != element_precision
> > > > (rhs2_type)
> > > >
> > > > instead.
> > > >
> > > > As said, I'm not sure all the changes in this patch are required.
> > > >
> > > > Please elaborate.
> > > >
> > > > Thanks,
> > > > Richard.
> > > >
> > > > >  	    || !useless_type_conversion_p (lhs_type, rhs3_type)
> > > > >  	    || maybe_lt (GET_MODE_SIZE (element_mode (rhs3_type)),
> > > > >  			 2 * GET_MODE_SIZE (element_mode (rhs1_type))))
> > > > diff --git
> > > > > a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index
> > > > >
> > > >
> > 93fa2928e001c154bd4a9a73ac1dbbbf73c456df..cb8f5fbb6abca181c4171194d1
> > > > 9f
> > > > > ec29ec6e4176 100644
> > > > > --- a/gcc/tree-vect-loop.c
> > > > > +++ b/gcc/tree-vect-loop.c
> > > > > @@ -6401,6 +6401,33 @@ build_vect_cond_expr (enum tree_code
> > code,
> > > > tree vop[3], tree mask,
> > > > >      }
> > > > >  }
> > > > >
> > > > > +/* Determine the optab_subtype to use for the given CODE and STMT.
> > > > For
> > > > > +   most CODE this will be optab_vector, however for certain
> > > > > + operations
> > > > such as
> > > > > +   DOT_PROD_EXPR where the operation can different signs for the
> > > > operands we
> > > > > +   need to be able to pick the right optabs.  */
> > > > > +
> > > > > +static enum optab_subtype
> > > > > +vect_determine_dot_kind (tree_code code, stmt_vec_info
> > > > > +stmt_vinfo) {
> > > > > +  enum optab_subtype subtype = optab_vector;
> > > > > +  switch (code)
> > > > > +    {
> > > > > +      case DOT_PROD_EXPR:
> > > > > +	{
> > > > > +	  gassign *stmt = as_a <gassign *> (STMT_VINFO_STMT (stmt_vinfo));
> > > > > +	  signop rhs1_sign = TYPE_SIGN (TREE_TYPE (gimple_assign_rhs1
> > > > (stmt)));
> > > > > +	  signop rhs2_sign = TYPE_SIGN (TREE_TYPE (gimple_assign_rhs2
> > > > (stmt)));
> > > > > +	  if (rhs1_sign != rhs2_sign)
> > > > > +	    subtype |= optab_unsigned_to_signed;
> > > > > +	  break;
> > > > > +	}
> > > > > +      default:
> > > > > +	break;
> > > > > +    }
> > > > > +
> > > > > +  return subtype;
> > > > > +}
> > > > > +
> > > > >  /* Function vectorizable_reduction.
> > > > >
> > > > >     Check if STMT_INFO performs a reduction operation that can be
> > > > vectorized.
> > > > > @@ -7189,7 +7216,8 @@ vectorizable_reduction (loop_vec_info
> > > > loop_vinfo,
> > > > >        bool ok = true;
> > > > >
> > > > >        /* 4.1. check support for the operation in the loop  */
> > > > > -      optab optab = optab_for_tree_code (code, vectype_in,
> > optab_vector);
> > > > > +      enum optab_subtype subtype = vect_determine_dot_kind (code,
> > > > stmt_info);
> > > > > +      optab optab = optab_for_tree_code (code, vectype_in,
> > > > > + subtype);
> > > > >        if (!optab)
> > > > >  	{
> > > > >  	  if (dump_enabled_p ())
> > > > > diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
> > > > > index
> > > > >
> > > >
> > 441d6cd28c4eaded7abd756164890dbcffd2f3b8..943c001fb13777b4d1513841f
> > > > a84
> > > > > 942316846d5e 100644
> > > > > --- a/gcc/tree-vect-patterns.c
> > > > > +++ b/gcc/tree-vect-patterns.c
> > > > > @@ -201,7 +201,8 @@ vect_get_external_def_edge (vec_info *vinfo,
> > > > > tree
> > > > > var)  static bool  vect_supportable_direct_optab_p (vec_info
> > > > > *vinfo, tree otype, tree_code code,
> > > > >  				 tree itype, tree *vecotype_out,
> > > > > -				 tree *vecitype_out = NULL)
> > > > > +				 tree *vecitype_out = NULL,
> > > > > +				 enum optab_subtype subtype =
> > > > optab_default)
> > > > >  {
> > > > >    tree vecitype = get_vectype_for_scalar_type (vinfo, itype);
> > > > >    if (!vecitype)
> > > > > @@ -211,7 +212,7 @@ vect_supportable_direct_optab_p (vec_info
> > > > > *vinfo,
> > > > tree otype, tree_code code,
> > > > >    if (!vecotype)
> > > > >      return false;
> > > > >
> > > > > -  optab optab = optab_for_tree_code (code, vecitype,
> > > > > optab_default);
> > > > > +  optab optab = optab_for_tree_code (code, vecitype, subtype);
> > > > >    if (!optab)
> > > > >      return false;
> > > > >
> > > > > @@ -487,14 +488,31 @@ vect_joust_widened_integer (tree type, bool
> > > > > shift_p, tree op,  }
> > > > >
> > > > >  /* Return true if the common supertype of NEW_TYPE and
> > > > *COMMON_TYPE
> > > > > -   is narrower than type, storing the supertype in *COMMON_TYPE if
> > so.
> > > > */
> > > > > +   is narrower than type, storing the supertype in *COMMON_TYPE if
> > so.
> > > > > +   If ALLOW_SHORT_SIGN_MISMATCH then accept that
> > *COMMON_TYPE
> > > > and NEW_TYPE
> > > > > +   may be of different signs but equal precision.   */
> > > > >
> > > > >  static bool
> > > > > -vect_joust_widened_type (tree type, tree new_type, tree
> > > > *common_type)
> > > > > +vect_joust_widened_type (tree type, tree new_type, tree
> > > > *common_type,
> > > > > +			 bool allow_short_sign_mismatch = false)
> > > > >  {
> > > > >    if (types_compatible_p (*common_type, new_type))
> > > > >      return true;
> > > > >
> > > > > +  /* Check if the mismatch is only in the sign and if we have
> > > > > +     allow_short_sign_mismatch then allow it.  */
> > > > > +  if (allow_short_sign_mismatch
> > > > > +      && TYPE_SIGN (*common_type) != TYPE_SIGN (new_type))
> > > > > +    {
> > > > > +      bool sign = TYPE_SIGN (*common_type) == UNSIGNED;
> > > > > +      tree eq_type
> > > > > +	= build_nonstandard_integer_type (TYPE_PRECISION (new_type),
> > > > > +					  sign);
> > > > > +
> > > > > +      if (types_compatible_p (*common_type, eq_type))
> > > > > +	return true;
> > > > > +    }
> > > > > +
> > > > >    /* See if *COMMON_TYPE can hold all values of NEW_TYPE.  */
> > > > >    if ((TYPE_PRECISION (new_type) < TYPE_PRECISION (*common_type))
> > > > >        && (TYPE_UNSIGNED (new_type) || !TYPE_UNSIGNED
> > > > (*common_type)))
> > > > > @@ -532,6 +550,9 @@ vect_joust_widened_type (tree type, tree
> > > > new_type, tree *common_type)
> > > > >     to a type that (a) is narrower than the result of STMT_INFO and
> > > > >     (b) can hold all leaf operand values.
> > > > >
> > > > > +   If ALLOW_SHORT_SIGN_MISMATCH then allow that the signs of the
> > > > operands
> > > > > +   may differ in signs but not in precision.
> > > > > +
> > > > >     Return 0 if STMT_INFO isn't such a tree, or if no such COMMON_TYPE
> > > > >     exists.  */
> > > > >
> > > > > @@ -539,7 +560,8 @@ static unsigned int  vect_widened_op_tree
> > > > > (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
> > > > >  		      tree_code widened_code, bool shift_p,
> > > > >  		      unsigned int max_nops,
> > > > > -		      vect_unpromoted_value *unprom, tree *common_type)
> > > > > +		      vect_unpromoted_value *unprom, tree *common_type,
> > > > > +		      bool allow_short_sign_mismatch = false)
> > > > >  {
> > > > >    /* Check for an integer operation with the right code.  */
> > > > >    gassign *assign = dyn_cast <gassign *> (stmt_info->stmt); @@
> > > > > -600,7
> > > > > +622,8 @@ vect_widened_op_tree (vec_info *vinfo, stmt_vec_info
> > > > stmt_info, tree_code code,
> > > > >  		= vinfo->lookup_def (this_unprom->op);
> > > > >  	      nops = vect_widened_op_tree (vinfo, def_stmt_info, code,
> > > > >  					   widened_code, shift_p, max_nops,
> > > > > -					   this_unprom, common_type);
> > > > > +					   this_unprom, common_type,
> > > > > +					   allow_short_sign_mismatch);
> > > > >  	      if (nops == 0)
> > > > >  		return 0;
> > > > >
> > > > > @@ -617,7 +640,8 @@ vect_widened_op_tree (vec_info *vinfo,
> > > > stmt_vec_info stmt_info, tree_code code,
> > > > >  	      if (i == 0)
> > > > >  		*common_type = this_unprom->type;
> > > > >  	      else if (!vect_joust_widened_type (type, this_unprom->type,
> > > > > -						 common_type))
> > > > > +						 common_type,
> > > > > +						 allow_short_sign_mismatch))
> > > > >  		return 0;
> > > > >  	    }
> > > > >  	}
> > > > > @@ -888,21 +912,24 @@ vect_reassociating_reduction_p (vec_info
> > > > > *vinfo,
> > > > >
> > > > >     Try to find the following pattern:
> > > > >
> > > > > -     type x_t, y_t;
> > > > > +     type1a x_t
> > > > > +     type1b y_t;
> > > > >       TYPE1 prod;
> > > > >       TYPE2 sum = init;
> > > > >     loop:
> > > > >       sum_0 = phi <init, sum_1>
> > > > >       S1  x_t = ...
> > > > >       S2  y_t = ...
> > > > > -     S3  x_T = (TYPE1) x_t;
> > > > > -     S4  y_T = (TYPE1) y_t;
> > > > > +     S3  x_T = (TYPE3) x_t;
> > > > > +     S4  y_T = (TYPE4) y_t;
> > > > >       S5  prod = x_T * y_T;
> > > > >       [S6  prod = (TYPE2) prod;  #optional]
> > > > >       S7  sum_1 = prod + sum_0;
> > > > >
> > > > > -   where 'TYPE1' is exactly double the size of type 'type', and 'TYPE2' is
> > the
> > > > > -   same size of 'TYPE1' or bigger. This is a special case of a reduction
> > > > > +   where 'TYPE1' is exactly double the size of type 'type1a' and 'type1b',
> > > > > +   the sign of 'TYPE1' must be one of 'type1a' or 'type1b' but the sign of
> > > > > +   'type1a' and 'type1b' can differ. 'TYPE2' is the same size of 'TYPE1' or
> > > > > +   bigger and must be the same sign. This is a special case of a
> > > > > + reduction
> > > > >     computation.
> > > > >
> > > > >     Input:
> > > > > @@ -939,15 +966,16 @@ vect_recog_dot_prod_pattern (vec_info
> > > > > *vinfo,
> > > > >
> > > > >    /* Look for the following pattern
> > > > >            DX = (TYPE1) X;
> > > > > -          DY = (TYPE1) Y;
> > > > > +	  DY = (TYPE2) Y;
> > > > >            DPROD = DX * DY;
> > > > > -          DDPROD = (TYPE2) DPROD;
> > > > > +	  DDPROD = (TYPE3) DPROD;
> > > > >            sum_1 = DDPROD + sum_0;
> > > > >       In which
> > > > >       - DX is double the size of X
> > > > >       - DY is double the size of Y
> > > > >       - DX, DY, DPROD all have the same type but the sign
> > > > > -       between DX, DY and DPROD can differ.
> > > > > +       between DX, DY and DPROD can differ. The sign of DPROD
> > > > > +       is one of the signs of DX or DY.
> > > > >       - sum is the same size of DPROD or bigger
> > > > >       - sum has been recognized as a reduction variable.
> > > > >
> > > > > @@ -986,14 +1014,41 @@ vect_recog_dot_prod_pattern (vec_info
> > *vinfo,
> > > > >       inside the loop (in case we are analyzing an outer-loop).  */
> > > > >    vect_unpromoted_value unprom0[2];
> > > > >    if (!vect_widened_op_tree (vinfo, mult_vinfo, MULT_EXPR,
> > > > WIDEN_MULT_EXPR,
> > > > > -			     false, 2, unprom0, &half_type))
> > > > > +			     false, 2, unprom0, &half_type, true))
> > > > >      return NULL;
> > > > >
> > > > > +  /* Check to see if there is a sign change happening in the
> > > > > + operands of
> > > > the
> > > > > +     multiplication and pick the appropriate optab subtype.  */
> > > > > +  enum optab_subtype subtype;
> > > > > +  tree rhs_type1 = unprom0[0].type;
> > > > > +  tree rhs_type2 = unprom0[1].type;
> > > > > +  if (TYPE_SIGN (rhs_type1) == TYPE_SIGN (rhs_type2))
> > > > > +     subtype = optab_default;
> > > > > +  else if (TYPE_SIGN (rhs_type1) == SIGNED
> > > > > +	   && TYPE_SIGN (rhs_type2) == UNSIGNED)
> > > > > +     subtype = optab_signed_to_unsigned;
> > > > > +  else if (TYPE_SIGN (rhs_type1) == UNSIGNED
> > > > > +	   && TYPE_SIGN (rhs_type2) == SIGNED)
> > > > > +     subtype = optab_unsigned_to_signed;
> > > > > +  else
> > > > > +    gcc_unreachable ();
> > > > > +
> > > > > +  /* If we have a sign changing dot product we need to check that the
> > > > > +     promoted type if unsigned has at least the same precision as the
> > final
> > > > > +     type of the dot-product.  */
> > > > > +  if (subtype != optab_default)
> > > > > +    {
> > > > > +      tree mult_type = TREE_TYPE (unprom_mult.op);
> > > > > +      if (TYPE_SIGN (mult_type) == UNSIGNED
> > > > > +	  && TYPE_PRECISION (mult_type) < TYPE_PRECISION (type))
> > > > > +	return NULL;
> > > > > +    }
> > > > > +
> > > > >    vect_pattern_detected ("vect_recog_dot_prod_pattern",
> > > > > last_stmt);
> > > > >
> > > > >    tree half_vectype;
> > > > >    if (!vect_supportable_direct_optab_p (vinfo, type,
> > > > > DOT_PROD_EXPR,
> > > > half_type,
> > > > > -					type_out, &half_vectype))
> > > > > +					type_out, &half_vectype, subtype))
> > > > >      return NULL;
> > > > >
> > > > >    /* Get the inputs in the appropriate types.  */ @@ -1002,8
> > > > > +1057,22 @@ vect_recog_dot_prod_pattern (vec_info *vinfo,
> > > > >  		       unprom0, half_vectype);
> > > > >
> > > > >    var = vect_recog_temp_ssa_var (type, NULL);
> > > > > +
> > > > > +  /* If we have a sign changing dot-product the dot-product itself does
> > any
> > > > > +     sign conversions, so consume the type and use the unpromoted
> > > > > + types.  */  tree mult_arg1, mult_arg2;  if (subtype ==
> > > > > + optab_default)
> > > > > +    {
> > > > > +      mult_arg1 = mult_oprnd[0];
> > > > > +      mult_arg2 = mult_oprnd[1];
> > > > > +    }
> > > > > +  else
> > > > > +    {
> > > > > +      mult_arg1 = unprom0[0].op;
> > > > > +      mult_arg2 = unprom0[1].op;
> > > > > +    }
> > > > >    pattern_stmt = gimple_build_assign (var, DOT_PROD_EXPR,
> > > > > -				      mult_oprnd[0], mult_oprnd[1], oprnd1);
> > > > > +				      mult_arg1, mult_arg2, oprnd1);
> > > > >
> > > > >    return pattern_stmt;
> > > > >  }
> > > > >
> > > > >
> > > > >
> > > >
> > > > --
> > > > Richard Biener <rguenther@suse.de>
> > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409
> > > > Nuernberg, Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> > >
> > 
> > --
> > Richard Biener <rguenther@suse.de>
> > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409
> > Nuernberg, Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
>
Tamar Christina May 25, 2021, 2:57 p.m. UTC | #6
Hi Richi,

Here's a respun version of the patch.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* optabs.def (usdot_prod_optab): New.
	* doc/md.texi: Document it and clarify other dot prod optabs.
	* optabs-tree.h (enum optab_subtype): Add optab_vector_mixed_sign.
	* optabs-tree.c (optab_for_tree_code): Support usdot_prod_optab.
	* optabs.c (expand_widen_pattern_expr): Likewise.
	* tree-cfg.c (verify_gimple_assign_ternary): Likewise.
	* tree-vect-loop.c (vect_determine_dot_kind): New.
	(vectorizable_reduction): Query dot-product kind.
	* tree-vect-patterns.c (vect_supportable_direct_optab_p): Take optional
	optab subtype.
	(vect_joust_widened_type, vect_widened_op_tree): Optionally ignore
	mismatch types.
	(vect_recog_dot_prod_pattern): Support usdot_prod_optab.


> -----Original Message-----
> From: Richard Biener <rguenther@suse.de>
> Sent: Monday, May 10, 2021 2:29 PM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>
> Subject: RE: [PATCH 1/4]middle-end Vect: Add support for dot-product
> where the sign for the multiplicant changes.
> 
> On Mon, 10 May 2021, Tamar Christina wrote:
> 
> >
> >
> > > -----Original Message-----
> > > From: Richard Biener <rguenther@suse.de>
> > > Sent: Monday, May 10, 2021 12:40 PM
> > > To: Tamar Christina <Tamar.Christina@arm.com>
> > > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>
> > > Subject: RE: [PATCH 1/4]middle-end Vect: Add support for dot-product
> > > where the sign for the multiplicant changes.
> > >
> > > On Fri, 7 May 2021, Tamar Christina wrote:
> > >
> > > > Hi Richi,
> > > >
> > > > > -----Original Message-----
> > > > > From: Richard Biener <rguenther@suse.de>
> > > > > Sent: Friday, May 7, 2021 12:46 PM
> > > > > To: Tamar Christina <Tamar.Christina@arm.com>
> > > > > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>
> > > > > Subject: Re: [PATCH 1/4]middle-end Vect: Add support for
> > > > > dot-product where the sign for the multiplicant changes.
> > > > >
> > > > > On Wed, 5 May 2021, Tamar Christina wrote:
> > > > >
> > > > > > Hi All,
> > > > > >
> > > > > > This patch adds support for a dot product where the sign of
> > > > > > the multiplication arguments differ. i.e. one is signed and
> > > > > > one is unsigned but the precisions are the same.
> > > > > >
> > > > > > #define N 480
> > > > > > #define SIGNEDNESS_1 unsigned
> > > > > > #define SIGNEDNESS_2 signed
> > > > > > #define SIGNEDNESS_3 signed
> > > > > > #define SIGNEDNESS_4 unsigned
> > > > > >
> > > > > > SIGNEDNESS_1 int __attribute__ ((noipa)) f (SIGNEDNESS_1 int
> > > > > > res,
> > > > > > SIGNEDNESS_3 char *restrict a,
> > > > > >    SIGNEDNESS_4 char *restrict b) {
> > > > > >   for (__INTPTR_TYPE__ i = 0; i < N; ++i)
> > > > > >     {
> > > > > >       int av = a[i];
> > > > > >       int bv = b[i];
> > > > > >       SIGNEDNESS_2 short mult = av * bv;
> > > > > >       res += mult;
> > > > > >     }
> > > > > >   return res;
> > > > > > }
> > > > > >
> > > > > > The operations are performed as if the operands were extended
> > > > > > to a 32-bit
> > > > > value.
> > > > > > As such this operation isn't valid if there is an intermediate
> > > > > > conversion to an unsigned value. i.e.  if SIGNEDNESS_2 is unsigned.
> > > > > >
> > > > > > more over if the signs of SIGNEDNESS_3 and SIGNEDNESS_4 are
> > > > > > flipped the same optab is used but the operands are flipped in
> > > > > > the optab
> > > > > expansion.
> > > > > >
> > > > > > To support this the patch extends the dot-product detection to
> > > > > > optionally ignore operands with different signs and stores
> > > > > > this information in the optab subtype which is now made a bitfield.
> > > > > >
> > > > > > The subtype can now additionally controls which optab an EXPR
> > > > > > can expand
> > > > > to.
> > > > > >
> > > > > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > > > > >
> > > > > > Ok for master?
> > > > > >
> > > > > > Thanks,
> > > > > > Tamar
> > > > > >
> > > > > > gcc/ChangeLog:
> > > > > >
> > > > > > 	* optabs.def (usdot_prod_optab): New.
> > > > > > 	* doc/md.texi: Document it.
> > > > > > 	* optabs-tree.c (optab_for_tree_code): Support
> usdot_prod_optab.
> > > > > > 	* optabs-tree.h (enum optab_subtype): Likewise.
> > > > > > 	* optabs.c (expand_widen_pattern_expr): Likewise.
> > > > > > 	* tree-cfg.c (verify_gimple_assign_ternary): Likewise.
> > > > > > 	* tree-vect-loop.c (vect_determine_dot_kind): New.
> > > > > > 	(vectorizable_reduction): Query dot-product kind.
> > > > > > 	* tree-vect-patterns.c (vect_supportable_direct_optab_p):
> > > > > > Take
> > > > > optional
> > > > > > 	optab subtype.
> > > > > > 	(vect_joust_widened_type, vect_widened_op_tree):
> Optionally
> > > > > ignore
> > > > > > 	mismatch types.
> > > > > > 	(vect_recog_dot_prod_pattern): Support usdot_prod_optab.
> > > > > >
> > > > > > --- inline copy of patch --
> > > > > > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index
> > > > > >
> > > > >
> > >
> d166a0debedf4d8edf55c842bcf4ff4690b3e9ce..baf20416e63745097825fc30fd
> > > > > f2
> > > > > > e66bc80d7d23 100644
> > > > > > --- a/gcc/doc/md.texi
> > > > > > +++ b/gcc/doc/md.texi
> > > > > > @@ -5440,11 +5440,13 @@ Like @samp{fold_left_plus_@var{m}},
> > > > > > but
> > > > > takes
> > > > > > an additional mask operand  @item @samp{sdot_prod@var{m}}
> > > @cindex
> > > > > > @code{udot_prod@var{m}} instruction pattern  @itemx
> > > > > > @samp{udot_prod@var{m}}
> > > > > > +@cindex @code{usdot_prod@var{m}} instruction pattern @itemx
> > > > > > +@samp{usdot_prod@var{m}}
> > > > > >  Compute the sum of the products of two signed/unsigned
> elements.
> > > > > > -Operand 1 and operand 2 are of the same mode. Their product,
> > > > > > which is of a -wider mode, is computed and added to operand 3.
> > > > > > Operand 3 is of a mode equal or -wider than the mode of the
> > > > > > product. The result is placed in operand 0, which -is of the
> > > > > > same mode
> > > as operand 3.
> > > > > > +Operand 1 and operand 2 are of the same mode but may differ
> > > > > > +in
> > > signs.
> > > > > > +Their product, which is of a wider mode, is computed and
> > > > > > +added to
> > > > > operand 3.
> > > > > > +Operand 3 is of a mode equal or wider than the mode of the
> product.
> > > > > > +The result is placed in operand 0, which is of the same mode
> > > > > > +as
> > > operand 3.
> > > > >
> > > > > This doesn't really say what the 's', 'u' and 'us' specify.
> > > > > Since we're doing a widen multiplication and then a non-widening
> > > > > addition we only need to know the effective sign of the
> > > > > multiplication so I think
> > > the existing 's' and 'u'
> > > > > are enough to cover all cases?
> > > >
> > > > The existing 's' and 'u' enforce that both operands of the
> > > > multiplication are of the same sign.  So for e.g. 'u' both operand
> > > > must be
> > > unsigned.
> > > >
> > > > In the `us` case one can be signed and one unsigned. Operationally
> > > > this does a sign extension to the wider type for the signed value,
> > > > and the unsigned value gets zero extended first, and then converts
> > > > it to unsigned to perform the unsigned multiplication, conforming
> > > > to the C
> > > promotion rules.
> > > >
> > > > TL;DR; Without a new optab I can't tell during expansion which
> > > > semantic the operation had at the gimple/C level as modes don't carry
> signs.
> > > >
> > > > Long version:
> > > >
> > > > The problem with using the existing patterns, because of their
> > > > enforcement of `av` and `bv` being the same sign is that we can't
> > > > remove the explicit sign extensions, but the multiplication must
> > > > be done on
> > > the sign/zero extended char input in the same sign.
> > > >
> > > > Which means (unless I am mistaken) to get the correct result, you
> > > > can't use neither `udot` nor `sdot` as semantically these would
> > > > zero or sign extend both operands from char to int to perform the
> > > > multiplication in the same sigh.  Whereas in this case, one
> > > > parameter is zero
> > > and one parameter is sign extended and the result is always an
> > > unsigned number.
> > > >
> > > > So basically
> > > >
> > > > udot<unsigned c, unsigned a, unsigned b> ==
> > > >    c = zero-ext (a) * zero-ext (b) sdot<signed c, signed a, signed
> > > > b> ==
> > > >    c = sign-ext (a) * sign-ext (b) usdot<unsigned c, unsigned a,
> > > > signed b> ==
> > > >    c = ((unsigned-conv) sign-ext (a)) * zero-ext (b)
> > > >
> > > > So semantically the existing optabs won't fit here. udot would
> > > > internally promote to unsigned types before the multiplication so
> > > > the result of the multiplication would be wrong.  sdot would
> > > > promote both to
> > > signed and do signed multiplication, so the result is also wrong.
> > > >
> > > > Now if I relax the constraint on the signs of udot and sdot there
> > > > are two
> > > problems:
> > > > RTL Modes don't contain signs.  So a target can't tell me how the
> > > > operands
> > > will be promoted.
> > > > So:
> > > >
> > > > 1) I can't really check which semantics the target will adhere to
> > > > on
> > > expansion.
> > > > 2) at expand time I have no way to differentiate between the two
> > > instructions variants, given just modes
> > > >      I can't tell whether I expand to the normal dot-product or
> > > > the new
> > > instruction.
> > >
> > > Ah, OK.  Indeed with such a weird instruction the new variant makes
> sense.
> > > Still can you please amend the optab documentation to say which
> > > operand is unsigned and which is signed?  Just 'may differ in signs'
> > > is bad.
> >
> > Sure, will expand on it.
> >
> > >
> > > Since the multiplication is commutative I wonder why you need to
> > > handle both signed_to_unsigned and unsigned_to_signed - we should
> > > just enforce a canonical order (like the optab does).
> >
> > Sure, I thought it would have been better to change the order at
> > expand time, but can do so at detection time.
> >
> > > I also think it's a particular bad fit for the bad
> > > optab_for_tree_code API - would any of that improve when using a
> > > direct internal function here?
> >
> > Somewhat, but this has considerable knock on effects, e.g. currently
> > DOT_PROD is treated as a widening operation and so is handled by
> > supportable_widening_operation which does not support calls. There's a
> > significant number of places which work on the tree EXPR (including
> constant folding) which all need to be changed.
> >
> > > In particular all the changes around optab_subtype look like they
> > > make a bad API worse ... at least a single optab_vector_mixed_sign
> > > should suffice here, no need to make it a flags kind.
> >
> > The reason I did so is because depending on where the query is done it
> > does use different subtypes currently.  During detection it uses
> > optab_default, and during vectorization optab_vector.  For this
> > instruction this difference doesn't seem to be used, but did not want to
> lose this information in case something depended on it.
> >
> > But can make it just one.
> >
> > >
> > > +  /* If we have a sign changing dot product we need to check that the
> > > +     promoted type if unsigned has at least the same precision as
> > > + the
> > > final
> > > +     type of the dot-product.  */
> > > +  if (subtype != optab_default)
> > > +    {
> > > +      tree mult_type = TREE_TYPE (unprom_mult.op);
> > > +      if (TYPE_SIGN (mult_type) == UNSIGNED
> > > +         && TYPE_PRECISION (mult_type) < TYPE_PRECISION (type))
> > > +       return NULL;
> > > +    }
> > >
> > > I don't understand this - how do we ever arrive at a result with less
> precision?
> >
> > The user could have manually truncated the results, i.e. in the
> > detection code notice `mult`
> >
> >       int av = a[i];
> >       int bv = b[i];
> >       SIGNEDNESS_2 short mult = av * bv;
> >       res += mult;
> >
> > which is a short, so it's manually truncating the multiplication which
> > is done as int by the instruction. If `mult` is unsigned then it will
> > truncate the result if the signed input to usdot was negative, unless
> > the Intermediate calculation is of the same precision as the
> > instruction. i.e. if mult is unsigned int then there's no truncation
> > going on, it's casting from int to unsigned int so it's safe to use
> > then as the instruction does the same thing internally.
> 
> It looks to me that we simply should only ever allow sing-changes from
> multiplication result to the sum.  At least your example above is not special to
> mixed sign multiplications, no?
> 
> > > And why's this not an issue for signed multiplication?
> >
> > It is, but in that case it's handled by the type jousting, which
> > doesn't allow the type mismatch. i.e.
> >
> > #define SIGNEDNESS_1 unsigned
> > #define SIGNEDNESS_2 unsigned
> > #define SIGNEDNESS_3 signed
> > #define SIGNEDNESS_4 signed
> >
> > SIGNEDNESS_1 int __attribute__ ((noipa)) f (SIGNEDNESS_1 int res,
> > SIGNEDNESS_3 char *restrict a,
> >    SIGNEDNESS_4 char *restrict b)
> > {
> >   for (__INTPTR_TYPE__ i = 0; i < N; ++i)
> >     {
> >       int av = a[i];
> >       int bv = b[i];
> >       SIGNEDNESS_2 short mult = av * bv;
> >       res += mult;
> >     }
> >   return res;
> > }
> >
> > Is also not detected as a dot product.  By adding the carve out to the
> > widen multiplication detection it now allows this case through so I
> > handle it in the detection code.  Thinking about it now, it seems more
> > logical to add this case handling inside the type jousting code as I
> > don't think it's ever something you'd want.
> 
> Yeah, I think we only need to look through sign changes on the multiplication
> result.
> 
> > > Also...
> > >
> > > +  /* If we have a sign changing dot-product the dot-product itself
> > > + does
> > > any
> > > +     sign conversions, so consume the type and use the unpromoted
> types.
> > > */
> > > +  tree mult_arg1, mult_arg2;
> > > +  if (subtype == optab_default)
> > > +    {
> > > +      mult_arg1 = mult_oprnd[0];
> > > +      mult_arg2 = mult_oprnd[1];
> > > +    }
> > > +  else
> > > +    {
> > > +      mult_arg1 = unprom0[0].op;
> > > +      mult_arg2 = unprom0[1].op;
> > > +    }
> > >    pattern_stmt = gimple_build_assign (var, DOT_PROD_EXPR,
> > > -                                     mult_oprnd[0], mult_oprnd[1],
> > > oprnd1);
> > > +                                     mult_arg1, mult_arg2, oprnd1);
> > >
> > > I thought DOT_PROD always performs the promotion.  Maybe
> mult_oprnd
> > > and unprom0 are just misnamed here?
> >
> > Somewhat, in a normal dot-product the sign of the multiplication are
> > the same here as the "unpromoted" types. So after vect_convert_input
> > these two types are the same.
> >
> > However because here the sign changes and to maintain the semantics of
> > the C code there's an extra conversion here to get the arguments in
> > the same sign.  That needs to be stripped before given to the
> > instruction which does the conversion internally.
> 
> Yes, but then why's that not done by the detection code?  That is, does it
> (mis-)handle the (int)short_a * (int)(unsigned short)short_b where we'd
> want the mixed-sign handling and not strip the unsigned short conversion
> from short_b?
> 
> Richard.
> 
> >
> > Regards,
> > Tamar
> >
> > >
> > > Richard.
> > >
> > > > Regards,
> > > > Tamar
> > > >
> > > > >
> > > > > The tree.def docs say the sum is also possibly widening but I
> > > > > don't see this covered by the optab so we should eventually
> > > > > remove this feature from the tree side.  In fact the tree-cfg.c
> > > > > verifier requires the addition to be not widening - thus only
> > > > > tree.def needs
> > > adjustment.
> > > > >
> > > > > >  @cindex @code{ssad@var{m}} instruction pattern  @item
> > > > > > @samp{ssad@var{m}} diff --git a/gcc/optabs-tree.h
> > > > > > b/gcc/optabs-tree.h index
> > > > > >
> > > > >
> > >
> c3aaa1a416991e856d3e24da45968a92ebada82c..ebc23ac86fe99057f375781c2f
> > > > > 19
> > > > > > 90e0548ba08d 100644
> > > > > > --- a/gcc/optabs-tree.h
> > > > > > +++ b/gcc/optabs-tree.h
> > > > > > @@ -27,11 +27,29 @@ along with GCC; see the file COPYING3.  If
> > > > > > not
> > > see
> > > > > >     shift amount vs. machines that take a vector for the shift amount.
> > > > > > */  enum optab_subtype  {
> > > > > > -  optab_default,
> > > > > > -  optab_scalar,
> > > > > > -  optab_vector
> > > > > > +  optab_default = 1 << 0,
> > > > > > +  optab_scalar = 1 << 1,
> > > > > > +  optab_vector = 1 << 2,
> > > > > > +  optab_signed_to_unsigned = 1 << 3,
> > > > > > + optab_unsigned_to_signed =
> > > > > > + 1 << 4
> > > > > >  };
> > > > > >
> > > > > > +/* Override the OrEqual-operator so we can use optab_subtype
> > > > > > +as a bit flag.  */ inline enum optab_subtype& operator |=
> > > > > > +(enum
> > > > > optab_subtype&
> > > > > > +a, enum optab_subtype b) {
> > > > > > +    return a = static_cast<optab_subtype>(static_cast<int>(a)
> > > > > > +					  | static_cast<int>(b)); }
> > > > > > +
> > > > > > +/* Override the Or-operator so we can use optab_subtype as a
> > > > > > +bit flag.  */ inline enum optab_subtype operator | (enum
> > > > > > +optab_subtype a, enum optab_subtype b) {
> > > > > > +    return static_cast<optab_subtype>(static_cast<int>(a)
> > > > > > +				      | static_cast<int>(b)); }
> > > > > > +
> > > > > >  /* Return the optab used for computing the given operation on
> > > > > > the type
> > > > > given by
> > > > > >     the second argument.  The third argument distinguishes
> > > > > > between the
> > > > > types of
> > > > > >     vector shifts and rotates.  */ diff --git
> > > > > > a/gcc/optabs-tree.c b/gcc/optabs-tree.c index
> > > > > >
> > > > >
> > >
> 95ffe397c23e80c105afea52e9d47216bf52f55a..2f60004545defc53182e004eea
> > > > > 1e
> > > > > > 5c22b7453072 100644
> > > > > > --- a/gcc/optabs-tree.c
> > > > > > +++ b/gcc/optabs-tree.c
> > > > > > @@ -127,7 +127,17 @@ optab_for_tree_code (enum tree_code
> code,
> > > > > const_tree type,
> > > > > >        return TYPE_UNSIGNED (type) ? usum_widen_optab :
> > > > > > ssum_widen_optab;
> > > > > >
> > > > > >      case DOT_PROD_EXPR:
> > > > > > -      return TYPE_UNSIGNED (type) ? udot_prod_optab :
> > > sdot_prod_optab;
> > > > > > +      {
> > > > > > +	gcc_assert (subtype & optab_default
> > > > > > +		    || subtype & optab_vector
> > > > > > +		    || subtype & optab_signed_to_unsigned
> > > > > > +		    || subtype & optab_unsigned_to_signed);
> > > > > > +
> > > > > > +	if (subtype & (optab_unsigned_to_signed |
> > > > > optab_signed_to_unsigned))
> > > > > > +	  return usdot_prod_optab;
> > > > > > +
> > > > > > +	return (TYPE_UNSIGNED (type) ? udot_prod_optab :
> > > > > sdot_prod_optab);
> > > > > > +      }
> > > > > >
> > > > > >      case SAD_EXPR:
> > > > > >        return TYPE_UNSIGNED (type) ? usad_optab : ssad_optab;
> > > > > > diff --git a/gcc/optabs.c b/gcc/optabs.c index
> > > > > >
> > > > >
> > >
> f4614a394587787293dc8b680a38901f7906f61c..2e18b76de1412eab71971753ac
> > > > > 67
> > > > > > 8597c0d00098 100644
> > > > > > --- a/gcc/optabs.c
> > > > > > +++ b/gcc/optabs.c
> > > > > > @@ -262,6 +262,11 @@ expand_widen_pattern_expr (sepops ops,
> > > > > > rtx op0,
> > > > > rtx op1, rtx wide_op,
> > > > > >    bool sbool = false;
> > > > > >
> > > > > >    oprnd0 = ops->op0;
> > > > > > +  if (nops >= 2)
> > > > > > +    oprnd1 = ops->op1;
> > > > > > +  if (nops >= 3)
> > > > > > +    oprnd2 = ops->op2;
> > > > > > +
> > > > > >    tmode0 = TYPE_MODE (TREE_TYPE (oprnd0));
> > > > > >    if (ops->code == VEC_UNPACK_FIX_TRUNC_HI_EXPR
> > > > > >        || ops->code == VEC_UNPACK_FIX_TRUNC_LO_EXPR) @@ -
> 285,6
> > > > > +290,27
> > > > > > @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1,
> > > > > > rtx
> > > > > wide_op,
> > > > > >  	   ? vec_unpacks_sbool_hi_optab :
> vec_unpacks_sbool_lo_optab);
> > > > > >        sbool = true;
> > > > > >      }
> > > > > > +  else if (ops->code == DOT_PROD_EXPR)
> > > > > > +    {
> > > > > > +      enum optab_subtype subtype = optab_default;
> > > > > > +      signop sign1 = TYPE_SIGN (TREE_TYPE (oprnd0));
> > > > > > +      signop sign2 = TYPE_SIGN (TREE_TYPE (oprnd1));
> > > > > > +      if (sign1 == sign2)
> > > > > > +	;
> > > > > > +      else if (sign1 == SIGNED && sign2 == UNSIGNED)
> > > > > > +	{
> > > > > > +	  subtype |= optab_signed_to_unsigned;
> > > > > > +	  /* Same as optab_unsigned_to_signed but flip the
> operands.  */
> > > > > > +	  std::swap (op0, op1);
> > > > > > +	}
> > > > > > +      else if (sign1 == UNSIGNED && sign2 == SIGNED)
> > > > > > +	subtype |= optab_unsigned_to_signed;
> > > > > > +      else
> > > > > > +	gcc_unreachable ();
> > > > > > +
> > > > > > +      widen_pattern_optab
> > > > > > +	= optab_for_tree_code (ops->code, TREE_TYPE (oprnd0),
> subtype);
> > > > > > +    }
> > > > > >    else
> > > > > >      widen_pattern_optab
> > > > > >        = optab_for_tree_code (ops->code, TREE_TYPE (oprnd0),
> > > > > > optab_default); @@ -298,10 +324,7 @@
> expand_widen_pattern_expr
> > > > > (sepops ops, rtx op0, rtx op1, rtx wide_op,
> > > > > >    gcc_assert (icode != CODE_FOR_nothing);
> > > > > >
> > > > > >    if (nops >= 2)
> > > > > > -    {
> > > > > > -      oprnd1 = ops->op1;
> > > > > > -      tmode1 = TYPE_MODE (TREE_TYPE (oprnd1));
> > > > > > -    }
> > > > > > +    tmode1 = TYPE_MODE (TREE_TYPE (oprnd1));
> > > > > >    else if (sbool)
> > > > > >      {
> > > > > >        nops = 2;
> > > > > > @@ -316,7 +339,6 @@ expand_widen_pattern_expr (sepops ops,
> rtx
> > > > > > op0,
> > > > > rtx op1, rtx wide_op,
> > > > > >      {
> > > > > >        gcc_assert (tmode1 == tmode0);
> > > > > >        gcc_assert (op1);
> > > > > > -      oprnd2 = ops->op2;
> > > > > >        wmode = TYPE_MODE (TREE_TYPE (oprnd2));
> > > > > >      }
> > > > > >
> > > > > > diff --git a/gcc/optabs.def b/gcc/optabs.def index
> > > > > >
> > > > >
> > >
> b192a9d070b8aa72e5676b2eaa020b5bdd7ffcc8..f470c2168378cec840edf7fbd
> > > > > b7c
> > > > > > 18615baae928 100644
> > > > > > --- a/gcc/optabs.def
> > > > > > +++ b/gcc/optabs.def
> > > > > > @@ -352,6 +352,7 @@ OPTAB_D (uavg_ceil_optab, "uavg$a3_ceil")
> > > > > OPTAB_D
> > > > > > (sdot_prod_optab, "sdot_prod$I$a")  OPTAB_D
> (ssum_widen_optab,
> > > > > > "widen_ssum$I$a3")  OPTAB_D (udot_prod_optab,
> "udot_prod$I$a")
> > > > > > +OPTAB_D (usdot_prod_optab, "usdot_prod$I$a")
> > > > > >  OPTAB_D (usum_widen_optab, "widen_usum$I$a3")  OPTAB_D
> > > > > (usad_optab,
> > > > > > "usad$I$a")  OPTAB_D (ssad_optab, "ssad$I$a") diff --git
> > > > > > a/gcc/tree-cfg.c b/gcc/tree-cfg.c index
> > > > > >
> > > > >
> > >
> 7e3aae5f9c28a49feedc7cc66e8ac0d476b9f28a..58b55bb648ad97d514f1fa18bb
> > > > > 00
> > > > > > 808fd2678b42 100644
> > > > > > --- a/gcc/tree-cfg.c
> > > > > > +++ b/gcc/tree-cfg.c
> > > > > > @@ -4421,7 +4421,8 @@ verify_gimple_assign_ternary (gassign
> *stmt)
> > > > > >  		  && !SCALAR_FLOAT_TYPE_P (rhs1_type))
> > > > > >  		 || (!INTEGRAL_TYPE_P (lhs_type)
> > > > > >  		     && !SCALAR_FLOAT_TYPE_P (lhs_type))))
> > > > > > -	    || !types_compatible_p (rhs1_type, rhs2_type)
> > > > > > +	    || (!types_compatible_p (rhs1_type, rhs2_type)
> > > > > > +		&& TYPE_SIGN (rhs1_type) == TYPE_SIGN
> (rhs2_type))
> > > > >
> > > > > That's not restrictive enough.  I suggest you use
> > > > >
> > > > >             && element_precision (rhs1_type) !=
> > > > > element_precision
> > > > > (rhs2_type)
> > > > >
> > > > > instead.
> > > > >
> > > > > As said, I'm not sure all the changes in this patch are required.
> > > > >
> > > > > Please elaborate.
> > > > >
> > > > > Thanks,
> > > > > Richard.
> > > > >
> > > > > >  	    || !useless_type_conversion_p (lhs_type, rhs3_type)
> > > > > >  	    || maybe_lt (GET_MODE_SIZE (element_mode
> (rhs3_type)),
> > > > > >  			 2 * GET_MODE_SIZE (element_mode
> (rhs1_type))))
> > > > > diff --git
> > > > > > a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index
> > > > > >
> > > > >
> > >
> 93fa2928e001c154bd4a9a73ac1dbbbf73c456df..cb8f5fbb6abca181c4171194d1
> > > > > 9f
> > > > > > ec29ec6e4176 100644
> > > > > > --- a/gcc/tree-vect-loop.c
> > > > > > +++ b/gcc/tree-vect-loop.c
> > > > > > @@ -6401,6 +6401,33 @@ build_vect_cond_expr (enum tree_code
> > > code,
> > > > > tree vop[3], tree mask,
> > > > > >      }
> > > > > >  }
> > > > > >
> > > > > > +/* Determine the optab_subtype to use for the given CODE and
> STMT.
> > > > > For
> > > > > > +   most CODE this will be optab_vector, however for certain
> > > > > > + operations
> > > > > such as
> > > > > > +   DOT_PROD_EXPR where the operation can different signs for
> > > > > > + the
> > > > > operands we
> > > > > > +   need to be able to pick the right optabs.  */
> > > > > > +
> > > > > > +static enum optab_subtype
> > > > > > +vect_determine_dot_kind (tree_code code, stmt_vec_info
> > > > > > +stmt_vinfo) {
> > > > > > +  enum optab_subtype subtype = optab_vector;
> > > > > > +  switch (code)
> > > > > > +    {
> > > > > > +      case DOT_PROD_EXPR:
> > > > > > +	{
> > > > > > +	  gassign *stmt = as_a <gassign *> (STMT_VINFO_STMT
> (stmt_vinfo));
> > > > > > +	  signop rhs1_sign = TYPE_SIGN (TREE_TYPE
> > > > > > +(gimple_assign_rhs1
> > > > > (stmt)));
> > > > > > +	  signop rhs2_sign = TYPE_SIGN (TREE_TYPE
> > > > > > +(gimple_assign_rhs2
> > > > > (stmt)));
> > > > > > +	  if (rhs1_sign != rhs2_sign)
> > > > > > +	    subtype |= optab_unsigned_to_signed;
> > > > > > +	  break;
> > > > > > +	}
> > > > > > +      default:
> > > > > > +	break;
> > > > > > +    }
> > > > > > +
> > > > > > +  return subtype;
> > > > > > +}
> > > > > > +
> > > > > >  /* Function vectorizable_reduction.
> > > > > >
> > > > > >     Check if STMT_INFO performs a reduction operation that can
> > > > > > be
> > > > > vectorized.
> > > > > > @@ -7189,7 +7216,8 @@ vectorizable_reduction (loop_vec_info
> > > > > loop_vinfo,
> > > > > >        bool ok = true;
> > > > > >
> > > > > >        /* 4.1. check support for the operation in the loop  */
> > > > > > -      optab optab = optab_for_tree_code (code, vectype_in,
> > > optab_vector);
> > > > > > +      enum optab_subtype subtype = vect_determine_dot_kind
> > > > > > + (code,
> > > > > stmt_info);
> > > > > > +      optab optab = optab_for_tree_code (code, vectype_in,
> > > > > > + subtype);
> > > > > >        if (!optab)
> > > > > >  	{
> > > > > >  	  if (dump_enabled_p ())
> > > > > > diff --git a/gcc/tree-vect-patterns.c
> > > > > > b/gcc/tree-vect-patterns.c index
> > > > > >
> > > > >
> > >
> 441d6cd28c4eaded7abd756164890dbcffd2f3b8..943c001fb13777b4d1513841f
> > > > > a84
> > > > > > 942316846d5e 100644
> > > > > > --- a/gcc/tree-vect-patterns.c
> > > > > > +++ b/gcc/tree-vect-patterns.c
> > > > > > @@ -201,7 +201,8 @@ vect_get_external_def_edge (vec_info
> > > > > > *vinfo, tree
> > > > > > var)  static bool  vect_supportable_direct_optab_p (vec_info
> > > > > > *vinfo, tree otype, tree_code code,
> > > > > >  				 tree itype, tree *vecotype_out,
> > > > > > -				 tree *vecitype_out = NULL)
> > > > > > +				 tree *vecitype_out = NULL,
> > > > > > +				 enum optab_subtype subtype =
> > > > > optab_default)
> > > > > >  {
> > > > > >    tree vecitype = get_vectype_for_scalar_type (vinfo, itype);
> > > > > >    if (!vecitype)
> > > > > > @@ -211,7 +212,7 @@ vect_supportable_direct_optab_p (vec_info
> > > > > > *vinfo,
> > > > > tree otype, tree_code code,
> > > > > >    if (!vecotype)
> > > > > >      return false;
> > > > > >
> > > > > > -  optab optab = optab_for_tree_code (code, vecitype,
> > > > > > optab_default);
> > > > > > +  optab optab = optab_for_tree_code (code, vecitype,
> > > > > > + subtype);
> > > > > >    if (!optab)
> > > > > >      return false;
> > > > > >
> > > > > > @@ -487,14 +488,31 @@ vect_joust_widened_integer (tree type,
> > > > > > bool shift_p, tree op,  }
> > > > > >
> > > > > >  /* Return true if the common supertype of NEW_TYPE and
> > > > > *COMMON_TYPE
> > > > > > -   is narrower than type, storing the supertype in *COMMON_TYPE
> if
> > > so.
> > > > > */
> > > > > > +   is narrower than type, storing the supertype in
> > > > > > + *COMMON_TYPE if
> > > so.
> > > > > > +   If ALLOW_SHORT_SIGN_MISMATCH then accept that
> > > *COMMON_TYPE
> > > > > and NEW_TYPE
> > > > > > +   may be of different signs but equal precision.   */
> > > > > >
> > > > > >  static bool
> > > > > > -vect_joust_widened_type (tree type, tree new_type, tree
> > > > > *common_type)
> > > > > > +vect_joust_widened_type (tree type, tree new_type, tree
> > > > > *common_type,
> > > > > > +			 bool allow_short_sign_mismatch = false)
> > > > > >  {
> > > > > >    if (types_compatible_p (*common_type, new_type))
> > > > > >      return true;
> > > > > >
> > > > > > +  /* Check if the mismatch is only in the sign and if we have
> > > > > > +     allow_short_sign_mismatch then allow it.  */
> > > > > > +  if (allow_short_sign_mismatch
> > > > > > +      && TYPE_SIGN (*common_type) != TYPE_SIGN (new_type))
> > > > > > +    {
> > > > > > +      bool sign = TYPE_SIGN (*common_type) == UNSIGNED;
> > > > > > +      tree eq_type
> > > > > > +	= build_nonstandard_integer_type (TYPE_PRECISION
> (new_type),
> > > > > > +					  sign);
> > > > > > +
> > > > > > +      if (types_compatible_p (*common_type, eq_type))
> > > > > > +	return true;
> > > > > > +    }
> > > > > > +
> > > > > >    /* See if *COMMON_TYPE can hold all values of NEW_TYPE.  */
> > > > > >    if ((TYPE_PRECISION (new_type) < TYPE_PRECISION
> (*common_type))
> > > > > >        && (TYPE_UNSIGNED (new_type) || !TYPE_UNSIGNED
> > > > > (*common_type)))
> > > > > > @@ -532,6 +550,9 @@ vect_joust_widened_type (tree type, tree
> > > > > new_type, tree *common_type)
> > > > > >     to a type that (a) is narrower than the result of STMT_INFO and
> > > > > >     (b) can hold all leaf operand values.
> > > > > >
> > > > > > +   If ALLOW_SHORT_SIGN_MISMATCH then allow that the signs of
> > > > > > + the
> > > > > operands
> > > > > > +   may differ in signs but not in precision.
> > > > > > +
> > > > > >     Return 0 if STMT_INFO isn't such a tree, or if no such
> COMMON_TYPE
> > > > > >     exists.  */
> > > > > >
> > > > > > @@ -539,7 +560,8 @@ static unsigned int  vect_widened_op_tree
> > > > > > (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
> > > > > >  		      tree_code widened_code, bool shift_p,
> > > > > >  		      unsigned int max_nops,
> > > > > > -		      vect_unpromoted_value *unprom, tree
> *common_type)
> > > > > > +		      vect_unpromoted_value *unprom, tree
> *common_type,
> > > > > > +		      bool allow_short_sign_mismatch = false)
> > > > > >  {
> > > > > >    /* Check for an integer operation with the right code.  */
> > > > > >    gassign *assign = dyn_cast <gassign *> (stmt_info->stmt);
> > > > > > @@
> > > > > > -600,7
> > > > > > +622,8 @@ vect_widened_op_tree (vec_info *vinfo,
> stmt_vec_info
> > > > > stmt_info, tree_code code,
> > > > > >  		= vinfo->lookup_def (this_unprom->op);
> > > > > >  	      nops = vect_widened_op_tree (vinfo, def_stmt_info,
> code,
> > > > > >  					   widened_code, shift_p,
> max_nops,
> > > > > > -					   this_unprom,
> common_type);
> > > > > > +					   this_unprom,
> common_type,
> > > > > > +
> allow_short_sign_mismatch);
> > > > > >  	      if (nops == 0)
> > > > > >  		return 0;
> > > > > >
> > > > > > @@ -617,7 +640,8 @@ vect_widened_op_tree (vec_info *vinfo,
> > > > > stmt_vec_info stmt_info, tree_code code,
> > > > > >  	      if (i == 0)
> > > > > >  		*common_type = this_unprom->type;
> > > > > >  	      else if (!vect_joust_widened_type (type, this_unprom-
> >type,
> > > > > > -						 common_type))
> > > > > > +						 common_type,
> > > > > > +
> allow_short_sign_mismatch))
> > > > > >  		return 0;
> > > > > >  	    }
> > > > > >  	}
> > > > > > @@ -888,21 +912,24 @@ vect_reassociating_reduction_p (vec_info
> > > > > > *vinfo,
> > > > > >
> > > > > >     Try to find the following pattern:
> > > > > >
> > > > > > -     type x_t, y_t;
> > > > > > +     type1a x_t
> > > > > > +     type1b y_t;
> > > > > >       TYPE1 prod;
> > > > > >       TYPE2 sum = init;
> > > > > >     loop:
> > > > > >       sum_0 = phi <init, sum_1>
> > > > > >       S1  x_t = ...
> > > > > >       S2  y_t = ...
> > > > > > -     S3  x_T = (TYPE1) x_t;
> > > > > > -     S4  y_T = (TYPE1) y_t;
> > > > > > +     S3  x_T = (TYPE3) x_t;
> > > > > > +     S4  y_T = (TYPE4) y_t;
> > > > > >       S5  prod = x_T * y_T;
> > > > > >       [S6  prod = (TYPE2) prod;  #optional]
> > > > > >       S7  sum_1 = prod + sum_0;
> > > > > >
> > > > > > -   where 'TYPE1' is exactly double the size of type 'type', and 'TYPE2'
> is
> > > the
> > > > > > -   same size of 'TYPE1' or bigger. This is a special case of a reduction
> > > > > > +   where 'TYPE1' is exactly double the size of type 'type1a' and
> 'type1b',
> > > > > > +   the sign of 'TYPE1' must be one of 'type1a' or 'type1b' but the
> sign of
> > > > > > +   'type1a' and 'type1b' can differ. 'TYPE2' is the same size of 'TYPE1'
> or
> > > > > > +   bigger and must be the same sign. This is a special case
> > > > > > + of a reduction
> > > > > >     computation.
> > > > > >
> > > > > >     Input:
> > > > > > @@ -939,15 +966,16 @@ vect_recog_dot_prod_pattern (vec_info
> > > > > > *vinfo,
> > > > > >
> > > > > >    /* Look for the following pattern
> > > > > >            DX = (TYPE1) X;
> > > > > > -          DY = (TYPE1) Y;
> > > > > > +	  DY = (TYPE2) Y;
> > > > > >            DPROD = DX * DY;
> > > > > > -          DDPROD = (TYPE2) DPROD;
> > > > > > +	  DDPROD = (TYPE3) DPROD;
> > > > > >            sum_1 = DDPROD + sum_0;
> > > > > >       In which
> > > > > >       - DX is double the size of X
> > > > > >       - DY is double the size of Y
> > > > > >       - DX, DY, DPROD all have the same type but the sign
> > > > > > -       between DX, DY and DPROD can differ.
> > > > > > +       between DX, DY and DPROD can differ. The sign of DPROD
> > > > > > +       is one of the signs of DX or DY.
> > > > > >       - sum is the same size of DPROD or bigger
> > > > > >       - sum has been recognized as a reduction variable.
> > > > > >
> > > > > > @@ -986,14 +1014,41 @@ vect_recog_dot_prod_pattern (vec_info
> > > *vinfo,
> > > > > >       inside the loop (in case we are analyzing an outer-loop).  */
> > > > > >    vect_unpromoted_value unprom0[2];
> > > > > >    if (!vect_widened_op_tree (vinfo, mult_vinfo, MULT_EXPR,
> > > > > WIDEN_MULT_EXPR,
> > > > > > -			     false, 2, unprom0, &half_type))
> > > > > > +			     false, 2, unprom0, &half_type, true))
> > > > > >      return NULL;
> > > > > >
> > > > > > +  /* Check to see if there is a sign change happening in the
> > > > > > + operands of
> > > > > the
> > > > > > +     multiplication and pick the appropriate optab subtype.
> > > > > > +*/
> > > > > > +  enum optab_subtype subtype;
> > > > > > +  tree rhs_type1 = unprom0[0].type;
> > > > > > +  tree rhs_type2 = unprom0[1].type;
> > > > > > +  if (TYPE_SIGN (rhs_type1) == TYPE_SIGN (rhs_type2))
> > > > > > +     subtype = optab_default;
> > > > > > +  else if (TYPE_SIGN (rhs_type1) == SIGNED
> > > > > > +	   && TYPE_SIGN (rhs_type2) == UNSIGNED)
> > > > > > +     subtype = optab_signed_to_unsigned;
> > > > > > +  else if (TYPE_SIGN (rhs_type1) == UNSIGNED
> > > > > > +	   && TYPE_SIGN (rhs_type2) == SIGNED)
> > > > > > +     subtype = optab_unsigned_to_signed;
> > > > > > +  else
> > > > > > +    gcc_unreachable ();
> > > > > > +
> > > > > > +  /* If we have a sign changing dot product we need to check that
> the
> > > > > > +     promoted type if unsigned has at least the same
> > > > > > + precision as the
> > > final
> > > > > > +     type of the dot-product.  */
> > > > > > +  if (subtype != optab_default)
> > > > > > +    {
> > > > > > +      tree mult_type = TREE_TYPE (unprom_mult.op);
> > > > > > +      if (TYPE_SIGN (mult_type) == UNSIGNED
> > > > > > +	  && TYPE_PRECISION (mult_type) < TYPE_PRECISION (type))
> > > > > > +	return NULL;
> > > > > > +    }
> > > > > > +
> > > > > >    vect_pattern_detected ("vect_recog_dot_prod_pattern",
> > > > > > last_stmt);
> > > > > >
> > > > > >    tree half_vectype;
> > > > > >    if (!vect_supportable_direct_optab_p (vinfo, type,
> > > > > > DOT_PROD_EXPR,
> > > > > half_type,
> > > > > > -					type_out, &half_vectype))
> > > > > > +					type_out, &half_vectype,
> subtype))
> > > > > >      return NULL;
> > > > > >
> > > > > >    /* Get the inputs in the appropriate types.  */ @@ -1002,8
> > > > > > +1057,22 @@ vect_recog_dot_prod_pattern (vec_info *vinfo,
> > > > > >  		       unprom0, half_vectype);
> > > > > >
> > > > > >    var = vect_recog_temp_ssa_var (type, NULL);
> > > > > > +
> > > > > > +  /* If we have a sign changing dot-product the dot-product
> > > > > > + itself does
> > > any
> > > > > > +     sign conversions, so consume the type and use the
> > > > > > + unpromoted types.  */  tree mult_arg1, mult_arg2;  if
> > > > > > + (subtype ==
> > > > > > + optab_default)
> > > > > > +    {
> > > > > > +      mult_arg1 = mult_oprnd[0];
> > > > > > +      mult_arg2 = mult_oprnd[1];
> > > > > > +    }
> > > > > > +  else
> > > > > > +    {
> > > > > > +      mult_arg1 = unprom0[0].op;
> > > > > > +      mult_arg2 = unprom0[1].op;
> > > > > > +    }
> > > > > >    pattern_stmt = gimple_build_assign (var, DOT_PROD_EXPR,
> > > > > > -				      mult_oprnd[0], mult_oprnd[1],
> oprnd1);
> > > > > > +				      mult_arg1, mult_arg2, oprnd1);
> > > > > >
> > > > > >    return pattern_stmt;
> > > > > >  }
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > > > --
> > > > > Richard Biener <rguenther@suse.de> SUSE Software Solutions
> > > > > Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF:
> > > > > Felix Imendörffer; HRB 36809 (AG Nuernberg)
> > > >
> > >
> > > --
> > > Richard Biener <rguenther@suse.de>
> > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409
> > > Nuernberg, Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> >
> 
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409
> Nuernberg, Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
Richard Biener May 26, 2021, 8:56 a.m. UTC | #7
On Tue, 25 May 2021, Tamar Christina wrote:

> Hi Richi,
> 
> Here's a respun version of the patch.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?

index 
7e3aae5f9c28a49feedc7cc66e8ac0d476b9f28a..13e405edd765dde704c64348d2d0b3cd88f0af7c 
100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -4421,7 +4421,9 @@ verify_gimple_assign_ternary (gassign *stmt)
                  && !SCALAR_FLOAT_TYPE_P (rhs1_type))
                 || (!INTEGRAL_TYPE_P (lhs_type)
                     && !SCALAR_FLOAT_TYPE_P (lhs_type))))
-           || !types_compatible_p (rhs1_type, rhs2_type)
+           || (!types_compatible_p (rhs1_type, rhs2_type)
+               && TYPE_SIGN (rhs1_type) == TYPE_SIGN (rhs2_type)
+               && TYPE_PRECISION (rhs1_type) != TYPE_PRECISION 
(rhs2_type))

I think this doesn't capture the constraints - instead please do

-           || !types_compatible_p (rhs1_type, rhs2_type)
+           /* rhs1_type and rhs2_type may differ in sign.  */
+           || !tree_nop_conversion_p (rhs1_type, rhs2_type)


+/* Determine the optab_subtype to use for the given CODE and STMT.  For
+   most CODE this will be optab_vector, however for certain operations 
such as
+   DOT_PROD_EXPR where the operation can different signs for the operands 
we
+   need to be able to pick the right optabs.  */
+
+static enum optab_subtype
+vect_determine_dot_kind (tree_code code, stmt_vec_info stmt_vinfo)

vect_determine_optab_subkind would be a better name.  'code' is
redundant (or should better match stmt_vinfo->stmts code).  I wonder
if it might be clearer to compute the subtype where we compute 'code'
and the relation to stmt_info is obvious, I mean here:

  /* 3. Check the operands of the operation.  The first operands are 
defined
        inside the loop body. The last operand is the reduction variable,
        which is defined by the loop-header-phi.  */

  tree vectype_out = STMT_VINFO_VECTYPE (stmt_info);
  STMT_VINFO_REDUC_VECTYPE (reduc_info) = vectype_out;
  gassign *stmt = as_a <gassign *> (stmt_info->stmt);
  enum tree_code code = gimple_assign_rhs_code (stmt);
  bool lane_reduc_code_p
    = (code == DOT_PROD_EXPR || code == WIDEN_SUM_EXPR || code == 
SAD_EXPR);

so just add

  enum optab_subtype optab_query_kind = optab_vector;
  if (code == DOT_PROD_EXPR
      && <sign test>)
    optab_query_kind = optab_vector_mixed_sign;

in this place and avoid adding the new function?

I'm not too familiar with the pattern recog code, a 2nd eye would be
prefered (Richard?), but

+  /* Check if the mismatch is only in the sign and if we have
+     allow_short_sign_mismatch then allow it.  */
+  if (unprom_type
+      && TYPE_SIGN (unprom_type) == SIGNED
+      && TYPE_SIGN (*common_type) != TYPE_SIGN (new_type))
+    {
+      bool sign = TYPE_SIGN (*common_type) == UNSIGNED;
+      tree eq_type
+       = build_nonstandard_integer_type (TYPE_PRECISION (new_type),
+                                         sign);
+
+      if (types_compatible_p (*common_type, eq_type))
+       return true;
+    }

looks somewhat complicated - is that equal to

  if (unprom_type
      && tree_nop_conversion_p (*common_type, new_type))
    return true;

?  That is, *common_type and new_type only differ in sign?

@@ -812,8 +844,13 @@ vect_convert_inputs (vec_info *vinfo, stmt_vec_info 
stmt_info, unsigned int n,
       for (j = 0; j < i; ++j)
        if (unprom[j].op == unprom[i].op)
          break;
+      bool only_sign = allow_short_sign_mismatch
+                      && TYPE_SIGN (type) != TYPE_SIGN (unprom[i].type)
+                      && TYPE_PRECISION (type) == TYPE_PRECISION 
(unprom[i].type);

this could use the same tree_nop_conversion_p predicate.

Otherwise the patch looks good.

Thanks,
Richard.



> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 	* optabs.def (usdot_prod_optab): New.
> 	* doc/md.texi: Document it and clarify other dot prod optabs.
> 	* optabs-tree.h (enum optab_subtype): Add optab_vector_mixed_sign.
> 	* optabs-tree.c (optab_for_tree_code): Support usdot_prod_optab.
> 	* optabs.c (expand_widen_pattern_expr): Likewise.
> 	* tree-cfg.c (verify_gimple_assign_ternary): Likewise.
> 	* tree-vect-loop.c (vect_determine_dot_kind): New.
> 	(vectorizable_reduction): Query dot-product kind.
> 	* tree-vect-patterns.c (vect_supportable_direct_optab_p): Take optional
> 	optab subtype.
> 	(vect_joust_widened_type, vect_widened_op_tree): Optionally ignore
> 	mismatch types.
> 	(vect_recog_dot_prod_pattern): Support usdot_prod_optab.
> 
> 
> > -----Original Message-----
> > From: Richard Biener <rguenther@suse.de>
> > Sent: Monday, May 10, 2021 2:29 PM
> > To: Tamar Christina <Tamar.Christina@arm.com>
> > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>
> > Subject: RE: [PATCH 1/4]middle-end Vect: Add support for dot-product
> > where the sign for the multiplicant changes.
> > 
> > On Mon, 10 May 2021, Tamar Christina wrote:
> > 
> > >
> > >
> > > > -----Original Message-----
> > > > From: Richard Biener <rguenther@suse.de>
> > > > Sent: Monday, May 10, 2021 12:40 PM
> > > > To: Tamar Christina <Tamar.Christina@arm.com>
> > > > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>
> > > > Subject: RE: [PATCH 1/4]middle-end Vect: Add support for dot-product
> > > > where the sign for the multiplicant changes.
> > > >
> > > > On Fri, 7 May 2021, Tamar Christina wrote:
> > > >
> > > > > Hi Richi,
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Richard Biener <rguenther@suse.de>
> > > > > > Sent: Friday, May 7, 2021 12:46 PM
> > > > > > To: Tamar Christina <Tamar.Christina@arm.com>
> > > > > > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>
> > > > > > Subject: Re: [PATCH 1/4]middle-end Vect: Add support for
> > > > > > dot-product where the sign for the multiplicant changes.
> > > > > >
> > > > > > On Wed, 5 May 2021, Tamar Christina wrote:
> > > > > >
> > > > > > > Hi All,
> > > > > > >
> > > > > > > This patch adds support for a dot product where the sign of
> > > > > > > the multiplication arguments differ. i.e. one is signed and
> > > > > > > one is unsigned but the precisions are the same.
> > > > > > >
> > > > > > > #define N 480
> > > > > > > #define SIGNEDNESS_1 unsigned
> > > > > > > #define SIGNEDNESS_2 signed
> > > > > > > #define SIGNEDNESS_3 signed
> > > > > > > #define SIGNEDNESS_4 unsigned
> > > > > > >
> > > > > > > SIGNEDNESS_1 int __attribute__ ((noipa)) f (SIGNEDNESS_1 int
> > > > > > > res,
> > > > > > > SIGNEDNESS_3 char *restrict a,
> > > > > > >    SIGNEDNESS_4 char *restrict b) {
> > > > > > >   for (__INTPTR_TYPE__ i = 0; i < N; ++i)
> > > > > > >     {
> > > > > > >       int av = a[i];
> > > > > > >       int bv = b[i];
> > > > > > >       SIGNEDNESS_2 short mult = av * bv;
> > > > > > >       res += mult;
> > > > > > >     }
> > > > > > >   return res;
> > > > > > > }
> > > > > > >
> > > > > > > The operations are performed as if the operands were extended
> > > > > > > to a 32-bit
> > > > > > value.
> > > > > > > As such this operation isn't valid if there is an intermediate
> > > > > > > conversion to an unsigned value. i.e.  if SIGNEDNESS_2 is unsigned.
> > > > > > >
> > > > > > > more over if the signs of SIGNEDNESS_3 and SIGNEDNESS_4 are
> > > > > > > flipped the same optab is used but the operands are flipped in
> > > > > > > the optab
> > > > > > expansion.
> > > > > > >
> > > > > > > To support this the patch extends the dot-product detection to
> > > > > > > optionally ignore operands with different signs and stores
> > > > > > > this information in the optab subtype which is now made a bitfield.
> > > > > > >
> > > > > > > The subtype can now additionally controls which optab an EXPR
> > > > > > > can expand
> > > > > > to.
> > > > > > >
> > > > > > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > > > > > >
> > > > > > > Ok for master?
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Tamar
> > > > > > >
> > > > > > > gcc/ChangeLog:
> > > > > > >
> > > > > > > 	* optabs.def (usdot_prod_optab): New.
> > > > > > > 	* doc/md.texi: Document it.
> > > > > > > 	* optabs-tree.c (optab_for_tree_code): Support
> > usdot_prod_optab.
> > > > > > > 	* optabs-tree.h (enum optab_subtype): Likewise.
> > > > > > > 	* optabs.c (expand_widen_pattern_expr): Likewise.
> > > > > > > 	* tree-cfg.c (verify_gimple_assign_ternary): Likewise.
> > > > > > > 	* tree-vect-loop.c (vect_determine_dot_kind): New.
> > > > > > > 	(vectorizable_reduction): Query dot-product kind.
> > > > > > > 	* tree-vect-patterns.c (vect_supportable_direct_optab_p):
> > > > > > > Take
> > > > > > optional
> > > > > > > 	optab subtype.
> > > > > > > 	(vect_joust_widened_type, vect_widened_op_tree):
> > Optionally
> > > > > > ignore
> > > > > > > 	mismatch types.
> > > > > > > 	(vect_recog_dot_prod_pattern): Support usdot_prod_optab.
> > > > > > >
> > > > > > > --- inline copy of patch --
> > > > > > > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index
> > > > > > >
> > > > > >
> > > >
> > d166a0debedf4d8edf55c842bcf4ff4690b3e9ce..baf20416e63745097825fc30fd
> > > > > > f2
> > > > > > > e66bc80d7d23 100644
> > > > > > > --- a/gcc/doc/md.texi
> > > > > > > +++ b/gcc/doc/md.texi
> > > > > > > @@ -5440,11 +5440,13 @@ Like @samp{fold_left_plus_@var{m}},
> > > > > > > but
> > > > > > takes
> > > > > > > an additional mask operand  @item @samp{sdot_prod@var{m}}
> > > > @cindex
> > > > > > > @code{udot_prod@var{m}} instruction pattern  @itemx
> > > > > > > @samp{udot_prod@var{m}}
> > > > > > > +@cindex @code{usdot_prod@var{m}} instruction pattern @itemx
> > > > > > > +@samp{usdot_prod@var{m}}
> > > > > > >  Compute the sum of the products of two signed/unsigned
> > elements.
> > > > > > > -Operand 1 and operand 2 are of the same mode. Their product,
> > > > > > > which is of a -wider mode, is computed and added to operand 3.
> > > > > > > Operand 3 is of a mode equal or -wider than the mode of the
> > > > > > > product. The result is placed in operand 0, which -is of the
> > > > > > > same mode
> > > > as operand 3.
> > > > > > > +Operand 1 and operand 2 are of the same mode but may differ
> > > > > > > +in
> > > > signs.
> > > > > > > +Their product, which is of a wider mode, is computed and
> > > > > > > +added to
> > > > > > operand 3.
> > > > > > > +Operand 3 is of a mode equal or wider than the mode of the
> > product.
> > > > > > > +The result is placed in operand 0, which is of the same mode
> > > > > > > +as
> > > > operand 3.
> > > > > >
> > > > > > This doesn't really say what the 's', 'u' and 'us' specify.
> > > > > > Since we're doing a widen multiplication and then a non-widening
> > > > > > addition we only need to know the effective sign of the
> > > > > > multiplication so I think
> > > > the existing 's' and 'u'
> > > > > > are enough to cover all cases?
> > > > >
> > > > > The existing 's' and 'u' enforce that both operands of the
> > > > > multiplication are of the same sign.  So for e.g. 'u' both operand
> > > > > must be
> > > > unsigned.
> > > > >
> > > > > In the `us` case one can be signed and one unsigned. Operationally
> > > > > this does a sign extension to the wider type for the signed value,
> > > > > and the unsigned value gets zero extended first, and then converts
> > > > > it to unsigned to perform the unsigned multiplication, conforming
> > > > > to the C
> > > > promotion rules.
> > > > >
> > > > > TL;DR; Without a new optab I can't tell during expansion which
> > > > > semantic the operation had at the gimple/C level as modes don't carry
> > signs.
> > > > >
> > > > > Long version:
> > > > >
> > > > > The problem with using the existing patterns, because of their
> > > > > enforcement of `av` and `bv` being the same sign is that we can't
> > > > > remove the explicit sign extensions, but the multiplication must
> > > > > be done on
> > > > the sign/zero extended char input in the same sign.
> > > > >
> > > > > Which means (unless I am mistaken) to get the correct result, you
> > > > > can't use neither `udot` nor `sdot` as semantically these would
> > > > > zero or sign extend both operands from char to int to perform the
> > > > > multiplication in the same sigh.  Whereas in this case, one
> > > > > parameter is zero
> > > > and one parameter is sign extended and the result is always an
> > > > unsigned number.
> > > > >
> > > > > So basically
> > > > >
> > > > > udot<unsigned c, unsigned a, unsigned b> ==
> > > > >    c = zero-ext (a) * zero-ext (b) sdot<signed c, signed a, signed
> > > > > b> ==
> > > > >    c = sign-ext (a) * sign-ext (b) usdot<unsigned c, unsigned a,
> > > > > signed b> ==
> > > > >    c = ((unsigned-conv) sign-ext (a)) * zero-ext (b)
> > > > >
> > > > > So semantically the existing optabs won't fit here. udot would
> > > > > internally promote to unsigned types before the multiplication so
> > > > > the result of the multiplication would be wrong.  sdot would
> > > > > promote both to
> > > > signed and do signed multiplication, so the result is also wrong.
> > > > >
> > > > > Now if I relax the constraint on the signs of udot and sdot there
> > > > > are two
> > > > problems:
> > > > > RTL Modes don't contain signs.  So a target can't tell me how the
> > > > > operands
> > > > will be promoted.
> > > > > So:
> > > > >
> > > > > 1) I can't really check which semantics the target will adhere to
> > > > > on
> > > > expansion.
> > > > > 2) at expand time I have no way to differentiate between the two
> > > > instructions variants, given just modes
> > > > >      I can't tell whether I expand to the normal dot-product or
> > > > > the new
> > > > instruction.
> > > >
> > > > Ah, OK.  Indeed with such a weird instruction the new variant makes
> > sense.
> > > > Still can you please amend the optab documentation to say which
> > > > operand is unsigned and which is signed?  Just 'may differ in signs'
> > > > is bad.
> > >
> > > Sure, will expand on it.
> > >
> > > >
> > > > Since the multiplication is commutative I wonder why you need to
> > > > handle both signed_to_unsigned and unsigned_to_signed - we should
> > > > just enforce a canonical order (like the optab does).
> > >
> > > Sure, I thought it would have been better to change the order at
> > > expand time, but can do so at detection time.
> > >
> > > > I also think it's a particular bad fit for the bad
> > > > optab_for_tree_code API - would any of that improve when using a
> > > > direct internal function here?
> > >
> > > Somewhat, but this has considerable knock on effects, e.g. currently
> > > DOT_PROD is treated as a widening operation and so is handled by
> > > supportable_widening_operation which does not support calls. There's a
> > > significant number of places which work on the tree EXPR (including
> > constant folding) which all need to be changed.
> > >
> > > > In particular all the changes around optab_subtype look like they
> > > > make a bad API worse ... at least a single optab_vector_mixed_sign
> > > > should suffice here, no need to make it a flags kind.
> > >
> > > The reason I did so is because depending on where the query is done it
> > > does use different subtypes currently.  During detection it uses
> > > optab_default, and during vectorization optab_vector.  For this
> > > instruction this difference doesn't seem to be used, but did not want to
> > lose this information in case something depended on it.
> > >
> > > But can make it just one.
> > >
> > > >
> > > > +  /* If we have a sign changing dot product we need to check that the
> > > > +     promoted type if unsigned has at least the same precision as
> > > > + the
> > > > final
> > > > +     type of the dot-product.  */
> > > > +  if (subtype != optab_default)
> > > > +    {
> > > > +      tree mult_type = TREE_TYPE (unprom_mult.op);
> > > > +      if (TYPE_SIGN (mult_type) == UNSIGNED
> > > > +         && TYPE_PRECISION (mult_type) < TYPE_PRECISION (type))
> > > > +       return NULL;
> > > > +    }
> > > >
> > > > I don't understand this - how do we ever arrive at a result with less
> > precision?
> > >
> > > The user could have manually truncated the results, i.e. in the
> > > detection code notice `mult`
> > >
> > >       int av = a[i];
> > >       int bv = b[i];
> > >       SIGNEDNESS_2 short mult = av * bv;
> > >       res += mult;
> > >
> > > which is a short, so it's manually truncating the multiplication which
> > > is done as int by the instruction. If `mult` is unsigned then it will
> > > truncate the result if the signed input to usdot was negative, unless
> > > the Intermediate calculation is of the same precision as the
> > > instruction. i.e. if mult is unsigned int then there's no truncation
> > > going on, it's casting from int to unsigned int so it's safe to use
> > > then as the instruction does the same thing internally.
> > 
> > It looks to me that we simply should only ever allow sing-changes from
> > multiplication result to the sum.  At least your example above is not special to
> > mixed sign multiplications, no?
> > 
> > > > And why's this not an issue for signed multiplication?
> > >
> > > It is, but in that case it's handled by the type jousting, which
> > > doesn't allow the type mismatch. i.e.
> > >
> > > #define SIGNEDNESS_1 unsigned
> > > #define SIGNEDNESS_2 unsigned
> > > #define SIGNEDNESS_3 signed
> > > #define SIGNEDNESS_4 signed
> > >
> > > SIGNEDNESS_1 int __attribute__ ((noipa)) f (SIGNEDNESS_1 int res,
> > > SIGNEDNESS_3 char *restrict a,
> > >    SIGNEDNESS_4 char *restrict b)
> > > {
> > >   for (__INTPTR_TYPE__ i = 0; i < N; ++i)
> > >     {
> > >       int av = a[i];
> > >       int bv = b[i];
> > >       SIGNEDNESS_2 short mult = av * bv;
> > >       res += mult;
> > >     }
> > >   return res;
> > > }
> > >
> > > Is also not detected as a dot product.  By adding the carve out to the
> > > widen multiplication detection it now allows this case through so I
> > > handle it in the detection code.  Thinking about it now, it seems more
> > > logical to add this case handling inside the type jousting code as I
> > > don't think it's ever something you'd want.
> > 
> > Yeah, I think we only need to look through sign changes on the multiplication
> > result.
> > 
> > > > Also...
> > > >
> > > > +  /* If we have a sign changing dot-product the dot-product itself
> > > > + does
> > > > any
> > > > +     sign conversions, so consume the type and use the unpromoted
> > types.
> > > > */
> > > > +  tree mult_arg1, mult_arg2;
> > > > +  if (subtype == optab_default)
> > > > +    {
> > > > +      mult_arg1 = mult_oprnd[0];
> > > > +      mult_arg2 = mult_oprnd[1];
> > > > +    }
> > > > +  else
> > > > +    {
> > > > +      mult_arg1 = unprom0[0].op;
> > > > +      mult_arg2 = unprom0[1].op;
> > > > +    }
> > > >    pattern_stmt = gimple_build_assign (var, DOT_PROD_EXPR,
> > > > -                                     mult_oprnd[0], mult_oprnd[1],
> > > > oprnd1);
> > > > +                                     mult_arg1, mult_arg2, oprnd1);
> > > >
> > > > I thought DOT_PROD always performs the promotion.  Maybe
> > mult_oprnd
> > > > and unprom0 are just misnamed here?
> > >
> > > Somewhat, in a normal dot-product the sign of the multiplication are
> > > the same here as the "unpromoted" types. So after vect_convert_input
> > > these two types are the same.
> > >
> > > However because here the sign changes and to maintain the semantics of
> > > the C code there's an extra conversion here to get the arguments in
> > > the same sign.  That needs to be stripped before given to the
> > > instruction which does the conversion internally.
> > 
> > Yes, but then why's that not done by the detection code?  That is, does it
> > (mis-)handle the (int)short_a * (int)(unsigned short)short_b where we'd
> > want the mixed-sign handling and not strip the unsigned short conversion
> > from short_b?
> > 
> > Richard.
> > 
> > >
> > > Regards,
> > > Tamar
> > >
> > > >
> > > > Richard.
> > > >
> > > > > Regards,
> > > > > Tamar
> > > > >
> > > > > >
> > > > > > The tree.def docs say the sum is also possibly widening but I
> > > > > > don't see this covered by the optab so we should eventually
> > > > > > remove this feature from the tree side.  In fact the tree-cfg.c
> > > > > > verifier requires the addition to be not widening - thus only
> > > > > > tree.def needs
> > > > adjustment.
> > > > > >
> > > > > > >  @cindex @code{ssad@var{m}} instruction pattern  @item
> > > > > > > @samp{ssad@var{m}} diff --git a/gcc/optabs-tree.h
> > > > > > > b/gcc/optabs-tree.h index
> > > > > > >
> > > > > >
> > > >
> > c3aaa1a416991e856d3e24da45968a92ebada82c..ebc23ac86fe99057f375781c2f
> > > > > > 19
> > > > > > > 90e0548ba08d 100644
> > > > > > > --- a/gcc/optabs-tree.h
> > > > > > > +++ b/gcc/optabs-tree.h
> > > > > > > @@ -27,11 +27,29 @@ along with GCC; see the file COPYING3.  If
> > > > > > > not
> > > > see
> > > > > > >     shift amount vs. machines that take a vector for the shift amount.
> > > > > > > */  enum optab_subtype  {
> > > > > > > -  optab_default,
> > > > > > > -  optab_scalar,
> > > > > > > -  optab_vector
> > > > > > > +  optab_default = 1 << 0,
> > > > > > > +  optab_scalar = 1 << 1,
> > > > > > > +  optab_vector = 1 << 2,
> > > > > > > +  optab_signed_to_unsigned = 1 << 3,
> > > > > > > + optab_unsigned_to_signed =
> > > > > > > + 1 << 4
> > > > > > >  };
> > > > > > >
> > > > > > > +/* Override the OrEqual-operator so we can use optab_subtype
> > > > > > > +as a bit flag.  */ inline enum optab_subtype& operator |=
> > > > > > > +(enum
> > > > > > optab_subtype&
> > > > > > > +a, enum optab_subtype b) {
> > > > > > > +    return a = static_cast<optab_subtype>(static_cast<int>(a)
> > > > > > > +					  | static_cast<int>(b)); }
> > > > > > > +
> > > > > > > +/* Override the Or-operator so we can use optab_subtype as a
> > > > > > > +bit flag.  */ inline enum optab_subtype operator | (enum
> > > > > > > +optab_subtype a, enum optab_subtype b) {
> > > > > > > +    return static_cast<optab_subtype>(static_cast<int>(a)
> > > > > > > +				      | static_cast<int>(b)); }
> > > > > > > +
> > > > > > >  /* Return the optab used for computing the given operation on
> > > > > > > the type
> > > > > > given by
> > > > > > >     the second argument.  The third argument distinguishes
> > > > > > > between the
> > > > > > types of
> > > > > > >     vector shifts and rotates.  */ diff --git
> > > > > > > a/gcc/optabs-tree.c b/gcc/optabs-tree.c index
> > > > > > >
> > > > > >
> > > >
> > 95ffe397c23e80c105afea52e9d47216bf52f55a..2f60004545defc53182e004eea
> > > > > > 1e
> > > > > > > 5c22b7453072 100644
> > > > > > > --- a/gcc/optabs-tree.c
> > > > > > > +++ b/gcc/optabs-tree.c
> > > > > > > @@ -127,7 +127,17 @@ optab_for_tree_code (enum tree_code
> > code,
> > > > > > const_tree type,
> > > > > > >        return TYPE_UNSIGNED (type) ? usum_widen_optab :
> > > > > > > ssum_widen_optab;
> > > > > > >
> > > > > > >      case DOT_PROD_EXPR:
> > > > > > > -      return TYPE_UNSIGNED (type) ? udot_prod_optab :
> > > > sdot_prod_optab;
> > > > > > > +      {
> > > > > > > +	gcc_assert (subtype & optab_default
> > > > > > > +		    || subtype & optab_vector
> > > > > > > +		    || subtype & optab_signed_to_unsigned
> > > > > > > +		    || subtype & optab_unsigned_to_signed);
> > > > > > > +
> > > > > > > +	if (subtype & (optab_unsigned_to_signed |
> > > > > > optab_signed_to_unsigned))
> > > > > > > +	  return usdot_prod_optab;
> > > > > > > +
> > > > > > > +	return (TYPE_UNSIGNED (type) ? udot_prod_optab :
> > > > > > sdot_prod_optab);
> > > > > > > +      }
> > > > > > >
> > > > > > >      case SAD_EXPR:
> > > > > > >        return TYPE_UNSIGNED (type) ? usad_optab : ssad_optab;
> > > > > > > diff --git a/gcc/optabs.c b/gcc/optabs.c index
> > > > > > >
> > > > > >
> > > >
> > f4614a394587787293dc8b680a38901f7906f61c..2e18b76de1412eab71971753ac
> > > > > > 67
> > > > > > > 8597c0d00098 100644
> > > > > > > --- a/gcc/optabs.c
> > > > > > > +++ b/gcc/optabs.c
> > > > > > > @@ -262,6 +262,11 @@ expand_widen_pattern_expr (sepops ops,
> > > > > > > rtx op0,
> > > > > > rtx op1, rtx wide_op,
> > > > > > >    bool sbool = false;
> > > > > > >
> > > > > > >    oprnd0 = ops->op0;
> > > > > > > +  if (nops >= 2)
> > > > > > > +    oprnd1 = ops->op1;
> > > > > > > +  if (nops >= 3)
> > > > > > > +    oprnd2 = ops->op2;
> > > > > > > +
> > > > > > >    tmode0 = TYPE_MODE (TREE_TYPE (oprnd0));
> > > > > > >    if (ops->code == VEC_UNPACK_FIX_TRUNC_HI_EXPR
> > > > > > >        || ops->code == VEC_UNPACK_FIX_TRUNC_LO_EXPR) @@ -
> > 285,6
> > > > > > +290,27
> > > > > > > @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1,
> > > > > > > rtx
> > > > > > wide_op,
> > > > > > >  	   ? vec_unpacks_sbool_hi_optab :
> > vec_unpacks_sbool_lo_optab);
> > > > > > >        sbool = true;
> > > > > > >      }
> > > > > > > +  else if (ops->code == DOT_PROD_EXPR)
> > > > > > > +    {
> > > > > > > +      enum optab_subtype subtype = optab_default;
> > > > > > > +      signop sign1 = TYPE_SIGN (TREE_TYPE (oprnd0));
> > > > > > > +      signop sign2 = TYPE_SIGN (TREE_TYPE (oprnd1));
> > > > > > > +      if (sign1 == sign2)
> > > > > > > +	;
> > > > > > > +      else if (sign1 == SIGNED && sign2 == UNSIGNED)
> > > > > > > +	{
> > > > > > > +	  subtype |= optab_signed_to_unsigned;
> > > > > > > +	  /* Same as optab_unsigned_to_signed but flip the
> > operands.  */
> > > > > > > +	  std::swap (op0, op1);
> > > > > > > +	}
> > > > > > > +      else if (sign1 == UNSIGNED && sign2 == SIGNED)
> > > > > > > +	subtype |= optab_unsigned_to_signed;
> > > > > > > +      else
> > > > > > > +	gcc_unreachable ();
> > > > > > > +
> > > > > > > +      widen_pattern_optab
> > > > > > > +	= optab_for_tree_code (ops->code, TREE_TYPE (oprnd0),
> > subtype);
> > > > > > > +    }
> > > > > > >    else
> > > > > > >      widen_pattern_optab
> > > > > > >        = optab_for_tree_code (ops->code, TREE_TYPE (oprnd0),
> > > > > > > optab_default); @@ -298,10 +324,7 @@
> > expand_widen_pattern_expr
> > > > > > (sepops ops, rtx op0, rtx op1, rtx wide_op,
> > > > > > >    gcc_assert (icode != CODE_FOR_nothing);
> > > > > > >
> > > > > > >    if (nops >= 2)
> > > > > > > -    {
> > > > > > > -      oprnd1 = ops->op1;
> > > > > > > -      tmode1 = TYPE_MODE (TREE_TYPE (oprnd1));
> > > > > > > -    }
> > > > > > > +    tmode1 = TYPE_MODE (TREE_TYPE (oprnd1));
> > > > > > >    else if (sbool)
> > > > > > >      {
> > > > > > >        nops = 2;
> > > > > > > @@ -316,7 +339,6 @@ expand_widen_pattern_expr (sepops ops,
> > rtx
> > > > > > > op0,
> > > > > > rtx op1, rtx wide_op,
> > > > > > >      {
> > > > > > >        gcc_assert (tmode1 == tmode0);
> > > > > > >        gcc_assert (op1);
> > > > > > > -      oprnd2 = ops->op2;
> > > > > > >        wmode = TYPE_MODE (TREE_TYPE (oprnd2));
> > > > > > >      }
> > > > > > >
> > > > > > > diff --git a/gcc/optabs.def b/gcc/optabs.def index
> > > > > > >
> > > > > >
> > > >
> > b192a9d070b8aa72e5676b2eaa020b5bdd7ffcc8..f470c2168378cec840edf7fbd
> > > > > > b7c
> > > > > > > 18615baae928 100644
> > > > > > > --- a/gcc/optabs.def
> > > > > > > +++ b/gcc/optabs.def
> > > > > > > @@ -352,6 +352,7 @@ OPTAB_D (uavg_ceil_optab, "uavg$a3_ceil")
> > > > > > OPTAB_D
> > > > > > > (sdot_prod_optab, "sdot_prod$I$a")  OPTAB_D
> > (ssum_widen_optab,
> > > > > > > "widen_ssum$I$a3")  OPTAB_D (udot_prod_optab,
> > "udot_prod$I$a")
> > > > > > > +OPTAB_D (usdot_prod_optab, "usdot_prod$I$a")
> > > > > > >  OPTAB_D (usum_widen_optab, "widen_usum$I$a3")  OPTAB_D
> > > > > > (usad_optab,
> > > > > > > "usad$I$a")  OPTAB_D (ssad_optab, "ssad$I$a") diff --git
> > > > > > > a/gcc/tree-cfg.c b/gcc/tree-cfg.c index
> > > > > > >
> > > > > >
> > > >
> > 7e3aae5f9c28a49feedc7cc66e8ac0d476b9f28a..58b55bb648ad97d514f1fa18bb
> > > > > > 00
> > > > > > > 808fd2678b42 100644
> > > > > > > --- a/gcc/tree-cfg.c
> > > > > > > +++ b/gcc/tree-cfg.c
> > > > > > > @@ -4421,7 +4421,8 @@ verify_gimple_assign_ternary (gassign
> > *stmt)
> > > > > > >  		  && !SCALAR_FLOAT_TYPE_P (rhs1_type))
> > > > > > >  		 || (!INTEGRAL_TYPE_P (lhs_type)
> > > > > > >  		     && !SCALAR_FLOAT_TYPE_P (lhs_type))))
> > > > > > > -	    || !types_compatible_p (rhs1_type, rhs2_type)
> > > > > > > +	    || (!types_compatible_p (rhs1_type, rhs2_type)
> > > > > > > +		&& TYPE_SIGN (rhs1_type) == TYPE_SIGN
> > (rhs2_type))
> > > > > >
> > > > > > That's not restrictive enough.  I suggest you use
> > > > > >
> > > > > >             && element_precision (rhs1_type) !=
> > > > > > element_precision
> > > > > > (rhs2_type)
> > > > > >
> > > > > > instead.
> > > > > >
> > > > > > As said, I'm not sure all the changes in this patch are required.
> > > > > >
> > > > > > Please elaborate.
> > > > > >
> > > > > > Thanks,
> > > > > > Richard.
> > > > > >
> > > > > > >  	    || !useless_type_conversion_p (lhs_type, rhs3_type)
> > > > > > >  	    || maybe_lt (GET_MODE_SIZE (element_mode
> > (rhs3_type)),
> > > > > > >  			 2 * GET_MODE_SIZE (element_mode
> > (rhs1_type))))
> > > > > > diff --git
> > > > > > > a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index
> > > > > > >
> > > > > >
> > > >
> > 93fa2928e001c154bd4a9a73ac1dbbbf73c456df..cb8f5fbb6abca181c4171194d1
> > > > > > 9f
> > > > > > > ec29ec6e4176 100644
> > > > > > > --- a/gcc/tree-vect-loop.c
> > > > > > > +++ b/gcc/tree-vect-loop.c
> > > > > > > @@ -6401,6 +6401,33 @@ build_vect_cond_expr (enum tree_code
> > > > code,
> > > > > > tree vop[3], tree mask,
> > > > > > >      }
> > > > > > >  }
> > > > > > >
> > > > > > > +/* Determine the optab_subtype to use for the given CODE and
> > STMT.
> > > > > > For
> > > > > > > +   most CODE this will be optab_vector, however for certain
> > > > > > > + operations
> > > > > > such as
> > > > > > > +   DOT_PROD_EXPR where the operation can different signs for
> > > > > > > + the
> > > > > > operands we
> > > > > > > +   need to be able to pick the right optabs.  */
> > > > > > > +
> > > > > > > +static enum optab_subtype
> > > > > > > +vect_determine_dot_kind (tree_code code, stmt_vec_info
> > > > > > > +stmt_vinfo) {
> > > > > > > +  enum optab_subtype subtype = optab_vector;
> > > > > > > +  switch (code)
> > > > > > > +    {
> > > > > > > +      case DOT_PROD_EXPR:
> > > > > > > +	{
> > > > > > > +	  gassign *stmt = as_a <gassign *> (STMT_VINFO_STMT
> > (stmt_vinfo));
> > > > > > > +	  signop rhs1_sign = TYPE_SIGN (TREE_TYPE
> > > > > > > +(gimple_assign_rhs1
> > > > > > (stmt)));
> > > > > > > +	  signop rhs2_sign = TYPE_SIGN (TREE_TYPE
> > > > > > > +(gimple_assign_rhs2
> > > > > > (stmt)));
> > > > > > > +	  if (rhs1_sign != rhs2_sign)
> > > > > > > +	    subtype |= optab_unsigned_to_signed;
> > > > > > > +	  break;
> > > > > > > +	}
> > > > > > > +      default:
> > > > > > > +	break;
> > > > > > > +    }
> > > > > > > +
> > > > > > > +  return subtype;
> > > > > > > +}
> > > > > > > +
> > > > > > >  /* Function vectorizable_reduction.
> > > > > > >
> > > > > > >     Check if STMT_INFO performs a reduction operation that can
> > > > > > > be
> > > > > > vectorized.
> > > > > > > @@ -7189,7 +7216,8 @@ vectorizable_reduction (loop_vec_info
> > > > > > loop_vinfo,
> > > > > > >        bool ok = true;
> > > > > > >
> > > > > > >        /* 4.1. check support for the operation in the loop  */
> > > > > > > -      optab optab = optab_for_tree_code (code, vectype_in,
> > > > optab_vector);
> > > > > > > +      enum optab_subtype subtype = vect_determine_dot_kind
> > > > > > > + (code,
> > > > > > stmt_info);
> > > > > > > +      optab optab = optab_for_tree_code (code, vectype_in,
> > > > > > > + subtype);
> > > > > > >        if (!optab)
> > > > > > >  	{
> > > > > > >  	  if (dump_enabled_p ())
> > > > > > > diff --git a/gcc/tree-vect-patterns.c
> > > > > > > b/gcc/tree-vect-patterns.c index
> > > > > > >
> > > > > >
> > > >
> > 441d6cd28c4eaded7abd756164890dbcffd2f3b8..943c001fb13777b4d1513841f
> > > > > > a84
> > > > > > > 942316846d5e 100644
> > > > > > > --- a/gcc/tree-vect-patterns.c
> > > > > > > +++ b/gcc/tree-vect-patterns.c
> > > > > > > @@ -201,7 +201,8 @@ vect_get_external_def_edge (vec_info
> > > > > > > *vinfo, tree
> > > > > > > var)  static bool  vect_supportable_direct_optab_p (vec_info
> > > > > > > *vinfo, tree otype, tree_code code,
> > > > > > >  				 tree itype, tree *vecotype_out,
> > > > > > > -				 tree *vecitype_out = NULL)
> > > > > > > +				 tree *vecitype_out = NULL,
> > > > > > > +				 enum optab_subtype subtype =
> > > > > > optab_default)
> > > > > > >  {
> > > > > > >    tree vecitype = get_vectype_for_scalar_type (vinfo, itype);
> > > > > > >    if (!vecitype)
> > > > > > > @@ -211,7 +212,7 @@ vect_supportable_direct_optab_p (vec_info
> > > > > > > *vinfo,
> > > > > > tree otype, tree_code code,
> > > > > > >    if (!vecotype)
> > > > > > >      return false;
> > > > > > >
> > > > > > > -  optab optab = optab_for_tree_code (code, vecitype,
> > > > > > > optab_default);
> > > > > > > +  optab optab = optab_for_tree_code (code, vecitype,
> > > > > > > + subtype);
> > > > > > >    if (!optab)
> > > > > > >      return false;
> > > > > > >
> > > > > > > @@ -487,14 +488,31 @@ vect_joust_widened_integer (tree type,
> > > > > > > bool shift_p, tree op,  }
> > > > > > >
> > > > > > >  /* Return true if the common supertype of NEW_TYPE and
> > > > > > *COMMON_TYPE
> > > > > > > -   is narrower than type, storing the supertype in *COMMON_TYPE
> > if
> > > > so.
> > > > > > */
> > > > > > > +   is narrower than type, storing the supertype in
> > > > > > > + *COMMON_TYPE if
> > > > so.
> > > > > > > +   If ALLOW_SHORT_SIGN_MISMATCH then accept that
> > > > *COMMON_TYPE
> > > > > > and NEW_TYPE
> > > > > > > +   may be of different signs but equal precision.   */
> > > > > > >
> > > > > > >  static bool
> > > > > > > -vect_joust_widened_type (tree type, tree new_type, tree
> > > > > > *common_type)
> > > > > > > +vect_joust_widened_type (tree type, tree new_type, tree
> > > > > > *common_type,
> > > > > > > +			 bool allow_short_sign_mismatch = false)
> > > > > > >  {
> > > > > > >    if (types_compatible_p (*common_type, new_type))
> > > > > > >      return true;
> > > > > > >
> > > > > > > +  /* Check if the mismatch is only in the sign and if we have
> > > > > > > +     allow_short_sign_mismatch then allow it.  */
> > > > > > > +  if (allow_short_sign_mismatch
> > > > > > > +      && TYPE_SIGN (*common_type) != TYPE_SIGN (new_type))
> > > > > > > +    {
> > > > > > > +      bool sign = TYPE_SIGN (*common_type) == UNSIGNED;
> > > > > > > +      tree eq_type
> > > > > > > +	= build_nonstandard_integer_type (TYPE_PRECISION
> > (new_type),
> > > > > > > +					  sign);
> > > > > > > +
> > > > > > > +      if (types_compatible_p (*common_type, eq_type))
> > > > > > > +	return true;
> > > > > > > +    }
> > > > > > > +
> > > > > > >    /* See if *COMMON_TYPE can hold all values of NEW_TYPE.  */
> > > > > > >    if ((TYPE_PRECISION (new_type) < TYPE_PRECISION
> > (*common_type))
> > > > > > >        && (TYPE_UNSIGNED (new_type) || !TYPE_UNSIGNED
> > > > > > (*common_type)))
> > > > > > > @@ -532,6 +550,9 @@ vect_joust_widened_type (tree type, tree
> > > > > > new_type, tree *common_type)
> > > > > > >     to a type that (a) is narrower than the result of STMT_INFO and
> > > > > > >     (b) can hold all leaf operand values.
> > > > > > >
> > > > > > > +   If ALLOW_SHORT_SIGN_MISMATCH then allow that the signs of
> > > > > > > + the
> > > > > > operands
> > > > > > > +   may differ in signs but not in precision.
> > > > > > > +
> > > > > > >     Return 0 if STMT_INFO isn't such a tree, or if no such
> > COMMON_TYPE
> > > > > > >     exists.  */
> > > > > > >
> > > > > > > @@ -539,7 +560,8 @@ static unsigned int  vect_widened_op_tree
> > > > > > > (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
> > > > > > >  		      tree_code widened_code, bool shift_p,
> > > > > > >  		      unsigned int max_nops,
> > > > > > > -		      vect_unpromoted_value *unprom, tree
> > *common_type)
> > > > > > > +		      vect_unpromoted_value *unprom, tree
> > *common_type,
> > > > > > > +		      bool allow_short_sign_mismatch = false)
> > > > > > >  {
> > > > > > >    /* Check for an integer operation with the right code.  */
> > > > > > >    gassign *assign = dyn_cast <gassign *> (stmt_info->stmt);
> > > > > > > @@
> > > > > > > -600,7
> > > > > > > +622,8 @@ vect_widened_op_tree (vec_info *vinfo,
> > stmt_vec_info
> > > > > > stmt_info, tree_code code,
> > > > > > >  		= vinfo->lookup_def (this_unprom->op);
> > > > > > >  	      nops = vect_widened_op_tree (vinfo, def_stmt_info,
> > code,
> > > > > > >  					   widened_code, shift_p,
> > max_nops,
> > > > > > > -					   this_unprom,
> > common_type);
> > > > > > > +					   this_unprom,
> > common_type,
> > > > > > > +
> > allow_short_sign_mismatch);
> > > > > > >  	      if (nops == 0)
> > > > > > >  		return 0;
> > > > > > >
> > > > > > > @@ -617,7 +640,8 @@ vect_widened_op_tree (vec_info *vinfo,
> > > > > > stmt_vec_info stmt_info, tree_code code,
> > > > > > >  	      if (i == 0)
> > > > > > >  		*common_type = this_unprom->type;
> > > > > > >  	      else if (!vect_joust_widened_type (type, this_unprom-
> > >type,
> > > > > > > -						 common_type))
> > > > > > > +						 common_type,
> > > > > > > +
> > allow_short_sign_mismatch))
> > > > > > >  		return 0;
> > > > > > >  	    }
> > > > > > >  	}
> > > > > > > @@ -888,21 +912,24 @@ vect_reassociating_reduction_p (vec_info
> > > > > > > *vinfo,
> > > > > > >
> > > > > > >     Try to find the following pattern:
> > > > > > >
> > > > > > > -     type x_t, y_t;
> > > > > > > +     type1a x_t
> > > > > > > +     type1b y_t;
> > > > > > >       TYPE1 prod;
> > > > > > >       TYPE2 sum = init;
> > > > > > >     loop:
> > > > > > >       sum_0 = phi <init, sum_1>
> > > > > > >       S1  x_t = ...
> > > > > > >       S2  y_t = ...
> > > > > > > -     S3  x_T = (TYPE1) x_t;
> > > > > > > -     S4  y_T = (TYPE1) y_t;
> > > > > > > +     S3  x_T = (TYPE3) x_t;
> > > > > > > +     S4  y_T = (TYPE4) y_t;
> > > > > > >       S5  prod = x_T * y_T;
> > > > > > >       [S6  prod = (TYPE2) prod;  #optional]
> > > > > > >       S7  sum_1 = prod + sum_0;
> > > > > > >
> > > > > > > -   where 'TYPE1' is exactly double the size of type 'type', and 'TYPE2'
> > is
> > > > the
> > > > > > > -   same size of 'TYPE1' or bigger. This is a special case of a reduction
> > > > > > > +   where 'TYPE1' is exactly double the size of type 'type1a' and
> > 'type1b',
> > > > > > > +   the sign of 'TYPE1' must be one of 'type1a' or 'type1b' but the
> > sign of
> > > > > > > +   'type1a' and 'type1b' can differ. 'TYPE2' is the same size of 'TYPE1'
> > or
> > > > > > > +   bigger and must be the same sign. This is a special case
> > > > > > > + of a reduction
> > > > > > >     computation.
> > > > > > >
> > > > > > >     Input:
> > > > > > > @@ -939,15 +966,16 @@ vect_recog_dot_prod_pattern (vec_info
> > > > > > > *vinfo,
> > > > > > >
> > > > > > >    /* Look for the following pattern
> > > > > > >            DX = (TYPE1) X;
> > > > > > > -          DY = (TYPE1) Y;
> > > > > > > +	  DY = (TYPE2) Y;
> > > > > > >            DPROD = DX * DY;
> > > > > > > -          DDPROD = (TYPE2) DPROD;
> > > > > > > +	  DDPROD = (TYPE3) DPROD;
> > > > > > >            sum_1 = DDPROD + sum_0;
> > > > > > >       In which
> > > > > > >       - DX is double the size of X
> > > > > > >       - DY is double the size of Y
> > > > > > >       - DX, DY, DPROD all have the same type but the sign
> > > > > > > -       between DX, DY and DPROD can differ.
> > > > > > > +       between DX, DY and DPROD can differ. The sign of DPROD
> > > > > > > +       is one of the signs of DX or DY.
> > > > > > >       - sum is the same size of DPROD or bigger
> > > > > > >       - sum has been recognized as a reduction variable.
> > > > > > >
> > > > > > > @@ -986,14 +1014,41 @@ vect_recog_dot_prod_pattern (vec_info
> > > > *vinfo,
> > > > > > >       inside the loop (in case we are analyzing an outer-loop).  */
> > > > > > >    vect_unpromoted_value unprom0[2];
> > > > > > >    if (!vect_widened_op_tree (vinfo, mult_vinfo, MULT_EXPR,
> > > > > > WIDEN_MULT_EXPR,
> > > > > > > -			     false, 2, unprom0, &half_type))
> > > > > > > +			     false, 2, unprom0, &half_type, true))
> > > > > > >      return NULL;
> > > > > > >
> > > > > > > +  /* Check to see if there is a sign change happening in the
> > > > > > > + operands of
> > > > > > the
> > > > > > > +     multiplication and pick the appropriate optab subtype.
> > > > > > > +*/
> > > > > > > +  enum optab_subtype subtype;
> > > > > > > +  tree rhs_type1 = unprom0[0].type;
> > > > > > > +  tree rhs_type2 = unprom0[1].type;
> > > > > > > +  if (TYPE_SIGN (rhs_type1) == TYPE_SIGN (rhs_type2))
> > > > > > > +     subtype = optab_default;
> > > > > > > +  else if (TYPE_SIGN (rhs_type1) == SIGNED
> > > > > > > +	   && TYPE_SIGN (rhs_type2) == UNSIGNED)
> > > > > > > +     subtype = optab_signed_to_unsigned;
> > > > > > > +  else if (TYPE_SIGN (rhs_type1) == UNSIGNED
> > > > > > > +	   && TYPE_SIGN (rhs_type2) == SIGNED)
> > > > > > > +     subtype = optab_unsigned_to_signed;
> > > > > > > +  else
> > > > > > > +    gcc_unreachable ();
> > > > > > > +
> > > > > > > +  /* If we have a sign changing dot product we need to check that
> > the
> > > > > > > +     promoted type if unsigned has at least the same
> > > > > > > + precision as the
> > > > final
> > > > > > > +     type of the dot-product.  */
> > > > > > > +  if (subtype != optab_default)
> > > > > > > +    {
> > > > > > > +      tree mult_type = TREE_TYPE (unprom_mult.op);
> > > > > > > +      if (TYPE_SIGN (mult_type) == UNSIGNED
> > > > > > > +	  && TYPE_PRECISION (mult_type) < TYPE_PRECISION (type))
> > > > > > > +	return NULL;
> > > > > > > +    }
> > > > > > > +
> > > > > > >    vect_pattern_detected ("vect_recog_dot_prod_pattern",
> > > > > > > last_stmt);
> > > > > > >
> > > > > > >    tree half_vectype;
> > > > > > >    if (!vect_supportable_direct_optab_p (vinfo, type,
> > > > > > > DOT_PROD_EXPR,
> > > > > > half_type,
> > > > > > > -					type_out, &half_vectype))
> > > > > > > +					type_out, &half_vectype,
> > subtype))
> > > > > > >      return NULL;
> > > > > > >
> > > > > > >    /* Get the inputs in the appropriate types.  */ @@ -1002,8
> > > > > > > +1057,22 @@ vect_recog_dot_prod_pattern (vec_info *vinfo,
> > > > > > >  		       unprom0, half_vectype);
> > > > > > >
> > > > > > >    var = vect_recog_temp_ssa_var (type, NULL);
> > > > > > > +
> > > > > > > +  /* If we have a sign changing dot-product the dot-product
> > > > > > > + itself does
> > > > any
> > > > > > > +     sign conversions, so consume the type and use the
> > > > > > > + unpromoted types.  */  tree mult_arg1, mult_arg2;  if
> > > > > > > + (subtype ==
> > > > > > > + optab_default)
> > > > > > > +    {
> > > > > > > +      mult_arg1 = mult_oprnd[0];
> > > > > > > +      mult_arg2 = mult_oprnd[1];
> > > > > > > +    }
> > > > > > > +  else
> > > > > > > +    {
> > > > > > > +      mult_arg1 = unprom0[0].op;
> > > > > > > +      mult_arg2 = unprom0[1].op;
> > > > > > > +    }
> > > > > > >    pattern_stmt = gimple_build_assign (var, DOT_PROD_EXPR,
> > > > > > > -				      mult_oprnd[0], mult_oprnd[1],
> > oprnd1);
> > > > > > > +				      mult_arg1, mult_arg2, oprnd1);
> > > > > > >
> > > > > > >    return pattern_stmt;
> > > > > > >  }
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > > > --
> > > > > > Richard Biener <rguenther@suse.de> SUSE Software Solutions
> > > > > > Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF:
> > > > > > Felix Imend?rffer; HRB 36809 (AG Nuernberg)
> > > > >
> > > >
> > > > --
> > > > Richard Biener <rguenther@suse.de>
> > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409
> > > > Nuernberg, Germany; GF: Felix Imend?rffer; HRB 36809 (AG Nuernberg)
> > >
> > 
> > --
> > Richard Biener <rguenther@suse.de>
> > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409
> > Nuernberg, Germany; GF: Felix Imend?rffer; HRB 36809 (AG Nuernberg)
>
Tamar Christina June 2, 2021, 9:28 a.m. UTC | #8
Ping,

Did you have any comments Richard S?

Otherwise I'll proceed with respining according to Richi's comments.

Regards,
Tamar

> -----Original Message-----
> From: Richard Biener <rguenther@suse.de>
> Sent: Wednesday, May 26, 2021 9:57 AM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; Richard Sandiford
> <Richard.Sandiford@arm.com>
> Subject: RE: [PATCH 1/4]middle-end Vect: Add support for dot-product
> where the sign for the multiplicant changes.
> 
> On Tue, 25 May 2021, Tamar Christina wrote:
> 
> > Hi Richi,
> >
> > Here's a respun version of the patch.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master?
> 
> index
> 7e3aae5f9c28a49feedc7cc66e8ac0d476b9f28a..13e405edd765dde704c64348d
> 2d0b3cd88f0af7c
> 100644
> --- a/gcc/tree-cfg.c
> +++ b/gcc/tree-cfg.c
> @@ -4421,7 +4421,9 @@ verify_gimple_assign_ternary (gassign *stmt)
>                   && !SCALAR_FLOAT_TYPE_P (rhs1_type))
>                  || (!INTEGRAL_TYPE_P (lhs_type)
>                      && !SCALAR_FLOAT_TYPE_P (lhs_type))))
> -           || !types_compatible_p (rhs1_type, rhs2_type)
> +           || (!types_compatible_p (rhs1_type, rhs2_type)
> +               && TYPE_SIGN (rhs1_type) == TYPE_SIGN (rhs2_type)
> +               && TYPE_PRECISION (rhs1_type) != TYPE_PRECISION
> (rhs2_type))
> 
> I think this doesn't capture the constraints - instead please do
> 
> -           || !types_compatible_p (rhs1_type, rhs2_type)
> +           /* rhs1_type and rhs2_type may differ in sign.  */
> +           || !tree_nop_conversion_p (rhs1_type, rhs2_type)
> 
> 
> +/* Determine the optab_subtype to use for the given CODE and STMT.  For
> +   most CODE this will be optab_vector, however for certain operations
> such as
> +   DOT_PROD_EXPR where the operation can different signs for the
> operands
> we
> +   need to be able to pick the right optabs.  */
> +
> +static enum optab_subtype
> +vect_determine_dot_kind (tree_code code, stmt_vec_info stmt_vinfo)
> 
> vect_determine_optab_subkind would be a better name.  'code' is
> redundant (or should better match stmt_vinfo->stmts code).  I wonder
> if it might be clearer to compute the subtype where we compute 'code'
> and the relation to stmt_info is obvious, I mean here:
> 
>   /* 3. Check the operands of the operation.  The first operands are
> defined
>         inside the loop body. The last operand is the reduction variable,
>         which is defined by the loop-header-phi.  */
> 
>   tree vectype_out = STMT_VINFO_VECTYPE (stmt_info);
>   STMT_VINFO_REDUC_VECTYPE (reduc_info) = vectype_out;
>   gassign *stmt = as_a <gassign *> (stmt_info->stmt);
>   enum tree_code code = gimple_assign_rhs_code (stmt);
>   bool lane_reduc_code_p
>     = (code == DOT_PROD_EXPR || code == WIDEN_SUM_EXPR || code ==
> SAD_EXPR);
> 
> so just add
> 
>   enum optab_subtype optab_query_kind = optab_vector;
>   if (code == DOT_PROD_EXPR
>       && <sign test>)
>     optab_query_kind = optab_vector_mixed_sign;
> 
> in this place and avoid adding the new function?
> 
> I'm not too familiar with the pattern recog code, a 2nd eye would be
> prefered (Richard?), but
> 
> +  /* Check if the mismatch is only in the sign and if we have
> +     allow_short_sign_mismatch then allow it.  */
> +  if (unprom_type
> +      && TYPE_SIGN (unprom_type) == SIGNED
> +      && TYPE_SIGN (*common_type) != TYPE_SIGN (new_type))
> +    {
> +      bool sign = TYPE_SIGN (*common_type) == UNSIGNED;
> +      tree eq_type
> +       = build_nonstandard_integer_type (TYPE_PRECISION (new_type),
> +                                         sign);
> +
> +      if (types_compatible_p (*common_type, eq_type))
> +       return true;
> +    }
> 
> looks somewhat complicated - is that equal to
> 
>   if (unprom_type
>       && tree_nop_conversion_p (*common_type, new_type))
>     return true;
> 
> ?  That is, *common_type and new_type only differ in sign?
> 
> @@ -812,8 +844,13 @@ vect_convert_inputs (vec_info *vinfo,
> stmt_vec_info
> stmt_info, unsigned int n,
>        for (j = 0; j < i; ++j)
>         if (unprom[j].op == unprom[i].op)
>           break;
> +      bool only_sign = allow_short_sign_mismatch
> +                      && TYPE_SIGN (type) != TYPE_SIGN (unprom[i].type)
> +                      && TYPE_PRECISION (type) == TYPE_PRECISION
> (unprom[i].type);
> 
> this could use the same tree_nop_conversion_p predicate.
> 
> Otherwise the patch looks good.
> 
> Thanks,
> Richard.
> 
> 
> 
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > 	* optabs.def (usdot_prod_optab): New.
> > 	* doc/md.texi: Document it and clarify other dot prod optabs.
> > 	* optabs-tree.h (enum optab_subtype): Add
> optab_vector_mixed_sign.
> > 	* optabs-tree.c (optab_for_tree_code): Support usdot_prod_optab.
> > 	* optabs.c (expand_widen_pattern_expr): Likewise.
> > 	* tree-cfg.c (verify_gimple_assign_ternary): Likewise.
> > 	* tree-vect-loop.c (vect_determine_dot_kind): New.
> > 	(vectorizable_reduction): Query dot-product kind.
> > 	* tree-vect-patterns.c (vect_supportable_direct_optab_p): Take
> optional
> > 	optab subtype.
> > 	(vect_joust_widened_type, vect_widened_op_tree): Optionally
> ignore
> > 	mismatch types.
> > 	(vect_recog_dot_prod_pattern): Support usdot_prod_optab.
> >
> >
> > > -----Original Message-----
> > > From: Richard Biener <rguenther@suse.de>
> > > Sent: Monday, May 10, 2021 2:29 PM
> > > To: Tamar Christina <Tamar.Christina@arm.com>
> > > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>
> > > Subject: RE: [PATCH 1/4]middle-end Vect: Add support for dot-product
> > > where the sign for the multiplicant changes.
> > >
> > > On Mon, 10 May 2021, Tamar Christina wrote:
> > >
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Richard Biener <rguenther@suse.de>
> > > > > Sent: Monday, May 10, 2021 12:40 PM
> > > > > To: Tamar Christina <Tamar.Christina@arm.com>
> > > > > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>
> > > > > Subject: RE: [PATCH 1/4]middle-end Vect: Add support for dot-
> product
> > > > > where the sign for the multiplicant changes.
> > > > >
> > > > > On Fri, 7 May 2021, Tamar Christina wrote:
> > > > >
> > > > > > Hi Richi,
> > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: Richard Biener <rguenther@suse.de>
> > > > > > > Sent: Friday, May 7, 2021 12:46 PM
> > > > > > > To: Tamar Christina <Tamar.Christina@arm.com>
> > > > > > > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>
> > > > > > > Subject: Re: [PATCH 1/4]middle-end Vect: Add support for
> > > > > > > dot-product where the sign for the multiplicant changes.
> > > > > > >
> > > > > > > On Wed, 5 May 2021, Tamar Christina wrote:
> > > > > > >
> > > > > > > > Hi All,
> > > > > > > >
> > > > > > > > This patch adds support for a dot product where the sign of
> > > > > > > > the multiplication arguments differ. i.e. one is signed and
> > > > > > > > one is unsigned but the precisions are the same.
> > > > > > > >
> > > > > > > > #define N 480
> > > > > > > > #define SIGNEDNESS_1 unsigned
> > > > > > > > #define SIGNEDNESS_2 signed
> > > > > > > > #define SIGNEDNESS_3 signed
> > > > > > > > #define SIGNEDNESS_4 unsigned
> > > > > > > >
> > > > > > > > SIGNEDNESS_1 int __attribute__ ((noipa)) f (SIGNEDNESS_1 int
> > > > > > > > res,
> > > > > > > > SIGNEDNESS_3 char *restrict a,
> > > > > > > >    SIGNEDNESS_4 char *restrict b) {
> > > > > > > >   for (__INTPTR_TYPE__ i = 0; i < N; ++i)
> > > > > > > >     {
> > > > > > > >       int av = a[i];
> > > > > > > >       int bv = b[i];
> > > > > > > >       SIGNEDNESS_2 short mult = av * bv;
> > > > > > > >       res += mult;
> > > > > > > >     }
> > > > > > > >   return res;
> > > > > > > > }
> > > > > > > >
> > > > > > > > The operations are performed as if the operands were
> extended
> > > > > > > > to a 32-bit
> > > > > > > value.
> > > > > > > > As such this operation isn't valid if there is an intermediate
> > > > > > > > conversion to an unsigned value. i.e.  if SIGNEDNESS_2 is
> unsigned.
> > > > > > > >
> > > > > > > > more over if the signs of SIGNEDNESS_3 and SIGNEDNESS_4 are
> > > > > > > > flipped the same optab is used but the operands are flipped in
> > > > > > > > the optab
> > > > > > > expansion.
> > > > > > > >
> > > > > > > > To support this the patch extends the dot-product detection to
> > > > > > > > optionally ignore operands with different signs and stores
> > > > > > > > this information in the optab subtype which is now made a
> bitfield.
> > > > > > > >
> > > > > > > > The subtype can now additionally controls which optab an EXPR
> > > > > > > > can expand
> > > > > > > to.
> > > > > > > >
> > > > > > > > Bootstrapped Regtested on aarch64-none-linux-gnu and no
> issues.
> > > > > > > >
> > > > > > > > Ok for master?
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Tamar
> > > > > > > >
> > > > > > > > gcc/ChangeLog:
> > > > > > > >
> > > > > > > > 	* optabs.def (usdot_prod_optab): New.
> > > > > > > > 	* doc/md.texi: Document it.
> > > > > > > > 	* optabs-tree.c (optab_for_tree_code): Support
> > > usdot_prod_optab.
> > > > > > > > 	* optabs-tree.h (enum optab_subtype): Likewise.
> > > > > > > > 	* optabs.c (expand_widen_pattern_expr): Likewise.
> > > > > > > > 	* tree-cfg.c (verify_gimple_assign_ternary): Likewise.
> > > > > > > > 	* tree-vect-loop.c (vect_determine_dot_kind): New.
> > > > > > > > 	(vectorizable_reduction): Query dot-product kind.
> > > > > > > > 	* tree-vect-patterns.c (vect_supportable_direct_optab_p):
> > > > > > > > Take
> > > > > > > optional
> > > > > > > > 	optab subtype.
> > > > > > > > 	(vect_joust_widened_type, vect_widened_op_tree):
> > > Optionally
> > > > > > > ignore
> > > > > > > > 	mismatch types.
> > > > > > > > 	(vect_recog_dot_prod_pattern): Support usdot_prod_optab.
> > > > > > > >
> > > > > > > > --- inline copy of patch --
> > > > > > > > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index
> > > > > > > >
> > > > > > >
> > > > >
> > >
> d166a0debedf4d8edf55c842bcf4ff4690b3e9ce..baf20416e63745097825fc30fd
> > > > > > > f2
> > > > > > > > e66bc80d7d23 100644
> > > > > > > > --- a/gcc/doc/md.texi
> > > > > > > > +++ b/gcc/doc/md.texi
> > > > > > > > @@ -5440,11 +5440,13 @@ Like
> @samp{fold_left_plus_@var{m}},
> > > > > > > > but
> > > > > > > takes
> > > > > > > > an additional mask operand  @item @samp{sdot_prod@var{m}}
> > > > > @cindex
> > > > > > > > @code{udot_prod@var{m}} instruction pattern  @itemx
> > > > > > > > @samp{udot_prod@var{m}}
> > > > > > > > +@cindex @code{usdot_prod@var{m}} instruction pattern
> @itemx
> > > > > > > > +@samp{usdot_prod@var{m}}
> > > > > > > >  Compute the sum of the products of two signed/unsigned
> > > elements.
> > > > > > > > -Operand 1 and operand 2 are of the same mode. Their product,
> > > > > > > > which is of a -wider mode, is computed and added to operand 3.
> > > > > > > > Operand 3 is of a mode equal or -wider than the mode of the
> > > > > > > > product. The result is placed in operand 0, which -is of the
> > > > > > > > same mode
> > > > > as operand 3.
> > > > > > > > +Operand 1 and operand 2 are of the same mode but may differ
> > > > > > > > +in
> > > > > signs.
> > > > > > > > +Their product, which is of a wider mode, is computed and
> > > > > > > > +added to
> > > > > > > operand 3.
> > > > > > > > +Operand 3 is of a mode equal or wider than the mode of the
> > > product.
> > > > > > > > +The result is placed in operand 0, which is of the same mode
> > > > > > > > +as
> > > > > operand 3.
> > > > > > >
> > > > > > > This doesn't really say what the 's', 'u' and 'us' specify.
> > > > > > > Since we're doing a widen multiplication and then a non-widening
> > > > > > > addition we only need to know the effective sign of the
> > > > > > > multiplication so I think
> > > > > the existing 's' and 'u'
> > > > > > > are enough to cover all cases?
> > > > > >
> > > > > > The existing 's' and 'u' enforce that both operands of the
> > > > > > multiplication are of the same sign.  So for e.g. 'u' both operand
> > > > > > must be
> > > > > unsigned.
> > > > > >
> > > > > > In the `us` case one can be signed and one unsigned. Operationally
> > > > > > this does a sign extension to the wider type for the signed value,
> > > > > > and the unsigned value gets zero extended first, and then converts
> > > > > > it to unsigned to perform the unsigned multiplication, conforming
> > > > > > to the C
> > > > > promotion rules.
> > > > > >
> > > > > > TL;DR; Without a new optab I can't tell during expansion which
> > > > > > semantic the operation had at the gimple/C level as modes don't
> carry
> > > signs.
> > > > > >
> > > > > > Long version:
> > > > > >
> > > > > > The problem with using the existing patterns, because of their
> > > > > > enforcement of `av` and `bv` being the same sign is that we can't
> > > > > > remove the explicit sign extensions, but the multiplication must
> > > > > > be done on
> > > > > the sign/zero extended char input in the same sign.
> > > > > >
> > > > > > Which means (unless I am mistaken) to get the correct result, you
> > > > > > can't use neither `udot` nor `sdot` as semantically these would
> > > > > > zero or sign extend both operands from char to int to perform the
> > > > > > multiplication in the same sigh.  Whereas in this case, one
> > > > > > parameter is zero
> > > > > and one parameter is sign extended and the result is always an
> > > > > unsigned number.
> > > > > >
> > > > > > So basically
> > > > > >
> > > > > > udot<unsigned c, unsigned a, unsigned b> ==
> > > > > >    c = zero-ext (a) * zero-ext (b) sdot<signed c, signed a, signed
> > > > > > b> ==
> > > > > >    c = sign-ext (a) * sign-ext (b) usdot<unsigned c, unsigned a,
> > > > > > signed b> ==
> > > > > >    c = ((unsigned-conv) sign-ext (a)) * zero-ext (b)
> > > > > >
> > > > > > So semantically the existing optabs won't fit here. udot would
> > > > > > internally promote to unsigned types before the multiplication so
> > > > > > the result of the multiplication would be wrong.  sdot would
> > > > > > promote both to
> > > > > signed and do signed multiplication, so the result is also wrong.
> > > > > >
> > > > > > Now if I relax the constraint on the signs of udot and sdot there
> > > > > > are two
> > > > > problems:
> > > > > > RTL Modes don't contain signs.  So a target can't tell me how the
> > > > > > operands
> > > > > will be promoted.
> > > > > > So:
> > > > > >
> > > > > > 1) I can't really check which semantics the target will adhere to
> > > > > > on
> > > > > expansion.
> > > > > > 2) at expand time I have no way to differentiate between the two
> > > > > instructions variants, given just modes
> > > > > >      I can't tell whether I expand to the normal dot-product or
> > > > > > the new
> > > > > instruction.
> > > > >
> > > > > Ah, OK.  Indeed with such a weird instruction the new variant makes
> > > sense.
> > > > > Still can you please amend the optab documentation to say which
> > > > > operand is unsigned and which is signed?  Just 'may differ in signs'
> > > > > is bad.
> > > >
> > > > Sure, will expand on it.
> > > >
> > > > >
> > > > > Since the multiplication is commutative I wonder why you need to
> > > > > handle both signed_to_unsigned and unsigned_to_signed - we
> should
> > > > > just enforce a canonical order (like the optab does).
> > > >
> > > > Sure, I thought it would have been better to change the order at
> > > > expand time, but can do so at detection time.
> > > >
> > > > > I also think it's a particular bad fit for the bad
> > > > > optab_for_tree_code API - would any of that improve when using a
> > > > > direct internal function here?
> > > >
> > > > Somewhat, but this has considerable knock on effects, e.g. currently
> > > > DOT_PROD is treated as a widening operation and so is handled by
> > > > supportable_widening_operation which does not support calls. There's
> a
> > > > significant number of places which work on the tree EXPR (including
> > > constant folding) which all need to be changed.
> > > >
> > > > > In particular all the changes around optab_subtype look like they
> > > > > make a bad API worse ... at least a single optab_vector_mixed_sign
> > > > > should suffice here, no need to make it a flags kind.
> > > >
> > > > The reason I did so is because depending on where the query is done it
> > > > does use different subtypes currently.  During detection it uses
> > > > optab_default, and during vectorization optab_vector.  For this
> > > > instruction this difference doesn't seem to be used, but did not want to
> > > lose this information in case something depended on it.
> > > >
> > > > But can make it just one.
> > > >
> > > > >
> > > > > +  /* If we have a sign changing dot product we need to check that
> the
> > > > > +     promoted type if unsigned has at least the same precision as
> > > > > + the
> > > > > final
> > > > > +     type of the dot-product.  */
> > > > > +  if (subtype != optab_default)
> > > > > +    {
> > > > > +      tree mult_type = TREE_TYPE (unprom_mult.op);
> > > > > +      if (TYPE_SIGN (mult_type) == UNSIGNED
> > > > > +         && TYPE_PRECISION (mult_type) < TYPE_PRECISION (type))
> > > > > +       return NULL;
> > > > > +    }
> > > > >
> > > > > I don't understand this - how do we ever arrive at a result with less
> > > precision?
> > > >
> > > > The user could have manually truncated the results, i.e. in the
> > > > detection code notice `mult`
> > > >
> > > >       int av = a[i];
> > > >       int bv = b[i];
> > > >       SIGNEDNESS_2 short mult = av * bv;
> > > >       res += mult;
> > > >
> > > > which is a short, so it's manually truncating the multiplication which
> > > > is done as int by the instruction. If `mult` is unsigned then it will
> > > > truncate the result if the signed input to usdot was negative, unless
> > > > the Intermediate calculation is of the same precision as the
> > > > instruction. i.e. if mult is unsigned int then there's no truncation
> > > > going on, it's casting from int to unsigned int so it's safe to use
> > > > then as the instruction does the same thing internally.
> > >
> > > It looks to me that we simply should only ever allow sing-changes from
> > > multiplication result to the sum.  At least your example above is not
> special to
> > > mixed sign multiplications, no?
> > >
> > > > > And why's this not an issue for signed multiplication?
> > > >
> > > > It is, but in that case it's handled by the type jousting, which
> > > > doesn't allow the type mismatch. i.e.
> > > >
> > > > #define SIGNEDNESS_1 unsigned
> > > > #define SIGNEDNESS_2 unsigned
> > > > #define SIGNEDNESS_3 signed
> > > > #define SIGNEDNESS_4 signed
> > > >
> > > > SIGNEDNESS_1 int __attribute__ ((noipa)) f (SIGNEDNESS_1 int res,
> > > > SIGNEDNESS_3 char *restrict a,
> > > >    SIGNEDNESS_4 char *restrict b)
> > > > {
> > > >   for (__INTPTR_TYPE__ i = 0; i < N; ++i)
> > > >     {
> > > >       int av = a[i];
> > > >       int bv = b[i];
> > > >       SIGNEDNESS_2 short mult = av * bv;
> > > >       res += mult;
> > > >     }
> > > >   return res;
> > > > }
> > > >
> > > > Is also not detected as a dot product.  By adding the carve out to the
> > > > widen multiplication detection it now allows this case through so I
> > > > handle it in the detection code.  Thinking about it now, it seems more
> > > > logical to add this case handling inside the type jousting code as I
> > > > don't think it's ever something you'd want.
> > >
> > > Yeah, I think we only need to look through sign changes on the
> multiplication
> > > result.
> > >
> > > > > Also...
> > > > >
> > > > > +  /* If we have a sign changing dot-product the dot-product itself
> > > > > + does
> > > > > any
> > > > > +     sign conversions, so consume the type and use the unpromoted
> > > types.
> > > > > */
> > > > > +  tree mult_arg1, mult_arg2;
> > > > > +  if (subtype == optab_default)
> > > > > +    {
> > > > > +      mult_arg1 = mult_oprnd[0];
> > > > > +      mult_arg2 = mult_oprnd[1];
> > > > > +    }
> > > > > +  else
> > > > > +    {
> > > > > +      mult_arg1 = unprom0[0].op;
> > > > > +      mult_arg2 = unprom0[1].op;
> > > > > +    }
> > > > >    pattern_stmt = gimple_build_assign (var, DOT_PROD_EXPR,
> > > > > -                                     mult_oprnd[0], mult_oprnd[1],
> > > > > oprnd1);
> > > > > +                                     mult_arg1, mult_arg2, oprnd1);
> > > > >
> > > > > I thought DOT_PROD always performs the promotion.  Maybe
> > > mult_oprnd
> > > > > and unprom0 are just misnamed here?
> > > >
> > > > Somewhat, in a normal dot-product the sign of the multiplication are
> > > > the same here as the "unpromoted" types. So after
> vect_convert_input
> > > > these two types are the same.
> > > >
> > > > However because here the sign changes and to maintain the semantics
> of
> > > > the C code there's an extra conversion here to get the arguments in
> > > > the same sign.  That needs to be stripped before given to the
> > > > instruction which does the conversion internally.
> > >
> > > Yes, but then why's that not done by the detection code?  That is, does it
> > > (mis-)handle the (int)short_a * (int)(unsigned short)short_b where we'd
> > > want the mixed-sign handling and not strip the unsigned short conversion
> > > from short_b?
> > >
> > > Richard.
> > >
> > > >
> > > > Regards,
> > > > Tamar
> > > >
> > > > >
> > > > > Richard.
> > > > >
> > > > > > Regards,
> > > > > > Tamar
> > > > > >
> > > > > > >
> > > > > > > The tree.def docs say the sum is also possibly widening but I
> > > > > > > don't see this covered by the optab so we should eventually
> > > > > > > remove this feature from the tree side.  In fact the tree-cfg.c
> > > > > > > verifier requires the addition to be not widening - thus only
> > > > > > > tree.def needs
> > > > > adjustment.
> > > > > > >
> > > > > > > >  @cindex @code{ssad@var{m}} instruction pattern  @item
> > > > > > > > @samp{ssad@var{m}} diff --git a/gcc/optabs-tree.h
> > > > > > > > b/gcc/optabs-tree.h index
> > > > > > > >
> > > > > > >
> > > > >
> > >
> c3aaa1a416991e856d3e24da45968a92ebada82c..ebc23ac86fe99057f375781c2f
> > > > > > > 19
> > > > > > > > 90e0548ba08d 100644
> > > > > > > > --- a/gcc/optabs-tree.h
> > > > > > > > +++ b/gcc/optabs-tree.h
> > > > > > > > @@ -27,11 +27,29 @@ along with GCC; see the file COPYING3.
> If
> > > > > > > > not
> > > > > see
> > > > > > > >     shift amount vs. machines that take a vector for the shift
> amount.
> > > > > > > > */  enum optab_subtype  {
> > > > > > > > -  optab_default,
> > > > > > > > -  optab_scalar,
> > > > > > > > -  optab_vector
> > > > > > > > +  optab_default = 1 << 0,
> > > > > > > > +  optab_scalar = 1 << 1,
> > > > > > > > +  optab_vector = 1 << 2,
> > > > > > > > +  optab_signed_to_unsigned = 1 << 3,
> > > > > > > > + optab_unsigned_to_signed =
> > > > > > > > + 1 << 4
> > > > > > > >  };
> > > > > > > >
> > > > > > > > +/* Override the OrEqual-operator so we can use
> optab_subtype
> > > > > > > > +as a bit flag.  */ inline enum optab_subtype& operator |=
> > > > > > > > +(enum
> > > > > > > optab_subtype&
> > > > > > > > +a, enum optab_subtype b) {
> > > > > > > > +    return a = static_cast<optab_subtype>(static_cast<int>(a)
> > > > > > > > +					  | static_cast<int>(b)); }
> > > > > > > > +
> > > > > > > > +/* Override the Or-operator so we can use optab_subtype as a
> > > > > > > > +bit flag.  */ inline enum optab_subtype operator | (enum
> > > > > > > > +optab_subtype a, enum optab_subtype b) {
> > > > > > > > +    return static_cast<optab_subtype>(static_cast<int>(a)
> > > > > > > > +				      | static_cast<int>(b)); }
> > > > > > > > +
> > > > > > > >  /* Return the optab used for computing the given operation on
> > > > > > > > the type
> > > > > > > given by
> > > > > > > >     the second argument.  The third argument distinguishes
> > > > > > > > between the
> > > > > > > types of
> > > > > > > >     vector shifts and rotates.  */ diff --git
> > > > > > > > a/gcc/optabs-tree.c b/gcc/optabs-tree.c index
> > > > > > > >
> > > > > > >
> > > > >
> > >
> 95ffe397c23e80c105afea52e9d47216bf52f55a..2f60004545defc53182e004eea
> > > > > > > 1e
> > > > > > > > 5c22b7453072 100644
> > > > > > > > --- a/gcc/optabs-tree.c
> > > > > > > > +++ b/gcc/optabs-tree.c
> > > > > > > > @@ -127,7 +127,17 @@ optab_for_tree_code (enum tree_code
> > > code,
> > > > > > > const_tree type,
> > > > > > > >        return TYPE_UNSIGNED (type) ? usum_widen_optab :
> > > > > > > > ssum_widen_optab;
> > > > > > > >
> > > > > > > >      case DOT_PROD_EXPR:
> > > > > > > > -      return TYPE_UNSIGNED (type) ? udot_prod_optab :
> > > > > sdot_prod_optab;
> > > > > > > > +      {
> > > > > > > > +	gcc_assert (subtype & optab_default
> > > > > > > > +		    || subtype & optab_vector
> > > > > > > > +		    || subtype & optab_signed_to_unsigned
> > > > > > > > +		    || subtype & optab_unsigned_to_signed);
> > > > > > > > +
> > > > > > > > +	if (subtype & (optab_unsigned_to_signed |
> > > > > > > optab_signed_to_unsigned))
> > > > > > > > +	  return usdot_prod_optab;
> > > > > > > > +
> > > > > > > > +	return (TYPE_UNSIGNED (type) ? udot_prod_optab :
> > > > > > > sdot_prod_optab);
> > > > > > > > +      }
> > > > > > > >
> > > > > > > >      case SAD_EXPR:
> > > > > > > >        return TYPE_UNSIGNED (type) ? usad_optab : ssad_optab;
> > > > > > > > diff --git a/gcc/optabs.c b/gcc/optabs.c index
> > > > > > > >
> > > > > > >
> > > > >
> > >
> f4614a394587787293dc8b680a38901f7906f61c..2e18b76de1412eab71971753ac
> > > > > > > 67
> > > > > > > > 8597c0d00098 100644
> > > > > > > > --- a/gcc/optabs.c
> > > > > > > > +++ b/gcc/optabs.c
> > > > > > > > @@ -262,6 +262,11 @@ expand_widen_pattern_expr (sepops
> ops,
> > > > > > > > rtx op0,
> > > > > > > rtx op1, rtx wide_op,
> > > > > > > >    bool sbool = false;
> > > > > > > >
> > > > > > > >    oprnd0 = ops->op0;
> > > > > > > > +  if (nops >= 2)
> > > > > > > > +    oprnd1 = ops->op1;
> > > > > > > > +  if (nops >= 3)
> > > > > > > > +    oprnd2 = ops->op2;
> > > > > > > > +
> > > > > > > >    tmode0 = TYPE_MODE (TREE_TYPE (oprnd0));
> > > > > > > >    if (ops->code == VEC_UNPACK_FIX_TRUNC_HI_EXPR
> > > > > > > >        || ops->code == VEC_UNPACK_FIX_TRUNC_LO_EXPR) @@ -
> > > 285,6
> > > > > > > +290,27
> > > > > > > > @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1,
> > > > > > > > rtx
> > > > > > > wide_op,
> > > > > > > >  	   ? vec_unpacks_sbool_hi_optab :
> > > vec_unpacks_sbool_lo_optab);
> > > > > > > >        sbool = true;
> > > > > > > >      }
> > > > > > > > +  else if (ops->code == DOT_PROD_EXPR)
> > > > > > > > +    {
> > > > > > > > +      enum optab_subtype subtype = optab_default;
> > > > > > > > +      signop sign1 = TYPE_SIGN (TREE_TYPE (oprnd0));
> > > > > > > > +      signop sign2 = TYPE_SIGN (TREE_TYPE (oprnd1));
> > > > > > > > +      if (sign1 == sign2)
> > > > > > > > +	;
> > > > > > > > +      else if (sign1 == SIGNED && sign2 == UNSIGNED)
> > > > > > > > +	{
> > > > > > > > +	  subtype |= optab_signed_to_unsigned;
> > > > > > > > +	  /* Same as optab_unsigned_to_signed but flip the
> > > operands.  */
> > > > > > > > +	  std::swap (op0, op1);
> > > > > > > > +	}
> > > > > > > > +      else if (sign1 == UNSIGNED && sign2 == SIGNED)
> > > > > > > > +	subtype |= optab_unsigned_to_signed;
> > > > > > > > +      else
> > > > > > > > +	gcc_unreachable ();
> > > > > > > > +
> > > > > > > > +      widen_pattern_optab
> > > > > > > > +	= optab_for_tree_code (ops->code, TREE_TYPE (oprnd0),
> > > subtype);
> > > > > > > > +    }
> > > > > > > >    else
> > > > > > > >      widen_pattern_optab
> > > > > > > >        = optab_for_tree_code (ops->code, TREE_TYPE (oprnd0),
> > > > > > > > optab_default); @@ -298,10 +324,7 @@
> > > expand_widen_pattern_expr
> > > > > > > (sepops ops, rtx op0, rtx op1, rtx wide_op,
> > > > > > > >    gcc_assert (icode != CODE_FOR_nothing);
> > > > > > > >
> > > > > > > >    if (nops >= 2)
> > > > > > > > -    {
> > > > > > > > -      oprnd1 = ops->op1;
> > > > > > > > -      tmode1 = TYPE_MODE (TREE_TYPE (oprnd1));
> > > > > > > > -    }
> > > > > > > > +    tmode1 = TYPE_MODE (TREE_TYPE (oprnd1));
> > > > > > > >    else if (sbool)
> > > > > > > >      {
> > > > > > > >        nops = 2;
> > > > > > > > @@ -316,7 +339,6 @@ expand_widen_pattern_expr (sepops
> ops,
> > > rtx
> > > > > > > > op0,
> > > > > > > rtx op1, rtx wide_op,
> > > > > > > >      {
> > > > > > > >        gcc_assert (tmode1 == tmode0);
> > > > > > > >        gcc_assert (op1);
> > > > > > > > -      oprnd2 = ops->op2;
> > > > > > > >        wmode = TYPE_MODE (TREE_TYPE (oprnd2));
> > > > > > > >      }
> > > > > > > >
> > > > > > > > diff --git a/gcc/optabs.def b/gcc/optabs.def index
> > > > > > > >
> > > > > > >
> > > > >
> > >
> b192a9d070b8aa72e5676b2eaa020b5bdd7ffcc8..f470c2168378cec840edf7fbd
> > > > > > > b7c
> > > > > > > > 18615baae928 100644
> > > > > > > > --- a/gcc/optabs.def
> > > > > > > > +++ b/gcc/optabs.def
> > > > > > > > @@ -352,6 +352,7 @@ OPTAB_D (uavg_ceil_optab,
> "uavg$a3_ceil")
> > > > > > > OPTAB_D
> > > > > > > > (sdot_prod_optab, "sdot_prod$I$a")  OPTAB_D
> > > (ssum_widen_optab,
> > > > > > > > "widen_ssum$I$a3")  OPTAB_D (udot_prod_optab,
> > > "udot_prod$I$a")
> > > > > > > > +OPTAB_D (usdot_prod_optab, "usdot_prod$I$a")
> > > > > > > >  OPTAB_D (usum_widen_optab, "widen_usum$I$a3")
> OPTAB_D
> > > > > > > (usad_optab,
> > > > > > > > "usad$I$a")  OPTAB_D (ssad_optab, "ssad$I$a") diff --git
> > > > > > > > a/gcc/tree-cfg.c b/gcc/tree-cfg.c index
> > > > > > > >
> > > > > > >
> > > > >
> > >
> 7e3aae5f9c28a49feedc7cc66e8ac0d476b9f28a..58b55bb648ad97d514f1fa18bb
> > > > > > > 00
> > > > > > > > 808fd2678b42 100644
> > > > > > > > --- a/gcc/tree-cfg.c
> > > > > > > > +++ b/gcc/tree-cfg.c
> > > > > > > > @@ -4421,7 +4421,8 @@ verify_gimple_assign_ternary (gassign
> > > *stmt)
> > > > > > > >  		  && !SCALAR_FLOAT_TYPE_P (rhs1_type))
> > > > > > > >  		 || (!INTEGRAL_TYPE_P (lhs_type)
> > > > > > > >  		     && !SCALAR_FLOAT_TYPE_P (lhs_type))))
> > > > > > > > -	    || !types_compatible_p (rhs1_type, rhs2_type)
> > > > > > > > +	    || (!types_compatible_p (rhs1_type, rhs2_type)
> > > > > > > > +		&& TYPE_SIGN (rhs1_type) == TYPE_SIGN
> > > (rhs2_type))
> > > > > > >
> > > > > > > That's not restrictive enough.  I suggest you use
> > > > > > >
> > > > > > >             && element_precision (rhs1_type) !=
> > > > > > > element_precision
> > > > > > > (rhs2_type)
> > > > > > >
> > > > > > > instead.
> > > > > > >
> > > > > > > As said, I'm not sure all the changes in this patch are required.
> > > > > > >
> > > > > > > Please elaborate.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Richard.
> > > > > > >
> > > > > > > >  	    || !useless_type_conversion_p (lhs_type, rhs3_type)
> > > > > > > >  	    || maybe_lt (GET_MODE_SIZE (element_mode
> > > (rhs3_type)),
> > > > > > > >  			 2 * GET_MODE_SIZE (element_mode
> > > (rhs1_type))))
> > > > > > > diff --git
> > > > > > > > a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index
> > > > > > > >
> > > > > > >
> > > > >
> > >
> 93fa2928e001c154bd4a9a73ac1dbbbf73c456df..cb8f5fbb6abca181c4171194d1
> > > > > > > 9f
> > > > > > > > ec29ec6e4176 100644
> > > > > > > > --- a/gcc/tree-vect-loop.c
> > > > > > > > +++ b/gcc/tree-vect-loop.c
> > > > > > > > @@ -6401,6 +6401,33 @@ build_vect_cond_expr (enum
> tree_code
> > > > > code,
> > > > > > > tree vop[3], tree mask,
> > > > > > > >      }
> > > > > > > >  }
> > > > > > > >
> > > > > > > > +/* Determine the optab_subtype to use for the given CODE
> and
> > > STMT.
> > > > > > > For
> > > > > > > > +   most CODE this will be optab_vector, however for certain
> > > > > > > > + operations
> > > > > > > such as
> > > > > > > > +   DOT_PROD_EXPR where the operation can different signs for
> > > > > > > > + the
> > > > > > > operands we
> > > > > > > > +   need to be able to pick the right optabs.  */
> > > > > > > > +
> > > > > > > > +static enum optab_subtype
> > > > > > > > +vect_determine_dot_kind (tree_code code, stmt_vec_info
> > > > > > > > +stmt_vinfo) {
> > > > > > > > +  enum optab_subtype subtype = optab_vector;
> > > > > > > > +  switch (code)
> > > > > > > > +    {
> > > > > > > > +      case DOT_PROD_EXPR:
> > > > > > > > +	{
> > > > > > > > +	  gassign *stmt = as_a <gassign *> (STMT_VINFO_STMT
> > > (stmt_vinfo));
> > > > > > > > +	  signop rhs1_sign = TYPE_SIGN (TREE_TYPE
> > > > > > > > +(gimple_assign_rhs1
> > > > > > > (stmt)));
> > > > > > > > +	  signop rhs2_sign = TYPE_SIGN (TREE_TYPE
> > > > > > > > +(gimple_assign_rhs2
> > > > > > > (stmt)));
> > > > > > > > +	  if (rhs1_sign != rhs2_sign)
> > > > > > > > +	    subtype |= optab_unsigned_to_signed;
> > > > > > > > +	  break;
> > > > > > > > +	}
> > > > > > > > +      default:
> > > > > > > > +	break;
> > > > > > > > +    }
> > > > > > > > +
> > > > > > > > +  return subtype;
> > > > > > > > +}
> > > > > > > > +
> > > > > > > >  /* Function vectorizable_reduction.
> > > > > > > >
> > > > > > > >     Check if STMT_INFO performs a reduction operation that can
> > > > > > > > be
> > > > > > > vectorized.
> > > > > > > > @@ -7189,7 +7216,8 @@ vectorizable_reduction (loop_vec_info
> > > > > > > loop_vinfo,
> > > > > > > >        bool ok = true;
> > > > > > > >
> > > > > > > >        /* 4.1. check support for the operation in the loop  */
> > > > > > > > -      optab optab = optab_for_tree_code (code, vectype_in,
> > > > > optab_vector);
> > > > > > > > +      enum optab_subtype subtype = vect_determine_dot_kind
> > > > > > > > + (code,
> > > > > > > stmt_info);
> > > > > > > > +      optab optab = optab_for_tree_code (code, vectype_in,
> > > > > > > > + subtype);
> > > > > > > >        if (!optab)
> > > > > > > >  	{
> > > > > > > >  	  if (dump_enabled_p ())
> > > > > > > > diff --git a/gcc/tree-vect-patterns.c
> > > > > > > > b/gcc/tree-vect-patterns.c index
> > > > > > > >
> > > > > > >
> > > > >
> > >
> 441d6cd28c4eaded7abd756164890dbcffd2f3b8..943c001fb13777b4d1513841f
> > > > > > > a84
> > > > > > > > 942316846d5e 100644
> > > > > > > > --- a/gcc/tree-vect-patterns.c
> > > > > > > > +++ b/gcc/tree-vect-patterns.c
> > > > > > > > @@ -201,7 +201,8 @@ vect_get_external_def_edge (vec_info
> > > > > > > > *vinfo, tree
> > > > > > > > var)  static bool  vect_supportable_direct_optab_p (vec_info
> > > > > > > > *vinfo, tree otype, tree_code code,
> > > > > > > >  				 tree itype, tree *vecotype_out,
> > > > > > > > -				 tree *vecitype_out = NULL)
> > > > > > > > +				 tree *vecitype_out = NULL,
> > > > > > > > +				 enum optab_subtype subtype =
> > > > > > > optab_default)
> > > > > > > >  {
> > > > > > > >    tree vecitype = get_vectype_for_scalar_type (vinfo, itype);
> > > > > > > >    if (!vecitype)
> > > > > > > > @@ -211,7 +212,7 @@ vect_supportable_direct_optab_p
> (vec_info
> > > > > > > > *vinfo,
> > > > > > > tree otype, tree_code code,
> > > > > > > >    if (!vecotype)
> > > > > > > >      return false;
> > > > > > > >
> > > > > > > > -  optab optab = optab_for_tree_code (code, vecitype,
> > > > > > > > optab_default);
> > > > > > > > +  optab optab = optab_for_tree_code (code, vecitype,
> > > > > > > > + subtype);
> > > > > > > >    if (!optab)
> > > > > > > >      return false;
> > > > > > > >
> > > > > > > > @@ -487,14 +488,31 @@ vect_joust_widened_integer (tree
> type,
> > > > > > > > bool shift_p, tree op,  }
> > > > > > > >
> > > > > > > >  /* Return true if the common supertype of NEW_TYPE and
> > > > > > > *COMMON_TYPE
> > > > > > > > -   is narrower than type, storing the supertype in
> *COMMON_TYPE
> > > if
> > > > > so.
> > > > > > > */
> > > > > > > > +   is narrower than type, storing the supertype in
> > > > > > > > + *COMMON_TYPE if
> > > > > so.
> > > > > > > > +   If ALLOW_SHORT_SIGN_MISMATCH then accept that
> > > > > *COMMON_TYPE
> > > > > > > and NEW_TYPE
> > > > > > > > +   may be of different signs but equal precision.   */
> > > > > > > >
> > > > > > > >  static bool
> > > > > > > > -vect_joust_widened_type (tree type, tree new_type, tree
> > > > > > > *common_type)
> > > > > > > > +vect_joust_widened_type (tree type, tree new_type, tree
> > > > > > > *common_type,
> > > > > > > > +			 bool allow_short_sign_mismatch = false)
> > > > > > > >  {
> > > > > > > >    if (types_compatible_p (*common_type, new_type))
> > > > > > > >      return true;
> > > > > > > >
> > > > > > > > +  /* Check if the mismatch is only in the sign and if we have
> > > > > > > > +     allow_short_sign_mismatch then allow it.  */
> > > > > > > > +  if (allow_short_sign_mismatch
> > > > > > > > +      && TYPE_SIGN (*common_type) != TYPE_SIGN (new_type))
> > > > > > > > +    {
> > > > > > > > +      bool sign = TYPE_SIGN (*common_type) == UNSIGNED;
> > > > > > > > +      tree eq_type
> > > > > > > > +	= build_nonstandard_integer_type (TYPE_PRECISION
> > > (new_type),
> > > > > > > > +					  sign);
> > > > > > > > +
> > > > > > > > +      if (types_compatible_p (*common_type, eq_type))
> > > > > > > > +	return true;
> > > > > > > > +    }
> > > > > > > > +
> > > > > > > >    /* See if *COMMON_TYPE can hold all values of NEW_TYPE.
> */
> > > > > > > >    if ((TYPE_PRECISION (new_type) < TYPE_PRECISION
> > > (*common_type))
> > > > > > > >        && (TYPE_UNSIGNED (new_type) || !TYPE_UNSIGNED
> > > > > > > (*common_type)))
> > > > > > > > @@ -532,6 +550,9 @@ vect_joust_widened_type (tree type,
> tree
> > > > > > > new_type, tree *common_type)
> > > > > > > >     to a type that (a) is narrower than the result of STMT_INFO
> and
> > > > > > > >     (b) can hold all leaf operand values.
> > > > > > > >
> > > > > > > > +   If ALLOW_SHORT_SIGN_MISMATCH then allow that the signs
> of
> > > > > > > > + the
> > > > > > > operands
> > > > > > > > +   may differ in signs but not in precision.
> > > > > > > > +
> > > > > > > >     Return 0 if STMT_INFO isn't such a tree, or if no such
> > > COMMON_TYPE
> > > > > > > >     exists.  */
> > > > > > > >
> > > > > > > > @@ -539,7 +560,8 @@ static unsigned int
> vect_widened_op_tree
> > > > > > > > (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
> > > > > > > >  		      tree_code widened_code, bool shift_p,
> > > > > > > >  		      unsigned int max_nops,
> > > > > > > > -		      vect_unpromoted_value *unprom, tree
> > > *common_type)
> > > > > > > > +		      vect_unpromoted_value *unprom, tree
> > > *common_type,
> > > > > > > > +		      bool allow_short_sign_mismatch = false)
> > > > > > > >  {
> > > > > > > >    /* Check for an integer operation with the right code.  */
> > > > > > > >    gassign *assign = dyn_cast <gassign *> (stmt_info->stmt);
> > > > > > > > @@
> > > > > > > > -600,7
> > > > > > > > +622,8 @@ vect_widened_op_tree (vec_info *vinfo,
> > > stmt_vec_info
> > > > > > > stmt_info, tree_code code,
> > > > > > > >  		= vinfo->lookup_def (this_unprom->op);
> > > > > > > >  	      nops = vect_widened_op_tree (vinfo, def_stmt_info,
> > > code,
> > > > > > > >  					   widened_code, shift_p,
> > > max_nops,
> > > > > > > > -					   this_unprom,
> > > common_type);
> > > > > > > > +					   this_unprom,
> > > common_type,
> > > > > > > > +
> > > allow_short_sign_mismatch);
> > > > > > > >  	      if (nops == 0)
> > > > > > > >  		return 0;
> > > > > > > >
> > > > > > > > @@ -617,7 +640,8 @@ vect_widened_op_tree (vec_info *vinfo,
> > > > > > > stmt_vec_info stmt_info, tree_code code,
> > > > > > > >  	      if (i == 0)
> > > > > > > >  		*common_type = this_unprom->type;
> > > > > > > >  	      else if (!vect_joust_widened_type (type, this_unprom-
> > > >type,
> > > > > > > > -						 common_type))
> > > > > > > > +						 common_type,
> > > > > > > > +
> > > allow_short_sign_mismatch))
> > > > > > > >  		return 0;
> > > > > > > >  	    }
> > > > > > > >  	}
> > > > > > > > @@ -888,21 +912,24 @@ vect_reassociating_reduction_p
> (vec_info
> > > > > > > > *vinfo,
> > > > > > > >
> > > > > > > >     Try to find the following pattern:
> > > > > > > >
> > > > > > > > -     type x_t, y_t;
> > > > > > > > +     type1a x_t
> > > > > > > > +     type1b y_t;
> > > > > > > >       TYPE1 prod;
> > > > > > > >       TYPE2 sum = init;
> > > > > > > >     loop:
> > > > > > > >       sum_0 = phi <init, sum_1>
> > > > > > > >       S1  x_t = ...
> > > > > > > >       S2  y_t = ...
> > > > > > > > -     S3  x_T = (TYPE1) x_t;
> > > > > > > > -     S4  y_T = (TYPE1) y_t;
> > > > > > > > +     S3  x_T = (TYPE3) x_t;
> > > > > > > > +     S4  y_T = (TYPE4) y_t;
> > > > > > > >       S5  prod = x_T * y_T;
> > > > > > > >       [S6  prod = (TYPE2) prod;  #optional]
> > > > > > > >       S7  sum_1 = prod + sum_0;
> > > > > > > >
> > > > > > > > -   where 'TYPE1' is exactly double the size of type 'type', and
> 'TYPE2'
> > > is
> > > > > the
> > > > > > > > -   same size of 'TYPE1' or bigger. This is a special case of a
> reduction
> > > > > > > > +   where 'TYPE1' is exactly double the size of type 'type1a' and
> > > 'type1b',
> > > > > > > > +   the sign of 'TYPE1' must be one of 'type1a' or 'type1b' but the
> > > sign of
> > > > > > > > +   'type1a' and 'type1b' can differ. 'TYPE2' is the same size of
> 'TYPE1'
> > > or
> > > > > > > > +   bigger and must be the same sign. This is a special case
> > > > > > > > + of a reduction
> > > > > > > >     computation.
> > > > > > > >
> > > > > > > >     Input:
> > > > > > > > @@ -939,15 +966,16 @@ vect_recog_dot_prod_pattern
> (vec_info
> > > > > > > > *vinfo,
> > > > > > > >
> > > > > > > >    /* Look for the following pattern
> > > > > > > >            DX = (TYPE1) X;
> > > > > > > > -          DY = (TYPE1) Y;
> > > > > > > > +	  DY = (TYPE2) Y;
> > > > > > > >            DPROD = DX * DY;
> > > > > > > > -          DDPROD = (TYPE2) DPROD;
> > > > > > > > +	  DDPROD = (TYPE3) DPROD;
> > > > > > > >            sum_1 = DDPROD + sum_0;
> > > > > > > >       In which
> > > > > > > >       - DX is double the size of X
> > > > > > > >       - DY is double the size of Y
> > > > > > > >       - DX, DY, DPROD all have the same type but the sign
> > > > > > > > -       between DX, DY and DPROD can differ.
> > > > > > > > +       between DX, DY and DPROD can differ. The sign of DPROD
> > > > > > > > +       is one of the signs of DX or DY.
> > > > > > > >       - sum is the same size of DPROD or bigger
> > > > > > > >       - sum has been recognized as a reduction variable.
> > > > > > > >
> > > > > > > > @@ -986,14 +1014,41 @@ vect_recog_dot_prod_pattern
> (vec_info
> > > > > *vinfo,
> > > > > > > >       inside the loop (in case we are analyzing an outer-loop).  */
> > > > > > > >    vect_unpromoted_value unprom0[2];
> > > > > > > >    if (!vect_widened_op_tree (vinfo, mult_vinfo, MULT_EXPR,
> > > > > > > WIDEN_MULT_EXPR,
> > > > > > > > -			     false, 2, unprom0, &half_type))
> > > > > > > > +			     false, 2, unprom0, &half_type, true))
> > > > > > > >      return NULL;
> > > > > > > >
> > > > > > > > +  /* Check to see if there is a sign change happening in the
> > > > > > > > + operands of
> > > > > > > the
> > > > > > > > +     multiplication and pick the appropriate optab subtype.
> > > > > > > > +*/
> > > > > > > > +  enum optab_subtype subtype;
> > > > > > > > +  tree rhs_type1 = unprom0[0].type;
> > > > > > > > +  tree rhs_type2 = unprom0[1].type;
> > > > > > > > +  if (TYPE_SIGN (rhs_type1) == TYPE_SIGN (rhs_type2))
> > > > > > > > +     subtype = optab_default;
> > > > > > > > +  else if (TYPE_SIGN (rhs_type1) == SIGNED
> > > > > > > > +	   && TYPE_SIGN (rhs_type2) == UNSIGNED)
> > > > > > > > +     subtype = optab_signed_to_unsigned;
> > > > > > > > +  else if (TYPE_SIGN (rhs_type1) == UNSIGNED
> > > > > > > > +	   && TYPE_SIGN (rhs_type2) == SIGNED)
> > > > > > > > +     subtype = optab_unsigned_to_signed;
> > > > > > > > +  else
> > > > > > > > +    gcc_unreachable ();
> > > > > > > > +
> > > > > > > > +  /* If we have a sign changing dot product we need to check
> that
> > > the
> > > > > > > > +     promoted type if unsigned has at least the same
> > > > > > > > + precision as the
> > > > > final
> > > > > > > > +     type of the dot-product.  */
> > > > > > > > +  if (subtype != optab_default)
> > > > > > > > +    {
> > > > > > > > +      tree mult_type = TREE_TYPE (unprom_mult.op);
> > > > > > > > +      if (TYPE_SIGN (mult_type) == UNSIGNED
> > > > > > > > +	  && TYPE_PRECISION (mult_type) < TYPE_PRECISION (type))
> > > > > > > > +	return NULL;
> > > > > > > > +    }
> > > > > > > > +
> > > > > > > >    vect_pattern_detected ("vect_recog_dot_prod_pattern",
> > > > > > > > last_stmt);
> > > > > > > >
> > > > > > > >    tree half_vectype;
> > > > > > > >    if (!vect_supportable_direct_optab_p (vinfo, type,
> > > > > > > > DOT_PROD_EXPR,
> > > > > > > half_type,
> > > > > > > > -					type_out, &half_vectype))
> > > > > > > > +					type_out, &half_vectype,
> > > subtype))
> > > > > > > >      return NULL;
> > > > > > > >
> > > > > > > >    /* Get the inputs in the appropriate types.  */ @@ -1002,8
> > > > > > > > +1057,22 @@ vect_recog_dot_prod_pattern (vec_info *vinfo,
> > > > > > > >  		       unprom0, half_vectype);
> > > > > > > >
> > > > > > > >    var = vect_recog_temp_ssa_var (type, NULL);
> > > > > > > > +
> > > > > > > > +  /* If we have a sign changing dot-product the dot-product
> > > > > > > > + itself does
> > > > > any
> > > > > > > > +     sign conversions, so consume the type and use the
> > > > > > > > + unpromoted types.  */  tree mult_arg1, mult_arg2;  if
> > > > > > > > + (subtype ==
> > > > > > > > + optab_default)
> > > > > > > > +    {
> > > > > > > > +      mult_arg1 = mult_oprnd[0];
> > > > > > > > +      mult_arg2 = mult_oprnd[1];
> > > > > > > > +    }
> > > > > > > > +  else
> > > > > > > > +    {
> > > > > > > > +      mult_arg1 = unprom0[0].op;
> > > > > > > > +      mult_arg2 = unprom0[1].op;
> > > > > > > > +    }
> > > > > > > >    pattern_stmt = gimple_build_assign (var, DOT_PROD_EXPR,
> > > > > > > > -				      mult_oprnd[0], mult_oprnd[1],
> > > oprnd1);
> > > > > > > > +				      mult_arg1, mult_arg2, oprnd1);
> > > > > > > >
> > > > > > > >    return pattern_stmt;
> > > > > > > >  }
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Richard Biener <rguenther@suse.de> SUSE Software Solutions
> > > > > > > Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany;
> GF:
> > > > > > > Felix Imend?rffer; HRB 36809 (AG Nuernberg)
> > > > > >
> > > > >
> > > > > --
> > > > > Richard Biener <rguenther@suse.de>
> > > > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409
> > > > > Nuernberg, Germany; GF: Felix Imend?rffer; HRB 36809 (AG
> Nuernberg)
> > > >
> > >
> > > --
> > > Richard Biener <rguenther@suse.de>
> > > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409
> > > Nuernberg, Germany; GF: Felix Imend?rffer; HRB 36809 (AG Nuernberg)
> >
> 
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409
> Nuernberg,
> Germany; GF: Felix Imend
Tamar Christina June 4, 2021, 10:12 a.m. UTC | #9
Hi Richi,

Attached is re-spun patch.  tree_nop_conversion_p was very handy in cleaning up the patch, Thanks!

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master if Richard S has no comments?

Thanks,
Tamar

gcc/ChangeLog:

	* optabs.def (usdot_prod_optab): New.
	* doc/md.texi: Document it and clarify other dot prod optabs.
	* optabs-tree.h (enum optab_subtype): Add optab_vector_mixed_sign.
	* optabs-tree.c (optab_for_tree_code): Support usdot_prod_optab.
	* optabs.c (expand_widen_pattern_expr): Likewise.
	* tree-cfg.c (verify_gimple_assign_ternary): Likewise.
	* tree-vect-loop.c (vectorizable_reduction): Query dot-product kind.
	* tree-vect-patterns.c (vect_supportable_direct_optab_p): Take optional
	optab subtype.
	(vect_joust_widened_type, vect_widened_op_tree): Optionally ignore
	mismatch types.
	(vect_recog_dot_prod_pattern): Support usdot_prod_optab.


--- inline copy of patch ---

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index d166a0debedf4d8edf55c842bcf4ff4690b3e9ce..9fad3322b3f1eb2a836833bb390df78f0cd9734b 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5438,13 +5438,55 @@ Like @samp{fold_left_plus_@var{m}}, but takes an additional mask operand
 
 @cindex @code{sdot_prod@var{m}} instruction pattern
 @item @samp{sdot_prod@var{m}}
+
+Compute the sum of the products of two signed elements.
+Operand 1 and operand 2 are of the same mode. Their
+product, which is of a wider mode, is computed and added to operand 3.
+Operand 3 is of a mode equal or wider than the mode of the product. The
+result is placed in operand 0, which is of the same mode as operand 3.
+
+Semantically the expressions perform the multiplication in the following signs
+
+@smallexample
+sdot<signed c, signed a, signed b> ==
+   res = sign-ext (a) * sign-ext (b) + c
+@dots{}
+@end smallexample
+
 @cindex @code{udot_prod@var{m}} instruction pattern
-@itemx @samp{udot_prod@var{m}}
-Compute the sum of the products of two signed/unsigned elements.
-Operand 1 and operand 2 are of the same mode. Their product, which is of a
-wider mode, is computed and added to operand 3. Operand 3 is of a mode equal or
-wider than the mode of the product. The result is placed in operand 0, which
-is of the same mode as operand 3.
+@item @samp{udot_prod@var{m}}
+
+Compute the sum of the products of two unsigned elements.
+Operand 1 and operand 2 are of the same mode. Their
+product, which is of a wider mode, is computed and added to operand 3.
+Operand 3 is of a mode equal or wider than the mode of the product. The
+result is placed in operand 0, which is of the same mode as operand 3.
+
+Semantically the expressions perform the multiplication in the following signs
+
+@smallexample
+udot<unsigned c, unsigned a, unsigned b> ==
+   res = zero-ext (a) * zero-ext (b) + c
+@dots{}
+@end smallexample
+
+
+
+@cindex @code{usdot_prod@var{m}} instruction pattern
+@item @samp{usdot_prod@var{m}}
+Compute the sum of the products of elements of different signs.
+Operand 1 must be unsigned and operand 2 signed. Their
+product, which is of a wider mode, is computed and added to operand 3.
+Operand 3 is of a mode equal or wider than the mode of the product. The
+result is placed in operand 0, which is of the same mode as operand 3.
+
+Semantically the expressions perform the multiplication in the following signs
+
+@smallexample
+usdot<unsigned c, unsigned a, signed b> ==
+   res = ((unsigned-conv) sign-ext (a)) * zero-ext (b) + c
+@dots{}
+@end smallexample
 
 @cindex @code{ssad@var{m}} instruction pattern
 @item @samp{ssad@var{m}}
diff --git a/gcc/optabs-tree.h b/gcc/optabs-tree.h
index c3aaa1a416991e856d3e24da45968a92ebada82c..fbd2b06b8dbfd560dfb66b314830e6b564b37abb 100644
--- a/gcc/optabs-tree.h
+++ b/gcc/optabs-tree.h
@@ -29,7 +29,8 @@ enum optab_subtype
 {
   optab_default,
   optab_scalar,
-  optab_vector
+  optab_vector,
+  optab_vector_mixed_sign
 };
 
 /* Return the optab used for computing the given operation on the type given by
diff --git a/gcc/optabs-tree.c b/gcc/optabs-tree.c
index 95ffe397c23e80c105afea52e9d47216bf52f55a..eeb5aeed3202cc6971b6447994bc5311e9c010bb 100644
--- a/gcc/optabs-tree.c
+++ b/gcc/optabs-tree.c
@@ -127,7 +127,12 @@ optab_for_tree_code (enum tree_code code, const_tree type,
       return TYPE_UNSIGNED (type) ? usum_widen_optab : ssum_widen_optab;
 
     case DOT_PROD_EXPR:
-      return TYPE_UNSIGNED (type) ? udot_prod_optab : sdot_prod_optab;
+      {
+	if (subtype == optab_vector_mixed_sign)
+	  return usdot_prod_optab;
+
+	return (TYPE_UNSIGNED (type) ? udot_prod_optab : sdot_prod_optab);
+      }
 
     case SAD_EXPR:
       return TYPE_UNSIGNED (type) ? usad_optab : ssad_optab;
diff --git a/gcc/optabs.c b/gcc/optabs.c
index f4614a394587787293dc8b680a38901f7906f61c..d9b64441d0e0726afee89dc9c937350451e7670d 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -262,6 +262,11 @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1, rtx wide_op,
   bool sbool = false;
 
   oprnd0 = ops->op0;
+  if (nops >= 2)
+    oprnd1 = ops->op1;
+  if (nops >= 3)
+    oprnd2 = ops->op2;
+
   tmode0 = TYPE_MODE (TREE_TYPE (oprnd0));
   if (ops->code == VEC_UNPACK_FIX_TRUNC_HI_EXPR
       || ops->code == VEC_UNPACK_FIX_TRUNC_LO_EXPR)
@@ -285,6 +290,27 @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1, rtx wide_op,
 	   ? vec_unpacks_sbool_hi_optab : vec_unpacks_sbool_lo_optab);
       sbool = true;
     }
+  else if (ops->code == DOT_PROD_EXPR)
+    {
+      enum optab_subtype subtype = optab_default;
+      signop sign1 = TYPE_SIGN (TREE_TYPE (oprnd0));
+      signop sign2 = TYPE_SIGN (TREE_TYPE (oprnd1));
+      if (sign1 == sign2)
+	;
+      else if (sign1 == SIGNED && sign2 == UNSIGNED)
+	{
+	  subtype = optab_vector_mixed_sign;
+	  /* Same as optab_vector_mixed_sign but flip the operands.  */
+	  std::swap (op0, op1);
+	}
+      else if (sign1 == UNSIGNED && sign2 == SIGNED)
+	subtype = optab_vector_mixed_sign;
+      else
+	gcc_unreachable ();
+
+      widen_pattern_optab
+	= optab_for_tree_code (ops->code, TREE_TYPE (oprnd0), subtype);
+    }
   else
     widen_pattern_optab
       = optab_for_tree_code (ops->code, TREE_TYPE (oprnd0), optab_default);
@@ -298,10 +324,7 @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1, rtx wide_op,
   gcc_assert (icode != CODE_FOR_nothing);
 
   if (nops >= 2)
-    {
-      oprnd1 = ops->op1;
-      tmode1 = TYPE_MODE (TREE_TYPE (oprnd1));
-    }
+    tmode1 = TYPE_MODE (TREE_TYPE (oprnd1));
   else if (sbool)
     {
       nops = 2;
@@ -316,7 +339,6 @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1, rtx wide_op,
     {
       gcc_assert (tmode1 == tmode0);
       gcc_assert (op1);
-      oprnd2 = ops->op2;
       wmode = TYPE_MODE (TREE_TYPE (oprnd2));
     }
 
diff --git a/gcc/optabs.def b/gcc/optabs.def
index b192a9d070b8aa72e5676b2eaa020b5bdd7ffcc8..f470c2168378cec840edf7fbdb7c18615baae928 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -352,6 +352,7 @@ OPTAB_D (uavg_ceil_optab, "uavg$a3_ceil")
 OPTAB_D (sdot_prod_optab, "sdot_prod$I$a")
 OPTAB_D (ssum_widen_optab, "widen_ssum$I$a3")
 OPTAB_D (udot_prod_optab, "udot_prod$I$a")
+OPTAB_D (usdot_prod_optab, "usdot_prod$I$a")
 OPTAB_D (usum_widen_optab, "widen_usum$I$a3")
 OPTAB_D (usad_optab, "usad$I$a")
 OPTAB_D (ssad_optab, "ssad$I$a")
diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index 7e3aae5f9c28a49feedc7cc66e8ac0d476b9f28a..0128891852fcd74fe31cd338614e90a26256b4bd 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -4421,7 +4421,8 @@ verify_gimple_assign_ternary (gassign *stmt)
 		  && !SCALAR_FLOAT_TYPE_P (rhs1_type))
 		 || (!INTEGRAL_TYPE_P (lhs_type)
 		     && !SCALAR_FLOAT_TYPE_P (lhs_type))))
-	    || !types_compatible_p (rhs1_type, rhs2_type)
+	    /* rhs1_type and rhs2_type may differ in sign.  */
+	    || !tree_nop_conversion_p (rhs1_type, rhs2_type)
 	    || !useless_type_conversion_p (lhs_type, rhs3_type)
 	    || maybe_lt (GET_MODE_SIZE (element_mode (rhs3_type)),
 			 2 * GET_MODE_SIZE (element_mode (rhs1_type))))
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 93fa2928e001c154bd4a9a73ac1dbbbf73c456df..756d2867b678d0d8394202c6adb03d9cd26029e7 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -6662,6 +6662,12 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
   bool lane_reduc_code_p
     = (code == DOT_PROD_EXPR || code == WIDEN_SUM_EXPR || code == SAD_EXPR);
   int op_type = TREE_CODE_LENGTH (code);
+  enum optab_subtype optab_query_kind = optab_vector;
+  if (code == DOT_PROD_EXPR
+      && TYPE_SIGN (TREE_TYPE (gimple_assign_rhs1 (stmt)))
+	   != TYPE_SIGN (TREE_TYPE (gimple_assign_rhs2 (stmt))))
+    optab_query_kind = optab_vector_mixed_sign;
+
 
   scalar_dest = gimple_assign_lhs (stmt);
   scalar_type = TREE_TYPE (scalar_dest);
@@ -7189,7 +7195,7 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
       bool ok = true;
 
       /* 4.1. check support for the operation in the loop  */
-      optab optab = optab_for_tree_code (code, vectype_in, optab_vector);
+      optab optab = optab_for_tree_code (code, vectype_in, optab_query_kind);
       if (!optab)
 	{
 	  if (dump_enabled_p ())
diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
index 441d6cd28c4eaded7abd756164890dbcffd2f3b8..82123b96313e6783ea214b9259805d65c07d8858 100644
--- a/gcc/tree-vect-patterns.c
+++ b/gcc/tree-vect-patterns.c
@@ -201,7 +201,8 @@ vect_get_external_def_edge (vec_info *vinfo, tree var)
 static bool
 vect_supportable_direct_optab_p (vec_info *vinfo, tree otype, tree_code code,
 				 tree itype, tree *vecotype_out,
-				 tree *vecitype_out = NULL)
+				 tree *vecitype_out = NULL,
+				 enum optab_subtype subtype = optab_default)
 {
   tree vecitype = get_vectype_for_scalar_type (vinfo, itype);
   if (!vecitype)
@@ -211,7 +212,7 @@ vect_supportable_direct_optab_p (vec_info *vinfo, tree otype, tree_code code,
   if (!vecotype)
     return false;
 
-  optab optab = optab_for_tree_code (code, vecitype, optab_default);
+  optab optab = optab_for_tree_code (code, vecitype, subtype);
   if (!optab)
     return false;
 
@@ -487,10 +488,14 @@ vect_joust_widened_integer (tree type, bool shift_p, tree op,
 }
 
 /* Return true if the common supertype of NEW_TYPE and *COMMON_TYPE
-   is narrower than type, storing the supertype in *COMMON_TYPE if so.  */
+   is narrower than type, storing the supertype in *COMMON_TYPE if so.
+   If UNPROM_TYPE then accept that *COMMON_TYPE and NEW_TYPE may be of
+   different signs but equal precision and that the resulting
+   multiplication of them be compatible with UNPROM_TYPE.   */
 
 static bool
-vect_joust_widened_type (tree type, tree new_type, tree *common_type)
+vect_joust_widened_type (tree type, tree new_type, tree *common_type,
+			 tree unprom_type = NULL)
 {
   if (types_compatible_p (*common_type, new_type))
     return true;
@@ -514,7 +519,18 @@ vect_joust_widened_type (tree type, tree new_type, tree *common_type)
   unsigned int precision = MAX (TYPE_PRECISION (*common_type),
 				TYPE_PRECISION (new_type));
   precision *= 2;
-  if (precision * 2 > TYPE_PRECISION (type))
+
+  /* Check if the mismatch is only in the sign and if we have
+     UNPROM_TYPE then allow it if there is enough precision to
+     not lose any information during the conversion.  */
+  if (unprom_type
+      && TYPE_SIGN (unprom_type) == SIGNED
+      && tree_nop_conversion_p (*common_type, new_type))
+	return true;
+
+  /* The resulting application is unsigned, check if we have enough
+     precision to perform the operation.  */
+  if (precision * 2 > TYPE_PRECISION (unprom_type ? unprom_type : type))
     return false;
 
   *common_type = build_nonstandard_integer_type (precision, false);
@@ -532,6 +548,10 @@ vect_joust_widened_type (tree type, tree new_type, tree *common_type)
    to a type that (a) is narrower than the result of STMT_INFO and
    (b) can hold all leaf operand values.
 
+   If UNPROM_TYPE then allow that the signs of the operands
+   may differ in signs but not in precision and that the resulting type
+   of the operation on the operands is compatible with UNPROM_TYPE.
+
    Return 0 if STMT_INFO isn't such a tree, or if no such COMMON_TYPE
    exists.  */
 
@@ -539,7 +559,8 @@ static unsigned int
 vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
 		      tree_code widened_code, bool shift_p,
 		      unsigned int max_nops,
-		      vect_unpromoted_value *unprom, tree *common_type)
+		      vect_unpromoted_value *unprom, tree *common_type,
+		      tree unprom_type = NULL)
 {
   /* Check for an integer operation with the right code.  */
   gassign *assign = dyn_cast <gassign *> (stmt_info->stmt);
@@ -600,7 +621,8 @@ vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
 		= vinfo->lookup_def (this_unprom->op);
 	      nops = vect_widened_op_tree (vinfo, def_stmt_info, code,
 					   widened_code, shift_p, max_nops,
-					   this_unprom, common_type);
+					   this_unprom, common_type,
+					   unprom_type);
 	      if (nops == 0)
 		return 0;
 
@@ -617,7 +639,7 @@ vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
 	      if (i == 0)
 		*common_type = this_unprom->type;
 	      else if (!vect_joust_widened_type (type, this_unprom->type,
-						 common_type))
+						 common_type, unprom_type))
 		return 0;
 	    }
 	}
@@ -799,12 +821,15 @@ vect_convert_input (vec_info *vinfo, stmt_vec_info stmt_info, tree type,
 }
 
 /* Invoke vect_convert_input for N elements of UNPROM and store the
-   result in the corresponding elements of RESULT.  */
+   result in the corresponding elements of RESULT.
+
+   If ALLOW_SHORT_SIGN_MISMATCH then don't convert the types if they only
+   differ by sign.  */
 
 static void
 vect_convert_inputs (vec_info *vinfo, stmt_vec_info stmt_info, unsigned int n,
 		     tree *result, tree type, vect_unpromoted_value *unprom,
-		     tree vectype)
+		     tree vectype, bool allow_short_sign_mismatch = false)
 {
   for (unsigned int i = 0; i < n; ++i)
     {
@@ -812,8 +837,12 @@ vect_convert_inputs (vec_info *vinfo, stmt_vec_info stmt_info, unsigned int n,
       for (j = 0; j < i; ++j)
 	if (unprom[j].op == unprom[i].op)
 	  break;
+
       if (j < i)
 	result[i] = result[j];
+      else if (allow_short_sign_mismatch
+	       && tree_nop_conversion_p (type, unprom[i].type))
+	result[i] = unprom[i].op;
       else
 	result[i] = vect_convert_input (vinfo, stmt_info,
 					type, &unprom[i], vectype);
@@ -888,21 +917,24 @@ vect_reassociating_reduction_p (vec_info *vinfo,
 
    Try to find the following pattern:
 
-     type x_t, y_t;
+     type1a x_t
+     type1b y_t;
      TYPE1 prod;
      TYPE2 sum = init;
    loop:
      sum_0 = phi <init, sum_1>
      S1  x_t = ...
      S2  y_t = ...
-     S3  x_T = (TYPE1) x_t;
-     S4  y_T = (TYPE1) y_t;
+     S3  x_T = (TYPE3) x_t;
+     S4  y_T = (TYPE4) y_t;
      S5  prod = x_T * y_T;
      [S6  prod = (TYPE2) prod;  #optional]
      S7  sum_1 = prod + sum_0;
 
-   where 'TYPE1' is exactly double the size of type 'type', and 'TYPE2' is the
-   same size of 'TYPE1' or bigger. This is a special case of a reduction
+   where 'TYPE1' is exactly double the size of type 'type1a' and 'type1b',
+   the sign of 'TYPE1' must be one of 'type1a' or 'type1b' but the sign of
+   'type1a' and 'type1b' can differ. 'TYPE2' is the same size of 'TYPE1' or
+   bigger and must be the same sign. This is a special case of a reduction
    computation.
 
    Input:
@@ -939,15 +971,16 @@ vect_recog_dot_prod_pattern (vec_info *vinfo,
 
   /* Look for the following pattern
           DX = (TYPE1) X;
-          DY = (TYPE1) Y;
+	  DY = (TYPE2) Y;
           DPROD = DX * DY;
-          DDPROD = (TYPE2) DPROD;
+	  DDPROD = (TYPE3) DPROD;
           sum_1 = DDPROD + sum_0;
      In which
      - DX is double the size of X
      - DY is double the size of Y
      - DX, DY, DPROD all have the same type but the sign
-       between DX, DY and DPROD can differ.
+       between DX, DY and DPROD can differ. The sign of DPROD
+       is one of the signs of DX or DY.
      - sum is the same size of DPROD or bigger
      - sum has been recognized as a reduction variable.
 
@@ -986,20 +1019,29 @@ vect_recog_dot_prod_pattern (vec_info *vinfo,
      inside the loop (in case we are analyzing an outer-loop).  */
   vect_unpromoted_value unprom0[2];
   if (!vect_widened_op_tree (vinfo, mult_vinfo, MULT_EXPR, WIDEN_MULT_EXPR,
-			     false, 2, unprom0, &half_type))
+			     false, 2, unprom0, &half_type,
+			     TREE_TYPE (unprom_mult.op)))
     return NULL;
 
+  /* Check to see if there is a sign change happening in the operands of the
+     multiplication and pick the appropriate optab subtype.  */
+  enum optab_subtype subtype;
+  if (TYPE_SIGN (unprom0[0].type) == TYPE_SIGN (unprom0[1].type))
+    subtype = optab_default;
+  else
+    subtype = optab_vector_mixed_sign;
+
   vect_pattern_detected ("vect_recog_dot_prod_pattern", last_stmt);
 
   tree half_vectype;
   if (!vect_supportable_direct_optab_p (vinfo, type, DOT_PROD_EXPR, half_type,
-					type_out, &half_vectype))
+					type_out, &half_vectype, subtype))
     return NULL;
 
   /* Get the inputs in the appropriate types.  */
   tree mult_oprnd[2];
   vect_convert_inputs (vinfo, stmt_vinfo, 2, mult_oprnd, half_type,
-		       unprom0, half_vectype);
+		       unprom0, half_vectype, true);
 
   var = vect_recog_temp_ssa_var (type, NULL);
   pattern_stmt = gimple_build_assign (var, DOT_PROD_EXPR,

> -----Original Message-----
> From: Gcc-patches <gcc-patches-
> bounces+tamar.christina=arm.com@gcc.gnu.org> On Behalf Of Tamar
> Christina via Gcc-patches
> Sent: Wednesday, June 2, 2021 10:28 AM
> To: Richard Biener <rguenther@suse.de>
> Cc: Richard Sandiford <Richard.Sandiford@arm.com>; nd <nd@arm.com>;
> gcc-patches@gcc.gnu.org
> Subject: RE: [PATCH 1/4]middle-end Vect: Add support for dot-product
> where the sign for the multiplicant changes.
> 
> Ping,
> 
> Did you have any comments Richard S?
> 
> Otherwise I'll proceed with respining according to Richi's comments.
> 
> Regards,
> Tamar
> 
> > -----Original Message-----
> > From: Richard Biener <rguenther@suse.de>
> > Sent: Wednesday, May 26, 2021 9:57 AM
> > To: Tamar Christina <Tamar.Christina@arm.com>
> > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; Richard Sandiford
> > <Richard.Sandiford@arm.com>
> > Subject: RE: [PATCH 1/4]middle-end Vect: Add support for dot-product
> > where the sign for the multiplicant changes.
> >
> > On Tue, 25 May 2021, Tamar Christina wrote:
> >
> > > Hi Richi,
> > >
> > > Here's a respun version of the patch.
> > >
> > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > >
> > > Ok for master?
> >
> > index
> >
> 7e3aae5f9c28a49feedc7cc66e8ac0d476b9f28a..13e405edd765dde704c64348d
> > 2d0b3cd88f0af7c
> > 100644
> > --- a/gcc/tree-cfg.c
> > +++ b/gcc/tree-cfg.c
> > @@ -4421,7 +4421,9 @@ verify_gimple_assign_ternary (gassign *stmt)
> >                   && !SCALAR_FLOAT_TYPE_P (rhs1_type))
> >                  || (!INTEGRAL_TYPE_P (lhs_type)
> >                      && !SCALAR_FLOAT_TYPE_P (lhs_type))))
> > -           || !types_compatible_p (rhs1_type, rhs2_type)
> > +           || (!types_compatible_p (rhs1_type, rhs2_type)
> > +               && TYPE_SIGN (rhs1_type) == TYPE_SIGN (rhs2_type)
> > +               && TYPE_PRECISION (rhs1_type) != TYPE_PRECISION
> > (rhs2_type))
> >
> > I think this doesn't capture the constraints - instead please do
> >
> > -           || !types_compatible_p (rhs1_type, rhs2_type)
> > +           /* rhs1_type and rhs2_type may differ in sign.  */
> > +           || !tree_nop_conversion_p (rhs1_type, rhs2_type)
> >
> >
> > +/* Determine the optab_subtype to use for the given CODE and STMT.
> For
> > +   most CODE this will be optab_vector, however for certain
> > +operations
> > such as
> > +   DOT_PROD_EXPR where the operation can different signs for the
> > operands
> > we
> > +   need to be able to pick the right optabs.  */
> > +
> > +static enum optab_subtype
> > +vect_determine_dot_kind (tree_code code, stmt_vec_info stmt_vinfo)
> >
> > vect_determine_optab_subkind would be a better name.  'code' is
> > redundant (or should better match stmt_vinfo->stmts code).  I wonder
> > if it might be clearer to compute the subtype where we compute 'code'
> > and the relation to stmt_info is obvious, I mean here:
> >
> >   /* 3. Check the operands of the operation.  The first operands are
> > defined
> >         inside the loop body. The last operand is the reduction variable,
> >         which is defined by the loop-header-phi.  */
> >
> >   tree vectype_out = STMT_VINFO_VECTYPE (stmt_info);
> >   STMT_VINFO_REDUC_VECTYPE (reduc_info) = vectype_out;
> >   gassign *stmt = as_a <gassign *> (stmt_info->stmt);
> >   enum tree_code code = gimple_assign_rhs_code (stmt);
> >   bool lane_reduc_code_p
> >     = (code == DOT_PROD_EXPR || code == WIDEN_SUM_EXPR || code ==
> > SAD_EXPR);
> >
> > so just add
> >
> >   enum optab_subtype optab_query_kind = optab_vector;
> >   if (code == DOT_PROD_EXPR
> >       && <sign test>)
> >     optab_query_kind = optab_vector_mixed_sign;
> >
> > in this place and avoid adding the new function?
> >
> > I'm not too familiar with the pattern recog code, a 2nd eye would be
> > prefered (Richard?), but
> >
> > +  /* Check if the mismatch is only in the sign and if we have
> > +     allow_short_sign_mismatch then allow it.  */  if (unprom_type
> > +      && TYPE_SIGN (unprom_type) == SIGNED
> > +      && TYPE_SIGN (*common_type) != TYPE_SIGN (new_type))
> > +    {
> > +      bool sign = TYPE_SIGN (*common_type) == UNSIGNED;
> > +      tree eq_type
> > +       = build_nonstandard_integer_type (TYPE_PRECISION (new_type),
> > +                                         sign);
> > +
> > +      if (types_compatible_p (*common_type, eq_type))
> > +       return true;
> > +    }
> >
> > looks somewhat complicated - is that equal to
> >
> >   if (unprom_type
> >       && tree_nop_conversion_p (*common_type, new_type))
> >     return true;
> >
> > ?  That is, *common_type and new_type only differ in sign?
> >
> > @@ -812,8 +844,13 @@ vect_convert_inputs (vec_info *vinfo,
> > stmt_vec_info stmt_info, unsigned int n,
> >        for (j = 0; j < i; ++j)
> >         if (unprom[j].op == unprom[i].op)
> >           break;
> > +      bool only_sign = allow_short_sign_mismatch
> > +                      && TYPE_SIGN (type) != TYPE_SIGN (unprom[i].type)
> > +                      && TYPE_PRECISION (type) == TYPE_PRECISION
> > (unprom[i].type);
> >
> > this could use the same tree_nop_conversion_p predicate.
> >
> > Otherwise the patch looks good.
> >
> > Thanks,
> > Richard.
> >
> >
> >
> > > Thanks,
> > > Tamar
> > >
> > > gcc/ChangeLog:
> > >
> > > 	* optabs.def (usdot_prod_optab): New.
> > > 	* doc/md.texi: Document it and clarify other dot prod optabs.
> > > 	* optabs-tree.h (enum optab_subtype): Add
> > optab_vector_mixed_sign.
> > > 	* optabs-tree.c (optab_for_tree_code): Support usdot_prod_optab.
> > > 	* optabs.c (expand_widen_pattern_expr): Likewise.
> > > 	* tree-cfg.c (verify_gimple_assign_ternary): Likewise.
> > > 	* tree-vect-loop.c (vect_determine_dot_kind): New.
> > > 	(vectorizable_reduction): Query dot-product kind.
> > > 	* tree-vect-patterns.c (vect_supportable_direct_optab_p): Take
> > optional
> > > 	optab subtype.
> > > 	(vect_joust_widened_type, vect_widened_op_tree): Optionally
> > ignore
> > > 	mismatch types.
> > > 	(vect_recog_dot_prod_pattern): Support usdot_prod_optab.
> > >
> > >
> > > > -----Original Message-----
> > > > From: Richard Biener <rguenther@suse.de>
> > > > Sent: Monday, May 10, 2021 2:29 PM
> > > > To: Tamar Christina <Tamar.Christina@arm.com>
> > > > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>
> > > > Subject: RE: [PATCH 1/4]middle-end Vect: Add support for
> > > > dot-product where the sign for the multiplicant changes.
> > > >
> > > > On Mon, 10 May 2021, Tamar Christina wrote:
> > > >
> > > > >
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Richard Biener <rguenther@suse.de>
> > > > > > Sent: Monday, May 10, 2021 12:40 PM
> > > > > > To: Tamar Christina <Tamar.Christina@arm.com>
> > > > > > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>
> > > > > > Subject: RE: [PATCH 1/4]middle-end Vect: Add support for dot-
> > product
> > > > > > where the sign for the multiplicant changes.
> > > > > >
> > > > > > On Fri, 7 May 2021, Tamar Christina wrote:
> > > > > >
> > > > > > > Hi Richi,
> > > > > > >
> > > > > > > > -----Original Message-----
> > > > > > > > From: Richard Biener <rguenther@suse.de>
> > > > > > > > Sent: Friday, May 7, 2021 12:46 PM
> > > > > > > > To: Tamar Christina <Tamar.Christina@arm.com>
> > > > > > > > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>
> > > > > > > > Subject: Re: [PATCH 1/4]middle-end Vect: Add support for
> > > > > > > > dot-product where the sign for the multiplicant changes.
> > > > > > > >
> > > > > > > > On Wed, 5 May 2021, Tamar Christina wrote:
> > > > > > > >
> > > > > > > > > Hi All,
> > > > > > > > >
> > > > > > > > > This patch adds support for a dot product where the sign
> > > > > > > > > of the multiplication arguments differ. i.e. one is
> > > > > > > > > signed and one is unsigned but the precisions are the same.
> > > > > > > > >
> > > > > > > > > #define N 480
> > > > > > > > > #define SIGNEDNESS_1 unsigned #define SIGNEDNESS_2
> > > > > > > > > signed #define SIGNEDNESS_3 signed #define SIGNEDNESS_4
> > > > > > > > > unsigned
> > > > > > > > >
> > > > > > > > > SIGNEDNESS_1 int __attribute__ ((noipa)) f (SIGNEDNESS_1
> > > > > > > > > int res,
> > > > > > > > > SIGNEDNESS_3 char *restrict a,
> > > > > > > > >    SIGNEDNESS_4 char *restrict b) {
> > > > > > > > >   for (__INTPTR_TYPE__ i = 0; i < N; ++i)
> > > > > > > > >     {
> > > > > > > > >       int av = a[i];
> > > > > > > > >       int bv = b[i];
> > > > > > > > >       SIGNEDNESS_2 short mult = av * bv;
> > > > > > > > >       res += mult;
> > > > > > > > >     }
> > > > > > > > >   return res;
> > > > > > > > > }
> > > > > > > > >
> > > > > > > > > The operations are performed as if the operands were
> > extended
> > > > > > > > > to a 32-bit
> > > > > > > > value.
> > > > > > > > > As such this operation isn't valid if there is an
> > > > > > > > > intermediate conversion to an unsigned value. i.e.  if
> > > > > > > > > SIGNEDNESS_2 is
> > unsigned.
> > > > > > > > >
> > > > > > > > > more over if the signs of SIGNEDNESS_3 and SIGNEDNESS_4
> > > > > > > > > are flipped the same optab is used but the operands are
> > > > > > > > > flipped in the optab
> > > > > > > > expansion.
> > > > > > > > >
> > > > > > > > > To support this the patch extends the dot-product
> > > > > > > > > detection to optionally ignore operands with different
> > > > > > > > > signs and stores this information in the optab subtype
> > > > > > > > > which is now made a
> > bitfield.
> > > > > > > > >
> > > > > > > > > The subtype can now additionally controls which optab an
> > > > > > > > > EXPR can expand
> > > > > > > > to.
> > > > > > > > >
> > > > > > > > > Bootstrapped Regtested on aarch64-none-linux-gnu and no
> > issues.
> > > > > > > > >
> > > > > > > > > Ok for master?
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Tamar
> > > > > > > > >
> > > > > > > > > gcc/ChangeLog:
> > > > > > > > >
> > > > > > > > > 	* optabs.def (usdot_prod_optab): New.
> > > > > > > > > 	* doc/md.texi: Document it.
> > > > > > > > > 	* optabs-tree.c (optab_for_tree_code): Support
> > > > usdot_prod_optab.
> > > > > > > > > 	* optabs-tree.h (enum optab_subtype): Likewise.
> > > > > > > > > 	* optabs.c (expand_widen_pattern_expr): Likewise.
> > > > > > > > > 	* tree-cfg.c (verify_gimple_assign_ternary): Likewise.
> > > > > > > > > 	* tree-vect-loop.c (vect_determine_dot_kind): New.
> > > > > > > > > 	(vectorizable_reduction): Query dot-product kind.
> > > > > > > > > 	* tree-vect-patterns.c (vect_supportable_direct_optab_p):
> > > > > > > > > Take
> > > > > > > > optional
> > > > > > > > > 	optab subtype.
> > > > > > > > > 	(vect_joust_widened_type, vect_widened_op_tree):
> > > > Optionally
> > > > > > > > ignore
> > > > > > > > > 	mismatch types.
> > > > > > > > > 	(vect_recog_dot_prod_pattern): Support usdot_prod_optab.
> > > > > > > > >
> > > > > > > > > --- inline copy of patch -- diff --git a/gcc/doc/md.texi
> > > > > > > > > b/gcc/doc/md.texi index
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> >
> d166a0debedf4d8edf55c842bcf4ff4690b3e9ce..baf20416e63745097825fc30fd
> > > > > > > > f2
> > > > > > > > > e66bc80d7d23 100644
> > > > > > > > > --- a/gcc/doc/md.texi
> > > > > > > > > +++ b/gcc/doc/md.texi
> > > > > > > > > @@ -5440,11 +5440,13 @@ Like
> > @samp{fold_left_plus_@var{m}},
> > > > > > > > > but
> > > > > > > > takes
> > > > > > > > > an additional mask operand  @item
> > > > > > > > > @samp{sdot_prod@var{m}}
> > > > > > @cindex
> > > > > > > > > @code{udot_prod@var{m}} instruction pattern  @itemx
> > > > > > > > > @samp{udot_prod@var{m}}
> > > > > > > > > +@cindex @code{usdot_prod@var{m}} instruction pattern
> > @itemx
> > > > > > > > > +@samp{usdot_prod@var{m}}
> > > > > > > > >  Compute the sum of the products of two signed/unsigned
> > > > elements.
> > > > > > > > > -Operand 1 and operand 2 are of the same mode. Their
> > > > > > > > > product, which is of a -wider mode, is computed and added to
> operand 3.
> > > > > > > > > Operand 3 is of a mode equal or -wider than the mode of
> > > > > > > > > the product. The result is placed in operand 0, which
> > > > > > > > > -is of the same mode
> > > > > > as operand 3.
> > > > > > > > > +Operand 1 and operand 2 are of the same mode but may
> > > > > > > > > +differ in
> > > > > > signs.
> > > > > > > > > +Their product, which is of a wider mode, is computed
> > > > > > > > > +and added to
> > > > > > > > operand 3.
> > > > > > > > > +Operand 3 is of a mode equal or wider than the mode of
> > > > > > > > > +the
> > > > product.
> > > > > > > > > +The result is placed in operand 0, which is of the same
> > > > > > > > > +mode as
> > > > > > operand 3.
> > > > > > > >
> > > > > > > > This doesn't really say what the 's', 'u' and 'us' specify.
> > > > > > > > Since we're doing a widen multiplication and then a
> > > > > > > > non-widening addition we only need to know the effective
> > > > > > > > sign of the multiplication so I think
> > > > > > the existing 's' and 'u'
> > > > > > > > are enough to cover all cases?
> > > > > > >
> > > > > > > The existing 's' and 'u' enforce that both operands of the
> > > > > > > multiplication are of the same sign.  So for e.g. 'u' both
> > > > > > > operand must be
> > > > > > unsigned.
> > > > > > >
> > > > > > > In the `us` case one can be signed and one unsigned.
> > > > > > > Operationally this does a sign extension to the wider type
> > > > > > > for the signed value, and the unsigned value gets zero
> > > > > > > extended first, and then converts it to unsigned to perform
> > > > > > > the unsigned multiplication, conforming to the C
> > > > > > promotion rules.
> > > > > > >
> > > > > > > TL;DR; Without a new optab I can't tell during expansion
> > > > > > > which semantic the operation had at the gimple/C level as
> > > > > > > modes don't
> > carry
> > > > signs.
> > > > > > >
> > > > > > > Long version:
> > > > > > >
> > > > > > > The problem with using the existing patterns, because of
> > > > > > > their enforcement of `av` and `bv` being the same sign is
> > > > > > > that we can't remove the explicit sign extensions, but the
> > > > > > > multiplication must be done on
> > > > > > the sign/zero extended char input in the same sign.
> > > > > > >
> > > > > > > Which means (unless I am mistaken) to get the correct
> > > > > > > result, you can't use neither `udot` nor `sdot` as
> > > > > > > semantically these would zero or sign extend both operands
> > > > > > > from char to int to perform the multiplication in the same
> > > > > > > sigh.  Whereas in this case, one parameter is zero
> > > > > > and one parameter is sign extended and the result is always an
> > > > > > unsigned number.
> > > > > > >
> > > > > > > So basically
> > > > > > >
> > > > > > > udot<unsigned c, unsigned a, unsigned b> ==
> > > > > > >    c = zero-ext (a) * zero-ext (b) sdot<signed c, signed a,
> > > > > > > signed
> > > > > > > b> ==
> > > > > > >    c = sign-ext (a) * sign-ext (b) usdot<unsigned c,
> > > > > > > unsigned a, signed b> ==
> > > > > > >    c = ((unsigned-conv) sign-ext (a)) * zero-ext (b)
> > > > > > >
> > > > > > > So semantically the existing optabs won't fit here. udot
> > > > > > > would internally promote to unsigned types before the
> > > > > > > multiplication so the result of the multiplication would be
> > > > > > > wrong.  sdot would promote both to
> > > > > > signed and do signed multiplication, so the result is also wrong.
> > > > > > >
> > > > > > > Now if I relax the constraint on the signs of udot and sdot
> > > > > > > there are two
> > > > > > problems:
> > > > > > > RTL Modes don't contain signs.  So a target can't tell me
> > > > > > > how the operands
> > > > > > will be promoted.
> > > > > > > So:
> > > > > > >
> > > > > > > 1) I can't really check which semantics the target will
> > > > > > > adhere to on
> > > > > > expansion.
> > > > > > > 2) at expand time I have no way to differentiate between the
> > > > > > > two
> > > > > > instructions variants, given just modes
> > > > > > >      I can't tell whether I expand to the normal dot-product
> > > > > > > or the new
> > > > > > instruction.
> > > > > >
> > > > > > Ah, OK.  Indeed with such a weird instruction the new variant
> > > > > > makes
> > > > sense.
> > > > > > Still can you please amend the optab documentation to say
> > > > > > which operand is unsigned and which is signed?  Just 'may differ in
> signs'
> > > > > > is bad.
> > > > >
> > > > > Sure, will expand on it.
> > > > >
> > > > > >
> > > > > > Since the multiplication is commutative I wonder why you need
> > > > > > to handle both signed_to_unsigned and unsigned_to_signed - we
> > should
> > > > > > just enforce a canonical order (like the optab does).
> > > > >
> > > > > Sure, I thought it would have been better to change the order at
> > > > > expand time, but can do so at detection time.
> > > > >
> > > > > > I also think it's a particular bad fit for the bad
> > > > > > optab_for_tree_code API - would any of that improve when using
> > > > > > a direct internal function here?
> > > > >
> > > > > Somewhat, but this has considerable knock on effects, e.g.
> > > > > currently DOT_PROD is treated as a widening operation and so is
> > > > > handled by supportable_widening_operation which does not support
> > > > > calls. There's
> > a
> > > > > significant number of places which work on the tree EXPR
> > > > > (including
> > > > constant folding) which all need to be changed.
> > > > >
> > > > > > In particular all the changes around optab_subtype look like
> > > > > > they make a bad API worse ... at least a single
> > > > > > optab_vector_mixed_sign should suffice here, no need to make it a
> flags kind.
> > > > >
> > > > > The reason I did so is because depending on where the query is
> > > > > done it does use different subtypes currently.  During detection
> > > > > it uses optab_default, and during vectorization optab_vector.
> > > > > For this instruction this difference doesn't seem to be used,
> > > > > but did not want to
> > > > lose this information in case something depended on it.
> > > > >
> > > > > But can make it just one.
> > > > >
> > > > > >
> > > > > > +  /* If we have a sign changing dot product we need to check
> > > > > > + that
> > the
> > > > > > +     promoted type if unsigned has at least the same
> > > > > > + precision as the
> > > > > > final
> > > > > > +     type of the dot-product.  */  if (subtype !=
> > > > > > + optab_default)
> > > > > > +    {
> > > > > > +      tree mult_type = TREE_TYPE (unprom_mult.op);
> > > > > > +      if (TYPE_SIGN (mult_type) == UNSIGNED
> > > > > > +         && TYPE_PRECISION (mult_type) < TYPE_PRECISION (type))
> > > > > > +       return NULL;
> > > > > > +    }
> > > > > >
> > > > > > I don't understand this - how do we ever arrive at a result
> > > > > > with less
> > > > precision?
> > > > >
> > > > > The user could have manually truncated the results, i.e. in the
> > > > > detection code notice `mult`
> > > > >
> > > > >       int av = a[i];
> > > > >       int bv = b[i];
> > > > >       SIGNEDNESS_2 short mult = av * bv;
> > > > >       res += mult;
> > > > >
> > > > > which is a short, so it's manually truncating the multiplication
> > > > > which is done as int by the instruction. If `mult` is unsigned
> > > > > then it will truncate the result if the signed input to usdot
> > > > > was negative, unless the Intermediate calculation is of the same
> > > > > precision as the instruction. i.e. if mult is unsigned int then
> > > > > there's no truncation going on, it's casting from int to
> > > > > unsigned int so it's safe to use then as the instruction does the same
> thing internally.
> > > >
> > > > It looks to me that we simply should only ever allow sing-changes
> > > > from multiplication result to the sum.  At least your example
> > > > above is not
> > special to
> > > > mixed sign multiplications, no?
> > > >
> > > > > > And why's this not an issue for signed multiplication?
> > > > >
> > > > > It is, but in that case it's handled by the type jousting, which
> > > > > doesn't allow the type mismatch. i.e.
> > > > >
> > > > > #define SIGNEDNESS_1 unsigned
> > > > > #define SIGNEDNESS_2 unsigned
> > > > > #define SIGNEDNESS_3 signed
> > > > > #define SIGNEDNESS_4 signed
> > > > >
> > > > > SIGNEDNESS_1 int __attribute__ ((noipa)) f (SIGNEDNESS_1 int
> > > > > res,
> > > > > SIGNEDNESS_3 char *restrict a,
> > > > >    SIGNEDNESS_4 char *restrict b) {
> > > > >   for (__INTPTR_TYPE__ i = 0; i < N; ++i)
> > > > >     {
> > > > >       int av = a[i];
> > > > >       int bv = b[i];
> > > > >       SIGNEDNESS_2 short mult = av * bv;
> > > > >       res += mult;
> > > > >     }
> > > > >   return res;
> > > > > }
> > > > >
> > > > > Is also not detected as a dot product.  By adding the carve out
> > > > > to the widen multiplication detection it now allows this case
> > > > > through so I handle it in the detection code.  Thinking about it
> > > > > now, it seems more logical to add this case handling inside the
> > > > > type jousting code as I don't think it's ever something you'd want.
> > > >
> > > > Yeah, I think we only need to look through sign changes on the
> > multiplication
> > > > result.
> > > >
> > > > > > Also...
> > > > > >
> > > > > > +  /* If we have a sign changing dot-product the dot-product
> > > > > > + itself does
> > > > > > any
> > > > > > +     sign conversions, so consume the type and use the
> > > > > > + unpromoted
> > > > types.
> > > > > > */
> > > > > > +  tree mult_arg1, mult_arg2;
> > > > > > +  if (subtype == optab_default)
> > > > > > +    {
> > > > > > +      mult_arg1 = mult_oprnd[0];
> > > > > > +      mult_arg2 = mult_oprnd[1];
> > > > > > +    }
> > > > > > +  else
> > > > > > +    {
> > > > > > +      mult_arg1 = unprom0[0].op;
> > > > > > +      mult_arg2 = unprom0[1].op;
> > > > > > +    }
> > > > > >    pattern_stmt = gimple_build_assign (var, DOT_PROD_EXPR,
> > > > > > -                                     mult_oprnd[0], mult_oprnd[1],
> > > > > > oprnd1);
> > > > > > +                                     mult_arg1, mult_arg2,
> > > > > > + oprnd1);
> > > > > >
> > > > > > I thought DOT_PROD always performs the promotion.  Maybe
> > > > mult_oprnd
> > > > > > and unprom0 are just misnamed here?
> > > > >
> > > > > Somewhat, in a normal dot-product the sign of the multiplication
> > > > > are the same here as the "unpromoted" types. So after
> > vect_convert_input
> > > > > these two types are the same.
> > > > >
> > > > > However because here the sign changes and to maintain the
> > > > > semantics
> > of
> > > > > the C code there's an extra conversion here to get the arguments
> > > > > in the same sign.  That needs to be stripped before given to the
> > > > > instruction which does the conversion internally.
> > > >
> > > > Yes, but then why's that not done by the detection code?  That is,
> > > > does it (mis-)handle the (int)short_a * (int)(unsigned
> > > > short)short_b where we'd want the mixed-sign handling and not
> > > > strip the unsigned short conversion from short_b?
> > > >
> > > > Richard.
> > > >
> > > > >
> > > > > Regards,
> > > > > Tamar
> > > > >
> > > > > >
> > > > > > Richard.
> > > > > >
> > > > > > > Regards,
> > > > > > > Tamar
> > > > > > >
> > > > > > > >
> > > > > > > > The tree.def docs say the sum is also possibly widening
> > > > > > > > but I don't see this covered by the optab so we should
> > > > > > > > eventually remove this feature from the tree side.  In
> > > > > > > > fact the tree-cfg.c verifier requires the addition to be
> > > > > > > > not widening - thus only tree.def needs
> > > > > > adjustment.
> > > > > > > >
> > > > > > > > >  @cindex @code{ssad@var{m}} instruction pattern  @item
> > > > > > > > > @samp{ssad@var{m}} diff --git a/gcc/optabs-tree.h
> > > > > > > > > b/gcc/optabs-tree.h index
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> >
> c3aaa1a416991e856d3e24da45968a92ebada82c..ebc23ac86fe99057f375781c2f
> > > > > > > > 19
> > > > > > > > > 90e0548ba08d 100644
> > > > > > > > > --- a/gcc/optabs-tree.h
> > > > > > > > > +++ b/gcc/optabs-tree.h
> > > > > > > > > @@ -27,11 +27,29 @@ along with GCC; see the file COPYING3.
> > If
> > > > > > > > > not
> > > > > > see
> > > > > > > > >     shift amount vs. machines that take a vector for the
> > > > > > > > > shift
> > amount.
> > > > > > > > > */  enum optab_subtype  {
> > > > > > > > > -  optab_default,
> > > > > > > > > -  optab_scalar,
> > > > > > > > > -  optab_vector
> > > > > > > > > +  optab_default = 1 << 0,  optab_scalar = 1 << 1,
> > > > > > > > > + optab_vector = 1 << 2,  optab_signed_to_unsigned = 1
> > > > > > > > > + << 3, optab_unsigned_to_signed =
> > > > > > > > > + 1 << 4
> > > > > > > > >  };
> > > > > > > > >
> > > > > > > > > +/* Override the OrEqual-operator so we can use
> > optab_subtype
> > > > > > > > > +as a bit flag.  */ inline enum optab_subtype& operator
> > > > > > > > > +|= (enum
> > > > > > > > optab_subtype&
> > > > > > > > > +a, enum optab_subtype b) {
> > > > > > > > > +    return a = static_cast<optab_subtype>(static_cast<int>(a)
> > > > > > > > > +					  |
> static_cast<int>(b)); }
> > > > > > > > > +
> > > > > > > > > +/* Override the Or-operator so we can use optab_subtype
> > > > > > > > > +as a bit flag.  */ inline enum optab_subtype operator |
> > > > > > > > > +(enum optab_subtype a, enum optab_subtype b) {
> > > > > > > > > +    return static_cast<optab_subtype>(static_cast<int>(a)
> > > > > > > > > +				      | static_cast<int>(b)); }
> > > > > > > > > +
> > > > > > > > >  /* Return the optab used for computing the given
> > > > > > > > > operation on the type
> > > > > > > > given by
> > > > > > > > >     the second argument.  The third argument
> > > > > > > > > distinguishes between the
> > > > > > > > types of
> > > > > > > > >     vector shifts and rotates.  */ diff --git
> > > > > > > > > a/gcc/optabs-tree.c b/gcc/optabs-tree.c index
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> >
> 95ffe397c23e80c105afea52e9d47216bf52f55a..2f60004545defc53182e004eea
> > > > > > > > 1e
> > > > > > > > > 5c22b7453072 100644
> > > > > > > > > --- a/gcc/optabs-tree.c
> > > > > > > > > +++ b/gcc/optabs-tree.c
> > > > > > > > > @@ -127,7 +127,17 @@ optab_for_tree_code (enum
> tree_code
> > > > code,
> > > > > > > > const_tree type,
> > > > > > > > >        return TYPE_UNSIGNED (type) ? usum_widen_optab :
> > > > > > > > > ssum_widen_optab;
> > > > > > > > >
> > > > > > > > >      case DOT_PROD_EXPR:
> > > > > > > > > -      return TYPE_UNSIGNED (type) ? udot_prod_optab :
> > > > > > sdot_prod_optab;
> > > > > > > > > +      {
> > > > > > > > > +	gcc_assert (subtype & optab_default
> > > > > > > > > +		    || subtype & optab_vector
> > > > > > > > > +		    || subtype & optab_signed_to_unsigned
> > > > > > > > > +		    || subtype & optab_unsigned_to_signed);
> > > > > > > > > +
> > > > > > > > > +	if (subtype & (optab_unsigned_to_signed |
> > > > > > > > optab_signed_to_unsigned))
> > > > > > > > > +	  return usdot_prod_optab;
> > > > > > > > > +
> > > > > > > > > +	return (TYPE_UNSIGNED (type) ? udot_prod_optab :
> > > > > > > > sdot_prod_optab);
> > > > > > > > > +      }
> > > > > > > > >
> > > > > > > > >      case SAD_EXPR:
> > > > > > > > >        return TYPE_UNSIGNED (type) ? usad_optab :
> > > > > > > > > ssad_optab; diff --git a/gcc/optabs.c b/gcc/optabs.c
> > > > > > > > > index
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> >
> f4614a394587787293dc8b680a38901f7906f61c..2e18b76de1412eab71971753ac
> > > > > > > > 67
> > > > > > > > > 8597c0d00098 100644
> > > > > > > > > --- a/gcc/optabs.c
> > > > > > > > > +++ b/gcc/optabs.c
> > > > > > > > > @@ -262,6 +262,11 @@ expand_widen_pattern_expr (sepops
> > ops,
> > > > > > > > > rtx op0,
> > > > > > > > rtx op1, rtx wide_op,
> > > > > > > > >    bool sbool = false;
> > > > > > > > >
> > > > > > > > >    oprnd0 = ops->op0;
> > > > > > > > > +  if (nops >= 2)
> > > > > > > > > +    oprnd1 = ops->op1;
> > > > > > > > > +  if (nops >= 3)
> > > > > > > > > +    oprnd2 = ops->op2;
> > > > > > > > > +
> > > > > > > > >    tmode0 = TYPE_MODE (TREE_TYPE (oprnd0));
> > > > > > > > >    if (ops->code == VEC_UNPACK_FIX_TRUNC_HI_EXPR
> > > > > > > > >        || ops->code == VEC_UNPACK_FIX_TRUNC_LO_EXPR)
> @@
> > > > > > > > > -
> > > > 285,6
> > > > > > > > +290,27
> > > > > > > > > @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx
> > > > > > > > > op1, rtx
> > > > > > > > wide_op,
> > > > > > > > >  	   ? vec_unpacks_sbool_hi_optab :
> > > > vec_unpacks_sbool_lo_optab);
> > > > > > > > >        sbool = true;
> > > > > > > > >      }
> > > > > > > > > +  else if (ops->code == DOT_PROD_EXPR)
> > > > > > > > > +    {
> > > > > > > > > +      enum optab_subtype subtype = optab_default;
> > > > > > > > > +      signop sign1 = TYPE_SIGN (TREE_TYPE (oprnd0));
> > > > > > > > > +      signop sign2 = TYPE_SIGN (TREE_TYPE (oprnd1));
> > > > > > > > > +      if (sign1 == sign2)
> > > > > > > > > +	;
> > > > > > > > > +      else if (sign1 == SIGNED && sign2 == UNSIGNED)
> > > > > > > > > +	{
> > > > > > > > > +	  subtype |= optab_signed_to_unsigned;
> > > > > > > > > +	  /* Same as optab_unsigned_to_signed but flip the
> > > > operands.  */
> > > > > > > > > +	  std::swap (op0, op1);
> > > > > > > > > +	}
> > > > > > > > > +      else if (sign1 == UNSIGNED && sign2 == SIGNED)
> > > > > > > > > +	subtype |= optab_unsigned_to_signed;
> > > > > > > > > +      else
> > > > > > > > > +	gcc_unreachable ();
> > > > > > > > > +
> > > > > > > > > +      widen_pattern_optab
> > > > > > > > > +	= optab_for_tree_code (ops->code, TREE_TYPE
> (oprnd0),
> > > > subtype);
> > > > > > > > > +    }
> > > > > > > > >    else
> > > > > > > > >      widen_pattern_optab
> > > > > > > > >        = optab_for_tree_code (ops->code, TREE_TYPE
> > > > > > > > > (oprnd0), optab_default); @@ -298,10 +324,7 @@
> > > > expand_widen_pattern_expr
> > > > > > > > (sepops ops, rtx op0, rtx op1, rtx wide_op,
> > > > > > > > >    gcc_assert (icode != CODE_FOR_nothing);
> > > > > > > > >
> > > > > > > > >    if (nops >= 2)
> > > > > > > > > -    {
> > > > > > > > > -      oprnd1 = ops->op1;
> > > > > > > > > -      tmode1 = TYPE_MODE (TREE_TYPE (oprnd1));
> > > > > > > > > -    }
> > > > > > > > > +    tmode1 = TYPE_MODE (TREE_TYPE (oprnd1));
> > > > > > > > >    else if (sbool)
> > > > > > > > >      {
> > > > > > > > >        nops = 2;
> > > > > > > > > @@ -316,7 +339,6 @@ expand_widen_pattern_expr (sepops
> > ops,
> > > > rtx
> > > > > > > > > op0,
> > > > > > > > rtx op1, rtx wide_op,
> > > > > > > > >      {
> > > > > > > > >        gcc_assert (tmode1 == tmode0);
> > > > > > > > >        gcc_assert (op1);
> > > > > > > > > -      oprnd2 = ops->op2;
> > > > > > > > >        wmode = TYPE_MODE (TREE_TYPE (oprnd2));
> > > > > > > > >      }
> > > > > > > > >
> > > > > > > > > diff --git a/gcc/optabs.def b/gcc/optabs.def index
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> >
> b192a9d070b8aa72e5676b2eaa020b5bdd7ffcc8..f470c2168378cec840edf7fbd
> > > > > > > > b7c
> > > > > > > > > 18615baae928 100644
> > > > > > > > > --- a/gcc/optabs.def
> > > > > > > > > +++ b/gcc/optabs.def
> > > > > > > > > @@ -352,6 +352,7 @@ OPTAB_D (uavg_ceil_optab,
> > "uavg$a3_ceil")
> > > > > > > > OPTAB_D
> > > > > > > > > (sdot_prod_optab, "sdot_prod$I$a")  OPTAB_D
> > > > (ssum_widen_optab,
> > > > > > > > > "widen_ssum$I$a3")  OPTAB_D (udot_prod_optab,
> > > > "udot_prod$I$a")
> > > > > > > > > +OPTAB_D (usdot_prod_optab, "usdot_prod$I$a")
> > > > > > > > >  OPTAB_D (usum_widen_optab, "widen_usum$I$a3")
> > OPTAB_D
> > > > > > > > (usad_optab,
> > > > > > > > > "usad$I$a")  OPTAB_D (ssad_optab, "ssad$I$a") diff --git
> > > > > > > > > a/gcc/tree-cfg.c b/gcc/tree-cfg.c index
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> >
> 7e3aae5f9c28a49feedc7cc66e8ac0d476b9f28a..58b55bb648ad97d514f1fa18bb
> > > > > > > > 00
> > > > > > > > > 808fd2678b42 100644
> > > > > > > > > --- a/gcc/tree-cfg.c
> > > > > > > > > +++ b/gcc/tree-cfg.c
> > > > > > > > > @@ -4421,7 +4421,8 @@ verify_gimple_assign_ternary
> > > > > > > > > (gassign
> > > > *stmt)
> > > > > > > > >  		  && !SCALAR_FLOAT_TYPE_P (rhs1_type))
> > > > > > > > >  		 || (!INTEGRAL_TYPE_P (lhs_type)
> > > > > > > > >  		     && !SCALAR_FLOAT_TYPE_P (lhs_type))))
> > > > > > > > > -	    || !types_compatible_p (rhs1_type, rhs2_type)
> > > > > > > > > +	    || (!types_compatible_p (rhs1_type, rhs2_type)
> > > > > > > > > +		&& TYPE_SIGN (rhs1_type) == TYPE_SIGN
> > > > (rhs2_type))
> > > > > > > >
> > > > > > > > That's not restrictive enough.  I suggest you use
> > > > > > > >
> > > > > > > >             && element_precision (rhs1_type) !=
> > > > > > > > element_precision
> > > > > > > > (rhs2_type)
> > > > > > > >
> > > > > > > > instead.
> > > > > > > >
> > > > > > > > As said, I'm not sure all the changes in this patch are required.
> > > > > > > >
> > > > > > > > Please elaborate.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Richard.
> > > > > > > >
> > > > > > > > >  	    || !useless_type_conversion_p (lhs_type, rhs3_type)
> > > > > > > > >  	    || maybe_lt (GET_MODE_SIZE (element_mode
> > > > (rhs3_type)),
> > > > > > > > >  			 2 * GET_MODE_SIZE (element_mode
> > > > (rhs1_type))))
> > > > > > > > diff --git
> > > > > > > > > a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> >
> 93fa2928e001c154bd4a9a73ac1dbbbf73c456df..cb8f5fbb6abca181c4171194d1
> > > > > > > > 9f
> > > > > > > > > ec29ec6e4176 100644
> > > > > > > > > --- a/gcc/tree-vect-loop.c
> > > > > > > > > +++ b/gcc/tree-vect-loop.c
> > > > > > > > > @@ -6401,6 +6401,33 @@ build_vect_cond_expr (enum
> > tree_code
> > > > > > code,
> > > > > > > > tree vop[3], tree mask,
> > > > > > > > >      }
> > > > > > > > >  }
> > > > > > > > >
> > > > > > > > > +/* Determine the optab_subtype to use for the given
> > > > > > > > > +CODE
> > and
> > > > STMT.
> > > > > > > > For
> > > > > > > > > +   most CODE this will be optab_vector, however for
> > > > > > > > > + certain operations
> > > > > > > > such as
> > > > > > > > > +   DOT_PROD_EXPR where the operation can different
> > > > > > > > > + signs for the
> > > > > > > > operands we
> > > > > > > > > +   need to be able to pick the right optabs.  */
> > > > > > > > > +
> > > > > > > > > +static enum optab_subtype vect_determine_dot_kind
> > > > > > > > > +(tree_code code, stmt_vec_info
> > > > > > > > > +stmt_vinfo) {
> > > > > > > > > +  enum optab_subtype subtype = optab_vector;
> > > > > > > > > +  switch (code)
> > > > > > > > > +    {
> > > > > > > > > +      case DOT_PROD_EXPR:
> > > > > > > > > +	{
> > > > > > > > > +	  gassign *stmt = as_a <gassign *>
> (STMT_VINFO_STMT
> > > > (stmt_vinfo));
> > > > > > > > > +	  signop rhs1_sign = TYPE_SIGN (TREE_TYPE
> > > > > > > > > +(gimple_assign_rhs1
> > > > > > > > (stmt)));
> > > > > > > > > +	  signop rhs2_sign = TYPE_SIGN (TREE_TYPE
> > > > > > > > > +(gimple_assign_rhs2
> > > > > > > > (stmt)));
> > > > > > > > > +	  if (rhs1_sign != rhs2_sign)
> > > > > > > > > +	    subtype |= optab_unsigned_to_signed;
> > > > > > > > > +	  break;
> > > > > > > > > +	}
> > > > > > > > > +      default:
> > > > > > > > > +	break;
> > > > > > > > > +    }
> > > > > > > > > +
> > > > > > > > > +  return subtype;
> > > > > > > > > +}
> > > > > > > > > +
> > > > > > > > >  /* Function vectorizable_reduction.
> > > > > > > > >
> > > > > > > > >     Check if STMT_INFO performs a reduction operation
> > > > > > > > > that can be
> > > > > > > > vectorized.
> > > > > > > > > @@ -7189,7 +7216,8 @@ vectorizable_reduction
> > > > > > > > > (loop_vec_info
> > > > > > > > loop_vinfo,
> > > > > > > > >        bool ok = true;
> > > > > > > > >
> > > > > > > > >        /* 4.1. check support for the operation in the loop  */
> > > > > > > > > -      optab optab = optab_for_tree_code (code, vectype_in,
> > > > > > optab_vector);
> > > > > > > > > +      enum optab_subtype subtype =
> > > > > > > > > + vect_determine_dot_kind (code,
> > > > > > > > stmt_info);
> > > > > > > > > +      optab optab = optab_for_tree_code (code,
> > > > > > > > > + vectype_in, subtype);
> > > > > > > > >        if (!optab)
> > > > > > > > >  	{
> > > > > > > > >  	  if (dump_enabled_p ()) diff --git
> > > > > > > > > a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
> > > > > > > > > index
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> >
> 441d6cd28c4eaded7abd756164890dbcffd2f3b8..943c001fb13777b4d1513841f
> > > > > > > > a84
> > > > > > > > > 942316846d5e 100644
> > > > > > > > > --- a/gcc/tree-vect-patterns.c
> > > > > > > > > +++ b/gcc/tree-vect-patterns.c
> > > > > > > > > @@ -201,7 +201,8 @@ vect_get_external_def_edge
> (vec_info
> > > > > > > > > *vinfo, tree
> > > > > > > > > var)  static bool  vect_supportable_direct_optab_p
> > > > > > > > > (vec_info *vinfo, tree otype, tree_code code,
> > > > > > > > >  				 tree itype, tree *vecotype_out,
> > > > > > > > > -				 tree *vecitype_out = NULL)
> > > > > > > > > +				 tree *vecitype_out = NULL,
> > > > > > > > > +				 enum optab_subtype
> subtype =
> > > > > > > > optab_default)
> > > > > > > > >  {
> > > > > > > > >    tree vecitype = get_vectype_for_scalar_type (vinfo, itype);
> > > > > > > > >    if (!vecitype)
> > > > > > > > > @@ -211,7 +212,7 @@ vect_supportable_direct_optab_p
> > (vec_info
> > > > > > > > > *vinfo,
> > > > > > > > tree otype, tree_code code,
> > > > > > > > >    if (!vecotype)
> > > > > > > > >      return false;
> > > > > > > > >
> > > > > > > > > -  optab optab = optab_for_tree_code (code, vecitype,
> > > > > > > > > optab_default);
> > > > > > > > > +  optab optab = optab_for_tree_code (code, vecitype,
> > > > > > > > > + subtype);
> > > > > > > > >    if (!optab)
> > > > > > > > >      return false;
> > > > > > > > >
> > > > > > > > > @@ -487,14 +488,31 @@ vect_joust_widened_integer (tree
> > type,
> > > > > > > > > bool shift_p, tree op,  }
> > > > > > > > >
> > > > > > > > >  /* Return true if the common supertype of NEW_TYPE and
> > > > > > > > *COMMON_TYPE
> > > > > > > > > -   is narrower than type, storing the supertype in
> > *COMMON_TYPE
> > > > if
> > > > > > so.
> > > > > > > > */
> > > > > > > > > +   is narrower than type, storing the supertype in
> > > > > > > > > + *COMMON_TYPE if
> > > > > > so.
> > > > > > > > > +   If ALLOW_SHORT_SIGN_MISMATCH then accept that
> > > > > > *COMMON_TYPE
> > > > > > > > and NEW_TYPE
> > > > > > > > > +   may be of different signs but equal precision.   */
> > > > > > > > >
> > > > > > > > >  static bool
> > > > > > > > > -vect_joust_widened_type (tree type, tree new_type, tree
> > > > > > > > *common_type)
> > > > > > > > > +vect_joust_widened_type (tree type, tree new_type, tree
> > > > > > > > *common_type,
> > > > > > > > > +			 bool allow_short_sign_mismatch =
> false)
> > > > > > > > >  {
> > > > > > > > >    if (types_compatible_p (*common_type, new_type))
> > > > > > > > >      return true;
> > > > > > > > >
> > > > > > > > > +  /* Check if the mismatch is only in the sign and if we have
> > > > > > > > > +     allow_short_sign_mismatch then allow it.  */
> > > > > > > > > +  if (allow_short_sign_mismatch
> > > > > > > > > +      && TYPE_SIGN (*common_type) != TYPE_SIGN
> (new_type))
> > > > > > > > > +    {
> > > > > > > > > +      bool sign = TYPE_SIGN (*common_type) == UNSIGNED;
> > > > > > > > > +      tree eq_type
> > > > > > > > > +	= build_nonstandard_integer_type (TYPE_PRECISION
> > > > (new_type),
> > > > > > > > > +					  sign);
> > > > > > > > > +
> > > > > > > > > +      if (types_compatible_p (*common_type, eq_type))
> > > > > > > > > +	return true;
> > > > > > > > > +    }
> > > > > > > > > +
> > > > > > > > >    /* See if *COMMON_TYPE can hold all values of NEW_TYPE.
> > */
> > > > > > > > >    if ((TYPE_PRECISION (new_type) < TYPE_PRECISION
> > > > (*common_type))
> > > > > > > > >        && (TYPE_UNSIGNED (new_type) || !TYPE_UNSIGNED
> > > > > > > > (*common_type)))
> > > > > > > > > @@ -532,6 +550,9 @@ vect_joust_widened_type (tree type,
> > tree
> > > > > > > > new_type, tree *common_type)
> > > > > > > > >     to a type that (a) is narrower than the result of
> > > > > > > > > STMT_INFO
> > and
> > > > > > > > >     (b) can hold all leaf operand values.
> > > > > > > > >
> > > > > > > > > +   If ALLOW_SHORT_SIGN_MISMATCH then allow that the
> > > > > > > > > + signs
> > of
> > > > > > > > > + the
> > > > > > > > operands
> > > > > > > > > +   may differ in signs but not in precision.
> > > > > > > > > +
> > > > > > > > >     Return 0 if STMT_INFO isn't such a tree, or if no
> > > > > > > > > such
> > > > COMMON_TYPE
> > > > > > > > >     exists.  */
> > > > > > > > >
> > > > > > > > > @@ -539,7 +560,8 @@ static unsigned int
> > vect_widened_op_tree
> > > > > > > > > (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
> > > > > > > > >  		      tree_code widened_code, bool shift_p,
> > > > > > > > >  		      unsigned int max_nops,
> > > > > > > > > -		      vect_unpromoted_value *unprom, tree
> > > > *common_type)
> > > > > > > > > +		      vect_unpromoted_value *unprom, tree
> > > > *common_type,
> > > > > > > > > +		      bool allow_short_sign_mismatch = false)
> > > > > > > > >  {
> > > > > > > > >    /* Check for an integer operation with the right code.  */
> > > > > > > > >    gassign *assign = dyn_cast <gassign *>
> > > > > > > > > (stmt_info->stmt); @@
> > > > > > > > > -600,7
> > > > > > > > > +622,8 @@ vect_widened_op_tree (vec_info *vinfo,
> > > > stmt_vec_info
> > > > > > > > stmt_info, tree_code code,
> > > > > > > > >  		= vinfo->lookup_def (this_unprom->op);
> > > > > > > > >  	      nops = vect_widened_op_tree (vinfo,
> > > > > > > > > def_stmt_info,
> > > > code,
> > > > > > > > >  					   widened_code, shift_p,
> > > > max_nops,
> > > > > > > > > -					   this_unprom,
> > > > common_type);
> > > > > > > > > +					   this_unprom,
> > > > common_type,
> > > > > > > > > +
> > > > allow_short_sign_mismatch);
> > > > > > > > >  	      if (nops == 0)
> > > > > > > > >  		return 0;
> > > > > > > > >
> > > > > > > > > @@ -617,7 +640,8 @@ vect_widened_op_tree (vec_info
> > > > > > > > > *vinfo,
> > > > > > > > stmt_vec_info stmt_info, tree_code code,
> > > > > > > > >  	      if (i == 0)
> > > > > > > > >  		*common_type = this_unprom->type;
> > > > > > > > >  	      else if (!vect_joust_widened_type (type,
> > > > > > > > > this_unprom-
> > > > >type,
> > > > > > > > > -						 common_type))
> > > > > > > > > +
> common_type,
> > > > > > > > > +
> > > > allow_short_sign_mismatch))
> > > > > > > > >  		return 0;
> > > > > > > > >  	    }
> > > > > > > > >  	}
> > > > > > > > > @@ -888,21 +912,24 @@ vect_reassociating_reduction_p
> > (vec_info
> > > > > > > > > *vinfo,
> > > > > > > > >
> > > > > > > > >     Try to find the following pattern:
> > > > > > > > >
> > > > > > > > > -     type x_t, y_t;
> > > > > > > > > +     type1a x_t
> > > > > > > > > +     type1b y_t;
> > > > > > > > >       TYPE1 prod;
> > > > > > > > >       TYPE2 sum = init;
> > > > > > > > >     loop:
> > > > > > > > >       sum_0 = phi <init, sum_1>
> > > > > > > > >       S1  x_t = ...
> > > > > > > > >       S2  y_t = ...
> > > > > > > > > -     S3  x_T = (TYPE1) x_t;
> > > > > > > > > -     S4  y_T = (TYPE1) y_t;
> > > > > > > > > +     S3  x_T = (TYPE3) x_t;
> > > > > > > > > +     S4  y_T = (TYPE4) y_t;
> > > > > > > > >       S5  prod = x_T * y_T;
> > > > > > > > >       [S6  prod = (TYPE2) prod;  #optional]
> > > > > > > > >       S7  sum_1 = prod + sum_0;
> > > > > > > > >
> > > > > > > > > -   where 'TYPE1' is exactly double the size of type 'type', and
> > 'TYPE2'
> > > > is
> > > > > > the
> > > > > > > > > -   same size of 'TYPE1' or bigger. This is a special case of a
> > reduction
> > > > > > > > > +   where 'TYPE1' is exactly double the size of type
> > > > > > > > > + 'type1a' and
> > > > 'type1b',
> > > > > > > > > +   the sign of 'TYPE1' must be one of 'type1a' or
> > > > > > > > > + 'type1b' but the
> > > > sign of
> > > > > > > > > +   'type1a' and 'type1b' can differ. 'TYPE2' is the
> > > > > > > > > + same size of
> > 'TYPE1'
> > > > or
> > > > > > > > > +   bigger and must be the same sign. This is a special
> > > > > > > > > + case of a reduction
> > > > > > > > >     computation.
> > > > > > > > >
> > > > > > > > >     Input:
> > > > > > > > > @@ -939,15 +966,16 @@ vect_recog_dot_prod_pattern
> > (vec_info
> > > > > > > > > *vinfo,
> > > > > > > > >
> > > > > > > > >    /* Look for the following pattern
> > > > > > > > >            DX = (TYPE1) X;
> > > > > > > > > -          DY = (TYPE1) Y;
> > > > > > > > > +	  DY = (TYPE2) Y;
> > > > > > > > >            DPROD = DX * DY;
> > > > > > > > > -          DDPROD = (TYPE2) DPROD;
> > > > > > > > > +	  DDPROD = (TYPE3) DPROD;
> > > > > > > > >            sum_1 = DDPROD + sum_0;
> > > > > > > > >       In which
> > > > > > > > >       - DX is double the size of X
> > > > > > > > >       - DY is double the size of Y
> > > > > > > > >       - DX, DY, DPROD all have the same type but the sign
> > > > > > > > > -       between DX, DY and DPROD can differ.
> > > > > > > > > +       between DX, DY and DPROD can differ. The sign of DPROD
> > > > > > > > > +       is one of the signs of DX or DY.
> > > > > > > > >       - sum is the same size of DPROD or bigger
> > > > > > > > >       - sum has been recognized as a reduction variable.
> > > > > > > > >
> > > > > > > > > @@ -986,14 +1014,41 @@ vect_recog_dot_prod_pattern
> > (vec_info
> > > > > > *vinfo,
> > > > > > > > >       inside the loop (in case we are analyzing an outer-loop).  */
> > > > > > > > >    vect_unpromoted_value unprom0[2];
> > > > > > > > >    if (!vect_widened_op_tree (vinfo, mult_vinfo,
> > > > > > > > > MULT_EXPR,
> > > > > > > > WIDEN_MULT_EXPR,
> > > > > > > > > -			     false, 2, unprom0, &half_type))
> > > > > > > > > +			     false, 2, unprom0, &half_type,
> true))
> > > > > > > > >      return NULL;
> > > > > > > > >
> > > > > > > > > +  /* Check to see if there is a sign change happening
> > > > > > > > > + in the operands of
> > > > > > > > the
> > > > > > > > > +     multiplication and pick the appropriate optab subtype.
> > > > > > > > > +*/
> > > > > > > > > +  enum optab_subtype subtype;
> > > > > > > > > +  tree rhs_type1 = unprom0[0].type;
> > > > > > > > > +  tree rhs_type2 = unprom0[1].type;
> > > > > > > > > +  if (TYPE_SIGN (rhs_type1) == TYPE_SIGN (rhs_type2))
> > > > > > > > > +     subtype = optab_default;
> > > > > > > > > +  else if (TYPE_SIGN (rhs_type1) == SIGNED
> > > > > > > > > +	   && TYPE_SIGN (rhs_type2) == UNSIGNED)
> > > > > > > > > +     subtype = optab_signed_to_unsigned;
> > > > > > > > > +  else if (TYPE_SIGN (rhs_type1) == UNSIGNED
> > > > > > > > > +	   && TYPE_SIGN (rhs_type2) == SIGNED)
> > > > > > > > > +     subtype = optab_unsigned_to_signed;
> > > > > > > > > +  else
> > > > > > > > > +    gcc_unreachable ();
> > > > > > > > > +
> > > > > > > > > +  /* If we have a sign changing dot product we need to
> > > > > > > > > + check
> > that
> > > > the
> > > > > > > > > +     promoted type if unsigned has at least the same
> > > > > > > > > + precision as the
> > > > > > final
> > > > > > > > > +     type of the dot-product.  */
> > > > > > > > > +  if (subtype != optab_default)
> > > > > > > > > +    {
> > > > > > > > > +      tree mult_type = TREE_TYPE (unprom_mult.op);
> > > > > > > > > +      if (TYPE_SIGN (mult_type) == UNSIGNED
> > > > > > > > > +	  && TYPE_PRECISION (mult_type) < TYPE_PRECISION
> (type))
> > > > > > > > > +	return NULL;
> > > > > > > > > +    }
> > > > > > > > > +
> > > > > > > > >    vect_pattern_detected ("vect_recog_dot_prod_pattern",
> > > > > > > > > last_stmt);
> > > > > > > > >
> > > > > > > > >    tree half_vectype;
> > > > > > > > >    if (!vect_supportable_direct_optab_p (vinfo, type,
> > > > > > > > > DOT_PROD_EXPR,
> > > > > > > > half_type,
> > > > > > > > > -					type_out, &half_vectype))
> > > > > > > > > +					type_out,
> &half_vectype,
> > > > subtype))
> > > > > > > > >      return NULL;
> > > > > > > > >
> > > > > > > > >    /* Get the inputs in the appropriate types.  */ @@
> > > > > > > > > -1002,8
> > > > > > > > > +1057,22 @@ vect_recog_dot_prod_pattern (vec_info
> > > > > > > > > +*vinfo,
> > > > > > > > >  		       unprom0, half_vectype);
> > > > > > > > >
> > > > > > > > >    var = vect_recog_temp_ssa_var (type, NULL);
> > > > > > > > > +
> > > > > > > > > +  /* If we have a sign changing dot-product the
> > > > > > > > > + dot-product itself does
> > > > > > any
> > > > > > > > > +     sign conversions, so consume the type and use the
> > > > > > > > > + unpromoted types.  */  tree mult_arg1, mult_arg2;  if
> > > > > > > > > + (subtype ==
> > > > > > > > > + optab_default)
> > > > > > > > > +    {
> > > > > > > > > +      mult_arg1 = mult_oprnd[0];
> > > > > > > > > +      mult_arg2 = mult_oprnd[1];
> > > > > > > > > +    }
> > > > > > > > > +  else
> > > > > > > > > +    {
> > > > > > > > > +      mult_arg1 = unprom0[0].op;
> > > > > > > > > +      mult_arg2 = unprom0[1].op;
> > > > > > > > > +    }
> > > > > > > > >    pattern_stmt = gimple_build_assign (var, DOT_PROD_EXPR,
> > > > > > > > > -				      mult_oprnd[0], mult_oprnd[1],
> > > > oprnd1);
> > > > > > > > > +				      mult_arg1, mult_arg2,
> oprnd1);
> > > > > > > > >
> > > > > > > > >    return pattern_stmt;
> > > > > > > > >  }
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Richard Biener <rguenther@suse.de> SUSE Software Solutions
> > > > > > > > Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany;
> > GF:
> > > > > > > > Felix Imend?rffer; HRB 36809 (AG Nuernberg)
> > > > > > >
> > > > > >
> > > > > > --
> > > > > > Richard Biener <rguenther@suse.de> SUSE Software Solutions
> > > > > > Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF:
> > > > > > Felix Imend?rffer; HRB 36809 (AG
> > Nuernberg)
> > > > >
> > > >
> > > > --
> > > > Richard Biener <rguenther@suse.de> SUSE Software Solutions
> Germany
> > > > GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Felix
> > > > Imend?rffer; HRB 36809 (AG Nuernberg)
> > >
> >
> > --
> > Richard Biener <rguenther@suse.de>
> > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409
> > Nuernberg, Germany; GF: Felix Imend
Richard Sandiford June 7, 2021, 10:10 a.m. UTC | #10
Sorry for the slow response.

Tamar Christina <Tamar.Christina@arm.com> writes:
> […]
> diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
> index 441d6cd28c4eaded7abd756164890dbcffd2f3b8..82123b96313e6783ea214b9259805d65c07d8858 100644
> --- a/gcc/tree-vect-patterns.c
> +++ b/gcc/tree-vect-patterns.c
> @@ -201,7 +201,8 @@ vect_get_external_def_edge (vec_info *vinfo, tree var)
>  static bool
>  vect_supportable_direct_optab_p (vec_info *vinfo, tree otype, tree_code code,
>                                  tree itype, tree *vecotype_out,
> -                                tree *vecitype_out = NULL)
> +                                tree *vecitype_out = NULL,
> +                                enum optab_subtype subtype = optab_default)
>  {
>    tree vecitype = get_vectype_for_scalar_type (vinfo, itype);
>    if (!vecitype)
> @@ -211,7 +212,7 @@ vect_supportable_direct_optab_p (vec_info *vinfo, tree otype, tree_code code,
>    if (!vecotype)
>      return false;
>
> -  optab optab = optab_for_tree_code (code, vecitype, optab_default);
> +  optab optab = optab_for_tree_code (code, vecitype, subtype);
>    if (!optab)
>      return false;
>
> @@ -487,10 +488,14 @@ vect_joust_widened_integer (tree type, bool shift_p, tree op,
>  }
>
>  /* Return true if the common supertype of NEW_TYPE and *COMMON_TYPE
> -   is narrower than type, storing the supertype in *COMMON_TYPE if so.  */
> +   is narrower than type, storing the supertype in *COMMON_TYPE if so.
> +   If UNPROM_TYPE then accept that *COMMON_TYPE and NEW_TYPE may be of
> +   different signs but equal precision and that the resulting
> +   multiplication of them be compatible with UNPROM_TYPE.   */
>
>  static bool
> -vect_joust_widened_type (tree type, tree new_type, tree *common_type)
> +vect_joust_widened_type (tree type, tree new_type, tree *common_type,
> +                        tree unprom_type = NULL)
>  {
>    if (types_compatible_p (*common_type, new_type))
>      return true;
> @@ -514,7 +519,18 @@ vect_joust_widened_type (tree type, tree new_type, tree *common_type)
>    unsigned int precision = MAX (TYPE_PRECISION (*common_type),
>                                 TYPE_PRECISION (new_type));
>    precision *= 2;
> -  if (precision * 2 > TYPE_PRECISION (type))
> +
> +  /* Check if the mismatch is only in the sign and if we have
> +     UNPROM_TYPE then allow it if there is enough precision to
> +     not lose any information during the conversion.  */
> +  if (unprom_type
> +      && TYPE_SIGN (unprom_type) == SIGNED
> +      && tree_nop_conversion_p (*common_type, new_type))
> +       return true;
> +
> +  /* The resulting application is unsigned, check if we have enough
> +     precision to perform the operation.  */
> +  if (precision * 2 > TYPE_PRECISION (unprom_type ? unprom_type : type))
>      return false;
>
>    *common_type = build_nonstandard_integer_type (precision, false);
> @@ -532,6 +548,10 @@ vect_joust_widened_type (tree type, tree new_type, tree *common_type)
>     to a type that (a) is narrower than the result of STMT_INFO and
>     (b) can hold all leaf operand values.
>
> +   If UNPROM_TYPE then allow that the signs of the operands
> +   may differ in signs but not in precision and that the resulting type
> +   of the operation on the operands is compatible with UNPROM_TYPE.
> +
>     Return 0 if STMT_INFO isn't such a tree, or if no such COMMON_TYPE
>     exists.  */
>
> @@ -539,7 +559,8 @@ static unsigned int
>  vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
>                       tree_code widened_code, bool shift_p,
>                       unsigned int max_nops,
> -                     vect_unpromoted_value *unprom, tree *common_type)
> +                     vect_unpromoted_value *unprom, tree *common_type,
> +                     tree unprom_type = NULL)
>  {
>    /* Check for an integer operation with the right code.  */
>    gassign *assign = dyn_cast <gassign *> (stmt_info->stmt);
> @@ -600,7 +621,8 @@ vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
>                 = vinfo->lookup_def (this_unprom->op);
>               nops = vect_widened_op_tree (vinfo, def_stmt_info, code,
>                                            widened_code, shift_p, max_nops,
> -                                          this_unprom, common_type);
> +                                          this_unprom, common_type,
> +                                          unprom_type);
>               if (nops == 0)
>                 return 0;
>
> @@ -617,7 +639,7 @@ vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
>               if (i == 0)
>                 *common_type = this_unprom->type;
>               else if (!vect_joust_widened_type (type, this_unprom->type,
> -                                                common_type))
> +                                                common_type, unprom_type))
>                 return 0;
>             }
>         }
> @@ -799,12 +821,15 @@ vect_convert_input (vec_info *vinfo, stmt_vec_info stmt_info, tree type,
>  }
>
>  /* Invoke vect_convert_input for N elements of UNPROM and store the
> -   result in the corresponding elements of RESULT.  */
> +   result in the corresponding elements of RESULT.
> +
> +   If ALLOW_SHORT_SIGN_MISMATCH then don't convert the types if they only
> +   differ by sign.  */
>
>  static void
>  vect_convert_inputs (vec_info *vinfo, stmt_vec_info stmt_info, unsigned int n,
>                      tree *result, tree type, vect_unpromoted_value *unprom,
> -                    tree vectype)
> +                    tree vectype, bool allow_short_sign_mismatch = false)
>  {
>    for (unsigned int i = 0; i < n; ++i)
>      {
> @@ -812,8 +837,12 @@ vect_convert_inputs (vec_info *vinfo, stmt_vec_info stmt_info, unsigned int n,
>        for (j = 0; j < i; ++j)
>         if (unprom[j].op == unprom[i].op)
>           break;
> +
>        if (j < i)
>         result[i] = result[j];
> +      else if (allow_short_sign_mismatch
> +              && tree_nop_conversion_p (type, unprom[i].type))
> +       result[i] = unprom[i].op;
>        else
>         result[i] = vect_convert_input (vinfo, stmt_info,
>                                         type, &unprom[i], vectype);
> @@ -888,21 +917,24 @@ vect_reassociating_reduction_p (vec_info *vinfo,
>
>     Try to find the following pattern:
>
> -     type x_t, y_t;
> +     type1a x_t
> +     type1b y_t;
>       TYPE1 prod;
>       TYPE2 sum = init;
>     loop:
>       sum_0 = phi <init, sum_1>
>       S1  x_t = ...
>       S2  y_t = ...
> -     S3  x_T = (TYPE1) x_t;
> -     S4  y_T = (TYPE1) y_t;
> +     S3  x_T = (TYPE3) x_t;
> +     S4  y_T = (TYPE4) y_t;
>       S5  prod = x_T * y_T;
>       [S6  prod = (TYPE2) prod;  #optional]
>       S7  sum_1 = prod + sum_0;
>
> -   where 'TYPE1' is exactly double the size of type 'type', and 'TYPE2' is the
> -   same size of 'TYPE1' or bigger. This is a special case of a reduction
> +   where 'TYPE1' is exactly double the size of type 'type1a' and 'type1b',
> +   the sign of 'TYPE1' must be one of 'type1a' or 'type1b' but the sign of
> +   'type1a' and 'type1b' can differ. 'TYPE2' is the same size of 'TYPE1' or
> +   bigger and must be the same sign. This is a special case of a reduction
>     computation.

What are TYPE3 and TYPE4 in the above?  AFAICT the x_T and y_T casts
should still be to TYPE1, since the types of x_T and y_T need to agree.

The sign of TYPE2 shouldn't matter, since TYPE2 is only used for
the addition.

> @@ -939,15 +971,16 @@ vect_recog_dot_prod_pattern (vec_info *vinfo,
>
>    /* Look for the following pattern
>            DX = (TYPE1) X;
> -          DY = (TYPE1) Y;
> +         DY = (TYPE2) Y;
>            DPROD = DX * DY;
> -          DDPROD = (TYPE2) DPROD;
> +         DDPROD = (TYPE3) DPROD;
>            sum_1 = DDPROD + sum_0;
>       In which
>       - DX is double the size of X
>       - DY is double the size of Y
>       - DX, DY, DPROD all have the same type but the sign
> -       between DX, DY and DPROD can differ.
> +       between DX, DY and DPROD can differ. The sign of DPROD
> +       is one of the signs of DX or DY.
>       - sum is the same size of DPROD or bigger
>       - sum has been recognized as a reduction variable.

These changes don't look right: DY has to be the same type as DX.
(What's different with usdot is that X and Y can be different signs.)

> @@ -986,20 +1019,29 @@ vect_recog_dot_prod_pattern (vec_info *vinfo,
>       inside the loop (in case we are analyzing an outer-loop).  */
>    vect_unpromoted_value unprom0[2];
>    if (!vect_widened_op_tree (vinfo, mult_vinfo, MULT_EXPR, WIDEN_MULT_EXPR,
> -                            false, 2, unprom0, &half_type))
> +                            false, 2, unprom0, &half_type,
> +                            TREE_TYPE (unprom_mult.op)))
>      return NULL;
>
> +  /* Check to see if there is a sign change happening in the operands of the
> +     multiplication and pick the appropriate optab subtype.  */
> +  enum optab_subtype subtype;
> +  if (TYPE_SIGN (unprom0[0].type) == TYPE_SIGN (unprom0[1].type))
> +    subtype = optab_default;
> +  else
> +    subtype = optab_vector_mixed_sign;
> +

Doesn't this check the signs of the uncast operands?  What really matters
is how things stand after the result of the (possible) casts to half_type.

E.g.:

   signed short x;
   unsigned char y;
   int z;

   z = (int) x * (int) y + z;

is an sdot operation with half_type signed short, rather than a usdot
operation.

How about instead passing a optab_subtype* to vect_widened_op_tree, in
place of the unprom_mult.op type?  When this optab_subtype* is nonnull,
the joust operation is allowed to fail as long as:

  tree_nop_conversion_p (this_unprom->type, common_type)

is true.  vect_widened_op_tree would set the optab_subtype to
optab_vector_mixed_sign to indicate this case.

We should make sure that we handle:

   unsigned short x;
   signed char y;
   int z;

   z = (int) x * (int) y + z;

correctly though: this should be a usdot operation in which y
is cast to signed short.  I'm not sure whether the patch would
insert the needed cast.

Thanks,
Richard

>    vect_pattern_detected ("vect_recog_dot_prod_pattern", last_stmt);
>
>    tree half_vectype;
>    if (!vect_supportable_direct_optab_p (vinfo, type, DOT_PROD_EXPR, half_type,
> -                                       type_out, &half_vectype))
> +                                       type_out, &half_vectype, subtype))
>      return NULL;
>
>    /* Get the inputs in the appropriate types.  */
>    tree mult_oprnd[2];
>    vect_convert_inputs (vinfo, stmt_vinfo, 2, mult_oprnd, half_type,
> -                      unprom0, half_vectype);
> +                      unprom0, half_vectype, true);
>
>    var = vect_recog_temp_ssa_var (type, NULL);
>    pattern_stmt = gimple_build_assign (var, DOT_PROD_EXPR,
Tamar Christina June 14, 2021, 12:06 p.m. UTC | #11
Hi Richard,

I've attached a new version of the patch with the changes.
I have also added 7 new tests in the testsuite to check the cases you mentioned.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* optabs.def (usdot_prod_optab): New.
	* doc/md.texi: Document it and clarify other dot prod optabs.
	* optabs-tree.h (enum optab_subtype): Add optab_vector_mixed_sign.
	* optabs-tree.c (optab_for_tree_code): Support usdot_prod_optab.
	* optabs.c (expand_widen_pattern_expr): Likewise.
	* tree-cfg.c (verify_gimple_assign_ternary): Likewise.
	* tree-vect-loop.c (vectorizable_reduction): Query dot-product kind.
	* tree-vect-patterns.c (vect_supportable_direct_optab_p): Take optional
	optab subtype.
	(vect_widened_op_tree): Optionally ignore
	mismatch types.
	(vect_recog_dot_prod_pattern): Support usdot_prod_optab.

--- inline copy of patch ---

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 00caf3844ccf8ea289d581839766502d51b9e8d7..1356afb7f903f17c198103562b5cd145ecb9966f 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5446,13 +5446,55 @@ Like @samp{fold_left_plus_@var{m}}, but takes an additional mask operand
 
 @cindex @code{sdot_prod@var{m}} instruction pattern
 @item @samp{sdot_prod@var{m}}
+
+Compute the sum of the products of two signed elements.
+Operand 1 and operand 2 are of the same mode. Their
+product, which is of a wider mode, is computed and added to operand 3.
+Operand 3 is of a mode equal or wider than the mode of the product. The
+result is placed in operand 0, which is of the same mode as operand 3.
+
+Semantically the expressions perform the multiplication in the following signs
+
+@smallexample
+sdot<signed c, signed a, signed b> ==
+   res = sign-ext (a) * sign-ext (b) + c
+@dots{}
+@end smallexample
+
 @cindex @code{udot_prod@var{m}} instruction pattern
-@itemx @samp{udot_prod@var{m}}
-Compute the sum of the products of two signed/unsigned elements.
-Operand 1 and operand 2 are of the same mode. Their product, which is of a
-wider mode, is computed and added to operand 3. Operand 3 is of a mode equal or
-wider than the mode of the product. The result is placed in operand 0, which
-is of the same mode as operand 3.
+@item @samp{udot_prod@var{m}}
+
+Compute the sum of the products of two unsigned elements.
+Operand 1 and operand 2 are of the same mode. Their
+product, which is of a wider mode, is computed and added to operand 3.
+Operand 3 is of a mode equal or wider than the mode of the product. The
+result is placed in operand 0, which is of the same mode as operand 3.
+
+Semantically the expressions perform the multiplication in the following signs
+
+@smallexample
+udot<unsigned c, unsigned a, unsigned b> ==
+   res = zero-ext (a) * zero-ext (b) + c
+@dots{}
+@end smallexample
+
+
+
+@cindex @code{usdot_prod@var{m}} instruction pattern
+@item @samp{usdot_prod@var{m}}
+Compute the sum of the products of elements of different signs.
+Operand 1 must be unsigned and operand 2 signed. Their
+product, which is of a wider mode, is computed and added to operand 3.
+Operand 3 is of a mode equal or wider than the mode of the product. The
+result is placed in operand 0, which is of the same mode as operand 3.
+
+Semantically the expressions perform the multiplication in the following signs
+
+@smallexample
+usdot<unsigned c, unsigned a, signed b> ==
+   res = ((unsigned-conv) sign-ext (a)) * zero-ext (b) + c
+@dots{}
+@end smallexample
 
 @cindex @code{ssad@var{m}} instruction pattern
 @item @samp{ssad@var{m}}
diff --git a/gcc/optabs-tree.h b/gcc/optabs-tree.h
index c3aaa1a416991e856d3e24da45968a92ebada82c..fbd2b06b8dbfd560dfb66b314830e6b564b37abb 100644
--- a/gcc/optabs-tree.h
+++ b/gcc/optabs-tree.h
@@ -29,7 +29,8 @@ enum optab_subtype
 {
   optab_default,
   optab_scalar,
-  optab_vector
+  optab_vector,
+  optab_vector_mixed_sign
 };
 
 /* Return the optab used for computing the given operation on the type given by
diff --git a/gcc/optabs-tree.c b/gcc/optabs-tree.c
index 95ffe397c23e80c105afea52e9d47216bf52f55a..eeb5aeed3202cc6971b6447994bc5311e9c010bb 100644
--- a/gcc/optabs-tree.c
+++ b/gcc/optabs-tree.c
@@ -127,7 +127,12 @@ optab_for_tree_code (enum tree_code code, const_tree type,
       return TYPE_UNSIGNED (type) ? usum_widen_optab : ssum_widen_optab;
 
     case DOT_PROD_EXPR:
-      return TYPE_UNSIGNED (type) ? udot_prod_optab : sdot_prod_optab;
+      {
+	if (subtype == optab_vector_mixed_sign)
+	  return usdot_prod_optab;
+
+	return (TYPE_UNSIGNED (type) ? udot_prod_optab : sdot_prod_optab);
+      }
 
     case SAD_EXPR:
       return TYPE_UNSIGNED (type) ? usad_optab : ssad_optab;
diff --git a/gcc/optabs.c b/gcc/optabs.c
index 62a6bdb4c59bf8263c499245795576199606d372..14d8ad2f33fd75388435fe912380e177f8f3c54b 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -262,6 +262,11 @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1, rtx wide_op,
   bool sbool = false;
 
   oprnd0 = ops->op0;
+  if (nops >= 2)
+    oprnd1 = ops->op1;
+  if (nops >= 3)
+    oprnd2 = ops->op2;
+
   tmode0 = TYPE_MODE (TREE_TYPE (oprnd0));
   if (ops->code == VEC_UNPACK_FIX_TRUNC_HI_EXPR
       || ops->code == VEC_UNPACK_FIX_TRUNC_LO_EXPR)
@@ -285,6 +290,27 @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1, rtx wide_op,
 	   ? vec_unpacks_sbool_hi_optab : vec_unpacks_sbool_lo_optab);
       sbool = true;
     }
+  else if (ops->code == DOT_PROD_EXPR)
+    {
+      enum optab_subtype subtype = optab_default;
+      signop sign1 = TYPE_SIGN (TREE_TYPE (oprnd0));
+      signop sign2 = TYPE_SIGN (TREE_TYPE (oprnd1));
+      if (sign1 == sign2)
+	;
+      else if (sign1 == SIGNED && sign2 == UNSIGNED)
+	{
+	  subtype = optab_vector_mixed_sign;
+	  /* Same as optab_vector_mixed_sign but flip the operands.  */
+	  std::swap (op0, op1);
+	}
+      else if (sign1 == UNSIGNED && sign2 == SIGNED)
+	subtype = optab_vector_mixed_sign;
+      else
+	gcc_unreachable ();
+
+      widen_pattern_optab
+	= optab_for_tree_code (ops->code, TREE_TYPE (oprnd0), subtype);
+    }
   else
     widen_pattern_optab
       = optab_for_tree_code (ops->code, TREE_TYPE (oprnd0), optab_default);
@@ -298,10 +324,7 @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1, rtx wide_op,
   gcc_assert (icode != CODE_FOR_nothing);
 
   if (nops >= 2)
-    {
-      oprnd1 = ops->op1;
-      tmode1 = TYPE_MODE (TREE_TYPE (oprnd1));
-    }
+    tmode1 = TYPE_MODE (TREE_TYPE (oprnd1));
   else if (sbool)
     {
       nops = 2;
@@ -316,7 +339,6 @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1, rtx wide_op,
     {
       gcc_assert (tmode1 == tmode0);
       gcc_assert (op1);
-      oprnd2 = ops->op2;
       wmode = TYPE_MODE (TREE_TYPE (oprnd2));
     }
 
diff --git a/gcc/optabs.def b/gcc/optabs.def
index b192a9d070b8aa72e5676b2eaa020b5bdd7ffcc8..f470c2168378cec840edf7fbdb7c18615baae928 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -352,6 +352,7 @@ OPTAB_D (uavg_ceil_optab, "uavg$a3_ceil")
 OPTAB_D (sdot_prod_optab, "sdot_prod$I$a")
 OPTAB_D (ssum_widen_optab, "widen_ssum$I$a3")
 OPTAB_D (udot_prod_optab, "udot_prod$I$a")
+OPTAB_D (usdot_prod_optab, "usdot_prod$I$a")
 OPTAB_D (usum_widen_optab, "widen_usum$I$a3")
 OPTAB_D (usad_optab, "usad$I$a")
 OPTAB_D (ssad_optab, "ssad$I$a")
diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index 02256580c986be426564adc1105ed2e1c69b0efc..f250f0fe99bec5278a0963e92bc1d2a61d9eee70 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -4412,7 +4412,8 @@ verify_gimple_assign_ternary (gassign *stmt)
 		  && !SCALAR_FLOAT_TYPE_P (rhs1_type))
 		 || (!INTEGRAL_TYPE_P (lhs_type)
 		     && !SCALAR_FLOAT_TYPE_P (lhs_type))))
-	    || !types_compatible_p (rhs1_type, rhs2_type)
+	    /* rhs1_type and rhs2_type may differ in sign.  */
+	    || !tree_nop_conversion_p (rhs1_type, rhs2_type)
 	    || !useless_type_conversion_p (lhs_type, rhs3_type)
 	    || maybe_lt (GET_MODE_SIZE (element_mode (rhs3_type)),
 			 2 * GET_MODE_SIZE (element_mode (rhs1_type))))
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index ee79808472cea88786e5c04756980b456c3f5a02..d2accf3c35ade25e8d2ff4ee88136651e3e87c74 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -6663,6 +6663,12 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
   bool lane_reduc_code_p
     = (code == DOT_PROD_EXPR || code == WIDEN_SUM_EXPR || code == SAD_EXPR);
   int op_type = TREE_CODE_LENGTH (code);
+  enum optab_subtype optab_query_kind = optab_vector;
+  if (code == DOT_PROD_EXPR
+      && TYPE_SIGN (TREE_TYPE (gimple_assign_rhs1 (stmt)))
+	   != TYPE_SIGN (TREE_TYPE (gimple_assign_rhs2 (stmt))))
+    optab_query_kind = optab_vector_mixed_sign;
+
 
   scalar_dest = gimple_assign_lhs (stmt);
   scalar_type = TREE_TYPE (scalar_dest);
@@ -7190,7 +7196,7 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
       bool ok = true;
 
       /* 4.1. check support for the operation in the loop  */
-      optab optab = optab_for_tree_code (code, vectype_in, optab_vector);
+      optab optab = optab_for_tree_code (code, vectype_in, optab_query_kind);
       if (!optab)
 	{
 	  if (dump_enabled_p ())
diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
index c6b6feadb8d8d5cc57ded192cd68dd54b9185aef..77605e55dec7b4f6b0a1e1fdafa6313b987fa12c 100644
--- a/gcc/tree-vect-patterns.c
+++ b/gcc/tree-vect-patterns.c
@@ -191,9 +191,9 @@ vect_get_external_def_edge (vec_info *vinfo, tree var)
 }
 
 /* Return true if the target supports a vector version of CODE,
-   where CODE is known to map to a direct optab.  ITYPE specifies
-   the type of (some of) the scalar inputs and OTYPE specifies the
-   type of the scalar result.
+   where CODE is known to map to a direct optab with the given SUBTYPE.
+   ITYPE specifies the type of (some of) the scalar inputs and OTYPE
+   specifies the type of the scalar result.
 
    If CODE allows the inputs and outputs to have different type
    (such as for WIDEN_SUM_EXPR), it is the input mode rather
@@ -208,7 +208,8 @@ vect_get_external_def_edge (vec_info *vinfo, tree var)
 static bool
 vect_supportable_direct_optab_p (vec_info *vinfo, tree otype, tree_code code,
 				 tree itype, tree *vecotype_out,
-				 tree *vecitype_out = NULL)
+				 tree *vecitype_out = NULL,
+				 enum optab_subtype subtype = optab_default)
 {
   tree vecitype = get_vectype_for_scalar_type (vinfo, itype);
   if (!vecitype)
@@ -218,7 +219,7 @@ vect_supportable_direct_optab_p (vec_info *vinfo, tree otype, tree_code code,
   if (!vecotype)
     return false;
 
-  optab optab = optab_for_tree_code (code, vecitype, optab_default);
+  optab optab = optab_for_tree_code (code, vecitype, subtype);
   if (!optab)
     return false;
 
@@ -521,6 +522,9 @@ vect_joust_widened_type (tree type, tree new_type, tree *common_type)
   unsigned int precision = MAX (TYPE_PRECISION (*common_type),
 				TYPE_PRECISION (new_type));
   precision *= 2;
+
+  /* The resulting application is unsigned, check if we have enough
+     precision to perform the operation.  */
   if (precision * 2 > TYPE_PRECISION (type))
     return false;
 
@@ -539,6 +543,10 @@ vect_joust_widened_type (tree type, tree new_type, tree *common_type)
    to a type that (a) is narrower than the result of STMT_INFO and
    (b) can hold all leaf operand values.
 
+   If SUBTYPE then allow that the signs of the operands
+   may differ in signs but not in precision.  SUBTYPE is updated to reflect
+   this.
+
    Return 0 if STMT_INFO isn't such a tree, or if no such COMMON_TYPE
    exists.  */
 
@@ -546,7 +554,8 @@ static unsigned int
 vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
 		      tree_code widened_code, bool shift_p,
 		      unsigned int max_nops,
-		      vect_unpromoted_value *unprom, tree *common_type)
+		      vect_unpromoted_value *unprom, tree *common_type,
+		      enum optab_subtype *subtype = NULL)
 {
   /* Check for an integer operation with the right code.  */
   gassign *assign = dyn_cast <gassign *> (stmt_info->stmt);
@@ -607,7 +616,8 @@ vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
 		= vinfo->lookup_def (this_unprom->op);
 	      nops = vect_widened_op_tree (vinfo, def_stmt_info, code,
 					   widened_code, shift_p, max_nops,
-					   this_unprom, common_type);
+					   this_unprom, common_type,
+					   subtype);
 	      if (nops == 0)
 		return 0;
 
@@ -625,7 +635,24 @@ vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
 		*common_type = this_unprom->type;
 	      else if (!vect_joust_widened_type (type, this_unprom->type,
 						 common_type))
-		return 0;
+		{
+		  if (subtype)
+		    {
+		      tree new_type = *common_type;
+		      /* See if we can sign extend the smaller type.  */
+		      if (TYPE_PRECISION (this_unprom->type) > TYPE_PRECISION (new_type)
+			  && (TYPE_UNSIGNED (this_unprom->type) && !TYPE_UNSIGNED (new_type)))
+			new_type = build_nonstandard_integer_type (TYPE_PRECISION (this_unprom->type), true);
+
+		      if (tree_nop_conversion_p (this_unprom->type, new_type))
+			{
+			  *subtype = optab_vector_mixed_sign;
+			  *common_type = new_type;
+			}
+		    }
+		  else
+		    return 0;
+		}
 	    }
 	}
       next_op += nops;
@@ -806,12 +833,15 @@ vect_convert_input (vec_info *vinfo, stmt_vec_info stmt_info, tree type,
 }
 
 /* Invoke vect_convert_input for N elements of UNPROM and store the
-   result in the corresponding elements of RESULT.  */
+   result in the corresponding elements of RESULT.
+
+   If SUBTYPE then don't convert the types if they only
+   differ by sign.  */
 
 static void
 vect_convert_inputs (vec_info *vinfo, stmt_vec_info stmt_info, unsigned int n,
 		     tree *result, tree type, vect_unpromoted_value *unprom,
-		     tree vectype)
+		     tree vectype, enum optab_subtype subtype = optab_default)
 {
   for (unsigned int i = 0; i < n; ++i)
     {
@@ -819,8 +849,12 @@ vect_convert_inputs (vec_info *vinfo, stmt_vec_info stmt_info, unsigned int n,
       for (j = 0; j < i; ++j)
 	if (unprom[j].op == unprom[i].op)
 	  break;
+
       if (j < i)
 	result[i] = result[j];
+      else if (subtype == optab_vector_mixed_sign
+	       && tree_nop_conversion_p (type, unprom[i].type))
+	result[i] = unprom[i].op;
       else
 	result[i] = vect_convert_input (vinfo, stmt_info,
 					type, &unprom[i], vectype);
@@ -895,7 +929,8 @@ vect_reassociating_reduction_p (vec_info *vinfo,
 
    Try to find the following pattern:
 
-     type x_t, y_t;
+     type1a x_t
+     type1b y_t;
      TYPE1 prod;
      TYPE2 sum = init;
    loop:
@@ -908,8 +943,10 @@ vect_reassociating_reduction_p (vec_info *vinfo,
      [S6  prod = (TYPE2) prod;  #optional]
      S7  sum_1 = prod + sum_0;
 
-   where 'TYPE1' is exactly double the size of type 'type', and 'TYPE2' is the
-   same size of 'TYPE1' or bigger. This is a special case of a reduction
+   where 'TYPE1' is exactly double the size of type 'type1a' and 'type1b',
+   the sign of 'TYPE1' must be one of 'type1a' or 'type1b' but the sign of
+   'type1a' and 'type1b' can differ. 'TYPE2' is the same size of 'TYPE1' or
+   bigger and must be the same sign. This is a special case of a reduction
    computation.
 
    Input:
@@ -946,15 +983,15 @@ vect_recog_dot_prod_pattern (vec_info *vinfo,
 
   /* Look for the following pattern
           DX = (TYPE1) X;
-          DY = (TYPE1) Y;
+	  DY = (TYPE1) Y;
           DPROD = DX * DY;
-          DDPROD = (TYPE2) DPROD;
+	  DDPROD = (TYPE2) DPROD;
           sum_1 = DDPROD + sum_0;
      In which
      - DX is double the size of X
      - DY is double the size of Y
      - DX, DY, DPROD all have the same type but the sign
-       between DX, DY and DPROD can differ.
+       between X, Y and DPROD can differ.
      - sum is the same size of DPROD or bigger
      - sum has been recognized as a reduction variable.
 
@@ -992,21 +1029,27 @@ vect_recog_dot_prod_pattern (vec_info *vinfo,
   /* FORNOW.  Can continue analyzing the def-use chain when this stmt in a phi
      inside the loop (in case we are analyzing an outer-loop).  */
   vect_unpromoted_value unprom0[2];
+  enum optab_subtype subtype = optab_vector;
   if (!vect_widened_op_tree (vinfo, mult_vinfo, MULT_EXPR, WIDEN_MULT_EXPR,
-			     false, 2, unprom0, &half_type))
+			     false, 2, unprom0, &half_type, &subtype))
+    return NULL;
+
+  if (subtype == optab_vector_mixed_sign
+      && TYPE_UNSIGNED (unprom_mult.type)
+      && TYPE_PRECISION (half_type) * 4 > TYPE_PRECISION (unprom_mult.type))
     return NULL;
 
   vect_pattern_detected ("vect_recog_dot_prod_pattern", last_stmt);
 
   tree half_vectype;
   if (!vect_supportable_direct_optab_p (vinfo, type, DOT_PROD_EXPR, half_type,
-					type_out, &half_vectype))
+					type_out, &half_vectype, subtype))
     return NULL;
 
   /* Get the inputs in the appropriate types.  */
   tree mult_oprnd[2];
   vect_convert_inputs (vinfo, stmt_vinfo, 2, mult_oprnd, half_type,
-		       unprom0, half_vectype);
+		       unprom0, half_vectype, subtype);
 
   var = vect_recog_temp_ssa_var (type, NULL);
   pattern_stmt = gimple_build_assign (var, DOT_PROD_EXPR,
Tamar Christina June 21, 2021, 8:11 a.m. UTC | #12
Ping

> -----Original Message-----
> From: Gcc-patches <gcc-patches-
> bounces+tamar.christina=arm.com@gcc.gnu.org> On Behalf Of Tamar
> Christina via Gcc-patches
> Sent: Monday, June 14, 2021 1:06 PM
> To: Richard Sandiford <Richard.Sandiford@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; Richard Biener
> <rguenther@suse.de>
> Subject: RE: [PATCH 1/4]middle-end Vect: Add support for dot-product
> where the sign for the multiplicant changes.
> 
> Hi Richard,
> 
> I've attached a new version of the patch with the changes.
> I have also added 7 new tests in the testsuite to check the cases you
> mentioned.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 	* optabs.def (usdot_prod_optab): New.
> 	* doc/md.texi: Document it and clarify other dot prod optabs.
> 	* optabs-tree.h (enum optab_subtype): Add
> optab_vector_mixed_sign.
> 	* optabs-tree.c (optab_for_tree_code): Support usdot_prod_optab.
> 	* optabs.c (expand_widen_pattern_expr): Likewise.
> 	* tree-cfg.c (verify_gimple_assign_ternary): Likewise.
> 	* tree-vect-loop.c (vectorizable_reduction): Query dot-product kind.
> 	* tree-vect-patterns.c (vect_supportable_direct_optab_p): Take
> optional
> 	optab subtype.
> 	(vect_widened_op_tree): Optionally ignore
> 	mismatch types.
> 	(vect_recog_dot_prod_pattern): Support usdot_prod_optab.
> 
> --- inline copy of patch ---
> 
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index
> 00caf3844ccf8ea289d581839766502d51b9e8d7..1356afb7f903f17c198103562b
> 5cd145ecb9966f 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -5446,13 +5446,55 @@ Like @samp{fold_left_plus_@var{m}}, but takes
> an additional mask operand
> 
>  @cindex @code{sdot_prod@var{m}} instruction pattern  @item
> @samp{sdot_prod@var{m}}
> +
> +Compute the sum of the products of two signed elements.
> +Operand 1 and operand 2 are of the same mode. Their product, which is
> +of a wider mode, is computed and added to operand 3.
> +Operand 3 is of a mode equal or wider than the mode of the product. The
> +result is placed in operand 0, which is of the same mode as operand 3.
> +
> +Semantically the expressions perform the multiplication in the
> +following signs
> +
> +@smallexample
> +sdot<signed c, signed a, signed b> ==
> +   res = sign-ext (a) * sign-ext (b) + c @dots{} @end smallexample
> +
>  @cindex @code{udot_prod@var{m}} instruction pattern -@itemx
> @samp{udot_prod@var{m}} -Compute the sum of the products of two
> signed/unsigned elements.
> -Operand 1 and operand 2 are of the same mode. Their product, which is of a
> -wider mode, is computed and added to operand 3. Operand 3 is of a mode
> equal or -wider than the mode of the product. The result is placed in operand
> 0, which -is of the same mode as operand 3.
> +@item @samp{udot_prod@var{m}}
> +
> +Compute the sum of the products of two unsigned elements.
> +Operand 1 and operand 2 are of the same mode. Their product, which is
> +of a wider mode, is computed and added to operand 3.
> +Operand 3 is of a mode equal or wider than the mode of the product. The
> +result is placed in operand 0, which is of the same mode as operand 3.
> +
> +Semantically the expressions perform the multiplication in the
> +following signs
> +
> +@smallexample
> +udot<unsigned c, unsigned a, unsigned b> ==
> +   res = zero-ext (a) * zero-ext (b) + c @dots{} @end smallexample
> +
> +
> +
> +@cindex @code{usdot_prod@var{m}} instruction pattern
> +@item @samp{usdot_prod@var{m}}
> +Compute the sum of the products of elements of different signs.
> +Operand 1 must be unsigned and operand 2 signed. Their
> +product, which is of a wider mode, is computed and added to operand 3.
> +Operand 3 is of a mode equal or wider than the mode of the product. The
> +result is placed in operand 0, which is of the same mode as operand 3.
> +
> +Semantically the expressions perform the multiplication in the following
> signs
> +
> +@smallexample
> +usdot<unsigned c, unsigned a, signed b> ==
> +   res = ((unsigned-conv) sign-ext (a)) * zero-ext (b) + c
> +@dots{}
> +@end smallexample
> 
>  @cindex @code{ssad@var{m}} instruction pattern
>  @item @samp{ssad@var{m}}
> diff --git a/gcc/optabs-tree.h b/gcc/optabs-tree.h
> index
> c3aaa1a416991e856d3e24da45968a92ebada82c..fbd2b06b8dbfd560dfb66b31
> 4830e6b564b37abb 100644
> --- a/gcc/optabs-tree.h
> +++ b/gcc/optabs-tree.h
> @@ -29,7 +29,8 @@ enum optab_subtype
>  {
>    optab_default,
>    optab_scalar,
> -  optab_vector
> +  optab_vector,
> +  optab_vector_mixed_sign
>  };
> 
>  /* Return the optab used for computing the given operation on the type
> given by
> diff --git a/gcc/optabs-tree.c b/gcc/optabs-tree.c
> index
> 95ffe397c23e80c105afea52e9d47216bf52f55a..eeb5aeed3202cc6971b6447994
> bc5311e9c010bb 100644
> --- a/gcc/optabs-tree.c
> +++ b/gcc/optabs-tree.c
> @@ -127,7 +127,12 @@ optab_for_tree_code (enum tree_code code,
> const_tree type,
>        return TYPE_UNSIGNED (type) ? usum_widen_optab :
> ssum_widen_optab;
> 
>      case DOT_PROD_EXPR:
> -      return TYPE_UNSIGNED (type) ? udot_prod_optab : sdot_prod_optab;
> +      {
> +	if (subtype == optab_vector_mixed_sign)
> +	  return usdot_prod_optab;
> +
> +	return (TYPE_UNSIGNED (type) ? udot_prod_optab :
> sdot_prod_optab);
> +      }
> 
>      case SAD_EXPR:
>        return TYPE_UNSIGNED (type) ? usad_optab : ssad_optab;
> diff --git a/gcc/optabs.c b/gcc/optabs.c
> index
> 62a6bdb4c59bf8263c499245795576199606d372..14d8ad2f33fd75388435fe9123
> 80e177f8f3c54b 100644
> --- a/gcc/optabs.c
> +++ b/gcc/optabs.c
> @@ -262,6 +262,11 @@ expand_widen_pattern_expr (sepops ops, rtx op0,
> rtx op1, rtx wide_op,
>    bool sbool = false;
> 
>    oprnd0 = ops->op0;
> +  if (nops >= 2)
> +    oprnd1 = ops->op1;
> +  if (nops >= 3)
> +    oprnd2 = ops->op2;
> +
>    tmode0 = TYPE_MODE (TREE_TYPE (oprnd0));
>    if (ops->code == VEC_UNPACK_FIX_TRUNC_HI_EXPR
>        || ops->code == VEC_UNPACK_FIX_TRUNC_LO_EXPR)
> @@ -285,6 +290,27 @@ expand_widen_pattern_expr (sepops ops, rtx op0,
> rtx op1, rtx wide_op,
>  	   ? vec_unpacks_sbool_hi_optab : vec_unpacks_sbool_lo_optab);
>        sbool = true;
>      }
> +  else if (ops->code == DOT_PROD_EXPR)
> +    {
> +      enum optab_subtype subtype = optab_default;
> +      signop sign1 = TYPE_SIGN (TREE_TYPE (oprnd0));
> +      signop sign2 = TYPE_SIGN (TREE_TYPE (oprnd1));
> +      if (sign1 == sign2)
> +	;
> +      else if (sign1 == SIGNED && sign2 == UNSIGNED)
> +	{
> +	  subtype = optab_vector_mixed_sign;
> +	  /* Same as optab_vector_mixed_sign but flip the operands.  */
> +	  std::swap (op0, op1);
> +	}
> +      else if (sign1 == UNSIGNED && sign2 == SIGNED)
> +	subtype = optab_vector_mixed_sign;
> +      else
> +	gcc_unreachable ();
> +
> +      widen_pattern_optab
> +	= optab_for_tree_code (ops->code, TREE_TYPE (oprnd0), subtype);
> +    }
>    else
>      widen_pattern_optab
>        = optab_for_tree_code (ops->code, TREE_TYPE (oprnd0), optab_default);
> @@ -298,10 +324,7 @@ expand_widen_pattern_expr (sepops ops, rtx op0,
> rtx op1, rtx wide_op,
>    gcc_assert (icode != CODE_FOR_nothing);
> 
>    if (nops >= 2)
> -    {
> -      oprnd1 = ops->op1;
> -      tmode1 = TYPE_MODE (TREE_TYPE (oprnd1));
> -    }
> +    tmode1 = TYPE_MODE (TREE_TYPE (oprnd1));
>    else if (sbool)
>      {
>        nops = 2;
> @@ -316,7 +339,6 @@ expand_widen_pattern_expr (sepops ops, rtx op0,
> rtx op1, rtx wide_op,
>      {
>        gcc_assert (tmode1 == tmode0);
>        gcc_assert (op1);
> -      oprnd2 = ops->op2;
>        wmode = TYPE_MODE (TREE_TYPE (oprnd2));
>      }
> 
> diff --git a/gcc/optabs.def b/gcc/optabs.def
> index
> b192a9d070b8aa72e5676b2eaa020b5bdd7ffcc8..f470c2168378cec840edf7fbd
> b7c18615baae928 100644
> --- a/gcc/optabs.def
> +++ b/gcc/optabs.def
> @@ -352,6 +352,7 @@ OPTAB_D (uavg_ceil_optab, "uavg$a3_ceil")
>  OPTAB_D (sdot_prod_optab, "sdot_prod$I$a")
>  OPTAB_D (ssum_widen_optab, "widen_ssum$I$a3")
>  OPTAB_D (udot_prod_optab, "udot_prod$I$a")
> +OPTAB_D (usdot_prod_optab, "usdot_prod$I$a")
>  OPTAB_D (usum_widen_optab, "widen_usum$I$a3")
>  OPTAB_D (usad_optab, "usad$I$a")
>  OPTAB_D (ssad_optab, "ssad$I$a")
> diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
> index
> 02256580c986be426564adc1105ed2e1c69b0efc..f250f0fe99bec5278a0963e92b
> c1d2a61d9eee70 100644
> --- a/gcc/tree-cfg.c
> +++ b/gcc/tree-cfg.c
> @@ -4412,7 +4412,8 @@ verify_gimple_assign_ternary (gassign *stmt)
>  		  && !SCALAR_FLOAT_TYPE_P (rhs1_type))
>  		 || (!INTEGRAL_TYPE_P (lhs_type)
>  		     && !SCALAR_FLOAT_TYPE_P (lhs_type))))
> -	    || !types_compatible_p (rhs1_type, rhs2_type)
> +	    /* rhs1_type and rhs2_type may differ in sign.  */
> +	    || !tree_nop_conversion_p (rhs1_type, rhs2_type)
>  	    || !useless_type_conversion_p (lhs_type, rhs3_type)
>  	    || maybe_lt (GET_MODE_SIZE (element_mode (rhs3_type)),
>  			 2 * GET_MODE_SIZE (element_mode (rhs1_type))))
> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> index
> ee79808472cea88786e5c04756980b456c3f5a02..d2accf3c35ade25e8d2ff4ee88
> 136651e3e87c74 100644
> --- a/gcc/tree-vect-loop.c
> +++ b/gcc/tree-vect-loop.c
> @@ -6663,6 +6663,12 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
>    bool lane_reduc_code_p
>      = (code == DOT_PROD_EXPR || code == WIDEN_SUM_EXPR || code ==
> SAD_EXPR);
>    int op_type = TREE_CODE_LENGTH (code);
> +  enum optab_subtype optab_query_kind = optab_vector;
> +  if (code == DOT_PROD_EXPR
> +      && TYPE_SIGN (TREE_TYPE (gimple_assign_rhs1 (stmt)))
> +	   != TYPE_SIGN (TREE_TYPE (gimple_assign_rhs2 (stmt))))
> +    optab_query_kind = optab_vector_mixed_sign;
> +
> 
>    scalar_dest = gimple_assign_lhs (stmt);
>    scalar_type = TREE_TYPE (scalar_dest);
> @@ -7190,7 +7196,7 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
>        bool ok = true;
> 
>        /* 4.1. check support for the operation in the loop  */
> -      optab optab = optab_for_tree_code (code, vectype_in, optab_vector);
> +      optab optab = optab_for_tree_code (code, vectype_in,
> optab_query_kind);
>        if (!optab)
>  	{
>  	  if (dump_enabled_p ())
> diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
> index
> c6b6feadb8d8d5cc57ded192cd68dd54b9185aef..77605e55dec7b4f6b0a1e1fd
> afa6313b987fa12c 100644
> --- a/gcc/tree-vect-patterns.c
> +++ b/gcc/tree-vect-patterns.c
> @@ -191,9 +191,9 @@ vect_get_external_def_edge (vec_info *vinfo, tree
> var)
>  }
> 
>  /* Return true if the target supports a vector version of CODE,
> -   where CODE is known to map to a direct optab.  ITYPE specifies
> -   the type of (some of) the scalar inputs and OTYPE specifies the
> -   type of the scalar result.
> +   where CODE is known to map to a direct optab with the given SUBTYPE.
> +   ITYPE specifies the type of (some of) the scalar inputs and OTYPE
> +   specifies the type of the scalar result.
> 
>     If CODE allows the inputs and outputs to have different type
>     (such as for WIDEN_SUM_EXPR), it is the input mode rather
> @@ -208,7 +208,8 @@ vect_get_external_def_edge (vec_info *vinfo, tree
> var)
>  static bool
>  vect_supportable_direct_optab_p (vec_info *vinfo, tree otype, tree_code
> code,
>  				 tree itype, tree *vecotype_out,
> -				 tree *vecitype_out = NULL)
> +				 tree *vecitype_out = NULL,
> +				 enum optab_subtype subtype =
> optab_default)
>  {
>    tree vecitype = get_vectype_for_scalar_type (vinfo, itype);
>    if (!vecitype)
> @@ -218,7 +219,7 @@ vect_supportable_direct_optab_p (vec_info *vinfo,
> tree otype, tree_code code,
>    if (!vecotype)
>      return false;
> 
> -  optab optab = optab_for_tree_code (code, vecitype, optab_default);
> +  optab optab = optab_for_tree_code (code, vecitype, subtype);
>    if (!optab)
>      return false;
> 
> @@ -521,6 +522,9 @@ vect_joust_widened_type (tree type, tree new_type,
> tree *common_type)
>    unsigned int precision = MAX (TYPE_PRECISION (*common_type),
>  				TYPE_PRECISION (new_type));
>    precision *= 2;
> +
> +  /* The resulting application is unsigned, check if we have enough
> +     precision to perform the operation.  */
>    if (precision * 2 > TYPE_PRECISION (type))
>      return false;
> 
> @@ -539,6 +543,10 @@ vect_joust_widened_type (tree type, tree
> new_type, tree *common_type)
>     to a type that (a) is narrower than the result of STMT_INFO and
>     (b) can hold all leaf operand values.
> 
> +   If SUBTYPE then allow that the signs of the operands
> +   may differ in signs but not in precision.  SUBTYPE is updated to reflect
> +   this.
> +
>     Return 0 if STMT_INFO isn't such a tree, or if no such COMMON_TYPE
>     exists.  */
> 
> @@ -546,7 +554,8 @@ static unsigned int
>  vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info,
> tree_code code,
>  		      tree_code widened_code, bool shift_p,
>  		      unsigned int max_nops,
> -		      vect_unpromoted_value *unprom, tree *common_type)
> +		      vect_unpromoted_value *unprom, tree *common_type,
> +		      enum optab_subtype *subtype = NULL)
>  {
>    /* Check for an integer operation with the right code.  */
>    gassign *assign = dyn_cast <gassign *> (stmt_info->stmt);
> @@ -607,7 +616,8 @@ vect_widened_op_tree (vec_info *vinfo,
> stmt_vec_info stmt_info, tree_code code,
>  		= vinfo->lookup_def (this_unprom->op);
>  	      nops = vect_widened_op_tree (vinfo, def_stmt_info, code,
>  					   widened_code, shift_p, max_nops,
> -					   this_unprom, common_type);
> +					   this_unprom, common_type,
> +					   subtype);
>  	      if (nops == 0)
>  		return 0;
> 
> @@ -625,7 +635,24 @@ vect_widened_op_tree (vec_info *vinfo,
> stmt_vec_info stmt_info, tree_code code,
>  		*common_type = this_unprom->type;
>  	      else if (!vect_joust_widened_type (type, this_unprom->type,
>  						 common_type))
> -		return 0;
> +		{
> +		  if (subtype)
> +		    {
> +		      tree new_type = *common_type;
> +		      /* See if we can sign extend the smaller type.  */
> +		      if (TYPE_PRECISION (this_unprom->type) >
> TYPE_PRECISION (new_type)
> +			  && (TYPE_UNSIGNED (this_unprom->type)
> && !TYPE_UNSIGNED (new_type)))
> +			new_type = build_nonstandard_integer_type
> (TYPE_PRECISION (this_unprom->type), true);
> +
> +		      if (tree_nop_conversion_p (this_unprom->type,
> new_type))
> +			{
> +			  *subtype = optab_vector_mixed_sign;
> +			  *common_type = new_type;
> +			}
> +		    }
> +		  else
> +		    return 0;
> +		}
>  	    }
>  	}
>        next_op += nops;
> @@ -806,12 +833,15 @@ vect_convert_input (vec_info *vinfo,
> stmt_vec_info stmt_info, tree type,
>  }
> 
>  /* Invoke vect_convert_input for N elements of UNPROM and store the
> -   result in the corresponding elements of RESULT.  */
> +   result in the corresponding elements of RESULT.
> +
> +   If SUBTYPE then don't convert the types if they only
> +   differ by sign.  */
> 
>  static void
>  vect_convert_inputs (vec_info *vinfo, stmt_vec_info stmt_info, unsigned
> int n,
>  		     tree *result, tree type, vect_unpromoted_value *unprom,
> -		     tree vectype)
> +		     tree vectype, enum optab_subtype subtype =
> optab_default)
>  {
>    for (unsigned int i = 0; i < n; ++i)
>      {
> @@ -819,8 +849,12 @@ vect_convert_inputs (vec_info *vinfo,
> stmt_vec_info stmt_info, unsigned int n,
>        for (j = 0; j < i; ++j)
>  	if (unprom[j].op == unprom[i].op)
>  	  break;
> +
>        if (j < i)
>  	result[i] = result[j];
> +      else if (subtype == optab_vector_mixed_sign
> +	       && tree_nop_conversion_p (type, unprom[i].type))
> +	result[i] = unprom[i].op;
>        else
>  	result[i] = vect_convert_input (vinfo, stmt_info,
>  					type, &unprom[i], vectype);
> @@ -895,7 +929,8 @@ vect_reassociating_reduction_p (vec_info *vinfo,
> 
>     Try to find the following pattern:
> 
> -     type x_t, y_t;
> +     type1a x_t
> +     type1b y_t;
>       TYPE1 prod;
>       TYPE2 sum = init;
>     loop:
> @@ -908,8 +943,10 @@ vect_reassociating_reduction_p (vec_info *vinfo,
>       [S6  prod = (TYPE2) prod;  #optional]
>       S7  sum_1 = prod + sum_0;
> 
> -   where 'TYPE1' is exactly double the size of type 'type', and 'TYPE2' is the
> -   same size of 'TYPE1' or bigger. This is a special case of a reduction
> +   where 'TYPE1' is exactly double the size of type 'type1a' and 'type1b',
> +   the sign of 'TYPE1' must be one of 'type1a' or 'type1b' but the sign of
> +   'type1a' and 'type1b' can differ. 'TYPE2' is the same size of 'TYPE1' or
> +   bigger and must be the same sign. This is a special case of a reduction
>     computation.
> 
>     Input:
> @@ -946,15 +983,15 @@ vect_recog_dot_prod_pattern (vec_info *vinfo,
> 
>    /* Look for the following pattern
>            DX = (TYPE1) X;
> -          DY = (TYPE1) Y;
> +	  DY = (TYPE1) Y;
>            DPROD = DX * DY;
> -          DDPROD = (TYPE2) DPROD;
> +	  DDPROD = (TYPE2) DPROD;
>            sum_1 = DDPROD + sum_0;
>       In which
>       - DX is double the size of X
>       - DY is double the size of Y
>       - DX, DY, DPROD all have the same type but the sign
> -       between DX, DY and DPROD can differ.
> +       between X, Y and DPROD can differ.
>       - sum is the same size of DPROD or bigger
>       - sum has been recognized as a reduction variable.
> 
> @@ -992,21 +1029,27 @@ vect_recog_dot_prod_pattern (vec_info *vinfo,
>    /* FORNOW.  Can continue analyzing the def-use chain when this stmt in a
> phi
>       inside the loop (in case we are analyzing an outer-loop).  */
>    vect_unpromoted_value unprom0[2];
> +  enum optab_subtype subtype = optab_vector;
>    if (!vect_widened_op_tree (vinfo, mult_vinfo, MULT_EXPR,
> WIDEN_MULT_EXPR,
> -			     false, 2, unprom0, &half_type))
> +			     false, 2, unprom0, &half_type, &subtype))
> +    return NULL;
> +
> +  if (subtype == optab_vector_mixed_sign
> +      && TYPE_UNSIGNED (unprom_mult.type)
> +      && TYPE_PRECISION (half_type) * 4 > TYPE_PRECISION
> (unprom_mult.type))
>      return NULL;
> 
>    vect_pattern_detected ("vect_recog_dot_prod_pattern", last_stmt);
> 
>    tree half_vectype;
>    if (!vect_supportable_direct_optab_p (vinfo, type, DOT_PROD_EXPR,
> half_type,
> -					type_out, &half_vectype))
> +					type_out, &half_vectype, subtype))
>      return NULL;
> 
>    /* Get the inputs in the appropriate types.  */
>    tree mult_oprnd[2];
>    vect_convert_inputs (vinfo, stmt_vinfo, 2, mult_oprnd, half_type,
> -		       unprom0, half_vectype);
> +		       unprom0, half_vectype, subtype);
> 
>    var = vect_recog_temp_ssa_var (type, NULL);
>    pattern_stmt = gimple_build_assign (var, DOT_PROD_EXPR,
Richard Sandiford June 22, 2021, 10:56 a.m. UTC | #13
Sorry for the slow review.

Just concentrating on tree-vect-patterns.c, as before:

Tamar Christina <Tamar.Christina@arm.com> writes:
> @@ -521,6 +522,9 @@ vect_joust_widened_type (tree type, tree new_type, tree *common_type)
>    unsigned int precision = MAX (TYPE_PRECISION (*common_type),
>  				TYPE_PRECISION (new_type));
>    precision *= 2;
> +
> +  /* The resulting application is unsigned, check if we have enough
> +     precision to perform the operation.  */
>    if (precision * 2 > TYPE_PRECISION (type))
>      return false;
>  

Not sure what the comment means by “application” here, but the common
type we pick is signed rather than unsigned.

> @@ -546,7 +554,8 @@ static unsigned int
>  vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
>  		      tree_code widened_code, bool shift_p,
>  		      unsigned int max_nops,
> -		      vect_unpromoted_value *unprom, tree *common_type)
> +		      vect_unpromoted_value *unprom, tree *common_type,
> +		      enum optab_subtype *subtype = NULL)
>  {
>    /* Check for an integer operation with the right code.  */
>    gassign *assign = dyn_cast <gassign *> (stmt_info->stmt);
> @@ -607,7 +616,8 @@ vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
>  		= vinfo->lookup_def (this_unprom->op);
>  	      nops = vect_widened_op_tree (vinfo, def_stmt_info, code,
>  					   widened_code, shift_p, max_nops,
> -					   this_unprom, common_type);
> +					   this_unprom, common_type,
> +					   subtype);
>  	      if (nops == 0)
>  		return 0;
>  
> @@ -625,7 +635,24 @@ vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
>  		*common_type = this_unprom->type;
>  	      else if (!vect_joust_widened_type (type, this_unprom->type,
>  						 common_type))
> -		return 0;
> +		{
> +		  if (subtype)
> +		    {

AIUI, if we get here then:

- there must be one unsigned operand (A) of precision P
- there must be one signed operand (B) with precision <= P
- we can't extend to precision 2*P 

A conversion is needed if B's precision is < P.
That conversion should be to a signed type with precision P.

So…

> +		      tree new_type = *common_type;
> +		      /* See if we can sign extend the smaller type.  */
> +		      if (TYPE_PRECISION (this_unprom->type) > TYPE_PRECISION (new_type)
> +			  && (TYPE_UNSIGNED (this_unprom->type) && !TYPE_UNSIGNED (new_type)))

…I think this second line could be an assert and

> +			new_type = build_nonstandard_integer_type (TYPE_PRECISION (this_unprom->type), true);

…picking an unsigned type here looks wrong.  The net effect would
be to convert B (the previous signed operand) to an unsigned type.

> +
> +		      if (tree_nop_conversion_p (this_unprom->type, new_type))
> +			{
> +			  *subtype = optab_vector_mixed_sign;
> +			  *common_type = new_type;
> +			}

IMO the sign of the common type shouldn't matter for optab_vector_mixed_sign:
if we need to convert operands later, it should be to the precision of
the common type but retaining the sign of the original type.
So I think it would be simpler to do:

		      if (TYPE_PRECISION (this_unprom->type)
			  > TYPE_PRECISION (*common_type)
			*common_type = this_unprom->type;
		      *subtype = optab_vector_mixed_sign;

here and adjust the conversion code as described below.

This also has the advantage of coping with > 2 operands, in case that
ever becomes important in future.

> @@ -806,12 +833,15 @@ vect_convert_input (vec_info *vinfo, stmt_vec_info stmt_info, tree type,
>  }
>  
>  /* Invoke vect_convert_input for N elements of UNPROM and store the
> -   result in the corresponding elements of RESULT.  */
> +   result in the corresponding elements of RESULT.
> +
> +   If SUBTYPE then don't convert the types if they only
> +   differ by sign.  */
>  
>  static void
>  vect_convert_inputs (vec_info *vinfo, stmt_vec_info stmt_info, unsigned int n,
>  		     tree *result, tree type, vect_unpromoted_value *unprom,
> -		     tree vectype)
> +		     tree vectype, enum optab_subtype subtype = optab_default)
>  {
>    for (unsigned int i = 0; i < n; ++i)
>      {
> @@ -819,8 +849,12 @@ vect_convert_inputs (vec_info *vinfo, stmt_vec_info stmt_info, unsigned int n,
>        for (j = 0; j < i; ++j)
>  	if (unprom[j].op == unprom[i].op)
>  	  break;
> +
>        if (j < i)
>  	result[i] = result[j];
> +      else if (subtype == optab_vector_mixed_sign
> +	       && tree_nop_conversion_p (type, unprom[i].type))
> +	result[i] = unprom[i].op;
>        else
>  	result[i] = vect_convert_input (vinfo, stmt_info,
>  					type, &unprom[i], vectype);

As noted above, I think we want to preserve the sign of the original
type for optab_vector_mixed_sign, even if a conversion is needed.
I think we should avoid the special case above and instead push
subtype down into vect_convert_input.  We can then adjust the
type at the head of that function:

  if (subtype == optab_vector_mixed_sign
      && TYPE_SIGN (type) != TYPE_SIGN (TREE_TYPE (unprom->op)))
    type = build_nonstandard_integer_type (TYPE_PRECISION (type),
					   TYPE_SIGN (this_unprom->type));

> @@ -895,7 +929,8 @@ vect_reassociating_reduction_p (vec_info *vinfo,
>  
>     Try to find the following pattern:
>  
> -     type x_t, y_t;
> +     type1a x_t
> +     type1b y_t;
>       TYPE1 prod;
>       TYPE2 sum = init;
>     loop:
> @@ -908,8 +943,10 @@ vect_reassociating_reduction_p (vec_info *vinfo,
>       [S6  prod = (TYPE2) prod;  #optional]
>       S7  sum_1 = prod + sum_0;
>  
> -   where 'TYPE1' is exactly double the size of type 'type', and 'TYPE2' is the
> -   same size of 'TYPE1' or bigger. This is a special case of a reduction
> +   where 'TYPE1' is exactly double the size of type 'type1a' and 'type1b',
> +   the sign of 'TYPE1' must be one of 'type1a' or 'type1b' but the sign of
> +   'type1a' and 'type1b' can differ. 'TYPE2' is the same size of 'TYPE1' or
> +   bigger and must be the same sign. This is a special case of a reduction

This last bit isn't true: TYPE2 is the type of the addition and can be
any sign.

>     computation.
>  
>     Input:
> @@ -946,15 +983,15 @@ vect_recog_dot_prod_pattern (vec_info *vinfo,
>  
>    /* Look for the following pattern
>            DX = (TYPE1) X;
> -          DY = (TYPE1) Y;
> +	  DY = (TYPE1) Y;
>            DPROD = DX * DY;
> -          DDPROD = (TYPE2) DPROD;
> +	  DDPROD = (TYPE2) DPROD;
>            sum_1 = DDPROD + sum_0;

Spurious whitespace changes: would be better to tabify the whole thing
or leave it as-is.

>       In which
>       - DX is double the size of X
>       - DY is double the size of Y
>       - DX, DY, DPROD all have the same type but the sign
> -       between DX, DY and DPROD can differ.
> +       between X, Y and DPROD can differ.
>       - sum is the same size of DPROD or bigger
>       - sum has been recognized as a reduction variable.
>  
> @@ -992,21 +1029,27 @@ vect_recog_dot_prod_pattern (vec_info *vinfo,
>    /* FORNOW.  Can continue analyzing the def-use chain when this stmt in a phi
>       inside the loop (in case we are analyzing an outer-loop).  */
>    vect_unpromoted_value unprom0[2];
> +  enum optab_subtype subtype = optab_vector;
>    if (!vect_widened_op_tree (vinfo, mult_vinfo, MULT_EXPR, WIDEN_MULT_EXPR,
> -			     false, 2, unprom0, &half_type))
> +			     false, 2, unprom0, &half_type, &subtype))
> +    return NULL;
> +
> +  if (subtype == optab_vector_mixed_sign
> +      && TYPE_UNSIGNED (unprom_mult.type)
> +      && TYPE_PRECISION (half_type) * 4 > TYPE_PRECISION (unprom_mult.type))
>      return NULL;

Isn't the final condition here instead that TYPE1 is narrower than TYPE2?
I.e. we need to reject the case in which we multiply a signed and an
unsigned value to get a (logically) signed result, but then zero-extend
it (rather than sign-extend it) to the precision of the addition.

That would make the test:

  if (subtype == optab_vector_mixed_sign
      && TYPE_UNSIGNED (unprom_mult.type)
      && TYPE_PRECISION (unprom_mult.type) < TYPE_PRECISION (type))
    return NULL;    
  
instead.

Thanks,
Richard
Richard Sandiford June 22, 2021, 11:16 a.m. UTC | #14
Richard Sandiford <richard.sandiford@arm.com> writes:
>> @@ -992,21 +1029,27 @@ vect_recog_dot_prod_pattern (vec_info *vinfo,
>>    /* FORNOW.  Can continue analyzing the def-use chain when this stmt in a phi
>>       inside the loop (in case we are analyzing an outer-loop).  */
>>    vect_unpromoted_value unprom0[2];
>> +  enum optab_subtype subtype = optab_vector;
>>    if (!vect_widened_op_tree (vinfo, mult_vinfo, MULT_EXPR, WIDEN_MULT_EXPR,
>> -			     false, 2, unprom0, &half_type))
>> +			     false, 2, unprom0, &half_type, &subtype))
>> +    return NULL;
>> +
>> +  if (subtype == optab_vector_mixed_sign
>> +      && TYPE_UNSIGNED (unprom_mult.type)
>> +      && TYPE_PRECISION (half_type) * 4 > TYPE_PRECISION (unprom_mult.type))
>>      return NULL;
>
> Isn't the final condition here instead that TYPE1 is narrower than TYPE2?
> I.e. we need to reject the case in which we multiply a signed and an
> unsigned value to get a (logically) signed result, but then zero-extend
> it (rather than sign-extend it) to the precision of the addition.
>
> That would make the test:
>
>   if (subtype == optab_vector_mixed_sign
>       && TYPE_UNSIGNED (unprom_mult.type)
>       && TYPE_PRECISION (unprom_mult.type) < TYPE_PRECISION (type))
>     return NULL;    
>   
> instead.

And folding that into the existing test gives:

  /* If there are two widening operations, make sure they agree on the sign
     of the extension.  The result of an optab_vector_mixed_sign operation
     is signed; otherwise, the result has the same sign as the operands.  */
  if (TYPE_PRECISION (unprom_mult.type) != TYPE_PRECISION (type)
      && (subtype == optab_vector_mixed_sign
	  ? TYPE_UNSIGNED (unprom_mult.type)
	  : TYPE_SIGN (unprom_mult.type) != TYPE_SIGN (half_type)))
    return NULL;

Thanks,
Richard
Tamar Christina July 12, 2021, 9:18 a.m. UTC | #15
Hi,

> Richard Sandiford <richard.sandiford@arm.com> writes:
> >> @@ -992,21 +1029,27 @@ vect_recog_dot_prod_pattern (vec_info
> *vinfo,
> >>    /* FORNOW.  Can continue analyzing the def-use chain when this stmt in
> a phi
> >>       inside the loop (in case we are analyzing an outer-loop).  */
> >>    vect_unpromoted_value unprom0[2];
> >> +  enum optab_subtype subtype = optab_vector;
> >>    if (!vect_widened_op_tree (vinfo, mult_vinfo, MULT_EXPR,
> WIDEN_MULT_EXPR,
> >> -			     false, 2, unprom0, &half_type))
> >> +			     false, 2, unprom0, &half_type, &subtype))
> >> +    return NULL;
> >> +
> >> +  if (subtype == optab_vector_mixed_sign
> >> +      && TYPE_UNSIGNED (unprom_mult.type)
> >> +      && TYPE_PRECISION (half_type) * 4 > TYPE_PRECISION
> >> + (unprom_mult.type))
> >>      return NULL;
> >
> > Isn't the final condition here instead that TYPE1 is narrower than TYPE2?
> > I.e. we need to reject the case in which we multiply a signed and an
> > unsigned value to get a (logically) signed result, but then
> > zero-extend it (rather than sign-extend it) to the precision of the addition.
> >
> > That would make the test:
> >
> >   if (subtype == optab_vector_mixed_sign
> >       && TYPE_UNSIGNED (unprom_mult.type)
> >       && TYPE_PRECISION (unprom_mult.type) < TYPE_PRECISION (type))
> >     return NULL;
> >
> > instead.
> 
> And folding that into the existing test gives:
> 
>   /* If there are two widening operations, make sure they agree on the sign
>      of the extension.  The result of an optab_vector_mixed_sign operation
>      is signed; otherwise, the result has the same sign as the operands.  */
>   if (TYPE_PRECISION (unprom_mult.type) != TYPE_PRECISION (type)
>       && (subtype == optab_vector_mixed_sign
> 	  ? TYPE_UNSIGNED (unprom_mult.type)
> 	  : TYPE_SIGN (unprom_mult.type) != TYPE_SIGN (half_type)))
>     return NULL;
> 

I went with the first one which doesn't add the extra constraints for the
normal dotproduct as that makes it too restrictive. It's the type of the
multiplication that determines the operation so dotproduct can be used
a bit more than where we currently do.

This was relaxed in an earlier patch.

Updated patch attached.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* optabs.def (usdot_prod_optab): New.
	* doc/md.texi: Document it and clarify other dot prod optabs.
	* optabs-tree.h (enum optab_subtype): Add optab_vector_mixed_sign.
	* optabs-tree.c (optab_for_tree_code): Support usdot_prod_optab.
	* optabs.c (expand_widen_pattern_expr): Likewise.
	* tree-cfg.c (verify_gimple_assign_ternary): Likewise.
	* tree-vect-loop.c (vectorizable_reduction): Query dot-product kind.
	* tree-vect-patterns.c (vect_supportable_direct_optab_p): Take optional
	optab subtype.
	(vect_widened_op_tree): Optionally ignore
	mismatch types.
	(vect_recog_dot_prod_pattern): Support usdot_prod_optab.

---- Inline copy of patch ----

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 1b91814433057b1b377283fd1f40cb970dc3d243..323ba8eab78e2b2e582fa0633752930182e83ee5 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5446,13 +5446,55 @@ Like @samp{fold_left_plus_@var{m}}, but takes an additional mask operand
 
 @cindex @code{sdot_prod@var{m}} instruction pattern
 @item @samp{sdot_prod@var{m}}
+
+Compute the sum of the products of two signed elements.
+Operand 1 and operand 2 are of the same mode. Their
+product, which is of a wider mode, is computed and added to operand 3.
+Operand 3 is of a mode equal or wider than the mode of the product. The
+result is placed in operand 0, which is of the same mode as operand 3.
+
+Semantically the expressions perform the multiplication in the following signs
+
+@smallexample
+sdot<signed c, signed a, signed b> ==
+   res = sign-ext (a) * sign-ext (b) + c
+@dots{}
+@end smallexample
+
 @cindex @code{udot_prod@var{m}} instruction pattern
-@itemx @samp{udot_prod@var{m}}
-Compute the sum of the products of two signed/unsigned elements.
-Operand 1 and operand 2 are of the same mode. Their product, which is of a
-wider mode, is computed and added to operand 3. Operand 3 is of a mode equal or
-wider than the mode of the product. The result is placed in operand 0, which
-is of the same mode as operand 3.
+@item @samp{udot_prod@var{m}}
+
+Compute the sum of the products of two unsigned elements.
+Operand 1 and operand 2 are of the same mode. Their
+product, which is of a wider mode, is computed and added to operand 3.
+Operand 3 is of a mode equal or wider than the mode of the product. The
+result is placed in operand 0, which is of the same mode as operand 3.
+
+Semantically the expressions perform the multiplication in the following signs
+
+@smallexample
+udot<unsigned c, unsigned a, unsigned b> ==
+   res = zero-ext (a) * zero-ext (b) + c
+@dots{}
+@end smallexample
+
+
+
+@cindex @code{usdot_prod@var{m}} instruction pattern
+@item @samp{usdot_prod@var{m}}
+Compute the sum of the products of elements of different signs.
+Operand 1 must be unsigned and operand 2 signed. Their
+product, which is of a wider mode, is computed and added to operand 3.
+Operand 3 is of a mode equal or wider than the mode of the product. The
+result is placed in operand 0, which is of the same mode as operand 3.
+
+Semantically the expressions perform the multiplication in the following signs
+
+@smallexample
+usdot<unsigned c, unsigned a, signed b> ==
+   res = ((unsigned-conv) sign-ext (a)) * zero-ext (b) + c
+@dots{}
+@end smallexample
 
 @cindex @code{ssad@var{m}} instruction pattern
 @item @samp{ssad@var{m}}
diff --git a/gcc/optabs-tree.h b/gcc/optabs-tree.h
index c3aaa1a416991e856d3e24da45968a92ebada82c..fbd2b06b8dbfd560dfb66b314830e6b564b37abb 100644
--- a/gcc/optabs-tree.h
+++ b/gcc/optabs-tree.h
@@ -29,7 +29,8 @@ enum optab_subtype
 {
   optab_default,
   optab_scalar,
-  optab_vector
+  optab_vector,
+  optab_vector_mixed_sign
 };
 
 /* Return the optab used for computing the given operation on the type given by
diff --git a/gcc/optabs-tree.c b/gcc/optabs-tree.c
index 95ffe397c23e80c105afea52e9d47216bf52f55a..eeb5aeed3202cc6971b6447994bc5311e9c010bb 100644
--- a/gcc/optabs-tree.c
+++ b/gcc/optabs-tree.c
@@ -127,7 +127,12 @@ optab_for_tree_code (enum tree_code code, const_tree type,
       return TYPE_UNSIGNED (type) ? usum_widen_optab : ssum_widen_optab;
 
     case DOT_PROD_EXPR:
-      return TYPE_UNSIGNED (type) ? udot_prod_optab : sdot_prod_optab;
+      {
+	if (subtype == optab_vector_mixed_sign)
+	  return usdot_prod_optab;
+
+	return (TYPE_UNSIGNED (type) ? udot_prod_optab : sdot_prod_optab);
+      }
 
     case SAD_EXPR:
       return TYPE_UNSIGNED (type) ? usad_optab : ssad_optab;
diff --git a/gcc/optabs.c b/gcc/optabs.c
index 62a6bdb4c59bf8263c499245795576199606d372..14d8ad2f33fd75388435fe912380e177f8f3c54b 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -262,6 +262,11 @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1, rtx wide_op,
   bool sbool = false;
 
   oprnd0 = ops->op0;
+  if (nops >= 2)
+    oprnd1 = ops->op1;
+  if (nops >= 3)
+    oprnd2 = ops->op2;
+
   tmode0 = TYPE_MODE (TREE_TYPE (oprnd0));
   if (ops->code == VEC_UNPACK_FIX_TRUNC_HI_EXPR
       || ops->code == VEC_UNPACK_FIX_TRUNC_LO_EXPR)
@@ -285,6 +290,27 @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1, rtx wide_op,
 	   ? vec_unpacks_sbool_hi_optab : vec_unpacks_sbool_lo_optab);
       sbool = true;
     }
+  else if (ops->code == DOT_PROD_EXPR)
+    {
+      enum optab_subtype subtype = optab_default;
+      signop sign1 = TYPE_SIGN (TREE_TYPE (oprnd0));
+      signop sign2 = TYPE_SIGN (TREE_TYPE (oprnd1));
+      if (sign1 == sign2)
+	;
+      else if (sign1 == SIGNED && sign2 == UNSIGNED)
+	{
+	  subtype = optab_vector_mixed_sign;
+	  /* Same as optab_vector_mixed_sign but flip the operands.  */
+	  std::swap (op0, op1);
+	}
+      else if (sign1 == UNSIGNED && sign2 == SIGNED)
+	subtype = optab_vector_mixed_sign;
+      else
+	gcc_unreachable ();
+
+      widen_pattern_optab
+	= optab_for_tree_code (ops->code, TREE_TYPE (oprnd0), subtype);
+    }
   else
     widen_pattern_optab
       = optab_for_tree_code (ops->code, TREE_TYPE (oprnd0), optab_default);
@@ -298,10 +324,7 @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1, rtx wide_op,
   gcc_assert (icode != CODE_FOR_nothing);
 
   if (nops >= 2)
-    {
-      oprnd1 = ops->op1;
-      tmode1 = TYPE_MODE (TREE_TYPE (oprnd1));
-    }
+    tmode1 = TYPE_MODE (TREE_TYPE (oprnd1));
   else if (sbool)
     {
       nops = 2;
@@ -316,7 +339,6 @@ expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1, rtx wide_op,
     {
       gcc_assert (tmode1 == tmode0);
       gcc_assert (op1);
-      oprnd2 = ops->op2;
       wmode = TYPE_MODE (TREE_TYPE (oprnd2));
     }
 
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 41ab2598eb6c32c003cbed490796abf25d2ee315..574d355b6b3092cf893f5ab0e8ae0f6d9ffcefbd 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -352,6 +352,7 @@ OPTAB_D (uavg_ceil_optab, "uavg$a3_ceil")
 OPTAB_D (sdot_prod_optab, "sdot_prod$I$a")
 OPTAB_D (ssum_widen_optab, "widen_ssum$I$a3")
 OPTAB_D (udot_prod_optab, "udot_prod$I$a")
+OPTAB_D (usdot_prod_optab, "usdot_prod$I$a")
 OPTAB_D (usum_widen_optab, "widen_usum$I$a3")
 OPTAB_D (usad_optab, "usad$I$a")
 OPTAB_D (ssad_optab, "ssad$I$a")
diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index c73e1cbdda6b9380190b03de66caee48c4e173e3..3750d2881cbb7fd1e71c0eb8c0d4929925fd4152 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -4434,7 +4434,8 @@ verify_gimple_assign_ternary (gassign *stmt)
 		  && !SCALAR_FLOAT_TYPE_P (rhs1_type))
 		 || (!INTEGRAL_TYPE_P (lhs_type)
 		     && !SCALAR_FLOAT_TYPE_P (lhs_type))))
-	    || !types_compatible_p (rhs1_type, rhs2_type)
+	    /* rhs1_type and rhs2_type may differ in sign.  */
+	    || !tree_nop_conversion_p (rhs1_type, rhs2_type)
 	    || !useless_type_conversion_p (lhs_type, rhs3_type)
 	    || maybe_lt (GET_MODE_SIZE (element_mode (rhs3_type)),
 			 2 * GET_MODE_SIZE (element_mode (rhs1_type))))
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 51a46a6d852fb342278bb9513d013702cff4b868..4e63e84cc70ca60c706c19367ccf256ea3f851b5 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -6663,6 +6663,12 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
   bool lane_reduc_code_p
     = (code == DOT_PROD_EXPR || code == WIDEN_SUM_EXPR || code == SAD_EXPR);
   int op_type = TREE_CODE_LENGTH (code);
+  enum optab_subtype optab_query_kind = optab_vector;
+  if (code == DOT_PROD_EXPR
+      && TYPE_SIGN (TREE_TYPE (gimple_assign_rhs1 (stmt)))
+	   != TYPE_SIGN (TREE_TYPE (gimple_assign_rhs2 (stmt))))
+    optab_query_kind = optab_vector_mixed_sign;
+
 
   scalar_dest = gimple_assign_lhs (stmt);
   scalar_type = TREE_TYPE (scalar_dest);
@@ -7190,7 +7196,7 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
       bool ok = true;
 
       /* 4.1. check support for the operation in the loop  */
-      optab optab = optab_for_tree_code (code, vectype_in, optab_vector);
+      optab optab = optab_for_tree_code (code, vectype_in, optab_query_kind);
       if (!optab)
 	{
 	  if (dump_enabled_p ())
diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
index d71c8c6180c8711687471060e6c937561dfe5caf..13b435c96ffdd0e7a8adf0c8e63523afb69bd2dc 100644
--- a/gcc/tree-vect-patterns.c
+++ b/gcc/tree-vect-patterns.c
@@ -191,9 +191,9 @@ vect_get_external_def_edge (vec_info *vinfo, tree var)
 }
 
 /* Return true if the target supports a vector version of CODE,
-   where CODE is known to map to a direct optab.  ITYPE specifies
-   the type of (some of) the scalar inputs and OTYPE specifies the
-   type of the scalar result.
+   where CODE is known to map to a direct optab with the given SUBTYPE.
+   ITYPE specifies the type of (some of) the scalar inputs and OTYPE
+   specifies the type of the scalar result.
 
    If CODE allows the inputs and outputs to have different type
    (such as for WIDEN_SUM_EXPR), it is the input mode rather
@@ -208,7 +208,8 @@ vect_get_external_def_edge (vec_info *vinfo, tree var)
 static bool
 vect_supportable_direct_optab_p (vec_info *vinfo, tree otype, tree_code code,
 				 tree itype, tree *vecotype_out,
-				 tree *vecitype_out = NULL)
+				 tree *vecitype_out = NULL,
+				 enum optab_subtype subtype = optab_default)
 {
   tree vecitype = get_vectype_for_scalar_type (vinfo, itype);
   if (!vecitype)
@@ -218,7 +219,7 @@ vect_supportable_direct_optab_p (vec_info *vinfo, tree otype, tree_code code,
   if (!vecotype)
     return false;
 
-  optab optab = optab_for_tree_code (code, vecitype, optab_default);
+  optab optab = optab_for_tree_code (code, vecitype, subtype);
   if (!optab)
     return false;
 
@@ -521,6 +522,7 @@ vect_joust_widened_type (tree type, tree new_type, tree *common_type)
   unsigned int precision = MAX (TYPE_PRECISION (*common_type),
 				TYPE_PRECISION (new_type));
   precision *= 2;
+
   if (precision * 2 > TYPE_PRECISION (type))
     return false;
 
@@ -539,6 +541,10 @@ vect_joust_widened_type (tree type, tree new_type, tree *common_type)
    to a type that (a) is narrower than the result of STMT_INFO and
    (b) can hold all leaf operand values.
 
+   If SUBTYPE then allow that the signs of the operands
+   may differ in signs but not in precision.  SUBTYPE is updated to reflect
+   this.
+
    Return 0 if STMT_INFO isn't such a tree, or if no such COMMON_TYPE
    exists.  */
 
@@ -546,7 +552,8 @@ static unsigned int
 vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
 		      tree_code widened_code, bool shift_p,
 		      unsigned int max_nops,
-		      vect_unpromoted_value *unprom, tree *common_type)
+		      vect_unpromoted_value *unprom, tree *common_type,
+		      enum optab_subtype *subtype = NULL)
 {
   /* Check for an integer operation with the right code.  */
   gassign *assign = dyn_cast <gassign *> (stmt_info->stmt);
@@ -607,7 +614,8 @@ vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
 		= vinfo->lookup_def (this_unprom->op);
 	      nops = vect_widened_op_tree (vinfo, def_stmt_info, code,
 					   widened_code, shift_p, max_nops,
-					   this_unprom, common_type);
+					   this_unprom, common_type,
+					   subtype);
 	      if (nops == 0)
 		return 0;
 
@@ -625,7 +633,18 @@ vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
 		*common_type = this_unprom->type;
 	      else if (!vect_joust_widened_type (type, this_unprom->type,
 						 common_type))
-		return 0;
+		{
+		  if (subtype)
+		    {
+		      /* See if we can sign extend the smaller type.  */
+		      if (TYPE_PRECISION (this_unprom->type)
+			  > TYPE_PRECISION (*common_type))
+			*common_type = this_unprom->type;
+		      *subtype = optab_vector_mixed_sign;
+		    }
+		  else
+		    return 0;
+		}
 	    }
 	}
       next_op += nops;
@@ -725,12 +744,22 @@ vect_split_statement (vec_info *vinfo, stmt_vec_info stmt2_info, tree new_rhs,
 
 /* Convert UNPROM to TYPE and return the result, adding new statements
    to STMT_INFO's pattern definition statements if no better way is
-   available.  VECTYPE is the vector form of TYPE.  */
+   available.  VECTYPE is the vector form of TYPE.
+
+   If SUBTYPE then convert the type based on the subtype.  */
 
 static tree
 vect_convert_input (vec_info *vinfo, stmt_vec_info stmt_info, tree type,
-		    vect_unpromoted_value *unprom, tree vectype)
+		    vect_unpromoted_value *unprom, tree vectype,
+		    enum optab_subtype subtype = optab_default)
 {
+
+  /* Update the type if the signs differ.  */
+  if (subtype == optab_vector_mixed_sign
+      && TYPE_SIGN (type) != TYPE_SIGN (TREE_TYPE (unprom->op)))
+    type = build_nonstandard_integer_type (TYPE_PRECISION (type),
+					   TYPE_SIGN (unprom->type));
+
   /* Check for a no-op conversion.  */
   if (types_compatible_p (type, TREE_TYPE (unprom->op)))
     return unprom->op;
@@ -806,12 +835,14 @@ vect_convert_input (vec_info *vinfo, stmt_vec_info stmt_info, tree type,
 }
 
 /* Invoke vect_convert_input for N elements of UNPROM and store the
-   result in the corresponding elements of RESULT.  */
+   result in the corresponding elements of RESULT.
+
+   If SUBTYPE then convert the type based on the subtype.  */
 
 static void
 vect_convert_inputs (vec_info *vinfo, stmt_vec_info stmt_info, unsigned int n,
 		     tree *result, tree type, vect_unpromoted_value *unprom,
-		     tree vectype)
+		     tree vectype, enum optab_subtype subtype = optab_default)
 {
   for (unsigned int i = 0; i < n; ++i)
     {
@@ -819,11 +850,12 @@ vect_convert_inputs (vec_info *vinfo, stmt_vec_info stmt_info, unsigned int n,
       for (j = 0; j < i; ++j)
 	if (unprom[j].op == unprom[i].op)
 	  break;
+
       if (j < i)
 	result[i] = result[j];
       else
 	result[i] = vect_convert_input (vinfo, stmt_info,
-					type, &unprom[i], vectype);
+					type, &unprom[i], vectype, subtype);
     }
 }
 
@@ -895,7 +927,8 @@ vect_reassociating_reduction_p (vec_info *vinfo,
 
    Try to find the following pattern:
 
-     type x_t, y_t;
+     type1a x_t
+     type1b y_t;
      TYPE1 prod;
      TYPE2 sum = init;
    loop:
@@ -908,9 +941,9 @@ vect_reassociating_reduction_p (vec_info *vinfo,
      [S6  prod = (TYPE2) prod;  #optional]
      S7  sum_1 = prod + sum_0;
 
-   where 'TYPE1' is exactly double the size of type 'type', and 'TYPE2' is the
-   same size of 'TYPE1' or bigger. This is a special case of a reduction
-   computation.
+   where 'TYPE1' is exactly double the size of type 'type1a' and 'type1b',
+   the sign of 'TYPE1' must be one of 'type1a' or 'type1b' but the sign of
+   'type1a' and 'type1b' can differ.
 
    Input:
 
@@ -954,7 +987,7 @@ vect_recog_dot_prod_pattern (vec_info *vinfo,
      - DX is double the size of X
      - DY is double the size of Y
      - DX, DY, DPROD all have the same type but the sign
-       between DX, DY and DPROD can differ.
+       between X, Y and DPROD can differ.
      - sum is the same size of DPROD or bigger
      - sum has been recognized as a reduction variable.
 
@@ -992,21 +1025,30 @@ vect_recog_dot_prod_pattern (vec_info *vinfo,
   /* FORNOW.  Can continue analyzing the def-use chain when this stmt in a phi
      inside the loop (in case we are analyzing an outer-loop).  */
   vect_unpromoted_value unprom0[2];
+  enum optab_subtype subtype = optab_vector;
   if (!vect_widened_op_tree (vinfo, mult_vinfo, MULT_EXPR, WIDEN_MULT_EXPR,
-			     false, 2, unprom0, &half_type))
+			     false, 2, unprom0, &half_type, &subtype))
+    return NULL;
+
+  /* If there are two widening operations, make sure they agree on the sign
+     of the extension.  The result of an optab_vector_mixed_sign operation
+     is signed.  */
+  if (subtype == optab_vector_mixed_sign
+      && TYPE_UNSIGNED (unprom_mult.type)
+      && TYPE_PRECISION (unprom_mult.type) < TYPE_PRECISION (type))
     return NULL;
 
   vect_pattern_detected ("vect_recog_dot_prod_pattern", last_stmt);
 
   tree half_vectype;
   if (!vect_supportable_direct_optab_p (vinfo, type, DOT_PROD_EXPR, half_type,
-					type_out, &half_vectype))
+					type_out, &half_vectype, subtype))
     return NULL;
 
   /* Get the inputs in the appropriate types.  */
   tree mult_oprnd[2];
   vect_convert_inputs (vinfo, stmt_vinfo, 2, mult_oprnd, half_type,
-		       unprom0, half_vectype);
+		       unprom0, half_vectype, subtype);
 
   var = vect_recog_temp_ssa_var (type, NULL);
   pattern_stmt = gimple_build_assign (var, DOT_PROD_EXPR,
Richard Sandiford July 12, 2021, 9:39 a.m. UTC | #16
Tamar Christina <Tamar.Christina@arm.com> writes:
> Hi,
>
>> Richard Sandiford <richard.sandiford@arm.com> writes:
>> >> @@ -992,21 +1029,27 @@ vect_recog_dot_prod_pattern (vec_info
>> *vinfo,
>> >>    /* FORNOW.  Can continue analyzing the def-use chain when this stmt in
>> a phi
>> >>       inside the loop (in case we are analyzing an outer-loop).  */
>> >>    vect_unpromoted_value unprom0[2];
>> >> +  enum optab_subtype subtype = optab_vector;
>> >>    if (!vect_widened_op_tree (vinfo, mult_vinfo, MULT_EXPR,
>> WIDEN_MULT_EXPR,
>> >> -			     false, 2, unprom0, &half_type))
>> >> +			     false, 2, unprom0, &half_type, &subtype))
>> >> +    return NULL;
>> >> +
>> >> +  if (subtype == optab_vector_mixed_sign
>> >> +      && TYPE_UNSIGNED (unprom_mult.type)
>> >> +      && TYPE_PRECISION (half_type) * 4 > TYPE_PRECISION
>> >> + (unprom_mult.type))
>> >>      return NULL;
>> >
>> > Isn't the final condition here instead that TYPE1 is narrower than TYPE2?
>> > I.e. we need to reject the case in which we multiply a signed and an
>> > unsigned value to get a (logically) signed result, but then
>> > zero-extend it (rather than sign-extend it) to the precision of the addition.
>> >
>> > That would make the test:
>> >
>> >   if (subtype == optab_vector_mixed_sign
>> >       && TYPE_UNSIGNED (unprom_mult.type)
>> >       && TYPE_PRECISION (unprom_mult.type) < TYPE_PRECISION (type))
>> >     return NULL;
>> >
>> > instead.
>> 
>> And folding that into the existing test gives:
>> 
>>   /* If there are two widening operations, make sure they agree on the sign
>>      of the extension.  The result of an optab_vector_mixed_sign operation
>>      is signed; otherwise, the result has the same sign as the operands.  */
>>   if (TYPE_PRECISION (unprom_mult.type) != TYPE_PRECISION (type)
>>       && (subtype == optab_vector_mixed_sign
>> 	  ? TYPE_UNSIGNED (unprom_mult.type)
>> 	  : TYPE_SIGN (unprom_mult.type) != TYPE_SIGN (half_type)))
>>     return NULL;
>> 
>
> I went with the first one which doesn't add the extra constraints for the
> normal dotproduct as that makes it too restrictive. It's the type of the
> multiplication that determines the operation so dotproduct can be used
> a bit more than where we currently do.
>
> This was relaxed in an earlier patch.

I didn't mean that we should add extra constraints to the normal case
though.  The existing test I was referring to above was:

  /* If there are two widening operations, make sure they agree on
     the sign of the extension.  */
  if (TYPE_PRECISION (unprom_mult.type) != TYPE_PRECISION (type)
      && TYPE_SIGN (unprom_mult.type) != TYPE_SIGN (half_type))
    return NULL;

Although this existing test makes sense for the normal case, IMO testing
TYPE_SIGN (half_type) doesn't make sense for the mixed-sign case.  I think
we should therefore replace the existing test with:

  /* If there are two widening operations, make sure they agree on the sign
     of the extension.  The result of an optab_vector_mixed_sign operation
     is signed; otherwise, the result has the same sign as the operands.  */
  if (TYPE_PRECISION (unprom_mult.type) != TYPE_PRECISION (type)
      && (subtype == optab_vector_mixed_sign
         ? TYPE_UNSIGNED (unprom_mult.type)
         : TYPE_SIGN (unprom_mult.type) != TYPE_SIGN (half_type)))
    return NULL;

rather than add a separate condition for the mixed-sign case.
The behaviour of the normal case is the same both ways.

Thanks,
Richard
Tamar Christina July 12, 2021, 9:56 a.m. UTC | #17
> -----Original Message-----
> From: Richard Sandiford <richard.sandiford@arm.com>
> Sent: Monday, July 12, 2021 10:39 AM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: Richard Biener <rguenther@suse.de>; nd <nd@arm.com>; gcc-
> patches@gcc.gnu.org
> Subject: Re: [PATCH 1/4]middle-end Vect: Add support for dot-product
> where the sign for the multiplicant changes.
> 
> Tamar Christina <Tamar.Christina@arm.com> writes:
> > Hi,
> >
> >> Richard Sandiford <richard.sandiford@arm.com> writes:
> >> >> @@ -992,21 +1029,27 @@ vect_recog_dot_prod_pattern (vec_info
> >> *vinfo,
> >> >>    /* FORNOW.  Can continue analyzing the def-use chain when this
> >> >> stmt in
> >> a phi
> >> >>       inside the loop (in case we are analyzing an outer-loop).  */
> >> >>    vect_unpromoted_value unprom0[2];
> >> >> +  enum optab_subtype subtype = optab_vector;
> >> >>    if (!vect_widened_op_tree (vinfo, mult_vinfo, MULT_EXPR,
> >> WIDEN_MULT_EXPR,
> >> >> -			     false, 2, unprom0, &half_type))
> >> >> +			     false, 2, unprom0, &half_type, &subtype))
> >> >> +    return NULL;
> >> >> +
> >> >> +  if (subtype == optab_vector_mixed_sign
> >> >> +      && TYPE_UNSIGNED (unprom_mult.type)
> >> >> +      && TYPE_PRECISION (half_type) * 4 > TYPE_PRECISION
> >> >> + (unprom_mult.type))
> >> >>      return NULL;
> >> >
> >> > Isn't the final condition here instead that TYPE1 is narrower than TYPE2?
> >> > I.e. we need to reject the case in which we multiply a signed and
> >> > an unsigned value to get a (logically) signed result, but then
> >> > zero-extend it (rather than sign-extend it) to the precision of the
> addition.
> >> >
> >> > That would make the test:
> >> >
> >> >   if (subtype == optab_vector_mixed_sign
> >> >       && TYPE_UNSIGNED (unprom_mult.type)
> >> >       && TYPE_PRECISION (unprom_mult.type) < TYPE_PRECISION (type))
> >> >     return NULL;
> >> >
> >> > instead.
> >>
> >> And folding that into the existing test gives:
> >>
> >>   /* If there are two widening operations, make sure they agree on the
> sign
> >>      of the extension.  The result of an optab_vector_mixed_sign operation
> >>      is signed; otherwise, the result has the same sign as the operands.  */
> >>   if (TYPE_PRECISION (unprom_mult.type) != TYPE_PRECISION (type)
> >>       && (subtype == optab_vector_mixed_sign
> >> 	  ? TYPE_UNSIGNED (unprom_mult.type)
> >> 	  : TYPE_SIGN (unprom_mult.type) != TYPE_SIGN (half_type)))
> >>     return NULL;
> >>
> >
> > I went with the first one which doesn't add the extra constraints for
> > the normal dotproduct as that makes it too restrictive. It's the type
> > of the multiplication that determines the operation so dotproduct can
> > be used a bit more than where we currently do.
> >
> > This was relaxed in an earlier patch.
> 
> I didn't mean that we should add extra constraints to the normal case though.
> The existing test I was referring to above was:
> 
>   /* If there are two widening operations, make sure they agree on
>      the sign of the extension.  */
>   if (TYPE_PRECISION (unprom_mult.type) != TYPE_PRECISION (type)
>       && TYPE_SIGN (unprom_mult.type) != TYPE_SIGN (half_type))
>     return NULL;

But as I mentioned, this restriction is unneeded and has been removed hence why it's not in my patchset's diff.
It's removed by https://gcc.gnu.org/pipermail/gcc-patches/2021-May/569851.html which Richi conditioned on
the rest of these patches being approved.

This change needlessly blocks test vect-reduc-dot-[2,3,6,7].c from being dotproducts for instance

It's also part of the deficiency between GCC codegen and Clang https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88492#c6

Regards,
Tamar

> 
> Although this existing test makes sense for the normal case, IMO testing
> TYPE_SIGN (half_type) doesn't make sense for the mixed-sign case.  I think
> we should therefore replace the existing test with:
> 
>   /* If there are two widening operations, make sure they agree on the sign
>      of the extension.  The result of an optab_vector_mixed_sign operation
>      is signed; otherwise, the result has the same sign as the operands.  */
>   if (TYPE_PRECISION (unprom_mult.type) != TYPE_PRECISION (type)
>       && (subtype == optab_vector_mixed_sign
>          ? TYPE_UNSIGNED (unprom_mult.type)
>          : TYPE_SIGN (unprom_mult.type) != TYPE_SIGN (half_type)))
>     return NULL;
> 
> rather than add a separate condition for the mixed-sign case.
> The behaviour of the normal case is the same both ways.
> 
> Thanks,
> Richard
>
Richard Sandiford July 12, 2021, 10:25 a.m. UTC | #18
Tamar Christina <Tamar.Christina@arm.com> writes:
>> -----Original Message-----
>> From: Richard Sandiford <richard.sandiford@arm.com>
>> Sent: Monday, July 12, 2021 10:39 AM
>> To: Tamar Christina <Tamar.Christina@arm.com>
>> Cc: Richard Biener <rguenther@suse.de>; nd <nd@arm.com>; gcc-
>> patches@gcc.gnu.org
>> Subject: Re: [PATCH 1/4]middle-end Vect: Add support for dot-product
>> where the sign for the multiplicant changes.
>> 
>> Tamar Christina <Tamar.Christina@arm.com> writes:
>> > Hi,
>> >
>> >> Richard Sandiford <richard.sandiford@arm.com> writes:
>> >> >> @@ -992,21 +1029,27 @@ vect_recog_dot_prod_pattern (vec_info
>> >> *vinfo,
>> >> >>    /* FORNOW.  Can continue analyzing the def-use chain when this
>> >> >> stmt in
>> >> a phi
>> >> >>       inside the loop (in case we are analyzing an outer-loop).  */
>> >> >>    vect_unpromoted_value unprom0[2];
>> >> >> +  enum optab_subtype subtype = optab_vector;
>> >> >>    if (!vect_widened_op_tree (vinfo, mult_vinfo, MULT_EXPR,
>> >> WIDEN_MULT_EXPR,
>> >> >> -			     false, 2, unprom0, &half_type))
>> >> >> +			     false, 2, unprom0, &half_type, &subtype))
>> >> >> +    return NULL;
>> >> >> +
>> >> >> +  if (subtype == optab_vector_mixed_sign
>> >> >> +      && TYPE_UNSIGNED (unprom_mult.type)
>> >> >> +      && TYPE_PRECISION (half_type) * 4 > TYPE_PRECISION
>> >> >> + (unprom_mult.type))
>> >> >>      return NULL;
>> >> >
>> >> > Isn't the final condition here instead that TYPE1 is narrower than TYPE2?
>> >> > I.e. we need to reject the case in which we multiply a signed and
>> >> > an unsigned value to get a (logically) signed result, but then
>> >> > zero-extend it (rather than sign-extend it) to the precision of the
>> addition.
>> >> >
>> >> > That would make the test:
>> >> >
>> >> >   if (subtype == optab_vector_mixed_sign
>> >> >       && TYPE_UNSIGNED (unprom_mult.type)
>> >> >       && TYPE_PRECISION (unprom_mult.type) < TYPE_PRECISION (type))
>> >> >     return NULL;
>> >> >
>> >> > instead.
>> >>
>> >> And folding that into the existing test gives:
>> >>
>> >>   /* If there are two widening operations, make sure they agree on the
>> sign
>> >>      of the extension.  The result of an optab_vector_mixed_sign operation
>> >>      is signed; otherwise, the result has the same sign as the operands.  */
>> >>   if (TYPE_PRECISION (unprom_mult.type) != TYPE_PRECISION (type)
>> >>       && (subtype == optab_vector_mixed_sign
>> >> 	  ? TYPE_UNSIGNED (unprom_mult.type)
>> >> 	  : TYPE_SIGN (unprom_mult.type) != TYPE_SIGN (half_type)))
>> >>     return NULL;
>> >>
>> >
>> > I went with the first one which doesn't add the extra constraints for
>> > the normal dotproduct as that makes it too restrictive. It's the type
>> > of the multiplication that determines the operation so dotproduct can
>> > be used a bit more than where we currently do.
>> >
>> > This was relaxed in an earlier patch.
>> 
>> I didn't mean that we should add extra constraints to the normal case though.
>> The existing test I was referring to above was:
>> 
>>   /* If there are two widening operations, make sure they agree on
>>      the sign of the extension.  */
>>   if (TYPE_PRECISION (unprom_mult.type) != TYPE_PRECISION (type)
>>       && TYPE_SIGN (unprom_mult.type) != TYPE_SIGN (half_type))
>>     return NULL;
>
> But as I mentioned, this restriction is unneeded and has been removed hence why it's not in my patchset's diff.
> It's removed by https://gcc.gnu.org/pipermail/gcc-patches/2021-May/569851.html which Richi conditioned on
> the rest of these patches being approved.
>
> This change needlessly blocks test vect-reduc-dot-[2,3,6,7].c from being dotproducts for instance
>
> It's also part of the deficiency between GCC codegen and Clang https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88492#c6

Hmm, OK.  Just removing the check regresses:

unsigned long __attribute__ ((noipa))
f (signed short *x, signed short *y)
{
  unsigned long res = 0;
  for (int i = 0; i < 100; ++i)
    res += (unsigned int) x[i] * (unsigned int) y[i];
  return res;
}

int
main (void)
{
  signed short x[100], y[100];
  for (int i = 0; i < 100; ++i)
    {
      x[i] = -1;
      y[i] = 1;
    }
  if (f (x, y) != 0x6400000000ULL - 100)
    __builtin_abort ();
  return 0;
}

on SVE.  We then use SDOT even though the result of the multiplication
is zero- rather than sign-extended to 64 bits.  Does something else
in the series stop that from that happening?

Richard
Tamar Christina July 12, 2021, 12:29 p.m. UTC | #19
> -----Original Message-----
> From: Richard Sandiford <richard.sandiford@arm.com>
> Sent: Monday, July 12, 2021 11:26 AM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: Richard Biener <rguenther@suse.de>; nd <nd@arm.com>; gcc-
> patches@gcc.gnu.org
> Subject: Re: [PATCH 1/4]middle-end Vect: Add support for dot-product
> where the sign for the multiplicant changes.
> 
> Tamar Christina <Tamar.Christina@arm.com> writes:
> >> -----Original Message-----
> >> From: Richard Sandiford <richard.sandiford@arm.com>
> >> Sent: Monday, July 12, 2021 10:39 AM
> >> To: Tamar Christina <Tamar.Christina@arm.com>
> >> Cc: Richard Biener <rguenther@suse.de>; nd <nd@arm.com>; gcc-
> >> patches@gcc.gnu.org
> >> Subject: Re: [PATCH 1/4]middle-end Vect: Add support for dot-product
> >> where the sign for the multiplicant changes.
> >>
> >> Tamar Christina <Tamar.Christina@arm.com> writes:
> >> > Hi,
> >> >
> >> >> Richard Sandiford <richard.sandiford@arm.com> writes:
> >> >> >> @@ -992,21 +1029,27 @@ vect_recog_dot_prod_pattern (vec_info
> >> >> *vinfo,
> >> >> >>    /* FORNOW.  Can continue analyzing the def-use chain when
> >> >> >> this stmt in
> >> >> a phi
> >> >> >>       inside the loop (in case we are analyzing an outer-loop).  */
> >> >> >>    vect_unpromoted_value unprom0[2];
> >> >> >> +  enum optab_subtype subtype = optab_vector;
> >> >> >>    if (!vect_widened_op_tree (vinfo, mult_vinfo, MULT_EXPR,
> >> >> WIDEN_MULT_EXPR,
> >> >> >> -			     false, 2, unprom0, &half_type))
> >> >> >> +			     false, 2, unprom0, &half_type, &subtype))
> >> >> >> +    return NULL;
> >> >> >> +
> >> >> >> +  if (subtype == optab_vector_mixed_sign
> >> >> >> +      && TYPE_UNSIGNED (unprom_mult.type)
> >> >> >> +      && TYPE_PRECISION (half_type) * 4 > TYPE_PRECISION
> >> >> >> + (unprom_mult.type))
> >> >> >>      return NULL;
> >> >> >
> >> >> > Isn't the final condition here instead that TYPE1 is narrower than
> TYPE2?
> >> >> > I.e. we need to reject the case in which we multiply a signed
> >> >> > and an unsigned value to get a (logically) signed result, but
> >> >> > then zero-extend it (rather than sign-extend it) to the
> >> >> > precision of the
> >> addition.
> >> >> >
> >> >> > That would make the test:
> >> >> >
> >> >> >   if (subtype == optab_vector_mixed_sign
> >> >> >       && TYPE_UNSIGNED (unprom_mult.type)
> >> >> >       && TYPE_PRECISION (unprom_mult.type) < TYPE_PRECISION
> (type))
> >> >> >     return NULL;
> >> >> >
> >> >> > instead.
> >> >>
> >> >> And folding that into the existing test gives:
> >> >>
> >> >>   /* If there are two widening operations, make sure they agree on
> >> >> the
> >> sign
> >> >>      of the extension.  The result of an optab_vector_mixed_sign
> operation
> >> >>      is signed; otherwise, the result has the same sign as the operands.
> */
> >> >>   if (TYPE_PRECISION (unprom_mult.type) != TYPE_PRECISION (type)
> >> >>       && (subtype == optab_vector_mixed_sign
> >> >> 	  ? TYPE_UNSIGNED (unprom_mult.type)
> >> >> 	  : TYPE_SIGN (unprom_mult.type) != TYPE_SIGN (half_type)))
> >> >>     return NULL;
> >> >>
> >> >
> >> > I went with the first one which doesn't add the extra constraints
> >> > for the normal dotproduct as that makes it too restrictive. It's
> >> > the type of the multiplication that determines the operation so
> >> > dotproduct can be used a bit more than where we currently do.
> >> >
> >> > This was relaxed in an earlier patch.
> >>
> >> I didn't mean that we should add extra constraints to the normal case
> though.
> >> The existing test I was referring to above was:
> >>
> >>   /* If there are two widening operations, make sure they agree on
> >>      the sign of the extension.  */
> >>   if (TYPE_PRECISION (unprom_mult.type) != TYPE_PRECISION (type)
> >>       && TYPE_SIGN (unprom_mult.type) != TYPE_SIGN (half_type))
> >>     return NULL;
> >
> > But as I mentioned, this restriction is unneeded and has been removed
> hence why it's not in my patchset's diff.
> > It's removed by
> > https://gcc.gnu.org/pipermail/gcc-patches/2021-May/569851.html which
> Richi conditioned on the rest of these patches being approved.
> >
> > This change needlessly blocks test vect-reduc-dot-[2,3,6,7].c from
> > being dotproducts for instance
> >
> > It's also part of the deficiency between GCC codegen and Clang
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88492#c6
> 
> Hmm, OK.  Just removing the check regresses:
> 
> unsigned long __attribute__ ((noipa))
> f (signed short *x, signed short *y)
> {
>   unsigned long res = 0;
>   for (int i = 0; i < 100; ++i)
>     res += (unsigned int) x[i] * (unsigned int) y[i];
>   return res;
> }
> 
> int
> main (void)
> {
>   signed short x[100], y[100];
>   for (int i = 0; i < 100; ++i)
>     {
>       x[i] = -1;
>       y[i] = 1;
>     }
>   if (f (x, y) != 0x6400000000ULL - 100)
>     __builtin_abort ();
>   return 0;
> }
> 
> on SVE.  We then use SDOT even though the result of the multiplication is
> zero- rather than sign-extended to 64 bits.  Does something else in the series
> stop that from that happening?

No, and I hadn't noticed it before because it looks like the mid-end tests that are execution test don't turn on dot-product for arm targets :/ 

I'll look at it separately, for now I've then added the check back in.

Ok for trunk now?

Thanks,
Tamar

> 
> Richard
Richard Sandiford July 12, 2021, 2:55 p.m. UTC | #20
Tamar Christina <Tamar.Christina@arm.com> writes:
>> -----Original Message-----
>> From: Richard Sandiford <richard.sandiford@arm.com>
>> Sent: Monday, July 12, 2021 11:26 AM
>> To: Tamar Christina <Tamar.Christina@arm.com>
>> Cc: Richard Biener <rguenther@suse.de>; nd <nd@arm.com>; gcc-
>> patches@gcc.gnu.org
>> Subject: Re: [PATCH 1/4]middle-end Vect: Add support for dot-product
>> where the sign for the multiplicant changes.
>> 
>> Tamar Christina <Tamar.Christina@arm.com> writes:
>> >> -----Original Message-----
>> >> From: Richard Sandiford <richard.sandiford@arm.com>
>> >> Sent: Monday, July 12, 2021 10:39 AM
>> >> To: Tamar Christina <Tamar.Christina@arm.com>
>> >> Cc: Richard Biener <rguenther@suse.de>; nd <nd@arm.com>; gcc-
>> >> patches@gcc.gnu.org
>> >> Subject: Re: [PATCH 1/4]middle-end Vect: Add support for dot-product
>> >> where the sign for the multiplicant changes.
>> >>
>> >> Tamar Christina <Tamar.Christina@arm.com> writes:
>> >> > Hi,
>> >> >
>> >> >> Richard Sandiford <richard.sandiford@arm.com> writes:
>> >> >> >> @@ -992,21 +1029,27 @@ vect_recog_dot_prod_pattern (vec_info
>> >> >> *vinfo,
>> >> >> >>    /* FORNOW.  Can continue analyzing the def-use chain when
>> >> >> >> this stmt in
>> >> >> a phi
>> >> >> >>       inside the loop (in case we are analyzing an outer-loop).  */
>> >> >> >>    vect_unpromoted_value unprom0[2];
>> >> >> >> +  enum optab_subtype subtype = optab_vector;
>> >> >> >>    if (!vect_widened_op_tree (vinfo, mult_vinfo, MULT_EXPR,
>> >> >> WIDEN_MULT_EXPR,
>> >> >> >> -			     false, 2, unprom0, &half_type))
>> >> >> >> +			     false, 2, unprom0, &half_type, &subtype))
>> >> >> >> +    return NULL;
>> >> >> >> +
>> >> >> >> +  if (subtype == optab_vector_mixed_sign
>> >> >> >> +      && TYPE_UNSIGNED (unprom_mult.type)
>> >> >> >> +      && TYPE_PRECISION (half_type) * 4 > TYPE_PRECISION
>> >> >> >> + (unprom_mult.type))
>> >> >> >>      return NULL;
>> >> >> >
>> >> >> > Isn't the final condition here instead that TYPE1 is narrower than
>> TYPE2?
>> >> >> > I.e. we need to reject the case in which we multiply a signed
>> >> >> > and an unsigned value to get a (logically) signed result, but
>> >> >> > then zero-extend it (rather than sign-extend it) to the
>> >> >> > precision of the
>> >> addition.
>> >> >> >
>> >> >> > That would make the test:
>> >> >> >
>> >> >> >   if (subtype == optab_vector_mixed_sign
>> >> >> >       && TYPE_UNSIGNED (unprom_mult.type)
>> >> >> >       && TYPE_PRECISION (unprom_mult.type) < TYPE_PRECISION
>> (type))
>> >> >> >     return NULL;
>> >> >> >
>> >> >> > instead.
>> >> >>
>> >> >> And folding that into the existing test gives:
>> >> >>
>> >> >>   /* If there are two widening operations, make sure they agree on
>> >> >> the
>> >> sign
>> >> >>      of the extension.  The result of an optab_vector_mixed_sign
>> operation
>> >> >>      is signed; otherwise, the result has the same sign as the operands.
>> */
>> >> >>   if (TYPE_PRECISION (unprom_mult.type) != TYPE_PRECISION (type)
>> >> >>       && (subtype == optab_vector_mixed_sign
>> >> >> 	  ? TYPE_UNSIGNED (unprom_mult.type)
>> >> >> 	  : TYPE_SIGN (unprom_mult.type) != TYPE_SIGN (half_type)))
>> >> >>     return NULL;
>> >> >>
>> >> >
>> >> > I went with the first one which doesn't add the extra constraints
>> >> > for the normal dotproduct as that makes it too restrictive. It's
>> >> > the type of the multiplication that determines the operation so
>> >> > dotproduct can be used a bit more than where we currently do.
>> >> >
>> >> > This was relaxed in an earlier patch.
>> >>
>> >> I didn't mean that we should add extra constraints to the normal case
>> though.
>> >> The existing test I was referring to above was:
>> >>
>> >>   /* If there are two widening operations, make sure they agree on
>> >>      the sign of the extension.  */
>> >>   if (TYPE_PRECISION (unprom_mult.type) != TYPE_PRECISION (type)
>> >>       && TYPE_SIGN (unprom_mult.type) != TYPE_SIGN (half_type))
>> >>     return NULL;
>> >
>> > But as I mentioned, this restriction is unneeded and has been removed
>> hence why it's not in my patchset's diff.
>> > It's removed by
>> > https://gcc.gnu.org/pipermail/gcc-patches/2021-May/569851.html which
>> Richi conditioned on the rest of these patches being approved.
>> >
>> > This change needlessly blocks test vect-reduc-dot-[2,3,6,7].c from
>> > being dotproducts for instance
>> >
>> > It's also part of the deficiency between GCC codegen and Clang
>> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88492#c6
>> 
>> Hmm, OK.  Just removing the check regresses:
>> 
>> unsigned long __attribute__ ((noipa))
>> f (signed short *x, signed short *y)
>> {
>>   unsigned long res = 0;
>>   for (int i = 0; i < 100; ++i)
>>     res += (unsigned int) x[i] * (unsigned int) y[i];
>>   return res;
>> }
>> 
>> int
>> main (void)
>> {
>>   signed short x[100], y[100];
>>   for (int i = 0; i < 100; ++i)
>>     {
>>       x[i] = -1;
>>       y[i] = 1;
>>     }
>>   if (f (x, y) != 0x6400000000ULL - 100)
>>     __builtin_abort ();
>>   return 0;
>> }
>> 
>> on SVE.  We then use SDOT even though the result of the multiplication is
>> zero- rather than sign-extended to 64 bits.  Does something else in the series
>> stop that from that happening?
>
> No, and I hadn't noticed it before because it looks like the mid-end tests that are execution test don't turn on dot-product for arm targets :/ 

Yeah, I was surprised I needed SVE to get an SDOT above, but didn't look
into why…

> I'll look at it separately, for now I've then added the check back in.
>
> Ok for trunk now?

Reviewing the full patch this time: I have a couple of nits about
the documentation, but otherwise it LGTM.

> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -5446,13 +5446,55 @@ Like @samp{fold_left_plus_@var{m}}, but takes an additional mask operand
>  
>  @cindex @code{sdot_prod@var{m}} instruction pattern
>  @item @samp{sdot_prod@var{m}}
> +
> +Compute the sum of the products of two signed elements.
> +Operand 1 and operand 2 are of the same mode. Their
> +product, which is of a wider mode, is computed and added to operand 3.
> +Operand 3 is of a mode equal or wider than the mode of the product. The
> +result is placed in operand 0, which is of the same mode as operand 3.
> +
> +Semantically the expressions perform the multiplication in the following signs
> +
> +@smallexample
> +sdot<signed c, signed a, signed b> ==
> +   res = sign-ext (a) * sign-ext (b) + c
> +@dots{}
> +@end smallexample

I think putting signed c first in the argument list might be confusing,
since like you say, it corresponds to operand 3 rather than operand 1.
How about calling them op0, op1, op2 and op3 instead of res, a, b and c,
and listing them in that order?

Same for udot_prod.

(Someone who doesn't know the AArch64 instructions might wonder how
the elements of op1 and op2 correspond to elements of op0 and op3.
That's a pre-existing problem though, so no need to fix it here.)

>  @cindex @code{udot_prod@var{m}} instruction pattern
> -@itemx @samp{udot_prod@var{m}}
> -Compute the sum of the products of two signed/unsigned elements.
> -Operand 1 and operand 2 are of the same mode. Their product, which is of a
> -wider mode, is computed and added to operand 3. Operand 3 is of a mode equal or
> -wider than the mode of the product. The result is placed in operand 0, which
> -is of the same mode as operand 3.
> +@item @samp{udot_prod@var{m}}
> +
> +Compute the sum of the products of two unsigned elements.
> +Operand 1 and operand 2 are of the same mode. Their
> +product, which is of a wider mode, is computed and added to operand 3.
> +Operand 3 is of a mode equal or wider than the mode of the product. The
> +result is placed in operand 0, which is of the same mode as operand 3.
> +
> +Semantically the expressions perform the multiplication in the following signs
> +
> +@smallexample
> +udot<unsigned c, unsigned a, unsigned b> ==
> +   res = zero-ext (a) * zero-ext (b) + c
> +@dots{}
> +@end smallexample
> +
> +
> +

Should just be one blank line here.

> +@cindex @code{usdot_prod@var{m}} instruction pattern
> +@item @samp{usdot_prod@var{m}}
> +Compute the sum of the products of elements of different signs.
> +Operand 1 must be unsigned and operand 2 signed. Their
> +product, which is of a wider mode, is computed and added to operand 3.
> +Operand 3 is of a mode equal or wider than the mode of the product. The
> +result is placed in operand 0, which is of the same mode as operand 3.
> +
> +Semantically the expressions perform the multiplication in the following signs
> +
> +@smallexample
> +usdot<unsigned c, unsigned a, signed b> ==
> +   res = ((unsigned-conv) sign-ext (a)) * zero-ext (b) + c

It looks like the extensions are the wrong way around.  I think it should be:

usdot<signed c, unsigned a, signed b> ==
   res = ((signed-conv) zero-ext (a)) * sign-ext (b) + c

(before the changes to put c last and use the opN names).

I.e. the unsigned operand is zero-extended and the signed operand is
sign extended.  I think it's easier to understand if we treat the
multiplication and c as signed, since in that case we don't reinterpret
any negative signed value (of b) as an unsigned value.  (Both choices
make sense for “a”, since the zero-ext(a) fits into both a signed wider
int and an unsigned wider int.)

OK with those changes, and thanks for your patience through the slow reviews.

Richard
diff mbox series

Patch

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index d166a0debedf4d8edf55c842bcf4ff4690b3e9ce..baf20416e63745097825fc30fdf2e66bc80d7d23 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5440,11 +5440,13 @@  Like @samp{fold_left_plus_@var{m}}, but takes an additional mask operand
 @item @samp{sdot_prod@var{m}}
 @cindex @code{udot_prod@var{m}} instruction pattern
 @itemx @samp{udot_prod@var{m}}
+@cindex @code{usdot_prod@var{m}} instruction pattern
+@itemx @samp{usdot_prod@var{m}}
 Compute the sum of the products of two signed/unsigned elements.
-Operand 1 and operand 2 are of the same mode. Their product, which is of a
-wider mode, is computed and added to operand 3. Operand 3 is of a mode equal or
-wider than the mode of the product. The result is placed in operand 0, which
-is of the same mode as operand 3.
+Operand 1 and operand 2 are of the same mode but may differ in signs. Their
+product, which is of a wider mode, is computed and added to operand 3.
+Operand 3 is of a mode equal or wider than the mode of the product. The
+result is placed in operand 0, which is of the same mode as operand 3.
 
 @cindex @code{ssad@var{m}} instruction pattern
 @item @samp{ssad@var{m}}
diff --git a/gcc/optabs-tree.h b/gcc/optabs-tree.h
index c3aaa1a416991e856d3e24da45968a92ebada82c..ebc23ac86fe99057f375781c2f1990e0548ba08d 100644
--- a/gcc/optabs-tree.h
+++ b/gcc/optabs-tree.h
@@ -27,11 +27,29 @@  along with GCC; see the file COPYING3.  If not see
    shift amount vs. machines that take a vector for the shift amount.  */
 enum optab_subtype
 {
-  optab_default,
-  optab_scalar,
-  optab_vector
+  optab_default = 1 << 0,
+  optab_scalar = 1 << 1,
+  optab_vector = 1 << 2,
+  optab_signed_to_unsigned = 1 << 3,
+  optab_unsigned_to_signed = 1 << 4
 };
 
+/* Override the OrEqual-operator so we can use optab_subtype as a bit flag.  */
+inline enum optab_subtype&
+operator |= (enum optab_subtype& a, enum optab_subtype b)
+{
+    return a = static_cast<optab_subtype>(static_cast<int>(a)
+					  | static_cast<int>(b));
+}
+
+/* Override the Or-operator so we can use optab_subtype as a bit flag.  */
+inline enum optab_subtype
+operator | (enum optab_subtype a, enum optab_subtype b)
+{
+    return static_cast<optab_subtype>(static_cast<int>(a)
+				      | static_cast<int>(b));
+}
+
 /* Return the optab used for computing the given operation on the type given by
    the second argument.  The third argument distinguishes between the types of
    vector shifts and rotates.  */
diff --git a/gcc/optabs-tree.c b/gcc/optabs-tree.c
index 95ffe397c23e80c105afea52e9d47216bf52f55a..2f60004545defc53182e004eea1e5c22b7453072 100644
--- a/gcc/optabs-tree.c
+++ b/gcc/optabs-tree.c
@@ -127,7 +127,17 @@  optab_for_tree_code (enum tree_code code, const_tree type,
       return TYPE_UNSIGNED (type) ? usum_widen_optab : ssum_widen_optab;
 
     case DOT_PROD_EXPR:
-      return TYPE_UNSIGNED (type) ? udot_prod_optab : sdot_prod_optab;
+      {
+	gcc_assert (subtype & optab_default
+		    || subtype & optab_vector
+		    || subtype & optab_signed_to_unsigned
+		    || subtype & optab_unsigned_to_signed);
+
+	if (subtype & (optab_unsigned_to_signed | optab_signed_to_unsigned))
+	  return usdot_prod_optab;
+
+	return (TYPE_UNSIGNED (type) ? udot_prod_optab : sdot_prod_optab);
+      }
 
     case SAD_EXPR:
       return TYPE_UNSIGNED (type) ? usad_optab : ssad_optab;
diff --git a/gcc/optabs.c b/gcc/optabs.c
index f4614a394587787293dc8b680a38901f7906f61c..2e18b76de1412eab71971753ac678597c0d00098 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -262,6 +262,11 @@  expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1, rtx wide_op,
   bool sbool = false;
 
   oprnd0 = ops->op0;
+  if (nops >= 2)
+    oprnd1 = ops->op1;
+  if (nops >= 3)
+    oprnd2 = ops->op2;
+
   tmode0 = TYPE_MODE (TREE_TYPE (oprnd0));
   if (ops->code == VEC_UNPACK_FIX_TRUNC_HI_EXPR
       || ops->code == VEC_UNPACK_FIX_TRUNC_LO_EXPR)
@@ -285,6 +290,27 @@  expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1, rtx wide_op,
 	   ? vec_unpacks_sbool_hi_optab : vec_unpacks_sbool_lo_optab);
       sbool = true;
     }
+  else if (ops->code == DOT_PROD_EXPR)
+    {
+      enum optab_subtype subtype = optab_default;
+      signop sign1 = TYPE_SIGN (TREE_TYPE (oprnd0));
+      signop sign2 = TYPE_SIGN (TREE_TYPE (oprnd1));
+      if (sign1 == sign2)
+	;
+      else if (sign1 == SIGNED && sign2 == UNSIGNED)
+	{
+	  subtype |= optab_signed_to_unsigned;
+	  /* Same as optab_unsigned_to_signed but flip the operands.  */
+	  std::swap (op0, op1);
+	}
+      else if (sign1 == UNSIGNED && sign2 == SIGNED)
+	subtype |= optab_unsigned_to_signed;
+      else
+	gcc_unreachable ();
+
+      widen_pattern_optab
+	= optab_for_tree_code (ops->code, TREE_TYPE (oprnd0), subtype);
+    }
   else
     widen_pattern_optab
       = optab_for_tree_code (ops->code, TREE_TYPE (oprnd0), optab_default);
@@ -298,10 +324,7 @@  expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1, rtx wide_op,
   gcc_assert (icode != CODE_FOR_nothing);
 
   if (nops >= 2)
-    {
-      oprnd1 = ops->op1;
-      tmode1 = TYPE_MODE (TREE_TYPE (oprnd1));
-    }
+    tmode1 = TYPE_MODE (TREE_TYPE (oprnd1));
   else if (sbool)
     {
       nops = 2;
@@ -316,7 +339,6 @@  expand_widen_pattern_expr (sepops ops, rtx op0, rtx op1, rtx wide_op,
     {
       gcc_assert (tmode1 == tmode0);
       gcc_assert (op1);
-      oprnd2 = ops->op2;
       wmode = TYPE_MODE (TREE_TYPE (oprnd2));
     }
 
diff --git a/gcc/optabs.def b/gcc/optabs.def
index b192a9d070b8aa72e5676b2eaa020b5bdd7ffcc8..f470c2168378cec840edf7fbdb7c18615baae928 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -352,6 +352,7 @@  OPTAB_D (uavg_ceil_optab, "uavg$a3_ceil")
 OPTAB_D (sdot_prod_optab, "sdot_prod$I$a")
 OPTAB_D (ssum_widen_optab, "widen_ssum$I$a3")
 OPTAB_D (udot_prod_optab, "udot_prod$I$a")
+OPTAB_D (usdot_prod_optab, "usdot_prod$I$a")
 OPTAB_D (usum_widen_optab, "widen_usum$I$a3")
 OPTAB_D (usad_optab, "usad$I$a")
 OPTAB_D (ssad_optab, "ssad$I$a")
diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index 7e3aae5f9c28a49feedc7cc66e8ac0d476b9f28a..58b55bb648ad97d514f1fa18bb00808fd2678b42 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -4421,7 +4421,8 @@  verify_gimple_assign_ternary (gassign *stmt)
 		  && !SCALAR_FLOAT_TYPE_P (rhs1_type))
 		 || (!INTEGRAL_TYPE_P (lhs_type)
 		     && !SCALAR_FLOAT_TYPE_P (lhs_type))))
-	    || !types_compatible_p (rhs1_type, rhs2_type)
+	    || (!types_compatible_p (rhs1_type, rhs2_type)
+		&& TYPE_SIGN (rhs1_type) == TYPE_SIGN (rhs2_type))
 	    || !useless_type_conversion_p (lhs_type, rhs3_type)
 	    || maybe_lt (GET_MODE_SIZE (element_mode (rhs3_type)),
 			 2 * GET_MODE_SIZE (element_mode (rhs1_type))))
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 93fa2928e001c154bd4a9a73ac1dbbbf73c456df..cb8f5fbb6abca181c4171194d19fec29ec6e4176 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -6401,6 +6401,33 @@  build_vect_cond_expr (enum tree_code code, tree vop[3], tree mask,
     }
 }
 
+/* Determine the optab_subtype to use for the given CODE and STMT.  For
+   most CODE this will be optab_vector, however for certain operations such as
+   DOT_PROD_EXPR where the operation can different signs for the operands we
+   need to be able to pick the right optabs.  */
+
+static enum optab_subtype
+vect_determine_dot_kind (tree_code code, stmt_vec_info stmt_vinfo)
+{
+  enum optab_subtype subtype = optab_vector;
+  switch (code)
+    {
+      case DOT_PROD_EXPR:
+	{
+	  gassign *stmt = as_a <gassign *> (STMT_VINFO_STMT (stmt_vinfo));
+	  signop rhs1_sign = TYPE_SIGN (TREE_TYPE (gimple_assign_rhs1 (stmt)));
+	  signop rhs2_sign = TYPE_SIGN (TREE_TYPE (gimple_assign_rhs2 (stmt)));
+	  if (rhs1_sign != rhs2_sign)
+	    subtype |= optab_unsigned_to_signed;
+	  break;
+	}
+      default:
+	break;
+    }
+
+  return subtype;
+}
+
 /* Function vectorizable_reduction.
 
    Check if STMT_INFO performs a reduction operation that can be vectorized.
@@ -7189,7 +7216,8 @@  vectorizable_reduction (loop_vec_info loop_vinfo,
       bool ok = true;
 
       /* 4.1. check support for the operation in the loop  */
-      optab optab = optab_for_tree_code (code, vectype_in, optab_vector);
+      enum optab_subtype subtype = vect_determine_dot_kind (code, stmt_info);
+      optab optab = optab_for_tree_code (code, vectype_in, subtype);
       if (!optab)
 	{
 	  if (dump_enabled_p ())
diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
index 441d6cd28c4eaded7abd756164890dbcffd2f3b8..943c001fb13777b4d1513841fa84942316846d5e 100644
--- a/gcc/tree-vect-patterns.c
+++ b/gcc/tree-vect-patterns.c
@@ -201,7 +201,8 @@  vect_get_external_def_edge (vec_info *vinfo, tree var)
 static bool
 vect_supportable_direct_optab_p (vec_info *vinfo, tree otype, tree_code code,
 				 tree itype, tree *vecotype_out,
-				 tree *vecitype_out = NULL)
+				 tree *vecitype_out = NULL,
+				 enum optab_subtype subtype = optab_default)
 {
   tree vecitype = get_vectype_for_scalar_type (vinfo, itype);
   if (!vecitype)
@@ -211,7 +212,7 @@  vect_supportable_direct_optab_p (vec_info *vinfo, tree otype, tree_code code,
   if (!vecotype)
     return false;
 
-  optab optab = optab_for_tree_code (code, vecitype, optab_default);
+  optab optab = optab_for_tree_code (code, vecitype, subtype);
   if (!optab)
     return false;
 
@@ -487,14 +488,31 @@  vect_joust_widened_integer (tree type, bool shift_p, tree op,
 }
 
 /* Return true if the common supertype of NEW_TYPE and *COMMON_TYPE
-   is narrower than type, storing the supertype in *COMMON_TYPE if so.  */
+   is narrower than type, storing the supertype in *COMMON_TYPE if so.
+   If ALLOW_SHORT_SIGN_MISMATCH then accept that *COMMON_TYPE and NEW_TYPE
+   may be of different signs but equal precision.   */
 
 static bool
-vect_joust_widened_type (tree type, tree new_type, tree *common_type)
+vect_joust_widened_type (tree type, tree new_type, tree *common_type,
+			 bool allow_short_sign_mismatch = false)
 {
   if (types_compatible_p (*common_type, new_type))
     return true;
 
+  /* Check if the mismatch is only in the sign and if we have
+     allow_short_sign_mismatch then allow it.  */
+  if (allow_short_sign_mismatch
+      && TYPE_SIGN (*common_type) != TYPE_SIGN (new_type))
+    {
+      bool sign = TYPE_SIGN (*common_type) == UNSIGNED;
+      tree eq_type
+	= build_nonstandard_integer_type (TYPE_PRECISION (new_type),
+					  sign);
+
+      if (types_compatible_p (*common_type, eq_type))
+	return true;
+    }
+
   /* See if *COMMON_TYPE can hold all values of NEW_TYPE.  */
   if ((TYPE_PRECISION (new_type) < TYPE_PRECISION (*common_type))
       && (TYPE_UNSIGNED (new_type) || !TYPE_UNSIGNED (*common_type)))
@@ -532,6 +550,9 @@  vect_joust_widened_type (tree type, tree new_type, tree *common_type)
    to a type that (a) is narrower than the result of STMT_INFO and
    (b) can hold all leaf operand values.
 
+   If ALLOW_SHORT_SIGN_MISMATCH then allow that the signs of the operands
+   may differ in signs but not in precision.
+
    Return 0 if STMT_INFO isn't such a tree, or if no such COMMON_TYPE
    exists.  */
 
@@ -539,7 +560,8 @@  static unsigned int
 vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
 		      tree_code widened_code, bool shift_p,
 		      unsigned int max_nops,
-		      vect_unpromoted_value *unprom, tree *common_type)
+		      vect_unpromoted_value *unprom, tree *common_type,
+		      bool allow_short_sign_mismatch = false)
 {
   /* Check for an integer operation with the right code.  */
   gassign *assign = dyn_cast <gassign *> (stmt_info->stmt);
@@ -600,7 +622,8 @@  vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
 		= vinfo->lookup_def (this_unprom->op);
 	      nops = vect_widened_op_tree (vinfo, def_stmt_info, code,
 					   widened_code, shift_p, max_nops,
-					   this_unprom, common_type);
+					   this_unprom, common_type,
+					   allow_short_sign_mismatch);
 	      if (nops == 0)
 		return 0;
 
@@ -617,7 +640,8 @@  vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code,
 	      if (i == 0)
 		*common_type = this_unprom->type;
 	      else if (!vect_joust_widened_type (type, this_unprom->type,
-						 common_type))
+						 common_type,
+						 allow_short_sign_mismatch))
 		return 0;
 	    }
 	}
@@ -888,21 +912,24 @@  vect_reassociating_reduction_p (vec_info *vinfo,
 
    Try to find the following pattern:
 
-     type x_t, y_t;
+     type1a x_t
+     type1b y_t;
      TYPE1 prod;
      TYPE2 sum = init;
    loop:
      sum_0 = phi <init, sum_1>
      S1  x_t = ...
      S2  y_t = ...
-     S3  x_T = (TYPE1) x_t;
-     S4  y_T = (TYPE1) y_t;
+     S3  x_T = (TYPE3) x_t;
+     S4  y_T = (TYPE4) y_t;
      S5  prod = x_T * y_T;
      [S6  prod = (TYPE2) prod;  #optional]
      S7  sum_1 = prod + sum_0;
 
-   where 'TYPE1' is exactly double the size of type 'type', and 'TYPE2' is the
-   same size of 'TYPE1' or bigger. This is a special case of a reduction
+   where 'TYPE1' is exactly double the size of type 'type1a' and 'type1b',
+   the sign of 'TYPE1' must be one of 'type1a' or 'type1b' but the sign of
+   'type1a' and 'type1b' can differ. 'TYPE2' is the same size of 'TYPE1' or
+   bigger and must be the same sign. This is a special case of a reduction
    computation.
 
    Input:
@@ -939,15 +966,16 @@  vect_recog_dot_prod_pattern (vec_info *vinfo,
 
   /* Look for the following pattern
           DX = (TYPE1) X;
-          DY = (TYPE1) Y;
+	  DY = (TYPE2) Y;
           DPROD = DX * DY;
-          DDPROD = (TYPE2) DPROD;
+	  DDPROD = (TYPE3) DPROD;
           sum_1 = DDPROD + sum_0;
      In which
      - DX is double the size of X
      - DY is double the size of Y
      - DX, DY, DPROD all have the same type but the sign
-       between DX, DY and DPROD can differ.
+       between DX, DY and DPROD can differ. The sign of DPROD
+       is one of the signs of DX or DY.
      - sum is the same size of DPROD or bigger
      - sum has been recognized as a reduction variable.
 
@@ -986,14 +1014,41 @@  vect_recog_dot_prod_pattern (vec_info *vinfo,
      inside the loop (in case we are analyzing an outer-loop).  */
   vect_unpromoted_value unprom0[2];
   if (!vect_widened_op_tree (vinfo, mult_vinfo, MULT_EXPR, WIDEN_MULT_EXPR,
-			     false, 2, unprom0, &half_type))
+			     false, 2, unprom0, &half_type, true))
     return NULL;
 
+  /* Check to see if there is a sign change happening in the operands of the
+     multiplication and pick the appropriate optab subtype.  */
+  enum optab_subtype subtype;
+  tree rhs_type1 = unprom0[0].type;
+  tree rhs_type2 = unprom0[1].type;
+  if (TYPE_SIGN (rhs_type1) == TYPE_SIGN (rhs_type2))
+     subtype = optab_default;
+  else if (TYPE_SIGN (rhs_type1) == SIGNED
+	   && TYPE_SIGN (rhs_type2) == UNSIGNED)
+     subtype = optab_signed_to_unsigned;
+  else if (TYPE_SIGN (rhs_type1) == UNSIGNED
+	   && TYPE_SIGN (rhs_type2) == SIGNED)
+     subtype = optab_unsigned_to_signed;
+  else
+    gcc_unreachable ();
+
+  /* If we have a sign changing dot product we need to check that the
+     promoted type if unsigned has at least the same precision as the final
+     type of the dot-product.  */
+  if (subtype != optab_default)
+    {
+      tree mult_type = TREE_TYPE (unprom_mult.op);
+      if (TYPE_SIGN (mult_type) == UNSIGNED
+	  && TYPE_PRECISION (mult_type) < TYPE_PRECISION (type))
+	return NULL;
+    }
+
   vect_pattern_detected ("vect_recog_dot_prod_pattern", last_stmt);
 
   tree half_vectype;
   if (!vect_supportable_direct_optab_p (vinfo, type, DOT_PROD_EXPR, half_type,
-					type_out, &half_vectype))
+					type_out, &half_vectype, subtype))
     return NULL;
 
   /* Get the inputs in the appropriate types.  */
@@ -1002,8 +1057,22 @@  vect_recog_dot_prod_pattern (vec_info *vinfo,
 		       unprom0, half_vectype);
 
   var = vect_recog_temp_ssa_var (type, NULL);
+
+  /* If we have a sign changing dot-product the dot-product itself does any
+     sign conversions, so consume the type and use the unpromoted types.  */
+  tree mult_arg1, mult_arg2;
+  if (subtype == optab_default)
+    {
+      mult_arg1 = mult_oprnd[0];
+      mult_arg2 = mult_oprnd[1];
+    }
+  else
+    {
+      mult_arg1 = unprom0[0].op;
+      mult_arg2 = unprom0[1].op;
+    }
   pattern_stmt = gimple_build_assign (var, DOT_PROD_EXPR,
-				      mult_oprnd[0], mult_oprnd[1], oprnd1);
+				      mult_arg1, mult_arg2, oprnd1);
 
   return pattern_stmt;
 }