diff mbox series

middle-end: __builtin_mul_overflow expansion improvements [PR95862]

Message ID 20201125100511.GY3788@tucnak
State New
Headers show
Series middle-end: __builtin_mul_overflow expansion improvements [PR95862] | expand

Commit Message

Jakub Jelinek Nov. 25, 2020, 10:05 a.m. UTC
Hi!

The following patch adds some improvements for __builtin_mul_overflow
expansion.
One optimization is for the u1 * u2 -> sr case, as documented we normally
do:
     u1 * u2 -> sr
        res = (S) (u1 * u2)
        ovf = res < 0 || main_ovf (true)
where main_ovf (true) stands for jump on unsigned multiplication overflow.
If we know that the most significant bits of both operands are clear (such
as when they are zero extended from something smaller), we can
emit better coe by handling it like s1 * s2 -> sr, i.e. just jump on
overflow after signed multiplication.

Another two cases are s1 * s2 -> ur or s1 * u2 -> ur, if we know the minimum
precision needed to encode all values of both arguments summed together
is smaller or equal to destination precision (such as when the two arguments
are sign (or zero) extended from half precision types, we know the overflows
happen only iff one argument is negative and the other argument is positive
(not zero), because even if both have maximum possible values, the maximum
is still representable (e.g. for char * char -> unsigned short
0x7f * 0x7f = 0x3f01 and for char * unsigned char -> unsigned short
0x7f * 0xffU = 0x7e81) and as the result is unsigned, all negative results
do overflow, but are also representable if we consider the result signed
- all of them have the MSB set.  So, it is more efficient to just
do the normal multiplication in that case and compare the result considered
as signed value against 0, if it is smaller, overflow happened.

And the get_min_precision change is to improve the char to short handling,
we have there in the IL
  _2 = (int) arg_1(D);
promotion from C promotions from char or unsigned char arg, and the caller
adds a NOP_EXPR cast to short or unsigned short.  get_min_precision punts
on the narrowing cast though, it handled only widening casts, but we can
handle narrowing casts fine too, by recursing on the narrowing cast operands
and using it only if it has in the end smaller minimal precision, which
would duplicate the sign bits (or zero bits) to both the bits above the
narrowing conversion and also at least one below that.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2020-10-25  Jakub Jelinek  <jakub@redhat.com>

	PR rtl-optimization/95862
	* internal-fn.c (get_min_precision): For narrowing conversion, recurse
	on the operand and if the operand precision is smaller than the
	current one, return that smaller precision.
	(expand_mul_overflow): For s1 * u2 -> ur and s1 * s2 -> ur cases
	if the sum of minimum precisions of both operands is smaller or equal
	to the result precision, just perform normal multiplication and
	set overflow to the sign bit of the multiplication result.  For
	u1 * u2 -> sr if both arguments have the MSB known zero, use
	normal s1 * s2 -> sr expansion.

	* gcc.dg/builtin-artih-overflow-5.c: New test.


	Jakub

Comments

Richard Biener Nov. 25, 2020, 2:40 p.m. UTC | #1
On Wed, 25 Nov 2020, Jakub Jelinek wrote:

> Hi!
> 
> The following patch adds some improvements for __builtin_mul_overflow
> expansion.
> One optimization is for the u1 * u2 -> sr case, as documented we normally
> do:
>      u1 * u2 -> sr
>         res = (S) (u1 * u2)
>         ovf = res < 0 || main_ovf (true)
> where main_ovf (true) stands for jump on unsigned multiplication overflow.
> If we know that the most significant bits of both operands are clear (such
> as when they are zero extended from something smaller), we can
> emit better coe by handling it like s1 * s2 -> sr, i.e. just jump on
> overflow after signed multiplication.
> 
> Another two cases are s1 * s2 -> ur or s1 * u2 -> ur, if we know the minimum
> precision needed to encode all values of both arguments summed together
> is smaller or equal to destination precision (such as when the two arguments
> are sign (or zero) extended from half precision types, we know the overflows
> happen only iff one argument is negative and the other argument is positive
> (not zero), because even if both have maximum possible values, the maximum
> is still representable (e.g. for char * char -> unsigned short
> 0x7f * 0x7f = 0x3f01 and for char * unsigned char -> unsigned short
> 0x7f * 0xffU = 0x7e81) and as the result is unsigned, all negative results
> do overflow, but are also representable if we consider the result signed
> - all of them have the MSB set.  So, it is more efficient to just
> do the normal multiplication in that case and compare the result considered
> as signed value against 0, if it is smaller, overflow happened.
> 
> And the get_min_precision change is to improve the char to short handling,
> we have there in the IL
>   _2 = (int) arg_1(D);
> promotion from C promotions from char or unsigned char arg, and the caller
> adds a NOP_EXPR cast to short or unsigned short.  get_min_precision punts
> on the narrowing cast though, it handled only widening casts, but we can
> handle narrowing casts fine too, by recursing on the narrowing cast operands
> and using it only if it has in the end smaller minimal precision, which
> would duplicate the sign bits (or zero bits) to both the bits above the
> narrowing conversion and also at least one below that.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Thanks,
Richard.

> 2020-10-25  Jakub Jelinek  <jakub@redhat.com>
> 
> 	PR rtl-optimization/95862
> 	* internal-fn.c (get_min_precision): For narrowing conversion, recurse
> 	on the operand and if the operand precision is smaller than the
> 	current one, return that smaller precision.
> 	(expand_mul_overflow): For s1 * u2 -> ur and s1 * s2 -> ur cases
> 	if the sum of minimum precisions of both operands is smaller or equal
> 	to the result precision, just perform normal multiplication and
> 	set overflow to the sign bit of the multiplication result.  For
> 	u1 * u2 -> sr if both arguments have the MSB known zero, use
> 	normal s1 * s2 -> sr expansion.
> 
> 	* gcc.dg/builtin-artih-overflow-5.c: New test.
> 
> --- gcc/internal-fn.c.jj	2020-10-13 09:25:28.444909391 +0200
> +++ gcc/internal-fn.c	2020-11-24 15:54:32.684762538 +0100
> @@ -553,6 +553,16 @@ get_min_precision (tree arg, signop sign
>        if (++cnt > 30)
>  	return prec + (orig_sign != sign);
>      }
> +  if (CONVERT_EXPR_P (arg)
> +      && INTEGRAL_TYPE_P (TREE_TYPE (TREE_OPERAND (arg, 0)))
> +      && TYPE_PRECISION (TREE_TYPE (TREE_OPERAND (arg, 0))) > prec)
> +    {
> +      /* We have e.g. (unsigned short) y_2 where int y_2 = (int) x_1(D);
> +	 If y_2's min precision is smaller than prec, return that.  */
> +      int oprec = get_min_precision (TREE_OPERAND (arg, 0), sign);
> +      if (oprec < prec)
> +	return oprec + (orig_sign != sign);
> +    }
>    if (TREE_CODE (arg) != SSA_NAME)
>      return prec + (orig_sign != sign);
>    wide_int arg_min, arg_max;
> @@ -1357,6 +1367,37 @@ expand_mul_overflow (location_t loc, tre
>  				   NULL, done_label, profile_probability::very_likely ());
>  	  goto do_error_label;
>  	case 3:
> +	  if (get_min_precision (arg1, UNSIGNED)
> +	      + get_min_precision (arg0, SIGNED) <= GET_MODE_PRECISION (mode))
> +	    {
> +	      /* If the first operand is sign extended from narrower type, the
> +		 second operand is zero extended from narrower type and
> +		 the sum of the two precisions is smaller or equal to the
> +		 result precision: if the first argument is at runtime
> +		 non-negative, maximum result will be 0x7e81 or 0x7f..fe80..01
> +		 and there will be no overflow, if the first argument is
> +		 negative and the second argument zero, the result will be
> +		 0 and there will be no overflow, if the first argument is
> +		 negative and the second argument positive, the result when
> +		 treated as signed will be negative (minimum -0x7f80 or
> +		 -0x7f..f80..0) there there will be always overflow.  So, do
> +		 res = (U) (s1 * u2)
> +		 ovf = (S) res < 0  */
> +	      struct separate_ops ops;
> +	      ops.code = MULT_EXPR;
> +	      ops.type
> +		= build_nonstandard_integer_type (GET_MODE_PRECISION (mode),
> +						  1);
> +	      ops.op0 = make_tree (ops.type, op0);
> +	      ops.op1 = make_tree (ops.type, op1);
> +	      ops.op2 = NULL_TREE;
> +	      ops.location = loc;
> +	      res = expand_expr_real_2 (&ops, NULL_RTX, mode, EXPAND_NORMAL);
> +	      do_compare_rtx_and_jump (res, const0_rtx, GE, false,
> +				       mode, NULL_RTX, NULL, done_label,
> +				       profile_probability::very_likely ());
> +	      goto do_error_label;
> +	    }
>  	  rtx_code_label *do_main_label;
>  	  do_main_label = gen_label_rtx ();
>  	  do_compare_rtx_and_jump (op0, const0_rtx, GE, false, mode, NULL_RTX,
> @@ -1374,7 +1415,16 @@ expand_mul_overflow (location_t loc, tre
>    /* u1 * u2 -> sr  */
>    if (uns0_p && uns1_p && !unsr_p)
>      {
> -      uns = true;
> +      if ((pos_neg0 | pos_neg1) == 1)
> +	{
> +	  /* If both arguments are zero extended from narrower types,
> +	     the MSB will be clear on both and so we can pretend it is
> +	     a normal s1 * s2 -> sr multiplication.  */
> +	  uns0_p = false;
> +	  uns1_p = false;
> +	}
> +      else
> +	uns = true;
>        /* Rest of handling of this case after res is computed.  */
>        goto do_main;
>      }
> @@ -1455,6 +1505,37 @@ expand_mul_overflow (location_t loc, tre
>  				       profile_probability::very_likely ());
>  	      goto do_error_label;
>  	    }
> +	  if (get_min_precision (arg0, SIGNED)
> +	      + get_min_precision (arg1, SIGNED) <= GET_MODE_PRECISION (mode))
> +	    {
> +	      /* If both operands are sign extended from narrower types and
> +		 the sum of the two precisions is smaller or equal to the
> +		 result precision: if both arguments are at runtime
> +		 non-negative, maximum result will be 0x3f01 or 0x3f..f0..01
> +		 and there will be no overflow, if both arguments are negative,
> +		 maximum result will be 0x40..00 and there will be no overflow
> +		 either, if one argument is positive and the other argument
> +		 negative, the result when treated as signed will be negative
> +		 and there will be always overflow, and if one argument is
> +		 zero and the other negative the result will be zero and no
> +		 overflow.  So, do
> +		 res = (U) (s1 * s2)
> +		 ovf = (S) res < 0  */
> +	      struct separate_ops ops;
> +	      ops.code = MULT_EXPR;
> +	      ops.type
> +		= build_nonstandard_integer_type (GET_MODE_PRECISION (mode),
> +						  1);
> +	      ops.op0 = make_tree (ops.type, op0);
> +	      ops.op1 = make_tree (ops.type, op1);
> +	      ops.op2 = NULL_TREE;
> +	      ops.location = loc;
> +	      res = expand_expr_real_2 (&ops, NULL_RTX, mode, EXPAND_NORMAL);
> +	      do_compare_rtx_and_jump (res, const0_rtx, GE, false,
> +				       mode, NULL_RTX, NULL, done_label,
> +				       profile_probability::very_likely ());
> +	      goto do_error_label;
> +	    }
>  	  /* The general case, do all the needed comparisons at runtime.  */
>  	  rtx_code_label *do_main_label, *after_negate_label;
>  	  rtx rop0, rop1;
> --- gcc/testsuite/gcc.dg/builtin-artih-overflow-5.c.jj	2020-11-24 16:13:49.482793354 +0100
> +++ gcc/testsuite/gcc.dg/builtin-artih-overflow-5.c	2020-11-24 16:13:44.980843796 +0100
> @@ -0,0 +1,87 @@
> +/* PR rtl-optimization/95862 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +int
> +f1 (int a, int b)
> +{
> +  unsigned long long c;
> +  return __builtin_mul_overflow (a, b, &c);
> +}
> +
> +int
> +f2 (int a, unsigned b)
> +{
> +  unsigned long long c;
> +  return __builtin_mul_overflow (a, b, &c);
> +}
> +
> +int
> +f3 (unsigned a, unsigned b)
> +{
> +  long long c;
> +  return __builtin_mul_overflow (a, b, &c);
> +}
> +
> +int
> +f4 (int a, unsigned b)
> +{
> +  long long c;
> +  return __builtin_mul_overflow (a, b, &c);
> +}
> +
> +short
> +f5 (short a, short b)
> +{
> +  unsigned c;
> +  return __builtin_mul_overflow (a, b, &c);
> +}
> +
> +short
> +f6 (short a, unsigned short b)
> +{
> +  unsigned c;
> +  return __builtin_mul_overflow (a, b, &c);
> +}
> +
> +short
> +f7 (unsigned short a, unsigned short b)
> +{
> +  int c;
> +  return __builtin_mul_overflow (a, b, &c);
> +}
> +
> +short
> +f8 (short a, unsigned short b)
> +{
> +  int c;
> +  return __builtin_mul_overflow (a, b, &c);
> +}
> +
> +signed char
> +f9 (signed char a, signed char b)
> +{
> +  unsigned short c;
> +  return __builtin_mul_overflow (a, b, &c);
> +}
> +
> +signed char
> +f10 (signed char a, unsigned char b)
> +{
> +  unsigned short c;
> +  return __builtin_mul_overflow (a, b, &c);
> +}
> +
> +signed char
> +f11 (unsigned char a, unsigned char b)
> +{
> +  short c;
> +  return __builtin_mul_overflow (a, b, &c);
> +}
> +
> +signed char
> +f12 (signed char a, unsigned char b)
> +{
> +  short c;
> +  return __builtin_mul_overflow (a, b, &c);
> +}
> 
> 	Jakub
> 
>
diff mbox series

Patch

--- gcc/internal-fn.c.jj	2020-10-13 09:25:28.444909391 +0200
+++ gcc/internal-fn.c	2020-11-24 15:54:32.684762538 +0100
@@ -553,6 +553,16 @@  get_min_precision (tree arg, signop sign
       if (++cnt > 30)
 	return prec + (orig_sign != sign);
     }
+  if (CONVERT_EXPR_P (arg)
+      && INTEGRAL_TYPE_P (TREE_TYPE (TREE_OPERAND (arg, 0)))
+      && TYPE_PRECISION (TREE_TYPE (TREE_OPERAND (arg, 0))) > prec)
+    {
+      /* We have e.g. (unsigned short) y_2 where int y_2 = (int) x_1(D);
+	 If y_2's min precision is smaller than prec, return that.  */
+      int oprec = get_min_precision (TREE_OPERAND (arg, 0), sign);
+      if (oprec < prec)
+	return oprec + (orig_sign != sign);
+    }
   if (TREE_CODE (arg) != SSA_NAME)
     return prec + (orig_sign != sign);
   wide_int arg_min, arg_max;
@@ -1357,6 +1367,37 @@  expand_mul_overflow (location_t loc, tre
 				   NULL, done_label, profile_probability::very_likely ());
 	  goto do_error_label;
 	case 3:
+	  if (get_min_precision (arg1, UNSIGNED)
+	      + get_min_precision (arg0, SIGNED) <= GET_MODE_PRECISION (mode))
+	    {
+	      /* If the first operand is sign extended from narrower type, the
+		 second operand is zero extended from narrower type and
+		 the sum of the two precisions is smaller or equal to the
+		 result precision: if the first argument is at runtime
+		 non-negative, maximum result will be 0x7e81 or 0x7f..fe80..01
+		 and there will be no overflow, if the first argument is
+		 negative and the second argument zero, the result will be
+		 0 and there will be no overflow, if the first argument is
+		 negative and the second argument positive, the result when
+		 treated as signed will be negative (minimum -0x7f80 or
+		 -0x7f..f80..0) there there will be always overflow.  So, do
+		 res = (U) (s1 * u2)
+		 ovf = (S) res < 0  */
+	      struct separate_ops ops;
+	      ops.code = MULT_EXPR;
+	      ops.type
+		= build_nonstandard_integer_type (GET_MODE_PRECISION (mode),
+						  1);
+	      ops.op0 = make_tree (ops.type, op0);
+	      ops.op1 = make_tree (ops.type, op1);
+	      ops.op2 = NULL_TREE;
+	      ops.location = loc;
+	      res = expand_expr_real_2 (&ops, NULL_RTX, mode, EXPAND_NORMAL);
+	      do_compare_rtx_and_jump (res, const0_rtx, GE, false,
+				       mode, NULL_RTX, NULL, done_label,
+				       profile_probability::very_likely ());
+	      goto do_error_label;
+	    }
 	  rtx_code_label *do_main_label;
 	  do_main_label = gen_label_rtx ();
 	  do_compare_rtx_and_jump (op0, const0_rtx, GE, false, mode, NULL_RTX,
@@ -1374,7 +1415,16 @@  expand_mul_overflow (location_t loc, tre
   /* u1 * u2 -> sr  */
   if (uns0_p && uns1_p && !unsr_p)
     {
-      uns = true;
+      if ((pos_neg0 | pos_neg1) == 1)
+	{
+	  /* If both arguments are zero extended from narrower types,
+	     the MSB will be clear on both and so we can pretend it is
+	     a normal s1 * s2 -> sr multiplication.  */
+	  uns0_p = false;
+	  uns1_p = false;
+	}
+      else
+	uns = true;
       /* Rest of handling of this case after res is computed.  */
       goto do_main;
     }
@@ -1455,6 +1505,37 @@  expand_mul_overflow (location_t loc, tre
 				       profile_probability::very_likely ());
 	      goto do_error_label;
 	    }
+	  if (get_min_precision (arg0, SIGNED)
+	      + get_min_precision (arg1, SIGNED) <= GET_MODE_PRECISION (mode))
+	    {
+	      /* If both operands are sign extended from narrower types and
+		 the sum of the two precisions is smaller or equal to the
+		 result precision: if both arguments are at runtime
+		 non-negative, maximum result will be 0x3f01 or 0x3f..f0..01
+		 and there will be no overflow, if both arguments are negative,
+		 maximum result will be 0x40..00 and there will be no overflow
+		 either, if one argument is positive and the other argument
+		 negative, the result when treated as signed will be negative
+		 and there will be always overflow, and if one argument is
+		 zero and the other negative the result will be zero and no
+		 overflow.  So, do
+		 res = (U) (s1 * s2)
+		 ovf = (S) res < 0  */
+	      struct separate_ops ops;
+	      ops.code = MULT_EXPR;
+	      ops.type
+		= build_nonstandard_integer_type (GET_MODE_PRECISION (mode),
+						  1);
+	      ops.op0 = make_tree (ops.type, op0);
+	      ops.op1 = make_tree (ops.type, op1);
+	      ops.op2 = NULL_TREE;
+	      ops.location = loc;
+	      res = expand_expr_real_2 (&ops, NULL_RTX, mode, EXPAND_NORMAL);
+	      do_compare_rtx_and_jump (res, const0_rtx, GE, false,
+				       mode, NULL_RTX, NULL, done_label,
+				       profile_probability::very_likely ());
+	      goto do_error_label;
+	    }
 	  /* The general case, do all the needed comparisons at runtime.  */
 	  rtx_code_label *do_main_label, *after_negate_label;
 	  rtx rop0, rop1;
--- gcc/testsuite/gcc.dg/builtin-artih-overflow-5.c.jj	2020-11-24 16:13:49.482793354 +0100
+++ gcc/testsuite/gcc.dg/builtin-artih-overflow-5.c	2020-11-24 16:13:44.980843796 +0100
@@ -0,0 +1,87 @@ 
+/* PR rtl-optimization/95862 */
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+int
+f1 (int a, int b)
+{
+  unsigned long long c;
+  return __builtin_mul_overflow (a, b, &c);
+}
+
+int
+f2 (int a, unsigned b)
+{
+  unsigned long long c;
+  return __builtin_mul_overflow (a, b, &c);
+}
+
+int
+f3 (unsigned a, unsigned b)
+{
+  long long c;
+  return __builtin_mul_overflow (a, b, &c);
+}
+
+int
+f4 (int a, unsigned b)
+{
+  long long c;
+  return __builtin_mul_overflow (a, b, &c);
+}
+
+short
+f5 (short a, short b)
+{
+  unsigned c;
+  return __builtin_mul_overflow (a, b, &c);
+}
+
+short
+f6 (short a, unsigned short b)
+{
+  unsigned c;
+  return __builtin_mul_overflow (a, b, &c);
+}
+
+short
+f7 (unsigned short a, unsigned short b)
+{
+  int c;
+  return __builtin_mul_overflow (a, b, &c);
+}
+
+short
+f8 (short a, unsigned short b)
+{
+  int c;
+  return __builtin_mul_overflow (a, b, &c);
+}
+
+signed char
+f9 (signed char a, signed char b)
+{
+  unsigned short c;
+  return __builtin_mul_overflow (a, b, &c);
+}
+
+signed char
+f10 (signed char a, unsigned char b)
+{
+  unsigned short c;
+  return __builtin_mul_overflow (a, b, &c);
+}
+
+signed char
+f11 (unsigned char a, unsigned char b)
+{
+  short c;
+  return __builtin_mul_overflow (a, b, &c);
+}
+
+signed char
+f12 (signed char a, unsigned char b)
+{
+  short c;
+  return __builtin_mul_overflow (a, b, &c);
+}