FIx endless match.pd recursion on cst1 + cst2 + cst3 (PR tree-optimization/84334)

Message ID 20180213175129.GS5867@tucnak
State New
Headers show
Series
  • FIx endless match.pd recursion on cst1 + cst2 + cst3 (PR tree-optimization/84334)
Related show

Commit Message

Jakub Jelinek Feb. 13, 2018, 5:51 p.m.
Hi!

On the following testcase, we recurse infinitely, because
we have float re-association enabled, but also rounding-math, so
we try to optimize (cst1 + cst2) + cst3 as (cst2 + cst3) + cst1
but (cst2 + cst3) doesn't simplify and we try again and optimize
it as (cst3 + cst1) + cst2 and then (cst1 + cst2) + cst3 and so on
forever.  If @0 is not a CONSTANT_CLASS_P, there is not a problem,
if it is, the code just checks if we can actually simplify the
operation between cst2 and cst3 into a constant.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2018-02-13  Jakub Jelinek  <jakub@redhat.com>

	PR tree-optimization/84334
	* match.pd ((A +- CST1) +- CST2 -> A + CST3): If A is
	also a CONSTANT_CLASS_P, only optimize if we can fold the
	operation between CST1 and CST2 into a constant.

	* gcc.dg/pr84334.c: New test.


	Jakub

Comments

Richard Biener Feb. 13, 2018, 6:04 p.m. | #1
On February 13, 2018 6:51:29 PM GMT+01:00, Jakub Jelinek <jakub@redhat.com> wrote:
>Hi!
>
>On the following testcase, we recurse infinitely, because
>we have float re-association enabled, but also rounding-math, so
>we try to optimize (cst1 + cst2) + cst3 as (cst2 + cst3) + cst1
>but (cst2 + cst3) doesn't simplify and we try again and optimize
>it as (cst3 + cst1) + cst2 and then (cst1 + cst2) + cst3 and so on
>forever.  If @0 is not a CONSTANT_CLASS_P, there is not a problem,
>if it is, the code just checks if we can actually simplify the
>operation between cst2 and cst3 into a constant.

Is there a reason to try simplifying at all for constant @0?  I'd rather not try to avoid all the complex code. 

Richard. 

>Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
>2018-02-13  Jakub Jelinek  <jakub@redhat.com>
>
>	PR tree-optimization/84334
>	* match.pd ((A +- CST1) +- CST2 -> A + CST3): If A is
>	also a CONSTANT_CLASS_P, only optimize if we can fold the
>	operation between CST1 and CST2 into a constant.
>
>	* gcc.dg/pr84334.c: New test.
>
>--- gcc/match.pd.jj	2018-02-13 09:33:31.000000000 +0100
>+++ gcc/match.pd	2018-02-13 12:14:08.108314686 +0100
>@@ -1733,9 +1733,20 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> 	       CONSTANT_CLASS_P@2)
>      /* If one of the types wraps, use that one.  */
>      (if (!ANY_INTEGRAL_TYPE_P (type) || TYPE_OVERFLOW_WRAPS (type))
>-      (if (outer_op == PLUS_EXPR)
>-       (plus (view_convert @0) (inner_op @2 (view_convert @1)))
>-       (minus (view_convert @0) (neg_inner_op @2 (view_convert @1))))
>+      /* If all 3 captures are CONSTANT_CLASS_P, only optimize if we
>+	 can simplify @2 with @1 into a constant, otherwise we might recurse
>+	 forever.  */
>+      (if (CONSTANT_CLASS_P (@0))
>+       (with { tree cst = fold_unary (VIEW_CONVERT_EXPR, type, @1);
>+	       if (cst && CONSTANT_CLASS_P (cst))
>+		 cst = const_binop (outer_op == PLUS_EXPR
>+				    ? inner_op : neg_inner_op, type,
>+				    @2, cst); }
>+	(if (cst)
>+	 (outer_op (view_convert @0) { cst; })))
>+       (if (outer_op == PLUS_EXPR)
>+	(plus (view_convert @0) (inner_op @2 (view_convert @1)))
>+	(minus (view_convert @0) (neg_inner_op @2 (view_convert @1)))))
>       (if (!ANY_INTEGRAL_TYPE_P (TREE_TYPE (@0))
> 	   || TYPE_OVERFLOW_WRAPS (TREE_TYPE (@0)))
>        (if (outer_op == PLUS_EXPR)
>--- gcc/testsuite/gcc.dg/pr84334.c.jj	2018-02-13 12:18:12.765463667
>+0100
>+++ gcc/testsuite/gcc.dg/pr84334.c	2018-02-13 11:36:56.019632428 +0100
>@@ -0,0 +1,12 @@
>+/* PR tree-optimization/84334 */
>+/* { dg-do compile } */
>+/* { dg-options "-Ofast -frounding-math" } */
>+
>+float
>+foo (void)
>+{
>+  float a = 9.999999974752427078783512115478515625e-7f;
>+  float b = 1.999999994950485415756702423095703125e-6f;
>+  float c = 4.999999873689375817775726318359375e-6f;
>+  return a + b + c;
>+}
>
>	Jakub
Marc Glisse Feb. 13, 2018, 7:28 p.m. | #2
On Tue, 13 Feb 2018, Richard Biener wrote:

> On February 13, 2018 6:51:29 PM GMT+01:00, Jakub Jelinek <jakub@redhat.com> wrote:
>> Hi!
>>
>> On the following testcase, we recurse infinitely, because
>> we have float re-association enabled, but also rounding-math, so
>> we try to optimize (cst1 + cst2) + cst3 as (cst2 + cst3) + cst1
>> but (cst2 + cst3) doesn't simplify and we try again and optimize
>> it as (cst3 + cst1) + cst2 and then (cst1 + cst2) + cst3 and so on
>> forever.  If @0 is not a CONSTANT_CLASS_P, there is not a problem,
>> if it is, the code just checks if we can actually simplify the
>> operation between cst2 and cst3 into a constant.
>
> Is there a reason to try simplifying at all for constant @0?

Yes. cst2+cst3 might simplify (the operation happens to be exact and not 
require rounding), which leaves us with only one addition instead of 2.

On the other hand, mixing -frounding-math with reassociation seems strange 
to me, and likely not worth optimizing for.
Jakub Jelinek Feb. 13, 2018, 8:59 p.m. | #3
On Tue, Feb 13, 2018 at 03:28:25PM -0400, Marc Glisse wrote:
> On Tue, 13 Feb 2018, Richard Biener wrote:
> 
> > On February 13, 2018 6:51:29 PM GMT+01:00, Jakub Jelinek <jakub@redhat.com> wrote:
> > > Hi!
> > > 
> > > On the following testcase, we recurse infinitely, because
> > > we have float re-association enabled, but also rounding-math, so
> > > we try to optimize (cst1 + cst2) + cst3 as (cst2 + cst3) + cst1
> > > but (cst2 + cst3) doesn't simplify and we try again and optimize
> > > it as (cst3 + cst1) + cst2 and then (cst1 + cst2) + cst3 and so on
> > > forever.  If @0 is not a CONSTANT_CLASS_P, there is not a problem,
> > > if it is, the code just checks if we can actually simplify the
> > > operation between cst2 and cst3 into a constant.
> > 
> > Is there a reason to try simplifying at all for constant @0?
> 
> Yes. cst2+cst3 might simplify (the operation happens to be exact and not
> require rounding), which leaves us with only one addition instead of 2.

Yeah, exactly, e.g.

/* { dg-do compile } */
/* { dg-options "-Ofast -frounding-math" } */

float
foo (void)
{
  float a = 9.999999974752427078783512115478515625e-7f;
  float b = 1024.0f;
  float c = 2048.0f;
  return a + b + c;
}

would no longer be optimized into a single addition rather than 2.

	Jakub
Richard Biener Feb. 14, 2018, 11:09 a.m. | #4
On Tue, 13 Feb 2018, Marc Glisse wrote:

> On Tue, 13 Feb 2018, Richard Biener wrote:
> 
> > On February 13, 2018 6:51:29 PM GMT+01:00, Jakub Jelinek <jakub@redhat.com>
> > wrote:
> > > Hi!
> > > 
> > > On the following testcase, we recurse infinitely, because
> > > we have float re-association enabled, but also rounding-math, so
> > > we try to optimize (cst1 + cst2) + cst3 as (cst2 + cst3) + cst1
> > > but (cst2 + cst3) doesn't simplify and we try again and optimize
> > > it as (cst3 + cst1) + cst2 and then (cst1 + cst2) + cst3 and so on
> > > forever.  If @0 is not a CONSTANT_CLASS_P, there is not a problem,
> > > if it is, the code just checks if we can actually simplify the
> > > operation between cst2 and cst3 into a constant.
> > 
> > Is there a reason to try simplifying at all for constant @0?
> 
> Yes. cst2+cst3 might simplify (the operation happens to be exact and not
> require rounding), which leaves us with only one addition instead of 2.
> 
> On the other hand, mixing -frounding-math with reassociation seems strange to
> me, and likely not worth optimizing for.

./cc1 -quiet t.c -O -frounding-math -fassociative-math
cc1: warning: -fassociative-math disabled; other options take precedence

So _maybe_ we should disable these patterns for !flag_associative_math
when dealing with FP?

Richard.
Jakub Jelinek Feb. 14, 2018, 12:29 p.m. | #5
On Wed, Feb 14, 2018 at 12:09:57PM +0100, Richard Biener wrote:
> On Tue, 13 Feb 2018, Marc Glisse wrote:
> 
> > On Tue, 13 Feb 2018, Richard Biener wrote:
> > 
> > > On February 13, 2018 6:51:29 PM GMT+01:00, Jakub Jelinek <jakub@redhat.com>
> > > wrote:
> > > > Hi!
> > > > 
> > > > On the following testcase, we recurse infinitely, because
> > > > we have float re-association enabled, but also rounding-math, so
> > > > we try to optimize (cst1 + cst2) + cst3 as (cst2 + cst3) + cst1
> > > > but (cst2 + cst3) doesn't simplify and we try again and optimize
> > > > it as (cst3 + cst1) + cst2 and then (cst1 + cst2) + cst3 and so on
> > > > forever.  If @0 is not a CONSTANT_CLASS_P, there is not a problem,
> > > > if it is, the code just checks if we can actually simplify the
> > > > operation between cst2 and cst3 into a constant.
> > > 
> > > Is there a reason to try simplifying at all for constant @0?
> > 
> > Yes. cst2+cst3 might simplify (the operation happens to be exact and not
> > require rounding), which leaves us with only one addition instead of 2.
> > 
> > On the other hand, mixing -frounding-math with reassociation seems strange to
> > me, and likely not worth optimizing for.
> 
> ./cc1 -quiet t.c -O -frounding-math -fassociative-math
> cc1: warning: -fassociative-math disabled; other options take precedence

You need
./cc1 -quiet t.c -O -fassociative-math -fno-trapping-math -fno-signed-zeros -frounding-math

> So _maybe_ we should disable these patterns for !flag_associative_math
> when dealing with FP?

We do, this is in block with:
 /* We can't reassociate floating-point unless -fassociative-math
    or fixed-point plus or minus because of saturation to +-Inf.  */
 (if ((!FLOAT_TYPE_P (type) || flag_associative_math)
      && !FIXED_POINT_TYPE_P (type))

But that doesn't mean you can't request associative math and rounding math
at the same time.

	Jakub
Marc Glisse Feb. 14, 2018, 1 p.m. | #6
On Wed, 14 Feb 2018, Richard Biener wrote:

> On Tue, 13 Feb 2018, Marc Glisse wrote:
>
>> On Tue, 13 Feb 2018, Richard Biener wrote:
>>
>>> On February 13, 2018 6:51:29 PM GMT+01:00, Jakub Jelinek <jakub@redhat.com>
>>> wrote:
>>>> Hi!
>>>>
>>>> On the following testcase, we recurse infinitely, because
>>>> we have float re-association enabled, but also rounding-math, so
>>>> we try to optimize (cst1 + cst2) + cst3 as (cst2 + cst3) + cst1
>>>> but (cst2 + cst3) doesn't simplify and we try again and optimize
>>>> it as (cst3 + cst1) + cst2 and then (cst1 + cst2) + cst3 and so on
>>>> forever.  If @0 is not a CONSTANT_CLASS_P, there is not a problem,
>>>> if it is, the code just checks if we can actually simplify the
>>>> operation between cst2 and cst3 into a constant.
>>>
>>> Is there a reason to try simplifying at all for constant @0?
>>
>> Yes. cst2+cst3 might simplify (the operation happens to be exact and not
>> require rounding), which leaves us with only one addition instead of 2.
>>
>> On the other hand, mixing -frounding-math with reassociation seems strange to
>> me, and likely not worth optimizing for.
>
> ./cc1 -quiet t.c -O -frounding-math -fassociative-math
> cc1: warning: -fassociative-math disabled; other options take precedence
>
> So _maybe_ we should disable these patterns for !flag_associative_math
> when dealing with FP?

There is

  (if ((!FLOAT_TYPE_P (type) || flag_associative_math)
       && !FIXED_POINT_TYPE_P (type))

above, which I think covers this transformation.

Patch

--- gcc/match.pd.jj	2018-02-13 09:33:31.000000000 +0100
+++ gcc/match.pd	2018-02-13 12:14:08.108314686 +0100
@@ -1733,9 +1733,20 @@  DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 	       CONSTANT_CLASS_P@2)
      /* If one of the types wraps, use that one.  */
      (if (!ANY_INTEGRAL_TYPE_P (type) || TYPE_OVERFLOW_WRAPS (type))
-      (if (outer_op == PLUS_EXPR)
-       (plus (view_convert @0) (inner_op @2 (view_convert @1)))
-       (minus (view_convert @0) (neg_inner_op @2 (view_convert @1))))
+      /* If all 3 captures are CONSTANT_CLASS_P, only optimize if we
+	 can simplify @2 with @1 into a constant, otherwise we might recurse
+	 forever.  */
+      (if (CONSTANT_CLASS_P (@0))
+       (with { tree cst = fold_unary (VIEW_CONVERT_EXPR, type, @1);
+	       if (cst && CONSTANT_CLASS_P (cst))
+		 cst = const_binop (outer_op == PLUS_EXPR
+				    ? inner_op : neg_inner_op, type,
+				    @2, cst); }
+	(if (cst)
+	 (outer_op (view_convert @0) { cst; })))
+       (if (outer_op == PLUS_EXPR)
+	(plus (view_convert @0) (inner_op @2 (view_convert @1)))
+	(minus (view_convert @0) (neg_inner_op @2 (view_convert @1)))))
       (if (!ANY_INTEGRAL_TYPE_P (TREE_TYPE (@0))
 	   || TYPE_OVERFLOW_WRAPS (TREE_TYPE (@0)))
        (if (outer_op == PLUS_EXPR)
--- gcc/testsuite/gcc.dg/pr84334.c.jj	2018-02-13 12:18:12.765463667 +0100
+++ gcc/testsuite/gcc.dg/pr84334.c	2018-02-13 11:36:56.019632428 +0100
@@ -0,0 +1,12 @@ 
+/* PR tree-optimization/84334 */
+/* { dg-do compile } */
+/* { dg-options "-Ofast -frounding-math" } */
+
+float
+foo (void)
+{
+  float a = 9.999999974752427078783512115478515625e-7f;
+  float b = 1.999999994950485415756702423095703125e-6f;
+  float c = 4.999999873689375817775726318359375e-6f;
+  return a + b + c;
+}