diff mbox

Don't fold in save_expr (PR sanitizer/80536, PR sanitizer/80386)

Message ID 20170511134951.GI4910@redhat.com
State New
Headers show

Commit Message

Marek Polacek May 11, 2017, 1:49 p.m. UTC
I had an interesting time coming to grips with these two PRs.  But it
essentially comes down to the fold call in save_expr.  With that, we can call
fold() on an expression whose operands weren't folded yet, but that is what
code in fold_binary_loc assumes.  Since fold() is not recursive, we could end
up there with expressions such as
i * ((unsigned long) -0 + 1) * 2
causing various sorts of problems: turning valid code into invalid, infinite
recursion, etc.

It's not clear why save_expr folds.  I did some archeology but all I could
find was r67189 (that's 2003) which had:

-  tree t = expr;
+  tree t = fold (expr);
   tree inner;
 
-  /* Don't fold a COMPONENT_EXPR: if the operand was a CONSTRUCTOR (the
-     only time it will fold), it can cause problems with PLACEHOLDER_EXPRs
-     in Ada.  Moreover, it isn't at all clear why we fold here at all.  */
-  if (TREE_CODE (t) != COMPONENT_REF)
-    t = fold (t);

so it wasn't clear even 14 years ago.  But now we know the call is harmful.

Now, fold() doesn't recurse, but can it happen that it folds something anyway?
And maybe turn the expression into an invariant so that we don't need to wrap
it in a SAVE_EXPR?  Turns out it can, but in the C FE, which either uses
c_save_expr that c_fully_folds, or calls save_expr when in_late_binary_op (so
the operands have already been folded), it's very rare, and can only be triggered
by code such as

void
f (int i)
{
  int (*a)[i];
  int x[sizeof (*a)];
}

so I'm not very worried about that.  For C++, Jakub suggested to add SAVE_EXPR
handling to cp_fold.

Future work: get rid of c_save_expr, only use save_expr, and teach c_fully_fold
how to fold SAVE_EXPR (once).  Although there's this c_wrap_maybe_const
business...

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2017-05-11  Marek Polacek  <polacek@redhat.com>

	PR sanitizer/80536
	PR sanitizer/80386
	* cp-gimplify.c (cp_fold): Handle SAVE_EXPR.

	* tree.c (save_expr): Don't fold the expression.

	* c-c++-common/ubsan/pr80536.c: New test.
	* g++.dg/ubsan/pr80386.C: New test.


	Marek

Comments

Richard Biener May 12, 2017, 7:32 a.m. UTC | #1
On Thu, May 11, 2017 at 3:49 PM, Marek Polacek <polacek@redhat.com> wrote:
> I had an interesting time coming to grips with these two PRs.  But it
> essentially comes down to the fold call in save_expr.  With that, we can call
> fold() on an expression whose operands weren't folded yet, but that is what
> code in fold_binary_loc assumes.  Since fold() is not recursive, we could end
> up there with expressions such as
> i * ((unsigned long) -0 + 1) * 2
> causing various sorts of problems: turning valid code into invalid, infinite
> recursion, etc.
>
> It's not clear why save_expr folds.  I did some archeology but all I could
> find was r67189 (that's 2003) which had:
>
> -  tree t = expr;
> +  tree t = fold (expr);
>    tree inner;
>
> -  /* Don't fold a COMPONENT_EXPR: if the operand was a CONSTRUCTOR (the
> -     only time it will fold), it can cause problems with PLACEHOLDER_EXPRs
> -     in Ada.  Moreover, it isn't at all clear why we fold here at all.  */
> -  if (TREE_CODE (t) != COMPONENT_REF)
> -    t = fold (t);
>
> so it wasn't clear even 14 years ago.  But now we know the call is harmful.
>
> Now, fold() doesn't recurse, but can it happen that it folds something anyway?
> And maybe turn the expression into an invariant so that we don't need to wrap
> it in a SAVE_EXPR?  Turns out it can, but in the C FE, which either uses
> c_save_expr that c_fully_folds, or calls save_expr when in_late_binary_op (so
> the operands have already been folded), it's very rare, and can only be triggered
> by code such as
>
> void
> f (int i)
> {
>   int (*a)[i];
>   int x[sizeof (*a)];
> }
>
> so I'm not very worried about that.  For C++, Jakub suggested to add SAVE_EXPR
> handling to cp_fold.
>
> Future work: get rid of c_save_expr, only use save_expr, and teach c_fully_fold
> how to fold SAVE_EXPR (once).  Although there's this c_wrap_maybe_const
> business...
>
> Bootstrapped/regtested on x86_64-linux, ok for trunk?

Interesting - I tried this once (removing fold () from save_expr) but
it failed miserably
(don't remember the details).  Maybe the cp/ part fixes it up.

Did you include Ada in testing?

Otherwise the tree.c change is ok.

(yay, one less to go in the attempt to remove fold ())

Thanks!
Richard.

> 2017-05-11  Marek Polacek  <polacek@redhat.com>
>
>         PR sanitizer/80536
>         PR sanitizer/80386
>         * cp-gimplify.c (cp_fold): Handle SAVE_EXPR.
>
>         * tree.c (save_expr): Don't fold the expression.
>
>         * c-c++-common/ubsan/pr80536.c: New test.
>         * g++.dg/ubsan/pr80386.C: New test.
>
> diff --git gcc/cp/cp-gimplify.c gcc/cp/cp-gimplify.c
> index de62414..38a8ed4 100644
> --- gcc/cp/cp-gimplify.c
> +++ gcc/cp/cp-gimplify.c
> @@ -2428,6 +2428,15 @@ cp_fold (tree x)
>        x = fold (x);
>        break;
>
> +    case SAVE_EXPR:
> +      /* A SAVE_EXPR might contain e.g. (0 * i) + (0 * j), which, after
> +        folding, evaluates to an invariant.  In that case no need to wrap
> +        this folded tree with a SAVE_EXPR.  */
> +      r = cp_fold (TREE_OPERAND (x, 0));
> +      if (tree_invariant_p (r))
> +       x = r;
> +      break;
> +
>      default:
>        return org_x;
>      }
> diff --git gcc/testsuite/c-c++-common/ubsan/pr80536.c gcc/testsuite/c-c++-common/ubsan/pr80536.c
> index e69de29..23913ad 100644
> --- gcc/testsuite/c-c++-common/ubsan/pr80536.c
> +++ gcc/testsuite/c-c++-common/ubsan/pr80536.c
> @@ -0,0 +1,9 @@
> +/* PR sanitizer/80536 */
> +/* { dg-do compile } */
> +/* { dg-options "-fsanitize=undefined" } */
> +
> +int
> +foo (int i)
> +{
> +  return ((i * (unsigned long long) (-0 + 1UL)) * 2) % 1;
> +}
> diff --git gcc/testsuite/g++.dg/ubsan/pr80386.C gcc/testsuite/g++.dg/ubsan/pr80386.C
> index e69de29..60122da 100644
> --- gcc/testsuite/g++.dg/ubsan/pr80386.C
> +++ gcc/testsuite/g++.dg/ubsan/pr80386.C
> @@ -0,0 +1,13 @@
> +// PR sanitizer/80386
> +// { dg-do run }
> +// { dg-options "-fsanitize=undefined -fno-sanitize-recover" }
> +
> +static unsigned long long int i = 13996271126042720493ULL;
> +
> +int
> +main ()
> +{
> +  int r = (((2921 + 0) - short(i)) + 0x7fffffff) >> 0;
> +  asm volatile ("" : "+g" (r));
> +  return 0;
> +}
> diff --git gcc/tree.c gcc/tree.c
> index b76b521..7506725 100644
> --- gcc/tree.c
> +++ gcc/tree.c
> @@ -3337,7 +3337,6 @@ tree_invariant_p (tree t)
>  tree
>  save_expr (tree expr)
>  {
> -  tree t = fold (expr);
>    tree inner;
>
>    /* If the tree evaluates to a constant, then we don't want to hide that
> @@ -3345,33 +3344,32 @@ save_expr (tree expr)
>       However, a read-only object that has side effects cannot be bypassed.
>       Since it is no problem to reevaluate literals, we just return the
>       literal node.  */
> -  inner = skip_simple_arithmetic (t);
> +  inner = skip_simple_arithmetic (expr);
>    if (TREE_CODE (inner) == ERROR_MARK)
>      return inner;
>
>    if (tree_invariant_p_1 (inner))
> -    return t;
> +    return expr;
>
>    /* If INNER contains a PLACEHOLDER_EXPR, we must evaluate it each time, since
>       it means that the size or offset of some field of an object depends on
>       the value within another field.
>
> -     Note that it must not be the case that T contains both a PLACEHOLDER_EXPR
> +     Note that it must not be the case that EXPR contains both a PLACEHOLDER_EXPR
>       and some variable since it would then need to be both evaluated once and
>       evaluated more than once.  Front-ends must assure this case cannot
>       happen by surrounding any such subexpressions in their own SAVE_EXPR
>       and forcing evaluation at the proper time.  */
>    if (contains_placeholder_p (inner))
> -    return t;
> +    return expr;
>
> -  t = build1 (SAVE_EXPR, TREE_TYPE (expr), t);
> -  SET_EXPR_LOCATION (t, EXPR_LOCATION (expr));
> +  expr = build1_loc (EXPR_LOCATION (expr), SAVE_EXPR, TREE_TYPE (expr), expr);
>
>    /* This expression might be placed ahead of a jump to ensure that the
>       value was computed on both sides of the jump.  So make sure it isn't
>       eliminated as dead.  */
> -  TREE_SIDE_EFFECTS (t) = 1;
> -  return t;
> +  TREE_SIDE_EFFECTS (expr) = 1;
> +  return expr;
>  }
>
>  /* Look inside EXPR into any simple arithmetic operations.  Return the
>
>         Marek
Marek Polacek May 12, 2017, 10:09 a.m. UTC | #2
On Fri, May 12, 2017 at 09:32:48AM +0200, Richard Biener wrote:
> On Thu, May 11, 2017 at 3:49 PM, Marek Polacek <polacek@redhat.com> wrote:
> > I had an interesting time coming to grips with these two PRs.  But it
> > essentially comes down to the fold call in save_expr.  With that, we can call
> > fold() on an expression whose operands weren't folded yet, but that is what
> > code in fold_binary_loc assumes.  Since fold() is not recursive, we could end
> > up there with expressions such as
> > i * ((unsigned long) -0 + 1) * 2
> > causing various sorts of problems: turning valid code into invalid, infinite
> > recursion, etc.
> >
> > It's not clear why save_expr folds.  I did some archeology but all I could
> > find was r67189 (that's 2003) which had:
> >
> > -  tree t = expr;
> > +  tree t = fold (expr);
> >    tree inner;
> >
> > -  /* Don't fold a COMPONENT_EXPR: if the operand was a CONSTRUCTOR (the
> > -     only time it will fold), it can cause problems with PLACEHOLDER_EXPRs
> > -     in Ada.  Moreover, it isn't at all clear why we fold here at all.  */
> > -  if (TREE_CODE (t) != COMPONENT_REF)
> > -    t = fold (t);
> >
> > so it wasn't clear even 14 years ago.  But now we know the call is harmful.

I've come to believe this was purely to discover more invariants.

> > Now, fold() doesn't recurse, but can it happen that it folds something anyway?
> > And maybe turn the expression into an invariant so that we don't need to wrap
> > it in a SAVE_EXPR?  Turns out it can, but in the C FE, which either uses
> > c_save_expr that c_fully_folds, or calls save_expr when in_late_binary_op (so
> > the operands have already been folded), it's very rare, and can only be triggered
> > by code such as
> >
> > void
> > f (int i)
> > {
> >   int (*a)[i];
> >   int x[sizeof (*a)];
> > }
> >
> > so I'm not very worried about that.  For C++, Jakub suggested to add SAVE_EXPR
> > handling to cp_fold.
> >
> > Future work: get rid of c_save_expr, only use save_expr, and teach c_fully_fold
> > how to fold SAVE_EXPR (once).  Although there's this c_wrap_maybe_const
> > business...
> >
> > Bootstrapped/regtested on x86_64-linux, ok for trunk?
> 
> Interesting - I tried this once (removing fold () from save_expr) but
> it failed miserably
> (don't remember the details).  Maybe the cp/ part fixes it up.

I think that must've been before C++ added all that constexpr bits.  The
cp_fold part is just an optimization that triggers very rarely.

> Did you include Ada in testing?

Not before, so I've tested this now with Ada included -- looks fine still.

> Otherwise the tree.c change is ok.

Thanks.  Jason/Joseph, any comments?

> (yay, one less to go in the attempt to remove fold ())

Do we have any low hanging fruit here?  Suppose not...

	Marek
Richard Biener May 12, 2017, 10:20 a.m. UTC | #3
On Fri, May 12, 2017 at 12:09 PM, Marek Polacek <polacek@redhat.com> wrote:
> On Fri, May 12, 2017 at 09:32:48AM +0200, Richard Biener wrote:
>> On Thu, May 11, 2017 at 3:49 PM, Marek Polacek <polacek@redhat.com> wrote:
>> > I had an interesting time coming to grips with these two PRs.  But it
>> > essentially comes down to the fold call in save_expr.  With that, we can call
>> > fold() on an expression whose operands weren't folded yet, but that is what
>> > code in fold_binary_loc assumes.  Since fold() is not recursive, we could end
>> > up there with expressions such as
>> > i * ((unsigned long) -0 + 1) * 2
>> > causing various sorts of problems: turning valid code into invalid, infinite
>> > recursion, etc.
>> >
>> > It's not clear why save_expr folds.  I did some archeology but all I could
>> > find was r67189 (that's 2003) which had:
>> >
>> > -  tree t = expr;
>> > +  tree t = fold (expr);
>> >    tree inner;
>> >
>> > -  /* Don't fold a COMPONENT_EXPR: if the operand was a CONSTRUCTOR (the
>> > -     only time it will fold), it can cause problems with PLACEHOLDER_EXPRs
>> > -     in Ada.  Moreover, it isn't at all clear why we fold here at all.  */
>> > -  if (TREE_CODE (t) != COMPONENT_REF)
>> > -    t = fold (t);
>> >
>> > so it wasn't clear even 14 years ago.  But now we know the call is harmful.
>
> I've come to believe this was purely to discover more invariants.
>
>> > Now, fold() doesn't recurse, but can it happen that it folds something anyway?
>> > And maybe turn the expression into an invariant so that we don't need to wrap
>> > it in a SAVE_EXPR?  Turns out it can, but in the C FE, which either uses
>> > c_save_expr that c_fully_folds, or calls save_expr when in_late_binary_op (so
>> > the operands have already been folded), it's very rare, and can only be triggered
>> > by code such as
>> >
>> > void
>> > f (int i)
>> > {
>> >   int (*a)[i];
>> >   int x[sizeof (*a)];
>> > }
>> >
>> > so I'm not very worried about that.  For C++, Jakub suggested to add SAVE_EXPR
>> > handling to cp_fold.
>> >
>> > Future work: get rid of c_save_expr, only use save_expr, and teach c_fully_fold
>> > how to fold SAVE_EXPR (once).  Although there's this c_wrap_maybe_const
>> > business...
>> >
>> > Bootstrapped/regtested on x86_64-linux, ok for trunk?
>>
>> Interesting - I tried this once (removing fold () from save_expr) but
>> it failed miserably
>> (don't remember the details).  Maybe the cp/ part fixes it up.
>
> I think that must've been before C++ added all that constexpr bits.  The
> cp_fold part is just an optimization that triggers very rarely.
>
>> Did you include Ada in testing?
>
> Not before, so I've tested this now with Ada included -- looks fine still.
>
>> Otherwise the tree.c change is ok.
>
> Thanks.  Jason/Joseph, any comments?
>
>> (yay, one less to go in the attempt to remove fold ())
>
> Do we have any low hanging fruit here?  Suppose not...

Factoring out the quaternary cases to better handle

  fold (build4 (code, ...

that happens in quite a few places.  And then there's

        tree t = build3 (BIT_FIELD_REF, currop->type, genop0, op1, op2);
        REF_REVERSE_STORAGE_ORDER (t) = currop->reverse;
        return fold (t);

for the lack of [fold_]buildN getting the REF_REVERSE_STORAGE_ORDER flag ...
(I fully blame Eric for this ;)).

Ok, now starts to be non-low-hanging.

Richard.

>         Marek
Joseph Myers May 12, 2017, 4:55 p.m. UTC | #4
On Fri, 12 May 2017, Marek Polacek wrote:

> > Otherwise the tree.c change is ok.
> 
> Thanks.  Jason/Joseph, any comments?

I don't have any comments on this.  (c_save_expr folds to avoid 
c_fully_fold needing to go inside SAVE_EXPRs.)
Jason Merrill May 16, 2017, 6:41 p.m. UTC | #5
On Fri, May 12, 2017 at 6:09 AM, Marek Polacek <polacek@redhat.com> wrote:
> On Fri, May 12, 2017 at 09:32:48AM +0200, Richard Biener wrote:
>> On Thu, May 11, 2017 at 3:49 PM, Marek Polacek <polacek@redhat.com> wrote:
>> > I had an interesting time coming to grips with these two PRs.  But it
>> > essentially comes down to the fold call in save_expr.  With that, we can call
>> > fold() on an expression whose operands weren't folded yet, but that is what
>> > code in fold_binary_loc assumes.  Since fold() is not recursive, we could end
>> > up there with expressions such as
>> > i * ((unsigned long) -0 + 1) * 2
>> > causing various sorts of problems: turning valid code into invalid, infinite
>> > recursion, etc.
>> >
>> > It's not clear why save_expr folds.  I did some archeology but all I could
>> > find was r67189 (that's 2003) which had:
>> >
>> > -  tree t = expr;
>> > +  tree t = fold (expr);
>> >    tree inner;
>> >
>> > -  /* Don't fold a COMPONENT_EXPR: if the operand was a CONSTRUCTOR (the
>> > -     only time it will fold), it can cause problems with PLACEHOLDER_EXPRs
>> > -     in Ada.  Moreover, it isn't at all clear why we fold here at all.  */
>> > -  if (TREE_CODE (t) != COMPONENT_REF)
>> > -    t = fold (t);
>> >
>> > so it wasn't clear even 14 years ago.  But now we know the call is harmful.
>
> I've come to believe this was purely to discover more invariants.
>
>> > Now, fold() doesn't recurse, but can it happen that it folds something anyway?
>> > And maybe turn the expression into an invariant so that we don't need to wrap
>> > it in a SAVE_EXPR?  Turns out it can, but in the C FE, which either uses
>> > c_save_expr that c_fully_folds, or calls save_expr when in_late_binary_op (so
>> > the operands have already been folded), it's very rare, and can only be triggered
>> > by code such as
>> >
>> > void
>> > f (int i)
>> > {
>> >   int (*a)[i];
>> >   int x[sizeof (*a)];
>> > }
>> >
>> > so I'm not very worried about that.  For C++, Jakub suggested to add SAVE_EXPR
>> > handling to cp_fold.
>> >
>> > Future work: get rid of c_save_expr, only use save_expr, and teach c_fully_fold
>> > how to fold SAVE_EXPR (once).  Although there's this c_wrap_maybe_const
>> > business...
>> >
>> > Bootstrapped/regtested on x86_64-linux, ok for trunk?
>>
>> Interesting - I tried this once (removing fold () from save_expr) but
>> it failed miserably
>> (don't remember the details).  Maybe the cp/ part fixes it up.
>
> I think that must've been before C++ added all that constexpr bits.  The
> cp_fold part is just an optimization that triggers very rarely.
>
>> Did you include Ada in testing?
>
> Not before, so I've tested this now with Ada included -- looks fine still.
>
>> Otherwise the tree.c change is ok.
>
> Thanks.  Jason/Joseph, any comments?

Looks good to me.

Jason
diff mbox

Patch

diff --git gcc/cp/cp-gimplify.c gcc/cp/cp-gimplify.c
index de62414..38a8ed4 100644
--- gcc/cp/cp-gimplify.c
+++ gcc/cp/cp-gimplify.c
@@ -2428,6 +2428,15 @@  cp_fold (tree x)
       x = fold (x);
       break;
 
+    case SAVE_EXPR:
+      /* A SAVE_EXPR might contain e.g. (0 * i) + (0 * j), which, after
+	 folding, evaluates to an invariant.  In that case no need to wrap
+	 this folded tree with a SAVE_EXPR.  */
+      r = cp_fold (TREE_OPERAND (x, 0));
+      if (tree_invariant_p (r))
+	x = r;
+      break;
+
     default:
       return org_x;
     }
diff --git gcc/testsuite/c-c++-common/ubsan/pr80536.c gcc/testsuite/c-c++-common/ubsan/pr80536.c
index e69de29..23913ad 100644
--- gcc/testsuite/c-c++-common/ubsan/pr80536.c
+++ gcc/testsuite/c-c++-common/ubsan/pr80536.c
@@ -0,0 +1,9 @@ 
+/* PR sanitizer/80536 */
+/* { dg-do compile } */
+/* { dg-options "-fsanitize=undefined" } */
+
+int
+foo (int i)
+{
+  return ((i * (unsigned long long) (-0 + 1UL)) * 2) % 1;
+}
diff --git gcc/testsuite/g++.dg/ubsan/pr80386.C gcc/testsuite/g++.dg/ubsan/pr80386.C
index e69de29..60122da 100644
--- gcc/testsuite/g++.dg/ubsan/pr80386.C
+++ gcc/testsuite/g++.dg/ubsan/pr80386.C
@@ -0,0 +1,13 @@ 
+// PR sanitizer/80386
+// { dg-do run }
+// { dg-options "-fsanitize=undefined -fno-sanitize-recover" }
+
+static unsigned long long int i = 13996271126042720493ULL;
+
+int
+main ()
+{
+  int r = (((2921 + 0) - short(i)) + 0x7fffffff) >> 0;
+  asm volatile ("" : "+g" (r));
+  return 0;
+}
diff --git gcc/tree.c gcc/tree.c
index b76b521..7506725 100644
--- gcc/tree.c
+++ gcc/tree.c
@@ -3337,7 +3337,6 @@  tree_invariant_p (tree t)
 tree
 save_expr (tree expr)
 {
-  tree t = fold (expr);
   tree inner;
 
   /* If the tree evaluates to a constant, then we don't want to hide that
@@ -3345,33 +3344,32 @@  save_expr (tree expr)
      However, a read-only object that has side effects cannot be bypassed.
      Since it is no problem to reevaluate literals, we just return the
      literal node.  */
-  inner = skip_simple_arithmetic (t);
+  inner = skip_simple_arithmetic (expr);
   if (TREE_CODE (inner) == ERROR_MARK)
     return inner;
 
   if (tree_invariant_p_1 (inner))
-    return t;
+    return expr;
 
   /* If INNER contains a PLACEHOLDER_EXPR, we must evaluate it each time, since
      it means that the size or offset of some field of an object depends on
      the value within another field.
 
-     Note that it must not be the case that T contains both a PLACEHOLDER_EXPR
+     Note that it must not be the case that EXPR contains both a PLACEHOLDER_EXPR
      and some variable since it would then need to be both evaluated once and
      evaluated more than once.  Front-ends must assure this case cannot
      happen by surrounding any such subexpressions in their own SAVE_EXPR
      and forcing evaluation at the proper time.  */
   if (contains_placeholder_p (inner))
-    return t;
+    return expr;
 
-  t = build1 (SAVE_EXPR, TREE_TYPE (expr), t);
-  SET_EXPR_LOCATION (t, EXPR_LOCATION (expr));
+  expr = build1_loc (EXPR_LOCATION (expr), SAVE_EXPR, TREE_TYPE (expr), expr);
 
   /* This expression might be placed ahead of a jump to ensure that the
      value was computed on both sides of the jump.  So make sure it isn't
      eliminated as dead.  */
-  TREE_SIDE_EFFECTS (t) = 1;
-  return t;
+  TREE_SIDE_EFFECTS (expr) = 1;
+  return expr;
 }
 
 /* Look inside EXPR into any simple arithmetic operations.  Return the