Patchwork Don't always instrument shifts (PR sanitizer/58413)

login
register
mail settings
Submitter Marek Polacek
Date Sept. 13, 2013, 6:01 p.m.
Message ID <20130913180135.GU23899@redhat.com>
Download mbox | patch
Permalink /patch/274833/
State New
Headers show

Comments

Marek Polacek - Sept. 13, 2013, 6:01 p.m.
This is kind of fugly, but don't have anything better at the moment.
The thing is that we were always instrumenting shift expressions, even
when both operands were INTEGER_CSTs and we could prove at compile
time that the expression is well defined.  This causes problems in the
C FE, mainly at places where the integer constant expression are
expected; one glaring example is e.g. the case label value.
So with this patch we don't instrument expression that are safe.
If we cannot prove that the expression does not cause undefined
behavior, we generate COMPOUND_EXPR as before, the FE then errors on
it -- but in this case doing that seems fine.  I'm only concerned about
the error message, perhaps I could do something similar as in the C++
FE, i.e. write a function that gets a COMPOUND_EXPR and says whether
this is an ubsan COMPOUND_EXPR and if it is, we'd instead "case label
does not reduce to an integer constant" issued something along the
lines of "undefined behavior occured".

Ran ubsan testsuite/bootstrap-ubsan on x86_64-linux, ok for trunk?

2013-09-13  Marek Polacek  <polacek@redhat.com>

	PR sanitizer/58413
c-family/
	* c-ubsan.c (ubsan_instrument_shift): Don't instrument
	an expression if we can prove it is correct.

testsuite/
	* c-c++-common/ubsan/shift-4.c: New test.
	* c-c++-common/ubsan/shift-5.c: New test.
	* gcc.dg/ubsan/c-shift-1.c: New test.


	Marek
Joseph S. Myers - Sept. 13, 2013, 7:18 p.m.
On Fri, 13 Sep 2013, Marek Polacek wrote:

> This is kind of fugly, but don't have anything better at the moment.
> 2013-09-13  Marek Polacek  <polacek@redhat.com>
> 
> 	PR sanitizer/58413
> c-family/
> 	* c-ubsan.c (ubsan_instrument_shift): Don't instrument
> 	an expression if we can prove it is correct.

Shouldn't the conditions used here for an expression being proved correct 
match those for instrumentation, i.e. depend on flag_isoc99 and on 
(cxx_dialect == cxx11 || cxx_dialect == cxx1y)?
Marek Polacek - Sept. 16, 2013, 9:37 a.m.
On Fri, Sep 13, 2013 at 07:18:24PM +0000, Joseph S. Myers wrote:
> On Fri, 13 Sep 2013, Marek Polacek wrote:
> 
> > This is kind of fugly, but don't have anything better at the moment.
> > 2013-09-13  Marek Polacek  <polacek@redhat.com>
> > 
> > 	PR sanitizer/58413
> > c-family/
> > 	* c-ubsan.c (ubsan_instrument_shift): Don't instrument
> > 	an expression if we can prove it is correct.
> 
> Shouldn't the conditions used here for an expression being proved correct 
> match those for instrumentation, i.e. depend on flag_isoc99 and on 
> (cxx_dialect == cxx11 || cxx_dialect == cxx1y)?

I don't think so: for the unsigned case we could restrict it to C
only, but it doesn't hurt doing it even for C++; in the signed case
we care only about C, but we can't restrict it to flag_isoc99 only,
since we need to prove the correctnes even for ANSI C.

	Marek
Joseph S. Myers - Sept. 16, 2013, 3:59 p.m.
On Mon, 16 Sep 2013, Marek Polacek wrote:

> On Fri, Sep 13, 2013 at 07:18:24PM +0000, Joseph S. Myers wrote:
> > On Fri, 13 Sep 2013, Marek Polacek wrote:
> > 
> > > This is kind of fugly, but don't have anything better at the moment.
> > > 2013-09-13  Marek Polacek  <polacek@redhat.com>
> > > 
> > > 	PR sanitizer/58413
> > > c-family/
> > > 	* c-ubsan.c (ubsan_instrument_shift): Don't instrument
> > > 	an expression if we can prove it is correct.
> > 
> > Shouldn't the conditions used here for an expression being proved correct 
> > match those for instrumentation, i.e. depend on flag_isoc99 and on 
> > (cxx_dialect == cxx11 || cxx_dialect == cxx1y)?
> 
> I don't think so: for the unsigned case we could restrict it to C
> only, but it doesn't hurt doing it even for C++; in the signed case
> we care only about C, but we can't restrict it to flag_isoc99 only,
> since we need to prove the correctnes even for ANSI C.

I don't understand how this answers my question.

* The following principle applies: for any command-line options, with 
ubsan enabled, if an integer operation with particular (non-constant) 
operands is accepted by the sanitization code at runtime, the same 
operation with the same operand values (and types) as constants should be 
accepted at compile time (and at runtime) in contexts where an integer 
constant expression is required.  Does this patch make the compiler meet 
this principle, for all the different command-line options that vary what 
is accepted at runtime?

* The following principle also applies: for any command-line options, with 
ubsan enabled, if an integer operation with particular (non-constant) 
operands is rejected by the sanitization code at runtime, the same 
operation with the same operand values (and types) as constants should be 
rejected at compile time (or at runtime) in contexts where an integer 
constant expression is required.  Does this patch make the compiler meet 
this principle, for all the different command-line options that vary what 
is accepted at runtime?
Jakub Jelinek - Sept. 16, 2013, 6:35 p.m.
On Fri, Sep 13, 2013 at 08:01:36PM +0200, Marek Polacek wrote:
> 2013-09-13  Marek Polacek  <polacek@redhat.com>
> 
> 	PR sanitizer/58413
> c-family/
> 	* c-ubsan.c (ubsan_instrument_shift): Don't instrument
> 	an expression if we can prove it is correct.
> 
> testsuite/
> 	* c-c++-common/ubsan/shift-4.c: New test.
> 	* c-c++-common/ubsan/shift-5.c: New test.
> 	* gcc.dg/ubsan/c-shift-1.c: New test.
> 
> --- gcc/c-family/c-ubsan.c.mp3	2013-09-13 19:19:55.410925155 +0200
> +++ gcc/c-family/c-ubsan.c	2013-09-13 19:20:16.766006242 +0200
> @@ -104,6 +104,40 @@ ubsan_instrument_shift (location_t loc,
>    tree uprecm1 = build_int_cst (op1_utype, op0_prec - 1);
>    tree precm1 = build_int_cst (type1, op0_prec - 1);
>  
> +  /* If OP0 is of an unsigned type, we may prove that OP1 is not
> +     greater than UPRECM1 (and not negative); in that case we can
> +     bail out early.  */
> +  if (TYPE_UNSIGNED (type0)
> +      && TREE_CODE (op1) == INTEGER_CST
> +      && tree_int_cst_sgn (op1) != -1
> +      && !tree_int_cst_lt (uprecm1, op1))
> +    return NULL_TREE;
> +
> +  /* Even when OP0 is of a signed type, we may prove that there's
> +     nothing wrong with the shift if both operands are INTEGER_CSTs
> +     and wouldn't trigger UB.  We do this only for C.
> +     XXX Should we treat ANSI C specially wrt 1 << 31?  */
> +  if (TREE_CODE (op0) == INTEGER_CST
> +      && TREE_CODE (op1) == INTEGER_CST
> +      && !TYPE_UNSIGNED (type0)
> +      && tree_int_cst_sgn (op1) != -1
> +      && !tree_int_cst_lt (uprecm1, op1)
> +      && !c_dialect_cxx ())
> +    {
> +      /* If this is a right shift, we can return now.  */
> +      if (code == RSHIFT_EXPR)
> +        return NULL_TREE;
> +
> +      /* Otherwise see comment below.  */
> +      tree x = fold_convert_loc (loc, unsigned_type_for (type0), op0);
> +      x = fold_build2 (RSHIFT_EXPR, TREE_TYPE (x), x,
> +		       fold_build2 (MINUS_EXPR, TREE_TYPE (precm1), precm1,
> +				    op1));
> +      if (integer_zerop (x))
> +        return NULL_TREE;
> +    }
> +
> +  /* Ok, we have to do the instrumentation.  */

I'd say the above is going to be a maintainance nightmare, with all the code
duplication.  And you are certainly going to miss cases that way,
e.g.
void
foo (void)
{
  int A[-2 / -1] = {};
}

I'd say instead of adding all this, you should just at the right spot insert
if (integer_zerop (t)) return NULL_TREE; or similar.

For shift instrumentation, I guess you could add
if (integer_zerop (t) && (tt == NULL_TREE || integer_zerop (tt)))
  return NULL_TREE;
right before:
  t = fold_build2 (COMPOUND_EXPR, TREE_TYPE (t), op0, t);

> +/* PR sanitizer/58413 */
> +/* { dg-do run } */
> +/* { dg-options "-fsanitize=shift -w" } */
> +
> +int x = 7;
> +int
> +main (void)
> +{
> +  /* All of the following should pass.  */
> +  int A[128 >> 5] = {};
> +  int B[128 << 5] = {};
> +
> +  static int e =
> +    ((int)
> +     (0x00000000 | ((31 & ((1 << (4)) - 1)) << (((15) + 6) + 4)) |
> +      ((0) << ((15) + 6)) | ((0) << (15))));

This relies on int32plus, so needs to be /* { dg-do run { target int32plus } } */
> --- gcc/testsuite/c-c++-common/ubsan/shift-5.c.mp3	2013-09-13 18:25:06.195847144 +0200
> +++ gcc/testsuite/c-c++-common/ubsan/shift-5.c	2013-09-13 19:16:38.990211229 +0200
> @@ -0,0 +1,21 @@
> +/* { dg-do compile} */
> +/* { dg-options "-fsanitize=shift -w" } */
> +/* { dg-shouldfail "ubsan" } */
> +
> +int x;
> +int
> +main (void)
> +{
> +  /* None of the following should pass.  */
> +  switch (x)
> +    {
> +    case 1 >> -1:	/* { dg-error "" } */
> +    case -1 >> -1:	/* { dg-error "" } */
> +    case 1 << -1:	/* { dg-error "" } */
> +    case -1 << -1:	/* { dg-error "" } */
> +    case -1 >> 200:	/* { dg-error "" } */
> +    case 1 << 200:	/* { dg-error "" } */

Can't you fill in the error you are expecting, or is the problem
that the wording is very different between C and C++?

	Jakub
Marek Polacek - Sept. 17, 2013, 12:41 p.m.
On Mon, Sep 16, 2013 at 03:59:12PM +0000, Joseph S. Myers wrote:
> On Mon, 16 Sep 2013, Marek Polacek wrote:
> 
> > On Fri, Sep 13, 2013 at 07:18:24PM +0000, Joseph S. Myers wrote:
> > > On Fri, 13 Sep 2013, Marek Polacek wrote:
> > > 
> > > > This is kind of fugly, but don't have anything better at the moment.
> > > > 2013-09-13  Marek Polacek  <polacek@redhat.com>
> > > > 
> > > > 	PR sanitizer/58413
> > > > c-family/
> > > > 	* c-ubsan.c (ubsan_instrument_shift): Don't instrument
> > > > 	an expression if we can prove it is correct.
> > > 
> > > Shouldn't the conditions used here for an expression being proved correct 
> > > match those for instrumentation, i.e. depend on flag_isoc99 and on 
> > > (cxx_dialect == cxx11 || cxx_dialect == cxx1y)?
> > 
> > I don't think so: for the unsigned case we could restrict it to C
> > only, but it doesn't hurt doing it even for C++; in the signed case
> > we care only about C, but we can't restrict it to flag_isoc99 only,
> > since we need to prove the correctnes even for ANSI C.
> 
> I don't understand how this answers my question.

I'm sorry.

Please disregard the original (ugly) patch, the folloing applies to
the new (pretty) patch
<http://gcc.gnu.org/ml/gcc-patches/2013-09/msg01283.html>.

> * The following principle applies: for any command-line options, with 
> ubsan enabled, if an integer operation with particular (non-constant) 
> operands is accepted by the sanitization code at runtime, the same 
> operation with the same operand values (and types) as constants should be 
> accepted at compile time (and at runtime) in contexts where an integer 
> constant expression is required.  Does this patch make the compiler meet 
> this principle, for all the different command-line options that vary what 
> is accepted at runtime?

I believe so.  E.g.

int i = 4, j = 3, k;
k = i << j;

is ok, thus the following is ok as well

case (4 << 3) (for C++/C with various -std=*).
 
> * The following principle also applies: for any command-line options, with 
> ubsan enabled, if an integer operation with particular (non-constant) 
> operands is rejected by the sanitization code at runtime, the same 
> operation with the same operand values (and types) as constants should be 
> rejected at compile time (or at runtime) in contexts where an integer 
> constant expression is required.  Does this patch make the compiler meet 
> this principle, for all the different command-line options that vary what 
> is accepted at runtime?

And I think this applies as well.  At runtime we reject e.g.
int i = 1, j = 120, k;
k = i << j;

and at compile-time we reject

  enum e { 
    red = 0 << 120,
  };

	Marek

Patch

--- gcc/c-family/c-ubsan.c.mp3	2013-09-13 19:19:55.410925155 +0200
+++ gcc/c-family/c-ubsan.c	2013-09-13 19:20:16.766006242 +0200
@@ -104,6 +104,40 @@  ubsan_instrument_shift (location_t loc,
   tree uprecm1 = build_int_cst (op1_utype, op0_prec - 1);
   tree precm1 = build_int_cst (type1, op0_prec - 1);
 
+  /* If OP0 is of an unsigned type, we may prove that OP1 is not
+     greater than UPRECM1 (and not negative); in that case we can
+     bail out early.  */
+  if (TYPE_UNSIGNED (type0)
+      && TREE_CODE (op1) == INTEGER_CST
+      && tree_int_cst_sgn (op1) != -1
+      && !tree_int_cst_lt (uprecm1, op1))
+    return NULL_TREE;
+
+  /* Even when OP0 is of a signed type, we may prove that there's
+     nothing wrong with the shift if both operands are INTEGER_CSTs
+     and wouldn't trigger UB.  We do this only for C.
+     XXX Should we treat ANSI C specially wrt 1 << 31?  */
+  if (TREE_CODE (op0) == INTEGER_CST
+      && TREE_CODE (op1) == INTEGER_CST
+      && !TYPE_UNSIGNED (type0)
+      && tree_int_cst_sgn (op1) != -1
+      && !tree_int_cst_lt (uprecm1, op1)
+      && !c_dialect_cxx ())
+    {
+      /* If this is a right shift, we can return now.  */
+      if (code == RSHIFT_EXPR)
+        return NULL_TREE;
+
+      /* Otherwise see comment below.  */
+      tree x = fold_convert_loc (loc, unsigned_type_for (type0), op0);
+      x = fold_build2 (RSHIFT_EXPR, TREE_TYPE (x), x,
+		       fold_build2 (MINUS_EXPR, TREE_TYPE (precm1), precm1,
+				    op1));
+      if (integer_zerop (x))
+        return NULL_TREE;
+    }
+
+  /* Ok, we have to do the instrumentation.  */
   t = fold_convert_loc (loc, op1_utype, op1);
   t = fold_build2 (GT_EXPR, boolean_type_node, t, uprecm1);
 
@@ -125,7 +159,7 @@  ubsan_instrument_shift (location_t loc,
      x < 0 || ((unsigned) x >> (precm1 - y))
      if > 1, is undefined.  */
   if (code == LSHIFT_EXPR
-      && !TYPE_UNSIGNED (TREE_TYPE (op0))
+      && !TYPE_UNSIGNED (type0)
       && (cxx_dialect == cxx11 || cxx_dialect == cxx1y))
     {
       tree x = fold_build2 (MINUS_EXPR, integer_type_node, precm1, op1);
--- gcc/testsuite/c-c++-common/ubsan/shift-4.c.mp3	2013-09-13 18:25:02.294833062 +0200
+++ gcc/testsuite/c-c++-common/ubsan/shift-4.c	2013-09-13 18:43:56.171046915 +0200
@@ -0,0 +1,30 @@ 
+/* PR sanitizer/58413 */
+/* { dg-do run } */
+/* { dg-options "-fsanitize=shift -w" } */
+
+int x = 7;
+int
+main (void)
+{
+  /* All of the following should pass.  */
+  int A[128 >> 5] = {};
+  int B[128 << 5] = {};
+
+  static int e =
+    ((int)
+     (0x00000000 | ((31 & ((1 << (4)) - 1)) << (((15) + 6) + 4)) |
+      ((0) << ((15) + 6)) | ((0) << (15))));
+
+  if (e != 503316480)
+    __builtin_abort ();
+
+  switch (x)
+    {
+    case 1 >> 4:
+    case 1 << 4:
+    case 128 << (4 + 1):
+    case 128 >> (4 + 1):
+      return 1;
+    }
+  return 0;
+}
--- gcc/testsuite/c-c++-common/ubsan/shift-5.c.mp3	2013-09-13 18:25:06.195847144 +0200
+++ gcc/testsuite/c-c++-common/ubsan/shift-5.c	2013-09-13 19:16:38.990211229 +0200
@@ -0,0 +1,21 @@ 
+/* { dg-do compile} */
+/* { dg-options "-fsanitize=shift -w" } */
+/* { dg-shouldfail "ubsan" } */
+
+int x;
+int
+main (void)
+{
+  /* None of the following should pass.  */
+  switch (x)
+    {
+    case 1 >> -1:	/* { dg-error "" } */
+    case -1 >> -1:	/* { dg-error "" } */
+    case 1 << -1:	/* { dg-error "" } */
+    case -1 << -1:	/* { dg-error "" } */
+    case -1 >> 200:	/* { dg-error "" } */
+    case 1 << 200:	/* { dg-error "" } */
+      return 1;
+    }
+  return 0;
+}
--- gcc/testsuite/gcc.dg/ubsan/c-shift-1.c.mp3	2013-09-13 19:01:21.334825808 +0200
+++ gcc/testsuite/gcc.dg/ubsan/c-shift-1.c	2013-09-13 19:01:21.334825808 +0200
@@ -0,0 +1,18 @@ 
+/* { dg-do compile} */
+/* { dg-options "-fsanitize=shift -w" } */
+/* { dg-shouldfail "ubsan" } */
+
+int x;
+int
+main (void)
+{
+  /* None of the following should pass.  */
+  int A[1 >> -1] = {};    /* { dg-error "variable-sized object may not be initialized" } */
+  int B[-1 >> -1] = {};   /* { dg-error "variable-sized object may not be initialized" } */
+  int D[1 << -1] = {};    /* { dg-error "variable-sized object may not be initialized" } */
+  int E[-1 << -1] = {};   /* { dg-error "variable-sized object may not be initialized" } */
+  int F[-1 >> 200] = {};  /* { dg-error "variable-sized object may not be initialized" } */
+  int G[1 << 200] = {};   /* { dg-error "variable-sized object may not be initialized" } */
+
+  return 0;
+}