diff mbox

ubsan PATCH to fix compile-time hog with operator overloading (PR sanitizer/78208)

Message ID 20161104160551.GV5939@redhat.com
State New
Headers show

Commit Message

Marek Polacek Nov. 4, 2016, 4:05 p.m. UTC
This is a similar case to PR sanitizer/70342.  Here, we were generating expression
in a quadratic fashion because of the initializer--we create SAVE_EXPR <>, then 
UBSAN_NULL <SAVE_EXPR <>>, and then COMPOUND_EXPR of these two and so on.

On this testcase we were instrumention CALL_EXPR that is in fact operator<<.  I
think those always return a reference, so it cannot be NULL, so there's no
point in instrumenting those?

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2016-11-04  Marek Polacek  <polacek@redhat.com>

	PR sanitizer/78208
	* cp-gimplify.c (cp_genericize_r): Don't instrument
	CALL_EXPR_OPERATOR_SYNTAX.

	* g++.dg/ubsan/null-8.C: New.


	Marek

Comments

Jakub Jelinek Nov. 4, 2016, 4:16 p.m. UTC | #1
On Fri, Nov 04, 2016 at 05:05:51PM +0100, Marek Polacek wrote:
> This is a similar case to PR sanitizer/70342.  Here, we were generating expression
> in a quadratic fashion because of the initializer--we create SAVE_EXPR <>, then 
> UBSAN_NULL <SAVE_EXPR <>>, and then COMPOUND_EXPR of these two and so on.
> 
> On this testcase we were instrumention CALL_EXPR that is in fact operator<<.  I
> think those always return a reference, so it cannot be NULL, so there's no
> point in instrumenting those?

How do you know what is the return type of a user defined overloaded
operator?
Even when it returns a reference, I thought the point of -fsanitize=null was
for all references to verify their addresses are non-null.

I must say I don't understand the issue, if it is the same SAVE_EXPR in both
lhs of COMPOUND_EXPR and UBSAN_NULL argument, shouldn't cp_genericize_r use
of hash table to avoid walking the same tree multiple times avoid the
exponential compile time/memory?

> Bootstrapped/regtested on x86_64-linux, ok for trunk?
> 
> 2016-11-04  Marek Polacek  <polacek@redhat.com>
> 
> 	PR sanitizer/78208
> 	* cp-gimplify.c (cp_genericize_r): Don't instrument
> 	CALL_EXPR_OPERATOR_SYNTAX.
> 
> 	* g++.dg/ubsan/null-8.C: New.
> 
> diff --git gcc/cp/cp-gimplify.c gcc/cp/cp-gimplify.c
> index 9b9b511..f39e9d5 100644
> --- gcc/cp/cp-gimplify.c
> +++ gcc/cp/cp-gimplify.c
> @@ -1495,7 +1495,8 @@ cp_genericize_r (tree *stmt_p, int *walk_subtrees, void *data)
>  		= TREE_CODE (fn) == ADDR_EXPR
>  		  && TREE_CODE (TREE_OPERAND (fn, 0)) == FUNCTION_DECL
>  		  && DECL_CONSTRUCTOR_P (TREE_OPERAND (fn, 0));
> -	      if (flag_sanitize & (SANITIZE_NULL | SANITIZE_ALIGNMENT))
> +	      if (flag_sanitize & (SANITIZE_NULL | SANITIZE_ALIGNMENT)
> +		  && !CALL_EXPR_OPERATOR_SYNTAX (stmt))
>  		ubsan_maybe_instrument_member_call (stmt, is_ctor);
>  	      if ((flag_sanitize & SANITIZE_VPTR) && !is_ctor)
>  		cp_ubsan_maybe_instrument_member_call (stmt);

	Jakub
Jason Merrill Nov. 4, 2016, 5:03 p.m. UTC | #2
On Fri, Nov 4, 2016 at 12:16 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Fri, Nov 04, 2016 at 05:05:51PM +0100, Marek Polacek wrote:
>> This is a similar case to PR sanitizer/70342.  Here, we were generating expression
>> in a quadratic fashion because of the initializer--we create SAVE_EXPR <>, then
>> UBSAN_NULL <SAVE_EXPR <>>, and then COMPOUND_EXPR of these two and so on.
>>
>> On this testcase we were instrumention CALL_EXPR that is in fact operator<<.  I
>> think those always return a reference, so it cannot be NULL, so there's no
>> point in instrumenting those?
>
> How do you know what is the return type of a user defined overloaded
> operator?

Indeed, there are no constraints on the return type of overloaded operator<<.

Jason
Marek Polacek Nov. 17, 2016, 12:29 a.m. UTC | #3
On Fri, Nov 04, 2016 at 05:16:00PM +0100, Jakub Jelinek wrote:
> On Fri, Nov 04, 2016 at 05:05:51PM +0100, Marek Polacek wrote:
> > This is a similar case to PR sanitizer/70342.  Here, we were generating expression
> > in a quadratic fashion because of the initializer--we create SAVE_EXPR <>, then 
> > UBSAN_NULL <SAVE_EXPR <>>, and then COMPOUND_EXPR of these two and so on.
> > 
> > On this testcase we were instrumention CALL_EXPR that is in fact operator<<.  I
> > think those always return a reference, so it cannot be NULL, so there's no
> > point in instrumenting those?
> 
> How do you know what is the return type of a user defined overloaded
> operator?
> Even when it returns a reference, I thought the point of -fsanitize=null was
> for all references to verify their addresses are non-null.
> 
> I must say I don't understand the issue, if it is the same SAVE_EXPR in both
> lhs of COMPOUND_EXPR and UBSAN_NULL argument, shouldn't cp_genericize_r use
> of hash table to avoid walking the same tree multiple times avoid the
> exponential compile time/memory?

Sorry.  So consider the following:

class S
{
  virtual void foo () = 0;
};

struct T {
  T &operator << (const char *s);
};

T t;

void
S::foo ()
{
  t << "a" << "b" << "c";
}

Before
1498               if (flag_sanitize & (SANITIZE_NULL | SANITIZE_ALIGNMENT))
1499                 ubsan_maybe_instrument_member_call (stmt, is_ctor);

the stmt will be

T::operator<< (T::operator<< (T::operator<< (&t, "a"), "b"), "c")

after ubsan_maybe_instrument_member_call it will be

T::operator<< (UBSAN_NULL (SAVE_EXPR <T::operator<< (T::operator<< (&t, "a"), "b")>, 4B, 0);, SAVE_EXPR <T::operator<< (T::operator<< (&t, "a"), "b")>;, "c")

and that's what is saved into the hash table.  Then another stmt will be the
inner

T::operator<< (T::operator<< (&t, "a"), "b")

which we instrument and put into the hash table, and so on.  But those
SAVE_EXPRs aren't the same.  So we have a T::operator<< call that has nested
T::operator<< calls and we kind of recursively instrument all of them.

Not sure if I made this any clearer nor if this can be avoided... :(

	Marek
Jakub Jelinek Nov. 17, 2016, 8:10 a.m. UTC | #4
On Wed, Nov 16, 2016 at 04:29:05PM -0800, Marek Polacek wrote:
> Sorry.  So consider the following:
> 
> class S
> {
>   virtual void foo () = 0;
> };
> 
> struct T {
>   T &operator << (const char *s);
> };
> 
> T t;
> 
> void
> S::foo ()
> {
>   t << "a" << "b" << "c";
> }
> 
> Before
> 1498               if (flag_sanitize & (SANITIZE_NULL | SANITIZE_ALIGNMENT))
> 1499                 ubsan_maybe_instrument_member_call (stmt, is_ctor);
> 
> the stmt will be
> 
> T::operator<< (T::operator<< (T::operator<< (&t, "a"), "b"), "c")
> 
> after ubsan_maybe_instrument_member_call it will be
> 
> T::operator<< (UBSAN_NULL (SAVE_EXPR <T::operator<< (T::operator<< (&t, "a"), "b")>, 4B, 0);, SAVE_EXPR <T::operator<< (T::operator<< (&t, "a"), "b")>;, "c")
> 
> and that's what is saved into the hash table.  Then another stmt will be the
> inner
> 
> T::operator<< (T::operator<< (&t, "a"), "b")
> 
> which we instrument and put into the hash table, and so on.  But those
> SAVE_EXPRs aren't the same.  So we have a T::operator<< call that has nested
> T::operator<< calls and we kind of recursively instrument all of them.

But those SAVE_EXPRs are the same.  If you put a breakpoint on the:
489	  if (op)
490	    CALL_EXPR_ARG (stmt, 0) = op;
line 490 in c-ubsan.c, on the above short testcase is called 2 times
(the T<<operator<< (&t, "a") is not instrumented, as the argument is known
not to be NULL), rather than 3 times.
When trying larger testcase like below I count ~ 120 calls (counted by hand,
so it is possible it was the right 119 times).
If this has not been linear, the testcase would be compiling for years.
Yes, there will be 119 SAVE_EXPRs, and when you -fdump-tree-original it,
it will be just insanely huge, but each SAVE_EXPR appears exactly twice
in its containing SAVE_EXPR and the second time cp_genericize_r sees
the SAVE_EXPR, it will do the
1138	  /* Other than invisiref parms, don't walk the same tree twice.  */
1139	  if (p_set->contains (stmt))
1140	    {
1141	      *walk_subtrees = 0;
1142	      return NULL_TREE;
1143	    }
early exit.

class S
{
  virtual void foo () = 0;
};

struct T {
  T &operator << (const char *s);
};

T t;

void
S::foo ()
{
  t << "a" << "b" << "c" << "d" << "e" << "f" << "g" << "h" << "i"
    << "j" << "k" << "l" << "m" << "n" << "o" << "p" << "q" << "r"
    << "s" << "t" << "u" << "v" << "w" << "z"
    << "a" << "b" << "c" << "d" << "e" << "f" << "g" << "h" << "i"
    << "j" << "k" << "l" << "m" << "n" << "o" << "p" << "q" << "r"
    << "s" << "t" << "u" << "v" << "w" << "z"
    << "a" << "b" << "c" << "d" << "e" << "f" << "g" << "h" << "i"
    << "j" << "k" << "l" << "m" << "n" << "o" << "p" << "q" << "r"
    << "s" << "t" << "u" << "v" << "w" << "z"
    << "a" << "b" << "c" << "d" << "e" << "f" << "g" << "h" << "i"
    << "j" << "k" << "l" << "m" << "n" << "o" << "p" << "q" << "r"
    << "s" << "t" << "u" << "v" << "w" << "z"
    << "a" << "b" << "c" << "d" << "e" << "f" << "g" << "h" << "i"
    << "j" << "k" << "l" << "m" << "n" << "o" << "p" << "q" << "r"
    << "s" << "t" << "u" << "v" << "w" << "z";
}

	Jakub
Marek Polacek Nov. 18, 2016, 1:39 a.m. UTC | #5
On Thu, Nov 17, 2016 at 09:10:51AM +0100, Jakub Jelinek wrote:
> On Wed, Nov 16, 2016 at 04:29:05PM -0800, Marek Polacek wrote:
> > Sorry.  So consider the following:
> > 
> > class S
> > {
> >   virtual void foo () = 0;
> > };
> > 
> > struct T {
> >   T &operator << (const char *s);
> > };
> > 
> > T t;
> > 
> > void
> > S::foo ()
> > {
> >   t << "a" << "b" << "c";
> > }
> > 
> > Before
> > 1498               if (flag_sanitize & (SANITIZE_NULL | SANITIZE_ALIGNMENT))
> > 1499                 ubsan_maybe_instrument_member_call (stmt, is_ctor);
> > 
> > the stmt will be
> > 
> > T::operator<< (T::operator<< (T::operator<< (&t, "a"), "b"), "c")
> > 
> > after ubsan_maybe_instrument_member_call it will be
> > 
> > T::operator<< (UBSAN_NULL (SAVE_EXPR <T::operator<< (T::operator<< (&t, "a"), "b")>, 4B, 0);, SAVE_EXPR <T::operator<< (T::operator<< (&t, "a"), "b")>;, "c")
> > 
> > and that's what is saved into the hash table.  Then another stmt will be the
> > inner
> > 
> > T::operator<< (T::operator<< (&t, "a"), "b")
> > 
> > which we instrument and put into the hash table, and so on.  But those
> > SAVE_EXPRs aren't the same.  So we have a T::operator<< call that has nested
> > T::operator<< calls and we kind of recursively instrument all of them.
> 
> But those SAVE_EXPRs are the same.  If you put a breakpoint on the:
> 489	  if (op)
> 490	    CALL_EXPR_ARG (stmt, 0) = op;
> line 490 in c-ubsan.c, on the above short testcase is called 2 times
> (the T<<operator<< (&t, "a") is not instrumented, as the argument is known
> not to be NULL), rather than 3 times.
> When trying larger testcase like below I count ~ 120 calls (counted by hand,
> so it is possible it was the right 119 times).
> If this has not been linear, the testcase would be compiling for years.
> Yes, there will be 119 SAVE_EXPRs, and when you -fdump-tree-original it,
> it will be just insanely huge, but each SAVE_EXPR appears exactly twice
> in its containing SAVE_EXPR and the second time cp_genericize_r sees
> the SAVE_EXPR, it will do the
> 1138	  /* Other than invisiref parms, don't walk the same tree twice.  */
> 1139	  if (p_set->contains (stmt))
> 1140	    {
> 1141	      *walk_subtrees = 0;
> 1142	      return NULL_TREE;
> 1143	    }
> early exit.
 
Clearly I misread things here.  I blame the cold I've caught!

In any case, I'm sorry for wasting your time and I'll just close the PR.

	Marek
Jakub Jelinek Nov. 18, 2016, 8:06 a.m. UTC | #6
On Thu, Nov 17, 2016 at 05:39:57PM -0800, Marek Polacek wrote:
> > Yes, there will be 119 SAVE_EXPRs, and when you -fdump-tree-original it,
> > it will be just insanely huge, but each SAVE_EXPR appears exactly twice
> > in its containing SAVE_EXPR and the second time cp_genericize_r sees
> > the SAVE_EXPR, it will do the
> > 1138	  /* Other than invisiref parms, don't walk the same tree twice.  */
> > 1139	  if (p_set->contains (stmt))
> > 1140	    {
> > 1141	      *walk_subtrees = 0;
> > 1142	      return NULL_TREE;
> > 1143	    }
> > early exit.
>  
> Clearly I misread things here.  I blame the cold I've caught!
> 
> In any case, I'm sorry for wasting your time and I'll just close the PR.

I bet it is a compile-time hog with -fsanitize=undefined
-fdump-tree-original.  So, if anything, we'd need some hack in the SAVE_EXPR
generic printing code, say track the depth of SAVE_EXPRs being printed
concurrently and if we go above a certain level of them (say 10), change
into a different mode where we print the content of SAVE_EXPR just the
first time we encounter it + e.g. the SAVE_EXPR address and then when
encountering the same SAVE_EXPR again, just print the address.  When
the SAVE_EXPR nesting level shrinks below 10, clear the pointer set
of SAVE_EXPRs and resume normal behavior.
-fsanitize=undefined -fdump-tree-gimple is fast.

	Jakub
diff mbox

Patch

diff --git gcc/cp/cp-gimplify.c gcc/cp/cp-gimplify.c
index 9b9b511..f39e9d5 100644
--- gcc/cp/cp-gimplify.c
+++ gcc/cp/cp-gimplify.c
@@ -1495,7 +1495,8 @@  cp_genericize_r (tree *stmt_p, int *walk_subtrees, void *data)
 		= TREE_CODE (fn) == ADDR_EXPR
 		  && TREE_CODE (TREE_OPERAND (fn, 0)) == FUNCTION_DECL
 		  && DECL_CONSTRUCTOR_P (TREE_OPERAND (fn, 0));
-	      if (flag_sanitize & (SANITIZE_NULL | SANITIZE_ALIGNMENT))
+	      if (flag_sanitize & (SANITIZE_NULL | SANITIZE_ALIGNMENT)
+		  && !CALL_EXPR_OPERATOR_SYNTAX (stmt))
 		ubsan_maybe_instrument_member_call (stmt, is_ctor);
 	      if ((flag_sanitize & SANITIZE_VPTR) && !is_ctor)
 		cp_ubsan_maybe_instrument_member_call (stmt);
diff --git gcc/testsuite/g++.dg/ubsan/null-8.C gcc/testsuite/g++.dg/ubsan/null-8.C
index e69de29..0600b93 100644
--- gcc/testsuite/g++.dg/ubsan/null-8.C
+++ gcc/testsuite/g++.dg/ubsan/null-8.C
@@ -0,0 +1,22 @@ 
+// PR sanitizer/78208
+// { dg-do compile }
+// { dg-options "-fsanitize=null" }
+
+class S
+{
+  virtual void foo () = 0;
+};
+
+struct T {
+  T &operator << (const char *s);
+};
+
+T t;
+
+void
+S::foo ()
+{
+  t << "a" << "b" << "c" << "d" << "e" << "f" << "g" << "h" << "i"
+    << "j" << "k" << "l" << "m" << "n" << "o" << "p" << "q" << "r"
+    << "s" << "t" << "u" << "v" << "w" << "z";
+}