diff mbox

[RFA,PR,tree-optimization/72785] Avoid duplicating blocks with b_c_p or b_o_s calls.

Message ID f406207b-7ae1-eb4c-3ba5-28ffe8e90fc0@redhat.com
State New
Headers show

Commit Message

Jeff Law Oct. 21, 2016, 9:45 p.m. UTC
As noted in the BZ, jump threading is isolating a path which contains a 
b_c_p call.  The result of the isolation is that on the original path, 
b_c_p continues to return 0 (not-constant), but on the isolated path it 
returns 1 (because the feeding PHI argument is constant).

That in turn causes the isolated path to issue a call a function that 
would not have been called by the original code.  That violates the 
as-if rule that governs so many of our transformations/optimizations.

I've come to the conclusion that if a block with a b_c_p/b_o_s can not 
be duplicated unless we can prove the original and duplicate continue to 
produce the same result -- including after any edge redirections to wire 
in the duplicate block.

This patch addresses the block duplication problem by disallowing 
duplication if the block contains a b_c_p or b_o_s call and making sure 
that we test can_duplicate_block_p in the old threader (the backward 
thread already has that test).

That's sufficient to resolve this particular BZ.

However, I suspect that we have a deeper problem, namely that any 
transformation which deletes an edge from the CFG can cause a b_c_p 
result to change from non-constant to constant.  It's left as an 
exercise to the reader to produce such a case.

I'm certainly open to someone redefining the semantics of b_c_p or b_o_s 
so that they're not subject to these issues.  But I think the bar for 
acceptance of such a change is fairly high, particularly when this kind 
of change in semantics would allow the optimizers to change code in an 
observable way.


Anyway, attached is the patch which restricts block duplication when the 
block has a b_c_p or b_o_s call.   It's been bootstrapped and regression 
tested on x86_64-linux-gnu.

OK for the trunk?

Jeff

Comments

Richard Biener Oct. 24, 2016, 7:55 a.m. UTC | #1
On Fri, Oct 21, 2016 at 11:45 PM, Jeff Law <law@redhat.com> wrote:
>
> As noted in the BZ, jump threading is isolating a path which contains a
> b_c_p call.  The result of the isolation is that on the original path, b_c_p
> continues to return 0 (not-constant), but on the isolated path it returns 1
> (because the feeding PHI argument is constant).
>
> That in turn causes the isolated path to issue a call a function that would
> not have been called by the original code.  That violates the as-if rule
> that governs so many of our transformations/optimizations.
>
> I've come to the conclusion that if a block with a b_c_p/b_o_s can not be
> duplicated unless we can prove the original and duplicate continue to
> produce the same result -- including after any edge redirections to wire in
> the duplicate block.
>
> This patch addresses the block duplication problem by disallowing
> duplication if the block contains a b_c_p or b_o_s call and making sure that
> we test can_duplicate_block_p in the old threader (the backward thread
> already has that test).
>
> That's sufficient to resolve this particular BZ.
>
> However, I suspect that we have a deeper problem, namely that any
> transformation which deletes an edge from the CFG can cause a b_c_p result
> to change from non-constant to constant.  It's left as an exercise to the
> reader to produce such a case.
>
> I'm certainly open to someone redefining the semantics of b_c_p or b_o_s so
> that they're not subject to these issues.  But I think the bar for
> acceptance of such a change is fairly high, particularly when this kind of
> change in semantics would allow the optimizers to change code in an
> observable way.
>
>
> Anyway, attached is the patch which restricts block duplication when the
> block has a b_c_p or b_o_s call.   It's been bootstrapped and regression
> tested on x86_64-linux-gnu.
>
> OK for the trunk?

Hum.  This means we no longer can for example apply loop versioning or
peeling to loops containing any of both calls.  I've not yet traced down whether
we can still inline functions containing such calls.

That said, we _can_ duplicate blocks with such calls and I don't think
that for b_c_p we guarantee anything about it (inlines into two different
functions may produce true in one case and false in another).  Your
testcase may show an interesting "odd" case but I don't think it's
any different that what we regularly see or _want_ to see.

Same for b_o_s of course.

So I think the bug is no bug.

Thanks,
Richard.

> Jeff
>
> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
> index fd28129..9d00980 100644
> --- a/gcc/ChangeLog
> +++ b/gcc/ChangeLog
> @@ -1,3 +1,12 @@
> +2016-10-21  Jeff Law  <law@redhat.com>
> +
> +       PR tree-optimization/72785
> +       * tree-cfg.c (gimple_can_duplicate_bb_p): Do not allow duplication
> of
> +       blocks with builtin_constant_p or builtin_object_size calls.
> +       * tree-ssa-threadedge.c: Include cfghooks.h.
> +       (thread_through_normal_block): Test can_duplicate_block_p before
> +       threading through the block.
> +
>  2016-10-21  Kugan Vivekanandarajah  <kuganv@linaro.org>
>
>         * ipa-prop.c (ipa_compute_jump_functions_for_edge): Create nonzero
> diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
> index 09db0f8..79c637f 100644
> --- a/gcc/testsuite/ChangeLog
> +++ b/gcc/testsuite/ChangeLog
> @@ -1,5 +1,8 @@
>  2016-10-21  Jeff Law  <law@redhat.com>
>
> +       PR tree-optimization/72785
> +       * gcc.dg/tree-ssa/pr72785.c: New test
> +
>         * PR tree-optimization/71947
>         * gcc.dg/tree-ssa/pr71947-4.c: Avoid x86 opcode.
>         * gcc.dg/tree-ssa/pr71947-5.c: Likewise.
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr72785.c
> b/gcc/testsuite/gcc.dg/tree-ssa/pr72785.c
> new file mode 100644
> index 0000000..17cda22
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr72785.c
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +
> +
> +
> +int a, b;
> +extern int link_error(void);
> +void by(void) {
> +  int c = 1;
> +  b = a ?: c;
> +  __builtin_constant_p(b) ? b ?link_error() : 0 : 0;
> +}
> +
> +
> +/* { dg-final { scan-tree-dump-not "link_error" "optimized" } } */
> +
> diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
> index dfa82aa..f6c7da4 100644
> --- a/gcc/tree-cfg.c
> +++ b/gcc/tree-cfg.c
> @@ -5922,8 +5922,25 @@ gimple_split_block_before_cond_jump (basic_block bb)
>  /* Return true if basic_block can be duplicated.  */
>
>  static bool
> -gimple_can_duplicate_bb_p (const_basic_block bb ATTRIBUTE_UNUSED)
> +gimple_can_duplicate_bb_p (const_basic_block bb)
>  {
> +  /* We can not duplicate a block with a call to builtin_constant_p.  */
> +  for (gimple_stmt_iterator gsi = gsi_start_bb (const_cast<basic_block>
> (bb));
> +       !gsi_end_p (gsi);
> +       gsi_next (&gsi))
> +    {
> +      gimple *stmt = gsi_stmt (gsi);
> +
> +      if (gimple_code (stmt) == GIMPLE_CALL)
> +       {
> +         tree callee = gimple_call_fndecl (stmt);
> +         if (callee
> +             && DECL_BUILT_IN_CLASS (callee) == BUILT_IN_NORMAL
> +             && (DECL_FUNCTION_CODE (callee) == BUILT_IN_CONSTANT_P
> +                 || DECL_FUNCTION_CODE (callee) == BUILT_IN_OBJECT_SIZE))
> +           return false;
> +       }
> +    }
>    return true;
>  }
>
> diff --git a/gcc/tree-ssa-threadedge.c b/gcc/tree-ssa-threadedge.c
> index 170e456..fee8035 100644
> --- a/gcc/tree-ssa-threadedge.c
> +++ b/gcc/tree-ssa-threadedge.c
> @@ -37,6 +37,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "tree-ssa-dom.h"
>  #include "gimple-fold.h"
>  #include "cfganal.h"
> +#include "cfghooks.h"
>
>  /* To avoid code explosion due to jump threading, we limit the
>     number of statements we are going to copy.  This variable
> @@ -1030,6 +1031,11 @@ thread_through_normal_block (edge e,
>                              vec<jump_thread_edge *> *path,
>                              bitmap visited)
>  {
> +  /* If we can't duplicate the block for some reason, then we can't
> +     thread through it.  */
> +  if (!can_duplicate_block_p (e->dest))
> +    return -1;
> +
>    /* We want to record any equivalences created by traversing E.  */
>    if (!handle_dominating_asserts)
>      record_temporary_equivalences (e, const_and_copies, avail_exprs_stack);
>
Jakub Jelinek Oct. 24, 2016, 8:05 a.m. UTC | #2
On Mon, Oct 24, 2016 at 09:55:47AM +0200, Richard Biener wrote:
> Hum.  This means we no longer can for example apply loop versioning or
> peeling to loops containing any of both calls.  I've not yet traced down whether
> we can still inline functions containing such calls.

I agree gimple_can_duplicate_bb_p is too strong for this.

> That said, we _can_ duplicate blocks with such calls and I don't think
> that for b_c_p we guarantee anything about it (inlines into two different
> functions may produce true in one case and false in another).  Your
> testcase may show an interesting "odd" case but I don't think it's
> any different that what we regularly see or _want_ to see.
> 
> Same for b_o_s of course.
> 
> So I think the bug is no bug.

The problem is that it breaks something fairly common.  The linux kernel
has:
#define ilog2(n)                                \
(                                               \
        __builtin_constant_p(n) ? (             \
                (n) < 1 ? ____ilog2_NaN() :     \
                (n) & (1ULL << 63) ? 63 :       \
                (n) & (1ULL << 62) ? 62 :       \
...
                (n) & (1ULL <<  4) ?  4 :       \
                (n) & (1ULL <<  3) ?  3 :       \
                (n) & (1ULL <<  2) ?  2 :       \
                (n) & (1ULL <<  1) ?  1 :       \
                (n) & (1ULL <<  0) ?  0 :       \
                ____ilog2_NaN()                 \
                                   ) :          \
        (sizeof(n) <= 4) ?                      \
        __ilog2_u32(n) :                        \
        __ilog2_u64(n)                          \
 )

and the assumption that in the b_c_p guarded branch (or block) the
corresponding variable is really constant is just too common.
I think either not threading if b_c_p appears there, or folding it to 0
during the threading is what we want.

	Jakub
Richard Biener Oct. 24, 2016, 8:13 a.m. UTC | #3
On Mon, Oct 24, 2016 at 10:05 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Mon, Oct 24, 2016 at 09:55:47AM +0200, Richard Biener wrote:
>> Hum.  This means we no longer can for example apply loop versioning or
>> peeling to loops containing any of both calls.  I've not yet traced down whether
>> we can still inline functions containing such calls.
>
> I agree gimple_can_duplicate_bb_p is too strong for this.
>
>> That said, we _can_ duplicate blocks with such calls and I don't think
>> that for b_c_p we guarantee anything about it (inlines into two different
>> functions may produce true in one case and false in another).  Your
>> testcase may show an interesting "odd" case but I don't think it's
>> any different that what we regularly see or _want_ to see.
>>
>> Same for b_o_s of course.
>>
>> So I think the bug is no bug.
>
> The problem is that it breaks something fairly common.  The linux kernel
> has:
> #define ilog2(n)                                \
> (                                               \
>         __builtin_constant_p(n) ? (             \
>                 (n) < 1 ? ____ilog2_NaN() :     \
>                 (n) & (1ULL << 63) ? 63 :       \
>                 (n) & (1ULL << 62) ? 62 :       \
> ...
>                 (n) & (1ULL <<  4) ?  4 :       \
>                 (n) & (1ULL <<  3) ?  3 :       \
>                 (n) & (1ULL <<  2) ?  2 :       \
>                 (n) & (1ULL <<  1) ?  1 :       \
>                 (n) & (1ULL <<  0) ?  0 :       \
>                 ____ilog2_NaN()                 \
>                                    ) :          \
>         (sizeof(n) <= 4) ?                      \
>         __ilog2_u32(n) :                        \
>         __ilog2_u64(n)                          \
>  )
>
> and the assumption that in the b_c_p guarded branch (or block) the
> corresponding variable is really constant is just too common.
> I think either not threading if b_c_p appears there, or folding it to 0
> during the threading is what we want.

I agree we should avoid the situation for the above.  The reduced
testcase is too obfuscated to be worth fixing though and for the above
it looks rather like a missed optimization to me ... (we thread a too
small part of the path?)

Richard.

>         Jakub
diff mbox

Patch

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index fd28129..9d00980 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,12 @@ 
+2016-10-21  Jeff Law  <law@redhat.com>
+
+	PR tree-optimization/72785
+	* tree-cfg.c (gimple_can_duplicate_bb_p): Do not allow duplication of
+	blocks with builtin_constant_p or builtin_object_size calls.
+	* tree-ssa-threadedge.c: Include cfghooks.h.
+	(thread_through_normal_block): Test can_duplicate_block_p before
+	threading through the block.
+
 2016-10-21  Kugan Vivekanandarajah  <kuganv@linaro.org>
 
 	* ipa-prop.c (ipa_compute_jump_functions_for_edge): Create nonzero
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 09db0f8..79c637f 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,5 +1,8 @@ 
 2016-10-21  Jeff Law  <law@redhat.com>
 
+	PR tree-optimization/72785
+	* gcc.dg/tree-ssa/pr72785.c: New test
+
 	* PR tree-optimization/71947
 	* gcc.dg/tree-ssa/pr71947-4.c: Avoid x86 opcode.
 	* gcc.dg/tree-ssa/pr71947-5.c: Likewise.
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr72785.c b/gcc/testsuite/gcc.dg/tree-ssa/pr72785.c
new file mode 100644
index 0000000..17cda22
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr72785.c
@@ -0,0 +1,16 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+
+
+int a, b;
+extern int link_error(void);
+void by(void) {
+  int c = 1;
+  b = a ?: c;
+  __builtin_constant_p(b) ? b ?link_error() : 0 : 0;
+}
+
+
+/* { dg-final { scan-tree-dump-not "link_error" "optimized" } } */
+
diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index dfa82aa..f6c7da4 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -5922,8 +5922,25 @@  gimple_split_block_before_cond_jump (basic_block bb)
 /* Return true if basic_block can be duplicated.  */
 
 static bool
-gimple_can_duplicate_bb_p (const_basic_block bb ATTRIBUTE_UNUSED)
+gimple_can_duplicate_bb_p (const_basic_block bb)
 {
+  /* We can not duplicate a block with a call to builtin_constant_p.  */
+  for (gimple_stmt_iterator gsi = gsi_start_bb (const_cast<basic_block> (bb));
+       !gsi_end_p (gsi);
+       gsi_next (&gsi))
+    {
+      gimple *stmt = gsi_stmt (gsi);
+
+      if (gimple_code (stmt) == GIMPLE_CALL)
+	{
+	  tree callee = gimple_call_fndecl (stmt);
+	  if (callee
+	      && DECL_BUILT_IN_CLASS (callee) == BUILT_IN_NORMAL
+	      && (DECL_FUNCTION_CODE (callee) == BUILT_IN_CONSTANT_P
+		  || DECL_FUNCTION_CODE (callee) == BUILT_IN_OBJECT_SIZE))
+	    return false;
+	}
+    }
   return true;
 }
 
diff --git a/gcc/tree-ssa-threadedge.c b/gcc/tree-ssa-threadedge.c
index 170e456..fee8035 100644
--- a/gcc/tree-ssa-threadedge.c
+++ b/gcc/tree-ssa-threadedge.c
@@ -37,6 +37,7 @@  along with GCC; see the file COPYING3.  If not see
 #include "tree-ssa-dom.h"
 #include "gimple-fold.h"
 #include "cfganal.h"
+#include "cfghooks.h"
 
 /* To avoid code explosion due to jump threading, we limit the
    number of statements we are going to copy.  This variable
@@ -1030,6 +1031,11 @@  thread_through_normal_block (edge e,
 			     vec<jump_thread_edge *> *path,
 			     bitmap visited)
 {
+  /* If we can't duplicate the block for some reason, then we can't
+     thread through it.  */
+  if (!can_duplicate_block_p (e->dest))
+    return -1;
+
   /* We want to record any equivalences created by traversing E.  */
   if (!handle_dominating_asserts)
     record_temporary_equivalences (e, const_and_copies, avail_exprs_stack);