diff mbox

Control dependence vs. builtin_unreachable

Message ID CABu31nNHjWr92Pm8HrX_fO7_VFj2W=8XkrLiWgWbuN5aoUZZVg@mail.gmail.com
State New
Headers show

Commit Message

Steven Bosscher Jan. 5, 2013, 8:10 p.m. UTC
On Thu, Jan 3, 2013 at 9:51 PM, Jeff Law wrote:
> On 01/03/2013 12:01 PM, Steven Bosscher wrote:
>>
>> Hello,
>>
>> Consider the following test case:
>>
>> void bar (void);
>> int foo (int b, int c, int d)
>> {
>>    int r = 0;
>>    if (b)
>>      res = b * 2 + 4;
>>    if (c)
>>      {
>>        if (d)
>>          r = res;
>>        else
>>          __builtin_unreachable ();
>>      }
>>    return r;
>> }
>>
>> This is typical for code in GCC itself in places where
>> gcc_unreachable() is used.
>
>
>>
>> The corresponding CFG looks like this:
>>
>>              +-----+
>>              | bb0 |
>>              +-----+
>>                |
>>                |
>>                v
>>              +-----+
>>              | bb2 | -+
>>              +-----+  |
>>                |      |
>>                |      |
>>                v      |
>>              +-----+  |
>>              | bb3 |  |
>>              +-----+  |
>>                |      |
>>                |      |
>>                v      |
>> +-----+     +-----+  |
>> | bb8 | <-- | bb4 | <+
>> +-----+     +-----+
>>    |           |
>>    |           |
>>    |           v
>>    |         +-----+     +-----+
>>    |         | bb5 | --> | bb7 |
>>    |         +-----+     +-----+
>>    |           |
>>    |           |
>>    |           v
>>    |         +-----+
>>    |         | bb6 |
>>    |         +-----+
>>    |           |
>>    |           |
>>    |           v
>>    |         +-----+
>>    +-------> | bb9 |
>>              +-----+
>>                |
>>                |
>>                v
>>              +-----+
>>              | bb1 |
>>              +-----+
>
> Presumably BB7 was created in response to the builtin_unreachable?

Yes. The block only contains the BB_UNREACHABLE call. It is cleaned up
at the end of the GIMPLE passes pipeline, in the fold-all-builtins
pass (most __builtin_unreachable calls are, but not all).


> One
> could argue that an empty dead-end basic block should just be removed and
> the CFG appropriately simplified.

The problem with this, is that __builtin_unreachable actually exposes
optimization opportunities: more const/copy props of "implicit sets"
in the predicate guarding the __builtin_unreachable call, more
optimistic value numbering, etc. It also helps improve maybe-unused
warnings accuracy. So simply removing these "dead ends" in the CFG is
probably not a good idea.


> You might want to look  at a discussion from Oct/Nov 2011 "New pass to
> delete unexecutable paths in the CFG" which touches on some of this stuff.

That's a really interesting discussion! I must have missed it at the time :-)


> It's not 100% the same, but the concept of eliminating edges from the CFG
> which we can never traverse in a conforming program applies to both your
> example and the stuff I was playing with.

I think there is one important difference: In the thread you referred
to, you're removing paths in the CFG that are implicitly not
executable (for some languages anyway), whereas a
__builtin_unreachable call is an explicit marker for "this can never
happen". I think this difference is important:

* The explicit marker may have been put there on purpose (e.g. to get
rid of a false-positive warning). The compiler should respect that. An
implicit unreachable path can be optimized away without regard for the
user's intent.

* The explicit marker should not inhibit optimizations. For an
implicit unreachable path the compiler should be conservative. But for
a __builtin_unreachable call that is the only statement in a basic
block, the compiler should be allowed to work as if the block really
is never reached.

The attached patch implements these ideas. During a tree-CFG cleanup,
basic blocks containing only a __builtin_unreachable call are marked
with a new flag BB_NEVER_REACHED. The flag is used in post-dominance:
A marked block is not considered in the computations of the immediate
post-dominator of the predecessor blocks. The result is a more
optimistic post-dominance tree: Without the patch all predecessors of
these BB_NEVER_REACHED blocks have the EXIT block as their
post-dominator, but with the patch this non-executable path in the CFG
is ignored and the post-dominators are those that would result if the
BB_NEVER_REACHED blocks are not there at all (the BB_NEVER_REACHED
blocks themselves are only post-dominated by the EXIT block).

I've also added a control dependence calculation function. It's not
currently used, but it shows how the BB_NEVER_REACHED flag is used in
this case to avoid the "false" control dependences that I showed in
the graphs in http://gcc.gnu.org/ml/gcc/2013-01/msg00021.html.

Bootstrapped&tested on powerpc64-unknown-linux-gnu.
What do you think of this approach?

Ciao!
Steven
* cfg-flags.def (BB_NEVER_REACHED): New flag on basic_block.
	(BB_SUPERBLOCK): Add FIXME, document that this flag should be removed.
	* tree-cfgcleanup.c (is_only_builtin_unreachable_block): New function.
	(mark_builtin_unreachable_blocks): New function.
	(cleanup_tree_cfg_1): Call mark_builtin_unreachable_blocks at the end
	to set BB_NEVER_REACHED on basic blocks containing only a call to
	__builtin_unreachable.
	* cfgexpand.c (expand_gimple_basic_block): Clear BB_NEVER_REACHED.
	* dominance.c (calc_idoms): Do not mess with edge_iterator internals.
	If a basic block has BB_NEVER_REACHED set, and post-dominators are
	being computed, ignore the block for the computation of the immediate
	post-dominator of its predecessors.

	* basic-block.h (compute_control_dependences): New prototype.
	* cfganal.c (compute_dominance_frontiers_1): Implement dominance
	frontiers of the reverse CFG, aka control dependence.
	(compute_dominance_frontiers): Update for above change.
	(compute_control_dependences): New function.

Comments

Jeff Law Jan. 7, 2013, 7:45 p.m. UTC | #1
On 01/05/2013 01:10 PM, Steven Bosscher wrote:
>>
>> Presumably BB7 was created in response to the builtin_unreachable?
>
> Yes. The block only contains the BB_UNREACHABLE call. It is cleaned up
> at the end of the GIMPLE passes pipeline, in the fold-all-builtins
> pass (most __builtin_unreachable calls are, but not all).
I think if you eliminate the block and the cleanup the CFG 
appropriately, the right thing will just happen.

> The problem with this, is that __builtin_unreachable actually exposes
> optimization opportunities: more const/copy props of "implicit sets"
> in the predicate guarding the __builtin_unreachable call, more
> optimistic value numbering, etc. It also helps improve maybe-unused
> warnings accuracy. So simply removing these "dead ends" in the CFG is
> probably not a good idea.
?!?  By removing the empty unreachable block that's precisely what you 
enable.  The block itself goes away and the branch leading to the block 
is simplified appropriately.  That in turn will create larger basic 
blocks, enabling the const/copy propagations and similar optimizations.

Finally removing unreachable paths was insired by the desire to 
eliminate false positives from maybe-{unused,uninitialized} and similar 
warnings.

I'd be very curious to see the conditions under which removing the empty 
unreachable and appropriate cleanup of the CFG  (including the 
underlying control statement) results in less optimizations and less 
precision in our may-warnings.

>
> I think there is one important difference: In the thread you referred
> to, you're removing paths in the CFG that are implicitly not
> executable (for some languages anyway), whereas a
> __builtin_unreachable call is an explicit marker for "this can never
> happen". I think this difference is important:
I don't see this as an important distinction.  The difference is in the 
implicit "can't happen" case, eliminating the "can't happen" path may 
eliminate a user visible side effect (for example a segfault).

>
> * The explicit marker may have been put there on purpose (e.g. to get
> rid of a false-positive warning). The compiler should respect that. An
> implicit unreachable path can be optimized away without regard for the
> user's intent.
Right, but if done right, eliminating the unreachable path correctly 
should eliminate false positive warnings.

>
> * The explicit marker should not inhibit optimizations. For an
> implicit unreachable path the compiler should be conservative. But for
> a __builtin_unreachable call that is the only statement in a basic
> block, the compiler should be allowed to work as if the block really
> is never reached.
Right.

>
> The attached patch implements these ideas. During a tree-CFG cleanup,
> basic blocks containing only a __builtin_unreachable call are marked
> with a new flag BB_NEVER_REACHED. The flag is used in post-dominance:
> A marked block is not considered in the computations of the immediate
> post-dominator of the predecessor blocks. The result is a more
> optimistic post-dominance tree: Without the patch all predecessors of
> these BB_NEVER_REACHED blocks have the EXIT block as their
> post-dominator, but with the patch this non-executable path in the CFG
> is ignored and the post-dominators are those that would result if the
> BB_NEVER_REACHED blocks are not there at all (the BB_NEVER_REACHED
> blocks themselves are only post-dominated by the EXIT block).
>
> I've also added a control dependence calculation function. It's not
> currently used, but it shows how the BB_NEVER_REACHED flag is used in
> this case to avoid the "false" control dependences that I showed in
> the graphs in http://gcc.gnu.org/ml/gcc/2013-01/msg00021.html.
>
> Bootstrapped&tested on powerpc64-unknown-linux-gnu.
> What do you think of this approach?
Before diving into the patch I think we should figure out why we see 
such different effects of eliminating these paths from the CFG.  Your 
assertion is eliminating them will result in more false positives and 
less optimization while my assertion is the opposite.

Jeff
Steven Bosscher Jan. 7, 2013, 8:42 p.m. UTC | #2
On Mon, Jan 7, 2013 at 8:45 PM, Jeff Law <law@redhat.com> wrote:
> Before diving into the patch I think we should figure out why we see such
> different effects of eliminating these paths from the CFG.  Your assertion
> is eliminating them will result in more false positives and less
> optimization while my assertion is the opposite.

Warnings:

$ cat q.c
#if 0
#define UNREACHABLE() __builtin_unreachable()
#else
#define UNREACHABLE() break
#endif

typedef enum fruits { banana = 0, apple = 1, pear = 2, orange = 3 } fruit;

extern void price_fruit_of_the_day (int);

void
discount_of_the_day (fruit f)
{
  int p, c = (int) f;
  switch (f)
  {
    case banana:
      UNREACHABLE ();
    case apple:
      p = 3 * c + 4;
      break;
    case pear:
    case orange:
      p = 7;
      break;
  }
  price_fruit_of_the_day (p);
}
$ # Compiled with UNREACHABLE defined as break:
$ ./cc1 -quiet -W -Wall -Wextra q.c -fdump-tree-all -O1
q.c: In function 'discount_of_the_day':
q.c:28:26: warning: 'p' may be used uninitialized in this function
[-Wmaybe-uninitialized]
   price_fruit_of_the_day (p);
                          ^
$ # Compiled with UNREACHABLE defined as builtin_unreachable:
$ ./cc1 -quiet -W -Wall -Wextra q.c -fdump-tree-all -O1
$


For optimizations it's a bit more difficult. The situations I'm
looking at, are all sinking related, and I suspect that most missed
optimizations will be of this form:

int foo (int b, int c, int d)
{
  int res, r = 0;
  res = 5 * d + b;
  if (d)
    __builtin_unreachable ();
  if (c)
    r = res;
  return r;
}

The "res =" statement can be sunk into the conditional arm of "if(c)"
and tree-ssa-sink.c is able to do so. But secondary effects are missed
(and test cases for this are hard to reduce to mailing list acceptable
size ;-) because tree-ssa-sink walks the post-dominator tree and, with
the right amount of bad luck, it visits the "res=" statement _before_
the part of the (forward-)dominator tree below "if(d)" because the
basic block containing "res=" has EXIT as its immediate
post-dominator, because that's where __builtin_unreachable() leads to.
The "optimistic" post-dominator of "res=" is the "if(c)" block, of
course, because the builtin_unreachable is a kind-of assert, not
something that can really ever happen.

I already showed how the control dependence graph looks Wrong with
__builtin_unreachable calls in place. Any transformation using the CDG
will also be less effective because of this.

Ciao!
Steven
Richard Biener Jan. 8, 2013, 11:26 a.m. UTC | #3
On Mon, Jan 7, 2013 at 8:45 PM, Jeff Law <law@redhat.com> wrote:
> On 01/05/2013 01:10 PM, Steven Bosscher wrote:
>>>
>>>
>>> Presumably BB7 was created in response to the builtin_unreachable?
>>
>>
>> Yes. The block only contains the BB_UNREACHABLE call. It is cleaned up
>> at the end of the GIMPLE passes pipeline, in the fold-all-builtins
>> pass (most __builtin_unreachable calls are, but not all).
>
> I think if you eliminate the block and the cleanup the CFG appropriately,
> the right thing will just happen.
>
>
>> The problem with this, is that __builtin_unreachable actually exposes
>> optimization opportunities: more const/copy props of "implicit sets"
>> in the predicate guarding the __builtin_unreachable call, more
>> optimistic value numbering, etc. It also helps improve maybe-unused
>> warnings accuracy. So simply removing these "dead ends" in the CFG is
>> probably not a good idea.
>
> ?!?  By removing the empty unreachable block that's precisely what you
> enable.  The block itself goes away and the branch leading to the block is
> simplified appropriately.  That in turn will create larger basic blocks,
> enabling the const/copy propagations and similar optimizations.
>
> Finally removing unreachable paths was insired by the desire to eliminate
> false positives from maybe-{unused,uninitialized} and similar warnings.
>
> I'd be very curious to see the conditions under which removing the empty
> unreachable and appropriate cleanup of the CFG  (including the underlying
> control statement) results in less optimizations and less precision in our
> may-warnings.

The issue is VRP - when you remove unreachable blocks you lose the
conditional statement as it is no longer necessary and thus the predicate
you can derive value-ranges from.

Maybe there are more examples.  I can imagine false positive may-be
warnings then become must-be false positives.

There are clearly (as seen here) missed optimizations because we do
keep the unreachable blocks around.

Note that the whole point of the unreachable () excercise was to be
able to implement an assert () that even with -DNDEBUG leaves the
compiler with the same optimization opportunities (as of value-range
analysis) as with -UNDEBUG.

Richard.
Richard Biener Jan. 8, 2013, 11:32 a.m. UTC | #4
On Sat, Jan 5, 2013 at 9:10 PM, Steven Bosscher <stevenb.gcc@gmail.com> wrote:
> On Thu, Jan 3, 2013 at 9:51 PM, Jeff Law wrote:
>> On 01/03/2013 12:01 PM, Steven Bosscher wrote:
>>>
>>> Hello,
>>>
>>> Consider the following test case:
>>>
>>> void bar (void);
>>> int foo (int b, int c, int d)
>>> {
>>>    int r = 0;
>>>    if (b)
>>>      res = b * 2 + 4;
>>>    if (c)
>>>      {
>>>        if (d)
>>>          r = res;
>>>        else
>>>          __builtin_unreachable ();
>>>      }
>>>    return r;
>>> }
>>>
>>> This is typical for code in GCC itself in places where
>>> gcc_unreachable() is used.
>>
>>
>>>
>>> The corresponding CFG looks like this:
>>>
>>>              +-----+
>>>              | bb0 |
>>>              +-----+
>>>                |
>>>                |
>>>                v
>>>              +-----+
>>>              | bb2 | -+
>>>              +-----+  |
>>>                |      |
>>>                |      |
>>>                v      |
>>>              +-----+  |
>>>              | bb3 |  |
>>>              +-----+  |
>>>                |      |
>>>                |      |
>>>                v      |
>>> +-----+     +-----+  |
>>> | bb8 | <-- | bb4 | <+
>>> +-----+     +-----+
>>>    |           |
>>>    |           |
>>>    |           v
>>>    |         +-----+     +-----+
>>>    |         | bb5 | --> | bb7 |
>>>    |         +-----+     +-----+
>>>    |           |
>>>    |           |
>>>    |           v
>>>    |         +-----+
>>>    |         | bb6 |
>>>    |         +-----+
>>>    |           |
>>>    |           |
>>>    |           v
>>>    |         +-----+
>>>    +-------> | bb9 |
>>>              +-----+
>>>                |
>>>                |
>>>                v
>>>              +-----+
>>>              | bb1 |
>>>              +-----+
>>
>> Presumably BB7 was created in response to the builtin_unreachable?
>
> Yes. The block only contains the BB_UNREACHABLE call. It is cleaned up
> at the end of the GIMPLE passes pipeline, in the fold-all-builtins
> pass (most __builtin_unreachable calls are, but not all).
>
>
>> One
>> could argue that an empty dead-end basic block should just be removed and
>> the CFG appropriately simplified.
>
> The problem with this, is that __builtin_unreachable actually exposes
> optimization opportunities: more const/copy props of "implicit sets"
> in the predicate guarding the __builtin_unreachable call, more
> optimistic value numbering, etc. It also helps improve maybe-unused
> warnings accuracy. So simply removing these "dead ends" in the CFG is
> probably not a good idea.
>
>
>> You might want to look  at a discussion from Oct/Nov 2011 "New pass to
>> delete unexecutable paths in the CFG" which touches on some of this stuff.
>
> That's a really interesting discussion! I must have missed it at the time :-)
>
>
>> It's not 100% the same, but the concept of eliminating edges from the CFG
>> which we can never traverse in a conforming program applies to both your
>> example and the stuff I was playing with.
>
> I think there is one important difference: In the thread you referred
> to, you're removing paths in the CFG that are implicitly not
> executable (for some languages anyway), whereas a
> __builtin_unreachable call is an explicit marker for "this can never
> happen". I think this difference is important:
>
> * The explicit marker may have been put there on purpose (e.g. to get
> rid of a false-positive warning). The compiler should respect that. An
> implicit unreachable path can be optimized away without regard for the
> user's intent.
>
> * The explicit marker should not inhibit optimizations. For an
> implicit unreachable path the compiler should be conservative. But for
> a __builtin_unreachable call that is the only statement in a basic
> block, the compiler should be allowed to work as if the block really
> is never reached.
>
> The attached patch implements these ideas. During a tree-CFG cleanup,
> basic blocks containing only a __builtin_unreachable call are marked
> with a new flag BB_NEVER_REACHED. The flag is used in post-dominance:
> A marked block is not considered in the computations of the immediate
> post-dominator of the predecessor blocks. The result is a more
> optimistic post-dominance tree: Without the patch all predecessors of
> these BB_NEVER_REACHED blocks have the EXIT block as their
> post-dominator, but with the patch this non-executable path in the CFG
> is ignored and the post-dominators are those that would result if the
> BB_NEVER_REACHED blocks are not there at all (the BB_NEVER_REACHED
> blocks themselves are only post-dominated by the EXIT block).
>
> I've also added a control dependence calculation function. It's not
> currently used, but it shows how the BB_NEVER_REACHED flag is used in
> this case to avoid the "false" control dependences that I showed in
> the graphs in http://gcc.gnu.org/ml/gcc/2013-01/msg00021.html.
>
> Bootstrapped&tested on powerpc64-unknown-linux-gnu.
> What do you think of this approach?

Does it handle side-effects on the builtin-unreachable path correctly?

int b;
int a;
extern void foo ();
int main()
{
  if (!a)
   {
     if (!b)
       foo ();
     __builtin_unreachable ();
   }
}

---

void foo () { puts("Hello"); exit(0); }

?  Users are not _required_ to annotate noreturn functions.  The BB with
__builtin_unreachable () is still empty otherwise.

Thanks,
Richard.

> Ciao!
> Steven
Jeff Law Jan. 8, 2013, 4:16 p.m. UTC | #5
On 01/08/2013 04:26 AM, Richard Biener wrote:

>
> The issue is VRP - when you remove unreachable blocks you lose the
> conditional statement as it is no longer necessary and thus the predicate
> you can derive value-ranges from.
Understood.  Perhaps we could eliminate them after the first VRP pass, 
but before the second.  That ought to give us the majority of the 
benefit of seeing the conditional and propagating based on the 
conditional, but also give us the benefit of eliminating the branch 
generating straight-line code.

Clearly it needs more investigation, but I think it's worth exploring.

Jeff
Steven Bosscher Jan. 8, 2013, 5:52 p.m. UTC | #6
On Tue, Jan 8, 2013 at 12:32 PM, Richard Biener wrote:
> Does it handle side-effects on the builtin-unreachable path correctly?
>
> int b;
> int a;
> extern void foo ();
> int main()
> {
>   if (!a)
>    {
>      if (!b)
>        foo ();
>      __builtin_unreachable ();
>    }
> }
>
> ---
>
> void foo () { puts("Hello"); exit(0); }
>
> ?  Users are not _required_ to annotate noreturn functions.  The BB with
> __builtin_unreachable () is still empty otherwise.

Not sure what you're expecting in this case, so guessing...

There is a warning for "reaching end of non-void" blah, but that's
correct on the a!=0 path.
With that if(!a) commented out, I don't have that warning of course.

The code is identical with and without patch. It's another example of
how __builtin_unreachable() helps hide unnecessary warnings.

Ciao!
Steven
Richard Biener Jan. 9, 2013, 8:45 a.m. UTC | #7
On Tue, Jan 8, 2013 at 5:16 PM, Jeff Law <law@redhat.com> wrote:
> On 01/08/2013 04:26 AM, Richard Biener wrote:
>
>>
>> The issue is VRP - when you remove unreachable blocks you lose the
>> conditional statement as it is no longer necessary and thus the predicate
>> you can derive value-ranges from.
>
> Understood.  Perhaps we could eliminate them after the first VRP pass, but
> before the second.  That ought to give us the majority of the benefit of
> seeing the conditional and propagating based on the conditional, but also
> give us the benefit of eliminating the branch generating straight-line code.
>
> Clearly it needs more investigation, but I think it's worth exploring.

Sure - especially if we eventually move to preserve value-range information
across passes.  __builtin_constant_p is a similar thing - it guards one
reachable and one unreachable path but we forcefully remove it only
during the late fold-all-builtins pass.

Richard.

> Jeff
Jeff Law Jan. 9, 2013, 8:28 p.m. UTC | #8
On 01/07/2013 01:42 PM, Steven Bosscher wrote:
> typedef enum fruits { banana = 0, apple = 1, pear = 2, orange = 3 } fruit;
>
> extern void price_fruit_of_the_day (int);
>
> void
> discount_of_the_day (fruit f)
> {
>    int p, c = (int) f;
>    switch (f)
>    {
>      case banana:
>        UNREACHABLE ();
>      case apple:
>        p = 3 * c + 4;
>        break;
>      case pear:
>      case orange:
>        p = 7;
>        break;
>    }
>    price_fruit_of_the_day (p);
> }
So if we look at this (I'll analyze the other cases separately).

In reassoc2 we have:


   # BLOCK 2 freq:10000
   # PRED: ENTRY [100.0%]  (fallthru,exec)
   c_3 = (int) f_2(D);
   switch (f_2(D)) <default: <L4>, case 0: <L0>, case 1: <L1>, case 2 
... 3: <L2>>
   # SUCC: 6 [25.0%]  (exec) 3 [25.0%]  (exec) 4 [25.0%]  (exec) 5 
[25.0%]  (exec)

   # BLOCK 3 freq:2500
   # PRED: 2 [25.0%]  (exec)
<L0>:
   # VUSE <.MEM_8(D)>
   __builtin_unreachable ();
   # SUCC:

   # BLOCK 4 freq:2500
   # PRED: 2 [25.0%]  (exec)
<L1>:
   D.1724_5 = c_3 * 3;
   p_6 = D.1724_5 + 4;
   goto <bb 6> (<L4>);
   # SUCC: 6 [100.0%]  (fallthru,exec)

   # BLOCK 5 freq:2500
   # PRED: 2 [25.0%]  (exec)
<L2>:
   # SUCC: 6 [100.0%]  (fallthru,exec)

   # BLOCK 6 freq:7500
   # PRED: 2 [25.0%]  (exec) 4 [100.0%]  (fallthru,exec) 5 [100.0%] 
(fallthru,exec)
   # p_1 = PHI <p_4(D)(2), p_6(4), 7(5)>
<L4>:
   # .MEM_9 = VDEF <.MEM_8(D)>
   price_fruit_of_the_day (p_1);
   # VUSE <.MEM_9>
   return;
   # SUCC: EXIT [100.0%]

}




What I'm suggesting is not simply removing the directive at the source 
level, but at the appropriate time using the knowledge that the block is 
unreachable to simplify the CFG.  In this specific case we would 
eliminate block #3 and the edge from 2->3.

If we do that VRP will come along and do its thing.  With the block/edge 
removals, it will discover that p_6 has the value 7, p_4 is undefined 
and ultimately it'll collapse the PHI into p_1 = 7.  The net result (in 
this case) will be the same as using builtin_unreachable.

Contrast that to simply defining away the directive per your test.  In 
that case we'd still have an edge from 2->3.  And when that's the case, 
VRP won't find a constant for p_6, which ultimately makes p_1 varying 
and triggers the warning because of the use of p_4 to compute p_1.

Jeff
Jeff Law Jan. 9, 2013, 8:37 p.m. UTC | #9
On 01/07/2013 01:42 PM, Steven Bosscher wrote:
> For optimizations it's a bit more difficult.
Somewhat.  For what I'm suggesting, the way we'd miss an optimization 
would be if by removing the unreachable block and simplifying the CFG we 
ultimately remove a conditional which produced a result which we could 
use later.

Something like this (tweaking your 2nd test):


int foo (int b, int c, int d)
{
   int res, r = 0;
   res = 5 * d + b;
   if (d)
     __builtin_unreachable ();
   if (c)
     r = res;
   return r + d;
}


What I'm suggesting would effectively remove the conditional and 
__builtin_unreacahble, so at the source level it'd look like:


int foo (int b, int c, int d)
{
   int res, r = 0;
   res = 5 * d + b;
   if (c)
     r = res;
   return r + d;
}

By removing the conditional, we've lost the fact that to reach the 
return statement, "d" must be zero, which is useful in optimizing the 
return statement.





  The situations I'm
> looking at, are all sinking related, and I suspect that most missed
> optimizations will be of this form:
>
> int foo (int b, int c, int d)
> {
>    int res, r = 0;
>    res = 5 * d + b;
>    if (d)
>      __builtin_unreachable ();
>    if (c)
>      r = res;
>    return r;
> }
I don't see how my proposal would inhibit sinking in this kind of code. 
  In fact, it'd made it easier.

>
> I already showed how the control dependence graph looks Wrong with
> __builtin_unreachable calls in place. Any transformation using the CDG
> will also be less effective because of this.
And what I'm suggesting would actually fix the post dominator tree, 
among other things.  Where we potentially lose is the tweaked case 
above.  However, if we do the optimization at the right time, we can get 
the benefits we're both looking for, I believe.

jeff
diff mbox

Patch

Index: cfg-flags.def
===================================================================
--- cfg-flags.def	(revision 194924)
+++ cfg-flags.def	(working copy)
@@ -51,9 +51,26 @@  DEF_BASIC_BLOCK_FLAG(REACHABLE, 1)
 /* Set for blocks in an irreducible loop by loop analysis.  */
 DEF_BASIC_BLOCK_FLAG(IRREDUCIBLE_LOOP, 2)
 
-/* Set on blocks that may actually not be single-entry single-exit block.  */
+/* Set on blocks that may actually not be single-entry single-exit block.
+   Only valid in RTL.  The value of this flag is overloaded by the
+   BB_NEVER_REACHED flag.
+   FIXME: This is only used by the SLJL exception mechanism.  Should clean
+   this up for GCC 4.9.  */
 DEF_BASIC_BLOCK_FLAG(SUPERBLOCK, 3)
 
+/* Set on blocks that contain only a __builtin_unreachable call.  Such blocks
+   are not really reachable, their only purpose is to inform the compiler that
+   some path through the control flow graph can never be taken.  This flag is
+   used, for instance, to ignore marked basic blocks in the computation of
+   post-dominance.  This flag is only used in GIMPLE, __builtin_unreachable
+   calls are either removed during builtin-folding or expanded as BARRIERs.
+   Note that the meaning of this flag is *not* the complement of BB_REACHABLE.
+   The latter is set on blocks that are reachable from the CFG entry block.
+   The BB_NEVER_REACHED flag is only set on basic blocks that have been
+   marked expelicitly as unreachable using __builtin_unreachable, but such
+   blocks are usually reachable from ENTRY.  */
+DEF_BASIC_BLOCK_FLAG(NEVER_REACHED, 3)
+
 /* Set on basic blocks that the scheduler should not touch.  This is used
    by SMS to prevent other schedulers from messing with the loop schedule.  */
 DEF_BASIC_BLOCK_FLAG(DISABLE_SCHEDULE, 4)
Index: tree-cfgcleanup.c
===================================================================
--- tree-cfgcleanup.c	(revision 194924)
+++ tree-cfgcleanup.c	(working copy)
@@ -577,6 +577,55 @@  split_bbs_on_noreturn_calls (void)
   return changed;
 }
 
+/* Return true if B is empty except for a __builtin_unreachable call and
+   maybe debug statements and/or removable labels.  */
+
+static bool
+is_only_builtin_unreachable_block (basic_block bb)
+{
+  gimple_stmt_iterator gsi;
+
+  if (EDGE_COUNT (bb->succs) > 1
+      || (EDGE_COUNT (bb->succs) == 1
+	  && !(single_succ_edge (bb)->flags & EDGE_FAKE)))
+    return false;
+  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+    {
+      gimple stmt = gsi_stmt (gsi);
+      if (is_gimple_debug (stmt))
+	continue;
+      if (gimple_code (stmt) == GIMPLE_LABEL)
+	{
+	  if (FORCED_LABEL (gimple_label_label (stmt)))
+	    return false;
+	  continue;
+	}
+      if (! is_gimple_builtin_call (stmt)
+	  || DECL_FUNCTION_CODE (gimple_call_fndecl (stmt))
+	     != BUILT_IN_UNREACHABLE)
+	return false;
+    }
+  return true;
+}
+
+/* Mark basic blocks that only contain a __builtin_unreachabel call as
+   BB_NEVER_REACHED.  Such blocks may appear reachable but, by definition,
+   really are not reachable.  Not marking these blocks is conservative but
+   may inhibit optimizations (e.g. post-dominance and control dependences
+   are pessimized if unmarked blocks persist).  */
+
+static void
+mark_builtin_unreachable_blocks (void)
+{
+  basic_block bb;
+  FOR_EACH_BB (bb)
+    {
+      bb->flags &= ~BB_NEVER_REACHED;
+      if (is_only_builtin_unreachable_block (bb))
+	bb->flags |= BB_NEVER_REACHED;
+    }
+}
+
 /* Tries to cleanup cfg in basic block BB.  Returns true if anything
    changes.  */
 
@@ -652,6 +701,7 @@  cleanup_tree_cfg_1 (void)
 
   end_recording_case_labels ();
   BITMAP_FREE (cfgcleanup_altered_bbs);
+  mark_builtin_unreachable_blocks ();
   return retval;
 }
 
Index: cfgexpand.c
===================================================================
--- cfgexpand.c	(revision 194924)
+++ cfgexpand.c	(working copy)
@@ -3776,6 +3776,9 @@  expand_gimple_basic_block (basic_block bb, bool di
     fprintf (dump_file, "\n;; Generating RTL for gimple basic block %d\n",
 	     bb->index);
 
+  /* This flag has no purpose in RTL land (and clashes with BB_SUPERBLOCK).  */
+  bb->flags &= ~BB_NEVER_REACHED;
+
   /* Note that since we are now transitioning from GIMPLE to RTL, we
      cannot use the gsi_*_bb() routines because they expect the basic
      block to be in GIMPLE, instead of RTL.  Therefore, we need to
Index: dominance.c
===================================================================
--- dominance.c	(revision 194924)
+++ dominance.c	(working copy)
@@ -524,7 +524,6 @@  calc_idoms (struct dom_info *di, bool reverse)
 	  if (bitmap_bit_p (di->fake_exit_edge, bb->index))
 	    {
 	      einext = ei;
-	      einext.index = 0;
 	      goto do_fake_exit_edge;
 	    }
 	}
@@ -543,6 +542,17 @@  calc_idoms (struct dom_info *di, bool reverse)
 	  einext = ei;
 	  ei_next (&einext);
 
+	  if (reverse)
+	    {
+	      /* If this is a __builtin_unreachable block, skip it so that
+		 predecessors don't get the EXIT block as post-dominator.  */
+	      if (b->flags & BB_NEVER_REACHED)
+		{
+		  ei = einext;
+		  continue;
+		}
+	    }
+
 	  if (b == en_block)
 	    {
 	    do_fake_exit_edge:
Index: basic-block.h
===================================================================
--- basic-block.h	(revision 194924)
+++ basic-block.h	(working copy)
@@ -790,6 +790,7 @@  extern int dfs_enumerate_from (basic_block, int,
 			       basic_block *, int, const void *);
 extern void compute_dominance_frontiers (struct bitmap_head_def *);
 extern bitmap compute_idf (bitmap, struct bitmap_head_def *);
+extern void compute_control_dependences (struct bitmap_head_def *);
 
 /* In cfgrtl.c  */
 extern rtx block_label (basic_block);
Index: cfganal.c
===================================================================
--- cfganal.c	(revision 194924)
+++ cfganal.c	(working copy)
@@ -1079,34 +1079,46 @@  dfs_enumerate_from (basic_block bb, int reverse,
 
    The number of nodes touched by this algorithm is equal to the size
    of the dominance frontiers, no more, no less.
+
+   If FORWARD is false, run on the reverse CFG so that control dependences
+   are computed instead.
 */
 
 
 static void
-compute_dominance_frontiers_1 (bitmap_head *frontiers)
+compute_dominance_frontiers_1 (bitmap_head *frontiers, bool forward)
 {
   edge p;
   edge_iterator ei;
   basic_block b;
+  enum cdi_direction dir = forward ? CDI_DOMINATORS : CDI_POST_DOMINATORS;
+  basic_block entrybb = forward ? ENTRY_BLOCK_PTR : EXIT_BLOCK_PTR;
+
   FOR_EACH_BB (b)
     {
-      if (EDGE_COUNT (b->preds) >= 2)
+      vec<edge, va_gc> *preds = forward ? b->preds : b->succs;
+      if (EDGE_COUNT (preds) >= 2)
 	{
-	  FOR_EACH_EDGE (p, ei, b->preds)
+	  FOR_EACH_EDGE (p, ei, preds)
 	    {
-	      basic_block runner = p->src;
+	      basic_block runner = forward ? p->src : p->dest;
 	      basic_block domsb;
-	      if (runner == ENTRY_BLOCK_PTR)
+
+	      if (runner == entrybb)
 		continue;
+	      if (!forward && (runner->flags & BB_NEVER_REACHED))
+		{
+		  bitmap_set_bit (&frontiers[runner->index], b->index);
+		  continue;
+		}
 
-	      domsb = get_immediate_dominator (CDI_DOMINATORS, b);
+	      domsb = get_immediate_dominator (dir, b);
 	      while (runner != domsb)
 		{
 		  if (!bitmap_set_bit (&frontiers[runner->index],
 				       b->index))
 		    break;
-		  runner = get_immediate_dominator (CDI_DOMINATORS,
-						    runner);
+		  runner = get_immediate_dominator (dir, runner);
 		}
 	    }
 	}
@@ -1119,11 +1131,21 @@  compute_dominance_frontiers (bitmap_head *frontier
 {
   timevar_push (TV_DOM_FRONTIERS);
 
-  compute_dominance_frontiers_1 (frontiers);
+  compute_dominance_frontiers_1 (frontiers, /*forward=*/true);
 
   timevar_pop (TV_DOM_FRONTIERS);
 }
 
+void
+compute_control_dependences (bitmap_head *control_deps)
+{
+  timevar_push (TV_CONTROL_DEPENDENCES);
+
+  compute_dominance_frontiers_1 (control_deps, /*forward=*/false);
+
+  timevar_pop (TV_CONTROL_DEPENDENCES);
+}
+
 /* Given a set of blocks with variable definitions (DEF_BLOCKS),
    return a bitmap with all the blocks in the iterated dominance
    frontier of the blocks in DEF_BLOCKS.  DFS contains dominance