diff mbox

[RFA,PR,middle-end/59285] BARRIERS and merged blocks

Message ID CABu31nPcksGH6ovMJzhChPmxeaJRhD+Ja37Y8bstFzzsui7X3w@mail.gmail.com
State New
Headers show

Commit Message

Steven Bosscher Dec. 24, 2013, 8:19 p.m. UTC
On Fri, Dec 20, 2013 at 7:30 PM, Jeff Law wrote:
>
> So here's an alternate approach to fixing 59285.  I still think attacking
> this in rtl_merge_blocks is better, but with nobody else chiming in to break
> the deadlock Steven and myself are in, I'll go with Steven's preferred
> solution (fix the callers in ifcvt.c).

I didn't intend to cause a deadlock, I only really want us to respect
the rules of the CFG, one of which is that you can't merge two basic
blocks that are not connected by an edge. I think this is a really
important invariant because it avoids accidental basic block merges
that are not correct.


> If we were to return to a "fix rtl_merge_blocks" approach, I would revamp
> that patch to utilize the ideas in this one.  Namely that it's not just
> barriers between the merged blocks that are a problem.  In fact, that's a
> symptom of the problem.  Things have already gone wrong by that point.

What has gone wrong at that point, is that we'd be trying to merge two
basic blocks that have no control flow connection. The case of
builtin_unreachable (the only legitimate case for an empty basic block
without successors) is a special case. (This is the reason why I would
like us to have a special instruction or some kind of other marker for
builtin_unreachable...)


> Given blocks A & B that will be merged.  If A has > 1 successor and B has no
> successors, the combined block will always have at least 1 successor.
> However, the combined block will be followed by a BARRIER that must be
> removed.

Note this would happen automatically if there as an edge connecting
the blocks and a JUMP_INSN ending block B.

I propose we just punt on optimizing this case for now. For GCC 4.10
we should define what behavior should result from builtin_unreachable
(Should it trap? Can it be optimized away after a while to avoid these
unnecessary conditional jumps? ...) but for the moment it seems wrong
IMHO to only optimize this in the cond_exec case and to do so against
the rules of the control flow graph.

Something like the patch below, tested with a cross-compiler for arm-eabi.
What do you think of this approach?


        PR middle-end/59285
        * ifcvt. (cond_exec_find_if_block): Do not try to if-convert an empty
        basic block without successors due to builtin_unreachable.

Comments

Jeff Law Jan. 7, 2014, 6:32 a.m. UTC | #1
On 12/24/13 13:19, Steven Bosscher wrote:
> On Fri, Dec 20, 2013 at 7:30 PM, Jeff Law wrote:
>>
>> So here's an alternate approach to fixing 59285.  I still think attacking
>> this in rtl_merge_blocks is better, but with nobody else chiming in to break
>> the deadlock Steven and myself are in, I'll go with Steven's preferred
>> solution (fix the callers in ifcvt.c).
>
> I didn't intend to cause a deadlock, I only really want us to respect
> the rules of the CFG, one of which is that you can't merge two basic
> blocks that are not connected by an edge. I think this is a really
> important invariant because it avoids accidental basic block merges
> that are not correct.
I certainly don't think you're trying to cause a deadlock, I think it's 
just in the nature of how many maintainers work.  Once someone reviews a 
patch, the other maintainers tend to step back and let the submitter and 
reviewer iterate.  That's not always the case, but it happens enough to 
be noticeable, IMHO.  And it's not necessarily bad.  Anyway...

Note carefully that we're not merging blocks without edges between them. 
  That's a misunderstanding, most likely due to my initial messages on 
this issue.  I thought I explained things better in my most recent 
message, I'll try again :-)





>
>
>> If we were to return to a "fix rtl_merge_blocks" approach, I would revamp
>> that patch to utilize the ideas in this one.  Namely that it's not just
>> barriers between the merged blocks that are a problem.  In fact, that's a
>> symptom of the problem.  Things have already gone wrong by that point.
>
> What has gone wrong at that point, is that we'd be trying to merge two
> basic blocks that have no control flow connection. The case of
> builtin_unreachable (the only legitimate case for an empty basic block
> without successors) is a special case. (This is the reason why I would
> like us to have a special instruction or some kind of other marker for
> builtin_unreachable...)
No.  That's a symptom of the problem.  We were both barking up the wrong 
tree earlier, in large part I'm sure due to my early conclusions.


>
>
>> Given blocks A & B that will be merged.  If A has > 1 successor and B has no
>> successors, the combined block will always have at least 1 successor.
>> However, the combined block will be followed by a BARRIER that must be
>> removed.
>
> Note this would happen automatically if there as an edge connecting
> the blocks and a JUMP_INSN ending block B.
Umm, no.  Read that paragraph again.  Perhaps the RTL will help too.

Given:

(note 5 1 67 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
[ ... ]

(jump_insn 17 16 19 2 (set (pc)
         (if_then_else (ne (reg:CC 100 cc)
                 (const_int 0 [0]))
             (label_ref 25)
             (pc))) j.c:14 236 {arm_cond_branch}
      (expr_list:REG_DEAD (reg:CC 100 cc)
         (int_list:REG_BR_PROB 5000 (nil)))
  -> 25)

(note 19 17 18 3 [bb 3] NOTE_INSN_BASIC_BLOCK)

(note/s 18 19 20 3 ("lab2") NOTE_INSN_DELETED_LABEL 3)

(note/s 20 18 21 ("lab") NOTE_INSN_DELETED_LABEL 4)

(barrier 21 20 25)

[ ... ]

We call merge blocks to merge blocks #2 and #3.  Note well that block #2 
reaches block #3 (and block #4).  Block #3 has no successors.  There's 
no reason I can see that we can not request those blocks be merged.  And 
if they are merged, we *must* remove the BARRIER after block #3.  If we 
do that properly (per my most recent patch), all the other issues just 
go away.




>
> I propose we just punt on optimizing this case for now.
I disagree.  There is no good reason why we can't handle this in a 
sensible manner now.

> For GCC 4.10
> we should define what behavior should result from builtin_unreachable
> (Should it trap? Can it be optimized away after a while to avoid these
> unnecessary conditional jumps? ...) but for the moment it seems wrong
> IMHO to only optimize this in the cond_exec case and to do so against
> the rules of the control flow graph.
This problem can (and should) be addressed independently of a review of 
the __builtin_unreachable semantics.

>
> Something like the patch below, tested with a cross-compiler for arm-eabi.
> What do you think of this approach?
>
>
>          PR middle-end/59285
>          * ifcvt. (cond_exec_find_if_block): Do not try to if-convert an empty
>          basic block without successors due to builtin_unreachable.
I still prefer my most recent patch.

jeff
Jeff Law Jan. 8, 2014, 5:54 a.m. UTC | #2
On 01/07/14 01:08, Steven Bosscher wrote:
> On Tuesday, January 7, 2014, Jeff Law wrote:
>
>     On 12/24/13 13:19, Steven Bosscher wrote:
>
>         For GCC 4.10
>
>         we should define what behavior should result from
>         builtin_unreachable
>         (Should it trap? Can it be optimized away after a while to avoid
>         these
>         unnecessary conditional jumps? ...) but for the moment it seems
>         wrong
>         IMHO to only optimize this in the cond_exec case and to do so
>         against
>         the rules of the control flow graph.
>
>     This problem can (and should) be addressed independently of a review
>     of the __builtin_unreachable semantics.
>
>
> Agreed.
:-)

>
>
>
>         Something like the patch below, tested with a cross-compiler for
>         arm-eabi.
>         What do you think of this approach?
>
>
>                   PR middle-end/59285
>                   * ifcvt. (cond_exec_find_if_block): Do not try to
>         if-convert an empty
>                   basic block without successors due to builtin_unreachable.
>
>     I still prefer my most recent patch.
>
>     jeff
>
>
>
> Ok!
Thanks.  One more P1 off the list :-)

jeff
diff mbox

Patch

Index: ifcvt.c
===================================================================
--- ifcvt.c     (revision 206195)
+++ ifcvt.c     (working copy)
@@ -3495,6 +3495,13 @@  cond_exec_find_if_block (struct ce_if_block * ce_i
      Check for the last insn of the THEN block being an indirect jump, which
      is listed as not having any successors, but confuses the rest of the CE
      code processing.  ??? we should fix this in the future.  */
+  /* To make things worse: A block that ended in builtin_unreachable is
+     usually empty.  Perhaps we should optimize these away, but the semantics
+     of builtin_unreachable are not really clear about this, and if we do
+     optimize builtin_unreachable here (i.e. in the cond_exec path) we have
+     a strange difference of semantics of builtin_unreachable on cond_exec
+     and non-cond_exec targets.  Therefore, at least for now, don't merge
+     away a builtin_unreachable block.  */
   if (EDGE_COUNT (then_bb->succs) == 0)
     {
       if (single_pred_p (else_bb) && else_bb != EXIT_BLOCK_PTR_FOR_FN (cfun))
@@ -3511,6 +3518,12 @@  cond_exec_find_if_block (struct ce_if_block * ce_i
              && ! simplejump_p (last_insn))
            return FALSE;

+         /* Empty block (no non-note insns anyway) only happens with
+            builtin_unreachable.  merge_if_blocks isn't prepared for
+            that.  See PR59285.  */
+         if (last_insn == BB_HEAD (then_bb))
+           return FALSE;
+
          join_bb = else_bb;
          else_bb = NULL_BLOCK;
        }