Patchwork Preserve loops from CFG build until after RTL loop opts

login
register
mail settings
Submitter Tom de Vries
Date April 26, 2013, 2:27 p.m.
Message ID <517A8EE8.8010805@mentor.com>
Download mbox | patch
Permalink /patch/239883/
State New
Headers show

Comments

Tom de Vries - April 26, 2013, 2:27 p.m.
On 25/04/13 16:19, Richard Biener wrote:
<SNIP>
> and compared to the previous patch changed the tree-ssa-tailmerge.c
> part to deal with merging of loop latch and loop preheader (even
> if that's a really bad idea) to not regress gcc.dg/pr50763.c.
> Any suggestion on how to improve that part welcome.
<SNIP>
> 	* tree-ssa-tail-merge.c: Include cfgloop.h.
> 	(replace_block_by): When merging loop latches mark loops for fixup.
<SNIP>
> Index: trunk/gcc/tree-ssa-tail-merge.c
> ===================================================================
> *** trunk.orig/gcc/tree-ssa-tail-merge.c	2013-04-25 11:31:14.000000000 +0200
> --- trunk/gcc/tree-ssa-tail-merge.c	2013-04-25 12:39:00.236390580 +0200
> *************** along with GCC; see the file COPYING3.
> *** 197,202 ****
> --- 197,203 ----
>   #include "gimple-pretty-print.h"
>   #include "tree-ssa-sccvn.h"
>   #include "tree-dump.h"
> + #include "cfgloop.h"
>   
>   /* ??? This currently runs as part of tree-ssa-pre.  Why is this not
>      a stand-alone GIMPLE pass?  */
> *************** replace_block_by (basic_block bb1, basic
> *** 1459,1464 ****
> --- 1460,1476 ----
>     /* Mark the basic block as deleted.  */
>     mark_basic_block_deleted (bb1);
>   
> +   /* ???  If we merge the loop preheader with the loop latch we are creating
> +      additional entries into the loop, eventually rotating it.
> +      Mark loops for fixup in this case.
> +      ???  This is a completely unwanted transform and will wreck most
> +      loops at this point - but with just not considering loop latches as
> +      merge candidates we fail to commonize the two loops in gcc.dg/pr50763.c.
> +      A better fix to avoid that regression is needed.  */
> +   if (current_loops
> +       && bb2->loop_father->latch == bb2)
> +     loops_state_set (LOOPS_NEED_FIXUP);
> + 
>     /* Redirect the incoming edges of bb1 to bb2.  */
>     for (i = EDGE_COUNT (bb1->preds); i > 0 ; --i)
>       {
<SNIP>

Richard,

I'm not sure if I get your comment about the two loops in pr50763.c. There is
just one loop, both before and after tail-merge.

BEFORE:
        2(e)
       / \
      *   *
     3     9
      \   /
       * *
        4
       / \
      *   *
     5     10
     |     |
     *     *
 +--*7     8(x)
 |  / \    *
 \ *   *  /
  6     11

TAIL-MERGE:
1. merges empty blocks 10 and 11
2. merges empty block 5 and 6
3. merges block 4 and 7, which are empty except for testing the conditional,
The transformations in steps 2 and 3 affect the loop.

AFTER:
        2(e)
       / \
      *   *
     3     9
      \   /
       * *
    +-*4,7
    |  / \
    \ *   *
     5,6   10,11
                \
                 *
                  8(x)

Although step 2 and 3 reduce the amount of BBs, which could make sense for
compile-for-size, I wonder whether this transformation works in general.

Step 3 only works if the same conditional is tested, which means an eternal loop.

Step 2 works if the loop pre-header and the loop latch are empty. This will be
the case quite often since loop_optimizer_init is called with
LOOPS_HAVE_PREHEADERS | LOOPS_HAVE_SIMPLE_LATCHES before pass_pre. OTOH, loops
will typically have a non-virtual phi, with different values coming from the
loop pre-header and the loop latch, which prevents the optimization.

So I think this is really a cornercase, and we should disregard it if that makes
things simpler.

Rather than fixing up the loop structure, we could prevent tail-merge in these
cases.

The current fix tests for current_loops == NULL, and I'm not sure that can still
happen there, given that we have PROP_loops.

It's not evident to me that the test bb2->loop_father->latch == bb2 is
sufficient. Before calling tail_merge_optimize, we call loop_optimizer_finalize
in which we assert that LOOPS_MAY_HAVE_MULTIPLE_LATCHES from there on, so in
theory we might miss some latches.

But I guess that pre (having started out with simple latches) maintains simple
latches throughout, and that tail-merge does the same.

Tentative patch attached. I'll try build & test.

[ Btw, it would be nice if restricting the optimization also means that we can
  simplify dominator handling in the pass. ]

Thanks,
- Tom

2013-04-26  Tom de Vries  <tom@codesourcery.com>

	* tree-ssa-tail-merge.c (find_same_succ_bb): Skip loop latch bbs.
	(replace_block_by): Don't set LOOPS_NEED_FIXUP.

	* gcc.dg/pr50763.c: Update test.

Patch

diff --git a/gcc/tree-ssa-tail-merge.c b/gcc/tree-ssa-tail-merge.c
index f2ab7444..e49c3e7 100644
--- a/gcc/tree-ssa-tail-merge.c
+++ b/gcc/tree-ssa-tail-merge.c
@@ -689,7 +689,8 @@  find_same_succ_bb (basic_block bb, same_succ *same_p)
   edge_iterator ei;
   edge e;
 
-  if (bb == NULL)
+  if (bb == NULL
+      || bb->loop_father->latch == bb)
     return;
   bitmap_set_bit (same->bbs, bb->index);
   FOR_EACH_EDGE (e, ei, bb->succs)
@@ -1460,17 +1461,6 @@  replace_block_by (basic_block bb1, basic_block bb2)
   /* Mark the basic block as deleted.  */
   mark_basic_block_deleted (bb1);
 
-  /* ???  If we merge the loop preheader with the loop latch we are creating
-     additional entries into the loop, eventually rotating it.
-     Mark loops for fixup in this case.
-     ???  This is a completely unwanted transform and will wreck most
-     loops at this point - but with just not considering loop latches as
-     merge candidates we fail to commonize the two loops in gcc.dg/pr50763.c.
-     A better fix to avoid that regression is needed.  */
-  if (current_loops
-      && bb2->loop_father->latch == bb2)
-    loops_state_set (LOOPS_NEED_FIXUP);
-
   /* Redirect the incoming edges of bb1 to bb2.  */
   for (i = EDGE_COUNT (bb1->preds); i > 0 ; --i)
     {
diff --git a/gcc/testsuite/gcc.dg/pr50763.c b/gcc/testsuite/gcc.dg/pr50763.c
index 60025e3..695b61c 100644
--- a/gcc/testsuite/gcc.dg/pr50763.c
+++ b/gcc/testsuite/gcc.dg/pr50763.c
@@ -12,5 +12,5 @@  foo (int c, int d)
   while (c == d);
 }
 
-/* { dg-final { scan-tree-dump-times "== 33" 1 "pre"} } */
+/* { dg-final { scan-tree-dump-times "== 33" 2 "pre"} } */
 /* { dg-final { cleanup-tree-dump "pre" } } */