Patchwork Fix CDDCE miscompilation (PR tree-optimization/55018)

login
register
mail settings
Submitter Steven Bosscher
Date Oct. 22, 2012, 10:49 p.m.
Message ID <CABu31nP5OD0Wk4V=b04yK6CKWqJ_+zygyuJ2K_HLE=n9D-6nwQ@mail.gmail.com>
Download mbox | patch
Permalink /patch/193299/
State New
Headers show

Comments

Steven Bosscher - Oct. 22, 2012, 10:49 p.m.
On Mon, Oct 22, 2012 at 11:09 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> Wouldn't it be way cheaper to just export dfs_find_deadend from cfganal.c
> and call it in calc_dfs_tree on each unconnected bb?
> I.e. (untested with the exception of the testcase):

FWIW, dfs_find_deadend looks broken to me for this usage case. It
could return a self-loop block with more than one successor. For a
pre-order search like dominance.c needs, you'd have to look as deep as
possible, something like this:



(And the (EDGE_COUNT(bb->succs) == 0) is unnecessary for
inverted_post_order_compute because it already puts all such blocks on
the initial work list :-)

Ciao!
Steven
Jakub Jelinek - Oct. 23, 2012, 6:57 a.m.
On Tue, Oct 23, 2012 at 12:49:44AM +0200, Steven Bosscher wrote:
> On Mon, Oct 22, 2012 at 11:09 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> > Wouldn't it be way cheaper to just export dfs_find_deadend from cfganal.c
> > and call it in calc_dfs_tree on each unconnected bb?
> > I.e. (untested with the exception of the testcase):
> 
> FWIW, dfs_find_deadend looks broken to me for this usage case. It
> could return a self-loop block with more than one successor. For a
> pre-order search like dominance.c needs, you'd have to look as deep as
> possible, something like this:

I've bootstrapped overnight the patch I posted (without your
dfs_find_deadend change (next_bb is unused var there btw)), and there is a
new FAIL with it - ssa-dce-3.c (and your dfs_find_deadend change doesn't
change anything on it).

Before cddce1 we have:

  <bb 2>:
  goto <bb 6>;

  <bb 3>:
  j_8 = j_3 + 501;
  goto <bb 5>;

  <bb 4>:
  j_9 = j_3 + 499;

  <bb 5>:
  # j_2 = PHI <j_8(3), j_9(4)>
  i_10 = i_1 + 2;

  <bb 6>:
  # i_1 = PHI <1(2), i_10(5)>
  # j_3 = PHI <0(2), j_2(5)>
  j_6 = j_3 + 500;
  _7 = j_6 % 7;
  if (_7 != 0)
    goto <bb 3>;
  else
    goto <bb 4>;

and before the dominance.c change bb6 has fake edge to exit,
bb3 and bb4 are immediately post-dominated by bb5 and bb5 is immediately
post-dominated by bb6, thus when mark_control_dependent_edges_necessary
is called on the latch (bb5) of the infinite loop, it marks the _7 != 0
statement as necessary and the j_N and _7 assignments stay,
as 6->3 and 6->4 edges are recorded as control parents for bb3, bb4 and bb5.

With the patch bb5 is instead the bb with fake edge to exit,
and bb6, bb3 and bb4 are all immediate post-dominated by bb5 and
edge 5->6 is control parent of bb5.  So with the patch this is optimized
into just:
  <bb 2>:

  <bb 3>:
  goto <bb 3>;

I guess it is fine that way and the testcase needs adjustment, just wanted
to point to the differences.

> (And the (EDGE_COUNT(bb->succs) == 0) is unnecessary for
> inverted_post_order_compute because it already puts all such blocks on
> the initial work list :-)

And so does dominance.c:
      FOR_EACH_BB_REVERSE (b)
        {
          if (EDGE_COUNT (b->succs) > 0)
            {
              if (di->dfs_order[b->index] == 0)
                saw_unconnected = true;
              continue;
            }
          bitmap_set_bit (di->fake_exit_edge, b->index);

	Jakub

Patch

Index: cfganal.c
===================================================================
--- cfganal.c   (revision 192696)
+++ cfganal.c   (working copy)
@@ -598,18 +598,26 @@  dfs_find_deadend (basic_block bb)
 {
   sbitmap visited = sbitmap_alloc (last_basic_block);
   sbitmap_zero (visited);
+  basic_block next_bb = NULL;
+  edge_iterator ei;
+  edge e;

   for (;;)
     {
       SET_BIT (visited, bb->index);
-      if (EDGE_COUNT (bb->succs) == 0
-          || TEST_BIT (visited, EDGE_SUCC (bb, 0)->dest->index))
+      /* Look for any not yet visited successors.
+        If all successors have been visited then
+        this is the dead end we're looking for.  */
+      FOR_EACH_EDGE (e, ei, bb->succs)
+       if (! TEST_BIT (visited, e->dest->index))
+         break;
+      if (e == NULL)
         {
           sbitmap_free (visited);
           return bb;
         }

-      bb = EDGE_SUCC (bb, 0)->dest;
+      bb = e->dest;
     }

   gcc_unreachable ();