diff mbox

[v4] Fix shrink-wrapping bug (PR67778, PR68634)

Message ID f1e54655c02ae9c5f5a013f0ecc4b27c5a351fa7.1449693269.git.segher@kernel.crashing.org
State New
Headers show

Commit Message

Segher Boessenkool Dec. 9, 2015, 8:47 p.m. UTC
After shrink-wrapping has found the "tightest fit" for where to place
the prologue, it tries move it earlier (so that frame saves are run
earlier) -- but without copying any more basic blocks.

Unfortunately a candidate block we select can be inside a loop, and we
will still allow it (because the loop always exits via our previously
chosen block).  We can do that just fine if we make a duplicate of the
block, but we do not want to here.

So we need to detect this situation.  We can place the prologue at a
previous block PRE only if PRE dominates every block reachable from
it, because then we will never need to duplicate that block (it will
always be executed with prologue).

v4: Fixed all the stupid mistakes you noticed.  Also, the previous
version stopped looking when the previous try didn't work out.  This
version doesn't: it is simpler, more in line with the rest of the
algorithm, potentially useful, and doesn't really cost more.

Tested on the two testcases from the PRs.  Also regression checked
on powerpc64-linux.

Is this okay for trunk?


Segher


2015-12-09  Segher Boessenkool  <segher@kernel.crashing.org>

	PR rtl-optimization/67778
	PR rtl-optimization/68634
	* shrink-wrap.c (try_shrink_wrapping): Add a comment about why we want
	to put the prologue earlier.  When determining if an earlier block is
	suitable, make sure it dominates every block reachable from it.

---
 gcc/shrink-wrap.c | 42 +++++++++++++++++++++++++++++++++++++-----
 1 file changed, 37 insertions(+), 5 deletions(-)


---
 gcc/shrink-wrap.c | 40 +++++++++++++++++++++++++++++++++++-----
 1 file changed, 35 insertions(+), 5 deletions(-)

Comments

Bernd Schmidt Dec. 10, 2015, 11:40 a.m. UTC | #1
On 12/09/2015 09:47 PM, Segher Boessenkool wrote:
> After shrink-wrapping has found the "tightest fit" for where to place
> the prologue, it tries move it earlier (so that frame saves are run
> earlier) -- but without copying any more basic blocks.
>
> Unfortunately a candidate block we select can be inside a loop, and we
> will still allow it (because the loop always exits via our previously
> chosen block).  We can do that just fine if we make a duplicate of the
> block, but we do not want to here.
>
> So we need to detect this situation.  We can place the prologue at a
> previous block PRE only if PRE dominates every block reachable from
> it, because then we will never need to duplicate that block (it will
> always be executed with prologue).
>
> v4: Fixed all the stupid mistakes you noticed.  Also, the previous
> version stopped looking when the previous try didn't work out.  This
> version doesn't: it is simpler, more in line with the rest of the
> algorithm, potentially useful, and doesn't really cost more.
>
> Tested on the two testcases from the PRs.  Also regression checked
> on powerpc64-linux.
>
> Is this okay for trunk?

Ok. This seems like a safe way to test things. For gcc-7 it might be 
worthwhile to try to have loop information available and use that for 
efficiency.


Bernd
diff mbox

Patch

diff --git a/gcc/shrink-wrap.c b/gcc/shrink-wrap.c
index 3a1df84..f65b0c3 100644
--- a/gcc/shrink-wrap.c
+++ b/gcc/shrink-wrap.c
@@ -744,36 +744,66 @@  try_shrink_wrapping (edge *entry_edge, bitmap_head *bb_with,
 	  vec.quick_push (e->dest);
     }
 
-  vec.release ();
-
   if (dump_file)
     fprintf (dump_file, "Avoiding non-duplicatable blocks, PRO is now %d\n",
 	     pro->index);
 
   /* If we can move PRO back without having to duplicate more blocks, do so.
+     We do this because putting the prologue earlier is better for scheduling.
      We can move back to a block PRE if every path from PRE will eventually
-     need a prologue, that is, PRO is a post-dominator of PRE.  */
+     need a prologue, that is, PRO is a post-dominator of PRE.  PRE needs
+     to dominate every block reachable from itself.  */
 
   if (pro != entry)
     {
       calculate_dominance_info (CDI_POST_DOMINATORS);
 
+      bitmap bb_tmp = BITMAP_ALLOC (NULL);
+      bitmap_copy (bb_tmp, bb_with);
       basic_block last_ok = pro;
+      vec.truncate (0);
+
       while (pro != entry)
 	{
 	  basic_block pre = get_immediate_dominator (CDI_DOMINATORS, pro);
 	  if (!dominated_by_p (CDI_POST_DOMINATORS, pre, pro))
 	    break;
 
+	  if (bitmap_set_bit (bb_tmp, pre->index))
+	    vec.quick_push (pre);
+
+	  bool ok = true;
+	  while (!vec.is_empty ())
+	    {
+	      basic_block bb = vec.pop ();
+	      bitmap_set_bit (bb_tmp, pre->index);
+
+	      if (!dominated_by_p (CDI_DOMINATORS, bb, pre))
+		{
+		  ok = false;
+		  break;
+		}
+
+	      FOR_EACH_EDGE (e, ei, bb->succs)
+		if (!bitmap_bit_p (bb_with, e->dest->index)
+		    && bitmap_set_bit (bb_tmp, e->dest->index))
+		  vec.quick_push (e->dest);
+	    }
+
+	  if (ok && can_get_prologue (pre, prologue_clobbered))
+	    last_ok = pre;
+
 	  pro = pre;
-	  if (can_get_prologue (pro, prologue_clobbered))
-	    last_ok = pro;
 	}
+
       pro = last_ok;
 
+      BITMAP_FREE (bb_tmp);
       free_dominance_info (CDI_POST_DOMINATORS);
     }
 
+  vec.release ();
+
   if (dump_file)
     fprintf (dump_file, "Bumping back to anticipatable blocks, PRO is now %d\n",
 	     pro->index);