diff mbox

Fix shrink-wrap bug with anticipating into loops (PR67778, PR68634)

Message ID ebfaeba0a9c98ddf1f8ea795efc2fd3ba082fdfe.1449080153.git.segher@kernel.crashing.org
State New
Headers show

Commit Message

Segher Boessenkool Dec. 2, 2015, 6:21 p.m. UTC
After shrink-wrapping has found the "tightest fit" for where to place
the prologue, it tries move it earlier (so that frame saves are run
earlier) -- but without copying any more basic blocks.

Unfortunately a candidate block we select can be inside a loop, and we
will still allow it (because the loop always exits via our previously
chosen block).  We can do that just fine if we make a duplicate of the
block, but we do not want to here.

So we need to detect this situation.  We can place the prologue at a
previous block PRE only if PRE dominates every block reachable from
it.  This is a bit hard / expensive to compute, so instead this patch
allows a block PRE only if PRE does not post-dominate any of its
successors (other than itself).

Tested on the two testcases from the PRs.  Also regression checked
on powerpc64-linux.

Is this okay for trunk?


Segher


2015-12-02  Segher Boessenkool  <segher@kernel.crashing.org>

	* shrink-wrap.c (try_shrink_wrapping): Disallow moving the prologue
	back into a loop.

---
 gcc/shrink-wrap.c | 21 ++++++++++++++++++---
 1 file changed, 18 insertions(+), 3 deletions(-)

Comments

Jakub Jelinek Dec. 2, 2015, 7:19 p.m. UTC | #1
On Wed, Dec 02, 2015 at 06:21:47PM +0000, Segher Boessenkool wrote:
> --- a/gcc/shrink-wrap.c
> +++ b/gcc/shrink-wrap.c
> @@ -752,7 +752,11 @@ try_shrink_wrapping (edge *entry_edge, bitmap_head *bb_with,
>  
>    /* If we can move PRO back without having to duplicate more blocks, do so.
>       We can move back to a block PRE if every path from PRE will eventually
> -     need a prologue, that is, PRO is a post-dominator of PRE.  */
> +     need a prologue, that is, PRO is a post-dominator of PRE.  We might
> +     need to duplicate PRE if there is any path from a successor of PRE back
> +     to PRE, so don't allow that either (but self-loops are fine, as are any
> +     other loops entirely dominated by PRE; this in general seems too
> +     expensive to check for, for such an uncommon case).  */

So, what will happen if PRE self-loops?  It would be nice to have it covered
by a testcase.

> +	  bool ok = true;
> +
> +	  if (!can_get_prologue (pre, prologue_clobbered))
> +	    ok = false;
> +
> +	  FOR_EACH_EDGE (e, ei, pre->succs)
> +	    if (e->dest != pre
> +		&& dominated_by_p (CDI_POST_DOMINATORS, e->dest, pre))
> +	      ok = false;

I wonder if it wouldn't be better to:

	if (!can_get_prologue (pre, prologue_clobbered))
	  ok = false;
	else
	  FOR_EACH_EDGE (e, ei, pre->succs)
	    if (e->dest != pre
		&& dominated_by_p (CDI_POST_DOMINATORS, e->dest, pre))
	      {
		ok = false;
		break;
	      }

so that it doesn't walk or continue walking the edges if not needed.

Anyway, that are my comments, I'll defer the rest of the review to somebody
else.

	Jakub
Segher Boessenkool Dec. 2, 2015, 10:45 p.m. UTC | #2
On Wed, Dec 02, 2015 at 08:19:05PM +0100, Jakub Jelinek wrote:
> On Wed, Dec 02, 2015 at 06:21:47PM +0000, Segher Boessenkool wrote:
> > --- a/gcc/shrink-wrap.c
> > +++ b/gcc/shrink-wrap.c
> > @@ -752,7 +752,11 @@ try_shrink_wrapping (edge *entry_edge, bitmap_head *bb_with,
> >  
> >    /* If we can move PRO back without having to duplicate more blocks, do so.
> >       We can move back to a block PRE if every path from PRE will eventually
> > -     need a prologue, that is, PRO is a post-dominator of PRE.  */
> > +     need a prologue, that is, PRO is a post-dominator of PRE.  We might
> > +     need to duplicate PRE if there is any path from a successor of PRE back
> > +     to PRE, so don't allow that either (but self-loops are fine, as are any
> > +     other loops entirely dominated by PRE; this in general seems too
> > +     expensive to check for, for such an uncommon case).  */
> 
> So, what will happen if PRE self-loops?

The prologue is put in a new block before the chosen one (as always).

> It would be nice to have it covered by a testcase.

If I knew how to prepare one, that stayed stable for more than about
two weeks, yes :-/

> > +	  bool ok = true;
> > +
> > +	  if (!can_get_prologue (pre, prologue_clobbered))
> > +	    ok = false;
> > +
> > +	  FOR_EACH_EDGE (e, ei, pre->succs)
> > +	    if (e->dest != pre
> > +		&& dominated_by_p (CDI_POST_DOMINATORS, e->dest, pre))
> > +	      ok = false;
> 
> I wonder if it wouldn't be better to:
> 
> 	if (!can_get_prologue (pre, prologue_clobbered))
> 	  ok = false;
> 	else
> 	  FOR_EACH_EDGE (e, ei, pre->succs)
> 	    if (e->dest != pre
> 		&& dominated_by_p (CDI_POST_DOMINATORS, e->dest, pre))
> 	      {
> 		ok = false;
> 		break;
> 	      }
> 
> so that it doesn't walk or continue walking the edges if not needed.

If the compiler is any good, neither does my code, right?  :-)

I think it is more important to have this code readable than a teeny
tiny bit faster.  It is all linear (assuming dominator lookups are O(1)),
which isn't too hard to ascertain (yeah, famous last words).


Segher
Bernd Schmidt Dec. 3, 2015, 11:31 a.m. UTC | #3
On 12/02/2015 07:21 PM, Segher Boessenkool wrote:
> After shrink-wrapping has found the "tightest fit" for where to place
> the prologue, it tries move it earlier (so that frame saves are run
> earlier) -- but without copying any more basic blocks.
>
> Unfortunately a candidate block we select can be inside a loop, and we
> will still allow it (because the loop always exits via our previously
> chosen block).

> So we need to detect this situation.  We can place the prologue at a
> previous block PRE only if PRE dominates every block reachable from
> it.  This is a bit hard / expensive to compute, so instead this patch
> allows a block PRE only if PRE does not post-dominate any of its
> successors (other than itself).

Are the two conditions equivalent though? I'm not fully convinced. Let's 
say the loop has multiple exits, then none of these exit blocks 
postdominate the loop entry block, right?

I think I agree with Jakub that we don't want to do unnecessary work in 
this piece of code.

>     /* If we can move PRO back without having to duplicate more blocks, do so.
>        We can move back to a block PRE if every path from PRE will eventually
> -     need a prologue, that is, PRO is a post-dominator of PRE.  */
> +     need a prologue, that is, PRO is a post-dominator of PRE.  We might
> +     need to duplicate PRE if there is any path from a successor of PRE back
> +     to PRE, so don't allow that either (but self-loops are fine, as are any
> +     other loops entirely dominated by PRE; this in general seems too
> +     expensive to check for, for such an uncommon case).  */

The last comment is unclear and I don't know what it wants to tell me.


Bernd
Bernd Schmidt Dec. 3, 2015, 11:35 a.m. UTC | #4
On 12/02/2015 07:21 PM, Segher Boessenkool wrote:
> After shrink-wrapping has found the "tightest fit" for where to place
> the prologue, it tries move it earlier (so that frame saves are run
> earlier) -- but without copying any more basic blocks.

Another question would be - is there really a good reason to do this at all?


Bernd
Richard Sandiford Dec. 3, 2015, 3:20 p.m. UTC | #5
Segher Boessenkool <segher@kernel.crashing.org> writes:
> On Wed, Dec 02, 2015 at 08:19:05PM +0100, Jakub Jelinek wrote:
>> On Wed, Dec 02, 2015 at 06:21:47PM +0000, Segher Boessenkool wrote:
>> > --- a/gcc/shrink-wrap.c
>> > +++ b/gcc/shrink-wrap.c
>> > @@ -752,7 +752,11 @@ try_shrink_wrapping (edge *entry_edge, bitmap_head *bb_with,
>> >  
>> >    /* If we can move PRO back without having to duplicate more blocks, do so.
>> >       We can move back to a block PRE if every path from PRE will eventually
>> > -     need a prologue, that is, PRO is a post-dominator of PRE.  */
>> > +     need a prologue, that is, PRO is a post-dominator of PRE.  We might
>> > +     need to duplicate PRE if there is any path from a successor of PRE back
>> > +     to PRE, so don't allow that either (but self-loops are fine, as are any
>> > +     other loops entirely dominated by PRE; this in general seems too
>> > +     expensive to check for, for such an uncommon case).  */
>> 
>> So, what will happen if PRE self-loops?
>
> The prologue is put in a new block before the chosen one (as always).
>
>> It would be nice to have it covered by a testcase.
>
> If I knew how to prepare one, that stayed stable for more than about
> two weeks, yes :-/
>
>> > +	  bool ok = true;
>> > +
>> > +	  if (!can_get_prologue (pre, prologue_clobbered))
>> > +	    ok = false;
>> > +
>> > +	  FOR_EACH_EDGE (e, ei, pre->succs)
>> > +	    if (e->dest != pre
>> > +		&& dominated_by_p (CDI_POST_DOMINATORS, e->dest, pre))
>> > +	      ok = false;
>> 
>> I wonder if it wouldn't be better to:
>> 
>> 	if (!can_get_prologue (pre, prologue_clobbered))
>> 	  ok = false;
>> 	else
>> 	  FOR_EACH_EDGE (e, ei, pre->succs)
>> 	    if (e->dest != pre
>> 		&& dominated_by_p (CDI_POST_DOMINATORS, e->dest, pre))
>> 	      {
>> 		ok = false;
>> 		break;
>> 	      }
>> 
>> so that it doesn't walk or continue walking the edges if not needed.
>
> If the compiler is any good, neither does my code, right?  :-)
>
> I think it is more important to have this code readable than a teeny
> tiny bit faster.  It is all linear (assuming dominator lookups are O(1)),
> which isn't too hard to ascertain (yeah, famous last words).

Maybe the clearest thing is to split it out into a function that returns
false as soon as it finds a reason why the transform is not OK.
The "decent compiler" ought to inline that function.

Thanks,
Richard
Segher Boessenkool Dec. 3, 2015, 7:46 p.m. UTC | #6
On Thu, Dec 03, 2015 at 12:35:51PM +0100, Bernd Schmidt wrote:
> On 12/02/2015 07:21 PM, Segher Boessenkool wrote:
> >After shrink-wrapping has found the "tightest fit" for where to place
> >the prologue, it tries move it earlier (so that frame saves are run
> >earlier) -- but without copying any more basic blocks.
> 
> Another question would be - is there really a good reason to do this at all?

I haven't actually benchmarked it to see if it in fact matters for
performance.  The original code did something similar, but perhaps not
for the same reasons.  The goal is to put the prologue as early as
possible while only putting it on paths that need it (the code before
here puts it as *late* as possible instead).

Moving the prologue earlier gives more free registers (the ones it saved)
in the blocks "skipped", so that late passes have more to work with.
More importantly, moving the prologue and the epilogue further apart
avoids some execution hazards.


Segher
Segher Boessenkool Dec. 3, 2015, 8:14 p.m. UTC | #7
On Thu, Dec 03, 2015 at 12:31:53PM +0100, Bernd Schmidt wrote:
> On 12/02/2015 07:21 PM, Segher Boessenkool wrote:
> >After shrink-wrapping has found the "tightest fit" for where to place
> >the prologue, it tries move it earlier (so that frame saves are run
> >earlier) -- but without copying any more basic blocks.
> >
> >Unfortunately a candidate block we select can be inside a loop, and we
> >will still allow it (because the loop always exits via our previously
> >chosen block).
> 
> >So we need to detect this situation.  We can place the prologue at a
> >previous block PRE only if PRE dominates every block reachable from
> >it.  This is a bit hard / expensive to compute, so instead this patch
> >allows a block PRE only if PRE does not post-dominate any of its
> >successors (other than itself).
> 
> Are the two conditions equivalent though?

They are not, one is a subset of the other.  By construction, the block
PRE (the new candidate for getting the prologue) dominates PRO (the
original block to get the prologue), and PRO post-dominates PRE.  Now,
PRE is only suitable if it dominates every block reachable from it,
since otherwise putting the prologue on PRE instead of on PRO requires
duplicating more blocks.

Hrm.  A successor block of PRE could loop back to PRE conditionally,
and go to PRO otherwise.  Rats, what was I thinking.  Thanks for catching
it; I'll have to think of something better.  A bit more factoring will
probably help, we'll see.

> I think I agree with Jakub that we don't want to do unnecessary work in 
> this piece of code.

I agree as well.

> >    /* If we can move PRO back without having to duplicate more blocks, do 
> >    so.
> >       We can move back to a block PRE if every path from PRE will 
> >       eventually
> >-     need a prologue, that is, PRO is a post-dominator of PRE.  */
> >+     need a prologue, that is, PRO is a post-dominator of PRE.  We might
> >+     need to duplicate PRE if there is any path from a successor of PRE 
> >back
> >+     to PRE, so don't allow that either (but self-loops are fine, as are 
> >any
> >+     other loops entirely dominated by PRE; this in general seems too
> >+     expensive to check for, for such an uncommon case).  */
> 
> The last comment is unclear and I don't know what it wants to tell me.

Yeah, sorry.  Writing text is hard :-)


Segher
diff mbox

Patch

diff --git a/gcc/shrink-wrap.c b/gcc/shrink-wrap.c
index 3a1df84..fe33652 100644
--- a/gcc/shrink-wrap.c
+++ b/gcc/shrink-wrap.c
@@ -752,7 +752,11 @@  try_shrink_wrapping (edge *entry_edge, bitmap_head *bb_with,
 
   /* If we can move PRO back without having to duplicate more blocks, do so.
      We can move back to a block PRE if every path from PRE will eventually
-     need a prologue, that is, PRO is a post-dominator of PRE.  */
+     need a prologue, that is, PRO is a post-dominator of PRE.  We might
+     need to duplicate PRE if there is any path from a successor of PRE back
+     to PRE, so don't allow that either (but self-loops are fine, as are any
+     other loops entirely dominated by PRE; this in general seems too
+     expensive to check for, for such an uncommon case).  */
 
   if (pro != entry)
     {
@@ -765,9 +769,20 @@  try_shrink_wrapping (edge *entry_edge, bitmap_head *bb_with,
 	  if (!dominated_by_p (CDI_POST_DOMINATORS, pre, pro))
 	    break;
 
+	  bool ok = true;
+
+	  if (!can_get_prologue (pre, prologue_clobbered))
+	    ok = false;
+
+	  FOR_EACH_EDGE (e, ei, pre->succs)
+	    if (e->dest != pre
+		&& dominated_by_p (CDI_POST_DOMINATORS, e->dest, pre))
+	      ok = false;
+
+	  if (ok)
+	    last_ok = pre;
+
 	  pro = pre;
-	  if (can_get_prologue (pro, prologue_clobbered))
-	    last_ok = pro;
 	}
       pro = last_ok;