Message ID | ebfaeba0a9c98ddf1f8ea795efc2fd3ba082fdfe.1449080153.git.segher@kernel.crashing.org |
---|---|
State | New |
Headers | show |
On Wed, Dec 02, 2015 at 06:21:47PM +0000, Segher Boessenkool wrote: > --- a/gcc/shrink-wrap.c > +++ b/gcc/shrink-wrap.c > @@ -752,7 +752,11 @@ try_shrink_wrapping (edge *entry_edge, bitmap_head *bb_with, > > /* If we can move PRO back without having to duplicate more blocks, do so. > We can move back to a block PRE if every path from PRE will eventually > - need a prologue, that is, PRO is a post-dominator of PRE. */ > + need a prologue, that is, PRO is a post-dominator of PRE. We might > + need to duplicate PRE if there is any path from a successor of PRE back > + to PRE, so don't allow that either (but self-loops are fine, as are any > + other loops entirely dominated by PRE; this in general seems too > + expensive to check for, for such an uncommon case). */ So, what will happen if PRE self-loops? It would be nice to have it covered by a testcase. > + bool ok = true; > + > + if (!can_get_prologue (pre, prologue_clobbered)) > + ok = false; > + > + FOR_EACH_EDGE (e, ei, pre->succs) > + if (e->dest != pre > + && dominated_by_p (CDI_POST_DOMINATORS, e->dest, pre)) > + ok = false; I wonder if it wouldn't be better to: if (!can_get_prologue (pre, prologue_clobbered)) ok = false; else FOR_EACH_EDGE (e, ei, pre->succs) if (e->dest != pre && dominated_by_p (CDI_POST_DOMINATORS, e->dest, pre)) { ok = false; break; } so that it doesn't walk or continue walking the edges if not needed. Anyway, that are my comments, I'll defer the rest of the review to somebody else. Jakub
On Wed, Dec 02, 2015 at 08:19:05PM +0100, Jakub Jelinek wrote: > On Wed, Dec 02, 2015 at 06:21:47PM +0000, Segher Boessenkool wrote: > > --- a/gcc/shrink-wrap.c > > +++ b/gcc/shrink-wrap.c > > @@ -752,7 +752,11 @@ try_shrink_wrapping (edge *entry_edge, bitmap_head *bb_with, > > > > /* If we can move PRO back without having to duplicate more blocks, do so. > > We can move back to a block PRE if every path from PRE will eventually > > - need a prologue, that is, PRO is a post-dominator of PRE. */ > > + need a prologue, that is, PRO is a post-dominator of PRE. We might > > + need to duplicate PRE if there is any path from a successor of PRE back > > + to PRE, so don't allow that either (but self-loops are fine, as are any > > + other loops entirely dominated by PRE; this in general seems too > > + expensive to check for, for such an uncommon case). */ > > So, what will happen if PRE self-loops? The prologue is put in a new block before the chosen one (as always). > It would be nice to have it covered by a testcase. If I knew how to prepare one, that stayed stable for more than about two weeks, yes :-/ > > + bool ok = true; > > + > > + if (!can_get_prologue (pre, prologue_clobbered)) > > + ok = false; > > + > > + FOR_EACH_EDGE (e, ei, pre->succs) > > + if (e->dest != pre > > + && dominated_by_p (CDI_POST_DOMINATORS, e->dest, pre)) > > + ok = false; > > I wonder if it wouldn't be better to: > > if (!can_get_prologue (pre, prologue_clobbered)) > ok = false; > else > FOR_EACH_EDGE (e, ei, pre->succs) > if (e->dest != pre > && dominated_by_p (CDI_POST_DOMINATORS, e->dest, pre)) > { > ok = false; > break; > } > > so that it doesn't walk or continue walking the edges if not needed. If the compiler is any good, neither does my code, right? :-) I think it is more important to have this code readable than a teeny tiny bit faster. It is all linear (assuming dominator lookups are O(1)), which isn't too hard to ascertain (yeah, famous last words). Segher
On 12/02/2015 07:21 PM, Segher Boessenkool wrote: > After shrink-wrapping has found the "tightest fit" for where to place > the prologue, it tries move it earlier (so that frame saves are run > earlier) -- but without copying any more basic blocks. > > Unfortunately a candidate block we select can be inside a loop, and we > will still allow it (because the loop always exits via our previously > chosen block). > So we need to detect this situation. We can place the prologue at a > previous block PRE only if PRE dominates every block reachable from > it. This is a bit hard / expensive to compute, so instead this patch > allows a block PRE only if PRE does not post-dominate any of its > successors (other than itself). Are the two conditions equivalent though? I'm not fully convinced. Let's say the loop has multiple exits, then none of these exit blocks postdominate the loop entry block, right? I think I agree with Jakub that we don't want to do unnecessary work in this piece of code. > /* If we can move PRO back without having to duplicate more blocks, do so. > We can move back to a block PRE if every path from PRE will eventually > - need a prologue, that is, PRO is a post-dominator of PRE. */ > + need a prologue, that is, PRO is a post-dominator of PRE. We might > + need to duplicate PRE if there is any path from a successor of PRE back > + to PRE, so don't allow that either (but self-loops are fine, as are any > + other loops entirely dominated by PRE; this in general seems too > + expensive to check for, for such an uncommon case). */ The last comment is unclear and I don't know what it wants to tell me. Bernd
On 12/02/2015 07:21 PM, Segher Boessenkool wrote: > After shrink-wrapping has found the "tightest fit" for where to place > the prologue, it tries move it earlier (so that frame saves are run > earlier) -- but without copying any more basic blocks. Another question would be - is there really a good reason to do this at all? Bernd
Segher Boessenkool <segher@kernel.crashing.org> writes: > On Wed, Dec 02, 2015 at 08:19:05PM +0100, Jakub Jelinek wrote: >> On Wed, Dec 02, 2015 at 06:21:47PM +0000, Segher Boessenkool wrote: >> > --- a/gcc/shrink-wrap.c >> > +++ b/gcc/shrink-wrap.c >> > @@ -752,7 +752,11 @@ try_shrink_wrapping (edge *entry_edge, bitmap_head *bb_with, >> > >> > /* If we can move PRO back without having to duplicate more blocks, do so. >> > We can move back to a block PRE if every path from PRE will eventually >> > - need a prologue, that is, PRO is a post-dominator of PRE. */ >> > + need a prologue, that is, PRO is a post-dominator of PRE. We might >> > + need to duplicate PRE if there is any path from a successor of PRE back >> > + to PRE, so don't allow that either (but self-loops are fine, as are any >> > + other loops entirely dominated by PRE; this in general seems too >> > + expensive to check for, for such an uncommon case). */ >> >> So, what will happen if PRE self-loops? > > The prologue is put in a new block before the chosen one (as always). > >> It would be nice to have it covered by a testcase. > > If I knew how to prepare one, that stayed stable for more than about > two weeks, yes :-/ > >> > + bool ok = true; >> > + >> > + if (!can_get_prologue (pre, prologue_clobbered)) >> > + ok = false; >> > + >> > + FOR_EACH_EDGE (e, ei, pre->succs) >> > + if (e->dest != pre >> > + && dominated_by_p (CDI_POST_DOMINATORS, e->dest, pre)) >> > + ok = false; >> >> I wonder if it wouldn't be better to: >> >> if (!can_get_prologue (pre, prologue_clobbered)) >> ok = false; >> else >> FOR_EACH_EDGE (e, ei, pre->succs) >> if (e->dest != pre >> && dominated_by_p (CDI_POST_DOMINATORS, e->dest, pre)) >> { >> ok = false; >> break; >> } >> >> so that it doesn't walk or continue walking the edges if not needed. > > If the compiler is any good, neither does my code, right? :-) > > I think it is more important to have this code readable than a teeny > tiny bit faster. It is all linear (assuming dominator lookups are O(1)), > which isn't too hard to ascertain (yeah, famous last words). Maybe the clearest thing is to split it out into a function that returns false as soon as it finds a reason why the transform is not OK. The "decent compiler" ought to inline that function. Thanks, Richard
On Thu, Dec 03, 2015 at 12:35:51PM +0100, Bernd Schmidt wrote: > On 12/02/2015 07:21 PM, Segher Boessenkool wrote: > >After shrink-wrapping has found the "tightest fit" for where to place > >the prologue, it tries move it earlier (so that frame saves are run > >earlier) -- but without copying any more basic blocks. > > Another question would be - is there really a good reason to do this at all? I haven't actually benchmarked it to see if it in fact matters for performance. The original code did something similar, but perhaps not for the same reasons. The goal is to put the prologue as early as possible while only putting it on paths that need it (the code before here puts it as *late* as possible instead). Moving the prologue earlier gives more free registers (the ones it saved) in the blocks "skipped", so that late passes have more to work with. More importantly, moving the prologue and the epilogue further apart avoids some execution hazards. Segher
On Thu, Dec 03, 2015 at 12:31:53PM +0100, Bernd Schmidt wrote: > On 12/02/2015 07:21 PM, Segher Boessenkool wrote: > >After shrink-wrapping has found the "tightest fit" for where to place > >the prologue, it tries move it earlier (so that frame saves are run > >earlier) -- but without copying any more basic blocks. > > > >Unfortunately a candidate block we select can be inside a loop, and we > >will still allow it (because the loop always exits via our previously > >chosen block). > > >So we need to detect this situation. We can place the prologue at a > >previous block PRE only if PRE dominates every block reachable from > >it. This is a bit hard / expensive to compute, so instead this patch > >allows a block PRE only if PRE does not post-dominate any of its > >successors (other than itself). > > Are the two conditions equivalent though? They are not, one is a subset of the other. By construction, the block PRE (the new candidate for getting the prologue) dominates PRO (the original block to get the prologue), and PRO post-dominates PRE. Now, PRE is only suitable if it dominates every block reachable from it, since otherwise putting the prologue on PRE instead of on PRO requires duplicating more blocks. Hrm. A successor block of PRE could loop back to PRE conditionally, and go to PRO otherwise. Rats, what was I thinking. Thanks for catching it; I'll have to think of something better. A bit more factoring will probably help, we'll see. > I think I agree with Jakub that we don't want to do unnecessary work in > this piece of code. I agree as well. > > /* If we can move PRO back without having to duplicate more blocks, do > > so. > > We can move back to a block PRE if every path from PRE will > > eventually > >- need a prologue, that is, PRO is a post-dominator of PRE. */ > >+ need a prologue, that is, PRO is a post-dominator of PRE. We might > >+ need to duplicate PRE if there is any path from a successor of PRE > >back > >+ to PRE, so don't allow that either (but self-loops are fine, as are > >any > >+ other loops entirely dominated by PRE; this in general seems too > >+ expensive to check for, for such an uncommon case). */ > > The last comment is unclear and I don't know what it wants to tell me. Yeah, sorry. Writing text is hard :-) Segher
diff --git a/gcc/shrink-wrap.c b/gcc/shrink-wrap.c index 3a1df84..fe33652 100644 --- a/gcc/shrink-wrap.c +++ b/gcc/shrink-wrap.c @@ -752,7 +752,11 @@ try_shrink_wrapping (edge *entry_edge, bitmap_head *bb_with, /* If we can move PRO back without having to duplicate more blocks, do so. We can move back to a block PRE if every path from PRE will eventually - need a prologue, that is, PRO is a post-dominator of PRE. */ + need a prologue, that is, PRO is a post-dominator of PRE. We might + need to duplicate PRE if there is any path from a successor of PRE back + to PRE, so don't allow that either (but self-loops are fine, as are any + other loops entirely dominated by PRE; this in general seems too + expensive to check for, for such an uncommon case). */ if (pro != entry) { @@ -765,9 +769,20 @@ try_shrink_wrapping (edge *entry_edge, bitmap_head *bb_with, if (!dominated_by_p (CDI_POST_DOMINATORS, pre, pro)) break; + bool ok = true; + + if (!can_get_prologue (pre, prologue_clobbered)) + ok = false; + + FOR_EACH_EDGE (e, ei, pre->succs) + if (e->dest != pre + && dominated_by_p (CDI_POST_DOMINATORS, e->dest, pre)) + ok = false; + + if (ok) + last_ok = pre; + pro = pre; - if (can_get_prologue (pro, prologue_clobbered)) - last_ok = pro; } pro = last_ok;