Message ID | ed130964e46e1941122dd75e381cca907ec28bd5.1596833936.git.segher@kernel.crashing.org |
---|---|
State | New |
Headers | show |
Series | bb-reorder: Remove a misfiring micro-optimization (PR96475) | expand |
Ping (added some Cc:s). Thanks in advance, Segher On Fri, Aug 07, 2020 at 09:51:04PM +0000, Segher Boessenkool wrote: > When the compgotos pass copies the tail of blocks ending in an indirect > jump, there is a micro-optimization to not copy the last one, since the > original block will then just be deleted. This does not work properly > if cleanup_cfg does not merge all pairs of blocks we expect it to. > > > v2: This also deletes the other use of single_pred_p, which has the same > problem in principle, I just never have triggered it so far. > > Tested on powerpc64-linux {-m32,-m64} like before. Is this okay for > trunk? > > > Segher > > > 2020-08-07 Segher Boessenkool <segher@kernel.crashing.org> > > PR rtl-optimization/96475 > * bb-reorder.c (maybe_duplicate_computed_goto): Remove single_pred_p > micro-optimization. > --- > gcc/bb-reorder.c | 10 +++------- > 1 file changed, 3 insertions(+), 7 deletions(-) > > diff --git a/gcc/bb-reorder.c b/gcc/bb-reorder.c > index c635010..76e56b5 100644 > --- a/gcc/bb-reorder.c > +++ b/gcc/bb-reorder.c > @@ -2680,9 +2680,6 @@ make_pass_reorder_blocks (gcc::context *ctxt) > static bool > maybe_duplicate_computed_goto (basic_block bb, int max_size) > { > - if (single_pred_p (bb)) > - return false; > - > /* Make sure that the block is small enough. */ > rtx_insn *insn; > FOR_BB_INSNS (bb, insn) > @@ -2700,10 +2697,9 @@ maybe_duplicate_computed_goto (basic_block bb, int max_size) > { > basic_block pred = e->src; > > - /* Do not duplicate BB into PRED if that is the last predecessor, or if > - we cannot merge a copy of BB with PRED. */ > - if (single_pred_p (bb) > - || !single_succ_p (pred) > + /* Do not duplicate BB into PRED if we cannot merge a copy of BB > + with PRED. */ > + if (!single_succ_p (pred) > || e->flags & EDGE_COMPLEX > || pred->index < NUM_FIXED_BLOCKS > || (JUMP_P (BB_END (pred)) && !simplejump_p (BB_END (pred))) > -- > 1.8.3.1
Segher Boessenkool <segher@kernel.crashing.org> writes: > When the compgotos pass copies the tail of blocks ending in an indirect > jump, there is a micro-optimization to not copy the last one, since the > original block will then just be deleted. This does not work properly > if cleanup_cfg does not merge all pairs of blocks we expect it to. > > > v2: This also deletes the other use of single_pred_p, which has the same > problem in principle, I just never have triggered it so far. Could you go into more details? I thought the reason you wanted to remove the test in the loop was that each successful iteration of the loop removes one incoming edge, meaning that if all previous iterations succeed, the final iteration is bound to see a single predecessor. And there's no guarantee in that case that the jump in the final block will be removed. I.e. the loop is unnecessarily leaving cfgcleanup to finish off the last step for it. So I agree that the loop part looks good. (I assume that was v1, but I missed it, sorry.) But it looks like the point of the initial test is to avoid “duplicating” the block if there would still only be one copy of the code. That test still seems reasonable since the case it rejects should be handled by other cfg cleanups. If for some reason it isn't, there doesn't seem to be any reason to apply the max_size limit for a single predecessor, since no code duplication will take place. So ISTM that removing the first condition is really a separate thing that should be its own patch. Thanks, Richard > > Tested on powerpc64-linux {-m32,-m64} like before. Is this okay for > trunk? > > > Segher > > > 2020-08-07 Segher Boessenkool <segher@kernel.crashing.org> > > PR rtl-optimization/96475 > * bb-reorder.c (maybe_duplicate_computed_goto): Remove single_pred_p > micro-optimization. > --- > gcc/bb-reorder.c | 10 +++------- > 1 file changed, 3 insertions(+), 7 deletions(-) > > diff --git a/gcc/bb-reorder.c b/gcc/bb-reorder.c > index c635010..76e56b5 100644 > --- a/gcc/bb-reorder.c > +++ b/gcc/bb-reorder.c > @@ -2680,9 +2680,6 @@ make_pass_reorder_blocks (gcc::context *ctxt) > static bool > maybe_duplicate_computed_goto (basic_block bb, int max_size) > { > - if (single_pred_p (bb)) > - return false; > - > /* Make sure that the block is small enough. */ > rtx_insn *insn; > FOR_BB_INSNS (bb, insn) > @@ -2700,10 +2697,9 @@ maybe_duplicate_computed_goto (basic_block bb, int max_size) > { > basic_block pred = e->src; > > - /* Do not duplicate BB into PRED if that is the last predecessor, or if > - we cannot merge a copy of BB with PRED. */ > - if (single_pred_p (bb) > - || !single_succ_p (pred) > + /* Do not duplicate BB into PRED if we cannot merge a copy of BB > + with PRED. */ > + if (!single_succ_p (pred) > || e->flags & EDGE_COMPLEX > || pred->index < NUM_FIXED_BLOCKS > || (JUMP_P (BB_END (pred)) && !simplejump_p (BB_END (pred)))
On Wed, Aug 19, 2020 at 01:10:36PM +0100, Richard Sandiford wrote: > Segher Boessenkool <segher@kernel.crashing.org> writes: > > When the compgotos pass copies the tail of blocks ending in an indirect > > jump, there is a micro-optimization to not copy the last one, since the > > original block will then just be deleted. This does not work properly > > if cleanup_cfg does not merge all pairs of blocks we expect it to. > > > > > > v2: This also deletes the other use of single_pred_p, which has the same > > problem in principle, I just never have triggered it so far. > > Could you go into more details? I thought the reason you wanted to > remove the test in the loop was that each successful iteration of the > loop removes one incoming edge, meaning that if all previous iterations > succeed, the final iteration is bound to see a single predecessor. > And there's no guarantee in that case that the jump in the final > block will be removed. I.e. the loop is unnecessarily leaving > cfgcleanup to finish off the last step for it. So I agree that the > loop part looks good. (I assume that was v1, but I missed it, sorry.) I didn't Cc: you, I should have :-) Yes, that is part of the problem. The case where it was noticed however was some release of GCC (some 9 release iirc) that did something wrong with cfg_cleanup apparenetly, so not everything was threaded optimally, making duplicate_computed_gotos run into jumps to jumps and stopping. Since it is such an important pass (for performance, for things with bigger jump tables or with threaded FSMs etc.), it seemed a good idea to make it more robust instead of slightly faster. > But it looks like the point of the initial test is to avoid “duplicating” > the block if there would still only be one copy of the code. That test > still seems reasonable since the case it rejects should be handled by > other cfg cleanups. If for some reason it isn't, there doesn't seem > to be any reason to apply the max_size limit for a single predecessor, > since no code duplication will take place. So ISTM that removing the > first condition is really a separate thing that should be its own patch. I have never seen the second case misfiring in practice, only the first one! The original patch is at https://gcc.gnu.org/pipermail/gcc-patches/2020-August/551560.html btw. I'll prepare a testcase for that (it doesn't trigger with recent GCC versions though :-/ ) Thanks, Segher
Segher Boessenkool <segher@kernel.crashing.org> writes: > On Wed, Aug 19, 2020 at 01:10:36PM +0100, Richard Sandiford wrote: >> Segher Boessenkool <segher@kernel.crashing.org> writes: >> > When the compgotos pass copies the tail of blocks ending in an indirect >> > jump, there is a micro-optimization to not copy the last one, since the >> > original block will then just be deleted. This does not work properly >> > if cleanup_cfg does not merge all pairs of blocks we expect it to. >> > >> > >> > v2: This also deletes the other use of single_pred_p, which has the same >> > problem in principle, I just never have triggered it so far. >> >> Could you go into more details? I thought the reason you wanted to >> remove the test in the loop was that each successful iteration of the >> loop removes one incoming edge, meaning that if all previous iterations >> succeed, the final iteration is bound to see a single predecessor. >> And there's no guarantee in that case that the jump in the final >> block will be removed. I.e. the loop is unnecessarily leaving >> cfgcleanup to finish off the last step for it. So I agree that the >> loop part looks good. (I assume that was v1, but I missed it, sorry.) > > I didn't Cc: you, I should have :-) > > Yes, that is part of the problem. The case where it was noticed however > was some release of GCC (some 9 release iirc) that did something wrong > with cfg_cleanup apparenetly, so not everything was threaded optimally, > making duplicate_computed_gotos run into jumps to jumps and stopping. > Since it is such an important pass (for performance, for things with > bigger jump tables or with threaded FSMs etc.), it seemed a good idea to > make it more robust instead of slightly faster. > >> But it looks like the point of the initial test is to avoid “duplicating” >> the block if there would still only be one copy of the code. That test >> still seems reasonable since the case it rejects should be handled by >> other cfg cleanups. If for some reason it isn't, there doesn't seem >> to be any reason to apply the max_size limit for a single predecessor, >> since no code duplication will take place. So ISTM that removing the >> first condition is really a separate thing that should be its own patch. > > I have never seen the second case misfiring in practice, only the first > one! Shucks, I guessed the wrong way round :-) I'd argue that the first check isn't a micro-optimisation though. It's testing whether the duplication really is a duplication (rather than a move). The second test does look like a micro-optimisation. > The original patch is at > https://gcc.gnu.org/pipermail/gcc-patches/2020-August/551560.html > btw. I'll prepare a testcase for that (it doesn't trigger with recent > GCC versions though :-/ ) Hmm, OK. If the first check is the important one, then I think it'd be better to make the max_size stuff conditional on !single_pred_p rather than drop the test entirely. Thanks, Richard
Hi! On Wed, Aug 19, 2020 at 07:18:02PM +0100, Richard Sandiford wrote: > > I have never seen the second case misfiring in practice, only the first > > one! > > Shucks, I guessed the wrong way round :-) > > I'd argue that the first check isn't a micro-optimisation though. > It's testing whether the duplication really is a duplication > (rather than a move). Yes, by removing it the code is duplicated like in all other cases, but then the original is removed later (as dead code). With the check in place we have it appended to the predecessor block (by insn chain grafting instead of copying, done by cleanup_cfg). That is what I call a micro-optimisation. Of course it isn't correct in all cases (maybe those cases didn't happen before, maybe no one noticed; in any case, this patch makes it more robust, while still keeping it O(N)). > > The original patch is at > > https://gcc.gnu.org/pipermail/gcc-patches/2020-August/551560.html > > btw. I'll prepare a testcase for that (it doesn't trigger with recent > > GCC versions though :-/ ) > > Hmm, OK. If the first check is the important one, then I think it'd > be better to make the max_size stuff conditional on !single_pred_p > rather than drop the test entirely. Okay, I'll look at that. Thanks, Segher
On Fri, 2020-08-07 at 21:51 +0000, Segher Boessenkool wrote: > When the compgotos pass copies the tail of blocks ending in an indirect > jump, there is a micro-optimization to not copy the last one, since the > original block will then just be deleted. This does not work properly > if cleanup_cfg does not merge all pairs of blocks we expect it to. > > > v2: This also deletes the other use of single_pred_p, which has the same > problem in principle, I just never have triggered it so far. > > Tested on powerpc64-linux {-m32,-m64} like before. Is this okay for > trunk? > > > Segher > > > 2020-08-07 Segher Boessenkool <segher@kernel.crashing.org> > > PR rtl-optimization/96475 > * bb-reorder.c (maybe_duplicate_computed_goto): Remove single_pred_p > micro-optimization. So this may have already been answered, but why didn't we merge the single pred with its single succ? Though I guess your patch wouldn't hurt, so OK. jeff
Hi! Sorry this took so long to come back to... On Tue, Aug 25, 2020 at 02:39:56PM -0600, Jeff Law wrote: > On Fri, 2020-08-07 at 21:51 +0000, Segher Boessenkool wrote: > > When the compgotos pass copies the tail of blocks ending in an indirect > > jump, there is a micro-optimization to not copy the last one, since the > > original block will then just be deleted. This does not work properly > > if cleanup_cfg does not merge all pairs of blocks we expect it to. > > > > > > v2: This also deletes the other use of single_pred_p, which has the same > > problem in principle, I just never have triggered it so far. > > > > Tested on powerpc64-linux {-m32,-m64} like before. Is this okay for > > trunk? > > > > > > Segher > > > > > > 2020-08-07 Segher Boessenkool <segher@kernel.crashing.org> > > > > PR rtl-optimization/96475 > > * bb-reorder.c (maybe_duplicate_computed_goto): Remove single_pred_p > > micro-optimization. > So this may have already been answered, but why didn't we merge the single pred > with its single succ? The patch makes the compgotos pass more robust, and also makes it work in cases where the unmodified code would not (for example, with the unmodified code, the original indirect jump is *not* copied to the predecessors whenever possible). I don't know if we hit this, or we hit a snag with some older GCC releases that did not jump thread properly in all cases. I suspect a micture of the two :-/ > Though I guess your patch wouldn't hurt, so OK. Thanks! Segher
diff --git a/gcc/bb-reorder.c b/gcc/bb-reorder.c index c635010..76e56b5 100644 --- a/gcc/bb-reorder.c +++ b/gcc/bb-reorder.c @@ -2680,9 +2680,6 @@ make_pass_reorder_blocks (gcc::context *ctxt) static bool maybe_duplicate_computed_goto (basic_block bb, int max_size) { - if (single_pred_p (bb)) - return false; - /* Make sure that the block is small enough. */ rtx_insn *insn; FOR_BB_INSNS (bb, insn) @@ -2700,10 +2697,9 @@ maybe_duplicate_computed_goto (basic_block bb, int max_size) { basic_block pred = e->src; - /* Do not duplicate BB into PRED if that is the last predecessor, or if - we cannot merge a copy of BB with PRED. */ - if (single_pred_p (bb) - || !single_succ_p (pred) + /* Do not duplicate BB into PRED if we cannot merge a copy of BB + with PRED. */ + if (!single_succ_p (pred) || e->flags & EDGE_COMPLEX || pred->index < NUM_FIXED_BLOCKS || (JUMP_P (BB_END (pred)) && !simplejump_p (BB_END (pred)))