Message ID | 1931947.uvSAzT6pjY@polaris |
---|---|
State | New |
Headers | show |
On Wed, Jan 15, 2014 at 10:24 AM, Eric Botcazou <ebotcazou@adacore.com> wrote: > Hi, > > we have run into optimization regressions in Ada caused by the interaction > between the GIMPLE clobbers and SRA: on the one hand GIMPLE clobbers create > artificial EH handlers for aggregate variables, on the other hand SRA refuses > to scalarize objects that appear in a statement that can throw internally. > The result is that GIMPLE clobbers block the scalarization of variables that > used to be possible up to the 4.6 series. Testcase attached, compile p.adb on > x86-64 with -O2 -fdump-tree-ealias -fdump-tree-esra. > > This can be solved by moving the ehcleanup pass to before early SRA in > the pipeline. It has a small but measurable positive effect on some of our > benchmarks (with a 4.7-based compiler). Tested on x86-64/Linux. > > Thoughts? What you want is tree-eh.c:optimize_clobbers, right? Can't we do this optimization during EH refactoring / lowering as well? Also why does SRA refuse to scalarize here? It will end up scalarizing during regular opts as <L1>: result = decls_support.get_private (100); result$i_12 = MEM[(struct decls_support__t_private *)&result]; where it can do the scalarization just fine if it inserts on all non-EH successor edges (so not doing the scalarization only if there is more than one non-EH successor edge would make sense). The current placement of EH cleanup is to make it catch all opportunities before early inlining sees the function as callee. And I guess both FRE and DCE will expose quite some EH cleanup opportunities especially with non-call-exceptions. Richard. > > 2014-01-15 Eric Botcazou <ebotcazou@adacore.com> > > * passes.def (pass_early_local_passes): Run first EH cleanup pass early. > > > -- > Eric Botcazou
> What you want is tree-eh.c:optimize_clobbers, right? Can't we > do this optimization during EH refactoring / lowering as well? We need the SSA form for sink_clobbers. > Also why does SRA refuse to scalarize here? Because of the EH handler and the following predicate: /* Disqualify LHS and RHS for scalarization if STMT must end its basic block in modes in which it matters, return true iff they have been disqualified. RHS may be NULL, in that case ignore it. If we scalarize an aggregate in intra-SRA we may need to add statements after each statement. This is not possible if a statement unconditionally has to end the basic block. */ static bool disqualify_ops_if_throwing_stmt (gimple stmt, tree lhs, tree rhs) { if ((sra_mode == SRA_MODE_EARLY_INTRA || sra_mode == SRA_MODE_INTRA) && (stmt_can_throw_internal (stmt) || stmt_ends_bb_p (stmt))) { disqualify_base_of_expr (lhs, "LHS of a throwing stmt."); if (rhs) disqualify_base_of_expr (rhs, "RHS of a throwing stmt."); return true; } return false; } > It will end up scalarizing during regular opts as > > <L1>: > result = decls_support.get_private (100); > result$i_12 = MEM[(struct decls_support__t_private *)&result]; Yes, it's scalarized during SRA but not ESRA because there is an ehcleanup pass in-between. It used to be scalarized during ESRA up to 4.6.x. > The current placement of EH cleanup is to make it catch > all opportunities before early inlining sees the function as > callee. And I guess both FRE and DCE will expose quite > some EH cleanup opportunities especially with non-call-exceptions. Early inlining or regular inlining (I'm always slightly confused here)? p.adb.003t.original p.adb.004t.gimple p.adb.005t.nested p.adb.006t.omplower p.adb.007t.lower p.adb.009t.ehopt p.adb.010t.eh p.adb.011t.cfg p.adb.015t.ssa p.adb.017t.inline_param1 p.adb.018t.einline p.adb.019t.early_optimizations p.adb.020t.copyrename1 p.adb.021t.ccp1 p.adb.022t.forwprop1 p.adb.023t.ealias p.adb.024t.esra p.adb.025t.fre1 p.adb.026t.copyprop1 p.adb.027t.mergephi1 p.adb.028t.cddce1 p.adb.029t.eipa_sra p.adb.030t.tailr1 p.adb.031t.switchconv p.adb.032t.ehcleanup1 p.adb.033t.profile_estimate p.adb.034t.local-pure-const1 p.adb.035t.fnsplit p.adb.036t.release_ssa p.adb.037t.inline_param2 p.adb.054t.copyrename2 Then we could add another ehcleanup pass before early inlining, i.e. between 015t.ssa and 017t.inline_param1, to achieve the same goal.
On Wed, Jan 15, 2014 at 1:06 PM, Eric Botcazou <ebotcazou@adacore.com> wrote: >> What you want is tree-eh.c:optimize_clobbers, right? Can't we >> do this optimization during EH refactoring / lowering as well? > > We need the SSA form for sink_clobbers. I know, I'm saying it may be possible to implement an equivalent optimization without SSA form. >> Also why does SRA refuse to scalarize here? > > Because of the EH handler and the following predicate: > > /* Disqualify LHS and RHS for scalarization if STMT must end its basic block > in modes in which it matters, return true iff they have been disqualified. > RHS may be NULL, in that case ignore it. If we scalarize an aggregate in > intra-SRA we may need to add statements after each statement. This is not > possible if a statement unconditionally has to end the basic block. */ > static bool > disqualify_ops_if_throwing_stmt (gimple stmt, tree lhs, tree rhs) > { > if ((sra_mode == SRA_MODE_EARLY_INTRA || sra_mode == SRA_MODE_INTRA) > && (stmt_can_throw_internal (stmt) || stmt_ends_bb_p (stmt))) > { > disqualify_base_of_expr (lhs, "LHS of a throwing stmt."); > if (rhs) > disqualify_base_of_expr (rhs, "RHS of a throwing stmt."); > return true; > } > return false; > } > >> It will end up scalarizing during regular opts as >> >> <L1>: >> result = decls_support.get_private (100); >> result$i_12 = MEM[(struct decls_support__t_private *)&result]; > > Yes, it's scalarized during SRA but not ESRA because there is an ehcleanup > pass in-between. It used to be scalarized during ESRA up to 4.6.x. I'm saying even ESRA should be able to scalarize it just fine. It just needs to be careful where to insert the loads from the aggregate result (on the single outgoing non-EH edge). Wouldn't that fix your issue? >> The current placement of EH cleanup is to make it catch >> all opportunities before early inlining sees the function as >> callee. And I guess both FRE and DCE will expose quite >> some EH cleanup opportunities especially with non-call-exceptions. > > Early inlining or regular inlining (I'm always slightly confused here)? Early inlining. > p.adb.003t.original > p.adb.004t.gimple > p.adb.005t.nested > p.adb.006t.omplower > p.adb.007t.lower > p.adb.009t.ehopt > p.adb.010t.eh > p.adb.011t.cfg > p.adb.015t.ssa > p.adb.017t.inline_param1 > p.adb.018t.einline > p.adb.019t.early_optimizations > p.adb.020t.copyrename1 > p.adb.021t.ccp1 > p.adb.022t.forwprop1 > p.adb.023t.ealias > p.adb.024t.esra > p.adb.025t.fre1 > p.adb.026t.copyprop1 > p.adb.027t.mergephi1 > p.adb.028t.cddce1 > p.adb.029t.eipa_sra > p.adb.030t.tailr1 > p.adb.031t.switchconv > p.adb.032t.ehcleanup1 > p.adb.033t.profile_estimate > p.adb.034t.local-pure-const1 > p.adb.035t.fnsplit > p.adb.036t.release_ssa > p.adb.037t.inline_param2 > p.adb.054t.copyrename2 > > Then we could add another ehcleanup pass before early inlining, i.e. between > 015t.ssa and 017t.inline_param1, to achieve the same goal. > > -- > Eric Botcazou
> I know, I'm saying it may be possible to implement an equivalent > optimization without SSA form. Sure, but what would be the point of this duplication exactly? > I'm saying even ESRA should be able to scalarize it just fine. It just > needs to be careful where to insert the loads from the aggregate result > (on the single outgoing non-EH edge). > > Wouldn't that fix your issue? Very likely, yes, but there might be other passes with the same problem before the first EH cleanup pass (e.g. FRE, DCE, etc). > Early inlining. Are you sure? I don't see how 032t.ehcleanup1 can have any effects on p.adb.018t.einline (unless something mysterious happens here). > > p.adb.003t.original > > p.adb.004t.gimple > > p.adb.005t.nested > > p.adb.006t.omplower > > p.adb.007t.lower > > p.adb.009t.ehopt > > p.adb.010t.eh > > p.adb.011t.cfg > > p.adb.015t.ssa > > p.adb.017t.inline_param1 > > p.adb.018t.einline > > p.adb.019t.early_optimizations > > p.adb.020t.copyrename1 > > p.adb.021t.ccp1 > > p.adb.022t.forwprop1 > > p.adb.023t.ealias > > p.adb.024t.esra > > p.adb.025t.fre1 > > p.adb.026t.copyprop1 > > p.adb.027t.mergephi1 > > p.adb.028t.cddce1 > > p.adb.029t.eipa_sra > > p.adb.030t.tailr1 > > p.adb.031t.switchconv > > p.adb.032t.ehcleanup1 > > p.adb.033t.profile_estimate > > p.adb.034t.local-pure-const1 > > p.adb.035t.fnsplit > > p.adb.036t.release_ssa > > p.adb.037t.inline_param2 > > p.adb.054t.copyrename2
Hi, On Wed, Jan 15, 2014 at 06:37:30PM +0100, Eric Botcazou wrote: > > I know, I'm saying it may be possible to implement an equivalent > > optimization without SSA form. > > Sure, but what would be the point of this duplication exactly? > > > I'm saying even ESRA should be able to scalarize it just fine. It just > > needs to be careful where to insert the loads from the aggregate result > > (on the single outgoing non-EH edge). > > > > Wouldn't that fix your issue? > > Very likely, yes, but there might be other passes with the same problem before > the first EH cleanup pass (e.g. FRE, DCE, etc). > > > Early inlining. > > Are you sure? I don't see how 032t.ehcleanup1 can have any effects on > p.adb.018t.einline (unless something mysterious happens here). We early-optimize functions in topological order so generally (except for SCCs), so when 018t.einline runs all the callees have already been early-optimized, and it is really the callees it looks at most. (But it of course prepares the function for ipa inlining analysis too since the standalone functions will be untouched.) Martin > > > > p.adb.003t.original > > > p.adb.004t.gimple > > > p.adb.005t.nested > > > p.adb.006t.omplower > > > p.adb.007t.lower > > > p.adb.009t.ehopt > > > p.adb.010t.eh > > > p.adb.011t.cfg > > > p.adb.015t.ssa > > > p.adb.017t.inline_param1 > > > p.adb.018t.einline > > > p.adb.019t.early_optimizations > > > p.adb.020t.copyrename1 > > > p.adb.021t.ccp1 > > > p.adb.022t.forwprop1 > > > p.adb.023t.ealias > > > p.adb.024t.esra > > > p.adb.025t.fre1 > > > p.adb.026t.copyprop1 > > > p.adb.027t.mergephi1 > > > p.adb.028t.cddce1 > > > p.adb.029t.eipa_sra > > > p.adb.030t.tailr1 > > > p.adb.031t.switchconv > > > p.adb.032t.ehcleanup1 > > > p.adb.033t.profile_estimate > > > p.adb.034t.local-pure-const1 > > > p.adb.035t.fnsplit > > > p.adb.036t.release_ssa > > > p.adb.037t.inline_param2 > > > p.adb.054t.copyrename2 > > -- > Eric Botcazou
On Wed, Jan 15, 2014 at 6:37 PM, Eric Botcazou <ebotcazou@adacore.com> wrote: >> I know, I'm saying it may be possible to implement an equivalent >> optimization without SSA form. > > Sure, but what would be the point of this duplication exactly? To avoid ... >> I'm saying even ESRA should be able to scalarize it just fine. It just >> needs to be careful where to insert the loads from the aggregate result >> (on the single outgoing non-EH edge). >> >> Wouldn't that fix your issue? > > Very likely, yes, but there might be other passes with the same problem before > the first EH cleanup pass (e.g. FRE, DCE, etc). ... these and avoid adding another EH cleanup pass. >> Early inlining. > > Are you sure? I don't see how 032t.ehcleanup1 can have any effects on > p.adb.018t.einline (unless something mysterious happens here). > >> > p.adb.003t.original >> > p.adb.004t.gimple >> > p.adb.005t.nested >> > p.adb.006t.omplower >> > p.adb.007t.lower >> > p.adb.009t.ehopt >> > p.adb.010t.eh >> > p.adb.011t.cfg >> > p.adb.015t.ssa >> > p.adb.017t.inline_param1 >> > p.adb.018t.einline >> > p.adb.019t.early_optimizations >> > p.adb.020t.copyrename1 >> > p.adb.021t.ccp1 >> > p.adb.022t.forwprop1 >> > p.adb.023t.ealias >> > p.adb.024t.esra >> > p.adb.025t.fre1 >> > p.adb.026t.copyprop1 >> > p.adb.027t.mergephi1 >> > p.adb.028t.cddce1 >> > p.adb.029t.eipa_sra >> > p.adb.030t.tailr1 >> > p.adb.031t.switchconv >> > p.adb.032t.ehcleanup1 >> > p.adb.033t.profile_estimate >> > p.adb.034t.local-pure-const1 >> > p.adb.035t.fnsplit >> > p.adb.036t.release_ssa >> > p.adb.037t.inline_param2 >> > p.adb.054t.copyrename2 > > -- > Eric Botcazou
Index: passes.def =================================================================== --- passes.def (revision 206563) +++ passes.def (working copy) @@ -71,6 +71,10 @@ along with GCC; see the file COPYING3. /* pass_build_ealias is a dummy pass that ensures that we execute TODO_rebuild_alias at this point. */ NEXT_PASS (pass_build_ealias); + /* SRA refuses to scalarize objects that appear in a statement + that can throw internally so we need to cleanup the EH tree + early to remove handlers that contain only clobbers. */ + NEXT_PASS (pass_cleanup_eh); NEXT_PASS (pass_sra_early); NEXT_PASS (pass_fre); NEXT_PASS (pass_copy_prop); @@ -79,7 +83,6 @@ along with GCC; see the file COPYING3. NEXT_PASS (pass_early_ipa_sra); NEXT_PASS (pass_tail_recursion); NEXT_PASS (pass_convert_switch); - NEXT_PASS (pass_cleanup_eh); NEXT_PASS (pass_profile); NEXT_PASS (pass_local_pure_const); /* Split functions creates parts that are not run through