diff mbox

[RFC] Move ehcleanup pass to before early SRA

Message ID 1931947.uvSAzT6pjY@polaris
State New
Headers show

Commit Message

Eric Botcazou Jan. 15, 2014, 9:24 a.m. UTC
Hi,

we have run into optimization regressions in Ada caused by the interaction 
between the GIMPLE clobbers and SRA: on the one hand GIMPLE clobbers create 
artificial EH handlers for aggregate variables, on the other hand SRA refuses 
to scalarize objects that appear in a statement that can throw internally.  
The result is that GIMPLE clobbers block the scalarization of variables that 
used to be possible up to the 4.6 series.  Testcase attached, compile p.adb on 
x86-64 with -O2 -fdump-tree-ealias -fdump-tree-esra.

This can be solved by moving the ehcleanup pass to before early SRA in 
the pipeline.  It has a small but measurable positive effect on some of our 
benchmarks (with a 4.7-based compiler).  Tested on x86-64/Linux.

Thoughts?


2014-01-15  Eric Botcazou  <ebotcazou@adacore.com>

	* passes.def (pass_early_local_passes): Run first EH cleanup pass early.

Comments

Richard Biener Jan. 15, 2014, 11:23 a.m. UTC | #1
On Wed, Jan 15, 2014 at 10:24 AM, Eric Botcazou <ebotcazou@adacore.com> wrote:
> Hi,
>
> we have run into optimization regressions in Ada caused by the interaction
> between the GIMPLE clobbers and SRA: on the one hand GIMPLE clobbers create
> artificial EH handlers for aggregate variables, on the other hand SRA refuses
> to scalarize objects that appear in a statement that can throw internally.
> The result is that GIMPLE clobbers block the scalarization of variables that
> used to be possible up to the 4.6 series.  Testcase attached, compile p.adb on
> x86-64 with -O2 -fdump-tree-ealias -fdump-tree-esra.
>
> This can be solved by moving the ehcleanup pass to before early SRA in
> the pipeline.  It has a small but measurable positive effect on some of our
> benchmarks (with a 4.7-based compiler).  Tested on x86-64/Linux.
>
> Thoughts?

What you want is tree-eh.c:optimize_clobbers, right?  Can't we
do this optimization during EH refactoring / lowering as well?

Also why does SRA refuse to scalarize here?  It will end up
scalarizing during regular opts as

<L1>:
  result = decls_support.get_private (100);
  result$i_12 = MEM[(struct decls_support__t_private *)&result];

where it can do the scalarization just fine if it inserts on all
non-EH successor edges (so not doing the scalarization only
if there is more than one non-EH successor edge would make
sense).

The current placement of EH cleanup is to make it catch
all opportunities before early inlining sees the function as
callee.  And I guess both FRE and DCE will expose quite
some EH cleanup opportunities especially with non-call-exceptions.

Richard.

>
> 2014-01-15  Eric Botcazou  <ebotcazou@adacore.com>
>
>         * passes.def (pass_early_local_passes): Run first EH cleanup pass early.
>
>
> --
> Eric Botcazou
Eric Botcazou Jan. 15, 2014, 12:06 p.m. UTC | #2
> What you want is tree-eh.c:optimize_clobbers, right?  Can't we
> do this optimization during EH refactoring / lowering as well?

We need the SSA form for sink_clobbers.

> Also why does SRA refuse to scalarize here?

Because of the EH handler and the following predicate:

/* Disqualify LHS and RHS for scalarization if STMT must end its basic block
   in modes in which it matters, return true iff they have been disqualified.
   RHS may be NULL, in that case ignore it.  If we scalarize an aggregate in
   intra-SRA we may need to add statements after each statement.  This is not
   possible if a statement unconditionally has to end the basic block.  */
static bool
disqualify_ops_if_throwing_stmt (gimple stmt, tree lhs, tree rhs)
{
  if ((sra_mode == SRA_MODE_EARLY_INTRA || sra_mode == SRA_MODE_INTRA)
      && (stmt_can_throw_internal (stmt) || stmt_ends_bb_p (stmt)))
    {
      disqualify_base_of_expr (lhs, "LHS of a throwing stmt.");
      if (rhs)
	disqualify_base_of_expr (rhs, "RHS of a throwing stmt.");
      return true;
    }
  return false;
}

> It will end up scalarizing during regular opts as
> 
> <L1>:
>   result = decls_support.get_private (100);
>   result$i_12 = MEM[(struct decls_support__t_private *)&result];

Yes, it's scalarized during SRA but not ESRA because there is an ehcleanup 
pass in-between.  It used to be scalarized during ESRA up to 4.6.x.

> The current placement of EH cleanup is to make it catch
> all opportunities before early inlining sees the function as
> callee.  And I guess both FRE and DCE will expose quite
> some EH cleanup opportunities especially with non-call-exceptions.

Early inlining or regular inlining (I'm always slightly confused here)?

p.adb.003t.original
p.adb.004t.gimple
p.adb.005t.nested
p.adb.006t.omplower
p.adb.007t.lower
p.adb.009t.ehopt
p.adb.010t.eh
p.adb.011t.cfg
p.adb.015t.ssa
p.adb.017t.inline_param1
p.adb.018t.einline
p.adb.019t.early_optimizations
p.adb.020t.copyrename1
p.adb.021t.ccp1
p.adb.022t.forwprop1
p.adb.023t.ealias
p.adb.024t.esra
p.adb.025t.fre1
p.adb.026t.copyprop1
p.adb.027t.mergephi1
p.adb.028t.cddce1
p.adb.029t.eipa_sra
p.adb.030t.tailr1
p.adb.031t.switchconv
p.adb.032t.ehcleanup1
p.adb.033t.profile_estimate
p.adb.034t.local-pure-const1
p.adb.035t.fnsplit
p.adb.036t.release_ssa
p.adb.037t.inline_param2
p.adb.054t.copyrename2

Then we could add another ehcleanup pass before early inlining, i.e. between 
015t.ssa and 017t.inline_param1, to achieve the same goal.
Richard Biener Jan. 15, 2014, 2:39 p.m. UTC | #3
On Wed, Jan 15, 2014 at 1:06 PM, Eric Botcazou <ebotcazou@adacore.com> wrote:
>> What you want is tree-eh.c:optimize_clobbers, right?  Can't we
>> do this optimization during EH refactoring / lowering as well?
>
> We need the SSA form for sink_clobbers.

I know, I'm saying it may be possible to implement an equivalent
optimization without SSA form.

>> Also why does SRA refuse to scalarize here?
>
> Because of the EH handler and the following predicate:
>
> /* Disqualify LHS and RHS for scalarization if STMT must end its basic block
>    in modes in which it matters, return true iff they have been disqualified.
>    RHS may be NULL, in that case ignore it.  If we scalarize an aggregate in
>    intra-SRA we may need to add statements after each statement.  This is not
>    possible if a statement unconditionally has to end the basic block.  */
> static bool
> disqualify_ops_if_throwing_stmt (gimple stmt, tree lhs, tree rhs)
> {
>   if ((sra_mode == SRA_MODE_EARLY_INTRA || sra_mode == SRA_MODE_INTRA)
>       && (stmt_can_throw_internal (stmt) || stmt_ends_bb_p (stmt)))
>     {
>       disqualify_base_of_expr (lhs, "LHS of a throwing stmt.");
>       if (rhs)
>         disqualify_base_of_expr (rhs, "RHS of a throwing stmt.");
>       return true;
>     }
>   return false;
> }
>
>> It will end up scalarizing during regular opts as
>>
>> <L1>:
>>   result = decls_support.get_private (100);
>>   result$i_12 = MEM[(struct decls_support__t_private *)&result];
>
> Yes, it's scalarized during SRA but not ESRA because there is an ehcleanup
> pass in-between.  It used to be scalarized during ESRA up to 4.6.x.

I'm saying even ESRA should be able to scalarize it just fine.  It just
needs to be careful where to insert the loads from the aggregate result
(on the single outgoing non-EH edge).

Wouldn't that fix your issue?

>> The current placement of EH cleanup is to make it catch
>> all opportunities before early inlining sees the function as
>> callee.  And I guess both FRE and DCE will expose quite
>> some EH cleanup opportunities especially with non-call-exceptions.
>
> Early inlining or regular inlining (I'm always slightly confused here)?

Early inlining.

> p.adb.003t.original
> p.adb.004t.gimple
> p.adb.005t.nested
> p.adb.006t.omplower
> p.adb.007t.lower
> p.adb.009t.ehopt
> p.adb.010t.eh
> p.adb.011t.cfg
> p.adb.015t.ssa
> p.adb.017t.inline_param1
> p.adb.018t.einline
> p.adb.019t.early_optimizations
> p.adb.020t.copyrename1
> p.adb.021t.ccp1
> p.adb.022t.forwprop1
> p.adb.023t.ealias
> p.adb.024t.esra
> p.adb.025t.fre1
> p.adb.026t.copyprop1
> p.adb.027t.mergephi1
> p.adb.028t.cddce1
> p.adb.029t.eipa_sra
> p.adb.030t.tailr1
> p.adb.031t.switchconv
> p.adb.032t.ehcleanup1
> p.adb.033t.profile_estimate
> p.adb.034t.local-pure-const1
> p.adb.035t.fnsplit
> p.adb.036t.release_ssa
> p.adb.037t.inline_param2
> p.adb.054t.copyrename2
>
> Then we could add another ehcleanup pass before early inlining, i.e. between
> 015t.ssa and 017t.inline_param1, to achieve the same goal.
>
> --
> Eric Botcazou
Eric Botcazou Jan. 15, 2014, 5:37 p.m. UTC | #4
> I know, I'm saying it may be possible to implement an equivalent
> optimization without SSA form.

Sure, but what would be the point of this duplication exactly?

> I'm saying even ESRA should be able to scalarize it just fine.  It just
> needs to be careful where to insert the loads from the aggregate result
> (on the single outgoing non-EH edge).
> 
> Wouldn't that fix your issue?

Very likely, yes, but there might be other passes with the same problem before 
the first EH cleanup pass (e.g. FRE, DCE, etc).

> Early inlining.

Are you sure?  I don't see how 032t.ehcleanup1 can have any effects on 
p.adb.018t.einline (unless something mysterious happens here).

> > p.adb.003t.original
> > p.adb.004t.gimple
> > p.adb.005t.nested
> > p.adb.006t.omplower
> > p.adb.007t.lower
> > p.adb.009t.ehopt
> > p.adb.010t.eh
> > p.adb.011t.cfg
> > p.adb.015t.ssa
> > p.adb.017t.inline_param1
> > p.adb.018t.einline
> > p.adb.019t.early_optimizations
> > p.adb.020t.copyrename1
> > p.adb.021t.ccp1
> > p.adb.022t.forwprop1
> > p.adb.023t.ealias
> > p.adb.024t.esra
> > p.adb.025t.fre1
> > p.adb.026t.copyprop1
> > p.adb.027t.mergephi1
> > p.adb.028t.cddce1
> > p.adb.029t.eipa_sra
> > p.adb.030t.tailr1
> > p.adb.031t.switchconv
> > p.adb.032t.ehcleanup1
> > p.adb.033t.profile_estimate
> > p.adb.034t.local-pure-const1
> > p.adb.035t.fnsplit
> > p.adb.036t.release_ssa
> > p.adb.037t.inline_param2
> > p.adb.054t.copyrename2
Martin Jambor Jan. 15, 2014, 6:04 p.m. UTC | #5
Hi,

On Wed, Jan 15, 2014 at 06:37:30PM +0100, Eric Botcazou wrote:
> > I know, I'm saying it may be possible to implement an equivalent
> > optimization without SSA form.
> 
> Sure, but what would be the point of this duplication exactly?
> 
> > I'm saying even ESRA should be able to scalarize it just fine.  It just
> > needs to be careful where to insert the loads from the aggregate result
> > (on the single outgoing non-EH edge).
> > 
> > Wouldn't that fix your issue?
> 
> Very likely, yes, but there might be other passes with the same problem before 
> the first EH cleanup pass (e.g. FRE, DCE, etc).
> 
> > Early inlining.
> 
> Are you sure?  I don't see how 032t.ehcleanup1 can have any effects on 
> p.adb.018t.einline (unless something mysterious happens here).

We early-optimize functions in topological order so generally (except
for SCCs), so when 018t.einline runs all the callees have already been
early-optimized, and it is really the callees it looks at most.

(But it of course prepares the function for ipa inlining analysis too
since the standalone functions will be untouched.)

Martin

> 
> > > p.adb.003t.original
> > > p.adb.004t.gimple
> > > p.adb.005t.nested
> > > p.adb.006t.omplower
> > > p.adb.007t.lower
> > > p.adb.009t.ehopt
> > > p.adb.010t.eh
> > > p.adb.011t.cfg
> > > p.adb.015t.ssa
> > > p.adb.017t.inline_param1
> > > p.adb.018t.einline
> > > p.adb.019t.early_optimizations
> > > p.adb.020t.copyrename1
> > > p.adb.021t.ccp1
> > > p.adb.022t.forwprop1
> > > p.adb.023t.ealias
> > > p.adb.024t.esra
> > > p.adb.025t.fre1
> > > p.adb.026t.copyprop1
> > > p.adb.027t.mergephi1
> > > p.adb.028t.cddce1
> > > p.adb.029t.eipa_sra
> > > p.adb.030t.tailr1
> > > p.adb.031t.switchconv
> > > p.adb.032t.ehcleanup1
> > > p.adb.033t.profile_estimate
> > > p.adb.034t.local-pure-const1
> > > p.adb.035t.fnsplit
> > > p.adb.036t.release_ssa
> > > p.adb.037t.inline_param2
> > > p.adb.054t.copyrename2
> 
> -- 
> Eric Botcazou
Richard Biener Jan. 16, 2014, 11:30 a.m. UTC | #6
On Wed, Jan 15, 2014 at 6:37 PM, Eric Botcazou <ebotcazou@adacore.com> wrote:
>> I know, I'm saying it may be possible to implement an equivalent
>> optimization without SSA form.
>
> Sure, but what would be the point of this duplication exactly?

To avoid ...

>> I'm saying even ESRA should be able to scalarize it just fine.  It just
>> needs to be careful where to insert the loads from the aggregate result
>> (on the single outgoing non-EH edge).
>>
>> Wouldn't that fix your issue?
>
> Very likely, yes, but there might be other passes with the same problem before
> the first EH cleanup pass (e.g. FRE, DCE, etc).

... these and avoid adding another EH cleanup pass.

>> Early inlining.
>
> Are you sure?  I don't see how 032t.ehcleanup1 can have any effects on
> p.adb.018t.einline (unless something mysterious happens here).
>
>> > p.adb.003t.original
>> > p.adb.004t.gimple
>> > p.adb.005t.nested
>> > p.adb.006t.omplower
>> > p.adb.007t.lower
>> > p.adb.009t.ehopt
>> > p.adb.010t.eh
>> > p.adb.011t.cfg
>> > p.adb.015t.ssa
>> > p.adb.017t.inline_param1
>> > p.adb.018t.einline
>> > p.adb.019t.early_optimizations
>> > p.adb.020t.copyrename1
>> > p.adb.021t.ccp1
>> > p.adb.022t.forwprop1
>> > p.adb.023t.ealias
>> > p.adb.024t.esra
>> > p.adb.025t.fre1
>> > p.adb.026t.copyprop1
>> > p.adb.027t.mergephi1
>> > p.adb.028t.cddce1
>> > p.adb.029t.eipa_sra
>> > p.adb.030t.tailr1
>> > p.adb.031t.switchconv
>> > p.adb.032t.ehcleanup1
>> > p.adb.033t.profile_estimate
>> > p.adb.034t.local-pure-const1
>> > p.adb.035t.fnsplit
>> > p.adb.036t.release_ssa
>> > p.adb.037t.inline_param2
>> > p.adb.054t.copyrename2
>
> --
> Eric Botcazou
diff mbox

Patch

Index: passes.def
===================================================================
--- passes.def	(revision 206563)
+++ passes.def	(working copy)
@@ -71,6 +71,10 @@  along with GCC; see the file COPYING3.
 	  /* pass_build_ealias is a dummy pass that ensures that we
 	     execute TODO_rebuild_alias at this point.  */
 	  NEXT_PASS (pass_build_ealias);
+	  /* SRA refuses to scalarize objects that appear in a statement
+	     that can throw internally so we need to cleanup the EH tree
+	     early to remove handlers that contain only clobbers.  */
+	  NEXT_PASS (pass_cleanup_eh);
 	  NEXT_PASS (pass_sra_early);
 	  NEXT_PASS (pass_fre);
 	  NEXT_PASS (pass_copy_prop);
@@ -79,7 +83,6 @@  along with GCC; see the file COPYING3.
 	  NEXT_PASS (pass_early_ipa_sra);
 	  NEXT_PASS (pass_tail_recursion);
 	  NEXT_PASS (pass_convert_switch);
-          NEXT_PASS (pass_cleanup_eh);
           NEXT_PASS (pass_profile);
           NEXT_PASS (pass_local_pure_const);
 	  /* Split functions creates parts that are not run through