Message ID | 6476732.lMFodJZTVz@polaris |
---|---|
State | New |
Headers | show |
On Wed, Apr 29, 2015 at 11:50 AM, Eric Botcazou <ebotcazou@adacore.com> wrote: > Historically the pragma Inline_Always of GNAT had been implemented in the FE > because the RTL inliner and then the Tree inliner weren't invoked at -O0 or > powerful enough to inline some constructs. But this approach had drawbacks, > especially wrt debug info. These restrictions were gradually lifted and now > the pragma is entirely piggybacked on the Tree inliner. > > This went mostly OK, except for a few cases where intrisinc operations that > used to be reasonably handled at -O0 now generate awful code, the typical > example being a modulus or division instrinsic by a power-of-2 generating a > fully-fledged modulus or division instruction instead of a simple shift. > > Therefore the attached patch implements anonymous constant propagation in the > inliner to fix the code quality regression. > > Tested on x86_64-suse-linux, OK for the mainline? Hmm, special-casing this in the inliner looks odd to me. ISTR the inliner already propagates constant parameters to immediate uses, so I guess you run into the casting case you handle specially. But then if any constant propagation is ok at -O0 why not perform full-fledged constant propagation (with !DECL_IGNORED_P vars as a propagation barrier) as a regular SSA pass? That is, if you'd had a Inline_Always function like int foo (int i) { return (i + 1) / i; } you'd not get the desired result as the i + 1 stmt wouldn't be folded and thus the (i + 1) / i stmt wouldn't either. That said - why does RTL optimization not save you here anyway? After all, it is responsible for turning divisions into shifts. Soo - if you use the existing CCP pass and make surely_varying_stmt_p return true for defs of !DECL_IGNORED_P stmts at -O0 (strictly needed to make setting vars in gdb possible) and somehow arrange for ccp to run at -O0... It might help other obscure cases with inline-asms to behave consistently with -O0 vs. -On as well. Richard. > 2015-04-29 Eric Botcazou <ebotcazou@adacore.com> > > * tree-inline.c (remap_gimple_op_r): Do anonymous constant propagation. > (copy_bb): Fold conversions of constants immediately. > > > 2015-04-29 Eric Botcazou <ebotcazou@adacore.com> > > * gnat.dg/inline12.adb: New test. > > > -- > Eric Botcazou
> Historically the pragma Inline_Always of GNAT had been implemented in the FE > because the RTL inliner and then the Tree inliner weren't invoked at -O0 or > powerful enough to inline some constructs. But this approach had drawbacks, > especially wrt debug info. These restrictions were gradually lifted and now > the pragma is entirely piggybacked on the Tree inliner. > > This went mostly OK, except for a few cases where intrisinc operations that > used to be reasonably handled at -O0 now generate awful code, the typical > example being a modulus or division instrinsic by a power-of-2 generating a > fully-fledged modulus or division instruction instead of a simple shift. > > Therefore the attached patch implements anonymous constant propagation in the > inliner to fix the code quality regression. > > Tested on x86_64-suse-linux, OK for the mainline? > > if (TREE_CODE (*tp) == SSA_NAME) > { > - *tp = remap_ssa_name (*tp, id); > + tree t = remap_ssa_name (*tp, id); > + /* Perform anonymous constant propagation, this makes it possible to > + generate reasonable code even at -O0 for operators implemented as > + inline functions. */ > + if (TREE_CODE (t) == SSA_NAME > + && SSA_NAME_DEF_STMT (t) > + && (!SSA_NAME_VAR (t) || DECL_IGNORED_P (SSA_NAME_VAR (t))) > + && gimple_assign_copy_p (SSA_NAME_DEF_STMT (t)) > + && is_gimple_min_invariant > + (gimple_assign_rhs1 (SSA_NAME_DEF_STMT (t)))) > + *tp = gimple_assign_rhs1 (SSA_NAME_DEF_STMT (t)); > + else > + *tp = t; This looks like a good idea to me (though i can't approve it). We may want to lift the (!SSA_NAME_VAR (t) || DECL_IGNORED_P (SSA_NAME_VAR (t))) when optimize is set - the amount of garbage inliner produce is large and killing it early is better than killing it later. This has chance to help early opts where ordering between ccp and einline is quite hard. > *walk_subtrees = 0; > return NULL; > } > @@ -1965,7 +1977,7 @@ copy_bb (copy_body_data *id, basic_block > > /* Statements produced by inlining can be unfolded, especially > when we constant propagated some operands. We can't fold > - them right now for two reasons: > + them right now in the general case for two reasons: > 1) folding require SSA_NAME_DEF_STMTs to be correct > 2) we can't change function calls to builtins. > So we just mark statement for later folding. We mark > @@ -1974,7 +1986,10 @@ copy_bb (copy_body_data *id, basic_block > foldable indirectly are updated. If this turns out to be > expensive, copy_body can be told to watch for nontrivial > changes. */ > - if (id->statements_to_fold) > + if (gimple_assign_cast_p (stmt) > + && is_gimple_min_invariant (gimple_assign_rhs1 (stmt))) > + fold_stmt (©_gsi); > + else if (id->statements_to_fold) Why this is needed? Is it just an optimization because we know that folding of casts will not do crazy stuff like SSA graph walking (that was original reason for delaying the folidng) or just an optimization to reudce the size of the list? Honza > id->statements_to_fold->add (stmt); > > /* We're duplicating a CALL_EXPR. Find any corresponding
On Wed, Apr 29, 2015 at 3:23 PM, Jan Hubicka <hubicka@ucw.cz> wrote: >> Historically the pragma Inline_Always of GNAT had been implemented in the FE >> because the RTL inliner and then the Tree inliner weren't invoked at -O0 or >> powerful enough to inline some constructs. But this approach had drawbacks, >> especially wrt debug info. These restrictions were gradually lifted and now >> the pragma is entirely piggybacked on the Tree inliner. >> >> This went mostly OK, except for a few cases where intrisinc operations that >> used to be reasonably handled at -O0 now generate awful code, the typical >> example being a modulus or division instrinsic by a power-of-2 generating a >> fully-fledged modulus or division instruction instead of a simple shift. >> >> Therefore the attached patch implements anonymous constant propagation in the >> inliner to fix the code quality regression. >> >> Tested on x86_64-suse-linux, OK for the mainline? >> >> if (TREE_CODE (*tp) == SSA_NAME) >> { >> - *tp = remap_ssa_name (*tp, id); >> + tree t = remap_ssa_name (*tp, id); >> + /* Perform anonymous constant propagation, this makes it possible to >> + generate reasonable code even at -O0 for operators implemented as >> + inline functions. */ >> + if (TREE_CODE (t) == SSA_NAME >> + && SSA_NAME_DEF_STMT (t) >> + && (!SSA_NAME_VAR (t) || DECL_IGNORED_P (SSA_NAME_VAR (t))) >> + && gimple_assign_copy_p (SSA_NAME_DEF_STMT (t)) >> + && is_gimple_min_invariant >> + (gimple_assign_rhs1 (SSA_NAME_DEF_STMT (t)))) >> + *tp = gimple_assign_rhs1 (SSA_NAME_DEF_STMT (t)); >> + else >> + *tp = t; > > This looks like a good idea to me (though i can't approve it). We may want to > lift the (!SSA_NAME_VAR (t) || DECL_IGNORED_P (SSA_NAME_VAR (t))) when optimize > is set - the amount of garbage inliner produce is large and killing it early is > better than killing it later. This has chance to help early opts where > ordering between ccp and einline is quite hard. Early opts run CCP as pretty much the first pass, so I don't see what you are refering to here. >> *walk_subtrees = 0; >> return NULL; >> } >> @@ -1965,7 +1977,7 @@ copy_bb (copy_body_data *id, basic_block >> >> /* Statements produced by inlining can be unfolded, especially >> when we constant propagated some operands. We can't fold >> - them right now for two reasons: >> + them right now in the general case for two reasons: >> 1) folding require SSA_NAME_DEF_STMTs to be correct >> 2) we can't change function calls to builtins. >> So we just mark statement for later folding. We mark >> @@ -1974,7 +1986,10 @@ copy_bb (copy_body_data *id, basic_block >> foldable indirectly are updated. If this turns out to be >> expensive, copy_body can be told to watch for nontrivial >> changes. */ >> - if (id->statements_to_fold) >> + if (gimple_assign_cast_p (stmt) >> + && is_gimple_min_invariant (gimple_assign_rhs1 (stmt))) >> + fold_stmt (©_gsi); >> + else if (id->statements_to_fold) > > Why this is needed? Is it just an optimization because we know that folding of > casts will not do crazy stuff like SSA graph walking (that was original reason > for delaying the folidng) or just an optimization to reudce the size of the list? Beause this folding turns the cast into sth the above hunk handles... It's basically a hack ;) A better approach might be to simply fold stmts we copy during the stmt copy itself by using gimple_build intead of gimple_copy + fold_stmt. Note that the match-and-simplify machinery does not perform constant propgation but relies on a valueization hook. Richard. > Honza >> id->statements_to_fold->add (stmt); >> >> /* We're duplicating a CALL_EXPR. Find any corresponding
> On Wed, Apr 29, 2015 at 3:23 PM, Jan Hubicka <hubicka@ucw.cz> wrote: > >> Historically the pragma Inline_Always of GNAT had been implemented in the FE > >> because the RTL inliner and then the Tree inliner weren't invoked at -O0 or > >> powerful enough to inline some constructs. But this approach had drawbacks, > >> especially wrt debug info. These restrictions were gradually lifted and now > >> the pragma is entirely piggybacked on the Tree inliner. > >> > >> This went mostly OK, except for a few cases where intrisinc operations that > >> used to be reasonably handled at -O0 now generate awful code, the typical > >> example being a modulus or division instrinsic by a power-of-2 generating a > >> fully-fledged modulus or division instruction instead of a simple shift. > >> > >> Therefore the attached patch implements anonymous constant propagation in the > >> inliner to fix the code quality regression. > >> > >> Tested on x86_64-suse-linux, OK for the mainline? > >> > >> if (TREE_CODE (*tp) == SSA_NAME) > >> { > >> - *tp = remap_ssa_name (*tp, id); > >> + tree t = remap_ssa_name (*tp, id); > >> + /* Perform anonymous constant propagation, this makes it possible to > >> + generate reasonable code even at -O0 for operators implemented as > >> + inline functions. */ > >> + if (TREE_CODE (t) == SSA_NAME > >> + && SSA_NAME_DEF_STMT (t) > >> + && (!SSA_NAME_VAR (t) || DECL_IGNORED_P (SSA_NAME_VAR (t))) > >> + && gimple_assign_copy_p (SSA_NAME_DEF_STMT (t)) > >> + && is_gimple_min_invariant > >> + (gimple_assign_rhs1 (SSA_NAME_DEF_STMT (t)))) > >> + *tp = gimple_assign_rhs1 (SSA_NAME_DEF_STMT (t)); > >> + else > >> + *tp = t; > > > > This looks like a good idea to me (though i can't approve it). We may want to > > lift the (!SSA_NAME_VAR (t) || DECL_IGNORED_P (SSA_NAME_VAR (t))) when optimize > > is set - the amount of garbage inliner produce is large and killing it early is > > better than killing it later. This has chance to help early opts where > > ordering between ccp and einline is quite hard. > > Early opts run CCP as pretty much the first pass, so I don't see what > you are refering to here. Hmm, you are right. I remember playing with similar patch but that was before we turned off iteration in early inliner and it was motivated to do more of indirect call promotion. Since ipa-prop is no longer complete joke on propagating devirutalization info perhaps this is no longer too important. honza
> Hmm, special-casing this in the inliner looks odd to me. ISTR the inliner > already propagates constant parameters to immediate uses, so I guess > you run into the casting case you handle specially. Right on both counts, the original GIMPLE looks like: right.3 = (system__storage_elements__integer_address) right; D.4203 = D.4201 % right.3; D.4200 = (system__storage_elements__storage_offset) D.4203; return D.4200; and setup_one_parameter has: /* If there is no setup required and we are in SSA, take the easy route replacing all SSA names representing the function parameter by the SSA name passed to function. We need to construct map for the variable anyway as it might be used in different SSA names when parameter is set in function. Do replacement at -O0 for const arguments replaced by constant. This is important for builtin_constant_p and other construct requiring constant argument to be visible in inlined function body. */ if (gimple_in_ssa_p (cfun) && rhs && def && is_gimple_reg (p) && (optimize || (TREE_READONLY (p) && is_gimple_min_invariant (rhs))) && (TREE_CODE (rhs) == SSA_NAME || is_gimple_min_invariant (rhs)) && !SSA_NAME_OCCURS_IN_ABNORMAL_PHI (def)) { insert_decl_map (id, def, rhs); return insert_init_debug_bind (id, bb, var, rhs, NULL); } and here is the GIMPLE after inlining: _17 = _16; _18 = _17; right.3_19 = 8; _20 = _18 % right.3_19; _21 = (system__storage_elements__storage_offset) _20; so the constant replacement is done for right.3_19 by the above block. > But then if any constant propagation is ok at -O0 why not perform > full-fledged constant propagation (with !DECL_IGNORED_P vars as a > propagation barrier) as a regular SSA pass? That is, if you'd had a > Inline_Always function like > > int foo (int i) > { > return (i + 1) / i; > } > > you'd not get the desired result as the i + 1 stmt wouldn't be folded > and thus the (i + 1) / i stmt wouldn't either. That's OK, only the trivial cases need to be dealt with since it's -O0 so running a fully-fledged constant propagation seems overkill. > That said - why does RTL optimization not save you here anyway? > After all, it is responsible for turning divisions into shifts. You mean the RTL expander? Because the SSA name isn't replaced with the RHS of its defining statement in: /* For EXPAND_INITIALIZER try harder to get something simpler. */ if (g == NULL && modifier == EXPAND_INITIALIZER && !SSA_NAME_IS_DEFAULT_DEF (exp) && (optimize || DECL_IGNORED_P (SSA_NAME_VAR (exp))) && stmt_is_replaceable_p (SSA_NAME_DEF_STMT (exp))) g = SSA_NAME_DEF_STMT (exp); since modifier is EXPAND_NORMAL. Do you think it would be OK to be a little more aggressive here? Something like: /* If we aren't on the LHS, look into the defining statement. */ if (g == NULL && modifier != EXPAND_WRITE && !SSA_NAME_IS_DEFAULT_DEF (exp) && (optimize || DECL_IGNORED_P (SSA_NAME_VAR (exp))) && stmt_is_replaceable_p (SSA_NAME_DEF_STMT (exp))) { g = SSA_NAME_DEF_STMT (exp); /* For EXPAND_INITIALIZER substitute in all cases, otherwise do it only for constants. */ if (modifier != EXPAND_INITIALIZER && (!gimple_assign_copy_p (g) || !is_gimple_min_invariant (gimple_assign_rhs1 (g)))) g = NULL; } That's certainly sufficient here.
On May 1, 2015 12:27:17 PM GMT+02:00, Eric Botcazou <ebotcazou@adacore.com> wrote: >> Hmm, special-casing this in the inliner looks odd to me. ISTR the >inliner >> already propagates constant parameters to immediate uses, so I guess >> you run into the casting case you handle specially. > >Right on both counts, the original GIMPLE looks like: > > right.3 = (system__storage_elements__integer_address) right; > D.4203 = D.4201 % right.3; > D.4200 = (system__storage_elements__storage_offset) D.4203; > return D.4200; > >and setup_one_parameter has: > >/* If there is no setup required and we are in SSA, take the easy route > replacing all SSA names representing the function parameter by the > SSA name passed to function. > > We need to construct map for the variable anyway as it might be used > in different SSA names when parameter is set in function. > > Do replacement at -O0 for const arguments replaced by constant. > This is important for builtin_constant_p and other construct requiring > constant argument to be visible in inlined function body. */ > if (gimple_in_ssa_p (cfun) && rhs && def && is_gimple_reg (p) > && (optimize > || (TREE_READONLY (p) > && is_gimple_min_invariant (rhs))) > && (TREE_CODE (rhs) == SSA_NAME > || is_gimple_min_invariant (rhs)) > && !SSA_NAME_OCCURS_IN_ABNORMAL_PHI (def)) > { > insert_decl_map (id, def, rhs); > return insert_init_debug_bind (id, bb, var, rhs, NULL); > } > >and here is the GIMPLE after inlining: > > _17 = _16; > _18 = _17; > right.3_19 = 8; > _20 = _18 % right.3_19; > _21 = (system__storage_elements__storage_offset) _20; > >so the constant replacement is done for right.3_19 by the above block. > >> But then if any constant propagation is ok at -O0 why not perform >> full-fledged constant propagation (with !DECL_IGNORED_P vars as a >> propagation barrier) as a regular SSA pass? That is, if you'd had a >> Inline_Always function like >> >> int foo (int i) >> { >> return (i + 1) / i; >> } >> >> you'd not get the desired result as the i + 1 stmt wouldn't be folded >> and thus the (i + 1) / i stmt wouldn't either. > >That's OK, only the trivial cases need to be dealt with since it's -O0 >so >running a fully-fledged constant propagation seems overkill. > >> That said - why does RTL optimization not save you here anyway? >> After all, it is responsible for turning divisions into shifts. > >You mean the RTL expander? Because the SSA name isn't replaced with >the RHS >of its defining statement in: > > /* For EXPAND_INITIALIZER try harder to get something simpler. */ > if (g == NULL > && modifier == EXPAND_INITIALIZER > && !SSA_NAME_IS_DEFAULT_DEF (exp) > && (optimize || DECL_IGNORED_P (SSA_NAME_VAR (exp))) > && stmt_is_replaceable_p (SSA_NAME_DEF_STMT (exp))) > g = SSA_NAME_DEF_STMT (exp); > >since modifier is EXPAND_NORMAL. Do you think it would be OK to be a >little >more aggressive here? Something like: > > /* If we aren't on the LHS, look into the defining statement. */ > if (g == NULL > && modifier != EXPAND_WRITE > && !SSA_NAME_IS_DEFAULT_DEF (exp) > && (optimize || DECL_IGNORED_P (SSA_NAME_VAR (exp))) > && stmt_is_replaceable_p (SSA_NAME_DEF_STMT (exp))) > { > g = SSA_NAME_DEF_STMT (exp); > /* For EXPAND_INITIALIZER substitute in all cases, otherwise do > it only for constants. */ > if (modifier != EXPAND_INITIALIZER > && (!gimple_assign_copy_p (g) > || !is_gimple_min_invariant (gimple_assign_rhs1 (g)))) > g = NULL; > } > >That's certainly sufficient here. Yeah, I think that's a way better place for the hack. Richard.
> Yeah, I think that's a way better place for the hack.
OK, how aggressive then? We could as well do the substitution for all copies:
/* For EXPAND_INITIALIZER try harder to get something simpler.
Otherwise, substitute copies on the RHS, this can propagate
constants at -O0 and thus simplify arithmetic operations. */
if (g == NULL
&& !SSA_NAME_IS_DEFAULT_DEF (exp)
&& (optimize || DECL_IGNORED_P (SSA_NAME_VAR (exp)))
&& (modifier == EXPAND_INITIALIZER
|| (modifier != EXPAND_WRITE
&& gimple_assign_copy_p (SSA_NAME_DEF_STMT (exp))))
&& stmt_is_replaceable_p (SSA_NAME_DEF_STMT (exp)))
g = SSA_NAME_DEF_STMT (exp);
Index: tree-inline.c =================================================================== --- tree-inline.c (revision 222439) +++ tree-inline.c (working copy) @@ -898,7 +898,19 @@ remap_gimple_op_r (tree *tp, int *walk_s if (TREE_CODE (*tp) == SSA_NAME) { - *tp = remap_ssa_name (*tp, id); + tree t = remap_ssa_name (*tp, id); + /* Perform anonymous constant propagation, this makes it possible to + generate reasonable code even at -O0 for operators implemented as + inline functions. */ + if (TREE_CODE (t) == SSA_NAME + && SSA_NAME_DEF_STMT (t) + && (!SSA_NAME_VAR (t) || DECL_IGNORED_P (SSA_NAME_VAR (t))) + && gimple_assign_copy_p (SSA_NAME_DEF_STMT (t)) + && is_gimple_min_invariant + (gimple_assign_rhs1 (SSA_NAME_DEF_STMT (t)))) + *tp = gimple_assign_rhs1 (SSA_NAME_DEF_STMT (t)); + else + *tp = t; *walk_subtrees = 0; return NULL; } @@ -1965,7 +1977,7 @@ copy_bb (copy_body_data *id, basic_block /* Statements produced by inlining can be unfolded, especially when we constant propagated some operands. We can't fold - them right now for two reasons: + them right now in the general case for two reasons: 1) folding require SSA_NAME_DEF_STMTs to be correct 2) we can't change function calls to builtins. So we just mark statement for later folding. We mark @@ -1974,7 +1986,10 @@ copy_bb (copy_body_data *id, basic_block foldable indirectly are updated. If this turns out to be expensive, copy_body can be told to watch for nontrivial changes. */ - if (id->statements_to_fold) + if (gimple_assign_cast_p (stmt) + && is_gimple_min_invariant (gimple_assign_rhs1 (stmt))) + fold_stmt (©_gsi); + else if (id->statements_to_fold) id->statements_to_fold->add (stmt); /* We're duplicating a CALL_EXPR. Find any corresponding