Message ID | 60956477-d556-98f6-8973-62d215dcf023@redhat.com |
---|---|
State | New |
Headers | show |
On Sun, Apr 18, 2021 at 08:11:21PM -0400, Andrew MacLeod via Gcc-patches wrote: > --- a/gcc/gimple-range-gori.cc > +++ b/gcc/gimple-range-gori.cc > @@ -29,6 +29,36 @@ along with GCC; see the file COPYING3. If not see > #include "gimple-pretty-print.h" > #include "gimple-range.h" > > +// Limit the nested depth thru logical expressions which GORI will build > +// def chains. > +#define LOGICAL_LIMIT 6 Such limits should be in params.def so that users can override them. > +// Return TRUE if GS is a logical && or || expression. > + > +static inline bool > +is_gimple_logical_p (const gimple *gs) > +{ > + // Look for boolean and/or condition. > + if (gimple_code (gs) == GIMPLE_ASSIGN) if (is_gimple_assign (gs)) is the normal spelling of this check. But more importantly, isn't 6 too low for logicals, and wouldn't it be better to put a cap not on the number of seen logicals, but on how many SSA_NAMEs are handled in a query? Because it is easy to construct testcases where the logical limit would not trigger but the operands of comparisons used in logicals would be thousands of arithmetic/bitwise statements etc. And it isn't just artificial, we have various examples of generated tests in bugzilla that typically can't be compiled in reasonable time at -O2 and need to use -O1 and have huge basic blocks with very deep chains. One thing is evrp/vrp passes where we want to compute ranges of all SSA_NAMEs and if caching is done, compute range of everything at most once (or with some bounded iteration counts for loops or whatever ranger uses). And another thing is where passes ask for ranges, perhaps just those should use the limit or should use lower limit. Jakub
On 4/19/21 3:40 AM, Jakub Jelinek wrote: > On Sun, Apr 18, 2021 at 08:11:21PM -0400, Andrew MacLeod via Gcc-patches wrote: >> --- a/gcc/gimple-range-gori.cc >> +++ b/gcc/gimple-range-gori.cc >> @@ -29,6 +29,36 @@ along with GCC; see the file COPYING3. If not see >> #include "gimple-pretty-print.h" >> #include "gimple-range.h" >> >> +// Limit the nested depth thru logical expressions which GORI will build >> +// def chains. >> +#define LOGICAL_LIMIT 6 > Such limits should be in params.def so that users can override them. > >> +// Return TRUE if GS is a logical && or || expression. >> + >> +static inline bool >> +is_gimple_logical_p (const gimple *gs) >> +{ >> + // Look for boolean and/or condition. >> + if (gimple_code (gs) == GIMPLE_ASSIGN) > if (is_gimple_assign (gs)) > > is the normal spelling of this check. > > But more importantly, isn't 6 too low for logicals, and wouldn't it be > better to put a cap not on the number of seen logicals, but on how many > SSA_NAMEs are handled in a query? Because it is easy to construct > testcases where the logical limit would not trigger but the operands > of comparisons used in logicals would be thousands of arithmetic/bitwise > statements etc. And it isn't just artificial, we have various examples > of generated tests in bugzilla that typically can't be compiled in > reasonable time at -O2 and need to use -O1 and have huge basic blocks with > very deep chains. The chains aren't typically the issue, its a linear calculation.. I gave a small example in the PR of why the logical expressions cause an exponential growth. Most values are only ever calculated once and are cached. the only time we do this backward calculation is when a range is changed by the condition at the bottom of a block, and used outside the block and therefore the edge will have a different value than the range in the block. I dont think 6 logicals is too low... in fact 4 is probably plenty 99.9999% of the time. This is not during a forward walk... we will walk thru every logical forward and calculate ranges as appropriate.. This is only when trying to determine a refined range on an edge. so for example <bb 2> [local count: 1073741823]: _1 = x_20(D) == 10; _2 = y_21(D) == 10; _3 = _1 & _2; _4 = x_20(D) == 20; _5 = y_21(D) == 20; _6 = _4 & _5; _7 = x_20(D) == 30; _8 = y_21(D) == 30; _9 = _7 & _8; _10 = x_20(D) == 40; _11 = y_21(D) == 40; _12 = _10 & _11; _13 = x_20(D) == 50; _14 = y_21(D) == 50; _15 = _13 & _14; _16 = x_20(D) == 60; _17 = y_21(D) == 60; _18 = _16 & _17; _19 = _3 | _6 // depth 3 _20 = _9 | _12 // depth 3 _21 = _15 | _18 // depth 2 _22 = _19 | _20 // depth 2 _23 = _21 | _22 // depth 1 if (_23 != 0) goto <bb 8>; [50.00%] else goto <bb 9>; [50.00%] We can still look back thru these conditions and could find a range for x_20 or y_21 of [10,10][20,20][30,30][40,40],[50,50][60,60] because the logical nesting depth never exceeds 3 when walking from back from the branch to the use of x_20 or y_21. we still evaluate everything walking forward.. ie we get global ranges for everything in the block and evaluate the final condition based on the block values. If we added a whole bunch more logicals, we'd still do that forward processing in EVRP. This change would then say, you know, its not worth trying to calculate back thru that many logical expressions, so we'll just say x_20 and y_21 are whatever their range on entry to the block was rather than trying to refine it on an outgoing edge. so we would get less rpecise ranges for x_20 and y_21 on these edges then. Sure, a similar argument can be made that if we go through a long list of ssa_names in a straight calculation, the odds of getting a refined range decrease... but for example: a_2 = a_1 + 1 a_3 = a_2 + 1 <...> a_44 = a_43 + 1 if (a_44 > 100) goto <bb 7> <bb 7> if (a_1 > 50) asking for the range of a_1 in bb_7 causes a lookup to say "a_1 is an export and can be calculated on the edge 2->7", what is it? We will walk back from a_44 > 100, evaluating those names based on the edge value of a_44 [101, +INF] until we get to a_2 = a_1 + 1... a that point, a_2 will have an edge range of [58, +INF - 42] or something to that effect, and it will return the evaluated a_2 range on the edge as [57, +INF - 43] , and we can fold the stmt. We only do that once. bb_7 will have the value of a_1 cached since it was explicitly requested. These really long back calculations don't really happen that often, so I havent seen the need to throttle them. We also don't cache the intervening values (a_3 thru a_44) ... Odds are they don't get asked for outside the block, so it would just be a memory waste.. we only ever cache whats actually asked for. So a lot depends on whats actually requested.. most of the time, these really long blocks do not have many, if any, uses outside the block.. in which case we will never even do a wind back range request. IF we find such cases, then maybe a limit is appropriate.. It may also be that we can find good heuristics for giving up early if we can detect that the calculation isnt going anything... a topic of research for stage 1. > One thing is evrp/vrp passes where we want to compute ranges of all > SSA_NAMEs and if caching is done, compute range of everything at most once > (or with some bounded iteration counts for loops or whatever ranger uses). Everything is only ever calculated once, EXCEPT when a condition at the bottom affects the value on and edge, and we only go back and re-calculate those when requested from other blocks. in the above case, if we didnt ask for the range of a_1 in bb7... we wouldn't calculate the range of anything on those edges.. we'd just use the global ranges. We only do work which is actually required... and the logical expression evaluation is the only thing in ranger which is currently not linear. I thought the logical cache would be sufficient, but apparently not in sufficiently complex cases. > changes > > And another thing is where passes ask for ranges, perhaps just those should > use the limit or should use lower limit. > > so again in summary, this limit doesn't affect the calculation of global ranges at all, it only limits how far we will look back thru logical expressions to try to refine values on edges when they can be recalculated... just to eliminate the exponential behaviour. and I would daresay every one of them we get this way is a range we wouldn't have gotten any other way anyway. I may not have explained this great... its the most complex part of how ranger works and is at the heart of the on-demand system. Andrew
On 4/19/2021 1:40 AM, Jakub Jelinek via Gcc-patches wrote: > On Sun, Apr 18, 2021 at 08:11:21PM -0400, Andrew MacLeod via Gcc-patches wrote: >> --- a/gcc/gimple-range-gori.cc >> +++ b/gcc/gimple-range-gori.cc >> @@ -29,6 +29,36 @@ along with GCC; see the file COPYING3. If not see >> #include "gimple-pretty-print.h" >> #include "gimple-range.h" >> >> +// Limit the nested depth thru logical expressions which GORI will build >> +// def chains. >> +#define LOGICAL_LIMIT 6 > Such limits should be in params.def so that users can override them. > >> +// Return TRUE if GS is a logical && or || expression. >> + >> +static inline bool >> +is_gimple_logical_p (const gimple *gs) >> +{ >> + // Look for boolean and/or condition. >> + if (gimple_code (gs) == GIMPLE_ASSIGN) > if (is_gimple_assign (gs)) > > is the normal spelling of this check. > > But more importantly, isn't 6 too low for logicals, and wouldn't it be > better to put a cap not on the number of seen logicals, but on how many > SSA_NAMEs are handled in a query? Because it is easy to construct > testcases where the logical limit would not trigger but the operands > of comparisons used in logicals would be thousands of arithmetic/bitwise > statements etc. And it isn't just artificial, we have various examples > of generated tests in bugzilla that typically can't be compiled in > reasonable time at -O2 and need to use -O1 and have huge basic blocks with > very deep chains. FWIW, the DOM code which tries to do similar things has a 4 level recursion limit which seemed to catch the vast majority of cases. That translates into ~8 operands most of the time. So Andrew's check seems to be in the right ballpark (it's doing something slightly different, but I think it's close enough to be comparable). jeff
commit a36a130a1305bc2c103e47955354c0f0edda47f4 Author: Andrew MacLeod <amacleod@redhat.com> Date: Fri Apr 16 17:08:51 2021 -0400 Limit depth of logical expression windback. Limit how many logical expressions GORI will look back through when evaluating outgoing edge range. PR tree-optimization/100081 * gimple_range_gori.cc (LOGICAL_LIMIT): New local definition. (is_gimple_logical_p): Move to top of file. (range_def_chain::m_logical_depth): New member. (range_def_chain::range_def_chain): Initialize m_logical_depth. (range_def_chain::get_def_chain): Don't build defchains through more than LOGICAL_LIMIT logical expressions. diff --git a/gcc/gimple-range-gori.cc b/gcc/gimple-range-gori.cc index 7f7f3dc0d69..4be49369bbd 100644 --- a/gcc/gimple-range-gori.cc +++ b/gcc/gimple-range-gori.cc @@ -29,6 +29,36 @@ along with GCC; see the file COPYING3. If not see #include "gimple-pretty-print.h" #include "gimple-range.h" +// Limit the nested depth thru logical expressions which GORI will build +// def chains. +#define LOGICAL_LIMIT 6 + +// Return TRUE if GS is a logical && or || expression. + +static inline bool +is_gimple_logical_p (const gimple *gs) +{ + // Look for boolean and/or condition. + if (gimple_code (gs) == GIMPLE_ASSIGN) + switch (gimple_expr_code (gs)) + { + case TRUTH_AND_EXPR: + case TRUTH_OR_EXPR: + return true; + + case BIT_AND_EXPR: + case BIT_IOR_EXPR: + // Bitwise operations on single bits are logical too. + if (types_compatible_p (TREE_TYPE (gimple_assign_rhs1 (gs)), + boolean_type_node)) + return true; + break; + + default: + break; + } + return false; +} /* RANGE_DEF_CHAIN is used to determine what SSA names in a block can have range information calculated for them, and what the @@ -76,6 +106,7 @@ public: private: vec<bitmap> m_def_chain; // SSA_NAME : def chain components. void build_def_chain (tree name, bitmap result, basic_block bb); + int m_logical_depth; }; @@ -85,6 +116,7 @@ range_def_chain::range_def_chain () { m_def_chain.create (0); m_def_chain.safe_grow_cleared (num_ssa_names); + m_logical_depth = 0; } // Destruct a range_def_chain. @@ -157,6 +189,7 @@ range_def_chain::get_def_chain (tree name) { tree ssa1, ssa2, ssa3; unsigned v = SSA_NAME_VERSION (name); + bool is_logical = false; // If it has already been processed, just return the cached value. if (has_def_chain (name)) @@ -169,6 +202,15 @@ range_def_chain::get_def_chain (tree name) gimple *stmt = SSA_NAME_DEF_STMT (name); if (gimple_range_handler (stmt)) { + is_logical = is_gimple_logical_p (stmt); + // Terminate the def chains if we see too many cascading logical stmts. + if (is_logical) + { + if (m_logical_depth == LOGICAL_LIMIT) + return NULL; + m_logical_depth++; + } + ssa1 = gimple_range_ssa_p (gimple_range_operand1 (stmt)); ssa2 = gimple_range_ssa_p (gimple_range_operand2 (stmt)); ssa3 = NULL_TREE; @@ -195,6 +237,9 @@ range_def_chain::get_def_chain (tree name) if (ssa3) build_def_chain (ssa3, m_def_chain[v], bb); + if (is_logical) + m_logical_depth--; + // If we run into pathological cases where the defintion chains are // huge (ie huge basic block fully unrolled) we might be able to limit // this by deciding here that if some criteria is satisfied, we change the @@ -562,32 +607,6 @@ gori_compute::compute_operand_range_switch (irange &r, gswitch *s, return false; } -// Return TRUE if GS is a logical && or || expression. - -static inline bool -is_gimple_logical_p (const gimple *gs) -{ - // Look for boolean and/or condition. - if (gimple_code (gs) == GIMPLE_ASSIGN) - switch (gimple_expr_code (gs)) - { - case TRUTH_AND_EXPR: - case TRUTH_OR_EXPR: - return true; - - case BIT_AND_EXPR: - case BIT_IOR_EXPR: - // Bitwise operations on single bits are logical too. - if (types_compatible_p (TREE_TYPE (gimple_assign_rhs1 (gs)), - boolean_type_node)) - return true; - break; - - default: - break; - } - return false; -} // Return an evaluation for NAME as it would appear in STMT when the // statement's lhs evaluates to LHS. If successful, return TRUE and