diff mbox

PR tree-optimization/100081 - Limit depth of logical expression windback.

Message ID 60956477-d556-98f6-8973-62d215dcf023@redhat.com
State New
Headers show

Commit Message

Andrew MacLeod April 19, 2021, 12:11 a.m. UTC
The code presented at the time wrestrict was invoking ranger has a basic 
block with about 7200 statements, all of them calculations which fed 
into 2614 logical ORs and 1480 logical ANDs.

the GORI component which calculates outgoing ranges starts at the exit 
of the block and works it way back  thru the basic block to the ssa_name 
requested, solving the equations along the way.

Logical expressions require a TRUE and FALSE results to be calculated 
for the ssa_name for each of two condition register operands on the 
logical expression.  Sometimes we can shortcut it, but frequently it its 
requires all 4 to properly evaluate.

When logical operations feed each other, this can become exponential.. 
and that was the situation here.

We had introduced a logical cache to attempt to reduce the number of 
calculations, but it was insufficient in this case.  We had previously 
discussed among ourselves that limiting the depth was probably 
sufficient, since the odds of being able to extract a meaningful range 
thru a large series of logical operations becomes highly unlikely.

This patch implements the depth limiter for logical statements and sets 
it at  6.  If there are more than 6 logical statements feeding each 
other we decide its not worth looking back any further...  and say this 
edge does not compute a range for that SSA_NAME.

This is different than the original patch I proposed in the PR, but 
accomplishes the same thing.  The original patch did the depth limit 
check at the time of evaluating an outgoing range.  This patch moves the 
check earlier when building the def chains, so the names never even 
appear as possible exports. The previous patch was still building these 
def chains a few thousand SSA names deep, but never using them due to 
the cutoff.   This patch will also therefore reduce the size of those 
dependency bitmaps which are not going to be used in either case.

This passes bootstrap and regression testing on x86_64-pc-linux-gnu.  I 
also ran it against the unlimited depth version on a full set of GCC 
source files, and there was 0 difference between this and the original 
in the cases ranger gets.

OK for trunk?

Andrew


PS. Second try with a new mailer system...  hopefully this time...

Comments

Jakub Jelinek April 19, 2021, 7:40 a.m. UTC | #1
On Sun, Apr 18, 2021 at 08:11:21PM -0400, Andrew MacLeod via Gcc-patches wrote:
> --- a/gcc/gimple-range-gori.cc
> +++ b/gcc/gimple-range-gori.cc
> @@ -29,6 +29,36 @@ along with GCC; see the file COPYING3.  If not see
>  #include "gimple-pretty-print.h"
>  #include "gimple-range.h"
>  
> +// Limit the nested depth thru logical expressions which GORI will build
> +// def chains.
> +#define LOGICAL_LIMIT	6

Such limits should be in params.def so that users can override them.

> +// Return TRUE if GS is a logical && or || expression.
> +
> +static inline bool
> +is_gimple_logical_p (const gimple *gs)
> +{
> +  // Look for boolean and/or condition.
> +  if (gimple_code (gs) == GIMPLE_ASSIGN)

  if (is_gimple_assign (gs))

is the normal spelling of this check.

But more importantly, isn't 6 too low for logicals, and wouldn't it be
better to put a cap not on the number of seen logicals, but on how many
SSA_NAMEs are handled in a query?  Because it is easy to construct
testcases where the logical limit would not trigger but the operands
of comparisons used in logicals would be thousands of arithmetic/bitwise
statements etc.  And it isn't just artificial, we have various examples
of generated tests in bugzilla that typically can't be compiled in
reasonable time at -O2 and need to use -O1 and have huge basic blocks with
very deep chains.

One thing is evrp/vrp passes where we want to compute ranges of all
SSA_NAMEs and if caching is done, compute range of everything at most once
(or with some bounded iteration counts for loops or whatever ranger uses).

And another thing is where passes ask for ranges, perhaps just those should
use the limit or should use lower limit.

	Jakub
Andrew MacLeod April 19, 2021, 4:11 p.m. UTC | #2
On 4/19/21 3:40 AM, Jakub Jelinek wrote:
> On Sun, Apr 18, 2021 at 08:11:21PM -0400, Andrew MacLeod via Gcc-patches wrote:
>> --- a/gcc/gimple-range-gori.cc
>> +++ b/gcc/gimple-range-gori.cc
>> @@ -29,6 +29,36 @@ along with GCC; see the file COPYING3.  If not see
>>   #include "gimple-pretty-print.h"
>>   #include "gimple-range.h"
>>   
>> +// Limit the nested depth thru logical expressions which GORI will build
>> +// def chains.
>> +#define LOGICAL_LIMIT	6
> Such limits should be in params.def so that users can override them.
>
>> +// Return TRUE if GS is a logical && or || expression.
>> +
>> +static inline bool
>> +is_gimple_logical_p (const gimple *gs)
>> +{
>> +  // Look for boolean and/or condition.
>> +  if (gimple_code (gs) == GIMPLE_ASSIGN)
>    if (is_gimple_assign (gs))
>
> is the normal spelling of this check.
>
> But more importantly, isn't 6 too low for logicals, and wouldn't it be
> better to put a cap not on the number of seen logicals, but on how many
> SSA_NAMEs are handled in a query?  Because it is easy to construct
> testcases where the logical limit would not trigger but the operands
> of comparisons used in logicals would be thousands of arithmetic/bitwise
> statements etc.  And it isn't just artificial, we have various examples
> of generated tests in bugzilla that typically can't be compiled in
> reasonable time at -O2 and need to use -O1 and have huge basic blocks with
> very deep chains.

The chains aren't typically  the issue, its a linear calculation.. I 
gave a small example in the PR of why the logical expressions cause an 
exponential growth.

Most values are only ever calculated once and are cached.  the only time 
we do this backward calculation is when a range is changed by the 
condition at the bottom of a block, and used outside the block and 
therefore the edge will have a different value than the range in the block.

I dont think 6 logicals is too low... in fact 4 is probably plenty 
99.9999% of the time.  This is not during a forward walk...  we will 
walk thru every logical forward and calculate ranges as appropriate.. 
This is only when trying to determine a refined range on an edge. so for 
example

<bb 2> [local count: 1073741823]:
   _1 = x_20(D) == 10;
   _2 = y_21(D) == 10;
   _3 = _1 & _2;
   _4 = x_20(D) == 20;
   _5 = y_21(D) == 20;
   _6 = _4 & _5;
   _7 = x_20(D) == 30;
   _8 = y_21(D) == 30;
   _9 = _7 & _8;
   _10 = x_20(D) == 40;
   _11 = y_21(D) == 40;
   _12 = _10 & _11;
   _13 = x_20(D) == 50;
   _14 = y_21(D) == 50;
   _15 = _13 & _14;
   _16 = x_20(D) == 60;
   _17 = y_21(D) == 60;
   _18 = _16 & _17;
   _19 = _3 | _6       // depth 3
   _20 = _9 | _12     // depth 3
   _21 = _15 | _18    // depth 2
   _22 = _19 | _20    // depth 2
   _23 = _21 | _22    // depth 1

   if (_23 != 0)
     goto <bb 8>; [50.00%]
   else
     goto <bb 9>; [50.00%]

We can still look back thru these conditions and could find a range for 
x_20 or y_21 of [10,10][20,20][30,30][40,40],[50,50][60,60] because the 
logical nesting depth never exceeds 3 when walking from back from the 
branch to the use of x_20 or y_21.

we still evaluate everything walking forward.. ie we get global ranges 
for everything in the block and evaluate the final condition based on 
the block values.

If we added a whole bunch more logicals, we'd still do that forward 
processing in EVRP.  This change would then say, you know, its not worth 
trying to calculate back thru that many logical expressions, so we'll 
just say x_20 and y_21 are whatever their range on entry to the block 
was  rather than trying to refine it on an outgoing edge. so we would 
get less rpecise ranges for x_20 and y_21 on these edges then.

Sure, a similar argument can be made that if we go through a long list 
of ssa_names in a straight calculation, the odds of getting a refined 
range decrease... but for example:

a_2 = a_1 + 1
a_3 = a_2 + 1
<...>
a_44 = a_43 + 1
if (a_44 > 100)
   goto <bb 7>
<bb 7>
   if (a_1 > 50)

asking for the range of a_1 in bb_7 causes a lookup to say "a_1 is an 
export and can be calculated on the edge 2->7", what is it?  We will 
walk back from a_44 > 100, evaluating those names based on the edge 
value of a_44  [101, +INF]  until we get to  a_2 = a_1 + 1...  a that 
point, a_2 will have an edge range of [58, +INF - 42]  or something to 
that effect, and it will return the evaluated a_2 range on the edge as 
[57, +INF - 43] , and we can fold the stmt.

We only do that once. bb_7 will have the value of a_1 cached since it 
was explicitly requested.    These really long back calculations don't 
really happen that often, so I havent seen the need to throttle them.  
We also don't cache the intervening values (a_3 thru a_44) ... Odds are 
they don't get asked for outside the block, so it would just be a memory 
waste.. we only ever cache whats actually asked for.

So a lot depends on whats actually requested.. most of the time, these 
really long blocks do not have many, if any, uses outside the block.. in 
which case we will never even do a wind back range request.  IF we find 
such cases, then maybe a limit is appropriate.. It may also be that we 
can find good heuristics for giving up early if we can detect that the 
calculation isnt going anything... a topic of research for stage 1.


> One thing is evrp/vrp passes where we want to compute ranges of all
> SSA_NAMEs and if caching is done, compute range of everything at most once
> (or with some bounded iteration counts for loops or whatever ranger uses).

Everything is only ever calculated once, EXCEPT when a condition at the 
bottom affects the value on and edge, and we only go back and 
re-calculate those when requested from other blocks.

in the above case, if we didnt ask for the range of a_1 in bb7...  we 
wouldn't calculate the range of anything on those edges.. we'd just use 
the global ranges. We only do work which is actually required... and the 
logical expression evaluation is the only thing in ranger which is 
currently not linear.  I thought the logical cache would be sufficient, 
but apparently not in sufficiently complex cases.


>   changes
>
> And another thing is where passes ask for ranges, perhaps just those should
> use the limit or should use lower limit.
>
> 	

so again in summary, this limit doesn't affect the calculation of global 
ranges at all, it only limits how far we will look back thru logical 
expressions to try to refine values on edges when they can be 
recalculated... just to eliminate the exponential behaviour. and I would 
daresay every one of them we get this way is a range we wouldn't have 
gotten any other way anyway.

I may not have explained this great...  its the most complex part of how 
ranger works and is at the heart of the on-demand system.

Andrew
Jeff Law April 20, 2021, 11:14 p.m. UTC | #3
On 4/19/2021 1:40 AM, Jakub Jelinek via Gcc-patches wrote:
> On Sun, Apr 18, 2021 at 08:11:21PM -0400, Andrew MacLeod via Gcc-patches wrote:
>> --- a/gcc/gimple-range-gori.cc
>> +++ b/gcc/gimple-range-gori.cc
>> @@ -29,6 +29,36 @@ along with GCC; see the file COPYING3.  If not see
>>   #include "gimple-pretty-print.h"
>>   #include "gimple-range.h"
>>   
>> +// Limit the nested depth thru logical expressions which GORI will build
>> +// def chains.
>> +#define LOGICAL_LIMIT	6
> Such limits should be in params.def so that users can override them.
>
>> +// Return TRUE if GS is a logical && or || expression.
>> +
>> +static inline bool
>> +is_gimple_logical_p (const gimple *gs)
>> +{
>> +  // Look for boolean and/or condition.
>> +  if (gimple_code (gs) == GIMPLE_ASSIGN)
>    if (is_gimple_assign (gs))
>
> is the normal spelling of this check.
>
> But more importantly, isn't 6 too low for logicals, and wouldn't it be
> better to put a cap not on the number of seen logicals, but on how many
> SSA_NAMEs are handled in a query?  Because it is easy to construct
> testcases where the logical limit would not trigger but the operands
> of comparisons used in logicals would be thousands of arithmetic/bitwise
> statements etc.  And it isn't just artificial, we have various examples
> of generated tests in bugzilla that typically can't be compiled in
> reasonable time at -O2 and need to use -O1 and have huge basic blocks with
> very deep chains.

FWIW, the DOM code which tries to do similar things has a 4 level 
recursion limit which seemed to catch the vast majority of cases. That 
translates into ~8 operands most of the time.    So Andrew's check seems 
to be in the right ballpark (it's doing something slightly different, 
but I think it's close enough to be comparable).


jeff
diff mbox

Patch

commit a36a130a1305bc2c103e47955354c0f0edda47f4
Author: Andrew MacLeod <amacleod@redhat.com>
Date:   Fri Apr 16 17:08:51 2021 -0400

    Limit depth of logical expression windback.
    
    Limit how many logical expressions GORI will look back through when
    evaluating outgoing edge range.
    
            PR tree-optimization/100081
            * gimple_range_gori.cc (LOGICAL_LIMIT): New local definition.
            (is_gimple_logical_p): Move to top of file.
            (range_def_chain::m_logical_depth): New member.
            (range_def_chain::range_def_chain): Initialize m_logical_depth.
            (range_def_chain::get_def_chain): Don't build defchains through more
            than LOGICAL_LIMIT logical expressions.

diff --git a/gcc/gimple-range-gori.cc b/gcc/gimple-range-gori.cc
index 7f7f3dc0d69..4be49369bbd 100644
--- a/gcc/gimple-range-gori.cc
+++ b/gcc/gimple-range-gori.cc
@@ -29,6 +29,36 @@  along with GCC; see the file COPYING3.  If not see
 #include "gimple-pretty-print.h"
 #include "gimple-range.h"
 
+// Limit the nested depth thru logical expressions which GORI will build
+// def chains.
+#define LOGICAL_LIMIT	6
+
+// Return TRUE if GS is a logical && or || expression.
+
+static inline bool
+is_gimple_logical_p (const gimple *gs)
+{
+  // Look for boolean and/or condition.
+  if (gimple_code (gs) == GIMPLE_ASSIGN)
+    switch (gimple_expr_code (gs))
+      {
+	case TRUTH_AND_EXPR:
+	case TRUTH_OR_EXPR:
+	  return true;
+
+	case BIT_AND_EXPR:
+	case BIT_IOR_EXPR:
+	  // Bitwise operations on single bits are logical too.
+	  if (types_compatible_p (TREE_TYPE (gimple_assign_rhs1 (gs)),
+				  boolean_type_node))
+	    return true;
+	  break;
+
+	default:
+	  break;
+      }
+  return false;
+}
 
 /* RANGE_DEF_CHAIN is used to determine what SSA names in a block can
    have range information calculated for them, and what the
@@ -76,6 +106,7 @@  public:
 private:
   vec<bitmap> m_def_chain;	// SSA_NAME : def chain components.
   void build_def_chain (tree name, bitmap result, basic_block bb);
+  int m_logical_depth;
 };
 
 
@@ -85,6 +116,7 @@  range_def_chain::range_def_chain ()
 {
   m_def_chain.create (0);
   m_def_chain.safe_grow_cleared (num_ssa_names);
+  m_logical_depth = 0;
 }
 
 // Destruct a range_def_chain.
@@ -157,6 +189,7 @@  range_def_chain::get_def_chain (tree name)
 {
   tree ssa1, ssa2, ssa3;
   unsigned v = SSA_NAME_VERSION (name);
+  bool is_logical = false;
 
   // If it has already been processed, just return the cached value.
   if (has_def_chain (name))
@@ -169,6 +202,15 @@  range_def_chain::get_def_chain (tree name)
   gimple *stmt = SSA_NAME_DEF_STMT (name);
   if (gimple_range_handler (stmt))
     {
+      is_logical = is_gimple_logical_p (stmt);
+      // Terminate the def chains if we see too many cascading logical stmts.
+      if (is_logical)
+	{
+	  if (m_logical_depth == LOGICAL_LIMIT)
+	    return NULL;
+	  m_logical_depth++;
+	}
+
       ssa1 = gimple_range_ssa_p (gimple_range_operand1 (stmt));
       ssa2 = gimple_range_ssa_p (gimple_range_operand2 (stmt));
       ssa3 = NULL_TREE;
@@ -195,6 +237,9 @@  range_def_chain::get_def_chain (tree name)
   if (ssa3)
     build_def_chain (ssa3, m_def_chain[v], bb);
 
+  if (is_logical)
+    m_logical_depth--;
+
   // If we run into pathological cases where the defintion chains are
   // huge (ie  huge basic block fully unrolled) we might be able to limit
   // this by deciding here that if some criteria is satisfied, we change the
@@ -562,32 +607,6 @@  gori_compute::compute_operand_range_switch (irange &r, gswitch *s,
   return false;
 }
 
-// Return TRUE if GS is a logical && or || expression.
-
-static inline bool
-is_gimple_logical_p (const gimple *gs)
-{
-  // Look for boolean and/or condition.
-  if (gimple_code (gs) == GIMPLE_ASSIGN)
-    switch (gimple_expr_code (gs))
-      {
-	case TRUTH_AND_EXPR:
-	case TRUTH_OR_EXPR:
-	  return true;
-
-	case BIT_AND_EXPR:
-	case BIT_IOR_EXPR:
-	  // Bitwise operations on single bits are logical too.
-	  if (types_compatible_p (TREE_TYPE (gimple_assign_rhs1 (gs)),
-				  boolean_type_node))
-	    return true;
-	  break;
-
-	default:
-	  break;
-      }
-  return false;
-}
 
 // Return an evaluation for NAME as it would appear in STMT when the
 // statement's lhs evaluates to LHS.  If successful, return TRUE and