Patchwork [tree-optimization] : Improve handling of conditional-branches on targets with high branch costs

login
register
mail settings
Submitter Kai Tietz
Date Oct. 10, 2011, 4:05 p.m.
Message ID <CAEwic4bt2VW9kaTg+ZegPq-9TLO2iQ5GAsgqD=Wts+TVNTpNEA@mail.gmail.com>
Download mbox | patch
Permalink /patch/118771/
State New
Headers show

Comments

Kai Tietz - Oct. 10, 2011, 4:05 p.m.
2011/10/10 Richard Guenther <richard.guenther@gmail.com>:
> On Mon, Oct 10, 2011 at 5:07 PM, Kai Tietz <ktietz70@googlemail.com> wrote:
>> 2011/10/10 Richard Guenther <richard.guenther@gmail.com>:
>>> On Mon, Oct 10, 2011 at 4:06 PM, Kai Tietz <ktietz70@googlemail.com> wrote:
>>>> 2011/10/10 Richard Guenther <richard.guenther@gmail.com>:
>>>>> On Mon, Oct 10, 2011 at 2:29 PM, Kai Tietz <ktietz70@googlemail.com> wrote:
>>>>>> Recent patch had a thinko on rhs of inner lhs check for TRUTH-IF.  It
>>>>>> has to be checked that the LHS code is same as outer CODE, as
>>>>>> otherwise we wouldn't apply different TRUTH-IF only on inner RHS of
>>>>>> LHS, which is of course wrong.
>>>>>>
>>>>>> Index: gcc/gcc/fold-const.c
>>>>>> ===================================================================
>>>>>> --- gcc.orig/gcc/fold-const.c
>>>>>> +++ gcc/gcc/fold-const.c
>>>>>> @@ -111,14 +111,13 @@ static tree decode_field_reference (loca
>>>>>>                                    tree *, tree *);
>>>>>>  static int all_ones_mask_p (const_tree, int);
>>>>>>  static tree sign_bit_p (tree, const_tree);
>>>>>> -static int simple_operand_p (const_tree);
>>>>>> +static int simple_operand_p (tree);
>>>>>>  static tree range_binop (enum tree_code, tree, tree, int, tree, int);
>>>>>>  static tree range_predecessor (tree);
>>>>>>  static tree range_successor (tree);
>>>>>>  static tree fold_range_test (location_t, enum tree_code, tree, tree, tree);
>>>>>>  static tree fold_cond_expr_with_comparison (location_t, tree, tree,
>>>>>> tree, tree);
>>>>>>  static tree unextend (tree, int, int, tree);
>>>>>> -static tree fold_truthop (location_t, enum tree_code, tree, tree, tree);
>>>>>>  static tree optimize_minmax_comparison (location_t, enum tree_code,
>>>>>>                                        tree, tree, tree);
>>>>>>  static tree extract_muldiv (tree, tree, enum tree_code, tree, bool *);
>>>>>> @@ -3500,7 +3499,7 @@ optimize_bit_field_compare (location_t l
>>>>>>   return lhs;
>>>>>>  }
>>>>>>
>>>>>> -/* Subroutine for fold_truthop: decode a field reference.
>>>>>> +/* Subroutine for fold_truth_andor_1: decode a field reference.
>>>>>>
>>>>>>    If EXP is a comparison reference, we return the innermost reference.
>>>>>>
>>>>>> @@ -3668,17 +3667,43 @@ sign_bit_p (tree exp, const_tree val)
>>>>>>   return NULL_TREE;
>>>>>>  }
>>>>>>
>>>>>> -/* Subroutine for fold_truthop: determine if an operand is simple enough
>>>>>> +/* Subroutine for fold_truth_andor_1: determine if an operand is simple enough
>>>>>>    to be evaluated unconditionally.  */
>>>>>>
>>>>>>  static int
>>>>>> -simple_operand_p (const_tree exp)
>>>>>> +simple_operand_p (tree exp)
>>>>>>  {
>>>>>> +  enum tree_code code;
>>>>>>   /* Strip any conversions that don't change the machine mode.  */
>>>>>>   STRIP_NOPS (exp);
>>>>>>
>>>>>> +  code = TREE_CODE (exp);
>>>>>> +
>>>>>> +  /* Handle some trivials   */
>>>>>> +  if (TREE_CODE_CLASS (code) == tcc_comparison)
>>>>>> +    return (tree_could_trap_p (exp)
>>>>>> +           && simple_operand_p (TREE_OPERAND (exp, 0))
>>>>>> +           && simple_operand_p (TREE_OPERAND (exp, 1)));
>>>>>
>>>>> And that's still wrong.
>>>>>
>>>>> Stopped reading here.
>>>>>
>>>>> Richard.
>>>>
>>>> Oh, there is a not missing.  I didn't spot that, sorry.
>>>>
>>>> To the point why we need to handle comparisons within simple_operand_p.
>>>>
>>>> If we reject comparisons and logical not here, we won't have any
>>>> branching optimization anymore, as this the patch moves into
>>>> fold_truthandor.
>>>>
>>>> The result with rejecting in simple_operand_p compares and logic-not
>>>> provides for the following example:
>>>
>>> But you change what simple_operand_p accepts and thus change what
>>> fold_truthop accepts as operands to its simplifications.
>>>
>>> Richard.
>>
>> Well, not really.  I assume you mean fold_truth_andor_1 (aka fold_truthop).
>>
>> It checks for
>> ...
>>  if (TREE_CODE_CLASS (lcode) != tcc_comparison
>>      || TREE_CODE_CLASS (rcode) != tcc_comparison)
>>    return 0;
>> ...
>> before checking for simple_operand_p.  So there is actual no change.
>> It might be of some interest here to add in a different patch support
>> for logic-not, but well, this is would be material for a different
>> patch.
>> So, it won't operate on anything else then comparisons as before.
>
> Sure, because simple_operand_p is checked on the comparison
> operands, not the comparison itself.
>
> Richard.

Right,  we would allow by this things like (a != b) < (c != d) etc.
Here it seems that only some cases for comparison in comparison are
folded.  (eg (a == b) == (c == d) -> (a == b) ^ (c == d) ). Well,
other material for a different patch.

To ensure that we use simple_operand_p in all cases, beside for
branching AND/OR chains, in same way as before, I added to this
function an additional argument, by which
the looking into comparisons can be activated.

Regards,
Kai

 static tree extract_muldiv (tree, tree, enum tree_code, tree, bool *);
@@ -3500,7 +3499,7 @@ optimize_bit_field_compare (location_t l
   return lhs;
 }
 
-/* Subroutine for fold_truthop: decode a field reference.
+/* Subroutine for fold_truth_andor_1: decode a field reference.

    If EXP is a comparison reference, we return the innermost reference.

@@ -3668,17 +3667,48 @@ sign_bit_p (tree exp, const_tree val)
   return NULL_TREE;
 }

-/* Subroutine for fold_truthop: determine if an operand is simple enough
-   to be evaluated unconditionally.  */
+/* Subroutine for fold_truth_andor_1: determine if an operand is simple enough
+   to be evaluated unconditionally.
+   If IN_COMPARES is TRUE, then we assume comparisons and logic-not
+   operations are simple, if their operands are simple.  */

-static int
-simple_operand_p (const_tree exp)
+static bool
+simple_operand_p (tree exp, bool in_compares)
 {
+  enum tree_code code;
   /* Strip any conversions that don't change the machine mode.  */
   STRIP_NOPS (exp);

+  code = TREE_CODE (exp);
+
+  if (in_compares)
+    {
+      if (TREE_CODE_CLASS (code) == tcc_comparison)
+	return (!tree_could_trap_p (exp)
+		&& simple_operand_p (TREE_OPERAND (exp, 0), true)
+		&& simple_operand_p (TREE_OPERAND (exp, 1), true));
+
+      if (FLOAT_TYPE_P (TREE_TYPE (exp))
+	  && tree_could_trap_p (exp))
+	return false;
+
+      switch (code)
+	{
+	case SSA_NAME:
+	  return true;
+	case TRUTH_NOT_EXPR:
+	  return simple_operand_p (TREE_OPERAND (exp, 0), true);
+	case BIT_NOT_EXPR:
+	  if (TREE_CODE (TREE_TYPE (exp)) != BOOLEAN_TYPE)
+	    return false;
+	  return simple_operand_p (TREE_OPERAND (exp, 0), true);
+	default:
+	  break;
+	}
+    }
+
   return (CONSTANT_CLASS_P (exp)
-	  || TREE_CODE (exp) == SSA_NAME
+  	  || code == SSA_NAME
 	  || (DECL_P (exp)
 	      && ! TREE_ADDRESSABLE (exp)
 	      && ! TREE_THIS_VOLATILE (exp)
@@ -4858,7 +4888,7 @@ fold_range_test (location_t loc, enum tr
       /* If simple enough, just rewrite.  Otherwise, make a SAVE_EXPR
 	 unless we are at top level or LHS contains a PLACEHOLDER_EXPR, in
 	 which cases we can't do this.  */
-      if (simple_operand_p (lhs))
+      if (simple_operand_p (lhs, false))
 	return build2_loc (loc, code == TRUTH_ANDIF_EXPR
 			   ? TRUTH_AND_EXPR : TRUTH_OR_EXPR,
 			   type, op0, op1);
@@ -4888,7 +4918,7 @@ fold_range_test (location_t loc, enum tr
   return 0;
 }
 
-/* Subroutine for fold_truthop: C is an INTEGER_CST interpreted as a P
+/* Subroutine for fold_truth_andor_1: C is an INTEGER_CST interpreted as a P
    bit value.  Arrange things so the extra bits will be set to zero if and
    only if C is signed-extended to its full width.  If MASK is nonzero,
    it is an INTEGER_CST that should be AND'ed with the extra bits.  */
@@ -5025,8 +5055,8 @@ merge_truthop_with_opposite_arm (locatio
    We return the simplified tree or 0 if no optimization is possible.  */

 static tree
-fold_truthop (location_t loc, enum tree_code code, tree truth_type,
-	      tree lhs, tree rhs)
+fold_truth_andor_1 (location_t loc, enum tree_code code, tree truth_type,
+		    tree lhs, tree rhs)
 {
   /* If this is the "or" of two comparisons, we can do something if
      the comparisons are NE_EXPR.  If this is the "and", we can do something
@@ -5054,8 +5084,6 @@ fold_truthop (location_t loc, enum tree_
   tree lntype, rntype, result;
   HOST_WIDE_INT first_bit, end_bit;
   int volatilep;
-  tree orig_lhs = lhs, orig_rhs = rhs;
-  enum tree_code orig_code = code;

   /* Start by getting the comparison codes.  Fail if anything is volatile.
      If one operand is a BIT_AND_EXPR with the constant one, treat it as if
@@ -5091,8 +5119,8 @@ fold_truthop (location_t loc, enum tree_
   rr_arg = TREE_OPERAND (rhs, 1);

   /* Simplify (x<y) && (x==y) into (x<=y) and related optimizations.  */
-  if (simple_operand_p (ll_arg)
-      && simple_operand_p (lr_arg))
+  if (simple_operand_p (ll_arg, false)
+      && simple_operand_p (lr_arg, false))
     {
       if (operand_equal_p (ll_arg, rl_arg, 0)
           && operand_equal_p (lr_arg, rr_arg, 0))
@@ -5119,14 +5147,13 @@ fold_truthop (location_t loc, enum tree_
   /* If the RHS can be evaluated unconditionally and its operands are
      simple, it wins to evaluate the RHS unconditionally on machines
      with expensive branches.  In this case, this isn't a comparison
-     that can be merged.  Avoid doing this if the RHS is a floating-point
-     comparison since those can trap.  */
+     that can be merged.  */

   if (BRANCH_COST (optimize_function_for_speed_p (cfun),
 		   false) >= 2
       && ! FLOAT_TYPE_P (TREE_TYPE (rl_arg))
-      && simple_operand_p (rl_arg)
-      && simple_operand_p (rr_arg))
+      && simple_operand_p (rl_arg, false)
+      && simple_operand_p (rr_arg, false))
     {
       /* Convert (a != 0) || (b != 0) into (a | b) != 0.  */
       if (code == TRUTH_OR_EXPR
@@ -5149,13 +5176,6 @@ fold_truthop (location_t loc, enum tree_
 			   build2 (BIT_IOR_EXPR, TREE_TYPE (ll_arg),
 				   ll_arg, rl_arg),
 			   build_int_cst (TREE_TYPE (ll_arg), 0));
-
-      if (LOGICAL_OP_NON_SHORT_CIRCUIT)
-	{
-	  if (code != orig_code || lhs != orig_lhs || rhs != orig_rhs)
-	    return build2_loc (loc, code, truth_type, lhs, rhs);
-	  return NULL_TREE;
-	}
     }

   /* See if the comparisons can be merged.  Then get all the parameters for
@@ -8380,13 +8400,46 @@ fold_truth_andor (location_t loc, enum t
      lhs is another similar operation, try to merge its rhs with our
      rhs.  Then try to merge our lhs and rhs.  */
   if (TREE_CODE (arg0) == code
-      && 0 != (tem = fold_truthop (loc, code, type,
-				   TREE_OPERAND (arg0, 1), arg1)))
+      && 0 != (tem = fold_truth_andor_1 (loc, code, type,
+					 TREE_OPERAND (arg0, 1), arg1)))
     return fold_build2_loc (loc, code, type, TREE_OPERAND (arg0, 0), tem);

-  if ((tem = fold_truthop (loc, code, type, arg0, arg1)) != 0)
+  if ((tem = fold_truth_andor_1 (loc, code, type, arg0, arg1)) != 0)
     return tem;

+  if ((code == TRUTH_ANDIF_EXPR || code == TRUTH_ORIF_EXPR)
+      && (BRANCH_COST (optimize_function_for_speed_p (cfun),
+		       false) >= 2)
+      && !TREE_SIDE_EFFECTS (arg1)
+      && LOGICAL_OP_NON_SHORT_CIRCUIT
+      && simple_operand_p (arg1, true))
+    {
+      enum tree_code ncode = (code == TRUTH_ANDIF_EXPR ? TRUTH_AND_EXPR
+						       : TRUTH_OR_EXPR);
+
+      /* We don't want to pack more then two leafs to an non-IF
+         If tree-code of left-hand operand isn't an AND/OR-IF code and not
+         equal to CODE, then we don't want to add right-hand operand.
+         If the inner right-hand side of left-hand operand has side-effects,
+         or isn't simple, then we can't add to it, as otherwise we might
+         destroy if-sequence.  */
+      if (TREE_CODE (arg0) == code
+      	  /* Needed for sequence points to handle trappings, and
+      	     side-effects.  */
+      	  && !TREE_SIDE_EFFECTS (TREE_OPERAND (arg0, 1))
+      	  && simple_operand_p (TREE_OPERAND (arg0, 1), true))
+       {
+         tem = fold_build2_loc (loc, ncode, type, TREE_OPERAND (arg0, 1),
+				arg1);
+         return fold_build2_loc (loc, code, type, TREE_OPERAND (arg0, 0),
+				 tem);
+       }
+     /* Needed for sequence points to handle trappings, and side-effects.  */
+     else if (!TREE_SIDE_EFFECTS (arg0)
+	      && simple_operand_p (arg0, true))
+       return fold_build2_loc (loc, ncode, type, arg0, arg1);
+    }
+
   return NULL_TREE;
 }
Michael Matz - Oct. 11, 2011, 12:06 p.m.
Hi,

On Mon, 10 Oct 2011, Kai Tietz wrote:

> To ensure that we use simple_operand_p in all cases, beside for 
> branching AND/OR chains, in same way as before, I added to this function 
> an additional argument, by which the looking into comparisons can be 
> activated.

Better make it a separate function the first tests your new conditions, 
and then calls simple_operand_p.

> +fold_truth_andor_1 (location_t loc, enum tree_code code, tree truth_type,
> +		    tree lhs, tree rhs)
>  {
>    /* If this is the "or" of two comparisons, we can do something if
>       the comparisons are NE_EXPR.  If this is the "and", we can do something
> @@ -5149,13 +5176,6 @@ fold_truthop (location_t loc, enum tree_
>  			   build2 (BIT_IOR_EXPR, TREE_TYPE (ll_arg),
>  				   ll_arg, rl_arg),
>  			   build_int_cst (TREE_TYPE (ll_arg), 0));
> -
> -      if (LOGICAL_OP_NON_SHORT_CIRCUIT)
> -	{
> -	  if (code != orig_code || lhs != orig_lhs || rhs != orig_rhs)
> -	    return build2_loc (loc, code, truth_type, lhs, rhs);
> -	  return NULL_TREE;
> -	}

Why do you remove this hunk?  Shouldn't you instead move the hunk you 
added to fold_truth_andor() here.  I realize this needs some TLC to 
fold_truth_andor_1, because right now it early-outs for non-comparisons, 
but it seems the better place.  I.e. somehow move the below code into the 
above branch, with the associated diddling on fold_truth_andor_1 that it 
gets called.

> +  if ((code == TRUTH_ANDIF_EXPR || code == TRUTH_ORIF_EXPR)
> +      && (BRANCH_COST (optimize_function_for_speed_p (cfun),
> +		       false) >= 2)
> +      && !TREE_SIDE_EFFECTS (arg1)
> +      && LOGICAL_OP_NON_SHORT_CIRCUIT
> +      && simple_operand_p (arg1, true))
> +    {
> +      enum tree_code ncode = (code == TRUTH_ANDIF_EXPR ? TRUTH_AND_EXPR
> +						       : TRUTH_OR_EXPR);
> +
> +      /* We don't want to pack more then two leafs to an non-IF

Missing continuation of the sentence?

> +         If tree-code of left-hand operand isn't an AND/OR-IF code and not
> +         equal to CODE, then we don't want to add right-hand operand.
> +         If the inner right-hand side of left-hand operand has side-effects,
> +         or isn't simple, then we can't add to it, as otherwise we might
> +         destroy if-sequence.  */


> +      if (TREE_CODE (arg0) == code
> +      	  /* Needed for sequence points to handle trappings, and
> +      	     side-effects.  */
> +      	  && !TREE_SIDE_EFFECTS (TREE_OPERAND (arg0, 1))
> +      	  && simple_operand_p (TREE_OPERAND (arg0, 1), true))
> +       {
> +         tem = fold_build2_loc (loc, ncode, type, TREE_OPERAND (arg0, 1),
> +				arg1);
> +         return fold_build2_loc (loc, code, type, TREE_OPERAND (arg0, 0),
> +				 tem);
> +       }
> +     /* Needed for sequence points to handle trappings, and side-effects.  */
> +     else if (!TREE_SIDE_EFFECTS (arg0)
> +	      && simple_operand_p (arg0, true))
> +       return fold_build2_loc (loc, ncode, type, arg0, arg1);
> +    }
> +


Ciao,
Michael.
Kai Tietz - Oct. 11, 2011, 12:46 p.m.
2011/10/11 Michael Matz <matz@suse.de>:
> Hi,
>
> On Mon, 10 Oct 2011, Kai Tietz wrote:
>
>> To ensure that we use simple_operand_p in all cases, beside for
>> branching AND/OR chains, in same way as before, I added to this function
>> an additional argument, by which the looking into comparisons can be
>> activated.
>
> Better make it a separate function the first tests your new conditions,
> and then calls simple_operand_p.

Well, either I make it a new function and call it instead of
simple_operand_p, or I need to modify old simple_operand_p.  The logic
of new and old version is incompatible and has not to be called on
same tree, as otherwise new check is vain.

>> +fold_truth_andor_1 (location_t loc, enum tree_code code, tree truth_type,
>> +                 tree lhs, tree rhs)
>>  {
>>    /* If this is the "or" of two comparisons, we can do something if
>>       the comparisons are NE_EXPR.  If this is the "and", we can do something
>> @@ -5149,13 +5176,6 @@ fold_truthop (location_t loc, enum tree_
>>                          build2 (BIT_IOR_EXPR, TREE_TYPE (ll_arg),
>>                                  ll_arg, rl_arg),
>>                          build_int_cst (TREE_TYPE (ll_arg), 0));
>> -
>> -      if (LOGICAL_OP_NON_SHORT_CIRCUIT)
>> -     {
>> -       if (code != orig_code || lhs != orig_lhs || rhs != orig_rhs)
>> -         return build2_loc (loc, code, truth_type, lhs, rhs);
>> -       return NULL_TREE;
>> -     }
>
> Why do you remove this hunk?  Shouldn't you instead move the hunk you
> added to fold_truth_andor() here.  I realize this needs some TLC to
> fold_truth_andor_1, because right now it early-outs for non-comparisons,
> but it seems the better place.  I.e. somehow move the below code into the
> above branch, with the associated diddling on fold_truth_andor_1 that it
> gets called.

This hunk is removed, as it is vain to do here.  Btw richi asked for
it, and I agree that new TRUTH-AND/OR packing is better done at a
single place in fold_truth_andor only.

>> +  if ((code == TRUTH_ANDIF_EXPR || code == TRUTH_ORIF_EXPR)
>> +      && (BRANCH_COST (optimize_function_for_speed_p (cfun),
>> +                    false) >= 2)
>> +      && !TREE_SIDE_EFFECTS (arg1)
>> +      && LOGICAL_OP_NON_SHORT_CIRCUIT
>> +      && simple_operand_p (arg1, true))
>> +    {
>> +      enum tree_code ncode = (code == TRUTH_ANDIF_EXPR ? TRUTH_AND_EXPR
>> +                                                    : TRUTH_OR_EXPR);
>> +
>> +      /* We don't want to pack more then two leafs to an non-IF
>
> Missing continuation of the sentence?

Well, here is a colon missing.

>> +         If tree-code of left-hand operand isn't an AND/OR-IF code and not
>> +         equal to CODE, then we don't want to add right-hand operand.
>> +         If the inner right-hand side of left-hand operand has side-effects,
>> +         or isn't simple, then we can't add to it, as otherwise we might
>> +         destroy if-sequence.  */
>
>
>> +      if (TREE_CODE (arg0) == code
>> +               /* Needed for sequence points to handle trappings, and
>> +                  side-effects.  */
>> +               && !TREE_SIDE_EFFECTS (TREE_OPERAND (arg0, 1))
>> +               && simple_operand_p (TREE_OPERAND (arg0, 1), true))
>> +       {
>> +         tem = fold_build2_loc (loc, ncode, type, TREE_OPERAND (arg0, 1),
>> +                             arg1);
>> +         return fold_build2_loc (loc, code, type, TREE_OPERAND (arg0, 0),
>> +                              tem);
>> +       }
>> +     /* Needed for sequence points to handle trappings, and side-effects.  */
>> +     else if (!TREE_SIDE_EFFECTS (arg0)
>> +           && simple_operand_p (arg0, true))
>> +       return fold_build2_loc (loc, ncode, type, arg0, arg1);
>> +    }
>> +
>
>
> Ciao,
> Michael.
>

Kai
Michael Matz - Oct. 11, 2011, 2:44 p.m.
Hi,

On Tue, 11 Oct 2011, Kai Tietz wrote:

> > Better make it a separate function the first tests your new 
> > conditions, and then calls simple_operand_p.
> 
> Well, either I make it a new function and call it instead of 
> simple_operand_p,

That's what I meant, yes.

> >> @@ -5149,13 +5176,6 @@ fold_truthop (location_t loc, enum tree_
> >>                          build2 (BIT_IOR_EXPR, TREE_TYPE (ll_arg),
> >>                                  ll_arg, rl_arg),
> >>                          build_int_cst (TREE_TYPE (ll_arg), 0));
> >> -
> >> -      if (LOGICAL_OP_NON_SHORT_CIRCUIT)
> >> -     {
> >> -       if (code != orig_code || lhs != orig_lhs || rhs != orig_rhs)
> >> -         return build2_loc (loc, code, truth_type, lhs, rhs);
> >> -       return NULL_TREE;
> >> -     }
> >
> > Why do you remove this hunk?  Shouldn't you instead move the hunk you
> > added to fold_truth_andor() here.  I realize this needs some TLC to
> > fold_truth_andor_1, because right now it early-outs for non-comparisons,
> > but it seems the better place.  I.e. somehow move the below code into the
> > above branch, with the associated diddling on fold_truth_andor_1 that it
> > gets called.
> 
> This hunk is removed, as it is vain to do here.

There is a fallthrough now, that wasn't there before.  I don't know if 
it's harmless, I just wanted to mention it.

> Btw richi asked for it, and I agree that new TRUTH-AND/OR packing is 
> better done at a single place in fold_truth_andor only.

As fold_truthop is called twice by fold_truth_andor, the latter might 
indeed be the better place.


Ciao,
Michael.
Kai Tietz - Oct. 11, 2011, 2:57 p.m.
2011/10/11 Michael Matz <matz@suse.de>:
> Hi,
>
> On Tue, 11 Oct 2011, Kai Tietz wrote:
>
>> > Better make it a separate function the first tests your new
>> > conditions, and then calls simple_operand_p.
>>
>> Well, either I make it a new function and call it instead of
>> simple_operand_p,
>
> That's what I meant, yes.
>
>> >> @@ -5149,13 +5176,6 @@ fold_truthop (location_t loc, enum tree_
>> >>                          build2 (BIT_IOR_EXPR, TREE_TYPE (ll_arg),
>> >>                                  ll_arg, rl_arg),
>> >>                          build_int_cst (TREE_TYPE (ll_arg), 0));
>> >> -
>> >> -      if (LOGICAL_OP_NON_SHORT_CIRCUIT)
>> >> -     {
>> >> -       if (code != orig_code || lhs != orig_lhs || rhs != orig_rhs)
>> >> -         return build2_loc (loc, code, truth_type, lhs, rhs);
>> >> -       return NULL_TREE;
>> >> -     }
>> >
>> > Why do you remove this hunk?  Shouldn't you instead move the hunk you
>> > added to fold_truth_andor() here.  I realize this needs some TLC to
>> > fold_truth_andor_1, because right now it early-outs for non-comparisons,
>> > but it seems the better place.  I.e. somehow move the below code into the
>> > above branch, with the associated diddling on fold_truth_andor_1 that it
>> > gets called.
>>
>> This hunk is removed, as it is vain to do here.
>
> There is a fallthrough now, that wasn't there before.  I don't know if
> it's harmless, I just wanted to mention it.

It is.  Before we changed expression here and recurse here with the
non-IF AND/OR expression later.  So there is no need to do this
recursion.

>> Btw richi asked for it, and I agree that new TRUTH-AND/OR packing is
>> better done at a single place in fold_truth_andor only.
>
> As fold_truthop is called twice by fold_truth_andor, the latter might
> indeed be the better place.
>
>
> Ciao,
> Michael.

Kai
Jiangning Liu - Oct. 26, 2011, 8:49 a.m.
> -----Original Message-----
> From: gcc-patches-owner@gcc.gnu.org [mailto:gcc-patches-
> owner@gcc.gnu.org] On Behalf Of Michael Matz
> Sent: Tuesday, October 11, 2011 10:45 PM
> To: Kai Tietz
> Cc: Richard Guenther; Kai Tietz; gcc-patches@gcc.gnu.org; Richard
> Henderson
> Subject: Re: [patch tree-optimization]: Improve handling of
> conditional-branches on targets with high branch costs
> 
> Hi,
> 
> On Tue, 11 Oct 2011, Kai Tietz wrote:
> 
> > > Better make it a separate function the first tests your new
> > > conditions, and then calls simple_operand_p.
> >
> > Well, either I make it a new function and call it instead of
> > simple_operand_p,
> 
> That's what I meant, yes.
> 
> > >> @@ -5149,13 +5176,6 @@ fold_truthop (location_t loc, enum tree_
> > >>                          build2 (BIT_IOR_EXPR, TREE_TYPE (ll_arg),
> > >>                                  ll_arg, rl_arg),
> > >>                          build_int_cst (TREE_TYPE (ll_arg), 0));
> > >> -
> > >> -      if (LOGICAL_OP_NON_SHORT_CIRCUIT)
> > >> -     {
> > >> -       if (code != orig_code || lhs != orig_lhs || rhs !=
> orig_rhs)
> > >> -         return build2_loc (loc, code, truth_type, lhs, rhs);
> > >> -       return NULL_TREE;
> > >> -     }
> > >
> > > Why do you remove this hunk?  Shouldn't you instead move the hunk
> you
> > > added to fold_truth_andor() here.  I realize this needs some TLC to
> > > fold_truth_andor_1, because right now it early-outs for non-
> comparisons,
> > > but it seems the better place.  I.e. somehow move the below code
> into the
> > > above branch, with the associated diddling on fold_truth_andor_1
> that it
> > > gets called.
> >
> > This hunk is removed, as it is vain to do here.
> 
> There is a fallthrough now, that wasn't there before.  I don't know if
> it's harmless, I just wanted to mention it.
> 

Yes, this part introduced different behavior for this small case,

int f(char *i, int j)
{
        if (*i && j!=2)
                return *i;
        else
                return j;
}

Before the fix, we have

  D.4710 = *i;
  D.4711 = D.4710 != 0;
  D.4712 = j != 2;
  D.4713 = D.4711 & D.4712;
  if (D.4713 != 0) goto <D.4714>; else goto <D.4715>;
  <D.4714>:
  D.4710 = *i;
  D.4716 = (int) D.4710;
  return D.4716;
  <D.4715>:
  D.4716 = j;
  return D.4716;

After the fix, we have

  D.4711 = *i;
  if (D.4711 != 0) goto <D.4712>; else goto <D.4710>;
  <D.4712>:
  if (j != 2) goto <D.4713>; else goto <D.4710>;
  <D.4713>:
  D.4711 = *i;
  D.4714 = (int) D.4711;
  return D.4714;
  <D.4710>:
  D.4714 = j;
  return D.4714;

Does this meet the original expectation? 

I find the code below in function fold_truth_andor makes difference,

      /* Transform (A AND-IF B) into (A AND B), or (A OR-IF B)	 into (A OR B).
	 For sequence point consistancy, we need to check for trapping,
	 and side-effects.  */
      else if (code == icode && simple_operand_p_2 (arg0)
               && simple_operand_p_2 (arg1))
         return fold_build2_loc (loc, ncode, type, arg0, arg1);

for "*i != 0" simple_operand_p(*i) returns false. Originally this is not checked by the code. I don't see the patch originally wanted to cover this. Can this be explained reasonably?

I'm not arguing this patch did worse thing, but only want to understand the rationale behind this and justify this patch doesn't hurt anything else. Actually on the contrary, I measured and this change accidently made some benchmarks significantly improved due to some other reasons.

Thanks,
-Jiangning

> > Btw richi asked for it, and I agree that new TRUTH-AND/OR packing is
> > better done at a single place in fold_truth_andor only.
> 
> As fold_truthop is called twice by fold_truth_andor, the latter might
> indeed be the better place.
> 
> 
> Ciao,
> Michael.
Kai Tietz - Oct. 26, 2011, 9:45 a.m.
2011/10/26 Jiangning Liu <jiangning.liu@arm.com>:
>
>
>> -----Original Message-----
>> From: gcc-patches-owner@gcc.gnu.org [mailto:gcc-patches-
>> owner@gcc.gnu.org] On Behalf Of Michael Matz
>> Sent: Tuesday, October 11, 2011 10:45 PM
>> To: Kai Tietz
>> Cc: Richard Guenther; Kai Tietz; gcc-patches@gcc.gnu.org; Richard
>> Henderson
>> Subject: Re: [patch tree-optimization]: Improve handling of
>> conditional-branches on targets with high branch costs
>>
>> Hi,
>>
>> On Tue, 11 Oct 2011, Kai Tietz wrote:
>>
>> > > Better make it a separate function the first tests your new
>> > > conditions, and then calls simple_operand_p.
>> >
>> > Well, either I make it a new function and call it instead of
>> > simple_operand_p,
>>
>> That's what I meant, yes.
>>
>> > >> @@ -5149,13 +5176,6 @@ fold_truthop (location_t loc, enum tree_
>> > >>                          build2 (BIT_IOR_EXPR, TREE_TYPE (ll_arg),
>> > >>                                  ll_arg, rl_arg),
>> > >>                          build_int_cst (TREE_TYPE (ll_arg), 0));
>> > >> -
>> > >> -      if (LOGICAL_OP_NON_SHORT_CIRCUIT)
>> > >> -     {
>> > >> -       if (code != orig_code || lhs != orig_lhs || rhs !=
>> orig_rhs)
>> > >> -         return build2_loc (loc, code, truth_type, lhs, rhs);
>> > >> -       return NULL_TREE;
>> > >> -     }
>> > >
>> > > Why do you remove this hunk?  Shouldn't you instead move the hunk
>> you
>> > > added to fold_truth_andor() here.  I realize this needs some TLC to
>> > > fold_truth_andor_1, because right now it early-outs for non-
>> comparisons,
>> > > but it seems the better place.  I.e. somehow move the below code
>> into the
>> > > above branch, with the associated diddling on fold_truth_andor_1
>> that it
>> > > gets called.
>> >
>> > This hunk is removed, as it is vain to do here.
>>
>> There is a fallthrough now, that wasn't there before.  I don't know if
>> it's harmless, I just wanted to mention it.
>>
>
> Yes, this part introduced different behavior for this small case,
>
> int f(char *i, int j)
> {
>        if (*i && j!=2)
>                return *i;
>        else
>                return j;
> }
>
> Before the fix, we have
>
>  D.4710 = *i;
>  D.4711 = D.4710 != 0;
>  D.4712 = j != 2;
>  D.4713 = D.4711 & D.4712;
>  if (D.4713 != 0) goto <D.4714>; else goto <D.4715>;
>  <D.4714>:
>  D.4710 = *i;
>  D.4716 = (int) D.4710;
>  return D.4716;
>  <D.4715>:
>  D.4716 = j;
>  return D.4716;
>
> After the fix, we have
>
>  D.4711 = *i;
>  if (D.4711 != 0) goto <D.4712>; else goto <D.4710>;
>  <D.4712>:
>  if (j != 2) goto <D.4713>; else goto <D.4710>;
>  <D.4713>:
>  D.4711 = *i;
>  D.4714 = (int) D.4711;
>  return D.4714;
>  <D.4710>:
>  D.4714 = j;
>  return D.4714;
>
> Does this meet the original expectation?
>
> I find the code below in function fold_truth_andor makes difference,
>
>      /* Transform (A AND-IF B) into (A AND B), or (A OR-IF B)   into (A OR B).
>         For sequence point consistancy, we need to check for trapping,
>         and side-effects.  */
>      else if (code == icode && simple_operand_p_2 (arg0)
>               && simple_operand_p_2 (arg1))
>         return fold_build2_loc (loc, ncode, type, arg0, arg1);
>
> for "*i != 0" simple_operand_p(*i) returns false. Originally this is not checked by the code. I don't see the patch originally wanted to cover this. Can this be explained reasonably?
>
> I'm not arguing this patch did worse thing, but only want to understand the rationale behind this and justify this patch doesn't hurt anything else. Actually on the contrary, I measured and this change accidently made some benchmarks significantly improved due to some other reasons.
>
> Thanks,
> -Jiangning
>
>> > Btw richi asked for it, and I agree that new TRUTH-AND/OR packing is
>> > better done at a single place in fold_truth_andor only.
>>
>> As fold_truthop is called twice by fold_truth_andor, the latter might
>> indeed be the better place.
>>
>>
>> Ciao,
>> Michael.

Well, as far as I understand C specification and sequence points, it
makes indeed a difference to do branch-cost optimization on
instructions might cause a signal traps. As signal could be handled in
specifc handlers. You need to consider here that short-circuit
optimization assumes associative law on operands.  So right-hand side
might be evaluaded before the left-hand-side.  And this indeed breaks
IMHO the sequences as mentioned in C specification about conditions.
 So a memory de-referencing (same as a floating-point trap) can never
be treated as simple, as far as I understood this.  So this patch -
beside improving logic for branch-cost merging - fixes this latent
issue.

Regards,
Kai
Michael Matz - Oct. 26, 2011, 11:53 a.m.
Hi,

On Wed, 26 Oct 2011, Jiangning Liu wrote:

> > > >> -
> > > >> -      if (LOGICAL_OP_NON_SHORT_CIRCUIT)
> > > >> -     {
> > > >> -       if (code != orig_code || lhs != orig_lhs || rhs !=
> > orig_rhs)
> > > >> -         return build2_loc (loc, code, truth_type, lhs, rhs);
> > > >> -       return NULL_TREE;
> > > >> -     }
> > > >
> > > > Why do you remove this hunk?  Shouldn't you instead move the hunk
> > you
> > > > added to fold_truth_andor() here.  I realize this needs some TLC to
> > > > fold_truth_andor_1, because right now it early-outs for non-
> > comparisons,
> > > > but it seems the better place.  I.e. somehow move the below code
> > into the
> > > > above branch, with the associated diddling on fold_truth_andor_1
> > that it
> > > > gets called.
> > >
> > > This hunk is removed, as it is vain to do here.
> > 
> > There is a fallthrough now, that wasn't there before.  I don't know if
> > it's harmless, I just wanted to mention it.
> > 
> 
> Yes, this part introduced different behavior for this small case,
> 
>   D.4710 = *i;
>   D.4711 = D.4710 != 0;
>   D.4712 = j != 2;
>   D.4713 = D.4711 & D.4712;
>   if (D.4713 != 0) goto <D.4714>; else goto <D.4715>;
> 
> After the fix, we have
> 
>   D.4711 = *i;
>   if (D.4711 != 0) goto <D.4712>; else goto <D.4710>;
>   <D.4712>:
>   if (j != 2) goto <D.4713>; else goto <D.4710>;

So, we have one more jump than originally, when the point of the patch 
was to emit less on targets with high branch costs.  So, as speculated, 
the hunk was not useless.  (It's nice that it caused a benchmark to 
improve significantly, but that should be done via a proper analysis and 
patch, not as a side effect of a supposed non-change).


Ciao,
Michael.
Michael Matz - Oct. 26, 2011, 12:13 p.m.
Hi,

On Wed, 26 Oct 2011, Kai Tietz wrote:

> > Yes, this part introduced different behavior for this small case,
> >
> > int f(char *i, int j)
> > {
> >        if (*i && j!=2)
> >                return *i;
> >        else
> >                return j;
> > }
> 
> Well, as far as I understand C specification and sequence points, it 
> makes indeed a difference to do branch-cost optimization on instructions 
> might cause a signal traps. As signal could be handled in specifc 
> handlers. You need to consider here that short-circuit optimization 
> assumes associative law on operands.  So right-hand side might be 
> evaluaded before the left-hand-side.  And this indeed breaks IMHO the 
> sequences as mentioned in C specification about conditions.

I'm not sure in this specific case.  If it had been:

if (*a && *b) ...

the you'd be right.  The side-effect of dereferencing a must happen before 
*b, and hence transforming this into (*a!=0)&(*b!=0) would be wrong.  But 
in the case at hand we only have one potentially problematic (aka 
detectable) side-effect (*i), so I don't think you could construct a 
program that detects the difference.  It would entail detecting that 
"j!=2" was already evaluated, when (say) the segfault happens, but you 
can't as the variable is non-volatile.  It also can't happen that the 
side-effect *i does not occur (which also would be a problem).  So, I 
think there wasn't an actual problem before, and it had fewer jumps.


Ciao,
Michael.
Kai Tietz - Oct. 26, 2011, 12:30 p.m.
2011/10/26 Michael Matz <matz@suse.de>:
> Hi,
>
> On Wed, 26 Oct 2011, Kai Tietz wrote:
>
>> > Yes, this part introduced different behavior for this small case,
>> >
>> > int f(char *i, int j)
>> > {
>> >        if (*i && j!=2)
>> >                return *i;
>> >        else
>> >                return j;
>> > }
>>
>> Well, as far as I understand C specification and sequence points, it
>> makes indeed a difference to do branch-cost optimization on instructions
>> might cause a signal traps. As signal could be handled in specifc
>> handlers. You need to consider here that short-circuit optimization
>> assumes associative law on operands.  So right-hand side might be
>> evaluaded before the left-hand-side.  And this indeed breaks IMHO the
>> sequences as mentioned in C specification about conditions.
>
> I'm not sure in this specific case.  If it had been:
>
> if (*a && *b) ...
>
> the you'd be right.  The side-effect of dereferencing a must happen before
> *b, and hence transforming this into (*a!=0)&(*b!=0) would be wrong.  But
> in the case at hand we only have one potentially problematic (aka
> detectable) side-effect (*i), so I don't think you could construct a
> program that detects the difference.  It would entail detecting that
> "j!=2" was already evaluated, when (say) the segfault happens, but you
> can't as the variable is non-volatile.  It also can't happen that the
> side-effect *i does not occur (which also would be a problem).  So, I
> think there wasn't an actual problem before, and it had fewer jumps.
>
>
> Ciao,
> Michael.

the case can be produced quite easily.

Eg:

extern int global = 0;

....
  if (*a && global) ...
...

the issue is that by C-specification see here 5.1.2.2.3 about
program-termination.The important point is here::

Evaluation of an expression may produce side effects. At certain
specified points in the execution sequence called sequence points, all
side effects of previous evaluations shall be complete *** and no side
effects of subsequent evaluations shall have taken place ***

Annex C describes sequence-points as

1 The following are the sequence points described in 5.1.2.3:
— The call to a function, after the arguments have been evaluated (6.5.2.2).
— The end of the first operand of the following operators: logical AND
&& (6.5.13); logical OR || (6.5.14); conditional ? (6.5.15); comma ,
(6.5.17).
...

Regards,
Kai
Kai Tietz - Oct. 26, 2011, 12:45 p.m.
I describe the sample more closely here

extern int global = 0;
extern int *a = NULL;

void catchSigSegV( int sig )
{
  a = &global;
 }

int foo (int j)
{
 signal (SIGSEGV, catchSigSegV);
 if (*a && global) return 2;
 return 0;
}

I admit that in most cases such a scenario is not common.  This sample
seems to be a valid C program.  So the conditions in IF shall be
evaluted strict in order of sequence-points, as first argument might
trap.
It doesn't matter if second argument have side-effects or none.  The
point is the first and so it has to be separated from other
conditions.

Regards,
Kai
Michael Matz - Oct. 26, 2011, 1:12 p.m.
Hi,

On Wed, 26 Oct 2011, Kai Tietz wrote:

> >> > int f(char *i, int j)
> >> > {
> >> >        if (*i && j!=2)
> >> >                return *i;
> >> >        else
> >> >                return j;
> >> > }
> >>
> 
> the case can be produced quite easily.
> 
> extern int global = 0;
> 
> ....
>   if (*a && global) ...

See?  You had to change the program to prove the transformation to be 
invalid.  But my point was that the function we discuss about was exactly 
as above.  It didn't have globals, or two loads, or a volatile, or 
anything else you can come up with where the transformation would be 
detectable (and hence invalid).  I claim that you can't construct a 
program that can distinguish between this function:

int f(char *i, int j)
{
  if (*i && j!=2)
    return *i;
  else
    return j;
}

and this one:

int f(char *i, int j)
{
  if (*i & j!=2)
    return *i;
  else
    return j;
}

And if you can't construct such a program, then the initial transformation 
before the fold-const.c change _for this specific situation_ was correct.


Ciao,
Michael.
Kai Tietz - Oct. 26, 2011, 1:42 p.m.
2011/10/26 Michael Matz <matz@suse.de>:
> Hi,
>
> On Wed, 26 Oct 2011, Kai Tietz wrote:
>
>> >> > int f(char *i, int j)
>> >> > {
>> >> >        if (*i && j!=2)
>> >> >                return *i;
>> >> >        else
>> >> >                return j;
>> >> > }
>> >>
>>
>> the case can be produced quite easily.
>>
>> extern int global = 0;
>>
>> ....
>>   if (*a && global) ...
>
> See?  You had to change the program to prove the transformation to be
> invalid.  But my point was that the function we discuss about was exactly
> as above.  It didn't have globals, or two loads, or a volatile, or
> anything else you can come up with where the transformation would be
> detectable (and hence invalid).  I claim that you can't construct a
> program that can distinguish between this function:
>
> int f(char *i, int j)
> {
>   if (*i && j!=2)
>     return *i;
>   else
>     return j;
> }
>
> and this one:
>
> int f(char *i, int j)
> {
>   if (*i & j!=2)
>     return *i;
>   else
>     return j;
> }
>
> And if you can't construct such a program, then the initial transformation
> before the fold-const.c change _for this specific situation_ was correct.
>
>
> Ciao,
> Michael.

well, if such a function is used as inline and we know for it that j
has value != 2, then we have here a big difference.  For your first
example, we still have to do the memory access to *i, even if we are
not interested in result.  See here point 4 of 5.1.2.3 of C-spec.
For your second sample we don't need to do that, as the & itself is no
sequence-point and so we can eliminate the *i access without breaking
anything.

Regards,
Kai
Michael Matz - Oct. 26, 2011, 2:25 p.m.
Hi,

On Wed, 26 Oct 2011, Kai Tietz wrote:

> well, if such a function is used as inline and we know for it that j has 
> value != 2, then we have here a big difference.  For your first example, 
> we still have to do the memory access to *i, even if we are not 
> interested in result.

Actually we don't have to preserve memory accesses.  The interesting case 
is if the pointer has an invalid value.  The behaviour of the access then 
is undefined, and it's okay to not do it at all.  In case the pointer does 
point to an object the access (if it's value isn't needed) also isn't 
necessary.  IOW: in "void f(int *p) { int i = *p; }" we can always remove 
the pointer read.  So, I still maintain that the transformation on the 
original example was okay.


Ciao,
Michael.
Kai Tietz - Oct. 26, 2011, 3:20 p.m.
2011/10/26 Michael Matz <matz@suse.de>:
> Hi,
>
> On Wed, 26 Oct 2011, Kai Tietz wrote:
>
>> well, if such a function is used as inline and we know for it that j has
>> value != 2, then we have here a big difference.  For your first example,
>> we still have to do the memory access to *i, even if we are not
>> interested in result.
>
> Actually we don't have to preserve memory accesses.  The interesting case
> is if the pointer has an invalid value.  The behaviour of the access then
> is undefined, and it's okay to not do it at all.  In case the pointer does
> point to an object the access (if it's value isn't needed) also isn't
> necessary.  IOW: in "void f(int *p) { int i = *p; }" we can always remove
> the pointer read.  So, I still maintain that the transformation on the
> original example was okay.
>
>
> Ciao,
> Michael.

So you would mean that memory dereferencing shouldn't be considered as
side-effect at all?

So we would happily cause by code 'if (i && *i != 0) an crash, as
memory-dereference has for you no side-effect?

In you special case it might be valid that, if first (and C-fold-const
doesn't know if the side-effect condition is really the first, as it
might be a sub-sequence of a condition) condition might trap or not,
to combine it.  But branching has to cover the general cases.  If you
find a way to determine that left-hand operand in fold_const's
branching code is really the left-most condition in chain, then we can
add such a special case, but I don't see here an easy way to determine
it.

Regards,
Kai

Hmm, not sure
Michael Matz - Oct. 26, 2011, 3:47 p.m.
Hi,

On Wed, 26 Oct 2011, Kai Tietz wrote:

> So you would mean that memory dereferencing shouldn't be considered as 
> side-effect at all?

No.  I haven't said this at all.  Of course it's a side-effect, but we're 
allowed to remove existing ones (under some circumstances).  We're not 
allowed to introduce new ones, which means that this ...

> So we would happily cause by code 'if (i && *i != 0) an crash, as 
> memory-dereference has for you no side-effect?

... is not allowed.  But in the original example the memread was on the 
left side, hence occured always, therefore we can move it to the right 
side, even though it might occur less often.

> In you special case it might be valid that, if first (and C-fold-const 
> doesn't know if the side-effect condition is really the first, as it 
> might be a sub-sequence of a condition) condition might trap or not, to 
> combine it.  But branching has to cover the general cases.  If you find 
> a way to determine that left-hand operand in fold_const's branching code 
> is really the left-most condition in chain, then we can add such a 
> special case, but I don't see here an easy way to determine it.

Hmm?  I don't see why it's necessary to check if it's the left-most 
condition in a chain.  If the left hand of '&&' is a memread it can always 
be moved to the right side (or the operator transformed into '&' which can 
have the same effect), of course only if the original rhs is free of side 
effects, but then independed if the && was part of a larger expression.  
The memread will possibly be done fewer times than originally, but as 
said, that's okay.


Ciao,
Michael.
Jiangning Liu - Oct. 27, 2011, 8:58 a.m.
> -----Original Message-----
> From: Michael Matz [mailto:matz@suse.de]
> Sent: Wednesday, October 26, 2011 11:47 PM
> To: Kai Tietz
> Cc: Jiangning Liu; Richard Guenther; Kai Tietz; gcc-patches@gcc.gnu.org;
> Richard Henderson
> Subject: Re: [patch tree-optimization]: Improve handling of
> conditional-branches on targets with high branch costs
> 
> Hi,
> 
> On Wed, 26 Oct 2011, Kai Tietz wrote:
> 
> > So you would mean that memory dereferencing shouldn't be considered
> as
> > side-effect at all?
> 
> No.  I haven't said this at all.  Of course it's a side-effect, but
> we're
> allowed to remove existing ones (under some circumstances).  We're not
> allowed to introduce new ones, which means that this ...
> 
> > So we would happily cause by code 'if (i && *i != 0) an crash, as
> > memory-dereference has for you no side-effect?
> 
> ... is not allowed.  But in the original example the memread was on the
> left side, hence occured always, therefore we can move it to the right
> side, even though it might occur less often.
> 
> > In you special case it might be valid that, if first (and C-fold-
> const
> > doesn't know if the side-effect condition is really the first, as it
> > might be a sub-sequence of a condition) condition might trap or not,
> to
> > combine it.  But branching has to cover the general cases.  If you
> find
> > a way to determine that left-hand operand in fold_const's branching
> code
> > is really the left-most condition in chain, then we can add such a
> > special case, but I don't see here an easy way to determine it.
> 
> Hmm?  I don't see why it's necessary to check if it's the left-most
> condition in a chain.  If the left hand of '&&' is a memread it can
> always
> be moved to the right side (or the operator transformed into '&' which
> can
> have the same effect), of course only if the original rhs is free of
> side
> effects, but then independed if the && was part of a larger expression.
> The memread will possibly be done fewer times than originally, but as
> said, that's okay.
> 

Agree. The point is for the small case I gave RHS doesn't have side effect
at all, so the optimization of changing it to AND doesn't violate C
specification. We need to recover something for this case, although it did
improve a lot for some particular benchmarks.

Thanks,
-Jiangning

> 
> Ciao,
> Michael.
Kai Tietz - Oct. 27, 2011, 9:36 a.m.
2011/10/27 Jiangning Liu <jiangning.liu@arm.com>:
>
>
>> -----Original Message-----
>> From: Michael Matz [mailto:matz@suse.de]
>> Sent: Wednesday, October 26, 2011 11:47 PM
>> To: Kai Tietz
>> Cc: Jiangning Liu; Richard Guenther; Kai Tietz; gcc-patches@gcc.gnu.org;
>> Richard Henderson
>> Subject: Re: [patch tree-optimization]: Improve handling of
>> conditional-branches on targets with high branch costs
>>
>> Hi,
>>
>> On Wed, 26 Oct 2011, Kai Tietz wrote:
>>
>> > So you would mean that memory dereferencing shouldn't be considered
>> as
>> > side-effect at all?
>>
>> No.  I haven't said this at all.  Of course it's a side-effect, but
>> we're
>> allowed to remove existing ones (under some circumstances).  We're not
>> allowed to introduce new ones, which means that this ...
>>
>> > So we would happily cause by code 'if (i && *i != 0) an crash, as
>> > memory-dereference has for you no side-effect?
>>
>> ... is not allowed.  But in the original example the memread was on the
>> left side, hence occured always, therefore we can move it to the right
>> side, even though it might occur less often.
>>
>> > In you special case it might be valid that, if first (and C-fold-
>> const
>> > doesn't know if the side-effect condition is really the first, as it
>> > might be a sub-sequence of a condition) condition might trap or not,
>> to
>> > combine it.  But branching has to cover the general cases.  If you
>> find
>> > a way to determine that left-hand operand in fold_const's branching
>> code
>> > is really the left-most condition in chain, then we can add such a
>> > special case, but I don't see here an easy way to determine it.
>>
>> Hmm?  I don't see why it's necessary to check if it's the left-most
>> condition in a chain.  If the left hand of '&&' is a memread it can
>> always
>> be moved to the right side (or the operator transformed into '&' which
>> can
>> have the same effect), of course only if the original rhs is free of
>> side
>> effects, but then independed if the && was part of a larger expression.
>> The memread will possibly be done fewer times than originally, but as
>> said, that's okay.
>>
>
> Agree. The point is for the small case I gave RHS doesn't have side effect
> at all, so the optimization of changing it to AND doesn't violate C
> specification. We need to recover something for this case, although it did
> improve a lot for some particular benchmarks.
>
> Thanks,
> -Jiangning
>
>>
>> Ciao,
>> Michael.

Hmm, so we can allow merging to AND, if the left-hand-side might trap
but has no-side-effects and rhs has neither trapping nor side-effects.
As for the case that left-hand side has side-effects but right-hand
not, we aren't allowed to do this AND/OR merge.  For example 'if ((f =
foo ()) != 0 && f < 24)' we aren't allowed to make this
transformation.

This shouldn't be that hard.  We need to provide to simple_operand_p_2
an additional argument for checking trapping or not.

Regards,
Kai
Jiangning Liu - Oct. 31, 2011, 7:57 a.m.
> -----Original Message-----
> From: Kai Tietz [mailto:ktietz70@googlemail.com]
> Sent: Thursday, October 27, 2011 5:36 PM
> To: Jiangning Liu
> Cc: Michael Matz; Richard Guenther; Kai Tietz; gcc-patches@gcc.gnu.org;
> Richard Henderson
> Subject: Re: [patch tree-optimization]: Improve handling of
> conditional-branches on targets with high branch costs
> 
> 2011/10/27 Jiangning Liu <jiangning.liu@arm.com>:
> >
> >
> >> -----Original Message-----
> >> From: Michael Matz [mailto:matz@suse.de]
> >> Sent: Wednesday, October 26, 2011 11:47 PM
> >> To: Kai Tietz
> >> Cc: Jiangning Liu; Richard Guenther; Kai Tietz; gcc-
> patches@gcc.gnu.org;
> >> Richard Henderson
> >> Subject: Re: [patch tree-optimization]: Improve handling of
> >> conditional-branches on targets with high branch costs
> >>
> >> Hi,
> >>
> >> On Wed, 26 Oct 2011, Kai Tietz wrote:
> >>
> >> > So you would mean that memory dereferencing shouldn't be
> considered
> >> as
> >> > side-effect at all?
> >>
> >> No.  I haven't said this at all.  Of course it's a side-effect, but
> >> we're
> >> allowed to remove existing ones (under some circumstances).  We're
> not
> >> allowed to introduce new ones, which means that this ...
> >>
> >> > So we would happily cause by code 'if (i && *i != 0) an crash, as
> >> > memory-dereference has for you no side-effect?
> >>
> >> ... is not allowed.  But in the original example the memread was on
> the
> >> left side, hence occured always, therefore we can move it to the
> right
> >> side, even though it might occur less often.
> >>
> >> > In you special case it might be valid that, if first (and C-fold-
> >> const
> >> > doesn't know if the side-effect condition is really the first, as
> it
> >> > might be a sub-sequence of a condition) condition might trap or
> not,
> >> to
> >> > combine it.  But branching has to cover the general cases.  If you
> >> find
> >> > a way to determine that left-hand operand in fold_const's
> branching
> >> code
> >> > is really the left-most condition in chain, then we can add such a
> >> > special case, but I don't see here an easy way to determine it.
> >>
> >> Hmm?  I don't see why it's necessary to check if it's the left-most
> >> condition in a chain.  If the left hand of '&&' is a memread it can
> >> always
> >> be moved to the right side (or the operator transformed into '&'
> which
> >> can
> >> have the same effect), of course only if the original rhs is free of
> >> side
> >> effects, but then independed if the && was part of a larger
> expression.
> >> The memread will possibly be done fewer times than originally, but
> as
> >> said, that's okay.
> >>
> >
> > Agree. The point is for the small case I gave RHS doesn't have side
> effect
> > at all, so the optimization of changing it to AND doesn't violate C
> > specification. We need to recover something for this case, although
> it did
> > improve a lot for some particular benchmarks.
> >
> > Thanks,
> > -Jiangning
> >
> >>
> >> Ciao,
> >> Michael.
> 
> Hmm, so we can allow merging to AND, if the left-hand-side might trap
> but has no-side-effects and rhs has neither trapping nor side-effects.
> As for the case that left-hand side has side-effects but right-hand
> not, we aren't allowed to do this AND/OR merge.  For example 'if ((f =
> foo ()) != 0 && f < 24)' we aren't allowed to make this
> transformation.
> 
> This shouldn't be that hard.  We need to provide to simple_operand_p_2
> an additional argument for checking trapping or not.
> 

Would it be OK if I file a tracker in bugzilla against this?

> Regards,
> Kai

Patch

Index: gcc/gcc/fold-const.c
===================================================================
--- gcc.orig/gcc/fold-const.c
+++ gcc/gcc/fold-const.c
@@ -111,14 +111,13 @@  static tree decode_field_reference (loca
 				    tree *, tree *);
 static int all_ones_mask_p (const_tree, int);
 static tree sign_bit_p (tree, const_tree);
-static int simple_operand_p (const_tree);
+static bool simple_operand_p (tree, bool);
 static tree range_binop (enum tree_code, tree, tree, int, tree, int);
 static tree range_predecessor (tree);
 static tree range_successor (tree);
 static tree fold_range_test (location_t, enum tree_code, tree, tree, tree);
 static tree fold_cond_expr_with_comparison (location_t, tree, tree,
tree, tree);
 static tree unextend (tree, int, int, tree);
-static tree fold_truthop (location_t, enum tree_code, tree, tree, tree);
 static tree optimize_minmax_comparison (location_t, enum tree_code,
 					tree, tree, tree);