Tree-level fix for PR 69526
diff mbox

Message ID CAFiYyc0gnMe7j+DgTeudnpH-zD22nK7Cuv4a2zj4Va-anApTBA@mail.gmail.com
State New
Headers show

Commit Message

Richard Biener Nov. 28, 2016, 11:13 a.m. UTC
On Fri, Nov 25, 2016 at 2:46 PM, Richard Biener
<richard.guenther@gmail.com> wrote> On Wed, Nov 16, 2016 at 4:54 PM,
Robin Dapp <rdapp@linux.vnet.ibm.com> wrote:
>> Found some time to look into this again.
>>
>>> Index: tree-ssa-propagate.c
>>> ===================================================================
>>> --- tree-ssa-propagate.c        (revision 240133)
>>> +++ tree-ssa-propagate.c        (working copy)
>>> @@ -1105,10 +1105,10 @@ substitute_and_fold_dom_walker::before_d
>>>        /* Replace real uses in the statement.  */
>>>        did_replace |= replace_uses_in (stmt, get_value_fn);
>>>
>>> -      /* If we made a replacement, fold the statement.  */
>>> -      if (did_replace)
>>> +      /* Fold the statement.  */
>>> +      if (fold_stmt (&i, follow_single_use_edges))
>>>         {
>>> -         fold_stmt (&i, follow_single_use_edges);
>>> +         did_replace = true;
>>>           stmt = gsi_stmt (i);
>>>         }
>>>
>>> this would need compile-time cost evaluation (and avoid the tree-vrp.c
>>> folding part
>>> of your patch).
>>
>> This causes an ICE and bootstrap errors with newer revisions. I tested
>> my patch on r240691 where it still works. How can this be done properly now?
>
> Not sure, I'd have to investigate.  It should still just work
> (throwing off a bootstrap
> with just that change over the weekend).


works fine for me.

>>> OTOH given that you use this to decide whether you can use a combined constant
>>> that doesn't look necessary (if the constant is correct, that is) --
>>> what you need
>>> to make sure is that the final operation, (T)(A) +- CST, does not overflow
>>> if ! TYPE_OVERFLOW_WRAPS and there wasn't any overflow in the
>>> original operation.  I think that always holds, thus the combine_ovf check looks
>>> superfluous to me.
>>
>> Removed the check and addressed the other remarks.
>>
>>> So now we know that for (T)(X + CST1) + CST2, (T)CST1 + CST2
>>> does not overflow.  But we do not really care for that, we want to know
>>> whether (T)X + CST' might invoke undefined behavior when the original
>>> expression did not.  This involves range information on X.  I don't
>>> see how we ensure this here.
>>
>> I guess I'm still missing an undefined behavior case. In which case can
>> (T)X + CST' trigger undefined behavior where the original statement did
>> not? I see the need for checking in the second pattern ((T)(X) + CST' ->
>> (T)(X + CST')), of course.
>
> Looking at
>
> +  /* ((T)(A +- CST)) +- CST -> (T)(A) +- CST)  */
> +#if GIMPLE
> +   (for outer_op (plus minus)
> +     (for inner_op (plus minus)
> +       (simplify
> +        (outer_op (convert (inner_op@3 @0 INTEGER_CST@1)) INTEGER_CST@2)
> +          (if (TREE_CODE (type) == INTEGER_TYPE &&
> +               TYPE_PRECISION (type) >= TYPE_PRECISION (TREE_TYPE (@3)))
>
> so the conversion (T) is widening or sign-changing.  (&& go to the next line)
>
> If A + CST overflows we cannot do the transform (you check that
> with extract_range_from_binary_expr setting 'range_split').
>
> If A + CST does not overflow but is unsigned and we are just changing sign
> (precision ==) then (T)A + (CST + CST) might overflow.  Consider
> (int)(INT_MAX + 1) + 1 -> INT_MAX + 2.  I think here the important part
> is whether A + CST fits in T for the case where we just change the type
> to a type with !TYPE_OVERFLOW_WRAPS.  Certainly restricting to
> widenings would avoid the issue.
>
>>> But that's "easily fixable" by computing it in unsigned arithmetic, that is
>>> doing
>>>
>>>   (long)(a + 2) + LONG_MAX -> (long)((unsigned long)a + (LONG_MAX + 2))
>>
>> Does this also work if (unsigned long)a + (LONG_MAX + 2) does not fit
>> into [0,LONG_MAX]? IIRC (unsigned long)(LONG_MAX + 2) is
>> implementation-defined and not undefined so it should work?
>
> Yes, implementation-defined beavior is fine.
>
>> Revised patch version attached. One thing I'm still not sure about is
>> the handling of sth. like (unsigned long)(a + UINT_MAX) + 1 for a = 0.
>> In the current patch version I always do a sign-extend of the first
>> constant (UINT_MAX here) which seems to cause no problems in the
>> testsuite and some custom tests still worked.
>
> +             /* Sign-extend @1 to TYPE.  */
> +             w1 = w1.from (w1, TYPE_PRECISION (type), SIGNED);
>
> not sure why you do always sign-extend.  If the inner op is unsigned
> and we widen then that's certainly bogus considering your UINT_MAX
> example above.  Does
>
>                w1 = w1.from (w1, TYPE_PRECISION (type), TYPE_SIGN (inner_type));
>
> not work for some reason?
>
> +             /* Combine in outer, larger type.  */
> +             bool combine_ovf = true;
> +             combined_cst = wi::add (w1, w2, SIGNED, &combine_ovf);
>
> as you ignore combine_ovf you can simply use
>
>   combined_cst = wi::add (w1, w2);
>
> +             /* Convert combined constant to tree of outer type if
> +                there was no value range split in the original operation.  */
> +             if (!range_split)
> +               {
>
> I'd say you want to condition on range_split early, like with
>
>     bool range_split;
>     if (TYPE_OVERFLOW_UNDEFINED (inner_type)
>         || ! (extract_range_from_binary_expr (...., &range_split), range_split))
>       {
> ...
>       }
>
> and avoid all the work if you throw it away anyway.
>
>> Can UINT_MAX, -1 and
>> similar cases be disambiguated (and correctly converted to the outer
>> type) when done in unsigned arithmetic?
>
> see above how I expect it to "just work".
>
>>
>> Thinking about the second pattern, on s390x it introduces more casts
>> than just using the first one (e.g. in cases where the value will be
>> sign-extended after the operation which wouldn't have happened when
>> performing the operation in the larger type. Can we catch this
>> with a cost function?
>
> Not sure, if it's really too bad we can try merging the two patterns again,
> thus have (T)(A +- CST) +- CST -> (T)(A +- CST) again.  Now that both
> patterns look simple enough that should be possible without too much hassle.
>
> +       (if (cst)
> +        (outer_op (convert { @0; }) { cst; }))
>
> you can write this as
>
>     (if (cst)
>      (outer_op (convert @0) { cst}))
>
> no need for the {}s around @0.
>
> +  /* ((T)(A)) +- CST -> (T)(A +- CST)  */
> +#if GIMPLE
> +   (for outer_op (plus minus)
> +    (simplify
> +     (outer_op (convert @0) INTEGER_CST@2)
> +      (if (TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0))
> +          && TREE_CODE (TREE_TYPE (@0)) == INTEGER_TYPE
> +          && TREE_CODE (type) == INTEGER_TYPE)
> +       /* Perform binary operation inside the cast if the constant fits
> +         and there is no overflow.  */
> +       (with
> +       {
> +         tree cst_inner;
> +         bool range_split = true;
> +
> +         wide_int cst = @2;
> +         cst_inner = wide_int_to_tree (TREE_TYPE (@0), cst);
>
> not sure if that really does what you want (you're truncating the constant
> to the smaller type).  I think you want to check
>
>      int_fits_type_p (TREE_TYPE (@0), @2)
>
> and then simply use
>
>          tree cst_inner = fold_convert (TREE_TYPE (@0), @2);
>
> as you do not allow a simple sign-change here if you merge the patterns
> disallowing it in the above one would simplify things as well.
>
> +         value_range vr = VR_INITIALIZER;
> +         extract_range_from_binary_expr (&vr, outer_op, TREE_TYPE (@0),
> +                                         @0, cst_inner, &range_split);
>
>
>>
>> On a side note: Can/should VRP infer ranges assuming no undefined
>> behavior will take place when -fstrict-overflow is in use? I.e.
>> inferring ~[INT_MIN,INT_MIN] for (long)(a - 1)? Would this even make sense?
>
> It could do that but it does not at the moment.
>
> Richard.
>
>> Regards
>>  Robin

Comments

Robin Dapp Nov. 28, 2016, 1:26 p.m. UTC | #1
>> +             /* Sign-extend @1 to TYPE.  */
>> +             w1 = w1.from (w1, TYPE_PRECISION (type), SIGNED);
>>
>> not sure why you do always sign-extend.  If the inner op is unsigned
>> and we widen then that's certainly bogus considering your UINT_MAX
>> example above.  Does
>>
>>                w1 = w1.from (w1, TYPE_PRECISION (type), TYPE_SIGN (inner_type));
>>
>> not work for some reason?

With TYPE_SIGN (inner_type), I encountered situations like this in the
testsuite (mostly Fortran but also 20000422-1.c):

  Folding statement: _1 = _9 + 1;
  Applying pattern match.pd:1214, gimple-match.c:8719
  gimple_simplified to _93 = (sizetype) _8;
  _1 = _93 + 4294967296;
  Folded into: _1 = _93 + 4294967296;

in
  _8 = (unsigned int) i_73;
  _5 = _8 + 4294967295;
  _9 = (sizetype) _5;
  _93 = (sizetype) _8;
  _1 = _93 + 4294967296;

TYPE_SIGN (inner_type) is PLUS here, although it should be MINUS in
order to combine -1 and +1 to 0. Perhaps this can be handled differently
with keeping TYPE_SIGN (inner_type)?

On the other hand, using SIGNED instead of TYPE_SIGN (inner_type) worked
for all cases I looked at, like
  if (a > 1 && a < 10)
    return (unsigned long)(a + UINT_MAX) + 1;
For
  if (a > 0 && a < 10)
extract_range...() would not find a non-VR_VARYING range although we
should probably still be able to simplify this.

  if (a > 0)
Here, vrp figures out [0,4294967294] and the simplification takes place.

For
  if (a <= 10)
    return (unsigned long)(a + (UINT_MAX - 10)) + 1;
003t.original already reads
    return (long unsigned int) a + 4294967286

From this I hand-wavingly deduced, we'd only see instances where a
sign-extension of the constant is fine (test suites on s390x and x86
affirm this for what it's woth) but I don't have a cogent reason hence
my doubts :)

I'm ok with omitting the sign-changing case (I hadn't though of it
anyway) and adapted the patch. I haven't attached the updated version
though, because I'm still unsure about the issue above (despite the
clean test suite).

Regards
 Robin
Robin Dapp Dec. 5, 2016, 7:57 a.m. UTC | #2
Ping. Any idea how to tackle this?
Richard Biener Dec. 6, 2016, 1:03 p.m. UTC | #3
On Mon, Nov 28, 2016 at 2:26 PM, Robin Dapp <rdapp@linux.vnet.ibm.com> wrote:
>>> +             /* Sign-extend @1 to TYPE.  */
>>> +             w1 = w1.from (w1, TYPE_PRECISION (type), SIGNED);
>>>
>>> not sure why you do always sign-extend.  If the inner op is unsigned
>>> and we widen then that's certainly bogus considering your UINT_MAX
>>> example above.  Does
>>>
>>>                w1 = w1.from (w1, TYPE_PRECISION (type), TYPE_SIGN (inner_type));
>>>
>>> not work for some reason?
>
> With TYPE_SIGN (inner_type), I encountered situations like this in the
> testsuite (mostly Fortran but also 20000422-1.c):
>
>   Folding statement: _1 = _9 + 1;
>   Applying pattern match.pd:1214, gimple-match.c:8719
>   gimple_simplified to _93 = (sizetype) _8;
>   _1 = _93 + 4294967296;
>   Folded into: _1 = _93 + 4294967296;
>
> in
>   _8 = (unsigned int) i_73;
>   _5 = _8 + 4294967295;
>   _9 = (sizetype) _5;
>   _93 = (sizetype) _8;
>   _1 = _93 + 4294967296;
>
> TYPE_SIGN (inner_type) is PLUS here, although it should be MINUS in
> order to combine -1 and +1 to 0. Perhaps this can be handled differently
> with keeping TYPE_SIGN (inner_type)?

So we have (uint64_t)(uint32 + -1U) + 1 and using TYPE_SIGN (inner_type)
produces (uint64_t)uint32 + -1U + 1.  This simply means that we cannot ignore
overflow of the inner operation and for some reason your change
to extract_range_from_binary_expr didn't catch this.  That is _8 + 4294967295
overflows but we ignored that.

> On the other hand, using SIGNED instead of TYPE_SIGN (inner_type) worked
> for all cases I looked at, like
>   if (a > 1 && a < 10)
>     return (unsigned long)(a + UINT_MAX) + 1;
> For
>   if (a > 0 && a < 10)
> extract_range...() would not find a non-VR_VARYING range although we
> should probably still be able to simplify this.
>
>   if (a > 0)
> Here, vrp figures out [0,4294967294] and the simplification takes place.
>
> For
>   if (a <= 10)
>     return (unsigned long)(a + (UINT_MAX - 10)) + 1;
> 003t.original already reads
>     return (long unsigned int) a + 4294967286
>
> From this I hand-wavingly deduced, we'd only see instances where a
> sign-extension of the constant is fine (test suites on s390x and x86
> affirm this for what it's woth) but I don't have a cogent reason hence
> my doubts :)

Well, even if sign-extending you can probably construct a testcase where
that's still wrong with respect to overflow.

Richard.

> I'm ok with omitting the sign-changing case (I hadn't though of it
> anyway) and adapted the patch. I haven't attached the updated version
> though, because I'm still unsure about the issue above (despite the
> clean test suite).
>
> Regards
>  Robin
>

Patch
diff mbox

Index: gcc/tree-ssa-propagate.c
===================================================================
--- gcc/tree-ssa-propagate.c    (revision 242875)
+++ gcc/tree-ssa-propagate.c    (working copy)
@@ -1065,13 +1065,15 @@  substitute_and_fold_dom_walker::before_d

       /* Replace real uses in the statement.  */
       did_replace |= replace_uses_in (stmt, get_value_fn);
+      if (did_replace)
+         gimple_set_modified (stmt, true);

       /* If we made a replacement, fold the statement.  */
-      if (did_replace)
+      if (fold_stmt (&i, follow_single_use_edges))
        {
-         fold_stmt (&i, follow_single_use_edges);
          stmt = gsi_stmt (i);
          gimple_set_modified (stmt, true);
+         did_replace = true;
        }

       /* Some statements may be simplified using propagator