diff mbox series

Simplify more EXACT_DIV_EXPR comparisons

Message ID alpine.DEB.2.02.1905191752230.13168@grove.saclay.inria.fr
State New
Headers show
Series Simplify more EXACT_DIV_EXPR comparisons | expand

Commit Message

Marc Glisse May 19, 2019, 4:16 p.m. UTC
Hello,

2 pieces:

- the first one handles the case where the denominator is negative. It 
doesn't happen often with exact_div, so I don't handle it everywhere, but 
this one looked trivial

- handle the case where a pointer difference is cast to an unsigned type 
before being compared to a constant (I hit this in std::vector). With some 
range info we could probably handle some non-constant cases as well...

The second piece breaks Walloca-13.c (-Walloca-larger-than=100 -O2)

void f (void*);
void g (int *p, int *q)
{
   __SIZE_TYPE__ n = (__SIZE_TYPE__)(p - q);
   if (n < 100)
     f (__builtin_alloca (n));
}

At the time of walloca2, we have

   _1 = p_5(D) - q_6(D);
   # RANGE [-2305843009213693952, 2305843009213693951]
   _2 = _1 /[ex] 4;
   # RANGE ~[2305843009213693952, 16140901064495857663]
   n_7 = (long unsigned intD.10) _2;
   _11 = (long unsigned intD.10) _1;
   if (_11 <= 396)
[...]
   _3 = allocaD.1059 (n_7);

and warn. However, DOM3 later produces

   _1 = p_5(D) - q_6(D);
   _11 = (long unsigned intD.10) _1;
   if (_11 <= 396)
[...]
   # RANGE [0, 99] NONZERO 127
   _2 = _1 /[ex] 4;
   # RANGE [0, 99] NONZERO 127
   n_7 = (long unsigned intD.10) _2;
   _3 = allocaD.1059 (n_7);

so I am tempted to say that the walloca2 pass is too early, xfail the 
testcase and file an issue...

A single_use restriction would also probably fix this testcase, but I 
don't think that's a good idea, the new code is better because the 
division is now in the branch.

2019-05-20  Marc Glisse  <marc.glisse@inria.fr>

gcc/
 	* match.pd (X/[ex]D<Y/[ex]D): Handle negative denominator.
 	((size_t)(A /[ex] B) CMP C): New transformation.

gcc/testsuite/
 	* gcc.dg/tree-ssa/cmpexactdiv-3.c: New file.
 	* gcc.dg/tree-ssa/cmpexactdiv-4.c: New file.

Comments

Richard Biener May 20, 2019, 8:04 a.m. UTC | #1
On Sun, May 19, 2019 at 6:16 PM Marc Glisse <marc.glisse@inria.fr> wrote:
>
> Hello,
>
> 2 pieces:
>
> - the first one handles the case where the denominator is negative. It
> doesn't happen often with exact_div, so I don't handle it everywhere, but
> this one looked trivial
>
> - handle the case where a pointer difference is cast to an unsigned type
> before being compared to a constant (I hit this in std::vector). With some
> range info we could probably handle some non-constant cases as well...
>
> The second piece breaks Walloca-13.c (-Walloca-larger-than=100 -O2)
>
> void f (void*);
> void g (int *p, int *q)
> {
>    __SIZE_TYPE__ n = (__SIZE_TYPE__)(p - q);
>    if (n < 100)
>      f (__builtin_alloca (n));
> }
>
> At the time of walloca2, we have
>
>    _1 = p_5(D) - q_6(D);
>    # RANGE [-2305843009213693952, 2305843009213693951]
>    _2 = _1 /[ex] 4;
>    # RANGE ~[2305843009213693952, 16140901064495857663]
>    n_7 = (long unsigned intD.10) _2;
>    _11 = (long unsigned intD.10) _1;
>    if (_11 <= 396)
> [...]
>    _3 = allocaD.1059 (n_7);
>
> and warn.

That's indeed to complicated relation of _11 to n_7 for
VRP predicate discovery.

> However, DOM3 later produces
>
>    _1 = p_5(D) - q_6(D);
>    _11 = (long unsigned intD.10) _1;
>    if (_11 <= 396)

while _11 vs. _1 works fine.

> [...]
>    # RANGE [0, 99] NONZERO 127
>    _2 = _1 /[ex] 4;
>    # RANGE [0, 99] NONZERO 127
>    n_7 = (long unsigned intD.10) _2;
>    _3 = allocaD.1059 (n_7);
>
> so I am tempted to say that the walloca2 pass is too early, xfail the
> testcase and file an issue...

Hmm, there's a DOM pass before walloca2 already and moving
walloca2 after loop opts doesn't look like the best thing to do?
I suppose it's not DOM but sinking that does the important transform
here?  That is,

Index: gcc/passes.def
===================================================================
--- gcc/passes.def      (revision 271395)
+++ gcc/passes.def      (working copy)
@@ -241,9 +241,9 @@ along with GCC; see the file COPYING3.
       NEXT_PASS (pass_optimize_bswap);
       NEXT_PASS (pass_laddress);
       NEXT_PASS (pass_lim);
-      NEXT_PASS (pass_walloca, false);
       NEXT_PASS (pass_pre);
       NEXT_PASS (pass_sink_code);
+      NEXT_PASS (pass_walloca, false);
       NEXT_PASS (pass_sancov);
       NEXT_PASS (pass_asan);
       NEXT_PASS (pass_tsan);

fixes it?

> A single_use restriction would also probably fix this testcase, but I
> don't think that's a good idea, the new code is better because the
> division is now in the branch.

OK.

Can you test the pass motion above?

Thanks,
Richard.

> 2019-05-20  Marc Glisse  <marc.glisse@inria.fr>
>
> gcc/
>         * match.pd (X/[ex]D<Y/[ex]D): Handle negative denominator.
>         ((size_t)(A /[ex] B) CMP C): New transformation.
>
> gcc/testsuite/
>         * gcc.dg/tree-ssa/cmpexactdiv-3.c: New file.
>         * gcc.dg/tree-ssa/cmpexactdiv-4.c: New file.
>
> --
> Marc Glisse
Marc Glisse May 20, 2019, 8:16 a.m. UTC | #2
On Mon, 20 May 2019, Richard Biener wrote:

> On Sun, May 19, 2019 at 6:16 PM Marc Glisse <marc.glisse@inria.fr> wrote:
>>
>> Hello,
>>
>> 2 pieces:
>>
>> - the first one handles the case where the denominator is negative. It
>> doesn't happen often with exact_div, so I don't handle it everywhere, but
>> this one looked trivial
>>
>> - handle the case where a pointer difference is cast to an unsigned type
>> before being compared to a constant (I hit this in std::vector). With some
>> range info we could probably handle some non-constant cases as well...
>>
>> The second piece breaks Walloca-13.c (-Walloca-larger-than=100 -O2)
>>
>> void f (void*);
>> void g (int *p, int *q)
>> {
>>    __SIZE_TYPE__ n = (__SIZE_TYPE__)(p - q);
>>    if (n < 100)
>>      f (__builtin_alloca (n));
>> }
>>
>> At the time of walloca2, we have
>>
>>    _1 = p_5(D) - q_6(D);
>>    # RANGE [-2305843009213693952, 2305843009213693951]
>>    _2 = _1 /[ex] 4;
>>    # RANGE ~[2305843009213693952, 16140901064495857663]
>>    n_7 = (long unsigned intD.10) _2;
>>    _11 = (long unsigned intD.10) _1;
>>    if (_11 <= 396)
>> [...]
>>    _3 = allocaD.1059 (n_7);
>>
>> and warn.
>
> That's indeed to complicated relation of _11 to n_7 for
> VRP predicate discovery.
>
>> However, DOM3 later produces
>>
>>    _1 = p_5(D) - q_6(D);
>>    _11 = (long unsigned intD.10) _1;
>>    if (_11 <= 396)
>
> while _11 vs. _1 works fine.
>
>> [...]
>>    # RANGE [0, 99] NONZERO 127
>>    _2 = _1 /[ex] 4;
>>    # RANGE [0, 99] NONZERO 127
>>    n_7 = (long unsigned intD.10) _2;
>>    _3 = allocaD.1059 (n_7);
>>
>> so I am tempted to say that the walloca2 pass is too early, xfail the
>> testcase and file an issue...
>
> Hmm, there's a DOM pass before walloca2 already and moving
> walloca2 after loop opts doesn't look like the best thing to do?
> I suppose it's not DOM but sinking that does the important transform
> here?  That is,
>
> Index: gcc/passes.def
> ===================================================================
> --- gcc/passes.def      (revision 271395)
> +++ gcc/passes.def      (working copy)
> @@ -241,9 +241,9 @@ along with GCC; see the file COPYING3.
>       NEXT_PASS (pass_optimize_bswap);
>       NEXT_PASS (pass_laddress);
>       NEXT_PASS (pass_lim);
> -      NEXT_PASS (pass_walloca, false);
>       NEXT_PASS (pass_pre);
>       NEXT_PASS (pass_sink_code);
> +      NEXT_PASS (pass_walloca, false);
>       NEXT_PASS (pass_sancov);
>       NEXT_PASS (pass_asan);
>       NEXT_PASS (pass_tsan);
>
> fixes it?

I will check, but I don't think walloca uses any kind of on-demand VRP, so 
we still need some pass to update the ranges after sinking, which doesn't 
seem to happen until the next DOM pass.
Richard Biener May 20, 2019, 9:16 a.m. UTC | #3
On Mon, May 20, 2019 at 10:16 AM Marc Glisse <marc.glisse@inria.fr> wrote:
>
> On Mon, 20 May 2019, Richard Biener wrote:
>
> > On Sun, May 19, 2019 at 6:16 PM Marc Glisse <marc.glisse@inria.fr> wrote:
> >>
> >> Hello,
> >>
> >> 2 pieces:
> >>
> >> - the first one handles the case where the denominator is negative. It
> >> doesn't happen often with exact_div, so I don't handle it everywhere, but
> >> this one looked trivial
> >>
> >> - handle the case where a pointer difference is cast to an unsigned type
> >> before being compared to a constant (I hit this in std::vector). With some
> >> range info we could probably handle some non-constant cases as well...
> >>
> >> The second piece breaks Walloca-13.c (-Walloca-larger-than=100 -O2)
> >>
> >> void f (void*);
> >> void g (int *p, int *q)
> >> {
> >>    __SIZE_TYPE__ n = (__SIZE_TYPE__)(p - q);
> >>    if (n < 100)
> >>      f (__builtin_alloca (n));
> >> }
> >>
> >> At the time of walloca2, we have
> >>
> >>    _1 = p_5(D) - q_6(D);
> >>    # RANGE [-2305843009213693952, 2305843009213693951]
> >>    _2 = _1 /[ex] 4;
> >>    # RANGE ~[2305843009213693952, 16140901064495857663]
> >>    n_7 = (long unsigned intD.10) _2;
> >>    _11 = (long unsigned intD.10) _1;
> >>    if (_11 <= 396)
> >> [...]
> >>    _3 = allocaD.1059 (n_7);
> >>
> >> and warn.
> >
> > That's indeed to complicated relation of _11 to n_7 for
> > VRP predicate discovery.
> >
> >> However, DOM3 later produces
> >>
> >>    _1 = p_5(D) - q_6(D);
> >>    _11 = (long unsigned intD.10) _1;
> >>    if (_11 <= 396)
> >
> > while _11 vs. _1 works fine.
> >
> >> [...]
> >>    # RANGE [0, 99] NONZERO 127
> >>    _2 = _1 /[ex] 4;
> >>    # RANGE [0, 99] NONZERO 127
> >>    n_7 = (long unsigned intD.10) _2;
> >>    _3 = allocaD.1059 (n_7);
> >>
> >> so I am tempted to say that the walloca2 pass is too early, xfail the
> >> testcase and file an issue...
> >
> > Hmm, there's a DOM pass before walloca2 already and moving
> > walloca2 after loop opts doesn't look like the best thing to do?
> > I suppose it's not DOM but sinking that does the important transform
> > here?  That is,
> >
> > Index: gcc/passes.def
> > ===================================================================
> > --- gcc/passes.def      (revision 271395)
> > +++ gcc/passes.def      (working copy)
> > @@ -241,9 +241,9 @@ along with GCC; see the file COPYING3.
> >       NEXT_PASS (pass_optimize_bswap);
> >       NEXT_PASS (pass_laddress);
> >       NEXT_PASS (pass_lim);
> > -      NEXT_PASS (pass_walloca, false);
> >       NEXT_PASS (pass_pre);
> >       NEXT_PASS (pass_sink_code);
> > +      NEXT_PASS (pass_walloca, false);
> >       NEXT_PASS (pass_sancov);
> >       NEXT_PASS (pass_asan);
> >       NEXT_PASS (pass_tsan);
> >
> > fixes it?
>
> I will check, but I don't think walloca uses any kind of on-demand VRP, so
> we still need some pass to update the ranges after sinking, which doesn't
> seem to happen until the next DOM pass.

Oh, ok...  Aldy, why's this a separate pass anyways?  I think similar
other warnigns are emitted from RTL expansion?  So maybe we can
indeed move the pass towards warn_restrict or late_warn_uninit.
I also see that the Og pass pipeline misses the second walloca pass
completely (and also the warn_restrict pass).

Given code sinkings obvious effects on SSA value-range representation
it may make sense to add another instance of that pass earlier.



>
> --
> Marc Glisse
Jeff Law May 20, 2019, 5:28 p.m. UTC | #4
On 5/20/19 3:16 AM, Richard Biener wrote:
> On Mon, May 20, 2019 at 10:16 AM Marc Glisse <marc.glisse@inria.fr> wrote:
>>
>> On Mon, 20 May 2019, Richard Biener wrote:
>>
>>> On Sun, May 19, 2019 at 6:16 PM Marc Glisse <marc.glisse@inria.fr> wrote:
>>>>
>>>> Hello,
>>>>
>>>> 2 pieces:
>>>>
>>>> - the first one handles the case where the denominator is negative. It
>>>> doesn't happen often with exact_div, so I don't handle it everywhere, but
>>>> this one looked trivial
>>>>
>>>> - handle the case where a pointer difference is cast to an unsigned type
>>>> before being compared to a constant (I hit this in std::vector). With some
>>>> range info we could probably handle some non-constant cases as well...
>>>>
>>>> The second piece breaks Walloca-13.c (-Walloca-larger-than=100 -O2)
>>>>
>>>> void f (void*);
>>>> void g (int *p, int *q)
>>>> {
>>>>    __SIZE_TYPE__ n = (__SIZE_TYPE__)(p - q);
>>>>    if (n < 100)
>>>>      f (__builtin_alloca (n));
>>>> }
>>>>
>>>> At the time of walloca2, we have
>>>>
>>>>    _1 = p_5(D) - q_6(D);
>>>>    # RANGE [-2305843009213693952, 2305843009213693951]
>>>>    _2 = _1 /[ex] 4;
>>>>    # RANGE ~[2305843009213693952, 16140901064495857663]
>>>>    n_7 = (long unsigned intD.10) _2;
>>>>    _11 = (long unsigned intD.10) _1;
>>>>    if (_11 <= 396)
>>>> [...]
>>>>    _3 = allocaD.1059 (n_7);
>>>>
>>>> and warn.
>>>
>>> That's indeed to complicated relation of _11 to n_7 for
>>> VRP predicate discovery.
>>>
>>>> However, DOM3 later produces
>>>>
>>>>    _1 = p_5(D) - q_6(D);
>>>>    _11 = (long unsigned intD.10) _1;
>>>>    if (_11 <= 396)
>>>
>>> while _11 vs. _1 works fine.
>>>
>>>> [...]
>>>>    # RANGE [0, 99] NONZERO 127
>>>>    _2 = _1 /[ex] 4;
>>>>    # RANGE [0, 99] NONZERO 127
>>>>    n_7 = (long unsigned intD.10) _2;
>>>>    _3 = allocaD.1059 (n_7);
>>>>
>>>> so I am tempted to say that the walloca2 pass is too early, xfail the
>>>> testcase and file an issue...
>>>
>>> Hmm, there's a DOM pass before walloca2 already and moving
>>> walloca2 after loop opts doesn't look like the best thing to do?
>>> I suppose it's not DOM but sinking that does the important transform
>>> here?  That is,
>>>
>>> Index: gcc/passes.def
>>> ===================================================================
>>> --- gcc/passes.def      (revision 271395)
>>> +++ gcc/passes.def      (working copy)
>>> @@ -241,9 +241,9 @@ along with GCC; see the file COPYING3.
>>>       NEXT_PASS (pass_optimize_bswap);
>>>       NEXT_PASS (pass_laddress);
>>>       NEXT_PASS (pass_lim);
>>> -      NEXT_PASS (pass_walloca, false);
>>>       NEXT_PASS (pass_pre);
>>>       NEXT_PASS (pass_sink_code);
>>> +      NEXT_PASS (pass_walloca, false);
>>>       NEXT_PASS (pass_sancov);
>>>       NEXT_PASS (pass_asan);
>>>       NEXT_PASS (pass_tsan);
>>>
>>> fixes it?
>>
>> I will check, but I don't think walloca uses any kind of on-demand VRP, so
>> we still need some pass to update the ranges after sinking, which doesn't
>> seem to happen until the next DOM pass.
> 
> Oh, ok...  Aldy, why's this a separate pass anyways?  I think similar
> other warnigns are emitted from RTL expansion?  So maybe we can
> indeed move the pass towards warn_restrict or late_warn_uninit.
> I also see that the Og pass pipeline misses the second walloca pass
> completely (and also the warn_restrict pass).
I'd tend to agree that pushing it down would be a good thing.

It's probably a separate pass because I suggested it not be embedded
within tree-vrp.c which I felt had way too much stuff in it already.




> 
> Given code sinkings obvious effects on SSA value-range representation
> it may make sense to add another instance of that pass earlier.
There is an early pass already.

jeff
Jeff Law May 20, 2019, 5:32 p.m. UTC | #5
On 5/20/19 2:04 AM, Richard Biener wrote:
> On Sun, May 19, 2019 at 6:16 PM Marc Glisse <marc.glisse@inria.fr> wrote:
>>
>> Hello,
>>
>> 2 pieces:
>>
>> - the first one handles the case where the denominator is negative. It
>> doesn't happen often with exact_div, so I don't handle it everywhere, but
>> this one looked trivial
>>
>> - handle the case where a pointer difference is cast to an unsigned type
>> before being compared to a constant (I hit this in std::vector). With some
>> range info we could probably handle some non-constant cases as well...
>>
>> The second piece breaks Walloca-13.c (-Walloca-larger-than=100 -O2)
>>
>> void f (void*);
>> void g (int *p, int *q)
>> {
>>    __SIZE_TYPE__ n = (__SIZE_TYPE__)(p - q);
>>    if (n < 100)
>>      f (__builtin_alloca (n));
>> }
>>
>> At the time of walloca2, we have
>>
>>    _1 = p_5(D) - q_6(D);
>>    # RANGE [-2305843009213693952, 2305843009213693951]
>>    _2 = _1 /[ex] 4;
>>    # RANGE ~[2305843009213693952, 16140901064495857663]
>>    n_7 = (long unsigned intD.10) _2;
>>    _11 = (long unsigned intD.10) _1;
>>    if (_11 <= 396)
>> [...]
>>    _3 = allocaD.1059 (n_7);
>>
>> and warn.
> 
> That's indeed to complicated relation of _11 to n_7 for
> VRP predicate discovery.
> 
>> However, DOM3 later produces
>>
>>    _1 = p_5(D) - q_6(D);
>>    _11 = (long unsigned intD.10) _1;
>>    if (_11 <= 396)
> 
> while _11 vs. _1 works fine.
> 
>> [...]
>>    # RANGE [0, 99] NONZERO 127
>>    _2 = _1 /[ex] 4;
>>    # RANGE [0, 99] NONZERO 127
>>    n_7 = (long unsigned intD.10) _2;
>>    _3 = allocaD.1059 (n_7);
>>
>> so I am tempted to say that the walloca2 pass is too early, xfail the
>> testcase and file an issue...
> 
> Hmm, there's a DOM pass before walloca2 already and moving
> walloca2 after loop opts doesn't look like the best thing to do?
> I suppose it's not DOM but sinking that does the important transform
> here?  That is,
It could well be sinking that's critical here.  It certainly was in
cases I looked at in our <vec> code and how we needed to transform the
pointer subtraction and subsequent testing to discover infeasible paths.

Jeff
Marc Glisse May 20, 2019, 5:37 p.m. UTC | #6
On Mon, 20 May 2019, Jeff Law wrote:

> On 5/20/19 3:16 AM, Richard Biener wrote:
>> Given code sinkings obvious effects on SSA value-range representation
>> it may make sense to add another instance of that pass earlier.
> There is an early pass already.

IIUC Richard was talking about adding an early 'sink' pass and you are
saying that there is already an early 'walloca' pass? Or are you talking 
about a different pass not called 'sink' that does something similar?
Jeff Law May 20, 2019, 5:41 p.m. UTC | #7
On 5/20/19 11:37 AM, Marc Glisse wrote:
> On Mon, 20 May 2019, Jeff Law wrote:
> 
>> On 5/20/19 3:16 AM, Richard Biener wrote:
>>> Given code sinkings obvious effects on SSA value-range representation
>>> it may make sense to add another instance of that pass earlier.
>> There is an early pass already.
> 
> IIUC Richard was talking about adding an early 'sink' pass and you are
> saying that there is already an early 'walloca' pass? Or are you talking
> about a different pass not called 'sink' that does something similar?
Oh, I misunderstood.  I meant there was an early Walloca pass.

jeff
Martin Sebor May 21, 2019, 2:13 a.m. UTC | #8
On 5/20/19 3:16 AM, Richard Biener wrote:
> On Mon, May 20, 2019 at 10:16 AM Marc Glisse <marc.glisse@inria.fr> wrote:
>>
>> On Mon, 20 May 2019, Richard Biener wrote:
>>
>>> On Sun, May 19, 2019 at 6:16 PM Marc Glisse <marc.glisse@inria.fr> wrote:
>>>>
>>>> Hello,
>>>>
>>>> 2 pieces:
>>>>
>>>> - the first one handles the case where the denominator is negative. It
>>>> doesn't happen often with exact_div, so I don't handle it everywhere, but
>>>> this one looked trivial
>>>>
>>>> - handle the case where a pointer difference is cast to an unsigned type
>>>> before being compared to a constant (I hit this in std::vector). With some
>>>> range info we could probably handle some non-constant cases as well...
>>>>
>>>> The second piece breaks Walloca-13.c (-Walloca-larger-than=100 -O2)
>>>>
>>>> void f (void*);
>>>> void g (int *p, int *q)
>>>> {
>>>>     __SIZE_TYPE__ n = (__SIZE_TYPE__)(p - q);
>>>>     if (n < 100)
>>>>       f (__builtin_alloca (n));
>>>> }
>>>>
>>>> At the time of walloca2, we have
>>>>
>>>>     _1 = p_5(D) - q_6(D);
>>>>     # RANGE [-2305843009213693952, 2305843009213693951]
>>>>     _2 = _1 /[ex] 4;
>>>>     # RANGE ~[2305843009213693952, 16140901064495857663]
>>>>     n_7 = (long unsigned intD.10) _2;
>>>>     _11 = (long unsigned intD.10) _1;
>>>>     if (_11 <= 396)
>>>> [...]
>>>>     _3 = allocaD.1059 (n_7);
>>>>
>>>> and warn.
>>>
>>> That's indeed to complicated relation of _11 to n_7 for
>>> VRP predicate discovery.
>>>
>>>> However, DOM3 later produces
>>>>
>>>>     _1 = p_5(D) - q_6(D);
>>>>     _11 = (long unsigned intD.10) _1;
>>>>     if (_11 <= 396)
>>>
>>> while _11 vs. _1 works fine.
>>>
>>>> [...]
>>>>     # RANGE [0, 99] NONZERO 127
>>>>     _2 = _1 /[ex] 4;
>>>>     # RANGE [0, 99] NONZERO 127
>>>>     n_7 = (long unsigned intD.10) _2;
>>>>     _3 = allocaD.1059 (n_7);
>>>>
>>>> so I am tempted to say that the walloca2 pass is too early, xfail the
>>>> testcase and file an issue...
>>>
>>> Hmm, there's a DOM pass before walloca2 already and moving
>>> walloca2 after loop opts doesn't look like the best thing to do?
>>> I suppose it's not DOM but sinking that does the important transform
>>> here?  That is,
>>>
>>> Index: gcc/passes.def
>>> ===================================================================
>>> --- gcc/passes.def      (revision 271395)
>>> +++ gcc/passes.def      (working copy)
>>> @@ -241,9 +241,9 @@ along with GCC; see the file COPYING3.
>>>        NEXT_PASS (pass_optimize_bswap);
>>>        NEXT_PASS (pass_laddress);
>>>        NEXT_PASS (pass_lim);
>>> -      NEXT_PASS (pass_walloca, false);
>>>        NEXT_PASS (pass_pre);
>>>        NEXT_PASS (pass_sink_code);
>>> +      NEXT_PASS (pass_walloca, false);
>>>        NEXT_PASS (pass_sancov);
>>>        NEXT_PASS (pass_asan);
>>>        NEXT_PASS (pass_tsan);
>>>
>>> fixes it?
>>
>> I will check, but I don't think walloca uses any kind of on-demand VRP, so
>> we still need some pass to update the ranges after sinking, which doesn't
>> seem to happen until the next DOM pass.
> 
> Oh, ok...  Aldy, why's this a separate pass anyways?  I think similar
> other warnigns are emitted from RTL expansion?  So maybe we can
> indeed move the pass towards warn_restrict or late_warn_uninit.

I thought there was a preference to add new middle-end warnings
into passes of their own rather than into existing passes.  Is
that not so (either in general or in this specific case)?

 From my POV, the main (only?) benefit of putting warnings in their
own passes is modularity.  Are there any others?

The biggest drawback I see is that it makes it hard to then share
data across multiple passes.  The sharing can help not just
warnings (reduce both false positive and false negative rates) but
also optimization.  That's why I'm merging the strlen and sprintf
passes, and want to eventually also look into merging
the -Wstringop-overflow warnings there (also emitted just before
RTL expansion.  Did I miss any downsides?

I don't know if there's the -Walloca pass would benefit from merging
with any of the others or vice versa, but superficially it seems like
it might be worth thinking about integrating the -Walloc-larger-than
warnings into the -Walloca pass, if only to keep similar functionality
in the same place.

> I also see that the Og pass pipeline misses the second walloca pass
> completely (and also the warn_restrict pass).

That's worth fixing.

Martin

> 
> Given code sinkings obvious effects on SSA value-range representation
> it may make sense to add another instance of that pass earlier.
> 
> 
> 
>>
>> --
>> Marc Glisse
Richard Biener May 21, 2019, 9:53 a.m. UTC | #9
On Tue, May 21, 2019 at 4:13 AM Martin Sebor <msebor@gmail.com> wrote:
>
> On 5/20/19 3:16 AM, Richard Biener wrote:
> > On Mon, May 20, 2019 at 10:16 AM Marc Glisse <marc.glisse@inria.fr> wrote:
> >>
> >> On Mon, 20 May 2019, Richard Biener wrote:
> >>
> >>> On Sun, May 19, 2019 at 6:16 PM Marc Glisse <marc.glisse@inria.fr> wrote:
> >>>>
> >>>> Hello,
> >>>>
> >>>> 2 pieces:
> >>>>
> >>>> - the first one handles the case where the denominator is negative. It
> >>>> doesn't happen often with exact_div, so I don't handle it everywhere, but
> >>>> this one looked trivial
> >>>>
> >>>> - handle the case where a pointer difference is cast to an unsigned type
> >>>> before being compared to a constant (I hit this in std::vector). With some
> >>>> range info we could probably handle some non-constant cases as well...
> >>>>
> >>>> The second piece breaks Walloca-13.c (-Walloca-larger-than=100 -O2)
> >>>>
> >>>> void f (void*);
> >>>> void g (int *p, int *q)
> >>>> {
> >>>>     __SIZE_TYPE__ n = (__SIZE_TYPE__)(p - q);
> >>>>     if (n < 100)
> >>>>       f (__builtin_alloca (n));
> >>>> }
> >>>>
> >>>> At the time of walloca2, we have
> >>>>
> >>>>     _1 = p_5(D) - q_6(D);
> >>>>     # RANGE [-2305843009213693952, 2305843009213693951]
> >>>>     _2 = _1 /[ex] 4;
> >>>>     # RANGE ~[2305843009213693952, 16140901064495857663]
> >>>>     n_7 = (long unsigned intD.10) _2;
> >>>>     _11 = (long unsigned intD.10) _1;
> >>>>     if (_11 <= 396)
> >>>> [...]
> >>>>     _3 = allocaD.1059 (n_7);
> >>>>
> >>>> and warn.
> >>>
> >>> That's indeed to complicated relation of _11 to n_7 for
> >>> VRP predicate discovery.
> >>>
> >>>> However, DOM3 later produces
> >>>>
> >>>>     _1 = p_5(D) - q_6(D);
> >>>>     _11 = (long unsigned intD.10) _1;
> >>>>     if (_11 <= 396)
> >>>
> >>> while _11 vs. _1 works fine.
> >>>
> >>>> [...]
> >>>>     # RANGE [0, 99] NONZERO 127
> >>>>     _2 = _1 /[ex] 4;
> >>>>     # RANGE [0, 99] NONZERO 127
> >>>>     n_7 = (long unsigned intD.10) _2;
> >>>>     _3 = allocaD.1059 (n_7);
> >>>>
> >>>> so I am tempted to say that the walloca2 pass is too early, xfail the
> >>>> testcase and file an issue...
> >>>
> >>> Hmm, there's a DOM pass before walloca2 already and moving
> >>> walloca2 after loop opts doesn't look like the best thing to do?
> >>> I suppose it's not DOM but sinking that does the important transform
> >>> here?  That is,
> >>>
> >>> Index: gcc/passes.def
> >>> ===================================================================
> >>> --- gcc/passes.def      (revision 271395)
> >>> +++ gcc/passes.def      (working copy)
> >>> @@ -241,9 +241,9 @@ along with GCC; see the file COPYING3.
> >>>        NEXT_PASS (pass_optimize_bswap);
> >>>        NEXT_PASS (pass_laddress);
> >>>        NEXT_PASS (pass_lim);
> >>> -      NEXT_PASS (pass_walloca, false);
> >>>        NEXT_PASS (pass_pre);
> >>>        NEXT_PASS (pass_sink_code);
> >>> +      NEXT_PASS (pass_walloca, false);
> >>>        NEXT_PASS (pass_sancov);
> >>>        NEXT_PASS (pass_asan);
> >>>        NEXT_PASS (pass_tsan);
> >>>
> >>> fixes it?
> >>
> >> I will check, but I don't think walloca uses any kind of on-demand VRP, so
> >> we still need some pass to update the ranges after sinking, which doesn't
> >> seem to happen until the next DOM pass.
> >
> > Oh, ok...  Aldy, why's this a separate pass anyways?  I think similar
> > other warnigns are emitted from RTL expansion?  So maybe we can
> > indeed move the pass towards warn_restrict or late_warn_uninit.
>
> I thought there was a preference to add new middle-end warnings
> into passes of their own rather than into existing passes.  Is
> that not so (either in general or in this specific case)?

The preference was to add them not into optimization passes.  But
of course having 10+ warning passes, each going over the whole IL
is excessive.  Also each of the locally computing ranges or so.

Given the simplicity of Walloca I wonder why it's not part of another
warning pass - since it's about tracking "sizes" again there are plenty
that fit ;)

>  From my POV, the main (only?) benefit of putting warnings in their
> own passes is modularity.  Are there any others?
>
> The biggest drawback I see is that it makes it hard to then share
> data across multiple passes.  The sharing can help not just
> warnings (reduce both false positive and false negative rates) but
> also optimization.  That's why I'm merging the strlen and sprintf
> passes, and want to eventually also look into merging
> the -Wstringop-overflow warnings there (also emitted just before
> RTL expansion.  Did I miss any downsides?

When things fit together they are fine to merge obviously.

One may not like -Warray-bounds inside VRP but it really "fits".

OTOH making a warning part of an optimization pass naturally
limits its effect to when the specific optimization is enabled.
In theory it's possible to do -Warray-bounds at -O0 - we are in
SSA form after all - but of course you don't want to enable VRP at -O0.

> I don't know if there's the -Walloca pass would benefit from merging
> with any of the others or vice versa, but superficially it seems like
> it might be worth thinking about integrating the -Walloc-larger-than
> warnings into the -Walloca pass, if only to keep similar functionality
> in the same place.

-Wrestrict and all the format warning stuff seems related as well.

> > I also see that the Og pass pipeline misses the second walloca pass
> > completely (and also the warn_restrict pass).
>
> That's worth fixing.
>
> Martin
>
> >
> > Given code sinkings obvious effects on SSA value-range representation
> > it may make sense to add another instance of that pass earlier.
> >
> >
> >
> >>
> >> --
> >> Marc Glisse
>
Aldy Hernandez May 27, 2019, 1:09 p.m. UTC | #10
On 5/21/19 5:53 AM, Richard Biener wrote:
> On Tue, May 21, 2019 at 4:13 AM Martin Sebor <msebor@gmail.com> wrote:
>>
>> On 5/20/19 3:16 AM, Richard Biener wrote:
>>> On Mon, May 20, 2019 at 10:16 AM Marc Glisse <marc.glisse@inria.fr> wrote:
>>>>
>>>> On Mon, 20 May 2019, Richard Biener wrote:
>>>>
>>>>> On Sun, May 19, 2019 at 6:16 PM Marc Glisse <marc.glisse@inria.fr> wrote:
>>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> 2 pieces:
>>>>>>
>>>>>> - the first one handles the case where the denominator is negative. It
>>>>>> doesn't happen often with exact_div, so I don't handle it everywhere, but
>>>>>> this one looked trivial
>>>>>>
>>>>>> - handle the case where a pointer difference is cast to an unsigned type
>>>>>> before being compared to a constant (I hit this in std::vector). With some
>>>>>> range info we could probably handle some non-constant cases as well...
>>>>>>
>>>>>> The second piece breaks Walloca-13.c (-Walloca-larger-than=100 -O2)
>>>>>>
>>>>>> void f (void*);
>>>>>> void g (int *p, int *q)
>>>>>> {
>>>>>>      __SIZE_TYPE__ n = (__SIZE_TYPE__)(p - q);
>>>>>>      if (n < 100)
>>>>>>        f (__builtin_alloca (n));
>>>>>> }
>>>>>>
>>>>>> At the time of walloca2, we have
>>>>>>
>>>>>>      _1 = p_5(D) - q_6(D);
>>>>>>      # RANGE [-2305843009213693952, 2305843009213693951]
>>>>>>      _2 = _1 /[ex] 4;
>>>>>>      # RANGE ~[2305843009213693952, 16140901064495857663]
>>>>>>      n_7 = (long unsigned intD.10) _2;
>>>>>>      _11 = (long unsigned intD.10) _1;
>>>>>>      if (_11 <= 396)
>>>>>> [...]
>>>>>>      _3 = allocaD.1059 (n_7);
>>>>>>
>>>>>> and warn.
>>>>>
>>>>> That's indeed to complicated relation of _11 to n_7 for
>>>>> VRP predicate discovery.
>>>>>
>>>>>> However, DOM3 later produces
>>>>>>
>>>>>>      _1 = p_5(D) - q_6(D);
>>>>>>      _11 = (long unsigned intD.10) _1;
>>>>>>      if (_11 <= 396)
>>>>>
>>>>> while _11 vs. _1 works fine.
>>>>>
>>>>>> [...]
>>>>>>      # RANGE [0, 99] NONZERO 127
>>>>>>      _2 = _1 /[ex] 4;
>>>>>>      # RANGE [0, 99] NONZERO 127
>>>>>>      n_7 = (long unsigned intD.10) _2;
>>>>>>      _3 = allocaD.1059 (n_7);
>>>>>>
>>>>>> so I am tempted to say that the walloca2 pass is too early, xfail the
>>>>>> testcase and file an issue...
>>>>>
>>>>> Hmm, there's a DOM pass before walloca2 already and moving
>>>>> walloca2 after loop opts doesn't look like the best thing to do?
>>>>> I suppose it's not DOM but sinking that does the important transform
>>>>> here?  That is,
>>>>>
>>>>> Index: gcc/passes.def
>>>>> ===================================================================
>>>>> --- gcc/passes.def      (revision 271395)
>>>>> +++ gcc/passes.def      (working copy)
>>>>> @@ -241,9 +241,9 @@ along with GCC; see the file COPYING3.
>>>>>         NEXT_PASS (pass_optimize_bswap);
>>>>>         NEXT_PASS (pass_laddress);
>>>>>         NEXT_PASS (pass_lim);
>>>>> -      NEXT_PASS (pass_walloca, false);
>>>>>         NEXT_PASS (pass_pre);
>>>>>         NEXT_PASS (pass_sink_code);
>>>>> +      NEXT_PASS (pass_walloca, false);
>>>>>         NEXT_PASS (pass_sancov);
>>>>>         NEXT_PASS (pass_asan);
>>>>>         NEXT_PASS (pass_tsan);
>>>>>
>>>>> fixes it?
>>>>
>>>> I will check, but I don't think walloca uses any kind of on-demand VRP, so
>>>> we still need some pass to update the ranges after sinking, which doesn't
>>>> seem to happen until the next DOM pass.
>>>
>>> Oh, ok...  Aldy, why's this a separate pass anyways?  I think similar
>>> other warnigns are emitted from RTL expansion?  So maybe we can
>>> indeed move the pass towards warn_restrict or late_warn_uninit.
>>
>> I thought there was a preference to add new middle-end warnings
>> into passes of their own rather than into existing passes.  Is
>> that not so (either in general or in this specific case)?
> 
> The preference was to add them not into optimization passes.  But
> of course having 10+ warning passes, each going over the whole IL
> is excessive.  Also each of the locally computing ranges or so.
> 
> Given the simplicity of Walloca I wonder why it's not part of another
> warning pass - since it's about tracking "sizes" again there are plenty
> that fit ;)
> 
>>   From my POV, the main (only?) benefit of putting warnings in their
>> own passes is modularity.  Are there any others?
>>
>> The biggest drawback I see is that it makes it hard to then share
>> data across multiple passes.  The sharing can help not just
>> warnings (reduce both false positive and false negative rates) but
>> also optimization.  That's why I'm merging the strlen and sprintf
>> passes, and want to eventually also look into merging
>> the -Wstringop-overflow warnings there (also emitted just before
>> RTL expansion.  Did I miss any downsides?
> 
> When things fit together they are fine to merge obviously.
> 
> One may not like -Warray-bounds inside VRP but it really "fits".
> 
> OTOH making a warning part of an optimization pass naturally
> limits its effect to when the specific optimization is enabled.
> In theory it's possible to do -Warray-bounds at -O0 - we are in
> SSA form after all - but of course you don't want to enable VRP at -O0.
> 
>> I don't know if there's the -Walloca pass would benefit from merging
>> with any of the others or vice versa, but superficially it seems like
>> it might be worth thinking about integrating the -Walloc-larger-than
>> warnings into the -Walloca pass, if only to keep similar functionality
>> in the same place.

My original code was in the VRP pass and I can't remember whether it was 
you or Jeff that suggested a separate pass so we'd stop polluting VRP 
with everything but the kitchen sink.

Aldy
Richard Biener May 27, 2019, 1:57 p.m. UTC | #11
On Mon, May 27, 2019 at 3:09 PM Aldy Hernandez <aldyh@redhat.com> wrote:
>
>
>
> On 5/21/19 5:53 AM, Richard Biener wrote:
> > On Tue, May 21, 2019 at 4:13 AM Martin Sebor <msebor@gmail.com> wrote:
> >>
> >> On 5/20/19 3:16 AM, Richard Biener wrote:
> >>> On Mon, May 20, 2019 at 10:16 AM Marc Glisse <marc.glisse@inria.fr> wrote:
> >>>>
> >>>> On Mon, 20 May 2019, Richard Biener wrote:
> >>>>
> >>>>> On Sun, May 19, 2019 at 6:16 PM Marc Glisse <marc.glisse@inria.fr> wrote:
> >>>>>>
> >>>>>> Hello,
> >>>>>>
> >>>>>> 2 pieces:
> >>>>>>
> >>>>>> - the first one handles the case where the denominator is negative. It
> >>>>>> doesn't happen often with exact_div, so I don't handle it everywhere, but
> >>>>>> this one looked trivial
> >>>>>>
> >>>>>> - handle the case where a pointer difference is cast to an unsigned type
> >>>>>> before being compared to a constant (I hit this in std::vector). With some
> >>>>>> range info we could probably handle some non-constant cases as well...
> >>>>>>
> >>>>>> The second piece breaks Walloca-13.c (-Walloca-larger-than=100 -O2)
> >>>>>>
> >>>>>> void f (void*);
> >>>>>> void g (int *p, int *q)
> >>>>>> {
> >>>>>>      __SIZE_TYPE__ n = (__SIZE_TYPE__)(p - q);
> >>>>>>      if (n < 100)
> >>>>>>        f (__builtin_alloca (n));
> >>>>>> }
> >>>>>>
> >>>>>> At the time of walloca2, we have
> >>>>>>
> >>>>>>      _1 = p_5(D) - q_6(D);
> >>>>>>      # RANGE [-2305843009213693952, 2305843009213693951]
> >>>>>>      _2 = _1 /[ex] 4;
> >>>>>>      # RANGE ~[2305843009213693952, 16140901064495857663]
> >>>>>>      n_7 = (long unsigned intD.10) _2;
> >>>>>>      _11 = (long unsigned intD.10) _1;
> >>>>>>      if (_11 <= 396)
> >>>>>> [...]
> >>>>>>      _3 = allocaD.1059 (n_7);
> >>>>>>
> >>>>>> and warn.
> >>>>>
> >>>>> That's indeed to complicated relation of _11 to n_7 for
> >>>>> VRP predicate discovery.
> >>>>>
> >>>>>> However, DOM3 later produces
> >>>>>>
> >>>>>>      _1 = p_5(D) - q_6(D);
> >>>>>>      _11 = (long unsigned intD.10) _1;
> >>>>>>      if (_11 <= 396)
> >>>>>
> >>>>> while _11 vs. _1 works fine.
> >>>>>
> >>>>>> [...]
> >>>>>>      # RANGE [0, 99] NONZERO 127
> >>>>>>      _2 = _1 /[ex] 4;
> >>>>>>      # RANGE [0, 99] NONZERO 127
> >>>>>>      n_7 = (long unsigned intD.10) _2;
> >>>>>>      _3 = allocaD.1059 (n_7);
> >>>>>>
> >>>>>> so I am tempted to say that the walloca2 pass is too early, xfail the
> >>>>>> testcase and file an issue...
> >>>>>
> >>>>> Hmm, there's a DOM pass before walloca2 already and moving
> >>>>> walloca2 after loop opts doesn't look like the best thing to do?
> >>>>> I suppose it's not DOM but sinking that does the important transform
> >>>>> here?  That is,
> >>>>>
> >>>>> Index: gcc/passes.def
> >>>>> ===================================================================
> >>>>> --- gcc/passes.def      (revision 271395)
> >>>>> +++ gcc/passes.def      (working copy)
> >>>>> @@ -241,9 +241,9 @@ along with GCC; see the file COPYING3.
> >>>>>         NEXT_PASS (pass_optimize_bswap);
> >>>>>         NEXT_PASS (pass_laddress);
> >>>>>         NEXT_PASS (pass_lim);
> >>>>> -      NEXT_PASS (pass_walloca, false);
> >>>>>         NEXT_PASS (pass_pre);
> >>>>>         NEXT_PASS (pass_sink_code);
> >>>>> +      NEXT_PASS (pass_walloca, false);
> >>>>>         NEXT_PASS (pass_sancov);
> >>>>>         NEXT_PASS (pass_asan);
> >>>>>         NEXT_PASS (pass_tsan);
> >>>>>
> >>>>> fixes it?
> >>>>
> >>>> I will check, but I don't think walloca uses any kind of on-demand VRP, so
> >>>> we still need some pass to update the ranges after sinking, which doesn't
> >>>> seem to happen until the next DOM pass.
> >>>
> >>> Oh, ok...  Aldy, why's this a separate pass anyways?  I think similar
> >>> other warnigns are emitted from RTL expansion?  So maybe we can
> >>> indeed move the pass towards warn_restrict or late_warn_uninit.
> >>
> >> I thought there was a preference to add new middle-end warnings
> >> into passes of their own rather than into existing passes.  Is
> >> that not so (either in general or in this specific case)?
> >
> > The preference was to add them not into optimization passes.  But
> > of course having 10+ warning passes, each going over the whole IL
> > is excessive.  Also each of the locally computing ranges or so.
> >
> > Given the simplicity of Walloca I wonder why it's not part of another
> > warning pass - since it's about tracking "sizes" again there are plenty
> > that fit ;)
> >
> >>   From my POV, the main (only?) benefit of putting warnings in their
> >> own passes is modularity.  Are there any others?
> >>
> >> The biggest drawback I see is that it makes it hard to then share
> >> data across multiple passes.  The sharing can help not just
> >> warnings (reduce both false positive and false negative rates) but
> >> also optimization.  That's why I'm merging the strlen and sprintf
> >> passes, and want to eventually also look into merging
> >> the -Wstringop-overflow warnings there (also emitted just before
> >> RTL expansion.  Did I miss any downsides?
> >
> > When things fit together they are fine to merge obviously.
> >
> > One may not like -Warray-bounds inside VRP but it really "fits".
> >
> > OTOH making a warning part of an optimization pass naturally
> > limits its effect to when the specific optimization is enabled.
> > In theory it's possible to do -Warray-bounds at -O0 - we are in
> > SSA form after all - but of course you don't want to enable VRP at -O0.
> >
> >> I don't know if there's the -Walloca pass would benefit from merging
> >> with any of the others or vice versa, but superficially it seems like
> >> it might be worth thinking about integrating the -Walloc-larger-than
> >> warnings into the -Walloca pass, if only to keep similar functionality
> >> in the same place.
>
> My original code was in the VRP pass and I can't remember whether it was
> you or Jeff that suggested a separate pass so we'd stop polluting VRP
> with everything but the kitchen sink.

Specifically for VRP the reason is I'd like to see it go away...

Richard.

>
> Aldy
Aldy Hernandez May 27, 2019, 2:20 p.m. UTC | #12
>>>> I don't know if there's the -Walloca pass would benefit from merging
>>>> with any of the others or vice versa, but superficially it seems like
>>>> it might be worth thinking about integrating the -Walloc-larger-than
>>>> warnings into the -Walloca pass, if only to keep similar functionality
>>>> in the same place.
>>
>> My original code was in the VRP pass and I can't remember whether it was
>> you or Jeff that suggested a separate pass so we'd stop polluting VRP
>> with everything but the kitchen sink.
> 
> Specifically for VRP the reason is I'd like to see it go away...

You're preaching to the choir here :).

Aldy
Martin Sebor May 28, 2019, 3:34 p.m. UTC | #13
On 5/21/19 3:53 AM, Richard Biener wrote:
> On Tue, May 21, 2019 at 4:13 AM Martin Sebor <msebor@gmail.com> wrote:
>>
>> On 5/20/19 3:16 AM, Richard Biener wrote:
>>> On Mon, May 20, 2019 at 10:16 AM Marc Glisse <marc.glisse@inria.fr> wrote:
>>>>
>>>> On Mon, 20 May 2019, Richard Biener wrote:
>>>>
>>>>> On Sun, May 19, 2019 at 6:16 PM Marc Glisse <marc.glisse@inria.fr> wrote:
>>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> 2 pieces:
>>>>>>
>>>>>> - the first one handles the case where the denominator is negative. It
>>>>>> doesn't happen often with exact_div, so I don't handle it everywhere, but
>>>>>> this one looked trivial
>>>>>>
>>>>>> - handle the case where a pointer difference is cast to an unsigned type
>>>>>> before being compared to a constant (I hit this in std::vector). With some
>>>>>> range info we could probably handle some non-constant cases as well...
>>>>>>
>>>>>> The second piece breaks Walloca-13.c (-Walloca-larger-than=100 -O2)
>>>>>>
>>>>>> void f (void*);
>>>>>> void g (int *p, int *q)
>>>>>> {
>>>>>>      __SIZE_TYPE__ n = (__SIZE_TYPE__)(p - q);
>>>>>>      if (n < 100)
>>>>>>        f (__builtin_alloca (n));
>>>>>> }
>>>>>>
>>>>>> At the time of walloca2, we have
>>>>>>
>>>>>>      _1 = p_5(D) - q_6(D);
>>>>>>      # RANGE [-2305843009213693952, 2305843009213693951]
>>>>>>      _2 = _1 /[ex] 4;
>>>>>>      # RANGE ~[2305843009213693952, 16140901064495857663]
>>>>>>      n_7 = (long unsigned intD.10) _2;
>>>>>>      _11 = (long unsigned intD.10) _1;
>>>>>>      if (_11 <= 396)
>>>>>> [...]
>>>>>>      _3 = allocaD.1059 (n_7);
>>>>>>
>>>>>> and warn.
>>>>>
>>>>> That's indeed to complicated relation of _11 to n_7 for
>>>>> VRP predicate discovery.
>>>>>
>>>>>> However, DOM3 later produces
>>>>>>
>>>>>>      _1 = p_5(D) - q_6(D);
>>>>>>      _11 = (long unsigned intD.10) _1;
>>>>>>      if (_11 <= 396)
>>>>>
>>>>> while _11 vs. _1 works fine.
>>>>>
>>>>>> [...]
>>>>>>      # RANGE [0, 99] NONZERO 127
>>>>>>      _2 = _1 /[ex] 4;
>>>>>>      # RANGE [0, 99] NONZERO 127
>>>>>>      n_7 = (long unsigned intD.10) _2;
>>>>>>      _3 = allocaD.1059 (n_7);
>>>>>>
>>>>>> so I am tempted to say that the walloca2 pass is too early, xfail the
>>>>>> testcase and file an issue...
>>>>>
>>>>> Hmm, there's a DOM pass before walloca2 already and moving
>>>>> walloca2 after loop opts doesn't look like the best thing to do?
>>>>> I suppose it's not DOM but sinking that does the important transform
>>>>> here?  That is,
>>>>>
>>>>> Index: gcc/passes.def
>>>>> ===================================================================
>>>>> --- gcc/passes.def      (revision 271395)
>>>>> +++ gcc/passes.def      (working copy)
>>>>> @@ -241,9 +241,9 @@ along with GCC; see the file COPYING3.
>>>>>         NEXT_PASS (pass_optimize_bswap);
>>>>>         NEXT_PASS (pass_laddress);
>>>>>         NEXT_PASS (pass_lim);
>>>>> -      NEXT_PASS (pass_walloca, false);
>>>>>         NEXT_PASS (pass_pre);
>>>>>         NEXT_PASS (pass_sink_code);
>>>>> +      NEXT_PASS (pass_walloca, false);
>>>>>         NEXT_PASS (pass_sancov);
>>>>>         NEXT_PASS (pass_asan);
>>>>>         NEXT_PASS (pass_tsan);
>>>>>
>>>>> fixes it?
>>>>
>>>> I will check, but I don't think walloca uses any kind of on-demand VRP, so
>>>> we still need some pass to update the ranges after sinking, which doesn't
>>>> seem to happen until the next DOM pass.
>>>
>>> Oh, ok...  Aldy, why's this a separate pass anyways?  I think similar
>>> other warnigns are emitted from RTL expansion?  So maybe we can
>>> indeed move the pass towards warn_restrict or late_warn_uninit.
>>
>> I thought there was a preference to add new middle-end warnings
>> into passes of their own rather than into existing passes.  Is
>> that not so (either in general or in this specific case)?
> 
> The preference was to add them not into optimization passes.  But
> of course having 10+ warning passes, each going over the whole IL
> is excessive.  Also each of the locally computing ranges or so.
> 
> Given the simplicity of Walloca I wonder why it's not part of another
> warning pass - since it's about tracking "sizes" again there are plenty
> that fit ;)

-Walloca doesn't need to track object sizes in the same sense
as objsize and strlen do.  It just examines calls to allocation
functions, same as -Walloc-larger-than.  It would make sense to
merge the implementation of two warnings.  They don't need to
run as a pass of their own.

>>   From my POV, the main (only?) benefit of putting warnings in their
>> own passes is modularity.  Are there any others?
>>
>> The biggest drawback I see is that it makes it hard to then share
>> data across multiple passes.  The sharing can help not just
>> warnings (reduce both false positive and false negative rates) but
>> also optimization.  That's why I'm merging the strlen and sprintf
>> passes, and want to eventually also look into merging
>> the -Wstringop-overflow warnings there (also emitted just before
>> RTL expansion.  Did I miss any downsides?
> 
> When things fit together they are fine to merge obviously.
> 
> One may not like -Warray-bounds inside VRP but it really "fits".

It fits because it has access to data that's not (yet) available
outside VRP, right?  It's not an ideal fit because by being in VRP
it doesn't have access to allocated object sizes so out-of-bounds
access to dynamically allocated objects aren't detected.  To detect
those, I think the only pass it could go in is strlen.  So again
an optimization pass.  It doesn't fit there very well at all, but
there is no other pass that tracks sizes of all objects (objsize
does, though only in response to calls to __builtin_object_size).

I think just like a general purpose API to query value ranges that
cam be called from any pass, another API to query object sizes, and
another to query string lengths, would be useful.

> OTOH making a warning part of an optimization pass naturally
> limits its effect to when the specific optimization is enabled.
> In theory it's possible to do -Warray-bounds at -O0 - we are in
> SSA form after all - but of course you don't want to enable VRP at -O0.

True.  Better warnings at -O0 would be great.  The drawback would
be that the warnings would then have to implement all the same
analyses as the optimization passes they rely on.  We'd end up
having to implement a static analyzer on top of basic GIMPLE.

>> I don't know if there's the -Walloca pass would benefit from merging
>> with any of the others or vice versa, but superficially it seems like
>> it might be worth thinking about integrating the -Walloc-larger-than
>> warnings into the -Walloca pass, if only to keep similar functionality
>> in the same place.
> 
> -Wrestrict and all the format warning stuff seems related as well.

I think so.  All of strlen, sprintf, Wrestrict, -Wstringop-overflow
and -Walloca/-Walloc-larger-than, and even objsize, would benefit
from sharing the same IL traversal and the same data.  As would
-Warray-bounds to detect out-of-bounds accesses to dynamically
allocated objects.  There probably are other warnings and likely
also optimizations that would benefit from closer coupling.

Martin


> 
>>> I also see that the Og pass pipeline misses the second walloca pass
>>> completely (and also the warn_restrict pass).
>>
>> That's worth fixing.
>>
>> Martin
>>
>>>
>>> Given code sinkings obvious effects on SSA value-range representation
>>> it may make sense to add another instance of that pass earlier.
>>>
>>>
>>>
>>>>
>>>> --
>>>> Marc Glisse
>>
Richard Biener May 29, 2019, 7:21 a.m. UTC | #14
On Tue, May 28, 2019 at 5:34 PM Martin Sebor <msebor@gmail.com> wrote:
>
> On 5/21/19 3:53 AM, Richard Biener wrote:
> > On Tue, May 21, 2019 at 4:13 AM Martin Sebor <msebor@gmail.com> wrote:
> >>
> >> On 5/20/19 3:16 AM, Richard Biener wrote:
> >>> On Mon, May 20, 2019 at 10:16 AM Marc Glisse <marc.glisse@inria.fr> wrote:
> >>>>
> >>>> On Mon, 20 May 2019, Richard Biener wrote:
> >>>>
> >>>>> On Sun, May 19, 2019 at 6:16 PM Marc Glisse <marc.glisse@inria.fr> wrote:
> >>>>>>
> >>>>>> Hello,
> >>>>>>
> >>>>>> 2 pieces:
> >>>>>>
> >>>>>> - the first one handles the case where the denominator is negative. It
> >>>>>> doesn't happen often with exact_div, so I don't handle it everywhere, but
> >>>>>> this one looked trivial
> >>>>>>
> >>>>>> - handle the case where a pointer difference is cast to an unsigned type
> >>>>>> before being compared to a constant (I hit this in std::vector). With some
> >>>>>> range info we could probably handle some non-constant cases as well...
> >>>>>>
> >>>>>> The second piece breaks Walloca-13.c (-Walloca-larger-than=100 -O2)
> >>>>>>
> >>>>>> void f (void*);
> >>>>>> void g (int *p, int *q)
> >>>>>> {
> >>>>>>      __SIZE_TYPE__ n = (__SIZE_TYPE__)(p - q);
> >>>>>>      if (n < 100)
> >>>>>>        f (__builtin_alloca (n));
> >>>>>> }
> >>>>>>
> >>>>>> At the time of walloca2, we have
> >>>>>>
> >>>>>>      _1 = p_5(D) - q_6(D);
> >>>>>>      # RANGE [-2305843009213693952, 2305843009213693951]
> >>>>>>      _2 = _1 /[ex] 4;
> >>>>>>      # RANGE ~[2305843009213693952, 16140901064495857663]
> >>>>>>      n_7 = (long unsigned intD.10) _2;
> >>>>>>      _11 = (long unsigned intD.10) _1;
> >>>>>>      if (_11 <= 396)
> >>>>>> [...]
> >>>>>>      _3 = allocaD.1059 (n_7);
> >>>>>>
> >>>>>> and warn.
> >>>>>
> >>>>> That's indeed to complicated relation of _11 to n_7 for
> >>>>> VRP predicate discovery.
> >>>>>
> >>>>>> However, DOM3 later produces
> >>>>>>
> >>>>>>      _1 = p_5(D) - q_6(D);
> >>>>>>      _11 = (long unsigned intD.10) _1;
> >>>>>>      if (_11 <= 396)
> >>>>>
> >>>>> while _11 vs. _1 works fine.
> >>>>>
> >>>>>> [...]
> >>>>>>      # RANGE [0, 99] NONZERO 127
> >>>>>>      _2 = _1 /[ex] 4;
> >>>>>>      # RANGE [0, 99] NONZERO 127
> >>>>>>      n_7 = (long unsigned intD.10) _2;
> >>>>>>      _3 = allocaD.1059 (n_7);
> >>>>>>
> >>>>>> so I am tempted to say that the walloca2 pass is too early, xfail the
> >>>>>> testcase and file an issue...
> >>>>>
> >>>>> Hmm, there's a DOM pass before walloca2 already and moving
> >>>>> walloca2 after loop opts doesn't look like the best thing to do?
> >>>>> I suppose it's not DOM but sinking that does the important transform
> >>>>> here?  That is,
> >>>>>
> >>>>> Index: gcc/passes.def
> >>>>> ===================================================================
> >>>>> --- gcc/passes.def      (revision 271395)
> >>>>> +++ gcc/passes.def      (working copy)
> >>>>> @@ -241,9 +241,9 @@ along with GCC; see the file COPYING3.
> >>>>>         NEXT_PASS (pass_optimize_bswap);
> >>>>>         NEXT_PASS (pass_laddress);
> >>>>>         NEXT_PASS (pass_lim);
> >>>>> -      NEXT_PASS (pass_walloca, false);
> >>>>>         NEXT_PASS (pass_pre);
> >>>>>         NEXT_PASS (pass_sink_code);
> >>>>> +      NEXT_PASS (pass_walloca, false);
> >>>>>         NEXT_PASS (pass_sancov);
> >>>>>         NEXT_PASS (pass_asan);
> >>>>>         NEXT_PASS (pass_tsan);
> >>>>>
> >>>>> fixes it?
> >>>>
> >>>> I will check, but I don't think walloca uses any kind of on-demand VRP, so
> >>>> we still need some pass to update the ranges after sinking, which doesn't
> >>>> seem to happen until the next DOM pass.
> >>>
> >>> Oh, ok...  Aldy, why's this a separate pass anyways?  I think similar
> >>> other warnigns are emitted from RTL expansion?  So maybe we can
> >>> indeed move the pass towards warn_restrict or late_warn_uninit.
> >>
> >> I thought there was a preference to add new middle-end warnings
> >> into passes of their own rather than into existing passes.  Is
> >> that not so (either in general or in this specific case)?
> >
> > The preference was to add them not into optimization passes.  But
> > of course having 10+ warning passes, each going over the whole IL
> > is excessive.  Also each of the locally computing ranges or so.
> >
> > Given the simplicity of Walloca I wonder why it's not part of another
> > warning pass - since it's about tracking "sizes" again there are plenty
> > that fit ;)
>
> -Walloca doesn't need to track object sizes in the same sense
> as objsize and strlen do.  It just examines calls to allocation
> functions, same as -Walloc-larger-than.  It would make sense to
> merge the implementation of two warnings.  They don't need to
> run as a pass of their own.
>
> >>   From my POV, the main (only?) benefit of putting warnings in their
> >> own passes is modularity.  Are there any others?
> >>
> >> The biggest drawback I see is that it makes it hard to then share
> >> data across multiple passes.  The sharing can help not just
> >> warnings (reduce both false positive and false negative rates) but
> >> also optimization.  That's why I'm merging the strlen and sprintf
> >> passes, and want to eventually also look into merging
> >> the -Wstringop-overflow warnings there (also emitted just before
> >> RTL expansion.  Did I miss any downsides?
> >
> > When things fit together they are fine to merge obviously.
> >
> > One may not like -Warray-bounds inside VRP but it really "fits".
>
> It fits because it has access to data that's not (yet) available
> outside VRP, right?  It's not an ideal fit because by being in VRP
> it doesn't have access to allocated object sizes so out-of-bounds
> access to dynamically allocated objects aren't detected.  To detect
> those, I think the only pass it could go in is strlen.  So again
> an optimization pass.  It doesn't fit there very well at all, but
> there is no other pass that tracks sizes of all objects (objsize
> does, though only in response to calls to __builtin_object_size).

All true - now that we have warning passes computing range-information
it definitely fits better there.  At the time we added the warning there
wasn't even per-SSA name range information.

So - if you want to move it I'd applaud such effort!

> I think just like a general purpose API to query value ranges that
> cam be called from any pass, another API to query object sizes, and
> another to query string lengths, would be useful.
>
> > OTOH making a warning part of an optimization pass naturally
> > limits its effect to when the specific optimization is enabled.
> > In theory it's possible to do -Warray-bounds at -O0 - we are in
> > SSA form after all - but of course you don't want to enable VRP at -O0.
>
> True.  Better warnings at -O0 would be great.  The drawback would
> be that the warnings would then have to implement all the same
> analyses as the optimization passes they rely on.  We'd end up
> having to implement a static analyzer on top of basic GIMPLE.
>
> >> I don't know if there's the -Walloca pass would benefit from merging
> >> with any of the others or vice versa, but superficially it seems like
> >> it might be worth thinking about integrating the -Walloc-larger-than
> >> warnings into the -Walloca pass, if only to keep similar functionality
> >> in the same place.
> >
> > -Wrestrict and all the format warning stuff seems related as well.
>
> I think so.  All of strlen, sprintf, Wrestrict, -Wstringop-overflow
> and -Walloca/-Walloc-larger-than, and even objsize, would benefit
> from sharing the same IL traversal and the same data.  As would
> -Warray-bounds to detect out-of-bounds accesses to dynamically
> allocated objects.  There probably are other warnings and likely
> also optimizations that would benefit from closer coupling.
>
> Martin
>
>
> >
> >>> I also see that the Og pass pipeline misses the second walloca pass
> >>> completely (and also the warn_restrict pass).
> >>
> >> That's worth fixing.
> >>
> >> Martin
> >>
> >>>
> >>> Given code sinkings obvious effects on SSA value-range representation
> >>> it may make sense to add another instance of that pass earlier.
> >>>
> >>>
> >>>
> >>>>
> >>>> --
> >>>> Marc Glisse
> >>
>
Jeff Law May 29, 2019, 4:31 p.m. UTC | #15
On 5/29/19 1:21 AM, Richard Biener wrote:
> On Tue, May 28, 2019 at 5:34 PM Martin Sebor <msebor@gmail.com> wrote:
>>
>> On 5/21/19 3:53 AM, Richard Biener wrote:
>>> On Tue, May 21, 2019 at 4:13 AM Martin Sebor <msebor@gmail.com> wrote:
>>>>
>>>> On 5/20/19 3:16 AM, Richard Biener wrote:
>>>>> On Mon, May 20, 2019 at 10:16 AM Marc Glisse <marc.glisse@inria.fr> wrote:
>>>>>>
>>>>>> On Mon, 20 May 2019, Richard Biener wrote:
>>>>>>
>>>>>>> On Sun, May 19, 2019 at 6:16 PM Marc Glisse <marc.glisse@inria.fr> wrote:
>>>>>>>>
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> 2 pieces:
>>>>>>>>
>>>>>>>> - the first one handles the case where the denominator is negative. It
>>>>>>>> doesn't happen often with exact_div, so I don't handle it everywhere, but
>>>>>>>> this one looked trivial
>>>>>>>>
>>>>>>>> - handle the case where a pointer difference is cast to an unsigned type
>>>>>>>> before being compared to a constant (I hit this in std::vector). With some
>>>>>>>> range info we could probably handle some non-constant cases as well...
>>>>>>>>
>>>>>>>> The second piece breaks Walloca-13.c (-Walloca-larger-than=100 -O2)
>>>>>>>>
>>>>>>>> void f (void*);
>>>>>>>> void g (int *p, int *q)
>>>>>>>> {
>>>>>>>>      __SIZE_TYPE__ n = (__SIZE_TYPE__)(p - q);
>>>>>>>>      if (n < 100)
>>>>>>>>        f (__builtin_alloca (n));
>>>>>>>> }
>>>>>>>>
>>>>>>>> At the time of walloca2, we have
>>>>>>>>
>>>>>>>>      _1 = p_5(D) - q_6(D);
>>>>>>>>      # RANGE [-2305843009213693952, 2305843009213693951]
>>>>>>>>      _2 = _1 /[ex] 4;
>>>>>>>>      # RANGE ~[2305843009213693952, 16140901064495857663]
>>>>>>>>      n_7 = (long unsigned intD.10) _2;
>>>>>>>>      _11 = (long unsigned intD.10) _1;
>>>>>>>>      if (_11 <= 396)
>>>>>>>> [...]
>>>>>>>>      _3 = allocaD.1059 (n_7);
>>>>>>>>
>>>>>>>> and warn.
>>>>>>>
>>>>>>> That's indeed to complicated relation of _11 to n_7 for
>>>>>>> VRP predicate discovery.
>>>>>>>
>>>>>>>> However, DOM3 later produces
>>>>>>>>
>>>>>>>>      _1 = p_5(D) - q_6(D);
>>>>>>>>      _11 = (long unsigned intD.10) _1;
>>>>>>>>      if (_11 <= 396)
>>>>>>>
>>>>>>> while _11 vs. _1 works fine.
>>>>>>>
>>>>>>>> [...]
>>>>>>>>      # RANGE [0, 99] NONZERO 127
>>>>>>>>      _2 = _1 /[ex] 4;
>>>>>>>>      # RANGE [0, 99] NONZERO 127
>>>>>>>>      n_7 = (long unsigned intD.10) _2;
>>>>>>>>      _3 = allocaD.1059 (n_7);
>>>>>>>>
>>>>>>>> so I am tempted to say that the walloca2 pass is too early, xfail the
>>>>>>>> testcase and file an issue...
>>>>>>>
>>>>>>> Hmm, there's a DOM pass before walloca2 already and moving
>>>>>>> walloca2 after loop opts doesn't look like the best thing to do?
>>>>>>> I suppose it's not DOM but sinking that does the important transform
>>>>>>> here?  That is,
>>>>>>>
>>>>>>> Index: gcc/passes.def
>>>>>>> ===================================================================
>>>>>>> --- gcc/passes.def      (revision 271395)
>>>>>>> +++ gcc/passes.def      (working copy)
>>>>>>> @@ -241,9 +241,9 @@ along with GCC; see the file COPYING3.
>>>>>>>         NEXT_PASS (pass_optimize_bswap);
>>>>>>>         NEXT_PASS (pass_laddress);
>>>>>>>         NEXT_PASS (pass_lim);
>>>>>>> -      NEXT_PASS (pass_walloca, false);
>>>>>>>         NEXT_PASS (pass_pre);
>>>>>>>         NEXT_PASS (pass_sink_code);
>>>>>>> +      NEXT_PASS (pass_walloca, false);
>>>>>>>         NEXT_PASS (pass_sancov);
>>>>>>>         NEXT_PASS (pass_asan);
>>>>>>>         NEXT_PASS (pass_tsan);
>>>>>>>
>>>>>>> fixes it?
>>>>>>
>>>>>> I will check, but I don't think walloca uses any kind of on-demand VRP, so
>>>>>> we still need some pass to update the ranges after sinking, which doesn't
>>>>>> seem to happen until the next DOM pass.
>>>>>
>>>>> Oh, ok...  Aldy, why's this a separate pass anyways?  I think similar
>>>>> other warnigns are emitted from RTL expansion?  So maybe we can
>>>>> indeed move the pass towards warn_restrict or late_warn_uninit.
>>>>
>>>> I thought there was a preference to add new middle-end warnings
>>>> into passes of their own rather than into existing passes.  Is
>>>> that not so (either in general or in this specific case)?
>>>
>>> The preference was to add them not into optimization passes.  But
>>> of course having 10+ warning passes, each going over the whole IL
>>> is excessive.  Also each of the locally computing ranges or so.
>>>
>>> Given the simplicity of Walloca I wonder why it's not part of another
>>> warning pass - since it's about tracking "sizes" again there are plenty
>>> that fit ;)
>>
>> -Walloca doesn't need to track object sizes in the same sense
>> as objsize and strlen do.  It just examines calls to allocation
>> functions, same as -Walloc-larger-than.  It would make sense to
>> merge the implementation of two warnings.  They don't need to
>> run as a pass of their own.
>>
>>>>   From my POV, the main (only?) benefit of putting warnings in their
>>>> own passes is modularity.  Are there any others?
>>>>
>>>> The biggest drawback I see is that it makes it hard to then share
>>>> data across multiple passes.  The sharing can help not just
>>>> warnings (reduce both false positive and false negative rates) but
>>>> also optimization.  That's why I'm merging the strlen and sprintf
>>>> passes, and want to eventually also look into merging
>>>> the -Wstringop-overflow warnings there (also emitted just before
>>>> RTL expansion.  Did I miss any downsides?
>>>
>>> When things fit together they are fine to merge obviously.
>>>
>>> One may not like -Warray-bounds inside VRP but it really "fits".
>>
>> It fits because it has access to data that's not (yet) available
>> outside VRP, right?  It's not an ideal fit because by being in VRP
>> it doesn't have access to allocated object sizes so out-of-bounds
>> access to dynamically allocated objects aren't detected.  To detect
>> those, I think the only pass it could go in is strlen.  So again
>> an optimization pass.  It doesn't fit there very well at all, but
>> there is no other pass that tracks sizes of all objects (objsize
>> does, though only in response to calls to __builtin_object_size).
> 
> All true - now that we have warning passes computing range-information
> it definitely fits better there.  At the time we added the warning there
> wasn't even per-SSA name range information.
Right.  Global ranges attached to SSA_NAMEs is a relatively recent
capability and one that I believe we'll continue to find uses for.

WRT out of bounds array indexing.  What we do right now in VRP is so
lame it makes me want to cry.  Essentially we only warn if the entire
range of the index is out of bounds.  We can and should be able to do
better.

Global ranges aren't the solution for this problem.  They're too
imprecise, particularly after join points.  Living inside VRP is
problematical, not just because it's gross, but because I think we
should be turning out of bounds array references into traps (after
warning of course).  In fact, it was problems in this space that were
the prime motivation behind the ranger project :-)

A common idiom I've seen during analysis is to use -1 to denote an error
case.  It's relatively common as a result to end up with stuff like this:

index = PHI (1, 2, 3, -1)
y = array[index]

I think we want a warning for this kind of stuff.  Furthermore, I think
we want to isolate the path which results in -1 being assigned to index.
 Once that path is isolated, we can insert a trap on the isolated path
and cleanup the CFG.

Jeff
Marc Glisse May 29, 2019, 9:27 p.m. UTC | #16
On Mon, 20 May 2019, Richard Biener wrote:

> On Mon, May 20, 2019 at 10:16 AM Marc Glisse <marc.glisse@inria.fr> wrote:
>>
>> On Mon, 20 May 2019, Richard Biener wrote:
>>
>>> On Sun, May 19, 2019 at 6:16 PM Marc Glisse <marc.glisse@inria.fr> wrote:
>>>>
>>>> Hello,
>>>>
>>>> 2 pieces:
>>>>
>>>> - the first one handles the case where the denominator is negative. It
>>>> doesn't happen often with exact_div, so I don't handle it everywhere, but
>>>> this one looked trivial
>>>>
>>>> - handle the case where a pointer difference is cast to an unsigned type
>>>> before being compared to a constant (I hit this in std::vector). With some
>>>> range info we could probably handle some non-constant cases as well...
>>>>
>>>> The second piece breaks Walloca-13.c (-Walloca-larger-than=100 -O2)
>>>>
>>>> void f (void*);
>>>> void g (int *p, int *q)
>>>> {
>>>>    __SIZE_TYPE__ n = (__SIZE_TYPE__)(p - q);
>>>>    if (n < 100)
>>>>      f (__builtin_alloca (n));
>>>> }
>>>>
>>>> At the time of walloca2, we have
>>>>
>>>>    _1 = p_5(D) - q_6(D);
>>>>    # RANGE [-2305843009213693952, 2305843009213693951]
>>>>    _2 = _1 /[ex] 4;
>>>>    # RANGE ~[2305843009213693952, 16140901064495857663]
>>>>    n_7 = (long unsigned intD.10) _2;
>>>>    _11 = (long unsigned intD.10) _1;
>>>>    if (_11 <= 396)
>>>> [...]
>>>>    _3 = allocaD.1059 (n_7);
>>>>
>>>> and warn.
>>>
>>> That's indeed to complicated relation of _11 to n_7 for
>>> VRP predicate discovery.
>>>
>>>> However, DOM3 later produces
>>>>
>>>>    _1 = p_5(D) - q_6(D);
>>>>    _11 = (long unsigned intD.10) _1;
>>>>    if (_11 <= 396)
>>>
>>> while _11 vs. _1 works fine.
>>>
>>>> [...]
>>>>    # RANGE [0, 99] NONZERO 127
>>>>    _2 = _1 /[ex] 4;
>>>>    # RANGE [0, 99] NONZERO 127
>>>>    n_7 = (long unsigned intD.10) _2;
>>>>    _3 = allocaD.1059 (n_7);
>>>>
>>>> so I am tempted to say that the walloca2 pass is too early, xfail the
>>>> testcase and file an issue...
>>>
>>> Hmm, there's a DOM pass before walloca2 already and moving
>>> walloca2 after loop opts doesn't look like the best thing to do?
>>> I suppose it's not DOM but sinking that does the important transform
>>> here?  That is,
>>>
>>> Index: gcc/passes.def
>>> ===================================================================
>>> --- gcc/passes.def      (revision 271395)
>>> +++ gcc/passes.def      (working copy)
>>> @@ -241,9 +241,9 @@ along with GCC; see the file COPYING3.
>>>       NEXT_PASS (pass_optimize_bswap);
>>>       NEXT_PASS (pass_laddress);
>>>       NEXT_PASS (pass_lim);
>>> -      NEXT_PASS (pass_walloca, false);
>>>       NEXT_PASS (pass_pre);
>>>       NEXT_PASS (pass_sink_code);
>>> +      NEXT_PASS (pass_walloca, false);
>>>       NEXT_PASS (pass_sancov);
>>>       NEXT_PASS (pass_asan);
>>>       NEXT_PASS (pass_tsan);
>>>
>>> fixes it?
>>
>> I will check, but I don't think walloca uses any kind of on-demand VRP, so
>> we still need some pass to update the ranges after sinking, which doesn't
>> seem to happen until the next DOM pass.
>
> Oh, ok...  Aldy, why's this a separate pass anyways?  I think similar
> other warnigns are emitted from RTL expansion?  So maybe we can
> indeed move the pass towards warn_restrict or late_warn_uninit.

I tried moving it after 'sink' and that didn't help. Moving it next to 
warn_restrict works for this test but breaks 2 others that currently 
"work" by accident (+ one where the message changes between "unbounded" 
and "too large", it isn't clear what the difference is between those 
messages).

My suggestion, in addition to the original patch, is

 	* gcc.dg/Walloca-13.c: Xfail.

--- Walloca-13.c	(revision 271742)
+++ Walloca-13.c	(working copy)
@@ -1,12 +1,12 @@
  /* { dg-do compile } */
  /* { dg-require-effective-target alloca } */
  /* { dg-options "-Walloca-larger-than=100 -O2" } */

  void f (void*);

  void g (int *p, int *q)
  {
    __SIZE_TYPE__ n = (__SIZE_TYPE__)(p - q);
    if (n < 100)
-    f (__builtin_alloca (n));
+    f (__builtin_alloca (n)); // { dg-bogus "may be too large due to conversion" "" { xfail { *-*-* } } }
  }

Is that ok?

> I also see that the Og pass pipeline misses the second walloca pass
> completely (and also the warn_restrict pass).
>
> Given code sinkings obvious effects on SSA value-range representation
> it may make sense to add another instance of that pass earlier.
Jeff Law May 30, 2019, 4:26 p.m. UTC | #17
On 5/29/19 3:27 PM, Marc Glisse wrote:
> On Mon, 20 May 2019, Richard Biener wrote:
> 
>> On Mon, May 20, 2019 at 10:16 AM Marc Glisse <marc.glisse@inria.fr>
>> wrote:
>>>
>>> On Mon, 20 May 2019, Richard Biener wrote:
>>>
>>>> On Sun, May 19, 2019 at 6:16 PM Marc Glisse <marc.glisse@inria.fr>
>>>> wrote:
>>>>>
>>>>> Hello,
>>>>>
>>>>> 2 pieces:
>>>>>
>>>>> - the first one handles the case where the denominator is negative. It
>>>>> doesn't happen often with exact_div, so I don't handle it
>>>>> everywhere, but
>>>>> this one looked trivial
>>>>>
>>>>> - handle the case where a pointer difference is cast to an unsigned
>>>>> type
>>>>> before being compared to a constant (I hit this in std::vector).
>>>>> With some
>>>>> range info we could probably handle some non-constant cases as well...
>>>>>
>>>>> The second piece breaks Walloca-13.c (-Walloca-larger-than=100 -O2)
>>>>>
>>>>> void f (void*);
>>>>> void g (int *p, int *q)
>>>>> {
>>>>>    __SIZE_TYPE__ n = (__SIZE_TYPE__)(p - q);
>>>>>    if (n < 100)
>>>>>      f (__builtin_alloca (n));
>>>>> }
>>>>>
>>>>> At the time of walloca2, we have
>>>>>
>>>>>    _1 = p_5(D) - q_6(D);
>>>>>    # RANGE [-2305843009213693952, 2305843009213693951]
>>>>>    _2 = _1 /[ex] 4;
>>>>>    # RANGE ~[2305843009213693952, 16140901064495857663]
>>>>>    n_7 = (long unsigned intD.10) _2;
>>>>>    _11 = (long unsigned intD.10) _1;
>>>>>    if (_11 <= 396)
>>>>> [...]
>>>>>    _3 = allocaD.1059 (n_7);
>>>>>
>>>>> and warn.
>>>>
>>>> That's indeed to complicated relation of _11 to n_7 for
>>>> VRP predicate discovery.
>>>>
>>>>> However, DOM3 later produces
>>>>>
>>>>>    _1 = p_5(D) - q_6(D);
>>>>>    _11 = (long unsigned intD.10) _1;
>>>>>    if (_11 <= 396)
>>>>
>>>> while _11 vs. _1 works fine.
>>>>
>>>>> [...]
>>>>>    # RANGE [0, 99] NONZERO 127
>>>>>    _2 = _1 /[ex] 4;
>>>>>    # RANGE [0, 99] NONZERO 127
>>>>>    n_7 = (long unsigned intD.10) _2;
>>>>>    _3 = allocaD.1059 (n_7);
>>>>>
>>>>> so I am tempted to say that the walloca2 pass is too early, xfail the
>>>>> testcase and file an issue...
>>>>
>>>> Hmm, there's a DOM pass before walloca2 already and moving
>>>> walloca2 after loop opts doesn't look like the best thing to do?
>>>> I suppose it's not DOM but sinking that does the important transform
>>>> here?  That is,
>>>>
>>>> Index: gcc/passes.def
>>>> ===================================================================
>>>> --- gcc/passes.def      (revision 271395)
>>>> +++ gcc/passes.def      (working copy)
>>>> @@ -241,9 +241,9 @@ along with GCC; see the file COPYING3.
>>>>       NEXT_PASS (pass_optimize_bswap);
>>>>       NEXT_PASS (pass_laddress);
>>>>       NEXT_PASS (pass_lim);
>>>> -      NEXT_PASS (pass_walloca, false);
>>>>       NEXT_PASS (pass_pre);
>>>>       NEXT_PASS (pass_sink_code);
>>>> +      NEXT_PASS (pass_walloca, false);
>>>>       NEXT_PASS (pass_sancov);
>>>>       NEXT_PASS (pass_asan);
>>>>       NEXT_PASS (pass_tsan);
>>>>
>>>> fixes it?
>>>
>>> I will check, but I don't think walloca uses any kind of on-demand
>>> VRP, so
>>> we still need some pass to update the ranges after sinking, which
>>> doesn't
>>> seem to happen until the next DOM pass.
>>
>> Oh, ok...  Aldy, why's this a separate pass anyways?  I think similar
>> other warnigns are emitted from RTL expansion?  So maybe we can
>> indeed move the pass towards warn_restrict or late_warn_uninit.
> 
> I tried moving it after 'sink' and that didn't help. Moving it next to
> warn_restrict works for this test but breaks 2 others that currently
> "work" by accident (+ one where the message changes between "unbounded"
> and "too large", it isn't clear what the difference is between those
> messages).
So "unbounded" means there was no controlling predicate that would allow
an upper bound on the size of the alloca to be determined.

"too large" breaks down into two cases.  One where we know it's too
large, the other when it may be too large.  The latter can happen due to
casts and perhaps other things (controlling predicate with a dynamic
upper bound maybe?)


> 
> My suggestion, in addition to the original patch, is
> 
>     * gcc.dg/Walloca-13.c: Xfail.
> 
> --- Walloca-13.c    (revision 271742)
> +++ Walloca-13.c    (working copy)
> @@ -1,12 +1,12 @@
>  /* { dg-do compile } */
>  /* { dg-require-effective-target alloca } */
>  /* { dg-options "-Walloca-larger-than=100 -O2" } */
> 
>  void f (void*);
> 
>  void g (int *p, int *q)
>  {
>    __SIZE_TYPE__ n = (__SIZE_TYPE__)(p - q);
>    if (n < 100)
> -    f (__builtin_alloca (n));
> +    f (__builtin_alloca (n)); // { dg-bogus "may be too large due to
> conversion" "" { xfail { *-*-* } } }
>  }
> 
> Is that ok?
Aldy would need to chime in -- at first glance it looks like we've lost
precision here.

jeff
Richard Biener May 31, 2019, 8:45 a.m. UTC | #18
On Wed, May 29, 2019 at 11:28 PM Marc Glisse <marc.glisse@inria.fr> wrote:
>
> On Mon, 20 May 2019, Richard Biener wrote:
>
> > On Mon, May 20, 2019 at 10:16 AM Marc Glisse <marc.glisse@inria.fr> wrote:
> >>
> >> On Mon, 20 May 2019, Richard Biener wrote:
> >>
> >>> On Sun, May 19, 2019 at 6:16 PM Marc Glisse <marc.glisse@inria.fr> wrote:
> >>>>
> >>>> Hello,
> >>>>
> >>>> 2 pieces:
> >>>>
> >>>> - the first one handles the case where the denominator is negative. It
> >>>> doesn't happen often with exact_div, so I don't handle it everywhere, but
> >>>> this one looked trivial
> >>>>
> >>>> - handle the case where a pointer difference is cast to an unsigned type
> >>>> before being compared to a constant (I hit this in std::vector). With some
> >>>> range info we could probably handle some non-constant cases as well...
> >>>>
> >>>> The second piece breaks Walloca-13.c (-Walloca-larger-than=100 -O2)
> >>>>
> >>>> void f (void*);
> >>>> void g (int *p, int *q)
> >>>> {
> >>>>    __SIZE_TYPE__ n = (__SIZE_TYPE__)(p - q);
> >>>>    if (n < 100)
> >>>>      f (__builtin_alloca (n));
> >>>> }
> >>>>
> >>>> At the time of walloca2, we have
> >>>>
> >>>>    _1 = p_5(D) - q_6(D);
> >>>>    # RANGE [-2305843009213693952, 2305843009213693951]
> >>>>    _2 = _1 /[ex] 4;
> >>>>    # RANGE ~[2305843009213693952, 16140901064495857663]
> >>>>    n_7 = (long unsigned intD.10) _2;
> >>>>    _11 = (long unsigned intD.10) _1;
> >>>>    if (_11 <= 396)
> >>>> [...]
> >>>>    _3 = allocaD.1059 (n_7);
> >>>>
> >>>> and warn.
> >>>
> >>> That's indeed to complicated relation of _11 to n_7 for
> >>> VRP predicate discovery.
> >>>
> >>>> However, DOM3 later produces
> >>>>
> >>>>    _1 = p_5(D) - q_6(D);
> >>>>    _11 = (long unsigned intD.10) _1;
> >>>>    if (_11 <= 396)
> >>>
> >>> while _11 vs. _1 works fine.
> >>>
> >>>> [...]
> >>>>    # RANGE [0, 99] NONZERO 127
> >>>>    _2 = _1 /[ex] 4;
> >>>>    # RANGE [0, 99] NONZERO 127
> >>>>    n_7 = (long unsigned intD.10) _2;
> >>>>    _3 = allocaD.1059 (n_7);
> >>>>
> >>>> so I am tempted to say that the walloca2 pass is too early, xfail the
> >>>> testcase and file an issue...
> >>>
> >>> Hmm, there's a DOM pass before walloca2 already and moving
> >>> walloca2 after loop opts doesn't look like the best thing to do?
> >>> I suppose it's not DOM but sinking that does the important transform
> >>> here?  That is,
> >>>
> >>> Index: gcc/passes.def
> >>> ===================================================================
> >>> --- gcc/passes.def      (revision 271395)
> >>> +++ gcc/passes.def      (working copy)
> >>> @@ -241,9 +241,9 @@ along with GCC; see the file COPYING3.
> >>>       NEXT_PASS (pass_optimize_bswap);
> >>>       NEXT_PASS (pass_laddress);
> >>>       NEXT_PASS (pass_lim);
> >>> -      NEXT_PASS (pass_walloca, false);
> >>>       NEXT_PASS (pass_pre);
> >>>       NEXT_PASS (pass_sink_code);
> >>> +      NEXT_PASS (pass_walloca, false);
> >>>       NEXT_PASS (pass_sancov);
> >>>       NEXT_PASS (pass_asan);
> >>>       NEXT_PASS (pass_tsan);
> >>>
> >>> fixes it?
> >>
> >> I will check, but I don't think walloca uses any kind of on-demand VRP, so
> >> we still need some pass to update the ranges after sinking, which doesn't
> >> seem to happen until the next DOM pass.
> >
> > Oh, ok...  Aldy, why's this a separate pass anyways?  I think similar
> > other warnigns are emitted from RTL expansion?  So maybe we can
> > indeed move the pass towards warn_restrict or late_warn_uninit.
>
> I tried moving it after 'sink' and that didn't help. Moving it next to
> warn_restrict works for this test but breaks 2 others that currently
> "work" by accident (+ one where the message changes between "unbounded"
> and "too large", it isn't clear what the difference is between those
> messages).
>
> My suggestion, in addition to the original patch, is
>
>         * gcc.dg/Walloca-13.c: Xfail.
>
> --- Walloca-13.c        (revision 271742)
> +++ Walloca-13.c        (working copy)
> @@ -1,12 +1,12 @@
>   /* { dg-do compile } */
>   /* { dg-require-effective-target alloca } */
>   /* { dg-options "-Walloca-larger-than=100 -O2" } */
>
>   void f (void*);
>
>   void g (int *p, int *q)
>   {
>     __SIZE_TYPE__ n = (__SIZE_TYPE__)(p - q);
>     if (n < 100)
> -    f (__builtin_alloca (n));
> +    f (__builtin_alloca (n)); // { dg-bogus "may be too large due to conversion" "" { xfail { *-*-* } } }
>   }
>
> Is that ok?

It works for me.  Any pass ordering changes should probably be done
as followup anyways since they might expose other issues (as you've seen...)

Richard.

> > I also see that the Og pass pipeline misses the second walloca pass
> > completely (and also the warn_restrict pass).
> >
> > Given code sinkings obvious effects on SSA value-range representation
> > it may make sense to add another instance of that pass earlier.
>
> --
> Marc Glisse
Aldy Hernandez May 31, 2019, 9:01 a.m. UTC | #19
I've never been too happy with the too large due to cast warnings. For that
matter, it seems like a lot of the unbounded alloca warning variants were
artifacts of the way we couldn't get precise ranges after vrp asserts had
disappeared and we were trying to guess at what the actual range in the
original code was. It's fragile at best.

I haven't been paying too much attention to walloca because the ranger gets
considerably better context ranges in the ranger walloca version, and we
are getting correct warnings for a variety of things we couldn't before. So
I was hoping to ignore this until we all agreed on what range, vrp etc will
look like going forward.

That being said, I could take a closer look at this xfail on Monday if
y'all would like. But I don't currently have strong opinions either way. I
guess it'll all change in the next few months.

Aldy

On Thu, May 30, 2019, 18:26 Jeff Law <law@redhat.com> wrote:

> On 5/29/19 3:27 PM, Marc Glisse wrote:
> > On Mon, 20 May 2019, Richard Biener wrote:
> >
> >> On Mon, May 20, 2019 at 10:16 AM Marc Glisse <marc.glisse@inria.fr>
> >> wrote:
> >>>
> >>> On Mon, 20 May 2019, Richard Biener wrote:
> >>>
> >>>> On Sun, May 19, 2019 at 6:16 PM Marc Glisse <marc.glisse@inria.fr>
> >>>> wrote:
> >>>>>
> >>>>> Hello,
> >>>>>
> >>>>> 2 pieces:
> >>>>>
> >>>>> - the first one handles the case where the denominator is negative.
> It
> >>>>> doesn't happen often with exact_div, so I don't handle it
> >>>>> everywhere, but
> >>>>> this one looked trivial
> >>>>>
> >>>>> - handle the case where a pointer difference is cast to an unsigned
> >>>>> type
> >>>>> before being compared to a constant (I hit this in std::vector).
> >>>>> With some
> >>>>> range info we could probably handle some non-constant cases as
> well...
> >>>>>
> >>>>> The second piece breaks Walloca-13.c (-Walloca-larger-than=100 -O2)
> >>>>>
> >>>>> void f (void*);
> >>>>> void g (int *p, int *q)
> >>>>> {
> >>>>>    __SIZE_TYPE__ n = (__SIZE_TYPE__)(p - q);
> >>>>>    if (n < 100)
> >>>>>      f (__builtin_alloca (n));
> >>>>> }
> >>>>>
> >>>>> At the time of walloca2, we have
> >>>>>
> >>>>>    _1 = p_5(D) - q_6(D);
> >>>>>    # RANGE [-2305843009213693952, 2305843009213693951]
> >>>>>    _2 = _1 /[ex] 4;
> >>>>>    # RANGE ~[2305843009213693952, 16140901064495857663]
> >>>>>    n_7 = (long unsigned intD.10) _2;
> >>>>>    _11 = (long unsigned intD.10) _1;
> >>>>>    if (_11 <= 396)
> >>>>> [...]
> >>>>>    _3 = allocaD.1059 (n_7);
> >>>>>
> >>>>> and warn.
> >>>>
> >>>> That's indeed to complicated relation of _11 to n_7 for
> >>>> VRP predicate discovery.
> >>>>
> >>>>> However, DOM3 later produces
> >>>>>
> >>>>>    _1 = p_5(D) - q_6(D);
> >>>>>    _11 = (long unsigned intD.10) _1;
> >>>>>    if (_11 <= 396)
> >>>>
> >>>> while _11 vs. _1 works fine.
> >>>>
> >>>>> [...]
> >>>>>    # RANGE [0, 99] NONZERO 127
> >>>>>    _2 = _1 /[ex] 4;
> >>>>>    # RANGE [0, 99] NONZERO 127
> >>>>>    n_7 = (long unsigned intD.10) _2;
> >>>>>    _3 = allocaD.1059 (n_7);
> >>>>>
> >>>>> so I am tempted to say that the walloca2 pass is too early, xfail the
> >>>>> testcase and file an issue...
> >>>>
> >>>> Hmm, there's a DOM pass before walloca2 already and moving
> >>>> walloca2 after loop opts doesn't look like the best thing to do?
> >>>> I suppose it's not DOM but sinking that does the important transform
> >>>> here?  That is,
> >>>>
> >>>> Index: gcc/passes.def
> >>>> ===================================================================
> >>>> --- gcc/passes.def      (revision 271395)
> >>>> +++ gcc/passes.def      (working copy)
> >>>> @@ -241,9 +241,9 @@ along with GCC; see the file COPYING3.
> >>>>       NEXT_PASS (pass_optimize_bswap);
> >>>>       NEXT_PASS (pass_laddress);
> >>>>       NEXT_PASS (pass_lim);
> >>>> -      NEXT_PASS (pass_walloca, false);
> >>>>       NEXT_PASS (pass_pre);
> >>>>       NEXT_PASS (pass_sink_code);
> >>>> +      NEXT_PASS (pass_walloca, false);
> >>>>       NEXT_PASS (pass_sancov);
> >>>>       NEXT_PASS (pass_asan);
> >>>>       NEXT_PASS (pass_tsan);
> >>>>
> >>>> fixes it?
> >>>
> >>> I will check, but I don't think walloca uses any kind of on-demand
> >>> VRP, so
> >>> we still need some pass to update the ranges after sinking, which
> >>> doesn't
> >>> seem to happen until the next DOM pass.
> >>
> >> Oh, ok...  Aldy, why's this a separate pass anyways?  I think similar
> >> other warnigns are emitted from RTL expansion?  So maybe we can
> >> indeed move the pass towards warn_restrict or late_warn_uninit.
> >
> > I tried moving it after 'sink' and that didn't help. Moving it next to
> > warn_restrict works for this test but breaks 2 others that currently
> > "work" by accident (+ one where the message changes between "unbounded"
> > and "too large", it isn't clear what the difference is between those
> > messages).
> So "unbounded" means there was no controlling predicate that would allow
> an upper bound on the size of the alloca to be determined.
>
> "too large" breaks down into two cases.  One where we know it's too
> large, the other when it may be too large.  The latter can happen due to
> casts and perhaps other things (controlling predicate with a dynamic
> upper bound maybe?)
>
>
> >
> > My suggestion, in addition to the original patch, is
> >
> >     * gcc.dg/Walloca-13.c: Xfail.
> >
> > --- Walloca-13.c    (revision 271742)
> > +++ Walloca-13.c    (working copy)
> > @@ -1,12 +1,12 @@
> >  /* { dg-do compile } */
> >  /* { dg-require-effective-target alloca } */
> >  /* { dg-options "-Walloca-larger-than=100 -O2" } */
> >
> >  void f (void*);
> >
> >  void g (int *p, int *q)
> >  {
> >    __SIZE_TYPE__ n = (__SIZE_TYPE__)(p - q);
> >    if (n < 100)
> > -    f (__builtin_alloca (n));
> > +    f (__builtin_alloca (n)); // { dg-bogus "may be too large due to
> > conversion" "" { xfail { *-*-* } } }
> >  }
> >
> > Is that ok?
> Aldy would need to chime in -- at first glance it looks like we've lost
> precision here.
>
> jeff
>
>
Marc Glisse May 31, 2019, 3:47 p.m. UTC | #20
On Fri, 31 May 2019, Aldy Hernandez wrote:

> I've never been too happy with the too large due to cast warnings. For that
> matter, it seems like a lot of the unbounded alloca warning variants were
> artifacts of the way we couldn't get precise ranges after vrp asserts had
> disappeared and we were trying to guess at what the actual range in the
> original code was. It's fragile at best.

Yes, very fragile.

> I haven't been paying too much attention to walloca because the ranger gets
> considerably better context ranges in the ranger walloca version, and we
> are getting correct warnings for a variety of things we couldn't before. So
> I was hoping to ignore this until we all agreed on what range, vrp etc will
> look like going forward.

Seems sensible.

> That being said, I could take a closer look at this xfail on Monday if
> y'all would like. But I don't currently have strong opinions either way. I
> guess it'll all change in the next few months.

As long as you are ok with one Walloca testcase being xfailed until the 
VRP work lands, I don't think there is a need to spend time on it now.
Aldy Hernandez May 31, 2019, 3:50 p.m. UTC | #21
Sure. No problem. Thanks for looking at this.

Aldy

On Fri, May 31, 2019, 17:48 Marc Glisse <marc.glisse@inria.fr> wrote:

> On Fri, 31 May 2019, Aldy Hernandez wrote:
>
> > I've never been too happy with the too large due to cast warnings. For
> that
> > matter, it seems like a lot of the unbounded alloca warning variants were
> > artifacts of the way we couldn't get precise ranges after vrp asserts had
> > disappeared and we were trying to guess at what the actual range in the
> > original code was. It's fragile at best.
>
> Yes, very fragile.
>
> > I haven't been paying too much attention to walloca because the ranger
> gets
> > considerably better context ranges in the ranger walloca version, and we
> > are getting correct warnings for a variety of things we couldn't before.
> So
> > I was hoping to ignore this until we all agreed on what range, vrp etc
> will
> > look like going forward.
>
> Seems sensible.
>
> > That being said, I could take a closer look at this xfail on Monday if
> > y'all would like. But I don't currently have strong opinions either way.
> I
> > guess it'll all change in the next few months.
>
> As long as you are ok with one Walloca testcase being xfailed until the
> VRP work lands, I don't think there is a need to spend time on it now.
>
> --
> Marc Glisse
>
diff mbox series

Patch

Index: gcc/match.pd
===================================================================
--- gcc/match.pd	(revision 271376)
+++ gcc/match.pd	(working copy)
@@ -1490,21 +1490,23 @@  DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 	&& (wi::to_wide (@2)
 	    == wi::max_value (TYPE_PRECISION (TREE_TYPE (@0)), SIGNED) - 1))
     (with { tree stype = signed_type_for (TREE_TYPE (@0)); }
      (icmp (convert:stype @0) { build_int_cst (stype, 0); })))))
 
 /* X / 4 < Y / 4 iff X < Y when the division is known to be exact.  */
 (for cmp (simple_comparison)
  (simplify
   (cmp (exact_div @0 INTEGER_CST@2) (exact_div @1 @2))
   (if (wi::gt_p (wi::to_wide (@2), 0, TYPE_SIGN (TREE_TYPE (@2))))
-   (cmp @0 @1))))
+   (cmp @0 @1)
+   (if (wi::lt_p (wi::to_wide (@2), 0, TYPE_SIGN (TREE_TYPE (@2))))
+    (cmp @1 @0)))))
 
 /* X / C1 op C2 into a simple range test.  */
 (for cmp (simple_comparison)
  (simplify
   (cmp (trunc_div:s @0 INTEGER_CST@1) INTEGER_CST@2)
   (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
        && integer_nonzerop (@1)
        && !TREE_OVERFLOW (@1)
        && !TREE_OVERFLOW (@2))
    (with { tree lo, hi; bool neg_overflow;
@@ -3626,20 +3634,47 @@  DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
       wi::overflow_type ovf;
       wide_int prod = wi::mul (wi::to_wide (@2), wi::to_wide (@1),
 			       TYPE_SIGN (TREE_TYPE (@1)), &ovf);
     }
     (if (ovf)
      { constant_boolean_node (wi::lt_p (wi::to_wide (@2), 0,
 					TYPE_SIGN (TREE_TYPE (@2)))
 			      != (cmp == LT_EXPR || cmp == LE_EXPR), type); }
      (cmp @0 { wide_int_to_tree (TREE_TYPE (@0), prod); }))))))
 
+/* Fold (size_t)(A /[ex] B) CMP C to (size_t)A CMP (size_t)B * C or A CMP' 0.
+
+   For small C (less than max/B), this is (size_t)A CMP (size_t)B * C.
+   For large C (more than min/B+2^size), this is also true, with the
+   multiplication computed modulo 2^size.
+   For intermediate C, this just tests the sign of A.  */
+(for cmp  (lt le gt ge)
+     cmp2 (ge ge lt lt)
+ (simplify
+  (cmp (convert (exact_div @0 INTEGER_CST@1)) INTEGER_CST@2)
+  (if (tree_nop_conversion_p (TREE_TYPE (@0), TREE_TYPE (@2))
+       && TYPE_UNSIGNED (TREE_TYPE (@2)) && !TYPE_UNSIGNED (TREE_TYPE (@0))
+       && wi::gt_p (wi::to_wide (@1), 0, TYPE_SIGN (TREE_TYPE (@1))))
+   (with
+    {
+      tree utype = TREE_TYPE (@2);
+      wide_int denom = wi::to_wide (@1);
+      wide_int right = wi::to_wide (@2);
+      wide_int smax = wi::sdiv_trunc (wi::max_value (TREE_TYPE (@0)), denom);
+      wide_int smin = wi::sdiv_trunc (wi::min_value (TREE_TYPE (@0)), denom);
+      bool small = wi::leu_p (right, smax);
+      bool large = wi::geu_p (right, smin);
+    }
+    (if (small || large)
+     (cmp (convert:utype @0) (mult @2 (convert @1)))
+     (cmp2 @0 { build_zero_cst (TREE_TYPE (@0)); }))))))
+
 /* Unordered tests if either argument is a NaN.  */
 (simplify
  (bit_ior (unordered @0 @0) (unordered @1 @1))
  (if (types_match (@0, @1))
   (unordered @0 @1)))
 (simplify
  (bit_and (ordered @0 @0) (ordered @1 @1))
  (if (types_match (@0, @1))
   (ordered @0 @1)))
 (simplify
Index: gcc/testsuite/gcc.dg/tree-ssa/cmpexactdiv-3.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/cmpexactdiv-3.c	(nonexistent)
+++ gcc/testsuite/gcc.dg/tree-ssa/cmpexactdiv-3.c	(working copy)
@@ -0,0 +1,10 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O -fdump-tree-optimized-raw" } */
+
+int f(int*a,int*b){
+  if(sizeof(__SIZE_TYPE__)!=sizeof(__PTRDIFF_TYPE__)) return -1;
+  __SIZE_TYPE__ d = b - a;
+  return d >= 5;
+}
+
+/* { dg-final { scan-tree-dump-not "exact_div_expr" "optimized" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/cmpexactdiv-4.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/cmpexactdiv-4.c	(nonexistent)
+++ gcc/testsuite/gcc.dg/tree-ssa/cmpexactdiv-4.c	(working copy)
@@ -0,0 +1,10 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O -fdump-tree-optimized-raw" } */
+
+int f(int*a,int*b,int*c){
+  __PTRDIFF_TYPE__ x = -(b - a);
+  __PTRDIFF_TYPE__ y = -(c - a);
+  return x < y;
+}
+
+/* { dg-final { scan-tree-dump-not "exact_div_expr" "optimized" } } */