diff mbox

fold x ^ y to 0 if x == y

Message ID CAAgBjM=iAt5QswqoKYj_TQxV4B__3kf-eYrGewQjftkuPhZVZQ@mail.gmail.com
State New
Headers show

Commit Message

Prathamesh Kulkarni July 8, 2016, 10:10 a.m. UTC
Hi Richard,
For the following test-case:

int f(int x, int y)
{
   int ret;

   if (x == y)
     ret = x ^ y;
   else
     ret = 1;

   return ret;
}

I was wondering if x ^ y should be folded to 0 since
it's guarded by condition x == y ?

optimized dump shows:
f (int x, int y)
{
  int iftmp.0_1;
  int iftmp.0_4;

  <bb 2>:
  if (x_2(D) == y_3(D))
    goto <bb 3>;
  else
    goto <bb 4>;

  <bb 3>:
  iftmp.0_4 = x_2(D) ^ y_3(D);

  <bb 4>:
  # iftmp.0_1 = PHI <iftmp.0_4(3), 1(2)>
  return iftmp.0_1;

}

The attached patch tries to fold for above case.
I am checking if op0 and op1 are equal using:
if (bitmap_intersect_p (vr1->equiv, vr2->equiv)
   && operand_equal_p (vr1->min, vr1->max)
   && operand_equal_p (vr2->min, vr2->max))
  { /* equal /* }

I suppose intersection would check if op0 and op1 have equivalent ranges,
and added operand_equal_p check to ensure that there is only one
element within the range. Does that look correct ?
Bootstrap+test in progress on x86_64-unknown-linux-gnu.

Thanks,
Prathamesh

Comments

Richard Biener July 8, 2016, 11:17 a.m. UTC | #1
On Fri, 8 Jul 2016, Prathamesh Kulkarni wrote:

> Hi Richard,
> For the following test-case:
> 
> int f(int x, int y)
> {
>    int ret;
> 
>    if (x == y)
>      ret = x ^ y;
>    else
>      ret = 1;
> 
>    return ret;
> }
> 
> I was wondering if x ^ y should be folded to 0 since
> it's guarded by condition x == y ?
> 
> optimized dump shows:
> f (int x, int y)
> {
>   int iftmp.0_1;
>   int iftmp.0_4;
> 
>   <bb 2>:
>   if (x_2(D) == y_3(D))
>     goto <bb 3>;
>   else
>     goto <bb 4>;
> 
>   <bb 3>:
>   iftmp.0_4 = x_2(D) ^ y_3(D);
> 
>   <bb 4>:
>   # iftmp.0_1 = PHI <iftmp.0_4(3), 1(2)>
>   return iftmp.0_1;
> 
> }
> 
> The attached patch tries to fold for above case.
> I am checking if op0 and op1 are equal using:
> if (bitmap_intersect_p (vr1->equiv, vr2->equiv)
>    && operand_equal_p (vr1->min, vr1->max)
>    && operand_equal_p (vr2->min, vr2->max))
>   { /* equal /* }
> 
> I suppose intersection would check if op0 and op1 have equivalent ranges,
> and added operand_equal_p check to ensure that there is only one
> element within the range. Does that look correct ?
> Bootstrap+test in progress on x86_64-unknown-linux-gnu.

I think VRP is the wrong place to catch this and DOM should have but it
does

Optimizing block #3

1>>> STMT 1 = x_2(D) le_expr y_3(D)
1>>> STMT 1 = x_2(D) ge_expr y_3(D)
1>>> STMT 1 = x_2(D) eq_expr y_3(D)
1>>> STMT 0 = x_2(D) ne_expr y_3(D)
0>>> COPY x_2(D) = y_3(D)
0>>> COPY y_3(D) = x_2(D)
Optimizing statement ret_4 = x_2(D) ^ y_3(D);
  Replaced 'x_2(D)' with variable 'y_3(D)'
  Replaced 'y_3(D)' with variable 'x_2(D)'
  Folded to: ret_4 = x_2(D) ^ y_3(D);
LKUP STMT ret_4 = x_2(D) bit_xor_expr y_3(D)

heh, registering both reqivalencies is obviously not going to help...

The 2nd equivalence is from doing

      /* We already recorded that LHS = RHS, with canonicalization,
         value chain following, etc.

         We also want to record RHS = LHS, but without any 
canonicalization
         or value chain following.  */
      if (TREE_CODE (rhs) == SSA_NAME)
        const_and_copies->record_const_or_copy_raw (rhs, lhs,
                                                    SSA_NAME_VALUE (rhs));

generally recording both is not helpful.  Jeff?  This seems to be
r233207 (fix for PR65917) which must have regressed this testcase.

Richard.
Richard Biener July 8, 2016, 11:29 a.m. UTC | #2
On Fri, 8 Jul 2016, Richard Biener wrote:

> On Fri, 8 Jul 2016, Prathamesh Kulkarni wrote:
> 
> > Hi Richard,
> > For the following test-case:
> > 
> > int f(int x, int y)
> > {
> >    int ret;
> > 
> >    if (x == y)
> >      ret = x ^ y;
> >    else
> >      ret = 1;
> > 
> >    return ret;
> > }
> > 
> > I was wondering if x ^ y should be folded to 0 since
> > it's guarded by condition x == y ?
> > 
> > optimized dump shows:
> > f (int x, int y)
> > {
> >   int iftmp.0_1;
> >   int iftmp.0_4;
> > 
> >   <bb 2>:
> >   if (x_2(D) == y_3(D))
> >     goto <bb 3>;
> >   else
> >     goto <bb 4>;
> > 
> >   <bb 3>:
> >   iftmp.0_4 = x_2(D) ^ y_3(D);
> > 
> >   <bb 4>:
> >   # iftmp.0_1 = PHI <iftmp.0_4(3), 1(2)>
> >   return iftmp.0_1;
> > 
> > }
> > 
> > The attached patch tries to fold for above case.
> > I am checking if op0 and op1 are equal using:
> > if (bitmap_intersect_p (vr1->equiv, vr2->equiv)
> >    && operand_equal_p (vr1->min, vr1->max)
> >    && operand_equal_p (vr2->min, vr2->max))
> >   { /* equal /* }
> > 
> > I suppose intersection would check if op0 and op1 have equivalent ranges,
> > and added operand_equal_p check to ensure that there is only one
> > element within the range. Does that look correct ?
> > Bootstrap+test in progress on x86_64-unknown-linux-gnu.
> 
> I think VRP is the wrong place to catch this and DOM should have but it
> does
> 
> Optimizing block #3
> 
> 1>>> STMT 1 = x_2(D) le_expr y_3(D)
> 1>>> STMT 1 = x_2(D) ge_expr y_3(D)
> 1>>> STMT 1 = x_2(D) eq_expr y_3(D)
> 1>>> STMT 0 = x_2(D) ne_expr y_3(D)
> 0>>> COPY x_2(D) = y_3(D)
> 0>>> COPY y_3(D) = x_2(D)
> Optimizing statement ret_4 = x_2(D) ^ y_3(D);
>   Replaced 'x_2(D)' with variable 'y_3(D)'
>   Replaced 'y_3(D)' with variable 'x_2(D)'
>   Folded to: ret_4 = x_2(D) ^ y_3(D);
> LKUP STMT ret_4 = x_2(D) bit_xor_expr y_3(D)
> 
> heh, registering both reqivalencies is obviously not going to help...
> 
> The 2nd equivalence is from doing
> 
>       /* We already recorded that LHS = RHS, with canonicalization,
>          value chain following, etc.
> 
>          We also want to record RHS = LHS, but without any 
> canonicalization
>          or value chain following.  */
>       if (TREE_CODE (rhs) == SSA_NAME)
>         const_and_copies->record_const_or_copy_raw (rhs, lhs,
>                                                     SSA_NAME_VALUE (rhs));
> 
> generally recording both is not helpful.  Jeff?  This seems to be
> r233207 (fix for PR65917) which must have regressed this testcase.

Just verified it works fine on the GCC 5 branch:

Optimizing block #3

0>>> COPY y_3(D) = x_2(D)
1>>> STMT 1 = x_2(D) le_expr y_3(D)
1>>> STMT 1 = x_2(D) ge_expr y_3(D)
1>>> STMT 1 = x_2(D) eq_expr y_3(D)
1>>> STMT 0 = x_2(D) ne_expr y_3(D)
Optimizing statement ret_4 = x_2(D) ^ y_3(D);
  Replaced 'y_3(D)' with variable 'x_2(D)'
Applying pattern match.pd:240, gimple-match.c:11346
gimple_simplified to ret_4 = 0;
  Folded to: ret_4 = 0;

Richard.
Prathamesh Kulkarni July 20, 2016, 12:59 p.m. UTC | #3
On 8 July 2016 at 12:29, Richard Biener <rguenther@suse.de> wrote:
> On Fri, 8 Jul 2016, Richard Biener wrote:
>
>> On Fri, 8 Jul 2016, Prathamesh Kulkarni wrote:
>>
>> > Hi Richard,
>> > For the following test-case:
>> >
>> > int f(int x, int y)
>> > {
>> >    int ret;
>> >
>> >    if (x == y)
>> >      ret = x ^ y;
>> >    else
>> >      ret = 1;
>> >
>> >    return ret;
>> > }
>> >
>> > I was wondering if x ^ y should be folded to 0 since
>> > it's guarded by condition x == y ?
>> >
>> > optimized dump shows:
>> > f (int x, int y)
>> > {
>> >   int iftmp.0_1;
>> >   int iftmp.0_4;
>> >
>> >   <bb 2>:
>> >   if (x_2(D) == y_3(D))
>> >     goto <bb 3>;
>> >   else
>> >     goto <bb 4>;
>> >
>> >   <bb 3>:
>> >   iftmp.0_4 = x_2(D) ^ y_3(D);
>> >
>> >   <bb 4>:
>> >   # iftmp.0_1 = PHI <iftmp.0_4(3), 1(2)>
>> >   return iftmp.0_1;
>> >
>> > }
>> >
>> > The attached patch tries to fold for above case.
>> > I am checking if op0 and op1 are equal using:
>> > if (bitmap_intersect_p (vr1->equiv, vr2->equiv)
>> >    && operand_equal_p (vr1->min, vr1->max)
>> >    && operand_equal_p (vr2->min, vr2->max))
>> >   { /* equal /* }
>> >
>> > I suppose intersection would check if op0 and op1 have equivalent ranges,
>> > and added operand_equal_p check to ensure that there is only one
>> > element within the range. Does that look correct ?
>> > Bootstrap+test in progress on x86_64-unknown-linux-gnu.
>>
>> I think VRP is the wrong place to catch this and DOM should have but it
>> does
>>
>> Optimizing block #3
>>
>> 1>>> STMT 1 = x_2(D) le_expr y_3(D)
>> 1>>> STMT 1 = x_2(D) ge_expr y_3(D)
>> 1>>> STMT 1 = x_2(D) eq_expr y_3(D)
>> 1>>> STMT 0 = x_2(D) ne_expr y_3(D)
>> 0>>> COPY x_2(D) = y_3(D)
>> 0>>> COPY y_3(D) = x_2(D)
>> Optimizing statement ret_4 = x_2(D) ^ y_3(D);
>>   Replaced 'x_2(D)' with variable 'y_3(D)'
>>   Replaced 'y_3(D)' with variable 'x_2(D)'
>>   Folded to: ret_4 = x_2(D) ^ y_3(D);
>> LKUP STMT ret_4 = x_2(D) bit_xor_expr y_3(D)
>>
>> heh, registering both reqivalencies is obviously not going to help...
>>
>> The 2nd equivalence is from doing
>>
>>       /* We already recorded that LHS = RHS, with canonicalization,
>>          value chain following, etc.
>>
>>          We also want to record RHS = LHS, but without any
>> canonicalization
>>          or value chain following.  */
>>       if (TREE_CODE (rhs) == SSA_NAME)
>>         const_and_copies->record_const_or_copy_raw (rhs, lhs,
>>                                                     SSA_NAME_VALUE (rhs));
>>
>> generally recording both is not helpful.  Jeff?  This seems to be
>> r233207 (fix for PR65917) which must have regressed this testcase.
>
> Just verified it works fine on the GCC 5 branch:
>
> Optimizing block #3
>
> 0>>> COPY y_3(D) = x_2(D)
> 1>>> STMT 1 = x_2(D) le_expr y_3(D)
> 1>>> STMT 1 = x_2(D) ge_expr y_3(D)
> 1>>> STMT 1 = x_2(D) eq_expr y_3(D)
> 1>>> STMT 0 = x_2(D) ne_expr y_3(D)
> Optimizing statement ret_4 = x_2(D) ^ y_3(D);
>   Replaced 'y_3(D)' with variable 'x_2(D)'
> Applying pattern match.pd:240, gimple-match.c:11346
> gimple_simplified to ret_4 = 0;
>   Folded to: ret_4 = 0;
I have reported it as PR71947.
Could you help me point out how to fix this ?

Thanks,
Prathamesh
>
> Richard.
Richard Biener July 20, 2016, 3:35 p.m. UTC | #4
On Wed, 20 Jul 2016, Prathamesh Kulkarni wrote:

> On 8 July 2016 at 12:29, Richard Biener <rguenther@suse.de> wrote:
> > On Fri, 8 Jul 2016, Richard Biener wrote:
> >
> >> On Fri, 8 Jul 2016, Prathamesh Kulkarni wrote:
> >>
> >> > Hi Richard,
> >> > For the following test-case:
> >> >
> >> > int f(int x, int y)
> >> > {
> >> >    int ret;
> >> >
> >> >    if (x == y)
> >> >      ret = x ^ y;
> >> >    else
> >> >      ret = 1;
> >> >
> >> >    return ret;
> >> > }
> >> >
> >> > I was wondering if x ^ y should be folded to 0 since
> >> > it's guarded by condition x == y ?
> >> >
> >> > optimized dump shows:
> >> > f (int x, int y)
> >> > {
> >> >   int iftmp.0_1;
> >> >   int iftmp.0_4;
> >> >
> >> >   <bb 2>:
> >> >   if (x_2(D) == y_3(D))
> >> >     goto <bb 3>;
> >> >   else
> >> >     goto <bb 4>;
> >> >
> >> >   <bb 3>:
> >> >   iftmp.0_4 = x_2(D) ^ y_3(D);
> >> >
> >> >   <bb 4>:
> >> >   # iftmp.0_1 = PHI <iftmp.0_4(3), 1(2)>
> >> >   return iftmp.0_1;
> >> >
> >> > }
> >> >
> >> > The attached patch tries to fold for above case.
> >> > I am checking if op0 and op1 are equal using:
> >> > if (bitmap_intersect_p (vr1->equiv, vr2->equiv)
> >> >    && operand_equal_p (vr1->min, vr1->max)
> >> >    && operand_equal_p (vr2->min, vr2->max))
> >> >   { /* equal /* }
> >> >
> >> > I suppose intersection would check if op0 and op1 have equivalent ranges,
> >> > and added operand_equal_p check to ensure that there is only one
> >> > element within the range. Does that look correct ?
> >> > Bootstrap+test in progress on x86_64-unknown-linux-gnu.
> >>
> >> I think VRP is the wrong place to catch this and DOM should have but it
> >> does
> >>
> >> Optimizing block #3
> >>
> >> 1>>> STMT 1 = x_2(D) le_expr y_3(D)
> >> 1>>> STMT 1 = x_2(D) ge_expr y_3(D)
> >> 1>>> STMT 1 = x_2(D) eq_expr y_3(D)
> >> 1>>> STMT 0 = x_2(D) ne_expr y_3(D)
> >> 0>>> COPY x_2(D) = y_3(D)
> >> 0>>> COPY y_3(D) = x_2(D)
> >> Optimizing statement ret_4 = x_2(D) ^ y_3(D);
> >>   Replaced 'x_2(D)' with variable 'y_3(D)'
> >>   Replaced 'y_3(D)' with variable 'x_2(D)'
> >>   Folded to: ret_4 = x_2(D) ^ y_3(D);
> >> LKUP STMT ret_4 = x_2(D) bit_xor_expr y_3(D)
> >>
> >> heh, registering both reqivalencies is obviously not going to help...
> >>
> >> The 2nd equivalence is from doing
> >>
> >>       /* We already recorded that LHS = RHS, with canonicalization,
> >>          value chain following, etc.
> >>
> >>          We also want to record RHS = LHS, but without any
> >> canonicalization
> >>          or value chain following.  */
> >>       if (TREE_CODE (rhs) == SSA_NAME)
> >>         const_and_copies->record_const_or_copy_raw (rhs, lhs,
> >>                                                     SSA_NAME_VALUE (rhs));
> >>
> >> generally recording both is not helpful.  Jeff?  This seems to be
> >> r233207 (fix for PR65917) which must have regressed this testcase.
> >
> > Just verified it works fine on the GCC 5 branch:
> >
> > Optimizing block #3
> >
> > 0>>> COPY y_3(D) = x_2(D)
> > 1>>> STMT 1 = x_2(D) le_expr y_3(D)
> > 1>>> STMT 1 = x_2(D) ge_expr y_3(D)
> > 1>>> STMT 1 = x_2(D) eq_expr y_3(D)
> > 1>>> STMT 0 = x_2(D) ne_expr y_3(D)
> > Optimizing statement ret_4 = x_2(D) ^ y_3(D);
> >   Replaced 'y_3(D)' with variable 'x_2(D)'
> > Applying pattern match.pd:240, gimple-match.c:11346
> > gimple_simplified to ret_4 = 0;
> >   Folded to: ret_4 = 0;
> I have reported it as PR71947.
> Could you help me point out how to fix this ?

Not record both equivalences.  This might break the testcase it was
introduced for (obviously).  Which is why I CCed Jeff for his opinion.

Richard.
Jeff Law July 20, 2016, 3:40 p.m. UTC | #5
On 07/20/2016 09:35 AM, Richard Biener wrote:
>> I have reported it as PR71947.
>> Could you help me point out how to fix this ?
>
> Not record both equivalences.  This might break the testcase it was
> introduced for (obviously).  Which is why I CCed Jeff for his opinion.
It's on my todo list.  I'm still catching up from my PTO last month.

It'll certainly regress the testcase that was introduced when we 
recorded both equivalences.

jeff
Jeff Law Oct. 5, 2016, 7:06 p.m. UTC | #6
On 07/08/2016 05:17 AM, Richard Biener wrote:
> On Fri, 8 Jul 2016, Prathamesh Kulkarni wrote:
>
>> Hi Richard,
>> For the following test-case:
>>
>> int f(int x, int y)
>> {
>>    int ret;
>>
>>    if (x == y)
>>      ret = x ^ y;
>>    else
>>      ret = 1;
>>
>>    return ret;
>> }
>>
>> I was wondering if x ^ y should be folded to 0 since
>> it's guarded by condition x == y ?
>>
>> optimized dump shows:
>> f (int x, int y)
>> {
>>   int iftmp.0_1;
>>   int iftmp.0_4;
>>
>>   <bb 2>:
>>   if (x_2(D) == y_3(D))
>>     goto <bb 3>;
>>   else
>>     goto <bb 4>;
>>
>>   <bb 3>:
>>   iftmp.0_4 = x_2(D) ^ y_3(D);
>>
>>   <bb 4>:
>>   # iftmp.0_1 = PHI <iftmp.0_4(3), 1(2)>
>>   return iftmp.0_1;
>>
>> }
>>
>> The attached patch tries to fold for above case.
>> I am checking if op0 and op1 are equal using:
>> if (bitmap_intersect_p (vr1->equiv, vr2->equiv)
>>    && operand_equal_p (vr1->min, vr1->max)
>>    && operand_equal_p (vr2->min, vr2->max))
>>   { /* equal /* }
>>
>> I suppose intersection would check if op0 and op1 have equivalent ranges,
>> and added operand_equal_p check to ensure that there is only one
>> element within the range. Does that look correct ?
>> Bootstrap+test in progress on x86_64-unknown-linux-gnu.
>
> I think VRP is the wrong place to catch this and DOM should have but it
> does
>
> Optimizing block #3
>
> 1>>> STMT 1 = x_2(D) le_expr y_3(D)
> 1>>> STMT 1 = x_2(D) ge_expr y_3(D)
> 1>>> STMT 1 = x_2(D) eq_expr y_3(D)
> 1>>> STMT 0 = x_2(D) ne_expr y_3(D)
> 0>>> COPY x_2(D) = y_3(D)
> 0>>> COPY y_3(D) = x_2(D)
> Optimizing statement ret_4 = x_2(D) ^ y_3(D);
>   Replaced 'x_2(D)' with variable 'y_3(D)'
>   Replaced 'y_3(D)' with variable 'x_2(D)'
>   Folded to: ret_4 = x_2(D) ^ y_3(D);
> LKUP STMT ret_4 = x_2(D) bit_xor_expr y_3(D)
>
> heh, registering both reqivalencies is obviously not going to help...
>
> The 2nd equivalence is from doing
>
>       /* We already recorded that LHS = RHS, with canonicalization,
>          value chain following, etc.
>
>          We also want to record RHS = LHS, but without any
> canonicalization
>          or value chain following.  */
>       if (TREE_CODE (rhs) == SSA_NAME)
>         const_and_copies->record_const_or_copy_raw (rhs, lhs,
>                                                     SSA_NAME_VALUE (rhs));
>
> generally recording both is not helpful.  Jeff?  This seems to be
> r233207 (fix for PR65917) which must have regressed this testcase.
The primary purpose of the const/copies table is to enable simple 
const/copy propagation.  So given a = b, we record that "a" can be 
replaced with "b" and from then on we replace occurrences of "a" with 
"b" -- ie, copy propagation.

We have a much lower priority goal of recording equivalences that arise 
from conditionals such as if (a == b) and using those in const/copy 
propagation.   Given the structure of the table, we can record that a = 
b or b = a.  The problem is we've never come up with any good heuristics 
for which of those two styles to record.  Which you got was mostly a 
function of canonicalization using SSA_NAME_VERSIONs.

As seen in 65917, sometimes that decision is wrong and can lead to 
performance regressions.  But it's really just luck of the draw whether 
it gets optimized or not.  And the test

65917 allows us to record both equivalences in the table.  We do that by 
bypassing all the checks/canonicalizations with a raw insertion method. 
During DOM we can now lookup either form which allows us to fix the 
regression in 65917.

It ought to be easy to fold x ^ y to zero when x == y (famous last 
words).  I'm sure I'll regret saying that when I go to look at how to 
twiddle DOM appropriately.


jeff
Richard Biener Oct. 6, 2016, 7:18 a.m. UTC | #7
On Wed, 5 Oct 2016, Jeff Law wrote:

> On 07/08/2016 05:17 AM, Richard Biener wrote:
> > On Fri, 8 Jul 2016, Prathamesh Kulkarni wrote:
> > 
> > > Hi Richard,
> > > For the following test-case:
> > > 
> > > int f(int x, int y)
> > > {
> > >    int ret;
> > > 
> > >    if (x == y)
> > >      ret = x ^ y;
> > >    else
> > >      ret = 1;
> > > 
> > >    return ret;
> > > }
> > > 
> > > I was wondering if x ^ y should be folded to 0 since
> > > it's guarded by condition x == y ?
> > > 
> > > optimized dump shows:
> > > f (int x, int y)
> > > {
> > >   int iftmp.0_1;
> > >   int iftmp.0_4;
> > > 
> > >   <bb 2>:
> > >   if (x_2(D) == y_3(D))
> > >     goto <bb 3>;
> > >   else
> > >     goto <bb 4>;
> > > 
> > >   <bb 3>:
> > >   iftmp.0_4 = x_2(D) ^ y_3(D);
> > > 
> > >   <bb 4>:
> > >   # iftmp.0_1 = PHI <iftmp.0_4(3), 1(2)>
> > >   return iftmp.0_1;
> > > 
> > > }
> > > 
> > > The attached patch tries to fold for above case.
> > > I am checking if op0 and op1 are equal using:
> > > if (bitmap_intersect_p (vr1->equiv, vr2->equiv)
> > >    && operand_equal_p (vr1->min, vr1->max)
> > >    && operand_equal_p (vr2->min, vr2->max))
> > >   { /* equal /* }
> > > 
> > > I suppose intersection would check if op0 and op1 have equivalent ranges,
> > > and added operand_equal_p check to ensure that there is only one
> > > element within the range. Does that look correct ?
> > > Bootstrap+test in progress on x86_64-unknown-linux-gnu.
> > 
> > I think VRP is the wrong place to catch this and DOM should have but it
> > does
> > 
> > Optimizing block #3
> > 
> > 1>>> STMT 1 = x_2(D) le_expr y_3(D)
> > 1>>> STMT 1 = x_2(D) ge_expr y_3(D)
> > 1>>> STMT 1 = x_2(D) eq_expr y_3(D)
> > 1>>> STMT 0 = x_2(D) ne_expr y_3(D)
> > 0>>> COPY x_2(D) = y_3(D)
> > 0>>> COPY y_3(D) = x_2(D)
> > Optimizing statement ret_4 = x_2(D) ^ y_3(D);
> >   Replaced 'x_2(D)' with variable 'y_3(D)'
> >   Replaced 'y_3(D)' with variable 'x_2(D)'
> >   Folded to: ret_4 = x_2(D) ^ y_3(D);
> > LKUP STMT ret_4 = x_2(D) bit_xor_expr y_3(D)
> > 
> > heh, registering both reqivalencies is obviously not going to help...
> > 
> > The 2nd equivalence is from doing
> > 
> >       /* We already recorded that LHS = RHS, with canonicalization,
> >          value chain following, etc.
> > 
> >          We also want to record RHS = LHS, but without any
> > canonicalization
> >          or value chain following.  */
> >       if (TREE_CODE (rhs) == SSA_NAME)
> >         const_and_copies->record_const_or_copy_raw (rhs, lhs,
> >                                                     SSA_NAME_VALUE (rhs));
> > 
> > generally recording both is not helpful.  Jeff?  This seems to be
> > r233207 (fix for PR65917) which must have regressed this testcase.
> The primary purpose of the const/copies table is to enable simple const/copy
> propagation.  So given a = b, we record that "a" can be replaced with "b" and
> from then on we replace occurrences of "a" with "b" -- ie, copy propagation.
> 
> We have a much lower priority goal of recording equivalences that arise from
> conditionals such as if (a == b) and using those in const/copy propagation.
> Given the structure of the table, we can record that a = b or b = a.  The
> problem is we've never come up with any good heuristics for which of those two
> styles to record.  Which you got was mostly a function of canonicalization
> using SSA_NAME_VERSIONs.
> 
> As seen in 65917, sometimes that decision is wrong and can lead to performance
> regressions.  But it's really just luck of the draw whether it gets optimized
> or not.  And the test
> 
> 65917 allows us to record both equivalences in the table.  We do that by
> bypassing all the checks/canonicalizations with a raw insertion method. During
> DOM we can now lookup either form which allows us to fix the regression in
> 65917.
> 
> It ought to be easy to fold x ^ y to zero when x == y (famous last words).
> I'm sure I'll regret saying that when I go to look at how to twiddle DOM
> appropriately.

Interesting idea.  Though it get's (theoretically) interesting for
ternary ops where you'd need to try 6 variants then ... thus in the
end it's a hack ...

That said, having SSA_NAME_VALUE (SSA_NAME_VALUE (t)) == t looks like
a bug in the lattice to me.

Richard.
Jeff Law Oct. 6, 2016, 2:56 p.m. UTC | #8
On 10/06/2016 01:18 AM, Richard Biener wrote:
>>
>> It ought to be easy to fold x ^ y to zero when x == y (famous last words).
>> I'm sure I'll regret saying that when I go to look at how to twiddle DOM
>> appropriately.
>
> Interesting idea.  Though it get's (theoretically) interesting for
> ternary ops where you'd need to try 6 variants then ... thus in the
> end it's a hack ...
I was actually pondering folding as changes are made rather than at the 
end of processing the current statement.  So given x ^ y and an x == y 
equivalence in the tables.  We'll replace one operand resulting in x ^ x 
or y ^ y, and immediately fold it to zero.

My suspicion is that most of time cprop doesn't change the current 
statement and that multiple propagations into the current statement are 
even rarer.  So for the two most common cases (0 or 1 operand can be 
propagated in the current statement) there would be zero additional cost.


>
> That said, having SSA_NAME_VALUE (SSA_NAME_VALUE (t)) == t looks like
> a bug in the lattice to me.
Not really.  It's just a convenient place to record the two way 
equivalence.  Even if we had a different, cleaner way to record the 
two-way equivalence the core issues around this BZ would remain.

jeff
diff mbox

Patch

diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
index 4333d60..787d068 100644
--- a/gcc/tree-vrp.c
+++ b/gcc/tree-vrp.c
@@ -6965,6 +6965,59 @@  vrp_valueize_1 (tree name)
   return name;
 }
 
+/* Try to fold op0 xor op1 == 0 if op0 == op1.  */ 
+static tree
+maybe_fold_xor (gassign *stmt)
+{
+  if (!stmt)
+    return NULL_TREE;
+
+  enum tree_code code = gimple_assign_rhs_code (stmt);
+  if (code != BIT_XOR_EXPR)
+    return NULL_TREE;
+
+  tree op0 = gimple_assign_rhs1 (stmt);
+  tree op1 = gimple_assign_rhs2 (stmt);
+
+  if (TREE_CODE (op0) != SSA_NAME
+      || TREE_CODE (op1) != SSA_NAME)
+    return NULL_TREE;
+
+  value_range *vr1 = get_value_range (op0);
+  value_range *vr2 = get_value_range (op1);
+
+  if (vr1 == NULL || vr2 == NULL)
+    return NULL_TREE;
+
+  if (vr1->type != VR_RANGE || vr2->type != VR_RANGE)
+    return NULL_TREE;
+
+  if (! (symbolic_range_p (vr1) && symbolic_range_p (vr2)))
+    return NULL_TREE;
+
+  if (! (TREE_CODE (vr1->min) == SSA_NAME && TREE_CODE (vr1->max) == SSA_NAME
+ 	 && TREE_CODE (vr2->min) == SSA_NAME && TREE_CODE (vr2->max) == SSA_NAME))
+    return NULL_TREE;
+
+  if (! (vr1->equiv && vr2->equiv))
+    return NULL_TREE;
+
+  /* check if op0 == op1.  */
+  if (bitmap_intersect_p (vr1->equiv, vr2->equiv)
+      && operand_equal_p (vr1->min, vr1->max, 0)
+      && operand_equal_p (vr2->min, vr2->max, 0)
+      && code == BIT_XOR_EXPR)
+    {
+      gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
+      gimple_assign_set_rhs_from_tree (&gsi, integer_zero_node); 
+      update_stmt (stmt);
+      return integer_zero_node;
+    }
+
+  return NULL_TREE;
+}
+
+
 /* Visit assignment STMT.  If it produces an interesting range, record
    the SSA name in *OUTPUT_P.  */
 
@@ -6990,8 +7043,11 @@  vrp_visit_assignment_or_call (gimple *stmt, tree *output_p)
       /* Try folding the statement to a constant first.  */
       tree tem = gimple_fold_stmt_to_constant_1 (stmt, vrp_valueize,
 						 vrp_valueize_1);
+      if (!tem)
+	tem = maybe_fold_xor (dyn_cast<gassign *> (stmt));
       if (tem && is_gimple_min_invariant (tem))
 	set_value_range_to_value (&new_vr, tem, NULL);
+
       /* Then dispatch to value-range extracting functions.  */
       else if (code == GIMPLE_CALL)
 	extract_range_basic (&new_vr, stmt);