diff mbox

[match.pd] Fix for PR35691

Message ID CAAgBjMnP8EK=wgJ9kwZFOPM-Cmmsb7iYTBKDNOy88-0t3T6Ovw@mail.gmail.com
State New
Headers show

Commit Message

Prathamesh Kulkarni Nov. 7, 2016, 5:36 p.m. UTC
On 7 November 2016 at 15:43, Richard Biener <rguenther@suse.de> wrote:
> On Fri, 4 Nov 2016, Prathamesh Kulkarni wrote:
>
>> On 4 November 2016 at 13:41, Richard Biener <rguenther@suse.de> wrote:
>> > On Thu, 3 Nov 2016, Marc Glisse wrote:
>> >
>> >> On Thu, 3 Nov 2016, Richard Biener wrote:
>> >>
>> >> > > > > The transform would also work for vectors (element_precision for
>> >> > > > > the test but also a value-matching zero which should ensure the
>> >> > > > > same number of elements).
>> >> > > > Um sorry, I didn't get how to check vectors to be of equal length by a
>> >> > > > matching zero.
>> >> > > > Could you please elaborate on that ?
>> >> > >
>> >> > > He may have meant something like:
>> >> > >
>> >> > >   (op (cmp @0 integer_zerop@2) (cmp @1 @2))
>> >> >
>> >> > I meant with one being @@2 to allow signed vs. Unsigned @0/@1 which was the
>> >> > point of the pattern.
>> >>
>> >> Oups, that's what I had written first, and then I somehow managed to confuse
>> >> myself enough to remove it so as to remove the call to types_match :-(
>> >>
>> >> > > So the last operand is checked with operand_equal_p instead of
>> >> > > integer_zerop. But the fact that we could compute bit_ior on the
>> >> > > comparison results should already imply that the number of elements is the
>> >> > > same.
>> >> >
>> >> > Though for equality compares we also allow scalar results IIRC.
>> >>
>> >> Oh, right, I keep forgetting that :-( And I have no idea how to generate one
>> >> for a testcase, at least until the GIMPLE FE lands...
>> >>
>> >> > > On platforms that have IOR on floats (at least x86 with SSE, maybe some
>> >> > > vector mode on s390?), it would be cool to do the same for floats (most
>> >> > > likely at the RTL level).
>> >> >
>> >> > On GIMPLE view-converts could come to the rescue here as well.  Or we cab
>> >> > just allow bit-and/or on floats as much as we allow them on pointers.
>> >>
>> >> Would that generate sensible code on targets that do not have logic insns for
>> >> floats? Actually, even on x86_64 that generates inefficient code, so there
>> >> would be some work (for instance grep finds no gen_iordf3, only gen_iorv2df3).
>> >>
>> >> I am also a bit wary of doing those obfuscating optimizations too early...
>> >> a==0 is something that other optimizations might use. long
>> >> c=(long&)a|(long&)b; (double&)c==0; less so...
>> >>
>> >> (and I am assuming that signaling NaNs don't make the whole transformation
>> >> impossible, which might be wrong)
>> >
>> > Yeah.  I also think it's not so much important - I just wanted to mention
>> > vectors...
>> >
>> > Btw, I still think we need a more sensible infrastructure for passes
>> > to gather, analyze and modify complex conditions.  (I'm always pointing
>> > to tree-affine.c as an, albeit not very good, example for handling
>> > a similar problem)
>> Thanks for mentioning the value-matching capture @@, I wasn't aware of
>> this match.pd feature.
>> The current patch keeps it restricted to only bitwise operators on integers.
>> Bootstrap+test running on x86_64-unknown-linux-gnu.
>> OK to commit if passes ?
>
> +/* PR35691: Transform
> +   (x == 0 & y == 0) -> (x | typeof(x)(y)) == 0.
> +   (x != 0 | y != 0) -> (x | typeof(x)(y)) != 0.  */
> +
>
> Please omit the vertical space
>
> +(for bitop (bit_and bit_ior)
> +     cmp (eq ne)
> + (simplify
> +  (bitop (cmp @0 integer_zerop) (cmp @1 integer_zerop))
>
> if you capture the first integer_zerop as @2 then you can re-use it...
>
> +   (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
> +       && INTEGRAL_TYPE_P (TREE_TYPE (@1))
> +       && TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE
> (@1)))
> +    (cmp (bit_ior @0 (convert @1)) { build_zero_cst (TREE_TYPE (@0));
>
> ... here inplace of the { build_zero_cst ... }.
>
> Ok with that changes.
Thanks, committed the attached version as r241915.

>
> Richard.
2016-11-07  Prathamesh Kulkarni  <prathamesh.kulkarni@linaro.org>

	PR middle-end/35691
	* match.pd: Add following two patterns:
	(x == 0 & y == 0) -> (x | typeof(x)(y)) == 0.
	(x != 0 | y != 0) -> (x | typeof(x)(y)) != 0.

testsuite/
	* gcc.dg/pr35691-1.c: New test-case.
	* gcc.dg/pr35691-4.c: Likewise.

Comments

Prathamesh Kulkarni Nov. 7, 2016, 5:40 p.m. UTC | #1
On 7 November 2016 at 23:06, Prathamesh Kulkarni
<prathamesh.kulkarni@linaro.org> wrote:
> On 7 November 2016 at 15:43, Richard Biener <rguenther@suse.de> wrote:
>> On Fri, 4 Nov 2016, Prathamesh Kulkarni wrote:
>>
>>> On 4 November 2016 at 13:41, Richard Biener <rguenther@suse.de> wrote:
>>> > On Thu, 3 Nov 2016, Marc Glisse wrote:
>>> >
>>> >> On Thu, 3 Nov 2016, Richard Biener wrote:
>>> >>
>>> >> > > > > The transform would also work for vectors (element_precision for
>>> >> > > > > the test but also a value-matching zero which should ensure the
>>> >> > > > > same number of elements).
>>> >> > > > Um sorry, I didn't get how to check vectors to be of equal length by a
>>> >> > > > matching zero.
>>> >> > > > Could you please elaborate on that ?
>>> >> > >
>>> >> > > He may have meant something like:
>>> >> > >
>>> >> > >   (op (cmp @0 integer_zerop@2) (cmp @1 @2))
>>> >> >
>>> >> > I meant with one being @@2 to allow signed vs. Unsigned @0/@1 which was the
>>> >> > point of the pattern.
>>> >>
>>> >> Oups, that's what I had written first, and then I somehow managed to confuse
>>> >> myself enough to remove it so as to remove the call to types_match :-(
>>> >>
>>> >> > > So the last operand is checked with operand_equal_p instead of
>>> >> > > integer_zerop. But the fact that we could compute bit_ior on the
>>> >> > > comparison results should already imply that the number of elements is the
>>> >> > > same.
>>> >> >
>>> >> > Though for equality compares we also allow scalar results IIRC.
>>> >>
>>> >> Oh, right, I keep forgetting that :-( And I have no idea how to generate one
>>> >> for a testcase, at least until the GIMPLE FE lands...
>>> >>
>>> >> > > On platforms that have IOR on floats (at least x86 with SSE, maybe some
>>> >> > > vector mode on s390?), it would be cool to do the same for floats (most
>>> >> > > likely at the RTL level).
>>> >> >
>>> >> > On GIMPLE view-converts could come to the rescue here as well.  Or we cab
>>> >> > just allow bit-and/or on floats as much as we allow them on pointers.
>>> >>
>>> >> Would that generate sensible code on targets that do not have logic insns for
>>> >> floats? Actually, even on x86_64 that generates inefficient code, so there
>>> >> would be some work (for instance grep finds no gen_iordf3, only gen_iorv2df3).
>>> >>
>>> >> I am also a bit wary of doing those obfuscating optimizations too early...
>>> >> a==0 is something that other optimizations might use. long
>>> >> c=(long&)a|(long&)b; (double&)c==0; less so...
>>> >>
>>> >> (and I am assuming that signaling NaNs don't make the whole transformation
>>> >> impossible, which might be wrong)
>>> >
>>> > Yeah.  I also think it's not so much important - I just wanted to mention
>>> > vectors...
>>> >
>>> > Btw, I still think we need a more sensible infrastructure for passes
>>> > to gather, analyze and modify complex conditions.  (I'm always pointing
>>> > to tree-affine.c as an, albeit not very good, example for handling
>>> > a similar problem)
>>> Thanks for mentioning the value-matching capture @@, I wasn't aware of
>>> this match.pd feature.
>>> The current patch keeps it restricted to only bitwise operators on integers.
>>> Bootstrap+test running on x86_64-unknown-linux-gnu.
>>> OK to commit if passes ?
>>
>> +/* PR35691: Transform
>> +   (x == 0 & y == 0) -> (x | typeof(x)(y)) == 0.
>> +   (x != 0 | y != 0) -> (x | typeof(x)(y)) != 0.  */
>> +
>>
>> Please omit the vertical space
>>
>> +(for bitop (bit_and bit_ior)
>> +     cmp (eq ne)
>> + (simplify
>> +  (bitop (cmp @0 integer_zerop) (cmp @1 integer_zerop))
>>
>> if you capture the first integer_zerop as @2 then you can re-use it...
>>
>> +   (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
>> +       && INTEGRAL_TYPE_P (TREE_TYPE (@1))
>> +       && TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE
>> (@1)))
>> +    (cmp (bit_ior @0 (convert @1)) { build_zero_cst (TREE_TYPE (@0));
>>
>> ... here inplace of the { build_zero_cst ... }.
>>
>> Ok with that changes.
> Thanks, committed the attached version as r241915.
ugh, the svn commit message has:

testsuite/
* gcc.dg/pr35691-1.c: New test-case.
* gcc.dg/pr35691-4.c: Likewise.

pr35691-4.c was a typo, should be pr35691-2.c :/
However testsuite/ChangeLog correctly has entry for pr35691-2.c
Is it possible to edit the commit message for r241915 ?
Sorry about this.

Regards,
Prathamesh
>
>>
>> Richard.
Richard Biener Nov. 8, 2016, 7:53 a.m. UTC | #2
On Mon, 7 Nov 2016, Prathamesh Kulkarni wrote:

> On 7 November 2016 at 23:06, Prathamesh Kulkarni
> <prathamesh.kulkarni@linaro.org> wrote:
> > On 7 November 2016 at 15:43, Richard Biener <rguenther@suse.de> wrote:
> >> On Fri, 4 Nov 2016, Prathamesh Kulkarni wrote:
> >>
> >>> On 4 November 2016 at 13:41, Richard Biener <rguenther@suse.de> wrote:
> >>> > On Thu, 3 Nov 2016, Marc Glisse wrote:
> >>> >
> >>> >> On Thu, 3 Nov 2016, Richard Biener wrote:
> >>> >>
> >>> >> > > > > The transform would also work for vectors (element_precision for
> >>> >> > > > > the test but also a value-matching zero which should ensure the
> >>> >> > > > > same number of elements).
> >>> >> > > > Um sorry, I didn't get how to check vectors to be of equal length by a
> >>> >> > > > matching zero.
> >>> >> > > > Could you please elaborate on that ?
> >>> >> > >
> >>> >> > > He may have meant something like:
> >>> >> > >
> >>> >> > >   (op (cmp @0 integer_zerop@2) (cmp @1 @2))
> >>> >> >
> >>> >> > I meant with one being @@2 to allow signed vs. Unsigned @0/@1 which was the
> >>> >> > point of the pattern.
> >>> >>
> >>> >> Oups, that's what I had written first, and then I somehow managed to confuse
> >>> >> myself enough to remove it so as to remove the call to types_match :-(
> >>> >>
> >>> >> > > So the last operand is checked with operand_equal_p instead of
> >>> >> > > integer_zerop. But the fact that we could compute bit_ior on the
> >>> >> > > comparison results should already imply that the number of elements is the
> >>> >> > > same.
> >>> >> >
> >>> >> > Though for equality compares we also allow scalar results IIRC.
> >>> >>
> >>> >> Oh, right, I keep forgetting that :-( And I have no idea how to generate one
> >>> >> for a testcase, at least until the GIMPLE FE lands...
> >>> >>
> >>> >> > > On platforms that have IOR on floats (at least x86 with SSE, maybe some
> >>> >> > > vector mode on s390?), it would be cool to do the same for floats (most
> >>> >> > > likely at the RTL level).
> >>> >> >
> >>> >> > On GIMPLE view-converts could come to the rescue here as well.  Or we cab
> >>> >> > just allow bit-and/or on floats as much as we allow them on pointers.
> >>> >>
> >>> >> Would that generate sensible code on targets that do not have logic insns for
> >>> >> floats? Actually, even on x86_64 that generates inefficient code, so there
> >>> >> would be some work (for instance grep finds no gen_iordf3, only gen_iorv2df3).
> >>> >>
> >>> >> I am also a bit wary of doing those obfuscating optimizations too early...
> >>> >> a==0 is something that other optimizations might use. long
> >>> >> c=(long&)a|(long&)b; (double&)c==0; less so...
> >>> >>
> >>> >> (and I am assuming that signaling NaNs don't make the whole transformation
> >>> >> impossible, which might be wrong)
> >>> >
> >>> > Yeah.  I also think it's not so much important - I just wanted to mention
> >>> > vectors...
> >>> >
> >>> > Btw, I still think we need a more sensible infrastructure for passes
> >>> > to gather, analyze and modify complex conditions.  (I'm always pointing
> >>> > to tree-affine.c as an, albeit not very good, example for handling
> >>> > a similar problem)
> >>> Thanks for mentioning the value-matching capture @@, I wasn't aware of
> >>> this match.pd feature.
> >>> The current patch keeps it restricted to only bitwise operators on integers.
> >>> Bootstrap+test running on x86_64-unknown-linux-gnu.
> >>> OK to commit if passes ?
> >>
> >> +/* PR35691: Transform
> >> +   (x == 0 & y == 0) -> (x | typeof(x)(y)) == 0.
> >> +   (x != 0 | y != 0) -> (x | typeof(x)(y)) != 0.  */
> >> +
> >>
> >> Please omit the vertical space
> >>
> >> +(for bitop (bit_and bit_ior)
> >> +     cmp (eq ne)
> >> + (simplify
> >> +  (bitop (cmp @0 integer_zerop) (cmp @1 integer_zerop))
> >>
> >> if you capture the first integer_zerop as @2 then you can re-use it...
> >>
> >> +   (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
> >> +       && INTEGRAL_TYPE_P (TREE_TYPE (@1))
> >> +       && TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE
> >> (@1)))
> >> +    (cmp (bit_ior @0 (convert @1)) { build_zero_cst (TREE_TYPE (@0));
> >>
> >> ... here inplace of the { build_zero_cst ... }.
> >>
> >> Ok with that changes.
> > Thanks, committed the attached version as r241915.
> ugh, the svn commit message has:
> 
> testsuite/
> * gcc.dg/pr35691-1.c: New test-case.
> * gcc.dg/pr35691-4.c: Likewise.
> 
> pr35691-4.c was a typo, should be pr35691-2.c :/
> However testsuite/ChangeLog correctly has entry for pr35691-2.c
> Is it possible to edit the commit message for r241915 ?
> Sorry about this.

No, just leave it as-is.

Richard.

> Regards,
> Prathamesh
> >
> >>
> >> Richard.
> 
>
Prathamesh Kulkarni Nov. 8, 2016, 11:12 a.m. UTC | #3
On 8 November 2016 at 13:23, Richard Biener <rguenther@suse.de> wrote:
> On Mon, 7 Nov 2016, Prathamesh Kulkarni wrote:
>
>> On 7 November 2016 at 23:06, Prathamesh Kulkarni
>> <prathamesh.kulkarni@linaro.org> wrote:
>> > On 7 November 2016 at 15:43, Richard Biener <rguenther@suse.de> wrote:
>> >> On Fri, 4 Nov 2016, Prathamesh Kulkarni wrote:
>> >>
>> >>> On 4 November 2016 at 13:41, Richard Biener <rguenther@suse.de> wrote:
>> >>> > On Thu, 3 Nov 2016, Marc Glisse wrote:
>> >>> >
>> >>> >> On Thu, 3 Nov 2016, Richard Biener wrote:
>> >>> >>
>> >>> >> > > > > The transform would also work for vectors (element_precision for
>> >>> >> > > > > the test but also a value-matching zero which should ensure the
>> >>> >> > > > > same number of elements).
>> >>> >> > > > Um sorry, I didn't get how to check vectors to be of equal length by a
>> >>> >> > > > matching zero.
>> >>> >> > > > Could you please elaborate on that ?
>> >>> >> > >
>> >>> >> > > He may have meant something like:
>> >>> >> > >
>> >>> >> > >   (op (cmp @0 integer_zerop@2) (cmp @1 @2))
>> >>> >> >
>> >>> >> > I meant with one being @@2 to allow signed vs. Unsigned @0/@1 which was the
>> >>> >> > point of the pattern.
>> >>> >>
>> >>> >> Oups, that's what I had written first, and then I somehow managed to confuse
>> >>> >> myself enough to remove it so as to remove the call to types_match :-(
>> >>> >>
>> >>> >> > > So the last operand is checked with operand_equal_p instead of
>> >>> >> > > integer_zerop. But the fact that we could compute bit_ior on the
>> >>> >> > > comparison results should already imply that the number of elements is the
>> >>> >> > > same.
>> >>> >> >
>> >>> >> > Though for equality compares we also allow scalar results IIRC.
>> >>> >>
>> >>> >> Oh, right, I keep forgetting that :-( And I have no idea how to generate one
>> >>> >> for a testcase, at least until the GIMPLE FE lands...
>> >>> >>
>> >>> >> > > On platforms that have IOR on floats (at least x86 with SSE, maybe some
>> >>> >> > > vector mode on s390?), it would be cool to do the same for floats (most
>> >>> >> > > likely at the RTL level).
>> >>> >> >
>> >>> >> > On GIMPLE view-converts could come to the rescue here as well.  Or we cab
>> >>> >> > just allow bit-and/or on floats as much as we allow them on pointers.
>> >>> >>
>> >>> >> Would that generate sensible code on targets that do not have logic insns for
>> >>> >> floats? Actually, even on x86_64 that generates inefficient code, so there
>> >>> >> would be some work (for instance grep finds no gen_iordf3, only gen_iorv2df3).
>> >>> >>
>> >>> >> I am also a bit wary of doing those obfuscating optimizations too early...
>> >>> >> a==0 is something that other optimizations might use. long
>> >>> >> c=(long&)a|(long&)b; (double&)c==0; less so...
>> >>> >>
>> >>> >> (and I am assuming that signaling NaNs don't make the whole transformation
>> >>> >> impossible, which might be wrong)
>> >>> >
>> >>> > Yeah.  I also think it's not so much important - I just wanted to mention
>> >>> > vectors...
>> >>> >
>> >>> > Btw, I still think we need a more sensible infrastructure for passes
>> >>> > to gather, analyze and modify complex conditions.  (I'm always pointing
>> >>> > to tree-affine.c as an, albeit not very good, example for handling
>> >>> > a similar problem)
>> >>> Thanks for mentioning the value-matching capture @@, I wasn't aware of
>> >>> this match.pd feature.
>> >>> The current patch keeps it restricted to only bitwise operators on integers.
>> >>> Bootstrap+test running on x86_64-unknown-linux-gnu.
>> >>> OK to commit if passes ?
>> >>
>> >> +/* PR35691: Transform
>> >> +   (x == 0 & y == 0) -> (x | typeof(x)(y)) == 0.
>> >> +   (x != 0 | y != 0) -> (x | typeof(x)(y)) != 0.  */
>> >> +
>> >>
>> >> Please omit the vertical space
>> >>
>> >> +(for bitop (bit_and bit_ior)
>> >> +     cmp (eq ne)
>> >> + (simplify
>> >> +  (bitop (cmp @0 integer_zerop) (cmp @1 integer_zerop))
>> >>
>> >> if you capture the first integer_zerop as @2 then you can re-use it...
>> >>
>> >> +   (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
>> >> +       && INTEGRAL_TYPE_P (TREE_TYPE (@1))
>> >> +       && TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE
>> >> (@1)))
>> >> +    (cmp (bit_ior @0 (convert @1)) { build_zero_cst (TREE_TYPE (@0));
>> >>
>> >> ... here inplace of the { build_zero_cst ... }.
>> >>
>> >> Ok with that changes.
>> > Thanks, committed the attached version as r241915.
>> ugh, the svn commit message has:
>>
>> testsuite/
>> * gcc.dg/pr35691-1.c: New test-case.
>> * gcc.dg/pr35691-4.c: Likewise.
>>
>> pr35691-4.c was a typo, should be pr35691-2.c :/
>> However testsuite/ChangeLog correctly has entry for pr35691-2.c
>> Is it possible to edit the commit message for r241915 ?
>> Sorry about this.
>
> No, just leave it as-is.
Hi,
Chritstophe reported to me that the commit caused test-cases
pr35691-1.c and pr35691-2.c (which were added by the commit)
to FAIL for cortex-a5:
http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/241915/arm-none-linux-gnueabihf/diff-gcc-rh60-arm-none-linux-gnueabihf-arm-cortex-a5-vfpv3-d16-fp16.txt

It seems truth_andif_expr is not simplified to bit_and_expr on
cortex-a5 as for x86_64 (and other arm variants).
The differences in dumps start from 004t.gimple for pr35691-1.c:

x86_64 gimple dump:
foo (int z0, unsigned int z1)
{
  int D.1800;
  int t0;
  int t1;
  int t2;

  _1 = z0 == 0;
  t0 = (int) _1;
  _2 = z1 == 0;
  t1 = (int) _2;
  _3 = t0 != 0;
  _4 = t1 != 0;
  _5 = _3 & _4;
  t2 = (int) _5;
  D.1800 = t2;
  return D.1800;
}

cortex-a5 gimple dump:
foo (int z0, unsigned int z1)
{
  int iftmp.0;
  int D.4176;
  int t0;
  int t1;
  int t2;

  _1 = z0 == 0;
  t0 = (int) _1;
  _2 = z1 == 0;
  t1 = (int) _2;
  if (t0 != 0) goto <D.4174>; else goto <D.4172>;
  <D.4174>:
  if (t1 != 0) goto <D.4175>; else goto <D.4172>;
  <D.4175>:
  iftmp.0 = 1;
  goto <D.4173>;
  <D.4172>:
  iftmp.0 = 0;
  <D.4173>:
  t2 = iftmp.0;
  D.4176 = t2;
  return D.4176;
}

Since the pattern expects truth_andif_expr to be converted to bit_and_expr,
it fails to match for cortex-a5.
This seems to happen only for cortex-a5 (the other variants a9, a15,
a57 are OK).

Is my assumption that truth_andif_expr would be always converted to bit_and_expr
for above case incorrect ?

Thanks,
Prathamesh
>
> Richard.
>
>> Regards,
>> Prathamesh
>> >
>> >>
>> >> Richard.
>>
>>
>
> --
> Richard Biener <rguenther@suse.de>
> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
Richard Biener Nov. 8, 2016, 11:16 a.m. UTC | #4
On Tue, 8 Nov 2016, Prathamesh Kulkarni wrote:

> On 8 November 2016 at 13:23, Richard Biener <rguenther@suse.de> wrote:
> > On Mon, 7 Nov 2016, Prathamesh Kulkarni wrote:
> >
> >> On 7 November 2016 at 23:06, Prathamesh Kulkarni
> >> <prathamesh.kulkarni@linaro.org> wrote:
> >> > On 7 November 2016 at 15:43, Richard Biener <rguenther@suse.de> wrote:
> >> >> On Fri, 4 Nov 2016, Prathamesh Kulkarni wrote:
> >> >>
> >> >>> On 4 November 2016 at 13:41, Richard Biener <rguenther@suse.de> wrote:
> >> >>> > On Thu, 3 Nov 2016, Marc Glisse wrote:
> >> >>> >
> >> >>> >> On Thu, 3 Nov 2016, Richard Biener wrote:
> >> >>> >>
> >> >>> >> > > > > The transform would also work for vectors (element_precision for
> >> >>> >> > > > > the test but also a value-matching zero which should ensure the
> >> >>> >> > > > > same number of elements).
> >> >>> >> > > > Um sorry, I didn't get how to check vectors to be of equal length by a
> >> >>> >> > > > matching zero.
> >> >>> >> > > > Could you please elaborate on that ?
> >> >>> >> > >
> >> >>> >> > > He may have meant something like:
> >> >>> >> > >
> >> >>> >> > >   (op (cmp @0 integer_zerop@2) (cmp @1 @2))
> >> >>> >> >
> >> >>> >> > I meant with one being @@2 to allow signed vs. Unsigned @0/@1 which was the
> >> >>> >> > point of the pattern.
> >> >>> >>
> >> >>> >> Oups, that's what I had written first, and then I somehow managed to confuse
> >> >>> >> myself enough to remove it so as to remove the call to types_match :-(
> >> >>> >>
> >> >>> >> > > So the last operand is checked with operand_equal_p instead of
> >> >>> >> > > integer_zerop. But the fact that we could compute bit_ior on the
> >> >>> >> > > comparison results should already imply that the number of elements is the
> >> >>> >> > > same.
> >> >>> >> >
> >> >>> >> > Though for equality compares we also allow scalar results IIRC.
> >> >>> >>
> >> >>> >> Oh, right, I keep forgetting that :-( And I have no idea how to generate one
> >> >>> >> for a testcase, at least until the GIMPLE FE lands...
> >> >>> >>
> >> >>> >> > > On platforms that have IOR on floats (at least x86 with SSE, maybe some
> >> >>> >> > > vector mode on s390?), it would be cool to do the same for floats (most
> >> >>> >> > > likely at the RTL level).
> >> >>> >> >
> >> >>> >> > On GIMPLE view-converts could come to the rescue here as well.  Or we cab
> >> >>> >> > just allow bit-and/or on floats as much as we allow them on pointers.
> >> >>> >>
> >> >>> >> Would that generate sensible code on targets that do not have logic insns for
> >> >>> >> floats? Actually, even on x86_64 that generates inefficient code, so there
> >> >>> >> would be some work (for instance grep finds no gen_iordf3, only gen_iorv2df3).
> >> >>> >>
> >> >>> >> I am also a bit wary of doing those obfuscating optimizations too early...
> >> >>> >> a==0 is something that other optimizations might use. long
> >> >>> >> c=(long&)a|(long&)b; (double&)c==0; less so...
> >> >>> >>
> >> >>> >> (and I am assuming that signaling NaNs don't make the whole transformation
> >> >>> >> impossible, which might be wrong)
> >> >>> >
> >> >>> > Yeah.  I also think it's not so much important - I just wanted to mention
> >> >>> > vectors...
> >> >>> >
> >> >>> > Btw, I still think we need a more sensible infrastructure for passes
> >> >>> > to gather, analyze and modify complex conditions.  (I'm always pointing
> >> >>> > to tree-affine.c as an, albeit not very good, example for handling
> >> >>> > a similar problem)
> >> >>> Thanks for mentioning the value-matching capture @@, I wasn't aware of
> >> >>> this match.pd feature.
> >> >>> The current patch keeps it restricted to only bitwise operators on integers.
> >> >>> Bootstrap+test running on x86_64-unknown-linux-gnu.
> >> >>> OK to commit if passes ?
> >> >>
> >> >> +/* PR35691: Transform
> >> >> +   (x == 0 & y == 0) -> (x | typeof(x)(y)) == 0.
> >> >> +   (x != 0 | y != 0) -> (x | typeof(x)(y)) != 0.  */
> >> >> +
> >> >>
> >> >> Please omit the vertical space
> >> >>
> >> >> +(for bitop (bit_and bit_ior)
> >> >> +     cmp (eq ne)
> >> >> + (simplify
> >> >> +  (bitop (cmp @0 integer_zerop) (cmp @1 integer_zerop))
> >> >>
> >> >> if you capture the first integer_zerop as @2 then you can re-use it...
> >> >>
> >> >> +   (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
> >> >> +       && INTEGRAL_TYPE_P (TREE_TYPE (@1))
> >> >> +       && TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE
> >> >> (@1)))
> >> >> +    (cmp (bit_ior @0 (convert @1)) { build_zero_cst (TREE_TYPE (@0));
> >> >>
> >> >> ... here inplace of the { build_zero_cst ... }.
> >> >>
> >> >> Ok with that changes.
> >> > Thanks, committed the attached version as r241915.
> >> ugh, the svn commit message has:
> >>
> >> testsuite/
> >> * gcc.dg/pr35691-1.c: New test-case.
> >> * gcc.dg/pr35691-4.c: Likewise.
> >>
> >> pr35691-4.c was a typo, should be pr35691-2.c :/
> >> However testsuite/ChangeLog correctly has entry for pr35691-2.c
> >> Is it possible to edit the commit message for r241915 ?
> >> Sorry about this.
> >
> > No, just leave it as-is.
> Hi,
> Chritstophe reported to me that the commit caused test-cases
> pr35691-1.c and pr35691-2.c (which were added by the commit)
> to FAIL for cortex-a5:
> http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/241915/arm-none-linux-gnueabihf/diff-gcc-rh60-arm-none-linux-gnueabihf-arm-cortex-a5-vfpv3-d16-fp16.txt
> 
> It seems truth_andif_expr is not simplified to bit_and_expr on
> cortex-a5 as for x86_64 (and other arm variants).
> The differences in dumps start from 004t.gimple for pr35691-1.c:
> 
> x86_64 gimple dump:
> foo (int z0, unsigned int z1)
> {
>   int D.1800;
>   int t0;
>   int t1;
>   int t2;
> 
>   _1 = z0 == 0;
>   t0 = (int) _1;
>   _2 = z1 == 0;
>   t1 = (int) _2;
>   _3 = t0 != 0;
>   _4 = t1 != 0;
>   _5 = _3 & _4;
>   t2 = (int) _5;
>   D.1800 = t2;
>   return D.1800;
> }
> 
> cortex-a5 gimple dump:
> foo (int z0, unsigned int z1)
> {
>   int iftmp.0;
>   int D.4176;
>   int t0;
>   int t1;
>   int t2;
> 
>   _1 = z0 == 0;
>   t0 = (int) _1;
>   _2 = z1 == 0;
>   t1 = (int) _2;
>   if (t0 != 0) goto <D.4174>; else goto <D.4172>;
>   <D.4174>:
>   if (t1 != 0) goto <D.4175>; else goto <D.4172>;
>   <D.4175>:
>   iftmp.0 = 1;
>   goto <D.4173>;
>   <D.4172>:
>   iftmp.0 = 0;
>   <D.4173>:
>   t2 = iftmp.0;
>   D.4176 = t2;
>   return D.4176;
> }
> 
> Since the pattern expects truth_andif_expr to be converted to bit_and_expr,
> it fails to match for cortex-a5.
> This seems to happen only for cortex-a5 (the other variants a9, a15,
> a57 are OK).
> 
> Is my assumption that truth_andif_expr would be always converted to bit_and_expr
> for above case incorrect ?

Yes, it depends on LOGICAL_OP_SHORT_CIRCUIT.

Richard.

> Thanks,
> Prathamesh
> >
> > Richard.
> >
> >> Regards,
> >> Prathamesh
> >> >
> >> >>
> >> >> Richard.
> >>
> >>
> >
> > --
> > Richard Biener <rguenther@suse.de>
> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
> 
>
Kyrill Tkachov Nov. 8, 2016, 11:19 a.m. UTC | #5
On 08/11/16 11:16, Richard Biener wrote:
> On Tue, 8 Nov 2016, Prathamesh Kulkarni wrote:
>
>> On 8 November 2016 at 13:23, Richard Biener <rguenther@suse.de> wrote:
>>> On Mon, 7 Nov 2016, Prathamesh Kulkarni wrote:
>>>
>>>> On 7 November 2016 at 23:06, Prathamesh Kulkarni
>>>> <prathamesh.kulkarni@linaro.org> wrote:
>>>>> On 7 November 2016 at 15:43, Richard Biener <rguenther@suse.de> wrote:
>>>>>> On Fri, 4 Nov 2016, Prathamesh Kulkarni wrote:
>>>>>>
>>>>>>> On 4 November 2016 at 13:41, Richard Biener <rguenther@suse.de> wrote:
>>>>>>>> On Thu, 3 Nov 2016, Marc Glisse wrote:
>>>>>>>>
>>>>>>>>> On Thu, 3 Nov 2016, Richard Biener wrote:
>>>>>>>>>
>>>>>>>>>>>>> The transform would also work for vectors (element_precision for
>>>>>>>>>>>>> the test but also a value-matching zero which should ensure the
>>>>>>>>>>>>> same number of elements).
>>>>>>>>>>>> Um sorry, I didn't get how to check vectors to be of equal length by a
>>>>>>>>>>>> matching zero.
>>>>>>>>>>>> Could you please elaborate on that ?
>>>>>>>>>>> He may have meant something like:
>>>>>>>>>>>
>>>>>>>>>>>    (op (cmp @0 integer_zerop@2) (cmp @1 @2))
>>>>>>>>>> I meant with one being @@2 to allow signed vs. Unsigned @0/@1 which was the
>>>>>>>>>> point of the pattern.
>>>>>>>>> Oups, that's what I had written first, and then I somehow managed to confuse
>>>>>>>>> myself enough to remove it so as to remove the call to types_match :-(
>>>>>>>>>
>>>>>>>>>>> So the last operand is checked with operand_equal_p instead of
>>>>>>>>>>> integer_zerop. But the fact that we could compute bit_ior on the
>>>>>>>>>>> comparison results should already imply that the number of elements is the
>>>>>>>>>>> same.
>>>>>>>>>> Though for equality compares we also allow scalar results IIRC.
>>>>>>>>> Oh, right, I keep forgetting that :-( And I have no idea how to generate one
>>>>>>>>> for a testcase, at least until the GIMPLE FE lands...
>>>>>>>>>
>>>>>>>>>>> On platforms that have IOR on floats (at least x86 with SSE, maybe some
>>>>>>>>>>> vector mode on s390?), it would be cool to do the same for floats (most
>>>>>>>>>>> likely at the RTL level).
>>>>>>>>>> On GIMPLE view-converts could come to the rescue here as well.  Or we cab
>>>>>>>>>> just allow bit-and/or on floats as much as we allow them on pointers.
>>>>>>>>> Would that generate sensible code on targets that do not have logic insns for
>>>>>>>>> floats? Actually, even on x86_64 that generates inefficient code, so there
>>>>>>>>> would be some work (for instance grep finds no gen_iordf3, only gen_iorv2df3).
>>>>>>>>>
>>>>>>>>> I am also a bit wary of doing those obfuscating optimizations too early...
>>>>>>>>> a==0 is something that other optimizations might use. long
>>>>>>>>> c=(long&)a|(long&)b; (double&)c==0; less so...
>>>>>>>>>
>>>>>>>>> (and I am assuming that signaling NaNs don't make the whole transformation
>>>>>>>>> impossible, which might be wrong)
>>>>>>>> Yeah.  I also think it's not so much important - I just wanted to mention
>>>>>>>> vectors...
>>>>>>>>
>>>>>>>> Btw, I still think we need a more sensible infrastructure for passes
>>>>>>>> to gather, analyze and modify complex conditions.  (I'm always pointing
>>>>>>>> to tree-affine.c as an, albeit not very good, example for handling
>>>>>>>> a similar problem)
>>>>>>> Thanks for mentioning the value-matching capture @@, I wasn't aware of
>>>>>>> this match.pd feature.
>>>>>>> The current patch keeps it restricted to only bitwise operators on integers.
>>>>>>> Bootstrap+test running on x86_64-unknown-linux-gnu.
>>>>>>> OK to commit if passes ?
>>>>>> +/* PR35691: Transform
>>>>>> +   (x == 0 & y == 0) -> (x | typeof(x)(y)) == 0.
>>>>>> +   (x != 0 | y != 0) -> (x | typeof(x)(y)) != 0.  */
>>>>>> +
>>>>>>
>>>>>> Please omit the vertical space
>>>>>>
>>>>>> +(for bitop (bit_and bit_ior)
>>>>>> +     cmp (eq ne)
>>>>>> + (simplify
>>>>>> +  (bitop (cmp @0 integer_zerop) (cmp @1 integer_zerop))
>>>>>>
>>>>>> if you capture the first integer_zerop as @2 then you can re-use it...
>>>>>>
>>>>>> +   (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
>>>>>> +       && INTEGRAL_TYPE_P (TREE_TYPE (@1))
>>>>>> +       && TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE
>>>>>> (@1)))
>>>>>> +    (cmp (bit_ior @0 (convert @1)) { build_zero_cst (TREE_TYPE (@0));
>>>>>>
>>>>>> ... here inplace of the { build_zero_cst ... }.
>>>>>>
>>>>>> Ok with that changes.
>>>>> Thanks, committed the attached version as r241915.
>>>> ugh, the svn commit message has:
>>>>
>>>> testsuite/
>>>> * gcc.dg/pr35691-1.c: New test-case.
>>>> * gcc.dg/pr35691-4.c: Likewise.
>>>>
>>>> pr35691-4.c was a typo, should be pr35691-2.c :/
>>>> However testsuite/ChangeLog correctly has entry for pr35691-2.c
>>>> Is it possible to edit the commit message for r241915 ?
>>>> Sorry about this.
>>> No, just leave it as-is.
>> Hi,
>> Chritstophe reported to me that the commit caused test-cases
>> pr35691-1.c and pr35691-2.c (which were added by the commit)
>> to FAIL for cortex-a5:
>> http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/241915/arm-none-linux-gnueabihf/diff-gcc-rh60-arm-none-linux-gnueabihf-arm-cortex-a5-vfpv3-d16-fp16.txt
>>
>> It seems truth_andif_expr is not simplified to bit_and_expr on
>> cortex-a5 as for x86_64 (and other arm variants).
>> The differences in dumps start from 004t.gimple for pr35691-1.c:
>>
>> x86_64 gimple dump:
>> foo (int z0, unsigned int z1)
>> {
>>    int D.1800;
>>    int t0;
>>    int t1;
>>    int t2;
>>
>>    _1 = z0 == 0;
>>    t0 = (int) _1;
>>    _2 = z1 == 0;
>>    t1 = (int) _2;
>>    _3 = t0 != 0;
>>    _4 = t1 != 0;
>>    _5 = _3 & _4;
>>    t2 = (int) _5;
>>    D.1800 = t2;
>>    return D.1800;
>> }
>>
>> cortex-a5 gimple dump:
>> foo (int z0, unsigned int z1)
>> {
>>    int iftmp.0;
>>    int D.4176;
>>    int t0;
>>    int t1;
>>    int t2;
>>
>>    _1 = z0 == 0;
>>    t0 = (int) _1;
>>    _2 = z1 == 0;
>>    t1 = (int) _2;
>>    if (t0 != 0) goto <D.4174>; else goto <D.4172>;
>>    <D.4174>:
>>    if (t1 != 0) goto <D.4175>; else goto <D.4172>;
>>    <D.4175>:
>>    iftmp.0 = 1;
>>    goto <D.4173>;
>>    <D.4172>:
>>    iftmp.0 = 0;
>>    <D.4173>:
>>    t2 = iftmp.0;
>>    D.4176 = t2;
>>    return D.4176;
>> }
>>
>> Since the pattern expects truth_andif_expr to be converted to bit_and_expr,
>> it fails to match for cortex-a5.
>> This seems to happen only for cortex-a5 (the other variants a9, a15,
>> a57 are OK).
>>
>> Is my assumption that truth_andif_expr would be always converted to bit_and_expr
>> for above case incorrect ?
> Yes, it depends on LOGICAL_OP_SHORT_CIRCUIT.

Right, and that varies by core in the arm backend.
looking at the tuning structures in arm.c I'd expect exynosm1 to also exhibit this

Kyrill

> Richard.
>
>> Thanks,
>> Prathamesh
>>> Richard.
>>>
>>>> Regards,
>>>> Prathamesh
>>>>>> Richard.
>>>>
>>> --
>>> Richard Biener <rguenther@suse.de>
>>> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
>>
Martin Sebor Nov. 8, 2016, 4:48 p.m. UTC | #6
> Chritstophe reported to me that the commit caused test-cases
> pr35691-1.c and pr35691-2.c (which were added by the commit)
> to FAIL for cortex-a5:
> http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/241915/arm-none-linux-gnueabihf/diff-gcc-rh60-arm-none-linux-gnueabihf-arm-cortex-a5-vfpv3-d16-fp16.txt

I also see the test fail on powerpc64le.  The forwprop1 tree dump
is attached in case it helps.

Martin
;; Function foo (foo, funcdef_no=0, decl_uid=2506, cgraph_uid=0, symbol_order=0)

Applying pattern match.pd:2422, gimple-match.c:1704
Applying pattern match.pd:913, gimple-match.c:621
Applying pattern match.pd:901, gimple-match.c:164
Applying pattern match.pd:2685, gimple-match.c:59999
gimple_simplified to if (_1 != 0)
Applying pattern match.pd:913, generic-match.c:429
Applying pattern match.pd:901, generic-match.c:136
Applying pattern match.pd:2685, generic-match.c:30968
  Replaced '_1 != 0' with 'z0_4(D) == 0'
Applying pattern match.pd:2422, gimple-match.c:1704
Applying pattern match.pd:913, gimple-match.c:621
Applying pattern match.pd:901, gimple-match.c:164
Applying pattern match.pd:2685, gimple-match.c:59999
gimple_simplified to if (_2 != 0)
Applying pattern match.pd:913, generic-match.c:429
Applying pattern match.pd:901, generic-match.c:136
Applying pattern match.pd:2685, generic-match.c:30968
  Replaced '_2 != 0' with 'z1_6(D) == 0'
gimple_simplified to _11 = iftmp.0_3;
foo (int z0, unsigned int z1)
{
  int t2;
  int t1;
  int t0;
  _Bool _1;
  _Bool _2;
  int iftmp.0_3;
  int _11;

  <bb 2>:
  _1 = z0_4(D) == 0;
  t0_5 = (int) _1;
  _2 = z1_6(D) == 0;
  t1_7 = (int) _2;
  if (z0_4(D) == 0)
    goto <bb 3>;
  else
    goto <bb 4>;

  <bb 3>:
  if (z1_6(D) == 0)
    goto <bb 5>;
  else
    goto <bb 4>;

  <bb 4>:

  <bb 5>:
  # iftmp.0_3 = PHI <1(3), 0(4)>
  t2_10 = iftmp.0_3;
  _11 = iftmp.0_3;
  return _11;

}
diff mbox

Patch

diff --git a/gcc/match.pd b/gcc/match.pd
index 48f7351..29ddcd8 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -519,6 +519,18 @@  DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (if (TYPE_UNSIGNED (type))
     (bit_and @0 (bit_not (lshift { build_all_ones_cst (type); } @1)))))
 
+/* PR35691: Transform
+   (x == 0 & y == 0) -> (x | typeof(x)(y)) == 0.
+   (x != 0 | y != 0) -> (x | typeof(x)(y)) != 0.  */
+(for bitop (bit_and bit_ior)
+     cmp (eq ne)
+ (simplify
+  (bitop (cmp @0 integer_zerop@2) (cmp @1 integer_zerop))
+   (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
+       && INTEGRAL_TYPE_P (TREE_TYPE (@1))
+       && TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE (@1)))
+    (cmp (bit_ior @0 (convert @1)) @2))))
+
 /* Fold (A & ~B) - (A & B) into (A ^ B) - B.  */
 (simplify
  (minus (bit_and:cs @0 (bit_not @1)) (bit_and:cs @0 @1))
diff --git a/gcc/testsuite/gcc.dg/pr35691-1.c b/gcc/testsuite/gcc.dg/pr35691-1.c
new file mode 100644
index 0000000..5211f815
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr35691-1.c
@@ -0,0 +1,12 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-forwprop-details" } */
+
+int foo(int z0, unsigned z1)
+{
+  int t0 = (z0 == 0);
+  int t1 = (z1 == 0);
+  int t2 = (t0 && t1);
+  return t2;
+}
+
+/* { dg-final { scan-tree-dump "gimple_simplified to _\[0-9\]* = \\(int\\) z1_\[0-9\]*\\(D\\);" "forwprop1" } } */
diff --git a/gcc/testsuite/gcc.dg/pr35691-2.c b/gcc/testsuite/gcc.dg/pr35691-2.c
new file mode 100644
index 0000000..90cbf6d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr35691-2.c
@@ -0,0 +1,12 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-forwprop-details" } */
+
+int foo(int z0, unsigned z1)
+{
+  int t0 = (z0 != 0);
+  int t1 = (z1 != 0);
+  int t2 = (t0 || t1);
+  return t2;
+}
+
+/* { dg-final { scan-tree-dump "gimple_simplified to _\[0-9\]* = \\(int\\) z1_\[0-9\]*\\(D\\);" "forwprop1" } } */