Message ID | 013901d65e96$68255230$386ff690$@nextmovesoftware.com |
---|---|
State | New |
Headers | show |
Series | middle-end: Fold popcount(x&4) to (x>>2)&1 and friends. | expand |
On Mon, Jul 20, 2020 at 3:06 PM Roger Sayle <roger@nextmovesoftware.com> wrote: > > > This patch complements one from June 12th which is still awaiting > review: https://gcc.gnu.org/pipermail/gcc-patches/2020-June/547937.html > > This patch optimizes popcount and parity of an argument known to have > at most a single bit set, to be that single bit. Hence, popcount(x&8) > is simplified to (x>>3)&1. This generalizes the existing optimization > of popcount(x&1) being simplified to x&1, which is moved with this > patch to avoid a duplicate pattern warning in match.pd. > > This patch has been tested on x86_64-pc-linux-gnu with a "make bootstrap" > and "make -k check" with no new failures. If this is approved after > (or at the same time) as the patch above, I'm happy to resolve the > conflicts and retest before committing. Given you know the constant bit position of the possibly nonzero bit you can elide the conversion to unsigned for all but the case of a possibly negative input (IIRC GCC doesn't yet take advantage of negative right shift undefinedness - but maybe sanitizers complain). Also the shift amount doesn't need to be in the same type as the shifted amount so using either size_int() or integer_type_node for that argument should reduce INTEGER_CST waste. Any reason you are not tackling IFN_POPCOUNT/PARITY? You could use (for pfun (POPCOUNT PARITY) ... and automagically get all builtins and the IFN. Thanks, Richard. > > 2020-07-20 Roger Sayle <roger@nextmovesoftware.com> > > gcc/ChangeLog > * match.pd (popcount(x) -> x>>C): New simplification. > > gcc/testsuite > * gcc.dg/fold-popcount-5.c: New test. > * gcc.dg/fold-parity-5.c: Likewise. > > > Ok for mainline? > Thanks in advance, > Roger > -- > Roger Sayle > NextMove Software > Cambridge, UK >
* Richard Biener via Gcc-patches: > Given you know the constant bit position of the possibly nonzero > bit you can elide the conversion to unsigned for all but the case > of a possibly negative input (IIRC GCC doesn't yet take advantage > of negative right shift undefinedness - but maybe sanitizers complain). It's not undefined: “Signed '>>' acts on negative numbers by sign extension.” Thanks, Florian
Hi Richard, Many thanks for the peer review and feedback. I completely agree that POPCOUNT and PARITY iterators simplifies things and handle the IFN_ variants. Likewise, using integer_type_node as the type of shift constants also matches the idioms used elsewhere in match.pd and fold. The following (combined) patch implements those suggestions, for both submitted patches and the existing POPCOUNT simplifications, cleaning up this logic and making this part of match.pd more consistent. [I hadn't appreciated using POPCOUNT/PARITY avoids the need for an explicit for]. I've kept the shiftrt unsigned which is both a requirement for the transformation (when the single bit is the sign bit), but this also matches the (canonicalization) preference in the middle-end that unsigned logical shifts are preferred over arithmetic shifts when the distinction isn't important [lshrdi3 is sometimes cheaper than ashrdi3]. This revised patch has been tested on x86_64-pc-linux-gnu with a "make bootstrap" and "make -k check" with no new failures. Ok for mainline? 2020-07-22 Roger Sayle <roger@nextmovesoftware.com> Richard Biener <rguenther@suse.de> gcc/ChangeLog * match.pd (popcount(x)&1 -> parity(x)): New simplification. (parity(~x) -> parity(x)): New simplification. (parity(x)^parity(y) -> parity(x^y)): New simplification. (parity(x&1) -> x&1): New simplification. (popcount(x) -> x>>C): New simplification. gcc/testsuite/ChangeLog * gcc.dg/fold-popcount-5.c: New test. * gcc.dg/fold-parity-1.c: Likewise. * gcc.dg/fold-parity-2.c: Likewise. * gcc.dg/fold-parity-3.c: Likewise. * gcc.dg/fold-parity-4.c: Likewise. * gcc.dg/fold-parity-5.c: Likewise. Thanks in advance, Roger -- Roger Sayle NextMove Software Cambridge, UK -----Original Message----- From: Richard Biener <richard.guenther@gmail.com> Sent: 20 July 2020 14:40 To: Roger Sayle <roger@nextmovesoftware.com> Cc: GCC Patches <gcc-patches@gcc.gnu.org> Subject: Re: [PATCH] middle-end: Fold popcount(x&4) to (x>>2)&1 and friends. On Mon, Jul 20, 2020 at 3:06 PM Roger Sayle <roger@nextmovesoftware.com> wrote: > This patch complements one from June 12th which is still awaiting review: > https://gcc.gnu.org/pipermail/gcc-patches/2020-June/547937.html > > This patch optimizes popcount and parity of an argument known to have > at most a single bit set, to be that single bit. Hence, popcount(x&8) > is simplified to (x>>3)&1. This generalizes the existing optimization > of popcount(x&1) being simplified to x&1, which is moved with this > patch to avoid a duplicate pattern warning in match.pd. > > This patch has been tested on x86_64-pc-linux-gnu with a "make bootstrap" > and "make -k check" with no new failures. If this is approved after > (or at the same time) as the patch above, I'm happy to resolve the > conflicts and retest before committing. Given you know the constant bit position of the possibly nonzero bit you can elide the conversion to unsigned for all but the case of a possibly negative input (IIRC GCC doesn't yet take advantage of negative right shift undefinedness - but maybe sanitizers complain). Also the shift amount doesn't need to be in the same type as the shifted amount so using either size_int() or integer_type_node for that argument should reduce INTEGER_CST waste. Any reason you are not tackling IFN_POPCOUNT/PARITY? You could use (for pfun (POPCOUNT PARITY) ... and automagically get all builtins and the IFN. Thanks, Richard. diff --git a/gcc/match.pd b/gcc/match.pd index c6ae7a7..a096a17 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -5964,25 +5964,55 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (IFN_FMA @0 @1 @2)))) /* POPCOUNT simplifications. */ -(for popcount (BUILT_IN_POPCOUNT BUILT_IN_POPCOUNTL BUILT_IN_POPCOUNTLL - BUILT_IN_POPCOUNTIMAX) - /* popcount(X&1) is nop_expr(X&1). */ - (simplify - (popcount @0) - (if (tree_nonzero_bits (@0) == 1) - (convert @0))) - /* popcount(X) + popcount(Y) is popcount(X|Y) when X&Y must be zero. */ - (simplify - (plus (popcount:s @0) (popcount:s @1)) - (if (wi::bit_and (tree_nonzero_bits (@0), tree_nonzero_bits (@1)) == 0) - (popcount (bit_ior @0 @1)))) - /* popcount(X) == 0 is X == 0, and related (in)equalities. */ +/* popcount(X) + popcount(Y) is popcount(X|Y) when X&Y must be zero. */ +(simplify + (plus (POPCOUNT:s @0) (POPCOUNT:s @1)) + (if (wi::bit_and (tree_nonzero_bits (@0), tree_nonzero_bits (@1)) == 0) + (POPCOUNT (bit_ior @0 @1)))) + +/* popcount(X) == 0 is X == 0, and related (in)equalities. */ +(for popcount (POPCOUNT) (for cmp (le eq ne gt) rep (eq eq ne ne) (simplify (cmp (popcount @0) integer_zerop) (rep @0 { build_zero_cst (TREE_TYPE (@0)); })))) +/* Canonicalize POPCOUNT(x)&1 as PARITY(X). */ +(for popcount (BUILT_IN_POPCOUNT BUILT_IN_POPCOUNTL BUILT_IN_POPCOUNTLL + BUILT_IN_POPCOUNTIMAX) + parity (BUILT_IN_PARITY BUILT_IN_PARITYL BUILT_IN_PARITYLL + BUILT_IN_PARITYIMAX) + (simplify + (bit_and (popcount @0) integer_onep) + (parity @0))) + +/* PARITY simplifications. */ +/* parity(~X) is parity(X). */ +(simplify + (PARITY (bit_not @0)) + (PARITY @0)) + +/* parity(X)^parity(Y) is parity(X^Y). */ +(simplify + (bit_xor (PARITY:s @0) (PARITY:s @1)) + (PARITY (bit_xor @0 @1))) + +/* Common POPCOUNT/PARITY simplifications. */ +/* popcount(X&C1) is (X>>C2)&1 when C1 == 1<<C2. Same for parity(X&C1). */ +(for pfun (POPCOUNT PARITY) + (simplify + (pfun @0) + (with { wide_int nz = tree_nonzero_bits (@0); } + (switch + (if (nz == 1) + (convert @0)) + (if (wi::popcount (nz) == 1) + (with { tree utype = unsigned_type_for (TREE_TYPE (@0)); } + (convert (rshift:utype (convert:utype @0) + { build_int_cst (integer_type_node, + wi::ctz (nz)); })))))))) + #if GIMPLE /* 64- and 32-bits branchless implementations of popcount are detected: diff --git a/gcc/testsuite/gcc.dg/fold-popcount-5.c b/gcc/testsuite/gcc.dg/fold-popcount-5.c new file mode 100644 index 0000000..943726f --- /dev/null +++ b/gcc/testsuite/gcc.dg/fold-popcount-5.c @@ -0,0 +1,38 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +int test_and4(unsigned int a) +{ + return __builtin_popcount(a&4); +} + +int test_and4l(unsigned long b) +{ + return __builtin_popcountl(b&4); +} + +int test_and4ll(unsigned long long c) +{ + return __builtin_popcountll(c&4); +} + +int test_shift(unsigned int d) +{ + int bits = 8*sizeof(unsigned int)-1; + return __builtin_popcount(d<<31); +} + +int test_shiftl(unsigned long e) +{ + int bits = 8*sizeof(unsigned long)-1; + return __builtin_popcountl(e<<bits); +} + +int test_shiftll(unsigned long long f) +{ + int bits = 8*sizeof(unsigned long long)-1; + return __builtin_popcountll(f<<bits); +} + +/* { dg-final { scan-tree-dump-times "popcount" 0 "optimized" } } */ + diff --git a/gcc/testsuite/gcc.dg/fold-parity-1.c b/gcc/testsuite/gcc.dg/fold-parity-1.c new file mode 100644 index 0000000..3ba56c7 --- /dev/null +++ b/gcc/testsuite/gcc.dg/fold-parity-1.c @@ -0,0 +1,21 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-original" } */ + +int foo(unsigned int x) +{ + return __builtin_popcount(x) & 1; +} + +int fool(unsigned long x) +{ + return __builtin_popcountl(x) & 1; +} + +int fooll(unsigned long long x) +{ + return __builtin_popcountll(x) & 1; +} + +/* { dg-final { scan-tree-dump-times "__builtin_popcount" 0 "original" } } */ +/* { dg-final { scan-tree-dump-times "__builtin_parity" 3 "original" } } */ + diff --git a/gcc/testsuite/gcc.dg/fold-parity-2.c b/gcc/testsuite/gcc.dg/fold-parity-2.c new file mode 100644 index 0000000..8c7acbf --- /dev/null +++ b/gcc/testsuite/gcc.dg/fold-parity-2.c @@ -0,0 +1,20 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +int foo(unsigned int x) +{ + return __builtin_parity(~x); +} + +int fool(unsigned long x) +{ + return __builtin_parityl(~x); +} + +int fooll(unsigned long long x) +{ + return __builtin_parityll(~x); +} + +/* { dg-final { scan-tree-dump-times "~" 0 "optimized" } } */ + diff --git a/gcc/testsuite/gcc.dg/fold-parity-3.c b/gcc/testsuite/gcc.dg/fold-parity-3.c new file mode 100644 index 0000000..e0355cc --- /dev/null +++ b/gcc/testsuite/gcc.dg/fold-parity-3.c @@ -0,0 +1,20 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +int foo(unsigned int x) +{ + return __builtin_parity(x&1); +} + +int fool(unsigned long x) +{ + return __builtin_parityl(x&1); +} + +int fooll(unsigned long long x) +{ + return __builtin_parityll(x&1); +} + +/* { dg-final { scan-tree-dump-times "__builtin_parity" 0 "optimized" } } */ + diff --git a/gcc/testsuite/gcc.dg/fold-parity-4.c b/gcc/testsuite/gcc.dg/fold-parity-4.c new file mode 100644 index 0000000..5dfedab --- /dev/null +++ b/gcc/testsuite/gcc.dg/fold-parity-4.c @@ -0,0 +1,20 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +int foo(unsigned int x, unsigned int y) +{ + return __builtin_parity(x) ^ __builtin_parity(y); +} + +int fool(unsigned long x, unsigned long y) +{ + return __builtin_parityl(x) ^ __builtin_parityl(y); +} + +int fooll(unsigned long long x, unsigned long long y) +{ + return __builtin_parityll(x) ^ __builtin_parityll(y); +} + +/* { dg-final { scan-tree-dump-times "__builtin_parity" 3 "optimized" } } */ + diff --git a/gcc/testsuite/gcc.dg/fold-parity-5.c b/gcc/testsuite/gcc.dg/fold-parity-5.c new file mode 100644 index 0000000..69d3a6a --- /dev/null +++ b/gcc/testsuite/gcc.dg/fold-parity-5.c @@ -0,0 +1,38 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +int test_and4(unsigned int a) +{ + return __builtin_parity(a&4); +} + +int test_and4l(unsigned long b) +{ + return __builtin_parityl(b&4); +} + +int test_and4ll(unsigned long long c) +{ + return __builtin_parityll(c&4); +} + +int test_shift(unsigned int d) +{ + int bits = 8*sizeof(unsigned int)-1; + return __builtin_parity(d<<31); +} + +int test_shiftl(unsigned long e) +{ + int bits = 8*sizeof(unsigned long)-1; + return __builtin_parityl(e<<bits); +} + +int test_shiftll(unsigned long long f) +{ + int bits = 8*sizeof(unsigned long long)-1; + return __builtin_parityll(f<<bits); +} + +/* { dg-final { scan-tree-dump-times "parity" 0 "optimized" } } */ +
On Wed, 22 Jul 2020, Roger Sayle wrote: > Many thanks for the peer review and feedback. I completely agree that POPCOUNT > and PARITY iterators simplifies things and handle the IFN_ variants. Is there a reason why the iterators cannot be used for this one? +(for popcount (BUILT_IN_POPCOUNT BUILT_IN_POPCOUNTL BUILT_IN_POPCOUNTLL + BUILT_IN_POPCOUNTIMAX) + parity (BUILT_IN_PARITY BUILT_IN_PARITYL BUILT_IN_PARITYLL + BUILT_IN_PARITYIMAX) + (simplify + (bit_and (popcount @0) integer_onep) + (parity @0)))
On Thu, Jul 23, 2020 at 10:15 AM Marc Glisse <marc.glisse@inria.fr> wrote: > > On Wed, 22 Jul 2020, Roger Sayle wrote: > > > Many thanks for the peer review and feedback. I completely agree that POPCOUNT > > and PARITY iterators simplifies things and handle the IFN_ variants. > > Is there a reason why the iterators cannot be used for this one? > > +(for popcount (BUILT_IN_POPCOUNT BUILT_IN_POPCOUNTL BUILT_IN_POPCOUNTLL > + BUILT_IN_POPCOUNTIMAX) > + parity (BUILT_IN_PARITY BUILT_IN_PARITYL BUILT_IN_PARITYLL > + BUILT_IN_PARITYIMAX) > + (simplify > + (bit_and (popcount @0) integer_onep) > + (parity @0))) I should indeed work fine to even do (simplify (bit_and (POPCOUNT @0) integer_onep) (PARITY @0)) Likewise for the existing +/* popcount(X) == 0 is X == 0, and related (in)equalities. */ +(for popcount (POPCOUNT) (for cmp (le eq ne gt) rep (eq eq ne ne) (simplify (cmp (popcount @0) integer_zerop) (rep @0 { build_zero_cst (TREE_TYPE (@0)); })))) you can elide the (for ...) OK with those changes. Richard. > > -- > Marc Glisse
On Thu, 23 Jul 2020, Marc Glisse wrote: > On Wed, 22 Jul 2020, Roger Sayle wrote: > >> Many thanks for the peer review and feedback. I completely agree that >> POPCOUNT >> and PARITY iterators simplifies things and handle the IFN_ variants. > > Is there a reason why the iterators cannot be used for this one? > > +(for popcount (BUILT_IN_POPCOUNT BUILT_IN_POPCOUNTL BUILT_IN_POPCOUNTLL > + BUILT_IN_POPCOUNTIMAX) > + parity (BUILT_IN_PARITY BUILT_IN_PARITYL BUILT_IN_PARITYLL > + BUILT_IN_PARITYIMAX) > + (simplify > + (bit_and (popcount @0) integer_onep) > + (parity @0))) Ah, maybe because we may have platforms/modes where the optab for popcount is supported but not the one for parity, and we are not allowed to create internal calls if the optab is not supported? I think expand_parity tries to expand parity as popcount&1, so it should be fine, but I didn't actually try it...
On Thu, Jul 23, 2020 at 11:11 AM Marc Glisse <marc.glisse@inria.fr> wrote: > > On Thu, 23 Jul 2020, Marc Glisse wrote: > > > On Wed, 22 Jul 2020, Roger Sayle wrote: > > > >> Many thanks for the peer review and feedback. I completely agree that > >> POPCOUNT > >> and PARITY iterators simplifies things and handle the IFN_ variants. > > > > Is there a reason why the iterators cannot be used for this one? > > > > +(for popcount (BUILT_IN_POPCOUNT BUILT_IN_POPCOUNTL BUILT_IN_POPCOUNTLL > > + BUILT_IN_POPCOUNTIMAX) > > + parity (BUILT_IN_PARITY BUILT_IN_PARITYL BUILT_IN_PARITYLL > > + BUILT_IN_PARITYIMAX) > > + (simplify > > + (bit_and (popcount @0) integer_onep) > > + (parity @0))) > > Ah, maybe because we may have platforms/modes where the optab for popcount > is supported but not the one for parity, and we are not allowed to create > internal calls if the optab is not supported? I think expand_parity tries > to expand parity as popcount&1, so it should be fine, but I didn't > actually try it... Hmm, that might indeed be a problem. An explicit direct_internal_fn_supported_p guard would help here, like maybe (if (PARITY != IFN_PARITY || direct_internal_fn_supported_p (IFN_PARITY, type, OPTIMIZE_FOR_BOTH)) (PARITY @0))) magic that eventually could be auto-generated by genmatch ... (but only if iterating, so it would be a bit iffy) Richard. > -- > Marc Glisse
Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes: > On Thu, Jul 23, 2020 at 11:11 AM Marc Glisse <marc.glisse@inria.fr> wrote: >> >> On Thu, 23 Jul 2020, Marc Glisse wrote: >> >> > On Wed, 22 Jul 2020, Roger Sayle wrote: >> > >> >> Many thanks for the peer review and feedback. I completely agree that >> >> POPCOUNT >> >> and PARITY iterators simplifies things and handle the IFN_ variants. >> > >> > Is there a reason why the iterators cannot be used for this one? >> > >> > +(for popcount (BUILT_IN_POPCOUNT BUILT_IN_POPCOUNTL BUILT_IN_POPCOUNTLL >> > + BUILT_IN_POPCOUNTIMAX) >> > + parity (BUILT_IN_PARITY BUILT_IN_PARITYL BUILT_IN_PARITYLL >> > + BUILT_IN_PARITYIMAX) >> > + (simplify >> > + (bit_and (popcount @0) integer_onep) >> > + (parity @0))) >> >> Ah, maybe because we may have platforms/modes where the optab for popcount >> is supported but not the one for parity, and we are not allowed to create >> internal calls if the optab is not supported? I think expand_parity tries >> to expand parity as popcount&1, so it should be fine, but I didn't >> actually try it... > > Hmm, that might indeed be a problem. An explicit > direct_internal_fn_supported_p guard would help here, like > maybe > > (if (PARITY != IFN_PARITY > || direct_internal_fn_supported_p (IFN_PARITY, type, OPTIMIZE_FOR_BOTH)) > (PARITY @0))) > > magic that eventually could be auto-generated by genmatch ... (but only if > iterating, so it would be a bit iffy) The matching code should already reject folds to IFN_PARITY calls that aren't supported by the target (gimple-match-head.c:build_call_internal). Thanks, Richard
On Thu, Jul 23, 2020 at 12:58 PM Richard Sandiford <richard.sandiford@arm.com> wrote: > > Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes: > > On Thu, Jul 23, 2020 at 11:11 AM Marc Glisse <marc.glisse@inria.fr> wrote: > >> > >> On Thu, 23 Jul 2020, Marc Glisse wrote: > >> > >> > On Wed, 22 Jul 2020, Roger Sayle wrote: > >> > > >> >> Many thanks for the peer review and feedback. I completely agree that > >> >> POPCOUNT > >> >> and PARITY iterators simplifies things and handle the IFN_ variants. > >> > > >> > Is there a reason why the iterators cannot be used for this one? > >> > > >> > +(for popcount (BUILT_IN_POPCOUNT BUILT_IN_POPCOUNTL BUILT_IN_POPCOUNTLL > >> > + BUILT_IN_POPCOUNTIMAX) > >> > + parity (BUILT_IN_PARITY BUILT_IN_PARITYL BUILT_IN_PARITYLL > >> > + BUILT_IN_PARITYIMAX) > >> > + (simplify > >> > + (bit_and (popcount @0) integer_onep) > >> > + (parity @0))) > >> > >> Ah, maybe because we may have platforms/modes where the optab for popcount > >> is supported but not the one for parity, and we are not allowed to create > >> internal calls if the optab is not supported? I think expand_parity tries > >> to expand parity as popcount&1, so it should be fine, but I didn't > >> actually try it... > > > > Hmm, that might indeed be a problem. An explicit > > direct_internal_fn_supported_p guard would help here, like > > maybe > > > > (if (PARITY != IFN_PARITY > > || direct_internal_fn_supported_p (IFN_PARITY, type, OPTIMIZE_FOR_BOTH)) > > (PARITY @0))) > > > > magic that eventually could be auto-generated by genmatch ... (but only if > > iterating, so it would be a bit iffy) > > The matching code should already reject folds to IFN_PARITY calls that > aren't supported by the target (gimple-match-head.c:build_call_internal). Ah, indeed, so it should work unchanged even. Richard. > Thanks, > Richard
On Thu, Jul 23, 2020 at 10:02 Richard Biener wrote: > Likewise for the existing > >+/* popcount(X) == 0 is X == 0, and related (in)equalities. */ (for >+(for popcount (POPCOUNT) > (for cmp (le eq ne gt) > rep (eq eq ne ne) > (simplify > (cmp (popcount @0) integer_zerop) > (rep @0 { build_zero_cst (TREE_TYPE (@0)); })))) > > you can elide the (for ...) Unfortunately, I tried that myself when preparing this patch: Attempt #1: (for cmp (le eq ne gt) rep (eq eq ne ne) (simplify (cmp (POPCOUNT @0) integer_zerop) (rep @0 { build_zero_cst (TREE_TYPE (@0)); })))) results in: build/genmatch --gimple ../../patcha3/gcc/match.pd \ > tmp-gimple-match.c ../../patcha3/gcc/match.pd:5977:11 error: operator-list POPCOUNT cannot be expanded inside 'for' (cmp (POPCOUNT @0) integer_zerop) ^ Attempt #2: (for popcount (POPCOUNT) cmp (le eq ne gt) rep (eq eq ne ne) (simplify (cmp (popcount @0) integer_zerop) (rep @0 { build_zero_cst (TREE_TYPE (@0)); }))) results in: > tmp-gimple-match.c ../../patcha3/gcc/match.pd:5975:22 error: All user-defined identifiers must have a multiple number of operator substitutions of the smallest number of substitutions cmp (le eq ne gt) ^ I'll leave it the way it is with the nested FORs, and investigate your other suggestions. > OK with those changes. > > Richard. Roger --
On Thu, Jul 23, 2020 at 2:01 PM Roger Sayle <roger@nextmovesoftware.com> wrote: > > > On Thu, Jul 23, 2020 at 10:02 Richard Biener wrote: > > Likewise for the existing > > > >+/* popcount(X) == 0 is X == 0, and related (in)equalities. */ (for > >+(for popcount (POPCOUNT) > > (for cmp (le eq ne gt) > > rep (eq eq ne ne) > > (simplify > > (cmp (popcount @0) integer_zerop) > > (rep @0 { build_zero_cst (TREE_TYPE (@0)); })))) > > > > you can elide the (for ...) > > Unfortunately, I tried that myself when preparing this patch: > > Attempt #1: > > (for cmp (le eq ne gt) > rep (eq eq ne ne) > (simplify > (cmp (POPCOUNT @0) integer_zerop) > (rep @0 { build_zero_cst (TREE_TYPE (@0)); })))) > > results in: > build/genmatch --gimple ../../patcha3/gcc/match.pd \ > > tmp-gimple-match.c > ../../patcha3/gcc/match.pd:5977:11 error: operator-list POPCOUNT cannot be expanded inside 'for' > (cmp (POPCOUNT @0) integer_zerop) > ^ > Attempt #2: > > (for popcount (POPCOUNT) > cmp (le eq ne gt) > rep (eq eq ne ne) > (simplify > (cmp (popcount @0) integer_zerop) > (rep @0 { build_zero_cst (TREE_TYPE (@0)); }))) > > results in: > > tmp-gimple-match.c > ../../patcha3/gcc/match.pd:5975:22 error: All user-defined identifiers must have a multiple number of operator substitutions of the smallest number of substitutions > cmp (le eq ne gt) > ^ Oh yeah - I failed to spot the outer (for ..), indeed it doesn't work then. We're not magically creating a nest from the first vairant which we could make work - I erred on the side of "too much implicitness" caution here. > I'll leave it the way it is with the nested FORs, and investigate your other suggestions. Yeah, fine. > > OK with those changes. > > > > Richard. > > > Roger > -- > >
diff --git a/gcc/match.pd b/gcc/match.pd index c6ae7a7..0b3b626 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -5966,11 +5966,6 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) /* POPCOUNT simplifications. */ (for popcount (BUILT_IN_POPCOUNT BUILT_IN_POPCOUNTL BUILT_IN_POPCOUNTLL BUILT_IN_POPCOUNTIMAX) - /* popcount(X&1) is nop_expr(X&1). */ - (simplify - (popcount @0) - (if (tree_nonzero_bits (@0) == 1) - (convert @0))) /* popcount(X) + popcount(Y) is popcount(X|Y) when X&Y must be zero. */ (simplify (plus (popcount:s @0) (popcount:s @1)) @@ -5983,6 +5978,23 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (cmp (popcount @0) integer_zerop) (rep @0 { build_zero_cst (TREE_TYPE (@0)); })))) +/* popcount(X&C1) is (X>>C2)&1 when C1 == 1<<C2. Same for parity(X&C1). */ +(for pfun (BUILT_IN_POPCOUNT BUILT_IN_PARITY + BUILT_IN_POPCOUNTL BUILT_IN_PARITYL + BUILT_IN_POPCOUNTLL BUILT_IN_PARITYLL + BUILT_IN_POPCOUNTIMAX BUILT_IN_PARITYIMAX) + (simplify + (pfun @0) + (with { wide_int nz = tree_nonzero_bits (@0); } + (switch + (if (nz == 1) + (convert @0)) + (if (wi::popcount (nz) == 1) + (with { tree utype = unsigned_type_for (TREE_TYPE (@0)); + int bits = wi::ctz (nz); } + (convert (rshift:utype (convert:utype @0) + { build_int_cst (utype, bits); })))))))) + #if GIMPLE /* 64- and 32-bits branchless implementations of popcount are detected: diff --git a/gcc/testsuite/gcc.dg/fold-popcount-5.c b/gcc/testsuite/gcc.dg/fold-popcount-5.c new file mode 100644 index 0000000..943726f --- /dev/null +++ b/gcc/testsuite/gcc.dg/fold-popcount-5.c @@ -0,0 +1,38 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +int test_and4(unsigned int a) +{ + return __builtin_popcount(a&4); +} + +int test_and4l(unsigned long b) +{ + return __builtin_popcountl(b&4); +} + +int test_and4ll(unsigned long long c) +{ + return __builtin_popcountll(c&4); +} + +int test_shift(unsigned int d) +{ + int bits = 8*sizeof(unsigned int)-1; + return __builtin_popcount(d<<31); +} + +int test_shiftl(unsigned long e) +{ + int bits = 8*sizeof(unsigned long)-1; + return __builtin_popcountl(e<<bits); +} + +int test_shiftll(unsigned long long f) +{ + int bits = 8*sizeof(unsigned long long)-1; + return __builtin_popcountll(f<<bits); +} + +/* { dg-final { scan-tree-dump-times "popcount" 0 "optimized" } } */ + diff --git a/gcc/testsuite/gcc.dg/fold-parity-5.c b/gcc/testsuite/gcc.dg/fold-parity-5.c new file mode 100644 index 0000000..69d3a6a --- /dev/null +++ b/gcc/testsuite/gcc.dg/fold-parity-5.c @@ -0,0 +1,38 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +int test_and4(unsigned int a) +{ + return __builtin_parity(a&4); +} + +int test_and4l(unsigned long b) +{ + return __builtin_parityl(b&4); +} + +int test_and4ll(unsigned long long c) +{ + return __builtin_parityll(c&4); +} + +int test_shift(unsigned int d) +{ + int bits = 8*sizeof(unsigned int)-1; + return __builtin_parity(d<<31); +} + +int test_shiftl(unsigned long e) +{ + int bits = 8*sizeof(unsigned long)-1; + return __builtin_parityl(e<<bits); +} + +int test_shiftll(unsigned long long f) +{ + int bits = 8*sizeof(unsigned long long)-1; + return __builtin_parityll(f<<bits); +} + +/* { dg-final { scan-tree-dump-times "parity" 0 "optimized" } } */ +