Message ID  003f01d3a1a3$66b893b0$3429bb10$@nextmovesoftware.com 

State  New 
Headers  show 
Series 

Related  show 
On 02/09/2018 05:42 AM, Roger Sayle wrote: > > The following patch implements a number of __builtin_popcount related > optimizations. > (i) popcount(x) == 0 can be simplified to x==0, and popcount(x) != 0 to > x!=0. > (ii) popcount(x&1) can be simplified to x&1, and for unsigned x, > popcount(x>>31) to x>>31. > (iii) popcount (x&6) + popcount(y&16) can be simplified to > popcount((x&6)(y&16)) > > These may seem obscure transformations, but performing these types of > POPCOUNT > operations are often the performance critical steps in some cheminformatics > applications. > > To implement the above transformations I've introduced the tree_nonzero_bits > function, > which is a treelevel version of rtlanal's nonzero_bits used by the RTL > optimizers. > > The following patch has been tested on x86_64pclinuxgnu with a "make > bootstrap" > and "make check" with no regressions, and passes for the four new gcc.dg > test cases. > > Many thanks In advance. Best regards, > > Roger >  > Roger Sayle, PhD. > NextMove Software Limited > Innovation Centre (Unit 23), Cambridge Science Park, Cambridge, CB4 0EY > > 20180209 Roger Sayle <roger@nextmovesoftware.com> > > * foldconst.c (tree_nonzero_bits): New function. > * foldconst.h (tree_nonzero_bits): Likewise. > * match.pd (POPCOUNT): New patterns to fold BUILTIN_POPCOUNT and > friends. POPCOUNT(x&1) => x&1, POPCOUNT(x)==0 => x==0, etc. > > 20180209 Roger Sayle <roger@nextmovesoftware.com> > > * gcc.dg/foldpopcount1.c: New testcase. > * gcc.dg/foldpopcount2.c: New testcase. > * gcc.dg/foldpopcount3.c: New testcase. > * gcc.dg/foldpopcount4.c: New testcase. Queued for stage1. THanks Roger. I hope things are going well. jeff
On 02/09/2018 05:42 AM, Roger Sayle wrote: > The following patch implements a number of __builtin_popcount related > optimizations. > (i) popcount(x) == 0 can be simplified to x==0, and popcount(x) != 0 to > x!=0. > (ii) popcount(x&1) can be simplified to x&1, and for unsigned x, > popcount(x>>31) to x>>31. > (iii) popcount (x&6) + popcount(y&16) can be simplified to > popcount((x&6)(y&16)) > > These may seem obscure transformations, but performing these types of > POPCOUNT > operations are often the performance critical steps in some cheminformatics > applications. > > To implement the above transformations I've introduced the tree_nonzero_bits > function, > which is a treelevel version of rtlanal's nonzero_bits used by the RTL > optimizers. > > The following patch has been tested on x86_64pclinuxgnu with a "make > bootstrap" > and "make check" with no regressions, and passes for the four new gcc.dg > test cases. > > Many thanks In advance. Best regards, > > Roger >  > Roger Sayle, PhD. > NextMove Software Limited > Innovation Centre (Unit 23), Cambridge Science Park, Cambridge, CB4 0EY > > 20180209 Roger Sayle <roger@nextmovesoftware.com> > > * foldconst.c (tree_nonzero_bits): New function. > * foldconst.h (tree_nonzero_bits): Likewise. > * match.pd (POPCOUNT): New patterns to fold BUILTIN_POPCOUNT and > friends. POPCOUNT(x&1) => x&1, POPCOUNT(x)==0 => x==0, etc. > > 20180209 Roger Sayle <roger@nextmovesoftware.com> > > * gcc.dg/foldpopcount1.c: New testcase. > * gcc.dg/foldpopcount2.c: New testcase. > * gcc.dg/foldpopcount3.c: New testcase. > * gcc.dg/foldpopcount4.c: New testcase. > > > > > Index: gcc/foldconst.c > =================================================================== >  gcc/foldconst.c (revision 257227) > +++ gcc/foldconst.c (working copy) > @@ 14580,6 +14580,75 @@ > return string + offset; > } > > +/* Given a tree T, compute which bits in T may be nonzero. */ > + > +wide_int > +tree_nonzero_bits (const_tree t) > +{ > + switch (TREE_CODE (t)) > + { > + case BIT_IOR_EXPR: > + case BIT_XOR_EXPR: > + return wi::bit_or (tree_nonzero_bits (TREE_OPERAND (t, 0)), > + tree_nonzero_bits (TREE_OPERAND (t, 1))); Hmm. I think this will potentially have too many bits set in the BIT_XOR case. Is there some reason you didn't use wi::bit_xor for that case? We can probably go ahead and ACK this once that question is resolved. THanks, jeff
(I am not a reviewer, just commenting) On Fri, 9 Feb 2018, Roger Sayle wrote: > The following patch implements a number of __builtin_popcount related > optimizations. > (i) popcount(x) == 0 can be simplified to x==0, and popcount(x) != 0 to > x!=0. > (ii) popcount(x&1) can be simplified to x&1, and for unsigned x, > popcount(x>>31) to x>>31. > (iii) popcount (x&6) + popcount(y&16) can be simplified to > popcount((x&6)(y&16)) > > These may seem obscure transformations, but performing these types of > POPCOUNT operations are often the performance critical steps in some > cheminformatics applications. > > To implement the above transformations I've introduced the > tree_nonzero_bits function, which is a treelevel version of rtlanal's > nonzero_bits used by the RTL optimizers. I am wondering if this brings much, compared to just using get_nonzero_bits (works on INTEGER_CST / SSA_NAME, matched by with_possible_nonzero_bits). If we do decide to introduce this function, we probably want to use it in other places that currently use get_nonzero_bits as well... > + case NOP_EXPR: > + case CONVERT_EXPR: We have CASE_CONVERT: for this pair. > + case LSHIFT_EXPR: > + if (TREE_CODE (TREE_OPERAND (t, 1)) == INTEGER_CST) Maybe check INTEGRAL_TYPE_P as well, like you did for PLUS_EXPR? Or was that also unnecessary for PLUS_EXPR? > + return wi::neg_p (arg1) > + ? wi::rshift (nzbits, arg1, TYPE_SIGN (type)) > + : wi::lshift (nzbits, arg1); I can see that foldconst.c already does something like that. I am surprised the sanitizer guys haven't asked that we just punt on negative values instead. >  gcc/match.pd (revision 257227) > +++ gcc/match.pd (working copy) > @@ 4648,3 +4648,24 @@ >  wi::geu_p (wi::to_wide (@rpos), > wi::to_wide (@ipos) + isize)) > (BIT_FIELD_REF @0 @rsize @rpos))))) > + > +/* POPCOUNT simplifications. */ > +(for popcount (BUILT_IN_POPCOUNT BUILT_IN_POPCOUNTL BUILT_IN_POPCOUNTLL > + BUILT_IN_POPCOUNTIMAX) > + /* popcount(X&1) is nop_expr(X&1). */ > + (simplify > + (popcount @0) > + (if (tree_nonzero_bits (@0) == 1) > + (convert @0))) Good thing we can't call popcount on signed 1bit types ;) > + /* popcount(X) + popcount(Y) is popcount(XY) when X&Y must be zero. */ > + (simplify > + (plus (popcount @0) (popcount @1)) We probably want :s on both popcount: if they are used in other places than just this addition, it is likely cheaper not to introduce a new call to popcount. > + (if (wi::bit_and (tree_nonzero_bits (@0), tree_nonzero_bits (@1)) == 0) > + (popcount (bit_ior @0 @1)))) We'll have to be careful if we ever turn popcount into something generic, but the current situation indeed should safely guarantee that @0 and @1 have the same type. > + /* pocount(X) == 0 is X == 0, and related (in)equalities. */ po+p+count > + (for cmp (le eq ne gt) > + rep (eq eq ne ne) > + (simplify > + (cmp (popcount @0) zerop) Might as well use integer_zerop when we know we are dealing with integers. > + (rep @0 { build_zero_cst (TREE_TYPE (@0)); })))) Nice patch :)
On Mon, 30 Apr 2018, Jeff Law wrote: > On 02/09/2018 05:42 AM, Roger Sayle wrote: >> The following patch implements a number of __builtin_popcount related >> optimizations. >> (i) popcount(x) == 0 can be simplified to x==0, and popcount(x) != 0 to >> x!=0. >> (ii) popcount(x&1) can be simplified to x&1, and for unsigned x, >> popcount(x>>31) to x>>31. >> (iii) popcount (x&6) + popcount(y&16) can be simplified to >> popcount((x&6)(y&16)) >> >> These may seem obscure transformations, but performing these types of >> POPCOUNT >> operations are often the performance critical steps in some cheminformatics >> applications. >> >> To implement the above transformations I've introduced the tree_nonzero_bits >> function, >> which is a treelevel version of rtlanal's nonzero_bits used by the RTL >> optimizers. >> >> The following patch has been tested on x86_64pclinuxgnu with a "make >> bootstrap" >> and "make check" with no regressions, and passes for the four new gcc.dg >> test cases. >> >> Many thanks In advance. Best regards, >> >> Roger >>  >> Roger Sayle, PhD. >> NextMove Software Limited >> Innovation Centre (Unit 23), Cambridge Science Park, Cambridge, CB4 0EY >> >> 20180209 Roger Sayle <roger@nextmovesoftware.com> >> >> * foldconst.c (tree_nonzero_bits): New function. >> * foldconst.h (tree_nonzero_bits): Likewise. >> * match.pd (POPCOUNT): New patterns to fold BUILTIN_POPCOUNT and >> friends. POPCOUNT(x&1) => x&1, POPCOUNT(x)==0 => x==0, etc. >> >> 20180209 Roger Sayle <roger@nextmovesoftware.com> >> >> * gcc.dg/foldpopcount1.c: New testcase. >> * gcc.dg/foldpopcount2.c: New testcase. >> * gcc.dg/foldpopcount3.c: New testcase. >> * gcc.dg/foldpopcount4.c: New testcase. >> >> >> >> >> Index: gcc/foldconst.c >> =================================================================== >>  gcc/foldconst.c (revision 257227) >> +++ gcc/foldconst.c (working copy) >> @@ 14580,6 +14580,75 @@ >> return string + offset; >> } >> >> +/* Given a tree T, compute which bits in T may be nonzero. */ >> + >> +wide_int >> +tree_nonzero_bits (const_tree t) >> +{ >> + switch (TREE_CODE (t)) >> + { >> + case BIT_IOR_EXPR: >> + case BIT_XOR_EXPR: >> + return wi::bit_or (tree_nonzero_bits (TREE_OPERAND (t, 0)), >> + tree_nonzero_bits (TREE_OPERAND (t, 1))); > Hmm. I think this will potentially have too many bits set in the > BIT_XOR case. Is there some reason you didn't use wi::bit_xor for that > case? You cannot use bit_xor because the bits are only *possibly* set (same as you can't say anything about BIT_NOT_EXPR). We would need to also track the bits *certainly* set to start doing clever stuff, and for BIT_XOR_EXPR that would still be inconvenient.
On 05/01/2018 02:48 AM, Marc Glisse wrote: > On Mon, 30 Apr 2018, Jeff Law wrote: > >> On 02/09/2018 05:42 AM, Roger Sayle wrote: >>> The following patch implements a number of __builtin_popcount related >>> optimizations. >>> (i) popcount(x) == 0 can be simplified to x==0, and popcount(x) != 0 to >>> x!=0. >>> (ii) popcount(x&1) can be simplified to x&1, and for unsigned x, >>> popcount(x>>31) to x>>31. >>> (iii) popcount (x&6) + popcount(y&16) can be simplified to >>> popcount((x&6)(y&16)) >>> >>> These may seem obscure transformations, but performing these types of >>> POPCOUNT >>> operations are often the performance critical steps in some >>> cheminformatics >>> applications. >>> >>> To implement the above transformations I've introduced the >>> tree_nonzero_bits >>> function, >>> which is a treelevel version of rtlanal's nonzero_bits used by the RTL >>> optimizers. >>> >>> The following patch has been tested on x86_64pclinuxgnu with a "make >>> bootstrap" >>> and "make check" with no regressions, and passes for the four new gcc.dg >>> test cases. >>> >>> Many thanks In advance. Best regards, >>> >>> Roger >>>  >>> Roger Sayle, PhD. >>> NextMove Software Limited >>> Innovation Centre (Unit 23), Cambridge Science Park, Cambridge, CB4 0EY >>> >>> 20180209 Roger Sayle <roger@nextmovesoftware.com> >>> >>> * foldconst.c (tree_nonzero_bits): New function. >>> * foldconst.h (tree_nonzero_bits): Likewise. >>> * match.pd (POPCOUNT): New patterns to fold BUILTIN_POPCOUNT and >>> friends. POPCOUNT(x&1) => x&1, POPCOUNT(x)==0 => x==0, etc. >>> >>> 20180209 Roger Sayle <roger@nextmovesoftware.com> >>> >>> * gcc.dg/foldpopcount1.c: New testcase. >>> * gcc.dg/foldpopcount2.c: New testcase. >>> * gcc.dg/foldpopcount3.c: New testcase. >>> * gcc.dg/foldpopcount4.c: New testcase. >>> >>> >>> >>> >>> Index: gcc/foldconst.c >>> =================================================================== >>>  gcc/foldconst.c (revision 257227) >>> +++ gcc/foldconst.c (working copy) >>> @@ 14580,6 +14580,75 @@ >>> return string + offset; >>> } >>> >>> +/* Given a tree T, compute which bits in T may be nonzero. */ >>> + >>> +wide_int >>> +tree_nonzero_bits (const_tree t) >>> +{ >>> + switch (TREE_CODE (t)) >>> + { >>> + case BIT_IOR_EXPR: >>> + case BIT_XOR_EXPR: >>> + return wi::bit_or (tree_nonzero_bits (TREE_OPERAND (t, 0)), >>> + tree_nonzero_bits (TREE_OPERAND (t, 1))); >> Hmm. I think this will potentially have too many bits set in the >> BIT_XOR case. Is there some reason you didn't use wi::bit_xor for that >> case? > > You cannot use bit_xor because the bits are only *possibly* set (same as > you can't say anything about BIT_NOT_EXPR). We would need to also track > the bits *certainly* set to start doing clever stuff, and for > BIT_XOR_EXPR that would still be inconvenient. Yea, I realized it was a may vs must issue after I signed off for the night. jeff
Index: gcc/foldconst.c ===================================================================  gcc/foldconst.c (revision 257227) +++ gcc/foldconst.c (working copy) @@ 14580,6 +14580,75 @@ return string + offset; } +/* Given a tree T, compute which bits in T may be nonzero. */ + +wide_int +tree_nonzero_bits (const_tree t) +{ + switch (TREE_CODE (t)) + { + case INTEGER_CST: + return wi::to_wide (t); + case SSA_NAME: + return get_nonzero_bits (t); + case NON_LVALUE_EXPR: + case SAVE_EXPR: + return tree_nonzero_bits (TREE_OPERAND (t, 0)); + case BIT_AND_EXPR: + return wi::bit_and (tree_nonzero_bits (TREE_OPERAND (t, 0)), + tree_nonzero_bits (TREE_OPERAND (t, 1))); + case BIT_IOR_EXPR: + case BIT_XOR_EXPR: + return wi::bit_or (tree_nonzero_bits (TREE_OPERAND (t, 0)), + tree_nonzero_bits (TREE_OPERAND (t, 1))); + case COND_EXPR: + return wi::bit_or (tree_nonzero_bits (TREE_OPERAND (t, 1)), + tree_nonzero_bits (TREE_OPERAND (t, 2))); + case NOP_EXPR: + case CONVERT_EXPR: + return wide_int::from (tree_nonzero_bits (TREE_OPERAND (t, 0)), + TYPE_PRECISION (TREE_TYPE (t)), + TYPE_SIGN (TREE_TYPE (TREE_OPERAND (t, 0)))); + case PLUS_EXPR: + if (INTEGRAL_TYPE_P (TREE_TYPE (t))) + { + wide_int nzbits1 = tree_nonzero_bits (TREE_OPERAND (t, 0)); + wide_int nzbits2 = tree_nonzero_bits (TREE_OPERAND (t, 1)); + if (wi::bit_and (nzbits1, nzbits2) == 0) + return wi::bit_or (nzbits1, nzbits2); + } + break; + case LSHIFT_EXPR: + if (TREE_CODE (TREE_OPERAND (t, 1)) == INTEGER_CST) + { + tree type = TREE_TYPE (t); + wide_int nzbits = tree_nonzero_bits (TREE_OPERAND (t, 0)); + wide_int arg1 = wi::to_wide (TREE_OPERAND (t, 1), + TYPE_PRECISION (type)); + return wi::neg_p (arg1) + ? wi::rshift (nzbits, arg1, TYPE_SIGN (type)) + : wi::lshift (nzbits, arg1); + } + break; + case RSHIFT_EXPR: + if (TREE_CODE (TREE_OPERAND (t, 1)) == INTEGER_CST) + { + tree type = TREE_TYPE (t); + wide_int nzbits = tree_nonzero_bits (TREE_OPERAND (t, 0)); + wide_int arg1 = wi::to_wide (TREE_OPERAND (t, 1), + TYPE_PRECISION (type)); + return wi::neg_p (arg1) + ? wi::lshift (nzbits, arg1) + : wi::rshift (nzbits, arg1, TYPE_SIGN (type)); + } + break; + default: + break; + } + + return wi::shwi (1, TYPE_PRECISION (TREE_TYPE (t))); +} + #if CHECKING_P namespace selftest { Index: gcc/foldconst.h ===================================================================  gcc/foldconst.h (revision 257227) +++ gcc/foldconst.h (working copy) @@ 181,6 +181,7 @@ extern tree const_binop (enum tree_code, tree, tree, tree); extern bool negate_mathfn_p (combined_fn); extern const char *c_getstr (tree, unsigned HOST_WIDE_INT *strlen = NULL); +extern wide_int tree_nonzero_bits (const_tree); /* Return OFF converted to a pointer offset type suitable as offset for POINTER_PLUS_EXPR. Use location LOC for this conversion. */ Index: gcc/match.pd ===================================================================  gcc/match.pd (revision 257227) +++ gcc/match.pd (working copy) @@ 4648,3 +4648,24 @@  wi::geu_p (wi::to_wide (@rpos), wi::to_wide (@ipos) + isize)) (BIT_FIELD_REF @0 @rsize @rpos))))) + +/* POPCOUNT simplifications. */ +(for popcount (BUILT_IN_POPCOUNT BUILT_IN_POPCOUNTL BUILT_IN_POPCOUNTLL + BUILT_IN_POPCOUNTIMAX) + /* popcount(X&1) is nop_expr(X&1). */ + (simplify + (popcount @0) + (if (tree_nonzero_bits (@0) == 1) + (convert @0))) + /* popcount(X) + popcount(Y) is popcount(XY) when X&Y must be zero. */ + (simplify + (plus (popcount @0) (popcount @1)) + (if (wi::bit_and (tree_nonzero_bits (@0), tree_nonzero_bits (@1)) == 0) + (popcount (bit_ior @0 @1)))) + /* pocount(X) == 0 is X == 0, and related (in)equalities. */ + (for cmp (le eq ne gt) + rep (eq eq ne ne) + (simplify + (cmp (popcount @0) zerop) + (rep @0 { build_zero_cst (TREE_TYPE (@0)); })))) +