Message ID | 1421837394-7619-5-git-send-email-rv@rasmusvillemoes.dk |
---|---|
State | New |
Headers | show |
On Wed, Jan 21, 2015 at 11:49 AM, Rasmus Villemoes <rv@rasmusvillemoes.dk> wrote: > Generalizing the x+(x&1) pattern, one can round up x to a multiple of > a 2^k by adding the negative of x modulo 2^k. But it is fewer > instructions, and presumably requires fewer registers, to do the more > common (x+m)&~m where m=2^k-1. > > Signed-off-by: Rasmus Villemoes <rv@rasmusvillemoes.dk> > --- > gcc/match.pd | 9 ++++++ > gcc/testsuite/gcc.dg/20150120-4.c | 59 +++++++++++++++++++++++++++++++++++++++ > 2 files changed, 68 insertions(+) > create mode 100644 gcc/testsuite/gcc.dg/20150120-4.c > > diff --git gcc/match.pd gcc/match.pd > index 47865f1..93c2298 100644 > --- gcc/match.pd > +++ gcc/match.pd > @@ -273,6 +273,15 @@ along with GCC; see the file COPYING3. If not see > (if (TREE_CODE (@2) != SSA_NAME || has_single_use (@2)) > (bit_ior @0 (bit_not @1)))) > > +/* x + ((-x) & m) -> (x + m) & ~m when m == 2^k-1. */ > +(simplify > + (plus:c @0 (bit_and@2 (negate @0) CONSTANT_CLASS_P@1)) I think you want to restrict this to INTEGER_CST@1 > + (with { tree cst = fold_binary (PLUS_EXPR, TREE_TYPE (@1), > + @1, build_one_cst (TREE_TYPE (@1))); } We shouldn't dispatch to fold_binary in patterns. int_const_binop would be the appropriate function to use - but what happens for @1 == INT_MAX where @1 + 1 overflows? Similar, is this also valid for negative @1 and thus signed mask types? IMHO we should check whether @1 is equal to wi::mask (TYPE_PRECISION (TREE_TYPE (@1)) - wi::clz (@1), false, TYPE_PRECISION (TREE_TYPE (@1)). As with the other patch a ChangeLog entry is missing as well as stating how you tested the patch. Thanks, Richard. > + (if ((TREE_CODE (@2) != SSA_NAME || has_single_use (@2)) > + && cst && integer_pow2p (cst)) > + (bit_and (plus @0 @1) (bit_not @1))))) > + > (simplify > (abs (negate @0)) > (abs @0)) > diff --git gcc/testsuite/gcc.dg/20150120-4.c gcc/testsuite/gcc.dg/20150120-4.c > new file mode 100644 > index 0000000..c3552bf > --- /dev/null > +++ gcc/testsuite/gcc.dg/20150120-4.c > @@ -0,0 +1,59 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fdump-tree-original" } */ > + > +/* x + ((-x) & m) -> (x + m) & ~m for m one less than a pow2. */ > +int > +fn1 (int x) > +{ > + return x + ((-x) & 7); > +} > +int > +fn2 (int x) > +{ > + return ((-x) & 7) + x; > +} > +unsigned int > +fn3 (unsigned int x) > +{ > + return x + ((-x) & 7); > +} > +unsigned int > +fn4 (unsigned int x) > +{ > + return ((-x) & 7) + x; > +} > +unsigned int > +fn5 (unsigned int x) > +{ > + return x + ((-x) % 8); > +} > +unsigned int > +fn6 (unsigned int x) > +{ > + return ((-x) % 8) + x; > +} > +int > +fn7 (int x) > +{ > + return x + ((-x) & 9); > +} > +int > +fn8 (int x) > +{ > + return ((-x) & 9) + x; > +} > +unsigned int > +fn9 (unsigned int x) > +{ > + return x + ((-x) & ~0U); > +} > +unsigned int > +fn10 (unsigned int x) > +{ > + return ((-x) & ~0U) + x; > +} > + > + > +/* { dg-final { scan-tree-dump-times "x \\+ 7" 6 "original" } } */ > +/* { dg-final { scan-tree-dump-times "-x & 9" 2 "original" } } */ > +/* { dg-final { scan-tree-dump-times "return 0" 2 "original" } } */ > -- > 2.1.3 >
On Thu, 30 Apr 2015, Richard Biener wrote: > On Wed, Jan 21, 2015 at 11:49 AM, Rasmus Villemoes > <rv@rasmusvillemoes.dk> wrote: >> Generalizing the x+(x&1) pattern, one can round up x to a multiple of >> a 2^k by adding the negative of x modulo 2^k. But it is fewer >> instructions, and presumably requires fewer registers, to do the more >> common (x+m)&~m where m=2^k-1. >> >> Signed-off-by: Rasmus Villemoes <rv@rasmusvillemoes.dk> >> --- >> gcc/match.pd | 9 ++++++ >> gcc/testsuite/gcc.dg/20150120-4.c | 59 +++++++++++++++++++++++++++++++++++++++ >> 2 files changed, 68 insertions(+) >> create mode 100644 gcc/testsuite/gcc.dg/20150120-4.c >> >> diff --git gcc/match.pd gcc/match.pd >> index 47865f1..93c2298 100644 >> --- gcc/match.pd >> +++ gcc/match.pd >> @@ -273,6 +273,15 @@ along with GCC; see the file COPYING3. If not see >> (if (TREE_CODE (@2) != SSA_NAME || has_single_use (@2)) >> (bit_ior @0 (bit_not @1)))) >> >> +/* x + ((-x) & m) -> (x + m) & ~m when m == 2^k-1. */ >> +(simplify >> + (plus:c @0 (bit_and@2 (negate @0) CONSTANT_CLASS_P@1)) > > I think you want to restrict this to INTEGER_CST@1 Is this only to make the following test easier (a good enough reason for me) or is there some fundamental reason why this transformation would be wrong for vectors? >> + (with { tree cst = fold_binary (PLUS_EXPR, TREE_TYPE (@1), >> + @1, build_one_cst (TREE_TYPE (@1))); } > > We shouldn't dispatch to fold_binary in patterns. int_const_binop would > be the appropriate function to use - but what happens for @1 == INT_MAX > where @1 + 1 overflows? Similar, is this also valid for negative @1 > and thus signed mask types? IMHO we should check whether @1 > is equal to wi::mask (TYPE_PRECISION (TREE_TYPE (@1)) - wi::clz (@1), > false, TYPE_PRECISION (TREE_TYPE (@1)). > > As with the other patch a ChangeLog entry is missing as well as stating > how you tested the patch. > > Thanks, > Richard. > >> + (if ((TREE_CODE (@2) != SSA_NAME || has_single_use (@2)) >> + && cst && integer_pow2p (cst)) >> + (bit_and (plus @0 @1) (bit_not @1)))))
On Thu, Apr 30, 2015 at 1:44 PM, Marc Glisse <marc.glisse@inria.fr> wrote: > On Thu, 30 Apr 2015, Richard Biener wrote: > >> On Wed, Jan 21, 2015 at 11:49 AM, Rasmus Villemoes >> <rv@rasmusvillemoes.dk> wrote: >>> >>> Generalizing the x+(x&1) pattern, one can round up x to a multiple of >>> a 2^k by adding the negative of x modulo 2^k. But it is fewer >>> instructions, and presumably requires fewer registers, to do the more >>> common (x+m)&~m where m=2^k-1. >>> >>> Signed-off-by: Rasmus Villemoes <rv@rasmusvillemoes.dk> >>> --- >>> gcc/match.pd | 9 ++++++ >>> gcc/testsuite/gcc.dg/20150120-4.c | 59 >>> +++++++++++++++++++++++++++++++++++++++ >>> 2 files changed, 68 insertions(+) >>> create mode 100644 gcc/testsuite/gcc.dg/20150120-4.c >>> >>> diff --git gcc/match.pd gcc/match.pd >>> index 47865f1..93c2298 100644 >>> --- gcc/match.pd >>> +++ gcc/match.pd >>> @@ -273,6 +273,15 @@ along with GCC; see the file COPYING3. If not see >>> (if (TREE_CODE (@2) != SSA_NAME || has_single_use (@2)) >>> (bit_ior @0 (bit_not @1)))) >>> >>> +/* x + ((-x) & m) -> (x + m) & ~m when m == 2^k-1. */ >>> +(simplify >>> + (plus:c @0 (bit_and@2 (negate @0) CONSTANT_CLASS_P@1)) >> >> >> I think you want to restrict this to INTEGER_CST@1 > > > Is this only to make the following test easier (a good enough reason for me) > or is there some fundamental reason why this transformation would be wrong > for vectors? Good question - I suppose it also works for vectors (well, the predicates don't). for non-ingegers or complex ints we shouldn't arrive here as we can't have bit_and for them. for pointers we can't have plus on them. So yes, it makes the following tests easier. A TODO comment for vectors might be appropriate (we'd simply need a predicate that can test for all emlements being 2^k-1). Richard. > >>> + (with { tree cst = fold_binary (PLUS_EXPR, TREE_TYPE (@1), >>> + @1, build_one_cst (TREE_TYPE (@1))); } >> >> >> We shouldn't dispatch to fold_binary in patterns. int_const_binop would >> be the appropriate function to use - but what happens for @1 == INT_MAX >> where @1 + 1 overflows? Similar, is this also valid for negative @1 >> and thus signed mask types? IMHO we should check whether @1 >> is equal to wi::mask (TYPE_PRECISION (TREE_TYPE (@1)) - wi::clz (@1), >> false, TYPE_PRECISION (TREE_TYPE (@1)). >> >> As with the other patch a ChangeLog entry is missing as well as stating >> how you tested the patch. >> >> Thanks, >> Richard. >> >>> + (if ((TREE_CODE (@2) != SSA_NAME || has_single_use (@2)) >>> + && cst && integer_pow2p (cst)) >>> + (bit_and (plus @0 @1) (bit_not @1))))) > > > -- > Marc Glisse
diff --git gcc/match.pd gcc/match.pd index 47865f1..93c2298 100644 --- gcc/match.pd +++ gcc/match.pd @@ -273,6 +273,15 @@ along with GCC; see the file COPYING3. If not see (if (TREE_CODE (@2) != SSA_NAME || has_single_use (@2)) (bit_ior @0 (bit_not @1)))) +/* x + ((-x) & m) -> (x + m) & ~m when m == 2^k-1. */ +(simplify + (plus:c @0 (bit_and@2 (negate @0) CONSTANT_CLASS_P@1)) + (with { tree cst = fold_binary (PLUS_EXPR, TREE_TYPE (@1), + @1, build_one_cst (TREE_TYPE (@1))); } + (if ((TREE_CODE (@2) != SSA_NAME || has_single_use (@2)) + && cst && integer_pow2p (cst)) + (bit_and (plus @0 @1) (bit_not @1))))) + (simplify (abs (negate @0)) (abs @0)) diff --git gcc/testsuite/gcc.dg/20150120-4.c gcc/testsuite/gcc.dg/20150120-4.c new file mode 100644 index 0000000..c3552bf --- /dev/null +++ gcc/testsuite/gcc.dg/20150120-4.c @@ -0,0 +1,59 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-original" } */ + +/* x + ((-x) & m) -> (x + m) & ~m for m one less than a pow2. */ +int +fn1 (int x) +{ + return x + ((-x) & 7); +} +int +fn2 (int x) +{ + return ((-x) & 7) + x; +} +unsigned int +fn3 (unsigned int x) +{ + return x + ((-x) & 7); +} +unsigned int +fn4 (unsigned int x) +{ + return ((-x) & 7) + x; +} +unsigned int +fn5 (unsigned int x) +{ + return x + ((-x) % 8); +} +unsigned int +fn6 (unsigned int x) +{ + return ((-x) % 8) + x; +} +int +fn7 (int x) +{ + return x + ((-x) & 9); +} +int +fn8 (int x) +{ + return ((-x) & 9) + x; +} +unsigned int +fn9 (unsigned int x) +{ + return x + ((-x) & ~0U); +} +unsigned int +fn10 (unsigned int x) +{ + return ((-x) & ~0U) + x; +} + + +/* { dg-final { scan-tree-dump-times "x \\+ 7" 6 "original" } } */ +/* { dg-final { scan-tree-dump-times "-x & 9" 2 "original" } } */ +/* { dg-final { scan-tree-dump-times "return 0" 2 "original" } } */
Generalizing the x+(x&1) pattern, one can round up x to a multiple of a 2^k by adding the negative of x modulo 2^k. But it is fewer instructions, and presumably requires fewer registers, to do the more common (x+m)&~m where m=2^k-1. Signed-off-by: Rasmus Villemoes <rv@rasmusvillemoes.dk> --- gcc/match.pd | 9 ++++++ gcc/testsuite/gcc.dg/20150120-4.c | 59 +++++++++++++++++++++++++++++++++++++++ 2 files changed, 68 insertions(+) create mode 100644 gcc/testsuite/gcc.dg/20150120-4.c