Message ID | 400461904.779176.1309161973367.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com |
---|---|
State | New |
Headers | show |
On Mon, Jun 27, 2011 at 10:06 AM, Kai Tietz <ktietz@redhat.com> wrote: > Hello, > > this patch improves the bswap (32/64) detection and fixes an issue about shifted bit values out of type-size precision. > > ChangeLog > > 2011-06-27 Kai Tietz <ktietz@redhat.com> > > * tree-ssa-math-opts.c (do_shift_rotate): Zero bits > out of type precision after operation. > (find_bswap): Take for limit value the integer auto- > promotion into account. > > ChangeLog > > 2011-06-27 Kai Tietz <ktietz@redhat.com> > > * gcc.dg/optimize-bswapdi-2.c: New test. > > Bootstrapped and regression tested for x86_64-pc-linux-gnu. Ok for apply? The do_shift_rotate hunk is ok. Where did you get the testcase from? I can easily construct testcases that go via an intermediate SImode step and the still won't be recongized. Thus, is the testcase practically relevant? If so we should add 1 + log2 (byte-size) instead of 3 to also cover the SImode -> HImode intermediate case. Thanks, Richard. > Regards, > Kai >
2011/6/27 Richard Guenther <richard.guenther@gmail.com>: > On Mon, Jun 27, 2011 at 10:06 AM, Kai Tietz <ktietz@redhat.com> wrote: >> Hello, >> >> this patch improves the bswap (32/64) detection and fixes an issue about shifted bit values out of type-size precision. >> >> ChangeLog >> >> 2011-06-27 Kai Tietz <ktietz@redhat.com> >> >> * tree-ssa-math-opts.c (do_shift_rotate): Zero bits >> out of type precision after operation. >> (find_bswap): Take for limit value the integer auto- >> promotion into account. >> >> ChangeLog >> >> 2011-06-27 Kai Tietz <ktietz@redhat.com> >> >> * gcc.dg/optimize-bswapdi-2.c: New test. >> >> Bootstrapped and regression tested for x86_64-pc-linux-gnu. Ok for apply? > > The do_shift_rotate hunk is ok. Where did you get the testcase from? > I can easily construct testcases that go via an intermediate SImode step > and the still won't be recongized. Thus, is the testcase practically > relevant? If so we should add 1 + log2 (byte-size) instead of 3 to also > cover the SImode -> HImode intermediate case. > > Thanks, > Richard. Ok, I can adjust it to use here 1 + 3 (for byte-size covering the SImode -> HImode intermediate case). I was seen a regression for bswap detection by doing in forward-propagate a type-sinking on (type) X op (type) Y (for X and Y with compatible type), and (type) X op CST (for ((type) (type-x) CST) == CST case. I will sent soon a patch for this optimization related thing. By this I detected that bswap algorithm didn't zero'ed upper bits out of size-range. Ok with that change? Regards, Kai
On Mon, Jun 27, 2011 at 11:50 AM, Kai Tietz <ktietz70@googlemail.com> wrote: > 2011/6/27 Richard Guenther <richard.guenther@gmail.com>: >> On Mon, Jun 27, 2011 at 10:06 AM, Kai Tietz <ktietz@redhat.com> wrote: >>> Hello, >>> >>> this patch improves the bswap (32/64) detection and fixes an issue about shifted bit values out of type-size precision. >>> >>> ChangeLog >>> >>> 2011-06-27 Kai Tietz <ktietz@redhat.com> >>> >>> * tree-ssa-math-opts.c (do_shift_rotate): Zero bits >>> out of type precision after operation. >>> (find_bswap): Take for limit value the integer auto- >>> promotion into account. >>> >>> ChangeLog >>> >>> 2011-06-27 Kai Tietz <ktietz@redhat.com> >>> >>> * gcc.dg/optimize-bswapdi-2.c: New test. >>> >>> Bootstrapped and regression tested for x86_64-pc-linux-gnu. Ok for apply? >> >> The do_shift_rotate hunk is ok. Where did you get the testcase from? >> I can easily construct testcases that go via an intermediate SImode step >> and the still won't be recongized. Thus, is the testcase practically >> relevant? If so we should add 1 + log2 (byte-size) instead of 3 to also >> cover the SImode -> HImode intermediate case. >> >> Thanks, >> Richard. > > Ok, I can adjust it to use here 1 + 3 (for byte-size covering the > SImode -> HImode intermediate case). I was seen a regression for > bswap detection by doing in forward-propagate a type-sinking on (type) > X op (type) Y (for X and Y with compatible type), and (type) X op CST > (for ((type) (type-x) CST) == CST case. I will sent soon a patch for > this optimization related thing. By this I detected that bswap > algorithm didn't zero'ed upper bits out of size-range. > > Ok with that change? Ok with using 1 + ceil_log2 (TREE_INT_CST_LOW (TYPE_SIZE_UNIT (...))), CSE that TREE_INT_CST_LOW. Thanks, Richard. > Regards, > Kai >
Index: gcc-head/gcc/testsuite/gcc.dg/optimize-bswapdi-2.c =================================================================== --- /dev/null +++ gcc-head/gcc/testsuite/gcc.dg/optimize-bswapdi-2.c @@ -0,0 +1,26 @@ +/* { dg-do compile { target arm*-*-* alpha*-*-* ia64*-*-* x86_64-*-* s390x-*-* powerpc*-*-* rs6000-*-* } } */ +/* { dg-require-effective-target stdint_types } */ +/* { dg-require-effective-target lp64 } */ +/* { dg-options "-O2 -fdump-tree-bswap" } */ + +#include <stdint.h> + +/* A variant via unsigned short. */ + +uint64_t +swap64_c (uint64_t x) +{ + uint16_t a0 = x >> 48; + uint16_t a1 = x >> 32; + uint16_t a2 = x >> 16; + uint16_t a3 = x; + + return ((uint64_t) (((a0 >> 8) & 0xff) | ((a0 << 8) & 0xff00))) + | ((uint64_t) (((a1 >> 8) & 0xff) | ((a1 << 8) & 0xff00)) << 16) + | ((uint64_t) (((a2 >> 8) & 0xff) | ((a2 << 8) & 0xff00)) << 32) + | ((uint64_t) (((a3 >> 8) & 0xff) | ((a3 << 8) & 0xff00)) << 48); +} + + +/* { dg-final { scan-tree-dump-times "64 bit bswap implementation found at" 1 "bswap" } } */ +/* { dg-final { cleanup-tree-dump "bswap" } } */ Index: gcc-head/gcc/tree-ssa-math-opts.c =================================================================== --- gcc-head.orig/gcc/tree-ssa-math-opts.c +++ gcc-head/gcc/tree-ssa-math-opts.c @@ -1543,6 +1543,9 @@ do_shift_rotate (enum tree_code code, default: return false; } + /* Zero unused bits for size. */ + if (n->size < (int)sizeof (HOST_WIDEST_INT)) + n->n &= ((unsigned HOST_WIDEST_INT)1 << (n->size * BITS_PER_UNIT)) - 1; return true; } @@ -1743,12 +1746,12 @@ find_bswap (gimple stmt) /* The last parameter determines the depth search limit. It usually correlates directly to the number of bytes to be touched. We - increase that number by one here in order to also cover signed -> - unsigned conversions of the src operand as can be seen in - libgcc. */ + increase that number by three here in order to also + cover signed -> unsigned converions of the src operand as can be seen + in libgcc, and for initial shift/and operation of the src operand. */ source_expr = find_bswap_1 (stmt, &n, TREE_INT_CST_LOW ( - TYPE_SIZE_UNIT (gimple_expr_type (stmt))) + 1); + TYPE_SIZE_UNIT (gimple_expr_type (stmt))) + 3); if (!source_expr) return NULL_TREE;