Message ID | 20160126212616.GK3017@tucnak.redhat.com |
---|---|
State | New |
Headers | show |
On Jan 26, 2016, at 1:26 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> will do cc1plus size comparison afterwards.
We know the dynamic check is larger. You can’t tell the advantage of speed from size. Better would be to time compiling any random large translation unit.
Nice to see that only 14 calls remain, that’s way better than the 34 I thought.
On Tue, Jan 26, 2016 at 01:55:41PM -0800, Mike Stump wrote: > On Jan 26, 2016, at 1:26 PM, Jakub Jelinek <jakub@redhat.com> wrote: > > will do cc1plus size comparison afterwards. > > We know the dynamic check is larger. You can’t tell the advantage of > speed from size. Better would be to time compiling any random large > translation unit. > > Nice to see that only 14 calls remain, that’s way better than the 34 I > thought. So, it seems probably the PR65656 changes made things actually significantly worse, while I see a (small) difference in the generated code between the two patches if I compile say tree-ssa-ccp.c with g++ 5.x, in the bootstrapped compiler there is no difference at all, the compilers with either patch have identical objdump -dr cc1plus. Already at *.gimple time all the relevant __builtin_constant_p are resolved and it seems all to 0. So please agree on one of the two patches (don't care which), and I'll try to distill a testcase to look at for PR65656. Jakub
On Wed, Jan 27, 2016 at 12:41 AM, Jakub Jelinek <jakub@redhat.com> wrote: > On Tue, Jan 26, 2016 at 01:55:41PM -0800, Mike Stump wrote: >> On Jan 26, 2016, at 1:26 PM, Jakub Jelinek <jakub@redhat.com> wrote: >> > will do cc1plus size comparison afterwards. >> >> We know the dynamic check is larger. You can’t tell the advantage of >> speed from size. Better would be to time compiling any random large >> translation unit. >> >> Nice to see that only 14 calls remain, that’s way better than the 34 I >> thought. > > So, it seems probably the PR65656 changes made things actually significantly > worse, while I see a (small) difference in the generated code between the two > patches if I compile say tree-ssa-ccp.c with g++ 5.x, in the bootstrapped > compiler there is no difference at all, the compilers with either patch > have identical objdump -dr cc1plus. Already at *.gimple time all the > relevant __builtin_constant_p are resolved and it seems all to 0. > > So please agree on one of the two patches (don't care which), and I'll try > to distill a testcase to look at for PR65656. I don't care if the wi::lshift by LOG2_BITS_PER_UNIT done by get_ref_base_and_extent are compiled to as good quality as before (I suspect it doesn't matter in this case as the shift amount is constant, but maybe due to PR65656 the non-STATIC_CONSTANT_P variant is better). Richard. > Jakub
On Wed, Jan 27, 2016 at 11:38:33AM +0100, Richard Biener wrote: > On Wed, Jan 27, 2016 at 12:41 AM, Jakub Jelinek <jakub@redhat.com> wrote: > > On Tue, Jan 26, 2016 at 01:55:41PM -0800, Mike Stump wrote: > >> On Jan 26, 2016, at 1:26 PM, Jakub Jelinek <jakub@redhat.com> wrote: > >> > will do cc1plus size comparison afterwards. > >> > >> We know the dynamic check is larger. You can’t tell the advantage of > >> speed from size. Better would be to time compiling any random large > >> translation unit. > >> > >> Nice to see that only 14 calls remain, that’s way better than the 34 I > >> thought. > > > > So, it seems probably the PR65656 changes made things actually significantly > > worse, while I see a (small) difference in the generated code between the two > > patches if I compile say tree-ssa-ccp.c with g++ 5.x, in the bootstrapped > > compiler there is no difference at all, the compilers with either patch > > have identical objdump -dr cc1plus. Already at *.gimple time all the > > relevant __builtin_constant_p are resolved and it seems all to 0. > > > > So please agree on one of the two patches (don't care which), and I'll try > > to distill a testcase to look at for PR65656. > > I don't care if the wi::lshift by LOG2_BITS_PER_UNIT done by > get_ref_base_and_extent > are compiled to as good quality as before (I suspect it doesn't matter > in this case > as the shift amount is constant, but maybe due to PR65656 the > non-STATIC_CONSTANT_P > variant is better). Ok, I'll commit the non-STATIC_CONSTANT_P variant for now, we can revisit it later. I have distilled a testcase for Jason in PR65656. It seems the problematic STATIC_CONSTANT_P is only if it is inside of the condition of ?: expression. So we could work around it by using something like: - if (STATIC_CONSTANT_P (xi.precision > HOST_BITS_PER_WIDE_INT) - ? (STATIC_CONSTANT_P (shift < HOST_BITS_PER_WIDE_INT - 1) - && xi.len == 1 - && xi.val[0] <= (HOST_WIDE_INT) ((unsigned HOST_WIDE_INT) - HOST_WIDE_INT_MAX >> shift)) - : precision <= HOST_BITS_PER_WIDE_INT) + bool fast_path = false; + if (STATIC_CONSTANT_P (xi.precision > HOST_BITS_PER_WIDE_INT) + { + if (STATIC_CONSTANT_P (shift < HOST_BITS_PER_WIDE_INT - 1) + && xi.len == 1 + && xi.val[0] <= (HOST_WIDE_INT) ((unsigned HOST_WIDE_INT) + HOST_WIDE_INT_MAX >> shift)) + fast_path = true; + } + else if (precision <= HOST_BITS_PER_WIDE_INT) + fast_path = true; + if (fast_path) { val[0] = xi.ulow () << shift; result.set_len (1); } else result.set_len (lshift_large (val, xi.val, xi.len, precision, shift)); and similarly in lrshift. Jakub
--- gcc/wide-int.h.jj 2016-01-04 14:55:50.000000000 +0100 +++ gcc/wide-int.h 2016-01-26 19:05:20.715269366 +0100 @@ -2909,7 +2909,9 @@ wi::lrshift (const T1 &x, const T2 &y) For variable-precision integers like wide_int, handle HWI and sub-HWI integers inline. */ if (STATIC_CONSTANT_P (xi.precision > HOST_BITS_PER_WIDE_INT) - ? xi.len == 1 && xi.val[0] >= 0 + ? (STATIC_CONSTANT_P (shift < HOST_BITS_PER_WIDE_INT) + && xi.len == 1 + && xi.val[0] >= 0) : xi.precision <= HOST_BITS_PER_WIDE_INT) { val[0] = xi.to_uhwi () >> shift; --- gcc/testsuite/gcc.dg/torture/pr69399.c.jj 2016-01-26 21:39:45.732927748 +0100 +++ gcc/testsuite/gcc.dg/torture/pr69399.c 2016-01-26 21:41:09.081753051 +0100 @@ -0,0 +1,18 @@ +/* { dg-do run { target int128 } } */ + +static unsigned __attribute__((noinline, noclone)) +foo (unsigned long long u) +{ + unsigned __int128 v = u | 0xffffff81U; + v >>= 64; + return v; +} + +int +main () +{ + unsigned x = foo (27); + if (x != 0) + __builtin_abort (); + return 0; +}