Message ID | ZDbjI34T20ewQ2qs@tucnak |
---|---|
State | New |
Headers | show |
Series | combine, v4: Fix AND handling for WORD_REGISTER_OPERATIONS targets [PR109040] | expand |
On 4/12/23 10:58, Jakub Jelinek wrote: > On Wed, Apr 12, 2023 at 12:02:12PM +0200, Jakub Jelinek via Gcc-patches wrote: >> I've tried the pr108947.c testcase, but I see no differences in the assembly >> before/after the patch (but dunno if I'm using the right options). >> The pr109040.c testcase from the patch I don't see the expected zero >> extension without the patch and do see it with it. > > Seems my cross defaulted to 32-bit compilation, reproduced it with > additional -mabi=lp64 -march=rv64gv even on the pr108947.c test. > So, let's include that test in the patch too: > > 2023-04-12 Jeff Law <jlaw@ventanamicro.com> > Jakub Jelinek <jakub@redhat.com> > > PR target/108947 > PR target/109040 > * combine.cc (simplify_and_const_int_1): Compute nonzero_bits in > word_mode rather than mode if WORD_REGISTER_OPERATIONS and mode is > smaller than word_mode. > * simplify-rtx.cc (simplify_context::simplify_binary_operation_1) > <case AND>: Likewise. > > * gcc.dg/pr108947.c: New test. > * gcc.c-torture/execute/pr109040.c: New test. Bootstrap of the v3 patch has completed. Regression testing is still spinning. It should be done and waiting for me when I wake up in the morning. jeff-
On Wed, Apr 12, 2023 at 10:05:08PM -0600, Jeff Law wrote: > On 4/12/23 10:58, Jakub Jelinek wrote: > >Seems my cross defaulted to 32-bit compilation, reproduced it with > >additional -mabi=lp64 -march=rv64gv even on the pr108947.c test. > >So, let's include that test in the patch too: > > > >2023-04-12 Jeff Law <jlaw@ventanamicro.com> > > Jakub Jelinek <jakub@redhat.com> > > > > PR target/108947 > > PR target/109040 > > * combine.cc (simplify_and_const_int_1): Compute nonzero_bits in > > word_mode rather than mode if WORD_REGISTER_OPERATIONS and mode is > > smaller than word_mode. > > * simplify-rtx.cc (simplify_context::simplify_binary_operation_1) > > <case AND>: Likewise. > > > > * gcc.dg/pr108947.c: New test. > > * gcc.c-torture/execute/pr109040.c: New test. > Bootstrap of the v3 patch has completed. Regression testing is still > spinning. It should be done and waiting for me when I wake up in the > morning. It's still okay for trunk (of course) if the bootstrap doesn't fail (of course). Thanks guys! Segher
On 4/13/23 04:57, Segher Boessenkool wrote: > On Wed, Apr 12, 2023 at 10:05:08PM -0600, Jeff Law wrote: >> On 4/12/23 10:58, Jakub Jelinek wrote: >>> Seems my cross defaulted to 32-bit compilation, reproduced it with >>> additional -mabi=lp64 -march=rv64gv even on the pr108947.c test. >>> So, let's include that test in the patch too: >>> >>> 2023-04-12 Jeff Law <jlaw@ventanamicro.com> >>> Jakub Jelinek <jakub@redhat.com> >>> >>> PR target/108947 >>> PR target/109040 >>> * combine.cc (simplify_and_const_int_1): Compute nonzero_bits in >>> word_mode rather than mode if WORD_REGISTER_OPERATIONS and mode is >>> smaller than word_mode. >>> * simplify-rtx.cc (simplify_context::simplify_binary_operation_1) >>> <case AND>: Likewise. >>> >>> * gcc.dg/pr108947.c: New test. >>> * gcc.c-torture/execute/pr109040.c: New test. >> Bootstrap of the v3 patch has completed. Regression testing is still >> spinning. It should be done and waiting for me when I wake up in the >> morning. > > It's still okay for trunk (of course) if the bootstrap doesn't fail (of > course). Thanks guys! Bootstrap was successful with v3, but there's hundreds of testsuite failures due to the simplify-rtx hunk. compile/20070520-1.c for example when compiled with: -O3 -funroll-loops -march=rv64gc -mabi=lp64d Thursdays are my hell day. It's unlikely I'd be able to look at this at all today. typedef unsigned char uint8_t; extern uint8_t ff_cropTbl[256 + 2 * 1024]; void ff_pred8x8_plane_c(uint8_t *src, int stride){ int j, k; int a; uint8_t *cm = ff_cropTbl + 1024; const uint8_t * const src0 = src+3-stride; const uint8_t *src1 = src+4*stride-1; const uint8_t *src2 = src1-2*stride; int H = src0[1] - src0[-1]; int V = src1[0] - src2[ 0]; for(k=2; k<=4; ++k) { src1 += stride; src2 -= stride; H += k*(src0[k] - src0[-k]); V += k*(src1[0] - src2[ 0]); } H = ( 17*H+16 ) >> 5; V = ( 17*V+16 ) >> 5; a = 16*(src1[0] + src2[8]+1) - 3*(V+H); for(j=8; j>0; --j) { int b = a; a += V; src[0] = cm[ (b ) >> 5 ]; src[1] = cm[ (b+ H) >> 5 ]; src[2] = cm[ (b+2*H) >> 5 ]; src[3] = cm[ (b+3*H) >> 5 ]; src[4] = cm[ (b+4*H) >> 5 ]; src[5] = cm[ (b+5*H) >> 5 ]; src[6] = cm[ (b+6*H) >> 5 ]; src[7] = cm[ (b+7*H) >> 5 ]; src += stride; } } Jeff
--- gcc/combine.cc.jj 2023-04-07 16:02:06.668051629 +0200 +++ gcc/combine.cc 2023-04-12 11:24:18.458240028 +0200 @@ -10055,9 +10055,12 @@ simplify_and_const_int_1 (scalar_int_mod /* See what bits may be nonzero in VAROP. Unlike the general case of a call to nonzero_bits, here we don't care about bits outside - MODE. */ + MODE unless WORD_REGISTER_OPERATIONS is true. */ - nonzero = nonzero_bits (varop, mode) & GET_MODE_MASK (mode); + scalar_int_mode tmode = mode; + if (WORD_REGISTER_OPERATIONS && GET_MODE_BITSIZE (mode) < BITS_PER_WORD) + tmode = word_mode; + nonzero = nonzero_bits (varop, tmode) & GET_MODE_MASK (tmode); /* Turn off all bits in the constant that are known to already be zero. Thus, if the AND isn't needed at all, we will have CONSTOP == NONZERO_BITS @@ -10071,7 +10074,7 @@ simplify_and_const_int_1 (scalar_int_mod /* If VAROP is a NEG of something known to be zero or 1 and CONSTOP is a power of two, we can replace this with an ASHIFT. */ - if (GET_CODE (varop) == NEG && nonzero_bits (XEXP (varop, 0), mode) == 1 + if (GET_CODE (varop) == NEG && nonzero_bits (XEXP (varop, 0), tmode) == 1 && (i = exact_log2 (constop)) >= 0) return simplify_shift_const (NULL_RTX, ASHIFT, mode, XEXP (varop, 0), i); --- gcc/simplify-rtx.cc.jj 2023-03-02 19:09:45.459594212 +0100 +++ gcc/simplify-rtx.cc 2023-04-12 11:26:26.027400305 +0200 @@ -3752,7 +3752,13 @@ simplify_context::simplify_binary_operat return op0; if (HWI_COMPUTABLE_MODE_P (mode)) { - HOST_WIDE_INT nzop0 = nonzero_bits (trueop0, mode); + /* When WORD_REGISTER_OPERATIONS is true, we need to know the + nonzero bits in WORD_MODE rather than MODE. */ + scalar_int_mode tmode = as_a <scalar_int_mode> (mode); + if (WORD_REGISTER_OPERATIONS + && GET_MODE_BITSIZE (tmode) < BITS_PER_WORD) + tmode = word_mode; + HOST_WIDE_INT nzop0 = nonzero_bits (trueop0, tmode); HOST_WIDE_INT nzop1; if (CONST_INT_P (trueop1)) { --- gcc/testsuite/gcc.dg/pr108947.c.jj 2023-04-12 18:54:13.115630365 +0200 +++ gcc/testsuite/gcc.dg/pr108947.c 2023-04-12 18:53:21.166372386 +0200 @@ -0,0 +1,21 @@ +/* PR target/108947 */ +/* { dg-do run } */ +/* { dg-options "-O2 -fno-forward-propagate -Wno-psabi" } */ + +typedef unsigned short __attribute__((__vector_size__ (2 * sizeof (short)))) V; + +__attribute__((__noipa__)) V +foo (V v) +{ + V w = 3 > (v & 3992); + return w; +} + +int +main () +{ + V w = foo ((V) { 0, 9 }); + if (w[0] != 0xffff || w[1] != 0) + __builtin_abort (); + return 0; +} --- gcc/testsuite/gcc.c-torture/execute/pr109040.c.jj 2023-04-12 11:11:56.728938344 +0200 +++ gcc/testsuite/gcc.c-torture/execute/pr109040.c 2023-04-12 11:11:56.728938344 +0200 @@ -0,0 +1,23 @@ +/* PR target/109040 */ + +typedef unsigned short __attribute__((__vector_size__ (32))) V; + +unsigned short a, b, c, d; + +void +foo (V m, unsigned short *ret) +{ + V v = 6 > ((V) { 2124, 8 } & m); + unsigned short uc = v[0] + a + b + c + d; + *ret = uc; +} + +int +main () +{ + unsigned short x; + foo ((V) { 0, 15 }, &x); + if (x != (unsigned short) ~0) + __builtin_abort (); + return 0; +}