Message ID | 20180320221023.9712-1-jimw@sifive.com |
---|---|
State | New |
Headers | show |
Series | [PR84660] Fix combine bug with SHIFT_COUNT_TRUNCATED. | expand |
On Tue, Mar 20, 2018 at 11:10 PM, Jim Wilson <jimw@sifive.com> wrote: > This fixes a wrong-code issue on RISC-V, but in theory could be a problem for > any SHIFT_COUNT_TRUNCATED target. > > The testcase computes 46 or 47 (0x2e or 0x2f), then ANDs the value with 0xf, > and then SHIFTs the value. On a SHIFT_COUNT_TRUNCATED target, the AND can get > optimized away because for a 32-bit shift we can use the 46/47 directly and > still get the correct result. Combine then tries to convert the 32-bit shift > into a 64-bit shift, and then we have a problem, as the AND 0xf is no longer > redundant. So we must prevent combine from converting a 32-bit shift to a > 64-bit shift on a SHIFT_COUNT_TRUNCATED target when there are non-zero bits > in the shift count that matter for the larger shift mode. > > Combine already has code to handle the case where shifts are being narrowed > and this accidentally changes the shift amount. This patch adds additional > code to handle the case where shifts are being widened. > > This was tested with a cross riscv64-linux toolchain build and make check. > The new testcase fails without the patch, and works with the patch. This was > also tested with an x86_64-linux build and make check. There were no > regressions. > > OK? IMHO the real issue is that SHIFT_COUNT_TRUNCATED is used for optimizing away the and early. We then rely on the compiler not assuming anything on the value. Like if you'd do that on GIMPLE VRP would come along and second-guess you because the shift value may not be too large. This combine issue sounds very similar. So I'm suggesting again to instead provide patterns that match (shft A (and B cst)) Richard. > Jim > > gcc/ > PR rtl-optimization/84660 > * combine.c (force_int_to_mode) <ASHIFT>: If SHIFT_COUNT_TRUNCATED, > call nonzero_bits and compare against xmode precision. > > gcc/testsuite/ > PR rtl-optimization/84660 > * gcc.dg/pr84660.c: New. > --- > gcc/combine.c | 10 ++++++++-- > gcc/testsuite/gcc.dg/pr84660.c | 17 +++++++++++++++++ > 2 files changed, 25 insertions(+), 2 deletions(-) > create mode 100644 gcc/testsuite/gcc.dg/pr84660.c > > diff --git a/gcc/combine.c b/gcc/combine.c > index ff672ad2adb..4ed59eb88c8 100644 > --- a/gcc/combine.c > +++ b/gcc/combine.c > @@ -8897,14 +8897,20 @@ force_int_to_mode (rtx x, scalar_int_mode mode, scalar_int_mode xmode, > However, we cannot do anything with shifts where we cannot > guarantee that the counts are smaller than the size of the mode > because such a count will have a different meaning in a > - wider mode. */ > + different mode. If we are narrowing the mode, the shift count must > + be compared against mode. If we are widening the mode, and shift > + counts are truncated, then the shift count must be compared against > + xmode. */ > > if (! (CONST_INT_P (XEXP (x, 1)) > && INTVAL (XEXP (x, 1)) >= 0 > && INTVAL (XEXP (x, 1)) < GET_MODE_PRECISION (mode)) > && ! (GET_MODE (XEXP (x, 1)) != VOIDmode > && (nonzero_bits (XEXP (x, 1), GET_MODE (XEXP (x, 1))) > - < (unsigned HOST_WIDE_INT) GET_MODE_PRECISION (mode)))) > + < (unsigned HOST_WIDE_INT) GET_MODE_PRECISION (mode)) > + && (! SHIFT_COUNT_TRUNCATED > + || (nonzero_bits (XEXP (x, 1), GET_MODE (XEXP (x, 1))) > + < (unsigned HOST_WIDE_INT) GET_MODE_PRECISION (xmode))))) > break; > > /* If the shift count is a constant and we can do arithmetic in > diff --git a/gcc/testsuite/gcc.dg/pr84660.c b/gcc/testsuite/gcc.dg/pr84660.c > new file mode 100644 > index 00000000000..a87fa0a914d > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/pr84660.c > @@ -0,0 +1,17 @@ > +/* { dg-do run } */ > +/* { dg-options "-O2" } */ > + > +extern void abort (void); > +extern void exit (int); > + > +unsigned int __attribute__ ((noinline, noclone)) > +foo(unsigned int i) { > + > + return 0xFFFF & (0xd066 << (((i & 0x1) ^ 0x2f) & 0xf)); > +} > + > +int main() { > + if (foo (1) != 0x8000) > + abort (); > + exit (0); > +} > -- > 2.14.1
On Wed, Mar 21, 2018 at 2:45 AM, Richard Biener <richard.guenther@gmail.com> wrote: > On Tue, Mar 20, 2018 at 11:10 PM, Jim Wilson <jimw@sifive.com> wrote: >> This fixes a wrong-code issue on RISC-V, but in theory could be a problem for >> any SHIFT_COUNT_TRUNCATED target. > IMHO the real issue is that SHIFT_COUNT_TRUNCATED is used for > optimizing away the and early. We then rely on the compiler not > assuming anything on the value. Like if you'd do that on GIMPLE > VRP would come along and second-guess you because the shift > value may not be too large. This combine issue sounds very similar. > > So I'm suggesting again to instead provide patterns that > match (shft A (and B cst)) I found a message from 2013 that talks about this. https://gcc.gnu.org/ml/gcc-patches/2013-11/msg00169.html I can try looking at this. I'd question the timing though. I don't think that trying to rewrite shift patterns in the RISC-V port at this point in the release process is a good idea. I think we should fix the combine bug now with a simple patch, and try reworking the shift support after the gcc-8 release. Jim
On Thu, Mar 22, 2018 at 6:58 PM, Jim Wilson <jimw@sifive.com> wrote: > On Wed, Mar 21, 2018 at 2:45 AM, Richard Biener > <richard.guenther@gmail.com> wrote: >> On Tue, Mar 20, 2018 at 11:10 PM, Jim Wilson <jimw@sifive.com> wrote: >>> This fixes a wrong-code issue on RISC-V, but in theory could be a problem for >>> any SHIFT_COUNT_TRUNCATED target. > >> IMHO the real issue is that SHIFT_COUNT_TRUNCATED is used for >> optimizing away the and early. We then rely on the compiler not >> assuming anything on the value. Like if you'd do that on GIMPLE >> VRP would come along and second-guess you because the shift >> value may not be too large. This combine issue sounds very similar. >> >> So I'm suggesting again to instead provide patterns that >> match (shft A (and B cst)) > > I found a message from 2013 that talks about this. > https://gcc.gnu.org/ml/gcc-patches/2013-11/msg00169.html > I can try looking at this. > > I'd question the timing though. I don't think that trying to rewrite > shift patterns in the RISC-V port at this point in the release process > is a good idea. I think we should fix the combine bug now with a > simple patch, and try reworking the shift support after the gcc-8 > release. I'm leaving the "simple" combiner patch to review by others but for RISC-V you could simply #define SHIFT_COUNT_TRUNCATED to zero to fix the issue. Then add patterns if it turns out to be required to avoid regressions. For example x86 has ;; Avoid useless masking of count operand. (define_insn_and_split "*<shift_insn><mode>3_mask" [(set (match_operand:SWI48 0 "nonimmediate_operand") (any_shiftrt:SWI48 (match_operand:SWI48 1 "nonimmediate_operand") (subreg:QI (and:SI (match_operand:SI 2 "register_operand" "c,r") (match_operand:SI 3 "const_int_operand")) 0))) (clobber (reg:CC FLAGS_REG))] "ix86_binary_operator_ok (<CODE>, <MODE>mode, operands) && (INTVAL (operands[3]) & (GET_MODE_BITSIZE (<MODE>mode)-1)) == GET_MODE_BITSIZE (<MODE>mode)-1 && can_create_pseudo_p ()" "#" "&& 1" [(parallel [(set (match_dup 0) (any_shiftrt:SWI48 (match_dup 1) (match_dup 2))) (clobber (reg:CC FLAGS_REG))])] "operands[2] = gen_lowpart (QImode, operands[2]);" [(set_attr "isa" "*,bmi2")]) not sure why it uses define_insn_and_split here. The splitter exactly re-introduces the issue you hit with SHIFT_COUNT_TRUNCATED simplification... I suppose shift expanders of the x86 backends ensure QImode shift counts here to more reliably match the above. Richard. > Jim
On Fri, Mar 23, 2018 at 5:33 AM, Richard Biener <richard.guenther@gmail.com> wrote: > I'm leaving the "simple" combiner patch to review by others > but for RISC-V you could simply #define SHIFT_COUNT_TRUNCATED to zero > to fix the issue. Then add patterns if it turns out to be required > to avoid regressions. For example x86 has I checked in a patch to do this. It took me quite a few iterations to work out how to do it without regressions, but it looks OK in the end. 6 additional patterns and changing the shift count mode to QImode in all existing shift patterns and I get effectively the same code as before with SHIFT_COUNT_TRUNCATED turned off. With DImode shift counts, I ran into problems with unnecessary sign/zero extensions from SImode for shift counts. With SImode shift counts, I ran into problems with unnecessary truncations from DImode. Using QImode shift counts avoided both of those problems. That leaves the other targets still using SHIFT_COUNT_TRUNCATED as potentially broken. I count 16 of them. Though this particular testcase might not be triggerable on most of them, since it requires shift patterns for two different modes, and most 32-bit targets probably only have one mode that shift is supported for. Jim
On Tue, Apr 3, 2018 at 12:48 AM, Jim Wilson <jimw@sifive.com> wrote: > On Fri, Mar 23, 2018 at 5:33 AM, Richard Biener > <richard.guenther@gmail.com> wrote: >> I'm leaving the "simple" combiner patch to review by others >> but for RISC-V you could simply #define SHIFT_COUNT_TRUNCATED to zero >> to fix the issue. Then add patterns if it turns out to be required >> to avoid regressions. For example x86 has > > I checked in a patch to do this. It took me quite a few iterations to > work out how to do it without regressions, but it looks OK in the end. > 6 additional patterns and changing the shift count mode to QImode in > all existing shift patterns and I get effectively the same code as > before with SHIFT_COUNT_TRUNCATED turned off. With DImode shift > counts, I ran into problems with unnecessary sign/zero extensions from > SImode for shift counts. With SImode shift counts, I ran into > problems with unnecessary truncations from DImode. Using QImode shift > counts avoided both of those problems. > > That leaves the other targets still using SHIFT_COUNT_TRUNCATED as > potentially broken. I count 16 of them. Though this particular > testcase might not be triggerable on most of them, since it requires > shift patterns for two different modes, and most 32-bit targets > probably only have one mode that shift is supported for. Thanks a lot for going this route for fixing. I only count the sparc as important platform unconditionally defining it to 1, the rest is weird embedded targets or targest with at least one mode where it is set to zero. OK, maybe add mips to the list of targets. So I'd be happy to just kill it off early during GCC 9 development... But I guess I proposed this before and people weren't entirely happy. Richard. > Jim
diff --git a/gcc/combine.c b/gcc/combine.c index ff672ad2adb..4ed59eb88c8 100644 --- a/gcc/combine.c +++ b/gcc/combine.c @@ -8897,14 +8897,20 @@ force_int_to_mode (rtx x, scalar_int_mode mode, scalar_int_mode xmode, However, we cannot do anything with shifts where we cannot guarantee that the counts are smaller than the size of the mode because such a count will have a different meaning in a - wider mode. */ + different mode. If we are narrowing the mode, the shift count must + be compared against mode. If we are widening the mode, and shift + counts are truncated, then the shift count must be compared against + xmode. */ if (! (CONST_INT_P (XEXP (x, 1)) && INTVAL (XEXP (x, 1)) >= 0 && INTVAL (XEXP (x, 1)) < GET_MODE_PRECISION (mode)) && ! (GET_MODE (XEXP (x, 1)) != VOIDmode && (nonzero_bits (XEXP (x, 1), GET_MODE (XEXP (x, 1))) - < (unsigned HOST_WIDE_INT) GET_MODE_PRECISION (mode)))) + < (unsigned HOST_WIDE_INT) GET_MODE_PRECISION (mode)) + && (! SHIFT_COUNT_TRUNCATED + || (nonzero_bits (XEXP (x, 1), GET_MODE (XEXP (x, 1))) + < (unsigned HOST_WIDE_INT) GET_MODE_PRECISION (xmode))))) break; /* If the shift count is a constant and we can do arithmetic in diff --git a/gcc/testsuite/gcc.dg/pr84660.c b/gcc/testsuite/gcc.dg/pr84660.c new file mode 100644 index 00000000000..a87fa0a914d --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr84660.c @@ -0,0 +1,17 @@ +/* { dg-do run } */ +/* { dg-options "-O2" } */ + +extern void abort (void); +extern void exit (int); + +unsigned int __attribute__ ((noinline, noclone)) +foo(unsigned int i) { + + return 0xFFFF & (0xd066 << (((i & 0x1) ^ 0x2f) & 0xf)); +} + +int main() { + if (foo (1) != 0x8000) + abort (); + exit (0); +}