Message ID | 20190228214315.GH7611@tucnak |
---|---|
State | New |
Headers | show |
Series | ARM cmpsi2_addneg fixes (PR target/89506) | expand |
Hi Jakub, On 2/28/19 9:43 PM, Jakub Jelinek wrote: > Hi! > > The following testcase ICEs on ARM, because the backend creates CONST_INTs > that aren't valid for SImode, in which they are used (0x80000000 rather than > the canonical -0x80000000). This is fixed by the 3 gen_int_mode calls > instead of just GEN_INT. > Once that is fixed, we ICE because if both the CONST_INTs are -0x80000000, > then INTVAL (operands[2]) == -INTVAL (operands[3]) is no longer true and > thus cmpsi2_addneg doesn't match and no other pattern does. > Fixing that is pretty obvious thing to do as well, similarly to the recent > *subsi3_carryin_compare_const fix. > > The next problem is that the result doesn't bootstrap. The problem is that > the instruction emitted for that -0x80000000 immediate - adds reg, reg, #-0x80000000 > actually doesn't do what the RTL pattern for it says. > The pattern is like subsi3_compare, except that the MINUS with > constant second argument is canonicalized as RTL normally does into PLUS of > the negated constant. The mode on the condition code register is CCmode, > so all Z, N, C and V bits should be valid, and the pattern is like that of > a cmp instruction, so the behavior vs. condition codes should be like cmp > instruction, which is what the subs instruction does. The adds r1, r2, #N > and subs r1, r2, #-N instructions actually behave the same most of the time. > The result is the same, Z and N flags are always the same as well. And, > it seems that for all N except for 0 and 0x80000000 also the C and V bits > are the same (I've proved that by using the valgrind subs condition code > algorithm (which is the same as cmp) vs. valgrind adds condition code > algorithm (which is the same as cmn) and even emulated it using 8-bit x > 8-bit exhaustive check). For 0 and 0x80000000 it is different and as can be > seen by the bootstrap failure, it is significant. > Previously before the above change we got by by the pattern never triggering > by the combiner for 0x80000000 (because the combiner would always try > canonical CONST_INTs for both and those don't have INTVAL == -INTVAL), but > it worked for the *compare_scc splitter/*negscc/*thumb2_negscc patterns > (which created invalid RTL and used adds rather than subs, but didn't care, > as it only tested the Z bit using ne condition). > > My next attempt was just to switch the two alternatives, so that if > operands[2] matches both "I" and "L" constraints and we can choose, we'd > use subs. That cured the bootstrap failure, but introduced > +FAIL: gcc.target/arm/pr43920-2.c object-size text <= 54 > +FAIL: gcc.target/arm/thumb2-cmpneg2add-2.c scan-assembler adds > regressions, in those 2 testcases neither the problematic 0 nor INT_MIN > is used, so both adds and subs are equivalent, but > - adds r2, r4, #1 > + subs r2, r4, #-1 > results in larger code. > Note, I've seen the patch creating smaller code in some cases too, > when the combiner matched the cmpsi2_addneg pattern with INT_MIN and > previously emitted loading the constant into some register and performing > subs with 3 registers instead of 2 and immediate. > > So, this is another alternative I'm proposing, just make sure we use > subs for #0 and #-2147483648 (where it is essential for correctness) > and for others keep previous behavior. > > Ok for trunk? Ok. > > Or is there an easy way to estimate if a constant satisfies both "I" and "L" > constraints at the same time and is not one of 0 and INT_MIN, which > of the two (positive or negated) would have smaller encoding and decide > based on that? Don't think there's a cleaner way of doing that. There are some helper functions in arm.c that can estimate the number instructions neeedd to synthesise a constant, but nothing that takes size into account. The encoding size can only change for TARGET_THUMB2 (not TARGET_ARM) but it's not worth gating that decision on TARGET_THUMB2 as well IMO. Thanks, Kyrill > > 2019-02-28 Jakub Jelinek <jakub@redhat.com> > > PR target/89506 > * config/arm/arm.md (cmpsi2_addneg): Use > trunc_int_for_mode (-INTVAL (...), SImode) instead of -INTVAL (...). > If operands[2] is 0 or INT_MIN, force use of subs. > (*compare_scc splitter): Use gen_int_mode. > (*negscc): Likewise. > * config/arm/thumb2.md (*thumb2_negscc): Likewise. > > * gcc.dg/pr89506.c: New test. > > --- gcc/config/arm/arm.md.jj 2019-02-28 14:13:08.368267536 +0100 > +++ gcc/config/arm/arm.md 2019-02-28 22:09:03.089588596 +0100 > @@ -867,10 +867,21 @@ (define_insn "cmpsi2_addneg" > (set (match_operand:SI 0 "s_register_operand" "=r,r") > (plus:SI (match_dup 1) > (match_operand:SI 3 "arm_addimm_operand" "I,L")))] > - "TARGET_32BIT && INTVAL (operands[2]) == -INTVAL (operands[3])" > - "@ > - adds%?\\t%0, %1, %3 > - subs%?\\t%0, %1, #%n3" > + "TARGET_32BIT > + && (INTVAL (operands[2]) > + == trunc_int_for_mode (-INTVAL (operands[3]), SImode))" > +{ > + /* For 0 and INT_MIN it is essential that we use subs, as adds > + will result in different condition codes (like cmn rather than > + like cmp). For other immediates, we should choose whatever > + will have smaller encoding. */ > + if (operands[2] == const0_rtx > + || INTVAL (operands[2]) == -HOST_WIDE_INT_C (0x80000000) > + || which_alternative == 1) > + return "subs%?\\t%0, %1, #%n3"; > + else > + return "adds%?\\t%0, %1, %3"; > +} > [(set_attr "conds" "set") > (set_attr "type" "alus_sreg")] > ) > @@ -9302,7 +9313,7 @@ (define_split > (cond_exec (ne:CC (reg:CC CC_REGNUM) (const_int 0)) > (set (match_dup 0) (const_int 1)))] > { > - operands[3] = GEN_INT (-INTVAL (operands[2])); > + operands[3] = gen_int_mode (-INTVAL (operands[2]), SImode); > }) > > (define_split > @@ -10082,7 +10093,8 @@ (define_insn_and_split "*negscc" > /* Emit subs\\t%0, %1, %2\;mvnne\\t%0, #0 */ > if (CONST_INT_P (operands[2])) > emit_insn (gen_cmpsi2_addneg (operands[0], operands[1], operands[2], > - GEN_INT (- INTVAL (operands[2])))); > + gen_int_mode (-INTVAL (operands[2]), > + SImode))); > else > emit_insn (gen_subsi3_compare (operands[0], operands[1], operands[2])); > > --- gcc/config/arm/thumb2.md.jj 2019-02-28 08:14:55.970600405 +0100 > +++ gcc/config/arm/thumb2.md 2019-02-28 21:51:20.387902673 +0100 > @@ -913,7 +913,8 @@ (define_insn_and_split "*thumb2_negscc" > /* Emit subs\\t%0, %1, %2\;it\\tne\;mvnne\\t%0, #0 */ > if (CONST_INT_P (operands[2])) > emit_insn (gen_cmpsi2_addneg (operands[0], operands[1], operands[2], > - GEN_INT (- INTVAL (operands[2])))); > + gen_int_mode (-INTVAL (operands[2]), > + SImode))); > else > emit_insn (gen_subsi3_compare (operands[0], operands[1], operands[2])); > > --- gcc/testsuite/gcc.dg/pr89506.c.jj 2019-02-28 21:51:20.387902673 +0100 > +++ gcc/testsuite/gcc.dg/pr89506.c 2019-02-28 21:51:20.387902673 +0100 > @@ -0,0 +1,14 @@ > +/* PR target/89506 */ > +/* { dg-do compile } */ > +/* { dg-options "-Og -g -w" } */ > + > +long long a; > +int c; > + > +int > +foo (long long d, short e) > +{ > + __builtin_sub_overflow (0xffffffff, c, &a); > + e >>= ~2147483647 != (int) a; > + return d + e; > +} > > Jakub
On Fri, Mar 01, 2019 at 09:21:16AM +0000, Kyrill Tkachov wrote: > > Ok for trunk? > > Ok. Thanks. I'll wait for my regtest, previously I've regtested it only with an older version of this patch. > > Or is there an easy way to estimate if a constant satisfies both "I" and "L" > > constraints at the same time and is not one of 0 and INT_MIN, which > > of the two (positive or negated) would have smaller encoding and decide > > based on that? > > Don't think there's a cleaner way of doing that. There are some helper > functions in arm.c that can estimate the number instructions neeedd to > synthesise a constant, but nothing that takes size into account. The > encoding size can only change for TARGET_THUMB2 (not TARGET_ARM) but it's > not worth gating that decision on TARGET_THUMB2 as well IMO. Well, with the patch the decision which insn is chosen is done in C code. So it could look like: if (operands[2] == const0_rtx || INTVAL (operands[2]) == -HOST_WIDE_INT_C (0x80000000) subs; // mandatory else if (TARGET_THUMB2 && arm_immediate_operand (operands[2], SImode) && arm_neg_immediate_operand (operands[2], SImode)) { // Both adds and subs can do the job here, and unlike // non-thumb mode, the instructions can have different // sizes. Pick the shorter one. } else if (which_alternative == 1) subs; else adds; This can be done incrementally, I just have no idea what the rules for thumb2 constant encoding are. Jakub
On 3/1/19 9:36 AM, Jakub Jelinek wrote: > On Fri, Mar 01, 2019 at 09:21:16AM +0000, Kyrill Tkachov wrote: >>> Ok for trunk? >> Ok. > Thanks. I'll wait for my regtest, previously I've regtested it only with > an older version of this patch. > >>> Or is there an easy way to estimate if a constant satisfies both "I" and "L" >>> constraints at the same time and is not one of 0 and INT_MIN, which >>> of the two (positive or negated) would have smaller encoding and decide >>> based on that? >> Don't think there's a cleaner way of doing that. There are some helper >> functions in arm.c that can estimate the number instructions neeedd to >> synthesise a constant, but nothing that takes size into account. The >> encoding size can only change for TARGET_THUMB2 (not TARGET_ARM) but it's >> not worth gating that decision on TARGET_THUMB2 as well IMO. > Well, with the patch the decision which insn is chosen is done in C code. > So it could look like: > if (operands[2] == const0_rtx > || INTVAL (operands[2]) == -HOST_WIDE_INT_C (0x80000000) > subs; // mandatory > else if (TARGET_THUMB2 > && arm_immediate_operand (operands[2], SImode) > && arm_neg_immediate_operand (operands[2], SImode)) > { > // Both adds and subs can do the job here, and unlike > // non-thumb mode, the instructions can have different > // sizes. Pick the shorter one. > } > else if (which_alternative == 1) > subs; > else > adds; > > This can be done incrementally, I just have no idea what the rules > for thumb2 constant encoding are. Yeah, I had considered this. But I don't think it's worth the extra code. I'm not aware of any implementation where adds and subs differ in performance and I'd be surprised so when there's no size advantage I wouldn't be picky. Thanks, Kyrill > > Jakub
On 3/1/19 9:42 AM, Kyrill Tkachov wrote: > > On 3/1/19 9:36 AM, Jakub Jelinek wrote: > > On Fri, Mar 01, 2019 at 09:21:16AM +0000, Kyrill Tkachov wrote: > >>> Ok for trunk? > >> Ok. > > Thanks. I'll wait for my regtest, previously I've regtested it only > with > > an older version of this patch. > > > >>> Or is there an easy way to estimate if a constant satisfies both > "I" and "L" > >>> constraints at the same time and is not one of 0 and INT_MIN, which > >>> of the two (positive or negated) would have smaller encoding and > decide > >>> based on that? > >> Don't think there's a cleaner way of doing that. There are some helper > >> functions in arm.c that can estimate the number instructions neeedd to > >> synthesise a constant, but nothing that takes size into account. The > >> encoding size can only change for TARGET_THUMB2 (not TARGET_ARM) > but it's > >> not worth gating that decision on TARGET_THUMB2 as well IMO. > > Well, with the patch the decision which insn is chosen is done in C > code. > > So it could look like: > > if (operands[2] == const0_rtx > > || INTVAL (operands[2]) == -HOST_WIDE_INT_C (0x80000000) > > subs; // mandatory > > else if (TARGET_THUMB2 > > && arm_immediate_operand (operands[2], SImode) > > && arm_neg_immediate_operand (operands[2], SImode)) > > { > > // Both adds and subs can do the job here, and unlike > > // non-thumb mode, the instructions can have different > > // sizes. Pick the shorter one. > > } > > else if (which_alternative == 1) > > subs; > > else > > adds; > > > > This can be done incrementally, I just have no idea what the rules > > for thumb2 constant encoding are. > > > Yeah, I had considered this. But I don't think it's worth the extra > code. I'm not aware of any implementation where adds and subs differ in > performance and I'd be surprised so when there's no size advantage I > wouldn't be picky. Two thoughts collided in my head on the last sentence. I meant to say: "I'm not aware of any implementation where adds and subs differ in performance so when there's no size advantage I wouldn't be picky." Thanks, Kyrill > > Thanks, > > Kyrill > > > > > > Jakub
--- gcc/config/arm/arm.md.jj 2019-02-28 14:13:08.368267536 +0100 +++ gcc/config/arm/arm.md 2019-02-28 22:09:03.089588596 +0100 @@ -867,10 +867,21 @@ (define_insn "cmpsi2_addneg" (set (match_operand:SI 0 "s_register_operand" "=r,r") (plus:SI (match_dup 1) (match_operand:SI 3 "arm_addimm_operand" "I,L")))] - "TARGET_32BIT && INTVAL (operands[2]) == -INTVAL (operands[3])" - "@ - adds%?\\t%0, %1, %3 - subs%?\\t%0, %1, #%n3" + "TARGET_32BIT + && (INTVAL (operands[2]) + == trunc_int_for_mode (-INTVAL (operands[3]), SImode))" +{ + /* For 0 and INT_MIN it is essential that we use subs, as adds + will result in different condition codes (like cmn rather than + like cmp). For other immediates, we should choose whatever + will have smaller encoding. */ + if (operands[2] == const0_rtx + || INTVAL (operands[2]) == -HOST_WIDE_INT_C (0x80000000) + || which_alternative == 1) + return "subs%?\\t%0, %1, #%n3"; + else + return "adds%?\\t%0, %1, %3"; +} [(set_attr "conds" "set") (set_attr "type" "alus_sreg")] ) @@ -9302,7 +9313,7 @@ (define_split (cond_exec (ne:CC (reg:CC CC_REGNUM) (const_int 0)) (set (match_dup 0) (const_int 1)))] { - operands[3] = GEN_INT (-INTVAL (operands[2])); + operands[3] = gen_int_mode (-INTVAL (operands[2]), SImode); }) (define_split @@ -10082,7 +10093,8 @@ (define_insn_and_split "*negscc" /* Emit subs\\t%0, %1, %2\;mvnne\\t%0, #0 */ if (CONST_INT_P (operands[2])) emit_insn (gen_cmpsi2_addneg (operands[0], operands[1], operands[2], - GEN_INT (- INTVAL (operands[2])))); + gen_int_mode (-INTVAL (operands[2]), + SImode))); else emit_insn (gen_subsi3_compare (operands[0], operands[1], operands[2])); --- gcc/config/arm/thumb2.md.jj 2019-02-28 08:14:55.970600405 +0100 +++ gcc/config/arm/thumb2.md 2019-02-28 21:51:20.387902673 +0100 @@ -913,7 +913,8 @@ (define_insn_and_split "*thumb2_negscc" /* Emit subs\\t%0, %1, %2\;it\\tne\;mvnne\\t%0, #0 */ if (CONST_INT_P (operands[2])) emit_insn (gen_cmpsi2_addneg (operands[0], operands[1], operands[2], - GEN_INT (- INTVAL (operands[2])))); + gen_int_mode (-INTVAL (operands[2]), + SImode))); else emit_insn (gen_subsi3_compare (operands[0], operands[1], operands[2])); --- gcc/testsuite/gcc.dg/pr89506.c.jj 2019-02-28 21:51:20.387902673 +0100 +++ gcc/testsuite/gcc.dg/pr89506.c 2019-02-28 21:51:20.387902673 +0100 @@ -0,0 +1,14 @@ +/* PR target/89506 */ +/* { dg-do compile } */ +/* { dg-options "-Og -g -w" } */ + +long long a; +int c; + +int +foo (long long d, short e) +{ + __builtin_sub_overflow (0xffffffff, c, &a); + e >>= ~2147483647 != (int) a; + return d + e; +}