diff mbox series

ARM cmpsi2_addneg fixes (PR target/89506)

Message ID 20190228214315.GH7611@tucnak
State New
Headers show
Series ARM cmpsi2_addneg fixes (PR target/89506) | expand

Commit Message

Jakub Jelinek Feb. 28, 2019, 9:43 p.m. UTC
Hi!

The following testcase ICEs on ARM, because the backend creates CONST_INTs
that aren't valid for SImode, in which they are used (0x80000000 rather than
the canonical -0x80000000).  This is fixed by the 3 gen_int_mode calls
instead of just GEN_INT.
Once that is fixed, we ICE because if both the CONST_INTs are -0x80000000,
then INTVAL (operands[2]) == -INTVAL (operands[3]) is no longer true and
thus cmpsi2_addneg doesn't match and no other pattern does.
Fixing that is pretty obvious thing to do as well, similarly to the recent
*subsi3_carryin_compare_const fix.

The next problem is that the result doesn't bootstrap.  The problem is that
the instruction emitted for that -0x80000000 immediate - adds reg, reg, #-0x80000000
actually doesn't do what the RTL pattern for it says.
The pattern is like subsi3_compare, except that the MINUS with
constant second argument is canonicalized as RTL normally does into PLUS of
the negated constant.  The mode on the condition code register is CCmode,
so all Z, N, C and V bits should be valid, and the pattern is like that of
a cmp instruction, so the behavior vs. condition codes should be like cmp
instruction, which is what the subs instruction does.  The adds r1, r2, #N
and subs r1, r2, #-N instructions actually behave the same most of the time.
The result is the same, Z and N flags are always the same as well.  And,
it seems that for all N except for 0 and 0x80000000 also the C and V bits
are the same (I've proved that by using the valgrind subs condition code
algorithm (which is the same as cmp) vs. valgrind adds condition code
algorithm (which is the same as cmn) and even emulated it using 8-bit x
8-bit exhaustive check).  For 0 and 0x80000000 it is different and as can be
seen by the bootstrap failure, it is significant.
Previously before the above change we got by by the pattern never triggering
by the combiner for 0x80000000 (because the combiner would always try
canonical CONST_INTs for both and those don't have INTVAL == -INTVAL), but
it worked for the *compare_scc splitter/*negscc/*thumb2_negscc patterns
(which created invalid RTL and used adds rather than subs, but didn't care,
as it only tested the Z bit using ne condition).

My next attempt was just to switch the two alternatives, so that if
operands[2] matches both "I" and "L" constraints and we can choose, we'd
use subs.  That cured the bootstrap failure, but introduced
+FAIL: gcc.target/arm/pr43920-2.c object-size text <= 54
+FAIL: gcc.target/arm/thumb2-cmpneg2add-2.c scan-assembler adds
regressions, in those 2 testcases neither the problematic 0 nor INT_MIN
is used, so both adds and subs are equivalent, but
-	adds	r2, r4, #1
+	subs	r2, r4, #-1
results in larger code.
Note, I've seen the patch creating smaller code in some cases too,
when the combiner matched the cmpsi2_addneg pattern with INT_MIN and
previously emitted loading the constant into some register and performing
subs with 3 registers instead of 2 and immediate.

So, this is another alternative I'm proposing, just make sure we use
subs for #0 and #-2147483648 (where it is essential for correctness)
and for others keep previous behavior.

Ok for trunk?

Or is there an easy way to estimate if a constant satisfies both "I" and "L"
constraints at the same time and is not one of 0 and INT_MIN, which
of the two (positive or negated) would have smaller encoding and decide
based on that?

2019-02-28  Jakub Jelinek  <jakub@redhat.com>

	PR target/89506
	* config/arm/arm.md (cmpsi2_addneg): Use
	trunc_int_for_mode (-INTVAL (...), SImode) instead of -INTVAL (...).
	If operands[2] is 0 or INT_MIN, force use of subs.
	(*compare_scc splitter): Use gen_int_mode.
	(*negscc): Likewise.
	* config/arm/thumb2.md (*thumb2_negscc): Likewise.

	* gcc.dg/pr89506.c: New test.


	Jakub

Comments

Kyrill Tkachov March 1, 2019, 9:21 a.m. UTC | #1
Hi Jakub,

On 2/28/19 9:43 PM, Jakub Jelinek wrote:
> Hi!
>
> The following testcase ICEs on ARM, because the backend creates CONST_INTs
> that aren't valid for SImode, in which they are used (0x80000000 rather than
> the canonical -0x80000000).  This is fixed by the 3 gen_int_mode calls
> instead of just GEN_INT.
> Once that is fixed, we ICE because if both the CONST_INTs are -0x80000000,
> then INTVAL (operands[2]) == -INTVAL (operands[3]) is no longer true and
> thus cmpsi2_addneg doesn't match and no other pattern does.
> Fixing that is pretty obvious thing to do as well, similarly to the recent
> *subsi3_carryin_compare_const fix.
>
> The next problem is that the result doesn't bootstrap.  The problem is that
> the instruction emitted for that -0x80000000 immediate - adds reg, reg, #-0x80000000
> actually doesn't do what the RTL pattern for it says.
> The pattern is like subsi3_compare, except that the MINUS with
> constant second argument is canonicalized as RTL normally does into PLUS of
> the negated constant.  The mode on the condition code register is CCmode,
> so all Z, N, C and V bits should be valid, and the pattern is like that of
> a cmp instruction, so the behavior vs. condition codes should be like cmp
> instruction, which is what the subs instruction does.  The adds r1, r2, #N
> and subs r1, r2, #-N instructions actually behave the same most of the time.
> The result is the same, Z and N flags are always the same as well.  And,
> it seems that for all N except for 0 and 0x80000000 also the C and V bits
> are the same (I've proved that by using the valgrind subs condition code
> algorithm (which is the same as cmp) vs. valgrind adds condition code
> algorithm (which is the same as cmn) and even emulated it using 8-bit x
> 8-bit exhaustive check).  For 0 and 0x80000000 it is different and as can be
> seen by the bootstrap failure, it is significant.
> Previously before the above change we got by by the pattern never triggering
> by the combiner for 0x80000000 (because the combiner would always try
> canonical CONST_INTs for both and those don't have INTVAL == -INTVAL), but
> it worked for the *compare_scc splitter/*negscc/*thumb2_negscc patterns
> (which created invalid RTL and used adds rather than subs, but didn't care,
> as it only tested the Z bit using ne condition).
>
> My next attempt was just to switch the two alternatives, so that if
> operands[2] matches both "I" and "L" constraints and we can choose, we'd
> use subs.  That cured the bootstrap failure, but introduced
> +FAIL: gcc.target/arm/pr43920-2.c object-size text <= 54
> +FAIL: gcc.target/arm/thumb2-cmpneg2add-2.c scan-assembler adds
> regressions, in those 2 testcases neither the problematic 0 nor INT_MIN
> is used, so both adds and subs are equivalent, but
> -	adds	r2, r4, #1
> +	subs	r2, r4, #-1
> results in larger code.
> Note, I've seen the patch creating smaller code in some cases too,
> when the combiner matched the cmpsi2_addneg pattern with INT_MIN and
> previously emitted loading the constant into some register and performing
> subs with 3 registers instead of 2 and immediate.
>
> So, this is another alternative I'm proposing, just make sure we use
> subs for #0 and #-2147483648 (where it is essential for correctness)
> and for others keep previous behavior.
>
> Ok for trunk?

Ok.


>
> Or is there an easy way to estimate if a constant satisfies both "I" and "L"
> constraints at the same time and is not one of 0 and INT_MIN, which
> of the two (positive or negated) would have smaller encoding and decide
> based on that?

Don't think there's a cleaner way of doing that. There are some helper 
functions in arm.c that can estimate the number instructions neeedd to 
synthesise a constant, but nothing that takes size into account. The 
encoding size can only change for TARGET_THUMB2 (not TARGET_ARM) but 
it's not worth gating that decision on TARGET_THUMB2 as well IMO.

Thanks,

Kyrill

>
> 2019-02-28  Jakub Jelinek  <jakub@redhat.com>
>
> 	PR target/89506
> 	* config/arm/arm.md (cmpsi2_addneg): Use
> 	trunc_int_for_mode (-INTVAL (...), SImode) instead of -INTVAL (...).
> 	If operands[2] is 0 or INT_MIN, force use of subs.
> 	(*compare_scc splitter): Use gen_int_mode.
> 	(*negscc): Likewise.
> 	* config/arm/thumb2.md (*thumb2_negscc): Likewise.
>
> 	* gcc.dg/pr89506.c: New test.
>
> --- gcc/config/arm/arm.md.jj	2019-02-28 14:13:08.368267536 +0100
> +++ gcc/config/arm/arm.md	2019-02-28 22:09:03.089588596 +0100
> @@ -867,10 +867,21 @@ (define_insn "cmpsi2_addneg"
>      (set (match_operand:SI 0 "s_register_operand" "=r,r")
>   	(plus:SI (match_dup 1)
>   		 (match_operand:SI 3 "arm_addimm_operand" "I,L")))]
> -  "TARGET_32BIT && INTVAL (operands[2]) == -INTVAL (operands[3])"
> -  "@
> -   adds%?\\t%0, %1, %3
> -   subs%?\\t%0, %1, #%n3"
> +  "TARGET_32BIT
> +   && (INTVAL (operands[2])
> +       == trunc_int_for_mode (-INTVAL (operands[3]), SImode))"
> +{
> +  /* For 0 and INT_MIN it is essential that we use subs, as adds
> +     will result in different condition codes (like cmn rather than
> +     like cmp).  For other immediates, we should choose whatever
> +     will have smaller encoding.  */
> +  if (operands[2] == const0_rtx
> +      || INTVAL (operands[2]) == -HOST_WIDE_INT_C (0x80000000)
> +      || which_alternative == 1)
> +    return "subs%?\\t%0, %1, #%n3";
> +  else
> +    return "adds%?\\t%0, %1, %3";
> +}
>     [(set_attr "conds" "set")
>      (set_attr "type" "alus_sreg")]
>   )
> @@ -9302,7 +9313,7 @@ (define_split
>      (cond_exec (ne:CC (reg:CC CC_REGNUM) (const_int 0))
>   	      (set (match_dup 0) (const_int 1)))]
>   {
> -  operands[3] = GEN_INT (-INTVAL (operands[2]));
> +  operands[3] = gen_int_mode (-INTVAL (operands[2]), SImode);
>   })
>   
>   (define_split
> @@ -10082,7 +10093,8 @@ (define_insn_and_split "*negscc"
>           /* Emit subs\\t%0, %1, %2\;mvnne\\t%0, #0 */
>           if (CONST_INT_P (operands[2]))
>             emit_insn (gen_cmpsi2_addneg (operands[0], operands[1], operands[2],
> -                                        GEN_INT (- INTVAL (operands[2]))));
> +                                        gen_int_mode (-INTVAL (operands[2]),
> +						      SImode)));
>           else
>             emit_insn (gen_subsi3_compare (operands[0], operands[1], operands[2]));
>   
> --- gcc/config/arm/thumb2.md.jj	2019-02-28 08:14:55.970600405 +0100
> +++ gcc/config/arm/thumb2.md	2019-02-28 21:51:20.387902673 +0100
> @@ -913,7 +913,8 @@ (define_insn_and_split "*thumb2_negscc"
>           /* Emit subs\\t%0, %1, %2\;it\\tne\;mvnne\\t%0, #0 */
>           if (CONST_INT_P (operands[2]))
>             emit_insn (gen_cmpsi2_addneg (operands[0], operands[1], operands[2],
> -                                        GEN_INT (- INTVAL (operands[2]))));
> +                                        gen_int_mode (-INTVAL (operands[2]),
> +						      SImode)));
>           else
>             emit_insn (gen_subsi3_compare (operands[0], operands[1], operands[2]));
>   
> --- gcc/testsuite/gcc.dg/pr89506.c.jj	2019-02-28 21:51:20.387902673 +0100
> +++ gcc/testsuite/gcc.dg/pr89506.c	2019-02-28 21:51:20.387902673 +0100
> @@ -0,0 +1,14 @@
> +/* PR target/89506 */
> +/* { dg-do compile } */
> +/* { dg-options "-Og -g -w" } */
> +
> +long long a;
> +int c;
> +
> +int
> +foo (long long d, short e)
> +{
> +  __builtin_sub_overflow (0xffffffff, c, &a);
> +  e >>= ~2147483647 != (int) a;
> +  return d + e;
> +}
>
> 	Jakub
Jakub Jelinek March 1, 2019, 9:36 a.m. UTC | #2
On Fri, Mar 01, 2019 at 09:21:16AM +0000, Kyrill Tkachov wrote:
> > Ok for trunk?
> 
> Ok.

Thanks.  I'll wait for my regtest, previously I've regtested it only with
an older version of this patch.

> > Or is there an easy way to estimate if a constant satisfies both "I" and "L"
> > constraints at the same time and is not one of 0 and INT_MIN, which
> > of the two (positive or negated) would have smaller encoding and decide
> > based on that?
> 
> Don't think there's a cleaner way of doing that. There are some helper
> functions in arm.c that can estimate the number instructions neeedd to
> synthesise a constant, but nothing that takes size into account. The
> encoding size can only change for TARGET_THUMB2 (not TARGET_ARM) but it's
> not worth gating that decision on TARGET_THUMB2 as well IMO.

Well, with the patch the decision which insn is chosen is done in C code.
So it could look like:
  if (operands[2] == const0_rtx
      || INTVAL (operands[2]) == -HOST_WIDE_INT_C (0x80000000)
    subs; // mandatory
  else if (TARGET_THUMB2
	   && arm_immediate_operand (operands[2], SImode)
	   && arm_neg_immediate_operand (operands[2], SImode))
    {
      // Both adds and subs can do the job here, and unlike
      // non-thumb mode, the instructions can have different
      // sizes.  Pick the shorter one.
    }
  else if (which_alternative == 1)
    subs;
  else
    adds;

This can be done incrementally, I just have no idea what the rules
for thumb2 constant encoding are.

	Jakub
Kyrill Tkachov March 1, 2019, 9:42 a.m. UTC | #3
On 3/1/19 9:36 AM, Jakub Jelinek wrote:
> On Fri, Mar 01, 2019 at 09:21:16AM +0000, Kyrill Tkachov wrote:
>>> Ok for trunk?
>> Ok.
> Thanks.  I'll wait for my regtest, previously I've regtested it only with
> an older version of this patch.
>
>>> Or is there an easy way to estimate if a constant satisfies both "I" and "L"
>>> constraints at the same time and is not one of 0 and INT_MIN, which
>>> of the two (positive or negated) would have smaller encoding and decide
>>> based on that?
>> Don't think there's a cleaner way of doing that. There are some helper
>> functions in arm.c that can estimate the number instructions neeedd to
>> synthesise a constant, but nothing that takes size into account. The
>> encoding size can only change for TARGET_THUMB2 (not TARGET_ARM) but it's
>> not worth gating that decision on TARGET_THUMB2 as well IMO.
> Well, with the patch the decision which insn is chosen is done in C code.
> So it could look like:
>    if (operands[2] == const0_rtx
>        || INTVAL (operands[2]) == -HOST_WIDE_INT_C (0x80000000)
>      subs; // mandatory
>    else if (TARGET_THUMB2
> 	   && arm_immediate_operand (operands[2], SImode)
> 	   && arm_neg_immediate_operand (operands[2], SImode))
>      {
>        // Both adds and subs can do the job here, and unlike
>        // non-thumb mode, the instructions can have different
>        // sizes.  Pick the shorter one.
>      }
>    else if (which_alternative == 1)
>      subs;
>    else
>      adds;
>
> This can be done incrementally, I just have no idea what the rules
> for thumb2 constant encoding are.


Yeah, I had considered this. But I don't think it's worth the extra 
code. I'm not aware of any implementation where adds and subs differ in 
performance and I'd be surprised so when there's no size advantage I 
wouldn't be picky.

Thanks,

Kyrill


>
> 	Jakub
Kyrill Tkachov March 1, 2019, 9:43 a.m. UTC | #4
On 3/1/19 9:42 AM, Kyrill Tkachov wrote:
>
> On 3/1/19 9:36 AM, Jakub Jelinek wrote:
> > On Fri, Mar 01, 2019 at 09:21:16AM +0000, Kyrill Tkachov wrote:
> >>> Ok for trunk?
> >> Ok.
> > Thanks.  I'll wait for my regtest, previously I've regtested it only 
> with
> > an older version of this patch.
> >
> >>> Or is there an easy way to estimate if a constant satisfies both 
> "I" and "L"
> >>> constraints at the same time and is not one of 0 and INT_MIN, which
> >>> of the two (positive or negated) would have smaller encoding and 
> decide
> >>> based on that?
> >> Don't think there's a cleaner way of doing that. There are some helper
> >> functions in arm.c that can estimate the number instructions neeedd to
> >> synthesise a constant, but nothing that takes size into account. The
> >> encoding size can only change for TARGET_THUMB2 (not TARGET_ARM) 
> but it's
> >> not worth gating that decision on TARGET_THUMB2 as well IMO.
> > Well, with the patch the decision which insn is chosen is done in C 
> code.
> > So it could look like:
> >    if (operands[2] == const0_rtx
> >        || INTVAL (operands[2]) == -HOST_WIDE_INT_C (0x80000000)
> >      subs; // mandatory
> >    else if (TARGET_THUMB2
> >           && arm_immediate_operand (operands[2], SImode)
> >           && arm_neg_immediate_operand (operands[2], SImode))
> >      {
> >        // Both adds and subs can do the job here, and unlike
> >        // non-thumb mode, the instructions can have different
> >        // sizes.  Pick the shorter one.
> >      }
> >    else if (which_alternative == 1)
> >      subs;
> >    else
> >      adds;
> >
> > This can be done incrementally, I just have no idea what the rules
> > for thumb2 constant encoding are.
>
>
> Yeah, I had considered this. But I don't think it's worth the extra
> code. I'm not aware of any implementation where adds and subs differ in
> performance and I'd be surprised so when there's no size advantage I
> wouldn't be picky.


Two thoughts collided in my head on the last sentence. I meant to say:

"I'm not aware of any implementation where adds and subs differ in
performance so when there's no size advantage I wouldn't be picky."

Thanks,

Kyrill

>
> Thanks,
>
> Kyrill
>
>
> >
> >        Jakub
diff mbox series

Patch

--- gcc/config/arm/arm.md.jj	2019-02-28 14:13:08.368267536 +0100
+++ gcc/config/arm/arm.md	2019-02-28 22:09:03.089588596 +0100
@@ -867,10 +867,21 @@  (define_insn "cmpsi2_addneg"
    (set (match_operand:SI 0 "s_register_operand" "=r,r")
 	(plus:SI (match_dup 1)
 		 (match_operand:SI 3 "arm_addimm_operand" "I,L")))]
-  "TARGET_32BIT && INTVAL (operands[2]) == -INTVAL (operands[3])"
-  "@
-   adds%?\\t%0, %1, %3
-   subs%?\\t%0, %1, #%n3"
+  "TARGET_32BIT
+   && (INTVAL (operands[2])
+       == trunc_int_for_mode (-INTVAL (operands[3]), SImode))"
+{
+  /* For 0 and INT_MIN it is essential that we use subs, as adds
+     will result in different condition codes (like cmn rather than
+     like cmp).  For other immediates, we should choose whatever
+     will have smaller encoding.  */
+  if (operands[2] == const0_rtx
+      || INTVAL (operands[2]) == -HOST_WIDE_INT_C (0x80000000)
+      || which_alternative == 1)
+    return "subs%?\\t%0, %1, #%n3";
+  else
+    return "adds%?\\t%0, %1, %3";
+}
   [(set_attr "conds" "set")
    (set_attr "type" "alus_sreg")]
 )
@@ -9302,7 +9313,7 @@  (define_split
    (cond_exec (ne:CC (reg:CC CC_REGNUM) (const_int 0))
 	      (set (match_dup 0) (const_int 1)))]
 {
-  operands[3] = GEN_INT (-INTVAL (operands[2]));
+  operands[3] = gen_int_mode (-INTVAL (operands[2]), SImode);
 })
 
 (define_split
@@ -10082,7 +10093,8 @@  (define_insn_and_split "*negscc"
         /* Emit subs\\t%0, %1, %2\;mvnne\\t%0, #0 */
         if (CONST_INT_P (operands[2]))
           emit_insn (gen_cmpsi2_addneg (operands[0], operands[1], operands[2],
-                                        GEN_INT (- INTVAL (operands[2]))));
+                                        gen_int_mode (-INTVAL (operands[2]),
+						      SImode)));
         else
           emit_insn (gen_subsi3_compare (operands[0], operands[1], operands[2]));
 
--- gcc/config/arm/thumb2.md.jj	2019-02-28 08:14:55.970600405 +0100
+++ gcc/config/arm/thumb2.md	2019-02-28 21:51:20.387902673 +0100
@@ -913,7 +913,8 @@  (define_insn_and_split "*thumb2_negscc"
         /* Emit subs\\t%0, %1, %2\;it\\tne\;mvnne\\t%0, #0 */
         if (CONST_INT_P (operands[2]))
           emit_insn (gen_cmpsi2_addneg (operands[0], operands[1], operands[2],
-                                        GEN_INT (- INTVAL (operands[2]))));
+                                        gen_int_mode (-INTVAL (operands[2]),
+						      SImode)));
         else
           emit_insn (gen_subsi3_compare (operands[0], operands[1], operands[2]));
 
--- gcc/testsuite/gcc.dg/pr89506.c.jj	2019-02-28 21:51:20.387902673 +0100
+++ gcc/testsuite/gcc.dg/pr89506.c	2019-02-28 21:51:20.387902673 +0100
@@ -0,0 +1,14 @@ 
+/* PR target/89506 */
+/* { dg-do compile } */
+/* { dg-options "-Og -g -w" } */
+
+long long a;
+int c;
+
+int
+foo (long long d, short e)
+{
+  __builtin_sub_overflow (0xffffffff, c, &a);
+  e >>= ~2147483647 != (int) a;
+  return d + e;
+}