Patchwork [PATCH:,PR,target/46975] Replace 32 bit instructions with 16 bit instructions in thumb2

login
register
mail settings
Submitter Carrot Wei
Date Dec. 16, 2010, 10:45 p.m.
Message ID <AANLkTinxr9MYzOtgiyPEBtkTu9tjm=NOussw3G8qrDek@mail.gmail.com>
Download mbox | patch
Permalink /patch/75815/
State New
Headers show

Comments

Carrot Wei - Dec. 16, 2010, 10:45 p.m.
Hi

Compile the following c code with options -march=armv7-a -mthumb -Os

int foo (int s)
{
    return s == 1;
}

GCC 4.6 generates:

00000000 <foo>:
   0:    f1a0 0301     sub.w    r3, r0, #1    // A
   4:    4258          negs    r0, r3
   6:    eb40 0003     adc.w    r0, r0, r3      // B
   a:    4770          bx    lr

Notice that instructions A and B are 32 bits. In thumb2 we can use subs and adcs
instead so they will be 16 bits.

The root cause is the following peephole2 generates 32 bit instructions:

8731 ;; Attempt to improve the sequence generated by the compare_scc splitters
8732 ;; not to use conditional execution.
8733 (define_peephole2
8734   [(set (reg:CC CC_REGNUM)
8735         (compare:CC (match_operand:SI 1 "register_operand" "")
8736                     (match_operand:SI 2 "arm_rhs_operand" "")))
8737    (cond_exec (ne (reg:CC CC_REGNUM) (const_int 0))
8738               (set (match_operand:SI 0 "register_operand" "")
(const_int 0)))
8739    (cond_exec (eq (reg:CC CC_REGNUM) (const_int 0))
8740               (set (match_dup 0) (const_int 1)))
8741    (match_scratch:SI 3 "r")]
8742   "TARGET_32BIT"
8743   [(set (match_dup 3) (minus:SI (match_dup 1) (match_dup 2)))
8744    (parallel
8745     [(set (reg:CC CC_REGNUM)
8746           (compare:CC (const_int 0) (match_dup 3)))
8747      (set (match_dup 0) (minus:SI (const_int 0) (match_dup 3)))])
8748    (set (match_dup 0)
8749         (plus:SI (plus:SI (match_dup 0) (match_dup 3))
8750                  (geu:SI (reg:CC CC_REGNUM) (const_int 0))))])
8751

This patch modifies the peephole2 to generate 16 bit instructions, and passed
regression test on arm qemu.

thanks
Guozhi


ChangeLog:
2010-12-16  Wei Guozhi  <carrot@google.com>

        PR target/46975
        * config/arm/arm.md (*addsi3_carryin_compare0_<optab>): New pattern.
        (peephole2 for conditional move): Generate 16 bit instructions.

ChangeLog:
2010-12-16  Wei Guozhi  <carrot@google.com>

        PR target/46975
        * gcc.target/arm/pr46975.c: New testcase.
Richard Earnshaw - Dec. 17, 2010, 12:18 p.m.
On Thu, 2010-12-16 at 14:45 -0800, Carrot Wei wrote:
> Hi
> 
> Compile the following c code with options -march=armv7-a -mthumb -Os
> 
> int foo (int s)
> {
>     return s == 1;
> }
> 
> GCC 4.6 generates:
> 
> 00000000 <foo>:
>    0:    f1a0 0301     sub.w    r3, r0, #1    // A
>    4:    4258          negs    r0, r3
>    6:    eb40 0003     adc.w    r0, r0, r3      // B
>    a:    4770          bx    lr
> 
> Notice that instructions A and B are 32 bits. In thumb2 we can use subs and adcs
> instead so they will be 16 bits.
> 

This sequence already contains an instruction that sets the flags as a
necessary part of the sequence.  Why doesn't it also generate
flag-corrupting variants of the other two instructions when the
registers selected are suitable?  It seems silly to force the compiler
to do yet more work to clean up this code.

R.

> The root cause is the following peephole2 generates 32 bit instructions:
> 
> 8731 ;; Attempt to improve the sequence generated by the compare_scc splitters
> 8732 ;; not to use conditional execution.
> 8733 (define_peephole2
> 8734   [(set (reg:CC CC_REGNUM)
> 8735         (compare:CC (match_operand:SI 1 "register_operand" "")
> 8736                     (match_operand:SI 2 "arm_rhs_operand" "")))
> 8737    (cond_exec (ne (reg:CC CC_REGNUM) (const_int 0))
> 8738               (set (match_operand:SI 0 "register_operand" "")
> (const_int 0)))
> 8739    (cond_exec (eq (reg:CC CC_REGNUM) (const_int 0))
> 8740               (set (match_dup 0) (const_int 1)))
> 8741    (match_scratch:SI 3 "r")]
> 8742   "TARGET_32BIT"
> 8743   [(set (match_dup 3) (minus:SI (match_dup 1) (match_dup 2)))
> 8744    (parallel
> 8745     [(set (reg:CC CC_REGNUM)
> 8746           (compare:CC (const_int 0) (match_dup 3)))
> 8747      (set (match_dup 0) (minus:SI (const_int 0) (match_dup 3)))])
> 8748    (set (match_dup 0)
> 8749         (plus:SI (plus:SI (match_dup 0) (match_dup 3))
> 8750                  (geu:SI (reg:CC CC_REGNUM) (const_int 0))))])
> 8751
> 
> This patch modifies the peephole2 to generate 16 bit instructions, and passed
> regression test on arm qemu.
> 
> thanks
> Guozhi
> 
> 
> ChangeLog:
> 2010-12-16  Wei Guozhi  <carrot@google.com>
> 
>         PR target/46975
>         * config/arm/arm.md (*addsi3_carryin_compare0_<optab>): New pattern.
>         (peephole2 for conditional move): Generate 16 bit instructions.
> 
> ChangeLog:
> 2010-12-16  Wei Guozhi  <carrot@google.com>
> 
>         PR target/46975
>         * gcc.target/arm/pr46975.c: New testcase.
> 
> 
> Index: arm.md
> ===================================================================
> --- arm.md	(revision 167861)
> +++ arm.md	(working copy)
> @@ -974,6 +974,21 @@
>  		      (const_string "alu_shift_reg")))]
>  )
> 
> +(define_insn "*addsi3_carryin_compare0_<optab>"
> +  [(set (reg:CC CC_REGNUM)
> +	(compare:CC
> +	 (plus:SI (plus:SI (match_operand:SI 1 "s_register_operand" "%r")
> +			   (match_operand:SI 2 "arm_rhs_operand" "rI"))
> +		  (LTUGEU:SI (reg:<cnb> CC_REGNUM) (const_int 0)))
> +	 (const_int 0)))
> +   (set (match_operand:SI 0 "s_register_operand" "=r")
> +	(plus:SI (plus:SI (match_dup 1) (match_dup 2))
> +		 (LTUGEU:SI (reg:<cnb> CC_REGNUM) (const_int 0))))]
> +   "TARGET_32BIT"
> +   "adc%.\\t%0, %1, %2"
> +   [(set_attr "conds" "set")]
> + )
> +
>  (define_expand "incscc"
>    [(set (match_operand:SI 0 "s_register_operand" "=r,r")
>          (plus:SI (match_operator:SI 2 "arm_comparison_operator"
> @@ -8748,14 +8763,22 @@
>  	      (set (match_dup 0) (const_int 1)))
>     (match_scratch:SI 3 "r")]
>    "TARGET_32BIT"
> -  [(set (match_dup 3) (minus:SI (match_dup 1) (match_dup 2)))
> +  [(parallel
> +    [(set (reg:CC CC_REGNUM)
> +	  (compare:CC (match_dup 1) (match_dup 2)))
> +     (set (match_dup 3) (minus:SI (match_dup 1) (match_dup 2)))])
>     (parallel
>      [(set (reg:CC CC_REGNUM)
>  	  (compare:CC (const_int 0) (match_dup 3)))
>       (set (match_dup 0) (minus:SI (const_int 0) (match_dup 3)))])
> -   (set (match_dup 0)
> -	(plus:SI (plus:SI (match_dup 0) (match_dup 3))
> -		 (geu:SI (reg:CC CC_REGNUM) (const_int 0))))])
> +   (parallel
> +    [(set (reg:CC CC_REGNUM)
> +	  (compare:CC (plus:SI (plus:SI (match_dup 0) (match_dup 3))
> +			       (geu:SI (reg:CC CC_REGNUM) (const_int 0)))
> +		      (const_int 0)))
> +     (set (match_dup 0)
> +	  (plus:SI (plus:SI (match_dup 0) (match_dup 3))
> +		   (geu:SI (reg:CC CC_REGNUM) (const_int 0))))])])
> 
>  (define_insn "*cond_move"
>    [(set (match_operand:SI 0 "s_register_operand" "=r,r,r")
> 
> 
> Index: pr46975.c
> ===================================================================
> --- pr46975.c	(revision 0)
> +++ pr46975.c	(revision 0)
> @@ -0,0 +1,9 @@
> +/* { dg-options "-mthumb -Os" } */
> +/* { dg-require-effective-target arm_thumb2_ok } */
> +/* { dg-final { scan-assembler "subs" } } */
> +/* { dg-final { scan-assembler "adcs" } } */
> +
> +int foo (int s)
> +{
> +      return s == 1;
> +}
Carrot Wei - Dec. 17, 2010, 7:30 p.m.
On Fri, Dec 17, 2010 at 4:18 AM, Richard Earnshaw <rearnsha@arm.com> wrote:
>
> On Thu, 2010-12-16 at 14:45 -0800, Carrot Wei wrote:
>> Hi
>>
>> Compile the following c code with options -march=armv7-a -mthumb -Os
>>
>> int foo (int s)
>> {
>>     return s == 1;
>> }
>>
>> GCC 4.6 generates:
>>
>> 00000000 <foo>:
>>    0:    f1a0 0301     sub.w    r3, r0, #1    // A
>>    4:    4258          negs    r0, r3
>>    6:    eb40 0003     adc.w    r0, r0, r3      // B
>>    a:    4770          bx    lr
>>
>> Notice that instructions A and B are 32 bits. In thumb2 we can use subs and adcs
>> instead so they will be 16 bits.
>>
>
> This sequence already contains an instruction that sets the flags as a
> necessary part of the sequence.  Why doesn't it also generate
> flag-corrupting variants of the other two instructions when the

That is the root cause of this problem. The old pattern simply didn't
generate the flag setting version.

> registers selected are suitable?  It seems silly to force the compiler
> to do yet more work to clean up this code.
>

This patch doesn't force the compiler to do more work. It directly
modifies the original peephole2 pattern to generate flag setting
instructions as you suggested. And the new insn pattern is for
instruction adcs which didn't exist previously.

thanks
Guozhi
Carrot Wei - Jan. 10, 2011, 4:30 a.m.
Ping

On Sat, Dec 18, 2010 at 3:30 AM, Carrot Wei <carrot@google.com> wrote:
>
> On Fri, Dec 17, 2010 at 4:18 AM, Richard Earnshaw <rearnsha@arm.com> wrote:
> >
> > On Thu, 2010-12-16 at 14:45 -0800, Carrot Wei wrote:
> >> Hi
> >>
> >> Compile the following c code with options -march=armv7-a -mthumb -Os
> >>
> >> int foo (int s)
> >> {
> >>     return s == 1;
> >> }
> >>
> >> GCC 4.6 generates:
> >>
> >> 00000000 <foo>:
> >>    0:    f1a0 0301     sub.w    r3, r0, #1    // A
> >>    4:    4258          negs    r0, r3
> >>    6:    eb40 0003     adc.w    r0, r0, r3      // B
> >>    a:    4770          bx    lr
> >>
> >> Notice that instructions A and B are 32 bits. In thumb2 we can use subs and adcs
> >> instead so they will be 16 bits.
> >>
> >
> > This sequence already contains an instruction that sets the flags as a
> > necessary part of the sequence.  Why doesn't it also generate
> > flag-corrupting variants of the other two instructions when the
>
> That is the root cause of this problem. The old pattern simply didn't
> generate the flag setting version.
>
> > registers selected are suitable?  It seems silly to force the compiler
> > to do yet more work to clean up this code.
> >
>
> This patch doesn't force the compiler to do more work. It directly
> modifies the original peephole2 pattern to generate flag setting
> instructions as you suggested. And the new insn pattern is for
> instruction adcs which didn't exist previously.
>
> thanks
> Guozhi
Carrot Wei - Jan. 26, 2011, 7:53 a.m.
Ping

On Mon, Jan 10, 2011 at 12:30 PM, Carrot Wei <carrot@google.com> wrote:
> Ping
>
> On Sat, Dec 18, 2010 at 3:30 AM, Carrot Wei <carrot@google.com> wrote:
>>
>> On Fri, Dec 17, 2010 at 4:18 AM, Richard Earnshaw <rearnsha@arm.com> wrote:
>> >
>> > On Thu, 2010-12-16 at 14:45 -0800, Carrot Wei wrote:
>> >> Hi
>> >>
>> >> Compile the following c code with options -march=armv7-a -mthumb -Os
>> >>
>> >> int foo (int s)
>> >> {
>> >>     return s == 1;
>> >> }
>> >>
>> >> GCC 4.6 generates:
>> >>
>> >> 00000000 <foo>:
>> >>    0:    f1a0 0301     sub.w    r3, r0, #1    // A
>> >>    4:    4258          negs    r0, r3
>> >>    6:    eb40 0003     adc.w    r0, r0, r3      // B
>> >>    a:    4770          bx    lr
>> >>
>> >> Notice that instructions A and B are 32 bits. In thumb2 we can use subs and adcs
>> >> instead so they will be 16 bits.
>> >>
>> >
>> > This sequence already contains an instruction that sets the flags as a
>> > necessary part of the sequence.  Why doesn't it also generate
>> > flag-corrupting variants of the other two instructions when the
>>
>> That is the root cause of this problem. The old pattern simply didn't
>> generate the flag setting version.
>>
>> > registers selected are suitable?  It seems silly to force the compiler
>> > to do yet more work to clean up this code.
>> >
>>
>> This patch doesn't force the compiler to do more work. It directly
>> modifies the original peephole2 pattern to generate flag setting
>> instructions as you suggested. And the new insn pattern is for
>> instruction adcs which didn't exist previously.
>>
>> thanks
>> Guozhi
>
Carrot Wei - March 18, 2011, 7:13 a.m.
Ping

On Sat, Dec 18, 2010 at 3:30 AM, Carrot Wei <carrot@google.com> wrote:
> On Fri, Dec 17, 2010 at 4:18 AM, Richard Earnshaw <rearnsha@arm.com> wrote:
>>
>> On Thu, 2010-12-16 at 14:45 -0800, Carrot Wei wrote:
>>> Hi
>>>
>>> Compile the following c code with options -march=armv7-a -mthumb -Os
>>>
>>> int foo (int s)
>>> {
>>>     return s == 1;
>>> }
>>>
>>> GCC 4.6 generates:
>>>
>>> 00000000 <foo>:
>>>    0:    f1a0 0301     sub.w    r3, r0, #1    // A
>>>    4:    4258          negs    r0, r3
>>>    6:    eb40 0003     adc.w    r0, r0, r3      // B
>>>    a:    4770          bx    lr
>>>
>>> Notice that instructions A and B are 32 bits. In thumb2 we can use subs and adcs
>>> instead so they will be 16 bits.
>>>
>>
>> This sequence already contains an instruction that sets the flags as a
>> necessary part of the sequence.  Why doesn't it also generate
>> flag-corrupting variants of the other two instructions when the
>
> That is the root cause of this problem. The old pattern simply didn't
> generate the flag setting version.
>
>> registers selected are suitable?  It seems silly to force the compiler
>> to do yet more work to clean up this code.
>>
>
> This patch doesn't force the compiler to do more work. It directly
> modifies the original peephole2 pattern to generate flag setting
> instructions as you suggested. And the new insn pattern is for
> instruction adcs which didn't exist previously.
>
> thanks
> Guozhi
>

Patch

Index: arm.md
===================================================================
--- arm.md	(revision 167861)
+++ arm.md	(working copy)
@@ -974,6 +974,21 @@ 
 		      (const_string "alu_shift_reg")))]
 )

+(define_insn "*addsi3_carryin_compare0_<optab>"
+  [(set (reg:CC CC_REGNUM)
+	(compare:CC
+	 (plus:SI (plus:SI (match_operand:SI 1 "s_register_operand" "%r")
+			   (match_operand:SI 2 "arm_rhs_operand" "rI"))
+		  (LTUGEU:SI (reg:<cnb> CC_REGNUM) (const_int 0)))
+	 (const_int 0)))
+   (set (match_operand:SI 0 "s_register_operand" "=r")
+	(plus:SI (plus:SI (match_dup 1) (match_dup 2))
+		 (LTUGEU:SI (reg:<cnb> CC_REGNUM) (const_int 0))))]
+   "TARGET_32BIT"
+   "adc%.\\t%0, %1, %2"
+   [(set_attr "conds" "set")]
+ )
+
 (define_expand "incscc"
   [(set (match_operand:SI 0 "s_register_operand" "=r,r")
         (plus:SI (match_operator:SI 2 "arm_comparison_operator"
@@ -8748,14 +8763,22 @@ 
 	      (set (match_dup 0) (const_int 1)))
    (match_scratch:SI 3 "r")]
   "TARGET_32BIT"
-  [(set (match_dup 3) (minus:SI (match_dup 1) (match_dup 2)))
+  [(parallel
+    [(set (reg:CC CC_REGNUM)
+	  (compare:CC (match_dup 1) (match_dup 2)))
+     (set (match_dup 3) (minus:SI (match_dup 1) (match_dup 2)))])
    (parallel
     [(set (reg:CC CC_REGNUM)
 	  (compare:CC (const_int 0) (match_dup 3)))
      (set (match_dup 0) (minus:SI (const_int 0) (match_dup 3)))])
-   (set (match_dup 0)
-	(plus:SI (plus:SI (match_dup 0) (match_dup 3))
-		 (geu:SI (reg:CC CC_REGNUM) (const_int 0))))])
+   (parallel
+    [(set (reg:CC CC_REGNUM)
+	  (compare:CC (plus:SI (plus:SI (match_dup 0) (match_dup 3))
+			       (geu:SI (reg:CC CC_REGNUM) (const_int 0)))
+		      (const_int 0)))
+     (set (match_dup 0)
+	  (plus:SI (plus:SI (match_dup 0) (match_dup 3))
+		   (geu:SI (reg:CC CC_REGNUM) (const_int 0))))])])

 (define_insn "*cond_move"
   [(set (match_operand:SI 0 "s_register_operand" "=r,r,r")


Index: pr46975.c
===================================================================
--- pr46975.c	(revision 0)
+++ pr46975.c	(revision 0)
@@ -0,0 +1,9 @@ 
+/* { dg-options "-mthumb -Os" } */
+/* { dg-require-effective-target arm_thumb2_ok } */
+/* { dg-final { scan-assembler "subs" } } */
+/* { dg-final { scan-assembler "adcs" } } */
+
+int foo (int s)
+{
+      return s == 1;
+}