diff mbox

[GCC/Thumb1] Improve 64bit constant load for Thumb1

Message ID 000001cf5558$b40149a0$1c03dce0$@arm.com
State New
Headers show

Commit Message

Terry Guo April 11, 2014, 7:36 a.m. UTC
Hi there,

Current gcc prefers to using two LDR instructions to load 64bit constants.
This could miss some chances that 64bit load can be done in fewer
instructions or fewer cycles. For example, below code to load 0x100000001

mov	r0, #1
mov	r1, #1

is better than current solution:

        ldr     r1, .L2+4
        ldr     r0, .L2
.L2:
        .word   1
        .word   1

The attached patch intends to split 64bit load to take advantage of such
chances. Tested with gcc regression test on cortex-m0. No new regressions.

Is it ok to stage 1?

BR,
Terry

gcc/
2014-04-11  Terry Guo  <terry.guo@arm.com>

        * config/arm/arm.md (split 64-bit constant for Thumb1): New split
pattern.

gcc/testsuite/
2014-04-11  Terry Guo  <terry.guo@arm.com>

        * gcc.target/arm/thumb1-load-64bit-constant-1.c: New test.
        * gcc.target/arm/thumb1-load-64bit-constant-2.c: Ditto.
        * gcc.target/arm/thumb1-load-64bit-constant-3.c: Ditto.

Comments

Ramana Radhakrishnan May 21, 2014, 8:56 a.m. UTC | #1
-(define_split
+; Split the load of 64-bit constant into two loads for high and low
32-bit parts respectively
+; to see if we can load them in fewer instructions or fewer cycles.
+; For the small 64-bit integer constants that satisfy constraint J,
the instruction pattern
+; thumb1_movdi_insn has a better way to handle them.
>+(define_split
>+  [(set (match_operand:ANY64 0 "arm_general_register_operand" "")
>+    (match_operand:ANY64 1 "const_double_operand" ""))]
>+  "TARGET_THUMB1 && reload_completed && !satisfies_constraint_J (operands[1])"
>+  [(set (match_dup 0) (match_dup 1))
>+   (set (match_dup 2) (match_dup 3))]

Not ok - this splitter used to kick in in ARM state, now you've turned it off.
Look at the movdi patterns in ARM state which deal with it immediate
moves with a #.

So, the condition should read

(TARGET_32BIT) || ( TARGET_THUMB1 && ....)

Ok with that change and if no regressions.

regards
Ramana




On Fri, Apr 11, 2014 at 8:36 AM, Terry Guo <terry.guo@arm.com> wrote:
> Hi there,
>
> Current gcc prefers to using two LDR instructions to load 64bit constants.
> This could miss some chances that 64bit load can be done in fewer
> instructions or fewer cycles. For example, below code to load 0x100000001
>
> mov     r0, #1
> mov     r1, #1
>
> is better than current solution:
>
>         ldr     r1, .L2+4
>         ldr     r0, .L2
> .L2:
>         .word   1
>         .word   1
>
> The attached patch intends to split 64bit load to take advantage of such
> chances. Tested with gcc regression test on cortex-m0. No new regressions.
>
> Is it ok to stage 1?
>
> BR,
> Terry
>
> gcc/
> 2014-04-11  Terry Guo  <terry.guo@arm.com>
>
>         * config/arm/arm.md (split 64-bit constant for Thumb1): New split
> pattern.
>
> gcc/testsuite/
> 2014-04-11  Terry Guo  <terry.guo@arm.com>
>
>         * gcc.target/arm/thumb1-load-64bit-constant-1.c: New test.
>         * gcc.target/arm/thumb1-load-64bit-constant-2.c: Ditto.
>         * gcc.target/arm/thumb1-load-64bit-constant-3.c: Ditto.
Terry Guo May 21, 2014, 9:29 a.m. UTC | #2
> -----Original Message-----
> From: Ramana Radhakrishnan [mailto:ramana.gcc@googlemail.com]
> Sent: Wednesday, May 21, 2014 4:56 PM
> To: Terry Guo
> Cc: gcc-patches; Richard Earnshaw; Ramana Radhakrishnan
> Subject: Re: [Patch, GCC/Thumb1] Improve 64bit constant load for Thumb1
> 
> -(define_split
> +; Split the load of 64-bit constant into two loads for high and low
> 32-bit parts respectively
> +; to see if we can load them in fewer instructions or fewer cycles.
> +; For the small 64-bit integer constants that satisfy constraint J,
> the instruction pattern
> +; thumb1_movdi_insn has a better way to handle them.
> >+(define_split
> >+  [(set (match_operand:ANY64 0 "arm_general_register_operand" "")
> >+    (match_operand:ANY64 1 "const_double_operand" ""))]
> >+  "TARGET_THUMB1 && reload_completed && !satisfies_constraint_J
> (operands[1])"
> >+  [(set (match_dup 0) (match_dup 1))
> >+   (set (match_dup 2) (match_dup 3))]
> 
> Not ok - this splitter used to kick in in ARM state, now you've turned it off.
> Look at the movdi patterns in ARM state which deal with it immediate
> moves with a #.
> 
> So, the condition should read
> 
> (TARGET_32BIT) || ( TARGET_THUMB1 && ....)
> 
> Ok with that change and if no regressions.
> 
> regards
> Ramana
> 

Thanks for reviewing. This is a total new splitter dedicated for Thumb1 targets. The one for ARM isn't touched.

BR,
Terry

> 
> 
> 
> On Fri, Apr 11, 2014 at 8:36 AM, Terry Guo <terry.guo@arm.com> wrote:
> > Hi there,
> >
> > Current gcc prefers to using two LDR instructions to load 64bit constants.
> > This could miss some chances that 64bit load can be done in fewer
> > instructions or fewer cycles. For example, below code to load
> > 0x100000001
> >
> > mov     r0, #1
> > mov     r1, #1
> >
> > is better than current solution:
> >
> >         ldr     r1, .L2+4
> >         ldr     r0, .L2
> > .L2:
> >         .word   1
> >         .word   1
> >
> > The attached patch intends to split 64bit load to take advantage of
> > such chances. Tested with gcc regression test on cortex-m0. No new
> regressions.
> >
> > Is it ok to stage 1?
> >
> > BR,
> > Terry
> >
> > gcc/
> > 2014-04-11  Terry Guo  <terry.guo@arm.com>
> >
> >         * config/arm/arm.md (split 64-bit constant for Thumb1): New
> > split pattern.
> >
> > gcc/testsuite/
> > 2014-04-11  Terry Guo  <terry.guo@arm.com>
> >
> >         * gcc.target/arm/thumb1-load-64bit-constant-1.c: New test.
> >         * gcc.target/arm/thumb1-load-64bit-constant-2.c: Ditto.
> >         * gcc.target/arm/thumb1-load-64bit-constant-3.c: Ditto.
Ramana Radhakrishnan May 21, 2014, 12:01 p.m. UTC | #3
On Wed, May 21, 2014 at 10:29 AM, Terry Guo <terry.guo@arm.com> wrote:
>
>
>> -----Original Message-----
>> From: Ramana Radhakrishnan [mailto:ramana.gcc@googlemail.com]
>> Sent: Wednesday, May 21, 2014 4:56 PM
>> To: Terry Guo
>> Cc: gcc-patches; Richard Earnshaw; Ramana Radhakrishnan
>> Subject: Re: [Patch, GCC/Thumb1] Improve 64bit constant load for Thumb1
>>
>> -(define_split
>> +; Split the load of 64-bit constant into two loads for high and low
>> 32-bit parts respectively
>> +; to see if we can load them in fewer instructions or fewer cycles.
>> +; For the small 64-bit integer constants that satisfy constraint J,
>> the instruction pattern
>> +; thumb1_movdi_insn has a better way to handle them.
>> >+(define_split
>> >+  [(set (match_operand:ANY64 0 "arm_general_register_operand" "")
>> >+    (match_operand:ANY64 1 "const_double_operand" ""))]
>> >+  "TARGET_THUMB1 && reload_completed && !satisfies_constraint_J
>> (operands[1])"
>> >+  [(set (match_dup 0) (match_dup 1))
>> >+   (set (match_dup 2) (match_dup 3))]
>>
>> Not ok - this splitter used to kick in in ARM state, now you've turned it off.
>> Look at the movdi patterns in ARM state which deal with it immediate
>> moves with a #.
>>
>> So, the condition should read
>>
>> (TARGET_32BIT) || ( TARGET_THUMB1 && ....)
>>
>> Ok with that change and if no regressions.
>>
>> regards
>> Ramana
>>
>
> Thanks for reviewing. This is a total new splitter dedicated for Thumb1 targets. The one for ARM isn't touched.
>

Bah - sorry I must have misread the patch. This is OK - please apply.
I was changing a very similar splitter yesterday so that I could
define TARGET_SUPPORTS_WIDE_INT and which is why I must have misread
it.


Ramana

> BR,
> Terry
>
>>
>>
>>
>> On Fri, Apr 11, 2014 at 8:36 AM, Terry Guo <terry.guo@arm.com> wrote:
>> > Hi there,
>> >
>> > Current gcc prefers to using two LDR instructions to load 64bit constants.
>> > This could miss some chances that 64bit load can be done in fewer
>> > instructions or fewer cycles. For example, below code to load
>> > 0x100000001
>> >
>> > mov     r0, #1
>> > mov     r1, #1
>> >
>> > is better than current solution:
>> >
>> >         ldr     r1, .L2+4
>> >         ldr     r0, .L2
>> > .L2:
>> >         .word   1
>> >         .word   1
>> >
>> > The attached patch intends to split 64bit load to take advantage of
>> > such chances. Tested with gcc regression test on cortex-m0. No new
>> regressions.
>> >
>> > Is it ok to stage 1?
>> >
>> > BR,
>> > Terry
>> >
>> > gcc/
>> > 2014-04-11  Terry Guo  <terry.guo@arm.com>
>> >
>> >         * config/arm/arm.md (split 64-bit constant for Thumb1): New
>> > split pattern.
>> >
>> > gcc/testsuite/
>> > 2014-04-11  Terry Guo  <terry.guo@arm.com>
>> >
>> >         * gcc.target/arm/thumb1-load-64bit-constant-1.c: New test.
>> >         * gcc.target/arm/thumb1-load-64bit-constant-2.c: Ditto.
>> >         * gcc.target/arm/thumb1-load-64bit-constant-3.c: Ditto.
>
>
>
diff mbox

Patch

diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 2ddda02..b339e83 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -6422,7 +6422,26 @@ 
    (set_attr "pool_range" "*,*,*,*,*,*,1018,*,*")
    (set_attr "conds" "set,clob,*,*,nocond,nocond,nocond,nocond,nocond")])
 
-(define_split 
+; Split the load of 64-bit constant into two loads for high and low 32-bit parts respectively
+; to see if we can load them in fewer instructions or fewer cycles.
+; For the small 64-bit integer constants that satisfy constraint J, the instruction pattern
+; thumb1_movdi_insn has a better way to handle them.
+(define_split
+  [(set (match_operand:ANY64 0 "arm_general_register_operand" "")
+	(match_operand:ANY64 1 "const_double_operand" ""))]
+  "TARGET_THUMB1 && reload_completed && !satisfies_constraint_J (operands[1])"
+  [(set (match_dup 0) (match_dup 1))
+   (set (match_dup 2) (match_dup 3))]
+  "
+  operands[2] = gen_highpart (SImode, operands[0]);
+  operands[3] = gen_highpart_mode (SImode, GET_MODE (operands[0]),
+				   operands[1]);
+  operands[0] = gen_lowpart (SImode, operands[0]);
+  operands[1] = gen_lowpart (SImode, operands[1]);
+  "
+)
+
+(define_split
   [(set (match_operand:SI 0 "register_operand" "")
 	(match_operand:SI 1 "const_int_operand" ""))]
   "TARGET_THUMB1 && satisfies_constraint_J (operands[1])"
diff --git a/gcc/testsuite/gcc.target/arm/thumb1-load-64bit-constant-1.c b/gcc/testsuite/gcc.target/arm/thumb1-load-64bit-constant-1.c
new file mode 100644
index 0000000..9537aaf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/thumb1-load-64bit-constant-1.c
@@ -0,0 +1,14 @@ 
+/* { dg-require-effective-target arm_thumb1_ok } */
+/* { dg-do compile } */
+/* { dg-options "-Os" } */
+/* { dg-skip-if "" { ! { arm_thumb1 } } } */
+
+extern long long madd (long long a, long long b);
+
+long long
+foo ()
+{
+  return madd (0x0000000100000001LL, 0x0000011100000001LL);
+}
+
+/* { dg-final { scan-assembler-not "ldr" } } */
diff --git a/gcc/testsuite/gcc.target/arm/thumb1-load-64bit-constant-2.c b/gcc/testsuite/gcc.target/arm/thumb1-load-64bit-constant-2.c
new file mode 100644
index 0000000..836682b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/thumb1-load-64bit-constant-2.c
@@ -0,0 +1,14 @@ 
+/* { dg-require-effective-target arm_thumb1_ok } */
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-skip-if "" { ! { arm_thumb1 } } } */
+
+extern long long madd (long long a);
+
+long long
+foo ()
+{
+  return madd (0x0000000100000001LL);
+}
+
+/* { dg-final { scan-assembler-not "ldr" } } */
diff --git a/gcc/testsuite/gcc.target/arm/thumb1-load-64bit-constant-3.c b/gcc/testsuite/gcc.target/arm/thumb1-load-64bit-constant-3.c
new file mode 100644
index 0000000..cf4d0be
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/thumb1-load-64bit-constant-3.c
@@ -0,0 +1,14 @@ 
+/* { dg-require-effective-target arm_thumb1_ok } */
+/* { dg-do compile } */
+/* { dg-options "-Os" } */
+/* { dg-skip-if "" { ! { arm_thumb1 } } } */
+
+long long
+foo (int len)
+{
+  return (long long) (((long long) 1 << len) - 1);
+}
+
+/* { dg-final { scan-assembler-not "ldr" } } */
+/* { dg-final { scan-assembler-times "neg" 1 } } */
+