Patchwork [Patch/ARM] Cortex-M4 core pipeline patch to tune LDR/STR pairs

login
register
mail settings
Submitter Terry Guo
Date March 29, 2013, 9:59 a.m.
Message ID <000001ce2c64$2a90af80$7fb20e80$@arm.com>
Download mbox | patch
Permalink /patch/232348/
State New
Headers show

Comments

Terry Guo - March 29, 2013, 9:59 a.m.
Hello,

The attached pipeline patch intends to turn following code generation

ldr r5, [r4, #12]
adds r2, r2, #16
str r5, [r3, #8]

to

ldr r5, [r4, #12]
str r5, [r3, #8]
adds r2, r2, #16

The reason is that the STR can be started from the second cycle of its
preceding LDR which takes 2 cycles, as long as the result of LDR isn't used
as memory address of STR.

Tested with various benchmarks on Cortex-M4 MPS. Except one regression
caused by register allocation, the others either show performance
improvement or no change.

Is it OK to trunk?

BR,
Terry

2013-03-29  Terry Guo  <terry.guo@arm.com>

                * gcc/config/arm/cortex-m4.md: New bypass to tune LDR/STR
pairs.
From 19dd8bdc9a03f78690700ded911e0cee66328c01 Mon Sep 17 00:00:00 2001
From: Terry Guo <terry.guo@arm.com>
Date: Wed, 27 Mar 2013 17:23:09 +0800
Subject: [PATCH] improve m4 pipeline description

---
 gcc/config/arm/cortex-m4.md |    4 ++++
 1 file changed, 4 insertions(+)
Terry Guo - April 16, 2013, 4:25 a.m.
Hello Ramana,

Can you please review my patch at
http://gcc.gnu.org/ml/gcc-patches/2013-03/msg01252.html.

Thanks.

Terry

> -----Original Message-----
> From: gcc-patches-owner@gcc.gnu.org [mailto:gcc-patches-
> owner@gcc.gnu.org] On Behalf Of Terry Guo
> Sent: Friday, March 29, 2013 6:00 PM
> To: gcc-patches@gcc.gnu.org
> Subject: [Patch/ARM] Cortex-M4 core pipeline patch to tune LDR/STR pairs
> 
> Hello,
> 
> The attached pipeline patch intends to turn following code generation
> 
> ldr r5, [r4, #12]
> adds r2, r2, #16
> str r5, [r3, #8]
> 
> to
> 
> ldr r5, [r4, #12]
> str r5, [r3, #8]
> adds r2, r2, #16
> 
> The reason is that the STR can be started from the second cycle of its
> preceding LDR which takes 2 cycles, as long as the result of LDR isn't
used as
> memory address of STR.
> 
> Tested with various benchmarks on Cortex-M4 MPS. Except one regression
> caused by register allocation, the others either show performance
> improvement or no change.
> 
> Is it OK to trunk?
> 
> BR,
> Terry
> 
> 2013-03-29  Terry Guo  <terry.guo@arm.com>
> 
>                 * gcc/config/arm/cortex-m4.md: New bypass to tune LDR/STR
pairs.
Richard Earnshaw - April 16, 2013, 9:56 a.m.
On 29/03/13 09:59, Terry Guo wrote:
> Hello,
>
> The attached pipeline patch intends to turn following code generation
>
> ldr r5, [r4, #12]
> adds r2, r2, #16
> str r5, [r3, #8]
>
> to
>
> ldr r5, [r4, #12]
> str r5, [r3, #8]
> adds r2, r2, #16
>
> The reason is that the STR can be started from the second cycle of its
> preceding LDR which takes 2 cycles, as long as the result of LDR isn't used
> as memory address of STR.
>
> Tested with various benchmarks on Cortex-M4 MPS. Except one regression
> caused by register allocation, the others either show performance
> improvement or no change.
>
> Is it OK to trunk?
>
> BR,
> Terry
>
> 2013-03-29  Terry Guo  <terry.guo@arm.com>
>
>                  * gcc/config/arm/cortex-m4.md: New bypass to tune LDR/STR
> pairs.
>

OK.

R.

Patch

diff --git a/gcc/config/arm/cortex-m4.md b/gcc/config/arm/cortex-m4.md
index 187867b..47b0364 100644
--- a/gcc/config/arm/cortex-m4.md
+++ b/gcc/config/arm/cortex-m4.md
@@ -84,6 +84,10 @@ 
        (eq_attr "type" "store4"))
   "cortex_m4_ex*5")
 
+(define_bypass 1 "cortex_m4_load1"
+                 "cortex_m4_store1_1,cortex_m4_store1_2"
+                 "arm_no_early_store_addr_dep")
+
 ;; If the address of load or store depends on the result of the preceding
 ;; instruction, the latency is increased by one.