Message ID | 000001ce2c64$2a90af80$7fb20e80$@arm.com |
---|---|
State | New |
Headers | show |
Hello Ramana, Can you please review my patch at http://gcc.gnu.org/ml/gcc-patches/2013-03/msg01252.html. Thanks. Terry > -----Original Message----- > From: gcc-patches-owner@gcc.gnu.org [mailto:gcc-patches- > owner@gcc.gnu.org] On Behalf Of Terry Guo > Sent: Friday, March 29, 2013 6:00 PM > To: gcc-patches@gcc.gnu.org > Subject: [Patch/ARM] Cortex-M4 core pipeline patch to tune LDR/STR pairs > > Hello, > > The attached pipeline patch intends to turn following code generation > > ldr r5, [r4, #12] > adds r2, r2, #16 > str r5, [r3, #8] > > to > > ldr r5, [r4, #12] > str r5, [r3, #8] > adds r2, r2, #16 > > The reason is that the STR can be started from the second cycle of its > preceding LDR which takes 2 cycles, as long as the result of LDR isn't used as > memory address of STR. > > Tested with various benchmarks on Cortex-M4 MPS. Except one regression > caused by register allocation, the others either show performance > improvement or no change. > > Is it OK to trunk? > > BR, > Terry > > 2013-03-29 Terry Guo <terry.guo@arm.com> > > * gcc/config/arm/cortex-m4.md: New bypass to tune LDR/STR pairs.
On 29/03/13 09:59, Terry Guo wrote: > Hello, > > The attached pipeline patch intends to turn following code generation > > ldr r5, [r4, #12] > adds r2, r2, #16 > str r5, [r3, #8] > > to > > ldr r5, [r4, #12] > str r5, [r3, #8] > adds r2, r2, #16 > > The reason is that the STR can be started from the second cycle of its > preceding LDR which takes 2 cycles, as long as the result of LDR isn't used > as memory address of STR. > > Tested with various benchmarks on Cortex-M4 MPS. Except one regression > caused by register allocation, the others either show performance > improvement or no change. > > Is it OK to trunk? > > BR, > Terry > > 2013-03-29 Terry Guo <terry.guo@arm.com> > > * gcc/config/arm/cortex-m4.md: New bypass to tune LDR/STR > pairs. > OK. R.
diff --git a/gcc/config/arm/cortex-m4.md b/gcc/config/arm/cortex-m4.md index 187867b..47b0364 100644 --- a/gcc/config/arm/cortex-m4.md +++ b/gcc/config/arm/cortex-m4.md @@ -84,6 +84,10 @@ (eq_attr "type" "store4")) "cortex_m4_ex*5") +(define_bypass 1 "cortex_m4_load1" + "cortex_m4_store1_1,cortex_m4_store1_2" + "arm_no_early_store_addr_dep") + ;; If the address of load or store depends on the result of the preceding ;; instruction, the latency is increased by one.
Hello, The attached pipeline patch intends to turn following code generation ldr r5, [r4, #12] adds r2, r2, #16 str r5, [r3, #8] to ldr r5, [r4, #12] str r5, [r3, #8] adds r2, r2, #16 The reason is that the STR can be started from the second cycle of its preceding LDR which takes 2 cycles, as long as the result of LDR isn't used as memory address of STR. Tested with various benchmarks on Cortex-M4 MPS. Except one regression caused by register allocation, the others either show performance improvement or no change. Is it OK to trunk? BR, Terry 2013-03-29 Terry Guo <terry.guo@arm.com> * gcc/config/arm/cortex-m4.md: New bypass to tune LDR/STR pairs. From 19dd8bdc9a03f78690700ded911e0cee66328c01 Mon Sep 17 00:00:00 2001 From: Terry Guo <terry.guo@arm.com> Date: Wed, 27 Mar 2013 17:23:09 +0800 Subject: [PATCH] improve m4 pipeline description --- gcc/config/arm/cortex-m4.md | 4 ++++ 1 file changed, 4 insertions(+)