Patchwork [U-Boot] OMAP3 patch to improve performance?

login
register
mail settings
Submitter Wolfgang Denk
Date Nov. 19, 2010, 10:35 p.m.
Message ID <20101119223507.B949A14E648@gemini.denx.de>
Download mbox | patch
Permalink /patch/72323/
State Not Applicable
Delegated to: Tom Rini
Headers show

Comments

Wolfgang Denk - Nov. 19, 2010, 10:35 p.m.
Dear Sandeep,

In message <0554BEF07D437848AF01B9C9B5F0BC5DBD1C11AD@dlee01.ent.ti.com> you wrote:
> I came across this patch that might be useful
> 
> http://lists.denx.de/pipermail/u-boot/2010-February/067534.html
> 
> In any case since the u-boot structure has changed this cannot be
> applied. If anybody is interested please send a new patch.

That's actually prtty straightforward.  Here it comes (completely
untested :-)

----------------------------------------------------------------------

From 0223e42a48417ad18276d2709206b404c8796807 Mon Sep 17 00:00:00 2001
From: Siarhei Siamashka <siarhei.siamashka@gmail.com>
Date: Sat, 6 Feb 2010 16:19:46 +0000
Subject: [PATCH] OMAP3: remove useless ASA bit from AUXCR

Setting ASA bit hurts performance for the code which has lots of I-cache
misses and there are no Cortex-A8 errata workarounds which would require
to have it.

A test program which intentionally stresses I-cache misses on conditional
branches is attached.

ASA bit is not set:

real    0m2.940s
user    0m2.930s
sys     0m0.008s

ASA bit is set:

real    0m3.470s
user    0m3.461s
sys     0m0.008s

The difference on some real applications is much more modest and is just
something like ~0.5%, but every little bit helps.

/**** start of bench_ASA.c ****/
void __attribute__((naked)) f(int count, void *rand)
{
    asm volatile (
        "    push   {r4, r5, r6, lr}\n"
        "    mov    r4, r0\n"
        "    mov    r5, r1\n"
        "0:\n"
        ".rept 4096\n"
        "    blx    r5\n"
        "    tst    r0, #1\n"
        "    bne    1f\n"
        "    b      2f\n"
        ".balign 64\n"
        "1:\n"
        ".rept 15\n"
        "    add    r0, r0, #0\n"
        ".endr\n"
        "    b      3f\n"
        ".balign 64\n"
        "2:\n"
        ".rept 16\n"
        "    add    r0, r0, #0\n"
        ".endr\n"
        "3:\n"
        ".endr\n"
        "    subs r4, r4, #1\n"
        "    bgt  0b\n"
        "    pop  {r4, r5, r6, pc}\n"
    );
}
int main()
{
    f(1000, rand);
    return 0;
}
/**** end of bench_ASA.c ****/

Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
---
 arch/arm/cpu/armv7/omap3/cache.S |    1 -
 1 files changed, 0 insertions(+), 1 deletions(-)
Sandeep Paulraj - Nov. 19, 2010, 11:12 p.m.
> 
> Dear Sandeep,
> 
> In message <0554BEF07D437848AF01B9C9B5F0BC5DBD1C11AD@dlee01.ent.ti.com>
> you wrote:
> > I came across this patch that might be useful
> >
> > http://lists.denx.de/pipermail/u-boot/2010-February/067534.html
> >
> > In any case since the u-boot structure has changed this cannot be
> > applied. If anybody is interested please send a new patch.
> 
> That's actually prtty straightforward.  Here it comes (completely
> untested :-)

Thanks but this now effects both OMAP3 and OMAP4, it needs quite a bit of testing in my opinion.

> 
> ----------------------------------------------------------------------
> 
> From 0223e42a48417ad18276d2709206b404c8796807 Mon Sep 17 00:00:00 2001
> From: Siarhei Siamashka <siarhei.siamashka@gmail.com>
> Date: Sat, 6 Feb 2010 16:19:46 +0000
> Subject: [PATCH] OMAP3: remove useless ASA bit from AUXCR
> 
> Setting ASA bit hurts performance for the code which has lots of I-cache
> misses and there are no Cortex-A8 errata workarounds which would require
> to have it.
> 
> A test program which intentionally stresses I-cache misses on conditional
> branches is attached.
> 
> ASA bit is not set:
> 
> real    0m2.940s
> user    0m2.930s
> sys     0m0.008s
> 
> ASA bit is set:
> 
> real    0m3.470s
> user    0m3.461s
> sys     0m0.008s
> 
> The difference on some real applications is much more modest and is just
> something like ~0.5%, but every little bit helps.
> 
> /**** start of bench_ASA.c ****/
> void __attribute__((naked)) f(int count, void *rand)
> {
>     asm volatile (
>         "    push   {r4, r5, r6, lr}\n"
>         "    mov    r4, r0\n"
>         "    mov    r5, r1\n"
>         "0:\n"
>         ".rept 4096\n"
>         "    blx    r5\n"
>         "    tst    r0, #1\n"
>         "    bne    1f\n"
>         "    b      2f\n"
>         ".balign 64\n"
>         "1:\n"
>         ".rept 15\n"
>         "    add    r0, r0, #0\n"
>         ".endr\n"
>         "    b      3f\n"
>         ".balign 64\n"
>         "2:\n"
>         ".rept 16\n"
>         "    add    r0, r0, #0\n"
>         ".endr\n"
>         "3:\n"
>         ".endr\n"
>         "    subs r4, r4, #1\n"
>         "    bgt  0b\n"
>         "    pop  {r4, r5, r6, pc}\n"
>     );
> }
> int main()
> {
>     f(1000, rand);
>     return 0;
> }
> /**** end of bench_ASA.c ****/
> 
> Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
> ---
>  arch/arm/cpu/armv7/omap3/cache.S |    1 -
>  1 files changed, 0 insertions(+), 1 deletions(-)
> 
> diff --git a/arch/arm/cpu/armv7/omap3/cache.S
> b/arch/arm/cpu/armv7/omap3/cache.S
> index cda87ba..2854771 100644
> --- a/arch/arm/cpu/armv7/omap3/cache.S
> +++ b/arch/arm/cpu/armv7/omap3/cache.S
> @@ -169,7 +169,6 @@ setup_auxcr:
>  	orr	r1, r3, r2, lsr #20-4		@ combine variant and revision
>  	mov	r12, #0x3
>  	mrc	p15, 0, r0, c1, c0, 1
> -	orr	r0, r0, #0x10			@ Enable ASA
>  	@ Enable L1NEON on pre-r2p1 (erratum 621766 workaround)
>  	cmp	r1, #0x21
>  	orrlt	r0, r0, #1 << 5
> --
> 1.7.3.2
> 
> 
> Best regards,
> 
> Wolfgang Denk
> 
> --
> DENX Software Engineering GmbH,     MD: Wolfgang Denk & Detlev Zundel
> HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
> Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
> Never put off until tomorrow what you can put off indefinitely.
Siarhei Siamashka - Nov. 24, 2010, 12:21 p.m.
On Saturday 20 November 2010 01:12:22 Paulraj, Sandeep wrote:
> > Dear Sandeep,
> > 
> > In message <0554BEF07D437848AF01B9C9B5F0BC5DBD1C11AD@dlee01.ent.ti.com>
> > 
> > you wrote:
> > > I came across this patch that might be useful
> > > 
> > > http://lists.denx.de/pipermail/u-boot/2010-February/067534.html
> > > 
> > > In any case since the u-boot structure has changed this cannot be
> > > applied. If anybody is interested please send a new patch.
> > 
> > That's actually prtty straightforward.  Here it comes (completely
> > untested :-)

Yes, this straightforward change looks ok to me.

> Thanks but this now effects both OMAP3 and OMAP4, it needs quite a bit of
> testing in my opinion.

Why do you think it affects OMAP4?

> > +++ b/arch/arm/cpu/armv7/omap3/cache.S
                             ^^^^^
 
OMAP4 uses Cortex-A9 with totally different bits in Auxiliary Control Register.
This ASA bit (and the patch) is only relevant for Cortex-A8.

Anyway, additional testing and ACK would be surely welcome :-)

Is there anything else expected to be done from my side? As I mentioned before,
I would really prefer if TI could deal with this stuff themselves. Just because
they ought to have all the most relevant information.

Patch

diff --git a/arch/arm/cpu/armv7/omap3/cache.S b/arch/arm/cpu/armv7/omap3/cache.S
index cda87ba..2854771 100644
--- a/arch/arm/cpu/armv7/omap3/cache.S
+++ b/arch/arm/cpu/armv7/omap3/cache.S
@@ -169,7 +169,6 @@  setup_auxcr:
 	orr	r1, r3, r2, lsr #20-4		@ combine variant and revision
 	mov	r12, #0x3
 	mrc	p15, 0, r0, c1, c0, 1
-	orr	r0, r0, #0x10			@ Enable ASA
 	@ Enable L1NEON on pre-r2p1 (erratum 621766 workaround)
 	cmp	r1, #0x21
 	orrlt	r0, r0, #1 << 5