Patchwork [U-Boot] Performance of the ARM's PL310 L2 cache.

login
register
mail settings
Submitter Łukasz Majewski
Date Aug. 17, 2012, 3:49 p.m.
Message ID <20120817174953.08add23e@amdc308.digital.local>
Download mbox | patch
Permalink /patch/178280/
State RFC
Headers show

Comments

Łukasz Majewski - Aug. 17, 2012, 3:49 p.m.
Hi Aneesh,

I've enabled the L2 cache for Trats board. Please find results from
performance tests.
The test function as well as my way for enabling L2 are attached to
this e-mail. 

I simply left the default configuration (number of ways, associativity)
as it is at Linux Kernel's driver.

Results:

test_l2_cache() performed once:
L1    L2     TIME [seconds]
OFF   OFF    90,359
ON    OFF    62,236
ON    ON     61,687
 
L1 speedup: ~33 %
L2 speedup (when compared to L1): < 1%

test_l2_cache() performed 5000 times:
L1    L2     TIME [seconds]
OFF   OFF    444,9
ON    OFF    320,55
ON    ON     287,21
 
L1 speedup: ~28 %
L2 speedup (when compared to L1): ~ 10%

Normal u-boot operation (from system startup - up till passing
execution to kernel).

L1    L2     TIME [seconds]
OFF   OFF    1,813
ON    OFF    1,552
ON    ON     1,533

As one can observe, for normal u-boot operation there is no significant
difference.

Have you had similar results with OMAP? 
Do you do more configuration when enabling the L2 at OMAP? 

The assembly code presented below (armv7/omap-common/lowlevel_init.S)
puzzles me a bit...

ENTRY(set_pl310_ctrl_reg)
 LDR	r12, =0x102 @ Set PL310 control register - value in R0
	.word	0xe1600070	@ SMC #0 - hand assembled
				@because -march=armv5
			        @ call ROM Code API to set control
				@ register
ENDPROC(set_pl310_ctrl_reg)

Are there any special operations executed at "ROM Code API"?
Tom Rini - Aug. 17, 2012, 7:09 p.m.
On Fri, Aug 17, 2012 at 05:49:53PM +0200, Lukasz Majewski wrote:

> Hi Aneesh,
> 
> I've enabled the L2 cache for Trats board. Please find results from
> performance tests.
> The test function as well as my way for enabling L2 are attached to
> this e-mail. 
[snip]
> Have you had similar results with OMAP? 
> Do you do more configuration when enabling the L2 at OMAP? 

At least on some parts, it's similar here.  The normal sequence of
operations is loading a relatively small payload (kernel, maybe device
tree) from storage and then booting it (which turns off L2 anyways).
This is why I was willing to disable DCACHE as a USB workaround, the
common use-case doesn't see a great deal of help from dcache being on.
Marek Vasut - Sept. 23, 2012, 8:12 p.m.
Dear Tom Rini,

> On Fri, Aug 17, 2012 at 05:49:53PM +0200, Lukasz Majewski wrote:
> > Hi Aneesh,
> > 
> > I've enabled the L2 cache for Trats board. Please find results from
> > performance tests.
> > The test function as well as my way for enabling L2 are attached to
> > this e-mail.
> 
> [snip]
> 
> > Have you had similar results with OMAP?
> > Do you do more configuration when enabling the L2 at OMAP?
> 
> At least on some parts, it's similar here.  The normal sequence of
> operations is loading a relatively small payload (kernel, maybe device
> tree) from storage and then booting it (which turns off L2 anyways).
> This is why I was willing to disable DCACHE as a USB workaround, the
> common use-case doesn't see a great deal of help from dcache being on.

I saw some pretty significant perf. boost with L1, I was planning to check L2 on 
mx6q, but I'm not sure if it's worth it anymore.

Best regards,
Marek Vasut
Łukasz Majewski - Oct. 1, 2012, 8:23 a.m.
Hi Marek,

> Dear Tom Rini,
> 
> > On Fri, Aug 17, 2012 at 05:49:53PM +0200, Lukasz Majewski wrote:
> > > Hi Aneesh,
> > > 
> > > I've enabled the L2 cache for Trats board. Please find results
> > > from performance tests.
> > > The test function as well as my way for enabling L2 are attached
> > > to this e-mail.
> > 
> > [snip]
> > 
> > > Have you had similar results with OMAP?
> > > Do you do more configuration when enabling the L2 at OMAP?
> > 
> > At least on some parts, it's similar here.  The normal sequence of
> > operations is loading a relatively small payload (kernel, maybe
> > device tree) from storage and then booting it (which turns off L2
> > anyways). This is why I was willing to disable DCACHE as a USB
> > workaround, the common use-case doesn't see a great deal of help
> > from dcache being on.
> 
> I saw some pretty significant perf. boost with L1, I was planning to
> check L2 on mx6q, but I'm not sure if it's worth it anymore.
> 
> Best regards,
> Marek Vasut

I can agree, that enabling L1 provides significant performance boost.
In our case, enabling L2 gives very little boost. Please try L2 on mx
and share the results.

Patch

From ce66526f772e748234d1f4bf3d264df90274e8c3 Mon Sep 17 00:00:00 2001
From: Lukasz Majewski <l.majewski@samsung.com>
Date: Thu, 16 Aug 2012 15:23:49 +0200
Subject: [PATCH] cache: wip: Test program to evaluate if L2 is working.

Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
---
 arch/arm/cpu/armv7/exynos/soc.c |   28 ++++++++++++++++++++++++++++
 arch/arm/lib/cache-pl310.c      |    2 --
 board/samsung/trats/trats.c     |    2 ++
 include/common.h                |    1 +
 include/configs/trats.h         |    3 +++
 5 files changed, 34 insertions(+), 2 deletions(-)

diff --git a/arch/arm/cpu/armv7/exynos/soc.c b/arch/arm/cpu/armv7/exynos/soc.c
index 9e8705f..9660daa 100644
--- a/arch/arm/cpu/armv7/exynos/soc.c
+++ b/arch/arm/cpu/armv7/exynos/soc.c
@@ -24,6 +24,7 @@ 
 #include <common.h>
 #include <asm/io.h>
 #include <asm/pl310.h>
+#include <linux/compiler.h>
 
 void reset_cpu(ulong addr)
 {
@@ -41,6 +42,7 @@  void enable_caches(void)
 #ifndef CONFIG_SYS_L2CACHE_OFF
 void v7_outer_cache_enable(void)
 {
+	/* puts("-\n"); */
 	pl310_enable();
 }
 void v7_outer_cache_disable(void)
@@ -48,3 +50,29 @@  void v7_outer_cache_disable(void)
 	pl310_disable();
 }
 #endif
+
+#define TEST_BUF_SIZE (1024*256)
+void test_l2_cache()
+{
+	int i, j;
+	u32 *ptr = (u32 *) 0x52000000;
+	volatile u64 sigma = 0;
+
+	/* Setup the buffer */
+	for (i = 0; i < TEST_BUF_SIZE; i++) {
+		*(ptr + i) = i;
+	}
+
+	flush_dcache_range((u32) ptr,
+			   (u32) ptr + (TEST_BUF_SIZE * sizeof(u32)));
+
+	/* Here data at Cache is in sync with SDRAM */
+	for (j = 0; j < 5000; j++) {
+		for (i = 0, sigma = 0; i < TEST_BUF_SIZE; i++)
+			sigma += *(ptr + i);
+		if (!(j % 500))
+			puts(".");
+	}
+
+	printf("sig: 0x%llx\n", sigma);
+}
diff --git a/arch/arm/lib/cache-pl310.c b/arch/arm/lib/cache-pl310.c
index 76d60cf..885063b 100644
--- a/arch/arm/lib/cache-pl310.c
+++ b/arch/arm/lib/cache-pl310.c
@@ -120,8 +120,6 @@  void v7_outer_cache_inval_range(u32 start, u32 stop)
 void pl310_enable(void)
 {
 	writel(1, &pl310->pl310_ctrl);
-	printf("p310_ctrl: 0x%x p310_aux_ctrl: 0x%x\n",
-	       readl(&pl310->pl310_ctrl), readl(&pl310->pl310_aux_ctrl));
 }
 
 void pl310_disable(void)
diff --git a/board/samsung/trats/trats.c b/board/samsung/trats/trats.c
index 4f9cb5a..0db22cb 100644
--- a/board/samsung/trats/trats.c
+++ b/board/samsung/trats/trats.c
@@ -72,6 +72,8 @@  int board_init(void)
 	pmic_init();
 #endif
 
+	test_l2_cache();
+
 	return 0;
 }
 
diff --git a/include/common.h b/include/common.h
index 39859d3..ab4f009 100644
--- a/include/common.h
+++ b/include/common.h
@@ -570,6 +570,7 @@  int	checkdcache   (void);
 void	upmconfig     (unsigned int, unsigned int *, unsigned int);
 ulong	get_tbclk     (void);
 void	reset_cpu     (ulong addr);
+void    test_l2_cache (void);
 #if defined (CONFIG_OF_LIBFDT) && defined (CONFIG_OF_BOARD_SETUP)
 void ft_cpu_setup(void *blob, bd_t *bd);
 #ifdef CONFIG_PCI
diff --git a/include/configs/trats.h b/include/configs/trats.h
index 4b2b4d6..f2e5bd9 100644
--- a/include/configs/trats.h
+++ b/include/configs/trats.h
@@ -42,6 +42,9 @@ 
 #define CONFIG_DISPLAY_CPUINFO
 #define CONFIG_DISPLAY_BOARDINFO
 
+#define CONFIG_SYS_DCACHE_OFF
+#define CONFIG_SYS_L2CACHE_OFF
+
 #ifndef CONFIG_SYS_L2CACHE_OFF
 #define CONFIG_SYS_L2_PL310
 #define CONFIG_SYS_PL310_BASE	0x10502000
-- 
1.7.2.3