diff mbox

[U-Boot] arm: pxa: PXA270 D-Cache as ram

Message ID 1369087586-1344-1-git-send-email-ynvich@gmail.com
State Rejected
Delegated to: Marek Vasut
Headers show

Commit Message

Sergey Yanovich May 20, 2013, 10:06 p.m. UTC
2.2.5.2 of Marvell PXA27x Processor Family Developers Manual says:
"The PXA27x processor cache configuration is identical to that of
the PXA255 processor."

As a result, it is perfectly legitimate to use PXA25X
'lock_cache_for_stack' on PXA27X as well.

Signed-off-by: Sergey Yanovich <ynvich@gmail.com>
---
 arch/arm/cpu/pxa/start.S |   10 ++++++++--
 include/configs/lp8x4x.h |    5 +++--
 2 files changed, 11 insertions(+), 4 deletions(-)

Comments

Marek Vasut May 21, 2013, 10:39 a.m. UTC | #1
Dear Sergey Yanovich,

> 2.2.5.2 of Marvell PXA27x Processor Family Developers Manual says:
> "The PXA27x processor cache configuration is identical to that of
> the PXA255 processor."
> 
> As a result, it is perfectly legitimate to use PXA25X
> 'lock_cache_for_stack' on PXA27X as well.
> 
> Signed-off-by: Sergey Yanovich <ynvich@gmail.com>
> ---
>  arch/arm/cpu/pxa/start.S |   10 ++++++++--
>  include/configs/lp8x4x.h |    5 +++--
>  2 files changed, 11 insertions(+), 4 deletions(-)

Why would you need this on PXA27x ? Is SRAM not enough?

Best regards,
Marek Vasut
Sergey Yanovich May 21, 2013, 10:42 a.m. UTC | #2
Dear Marek Vasut,

On Tue, 2013-05-21 at 12:39 +0200, Marek Vasut wrote:
> > 2.2.5.2 of Marvell PXA27x Processor Family Developers Manual says:
> > "The PXA27x processor cache configuration is identical to that of
> > the PXA255 processor."
> > 
> > As a result, it is perfectly legitimate to use PXA25X
> > 'lock_cache_for_stack' on PXA27X as well.

> Why would you need this on PXA27x ? Is SRAM not enough?

SRAM is battery-backup. It can be used as a ultra-fast persistent
storage in OS. Wasting a quater of it just to boot the system isn't the
best choice.
Marek Vasut May 21, 2013, 10:54 a.m. UTC | #3
Dear Sergey Yanovich,

> Dear Marek Vasut,
> 
> On Tue, 2013-05-21 at 12:39 +0200, Marek Vasut wrote:
> > > 2.2.5.2 of Marvell PXA27x Processor Family Developers Manual says:
> > > "The PXA27x processor cache configuration is identical to that of
> > > the PXA255 processor."
> > > 
> > > As a result, it is perfectly legitimate to use PXA25X
> > > 'lock_cache_for_stack' on PXA27X as well.
> > 
> > Why would you need this on PXA27x ? Is SRAM not enough?
> 
> SRAM is battery-backup.

SRAM is just the in-CPU bit of fast RAM. What do you mean by "battery-backup" ?

> It can be used as a ultra-fast persistent
> storage in OS. Wasting a quater of it just to boot the system isn't the
> best choice.

The SRAM is used for stack in U-Boot until you leave board_init_f, then the 
stack is relocated to DRAM. The OS can use SRAM as needed, U-Boot is no longer 
operational once you load subsequent OS. What's the problem?

Best regards,
Marek Vasut
Sergey Yanovich May 21, 2013, 11:23 a.m. UTC | #4
Dear Marek Vasut,

On Tue, 2013-05-21 at 12:54 +0200, Marek Vasut wrote:
> SRAM is just the in-CPU bit of fast RAM. What do you mean by "battery-backup" ?

Yes, you are right. It is 'for high speed code or data storage preserved
during low-power states' using a quote from PXA270 EMTS (top of page 1).
Battery-backup is optional. I mixed PXA270 and LP-8x4x specs.

> > It can be used as a ultra-fast persistent
> > storage in OS. Wasting a quater of it just to boot the system isn't the
> > best choice.
> 
> The SRAM is used for stack in U-Boot until you leave board_init_f, then the 
> stack is relocated to DRAM. The OS can use SRAM as needed, U-Boot is no longer 
> operational once you load subsequent OS. What's the problem?

Anyway, SRAM preserves its state when power is off. Poweroff time could
be in years with a backup battery. In addition, D-Cache is an order of
magnitude faster than SRAM (approx. 9 times) for both reads and writes.
Marek Vasut May 21, 2013, 11:38 a.m. UTC | #5
Dear Sergey Yanovich,

> Dear Marek Vasut,
> 
> On Tue, 2013-05-21 at 12:54 +0200, Marek Vasut wrote:
> > SRAM is just the in-CPU bit of fast RAM. What do you mean by
> > "battery-backup" ?
> 
> Yes, you are right. It is 'for high speed code or data storage preserved
> during low-power states' using a quote from PXA270 EMTS (top of page 1).
> Battery-backup is optional. I mixed PXA270 and LP-8x4x specs.

Yes, it's just an in-CPU RAM.

> > > It can be used as a ultra-fast persistent
> > > storage in OS. Wasting a quater of it just to boot the system isn't the
> > > best choice.
> > 
> > The SRAM is used for stack in U-Boot until you leave board_init_f, then
> > the stack is relocated to DRAM. The OS can use SRAM as needed, U-Boot is
> > no longer operational once you load subsequent OS. What's the problem?
> 
> Anyway, SRAM preserves its state when power is off. Poweroff time could
> be in years with a backup battery. In addition, D-Cache is an order of
> magnitude faster than SRAM (approx. 9 times) for both reads and writes.

Is there any measurable difference between using DCache and SRAM? Do you have 
any evidence that a speedup happens?

Still, the SRAM/DCache is only used until you leave board_init_f(), then it's 
all DRAM.

Best regards,
Marek Vasut
Sergey Yanovich May 21, 2013, 11:56 a.m. UTC | #6
Dear Marek Vasut,

On Tue, 2013-05-21 at 13:38 +0200, Marek Vasut wrote:
> Yes, it's just an in-CPU RAM.

Well, it is not 'just' RAM. It preserves its state during deep sleep and
power off modes.

> > Anyway, SRAM preserves its state when power is off. Poweroff time could
> > be in years with a backup battery. In addition, D-Cache is an order of
> > magnitude faster than SRAM (approx. 9 times) for both reads and writes.
> 
> Is there any measurable difference between using DCache and SRAM? Do you have 
> any evidence that a speedup happens?

I haven't done any special profiling. I am just relying on PXA270 EMTS
table 6-14. The table says SRAM reads take 9 cycles, writes take 7
cycles. D-Cache operations take 1 cycle.

> Still, the SRAM/DCache is only used until you leave board_init_f(), then it's 
> all DRAM.

Yes, the patch as it is will only affects relocation speed and preserve
SRAM from corruption. The speed gain can also be applied to uImage
copying/unpacking, but that requires deeper understanding than I have at
the moment.
Marek Vasut May 21, 2013, 3 p.m. UTC | #7
Dear Sergey Yanovich,

> Dear Marek Vasut,
> 
> On Tue, 2013-05-21 at 13:38 +0200, Marek Vasut wrote:
> > Yes, it's just an in-CPU RAM.
> 
> Well, it is not 'just' RAM. It preserves its state during deep sleep and
> power off modes.

So does RAM during sleep state ;-)

> > > Anyway, SRAM preserves its state when power is off. Poweroff time could
> > > be in years with a backup battery. In addition, D-Cache is an order of
> > > magnitude faster than SRAM (approx. 9 times) for both reads and writes.
> > 
> > Is there any measurable difference between using DCache and SRAM? Do you
> > have any evidence that a speedup happens?
> 
> I haven't done any special profiling. I am just relying on PXA270 EMTS
> table 6-14. The table says SRAM reads take 9 cycles, writes take 7
> cycles. D-Cache operations take 1 cycle.
> 
> > Still, the SRAM/DCache is only used until you leave board_init_f(), then
> > it's all DRAM.
> 
> Yes, the patch as it is will only affects relocation speed and preserve
> SRAM from corruption.

Now this is the right (convincing) argument! What kind of corruption ? When does 
it occur ?

> The speed gain can also be applied to uImage
> copying/unpacking, but that requires deeper understanding than I have at
> the moment.

Uh ... I lost you here. Can you please elaborate some more ?

Best regards,
Marek Vasut
Sergey Yanovich May 21, 2013, 4:31 p.m. UTC | #8
On Tue, 2013-05-21 at 17:00 +0200, Marek Vasut wrote:
> > Yes, the patch as it is will only affects relocation speed and preserve
> > SRAM from corruption.
> 
> Now this is the right (convincing) argument! What kind of corruption ? When does 
> it occur ?

The whole 256 kB of SRAM could be used for persistent storage with the
patch. Without it, part of SRAM should be dedicated for U-Boot stack or
be overwritten on boot.

> > The speed gain can also be applied to uImage
> > copying/unpacking, but that requires deeper understanding than I have at
> > the moment.
> 
> Uh ... I lost you here. Can you please elaborate some more ?

You are right. After SDRAM is configured, it is enough to turn on data
caching to receive its speed benefits.
Marek Vasut May 21, 2013, 7:02 p.m. UTC | #9
Dear Sergey Yanovich,

> On Tue, 2013-05-21 at 17:00 +0200, Marek Vasut wrote:
> > > Yes, the patch as it is will only affects relocation speed and preserve
> > > SRAM from corruption.
> > 
> > Now this is the right (convincing) argument! What kind of corruption ?
> > When does it occur ?
> 
> The whole 256 kB of SRAM could be used for persistent storage with the
> patch. Without it, part of SRAM should be dedicated for U-Boot stack or
> be overwritten on boot.

This won't hold on any PXA that uses SPL, like the vpac270 with OneNAND SPL and 
PXA3xx (which is out of tree, none of your concern ;-) )

> > > The speed gain can also be applied to uImage
> > > copying/unpacking, but that requires deeper understanding than I have
> > > at the moment.
> > 
> > Uh ... I lost you here. Can you please elaborate some more ?
> 
> You are right. After SDRAM is configured, it is enough to turn on data
> caching to receive its speed benefits.

You must make sure anything that uses DMA won't crash. But I don't understand 
how locking cachelines as RAM and enabling dcache relate to each other in this 
context.

Best regards,
Marek Vasut
Sergey Yanovich May 21, 2013, 7:18 p.m. UTC | #10
Dear Marek Vasut,

On Tue, 2013-05-21 at 21:02 +0200, Marek Vasut wrote:
> > The whole 256 kB of SRAM could be used for persistent storage with the
> > patch. Without it, part of SRAM should be dedicated for U-Boot stack or
> > be overwritten on boot.
> 
> This won't hold on any PXA that uses SPL, like the vpac270 with OneNAND SPL and 
> PXA3xx (which is out of tree, none of your concern ;-) )

I am no way trying to enforce D-Cache as RAM. The patch just provides an
option for those who needs it.

> > You are right. After SDRAM is configured, it is enough to turn on data
> > caching to receive its speed benefits.
> 
> You must make sure anything that uses DMA won't crash.

I wasn't sure why data cache is disabled in U-Boot for every board I saw
configuration of. Thanks for pointing out.

> But I don't understand 
> how locking cachelines as RAM and enabling dcache relate to each other in this 
> context.

I meant D-Cache is several times faster than SDRAM, so it may be
possible to get a bit faster, if stack remains on D-Cache even after
SDRAM is configured. Repeating my hedge, I don't see the full picture,
yet. It may well be impossible (if U-Boot needs more than 32 kB of
stack) or not worth the effort (if the gain is too small).
Marek Vasut May 21, 2013, 7:24 p.m. UTC | #11
Dear Sergey Yanovich,

> Dear Marek Vasut,
> 
> On Tue, 2013-05-21 at 21:02 +0200, Marek Vasut wrote:
> > > The whole 256 kB of SRAM could be used for persistent storage with the
> > > patch. Without it, part of SRAM should be dedicated for U-Boot stack or
> > > be overwritten on boot.
> > 
> > This won't hold on any PXA that uses SPL, like the vpac270 with OneNAND
> > SPL and PXA3xx (which is out of tree, none of your concern ;-) )
> 
> I am no way trying to enforce D-Cache as RAM. The patch just provides an
> option for those who needs it.

I'd love to have a uniform way to do this cache thing, really ...

> > > You are right. After SDRAM is configured, it is enough to turn on data
> > > caching to receive its speed benefits.
> > 
> > You must make sure anything that uses DMA won't crash.
> 
> I wasn't sure why data cache is disabled in U-Boot for every board I saw
> configuration of. Thanks for pointing out.
> 
> > But I don't understand
> > how locking cachelines as RAM and enabling dcache relate to each other in
> > this context.
> 
> I meant D-Cache is several times faster than SDRAM, so it may be
> possible to get a bit faster, if stack remains on D-Cache even after
> SDRAM is configured.

Not really, enabling dcache altogether and marking DRAM region as cached would 
be much better.

> Repeating my hedge, I don't see the full picture,
> yet. It may well be impossible (if U-Boot needs more than 32 kB of
> stack)

No way.

> or not worth the effort (if the gain is too small).

The larger gain would be from fixing the U-Boot drivers for PXA to work well 
with DCache ;-) Then the speedup would really be plenty significant, this can be 
well confirmed on many other ARM chips.

Best regards,
Marek Vasut
Sergey Yanovich May 21, 2013, 7:42 p.m. UTC | #12
Dear Marek Vasut,

On Tue, 2013-05-21 at 21:24 +0200, Marek Vasut wrote:
> I'd love to have a uniform way to do this cache thing, really ...

Requoting the spec 'The PXA27x processor cache configuration is
identical to that of the PXA255 processor'. It looks safe to configure
all PXA2XX chipsets this way.

Maybe I am missing something, but SPL is no exception in this case.

> Not really, enabling dcache altogether and marking DRAM region as cached would 
> be much better.
> 
> > Repeating my hedge, I don't see the full picture,
> > yet. It may well be impossible (if U-Boot needs more than 32 kB of
> > stack)
> 
> No way.
> 
> > or not worth the effort (if the gain is too small).
> 
> The larger gain would be from fixing the U-Boot drivers for PXA to work well 
> with DCache ;-) Then the speedup would really be plenty significant, this can be 
> well confirmed on many other ARM chips.

I have plans to dig deeper into this after I complete the current
project. Faster boot is always a good thing. Thanks for explaining in
details.
Marek Vasut May 21, 2013, 8:07 p.m. UTC | #13
Dear Sergey Yanovich,

> Dear Marek Vasut,
> 
> On Tue, 2013-05-21 at 21:24 +0200, Marek Vasut wrote:
> > I'd love to have a uniform way to do this cache thing, really ...
> 
> Requoting the spec 'The PXA27x processor cache configuration is
> identical to that of the PXA255 processor'. It looks safe to configure
> all PXA2XX chipsets this way.
> 
> Maybe I am missing something, but SPL is no exception in this case.

The OneNAND has 1kbyte limit on code it will directly address when booting from 
it, you can't even make generate the MMU table there.

> > Not really, enabling dcache altogether and marking DRAM region as cached
> > would be much better.
> > 
> > > Repeating my hedge, I don't see the full picture,
> > > yet. It may well be impossible (if U-Boot needs more than 32 kB of
> > > stack)
> > 
> > No way.
> > 
> > > or not worth the effort (if the gain is too small).
> > 
> > The larger gain would be from fixing the U-Boot drivers for PXA to work
> > well with DCache ;-) Then the speedup would really be plenty
> > significant, this can be well confirmed on many other ARM chips.
> 
> I have plans to dig deeper into this after I complete the current
> project. Faster boot is always a good thing. Thanks for explaining in
> details.

Sure, yet I think I just piled work onto you ;-)

Best regards,
Marek Vasut
Sergey Yanovich May 21, 2013, 8:24 p.m. UTC | #14
Dear Marek Vasut,

On Tue, 2013-05-21 at 22:07 +0200, Marek Vasut wrote:
> The OneNAND has 1kbyte limit on code it will directly address when booting from 
> it, you can't even make generate the MMU table there.

Do you mean there is no space left inside that 1K for
lock_cache_for_stack()?

> > I have plans to dig deeper into this after I complete the current
> > project. Faster boot is always a good thing. Thanks for explaining in
> > details.
> 
> Sure, yet I think I just piled work onto you ;-)

It's OK. I have plans to do this anyway.
Marek Vasut May 21, 2013, 9:38 p.m. UTC | #15
Dear Sergey Yanovich,

> Dear Marek Vasut,
> 
> On Tue, 2013-05-21 at 22:07 +0200, Marek Vasut wrote:
> > The OneNAND has 1kbyte limit on code it will directly address when
> > booting from it, you can't even make generate the MMU table there.
> 
> Do you mean there is no space left inside that 1K for
> lock_cache_for_stack()?

How would you do that ? You need MMU enabled to lock lines IIRC.

> > > I have plans to dig deeper into this after I complete the current
> > > project. Faster boot is always a good thing. Thanks for explaining in
> > > details.
> > 
> > Sure, yet I think I just piled work onto you ;-)
> 
> It's OK. I have plans to do this anyway.

CCing Mike Dunn, you two should coordinate.

Best regards,
Marek Vasut
Sergey Yanovich May 22, 2013, 1:02 p.m. UTC | #16
Dear Marek Vasut,

On Tue, 2013-05-21 at 23:38 +0200, Marek Vasut wrote:
> > Do you mean there is no space left inside that 1K for
> > lock_cache_for_stack()?
> 
> How would you do that ? You need MMU enabled to lock lines IIRC.

I see I don't know enough to continue the discussion, yet.

Concerning the patch in question, is it possible to allow cache for ram
on PXA270 somehow before we have a final solution?
Marek Vasut May 22, 2013, 1:04 p.m. UTC | #17
Dear Sergey Yanovich,

> Dear Marek Vasut,
> 
> On Tue, 2013-05-21 at 23:38 +0200, Marek Vasut wrote:
> > > Do you mean there is no space left inside that 1K for
> > > lock_cache_for_stack()?
> > 
> > How would you do that ? You need MMU enabled to lock lines IIRC.
> 
> I see I don't know enough to continue the discussion, yet.
> 
> Concerning the patch in question, is it possible to allow cache for ram
> on PXA270 somehow before we have a final solution?

I dont mind discussing this further and even using DCache for stack in 
board_init_f(), but we must make sure nothing gets broken.

Best regards,
Marek Vasut
Sergey Yanovich May 22, 2013, 1:21 p.m. UTC | #18
Dear Marek Vasut,

On Wed, 2013-05-22 at 15:04 +0200, Marek Vasut wrote:
> > Concerning the patch in question, is it possible to allow cache for ram
> > on PXA270 somehow before we have a final solution?
> 
> I dont mind discussing this further and even using DCache for stack in 
> board_init_f(), but we must make sure nothing gets broken.

If it is possible, I would like to have a configuration option, which
allows to use D-Cache as RAM not only on PXA25X, but also on PXA27X. I
propose to preserve status-quo for now and only use D-Cache as RAM when
it is explicitly configured so.

You mentioned in a separate letter, that config options need
documenting. This part is missing in my patch, and I'll fix this, if the
above is OK.
Marek Vasut May 22, 2013, 1:54 p.m. UTC | #19
Dear Sergey Yanovich,

> Dear Marek Vasut,
> 
> On Wed, 2013-05-22 at 15:04 +0200, Marek Vasut wrote:
> > > Concerning the patch in question, is it possible to allow cache for ram
> > > on PXA270 somehow before we have a final solution?
> > 
> > I dont mind discussing this further and even using DCache for stack in
> > board_init_f(), but we must make sure nothing gets broken.
> 
> If it is possible, I would like to have a configuration option, which
> allows to use D-Cache as RAM not only on PXA25X, but also on PXA27X. I
> propose to preserve status-quo for now and only use D-Cache as RAM when
> it is explicitly configured so.

See above, let's do it properly please. Check the OneNAND SPL config on vpac270 
board, will it still work with this enabled?

> You mentioned in a separate letter, that config options need
> documenting. This part is missing in my patch, and I'll fix this, if the
> above is OK.

Best regards,
Marek Vasut
Mike Dunn May 23, 2013, 5:43 p.m. UTC | #20
On 05/21/2013 02:38 PM, Marek Vasut wrote:
> Dear Sergey Yanovich,
> 
>> Dear Marek Vasut,
>>
>> On Tue, 2013-05-21 at 22:07 +0200, Marek Vasut wrote:
>>> The OneNAND has 1kbyte limit on code it will directly address when
>>> booting from it, you can't even make generate the MMU table there.
>>
>> Do you mean there is no space left inside that 1K for
>> lock_cache_for_stack()?
> 
> How would you do that ? You need MMU enabled to lock lines IIRC.
> 
>>>> I have plans to dig deeper into this after I complete the current
>>>> project. Faster boot is always a good thing. Thanks for explaining in
>>>> details.
>>>
>>> Sure, yet I think I just piled work onto you ;-)
>>
>> It's OK. I have plans to do this anyway.
> 
> CCing Mike Dunn, you two should coordinate.


Thanks Marek.  Was following this on the ML.  Will take a look when I get a chance.
diff mbox

Patch

diff --git a/arch/arm/cpu/pxa/start.S b/arch/arm/cpu/pxa/start.S
index ada91a6..5ea512e 100644
--- a/arch/arm/cpu/pxa/start.S
+++ b/arch/arm/cpu/pxa/start.S
@@ -40,6 +40,12 @@ 
 #include <version.h>
 
 #ifdef CONFIG_CPU_PXA25X
+#ifndef CONFIG_PXA2XX_CACHE_AS_RAM
+#define CONFIG_PXA2XX_CACHE_AS_RAM
+#endif
+#endif
+
+#ifdef CONFIG_PXA2XX_CACHE_AS_RAM
 #if ((CONFIG_SYS_INIT_SP_ADDR) != 0xfffff800)
 #error "Init SP address must be set to 0xfffff800 for PXA250"
 #endif
@@ -164,7 +170,7 @@  reset:
 	bl  cpu_init_crit
 #endif
 
-#ifdef	CONFIG_CPU_PXA25X
+#ifdef	CONFIG_PXA2XX_CACHE_AS_RAM
 	bl	lock_cache_for_stack
 #endif
 
@@ -482,7 +488,7 @@  fiq:
  * This is useful on PXA25x and PXA26x in early bootstages, where there is no
  * other possible memory available to hold stack.
  */
-#ifdef CONFIG_CPU_PXA25X
+#ifdef CONFIG_PXA2XX_CACHE_AS_RAM
 .macro CPWAIT reg
 	mrc	p15, 0, \reg, c2, c0, 0
 	mov	\reg, \reg
diff --git a/include/configs/lp8x4x.h b/include/configs/lp8x4x.h
index 026f321..27ff2f4 100644
--- a/include/configs/lp8x4x.h
+++ b/include/configs/lp8x4x.h
@@ -149,8 +149,9 @@ 
 
 #define	CONFIG_SYS_LOAD_ADDR		0xa0008000
 #define CONFIG_SYS_SDRAM_BASE		PHYS_SDRAM_1
-/* Use first 64kb bank of the internal SRAM */
-#define	CONFIG_SYS_INIT_SP_ADDR		0x5c010000
+/* Use CPU data cache as internal RAM */
+#define CONFIG_PXA2XX_CACHE_AS_RAM
+#define	CONFIG_SYS_INIT_SP_ADDR		0xfffff800
 
 /*
  * NOR FLASH