diff mbox

[U-Boot,RFC] common: board_f: Make reserve_mmu a weak function

Message ID 524458978d26f518c021d3716da46519ea8a410f.1499953800.git.michal.simek@xilinx.com
State Accepted
Commit 60873f736e46c0b2fe80c8c0d322bcdbdb8acc63
Delegated to: Tom Rini
Headers show

Commit Message

Michal Simek July 13, 2017, 1:50 p.m. UTC
From: Siva Durga Prasad Paladugu <siva.durga.paladugu@xilinx.com>

Make reserve_mmu a weak so that it provides an option
to customize this routine as per platform need

Signed-off-by: Siva Durga Prasad Paladugu <sivadur@xilinx.com>
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
---

 common/board_f.c | 2 +-
 include/common.h | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

Comments

Simon Glass July 18, 2017, 2:01 p.m. UTC | #1
Hi Michal,

On 13 July 2017 at 06:50, Michal Simek <michal.simek@xilinx.com> wrote:
> From: Siva Durga Prasad Paladugu <siva.durga.paladugu@xilinx.com>
>
> Make reserve_mmu a weak so that it provides an option
> to customize this routine as per platform need

Why do you need this? Can we instead have the generic code do the
right thing? Or can we use reserve_arch() to handle this?

>
> Signed-off-by: Siva Durga Prasad Paladugu <sivadur@xilinx.com>
> Signed-off-by: Michal Simek <michal.simek@xilinx.com>
> ---
>
>  common/board_f.c | 2 +-
>  include/common.h | 1 +
>  2 files changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/common/board_f.c b/common/board_f.c
> index 2cdd12a503ae..09f0fbfb6c96 100644
> --- a/common/board_f.c
> +++ b/common/board_f.c
> @@ -340,7 +340,7 @@ static int reserve_round_4k(void)
>  }
>
>  #ifdef CONFIG_ARM
> -static int reserve_mmu(void)
> +__weak int reserve_mmu(void)
>  {
>  #if !(defined(CONFIG_SYS_ICACHE_OFF) && defined(CONFIG_SYS_DCACHE_OFF))
>         /* reserve TLB table */
> diff --git a/include/common.h b/include/common.h
> index 1a98512ab618..ce4c92015928 100644
> --- a/include/common.h
> +++ b/include/common.h
> @@ -286,6 +286,7 @@ void board_show_dram(phys_size_t size);
>   */
>  int arch_fixup_fdt(void *blob);
>
> +int reserve_mmu(void);
>  /* common/flash.c */
>  void flash_perror (int);
>
> --
> 1.9.1
>
Michal Simek July 18, 2017, 2:23 p.m. UTC | #2
Hi Simon,

On 18.7.2017 16:01, Simon Glass wrote:
> Hi Michal,
> 
> On 13 July 2017 at 06:50, Michal Simek <michal.simek@xilinx.com> wrote:
>> From: Siva Durga Prasad Paladugu <siva.durga.paladugu@xilinx.com>
>>
>> Make reserve_mmu a weak so that it provides an option
>> to customize this routine as per platform need
> 
> Why do you need this? Can we instead have the generic code do the
> right thing? Or can we use reserve_arch() to handle this?

reserve_mmu is called just for ARM.
What we need to do is support configuration where we have small memory
OCM and we need to put mmu table to different location (TCM) because it
is simply big and it can't be placed where it is placed now.

Based on looking at code the key question is if we can call reserve_mmu
at that time when reserve_arch is called now.
On zynqmp I can't see any issue to allocate tlb on the stack and call it
from it. The question is if this is valid for all arms.

This is just a small hack I have created to move it to stack.

 diff --git a/common/board_f.c b/common/board_f.c
 index 88e20e0168b2..32386c957962 100644
 --- a/common/board_f.c
 +++ b/common/board_f.c
 @@ -433,13 +433,13 @@ __weak int reserve_mmu(void)
  {
         /* reserve TLB table */
         gd->arch.tlb_size = PGTABLE_SIZE;
 -       gd->relocaddr -= gd->arch.tlb_size;
 +       gd->start_addr_sp -= gd->arch.tlb_size;

         /* round down to next 64 kB limit */
 -       gd->relocaddr &= ~(0x10000 - 1);
 +       gd->start_addr_sp &= ~(0x10000 - 1);

 -       gd->arch.tlb_addr = gd->relocaddr;
 -       debug("TLB table from %08lx to %08lx\n", gd->arch.tlb_addr,
 +       gd->arch.tlb_addr = gd->start_addr_sp;
 +       printf("TLB table from %08lx to %08lx\n", gd->arch.tlb_addr,
               gd->arch.tlb_addr + gd->arch.tlb_size);

  #ifdef CONFIG_SYS_MEM_RESERVE_SECURE
 @@ -837,7 +837,7 @@ static int initf_dm(void)
  /* Architecture-specific memory reservation */
  __weak int reserve_arch(void)
  {
 -       return 0;
 +       return reserve_mmu();
  }

  __weak int arch_cpu_init_dm(void)
 @@ -998,10 +998,6 @@ static init_fnc_t init_sequence_f[] = {
         reserve_pram,
  #endif
         reserve_round_4k,
 -#if !(defined(CONFIG_SYS_ICACHE_OFF) &&
defined(CONFIG_SYS_DCACHE_OFF)) && \
 -               defined(CONFIG_ARM)
 -       reserve_mmu,
 -#endif
  #ifdef CONFIG_DM_VIDEO
         reserve_video,
  #else

For us we can't just rewrite tlb addresses as we want to do because we
can't allow reserve_mmu to allocate 64kB alied PGTABLE_SIZE in
relocation structure.
What do you think?

Thanks,
Michal
Simon Glass July 18, 2017, 2:50 p.m. UTC | #3
Hi Michal,

On 18 July 2017 at 07:23, Michal Simek <michal.simek@xilinx.com> wrote:
> Hi Simon,
>
> On 18.7.2017 16:01, Simon Glass wrote:
>> Hi Michal,
>>
>> On 13 July 2017 at 06:50, Michal Simek <michal.simek@xilinx.com> wrote:
>>> From: Siva Durga Prasad Paladugu <siva.durga.paladugu@xilinx.com>
>>>
>>> Make reserve_mmu a weak so that it provides an option
>>> to customize this routine as per platform need
>>
>> Why do you need this? Can we instead have the generic code do the
>> right thing? Or can we use reserve_arch() to handle this?
>
> reserve_mmu is called just for ARM.

What arch are you using?

> What we need to do is support configuration where we have small memory
> OCM and we need to put mmu table to different location (TCM) because it
> is simply big and it can't be placed where it is placed now.

What is TCM? Is this the internal SRAM?

I don't understand 'simply big'. Are you saying it is too big to go
into main memory?

>
> Based on looking at code the key question is if we can call reserve_mmu
> at that time when reserve_arch is called now.
> On zynqmp I can't see any issue to allocate tlb on the stack and call it
> from it. The question is if this is valid for all arms.
>
> This is just a small hack I have created to move it to stack.
>
>  diff --git a/common/board_f.c b/common/board_f.c
>  index 88e20e0168b2..32386c957962 100644
>  --- a/common/board_f.c
>  +++ b/common/board_f.c
>  @@ -433,13 +433,13 @@ __weak int reserve_mmu(void)
>   {
>          /* reserve TLB table */
>          gd->arch.tlb_size = PGTABLE_SIZE;
>  -       gd->relocaddr -= gd->arch.tlb_size;
>  +       gd->start_addr_sp -= gd->arch.tlb_size;
>
>          /* round down to next 64 kB limit */
>  -       gd->relocaddr &= ~(0x10000 - 1);
>  +       gd->start_addr_sp &= ~(0x10000 - 1);
>
>  -       gd->arch.tlb_addr = gd->relocaddr;
>  -       debug("TLB table from %08lx to %08lx\n", gd->arch.tlb_addr,
>  +       gd->arch.tlb_addr = gd->start_addr_sp;
>  +       printf("TLB table from %08lx to %08lx\n", gd->arch.tlb_addr,
>                gd->arch.tlb_addr + gd->arch.tlb_size);
>
>   #ifdef CONFIG_SYS_MEM_RESERVE_SECURE
>  @@ -837,7 +837,7 @@ static int initf_dm(void)
>   /* Architecture-specific memory reservation */
>   __weak int reserve_arch(void)
>   {
>  -       return 0;
>  +       return reserve_mmu();
>   }
>
>   __weak int arch_cpu_init_dm(void)
>  @@ -998,10 +998,6 @@ static init_fnc_t init_sequence_f[] = {
>          reserve_pram,
>   #endif
>          reserve_round_4k,
>  -#if !(defined(CONFIG_SYS_ICACHE_OFF) &&
> defined(CONFIG_SYS_DCACHE_OFF)) && \
>  -               defined(CONFIG_ARM)
>  -       reserve_mmu,
>  -#endif
>   #ifdef CONFIG_DM_VIDEO
>          reserve_video,
>   #else
>
> For us we can't just rewrite tlb addresses as we want to do because we
> can't allow reserve_mmu to allocate 64kB alied PGTABLE_SIZE in
> relocation structure.

Why not?

> What do you think?
>
> Thanks,
> Michal
>

Regards,
Simon
Michal Simek July 18, 2017, 2:59 p.m. UTC | #4
Hi Simon,

On 18.7.2017 16:50, Simon Glass wrote:
> Hi Michal,
> 
> On 18 July 2017 at 07:23, Michal Simek <michal.simek@xilinx.com> wrote:
>> Hi Simon,
>>
>> On 18.7.2017 16:01, Simon Glass wrote:
>>> Hi Michal,
>>>
>>> On 13 July 2017 at 06:50, Michal Simek <michal.simek@xilinx.com> wrote:
>>>> From: Siva Durga Prasad Paladugu <siva.durga.paladugu@xilinx.com>
>>>>
>>>> Make reserve_mmu a weak so that it provides an option
>>>> to customize this routine as per platform need
>>>
>>> Why do you need this? Can we instead have the generic code do the
>>> right thing? Or can we use reserve_arch() to handle this?
>>
>> reserve_mmu is called just for ARM.
> 
> What arch are you using?

arm64 - a53s

> 
>> What we need to do is support configuration where we have small memory
>> OCM and we need to put mmu table to different location (TCM) because it
>> is simply big and it can't be placed where it is placed now.
> 
> What is TCM? Is this the internal SRAM?

It is internal sram which is used mostly by R5s in the system but a53
has also access to it.

> 
> I don't understand 'simply big'. Are you saying it is too big to go
> into main memory?

These configuration can't use DDR. One of that configuration which we
are using is DDR memory test. It means you run from internal sram (256kB
OCM + 256k TCM).
Another configuration is for loading stuff to EMMC on different boards.
It means you don't need to setup DDR for generic emmc programming.

All these configurations are using console over DCC - arm jtag uart.

> 
>>
>> Based on looking at code the key question is if we can call reserve_mmu
>> at that time when reserve_arch is called now.
>> On zynqmp I can't see any issue to allocate tlb on the stack and call it
>> from it. The question is if this is valid for all arms.
>>
>> This is just a small hack I have created to move it to stack.
>>
>>  diff --git a/common/board_f.c b/common/board_f.c
>>  index 88e20e0168b2..32386c957962 100644
>>  --- a/common/board_f.c
>>  +++ b/common/board_f.c
>>  @@ -433,13 +433,13 @@ __weak int reserve_mmu(void)
>>   {
>>          /* reserve TLB table */
>>          gd->arch.tlb_size = PGTABLE_SIZE;
>>  -       gd->relocaddr -= gd->arch.tlb_size;
>>  +       gd->start_addr_sp -= gd->arch.tlb_size;
>>
>>          /* round down to next 64 kB limit */
>>  -       gd->relocaddr &= ~(0x10000 - 1);
>>  +       gd->start_addr_sp &= ~(0x10000 - 1);
>>
>>  -       gd->arch.tlb_addr = gd->relocaddr;
>>  -       debug("TLB table from %08lx to %08lx\n", gd->arch.tlb_addr,
>>  +       gd->arch.tlb_addr = gd->start_addr_sp;
>>  +       printf("TLB table from %08lx to %08lx\n", gd->arch.tlb_addr,
>>                gd->arch.tlb_addr + gd->arch.tlb_size);
>>
>>   #ifdef CONFIG_SYS_MEM_RESERVE_SECURE
>>  @@ -837,7 +837,7 @@ static int initf_dm(void)
>>   /* Architecture-specific memory reservation */
>>   __weak int reserve_arch(void)
>>   {
>>  -       return 0;
>>  +       return reserve_mmu();
>>   }
>>
>>   __weak int arch_cpu_init_dm(void)
>>  @@ -998,10 +998,6 @@ static init_fnc_t init_sequence_f[] = {
>>          reserve_pram,
>>   #endif
>>          reserve_round_4k,
>>  -#if !(defined(CONFIG_SYS_ICACHE_OFF) &&
>> defined(CONFIG_SYS_DCACHE_OFF)) && \
>>  -               defined(CONFIG_ARM)
>>  -       reserve_mmu,
>>  -#endif
>>   #ifdef CONFIG_DM_VIDEO
>>          reserve_video,
>>   #else
>>
>> For us we can't just rewrite tlb addresses as we want to do because we
>> can't allow reserve_mmu to allocate 64kB alied PGTABLE_SIZE in
>> relocation structure.
> 
> Why not?

Because we need to fit to 256kB of memory also with relocation and also
space for data. And IIRC MMU table size is 64kB.
Another option could be to disable relocation.

Thanks,
Michal
York Sun July 19, 2017, 2:18 a.m. UTC | #5
On 07/18/2017 10:59 PM, Michal Simek wrote:
> Hi Simon,
> 
> On 18.7.2017 16:50, Simon Glass wrote:
>> Hi Michal,
>>
>> On 18 July 2017 at 07:23, Michal Simek <michal.simek@xilinx.com> wrote:
>>> Hi Simon,
>>>
>>> On 18.7.2017 16:01, Simon Glass wrote:
>>>> Hi Michal,
>>>>
>>>> On 13 July 2017 at 06:50, Michal Simek <michal.simek@xilinx.com> wrote:
>>>>> From: Siva Durga Prasad Paladugu <siva.durga.paladugu@xilinx.com>
>>>>>
>>>>> Make reserve_mmu a weak so that it provides an option
>>>>> to customize this routine as per platform need
>>>>
>>>> Why do you need this? Can we instead have the generic code do the
>>>> right thing? Or can we use reserve_arch() to handle this?
>>>
>>> reserve_mmu is called just for ARM.
>>
>> What arch are you using?
> 
> arm64 - a53s
> 
>>
>>> What we need to do is support configuration where we have small memory
>>> OCM and we need to put mmu table to different location (TCM) because it
>>> is simply big and it can't be placed where it is placed now.
>>
>> What is TCM? Is this the internal SRAM?
> 
> It is internal sram which is used mostly by R5s in the system but a53
> has also access to it.
> 
>>
>> I don't understand 'simply big'. Are you saying it is too big to go
>> into main memory?
> 
> These configuration can't use DDR. One of that configuration which we
> are using is DDR memory test. It means you run from internal sram (256kB
> OCM + 256k TCM).
> Another configuration is for loading stuff to EMMC on different boards.
> It means you don't need to setup DDR for generic emmc programming.
> 
> All these configurations are using console over DCC - arm jtag uart.
> 
>>
>>>
>>> Based on looking at code the key question is if we can call reserve_mmu
>>> at that time when reserve_arch is called now.
>>> On zynqmp I can't see any issue to allocate tlb on the stack and call it
>>> from it. The question is if this is valid for all arms.
>>>
>>> This is just a small hack I have created to move it to stack.
>>>
>>>   diff --git a/common/board_f.c b/common/board_f.c
>>>   index 88e20e0168b2..32386c957962 100644
>>>   --- a/common/board_f.c
>>>   +++ b/common/board_f.c
>>>   @@ -433,13 +433,13 @@ __weak int reserve_mmu(void)
>>>    {
>>>           /* reserve TLB table */
>>>           gd->arch.tlb_size = PGTABLE_SIZE;
>>>   -       gd->relocaddr -= gd->arch.tlb_size;
>>>   +       gd->start_addr_sp -= gd->arch.tlb_size;
>>>
>>>           /* round down to next 64 kB limit */
>>>   -       gd->relocaddr &= ~(0x10000 - 1);
>>>   +       gd->start_addr_sp &= ~(0x10000 - 1);
>>>
>>>   -       gd->arch.tlb_addr = gd->relocaddr;
>>>   -       debug("TLB table from %08lx to %08lx\n", gd->arch.tlb_addr,
>>>   +       gd->arch.tlb_addr = gd->start_addr_sp;
>>>   +       printf("TLB table from %08lx to %08lx\n", gd->arch.tlb_addr,
>>>                 gd->arch.tlb_addr + gd->arch.tlb_size);
>>>
>>>    #ifdef CONFIG_SYS_MEM_RESERVE_SECURE
>>>   @@ -837,7 +837,7 @@ static int initf_dm(void)
>>>    /* Architecture-specific memory reservation */
>>>    __weak int reserve_arch(void)
>>>    {
>>>   -       return 0;
>>>   +       return reserve_mmu();
>>>    }
>>>
>>>    __weak int arch_cpu_init_dm(void)
>>>   @@ -998,10 +998,6 @@ static init_fnc_t init_sequence_f[] = {
>>>           reserve_pram,
>>>    #endif
>>>           reserve_round_4k,
>>>   -#if !(defined(CONFIG_SYS_ICACHE_OFF) &&
>>> defined(CONFIG_SYS_DCACHE_OFF)) && \
>>>   -               defined(CONFIG_ARM)
>>>   -       reserve_mmu,
>>>   -#endif
>>>    #ifdef CONFIG_DM_VIDEO
>>>           reserve_video,
>>>    #else
>>>
>>> For us we can't just rewrite tlb addresses as we want to do because we
>>> can't allow reserve_mmu to allocate 64kB alied PGTABLE_SIZE in
>>> relocation structure.
>>
>> Why not?
> 
> Because we need to fit to 256kB of memory also with relocation and also
> space for data. And IIRC MMU table size is 64kB.
> Another option could be to disable relocation.
> 
Michal,

For your reference, we use two stage MMU setup for FSL ARMv8 SoCs. We 
setup an early MMU in arch_cpu_init() so we can enable d-cache very 
early to speed up execution. Our DDR init and testing is done on top of 
that. After DDR init, we use the reserved MMU table in DDR to setup a 
final MMU for normal use.

York
Michal Simek July 19, 2017, 8:25 a.m. UTC | #6
On 19.7.2017 04:18, York Sun wrote:
> On 07/18/2017 10:59 PM, Michal Simek wrote:
>> Hi Simon,
>>
>> On 18.7.2017 16:50, Simon Glass wrote:
>>> Hi Michal,
>>>
>>> On 18 July 2017 at 07:23, Michal Simek <michal.simek@xilinx.com> wrote:
>>>> Hi Simon,
>>>>
>>>> On 18.7.2017 16:01, Simon Glass wrote:
>>>>> Hi Michal,
>>>>>
>>>>> On 13 July 2017 at 06:50, Michal Simek <michal.simek@xilinx.com> wrote:
>>>>>> From: Siva Durga Prasad Paladugu <siva.durga.paladugu@xilinx.com>
>>>>>>
>>>>>> Make reserve_mmu a weak so that it provides an option
>>>>>> to customize this routine as per platform need
>>>>>
>>>>> Why do you need this? Can we instead have the generic code do the
>>>>> right thing? Or can we use reserve_arch() to handle this?
>>>>
>>>> reserve_mmu is called just for ARM.
>>>
>>> What arch are you using?
>>
>> arm64 - a53s
>>
>>>
>>>> What we need to do is support configuration where we have small memory
>>>> OCM and we need to put mmu table to different location (TCM) because it
>>>> is simply big and it can't be placed where it is placed now.
>>>
>>> What is TCM? Is this the internal SRAM?
>>
>> It is internal sram which is used mostly by R5s in the system but a53
>> has also access to it.
>>
>>>
>>> I don't understand 'simply big'. Are you saying it is too big to go
>>> into main memory?
>>
>> These configuration can't use DDR. One of that configuration which we
>> are using is DDR memory test. It means you run from internal sram (256kB
>> OCM + 256k TCM).
>> Another configuration is for loading stuff to EMMC on different boards.
>> It means you don't need to setup DDR for generic emmc programming.
>>
>> All these configurations are using console over DCC - arm jtag uart.
>>
>>>
>>>>
>>>> Based on looking at code the key question is if we can call reserve_mmu
>>>> at that time when reserve_arch is called now.
>>>> On zynqmp I can't see any issue to allocate tlb on the stack and call it
>>>> from it. The question is if this is valid for all arms.
>>>>
>>>> This is just a small hack I have created to move it to stack.
>>>>
>>>>   diff --git a/common/board_f.c b/common/board_f.c
>>>>   index 88e20e0168b2..32386c957962 100644
>>>>   --- a/common/board_f.c
>>>>   +++ b/common/board_f.c
>>>>   @@ -433,13 +433,13 @@ __weak int reserve_mmu(void)
>>>>    {
>>>>           /* reserve TLB table */
>>>>           gd->arch.tlb_size = PGTABLE_SIZE;
>>>>   -       gd->relocaddr -= gd->arch.tlb_size;
>>>>   +       gd->start_addr_sp -= gd->arch.tlb_size;
>>>>
>>>>           /* round down to next 64 kB limit */
>>>>   -       gd->relocaddr &= ~(0x10000 - 1);
>>>>   +       gd->start_addr_sp &= ~(0x10000 - 1);
>>>>
>>>>   -       gd->arch.tlb_addr = gd->relocaddr;
>>>>   -       debug("TLB table from %08lx to %08lx\n", gd->arch.tlb_addr,
>>>>   +       gd->arch.tlb_addr = gd->start_addr_sp;
>>>>   +       printf("TLB table from %08lx to %08lx\n", gd->arch.tlb_addr,
>>>>                 gd->arch.tlb_addr + gd->arch.tlb_size);
>>>>
>>>>    #ifdef CONFIG_SYS_MEM_RESERVE_SECURE
>>>>   @@ -837,7 +837,7 @@ static int initf_dm(void)
>>>>    /* Architecture-specific memory reservation */
>>>>    __weak int reserve_arch(void)
>>>>    {
>>>>   -       return 0;
>>>>   +       return reserve_mmu();
>>>>    }
>>>>
>>>>    __weak int arch_cpu_init_dm(void)
>>>>   @@ -998,10 +998,6 @@ static init_fnc_t init_sequence_f[] = {
>>>>           reserve_pram,
>>>>    #endif
>>>>           reserve_round_4k,
>>>>   -#if !(defined(CONFIG_SYS_ICACHE_OFF) &&
>>>> defined(CONFIG_SYS_DCACHE_OFF)) && \
>>>>   -               defined(CONFIG_ARM)
>>>>   -       reserve_mmu,
>>>>   -#endif
>>>>    #ifdef CONFIG_DM_VIDEO
>>>>           reserve_video,
>>>>    #else
>>>>
>>>> For us we can't just rewrite tlb addresses as we want to do because we
>>>> can't allow reserve_mmu to allocate 64kB alied PGTABLE_SIZE in
>>>> relocation structure.
>>>
>>> Why not?
>>
>> Because we need to fit to 256kB of memory also with relocation and also
>> space for data. And IIRC MMU table size is 64kB.
>> Another option could be to disable relocation.
>>
> Michal,
> 
> For your reference, we use two stage MMU setup for FSL ARMv8 SoCs. We 
> setup an early MMU in arch_cpu_init() so we can enable d-cache very 
> early to speed up execution. 

How big is this speed up?

Our DDR init and testing is done on top of
> that. After DDR init, we use the reserved MMU table in DDR to setup a 
> final MMU for normal use.

ok. This is a little bit different purpose but yes - we just need what
you do in early to be consistent.
Maybe there is an option simply check if tlb_size is filled and do not
allocate space for it but this will break your SoC because you don't
want to keep early mmu mapping in OCRAM.

Thanks,
Michal
Simon Glass July 19, 2017, 9:05 a.m. UTC | #7
Hi Michal,

On 18 July 2017 at 07:59, Michal Simek <michal.simek@xilinx.com> wrote:
> Hi Simon,
>
> On 18.7.2017 16:50, Simon Glass wrote:
>> Hi Michal,
>>
>> On 18 July 2017 at 07:23, Michal Simek <michal.simek@xilinx.com> wrote:
>>> Hi Simon,
>>>
>>> On 18.7.2017 16:01, Simon Glass wrote:
>>>> Hi Michal,
>>>>
>>>> On 13 July 2017 at 06:50, Michal Simek <michal.simek@xilinx.com> wrote:
>>>>> From: Siva Durga Prasad Paladugu <siva.durga.paladugu@xilinx.com>
>>>>>
>>>>> Make reserve_mmu a weak so that it provides an option
>>>>> to customize this routine as per platform need
>>>>
>>>> Why do you need this? Can we instead have the generic code do the
>>>> right thing? Or can we use reserve_arch() to handle this?
>>>
>>> reserve_mmu is called just for ARM.
>>
>> What arch are you using?
>
> arm64 - a53s
>
>>
>>> What we need to do is support configuration where we have small memory
>>> OCM and we need to put mmu table to different location (TCM) because it
>>> is simply big and it can't be placed where it is placed now.
>>
>> What is TCM? Is this the internal SRAM?
>
> It is internal sram which is used mostly by R5s in the system but a53
> has also access to it.
>
>>
>> I don't understand 'simply big'. Are you saying it is too big to go
>> into main memory?
>
> These configuration can't use DDR. One of that configuration which we
> are using is DDR memory test. It means you run from internal sram (256kB
> OCM + 256k TCM).
> Another configuration is for loading stuff to EMMC on different boards.
> It means you don't need to setup DDR for generic emmc programming.
>
> All these configurations are using console over DCC - arm jtag uart.

OK I see.

>
>>
>>>
>>> Based on looking at code the key question is if we can call reserve_mmu
>>> at that time when reserve_arch is called now.
>>> On zynqmp I can't see any issue to allocate tlb on the stack and call it
>>> from it. The question is if this is valid for all arms.
>>>
>>> This is just a small hack I have created to move it to stack.
>>>
>>>  diff --git a/common/board_f.c b/common/board_f.c
>>>  index 88e20e0168b2..32386c957962 100644
>>>  --- a/common/board_f.c
>>>  +++ b/common/board_f.c
>>>  @@ -433,13 +433,13 @@ __weak int reserve_mmu(void)
>>>   {
>>>          /* reserve TLB table */
>>>          gd->arch.tlb_size = PGTABLE_SIZE;
>>>  -       gd->relocaddr -= gd->arch.tlb_size;
>>>  +       gd->start_addr_sp -= gd->arch.tlb_size;
>>>
>>>          /* round down to next 64 kB limit */
>>>  -       gd->relocaddr &= ~(0x10000 - 1);
>>>  +       gd->start_addr_sp &= ~(0x10000 - 1);
>>>
>>>  -       gd->arch.tlb_addr = gd->relocaddr;
>>>  -       debug("TLB table from %08lx to %08lx\n", gd->arch.tlb_addr,
>>>  +       gd->arch.tlb_addr = gd->start_addr_sp;
>>>  +       printf("TLB table from %08lx to %08lx\n", gd->arch.tlb_addr,
>>>                gd->arch.tlb_addr + gd->arch.tlb_size);
>>>
>>>   #ifdef CONFIG_SYS_MEM_RESERVE_SECURE
>>>  @@ -837,7 +837,7 @@ static int initf_dm(void)
>>>   /* Architecture-specific memory reservation */
>>>   __weak int reserve_arch(void)
>>>   {
>>>  -       return 0;
>>>  +       return reserve_mmu();
>>>   }
>>>
>>>   __weak int arch_cpu_init_dm(void)
>>>  @@ -998,10 +998,6 @@ static init_fnc_t init_sequence_f[] = {
>>>          reserve_pram,
>>>   #endif
>>>          reserve_round_4k,
>>>  -#if !(defined(CONFIG_SYS_ICACHE_OFF) &&
>>> defined(CONFIG_SYS_DCACHE_OFF)) && \
>>>  -               defined(CONFIG_ARM)
>>>  -       reserve_mmu,
>>>  -#endif
>>>   #ifdef CONFIG_DM_VIDEO
>>>          reserve_video,
>>>   #else
>>>
>>> For us we can't just rewrite tlb addresses as we want to do because we
>>> can't allow reserve_mmu to allocate 64kB alied PGTABLE_SIZE in
>>> relocation structure.
>>
>> Why not?
>
> Because we need to fit to 256kB of memory also with relocation and also
> space for data. And IIRC MMU table size is 64kB.
> Another option could be to disable relocation.

I don't have a better option for this case unfortunately. I did send a
board uclass series a while back which could have handled this, but
for now I think what you have is reasonable.

Reviewed-by: Simon Glass <sjg@chromium.org>

Regards,
Simon
York Sun July 19, 2017, 9:11 a.m. UTC | #8
> On Jul 19, 2017, at 16:25, Michal Simek <michal.simek@xilinx.com> wrote:
> 
>> On 19.7.2017 04:18, York Sun wrote:
>>> On 07/18/2017 10:59 PM, Michal Simek wrote:
>>> Hi Simon,
>>> 
>>>> On 18.7.2017 16:50, Simon Glass wrote:
>>>> Hi Michal,
>>>> 
>>>>> On 18 July 2017 at 07:23, Michal Simek <michal.simek@xilinx.com> wrote:
>>>>> Hi Simon,
>>>>> 
>>>>>> On 18.7.2017 16:01, Simon Glass wrote:
>>>>>> Hi Michal,
>>>>>> 
>>>>>>> On 13 July 2017 at 06:50, Michal Simek <michal.simek@xilinx.com> wrote:
>>>>>>> From: Siva Durga Prasad Paladugu <siva.durga.paladugu@xilinx.com>
>>>>>>> 
>>>>>>> Make reserve_mmu a weak so that it provides an option
>>>>>>> to customize this routine as per platform need
>>>>>> 
>>>>>> Why do you need this? Can we instead have the generic code do the
>>>>>> right thing? Or can we use reserve_arch() to handle this?
>>>>> 
>>>>> reserve_mmu is called just for ARM.
>>>> 
>>>> What arch are you using?
>>> 
>>> arm64 - a53s
>>> 
>>>> 
>>>>> What we need to do is support configuration where we have small memory
>>>>> OCM and we need to put mmu table to different location (TCM) because it
>>>>> is simply big and it can't be placed where it is placed now.
>>>> 
>>>> What is TCM? Is this the internal SRAM?
>>> 
>>> It is internal sram which is used mostly by R5s in the system but a53
>>> has also access to it.
>>> 
>>>> 
>>>> I don't understand 'simply big'. Are you saying it is too big to go
>>>> into main memory?
>>> 
>>> These configuration can't use DDR. One of that configuration which we
>>> are using is DDR memory test. It means you run from internal sram (256kB
>>> OCM + 256k TCM).
>>> Another configuration is for loading stuff to EMMC on different boards.
>>> It means you don't need to setup DDR for generic emmc programming.
>>> 
>>> All these configurations are using console over DCC - arm jtag uart.
>>> 
>>>> 
>>>>> 
>>>>> Based on looking at code the key question is if we can call reserve_mmu
>>>>> at that time when reserve_arch is called now.
>>>>> On zynqmp I can't see any issue to allocate tlb on the stack and call it
>>>>> from it. The question is if this is valid for all arms.
>>>>> 
>>>>> This is just a small hack I have created to move it to stack.
>>>>> 
>>>>>  diff --git a/common/board_f.c b/common/board_f.c
>>>>>  index 88e20e0168b2..32386c957962 100644
>>>>>  --- a/common/board_f.c
>>>>>  +++ b/common/board_f.c
>>>>>  @@ -433,13 +433,13 @@ __weak int reserve_mmu(void)
>>>>>   {
>>>>>          /* reserve TLB table */
>>>>>          gd->arch.tlb_size = PGTABLE_SIZE;
>>>>>  -       gd->relocaddr -= gd->arch.tlb_size;
>>>>>  +       gd->start_addr_sp -= gd->arch.tlb_size;
>>>>> 
>>>>>          /* round down to next 64 kB limit */
>>>>>  -       gd->relocaddr &= ~(0x10000 - 1);
>>>>>  +       gd->start_addr_sp &= ~(0x10000 - 1);
>>>>> 
>>>>>  -       gd->arch.tlb_addr = gd->relocaddr;
>>>>>  -       debug("TLB table from %08lx to %08lx\n", gd->arch.tlb_addr,
>>>>>  +       gd->arch.tlb_addr = gd->start_addr_sp;
>>>>>  +       printf("TLB table from %08lx to %08lx\n", gd->arch.tlb_addr,
>>>>>                gd->arch.tlb_addr + gd->arch.tlb_size);
>>>>> 
>>>>>   #ifdef CONFIG_SYS_MEM_RESERVE_SECURE
>>>>>  @@ -837,7 +837,7 @@ static int initf_dm(void)
>>>>>   /* Architecture-specific memory reservation */
>>>>>   __weak int reserve_arch(void)
>>>>>   {
>>>>>  -       return 0;
>>>>>  +       return reserve_mmu();
>>>>>   }
>>>>> 
>>>>>   __weak int arch_cpu_init_dm(void)
>>>>>  @@ -998,10 +998,6 @@ static init_fnc_t init_sequence_f[] = {
>>>>>          reserve_pram,
>>>>>   #endif
>>>>>          reserve_round_4k,
>>>>>  -#if !(defined(CONFIG_SYS_ICACHE_OFF) &&
>>>>> defined(CONFIG_SYS_DCACHE_OFF)) && \
>>>>>  -               defined(CONFIG_ARM)
>>>>>  -       reserve_mmu,
>>>>>  -#endif
>>>>>   #ifdef CONFIG_DM_VIDEO
>>>>>          reserve_video,
>>>>>   #else
>>>>> 
>>>>> For us we can't just rewrite tlb addresses as we want to do because we
>>>>> can't allow reserve_mmu to allocate 64kB alied PGTABLE_SIZE in
>>>>> relocation structure.
>>>> 
>>>> Why not?
>>> 
>>> Because we need to fit to 256kB of memory also with relocation and also
>>> space for data. And IIRC MMU table size is 64kB.
>>> Another option could be to disable relocation.
>>> 
>> Michal,
>> 
>> For your reference, we use two stage MMU setup for FSL ARMv8 SoCs. We 
>> setup an early MMU in arch_cpu_init() so we can enable d-cache very 
>> early to speed up execution. 
> 
> How big is this speed up?

Huge improvement (feeling like 10x faster at least), especially on emulators. 

York
Michal Simek July 19, 2017, 9:17 a.m. UTC | #9
On 19.7.2017 11:11, York Sun wrote:
> 
>> On Jul 19, 2017, at 16:25, Michal Simek <michal.simek@xilinx.com> wrote:
>>
>>> On 19.7.2017 04:18, York Sun wrote:
>>>> On 07/18/2017 10:59 PM, Michal Simek wrote:
>>>> Hi Simon,
>>>>
>>>>> On 18.7.2017 16:50, Simon Glass wrote:
>>>>> Hi Michal,
>>>>>
>>>>>> On 18 July 2017 at 07:23, Michal Simek <michal.simek@xilinx.com> wrote:
>>>>>> Hi Simon,
>>>>>>
>>>>>>> On 18.7.2017 16:01, Simon Glass wrote:
>>>>>>> Hi Michal,
>>>>>>>
>>>>>>>> On 13 July 2017 at 06:50, Michal Simek <michal.simek@xilinx.com> wrote:
>>>>>>>> From: Siva Durga Prasad Paladugu <siva.durga.paladugu@xilinx.com>
>>>>>>>>
>>>>>>>> Make reserve_mmu a weak so that it provides an option
>>>>>>>> to customize this routine as per platform need
>>>>>>>
>>>>>>> Why do you need this? Can we instead have the generic code do the
>>>>>>> right thing? Or can we use reserve_arch() to handle this?
>>>>>>
>>>>>> reserve_mmu is called just for ARM.
>>>>>
>>>>> What arch are you using?
>>>>
>>>> arm64 - a53s
>>>>
>>>>>
>>>>>> What we need to do is support configuration where we have small memory
>>>>>> OCM and we need to put mmu table to different location (TCM) because it
>>>>>> is simply big and it can't be placed where it is placed now.
>>>>>
>>>>> What is TCM? Is this the internal SRAM?
>>>>
>>>> It is internal sram which is used mostly by R5s in the system but a53
>>>> has also access to it.
>>>>
>>>>>
>>>>> I don't understand 'simply big'. Are you saying it is too big to go
>>>>> into main memory?
>>>>
>>>> These configuration can't use DDR. One of that configuration which we
>>>> are using is DDR memory test. It means you run from internal sram (256kB
>>>> OCM + 256k TCM).
>>>> Another configuration is for loading stuff to EMMC on different boards.
>>>> It means you don't need to setup DDR for generic emmc programming.
>>>>
>>>> All these configurations are using console over DCC - arm jtag uart.
>>>>
>>>>>
>>>>>>
>>>>>> Based on looking at code the key question is if we can call reserve_mmu
>>>>>> at that time when reserve_arch is called now.
>>>>>> On zynqmp I can't see any issue to allocate tlb on the stack and call it
>>>>>> from it. The question is if this is valid for all arms.
>>>>>>
>>>>>> This is just a small hack I have created to move it to stack.
>>>>>>
>>>>>>  diff --git a/common/board_f.c b/common/board_f.c
>>>>>>  index 88e20e0168b2..32386c957962 100644
>>>>>>  --- a/common/board_f.c
>>>>>>  +++ b/common/board_f.c
>>>>>>  @@ -433,13 +433,13 @@ __weak int reserve_mmu(void)
>>>>>>   {
>>>>>>          /* reserve TLB table */
>>>>>>          gd->arch.tlb_size = PGTABLE_SIZE;
>>>>>>  -       gd->relocaddr -= gd->arch.tlb_size;
>>>>>>  +       gd->start_addr_sp -= gd->arch.tlb_size;
>>>>>>
>>>>>>          /* round down to next 64 kB limit */
>>>>>>  -       gd->relocaddr &= ~(0x10000 - 1);
>>>>>>  +       gd->start_addr_sp &= ~(0x10000 - 1);
>>>>>>
>>>>>>  -       gd->arch.tlb_addr = gd->relocaddr;
>>>>>>  -       debug("TLB table from %08lx to %08lx\n", gd->arch.tlb_addr,
>>>>>>  +       gd->arch.tlb_addr = gd->start_addr_sp;
>>>>>>  +       printf("TLB table from %08lx to %08lx\n", gd->arch.tlb_addr,
>>>>>>                gd->arch.tlb_addr + gd->arch.tlb_size);
>>>>>>
>>>>>>   #ifdef CONFIG_SYS_MEM_RESERVE_SECURE
>>>>>>  @@ -837,7 +837,7 @@ static int initf_dm(void)
>>>>>>   /* Architecture-specific memory reservation */
>>>>>>   __weak int reserve_arch(void)
>>>>>>   {
>>>>>>  -       return 0;
>>>>>>  +       return reserve_mmu();
>>>>>>   }
>>>>>>
>>>>>>   __weak int arch_cpu_init_dm(void)
>>>>>>  @@ -998,10 +998,6 @@ static init_fnc_t init_sequence_f[] = {
>>>>>>          reserve_pram,
>>>>>>   #endif
>>>>>>          reserve_round_4k,
>>>>>>  -#if !(defined(CONFIG_SYS_ICACHE_OFF) &&
>>>>>> defined(CONFIG_SYS_DCACHE_OFF)) && \
>>>>>>  -               defined(CONFIG_ARM)
>>>>>>  -       reserve_mmu,
>>>>>>  -#endif
>>>>>>   #ifdef CONFIG_DM_VIDEO
>>>>>>          reserve_video,
>>>>>>   #else
>>>>>>
>>>>>> For us we can't just rewrite tlb addresses as we want to do because we
>>>>>> can't allow reserve_mmu to allocate 64kB alied PGTABLE_SIZE in
>>>>>> relocation structure.
>>>>>
>>>>> Why not?
>>>>
>>>> Because we need to fit to 256kB of memory also with relocation and also
>>>> space for data. And IIRC MMU table size is 64kB.
>>>> Another option could be to disable relocation.
>>>>
>>> Michal,
>>>
>>> For your reference, we use two stage MMU setup for FSL ARMv8 SoCs. We 
>>> setup an early MMU in arch_cpu_init() so we can enable d-cache very 
>>> early to speed up execution. 
>>
>> How big is this speed up?
> 
> Huge improvement (feeling like 10x faster at least), especially on emulators. 

ok. It means on real silicon I see that improvement could be till a
second right?

Thanks,
Michal
Michal Simek July 19, 2017, 9:20 a.m. UTC | #10
Hi Simon and Tom,

On 19.7.2017 11:05, Simon Glass wrote:
> Hi Michal,
> 
> On 18 July 2017 at 07:59, Michal Simek <michal.simek@xilinx.com> wrote:
>> Hi Simon,
>>
>> On 18.7.2017 16:50, Simon Glass wrote:
>>> Hi Michal,
>>>
>>> On 18 July 2017 at 07:23, Michal Simek <michal.simek@xilinx.com> wrote:
>>>> Hi Simon,
>>>>
>>>> On 18.7.2017 16:01, Simon Glass wrote:
>>>>> Hi Michal,
>>>>>
>>>>> On 13 July 2017 at 06:50, Michal Simek <michal.simek@xilinx.com> wrote:
>>>>>> From: Siva Durga Prasad Paladugu <siva.durga.paladugu@xilinx.com>
>>>>>>
>>>>>> Make reserve_mmu a weak so that it provides an option
>>>>>> to customize this routine as per platform need
>>>>>
>>>>> Why do you need this? Can we instead have the generic code do the
>>>>> right thing? Or can we use reserve_arch() to handle this?
>>>>
>>>> reserve_mmu is called just for ARM.
>>>
>>> What arch are you using?
>>
>> arm64 - a53s
>>
>>>
>>>> What we need to do is support configuration where we have small memory
>>>> OCM and we need to put mmu table to different location (TCM) because it
>>>> is simply big and it can't be placed where it is placed now.
>>>
>>> What is TCM? Is this the internal SRAM?
>>
>> It is internal sram which is used mostly by R5s in the system but a53
>> has also access to it.
>>
>>>
>>> I don't understand 'simply big'. Are you saying it is too big to go
>>> into main memory?
>>
>> These configuration can't use DDR. One of that configuration which we
>> are using is DDR memory test. It means you run from internal sram (256kB
>> OCM + 256k TCM).
>> Another configuration is for loading stuff to EMMC on different boards.
>> It means you don't need to setup DDR for generic emmc programming.
>>
>> All these configurations are using console over DCC - arm jtag uart.
> 
> OK I see.
> 
>>
>>>
>>>>
>>>> Based on looking at code the key question is if we can call reserve_mmu
>>>> at that time when reserve_arch is called now.
>>>> On zynqmp I can't see any issue to allocate tlb on the stack and call it
>>>> from it. The question is if this is valid for all arms.
>>>>
>>>> This is just a small hack I have created to move it to stack.
>>>>
>>>>  diff --git a/common/board_f.c b/common/board_f.c
>>>>  index 88e20e0168b2..32386c957962 100644
>>>>  --- a/common/board_f.c
>>>>  +++ b/common/board_f.c
>>>>  @@ -433,13 +433,13 @@ __weak int reserve_mmu(void)
>>>>   {
>>>>          /* reserve TLB table */
>>>>          gd->arch.tlb_size = PGTABLE_SIZE;
>>>>  -       gd->relocaddr -= gd->arch.tlb_size;
>>>>  +       gd->start_addr_sp -= gd->arch.tlb_size;
>>>>
>>>>          /* round down to next 64 kB limit */
>>>>  -       gd->relocaddr &= ~(0x10000 - 1);
>>>>  +       gd->start_addr_sp &= ~(0x10000 - 1);
>>>>
>>>>  -       gd->arch.tlb_addr = gd->relocaddr;
>>>>  -       debug("TLB table from %08lx to %08lx\n", gd->arch.tlb_addr,
>>>>  +       gd->arch.tlb_addr = gd->start_addr_sp;
>>>>  +       printf("TLB table from %08lx to %08lx\n", gd->arch.tlb_addr,
>>>>                gd->arch.tlb_addr + gd->arch.tlb_size);
>>>>
>>>>   #ifdef CONFIG_SYS_MEM_RESERVE_SECURE
>>>>  @@ -837,7 +837,7 @@ static int initf_dm(void)
>>>>   /* Architecture-specific memory reservation */
>>>>   __weak int reserve_arch(void)
>>>>   {
>>>>  -       return 0;
>>>>  +       return reserve_mmu();
>>>>   }
>>>>
>>>>   __weak int arch_cpu_init_dm(void)
>>>>  @@ -998,10 +998,6 @@ static init_fnc_t init_sequence_f[] = {
>>>>          reserve_pram,
>>>>   #endif
>>>>          reserve_round_4k,
>>>>  -#if !(defined(CONFIG_SYS_ICACHE_OFF) &&
>>>> defined(CONFIG_SYS_DCACHE_OFF)) && \
>>>>  -               defined(CONFIG_ARM)
>>>>  -       reserve_mmu,
>>>>  -#endif
>>>>   #ifdef CONFIG_DM_VIDEO
>>>>          reserve_video,
>>>>   #else
>>>>
>>>> For us we can't just rewrite tlb addresses as we want to do because we
>>>> can't allow reserve_mmu to allocate 64kB alied PGTABLE_SIZE in
>>>> relocation structure.
>>>
>>> Why not?
>>
>> Because we need to fit to 256kB of memory also with relocation and also
>> space for data. And IIRC MMU table size is 64kB.
>> Another option could be to disable relocation.
> 
> I don't have a better option for this case unfortunately. I did send a
> board uclass series a while back which could have handled this, but
> for now I think what you have is reasonable.
> 
> Reviewed-by: Simon Glass <sjg@chromium.org>

Simon: thanks.

Tom: Any comment or do you know better way how to support this?

Thanks,
Michal
York Sun July 19, 2017, 10:10 a.m. UTC | #11
> On Jul 19, 2017, at 17:18, Michal Simek <michal.simek@xilinx.com> wrote:
> 
>> On 19.7.2017 11:11, York Sun wrote:
>> 
>>>> On Jul 19, 2017, at 16:25, Michal Simek <michal.simek@xilinx.com> wrote:
>>>> 
>>>>> On 19.7.2017 04:18, York Sun wrote:
>>>>> On 07/18/2017 10:59 PM, Michal Simek wrote:
>>>>> Hi Simon,
>>>>> 
>>>>>> On 18.7.2017 16:50, Simon Glass wrote:
>>>>>> Hi Michal,
>>>>>> 
>>>>>>> On 18 July 2017 at 07:23, Michal Simek <michal.simek@xilinx.com> wrote:
>>>>>>> Hi Simon,
>>>>>>> 
>>>>>>>> On 18.7.2017 16:01, Simon Glass wrote:
>>>>>>>> Hi Michal,
>>>>>>>> 
>>>>>>>>> On 13 July 2017 at 06:50, Michal Simek <michal.simek@xilinx.com> wrote:
>>>>>>>>> From: Siva Durga Prasad Paladugu <siva.durga.paladugu@xilinx.com>
>>>>>>>>> 
>>>>>>>>> Make reserve_mmu a weak so that it provides an option
>>>>>>>>> to customize this routine as per platform need
>>>>>>>> 
>>>>>>>> Why do you need this? Can we instead have the generic code do the
>>>>>>>> right thing? Or can we use reserve_arch() to handle this?
>>>>>>> 
>>>>>>> reserve_mmu is called just for ARM.
>>>>>> 
>>>>>> What arch are you using?
>>>>> 
>>>>> arm64 - a53s
>>>>> 
>>>>>> 
>>>>>>> What we need to do is support configuration where we have small memory
>>>>>>> OCM and we need to put mmu table to different location (TCM) because it
>>>>>>> is simply big and it can't be placed where it is placed now.
>>>>>> 
>>>>>> What is TCM? Is this the internal SRAM?
>>>>> 
>>>>> It is internal sram which is used mostly by R5s in the system but a53
>>>>> has also access to it.
>>>>> 
>>>>>> 
>>>>>> I don't understand 'simply big'. Are you saying it is too big to go
>>>>>> into main memory?
>>>>> 
>>>>> These configuration can't use DDR. One of that configuration which we
>>>>> are using is DDR memory test. It means you run from internal sram (256kB
>>>>> OCM + 256k TCM).
>>>>> Another configuration is for loading stuff to EMMC on different boards.
>>>>> It means you don't need to setup DDR for generic emmc programming.
>>>>> 
>>>>> All these configurations are using console over DCC - arm jtag uart.
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>> Based on looking at code the key question is if we can call reserve_mmu
>>>>>>> at that time when reserve_arch is called now.
>>>>>>> On zynqmp I can't see any issue to allocate tlb on the stack and call it
>>>>>>> from it. The question is if this is valid for all arms.
>>>>>>> 
>>>>>>> This is just a small hack I have created to move it to stack.
>>>>>>> 
>>>>>>> diff --git a/common/board_f.c b/common/board_f.c
>>>>>>> index 88e20e0168b2..32386c957962 100644
>>>>>>> --- a/common/board_f.c
>>>>>>> +++ b/common/board_f.c
>>>>>>> @@ -433,13 +433,13 @@ __weak int reserve_mmu(void)
>>>>>>>  {
>>>>>>>         /* reserve TLB table */
>>>>>>>         gd->arch.tlb_size = PGTABLE_SIZE;
>>>>>>> -       gd->relocaddr -= gd->arch.tlb_size;
>>>>>>> +       gd->start_addr_sp -= gd->arch.tlb_size;
>>>>>>> 
>>>>>>>         /* round down to next 64 kB limit */
>>>>>>> -       gd->relocaddr &= ~(0x10000 - 1);
>>>>>>> +       gd->start_addr_sp &= ~(0x10000 - 1);
>>>>>>> 
>>>>>>> -       gd->arch.tlb_addr = gd->relocaddr;
>>>>>>> -       debug("TLB table from %08lx to %08lx\n", gd->arch.tlb_addr,
>>>>>>> +       gd->arch.tlb_addr = gd->start_addr_sp;
>>>>>>> +       printf("TLB table from %08lx to %08lx\n", gd->arch.tlb_addr,
>>>>>>>               gd->arch.tlb_addr + gd->arch.tlb_size);
>>>>>>> 
>>>>>>>  #ifdef CONFIG_SYS_MEM_RESERVE_SECURE
>>>>>>> @@ -837,7 +837,7 @@ static int initf_dm(void)
>>>>>>>  /* Architecture-specific memory reservation */
>>>>>>>  __weak int reserve_arch(void)
>>>>>>>  {
>>>>>>> -       return 0;
>>>>>>> +       return reserve_mmu();
>>>>>>>  }
>>>>>>> 
>>>>>>>  __weak int arch_cpu_init_dm(void)
>>>>>>> @@ -998,10 +998,6 @@ static init_fnc_t init_sequence_f[] = {
>>>>>>>         reserve_pram,
>>>>>>>  #endif
>>>>>>>         reserve_round_4k,
>>>>>>> -#if !(defined(CONFIG_SYS_ICACHE_OFF) &&
>>>>>>> defined(CONFIG_SYS_DCACHE_OFF)) && \
>>>>>>> -               defined(CONFIG_ARM)
>>>>>>> -       reserve_mmu,
>>>>>>> -#endif
>>>>>>>  #ifdef CONFIG_DM_VIDEO
>>>>>>>         reserve_video,
>>>>>>>  #else
>>>>>>> 
>>>>>>> For us we can't just rewrite tlb addresses as we want to do because we
>>>>>>> can't allow reserve_mmu to allocate 64kB alied PGTABLE_SIZE in
>>>>>>> relocation structure.
>>>>>> 
>>>>>> Why not?
>>>>> 
>>>>> Because we need to fit to 256kB of memory also with relocation and also
>>>>> space for data. And IIRC MMU table size is 64kB.
>>>>> Another option could be to disable relocation.
>>>>> 
>>>> Michal,
>>>> 
>>>> For your reference, we use two stage MMU setup for FSL ARMv8 SoCs. We 
>>>> setup an early MMU in arch_cpu_init() so we can enable d-cache very 
>>>> early to speed up execution. 
>>> 
>>> How big is this speed up?
>> 
>> Huge improvement (feeling like 10x faster at least), especially on emulators. 
> 
> ok. It means on real silicon I see that improvement could be till a
> second right?

It may be several hundreds milliseconds difference on real silicon, depends on core speed and booting device speed and booting method. For example, the NOR flash boot is very slow due to the interface.

York
Michal Simek July 19, 2017, 11:05 a.m. UTC | #12
On 19.7.2017 12:10, York Sun wrote:
> 
>> On Jul 19, 2017, at 17:18, Michal Simek <michal.simek@xilinx.com> wrote:
>>
>>> On 19.7.2017 11:11, York Sun wrote:
>>>
>>>>> On Jul 19, 2017, at 16:25, Michal Simek <michal.simek@xilinx.com> wrote:
>>>>>
>>>>>> On 19.7.2017 04:18, York Sun wrote:
>>>>>> On 07/18/2017 10:59 PM, Michal Simek wrote:
>>>>>> Hi Simon,
>>>>>>
>>>>>>> On 18.7.2017 16:50, Simon Glass wrote:
>>>>>>> Hi Michal,
>>>>>>>
>>>>>>>> On 18 July 2017 at 07:23, Michal Simek <michal.simek@xilinx.com> wrote:
>>>>>>>> Hi Simon,
>>>>>>>>
>>>>>>>>> On 18.7.2017 16:01, Simon Glass wrote:
>>>>>>>>> Hi Michal,
>>>>>>>>>
>>>>>>>>>> On 13 July 2017 at 06:50, Michal Simek <michal.simek@xilinx.com> wrote:
>>>>>>>>>> From: Siva Durga Prasad Paladugu <siva.durga.paladugu@xilinx.com>
>>>>>>>>>>
>>>>>>>>>> Make reserve_mmu a weak so that it provides an option
>>>>>>>>>> to customize this routine as per platform need
>>>>>>>>>
>>>>>>>>> Why do you need this? Can we instead have the generic code do the
>>>>>>>>> right thing? Or can we use reserve_arch() to handle this?
>>>>>>>>
>>>>>>>> reserve_mmu is called just for ARM.
>>>>>>>
>>>>>>> What arch are you using?
>>>>>>
>>>>>> arm64 - a53s
>>>>>>
>>>>>>>
>>>>>>>> What we need to do is support configuration where we have small memory
>>>>>>>> OCM and we need to put mmu table to different location (TCM) because it
>>>>>>>> is simply big and it can't be placed where it is placed now.
>>>>>>>
>>>>>>> What is TCM? Is this the internal SRAM?
>>>>>>
>>>>>> It is internal sram which is used mostly by R5s in the system but a53
>>>>>> has also access to it.
>>>>>>
>>>>>>>
>>>>>>> I don't understand 'simply big'. Are you saying it is too big to go
>>>>>>> into main memory?
>>>>>>
>>>>>> These configuration can't use DDR. One of that configuration which we
>>>>>> are using is DDR memory test. It means you run from internal sram (256kB
>>>>>> OCM + 256k TCM).
>>>>>> Another configuration is for loading stuff to EMMC on different boards.
>>>>>> It means you don't need to setup DDR for generic emmc programming.
>>>>>>
>>>>>> All these configurations are using console over DCC - arm jtag uart.
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Based on looking at code the key question is if we can call reserve_mmu
>>>>>>>> at that time when reserve_arch is called now.
>>>>>>>> On zynqmp I can't see any issue to allocate tlb on the stack and call it
>>>>>>>> from it. The question is if this is valid for all arms.
>>>>>>>>
>>>>>>>> This is just a small hack I have created to move it to stack.
>>>>>>>>
>>>>>>>> diff --git a/common/board_f.c b/common/board_f.c
>>>>>>>> index 88e20e0168b2..32386c957962 100644
>>>>>>>> --- a/common/board_f.c
>>>>>>>> +++ b/common/board_f.c
>>>>>>>> @@ -433,13 +433,13 @@ __weak int reserve_mmu(void)
>>>>>>>>  {
>>>>>>>>         /* reserve TLB table */
>>>>>>>>         gd->arch.tlb_size = PGTABLE_SIZE;
>>>>>>>> -       gd->relocaddr -= gd->arch.tlb_size;
>>>>>>>> +       gd->start_addr_sp -= gd->arch.tlb_size;
>>>>>>>>
>>>>>>>>         /* round down to next 64 kB limit */
>>>>>>>> -       gd->relocaddr &= ~(0x10000 - 1);
>>>>>>>> +       gd->start_addr_sp &= ~(0x10000 - 1);
>>>>>>>>
>>>>>>>> -       gd->arch.tlb_addr = gd->relocaddr;
>>>>>>>> -       debug("TLB table from %08lx to %08lx\n", gd->arch.tlb_addr,
>>>>>>>> +       gd->arch.tlb_addr = gd->start_addr_sp;
>>>>>>>> +       printf("TLB table from %08lx to %08lx\n", gd->arch.tlb_addr,
>>>>>>>>               gd->arch.tlb_addr + gd->arch.tlb_size);
>>>>>>>>
>>>>>>>>  #ifdef CONFIG_SYS_MEM_RESERVE_SECURE
>>>>>>>> @@ -837,7 +837,7 @@ static int initf_dm(void)
>>>>>>>>  /* Architecture-specific memory reservation */
>>>>>>>>  __weak int reserve_arch(void)
>>>>>>>>  {
>>>>>>>> -       return 0;
>>>>>>>> +       return reserve_mmu();
>>>>>>>>  }
>>>>>>>>
>>>>>>>>  __weak int arch_cpu_init_dm(void)
>>>>>>>> @@ -998,10 +998,6 @@ static init_fnc_t init_sequence_f[] = {
>>>>>>>>         reserve_pram,
>>>>>>>>  #endif
>>>>>>>>         reserve_round_4k,
>>>>>>>> -#if !(defined(CONFIG_SYS_ICACHE_OFF) &&
>>>>>>>> defined(CONFIG_SYS_DCACHE_OFF)) && \
>>>>>>>> -               defined(CONFIG_ARM)
>>>>>>>> -       reserve_mmu,
>>>>>>>> -#endif
>>>>>>>>  #ifdef CONFIG_DM_VIDEO
>>>>>>>>         reserve_video,
>>>>>>>>  #else
>>>>>>>>
>>>>>>>> For us we can't just rewrite tlb addresses as we want to do because we
>>>>>>>> can't allow reserve_mmu to allocate 64kB alied PGTABLE_SIZE in
>>>>>>>> relocation structure.
>>>>>>>
>>>>>>> Why not?
>>>>>>
>>>>>> Because we need to fit to 256kB of memory also with relocation and also
>>>>>> space for data. And IIRC MMU table size is 64kB.
>>>>>> Another option could be to disable relocation.
>>>>>>
>>>>> Michal,
>>>>>
>>>>> For your reference, we use two stage MMU setup for FSL ARMv8 SoCs. We 
>>>>> setup an early MMU in arch_cpu_init() so we can enable d-cache very 
>>>>> early to speed up execution. 
>>>>
>>>> How big is this speed up?
>>>
>>> Huge improvement (feeling like 10x faster at least), especially on emulators. 
>>
>> ok. It means on real silicon I see that improvement could be till a
>> second right?
> 
> It may be several hundreds milliseconds difference on real silicon, depends on core speed and booting device speed and booting method. For example, the NOR flash boot is very slow due to the interface.

Ok. I see. Time to do some measurements and see if makes sense to invest
our time to do this. Till now I am not aware about any request to speed
up u-boot boot time.
Definitely thanks for pointer.

Thanks,
Michal
York Sun July 19, 2017, 11:10 a.m. UTC | #13
> On Jul 19, 2017, at 19:06, Michal Simek <michal.simek@xilinx.com> wrote:
> 
>> On 19.7.2017 12:10, York Sun wrote:
>> 
>>>> On Jul 19, 2017, at 17:18, Michal Simek <michal.simek@xilinx.com> wrote:
>>>> 
>>>> On 19.7.2017 11:11, York Sun wrote:
>>>> 
>>>>>> On Jul 19, 2017, at 16:25, Michal Simek <michal.simek@xilinx.com> wrote:
>>>>>> 
>>>>>>> On 19.7.2017 04:18, York Sun wrote:
>>>>>>> On 07/18/2017 10:59 PM, Michal Simek wrote:
>>>>>>> Hi Simon,
>>>>>>> 
>>>>>>>> On 18.7.2017 16:50, Simon Glass wrote:
>>>>>>>> Hi Michal,
>>>>>>>> 
>>>>>>>>> On 18 July 2017 at 07:23, Michal Simek <michal.simek@xilinx.com> wrote:
>>>>>>>>> Hi Simon,
>>>>>>>>> 
>>>>>>>>>> On 18.7.2017 16:01, Simon Glass wrote:
>>>>>>>>>> Hi Michal,
>>>>>>>>>> 
>>>>>>>>>>> On 13 July 2017 at 06:50, Michal Simek <michal.simek@xilinx.com> wrote:
>>>>>>>>>>> From: Siva Durga Prasad Paladugu <siva.durga.paladugu@xilinx.com>
>>>>>>>>>>> 
>>>>>>>>>>> Make reserve_mmu a weak so that it provides an option
>>>>>>>>>>> to customize this routine as per platform need
>>>>>>>>>> 
>>>>>>>>>> Why do you need this? Can we instead have the generic code do the
>>>>>>>>>> right thing? Or can we use reserve_arch() to handle this?
>>>>>>>>> 
>>>>>>>>> reserve_mmu is called just for ARM.
>>>>>>>> 
>>>>>>>> What arch are you using?
>>>>>>> 
>>>>>>> arm64 - a53s
>>>>>>> 
>>>>>>>> 
>>>>>>>>> What we need to do is support configuration where we have small memory
>>>>>>>>> OCM and we need to put mmu table to different location (TCM) because it
>>>>>>>>> is simply big and it can't be placed where it is placed now.
>>>>>>>> 
>>>>>>>> What is TCM? Is this the internal SRAM?
>>>>>>> 
>>>>>>> It is internal sram which is used mostly by R5s in the system but a53
>>>>>>> has also access to it.
>>>>>>> 
>>>>>>>> 
>>>>>>>> I don't understand 'simply big'. Are you saying it is too big to go
>>>>>>>> into main memory?
>>>>>>> 
>>>>>>> These configuration can't use DDR. One of that configuration which we
>>>>>>> are using is DDR memory test. It means you run from internal sram (256kB
>>>>>>> OCM + 256k TCM).
>>>>>>> Another configuration is for loading stuff to EMMC on different boards.
>>>>>>> It means you don't need to setup DDR for generic emmc programming.
>>>>>>> 
>>>>>>> All these configurations are using console over DCC - arm jtag uart.
>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Based on looking at code the key question is if we can call reserve_mmu
>>>>>>>>> at that time when reserve_arch is called now.
>>>>>>>>> On zynqmp I can't see any issue to allocate tlb on the stack and call it
>>>>>>>>> from it. The question is if this is valid for all arms.
>>>>>>>>> 
>>>>>>>>> This is just a small hack I have created to move it to stack.
>>>>>>>>> 
>>>>>>>>> diff --git a/common/board_f.c b/common/board_f.c
>>>>>>>>> index 88e20e0168b2..32386c957962 100644
>>>>>>>>> --- a/common/board_f.c
>>>>>>>>> +++ b/common/board_f.c
>>>>>>>>> @@ -433,13 +433,13 @@ __weak int reserve_mmu(void)
>>>>>>>>> {
>>>>>>>>>        /* reserve TLB table */
>>>>>>>>>        gd->arch.tlb_size = PGTABLE_SIZE;
>>>>>>>>> -       gd->relocaddr -= gd->arch.tlb_size;
>>>>>>>>> +       gd->start_addr_sp -= gd->arch.tlb_size;
>>>>>>>>> 
>>>>>>>>>        /* round down to next 64 kB limit */
>>>>>>>>> -       gd->relocaddr &= ~(0x10000 - 1);
>>>>>>>>> +       gd->start_addr_sp &= ~(0x10000 - 1);
>>>>>>>>> 
>>>>>>>>> -       gd->arch.tlb_addr = gd->relocaddr;
>>>>>>>>> -       debug("TLB table from %08lx to %08lx\n", gd->arch.tlb_addr,
>>>>>>>>> +       gd->arch.tlb_addr = gd->start_addr_sp;
>>>>>>>>> +       printf("TLB table from %08lx to %08lx\n", gd->arch.tlb_addr,
>>>>>>>>>              gd->arch.tlb_addr + gd->arch.tlb_size);
>>>>>>>>> 
>>>>>>>>> #ifdef CONFIG_SYS_MEM_RESERVE_SECURE
>>>>>>>>> @@ -837,7 +837,7 @@ static int initf_dm(void)
>>>>>>>>> /* Architecture-specific memory reservation */
>>>>>>>>> __weak int reserve_arch(void)
>>>>>>>>> {
>>>>>>>>> -       return 0;
>>>>>>>>> +       return reserve_mmu();
>>>>>>>>> }
>>>>>>>>> 
>>>>>>>>> __weak int arch_cpu_init_dm(void)
>>>>>>>>> @@ -998,10 +998,6 @@ static init_fnc_t init_sequence_f[] = {
>>>>>>>>>        reserve_pram,
>>>>>>>>> #endif
>>>>>>>>>        reserve_round_4k,
>>>>>>>>> -#if !(defined(CONFIG_SYS_ICACHE_OFF) &&
>>>>>>>>> defined(CONFIG_SYS_DCACHE_OFF)) && \
>>>>>>>>> -               defined(CONFIG_ARM)
>>>>>>>>> -       reserve_mmu,
>>>>>>>>> -#endif
>>>>>>>>> #ifdef CONFIG_DM_VIDEO
>>>>>>>>>        reserve_video,
>>>>>>>>> #else
>>>>>>>>> 
>>>>>>>>> For us we can't just rewrite tlb addresses as we want to do because we
>>>>>>>>> can't allow reserve_mmu to allocate 64kB alied PGTABLE_SIZE in
>>>>>>>>> relocation structure.
>>>>>>>> 
>>>>>>>> Why not?
>>>>>>> 
>>>>>>> Because we need to fit to 256kB of memory also with relocation and also
>>>>>>> space for data. And IIRC MMU table size is 64kB.
>>>>>>> Another option could be to disable relocation.
>>>>>>> 
>>>>>> Michal,
>>>>>> 
>>>>>> For your reference, we use two stage MMU setup for FSL ARMv8 SoCs. We 
>>>>>> setup an early MMU in arch_cpu_init() so we can enable d-cache very 
>>>>>> early to speed up execution. 
>>>>> 
>>>>> How big is this speed up?
>>>> 
>>>> Huge improvement (feeling like 10x faster at least), especially on emulators. 
>>> 
>>> ok. It means on real silicon I see that improvement could be till a
>>> second right?
>> 
>> It may be several hundreds milliseconds difference on real silicon, depends on core speed and booting device speed and booting method. For example, the NOR flash boot is very slow due to the interface.
> 
> Ok. I see. Time to do some measurements and see if makes sense to invest
> our time to do this. Till now I am not aware about any request to speed
> up u-boot boot time.
> Definitely thanks for pointer.
> 
It makes sense on emulators because they runs thousands times slower. 

York
Michal Simek July 19, 2017, 11:17 a.m. UTC | #14
On 19.7.2017 13:10, York Sun wrote:
> 
>> On Jul 19, 2017, at 19:06, Michal Simek <michal.simek@xilinx.com> wrote:
>>
>>> On 19.7.2017 12:10, York Sun wrote:
>>>
>>>>> On Jul 19, 2017, at 17:18, Michal Simek <michal.simek@xilinx.com> wrote:
>>>>>
>>>>> On 19.7.2017 11:11, York Sun wrote:
>>>>>
>>>>>>> On Jul 19, 2017, at 16:25, Michal Simek <michal.simek@xilinx.com> wrote:
>>>>>>>
>>>>>>>> On 19.7.2017 04:18, York Sun wrote:
>>>>>>>> On 07/18/2017 10:59 PM, Michal Simek wrote:
>>>>>>>> Hi Simon,
>>>>>>>>
>>>>>>>>> On 18.7.2017 16:50, Simon Glass wrote:
>>>>>>>>> Hi Michal,
>>>>>>>>>
>>>>>>>>>> On 18 July 2017 at 07:23, Michal Simek <michal.simek@xilinx.com> wrote:
>>>>>>>>>> Hi Simon,
>>>>>>>>>>
>>>>>>>>>>> On 18.7.2017 16:01, Simon Glass wrote:
>>>>>>>>>>> Hi Michal,
>>>>>>>>>>>
>>>>>>>>>>>> On 13 July 2017 at 06:50, Michal Simek <michal.simek@xilinx.com> wrote:
>>>>>>>>>>>> From: Siva Durga Prasad Paladugu <siva.durga.paladugu@xilinx.com>
>>>>>>>>>>>>
>>>>>>>>>>>> Make reserve_mmu a weak so that it provides an option
>>>>>>>>>>>> to customize this routine as per platform need
>>>>>>>>>>>
>>>>>>>>>>> Why do you need this? Can we instead have the generic code do the
>>>>>>>>>>> right thing? Or can we use reserve_arch() to handle this?
>>>>>>>>>>
>>>>>>>>>> reserve_mmu is called just for ARM.
>>>>>>>>>
>>>>>>>>> What arch are you using?
>>>>>>>>
>>>>>>>> arm64 - a53s
>>>>>>>>
>>>>>>>>>
>>>>>>>>>> What we need to do is support configuration where we have small memory
>>>>>>>>>> OCM and we need to put mmu table to different location (TCM) because it
>>>>>>>>>> is simply big and it can't be placed where it is placed now.
>>>>>>>>>
>>>>>>>>> What is TCM? Is this the internal SRAM?
>>>>>>>>
>>>>>>>> It is internal sram which is used mostly by R5s in the system but a53
>>>>>>>> has also access to it.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> I don't understand 'simply big'. Are you saying it is too big to go
>>>>>>>>> into main memory?
>>>>>>>>
>>>>>>>> These configuration can't use DDR. One of that configuration which we
>>>>>>>> are using is DDR memory test. It means you run from internal sram (256kB
>>>>>>>> OCM + 256k TCM).
>>>>>>>> Another configuration is for loading stuff to EMMC on different boards.
>>>>>>>> It means you don't need to setup DDR for generic emmc programming.
>>>>>>>>
>>>>>>>> All these configurations are using console over DCC - arm jtag uart.
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Based on looking at code the key question is if we can call reserve_mmu
>>>>>>>>>> at that time when reserve_arch is called now.
>>>>>>>>>> On zynqmp I can't see any issue to allocate tlb on the stack and call it
>>>>>>>>>> from it. The question is if this is valid for all arms.
>>>>>>>>>>
>>>>>>>>>> This is just a small hack I have created to move it to stack.
>>>>>>>>>>
>>>>>>>>>> diff --git a/common/board_f.c b/common/board_f.c
>>>>>>>>>> index 88e20e0168b2..32386c957962 100644
>>>>>>>>>> --- a/common/board_f.c
>>>>>>>>>> +++ b/common/board_f.c
>>>>>>>>>> @@ -433,13 +433,13 @@ __weak int reserve_mmu(void)
>>>>>>>>>> {
>>>>>>>>>>        /* reserve TLB table */
>>>>>>>>>>        gd->arch.tlb_size = PGTABLE_SIZE;
>>>>>>>>>> -       gd->relocaddr -= gd->arch.tlb_size;
>>>>>>>>>> +       gd->start_addr_sp -= gd->arch.tlb_size;
>>>>>>>>>>
>>>>>>>>>>        /* round down to next 64 kB limit */
>>>>>>>>>> -       gd->relocaddr &= ~(0x10000 - 1);
>>>>>>>>>> +       gd->start_addr_sp &= ~(0x10000 - 1);
>>>>>>>>>>
>>>>>>>>>> -       gd->arch.tlb_addr = gd->relocaddr;
>>>>>>>>>> -       debug("TLB table from %08lx to %08lx\n", gd->arch.tlb_addr,
>>>>>>>>>> +       gd->arch.tlb_addr = gd->start_addr_sp;
>>>>>>>>>> +       printf("TLB table from %08lx to %08lx\n", gd->arch.tlb_addr,
>>>>>>>>>>              gd->arch.tlb_addr + gd->arch.tlb_size);
>>>>>>>>>>
>>>>>>>>>> #ifdef CONFIG_SYS_MEM_RESERVE_SECURE
>>>>>>>>>> @@ -837,7 +837,7 @@ static int initf_dm(void)
>>>>>>>>>> /* Architecture-specific memory reservation */
>>>>>>>>>> __weak int reserve_arch(void)
>>>>>>>>>> {
>>>>>>>>>> -       return 0;
>>>>>>>>>> +       return reserve_mmu();
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> __weak int arch_cpu_init_dm(void)
>>>>>>>>>> @@ -998,10 +998,6 @@ static init_fnc_t init_sequence_f[] = {
>>>>>>>>>>        reserve_pram,
>>>>>>>>>> #endif
>>>>>>>>>>        reserve_round_4k,
>>>>>>>>>> -#if !(defined(CONFIG_SYS_ICACHE_OFF) &&
>>>>>>>>>> defined(CONFIG_SYS_DCACHE_OFF)) && \
>>>>>>>>>> -               defined(CONFIG_ARM)
>>>>>>>>>> -       reserve_mmu,
>>>>>>>>>> -#endif
>>>>>>>>>> #ifdef CONFIG_DM_VIDEO
>>>>>>>>>>        reserve_video,
>>>>>>>>>> #else
>>>>>>>>>>
>>>>>>>>>> For us we can't just rewrite tlb addresses as we want to do because we
>>>>>>>>>> can't allow reserve_mmu to allocate 64kB alied PGTABLE_SIZE in
>>>>>>>>>> relocation structure.
>>>>>>>>>
>>>>>>>>> Why not?
>>>>>>>>
>>>>>>>> Because we need to fit to 256kB of memory also with relocation and also
>>>>>>>> space for data. And IIRC MMU table size is 64kB.
>>>>>>>> Another option could be to disable relocation.
>>>>>>>>
>>>>>>> Michal,
>>>>>>>
>>>>>>> For your reference, we use two stage MMU setup for FSL ARMv8 SoCs. We 
>>>>>>> setup an early MMU in arch_cpu_init() so we can enable d-cache very 
>>>>>>> early to speed up execution. 
>>>>>>
>>>>>> How big is this speed up?
>>>>>
>>>>> Huge improvement (feeling like 10x faster at least), especially on emulators. 
>>>>
>>>> ok. It means on real silicon I see that improvement could be till a
>>>> second right?
>>>
>>> It may be several hundreds milliseconds difference on real silicon, depends on core speed and booting device speed and booting method. For example, the NOR flash boot is very slow due to the interface.
>>
>> Ok. I see. Time to do some measurements and see if makes sense to invest
>> our time to do this. Till now I am not aware about any request to speed
>> up u-boot boot time.
>> Definitely thanks for pointer.
>>
> It makes sense on emulators because they runs thousands times slower. 

Agree but depends on emulators :-).

M
diff mbox

Patch

diff --git a/common/board_f.c b/common/board_f.c
index 2cdd12a503ae..09f0fbfb6c96 100644
--- a/common/board_f.c
+++ b/common/board_f.c
@@ -340,7 +340,7 @@  static int reserve_round_4k(void)
 }
 
 #ifdef CONFIG_ARM
-static int reserve_mmu(void)
+__weak int reserve_mmu(void)
 {
 #if !(defined(CONFIG_SYS_ICACHE_OFF) && defined(CONFIG_SYS_DCACHE_OFF))
 	/* reserve TLB table */
diff --git a/include/common.h b/include/common.h
index 1a98512ab618..ce4c92015928 100644
--- a/include/common.h
+++ b/include/common.h
@@ -286,6 +286,7 @@  void board_show_dram(phys_size_t size);
  */
 int arch_fixup_fdt(void *blob);
 
+int reserve_mmu(void);
 /* common/flash.c */
 void flash_perror (int);