diff mbox series

[3/3] malloc: Enable SYS_MALLOC_RUNTIME_INIT by default in SPL

Message ID 20230731223327.109865-4-sean.anderson@seco.com
State Changes Requested
Delegated to: Tom Rini
Headers show
Series malloc: Reduce size by initializing data at runtime | expand

Commit Message

Sean Anderson July 31, 2023, 10:33 p.m. UTC
On boards with size restrictions, 1-2k can be a significant fraction of
the binary size. Add a new SPL version of SYS_MALLOC_RUNTIME_INIT and
enable it by default.

Signed-off-by: Sean Anderson <sean.anderson@seco.com>
---

 Kconfig | 11 +++++++++++
 1 file changed, 11 insertions(+)

Comments

Tom Rini Aug. 1, 2023, 3:25 p.m. UTC | #1
On Mon, Jul 31, 2023 at 06:33:27PM -0400, Sean Anderson wrote:

> On boards with size restrictions, 1-2k can be a significant fraction of
> the binary size. Add a new SPL version of SYS_MALLOC_RUNTIME_INIT and
> enable it by default.
> 
> Signed-off-by: Sean Anderson <sean.anderson@seco.com>
> ---
> 
>  Kconfig | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/Kconfig b/Kconfig
> index 4b32286b69..3cb31a9346 100644
> --- a/Kconfig
> +++ b/Kconfig
> @@ -381,6 +381,17 @@ config SYS_MALLOC_RUNTIME_INIT
>           reduce the size of U-Boot by letting malloc's data reside in .bss
>           instead of .data.
>  
> +config SPL_SYS_MALLOC_RUNTIME_INIT
> +	bool "Initialize malloc's internal data at runtime in SPL"
> +	default y
> +	depends on SPL
> +	help
> +	 Initialize malloc's internal data structures at SPL runtime, rather
> +	 than at compile-time. This is necessary if relocating the malloc arena
> +	 from a smaller static memory to a large DDR memory. It can also reduce
> +	 the size of U-Boot by letting malloc's data reside in .bss instead of
> +	 .data.
> +
>  config TOOLS_DEBUG
>  	bool "Enable debug information for tools"
>  	help

Can you use something like grabserial (or other tooling) to quantify the
change on a platform or two?
Sean Anderson Aug. 1, 2023, 11:51 p.m. UTC | #2
On 8/1/23 11:25, Tom Rini wrote:
> On Mon, Jul 31, 2023 at 06:33:27PM -0400, Sean Anderson wrote:
> 
>> On boards with size restrictions, 1-2k can be a significant fraction of
>> the binary size. Add a new SPL version of SYS_MALLOC_RUNTIME_INIT and
>> enable it by default.
>> 
>> Signed-off-by: Sean Anderson <sean.anderson@seco.com>
>> ---
>> 
>>  Kconfig | 11 +++++++++++
>>  1 file changed, 11 insertions(+)
>> 
>> diff --git a/Kconfig b/Kconfig
>> index 4b32286b69..3cb31a9346 100644
>> --- a/Kconfig
>> +++ b/Kconfig
>> @@ -381,6 +381,17 @@ config SYS_MALLOC_RUNTIME_INIT
>>           reduce the size of U-Boot by letting malloc's data reside in .bss
>>           instead of .data.
>>  
>> +config SPL_SYS_MALLOC_RUNTIME_INIT
>> +	bool "Initialize malloc's internal data at runtime in SPL"
>> +	default y
>> +	depends on SPL
>> +	help
>> +	 Initialize malloc's internal data structures at SPL runtime, rather
>> +	 than at compile-time. This is necessary if relocating the malloc arena
>> +	 from a smaller static memory to a large DDR memory. It can also reduce
>> +	 the size of U-Boot by letting malloc's data reside in .bss instead of
>> +	 .data.
>> +
>>  config TOOLS_DEBUG
>>  	bool "Enable debug information for tools"
>>  	help
> 
> Can you use something like grabserial (or other tooling) to quantify the
> change on a platform or two?

I had a ZynqMP board roughly based on xilinx_zynqmp_virt_defconfig lying
around. On this board, serial is not initialized until after malloc, so
I used the internal timer and printed the times later. DEBUG_UART would
probably be better but I didn't get it working.

diff --git a/common/board_r.c b/common/board_r.c
index d798c00a80a..d7aee85e5b1 100644
--- a/common/board_r.c
+++ b/common/board_r.c
@@ -194,6 +194,7 @@ static int initr_barrier(void)
        return 0;
 }
 
+static ulong malloc_begin, malloc_end;
 static int initr_malloc(void)
 {
        ulong malloc_start;
@@ -208,8 +209,10 @@ static int initr_malloc(void)
         * reserve_noncached().
         */
        malloc_start = gd->relocaddr - TOTAL_MALLOC_LEN;
+       malloc_begin = timer_get_boot_us();
        mem_malloc_init((ulong)map_sysmem(malloc_start, TOTAL_MALLOC_LEN),
                        TOTAL_MALLOC_LEN);
+       malloc_end = timer_get_boot_us();
        return 0;
 }
 
@@ -569,6 +572,7 @@ static int dm_announce(void)
 
 static int run_main_loop(void)
 {
+       printf("malloc_init took %luus (%lu %lu)\n", malloc_end - malloc_begin, malloc_begin, malloc_end);
 #ifdef CONFIG_SANDBOX
        sandbox_main_loop_init();
 #endif
diff --git a/common/spl/spl.c b/common/spl/spl.c
index d74acec10b5..09abcc74442 100644
--- a/common/spl/spl.c
+++ b/common/spl/spl.c
@@ -755,7 +755,9 @@ void board_init_r(gd_t *dummy1, ulong dummy2)
        spl_set_bd();
 
 #if defined(CONFIG_SYS_SPL_MALLOC)
+       ulong malloc_begin = timer_get_boot_us();
        mem_malloc_init(SYS_SPL_MALLOC_START, CONFIG_SYS_SPL_MALLOC_SIZE);
+       ulong malloc_end = timer_get_boot_us();
        gd->flags |= GD_FLG_FULL_MALLOC_INIT;
 #endif
        if (!(gd->flags & GD_FLG_SPL_INIT)) {
@@ -817,6 +819,7 @@ void board_init_r(gd_t *dummy1, ulong dummy2)
        spl_image.boot_device = BOOT_DEVICE_NONE;
        board_boot_order(spl_boot_list);
 
+       printf("malloc_init took %luus (%lu %lu)\n", malloc_end - malloc_begin, malloc_begin, malloc_end);
        ret = boot_from_devices(&spl_image, spl_boot_list,
                                ARRAY_SIZE(spl_boot_list));
        if (ret) {

I recorded times (in us) from five boots. The value of
(SPL_)SYS_MALLOC_RUNTIME_INIT is in parentheses:

SPL (n) U-Boot (n) SPL (y) U-Boot (y)
======= ========== ======= ==========
192975       47557  135027      47557
192940       47557  135059      47557
193006       47557  134722      47558
193015       47556  135055      47557
193987       47557  134790      47557
             
So SPL took 60 ms shorter (?!) and U-Boot was mostly unaffected. Not
sure how that happened. The raw values for begin/end look reasonable:

  malloc_init took 47557us (6778108 6825665)
  malloc_init took 47557us (6779290 6826847)
  malloc_init took 47558us (6780379 6827937)

etc. but to be honest I don't really see how SPL can spend 190ms setting
some variables and clearing the arena.

--Sean
Tom Rini Aug. 2, 2023, 5:02 p.m. UTC | #3
On Tue, Aug 01, 2023 at 07:51:20PM -0400, Sean Anderson wrote:
> On 8/1/23 11:25, Tom Rini wrote:
> > On Mon, Jul 31, 2023 at 06:33:27PM -0400, Sean Anderson wrote:
> > 
> >> On boards with size restrictions, 1-2k can be a significant fraction of
> >> the binary size. Add a new SPL version of SYS_MALLOC_RUNTIME_INIT and
> >> enable it by default.
> >> 
> >> Signed-off-by: Sean Anderson <sean.anderson@seco.com>
> >> ---
> >> 
> >>  Kconfig | 11 +++++++++++
> >>  1 file changed, 11 insertions(+)
> >> 
> >> diff --git a/Kconfig b/Kconfig
> >> index 4b32286b69..3cb31a9346 100644
> >> --- a/Kconfig
> >> +++ b/Kconfig
> >> @@ -381,6 +381,17 @@ config SYS_MALLOC_RUNTIME_INIT
> >>           reduce the size of U-Boot by letting malloc's data reside in .bss
> >>           instead of .data.
> >>  
> >> +config SPL_SYS_MALLOC_RUNTIME_INIT
> >> +	bool "Initialize malloc's internal data at runtime in SPL"
> >> +	default y
> >> +	depends on SPL
> >> +	help
> >> +	 Initialize malloc's internal data structures at SPL runtime, rather
> >> +	 than at compile-time. This is necessary if relocating the malloc arena
> >> +	 from a smaller static memory to a large DDR memory. It can also reduce
> >> +	 the size of U-Boot by letting malloc's data reside in .bss instead of
> >> +	 .data.
> >> +
> >>  config TOOLS_DEBUG
> >>  	bool "Enable debug information for tools"
> >>  	help
> > 
> > Can you use something like grabserial (or other tooling) to quantify the
> > change on a platform or two?
> 
> I had a ZynqMP board roughly based on xilinx_zynqmp_virt_defconfig lying
> around. On this board, serial is not initialized until after malloc, so
> I used the internal timer and printed the times later. DEBUG_UART would
> probably be better but I didn't get it working.
> 
> diff --git a/common/board_r.c b/common/board_r.c
> index d798c00a80a..d7aee85e5b1 100644
> --- a/common/board_r.c
> +++ b/common/board_r.c
> @@ -194,6 +194,7 @@ static int initr_barrier(void)
>         return 0;
>  }
>  
> +static ulong malloc_begin, malloc_end;
>  static int initr_malloc(void)
>  {
>         ulong malloc_start;
> @@ -208,8 +209,10 @@ static int initr_malloc(void)
>          * reserve_noncached().
>          */
>         malloc_start = gd->relocaddr - TOTAL_MALLOC_LEN;
> +       malloc_begin = timer_get_boot_us();
>         mem_malloc_init((ulong)map_sysmem(malloc_start, TOTAL_MALLOC_LEN),
>                         TOTAL_MALLOC_LEN);
> +       malloc_end = timer_get_boot_us();
>         return 0;
>  }
>  
> @@ -569,6 +572,7 @@ static int dm_announce(void)
>  
>  static int run_main_loop(void)
>  {
> +       printf("malloc_init took %luus (%lu %lu)\n", malloc_end - malloc_begin, malloc_begin, malloc_end);
>  #ifdef CONFIG_SANDBOX
>         sandbox_main_loop_init();
>  #endif
> diff --git a/common/spl/spl.c b/common/spl/spl.c
> index d74acec10b5..09abcc74442 100644
> --- a/common/spl/spl.c
> +++ b/common/spl/spl.c
> @@ -755,7 +755,9 @@ void board_init_r(gd_t *dummy1, ulong dummy2)
>         spl_set_bd();
>  
>  #if defined(CONFIG_SYS_SPL_MALLOC)
> +       ulong malloc_begin = timer_get_boot_us();
>         mem_malloc_init(SYS_SPL_MALLOC_START, CONFIG_SYS_SPL_MALLOC_SIZE);
> +       ulong malloc_end = timer_get_boot_us();
>         gd->flags |= GD_FLG_FULL_MALLOC_INIT;
>  #endif
>         if (!(gd->flags & GD_FLG_SPL_INIT)) {
> @@ -817,6 +819,7 @@ void board_init_r(gd_t *dummy1, ulong dummy2)
>         spl_image.boot_device = BOOT_DEVICE_NONE;
>         board_boot_order(spl_boot_list);
>  
> +       printf("malloc_init took %luus (%lu %lu)\n", malloc_end - malloc_begin, malloc_begin, malloc_end);
>         ret = boot_from_devices(&spl_image, spl_boot_list,
>                                 ARRAY_SIZE(spl_boot_list));
>         if (ret) {
> 
> I recorded times (in us) from five boots. The value of
> (SPL_)SYS_MALLOC_RUNTIME_INIT is in parentheses:
> 
> SPL (n) U-Boot (n) SPL (y) U-Boot (y)
> ======= ========== ======= ==========
> 192975       47557  135027      47557
> 192940       47557  135059      47557
> 193006       47557  134722      47558
> 193015       47556  135055      47557
> 193987       47557  134790      47557
>              
> So SPL took 60 ms shorter (?!) and U-Boot was mostly unaffected. Not
> sure how that happened. The raw values for begin/end look reasonable:
> 
>   malloc_init took 47557us (6778108 6825665)
>   malloc_init took 47557us (6779290 6826847)
>   malloc_init took 47558us (6780379 6827937)
> 
> etc. but to be honest I don't really see how SPL can spend 190ms setting
> some variables and clearing the arena.

OK.  Please repeat this for v2 and assuming that once again the timing
change is marginal, I think we can emphasize it's generally a wash.
diff mbox series

Patch

diff --git a/Kconfig b/Kconfig
index 4b32286b69..3cb31a9346 100644
--- a/Kconfig
+++ b/Kconfig
@@ -381,6 +381,17 @@  config SYS_MALLOC_RUNTIME_INIT
          reduce the size of U-Boot by letting malloc's data reside in .bss
          instead of .data.
 
+config SPL_SYS_MALLOC_RUNTIME_INIT
+	bool "Initialize malloc's internal data at runtime in SPL"
+	default y
+	depends on SPL
+	help
+	 Initialize malloc's internal data structures at SPL runtime, rather
+	 than at compile-time. This is necessary if relocating the malloc arena
+	 from a smaller static memory to a large DDR memory. It can also reduce
+	 the size of U-Boot by letting malloc's data reside in .bss instead of
+	 .data.
+
 config TOOLS_DEBUG
 	bool "Enable debug information for tools"
 	help