diff mbox

[U-Boot] arm: spl: Allow board_init_r() to run with a larger stack

Message ID 1421607336-26934-1-git-send-email-sjg@chromium.org
State Superseded
Delegated to: Albert ARIBAUD
Headers show

Commit Message

Simon Glass Jan. 18, 2015, 6:55 p.m. UTC
At present SPL uses a single stack, either CONFIG_SPL_STACK or
CONFIG_SYS_INIT_SP_ADDR. Since some SPL features (such as MMC and
environment) require a lot of stack, some boards set CONFIG_SPL_STACK to
point into SDRAM. They then set up SDRAM very early, before board_init_f(),
so that the larger stack can be used.

This is an abuse of lowlevel_init(). That function should only be used for
essential start-up code which cannot be delayed. An example of a valid use is
when only part of the SPL code is visible/executable, and the SoC must be set
up so that board_init_f() can be reached. It should not be used for SDRAM
init, console init, etc.

Add a CONFIG_SPL_STACK_R option, which allows the stack to be moved to a new
address before board_init_r() is called in SPL.

The expected SPL flow (for CONFIG_SPL_FRAMEWORK) is now:

Execution starts with start.S. Two main functions can be provided by the
board implementation. The purpose and limitations of each is described below.
After that, the common board_init_r() is called to perform the SPL task.

lowlevel_init():
	- purpose: essential init to permit execution to reach board_init_f()
	- no global_data, but there is a stack
	- must not set up SDRAM or use console
	- must only do the bare minimum to allow execution to continue to
		board_init_f()
	- this is almost never needed
	- return normally from this function

board_init_f():
	- purpose: set up the machine ready for running board_init_r():
		i.e. SDRAM and serial UART
	- global_data is available
	- preloader_console_init() can be called here in extremis
	- stack is in SRAM
	- should set up SDRAM, and anything needed to make the UART work
	- these is no need to clear BSS, it will be done by crt0.S
	- must return normally from this function (don't call board_init_r()
		directly)

Here the BSS is cleared. If CONFIG_SPL_STACK_R is defined, then at this point
the stack and global_data are relocated to below that address.

board_init_r():
	- purpose: main execution, common code
	- global_data is available
	- SDRAM is available
	- stack is optionally in SDRAM, if CONFIG_SPL_STACK_R is defined and
		points into SDRAM
	- preloader_console_init() can be called here - typically this is
		done by defining CONFIG_SPL_BOARD_INIT and then supplying a
		spl_board_init() function containing this call
	- loads U-Boot or (in falcon mode) Linux

Note: This patch is intended to apply over the top of Tom's SPL changes and
this series:

https://patchwork.ozlabs.org/patch/423785/

Signed-off-by: Simon Glass <sjg@chromium.org>
---

 arch/arm/lib/crt0.S | 13 ++++++++++---
 common/spl/spl.c    | 35 +++++++++++++++++++++++++++++++++++
 doc/README.SPL      | 42 ++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 87 insertions(+), 3 deletions(-)

Comments

Heiko Schocher Jan. 19, 2015, 6:46 a.m. UTC | #1
Hello Simon,

added Bo Shen to cc, as he currently try to set BSS (and stack) into
SDRAM for at91 based boards ... Bo, could you try this aproach?

Am 18.01.2015 19:55, schrieb Simon Glass:
> At present SPL uses a single stack, either CONFIG_SPL_STACK or
> CONFIG_SYS_INIT_SP_ADDR. Since some SPL features (such as MMC and
> environment) require a lot of stack, some boards set CONFIG_SPL_STACK to
> point into SDRAM. They then set up SDRAM very early, before board_init_f(),
> so that the larger stack can be used.
>
> This is an abuse of lowlevel_init(). That function should only be used for
> essential start-up code which cannot be delayed. An example of a valid use is
> when only part of the SPL code is visible/executable, and the SoC must be set
> up so that board_init_f() can be reached. It should not be used for SDRAM
> init, console init, etc.
>
> Add a CONFIG_SPL_STACK_R option, which allows the stack to be moved to a new
> address before board_init_r() is called in SPL.
>
> The expected SPL flow (for CONFIG_SPL_FRAMEWORK) is now:
>
> Execution starts with start.S. Two main functions can be provided by the
> board implementation. The purpose and limitations of each is described below.
> After that, the common board_init_r() is called to perform the SPL task.
>
> lowlevel_init():
> 	- purpose: essential init to permit execution to reach board_init_f()
> 	- no global_data, but there is a stack
> 	- must not set up SDRAM or use console
> 	- must only do the bare minimum to allow execution to continue to
> 		board_init_f()
> 	- this is almost never needed
> 	- return normally from this function
>
> board_init_f():
> 	- purpose: set up the machine ready for running board_init_r():
> 		i.e. SDRAM and serial UART
> 	- global_data is available
> 	- preloader_console_init() can be called here in extremis
> 	- stack is in SRAM
> 	- should set up SDRAM, and anything needed to make the UART work
> 	- these is no need to clear BSS, it will be done by crt0.S
> 	- must return normally from this function (don't call board_init_r()
> 		directly)
>
> Here the BSS is cleared. If CONFIG_SPL_STACK_R is defined, then at this point
> the stack and global_data are relocated to below that address.
>
> board_init_r():
> 	- purpose: main execution, common code
> 	- global_data is available
> 	- SDRAM is available
> 	- stack is optionally in SDRAM, if CONFIG_SPL_STACK_R is defined and
> 		points into SDRAM
> 	- preloader_console_init() can be called here - typically this is
> 		done by defining CONFIG_SPL_BOARD_INIT and then supplying a
> 		spl_board_init() function containing this call
> 	- loads U-Boot or (in falcon mode) Linux
>
> Note: This patch is intended to apply over the top of Tom's SPL changes and

Sound good! Can you point me, which patches you mean? Thanks!

> this series:
>
> https://patchwork.ozlabs.org/patch/423785/
>
> Signed-off-by: Simon Glass <sjg@chromium.org>
> ---
>
>   arch/arm/lib/crt0.S | 13 ++++++++++---
>   common/spl/spl.c    | 35 +++++++++++++++++++++++++++++++++++
>   doc/README.SPL      | 42 ++++++++++++++++++++++++++++++++++++++++++
>   3 files changed, 87 insertions(+), 3 deletions(-)

Hmm... I miss here "arch/arm/lib/spl.c" where "clearing BSS"
should get removed ... or?

> diff --git a/arch/arm/lib/crt0.S b/arch/arm/lib/crt0.S
> index 22df3e5..7939ced 100644
> --- a/arch/arm/lib/crt0.S
> +++ b/arch/arm/lib/crt0.S
> @@ -113,7 +113,14 @@ here:
>   /* Set up final (full) environment */
>
>   	bl	c_runtime_cpu_setup	/* we still call old routine here */
> -
> +#endif
> +#if !defined(CONFIG_SPL_BUILD) || defined(CONFIG_SPL_FRAMEWORK)
> +# ifdef CONFIG_SPL_BUILD
> +	/* Use a DRAM stack for the rest of SPL, if requested */
> +	bl	spl_relocate_stack_gd
> +	cmp	r0, #0
> +	movne	sp, r0
> +# endif
>   	ldr	r0, =__bss_start	/* this is auto-relocated! */
>   	ldr	r1, =__bss_end		/* this is auto-relocated! */
>
> @@ -124,9 +131,10 @@ clbss_l:cmp	r0, r1			/* while not at end of BSS */
>   	addlo	r0, r0, #4		/* move to next */
>   	blo	clbss_l
>
> +#if ! defined(CONFIG_SPL_BUILD)
>   	bl coloured_LED_init
>   	bl red_led_on
> -
> +#endif
>   	/* call board_init_r(gd_t *id, ulong dest_addr) */
>   	mov     r0, r9                  /* gd_t */
>   	ldr	r1, [r9, #GD_RELOCADDR]	/* dest_addr */
> @@ -134,7 +142,6 @@ clbss_l:cmp	r0, r1			/* while not at end of BSS */
>   	ldr	pc, =board_init_r	/* this is auto-relocated! */
>
>   	/* we should not return here. */
> -
>   #endif
>
>   ENDPROC(_main)
> diff --git a/common/spl/spl.c b/common/spl/spl.c
> index 1826c47..78bb279 100644
> --- a/common/spl/spl.c
> +++ b/common/spl/spl.c
> @@ -276,3 +276,38 @@ void preloader_console_init(void)
>   	spl_display_print();
>   #endif
>   }
> +
> +/**
> + * spl_relocate_stack_gd() - Relocate stack ready for board_init_r() execution
> + *
> + * Sometimes board_init_f() runs with a stack in SRAM but we want to use SDRAM
> + * for the main board_init_r() execution. This is typically because we need
> + * more stack space for things like the MMC sub-system.
> + *
> + * This function calculates the stack position, copies the global_data into
> + * place and returns the new stack position. The caller is responsible for
> + * setting up the sp register.
> + *
> + * @return new stack location, or 0 to use the same stack
> + */
> +ulong spl_relocate_stack_gd(void)
> +{
> +#ifdef CONFIG_SPL_STACK_R
> +	gd_t *new_gd;
> +	ulong ptr;
> +
> +	/* Get stack position: use 8-byte alignment for ABI compliance */
> +	ptr = CONFIG_SPL_STACK_R - sizeof(gd_t);
> +	ptr &= ~7;
> +	new_gd = (gd_t *)ptr;
> +	memcpy(new_gd, (void *)gd, sizeof(gd_t));
> +	gd = new_gd;
> +
> +	/* Clear the BSS. */
> +	memset(__bss_start, 0, __bss_end - __bss_start);
> +
> +	return ptr;
> +#else
> +	return 0;
> +#endif
> +}

I try this patch out on a sam9260 board with 4k sram for SPL (if I have the
hw) ... looking into my current objdump with the approach from Bo,
see patch [1], I do not find "memcpy" in it ... I currently have a
SPL size (untested, compiled code only) currently from 3392 bytes, which
perfectly fits in 4k ... adding a memcpy in code, I get a size of 3464 bytes
fits also in 4k ... with space for some init stack in sram  ... so, I try
your patch ASAP, thanks!

bye,
Heiko
[1] http://patchwork.ozlabs.org/patch/429667/

> diff --git a/doc/README.SPL b/doc/README.SPL
> index 3ba313c..327d3e2 100644
> --- a/doc/README.SPL
> +++ b/doc/README.SPL
> @@ -95,3 +95,45 @@ cflow will spit out a number of warnings as it does not parse
>   the config files and picks functions based on #ifdef.  Parsing the '.i'
>   files instead introduces another set of headaches.  These warnings are
>   not usually important to understanding the flow, however.
> +
> +
> +ARM SPL Control Flow
> +--------------------
> +
> +Execution starts with start.S. Two main functions can be provided by the
> +board implementation. The purpose and limitations of each is described below.
> +After that, the common board_init_r() is called to perform the SPL task.
> +
> +lowlevel_init():
> +	- purpose: essential init to permit execution to reach board_init_f()
> +	- no global_data, but there is a stack
> +	- must not set up SDRAM or use console
> +	- must only do the bare minimum to allow execution to continue to
> +		board_init_f()
> +	- this is almost never needed
> +	- return normally from this function
> +
> +board_init_f():
> +	- purpose: set up the machine ready for running board_init_r():
> +		i.e. SDRAM and serial UART
> +	- global_data is available
> +	- preloader_console_init() can be called here in extremis
> +	- stack is in SRAM
> +	- should set up SDRAM, and anything needed to make the UART work
> +	- these is no need to clear BSS, it will be done by crt0.S
> +	- must return normally from this function (don't call board_init_r()
> +		directly)
> +
> +Here the BSS is cleared. If CONFIG_SPL_STACK_R is defined, then at this point
> +the stack and global_data are relocated to below that address.
> +
> +board_init_r():
> +	- purpose: main execution, common code
> +	- global_data is available
> +	- SDRAM is available
> +	- stack is optionally in SDRAM, if CONFIG_SPL_STACK_R is defined and
> +		points into SDRAM
> +	- preloader_console_init() can be called here - typically this is
> +		done by defining CONFIG_SPL_BOARD_INIT and then supplying a
> +		spl_board_init() function containing this call
> +	- loads U-Boot or (in falcon mode) Linux
>
Albert ARIBAUD Jan. 19, 2015, 6:54 a.m. UTC | #2
Hello Simon,

On Sun, 18 Jan 2015 11:55:36 -0700, Simon Glass <sjg@chromium.org>
wrote:
> At present SPL uses a single stack, either CONFIG_SPL_STACK or
> CONFIG_SYS_INIT_SP_ADDR. Since some SPL features (such as MMC and
> environment) require a lot of stack, some boards set CONFIG_SPL_STACK to
> point into SDRAM. They then set up SDRAM very early, before board_init_f(),
> so that the larger stack can be used.
> 
> This is an abuse of lowlevel_init(). That function should only be used for
> essential start-up code which cannot be delayed. An example of a valid use is
> when only part of the SPL code is visible/executable, and the SoC must be set
> up so that board_init_f() can be reached. It should not be used for SDRAM
> init, console init, etc.
> 
> Add a CONFIG_SPL_STACK_R option, which allows the stack to be moved to a new
> address before board_init_r() is called in SPL.
> 
> The expected SPL flow (for CONFIG_SPL_FRAMEWORK) is now:
> 
> Execution starts with start.S. Two main functions can be provided by the
> board implementation. The purpose and limitations of each is described below.
> After that, the common board_init_r() is called to perform the SPL task.
> 
> lowlevel_init():
> 	- purpose: essential init to permit execution to reach board_init_f()
> 	- no global_data, but there is a stack
> 	- must not set up SDRAM or use console
> 	- must only do the bare minimum to allow execution to continue to
> 		board_init_f()
> 	- this is almost never needed
> 	- return normally from this function
> 
> board_init_f():
> 	- purpose: set up the machine ready for running board_init_r():
> 		i.e. SDRAM and serial UART
> 	- global_data is available
> 	- preloader_console_init() can be called here in extremis
> 	- stack is in SRAM
> 	- should set up SDRAM, and anything needed to make the UART work
> 	- these is no need to clear BSS, it will be done by crt0.S
> 	- must return normally from this function (don't call board_init_r()
> 		directly)
> 
> Here the BSS is cleared. If CONFIG_SPL_STACK_R is defined, then at this point
> the stack and global_data are relocated to below that address.
> 
> board_init_r():
> 	- purpose: main execution, common code
> 	- global_data is available
> 	- SDRAM is available
> 	- stack is optionally in SDRAM, if CONFIG_SPL_STACK_R is defined and
> 		points into SDRAM
> 	- preloader_console_init() can be called here - typically this is
> 		done by defining CONFIG_SPL_BOARD_INIT and then supplying a
> 		spl_board_init() function containing this call
> 	- loads U-Boot or (in falcon mode) Linux

Seems to me that now SPL and non-SPL boot sequences are mostly similar
in the name, order and purpose of the functions called (which is a good
thing!) so maybe this sequence should be described in a single document
rather than in doc/README.SPL? Just opening the discussion; I have no
strong opinion on this.

> Note: This patch is intended to apply over the top of Tom's SPL changes and
> this series:
> 
> https://patchwork.ozlabs.org/patch/423785/
> 
> Signed-off-by: Simon Glass <sjg@chromium.org>
> ---

Acked-by: Albert ARIBAUD <albert.u.boot@aribaudnet>

Amicalement,
Stefan Roese Jan. 19, 2015, 7:08 a.m. UTC | #3
On 18.01.2015 19:55, Simon Glass wrote:
> At present SPL uses a single stack, either CONFIG_SPL_STACK or
> CONFIG_SYS_INIT_SP_ADDR. Since some SPL features (such as MMC and
> environment) require a lot of stack, some boards set CONFIG_SPL_STACK to
> point into SDRAM. They then set up SDRAM very early, before board_init_f(),
> so that the larger stack can be used.
>
> This is an abuse of lowlevel_init(). That function should only be used for
> essential start-up code which cannot be delayed. An example of a valid use is
> when only part of the SPL code is visible/executable, and the SoC must be set
> up so that board_init_f() can be reached. It should not be used for SDRAM
> init, console init, etc.
>
> Add a CONFIG_SPL_STACK_R option, which allows the stack to be moved to a new
> address before board_init_r() is called in SPL.
>
> The expected SPL flow (for CONFIG_SPL_FRAMEWORK) is now:
>
> Execution starts with start.S. Two main functions can be provided by the
> board implementation. The purpose and limitations of each is described below.
> After that, the common board_init_r() is called to perform the SPL task.
>
> lowlevel_init():
> 	- purpose: essential init to permit execution to reach board_init_f()
> 	- no global_data, but there is a stack
> 	- must not set up SDRAM or use console
> 	- must only do the bare minimum to allow execution to continue to
> 		board_init_f()
> 	- this is almost never needed
> 	- return normally from this function
>
> board_init_f():
> 	- purpose: set up the machine ready for running board_init_r():
> 		i.e. SDRAM and serial UART
> 	- global_data is available
> 	- preloader_console_init() can be called here in extremis
> 	- stack is in SRAM
> 	- should set up SDRAM, and anything needed to make the UART work
> 	- these is no need to clear BSS, it will be done by crt0.S
> 	- must return normally from this function (don't call board_init_r()
> 		directly)
>
> Here the BSS is cleared. If CONFIG_SPL_STACK_R is defined, then at this point
> the stack and global_data are relocated to below that address.
>
> board_init_r():
> 	- purpose: main execution, common code
> 	- global_data is available
> 	- SDRAM is available
> 	- stack is optionally in SDRAM, if CONFIG_SPL_STACK_R is defined and
> 		points into SDRAM
> 	- preloader_console_init() can be called here - typically this is
> 		done by defining CONFIG_SPL_BOARD_INIT and then supplying a
> 		spl_board_init() function containing this call
> 	- loads U-Boot or (in falcon mode) Linux
>
> Note: This patch is intended to apply over the top of Tom's SPL changes and
> this series:
>
> https://patchwork.ozlabs.org/patch/423785/
>
> Signed-off-by: Simon Glass <sjg@chromium.org>

Reviewed-by: Stefan Roese <sr@denx.de>

Thanks,
Stefan
Stefano Babic Jan. 19, 2015, 7:31 a.m. UTC | #4
Hi Simon,

On 18/01/2015 19:55, Simon Glass wrote:
> At present SPL uses a single stack, either CONFIG_SPL_STACK or
> CONFIG_SYS_INIT_SP_ADDR. Since some SPL features (such as MMC and
> environment) require a lot of stack, some boards set CONFIG_SPL_STACK to
> point into SDRAM. They then set up SDRAM very early, before board_init_f(),
> so that the larger stack can be used.
> 
> This is an abuse of lowlevel_init(). That function should only be used for
> essential start-up code which cannot be delayed. An example of a valid use is
> when only part of the SPL code is visible/executable, and the SoC must be set
> up so that board_init_f() can be reached. It should not be used for SDRAM
> init, console init, etc.
> 
> Add a CONFIG_SPL_STACK_R option, which allows the stack to be moved to a new
> address before board_init_r() is called in SPL.
> 
> The expected SPL flow (for CONFIG_SPL_FRAMEWORK) is now:
> 
> Execution starts with start.S. Two main functions can be provided by the
> board implementation. The purpose and limitations of each is described below.
> After that, the common board_init_r() is called to perform the SPL task.
> 
> lowlevel_init():
> 	- purpose: essential init to permit execution to reach board_init_f()
> 	- no global_data, but there is a stack
> 	- must not set up SDRAM or use console
> 	- must only do the bare minimum to allow execution to continue to
> 		board_init_f()
> 	- this is almost never needed
> 	- return normally from this function
> 
> board_init_f():
> 	- purpose: set up the machine ready for running board_init_r():
> 		i.e. SDRAM and serial UART
> 	- global_data is available
> 	- preloader_console_init() can be called here in extremis
> 	- stack is in SRAM
> 	- should set up SDRAM, and anything needed to make the UART work
> 	- these is no need to clear BSS, it will be done by crt0.S
> 	- must return normally from this function (don't call board_init_r()
> 		directly)
> 
> Here the BSS is cleared. If CONFIG_SPL_STACK_R is defined, then at this point
> the stack and global_data are relocated to below that address.
> 
> board_init_r():
> 	- purpose: main execution, common code
> 	- global_data is available
> 	- SDRAM is available
> 	- stack is optionally in SDRAM, if CONFIG_SPL_STACK_R is defined and
> 		points into SDRAM
> 	- preloader_console_init() can be called here - typically this is
> 		done by defining CONFIG_SPL_BOARD_INIT and then supplying a
> 		spl_board_init() function containing this call
> 	- loads U-Boot or (in falcon mode) Linux
> 
> Note: This patch is intended to apply over the top of Tom's SPL changes and
> this series:
> 
> https://patchwork.ozlabs.org/patch/423785/
> 
> Signed-off-by: Simon Glass <sjg@chromium.org>
> ---
> 
>  arch/arm/lib/crt0.S | 13 ++++++++++---
>  common/spl/spl.c    | 35 +++++++++++++++++++++++++++++++++++
>  doc/README.SPL      | 42 ++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 87 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/arm/lib/crt0.S b/arch/arm/lib/crt0.S
> index 22df3e5..7939ced 100644
> --- a/arch/arm/lib/crt0.S
> +++ b/arch/arm/lib/crt0.S
> @@ -113,7 +113,14 @@ here:
>  /* Set up final (full) environment */
>  
>  	bl	c_runtime_cpu_setup	/* we still call old routine here */
> -
> +#endif
> +#if !defined(CONFIG_SPL_BUILD) || defined(CONFIG_SPL_FRAMEWORK)
> +# ifdef CONFIG_SPL_BUILD
> +	/* Use a DRAM stack for the rest of SPL, if requested */
> +	bl	spl_relocate_stack_gd
> +	cmp	r0, #0
> +	movne	sp, r0
> +# endif
>  	ldr	r0, =__bss_start	/* this is auto-relocated! */
>  	ldr	r1, =__bss_end		/* this is auto-relocated! */
>  
> @@ -124,9 +131,10 @@ clbss_l:cmp	r0, r1			/* while not at end of BSS */
>  	addlo	r0, r0, #4		/* move to next */
>  	blo	clbss_l
>  
> +#if ! defined(CONFIG_SPL_BUILD)
>  	bl coloured_LED_init
>  	bl red_led_on
> -
> +#endif
>  	/* call board_init_r(gd_t *id, ulong dest_addr) */
>  	mov     r0, r9                  /* gd_t */
>  	ldr	r1, [r9, #GD_RELOCADDR]	/* dest_addr */
> @@ -134,7 +142,6 @@ clbss_l:cmp	r0, r1			/* while not at end of BSS */
>  	ldr	pc, =board_init_r	/* this is auto-relocated! */
>  
>  	/* we should not return here. */
> -
>  #endif
>  
>  ENDPROC(_main)
> diff --git a/common/spl/spl.c b/common/spl/spl.c
> index 1826c47..78bb279 100644
> --- a/common/spl/spl.c
> +++ b/common/spl/spl.c
> @@ -276,3 +276,38 @@ void preloader_console_init(void)
>  	spl_display_print();
>  #endif
>  }
> +
> +/**
> + * spl_relocate_stack_gd() - Relocate stack ready for board_init_r() execution
> + *
> + * Sometimes board_init_f() runs with a stack in SRAM but we want to use SDRAM
> + * for the main board_init_r() execution. This is typically because we need
> + * more stack space for things like the MMC sub-system.
> + *
> + * This function calculates the stack position, copies the global_data into
> + * place and returns the new stack position. The caller is responsible for
> + * setting up the sp register.
> + *
> + * @return new stack location, or 0 to use the same stack
> + */
> +ulong spl_relocate_stack_gd(void)
> +{
> +#ifdef CONFIG_SPL_STACK_R
> +	gd_t *new_gd;
> +	ulong ptr;
> +
> +	/* Get stack position: use 8-byte alignment for ABI compliance */
> +	ptr = CONFIG_SPL_STACK_R - sizeof(gd_t);
> +	ptr &= ~7;
> +	new_gd = (gd_t *)ptr;
> +	memcpy(new_gd, (void *)gd, sizeof(gd_t));
> +	gd = new_gd;
> +
> +	/* Clear the BSS. */
> +	memset(__bss_start, 0, __bss_end - __bss_start);
> +
> +	return ptr;
> +#else
> +	return 0;
> +#endif
> +}
> diff --git a/doc/README.SPL b/doc/README.SPL
> index 3ba313c..327d3e2 100644
> --- a/doc/README.SPL
> +++ b/doc/README.SPL
> @@ -95,3 +95,45 @@ cflow will spit out a number of warnings as it does not parse
>  the config files and picks functions based on #ifdef.  Parsing the '.i'
>  files instead introduces another set of headaches.  These warnings are
>  not usually important to understanding the flow, however.
> +
> +
> +ARM SPL Control Flow
> +--------------------
> +
> +Execution starts with start.S. Two main functions can be provided by the
> +board implementation. The purpose and limitations of each is described below.
> +After that, the common board_init_r() is called to perform the SPL task.
> +
> +lowlevel_init():
> +	- purpose: essential init to permit execution to reach board_init_f()
> +	- no global_data, but there is a stack
> +	- must not set up SDRAM or use console
> +	- must only do the bare minimum to allow execution to continue to
> +		board_init_f()
> +	- this is almost never needed
> +	- return normally from this function
> +
> +board_init_f():
> +	- purpose: set up the machine ready for running board_init_r():
> +		i.e. SDRAM and serial UART
> +	- global_data is available
> +	- preloader_console_init() can be called here in extremis
> +	- stack is in SRAM
> +	- should set up SDRAM, and anything needed to make the UART work
> +	- these is no need to clear BSS, it will be done by crt0.S
> +	- must return normally from this function (don't call board_init_r()
> +		directly)
> +
> +Here the BSS is cleared. If CONFIG_SPL_STACK_R is defined, then at this point
> +the stack and global_data are relocated to below that address.
> +
> +board_init_r():
> +	- purpose: main execution, common code
> +	- global_data is available
> +	- SDRAM is available
> +	- stack is optionally in SDRAM, if CONFIG_SPL_STACK_R is defined and
> +		points into SDRAM
> +	- preloader_console_init() can be called here - typically this is
> +		done by defining CONFIG_SPL_BOARD_INIT and then supplying a
> +		spl_board_init() function containing this call
> +	- loads U-Boot or (in falcon mode) Linux
> 

Reviewed-by: Stefano Babic <sbabic@denx.de>

Regards,
Stefano
Simon Glass Jan. 19, 2015, 7:38 p.m. UTC | #5
Hi,

On 18 January 2015 at 23:46, Heiko Schocher <hs@denx.de> wrote:
> Hello Simon,
>
> added Bo Shen to cc, as he currently try to set BSS (and stack) into
> SDRAM for at91 based boards ... Bo, could you try this aproach?
>
> Am 18.01.2015 19:55, schrieb Simon Glass:
>
>> At present SPL uses a single stack, either CONFIG_SPL_STACK or
>> CONFIG_SYS_INIT_SP_ADDR. Since some SPL features (such as MMC and
>> environment) require a lot of stack, some boards set CONFIG_SPL_STACK to
>> point into SDRAM. They then set up SDRAM very early, before
>> board_init_f(),
>> so that the larger stack can be used.
>>
>> This is an abuse of lowlevel_init(). That function should only be used for
>> essential start-up code which cannot be delayed. An example of a valid use
>> is
>> when only part of the SPL code is visible/executable, and the SoC must be
>> set
>> up so that board_init_f() can be reached. It should not be used for SDRAM
>> init, console init, etc.
>>
>> Add a CONFIG_SPL_STACK_R option, which allows the stack to be moved to a
>> new
>> address before board_init_r() is called in SPL.
>>
>> The expected SPL flow (for CONFIG_SPL_FRAMEWORK) is now:
>>
>> Execution starts with start.S. Two main functions can be provided by the
>> board implementation. The purpose and limitations of each is described
>> below.
>> After that, the common board_init_r() is called to perform the SPL task.
>>
>> lowlevel_init():
>>         - purpose: essential init to permit execution to reach
>> board_init_f()
>>         - no global_data, but there is a stack
>>         - must not set up SDRAM or use console
>>         - must only do the bare minimum to allow execution to continue to
>>                 board_init_f()
>>         - this is almost never needed
>>         - return normally from this function
>>
>> board_init_f():
>>         - purpose: set up the machine ready for running board_init_r():
>>                 i.e. SDRAM and serial UART
>>         - global_data is available
>>         - preloader_console_init() can be called here in extremis
>>         - stack is in SRAM
>>         - should set up SDRAM, and anything needed to make the UART work
>>         - these is no need to clear BSS, it will be done by crt0.S
>>         - must return normally from this function (don't call
>> board_init_r()
>>                 directly)
>>
>> Here the BSS is cleared. If CONFIG_SPL_STACK_R is defined, then at this
>> point
>> the stack and global_data are relocated to below that address.
>>
>> board_init_r():
>>         - purpose: main execution, common code
>>         - global_data is available
>>         - SDRAM is available
>>         - stack is optionally in SDRAM, if CONFIG_SPL_STACK_R is defined
>> and
>>                 points into SDRAM
>>         - preloader_console_init() can be called here - typically this is
>>                 done by defining CONFIG_SPL_BOARD_INIT and then supplying
>> a
>>                 spl_board_init() function containing this call
>>         - loads U-Boot or (in falcon mode) Linux
>>
>> Note: This patch is intended to apply over the top of Tom's SPL changes
>> and
>
>
> Sound good! Can you point me, which patches you mean? Thanks!
>

Tom's ones are here, although he is planning to update them I think.

http://patchwork.ozlabs.org/patch/422992/
http://patchwork.ozlabs.org/patch/422993/

>> this series:
>>
>> https://patchwork.ozlabs.org/patch/423785/
>>
>> Signed-off-by: Simon Glass <sjg@chromium.org>
>> ---
>>
>>   arch/arm/lib/crt0.S | 13 ++++++++++---
>>   common/spl/spl.c    | 35 +++++++++++++++++++++++++++++++++++
>>   doc/README.SPL      | 42 ++++++++++++++++++++++++++++++++++++++++++
>>   3 files changed, 87 insertions(+), 3 deletions(-)
>
>
> Hmm... I miss here "arch/arm/lib/spl.c" where "clearing BSS"
> should get removed ... or?
>

I haven't removed it from existing board code, just added it to crt0.S
so that it will then be possible to clean up the board code.

>
>> diff --git a/arch/arm/lib/crt0.S b/arch/arm/lib/crt0.S
>> index 22df3e5..7939ced 100644
>> --- a/arch/arm/lib/crt0.S
>> +++ b/arch/arm/lib/crt0.S
>> @@ -113,7 +113,14 @@ here:
>>   /* Set up final (full) environment */
>>
>>         bl      c_runtime_cpu_setup     /* we still call old routine here
>> */
>> -
>> +#endif
>> +#if !defined(CONFIG_SPL_BUILD) || defined(CONFIG_SPL_FRAMEWORK)
>> +# ifdef CONFIG_SPL_BUILD
>> +       /* Use a DRAM stack for the rest of SPL, if requested */
>> +       bl      spl_relocate_stack_gd
>> +       cmp     r0, #0
>> +       movne   sp, r0
>> +# endif
>>         ldr     r0, =__bss_start        /* this is auto-relocated! */
>>         ldr     r1, =__bss_end          /* this is auto-relocated! */
>>
>> @@ -124,9 +131,10 @@ clbss_l:cmp        r0, r1                  /* while
>> not at end of BSS */
>>         addlo   r0, r0, #4              /* move to next */
>>         blo     clbss_l
>>
>> +#if ! defined(CONFIG_SPL_BUILD)
>>         bl coloured_LED_init
>>         bl red_led_on
>> -
>> +#endif
>>         /* call board_init_r(gd_t *id, ulong dest_addr) */
>>         mov     r0, r9                  /* gd_t */
>>         ldr     r1, [r9, #GD_RELOCADDR] /* dest_addr */
>> @@ -134,7 +142,6 @@ clbss_l:cmp r0, r1                  /* while not at
>> end of BSS */
>>         ldr     pc, =board_init_r       /* this is auto-relocated! */
>>
>>         /* we should not return here. */
>> -
>>   #endif
>>
>>   ENDPROC(_main)
>> diff --git a/common/spl/spl.c b/common/spl/spl.c
>> index 1826c47..78bb279 100644
>> --- a/common/spl/spl.c
>> +++ b/common/spl/spl.c
>> @@ -276,3 +276,38 @@ void preloader_console_init(void)
>>         spl_display_print();
>>   #endif
>>   }
>> +
>> +/**
>> + * spl_relocate_stack_gd() - Relocate stack ready for board_init_r()
>> execution
>> + *
>> + * Sometimes board_init_f() runs with a stack in SRAM but we want to use
>> SDRAM
>> + * for the main board_init_r() execution. This is typically because we
>> need
>> + * more stack space for things like the MMC sub-system.
>> + *
>> + * This function calculates the stack position, copies the global_data
>> into
>> + * place and returns the new stack position. The caller is responsible
>> for
>> + * setting up the sp register.
>> + *
>> + * @return new stack location, or 0 to use the same stack
>> + */
>> +ulong spl_relocate_stack_gd(void)
>> +{
>> +#ifdef CONFIG_SPL_STACK_R
>> +       gd_t *new_gd;
>> +       ulong ptr;
>> +
>> +       /* Get stack position: use 8-byte alignment for ABI compliance */
>> +       ptr = CONFIG_SPL_STACK_R - sizeof(gd_t);
>> +       ptr &= ~7;
>> +       new_gd = (gd_t *)ptr;
>> +       memcpy(new_gd, (void *)gd, sizeof(gd_t));
>> +       gd = new_gd;
>> +
>> +       /* Clear the BSS. */
>> +       memset(__bss_start, 0, __bss_end - __bss_start);
>> +
>> +       return ptr;
>> +#else
>> +       return 0;
>> +#endif
>> +}
>
>
> I try this patch out on a sam9260 board with 4k sram for SPL (if I have the
> hw) ... looking into my current objdump with the approach from Bo,
> see patch [1], I do not find "memcpy" in it ... I currently have a
> SPL size (untested, compiled code only) currently from 3392 bytes, which
> perfectly fits in 4k ... adding a memcpy in code, I get a size of 3464 bytes
> fits also in 4k ... with space for some init stack in sram  ... so, I try
> your patch ASAP, thanks!

OK thanks.

- Simon
Simon Glass Jan. 19, 2015, 7:39 p.m. UTC | #6
Hi Albert,

On 18 January 2015 at 23:54, Albert ARIBAUD <albert.u.boot@aribaud.net> wrote:
> Hello Simon,
>
> On Sun, 18 Jan 2015 11:55:36 -0700, Simon Glass <sjg@chromium.org>
> wrote:
>> At present SPL uses a single stack, either CONFIG_SPL_STACK or
>> CONFIG_SYS_INIT_SP_ADDR. Since some SPL features (such as MMC and
>> environment) require a lot of stack, some boards set CONFIG_SPL_STACK to
>> point into SDRAM. They then set up SDRAM very early, before board_init_f(),
>> so that the larger stack can be used.
>>
>> This is an abuse of lowlevel_init(). That function should only be used for
>> essential start-up code which cannot be delayed. An example of a valid use is
>> when only part of the SPL code is visible/executable, and the SoC must be set
>> up so that board_init_f() can be reached. It should not be used for SDRAM
>> init, console init, etc.
>>
>> Add a CONFIG_SPL_STACK_R option, which allows the stack to be moved to a new
>> address before board_init_r() is called in SPL.
>>
>> The expected SPL flow (for CONFIG_SPL_FRAMEWORK) is now:
>>
>> Execution starts with start.S. Two main functions can be provided by the
>> board implementation. The purpose and limitations of each is described below.
>> After that, the common board_init_r() is called to perform the SPL task.
>>
>> lowlevel_init():
>>       - purpose: essential init to permit execution to reach board_init_f()
>>       - no global_data, but there is a stack
>>       - must not set up SDRAM or use console
>>       - must only do the bare minimum to allow execution to continue to
>>               board_init_f()
>>       - this is almost never needed
>>       - return normally from this function
>>
>> board_init_f():
>>       - purpose: set up the machine ready for running board_init_r():
>>               i.e. SDRAM and serial UART
>>       - global_data is available
>>       - preloader_console_init() can be called here in extremis
>>       - stack is in SRAM
>>       - should set up SDRAM, and anything needed to make the UART work
>>       - these is no need to clear BSS, it will be done by crt0.S
>>       - must return normally from this function (don't call board_init_r()
>>               directly)
>>
>> Here the BSS is cleared. If CONFIG_SPL_STACK_R is defined, then at this point
>> the stack and global_data are relocated to below that address.
>>
>> board_init_r():
>>       - purpose: main execution, common code
>>       - global_data is available
>>       - SDRAM is available
>>       - stack is optionally in SDRAM, if CONFIG_SPL_STACK_R is defined and
>>               points into SDRAM
>>       - preloader_console_init() can be called here - typically this is
>>               done by defining CONFIG_SPL_BOARD_INIT and then supplying a
>>               spl_board_init() function containing this call
>>       - loads U-Boot or (in falcon mode) Linux
>
> Seems to me that now SPL and non-SPL boot sequences are mostly similar
> in the name, order and purpose of the functions called (which is a good
> thing!) so maybe this sequence should be described in a single document
> rather than in doc/README.SPL? Just opening the discussion; I have no
> strong opinion on this.

Yes that is the idea. Yes I think it would be good to documentation
this more generally, although I wonder if we should wait until some
boards actually support this? :-)

Regards,
Simon
Albert ARIBAUD Jan. 20, 2015, 6:46 a.m. UTC | #7
Hello Simon,

On Mon, 19 Jan 2015 12:39:34 -0700, Simon Glass <sjg@chromium.org>
wrote:
> Hi Albert,
> 
> On 18 January 2015 at 23:54, Albert ARIBAUD <albert.u.boot@aribaud.net> wrote:
> > Hello Simon,
> >
> > On Sun, 18 Jan 2015 11:55:36 -0700, Simon Glass <sjg@chromium.org>
> > wrote:
> >> At present SPL uses a single stack, either CONFIG_SPL_STACK or
> >> CONFIG_SYS_INIT_SP_ADDR. Since some SPL features (such as MMC and
> >> environment) require a lot of stack, some boards set CONFIG_SPL_STACK to
> >> point into SDRAM. They then set up SDRAM very early, before board_init_f(),
> >> so that the larger stack can be used.
> >>
> >> This is an abuse of lowlevel_init(). That function should only be used for
> >> essential start-up code which cannot be delayed. An example of a valid use is
> >> when only part of the SPL code is visible/executable, and the SoC must be set
> >> up so that board_init_f() can be reached. It should not be used for SDRAM
> >> init, console init, etc.
> >>
> >> Add a CONFIG_SPL_STACK_R option, which allows the stack to be moved to a new
> >> address before board_init_r() is called in SPL.
> >>
> >> The expected SPL flow (for CONFIG_SPL_FRAMEWORK) is now:
> >>
> >> Execution starts with start.S. Two main functions can be provided by the
> >> board implementation. The purpose and limitations of each is described below.
> >> After that, the common board_init_r() is called to perform the SPL task.
> >>
> >> lowlevel_init():
> >>       - purpose: essential init to permit execution to reach board_init_f()
> >>       - no global_data, but there is a stack
> >>       - must not set up SDRAM or use console
> >>       - must only do the bare minimum to allow execution to continue to
> >>               board_init_f()
> >>       - this is almost never needed
> >>       - return normally from this function
> >>
> >> board_init_f():
> >>       - purpose: set up the machine ready for running board_init_r():
> >>               i.e. SDRAM and serial UART
> >>       - global_data is available
> >>       - preloader_console_init() can be called here in extremis
> >>       - stack is in SRAM
> >>       - should set up SDRAM, and anything needed to make the UART work
> >>       - these is no need to clear BSS, it will be done by crt0.S
> >>       - must return normally from this function (don't call board_init_r()
> >>               directly)
> >>
> >> Here the BSS is cleared. If CONFIG_SPL_STACK_R is defined, then at this point
> >> the stack and global_data are relocated to below that address.
> >>
> >> board_init_r():
> >>       - purpose: main execution, common code
> >>       - global_data is available
> >>       - SDRAM is available
> >>       - stack is optionally in SDRAM, if CONFIG_SPL_STACK_R is defined and
> >>               points into SDRAM
> >>       - preloader_console_init() can be called here - typically this is
> >>               done by defining CONFIG_SPL_BOARD_INIT and then supplying a
> >>               spl_board_init() function containing this call
> >>       - loads U-Boot or (in falcon mode) Linux
> >
> > Seems to me that now SPL and non-SPL boot sequences are mostly similar
> > in the name, order and purpose of the functions called (which is a good
> > thing!) so maybe this sequence should be described in a single document
> > rather than in doc/README.SPL? Just opening the discussion; I have no
> > strong opinion on this.
> 
> Yes that is the idea. Yes I think it would be good to documentation
> this more generally, although I wonder if we should wait until some
> boards actually support this? :-)

Not sure I'm getting your point: waiting for a board to support this
may be a strongly preferred prerequisite for a custodian to apply the
patchset (it would be for me if that landed in my custody, at least);
but that does not prevent posting a v2 with a more complete boot
sequence documentation in a dedicated file.

> Regards,
> Simon

Amicalement,
Bo Shen Jan. 21, 2015, 9:48 a.m. UTC | #8
Hi Simon Glass,

On 01/19/2015 02:55 AM, Simon Glass wrote:
> At present SPL uses a single stack, either CONFIG_SPL_STACK or
> CONFIG_SYS_INIT_SP_ADDR. Since some SPL features (such as MMC and
> environment) require a lot of stack, some boards set CONFIG_SPL_STACK to
> point into SDRAM. They then set up SDRAM very early, before board_init_f(),
> so that the larger stack can be used.
>
> This is an abuse of lowlevel_init(). That function should only be used for
> essential start-up code which cannot be delayed. An example of a valid use is
> when only part of the SPL code is visible/executable, and the SoC must be set
> up so that board_init_f() can be reached. It should not be used for SDRAM
> init, console init, etc.
>
> Add a CONFIG_SPL_STACK_R option, which allows the stack to be moved to a new
> address before board_init_r() is called in SPL.
>
> The expected SPL flow (for CONFIG_SPL_FRAMEWORK) is now:
>
> Execution starts with start.S. Two main functions can be provided by the
> board implementation. The purpose and limitations of each is described below.
> After that, the common board_init_r() is called to perform the SPL task.
>
> lowlevel_init():
> 	- purpose: essential init to permit execution to reach board_init_f()
> 	- no global_data, but there is a stack
> 	- must not set up SDRAM or use console
> 	- must only do the bare minimum to allow execution to continue to
> 		board_init_f()
> 	- this is almost never needed
> 	- return normally from this function
>
> board_init_f():
> 	- purpose: set up the machine ready for running board_init_r():
> 		i.e. SDRAM and serial UART
> 	- global_data is available
> 	- preloader_console_init() can be called here in extremis
> 	- stack is in SRAM
> 	- should set up SDRAM, and anything needed to make the UART work
> 	- these is no need to clear BSS, it will be done by crt0.S
> 	- must return normally from this function (don't call board_init_r()
> 		directly)
>
> Here the BSS is cleared. If CONFIG_SPL_STACK_R is defined, then at this point
> the stack and global_data are relocated to below that address.
>
> board_init_r():
> 	- purpose: main execution, common code
> 	- global_data is available
> 	- SDRAM is available
> 	- stack is optionally in SDRAM, if CONFIG_SPL_STACK_R is defined and
> 		points into SDRAM
> 	- preloader_console_init() can be called here - typically this is
> 		done by defining CONFIG_SPL_BOARD_INIT and then supplying a
> 		spl_board_init() function containing this call
> 	- loads U-Boot or (in falcon mode) Linux
>
> Note: This patch is intended to apply over the top of Tom's SPL changes and
> this series:
>
> https://patchwork.ozlabs.org/patch/423785/
>
> Signed-off-by: Simon Glass <sjg@chromium.org>

Tested-by: Bo Shen <voice.shen@atmel.com>
Acked-by: Bo Shen <voice.shen@atmel.com>

> ---
>
>   arch/arm/lib/crt0.S | 13 ++++++++++---
>   common/spl/spl.c    | 35 +++++++++++++++++++++++++++++++++++
>   doc/README.SPL      | 42 ++++++++++++++++++++++++++++++++++++++++++
>   3 files changed, 87 insertions(+), 3 deletions(-)
>
> diff --git a/arch/arm/lib/crt0.S b/arch/arm/lib/crt0.S
> index 22df3e5..7939ced 100644
> --- a/arch/arm/lib/crt0.S
> +++ b/arch/arm/lib/crt0.S
> @@ -113,7 +113,14 @@ here:
>   /* Set up final (full) environment */
>
>   	bl	c_runtime_cpu_setup	/* we still call old routine here */
> -
> +#endif
> +#if !defined(CONFIG_SPL_BUILD) || defined(CONFIG_SPL_FRAMEWORK)
> +# ifdef CONFIG_SPL_BUILD
> +	/* Use a DRAM stack for the rest of SPL, if requested */
> +	bl	spl_relocate_stack_gd
> +	cmp	r0, #0
> +	movne	sp, r0
> +# endif
>   	ldr	r0, =__bss_start	/* this is auto-relocated! */
>   	ldr	r1, =__bss_end		/* this is auto-relocated! */
>
> @@ -124,9 +131,10 @@ clbss_l:cmp	r0, r1			/* while not at end of BSS */
>   	addlo	r0, r0, #4		/* move to next */
>   	blo	clbss_l
>
> +#if ! defined(CONFIG_SPL_BUILD)
>   	bl coloured_LED_init
>   	bl red_led_on
> -
> +#endif
>   	/* call board_init_r(gd_t *id, ulong dest_addr) */
>   	mov     r0, r9                  /* gd_t */
>   	ldr	r1, [r9, #GD_RELOCADDR]	/* dest_addr */
> @@ -134,7 +142,6 @@ clbss_l:cmp	r0, r1			/* while not at end of BSS */
>   	ldr	pc, =board_init_r	/* this is auto-relocated! */
>
>   	/* we should not return here. */
> -
>   #endif
>
>   ENDPROC(_main)
> diff --git a/common/spl/spl.c b/common/spl/spl.c
> index 1826c47..78bb279 100644
> --- a/common/spl/spl.c
> +++ b/common/spl/spl.c
> @@ -276,3 +276,38 @@ void preloader_console_init(void)
>   	spl_display_print();
>   #endif
>   }
> +
> +/**
> + * spl_relocate_stack_gd() - Relocate stack ready for board_init_r() execution
> + *
> + * Sometimes board_init_f() runs with a stack in SRAM but we want to use SDRAM
> + * for the main board_init_r() execution. This is typically because we need
> + * more stack space for things like the MMC sub-system.
> + *
> + * This function calculates the stack position, copies the global_data into
> + * place and returns the new stack position. The caller is responsible for
> + * setting up the sp register.
> + *
> + * @return new stack location, or 0 to use the same stack
> + */
> +ulong spl_relocate_stack_gd(void)
> +{
> +#ifdef CONFIG_SPL_STACK_R
> +	gd_t *new_gd;
> +	ulong ptr;
> +
> +	/* Get stack position: use 8-byte alignment for ABI compliance */
> +	ptr = CONFIG_SPL_STACK_R - sizeof(gd_t);
> +	ptr &= ~7;
> +	new_gd = (gd_t *)ptr;
> +	memcpy(new_gd, (void *)gd, sizeof(gd_t));
> +	gd = new_gd;
> +
> +	/* Clear the BSS. */
> +	memset(__bss_start, 0, __bss_end - __bss_start);
> +
> +	return ptr;
> +#else
> +	return 0;
> +#endif
> +}
> diff --git a/doc/README.SPL b/doc/README.SPL
> index 3ba313c..327d3e2 100644
> --- a/doc/README.SPL
> +++ b/doc/README.SPL
> @@ -95,3 +95,45 @@ cflow will spit out a number of warnings as it does not parse
>   the config files and picks functions based on #ifdef.  Parsing the '.i'
>   files instead introduces another set of headaches.  These warnings are
>   not usually important to understanding the flow, however.
> +
> +
> +ARM SPL Control Flow
> +--------------------
> +
> +Execution starts with start.S. Two main functions can be provided by the
> +board implementation. The purpose and limitations of each is described below.
> +After that, the common board_init_r() is called to perform the SPL task.
> +
> +lowlevel_init():
> +	- purpose: essential init to permit execution to reach board_init_f()
> +	- no global_data, but there is a stack
> +	- must not set up SDRAM or use console
> +	- must only do the bare minimum to allow execution to continue to
> +		board_init_f()
> +	- this is almost never needed
> +	- return normally from this function
> +
> +board_init_f():
> +	- purpose: set up the machine ready for running board_init_r():
> +		i.e. SDRAM and serial UART
> +	- global_data is available
> +	- preloader_console_init() can be called here in extremis
> +	- stack is in SRAM
> +	- should set up SDRAM, and anything needed to make the UART work
> +	- these is no need to clear BSS, it will be done by crt0.S
> +	- must return normally from this function (don't call board_init_r()
> +		directly)
> +
> +Here the BSS is cleared. If CONFIG_SPL_STACK_R is defined, then at this point
> +the stack and global_data are relocated to below that address.
> +
> +board_init_r():
> +	- purpose: main execution, common code
> +	- global_data is available
> +	- SDRAM is available
> +	- stack is optionally in SDRAM, if CONFIG_SPL_STACK_R is defined and
> +		points into SDRAM
> +	- preloader_console_init() can be called here - typically this is
> +		done by defining CONFIG_SPL_BOARD_INIT and then supplying a
> +		spl_board_init() function containing this call
> +	- loads U-Boot or (in falcon mode) Linux
>

Best Regards,
Bo Shen
Masahiro Yamada Jan. 21, 2015, 10:51 a.m. UTC | #9
Hi Simon,



On Sun, 18 Jan 2015 11:55:36 -0700
Simon Glass <sjg@chromium.org> wrote:

> At present SPL uses a single stack, either CONFIG_SPL_STACK or
> CONFIG_SYS_INIT_SP_ADDR. Since some SPL features (such as MMC and
> environment) require a lot of stack, some boards set CONFIG_SPL_STACK to
> point into SDRAM. They then set up SDRAM very early, before board_init_f(),
> so that the larger stack can be used.
> 
> This is an abuse of lowlevel_init(). That function should only be used for
> essential start-up code which cannot be delayed. An example of a valid use is
> when only part of the SPL code is visible/executable, and the SoC must be set
> up so that board_init_f() can be reached. It should not be used for SDRAM
> init, console init, etc.
> 
> Add a CONFIG_SPL_STACK_R option, which allows the stack to be moved to a new
> address before board_init_r() is called in SPL.
> 
> The expected SPL flow (for CONFIG_SPL_FRAMEWORK) is now:
> 
> Execution starts with start.S. Two main functions can be provided by the
> board implementation. The purpose and limitations of each is described below.
> After that, the common board_init_r() is called to perform the SPL task.
> 
> lowlevel_init():
> 	- purpose: essential init to permit execution to reach board_init_f()
> 	- no global_data, but there is a stack
> 	- must not set up SDRAM or use console
> 	- must only do the bare minimum to allow execution to continue to
> 		board_init_f()
> 	- this is almost never needed
> 	- return normally from this function
> 
> board_init_f():
> 	- purpose: set up the machine ready for running board_init_r():
> 		i.e. SDRAM and serial UART
> 	- global_data is available
> 	- preloader_console_init() can be called here in extremis
> 	- stack is in SRAM
> 	- should set up SDRAM, and anything needed to make the UART work
> 	- these is no need to clear BSS, it will be done by crt0.S
> 	- must return normally from this function (don't call board_init_r()
> 		directly)
> 
> Here the BSS is cleared. If CONFIG_SPL_STACK_R is defined, then at this point
> the stack and global_data are relocated to below that address.
> 
> board_init_r():
> 	- purpose: main execution, common code
> 	- global_data is available
> 	- SDRAM is available
> 	- stack is optionally in SDRAM, if CONFIG_SPL_STACK_R is defined and
> 		points into SDRAM
> 	- preloader_console_init() can be called here - typically this is
> 		done by defining CONFIG_SPL_BOARD_INIT and then supplying a
> 		spl_board_init() function containing this call
> 	- loads U-Boot or (in falcon mode) Linux
> 
> Note: This patch is intended to apply over the top of Tom's SPL changes and
> this series:
> 
> https://patchwork.ozlabs.org/patch/423785/



I still have an opinion that global_data itself
is a nightmare rather than a useful stuff.






> @@ -124,9 +131,10 @@ clbss_l:cmp	r0, r1			/* while not at end of BSS */
>  	addlo	r0, r0, #4		/* move to next */
>  	blo	clbss_l
>  
> +#if ! defined(CONFIG_SPL_BUILD)
>  	bl coloured_LED_init
>  	bl red_led_on
> -
> +#endif

It is not clear to me why this change is related to a larger stack.



>  }
> +
> +/**
> + * spl_relocate_stack_gd() - Relocate stack ready for board_init_r() execution
> + *
> + * Sometimes board_init_f() runs with a stack in SRAM but we want to use SDRAM
> + * for the main board_init_r() execution. This is typically because we need
> + * more stack space for things like the MMC sub-system.
> + *
> + * This function calculates the stack position, copies the global_data into
> + * place and returns the new stack position. The caller is responsible for
> + * setting up the sp register.
> + *
> + * @return new stack location, or 0 to use the same stack
> + */
> +ulong spl_relocate_stack_gd(void)
> +{
> +#ifdef CONFIG_SPL_STACK_R

I guess CONFIG_SPL_STACK_R has the type, "hex", not "bool".

In terms of Kconfig's way, the "ifdef CONFIG_SPL_STACK_R"
is the abuse of the #ifdef conditonal.

When Kconfig was introduced, I might have mentioned
"Add a new CONFIG to Kconfig rather than headers when you introduce a new feature
and document the usage in Kconfig".

Most of people do not stick to that, so I think you can excuse here.

I assume you (or somebody else) will implement it correctly when it is moved to Kconfig.
(I should be easy.)





> diff --git a/doc/README.SPL b/doc/README.SPL
> index 3ba313c..327d3e2 100644
> --- a/doc/README.SPL
> +++ b/doc/README.SPL
> @@ -95,3 +95,45 @@ cflow will spit out a number of warnings as it does not parse
>  the config files and picks functions based on #ifdef.  Parsing the '.i'
>  files instead introduces another set of headaches.  These warnings are
>  not usually important to understanding the flow, however.
> +
> +
> +ARM SPL Control Flow
> +--------------------


Is this flow ARM-specific?
It looks like the following description is very generic
althogh I do not familiar with the other architectures..



> +Execution starts with start.S. Two main functions can be provided by the
> +board implementation. The purpose and limitations of each is described below.
> +After that, the common board_init_r() is called to perform the SPL task.
> +
> +lowlevel_init():
> +	- purpose: essential init to permit execution to reach board_init_f()
> +	- no global_data, but there is a stack
> +	- must not set up SDRAM or use console
> +	- must only do the bare minimum to allow execution to continue to
> +		board_init_f()
> +	- this is almost never needed
> +	- return normally from this function
> +
> +board_init_f():
> +	- purpose: set up the machine ready for running board_init_r():
> +		i.e. SDRAM and serial UART
> +	- global_data is available
> +	- preloader_console_init() can be called here in extremis
> +	- stack is in SRAM
> +	- should set up SDRAM, and anything needed to make the UART work

I do not mean I object against this patch.
From here, this is beyond the scope of this patch.
Just comments about the current SPL boot flow that, I think, is not very nice.

One of the most important tasks of a boot loader is DRAM initialization.
This is sometimes problematic, so the printf debug (or any other early UART feature)
would be really helpful.
If DRAM init fails, I want some error messages on the console rather than the silent die.

In the main U-Boot boot flow (common/board_f.c), the initcalls are invoked in this order:
  initf_dm, serial_init, console_init_f, dram_init

It is nice the UART is available in dram_init().


On the other hand, what is happening in SPL is like this:
 [1] You are introducing CONFIG_SPL_DM
 [2] I assume the legacy drivers will be all dropped including UART of SPL
 [3] The DM scan (dm_init_and_scan) is called in board_init_r()
 [4] As you mentioned above in the README, DRAM should be setup in board_init_f()

Both [3] and [4] together make it difficult the UART debug of dram_init().

I guess [3] is just a temporary workaround in order to introduce DM into SPL and
we will have to re-design the SPL boot flow someday.

Perhaps what will happen next might be to reuse common/board_f.c for SPL.
(i.e.  Generic Board for SPL,    CONFIG_SPL_GENERIC_BOARD?)

Roughly, what we want to do in SPL is all included in common/board_f.c

Moreover, if SPL is enabled, we can skip common/board_f.c in the main U-boot image.
Most of the initializations have already been done in SPL.
We do not have to do board_init_f() twice.






Best Regards
Masahiro Yamada
Heiko Schocher Jan. 21, 2015, 2:11 p.m. UTC | #10
Hello Simon,

Am 19.01.2015 07:46, schrieb Heiko Schocher:
> Hello Simon,
>
> added Bo Shen to cc, as he currently try to set BSS (and stack) into
> SDRAM for at91 based boards ... Bo, could you try this aproach?
>
> Am 18.01.2015 19:55, schrieb Simon Glass:
>> At present SPL uses a single stack, either CONFIG_SPL_STACK or
>> CONFIG_SYS_INIT_SP_ADDR. Since some SPL features (such as MMC and
>> environment) require a lot of stack, some boards set CONFIG_SPL_STACK to
>> point into SDRAM. They then set up SDRAM very early, before board_init_f(),
>> so that the larger stack can be used.
>>
>> This is an abuse of lowlevel_init(). That function should only be used for
>> essential start-up code which cannot be delayed. An example of a valid use is
>> when only part of the SPL code is visible/executable, and the SoC must be set
>> up so that board_init_f() can be reached. It should not be used for SDRAM
>> init, console init, etc.
>>
>> Add a CONFIG_SPL_STACK_R option, which allows the stack to be moved to a new
>> address before board_init_r() is called in SPL.
>>
>> The expected SPL flow (for CONFIG_SPL_FRAMEWORK) is now:
>>
>> Execution starts with start.S. Two main functions can be provided by the
>> board implementation. The purpose and limitations of each is described below.
>> After that, the common board_init_r() is called to perform the SPL task.
>>
>> lowlevel_init():
>>     - purpose: essential init to permit execution to reach board_init_f()
>>     - no global_data, but there is a stack
>>     - must not set up SDRAM or use console
>>     - must only do the bare minimum to allow execution to continue to
>>         board_init_f()
>>     - this is almost never needed
>>     - return normally from this function
>>
>> board_init_f():
>>     - purpose: set up the machine ready for running board_init_r():
>>         i.e. SDRAM and serial UART
>>     - global_data is available
>>     - preloader_console_init() can be called here in extremis
>>     - stack is in SRAM
>>     - should set up SDRAM, and anything needed to make the UART work
>>     - these is no need to clear BSS, it will be done by crt0.S
>>     - must return normally from this function (don't call board_init_r()
>>         directly)
>>
>> Here the BSS is cleared. If CONFIG_SPL_STACK_R is defined, then at this point
>> the stack and global_data are relocated to below that address.
>>
>> board_init_r():
>>     - purpose: main execution, common code
>>     - global_data is available
>>     - SDRAM is available
>>     - stack is optionally in SDRAM, if CONFIG_SPL_STACK_R is defined and
>>         points into SDRAM
>>     - preloader_console_init() can be called here - typically this is
>>         done by defining CONFIG_SPL_BOARD_INIT and then supplying a
>>         spl_board_init() function containing this call
>>     - loads U-Boot or (in falcon mode) Linux
>>
>> Note: This patch is intended to apply over the top of Tom's SPL changes and
>
> Sound good! Can you point me, which patches you mean? Thanks!
>
>> this series:
>>
>> https://patchwork.ozlabs.org/patch/423785/
>>
>> Signed-off-by: Simon Glass <sjg@chromium.org>
>> ---
>>
>>   arch/arm/lib/crt0.S | 13 ++++++++++---
>>   common/spl/spl.c    | 35 +++++++++++++++++++++++++++++++++++
>>   doc/README.SPL      | 42 ++++++++++++++++++++++++++++++++++++++++++
>>   3 files changed, 87 insertions(+), 3 deletions(-)
[...]
>> diff --git a/common/spl/spl.c b/common/spl/spl.c
>> index 1826c47..78bb279 100644
>> --- a/common/spl/spl.c
>> +++ b/common/spl/spl.c
>> @@ -276,3 +276,38 @@ void preloader_console_init(void)
>>       spl_display_print();
>>   #endif
>>   }
>> +
>> +/**
>> + * spl_relocate_stack_gd() - Relocate stack ready for board_init_r() execution
>> + *
>> + * Sometimes board_init_f() runs with a stack in SRAM but we want to use SDRAM
>> + * for the main board_init_r() execution. This is typically because we need
>> + * more stack space for things like the MMC sub-system.
>> + *
>> + * This function calculates the stack position, copies the global_data into
>> + * place and returns the new stack position. The caller is responsible for
>> + * setting up the sp register.
>> + *
>> + * @return new stack location, or 0 to use the same stack
>> + */
>> +ulong spl_relocate_stack_gd(void)
>> +{
>> +#ifdef CONFIG_SPL_STACK_R
>> +    gd_t *new_gd;
>> +    ulong ptr;
>> +
>> +    /* Get stack position: use 8-byte alignment for ABI compliance */
>> +    ptr = CONFIG_SPL_STACK_R - sizeof(gd_t);
>> +    ptr &= ~7;
>> +    new_gd = (gd_t *)ptr;
>> +    memcpy(new_gd, (void *)gd, sizeof(gd_t));
>> +    gd = new_gd;
>> +
>> +    /* Clear the BSS. */
>> +    memset(__bss_start, 0, __bss_end - __bss_start);
>> +
>> +    return ptr;
>> +#else
>> +    return 0;
>> +#endif
>> +}
>
> I try this patch out on a sam9260 board with 4k sram for SPL (if I have the
> hw) ... looking into my current objdump with the approach from Bo,
> see patch [1], I do not find "memcpy" in it ... I currently have a
> SPL size (untested, compiled code only) currently from 3392 bytes, which
> perfectly fits in 4k ... adding a memcpy in code, I get a size of 3464 bytes
> fits also in 4k ... with space for some init stack in sram  ... so, I try
> your patch ASAP, thanks!

Tested on an sam9260 based board (not mainlined yet) with 4k sram
for SPL and relocating stack to RAM with CONFIG_SPL_STACK_R

:-D

Acked-by: Heiko Schocher <hs@denx.de>
Tested-by: Heiko Schocher <hs@denx.de>

Thanks!

bye,
Heiko
BTW: before your patch SPL size was 3392 bytes, with your patch
size shrunk to 3227 bytes :-)
Simon Glass Jan. 21, 2015, 3:51 p.m. UTC | #11
Hi Masahiro,

On 21 January 2015 at 03:51, Masahiro Yamada <yamada.m@jp.panasonic.com> wrote:
> Hi Simon,
>
>
>
> On Sun, 18 Jan 2015 11:55:36 -0700
> Simon Glass <sjg@chromium.org> wrote:
>
>> At present SPL uses a single stack, either CONFIG_SPL_STACK or
>> CONFIG_SYS_INIT_SP_ADDR. Since some SPL features (such as MMC and
>> environment) require a lot of stack, some boards set CONFIG_SPL_STACK to
>> point into SDRAM. They then set up SDRAM very early, before board_init_f(),
>> so that the larger stack can be used.
>>
>> This is an abuse of lowlevel_init(). That function should only be used for
>> essential start-up code which cannot be delayed. An example of a valid use is
>> when only part of the SPL code is visible/executable, and the SoC must be set
>> up so that board_init_f() can be reached. It should not be used for SDRAM
>> init, console init, etc.
>>
>> Add a CONFIG_SPL_STACK_R option, which allows the stack to be moved to a new
>> address before board_init_r() is called in SPL.
>>
>> The expected SPL flow (for CONFIG_SPL_FRAMEWORK) is now:
>>
>> Execution starts with start.S. Two main functions can be provided by the
>> board implementation. The purpose and limitations of each is described below.
>> After that, the common board_init_r() is called to perform the SPL task.
>>
>> lowlevel_init():
>>       - purpose: essential init to permit execution to reach board_init_f()
>>       - no global_data, but there is a stack
>>       - must not set up SDRAM or use console
>>       - must only do the bare minimum to allow execution to continue to
>>               board_init_f()
>>       - this is almost never needed
>>       - return normally from this function
>>
>> board_init_f():
>>       - purpose: set up the machine ready for running board_init_r():
>>               i.e. SDRAM and serial UART
>>       - global_data is available
>>       - preloader_console_init() can be called here in extremis
>>       - stack is in SRAM
>>       - should set up SDRAM, and anything needed to make the UART work
>>       - these is no need to clear BSS, it will be done by crt0.S
>>       - must return normally from this function (don't call board_init_r()
>>               directly)
>>
>> Here the BSS is cleared. If CONFIG_SPL_STACK_R is defined, then at this point
>> the stack and global_data are relocated to below that address.
>>
>> board_init_r():
>>       - purpose: main execution, common code
>>       - global_data is available
>>       - SDRAM is available
>>       - stack is optionally in SDRAM, if CONFIG_SPL_STACK_R is defined and
>>               points into SDRAM
>>       - preloader_console_init() can be called here - typically this is
>>               done by defining CONFIG_SPL_BOARD_INIT and then supplying a
>>               spl_board_init() function containing this call
>>       - loads U-Boot or (in falcon mode) Linux
>>
>> Note: This patch is intended to apply over the top of Tom's SPL changes and
>> this series:
>>
>> https://patchwork.ozlabs.org/patch/423785/
>
>
>
> I still have an opinion that global_data itself
> is a nightmare rather than a useful stuff.
>
>

I believe its function is to provide a place to put variables before
BSS is available. After that I suppose, since the values are sitting
there, it makes sense to keep using it. But it can be abused.

>
>
>
>
>> @@ -124,9 +131,10 @@ clbss_l:cmp      r0, r1                  /* while not at end of BSS */
>>       addlo   r0, r0, #4              /* move to next */
>>       blo     clbss_l
>>
>> +#if ! defined(CONFIG_SPL_BUILD)
>>       bl coloured_LED_init
>>       bl red_led_on
>> -
>> +#endif
>
> It is not clear to me why this change is related to a larger stack.

We don't want to call this code from SPL.

>
>
>
>>  }
>> +
>> +/**
>> + * spl_relocate_stack_gd() - Relocate stack ready for board_init_r() execution
>> + *
>> + * Sometimes board_init_f() runs with a stack in SRAM but we want to use SDRAM
>> + * for the main board_init_r() execution. This is typically because we need
>> + * more stack space for things like the MMC sub-system.
>> + *
>> + * This function calculates the stack position, copies the global_data into
>> + * place and returns the new stack position. The caller is responsible for
>> + * setting up the sp register.
>> + *
>> + * @return new stack location, or 0 to use the same stack
>> + */
>> +ulong spl_relocate_stack_gd(void)
>> +{
>> +#ifdef CONFIG_SPL_STACK_R
>
> I guess CONFIG_SPL_STACK_R has the type, "hex", not "bool".
>
> In terms of Kconfig's way, the "ifdef CONFIG_SPL_STACK_R"
> is the abuse of the #ifdef conditonal.
>
> When Kconfig was introduced, I might have mentioned
> "Add a new CONFIG to Kconfig rather than headers when you introduce a new feature
> and document the usage in Kconfig".
>
> Most of people do not stick to that, so I think you can excuse here.
>
> I assume you (or somebody else) will implement it correctly when it is moved to Kconfig.
> (I should be easy.)

Yes agreed. I can do that. It seems like the concept is agreed at least.

So how about this:

CONFIG_SPL_STACK_R - bool
CONFIG_SPL_STACK_R _SIZE - hex

>
>
>
>
>
>> diff --git a/doc/README.SPL b/doc/README.SPL
>> index 3ba313c..327d3e2 100644
>> --- a/doc/README.SPL
>> +++ b/doc/README.SPL
>> @@ -95,3 +95,45 @@ cflow will spit out a number of warnings as it does not parse
>>  the config files and picks functions based on #ifdef.  Parsing the '.i'
>>  files instead introduces another set of headaches.  These warnings are
>>  not usually important to understanding the flow, however.
>> +
>> +
>> +ARM SPL Control Flow
>> +--------------------
>
>
> Is this flow ARM-specific?
> It looks like the following description is very generic
> althogh I do not familiar with the other architectures..
>

Yes it is at present. My change only affects ARM (32-bit) as that's
all I can test with.

That said, Albert has asked me to generalise this to cover U-Boot as
well as SPL, so I'll do that. I can change the heading here to be
generic, and then mention that it currently applies only for ARM.
>
>
>> +Execution starts with start.S. Two main functions can be provided by the
>> +board implementation. The purpose and limitations of each is described below.
>> +After that, the common board_init_r() is called to perform the SPL task.
>> +
>> +lowlevel_init():
>> +     - purpose: essential init to permit execution to reach board_init_f()
>> +     - no global_data, but there is a stack
>> +     - must not set up SDRAM or use console
>> +     - must only do the bare minimum to allow execution to continue to
>> +             board_init_f()
>> +     - this is almost never needed
>> +     - return normally from this function
>> +
>> +board_init_f():
>> +     - purpose: set up the machine ready for running board_init_r():
>> +             i.e. SDRAM and serial UART
>> +     - global_data is available
>> +     - preloader_console_init() can be called here in extremis
>> +     - stack is in SRAM
>> +     - should set up SDRAM, and anything needed to make the UART work
>
> I do not mean I object against this patch.
> From here, this is beyond the scope of this patch.
> Just comments about the current SPL boot flow that, I think, is not very nice.
>
> One of the most important tasks of a boot loader is DRAM initialization.
> This is sometimes problematic, so the printf debug (or any other early UART feature)
> would be really helpful.
> If DRAM init fails, I want some error messages on the console rather than the silent die.
>
> In the main U-Boot boot flow (common/board_f.c), the initcalls are invoked in this order:
>   initf_dm, serial_init, console_init_f, dram_init
>
> It is nice the UART is available in dram_init().
>
>
> On the other hand, what is happening in SPL is like this:
>  [1] You are introducing CONFIG_SPL_DM
>  [2] I assume the legacy drivers will be all dropped including UART of SPL
>  [3] The DM scan (dm_init_and_scan) is called in board_init_r()
>  [4] As you mentioned above in the README, DRAM should be setup in board_init_f()
>
> Both [3] and [4] together make it difficult the UART debug of dram_init().
>
> I guess [3] is just a temporary workaround in order to introduce DM into SPL and
> we will have to re-design the SPL boot flow someday.
>
> Perhaps what will happen next might be to reuse common/board_f.c for SPL.
> (i.e.  Generic Board for SPL,    CONFIG_SPL_GENERIC_BOARD?)
>
> Roughly, what we want to do in SPL is all included in common/board_f.c
>
> Moreover, if SPL is enabled, we can skip common/board_f.c in the main U-boot image.
> Most of the initializations have already been done in SPL.
> We do not have to do board_init_f() twice.

This idea of duplicate init is not well developed. I think we should
establish some principles here also - e.g. doing the same thing in SPL
and U-Boot proper seems wrong to me.

But for your particular case, I certainly would like the UART to be
available early in SPL. We are in the very early days of driver model
for SPL. Once we get it merged (thanks to Tom's work this should be
soon) we can look at how to get the UART available early. One hacky
idea is to do the DM scan in board_init_f() in SPL, for serial only.
But it can be done.

Yes generic board for SPL is where I'd like to head (but hopefully
using driver model for all init). This patch is a step along the way.

Regards,
Simon
Masahiro Yamada Jan. 21, 2015, 5:12 p.m. UTC | #12
Hi Simon,



> Yes agreed. I can do that. It seems like the concept is agreed at least.
>
> So how about this:
>
> CONFIG_SPL_STACK_R - bool
> CONFIG_SPL_STACK_R _SIZE - hex


Do you need the size of stack?  Or the base address?



> But for your particular case, I certainly would like the UART to be
> available early in SPL. We are in the very early days of driver model
> for SPL. Once we get it merged (thanks to Tom's work this should be
> soon) we can look at how to get the UART available early. One hacky
> idea is to do the DM scan in board_init_f() in SPL, for serial only.
> But it can be done.


I think we can move the DM scan from board_init_r() to board_init_f().
Simon Glass Jan. 21, 2015, 5:38 p.m. UTC | #13
Hi Masahiro,

On 21 January 2015 at 10:12, Masahiro YAMADA <yamada.m@jp.panasonic.com> wrote:
> Hi Simon,
>
>
>
>> Yes agreed. I can do that. It seems like the concept is agreed at least.
>>
>> So how about this:
>>
>> CONFIG_SPL_STACK_R - bool
>> CONFIG_SPL_STACK_R _SIZE - hex
>
>
> Do you need the size of stack?  Or the base address?

The address, of course, so:

CONFIG_SPL_STACK_R - bool
CONFIG_SPL_STACK_R _ADDR - hex

>
>
>
>> But for your particular case, I certainly would like the UART to be
>> available early in SPL. We are in the very early days of driver model
>> for SPL. Once we get it merged (thanks to Tom's work this should be
>> soon) we can look at how to get the UART available early. One hacky
>> idea is to do the DM scan in board_init_f() in SPL, for serial only.
>> But it can be done.
>
>
> I think we can move the DM scan from board_init_r() to board_init_f().

OK. I'll revisit the DM SPL series soon, and see how it all looks.

Regards,
Simon
Heiko Schocher Jan. 22, 2015, 6:28 a.m. UTC | #14
Hello Simon, Masahiro,

Am 21.01.2015 16:51, schrieb Simon Glass:
> Hi Masahiro,
>
> On 21 January 2015 at 03:51, Masahiro Yamada <yamada.m@jp.panasonic.com> wrote:
>> Hi Simon,
>>
>>
>>
>> On Sun, 18 Jan 2015 11:55:36 -0700
>> Simon Glass <sjg@chromium.org> wrote:
>>
>>> At present SPL uses a single stack, either CONFIG_SPL_STACK or
>>> CONFIG_SYS_INIT_SP_ADDR. Since some SPL features (such as MMC and
>>> environment) require a lot of stack, some boards set CONFIG_SPL_STACK to
>>> point into SDRAM. They then set up SDRAM very early, before board_init_f(),
>>> so that the larger stack can be used.
>>>
>>> This is an abuse of lowlevel_init(). That function should only be used for
>>> essential start-up code which cannot be delayed. An example of a valid use is
>>> when only part of the SPL code is visible/executable, and the SoC must be set
>>> up so that board_init_f() can be reached. It should not be used for SDRAM
>>> init, console init, etc.
>>>
>>> Add a CONFIG_SPL_STACK_R option, which allows the stack to be moved to a new
>>> address before board_init_r() is called in SPL.
>>>
>>> The expected SPL flow (for CONFIG_SPL_FRAMEWORK) is now:
>>>
>>> Execution starts with start.S. Two main functions can be provided by the
>>> board implementation. The purpose and limitations of each is described below.
>>> After that, the common board_init_r() is called to perform the SPL task.
>>>
>>> lowlevel_init():
>>>        - purpose: essential init to permit execution to reach board_init_f()
>>>        - no global_data, but there is a stack
>>>        - must not set up SDRAM or use console
>>>        - must only do the bare minimum to allow execution to continue to
>>>                board_init_f()
>>>        - this is almost never needed
>>>        - return normally from this function
>>>
>>> board_init_f():
>>>        - purpose: set up the machine ready for running board_init_r():
>>>                i.e. SDRAM and serial UART
>>>        - global_data is available
>>>        - preloader_console_init() can be called here in extremis
>>>        - stack is in SRAM
>>>        - should set up SDRAM, and anything needed to make the UART work
>>>        - these is no need to clear BSS, it will be done by crt0.S
>>>        - must return normally from this function (don't call board_init_r()
>>>                directly)
>>>
>>> Here the BSS is cleared. If CONFIG_SPL_STACK_R is defined, then at this point
>>> the stack and global_data are relocated to below that address.
>>>
>>> board_init_r():
>>>        - purpose: main execution, common code
>>>        - global_data is available
>>>        - SDRAM is available
>>>        - stack is optionally in SDRAM, if CONFIG_SPL_STACK_R is defined and
>>>                points into SDRAM
>>>        - preloader_console_init() can be called here - typically this is
>>>                done by defining CONFIG_SPL_BOARD_INIT and then supplying a
>>>                spl_board_init() function containing this call
>>>        - loads U-Boot or (in falcon mode) Linux
>>>
>>> Note: This patch is intended to apply over the top of Tom's SPL changes and
>>> this series:
>>>
>>> https://patchwork.ozlabs.org/patch/423785/
>>
>>
>>
>> I still have an opinion that global_data itself
>> is a nightmare rather than a useful stuff.
>>
>>
>
> I believe its function is to provide a place to put variables before
> BSS is available. After that I suppose, since the values are sitting
> there, it makes sense to keep using it. But it can be abused.

Yes, IIRC thats the reason fot it.

>>> @@ -124,9 +131,10 @@ clbss_l:cmp      r0, r1                  /* while not at end of BSS */
>>>        addlo   r0, r0, #4              /* move to next */
>>>        blo     clbss_l
>>>
>>> +#if ! defined(CONFIG_SPL_BUILD)
>>>        bl coloured_LED_init
>>>        bl red_led_on
>>> -
>>> +#endif
>>
>> It is not clear to me why this change is related to a larger stack.
>
> We don't want to call this code from SPL.

Ah, now I see it ... Hmm... I am not sure if the at91 boards
really not want this ... Bo? Andreas? Could you help here?

I wondered also why this is in SPL code ... but just bringing
up SPL on this samp9260 based board with 4k sram only ... there
you have no debug console, and at last switching some LEDs on
is helpfull ... (I must admit that I have no physical contact to
this board, so I should write "is may be helpful" ;-)

>>>   }
>>> +
>>> +/**
>>> + * spl_relocate_stack_gd() - Relocate stack ready for board_init_r() execution
>>> + *
>>> + * Sometimes board_init_f() runs with a stack in SRAM but we want to use SDRAM
>>> + * for the main board_init_r() execution. This is typically because we need
>>> + * more stack space for things like the MMC sub-system.
>>> + *
>>> + * This function calculates the stack position, copies the global_data into
>>> + * place and returns the new stack position. The caller is responsible for
>>> + * setting up the sp register.
>>> + *
>>> + * @return new stack location, or 0 to use the same stack
>>> + */
>>> +ulong spl_relocate_stack_gd(void)
>>> +{
>>> +#ifdef CONFIG_SPL_STACK_R
>>
>> I guess CONFIG_SPL_STACK_R has the type, "hex", not "bool".
>>
>> In terms of Kconfig's way, the "ifdef CONFIG_SPL_STACK_R"
>> is the abuse of the #ifdef conditonal.
>>
>> When Kconfig was introduced, I might have mentioned
>> "Add a new CONFIG to Kconfig rather than headers when you introduce a new feature
>> and document the usage in Kconfig".
>>
>> Most of people do not stick to that, so I think you can excuse here.
>>
>> I assume you (or somebody else) will implement it correctly when it is moved to Kconfig.
>> (I should be easy.)
>
> Yes agreed. I can do that. It seems like the concept is agreed at least.
>
> So how about this:
>
> CONFIG_SPL_STACK_R - bool
> CONFIG_SPL_STACK_R _SIZE - hex

Ok from my side, but it is an address, so I think, "CONFIG_SPL_STACK_R_ADDR"
would be better

>>> diff --git a/doc/README.SPL b/doc/README.SPL
>>> index 3ba313c..327d3e2 100644
>>> --- a/doc/README.SPL
>>> +++ b/doc/README.SPL
>>> @@ -95,3 +95,45 @@ cflow will spit out a number of warnings as it does not parse
>>>   the config files and picks functions based on #ifdef.  Parsing the '.i'
>>>   files instead introduces another set of headaches.  These warnings are
>>>   not usually important to understanding the flow, however.
>>> +
>>> +
>>> +ARM SPL Control Flow
>>> +--------------------
>>
>>
>> Is this flow ARM-specific?
>> It looks like the following description is very generic
>> althogh I do not familiar with the other architectures..
>>
>
> Yes it is at present. My change only affects ARM (32-bit) as that's
> all I can test with.
>
> That said, Albert has asked me to generalise this to cover U-Boot as
> well as SPL, so I'll do that. I can change the heading here to be
> generic, and then mention that it currently applies only for ARM.
>>
>>
>>> +Execution starts with start.S. Two main functions can be provided by the
>>> +board implementation. The purpose and limitations of each is described below.
>>> +After that, the common board_init_r() is called to perform the SPL task.
>>> +
>>> +lowlevel_init():
>>> +     - purpose: essential init to permit execution to reach board_init_f()
>>> +     - no global_data, but there is a stack
>>> +     - must not set up SDRAM or use console
>>> +     - must only do the bare minimum to allow execution to continue to
>>> +             board_init_f()
>>> +     - this is almost never needed
>>> +     - return normally from this function
>>> +
>>> +board_init_f():
>>> +     - purpose: set up the machine ready for running board_init_r():
>>> +             i.e. SDRAM and serial UART
>>> +     - global_data is available
>>> +     - preloader_console_init() can be called here in extremis
>>> +     - stack is in SRAM
>>> +     - should set up SDRAM, and anything needed to make the UART work
>>
>> I do not mean I object against this patch.
>>  From here, this is beyond the scope of this patch.
>> Just comments about the current SPL boot flow that, I think, is not very nice.
>>
>> One of the most important tasks of a boot loader is DRAM initialization.
>> This is sometimes problematic, so the printf debug (or any other early UART feature)
>> would be really helpful.
>> If DRAM init fails, I want some error messages on the console rather than the silent die.
>>
>> In the main U-Boot boot flow (common/board_f.c), the initcalls are invoked in this order:
>>    initf_dm, serial_init, console_init_f, dram_init
>>
>> It is nice the UART is available in dram_init().
>>
>>
>> On the other hand, what is happening in SPL is like this:
>>   [1] You are introducing CONFIG_SPL_DM
>>   [2] I assume the legacy drivers will be all dropped including UART of SPL
>>   [3] The DM scan (dm_init_and_scan) is called in board_init_r()
>>   [4] As you mentioned above in the README, DRAM should be setup in board_init_f()
>>
>> Both [3] and [4] together make it difficult the UART debug of dram_init().
>>
>> I guess [3] is just a temporary workaround in order to introduce DM into SPL and
>> we will have to re-design the SPL boot flow someday.
>>
>> Perhaps what will happen next might be to reuse common/board_f.c for SPL.
>> (i.e.  Generic Board for SPL,    CONFIG_SPL_GENERIC_BOARD?)
>>
>> Roughly, what we want to do in SPL is all included in common/board_f.c
>>
>> Moreover, if SPL is enabled, we can skip common/board_f.c in the main U-boot image.
>> Most of the initializations have already been done in SPL.
>> We do not have to do board_init_f() twice.
>
> This idea of duplicate init is not well developed. I think we should
> establish some principles here also - e.g. doing the same thing in SPL
> and U-Boot proper seems wrong to me.
>
> But for your particular case, I certainly would like the UART to be
> available early in SPL. We are in the very early days of driver model
> for SPL. Once we get it merged (thanks to Tom's work this should be
> soon) we can look at how to get the UART available early. One hacky
> idea is to do the DM scan in board_init_f() in SPL, for serial only.
> But it can be done.
>
> Yes generic board for SPL is where I'd like to head (but hopefully
> using driver model for all init). This patch is a step along the way.

I know, I am geting annoying, but we should not forget boards/SoCs
which have minimal sram ... like the sam9260 with 4k only! So we must
have an option to get SPL running there ... to set BSS and stack
into RAM is a great option, so we have no problems with that anymore,
but there is currently no place to get an UART running on it... so I
feel DM is also to big ...

bye,
Heiko
Masahiro Yamada Jan. 22, 2015, 7:33 a.m. UTC | #15
Simon, Heiko,


On Thu, 22 Jan 2015 07:28:37 +0100
Heiko Schocher <hs@denx.de> wrote:

> > But for your particular case, I certainly would like the UART to be
> > available early in SPL. We are in the very early days of driver model
> > for SPL. Once we get it merged (thanks to Tom's work this should be
> > soon) we can look at how to get the UART available early. One hacky
> > idea is to do the DM scan in board_init_f() in SPL, for serial only.
> > But it can be done.
> >
> > Yes generic board for SPL is where I'd like to head (but hopefully
> > using driver model for all init). This patch is a step along the way.
> 
> I know, I am geting annoying, but we should not forget boards/SoCs
> which have minimal sram ... like the sam9260 with 4k only! So we must
> have an option to get SPL running there ... to set BSS and stack
> into RAM is a great option, so we have no problems with that anymore,
> but there is currently no place to get an UART running on it... so I
> feel DM is also to big ...


Simon,
Have you ever estimated how much the DM increase the binary image size?
(for ARM)

Heiko,
In your case, perhaps, you can disable the DM for SPL.
If you do not use UART on SPL, the DM core is not needed either.



Best Regards
Masahiro Yamada
Simon Glass Jan. 22, 2015, 3:10 p.m. UTC | #16
Hi Masahiro,

On 22 January 2015 at 00:33, Masahiro Yamada <yamada.m@jp.panasonic.com> wrote:
> Simon, Heiko,
>
>
> On Thu, 22 Jan 2015 07:28:37 +0100
> Heiko Schocher <hs@denx.de> wrote:
>
>> > But for your particular case, I certainly would like the UART to be
>> > available early in SPL. We are in the very early days of driver model
>> > for SPL. Once we get it merged (thanks to Tom's work this should be
>> > soon) we can look at how to get the UART available early. One hacky
>> > idea is to do the DM scan in board_init_f() in SPL, for serial only.
>> > But it can be done.
>> >
>> > Yes generic board for SPL is where I'd like to head (but hopefully
>> > using driver model for all init). This patch is a step along the way.
>>
>> I know, I am geting annoying, but we should not forget boards/SoCs
>> which have minimal sram ... like the sam9260 with 4k only! So we must
>> have an option to get SPL running there ... to set BSS and stack
>> into RAM is a great option, so we have no problems with that anymore,
>> but there is currently no place to get an UART running on it... so I
>> feel DM is also to big ...
>
>
> Simon,
> Have you ever estimated how much the DM increase the binary image size?
> (for ARM)

I did originally (around 2.2KB for the core) but it has grown since.
When I revisit SPL (soon) I suppose we will get another data point.

One problem is that the new bus functionality is likely not needed in
SPL for things like AT91. I have a SAM9260 board so can try it out.
However, that chip is very old. Are there chips in common use still
which are so limited in SRAM?

Also Masahiro you should badger your hardware guys to add e.g. 128KB
of SRAM to their next chip!

>
> Heiko,
> In your case, perhaps, you can disable the DM for SPL.
> If you do not use UART on SPL, the DM core is not needed either.

Regards,
Simon
Masahiro Yamada Jan. 23, 2015, 1:53 p.m. UTC | #17
Hi Simon,


On Thu, 22 Jan 2015 08:10:07 -0700
Simon Glass <sjg@chromium.org> wrote:

> Hi Masahiro,
> 
> On 22 January 2015 at 00:33, Masahiro Yamada <yamada.m@jp.panasonic.com> wrote:
> > Simon, Heiko,
> >
> >
> > On Thu, 22 Jan 2015 07:28:37 +0100
> > Heiko Schocher <hs@denx.de> wrote:
> >
> >> > But for your particular case, I certainly would like the UART to be
> >> > available early in SPL. We are in the very early days of driver model
> >> > for SPL. Once we get it merged (thanks to Tom's work this should be
> >> > soon) we can look at how to get the UART available early. One hacky
> >> > idea is to do the DM scan in board_init_f() in SPL, for serial only.
> >> > But it can be done.
> >> >
> >> > Yes generic board for SPL is where I'd like to head (but hopefully
> >> > using driver model for all init). This patch is a step along the way.
> >>
> >> I know, I am geting annoying, but we should not forget boards/SoCs
> >> which have minimal sram ... like the sam9260 with 4k only! So we must
> >> have an option to get SPL running there ... to set BSS and stack
> >> into RAM is a great option, so we have no problems with that anymore,
> >> but there is currently no place to get an UART running on it... so I
> >> feel DM is also to big ...
> >
> >
> > Simon,
> > Have you ever estimated how much the DM increase the binary image size?
> > (for ARM)
> 
> I did originally (around 2.2KB for the core) but it has grown since.
> When I revisit SPL (soon) I suppose we will get another data point.
> 
> One problem is that the new bus functionality is likely not needed in
> SPL for things like AT91. I have a SAM9260 board so can try it out.
> However, that chip is very old. Are there chips in common use still
> which are so limited in SRAM?
> 
> Also Masahiro you should badger your hardware guys to add e.g. 128KB
> of SRAM to their next chip!

I do not think this would happen.

Some old Panasonic SoCs could load 128KB,
but newer ones can only load 64KB.

Looks like SRAM is the first thing hardware folks want to cut down
for chip-shrinking.

Anyway, the SPL image size for my board is around 35KB;  29KB space available.
No problem with enabling SPL DM on my boards!


Best Regards
Masahiro Yamada
diff mbox

Patch

diff --git a/arch/arm/lib/crt0.S b/arch/arm/lib/crt0.S
index 22df3e5..7939ced 100644
--- a/arch/arm/lib/crt0.S
+++ b/arch/arm/lib/crt0.S
@@ -113,7 +113,14 @@  here:
 /* Set up final (full) environment */
 
 	bl	c_runtime_cpu_setup	/* we still call old routine here */
-
+#endif
+#if !defined(CONFIG_SPL_BUILD) || defined(CONFIG_SPL_FRAMEWORK)
+# ifdef CONFIG_SPL_BUILD
+	/* Use a DRAM stack for the rest of SPL, if requested */
+	bl	spl_relocate_stack_gd
+	cmp	r0, #0
+	movne	sp, r0
+# endif
 	ldr	r0, =__bss_start	/* this is auto-relocated! */
 	ldr	r1, =__bss_end		/* this is auto-relocated! */
 
@@ -124,9 +131,10 @@  clbss_l:cmp	r0, r1			/* while not at end of BSS */
 	addlo	r0, r0, #4		/* move to next */
 	blo	clbss_l
 
+#if ! defined(CONFIG_SPL_BUILD)
 	bl coloured_LED_init
 	bl red_led_on
-
+#endif
 	/* call board_init_r(gd_t *id, ulong dest_addr) */
 	mov     r0, r9                  /* gd_t */
 	ldr	r1, [r9, #GD_RELOCADDR]	/* dest_addr */
@@ -134,7 +142,6 @@  clbss_l:cmp	r0, r1			/* while not at end of BSS */
 	ldr	pc, =board_init_r	/* this is auto-relocated! */
 
 	/* we should not return here. */
-
 #endif
 
 ENDPROC(_main)
diff --git a/common/spl/spl.c b/common/spl/spl.c
index 1826c47..78bb279 100644
--- a/common/spl/spl.c
+++ b/common/spl/spl.c
@@ -276,3 +276,38 @@  void preloader_console_init(void)
 	spl_display_print();
 #endif
 }
+
+/**
+ * spl_relocate_stack_gd() - Relocate stack ready for board_init_r() execution
+ *
+ * Sometimes board_init_f() runs with a stack in SRAM but we want to use SDRAM
+ * for the main board_init_r() execution. This is typically because we need
+ * more stack space for things like the MMC sub-system.
+ *
+ * This function calculates the stack position, copies the global_data into
+ * place and returns the new stack position. The caller is responsible for
+ * setting up the sp register.
+ *
+ * @return new stack location, or 0 to use the same stack
+ */
+ulong spl_relocate_stack_gd(void)
+{
+#ifdef CONFIG_SPL_STACK_R
+	gd_t *new_gd;
+	ulong ptr;
+
+	/* Get stack position: use 8-byte alignment for ABI compliance */
+	ptr = CONFIG_SPL_STACK_R - sizeof(gd_t);
+	ptr &= ~7;
+	new_gd = (gd_t *)ptr;
+	memcpy(new_gd, (void *)gd, sizeof(gd_t));
+	gd = new_gd;
+
+	/* Clear the BSS. */
+	memset(__bss_start, 0, __bss_end - __bss_start);
+
+	return ptr;
+#else
+	return 0;
+#endif
+}
diff --git a/doc/README.SPL b/doc/README.SPL
index 3ba313c..327d3e2 100644
--- a/doc/README.SPL
+++ b/doc/README.SPL
@@ -95,3 +95,45 @@  cflow will spit out a number of warnings as it does not parse
 the config files and picks functions based on #ifdef.  Parsing the '.i'
 files instead introduces another set of headaches.  These warnings are
 not usually important to understanding the flow, however.
+
+
+ARM SPL Control Flow
+--------------------
+
+Execution starts with start.S. Two main functions can be provided by the
+board implementation. The purpose and limitations of each is described below.
+After that, the common board_init_r() is called to perform the SPL task.
+
+lowlevel_init():
+	- purpose: essential init to permit execution to reach board_init_f()
+	- no global_data, but there is a stack
+	- must not set up SDRAM or use console
+	- must only do the bare minimum to allow execution to continue to
+		board_init_f()
+	- this is almost never needed
+	- return normally from this function
+
+board_init_f():
+	- purpose: set up the machine ready for running board_init_r():
+		i.e. SDRAM and serial UART
+	- global_data is available
+	- preloader_console_init() can be called here in extremis
+	- stack is in SRAM
+	- should set up SDRAM, and anything needed to make the UART work
+	- these is no need to clear BSS, it will be done by crt0.S
+	- must return normally from this function (don't call board_init_r()
+		directly)
+
+Here the BSS is cleared. If CONFIG_SPL_STACK_R is defined, then at this point
+the stack and global_data are relocated to below that address.
+
+board_init_r():
+	- purpose: main execution, common code
+	- global_data is available
+	- SDRAM is available
+	- stack is optionally in SDRAM, if CONFIG_SPL_STACK_R is defined and
+		points into SDRAM
+	- preloader_console_init() can be called here - typically this is
+		done by defining CONFIG_SPL_BOARD_INIT and then supplying a
+		spl_board_init() function containing this call
+	- loads U-Boot or (in falcon mode) Linux