diff mbox

[U-Boot,v5] Fix board init code to respect the C runtime environment

Message ID 1447690927-18871-1-git-send-email-albert.u.boot@aribaud.net
State Superseded
Delegated to: Tom Rini
Headers show

Commit Message

Albert ARIBAUD Nov. 16, 2015, 4:22 p.m. UTC
board_init_f_mem() alters the C runtime environment's
stack it is actually already using. This is not a valid
behaviour within a C runtime environment.

Split board_init_f_mem into C functions which do not alter
their own stack and always behave properly with respect to
their C runtime environment.

Signed-off-by: Albert ARIBAUD <albert.u.boot@aribaud.net>
---
Cc:ing custodians for all architectures which this
patch affects.

This patch has been build-tested for all there
architectures, and run-tested on ARM OpenRD Client.

For NIOS2, this patch contains all manual v1 and v2
fixes by Thomas.

For x86, this patch contains all manual v2 fixes by Bin.

Changes in v5:
- Reword commit message (including summary/subject)
- Reword comments in ARC code

Changes in v4:
- Add comments on reserving GD at the lowest location
- Add comments on post-incrementing base for each "chunk"

Changes in v3:
- Rebase after commit 9ac4fc82
- Simplify malloc conditional as per commit 9ac4fc82
- Fix NIOS2 return value register (r2, not r4)
- Fix x86 subl argument inversion
- Group allocations in a single function
- Group initializations in a single function

Changes in v2:
- Fix all checkpatch issues
- Fix board_init_f_malloc prototype mismatch
- Fix board_init_[f_]xxx typo in NIOS2
- Fix aarch64 asm 'sub' syntax error

 arch/arc/lib/start.S            |  16 +++---
 arch/arm/lib/crt0.S             |   6 ++-
 arch/arm/lib/crt0_64.S          |   6 ++-
 arch/microblaze/cpu/start.S     |   4 +-
 arch/nios2/cpu/start.S          |  13 +++--
 arch/powerpc/cpu/ppc4xx/start.S |  10 ++--
 arch/x86/cpu/start.S            |   5 +-
 arch/x86/lib/fsp/fsp_common.c   |   4 +-
 common/init/board_init.c        | 108 +++++++++++++++++++++++++++++++++++-----
 include/common.h                |  33 +++++-------
 10 files changed, 146 insertions(+), 59 deletions(-)

Comments

Alexey Brodkin Nov. 16, 2015, 5:25 p.m. UTC | #1
Hi Albert,

On Mon, 2015-11-16 at 17:22 +0100, Albert ARIBAUD wrote:
> board_init_f_mem() alters the C runtime environment's
> stack it is actually already using. This is not a valid
> behaviour within a C runtime environment.
> 
> Split board_init_f_mem into C functions which do not alter
> their own stack and always behave properly with respect to
> their C runtime environment.
> 
> Signed-off-by: Albert ARIBAUD <albert.u.boot@aribaud.net>
> ---
> Cc:ing custodians for all architectures which this
> patch affects.
> 
> This patch has been build-tested for all there
> architectures, and run-tested on ARM OpenRD Client.
> 
> For NIOS2, this patch contains all manual v1 and v2
> fixes by Thomas.
> 
> For x86, this patch contains all manual v2 fixes by Bin.
> 
> Changes in v5:
> - Reword commit message (including summary/subject)
> - Reword comments in ARC code
> 
> Changes in v4:
> - Add comments on reserving GD at the lowest location
> - Add comments on post-incrementing base for each "chunk"
> 
> Changes in v3:
> - Rebase after commit 9ac4fc82
> - Simplify malloc conditional as per commit 9ac4fc82
> - Fix NIOS2 return value register (r2, not r4)
> - Fix x86 subl argument inversion
> - Group allocations in a single function
> - Group initializations in a single function
> 
> Changes in v2:
> - Fix all checkpatch issues
> - Fix board_init_f_malloc prototype mismatch
> - Fix board_init_[f_]xxx typo in NIOS2
> - Fix aarch64 asm 'sub' syntax error
> 
>  arch/arc/lib/start.S            |  16 +++---
>  arch/arm/lib/crt0.S             |   6 ++-
>  arch/arm/lib/crt0_64.S          |   6 ++-
>  arch/microblaze/cpu/start.S     |   4 +-
>  arch/nios2/cpu/start.S          |  13 +++--
>  arch/powerpc/cpu/ppc4xx/start.S |  10 ++--
>  arch/x86/cpu/start.S            |   5 +-
>  arch/x86/lib/fsp/fsp_common.c   |   4 +-
>  common/init/board_init.c        | 108 +++++++++++++++++++++++++++++++++++-----
>  include/common.h                |  33 +++++-------
>  10 files changed, 146 insertions(+), 59 deletions(-)

Looks good to me now, thanks!

-Alexey
Simon Glass Nov. 17, 2015, 4:11 a.m. UTC | #2
Hi Albert,

On 16 November 2015 at 09:22, Albert ARIBAUD <albert.u.boot@aribaud.net> wrote:
>
> board_init_f_mem() alters the C runtime environment's
> stack it is actually already using. This is not a valid
> behaviour within a C runtime environment.
>
> Split board_init_f_mem into C functions which do not alter
> their own stack and always behave properly with respect to
> their C runtime environment.
>
> Signed-off-by: Albert ARIBAUD <albert.u.boot@aribaud.net>
> ---
> Cc:ing custodians for all architectures which this
> patch affects.
>
> This patch has been build-tested for all there
> architectures, and run-tested on ARM OpenRD Client.
>
> For NIOS2, this patch contains all manual v1 and v2
> fixes by Thomas.
>
> For x86, this patch contains all manual v2 fixes by Bin.
>
> Changes in v5:
> - Reword commit message (including summary/subject)
> - Reword comments in ARC code
>
> Changes in v4:
> - Add comments on reserving GD at the lowest location
> - Add comments on post-incrementing base for each "chunk"
>
> Changes in v3:
> - Rebase after commit 9ac4fc82
> - Simplify malloc conditional as per commit 9ac4fc82
> - Fix NIOS2 return value register (r2, not r4)
> - Fix x86 subl argument inversion
> - Group allocations in a single function
> - Group initializations in a single function
>
> Changes in v2:
> - Fix all checkpatch issues
> - Fix board_init_f_malloc prototype mismatch
> - Fix board_init_[f_]xxx typo in NIOS2
> - Fix aarch64 asm 'sub' syntax error
>
>  arch/arc/lib/start.S            |  16 +++---
>  arch/arm/lib/crt0.S             |   6 ++-
>  arch/arm/lib/crt0_64.S          |   6 ++-
>  arch/microblaze/cpu/start.S     |   4 +-
>  arch/nios2/cpu/start.S          |  13 +++--
>  arch/powerpc/cpu/ppc4xx/start.S |  10 ++--
>  arch/x86/cpu/start.S            |   5 +-
>  arch/x86/lib/fsp/fsp_common.c   |   4 +-
>  common/init/board_init.c        | 108 +++++++++++++++++++++++++++++++++++-----
>  include/common.h                |  33 +++++-------
>  10 files changed, 146 insertions(+), 59 deletions(-)
>
> diff --git a/arch/arc/lib/start.S b/arch/arc/lib/start.S
> index 26a5934..fddd536 100644
> --- a/arch/arc/lib/start.S
> +++ b/arch/arc/lib/start.S
> @@ -50,18 +50,20 @@ ENTRY(_start)
>  1:
>  #endif
>
> -       /* Setup stack- and frame-pointers */
> +       /* Establish C runtime stack and frame */
>         mov     %sp, CONFIG_SYS_INIT_SP_ADDR
>         mov     %fp, %sp
>
> -       /* Allocate and zero GD, update SP */
> -       mov     %r0, %sp
> -       bl      board_init_f_mem
> -
> -       /* Update stack- and frame-pointers */
> -       mov     %sp, %r0
> +       /* Get reserved area size */
> +       bl      board_init_f_get_reserve_size
> +       /* Allocate reserved size on stack, adjust frame pointer accordingly */
> +       sub     %sp, %sp, %r0
>         mov     %fp, %sp
>
> +       /* Initialize reserved area */
> +       mov     %r0, %sp
> +       bl      board_init_f_init_reserve
> +
>         /* Zero the one and only argument of "board_init_f" */
>         mov_s   %r0, 0
>         j       board_init_f
> diff --git a/arch/arm/lib/crt0.S b/arch/arm/lib/crt0.S
> index 80548eb..4ec89a4 100644
> --- a/arch/arm/lib/crt0.S
> +++ b/arch/arm/lib/crt0.S
> @@ -82,9 +82,11 @@ ENTRY(_main)
>  #else
>         bic     sp, sp, #7      /* 8-byte alignment for ABI compliance */
>  #endif
> +       bl      board_init_f_get_reserve_size
> +       sub     sp, sp, r0
> +       mov     r9, sp

Why do you need that?

>         mov     r0, sp
> -       bl      board_init_f_mem
> -       mov     sp, r0
> +       bl      board_init_f_init_reserve
>
>         mov     r0, #0
>         bl      board_init_f
> diff --git a/arch/arm/lib/crt0_64.S b/arch/arm/lib/crt0_64.S
> index cef1c71..2891071 100644
> --- a/arch/arm/lib/crt0_64.S
> +++ b/arch/arm/lib/crt0_64.S
> @@ -75,8 +75,10 @@ ENTRY(_main)
>         ldr     x0, =(CONFIG_SYS_INIT_SP_ADDR)
>  #endif
>         bic     sp, x0, #0xf    /* 16-byte alignment for ABI compliance */
> -       bl      board_init_f_mem
> -       mov     sp, x0
> +       bl      board_init_f_get_reserve_size
> +       sub     sp, sp, x0
> +       mov     x0, sp
> +       bl      board_init_f_init_reserve
>
>         mov     x0, #0
>         bl      board_init_f
> diff --git a/arch/microblaze/cpu/start.S b/arch/microblaze/cpu/start.S
> index 14f46a8..206be3e 100644
> --- a/arch/microblaze/cpu/start.S
> +++ b/arch/microblaze/cpu/start.S
> @@ -25,7 +25,7 @@ _start:
>
>         addi    r8, r0, __end
>         mts     rslr, r8
> -       /* TODO: Redo this code to call board_init_f_mem() */
> +       /* TODO: Redo this code to call board_init_f_*() */
>  #if defined(CONFIG_SPL_BUILD)
>         addi    r1, r0, CONFIG_SPL_STACK_ADDR
>         mts     rshr, r1
> @@ -142,7 +142,7 @@ _start:
>         ori     r12, r12, 0x1a0
>         mts     rmsr, r12
>
> -       /* TODO: Redo this code to call board_init_f_mem() */
> +       /* TODO: Redo this code to call board_init_f_*() */
>  clear_bss:
>         /* clear BSS segments */
>         addi    r5, r0, __bss_start
> diff --git a/arch/nios2/cpu/start.S b/arch/nios2/cpu/start.S
> index 54787c5..45bf0fd 100644
> --- a/arch/nios2/cpu/start.S
> +++ b/arch/nios2/cpu/start.S
> @@ -106,14 +106,17 @@ _reloc:
>         stw     r0, 4(sp)
>         mov     fp, sp
>
> -       /* Allocate and zero GD, update SP */
> +       /* Allocate and initialize reserved area, update SP */
> +       movhi   r2, %hi(board_init_f_get_reserve_size@h)
> +       ori     r2, r2, %lo(board_init_f_get_reserve_size@h)
> +       callr   r2
> +       sub     sp, sp, r2
>         mov     r4, sp
> -       movhi   r2, %hi(board_init_f_mem@h)
> -       ori     r2, r2, %lo(board_init_f_mem@h)
> +       movhi   r2, %hi(board_init_f_init_reserve@h)
> +       ori     r2, r2, %lo(board_init_f_init_reserve@h)
>         callr   r2
>
> -       /* Update stack- and frame-pointers */
> -       mov     sp, r2
> +       /* Update frame-pointer */
>         mov     fp, sp
>
>         /* Call board_init_f -- never returns */
> diff --git a/arch/powerpc/cpu/ppc4xx/start.S b/arch/powerpc/cpu/ppc4xx/start.S
> index 77d4040..44222d2 100644
> --- a/arch/powerpc/cpu/ppc4xx/start.S
> +++ b/arch/powerpc/cpu/ppc4xx/start.S
> @@ -761,9 +761,10 @@ _start:
>
>         bl      cpu_init_f      /* run low-level CPU init code     (from Flash) */
>  #ifdef CONFIG_SYS_GENERIC_BOARD
> +       bl      board_init_f_get_reserve_size
> +       sub     r1, r1, r3
>         mr      r3, r1
> -       bl      board_init_f_mem
> -       mr      r1, r3
> +       bl      board_init_f_init_reserve
>         li      r0,0
>         stwu    r0, -4(r1)
>         stwu    r0, -4(r1)
> @@ -1037,9 +1038,10 @@ _start:
>
>         bl      cpu_init_f      /* run low-level CPU init code     (from Flash) */
>  #ifdef CONFIG_SYS_GENERIC_BOARD
> +       bl      board_init_f_get_reserve_size
> +       sub     r1, r1, r3
>         mr      r3, r1
> -       bl      board_init_f_mem
> -       mr      r1, r3
> +       bl      board_init_f_init_reserve
>         stwu    r0, -4(r1)
>         stwu    r0, -4(r1)
>  #endif
> diff --git a/arch/x86/cpu/start.S b/arch/x86/cpu/start.S
> index 5b4ee79..932672c 100644
> --- a/arch/x86/cpu/start.S
> +++ b/arch/x86/cpu/start.S
> @@ -122,9 +122,10 @@ car_init_ret:
>          */
>  #endif
>         /* Set up global data */
> +       call    board_init_f_get_reserve_size
> +       subl    %eax, %esp
>         mov     %esp, %eax
> -       call    board_init_f_mem
> -       mov     %eax, %esp
> +       call    board_init_f_init_reserve
>
>  #ifdef CONFIG_DEBUG_UART
>         call    debug_uart_init
> diff --git a/arch/x86/lib/fsp/fsp_common.c b/arch/x86/lib/fsp/fsp_common.c
> index c78df94..72c458f 100644
> --- a/arch/x86/lib/fsp/fsp_common.c
> +++ b/arch/x86/lib/fsp/fsp_common.c
> @@ -95,8 +95,8 @@ int x86_fsp_init(void)
>                 /*
>                  * The second time we enter here, adjust the size of malloc()
>                  * pool before relocation. Given gd->malloc_base was adjusted
> -                * after the call to board_init_f_mem() in arch/x86/cpu/start.S,
> -                * we should fix up gd->malloc_limit here.
> +                * after the call to board_init_f_init_reserve() in arch/x86/
> +                * cpu/start.S, we should fix up gd->malloc_limit here.
>                  */
>                 gd->malloc_limit += CONFIG_FSP_SYS_MALLOC_F_LEN;
>         }
> diff --git a/common/init/board_init.c b/common/init/board_init.c
> index 1c6126d..db4d9a0 100644
> --- a/common/init/board_init.c
> +++ b/common/init/board_init.c
> @@ -21,39 +21,121 @@ DECLARE_GLOBAL_DATA_PTR;
>  #define _USE_MEMCPY
>  #endif
>
> -/* Unfortunately x86 can't compile this code as gd cannot be assigned */
> -#ifndef CONFIG_X86
> +/* Unfortunately x86 or ARM can't compile this code as gd cannot be assigned */
> +#if !defined(CONFIG_X86) && !defined(CONFIG_ARM)
>  __weak void arch_setup_gd(struct global_data *gd_ptr)
>  {
>         gd = gd_ptr;
>  }
> -#endif /* !CONFIG_X86 */
> +#endif /* !CONFIG_X86 && !CONFIG_SYS_THUMB_BUILD */
>
> -ulong board_init_f_mem(ulong top)
> +/*
> + * Compute size of space to reserve on stack for use as 'globals'
> + * and return size request for reserved area.
> + *
> + * Notes:
> + *
> + * Actual reservation cannot be done from within this function as
> + * it requires altering the C stack pointer, so this will be done by
> + * the caller upon return from this function.
> + *
> + * IMPORTANT:
> + *
> + * The return value of this function has two uses:
> + * - it determines the amount of stack reserved for global data and
> + *   the malloc arena;

early malloc() area

> + * - it determines where the global data will be located, as on some
> + *   architectures the caller will set gd to the base of the reserved
> + *   location.
> + *
> + */
> +
> +ulong board_init_f_get_reserve_size(void)

Another option would be to pass the address and have this function
return the new address. That would simplify the assembly code slightly
for all archs I think. It would also allow you to align the address
for sure, whereas at present it only works if the original address was
aligned, and CONFIG_SYS_MALLOC_F_LEN is aligned.

> +{
> +       ulong size;
> +
> +       /* Reserve GD (rounded up to a multiple of 16 bytes) */
> +       size = roundup(sizeof(struct global_data), 16);
> +
> +       /* Reserve malloc arena */

early malloc() area

> +#if defined(CONFIG_SYS_MALLOC_F)
> +       size += CONFIG_SYS_MALLOC_F_LEN;
> +#endif
> +
> +       return size;
> +}
> +
> +/*
> + * Initialize reserved space (which has been safely allocated on the C
> + * stack from the C runtime environment handling code).
> + *
> + * Notes:
> + *
> + * Actual reservation was done by the caller; the locations from base
> + * to base+size-1 (where 'size' is the value returned by the allocation
> + * function above) can be accessed freely without risk of corrupting the
> + * C runtime environment.
> + *
> + * IMPORTANT:
> + *
> + * Upon return from the allocation function above, on some architectures
> + * the caller will set gd to the lowest reserved location. Therefore, in
> + * this initialization function, the global data MUST be placed at base.
> + *
> + * ALSO IMPORTANT:
> + *
> + * On some architectures, gd will only be set once arch_setup_gd() is
> + * called. Therefore:
> + *
> + *     Do not use 'gd->' until arch_setup_gd() has been called!
> + *
> + * IMPORTANT TOO:
> + *
> + * Initialization for each "chunk" (GD, malloc arena...) ends with an

early malloc() area

> + * incrementation line of the form 'base += <some size>'. The last of
> + * these incrementations seems useless, as base will not be used any
> + * more after this incrementation; but if/when a new "chunk" is appended,
> + * this increment will be essential as it will give base right value for
> + * this new chunk (which will have to end with its own incrementation
> + * statement). Besides, the compiler's optimizer will silently detect
> + * and remove the last base incrementation, therefore leaving that last
> + * (seemingly useless) incrementation causes no code increase.
> + */
> +
> +void board_init_f_init_reserve(ulong base)
>  {
>         struct global_data *gd_ptr;
>  #ifndef _USE_MEMCPY
>         int *ptr;
>  #endif
>
> -       /* Leave space for the stack we are running with now */
> -       top -= 0x40;
> +       /*
> +        * clear GD entirely and set it up
> +        */
>
> -       top -= sizeof(struct global_data);
> -       top = ALIGN(top, 16);
> -       gd_ptr = (struct global_data *)top;
> +       gd_ptr = (struct global_data *)base;
> +       /* zero the area */
>  #ifdef _USE_MEMCPY
>         memset(gd_ptr, '\0', sizeof(*gd));
>  #else
>         for (ptr = (int *)gd_ptr; ptr < (int *)(gd_ptr + 1); )
>                 *ptr++ = 0;
>  #endif
> +       /* set GD unless architecture did it already */
> +#if !defined(CONFIG_X86) && !defined(CONFIG_ARM)
>         arch_setup_gd(gd_ptr);

Why does this not work for ARM?

> +#endif
> +       /* next alloc will be higher by one GD plus 16-byte alignment */
> +       base += roundup(sizeof(struct global_data), 16);
> +
> +       /*
> +        * record malloc arena start
> +        */
>
>  #if defined(CONFIG_SYS_MALLOC_F)
> -       top -= CONFIG_SYS_MALLOC_F_LEN;
> -       gd->malloc_base = top;
> +       /* go down one malloc arena */
> +       gd->malloc_base = base;

gd_ptr?

> +       /* next alloc will be higher by one malloc arena size */
> +       base += CONFIG_SYS_MALLOC_F_LEN;
>  #endif
> -
> -       return top;
>  }
> diff --git a/include/common.h b/include/common.h
> index 09a131d..988d67a 100644
> --- a/include/common.h
> +++ b/include/common.h
> @@ -225,32 +225,25 @@ void board_init_f(ulong);
>  void board_init_r(gd_t *, ulong) __attribute__ ((noreturn));
>
>  /**
> - * board_init_f_mem() - Allocate global data and set stack position
> + * ulong board_init_f_get_reserve_size - return size of reserved area
>   *
>   * This function is called by each architecture very early in the start-up
> - * code to set up the environment for board_init_f(). It allocates space for
> - * global_data (see include/asm-generic/global_data.h) and places the stack
> - * below this.
> + * code to allow the C runtime to reserve space on the stack for writable
> + * 'globals' such as GD and the malloc arena.
>   *
> - * This function requires a stack[1] Normally this is at @top. The function
> - * starts allocating space from 64 bytes below @top. First it creates space
> - * for global_data. Then it calls arch_setup_gd() which sets gd to point to
> - * the global_data space and can reserve additional bytes of space if
> - * required). Finally it allocates early malloc() memory
> - * (CONFIG_SYS_MALLOC_F_LEN). The new top of the stack is just below this,
> - * and it returned by this function.
> + * @return:    size of reserved area
> + */
> +ulong board_init_f_get_reserve_size(void);
> +
> +/**
> + * board_init_f_init_reserve - initialize the reserved area(s)
>   *
> - * [1] Strictly speaking it would be possible to implement this function
> - * in C on many archs such that it does not require a stack. However this
> - * does not seem hugely important as only 64 byte are wasted. The 64 bytes
> - * are used to handle the calling standard which generally requires pushing
> - * addresses or registers onto the stack. We should be able to get away with
> - * less if this becomes important.
> + * This function is called once the C runtime has allocated the reserved
> + * area on the stack. It must initialize the GD at the base of that area.
>   *
> - * @top:       Top of available memory, also normally the top of the stack
> - * @return:    New stack location
> + * @base:      top from which reservation was done
>   */
> -ulong board_init_f_mem(ulong top);
> +void board_init_f_init_reserve(ulong base);
>
>  /**
>   * arch_setup_gd() - Set up the global_data pointer
> --
> 2.5.0
>

Regards,
Simon
Albert ARIBAUD Nov. 17, 2015, 12:59 p.m. UTC | #3
Hello Simon,

On Mon, 16 Nov 2015 21:11:38 -0700, Simon Glass <sjg@chromium.org>
wrote:

> > +++ b/arch/arm/lib/crt0.S
> > @@ -82,9 +82,11 @@ ENTRY(_main)
> >  #else
> >         bic     sp, sp, #7      /* 8-byte alignment for ABI compliance */
> >  #endif
> > +       bl      board_init_f_get_reserve_size
> > +       sub     sp, sp, r0
> > +       mov     r9, sp
> 
> Why do you need that?

To set gd in ARM architecture, as arch_setup_gd() may not work for
ARM, see my 'Note about the ARM case' in my answer dated Sun, 15 Nov
2015 14:51:04 +0100 for details:

https://www.mail-archive.com/u-boot@lists.denx.de/msg192494.html

I shall post a 2-patch v6 with this fix standalone as 1/2 and the rest
of the changes as 2/2.

> > +++ b/common/init/board_init.c
> > @@ -21,39 +21,121 @@ DECLARE_GLOBAL_DATA_PTR;
> >  #define _USE_MEMCPY
> >  #endif
> >
> > -/* Unfortunately x86 can't compile this code as gd cannot be assigned */
> > -#ifndef CONFIG_X86
> > +/* Unfortunately x86 or ARM can't compile this code as gd cannot be assigned */
> > +#if !defined(CONFIG_X86) && !defined(CONFIG_ARM)
> >  __weak void arch_setup_gd(struct global_data *gd_ptr)
> >  {
> >         gd = gd_ptr;
> >  }
> > -#endif /* !CONFIG_X86 */
> > +#endif /* !CONFIG_X86 && !CONFIG_SYS_THUMB_BUILD */
> >
> > -ulong board_init_f_mem(ulong top)
> > +/*
> > + * Compute size of space to reserve on stack for use as 'globals'
> > + * and return size request for reserved area.
> > + *
> > + * Notes:
> > + *
> > + * Actual reservation cannot be done from within this function as
> > + * it requires altering the C stack pointer, so this will be done by
> > + * the caller upon return from this function.
> > + *
> > + * IMPORTANT:
> > + *
> > + * The return value of this function has two uses:
> > + * - it determines the amount of stack reserved for global data and
> > + *   the malloc arena;
> 
> early malloc() area

"Arena", and "malloc arena", are established designations for the
memory space used by malloc implementations; and I prefer to use this
more specific term, as people may use it as a search keyword when
looking for malloc related stuff.

> Another option would be to pass the address and have this function
> return the new address. That would simplify the assembly code slightly
> for all archs I think. It would also allow you to align the address
> for sure, whereas at present it only works if the original address was
> aligned, and CONFIG_SYS_MALLOC_F_LEN is aligned.

Good point, with a few caveats though.

Regarding alignment of CONFIG_SYS_MALLOC_F_LEN:

Indeed, no attempt is made to check that it is aligned (and no attempt
is made in the original code either) -- all the more since there is no
strict definition of what it should be aligned to. There is, actually,
no indication that it should be aligned, although it will probably only
be used up until the last full 4-byte-word (or 8-byte word for 64-bit
architectures). In fact, the alignment of CONFIG_SYS_MALLOC_F_LEN does
not matter much, see (*) below.

Regarding alignment of the original/'top' address:

This top address is the SP for all architectures, and at least for ARM,
it is aligned to an 8-byte boundary, as this is an ARM architecture
EABI requirement. For other architectures, I cannot claim it is, but I
suspect all initial values of SP, which generally are CONFIG_SPL_STACK
or CONFIG_SYS_INIT_SP_ADDR, are (sufficiently) aligned.

And if CONFIG_SPL_STACK and CONFIG_SYS_INIT_SP_ADDR are not
(sufficiently) aligned in their definition, then either we can fix
these definitions (and that of CONFIG_SYS_MALLOC_F_LEN too, while we
are at it), or, if we can only fix this at run time, then this should be
done where the stack pointer is altered, in the crt0.S file or whatever
its equivalent is for any given architecture, outside the C environment.

But that will have to go in another patch dealing with alignment.

(*)

Indeed board_init_f_get_reserve_size may have to respect some location
or size alignment for each of the chunks it reserves. Right now, for
instance, GD is aligned to a 16-byte boundary, and the malloc arena is
aligned by virtue of the rounded value of CONFIG_SYS_MALLOC_F_LEN.

And yes, should some new reservation be made beside GD and the malloc
arena, it might require some specific alignment not dealt with so far;
for instance, we may need to reserve some 4KB-aligned chunk for memory
management purposes, or whatever, and there is no guarantee that the
original stack is 4KB-aligned.

In that light, ulong board_init_f_get_reserve_size(void) does not
permit controlled alignment, whereas ulong board_init_f_reserve(ulong
top) does, since we can round 'top' down to any value we like.

Note, however, that it will not simplify assembly code: it will turn a
subtraction from sp into an assignment to sp, which is not simpler, and
it will add an assignment to whatever register represents the first
argument, since we'll turn a (void) function into a (ulong top)
function, so all in all, it will add one assembly instruction with
respect to the 'ulong board_init_f_get_reserve_size(void)' approach.

> > +       /* set GD unless architecture did it already */
> > +#if !defined(CONFIG_X86) && !defined(CONFIG_ARM)
> >         arch_setup_gd(gd_ptr);
> 
> Why does this not work for ARM?

See above, but note that whatever the architecture, gd will be useable
after this call, so :

> > +#endif
> > +       /* next alloc will be higher by one GD plus 16-byte alignment */
> > +       base += roundup(sizeof(struct global_data), 16);
> > +
> > +       /*
> > +        * record malloc arena start
> > +        */
> >
> >  #if defined(CONFIG_SYS_MALLOC_F)
> > -       top -= CONFIG_SYS_MALLOC_F_LEN;
> > -       gd->malloc_base = top;
> > +       /* go down one malloc arena */
> > +       gd->malloc_base = base;
> 
> gd_ptr?

gd works at this point, and I want to avoid any aliasing issue.

> Regards,
> Simon

Thanks for your comments! I'll turn

	ulong board_init_f_get_reserve_size(void)

into

	ulong board_init_f_get_reserve(ulong top)

So that arbitrary alignments will be possible, and I will move the r9
fix into its own patch.

Amicalement,
Albert ARIBAUD Nov. 17, 2015, 1:39 p.m. UTC | #4
... and of course, I forgot to add a note at '(*)' re the malloc arena
size alignment...

On Tue, 17 Nov 2015 13:59:32 +0100, Albert ARIBAUD
<albert.u.boot@aribaud.net> wrote:

> Hello Simon,
> Indeed, no attempt is made to check that it is aligned (and no attempt
> is made in the original code either) -- all the more since there is no
> strict definition of what it should be aligned to. There is, actually,
> no indication that it should be aligned, although it will probably only
> be used up until the last full 4-byte-word (or 8-byte word for 64-bit
> architectures). In fact, the alignment of CONFIG_SYS_MALLOC_F_LEN does
> not matter much, see (*) below.

> (*)

And, in fact, there is not real need for CONFIG_SYS_MALLOC_F_LEN to be
aligned, whether is is reserved above or below GD.

If the malloc arena is reserved above GD (my preferred scenario, as
explained elsewhere), then what counts is aligning its *base*, not
*size*, in the probable case that it should support word accesses and
some architecture may require word alignment.

If the malloc arena is reserved below GD, then what counts is aligning
*GD's base*, not the malloc arena's size.

(that's a general principle, BTW: each 'chunk' allocation should decrease
the current top address by the chunk size requested then round it further
down by the alignment constraint applicable to that chunk. The chunk size
does not matter much.)

Amicalement,
Simon Glass Nov. 19, 2015, 1:05 a.m. UTC | #5
Hi Albert,

On 17 November 2015 at 05:59, Albert ARIBAUD <albert.u.boot@aribaud.net> wrote:
> Hello Simon,
>
> On Mon, 16 Nov 2015 21:11:38 -0700, Simon Glass <sjg@chromium.org>
> wrote:
>
>> > +++ b/arch/arm/lib/crt0.S
>> > @@ -82,9 +82,11 @@ ENTRY(_main)
>> >  #else
>> >         bic     sp, sp, #7      /* 8-byte alignment for ABI compliance */
>> >  #endif
>> > +       bl      board_init_f_get_reserve_size
>> > +       sub     sp, sp, r0
>> > +       mov     r9, sp
>>
>> Why do you need that?
>
> To set gd in ARM architecture, as arch_setup_gd() may not work for
> ARM, see my 'Note about the ARM case' in my answer dated Sun, 15 Nov
> 2015 14:51:04 +0100 for details:
>
> https://www.mail-archive.com/u-boot@lists.denx.de/msg192494.html
>

OK I understand that now.

> I shall post a 2-patch v6 with this fix standalone as 1/2 and the rest
> of the changes as 2/2.
>
>> > +++ b/common/init/board_init.c
>> > @@ -21,39 +21,121 @@ DECLARE_GLOBAL_DATA_PTR;
>> >  #define _USE_MEMCPY
>> >  #endif
>> >
>> > -/* Unfortunately x86 can't compile this code as gd cannot be assigned */
>> > -#ifndef CONFIG_X86
>> > +/* Unfortunately x86 or ARM can't compile this code as gd cannot be assigned */
>> > +#if !defined(CONFIG_X86) && !defined(CONFIG_ARM)
>> >  __weak void arch_setup_gd(struct global_data *gd_ptr)
>> >  {
>> >         gd = gd_ptr;
>> >  }
>> > -#endif /* !CONFIG_X86 */
>> > +#endif /* !CONFIG_X86 && !CONFIG_SYS_THUMB_BUILD */
>> >
>> > -ulong board_init_f_mem(ulong top)
>> > +/*
>> > + * Compute size of space to reserve on stack for use as 'globals'
>> > + * and return size request for reserved area.
>> > + *
>> > + * Notes:
>> > + *
>> > + * Actual reservation cannot be done from within this function as
>> > + * it requires altering the C stack pointer, so this will be done by
>> > + * the caller upon return from this function.
>> > + *
>> > + * IMPORTANT:
>> > + *
>> > + * The return value of this function has two uses:
>> > + * - it determines the amount of stack reserved for global data and
>> > + *   the malloc arena;
>>
>> early malloc() area
>
> "Arena", and "malloc arena", are established designations for the
> memory space used by malloc implementations; and I prefer to use this
> more specific term, as people may use it as a search keyword when
> looking for malloc related stuff.

Arena is OK, but can you please mention 'early' each time? It's
confusing otherwise. I think we should have a clear distinction
between the early malloc() and full malloc().

>
>> Another option would be to pass the address and have this function
>> return the new address. That would simplify the assembly code slightly
>> for all archs I think. It would also allow you to align the address
>> for sure, whereas at present it only works if the original address was
>> aligned, and CONFIG_SYS_MALLOC_F_LEN is aligned.
>
> Good point, with a few caveats though.
>
> Regarding alignment of CONFIG_SYS_MALLOC_F_LEN:
>
> Indeed, no attempt is made to check that it is aligned (and no attempt
> is made in the original code either) -- all the more since there is no
> strict definition of what it should be aligned to. There is, actually,
> no indication that it should be aligned, although it will probably only
> be used up until the last full 4-byte-word (or 8-byte word for 64-bit
> architectures). In fact, the alignment of CONFIG_SYS_MALLOC_F_LEN does
> not matter much, see (*) below.
>
> Regarding alignment of the original/'top' address:
>
> This top address is the SP for all architectures, and at least for ARM,
> it is aligned to an 8-byte boundary, as this is an ARM architecture
> EABI requirement. For other architectures, I cannot claim it is, but I
> suspect all initial values of SP, which generally are CONFIG_SPL_STACK
> or CONFIG_SYS_INIT_SP_ADDR, are (sufficiently) aligned.
>
> And if CONFIG_SPL_STACK and CONFIG_SYS_INIT_SP_ADDR are not
> (sufficiently) aligned in their definition, then either we can fix
> these definitions (and that of CONFIG_SYS_MALLOC_F_LEN too, while we
> are at it), or, if we can only fix this at run time, then this should be
> done where the stack pointer is altered, in the crt0.S file or whatever
> its equivalent is for any given architecture, outside the C environment.
>
> But that will have to go in another patch dealing with alignment.

If you can have it so that the stack top equals the global_data
bottom, then we should be OK.

>
> (*)
>
> Indeed board_init_f_get_reserve_size may have to respect some location
> or size alignment for each of the chunks it reserves. Right now, for
> instance, GD is aligned to a 16-byte boundary, and the malloc arena is
> aligned by virtue of the rounded value of CONFIG_SYS_MALLOC_F_LEN.
>
> And yes, should some new reservation be made beside GD and the malloc
> arena, it might require some specific alignment not dealt with so far;
> for instance, we may need to reserve some 4KB-aligned chunk for memory
> management purposes, or whatever, and there is no guarantee that the
> original stack is 4KB-aligned.
>
> In that light, ulong board_init_f_get_reserve_size(void) does not
> permit controlled alignment, whereas ulong board_init_f_reserve(ulong
> top) does, since we can round 'top' down to any value we like.
>
> Note, however, that it will not simplify assembly code: it will turn a
> subtraction from sp into an assignment to sp, which is not simpler, and
> it will add an assignment to whatever register represents the first
> argument, since we'll turn a (void) function into a (ulong top)
> function, so all in all, it will add one assembly instruction with
> respect to the 'ulong board_init_f_get_reserve_size(void)' approach.

True, but now we are just passing values around rather than doing
arithmetic in assembler.

>
>> > +       /* set GD unless architecture did it already */
>> > +#if !defined(CONFIG_X86) && !defined(CONFIG_ARM)
>> >         arch_setup_gd(gd_ptr);
>>
>> Why does this not work for ARM?
>
> See above, but note that whatever the architecture, gd will be useable
> after this call, so :
>
>> > +#endif
>> > +       /* next alloc will be higher by one GD plus 16-byte alignment */
>> > +       base += roundup(sizeof(struct global_data), 16);
>> > +
>> > +       /*
>> > +        * record malloc arena start
>> > +        */
>> >
>> >  #if defined(CONFIG_SYS_MALLOC_F)
>> > -       top -= CONFIG_SYS_MALLOC_F_LEN;
>> > -       gd->malloc_base = top;
>> > +       /* go down one malloc arena */
>> > +       gd->malloc_base = base;
>>
>> gd_ptr?
>
> gd works at this point, and I want to avoid any aliasing issue.

I don't really understand that, but if you want to use gd I think it
would be worth a one-line comment.

>
>> Regards,
>> Simon
>
> Thanks for your comments! I'll turn
>
>         ulong board_init_f_get_reserve_size(void)
>
> into
>
>         ulong board_init_f_get_reserve(ulong top)
>
> So that arbitrary alignments will be possible, and I will move the r9
> fix into its own patch.

OK sounds good.

>
> Amicalement,
> --
> Albert.
Albert ARIBAUD Nov. 20, 2015, 4:26 p.m. UTC | #6
Hello Simon,

On Wed, 18 Nov 2015 18:05:07 -0700, Simon Glass <sjg@chromium.org>
wrote:
> Hi Albert,
> 
> >> early malloc() area
> >
> > "Arena", and "malloc arena", are established designations for the
> > memory space used by malloc implementations; and I prefer to use this
> > more specific term, as people may use it as a search keyword when
> > looking for malloc related stuff.
> 
> Arena is OK, but can you please mention 'early' each time? It's
> confusing otherwise. I think we should have a clear distinction
> between the early malloc() and full malloc().

Good point, will do.

> If you can have it so that the stack top equals the global_data
> bottom, then we should be OK.

Will do that in v6.

> > Note, however, that it will not simplify assembly code: it will turn a
> > subtraction from sp into an assignment to sp, which is not simpler, and
> > it will add an assignment to whatever register represents the first
> > argument, since we'll turn a (void) function into a (ulong top)
> > function, so all in all, it will add one assembly instruction with
> > respect to the 'ulong board_init_f_get_reserve_size(void)' approach.
> 
> True, but now we are just passing values around rather than doing
> arithmetic in assembler.

Well, at the C level an addition may be more complex than an
assignment, but at the assembly level, that's pretty much equally
simple. Anyway, the change is ok by me.

> > gd works at this point, and I want to avoid any aliasing issue.
> 
> I don't really understand that, but if you want to use gd I think it
> would be worth a one-line comment.

I believe it is already present in v5:

+ * ALSO IMPORTANT:
+ *
+ * On some architectures, gd will only be set once arch_setup_gd() is
+ * called. Therefore:
+ *
+ * 	Do not use 'gd->' until arch_setup_gd() has been called!

I will expand that comment a bit, and add a one-liner in the code too.

Thanks again for your comments!

Amicalement,
diff mbox

Patch

diff --git a/arch/arc/lib/start.S b/arch/arc/lib/start.S
index 26a5934..fddd536 100644
--- a/arch/arc/lib/start.S
+++ b/arch/arc/lib/start.S
@@ -50,18 +50,20 @@  ENTRY(_start)
 1:
 #endif
 
-	/* Setup stack- and frame-pointers */
+	/* Establish C runtime stack and frame */
 	mov	%sp, CONFIG_SYS_INIT_SP_ADDR
 	mov	%fp, %sp
 
-	/* Allocate and zero GD, update SP */
-	mov	%r0, %sp
-	bl	board_init_f_mem
-
-	/* Update stack- and frame-pointers */
-	mov	%sp, %r0
+	/* Get reserved area size */
+	bl	board_init_f_get_reserve_size
+	/* Allocate reserved size on stack, adjust frame pointer accordingly */
+	sub	%sp, %sp, %r0
 	mov	%fp, %sp
 
+	/* Initialize reserved area */
+	mov	%r0, %sp
+	bl	board_init_f_init_reserve
+
 	/* Zero the one and only argument of "board_init_f" */
 	mov_s	%r0, 0
 	j	board_init_f
diff --git a/arch/arm/lib/crt0.S b/arch/arm/lib/crt0.S
index 80548eb..4ec89a4 100644
--- a/arch/arm/lib/crt0.S
+++ b/arch/arm/lib/crt0.S
@@ -82,9 +82,11 @@  ENTRY(_main)
 #else
 	bic	sp, sp, #7	/* 8-byte alignment for ABI compliance */
 #endif
+	bl	board_init_f_get_reserve_size
+	sub	sp, sp, r0
+	mov	r9, sp
 	mov	r0, sp
-	bl	board_init_f_mem
-	mov	sp, r0
+	bl	board_init_f_init_reserve
 
 	mov	r0, #0
 	bl	board_init_f
diff --git a/arch/arm/lib/crt0_64.S b/arch/arm/lib/crt0_64.S
index cef1c71..2891071 100644
--- a/arch/arm/lib/crt0_64.S
+++ b/arch/arm/lib/crt0_64.S
@@ -75,8 +75,10 @@  ENTRY(_main)
 	ldr	x0, =(CONFIG_SYS_INIT_SP_ADDR)
 #endif
 	bic	sp, x0, #0xf	/* 16-byte alignment for ABI compliance */
-	bl	board_init_f_mem
-	mov	sp, x0
+	bl	board_init_f_get_reserve_size
+	sub	sp, sp, x0
+	mov	x0, sp
+	bl	board_init_f_init_reserve
 
 	mov	x0, #0
 	bl	board_init_f
diff --git a/arch/microblaze/cpu/start.S b/arch/microblaze/cpu/start.S
index 14f46a8..206be3e 100644
--- a/arch/microblaze/cpu/start.S
+++ b/arch/microblaze/cpu/start.S
@@ -25,7 +25,7 @@  _start:
 
 	addi	r8, r0, __end
 	mts	rslr, r8
-	/* TODO: Redo this code to call board_init_f_mem() */
+	/* TODO: Redo this code to call board_init_f_*() */
 #if defined(CONFIG_SPL_BUILD)
 	addi	r1, r0, CONFIG_SPL_STACK_ADDR
 	mts	rshr, r1
@@ -142,7 +142,7 @@  _start:
 	ori	r12, r12, 0x1a0
 	mts	rmsr, r12
 
-	/* TODO: Redo this code to call board_init_f_mem() */
+	/* TODO: Redo this code to call board_init_f_*() */
 clear_bss:
 	/* clear BSS segments */
 	addi	r5, r0, __bss_start
diff --git a/arch/nios2/cpu/start.S b/arch/nios2/cpu/start.S
index 54787c5..45bf0fd 100644
--- a/arch/nios2/cpu/start.S
+++ b/arch/nios2/cpu/start.S
@@ -106,14 +106,17 @@  _reloc:
 	stw	r0, 4(sp)
 	mov	fp, sp
 
-	/* Allocate and zero GD, update SP */
+	/* Allocate and initialize reserved area, update SP */
+	movhi	r2, %hi(board_init_f_get_reserve_size@h)
+	ori	r2, r2, %lo(board_init_f_get_reserve_size@h)
+	callr	r2
+	sub	sp, sp, r2
 	mov	r4, sp
-	movhi	r2, %hi(board_init_f_mem@h)
-	ori	r2, r2, %lo(board_init_f_mem@h)
+	movhi	r2, %hi(board_init_f_init_reserve@h)
+	ori	r2, r2, %lo(board_init_f_init_reserve@h)
 	callr	r2
 
-	/* Update stack- and frame-pointers */
-	mov	sp, r2
+	/* Update frame-pointer */
 	mov	fp, sp
 
 	/* Call board_init_f -- never returns */
diff --git a/arch/powerpc/cpu/ppc4xx/start.S b/arch/powerpc/cpu/ppc4xx/start.S
index 77d4040..44222d2 100644
--- a/arch/powerpc/cpu/ppc4xx/start.S
+++ b/arch/powerpc/cpu/ppc4xx/start.S
@@ -761,9 +761,10 @@  _start:
 
 	bl	cpu_init_f	/* run low-level CPU init code	   (from Flash) */
 #ifdef CONFIG_SYS_GENERIC_BOARD
+	bl	board_init_f_get_reserve_size
+	sub	r1, r1, r3
 	mr	r3, r1
-	bl	board_init_f_mem
-	mr	r1, r3
+	bl	board_init_f_init_reserve
 	li	r0,0
 	stwu	r0, -4(r1)
 	stwu	r0, -4(r1)
@@ -1037,9 +1038,10 @@  _start:
 
 	bl	cpu_init_f	/* run low-level CPU init code	   (from Flash) */
 #ifdef CONFIG_SYS_GENERIC_BOARD
+	bl	board_init_f_get_reserve_size
+	sub	r1, r1, r3
 	mr	r3, r1
-	bl	board_init_f_mem
-	mr	r1, r3
+	bl	board_init_f_init_reserve
 	stwu	r0, -4(r1)
 	stwu	r0, -4(r1)
 #endif
diff --git a/arch/x86/cpu/start.S b/arch/x86/cpu/start.S
index 5b4ee79..932672c 100644
--- a/arch/x86/cpu/start.S
+++ b/arch/x86/cpu/start.S
@@ -122,9 +122,10 @@  car_init_ret:
 	 */
 #endif
 	/* Set up global data */
+	call	board_init_f_get_reserve_size
+	subl	%eax, %esp
 	mov	%esp, %eax
-	call	board_init_f_mem
-	mov	%eax, %esp
+	call	board_init_f_init_reserve
 
 #ifdef CONFIG_DEBUG_UART
 	call	debug_uart_init
diff --git a/arch/x86/lib/fsp/fsp_common.c b/arch/x86/lib/fsp/fsp_common.c
index c78df94..72c458f 100644
--- a/arch/x86/lib/fsp/fsp_common.c
+++ b/arch/x86/lib/fsp/fsp_common.c
@@ -95,8 +95,8 @@  int x86_fsp_init(void)
 		/*
 		 * The second time we enter here, adjust the size of malloc()
 		 * pool before relocation. Given gd->malloc_base was adjusted
-		 * after the call to board_init_f_mem() in arch/x86/cpu/start.S,
-		 * we should fix up gd->malloc_limit here.
+		 * after the call to board_init_f_init_reserve() in arch/x86/
+		 * cpu/start.S, we should fix up gd->malloc_limit here.
 		 */
 		gd->malloc_limit += CONFIG_FSP_SYS_MALLOC_F_LEN;
 	}
diff --git a/common/init/board_init.c b/common/init/board_init.c
index 1c6126d..db4d9a0 100644
--- a/common/init/board_init.c
+++ b/common/init/board_init.c
@@ -21,39 +21,121 @@  DECLARE_GLOBAL_DATA_PTR;
 #define _USE_MEMCPY
 #endif
 
-/* Unfortunately x86 can't compile this code as gd cannot be assigned */
-#ifndef CONFIG_X86
+/* Unfortunately x86 or ARM can't compile this code as gd cannot be assigned */
+#if !defined(CONFIG_X86) && !defined(CONFIG_ARM)
 __weak void arch_setup_gd(struct global_data *gd_ptr)
 {
 	gd = gd_ptr;
 }
-#endif /* !CONFIG_X86 */
+#endif /* !CONFIG_X86 && !CONFIG_SYS_THUMB_BUILD */
 
-ulong board_init_f_mem(ulong top)
+/*
+ * Compute size of space to reserve on stack for use as 'globals'
+ * and return size request for reserved area.
+ *
+ * Notes:
+ *
+ * Actual reservation cannot be done from within this function as
+ * it requires altering the C stack pointer, so this will be done by
+ * the caller upon return from this function.
+ *
+ * IMPORTANT:
+ *
+ * The return value of this function has two uses:
+ * - it determines the amount of stack reserved for global data and
+ *   the malloc arena;
+ * - it determines where the global data will be located, as on some
+ *   architectures the caller will set gd to the base of the reserved
+ *   location.
+ *
+ */
+
+ulong board_init_f_get_reserve_size(void)
+{
+	ulong size;
+
+	/* Reserve GD (rounded up to a multiple of 16 bytes) */
+	size = roundup(sizeof(struct global_data), 16);
+
+	/* Reserve malloc arena */
+#if defined(CONFIG_SYS_MALLOC_F)
+	size += CONFIG_SYS_MALLOC_F_LEN;
+#endif
+
+	return size;
+}
+
+/*
+ * Initialize reserved space (which has been safely allocated on the C
+ * stack from the C runtime environment handling code).
+ *
+ * Notes:
+ *
+ * Actual reservation was done by the caller; the locations from base
+ * to base+size-1 (where 'size' is the value returned by the allocation
+ * function above) can be accessed freely without risk of corrupting the
+ * C runtime environment.
+ *
+ * IMPORTANT:
+ *
+ * Upon return from the allocation function above, on some architectures
+ * the caller will set gd to the lowest reserved location. Therefore, in
+ * this initialization function, the global data MUST be placed at base.
+ *
+ * ALSO IMPORTANT:
+ *
+ * On some architectures, gd will only be set once arch_setup_gd() is
+ * called. Therefore:
+ *
+ * 	Do not use 'gd->' until arch_setup_gd() has been called!
+ *
+ * IMPORTANT TOO:
+ *
+ * Initialization for each "chunk" (GD, malloc arena...) ends with an
+ * incrementation line of the form 'base += <some size>'. The last of
+ * these incrementations seems useless, as base will not be used any
+ * more after this incrementation; but if/when a new "chunk" is appended,
+ * this increment will be essential as it will give base right value for
+ * this new chunk (which will have to end with its own incrementation
+ * statement). Besides, the compiler's optimizer will silently detect
+ * and remove the last base incrementation, therefore leaving that last
+ * (seemingly useless) incrementation causes no code increase.
+ */
+
+void board_init_f_init_reserve(ulong base)
 {
 	struct global_data *gd_ptr;
 #ifndef _USE_MEMCPY
 	int *ptr;
 #endif
 
-	/* Leave space for the stack we are running with now */
-	top -= 0x40;
+	/*
+	 * clear GD entirely and set it up
+	 */
 
-	top -= sizeof(struct global_data);
-	top = ALIGN(top, 16);
-	gd_ptr = (struct global_data *)top;
+	gd_ptr = (struct global_data *)base;
+	/* zero the area */
 #ifdef _USE_MEMCPY
 	memset(gd_ptr, '\0', sizeof(*gd));
 #else
 	for (ptr = (int *)gd_ptr; ptr < (int *)(gd_ptr + 1); )
 		*ptr++ = 0;
 #endif
+	/* set GD unless architecture did it already */
+#if !defined(CONFIG_X86) && !defined(CONFIG_ARM)
 	arch_setup_gd(gd_ptr);
+#endif
+	/* next alloc will be higher by one GD plus 16-byte alignment */
+	base += roundup(sizeof(struct global_data), 16);
+
+	/*
+	 * record malloc arena start
+	 */
 
 #if defined(CONFIG_SYS_MALLOC_F)
-	top -= CONFIG_SYS_MALLOC_F_LEN;
-	gd->malloc_base = top;
+	/* go down one malloc arena */
+	gd->malloc_base = base;
+	/* next alloc will be higher by one malloc arena size */
+	base += CONFIG_SYS_MALLOC_F_LEN;
 #endif
-
-	return top;
 }
diff --git a/include/common.h b/include/common.h
index 09a131d..988d67a 100644
--- a/include/common.h
+++ b/include/common.h
@@ -225,32 +225,25 @@  void board_init_f(ulong);
 void board_init_r(gd_t *, ulong) __attribute__ ((noreturn));
 
 /**
- * board_init_f_mem() - Allocate global data and set stack position
+ * ulong board_init_f_get_reserve_size - return size of reserved area
  *
  * This function is called by each architecture very early in the start-up
- * code to set up the environment for board_init_f(). It allocates space for
- * global_data (see include/asm-generic/global_data.h) and places the stack
- * below this.
+ * code to allow the C runtime to reserve space on the stack for writable
+ * 'globals' such as GD and the malloc arena.
  *
- * This function requires a stack[1] Normally this is at @top. The function
- * starts allocating space from 64 bytes below @top. First it creates space
- * for global_data. Then it calls arch_setup_gd() which sets gd to point to
- * the global_data space and can reserve additional bytes of space if
- * required). Finally it allocates early malloc() memory
- * (CONFIG_SYS_MALLOC_F_LEN). The new top of the stack is just below this,
- * and it returned by this function.
+ * @return:	size of reserved area
+ */
+ulong board_init_f_get_reserve_size(void);
+
+/**
+ * board_init_f_init_reserve - initialize the reserved area(s)
  *
- * [1] Strictly speaking it would be possible to implement this function
- * in C on many archs such that it does not require a stack. However this
- * does not seem hugely important as only 64 byte are wasted. The 64 bytes
- * are used to handle the calling standard which generally requires pushing
- * addresses or registers onto the stack. We should be able to get away with
- * less if this becomes important.
+ * This function is called once the C runtime has allocated the reserved
+ * area on the stack. It must initialize the GD at the base of that area.
  *
- * @top:	Top of available memory, also normally the top of the stack
- * @return:	New stack location
+ * @base:	top from which reservation was done
  */
-ulong board_init_f_mem(ulong top);
+void board_init_f_init_reserve(ulong base);
 
 /**
  * arch_setup_gd() - Set up the global_data pointer