diff mbox

[U-Boot,v3,1/2] arm: move C runtime setup code in crt0.S

Message ID 1352566813-30386-2-git-send-email-albert.u.boot@aribaud.net
State Superseded
Headers show

Commit Message

Albert ARIBAUD Nov. 10, 2012, 5 p.m. UTC
Move all the C runtime setup code from every start.S
in arch/arm into arch/arm/lib/crt0.S. This covers
the code sequence from isetting up the initial stack
to calling into board_init_r().

Also, rewrite the C runtime setup and make functions
board_init_*() and relocate_code() behave according to
normal C semantics (no jumping across the C stack any
more, etc).

Some SPL targets had to be touched because they use
start.S exolicitly or for some reason; the relevant
maintainers and custodians are cc:ed.

Signed-off-by: Albert ARIBAUD <albert.u.boot@aribaud.net>
---
Changes in v3:
- various clarifications and typo fixes
- fixed wrong LED calls conditional
- fixed open comment eating some code
- fixed code overrun in SPL case
Series-changes: 2
- moved description from cover letter to patch commit msg
- added note about tests in the cover letter
- fixed baords with CONFIG_SPL but not CONFIG_SPL_STACK

 arch/arm/cpu/arm1136/start.S              |   69 +++--------
 arch/arm/cpu/arm1176/start.S              |   62 ++--------
 arch/arm/cpu/arm720t/start.S              |   53 ++-------
 arch/arm/cpu/arm920t/start.S              |   61 ++--------
 arch/arm/cpu/arm925t/start.S              |   61 ++--------
 arch/arm/cpu/arm926ejs/start.S            |   80 ++-----------
 arch/arm/cpu/arm946es/start.S             |   56 ++-------
 arch/arm/cpu/arm_intcm/start.S            |   63 ++--------
 arch/arm/cpu/armv7/start.S                |   58 +++-------
 arch/arm/cpu/ixp/start.S                  |   55 ++-------
 arch/arm/cpu/pxa/start.S                  |   63 ++--------
 arch/arm/cpu/s3c44b0/start.S              |   55 ++-------
 arch/arm/cpu/sa1100/start.S               |   50 ++------
 arch/arm/lib/Makefile                     |    2 +
 arch/arm/lib/board.c                      |   11 --
 arch/arm/lib/crt0.S                       |  180 +++++++++++++++++++++++++++++
 include/common.h                          |    2 +-
 include/configs/socfpga_cyclone5.h        |    2 +-
 lib/asm-offsets.c                         |   10 ++
 nand_spl/board/freescale/mx31pdk/Makefile |    6 +-
 nand_spl/board/karo/tx25/Makefile         |    6 +-
 21 files changed, 329 insertions(+), 676 deletions(-)
 create mode 100644 arch/arm/lib/crt0.S

Comments

Simon Glass Nov. 15, 2012, 7:35 p.m. UTC | #1
Hi Albert,

On Sat, Nov 10, 2012 at 9:00 AM, Albert ARIBAUD
<albert.u.boot@aribaud.net> wrote:
> Move all the C runtime setup code from every start.S
> in arch/arm into arch/arm/lib/crt0.S. This covers
> the code sequence from isetting up the initial stack
> to calling into board_init_r().
>
> Also, rewrite the C runtime setup and make functions
> board_init_*() and relocate_code() behave according to
> normal C semantics (no jumping across the C stack any
> more, etc).
>
> Some SPL targets had to be touched because they use
> start.S exolicitly or for some reason; the relevant
> maintainers and custodians are cc:ed.
>
> Signed-off-by: Albert ARIBAUD <albert.u.boot@aribaud.net>

I am very pleased to see this. My generic relocation series from some
time ago is still hanging around. Your series tidies up some things
which would make that series easier to do. I'm happy to take another
look at it.

I have tested your series on a Tegra seaboard, and got it running with
a few changes. Please see comments below.

> ---
> Changes in v3:
> - various clarifications and typo fixes
> - fixed wrong LED calls conditional
> - fixed open comment eating some code
> - fixed code overrun in SPL case
> Series-changes: 2
> - moved description from cover letter to patch commit msg
> - added note about tests in the cover letter
> - fixed baords with CONFIG_SPL but not CONFIG_SPL_STACK
>
>  arch/arm/cpu/arm1136/start.S              |   69 +++--------
>  arch/arm/cpu/arm1176/start.S              |   62 ++--------
>  arch/arm/cpu/arm720t/start.S              |   53 ++-------
>  arch/arm/cpu/arm920t/start.S              |   61 ++--------
>  arch/arm/cpu/arm925t/start.S              |   61 ++--------
>  arch/arm/cpu/arm926ejs/start.S            |   80 ++-----------
>  arch/arm/cpu/arm946es/start.S             |   56 ++-------
>  arch/arm/cpu/arm_intcm/start.S            |   63 ++--------
>  arch/arm/cpu/armv7/start.S                |   58 +++-------
>  arch/arm/cpu/ixp/start.S                  |   55 ++-------
>  arch/arm/cpu/pxa/start.S                  |   63 ++--------
>  arch/arm/cpu/s3c44b0/start.S              |   55 ++-------
>  arch/arm/cpu/sa1100/start.S               |   50 ++------
>  arch/arm/lib/Makefile                     |    2 +
>  arch/arm/lib/board.c                      |   11 --
>  arch/arm/lib/crt0.S                       |  180 +++++++++++++++++++++++++++++
>  include/common.h                          |    2 +-
>  include/configs/socfpga_cyclone5.h        |    2 +-
>  lib/asm-offsets.c                         |   10 ++
>  nand_spl/board/freescale/mx31pdk/Makefile |    6 +-
>  nand_spl/board/karo/tx25/Makefile         |    6 +-
>  21 files changed, 329 insertions(+), 676 deletions(-)
>  create mode 100644 arch/arm/lib/crt0.S
>
[snip]
> diff --git a/arch/arm/lib/crt0.S b/arch/arm/lib/crt0.S
> new file mode 100644
> index 0000000..ad6c71d
> --- /dev/null
> +++ b/arch/arm/lib/crt0.S
> @@ -0,0 +1,180 @@
> +/*
> + *  crt0 - C-runtime startup Code for ARM U-Boot
> + *
> + *  Copyright (c) 2012  Albert ARIBAUD <albert.u.boot@aribaud.net>
> + *
> + * See file CREDITS for list of people who contributed to this
> + * project.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License as
> + * published by the Free Software Foundation; either version 2 of
> + * the License, or (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, Inc., 59 Temple Place, Suite 330, Boston,
> + * MA 02111-1307 USA
> + */
> +
> +#include <config.h>
> +#include <asm-offsets.h>
> +
> +/*
> + * This file handles the target-independent stages of the U-Boot
> + * start-up where a C runtime environment is needed. Its entry point
> + * is _main and is branched into from the target's start.S file.
> + *
> + * _main execution sequence is:
> + *
> + * 1. Set up initial environment for calling board_init_f().
> + *    This environment only provides a stack and a place to store
> + *    the GD ('global data') structure, both located in some readily
> + *    available RAM (SRAM, locked cache...). In this context, VARIABLE
> + *    global data, initialized or not (BSS), are UNAVAILABLE; only
> + *    CONSTANT initialized data are available.
> + *
> + * 2. Call board_init_f(). This function prepares the hardware for
> + *    execution from system RAM (DRAM, DDR...) As system RAM may not
> + *    be available yet, , board_init_f() must use the current GD to
> + *    store any data which must be passed on to later stages. These
> + *    data include the relocation destination, the future stack, and
> + *    the future GD location.
> + *
> + * 3. Set up intermediate environment where the stack and GD are the
> + *    ones allocated by board_init_f() in system RAM, but BSS and
> + *    initialized non-const data are still not available.
> + *
> + * 4. Call relocate_code(). This function relocates U-Boot from its
> + *    current location into the relocation destination computed by
> + *    board_init_f().
> + *
> + * 5. Set up final environment for calling board_init_r(). This
> + *    environment has BSS (initialized to 0), initialized non-const
> + *    data (initialized to their intended value), and stack in system
> + *    RAM. GD has retained values set by board_init_f(). Some CPUs
> + *    have some work left to do at this point regarding memory, so
> + *    call c_runtime_cpu_setup.
> + *
> + * 6. Branch to either _nand_boot() or board_init_r().
> + */
> +
> +/*
> + * offset of the nand_boot() function for SPL crt
> + */
> +
> +#if defined(CONFIG_NAND_SPL)
> +
> +.globl nand_boot
> +_nand_boot:
> +       .word nand_boot
> +
> +#elif ! defined(CONFIG_SPL_BUILD)
> +
> +/*
> + * offset of the board_init_r() function for non-SPL crt
> + */
> +
> +.globl board_init_r
> +_board_init_r:
> +       .word board_init_r
> +
> +#endif
> +
> +/*
> + * start and end of BSS
> + */
> +
> +.globl __bss_start
> +.globl __bss_end__
> +
> +/*
> + * entry point of crt0 sequence
> + */
> +
> +.global _main
> +
> +_main:
> +
> +/*
> + * Set up initial C runtime environment and call board_init_f(0).
> + */
> +
> +#if defined(CONFIG_NAND_SPL)
> +       /* deprecated, use instead CONFIG_SPL_BUILD */
> +       ldr     sp, =(CONFIG_SYS_INIT_SP_ADDR)
> +#elif defined(CONFIG_SPL_BUILD) && defined(CONFIG_SPL_STACK)
> +       ldr     sp, =(CONFIG_SPL_STACK)
> +#else
> +       ldr     sp, =(CONFIG_SYS_INIT_SP_ADDR)
> +#endif
> +       bic     sp, sp, #7 /* 8-byte alignment for ABI compliance */
> +       mov     r8, sp          /* GD is above SP */
> +       mov     r0, #0
> +       bl      board_init_f
> +
> +/*
> + * Set up intermediate environment (new sp and gd) and call
> + * relocate_code(addr_sp, gd, addr_moni). Trick here is that
> + * we'll return 'here' but relocated.
> + */
> +
> +       ldr     sp, [r8, #GD_START_ADDR_SP]     /* r8 = gd->start_addr_sp */
> +       ldr     r8, [r8, #GD_BD]                /* r8 = gd->bd */
> +       sub     r8, r8, #GD_SIZE                /* new GD is below bd */
> +
> +#if ! defined(CONFIG_SPL_BUILD)
> +
> +       adr     lr, here
> +       ldr     r0, [r8, #GD_RELOC_OFF]         /* lr = gd->start_addr_sp */
> +       add     lr, lr, r0
> +       ldr     r0, [r8, #GD_START_ADDR_SP]     /* r0 = gd->start_addr_sp */
> +       mov     r1, r8                          /* r1 = gd */
> +       ldr     r2, [r8, #GD_RELOCADDR]         /* r2 = gd->relocaddr */
> +       b       relocate_code
> +here:
> +
> +#endif
> +
> +/* Set up final (full) environment */
> +
> +       bl      c_runtime_cpu_setup     /* we still call old routine here */
> +
> +       ldr     r0, =__bss_start        /* this is auto-relocated! */
> +       ldr     r1, =__bss_end__        /* this is auto-relocated! */

I noticed this last time around - really we should have __bss_end__,
not __bss_end. Something to look at in future work, perhaps.

> +
> +       mov     r2, #0x00000000         /* prepare zero to clear BSS */
> +
> +clbss_l:cmp    r0, r1                  /* while not at end of BSS */
> +       strlo   r2, [r0]                /* clear 32-bit BSS word */
> +       addlo   r0, r0, #4              /* move to next */
> +       bhs     clbss_l

I think this should be blo, not bhs. Otherwise the BSS does not get
cleared and my seaboard doesn't boot properly.

> +
> +#if ! defined(CONFIG_SPL_BUILD)
> +
> +       bl coloured_LED_init
> +       bl red_led_on
> +
> +#endif
> +
> +#if defined(CONFIG_NAND_SPL)
> +
> +       /* call _nand_boot() */
> +       ldr     pc, _nand_boot
> +
> +#else
> +
> +       /* call board_init_r(gd_t *id, ulong dest_addr) */
> +       mov     r0, r8                  /* gd_t */
> +       ldr     r1, [r8, #GD_RELOCADDR] /* dest_addr */
> +       /* call board_init_r */
> +       ldr     pc, _board_init_r       /* this is auto-relocated! */

For my toolchain I need:

ldr     pc, =board_init_r

otherwise I get:

crt0.S: Assembler messages:
crt0.S:176: Error: internal_relocation (type: OFFSET_IMM) not fixed up

Also _board_init_r seems to be the wrong symbol name for me:

arch/arm/lib/libarm.o: In function `clbss_l':
/home/sjg/trunk/src/third_party/u-boot/files/arch/arm/lib/crt0.S:176:
undefined reference to `_board_init_r'

> +
> +#endif
> +
> +       /* we should not return here. */

[snip]

Regards,
Simon
Albert ARIBAUD Nov. 15, 2012, 10:41 p.m. UTC | #2
Hi Simon,

On Thu, 15 Nov 2012 11:35:27 -0800, Simon Glass <sjg@chromium.org>
wrote:

> Hi Albert,
> 
> On Sat, Nov 10, 2012 at 9:00 AM, Albert ARIBAUD
> <albert.u.boot@aribaud.net> wrote:
> > Move all the C runtime setup code from every start.S
> > in arch/arm into arch/arm/lib/crt0.S. This covers
> > the code sequence from isetting up the initial stack
> > to calling into board_init_r().
> >
> > Also, rewrite the C runtime setup and make functions
> > board_init_*() and relocate_code() behave according to
> > normal C semantics (no jumping across the C stack any
> > more, etc).
> >
> > Some SPL targets had to be touched because they use
> > start.S exolicitly or for some reason; the relevant
> > maintainers and custodians are cc:ed.
> >
> > Signed-off-by: Albert ARIBAUD <albert.u.boot@aribaud.net>
> 
> I am very pleased to see this. My generic relocation series from some
> time ago is still hanging around. Your series tidies up some things
> which would make that series easier to do. I'm happy to take another
> look at it.

I do remember your series -- it is part of what gave me the incentive 
toput this patch series together. :)

> I have tested your series on a Tegra seaboard, and got it running with
> a few changes. Please see comments below.

> > +       ldr     r0, =__bss_start        /* this is auto-relocated! */
> > +       ldr     r1, =__bss_end__        /* this is auto-relocated! */
> 
> I noticed this last time around - really we should have __bss_end__,
> not __bss_end. Something to look at in future work, perhaps.

Noted.

> > +       bhs     clbss_l
> 
> I think this should be blo, not bhs. Otherwise the BSS does not get
> cleared and my seaboard doesn't boot properly.

Thanks -- not the first time I see these... I'll have a go at checking
all loops.

> > +       /* call board_init_r */
> > +       ldr     pc, _board_init_r       /* this is auto-relocated! */
> 
> For my toolchain I need:
> 
> ldr     pc, =board_init_r
> 
> otherwise I get:
> 
> crt0.S: Assembler messages:
> crt0.S:176: Error: internal_relocation (type: OFFSET_IMM) not fixed up

Thanks -- this could well cure errors that Tom Warren reported to me
for various toolchains.

> Also _board_init_r seems to be the wrong symbol name for me:
> 
> arch/arm/lib/libarm.o: In function `clbss_l':
> /home/sjg/trunk/src/third_party/u-boot/files/arch/arm/lib/crt0.S:176:
> undefined reference to `_board_init_r'

Thanks a lot for the feedback and analysis! This should make V4 work on
a much wider range of ARM boards.

> Regards,
> Simon

Amicalement,
diff mbox

Patch

diff --git a/arch/arm/cpu/arm1136/start.S b/arch/arm/cpu/arm1136/start.S
index 5d3b4c2..d19c1ed 100644
--- a/arch/arm/cpu/arm1136/start.S
+++ b/arch/arm/cpu/arm1136/start.S
@@ -165,13 +165,7 @@  next:
 	bl  cpu_init_crit
 #endif
 
-/* Set stackpointer in internal RAM to call board_init_f */
-call_board_init_f:
-	ldr	sp, =(CONFIG_SYS_INIT_SP_ADDR)
-	bic	sp, sp, #7 /* 8-byte alignment for ABI compliance */
-	ldr	r0,=0x00000000
-
-	bl	board_init_f
+	bl	_main
 
 /*------------------------------------------------------------------------------*/
 
@@ -188,14 +182,10 @@  relocate_code:
 	mov	r5, r1	/* save addr of gd */
 	mov	r6, r2	/* save addr of destination */
 
-	/* Set up the stack						    */
-stack_setup:
-	mov	sp, r4
-
 	adr	r0, _start
 	cmp	r0, r6
 	moveq	r9, #0		/* no relocation. relocation offset(r9) = 0 */
-	beq	clear_bss		/* skip relocation */
+	beq	relocate_done		/* skip relocation */
 	mov	r1, r6			/* r1 <- scratch for copy_loop */
 	ldr	r3, _image_copy_end_ofs
 	add	r2, r0, r3		/* r2 <- source end address	    */
@@ -255,52 +245,21 @@  _dynsym_start_ofs:
 	.word __dynsym_start - _start
 #endif
 
-clear_bss:
-#ifdef CONFIG_SPL_BUILD
-	/* No relocation for SPL */
-	ldr	r0, =__bss_start
-	ldr	r1, =__bss_end__
-#else
-	ldr	r0, _bss_start_ofs
-	ldr	r1, _bss_end_ofs
-	mov	r4, r6			/* reloc addr */
-	add	r0, r0, r4
-	add	r1, r1, r4
-#endif
-	mov	r2, #0x00000000		/* clear			    */
+relocate_done:
 
-clbss_l:cmp	r0, r1			/* clear loop... */
-	bhs	clbss_e			/* if reached end of bss, exit */
-	str	r2, [r0]
-	add	r0, r0, #4
-	b	clbss_l
-clbss_e:
+	bx	lr
 
-/*
- * We are done. Do not return, instead branch to second part of board
- * initialization, now running from RAM.
- */
-#ifdef CONFIG_NAND_SPL
-	ldr     r0, _nand_boot_ofs
-	mov	pc, r0
+_rel_dyn_start_ofs:
+	.word __rel_dyn_start - _start
+_rel_dyn_end_ofs:
+	.word __rel_dyn_end - _start
+_dynsym_start_ofs:
+	.word __dynsym_start - _start
 
-_nand_boot_ofs:
-	.word nand_boot
-#else
-jump_2_ram:
-	ldr	r0, _board_init_r_ofs
-	adr	r1, _start
-	add	lr, r0, r1
-	add	lr, lr, r9
-	/* setup parameters for board_init_r */
-	mov	r0, r5		/* gd_t */
-	mov	r1, r6		/* dest_addr */
-	/* jump to it ... */
-	mov	pc, lr
-
-_board_init_r_ofs:
-	.word board_init_r - _start
-#endif
+	.globl	c_runtime_cpu_setup
+c_runtime_cpu_setup:
+
+	bx	lr
 
 /*
  *************************************************************************
diff --git a/arch/arm/cpu/arm1176/start.S b/arch/arm/cpu/arm1176/start.S
index 667a0e0..40df4b1 100644
--- a/arch/arm/cpu/arm1176/start.S
+++ b/arch/arm/cpu/arm1176/start.S
@@ -224,12 +224,7 @@  skip_tcmdisable:
 	 */
 	bl	lowlevel_init		/* go setup pll,mux,memory */
 
-/* Set stackpointer in internal RAM to call board_init_f */
-call_board_init_f:
-	ldr	sp, =(CONFIG_SYS_INIT_SP_ADDR)
-	bic	sp, sp, #7 /* 8-byte alignment for ABI compliance */
-	ldr	r0,=0x00000000
-	bl	board_init_f
+	bl	_main
 
 /*------------------------------------------------------------------------------*/
 
@@ -246,14 +241,10 @@  relocate_code:
 	mov	r5, r1	/* save addr of gd */
 	mov	r6, r2	/* save addr of destination */
 
-	/* Set up the stack						    */
-stack_setup:
-	mov	sp, r4
-
 	adr	r0, _start
 	cmp	r0, r6
 	moveq	r9, #0		/* no relocation. relocation offset(r9) = 0 */
-	beq	clear_bss		/* skip relocation */
+	beq	relocate_done		/* skip relocation */
 	mov	r1, r6			/* r1 <- scratch for copy_loop */
 	ldr	r3, _bss_start_ofs
 	add	r2, r0, r3		/* r2 <- source end address	    */
@@ -343,49 +334,9 @@  mmu_enable:
 skip_hw_init:
 #endif
 
-clear_bss:
-#ifndef CONFIG_SPL_BUILD
-	ldr	r0, _bss_start_ofs
-	ldr	r1, _bss_end_ofs
-	mov	r4, r6			/* reloc addr */
-	add	r0, r0, r4
-	add	r1, r1, r4
-	mov	r2, #0x00000000		/* clear			    */
-
-clbss_l:cmp	r0, r1			/* clear loop... */
-	bhs	clbss_e			/* if reached end of bss, exit */
-	str	r2, [r0]
-	add	r0, r0, #4
-	b	clbss_l
-clbss_e:
-#ifndef CONFIG_NAND_SPL
-	bl coloured_LED_init
-	bl red_led_on
-#endif
-#endif
+relocate_done:
 
-/*
- * We are done. Do not return, instead branch to second part of board
- * initialization, now running from RAM.
- */
-#ifdef CONFIG_NAND_SPL
-	ldr     pc, _nand_boot
-
-_nand_boot: .word nand_boot
-#else
-	ldr	r0, _board_init_r_ofs
-	adr	r1, _start
-	add	lr, r0, r1
-	add     lr, lr, r9
-	/* setup parameters for board_init_r */
-	mov	r0, r5		/* gd_t */
-	mov	r1, r6		/* dest_addr */
-	/* jump to it ... */
-	mov	pc, lr
-
-_board_init_r_ofs:
-	.word board_init_r - _start
-#endif
+	bx	lr
 
 _rel_dyn_start_ofs:
 	.word __rel_dyn_start - _start
@@ -399,6 +350,11 @@  _mmu_table_base:
 	.word mmu_table
 #endif
 
+	.globl	c_runtime_cpu_setup
+c_runtime_cpu_setup:
+
+	mov	pc, lr
+
 #ifndef CONFIG_NAND_SPL
 /*
  * we assume that cache operation is done before. (eg. cleanup_before_linux())
diff --git a/arch/arm/cpu/arm720t/start.S b/arch/arm/cpu/arm720t/start.S
index c2a7763..771d386 100644
--- a/arch/arm/cpu/arm720t/start.S
+++ b/arch/arm/cpu/arm720t/start.S
@@ -147,12 +147,7 @@  reset:
 	bl	cpu_init_crit
 #endif
 
-/* Set stackpointer in internal RAM to call board_init_f */
-call_board_init_f:
-	ldr	sp, =(CONFIG_SYS_INIT_SP_ADDR)
-	bic	sp, sp, #7 /* 8-byte alignment for ABI compliance */
-	ldr	r0,=0x00000000
-	bl	board_init_f
+	bl	_main
 
 /*------------------------------------------------------------------------------*/
 
@@ -169,14 +164,10 @@  relocate_code:
 	mov	r5, r1	/* save addr of gd */
 	mov	r6, r2	/* save addr of destination */
 
-	/* Set up the stack						    */
-stack_setup:
-	mov	sp, r4
-
 	adr	r0, _start
 	cmp	r0, r6
 	moveq	r9, #0		/* no relocation. relocation offset(r9) = 0 */
-	beq	clear_bss		/* skip relocation */
+	beq	relocate_done		/* skip relocation */
 	mov	r1, r6			/* r1 <- scratch for copy_loop */
 	ldr	r3, _bss_start_ofs
 	add	r2, r0, r3		/* r2 <- source end address	    */
@@ -228,43 +219,10 @@  fixnext:
 	blo	fixloop
 #endif
 
-clear_bss:
-#ifndef CONFIG_SPL_BUILD
-	ldr	r0, _bss_start_ofs
-	ldr	r1, _bss_end_ofs
-	mov	r4, r6			/* reloc addr */
-	add	r0, r0, r4
-	add	r1, r1, r4
-	mov	r2, #0x00000000		/* clear			    */
-
-clbss_l:cmp	r0, r1			/* clear loop... */
-	bhs	clbss_e			/* if reached end of bss, exit */
-	str	r2, [r0]
-	add	r0, r0, #4
-	b	clbss_l
-clbss_e:
-
-	bl coloured_LED_init
-	bl red_led_on
-#endif
+relocate_done:
 
-/*
- * We are done. Do not return, instead branch to second part of board
- * initialization, now running from RAM.
- */
-	ldr	r0, _board_init_r_ofs
-	adr	r1, _start
-	add	lr, r0, r1
-	add	lr, lr, r9
-	/* setup parameters for board_init_r */
-	mov	r0, r5		/* gd_t */
-	mov	r1, r6		/* dest_addr */
-	/* jump to it ... */
 	mov	pc, lr
 
-_board_init_r_ofs:
-	.word board_init_r - _start
-
 _rel_dyn_start_ofs:
 	.word __rel_dyn_start - _start
 _rel_dyn_end_ofs:
@@ -272,6 +230,11 @@  _rel_dyn_end_ofs:
 _dynsym_start_ofs:
 	.word __dynsym_start - _start
 
+	.globl	c_runtime_cpu_setup
+c_runtime_cpu_setup:
+
+	mov	pc, lr
+
 /*
  *************************************************************************
  *
diff --git a/arch/arm/cpu/arm920t/start.S b/arch/arm/cpu/arm920t/start.S
index 14c9156..511d21d 100644
--- a/arch/arm/cpu/arm920t/start.S
+++ b/arch/arm/cpu/arm920t/start.S
@@ -182,12 +182,7 @@  copyex:
 	bl	cpu_init_crit
 #endif
 
-/* Set stackpointer in internal RAM to call board_init_f */
-call_board_init_f:
-	ldr	sp, =(CONFIG_SYS_INIT_SP_ADDR)
-	bic	sp, sp, #7 /* 8-byte alignment for ABI compliance */
-	ldr	r0,=0x00000000
-	bl	board_init_f
+	bl	_main
 
 /*------------------------------------------------------------------------------*/
 
@@ -204,14 +199,10 @@  relocate_code:
 	mov	r5, r1	/* save addr of gd */
 	mov	r6, r2	/* save addr of destination */
 
-	/* Set up the stack						    */
-stack_setup:
-	mov	sp, r4
-
 	adr	r0, _start
 	cmp	r0, r6
 	moveq	r9, #0		/* no relocation. relocation offset(r9) = 0 */
-	beq	clear_bss		/* skip relocation */
+	beq	relocate_done		/* skip relocation */
 	mov	r1, r6			/* r1 <- scratch for copy_loop */
 	ldr	r3, _bss_start_ofs
 	add	r2, r0, r3		/* r2 <- source end address	    */
@@ -263,51 +254,10 @@  fixnext:
 	blo	fixloop
 #endif
 
-clear_bss:
-#ifndef CONFIG_SPL_BUILD
-	ldr	r0, _bss_start_ofs
-	ldr	r1, _bss_end_ofs
-	mov	r4, r6			/* reloc addr */
-	add	r0, r0, r4
-	add	r1, r1, r4
-	mov	r2, #0x00000000		/* clear			    */
-
-clbss_l:cmp	r0, r1			/* clear loop... */
-	bhs	clbss_e			/* if reached end of bss, exit */
-	str	r2, [r0]
-	add	r0, r0, #4
-	b	clbss_l
-clbss_e:
-
-	bl coloured_LED_init
-	bl red_led_on
-#endif
-
-/*
- * We are done. Do not return, instead branch to second part of board
- * initialization, now running from RAM.
- */
-#ifdef CONFIG_NAND_SPL
-	ldr     r0, _nand_boot_ofs
-	mov	pc, r0
+relocate_done:
 
-_nand_boot_ofs:
-	.word nand_boot
-#else
-	ldr	r0, _board_init_r_ofs
-	adr	r1, _start
-	add	lr, r0, r1
-	add	lr, lr, r9
-	/* setup parameters for board_init_r */
-	mov	r0, r5		/* gd_t */
-	mov	r1, r6		/* dest_addr */
-	/* jump to it ... */
 	mov	pc, lr
 
-_board_init_r_ofs:
-	.word board_init_r - _start
-#endif
-
 _rel_dyn_start_ofs:
 	.word __rel_dyn_start - _start
 _rel_dyn_end_ofs:
@@ -315,6 +265,11 @@  _rel_dyn_end_ofs:
 _dynsym_start_ofs:
 	.word __dynsym_start - _start
 
+	.globl	c_runtime_cpu_setup
+c_runtime_cpu_setup:
+
+	mov	pc, lr
+
 /*
  *************************************************************************
  *
diff --git a/arch/arm/cpu/arm925t/start.S b/arch/arm/cpu/arm925t/start.S
index 3a483f6..82f5b8b 100644
--- a/arch/arm/cpu/arm925t/start.S
+++ b/arch/arm/cpu/arm925t/start.S
@@ -176,12 +176,7 @@  poll1:
 	bl  cpu_init_crit
 #endif
 
-/* Set stackpointer in internal RAM to call board_init_f */
-call_board_init_f:
-	ldr	sp, =(CONFIG_SYS_INIT_SP_ADDR)
-	bic	sp, sp, #7 /* 8-byte alignment for ABI compliance */
-	ldr	r0,=0x00000000
-	bl	board_init_f
+	bl	_main
 
 /*------------------------------------------------------------------------------*/
 
@@ -198,14 +193,10 @@  relocate_code:
 	mov	r5, r1	/* save addr of gd */
 	mov	r6, r2	/* save addr of destination */
 
-	/* Set up the stack						    */
-stack_setup:
-	mov	sp, r4
-
 	adr	r0, _start
 	cmp	r0, r6
 	moveq	r9, #0		/* no relocation. relocation offset(r9) = 0 */
-	beq	clear_bss		/* skip relocation */
+	beq	relocate_done		/* skip relocation */
 	mov	r1, r6			/* r1 <- scratch for copy_loop */
 	ldr	r3, _bss_start_ofs
 	add	r2, r0, r3		/* r2 <- source end address	    */
@@ -257,51 +248,10 @@  fixnext:
 	blo	fixloop
 #endif
 
-clear_bss:
-#ifndef CONFIG_SPL_BUILD
-	ldr	r0, _bss_start_ofs
-	ldr	r1, _bss_end_ofs
-	mov	r4, r6			/* reloc addr */
-	add	r0, r0, r4
-	add	r1, r1, r4
-	mov	r2, #0x00000000		/* clear			    */
-
-clbss_l:cmp	r0, r1			/* clear loop... */
-	bhs	clbss_e			/* if reached end of bss, exit */
-	str	r2, [r0]
-	add	r0, r0, #4
-	b	clbss_l
-clbss_e:
-
-	bl coloured_LED_init
-	bl red_led_on
-#endif
-
-/*
- * We are done. Do not return, instead branch to second part of board
- * initialization, now running from RAM.
- */
-#ifdef CONFIG_NAND_SPL
-	ldr     r0, _nand_boot_ofs
-	mov	pc, r0
+relocate_done:
 
-_nand_boot_ofs:
-	.word nand_boot
-#else
-	ldr	r0, _board_init_r_ofs
-	adr	r1, _start
-	add	lr, r0, r1
-	add	lr, lr, r9
-	/* setup parameters for board_init_r */
-	mov	r0, r5		/* gd_t */
-	mov	r1, r6		/* dest_addr */
-	/* jump to it ... */
 	mov	pc, lr
 
-_board_init_r_ofs:
-	.word board_init_r - _start
-#endif
-
 _rel_dyn_start_ofs:
 	.word __rel_dyn_start - _start
 _rel_dyn_end_ofs:
@@ -309,6 +259,11 @@  _rel_dyn_end_ofs:
 _dynsym_start_ofs:
 	.word __dynsym_start - _start
 
+	.globl	c_runtime_cpu_setup
+c_runtime_cpu_setup:
+
+	mov	pc, lr
+
 /*
  *************************************************************************
  *
diff --git a/arch/arm/cpu/arm926ejs/start.S b/arch/arm/cpu/arm926ejs/start.S
index 2188f7e..5dfd49a 100644
--- a/arch/arm/cpu/arm926ejs/start.S
+++ b/arch/arm/cpu/arm926ejs/start.S
@@ -198,20 +198,7 @@  reset:
 	bl	cpu_init_crit
 #endif
 
-/* Set stackpointer in internal RAM to call board_init_f */
-call_board_init_f:
-#ifdef CONFIG_NAND_SPL /* deprecated, use instead CONFIG_SPL_BUILD */
-	ldr	sp, =(CONFIG_SYS_INIT_SP_ADDR)
-#else
-#ifdef CONFIG_SPL_BUILD
-	ldr	sp, =(CONFIG_SPL_STACK)
-#else
-	ldr	sp, =(CONFIG_SYS_INIT_SP_ADDR)
-#endif
-#endif
-	bic	sp, sp, #7 /* 8-byte alignment for ABI compliance */
-	ldr	r0,=0x00000000
-	bl	board_init_f
+	bl	_main
 
 /*------------------------------------------------------------------------------*/
 
@@ -229,15 +216,11 @@  relocate_code:
 	mov	r5, r1	/* save addr of gd */
 	mov	r6, r2	/* save addr of destination */
 
-	/* Set up the stack						    */
-stack_setup:
-	mov	sp, r4
-
 	adr	r0, _start
 	sub	r9, r6, r0		/* r9 <- relocation offset */
 	cmp	r0, r6
-	moveq	r9, #0		/* no relocation. relocation offset(r9) = 0 */
-	beq	clear_bss		/* skip relocation */
+	moveq	r9, #0			/* no relocation. offset(r9) = 0 */
+	beq	relocate_done		/* skip relocation */
 	mov	r1, r6			/* r1 <- scratch for copy loop */
 	ldr	r3, _bss_start_ofs
 	add	r2, r0, r3		/* r2 <- source end address	    */
@@ -289,56 +272,9 @@  fixnext:
 	blo	fixloop
 #endif
 
-clear_bss:
-#ifdef CONFIG_SPL_BUILD
-	/* No relocation for SPL */
-	ldr	r0, =__bss_start
-	ldr	r1, =__bss_end__
-#else
-	ldr	r0, _bss_start_ofs
-	ldr	r1, _bss_end_ofs
-	mov	r4, r6			/* reloc addr */
-	add	r0, r0, r4
-	add	r1, r1, r4
-#endif
-	mov	r2, #0x00000000		/* clear			    */
+relocate_done:
 
-clbss_l:cmp	r0, r1			/* clear loop... */
-	bhs	clbss_e			/* if reached end of bss, exit */
-	str	r2, [r0]
-	add	r0, r0, #4
-	b	clbss_l
-clbss_e:
-
-#ifndef CONFIG_SPL_BUILD
-	bl coloured_LED_init
-	bl red_led_on
-#endif
-
-/*
- * We are done. Do not return, instead branch to second part of board
- * initialization, now running from RAM.
- */
-#ifdef CONFIG_NAND_SPL
-	ldr     r0, _nand_boot_ofs
-	mov	pc, r0
-
-_nand_boot_ofs:
-	.word nand_boot
-#else
-	ldr	r0, _board_init_r_ofs
-	ldr	r1, _TEXT_BASE
-	add	lr, r0, r1
-	add	lr, lr, r9
-	/* setup parameters for board_init_r */
-	mov	r0, r5		/* gd_t */
-	mov	r1, r6		/* dest_addr */
-	/* jump to it ... */
-	mov	pc, lr
-
-_board_init_r_ofs:
-	.word board_init_r - _start
-#endif
+	bx	lr
 
 _rel_dyn_start_ofs:
 	.word __rel_dyn_start - _start
@@ -346,8 +282,14 @@  _rel_dyn_end_ofs:
 	.word __rel_dyn_end - _start
 _dynsym_start_ofs:
 	.word __dynsym_start - _start
+
 #endif
 
+	.globl	c_runtime_cpu_setup
+c_runtime_cpu_setup:
+
+	mov	pc, lr
+
 /*
  *************************************************************************
  *
diff --git a/arch/arm/cpu/arm946es/start.S b/arch/arm/cpu/arm946es/start.S
index 30e2183..a7a98a4 100644
--- a/arch/arm/cpu/arm946es/start.S
+++ b/arch/arm/cpu/arm946es/start.S
@@ -147,12 +147,7 @@  reset:
 	bl	cpu_init_crit
 #endif
 
-/* Set stackpointer in internal RAM to call board_init_f */
-call_board_init_f:
-	ldr	sp, =(CONFIG_SYS_INIT_SP_ADDR)
-	bic	sp, sp, #7 /* 8-byte alignment for ABI compliance */
-	ldr	r0,=0x00000000
-	bl	board_init_f
+	bl	_main
 
 /*------------------------------------------------------------------------------*/
 
@@ -169,14 +164,10 @@  relocate_code:
 	mov	r5, r1	/* save addr of gd */
 	mov	r6, r2	/* save addr of destination */
 
-	/* Set up the stack						    */
-stack_setup:
-	mov	sp, r4
-
 	adr	r0, _start
 	cmp	r0, r6
 	moveq	r9, #0		/* no relocation. relocation offset(r9) = 0 */
-	beq	clear_bss		/* skip relocation */
+	beq	relocate_done		/* skip relocation */
 	mov	r1, r6			/* r1 <- scratch for copy_loop */
 	ldr	r3, _bss_start_ofs
 	add	r2, r0, r3		/* r2 <- source end address	    */
@@ -228,46 +219,10 @@  fixnext:
 	blo	fixloop
 #endif
 
-clear_bss:
-#ifndef CONFIG_SPL_BUILD
-	ldr	r0, _bss_start_ofs
-	ldr	r1, _bss_end_ofs
-	mov	r4, r6			/* reloc addr */
-	add	r0, r0, r4
-	add	r1, r1, r4
-	mov	r2, #0x00000000		/* clear			    */
-
-clbss_l:cmp	r0, r1			/* clear loop... */
-	bhs	clbss_e			/* if reached end of bss, exit */
-	str	r2, [r0]
-	add	r0, r0, #4
-	b	clbss_l
-clbss_e:
-#endif
-
-/*
- * We are done. Do not return, instead branch to second part of board
- * initialization, now running from RAM.
- */
-#ifdef CONFIG_NAND_SPL
-	ldr     pc, _nand_boot
+relocate_done:
 
-_nand_boot: .word nand_boot
-#else
-	ldr	r0, _board_init_r_ofs
-	adr	r1, _start
-	add	lr, r0, r1
-	add	lr, lr, r9
-	/* setup parameters for board_init_r */
-	mov	r0, r5		/* gd_t */
-	mov	r1, r6		/* dest_addr */
-	/* jump to it ... */
 	mov	pc, lr
 
-_board_init_r_ofs:
-	.word board_init_r - _start
-#endif
-
 _rel_dyn_start_ofs:
 	.word __rel_dyn_start - _start
 _rel_dyn_end_ofs:
@@ -275,6 +230,11 @@  _rel_dyn_end_ofs:
 _dynsym_start_ofs:
 	.word __dynsym_start - _start
 
+	.globl	c_runtime_cpu_setup
+c_runtime_cpu_setup:
+
+	mov	pc, lr
+
 /*
  *************************************************************************
  *
diff --git a/arch/arm/cpu/arm_intcm/start.S b/arch/arm/cpu/arm_intcm/start.S
index a133d19..c189849 100644
--- a/arch/arm/cpu/arm_intcm/start.S
+++ b/arch/arm/cpu/arm_intcm/start.S
@@ -143,12 +143,7 @@  reset:
 	bl	cpu_init_crit
 #endif
 
-/* Set stackpointer in internal RAM to call board_init_f */
-call_board_init_f:
-	ldr	sp, =(CONFIG_SYS_INIT_SP_ADDR)
-	bic	sp, sp, #7 /* 8-byte alignment for ABI compliance */
-	ldr	r0,=0x00000000
-	bl	board_init_f
+	bl	_main
 
 /*------------------------------------------------------------------------------*/
 
@@ -165,14 +160,10 @@  relocate_code:
 	mov	r5, r1	/* save addr of gd */
 	mov	r6, r2	/* save addr of destination */
 
-	/* Set up the stack						    */
-stack_setup:
-	mov	sp, r4
-
 	adr	r0, _start
 	cmp	r0, r6
 	moveq	r9, #0		/* no relocation. relocation offset(r9) = 0 */
-	beq	clear_bss		/* skip relocation */
+	beq	relocate_done		/* skip relocation */
 	mov	r1, r6			/* r1 <- scratch for copy_loop */
 	ldr	r3, _bss_start_ofs
 	add	r2, r0, r3		/* r2 <- source end address	    */
@@ -224,50 +215,9 @@  fixnext:
 	blo	fixloop
 #endif
 
-clear_bss:
-#ifndef CONFIG_SPL_BUILD
-	ldr	r0, _bss_start_ofs
-	ldr	r1, _bss_end_ofs
-	mov	r4, r6			/* reloc addr */
-	add	r0, r0, r4
-	add	r1, r1, r4
-	mov	r2, #0x00000000		/* clear			    */
-
-clbss_l:cmp	r0, r1			/* clear loop... */
-	bhs	clbss_e			/* if reached end of bss, exit */
-	str	r2, [r0]
-	add	r0, r0, #4
-	b	clbss_l
-clbss_e:
-
-	bl coloured_LED_init
-	bl red_led_on
-#endif
+relocate_done:
 
-/*
- * We are done. Do not return, instead branch to second part of board
- * initialization, now running from RAM.
- */
-#ifdef CONFIG_NAND_SPL
-	ldr     r0, _nand_boot_ofs
-	mov	pc, r0
-
-_nand_boot_ofs:
-	.word nand_boot
-#else
-	ldr	r0, _board_init_r_ofs
-	adr	r1, _start
-	add	lr, r0, r1
-	add	lr, lr, r9
-	/* setup parameters for board_init_r */
-	mov	r0, r5		/* gd_t */
-	mov	r1, r6		/* dest_addr */
-	/* jump to it ... */
-	mov	pc, lr
-
-_board_init_r_ofs:
-	.word board_init_r - _start
-#endif
+	bx	lr
 
 _rel_dyn_start_ofs:
 	.word __rel_dyn_start - _start
@@ -276,6 +226,11 @@  _rel_dyn_end_ofs:
 _dynsym_start_ofs:
 	.word __dynsym_start - _start
 
+	.globl	c_runtime_cpu_setup
+c_runtime_cpu_setup:
+
+	mov	pc, lr
+
 /*
  *************************************************************************
  *
diff --git a/arch/arm/cpu/armv7/start.S b/arch/arm/cpu/armv7/start.S
index 7df97c5..dcc1f83 100644
--- a/arch/arm/cpu/armv7/start.S
+++ b/arch/arm/cpu/armv7/start.S
@@ -155,12 +155,7 @@  reset:
 	bl	cpu_init_crit
 #endif
 
-/* Set stackpointer in internal RAM to call board_init_f */
-call_board_init_f:
-	ldr	sp, =(CONFIG_SYS_INIT_SP_ADDR)
-	bic	sp, sp, #7 /* 8-byte alignment for ABI compliance */
-	ldr	r0,=0x00000000
-	bl	board_init_f
+	bl	_main
 
 /*------------------------------------------------------------------------------*/
 
@@ -177,14 +172,10 @@  ENTRY(relocate_code)
 	mov	r5, r1	/* save addr of gd */
 	mov	r6, r2	/* save addr of destination */
 
-	/* Set up the stack						    */
-stack_setup:
-	mov	sp, r4
-
 	adr	r0, _start
 	cmp	r0, r6
 	moveq	r9, #0		/* no relocation. relocation offset(r9) = 0 */
-	beq	clear_bss		/* skip relocation */
+	beq	relocate_done		/* skip relocation */
 	mov	r1, r6			/* r1 <- scratch for copy_loop */
 	ldr	r3, _image_copy_end_ofs
 	add	r2, r0, r3		/* r2 <- source end address	    */
@@ -233,34 +224,22 @@  fixnext:
 	add	r2, r2, #8		/* each rel.dyn entry is 8 bytes */
 	cmp	r2, r3
 	blo	fixloop
-	b	clear_bss
+
+relocate_done:
+
+	bx	lr
+
 _rel_dyn_start_ofs:
 	.word __rel_dyn_start - _start
 _rel_dyn_end_ofs:
 	.word __rel_dyn_end - _start
 _dynsym_start_ofs:
 	.word __dynsym_start - _start
+ENDPROC(relocate_code)
 
-clear_bss:
-	ldr	r0, _bss_start_ofs
-	ldr	r1, _bss_end_ofs
-	mov	r4, r6			/* reloc addr */
-	add	r0, r0, r4
-	add	r1, r1, r4
-	mov	r2, #0x00000000		/* clear			    */
-
-clbss_l:cmp	r0, r1			/* clear loop... */
-	bhs	clbss_e			/* if reached end of bss, exit */
-	str	r2, [r0]
-	add	r0, r0, #4
-	b	clbss_l
-clbss_e:
+#endif
 
-/*
- * We are done. Do not return, instead branch to second part of board
- * initialization, now running from RAM.
- */
-jump_2_ram:
+ENTRY(c_runtime_cpu_setup)
 /*
  * If I-cache is enabled invalidate it
  */
@@ -279,20 +258,9 @@  jump_2_ram:
 	mcr     p15, 0, r0, c12, c0, 0  @Set VBAR
 #endif /* !Tegra20 */
 
-	ldr	r0, _board_init_r_ofs
-	adr	r1, _start
-	add	lr, r0, r1
-	add	lr, lr, r9
-	/* setup parameters for board_init_r */
-	mov	r0, r5		/* gd_t */
-	mov	r1, r6		/* dest_addr */
-	/* jump to it ... */
-	mov	pc, lr
-
-_board_init_r_ofs:
-	.word board_init_r - _start
-ENDPROC(relocate_code)
-#endif
+	bx	lr
+
+ENDPROC(c_runtime_cpu_setup)
 
 /*************************************************************************
  *
diff --git a/arch/arm/cpu/ixp/start.S b/arch/arm/cpu/ixp/start.S
index c12f1a7..efb5a40 100644
--- a/arch/arm/cpu/ixp/start.S
+++ b/arch/arm/cpu/ixp/start.S
@@ -245,12 +245,7 @@  reset:
 	orr	r0,r0,#0x13
 	msr	cpsr,r0
 
-/* Set initial stackpointer in SDRAM to call board_init_f */
-call_board_init_f:
-	ldr	sp, =(CONFIG_SYS_INIT_SP_ADDR)
-	bic	sp, sp, #7 /* 8-byte alignment for ABI compliance */
-	ldr	r0,=0x00000000
-	bl	board_init_f
+	bl	_main
 
 /*------------------------------------------------------------------------------*/
 
@@ -267,14 +262,10 @@  relocate_code:
 	mov	r5, r1	/* save addr of gd */
 	mov	r6, r2	/* save addr of destination */
 
-	/* Set up the stack						    */
-stack_setup:
-	mov	sp, r4
-
 	adr	r0, _start
 	cmp	r0, r6
 	moveq	r9, #0		/* no relocation. relocation offset(r9) = 0 */
-	beq	clear_bss		/* skip relocation */
+	beq	relocate_done		/* skip relocation */
 	mov	r1, r6			/* r1 <- scratch for copy_loop */
 	ldr	r3, _bss_start_ofs
 	add	r2, r0, r3		/* r2 <- source end address	    */
@@ -326,42 +317,9 @@  fixnext:
 	blo	fixloop
 #endif
 
-clear_bss:
-#ifndef CONFIG_SPL_BUILD
-	ldr	r0, _bss_start_ofs
-	ldr	r1, _bss_end_ofs
-	mov	r4, r6			/* reloc addr */
-	add	r0, r0, r4
-	add	r1, r1, r4
-	mov	r2, #0x00000000		/* clear			    */
-
-clbss_l:cmp	r0, r1			/* clear loop... */
-	bhs	clbss_e			/* if reached end of bss, exit */
-	str	r2, [r0]
-	add	r0, r0, #4
-	b	clbss_l
-clbss_e:
-
-	bl coloured_LED_init
-	bl red_led_on
-#endif
+relocate_done:
 
-/*
- * We are done. Do not return, instead branch to second part of board
- * initialization, now running from RAM.
- */
-	ldr	r0, _board_init_r_ofs
-	adr	r1, _start
-	add	lr, r0, r1
-	add	lr, lr, r9
-	/* setup parameters for board_init_r */
-	mov	r0, r5		/* gd_t */
-	mov	r1, r6		/* dest_addr */
-	/* jump to it ... */
-	mov	pc, lr
-
-_board_init_r_ofs:
-	.word board_init_r - _start
+	bx	lr
 
 _rel_dyn_start_ofs:
 	.word __rel_dyn_start - _start
@@ -370,6 +328,11 @@  _rel_dyn_end_ofs:
 _dynsym_start_ofs:
 	.word __dynsym_start - _start
 
+	.globl	c_runtime_cpu_setup
+c_runtime_cpu_setup:
+
+	bx	lr
+
 /****************************************************************************/
 /*									    */
 /* Interrupt handling							    */
diff --git a/arch/arm/cpu/pxa/start.S b/arch/arm/cpu/pxa/start.S
index 536cf5c..72af869 100644
--- a/arch/arm/cpu/pxa/start.S
+++ b/arch/arm/cpu/pxa/start.S
@@ -164,12 +164,7 @@  reset:
 	bl	lock_cache_for_stack
 #endif
 
-/* Set stackpointer in internal RAM to call board_init_f */
-call_board_init_f:
-	ldr	sp, =(CONFIG_SYS_INIT_SP_ADDR)
-	bic	sp, sp, #7 /* 8-byte alignment for ABI compliance */
-	ldr	r0, =0x00000000
-	bl	board_init_f
+	bl	_main
 
 /*------------------------------------------------------------------------------*/
 #ifndef CONFIG_SPL_BUILD
@@ -186,10 +181,6 @@  relocate_code:
 	mov	r5, r1	/* save addr of gd */
 	mov	r6, r2	/* save addr of destination */
 
-	/* Set up the stack						    */
-stack_setup:
-	mov	sp, r4
-
 /* Disable the Dcache RAM lock for stack now */
 #ifdef	CONFIG_CPU_PXA25X
 	bl	cpu_init_crit
@@ -198,7 +189,7 @@  stack_setup:
 	adr	r0, _start
 	cmp	r0, r6
 	moveq	r9, #0		/* no relocation. relocation offset(r9) = 0 */
-	beq	clear_bss		/* skip relocation */
+	beq	relocate_done		/* skip relocation */
 	mov	r1, r6			/* r1 <- scratch for copy_loop */
 	ldr	r3, _bss_start_ofs
 	add	r2, r0, r3		/* r2 <- source end address	    */
@@ -250,48 +241,9 @@  fixnext:
 	blo	fixloop
 #endif
 
-clear_bss:
-#ifndef CONFIG_SPL_BUILD
-	ldr	r0, _bss_start_ofs
-	ldr	r1, _bss_end_ofs
-	mov	r4, r6			/* reloc addr */
-	add	r0, r0, r4
-	add	r1, r1, r4
-	mov	r2, #0x00000000		/* clear			    */
-
-clbss_l:cmp	r0, r1			/* clear loop... */
-	bhs	clbss_e			/* if reached end of bss, exit */
-	str	r2, [r0]
-	add	r0, r0, #4
-	b	clbss_l
-clbss_e:
-#endif	/* #ifndef CONFIG_SPL_BUILD */
+relocate_done:
 
-/*
- * We are done. Do not return, instead branch to second part of board
- * initialization, now running from RAM.
- */
-#ifdef CONFIG_ONENAND_SPL
-	ldr     r0, _onenand_boot_ofs
-	mov	pc, r0
-
-_onenand_boot_ofs:
-	.word onenand_boot
-#else
-jump_2_ram:
-	ldr	r0, _board_init_r_ofs
-	ldr     r1, _TEXT_BASE
-	add	lr, r0, r1
-	add	lr, lr, r9
-	/* setup parameters for board_init_r */
-	mov	r0, r5		/* gd_t */
-	mov	r1, r6		/* dest_addr */
-	/* jump to it ... */
-	mov	pc, lr
-
-_board_init_r_ofs:
-	.word board_init_r - _start
-#endif
+	bx	lr
 
 _rel_dyn_start_ofs:
 	.word __rel_dyn_start - _start
@@ -299,7 +251,14 @@  _rel_dyn_end_ofs:
 	.word __rel_dyn_end - _start
 _dynsym_start_ofs:
 	.word __dynsym_start - _start
+
 #endif
+
+	.globl	c_runtime_cpu_setup
+c_runtime_cpu_setup:
+
+	bx	lr
+
 /*
  *************************************************************************
  *
diff --git a/arch/arm/cpu/s3c44b0/start.S b/arch/arm/cpu/s3c44b0/start.S
index 323b923..4528c91 100644
--- a/arch/arm/cpu/s3c44b0/start.S
+++ b/arch/arm/cpu/s3c44b0/start.S
@@ -128,12 +128,7 @@  reset:
 	bl	lowlevel_init
 #endif
 
-/* Set stackpointer in internal RAM to call board_init_f */
-call_board_init_f:
-	ldr	sp, =(CONFIG_SYS_INIT_SP_ADDR)
-	bic	sp, sp, #7 /* 8-byte alignment for ABI compliance */
-	ldr	r0,=0x00000000
-	bl	board_init_f
+	bl	_main
 
 /*------------------------------------------------------------------------------*/
 
@@ -150,14 +145,10 @@  relocate_code:
 	mov	r5, r1	/* save addr of gd */
 	mov	r6, r2	/* save addr of destination */
 
-	/* Set up the stack						    */
-stack_setup:
-	mov	sp, r4
-
 	adr	r0, _start
 	cmp	r0, r6
 	moveq	r9, #0		/* no relocation. relocation offset(r9) = 0 */
-	beq	clear_bss		/* skip relocation */
+	beq	relocate_done		/* skip relocation */
 	mov	r1, r6			/* r1 <- scratch for copy_loop */
 	ldr	r3, _bss_start_ofs
 	add	r2, r0, r3		/* r2 <- source end address	    */
@@ -209,42 +200,9 @@  fixnext:
 	blo	fixloop
 #endif
 
-clear_bss:
-#ifndef CONFIG_SPL_BUILD
-	ldr	r0, _bss_start_ofs
-	ldr	r1, _bss_end_ofs
-	mov	r4, r6			/* reloc addr */
-	add	r0, r0, r4
-	add	r1, r1, r4
-	mov	r2, #0x00000000		/* clear			    */
-
-clbss_l:cmp	r0, r1			/* clear loop... */
-	bhs	clbss_e			/* if reached end of bss, exit */
-	str	r2, [r0]
-	add	r0, r0, #4
-	b	clbss_l
-clbss_e:
-
-	bl coloured_LED_init
-	bl red_led_on
-#endif
+relocate_done:
 
-/*
- * We are done. Do not return, instead branch to second part of board
- * initialization, now running from RAM.
- */
-	ldr	r0, _board_init_r_ofs
-	adr	r1, _start
-	add	lr, r0, r1
-	add	lr, lr, r9
-	/* setup parameters for board_init_r */
-	mov	r0, r5		/* gd_t */
-	mov	r1, r6		/* dest_addr */
-	/* jump to it ... */
-	mov	pc, lr
-
-_board_init_r_ofs:
-	.word board_init_r - _start
+	bx	lr
 
 _rel_dyn_start_ofs:
 	.word __rel_dyn_start - _start
@@ -253,6 +211,11 @@  _rel_dyn_end_ofs:
 _dynsym_start_ofs:
 	.word __dynsym_start - _start
 
+	.globl	c_runtime_cpu_setup
+c_runtime_cpu_setup:
+
+	bx	lr
+
 /*
  *************************************************************************
  *
diff --git a/arch/arm/cpu/sa1100/start.S b/arch/arm/cpu/sa1100/start.S
index 1ea92d1..3144299 100644
--- a/arch/arm/cpu/sa1100/start.S
+++ b/arch/arm/cpu/sa1100/start.S
@@ -132,12 +132,7 @@  reset:
 	bl	cpu_init_crit
 #endif
 
-/* Set stackpointer in internal RAM to call board_init_f */
-call_board_init_f:
-	ldr	sp, =(CONFIG_SYS_INIT_SP_ADDR)
-	bic	sp, sp, #7 /* 8-byte alignment for ABI compliance */
-	ldr	r0,=0x00000000
-	bl	board_init_f
+	bl	_main
 
 /*------------------------------------------------------------------------------*/
 
@@ -154,14 +149,10 @@  relocate_code:
 	mov	r5, r1	/* save addr of gd */
 	mov	r6, r2	/* save addr of destination */
 
-	/* Set up the stack						    */
-stack_setup:
-	mov	sp, r4
-
 	adr	r0, _start
 	cmp	r0, r6
 	moveq	r9, #0		/* no relocation. relocation offset(r9) = 0 */
-	beq	clear_bss		/* skip relocation */
+	beq	relocate_done		/* skip relocation */
 	mov	r1, r6			/* r1 <- scratch for copy_loop */
 	ldr	r3, _bss_start_ofs
 	add	r2, r0, r3		/* r2 <- source end address	    */
@@ -213,40 +204,10 @@  fixnext:
 	blo	fixloop
 #endif
 
-clear_bss:
-#ifndef CONFIG_SPL_BUILD
-	ldr	r0, _bss_start_ofs
-	ldr	r1, _bss_end_ofs
-	mov	r4, r6			/* reloc addr */
-	add	r0, r0, r4
-	add	r1, r1, r4
-	mov	r2, #0x00000000		/* clear			    */
-
-clbss_l:cmp	r0, r1			/* clear loop... */
-	bhs	clbss_e			/* if reached end of bss, exit */
-	str	r2, [r0]
-	add	r0, r0, #4
-	b	clbss_l
-clbss_e:
-#endif
+relocate_done:
 
-/*
- * We are done. Do not return, instead branch to second part of board
- * initialization, now running from RAM.
- */
-	ldr	r0, _board_init_r_ofs
-	adr	r1, _start
-	add	lr, r0, r1
-	add	lr, lr, r9
-	/* setup parameters for board_init_r */
-	mov	r0, r5		/* gd_t */
-	mov	r1, r6		/* dest_addr */
-	/* jump to it ... */
 	mov	pc, lr
 
-_board_init_r_ofs:
-	.word board_init_r - _start
-
 _rel_dyn_start_ofs:
 	.word __rel_dyn_start - _start
 _rel_dyn_end_ofs:
@@ -254,6 +215,11 @@  _rel_dyn_end_ofs:
 _dynsym_start_ofs:
 	.word __dynsym_start - _start
 
+	.globl	c_runtime_cpu_setup
+c_runtime_cpu_setup:
+
+	mov	pc, lr
+
 /*
  *************************************************************************
  *
diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile
index 3422ac1..07baee2 100644
--- a/arch/arm/lib/Makefile
+++ b/arch/arm/lib/Makefile
@@ -36,6 +36,8 @@  GLSOBJS	+= _umodsi3.o
 
 GLCOBJS	+= div0.o
 
+SOBJS-y += crt0.o
+
 ifndef CONFIG_SPL_BUILD
 COBJS-y	+= board.o
 COBJS-y	+= bootm.o
diff --git a/arch/arm/lib/board.c b/arch/arm/lib/board.c
index 92cad9a..ad96d7e 100644
--- a/arch/arm/lib/board.c
+++ b/arch/arm/lib/board.c
@@ -272,11 +272,6 @@  void board_init_f(ulong bootflag)
 
 	bootstage_mark_name(BOOTSTAGE_ID_START_UBOOT_F, "board_init_f");
 
-	/* Pointer is writable since we allocated a register for it */
-	gd = (gd_t *) ((CONFIG_SYS_INIT_SP_ADDR) & ~0x07);
-	/* compiler optimization barrier needed for GCC >= 3.4 */
-	__asm__ __volatile__("": : :"memory");
-
 	memset((void *)gd, 0, sizeof(gd_t));
 
 	gd->mon_len = _bss_end_ofs;
@@ -458,10 +453,6 @@  void board_init_f(ulong bootflag)
 		gd->fdt_blob = new_fdt;
 	}
 	memcpy(id, (void *)gd, sizeof(gd_t));
-
-	relocate_code(addr_sp, id, addr);
-
-	/* NOTREACHED - relocate_code() does not return */
 }
 
 #if !defined(CONFIG_SYS_NO_FLASH)
@@ -486,8 +477,6 @@  void board_init_r(gd_t *id, ulong dest_addr)
 	ulong flash_size;
 #endif
 
-	gd = id;
-
 	gd->flags |= GD_FLG_RELOC;	/* tell others: relocation done */
 	bootstage_mark_name(BOOTSTAGE_ID_START_UBOOT_R, "board_init_r");
 
diff --git a/arch/arm/lib/crt0.S b/arch/arm/lib/crt0.S
new file mode 100644
index 0000000..ad6c71d
--- /dev/null
+++ b/arch/arm/lib/crt0.S
@@ -0,0 +1,180 @@ 
+/*
+ *  crt0 - C-runtime startup Code for ARM U-Boot
+ *
+ *  Copyright (c) 2012  Albert ARIBAUD <albert.u.boot@aribaud.net>
+ *
+ * See file CREDITS for list of people who contributed to this
+ * project.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of
+ * the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston,
+ * MA 02111-1307 USA
+ */
+
+#include <config.h>
+#include <asm-offsets.h>
+
+/*
+ * This file handles the target-independent stages of the U-Boot
+ * start-up where a C runtime environment is needed. Its entry point
+ * is _main and is branched into from the target's start.S file.
+ *
+ * _main execution sequence is:
+ *
+ * 1. Set up initial environment for calling board_init_f().
+ *    This environment only provides a stack and a place to store
+ *    the GD ('global data') structure, both located in some readily
+ *    available RAM (SRAM, locked cache...). In this context, VARIABLE
+ *    global data, initialized or not (BSS), are UNAVAILABLE; only
+ *    CONSTANT initialized data are available.
+ *
+ * 2. Call board_init_f(). This function prepares the hardware for
+ *    execution from system RAM (DRAM, DDR...) As system RAM may not
+ *    be available yet, , board_init_f() must use the current GD to
+ *    store any data which must be passed on to later stages. These
+ *    data include the relocation destination, the future stack, and
+ *    the future GD location.
+ *
+ * 3. Set up intermediate environment where the stack and GD are the
+ *    ones allocated by board_init_f() in system RAM, but BSS and
+ *    initialized non-const data are still not available.
+ *
+ * 4. Call relocate_code(). This function relocates U-Boot from its
+ *    current location into the relocation destination computed by
+ *    board_init_f().
+ *
+ * 5. Set up final environment for calling board_init_r(). This
+ *    environment has BSS (initialized to 0), initialized non-const
+ *    data (initialized to their intended value), and stack in system
+ *    RAM. GD has retained values set by board_init_f(). Some CPUs
+ *    have some work left to do at this point regarding memory, so
+ *    call c_runtime_cpu_setup.
+ *
+ * 6. Branch to either _nand_boot() or board_init_r().
+ */
+
+/*
+ * offset of the nand_boot() function for SPL crt
+ */
+
+#if defined(CONFIG_NAND_SPL)
+
+.globl nand_boot
+_nand_boot:
+	.word nand_boot
+
+#elif ! defined(CONFIG_SPL_BUILD)
+
+/*
+ * offset of the board_init_r() function for non-SPL crt
+ */
+
+.globl board_init_r
+_board_init_r:
+	.word board_init_r
+
+#endif
+
+/*
+ * start and end of BSS
+ */
+
+.globl __bss_start
+.globl __bss_end__
+
+/*
+ * entry point of crt0 sequence
+ */
+
+.global _main
+
+_main:
+
+/*
+ * Set up initial C runtime environment and call board_init_f(0).
+ */
+
+#if defined(CONFIG_NAND_SPL)
+	/* deprecated, use instead CONFIG_SPL_BUILD */
+	ldr	sp, =(CONFIG_SYS_INIT_SP_ADDR)
+#elif defined(CONFIG_SPL_BUILD) && defined(CONFIG_SPL_STACK)
+	ldr	sp, =(CONFIG_SPL_STACK)
+#else
+	ldr	sp, =(CONFIG_SYS_INIT_SP_ADDR)
+#endif
+	bic	sp, sp, #7 /* 8-byte alignment for ABI compliance */
+	mov	r8, sp		/* GD is above SP */
+	mov	r0, #0
+	bl	board_init_f
+
+/*
+ * Set up intermediate environment (new sp and gd) and call
+ * relocate_code(addr_sp, gd, addr_moni). Trick here is that
+ * we'll return 'here' but relocated.
+ */
+
+	ldr	sp, [r8, #GD_START_ADDR_SP]	/* r8 = gd->start_addr_sp */
+	ldr	r8, [r8, #GD_BD]		/* r8 = gd->bd */
+	sub	r8, r8, #GD_SIZE		/* new GD is below bd */
+
+#if ! defined(CONFIG_SPL_BUILD)
+
+	adr	lr, here
+	ldr	r0, [r8, #GD_RELOC_OFF]		/* lr = gd->start_addr_sp */
+	add	lr, lr, r0
+	ldr	r0, [r8, #GD_START_ADDR_SP]	/* r0 = gd->start_addr_sp */
+	mov	r1, r8				/* r1 = gd */
+	ldr	r2, [r8, #GD_RELOCADDR]		/* r2 = gd->relocaddr */
+	b	relocate_code
+here:
+
+#endif
+
+/* Set up final (full) environment */
+
+	bl	c_runtime_cpu_setup	/* we still call old routine here */
+
+	ldr	r0, =__bss_start	/* this is auto-relocated! */
+	ldr	r1, =__bss_end__	/* this is auto-relocated! */
+
+	mov	r2, #0x00000000		/* prepare zero to clear BSS */
+
+clbss_l:cmp	r0, r1			/* while not at end of BSS */
+	strlo	r2, [r0]		/* clear 32-bit BSS word */
+	addlo	r0, r0, #4		/* move to next */
+	bhs	clbss_l
+
+#if ! defined(CONFIG_SPL_BUILD)
+
+	bl coloured_LED_init
+	bl red_led_on
+
+#endif
+
+#if defined(CONFIG_NAND_SPL)
+
+	/* call _nand_boot() */
+	ldr     pc, _nand_boot
+
+#else
+
+	/* call board_init_r(gd_t *id, ulong dest_addr) */
+	mov	r0, r8			/* gd_t */
+	ldr	r1, [r8, #GD_RELOCADDR]	/* dest_addr */
+	/* call board_init_r */
+	ldr	pc, _board_init_r	/* this is auto-relocated! */
+
+#endif
+
+	/* we should not return here. */
diff --git a/include/common.h b/include/common.h
index 5e3c5ee..69a1ccd 100644
--- a/include/common.h
+++ b/include/common.h
@@ -300,7 +300,7 @@  int	abortboot(int bootdelay);
 extern char console_buffer[];
 
 /* arch/$(ARCH)/lib/board.c */
-void	board_init_f  (ulong) __attribute__ ((noreturn));
+void	board_init_f(ulong);
 void	board_init_r  (gd_t *, ulong) __attribute__ ((noreturn));
 int	checkboard    (void);
 int	checkflash    (void);
diff --git a/include/configs/socfpga_cyclone5.h b/include/configs/socfpga_cyclone5.h
index 42077bd..5633d2a 100644
--- a/include/configs/socfpga_cyclone5.h
+++ b/include/configs/socfpga_cyclone5.h
@@ -65,7 +65,7 @@ 
 #define CONFIG_SYS_BOOTMAPSZ		((256*1024*1024) - (4*1024))
 
 #define CONFIG_SPL_RAM_DEVICE
-#define CONFIG_SPL_STACK (&__stack_start)
+#define CONFIG_SPL_STACK CONFIG_SYS_INIT_SP_ADDR
 #define CONFIG_SYS_SPL_MALLOC_START ((unsigned long) (&__malloc_start))
 #define CONFIG_SYS_SPL_MALLOC_SIZE (&__malloc_end - &__malloc_start)
 
diff --git a/lib/asm-offsets.c b/lib/asm-offsets.c
index c88f5d4..764e27b 100644
--- a/lib/asm-offsets.c
+++ b/lib/asm-offsets.c
@@ -28,5 +28,15 @@  int main(void)
 	DEFINE(GENERATED_BD_INFO_SIZE,
 		(sizeof(struct bd_info) + 15) & ~15);
 
+	DEFINE(GD_SIZE, sizeof(struct global_data));
+
+	DEFINE(GD_BD, offsetof(struct global_data, bd));
+
+	DEFINE(GD_RELOCADDR, offsetof(struct global_data, relocaddr));
+
+	DEFINE(GD_RELOC_OFF, offsetof(struct global_data, reloc_off));
+
+	DEFINE(GD_START_ADDR_SP, offsetof(struct global_data, start_addr_sp));
+
 	return 0;
 }
diff --git a/nand_spl/board/freescale/mx31pdk/Makefile b/nand_spl/board/freescale/mx31pdk/Makefile
index 43e72c4..2793aef 100644
--- a/nand_spl/board/freescale/mx31pdk/Makefile
+++ b/nand_spl/board/freescale/mx31pdk/Makefile
@@ -12,11 +12,12 @@  LDFLAGS := -T $(nandobj)u-boot.lds -Ttext $(CONFIG_SYS_TEXT_BASE) $(LDFLAGS) \
 AFLAGS	+= -DCONFIG_SPL_BUILD -DCONFIG_NAND_SPL
 CFLAGS	+= -DCONFIG_SPL_BUILD -DCONFIG_NAND_SPL
 
-SOBJS	= start.o lowlevel_init.o
+SOBJS	= start.o crt0.o lowlevel_init.o
 COBJS	= nand_boot_fsl_nfc.o
 
 SRCS	:= $(SRCTREE)/nand_spl/nand_boot_fsl_nfc.c
 SRCS	+= $(SRCTREE)/arch/arm/cpu/arm1136/start.S
+SRCS	+= $(SRCTREE)/arch/arm/lib/crt0.S
 SRCS	+= $(SRCTREE)/board/freescale/mx31pdk/lowlevel_init.S
 OBJS	:= $(addprefix $(obj),$(SOBJS) $(COBJS))
 __OBJS	:= $(SOBJS) $(COBJS)
@@ -49,6 +50,9 @@  $(nandobj)u-boot.lds: $(LDSCRIPT) $(LSTSCRIPT)
 $(obj)%.o:	$(SRCTREE)/arch/arm/cpu/arm1136/%.S
 	$(CC) $(AFLAGS) -c -o $@ $<
 
+$(obj)%.o:	$(SRCTREE)/arch/arm/lib/%.S
+	$(CC) $(AFLAGS) -c -o $@ $<
+
 $(obj)%.o:	$(SRCTREE)/board/freescale/mx31pdk/%.S
 	$(CC) $(AFLAGS) -c -o $@ $<
 
diff --git a/nand_spl/board/karo/tx25/Makefile b/nand_spl/board/karo/tx25/Makefile
index becf7fa..825f06d 100644
--- a/nand_spl/board/karo/tx25/Makefile
+++ b/nand_spl/board/karo/tx25/Makefile
@@ -33,11 +33,12 @@  LDFLAGS := -T $(nandobj)u-boot.lds -Ttext $(CONFIG_SYS_TEXT_BASE) $(LDFLAGS) \
 AFLAGS	+= -DCONFIG_SPL_BUILD -DCONFIG_NAND_SPL
 CFLAGS	+= -DCONFIG_SPL_BUILD -DCONFIG_NAND_SPL
 
-SOBJS	= start.o lowlevel_init.o
+SOBJS	= start.o crt0.o lowlevel_init.o
 COBJS	= nand_boot_fsl_nfc.o
 
 SRCS	:= $(SRCTREE)/nand_spl/nand_boot_fsl_nfc.c
 SRCS	+= $(SRCTREE)/arch/arm/cpu/arm926ejs/start.S
+SRCS	+= $(SRCTREE)/arch/arm/lib/crt0.S
 SRCS	+= $(SRCTREE)/board/karo/tx25/lowlevel_init.S
 OBJS	:= $(addprefix $(obj),$(SOBJS) $(COBJS))
 __OBJS	:= $(SOBJS) $(COBJS)
@@ -70,6 +71,9 @@  $(nandobj)u-boot.lds: $(LDSCRIPT) $(LSTSCRIPT)
 $(obj)%.o:	$(SRCTREE)/arch/arm/cpu/arm926ejs/%.S
 	$(CC) $(AFLAGS) -c -o $@ $<
 
+$(obj)%.o:	$(SRCTREE)/arch/arm/lib/%.S
+	$(CC) $(AFLAGS) -c -o $@ $<
+
 $(obj)%.o:	$(SRCTREE)/board/karo/tx25/%.S
 	$(CC) $(AFLAGS) -c -o $@ $<