Patchwork [U-Boot,v4,1/2] arm: move C runtime setup code in crt0.S

login
register
mail settings
Submitter Albert ARIBAUD
Date Nov. 27, 2012, 12:43 p.m.
Message ID <1354020206-24762-2-git-send-email-albert.u.boot@aribaud.net>
Download mbox | patch
Permalink /patch/202203/
State Superseded
Delegated to: Albert ARIBAUD
Headers show

Comments

Albert ARIBAUD - Nov. 27, 2012, 12:43 p.m.
Move all the C runtime setup code from every start.S
in arch/arm into arch/arm/lib/crt0.S. This covers
the code sequence from setting up the initial stack
to calling into board_init_r().

Also, rewrite the C runtime setup and make functions
board_init_*() and relocate_code() behave according to
normal C semantics (no jumping across the C stack any
more, etc).

Some SPL targets had to be touched because they use
start.S explicitly or for some reason; the relevant
maintainers and custodians are cc:ed.

Signed-off-by: Albert ARIBAUD <albert.u.boot@aribaud.net>
---
Changes in v4:
- fixed bhs into blo in BSS init loop
- switched from explicit literals to '=target' notation
- fixed location of GD in initial environment
- stopped SPL boot sequence at calling board_init_f

Changes in v3:
- various clarifications and typo fixes
- fixed wrong LED calls conditional
- fixed open comment eating some code
- fixed code overrun in SPL case

Changes in v2:
- moved description from cover letter to patch commit msg
- added note about tests in the cover letter
- fixed baords with CONFIG_SPL but not CONFIG_SPL_STACK

 arch/arm/cpu/arm1136/start.S              |   71 +++---------
 arch/arm/cpu/arm1176/start.S              |   62 ++---------
 arch/arm/cpu/arm720t/start.S              |   53 ++-------
 arch/arm/cpu/arm920t/start.S              |   61 ++--------
 arch/arm/cpu/arm925t/start.S              |   61 ++--------
 arch/arm/cpu/arm926ejs/start.S            |   80 ++-----------
 arch/arm/cpu/arm946es/start.S             |   56 ++--------
 arch/arm/cpu/arm_intcm/start.S            |   63 ++---------
 arch/arm/cpu/armv7/start.S                |   58 +++-------
 arch/arm/cpu/ixp/start.S                  |   55 ++-------
 arch/arm/cpu/pxa/start.S                  |   63 ++---------
 arch/arm/cpu/s3c44b0/start.S              |   55 ++-------
 arch/arm/cpu/sa1100/start.S               |   50 ++-------
 arch/arm/lib/Makefile                     |    2 +
 arch/arm/lib/board.c                      |   11 --
 arch/arm/lib/crt0.S                       |  173 +++++++++++++++++++++++++++++
 include/common.h                          |    2 +-
 include/configs/socfpga_cyclone5.h        |    2 +-
 lib/asm-offsets.c                         |   10 ++
 nand_spl/board/freescale/mx31pdk/Makefile |    6 +-
 nand_spl/board/karo/tx25/Makefile         |    6 +-
 21 files changed, 322 insertions(+), 678 deletions(-)
 create mode 100644 arch/arm/lib/crt0.S
Simon Glass - Nov. 28, 2012, 9:18 p.m.
Hi Albert,

On Tue, Nov 27, 2012 at 4:43 AM, Albert ARIBAUD
<albert.u.boot@aribaud.net> wrote:
> Move all the C runtime setup code from every start.S
> in arch/arm into arch/arm/lib/crt0.S. This covers
> the code sequence from setting up the initial stack
> to calling into board_init_r().
>
> Also, rewrite the C runtime setup and make functions
> board_init_*() and relocate_code() behave according to
> normal C semantics (no jumping across the C stack any
> more, etc).
>
> Some SPL targets had to be touched because they use
> start.S explicitly or for some reason; the relevant
> maintainers and custodians are cc:ed.
>
> Signed-off-by: Albert ARIBAUD <albert.u.boot@aribaud.net>

I tested this on Seaboard (Tegra 20).

Tested-by: Simon Glass <sjg@chromium.org>

I tried to test it on a snow (exynos5250) but couldn't really sync up
with our tree, so gave up. I am not completely sure about how the
CONFIG_SPL_BUILD stuff fits together in start.S and I got a build
error I wasn't sure how to correct (arch/arm/cpu/armv7/start.S:179:
undefined reference to `relocate_done')

Regards,
Simon
Albert ARIBAUD - Nov. 28, 2012, 10:34 p.m.
Hi Simon,

On Wed, 28 Nov 2012 13:18:57 -0800, Simon Glass <sjg@chromium.org>
wrote:

> Hi Albert,
> 
> On Tue, Nov 27, 2012 at 4:43 AM, Albert ARIBAUD
> <albert.u.boot@aribaud.net> wrote:
> > Move all the C runtime setup code from every start.S
> > in arch/arm into arch/arm/lib/crt0.S. This covers
> > the code sequence from setting up the initial stack
> > to calling into board_init_r().
> >
> > Also, rewrite the C runtime setup and make functions
> > board_init_*() and relocate_code() behave according to
> > normal C semantics (no jumping across the C stack any
> > more, etc).
> >
> > Some SPL targets had to be touched because they use
> > start.S explicitly or for some reason; the relevant
> > maintainers and custodians are cc:ed.
> >
> > Signed-off-by: Albert ARIBAUD <albert.u.boot@aribaud.net>
> 
> I tested this on Seaboard (Tegra 20).
> 
> Tested-by: Simon Glass <sjg@chromium.org>
> 
> I tried to test it on a snow (exynos5250) but couldn't really sync up
> with our tree, so gave up. I am not completely sure about how the
> CONFIG_SPL_BUILD stuff fits together in start.S and I got a build
> error I wasn't sure how to correct (arch/arm/cpu/armv7/start.S:179:
> undefined reference to `relocate_done')

Thansk Simon.

In file arch/arm/cpu/armv7/start.S, line 179 uses relocate_done and
line 228 provides it, and there is no preprocessor conditional
in-between, so I fail to see how you could get this error.

Is your tree (and branch) available so that I can try and see the issue
for myself? Either it'll uncover a problem in my patch series, or it'll
help you merge it (or both).

> Regards,
> Simon

Amicalement,
Simon Glass - Nov. 30, 2012, 10:10 p.m.
Hi Albert,

On Wed, Nov 28, 2012 at 2:34 PM, Albert ARIBAUD
<albert.u.boot@aribaud.net> wrote:
> Hi Simon,
>
> On Wed, 28 Nov 2012 13:18:57 -0800, Simon Glass <sjg@chromium.org>
> wrote:
>
>> Hi Albert,
>>
>> On Tue, Nov 27, 2012 at 4:43 AM, Albert ARIBAUD
>> <albert.u.boot@aribaud.net> wrote:
>> > Move all the C runtime setup code from every start.S
>> > in arch/arm into arch/arm/lib/crt0.S. This covers
>> > the code sequence from setting up the initial stack
>> > to calling into board_init_r().
>> >
>> > Also, rewrite the C runtime setup and make functions
>> > board_init_*() and relocate_code() behave according to
>> > normal C semantics (no jumping across the C stack any
>> > more, etc).
>> >
>> > Some SPL targets had to be touched because they use
>> > start.S explicitly or for some reason; the relevant
>> > maintainers and custodians are cc:ed.
>> >
>> > Signed-off-by: Albert ARIBAUD <albert.u.boot@aribaud.net>
>>
>> I tested this on Seaboard (Tegra 20).
>>
>> Tested-by: Simon Glass <sjg@chromium.org>
>>
>> I tried to test it on a snow (exynos5250) but couldn't really sync up
>> with our tree, so gave up. I am not completely sure about how the
>> CONFIG_SPL_BUILD stuff fits together in start.S and I got a build
>> error I wasn't sure how to correct (arch/arm/cpu/armv7/start.S:179:
>> undefined reference to `relocate_done')
>
> Thansk Simon.
>
> In file arch/arm/cpu/armv7/start.S, line 179 uses relocate_done and
> line 228 provides it, and there is no preprocessor conditional
> in-between, so I fail to see how you could get this error.
>
> Is your tree (and branch) available so that I can try and see the issue
> for myself? Either it'll uncover a problem in my patch series, or it'll
> help you merge it (or both).

OK I will try a bit harder and come back to you. Sorry I can't do that
immediately but will be soon...

Regards,
Simon

>
>> Regards,
>> Simon
>
> Amicalement,
> --
> Albert.
Sughosh Ganu - Dec. 9, 2012, 8:33 p.m.
On Tue Nov 27, 2012 at 01:43:25PM +0100, Albert ARIBAUD wrote:
> Move all the C runtime setup code from every start.S
> in arch/arm into arch/arm/lib/crt0.S. This covers
> the code sequence from setting up the initial stack
> to calling into board_init_r().
> 
> Also, rewrite the C runtime setup and make functions
> board_init_*() and relocate_code() behave according to
> normal C semantics (no jumping across the C stack any
> more, etc).
> 
> Some SPL targets had to be touched because they use
> start.S explicitly or for some reason; the relevant
> maintainers and custodians are cc:ed.
> 
> Signed-off-by: Albert ARIBAUD <albert.u.boot@aribaud.net>
> ---

Tested on hawkboard. Spl and u-boot images boot up fine with these
changes.

Tested-by: Sughosh Ganu <urwithsughosh@gmail.com>

-sughosh
Albert ARIBAUD - Dec. 23, 2012, 3:03 p.m.
Hi Simon,

On Fri, 30 Nov 2012 14:10:01 -0800, Simon Glass <sjg@chromium.org>
wrote:

> >> I tried to test it on a snow (exynos5250) but couldn't really sync up
> >> with our tree, so gave up. I am not completely sure about how the
> >> CONFIG_SPL_BUILD stuff fits together in start.S and I got a build
> >> error I wasn't sure how to correct (arch/arm/cpu/armv7/start.S:179:
> >> undefined reference to `relocate_done')
> >
> > Thansk Simon.
> >
> > In file arch/arm/cpu/armv7/start.S, line 179 uses relocate_done and
> > line 228 provides it, and there is no preprocessor conditional
> > in-between, so I fail to see how you could get this error.
> >
> > Is your tree (and branch) available so that I can try and see the issue
> > for myself? Either it'll uncover a problem in my patch series, or it'll
> > help you merge it (or both).
> 
> OK I will try a bit harder and come back to you. Sorry I can't do that
> immediately but will be soon...

Did you manage to reproduce or sort out this issue?

> Regards,
> Simon

Amicalement,
Simon Glass - Dec. 26, 2012, 8:41 p.m.
Hi Albert,

On Sun, Dec 23, 2012 at 7:03 AM, Albert ARIBAUD
<albert.u.boot@aribaud.net> wrote:
> Hi Simon,
>
> On Fri, 30 Nov 2012 14:10:01 -0800, Simon Glass <sjg@chromium.org>
> wrote:
>
>> >> I tried to test it on a snow (exynos5250) but couldn't really sync up
>> >> with our tree, so gave up. I am not completely sure about how the
>> >> CONFIG_SPL_BUILD stuff fits together in start.S and I got a build
>> >> error I wasn't sure how to correct (arch/arm/cpu/armv7/start.S:179:
>> >> undefined reference to `relocate_done')
>> >
>> > Thansk Simon.
>> >
>> > In file arch/arm/cpu/armv7/start.S, line 179 uses relocate_done and
>> > line 228 provides it, and there is no preprocessor conditional
>> > in-between, so I fail to see how you could get this error.
>> >
>> > Is your tree (and branch) available so that I can try and see the issue
>> > for myself? Either it'll uncover a problem in my patch series, or it'll
>> > help you merge it (or both).
>>
>> OK I will try a bit harder and come back to you. Sorry I can't do that
>> immediately but will be soon...
>
> Did you manage to reproduce or sort out this issue?

No, I was hoping that I would be able to test upstream U-Boot on snow
(waiting on some memory patches to appear), but that hasn't happened.
I will have another go at working out the problem - either this week
or next depending on when I can set aside the time for it. However,
please don't hold up the series for it.

>
>> Regards,
>> Simon
>
> Amicalement,
> --
> Albert.

Regards,
Simon
Simon Glass - Jan. 5, 2013, 1 a.m.
Hi Albert,

On Wed, Dec 26, 2012 at 12:41 PM, Simon Glass <sjg@chromium.org> wrote:
> Hi Albert,
>
> On Sun, Dec 23, 2012 at 7:03 AM, Albert ARIBAUD
> <albert.u.boot@aribaud.net> wrote:
>> Hi Simon,
>>
>> On Fri, 30 Nov 2012 14:10:01 -0800, Simon Glass <sjg@chromium.org>
>> wrote:
>>
>>> >> I tried to test it on a snow (exynos5250) but couldn't really sync up
>>> >> with our tree, so gave up. I am not completely sure about how the
>>> >> CONFIG_SPL_BUILD stuff fits together in start.S and I got a build
>>> >> error I wasn't sure how to correct (arch/arm/cpu/armv7/start.S:179:
>>> >> undefined reference to `relocate_done')
>>> >
>>> > Thansk Simon.
>>> >
>>> > In file arch/arm/cpu/armv7/start.S, line 179 uses relocate_done and
>>> > line 228 provides it, and there is no preprocessor conditional
>>> > in-between, so I fail to see how you could get this error.
>>> >
>>> > Is your tree (and branch) available so that I can try and see the issue
>>> > for myself? Either it'll uncover a problem in my patch series, or it'll
>>> > help you merge it (or both).
>>>
>>> OK I will try a bit harder and come back to you. Sorry I can't do that
>>> immediately but will be soon...
>>
>> Did you manage to reproduce or sort out this issue?
>
> No, I was hoping that I would be able to test upstream U-Boot on snow
> (waiting on some memory patches to appear), but that hasn't happened.
> I will have another go at working out the problem - either this week
> or next depending on when I can set aside the time for it. However,
> please don't hold up the series for it.

I have tested this on snow at last. My problem was basically the
distance between our tree and upstream - we are just coming up to a
rebase so are at the furthest point.

Tested-by: Simon Glass <sjg@chromium.org>
Acked-by: Simon Glass <sjg@chromium.org>

>
>>
>>> Regards,
>>> Simon
>>
>> Amicalement,
>> --
>> Albert.
>
> Regards,
> Simon
Tom Rini - Jan. 7, 2013, 2:40 p.m.
On Tue, Nov 27, 2012 at 01:43:25PM +0100, Albert ARIBAUD wrote:

> Move all the C runtime setup code from every start.S
> in arch/arm into arch/arm/lib/crt0.S. This covers
> the code sequence from setting up the initial stack
> to calling into board_init_r().
> 
> Also, rewrite the C runtime setup and make functions
> board_init_*() and relocate_code() behave according to
> normal C semantics (no jumping across the C stack any
> more, etc).
> 
> Some SPL targets had to be touched because they use
> start.S explicitly or for some reason; the relevant
> maintainers and custodians are cc:ed.
> 
> Signed-off-by: Albert ARIBAUD <albert.u.boot@aribaud.net>

On beaglebone Rev A5
Tested-by: Tom Rini <trini@ti.com>
Acked-by: Tom Rini <trini@ti.com>
Tom Rini - Jan. 8, 2013, 7:26 p.m.
On Tue, Nov 27, 2012 at 01:43:25PM +0100, Albert ARIBAUD wrote:

> Move all the C runtime setup code from every start.S
> in arch/arm into arch/arm/lib/crt0.S. This covers
> the code sequence from setting up the initial stack
> to calling into board_init_r().
> 
> Also, rewrite the C runtime setup and make functions
> board_init_*() and relocate_code() behave according to
> normal C semantics (no jumping across the C stack any
> more, etc).
> 
> Some SPL targets had to be touched because they use
> start.S explicitly or for some reason; the relevant
> maintainers and custodians are cc:ed.
> 
> Signed-off-by: Albert ARIBAUD <albert.u.boot@aribaud.net>
> ---
> Changes in v4:
> - fixed bhs into blo in BSS init loop
> - switched from explicit literals to '=target' notation
> - fixed location of GD in initial environment
> - stopped SPL boot sequence at calling board_init_f
> 
> Changes in v3:
> - various clarifications and typo fixes
> - fixed wrong LED calls conditional
> - fixed open comment eating some code
> - fixed code overrun in SPL case
> 
> Changes in v2:
> - moved description from cover letter to patch commit msg
> - added note about tests in the cover letter
> - fixed baords with CONFIG_SPL but not CONFIG_SPL_STACK

This change breaks sandbox, powerpc, mips and m68k.  I suspect fixing
sandbox will be the easiest test-case and catch all of the others.
Albert ARIBAUD - Jan. 8, 2013, 7:50 p.m.
Hi Tom,

On Tue, 8 Jan 2013 12:26:07 -0700, Tom Rini <trini@ti.com> wrote:
> On Tue, Nov 27, 2012 at 01:43:25PM +0100, Albert ARIBAUD wrote:
> 
> > Move all the C runtime setup code from every start.S
> > in arch/arm into arch/arm/lib/crt0.S. This covers
> > the code sequence from setting up the initial stack
> > to calling into board_init_r().
> > 
> > Also, rewrite the C runtime setup and make functions
> > board_init_*() and relocate_code() behave according to
> > normal C semantics (no jumping across the C stack any
> > more, etc).
> > 
> > Some SPL targets had to be touched because they use
> > start.S explicitly or for some reason; the relevant
> > maintainers and custodians are cc:ed.
> > 
> > Signed-off-by: Albert ARIBAUD <albert.u.boot@aribaud.net>
> > ---
> > Changes in v4:
> > - fixed bhs into blo in BSS init loop
> > - switched from explicit literals to '=target' notation
> > - fixed location of GD in initial environment
> > - stopped SPL boot sequence at calling board_init_f
> > 
> > Changes in v3:
> > - various clarifications and typo fixes
> > - fixed wrong LED calls conditional
> > - fixed open comment eating some code
> > - fixed code overrun in SPL case
> > 
> > Changes in v2:
> > - moved description from cover letter to patch commit msg
> > - added note about tests in the cover letter
> > - fixed baords with CONFIG_SPL but not CONFIG_SPL_STACK
> 
> This change breaks sandbox, powerpc, mips and m68k.  I suspect fixing
> sandbox will be the easiest test-case and catch all of the others.
> 
> -- 
> Tom

Ok, then, rolling back ARM and pushing v5 to fix breakage on non-ARM
arches.

Amicalement,
Albert ARIBAUD - Jan. 8, 2013, 8:18 p.m.
The goal of this series is to scrub the start.S files
which have proliferated across arch/arm and eliminate
code redundancy.

This series only factorizes the C runtime framework,
that is, the assembly code responsible for setting up
the C environments needed by board_init_*() and
relocate_code().

Further factorization is possible: relocate_code() can
be moved out and possibly rewritten in C now that it
has plain C function semantics; exception handlers
should also be pretty much cpu-independent.

Also, SPL currently makes board_init_f() perform all
of the setting up, including BSS clearing and chain-
loading. This could be streamlined by only putting
basic inits in board_init_f(), letting crt0 do the
BSS initialization and chain-loading U-Boot from
inside board_init_r().

Eventually, start.S files should disappear or contain
CPU-specific code only.

The new C runtime setup sequence has been validated step by
step on targets versatileqemu using gcc version 4.2.2 from
the ELDK4.2 toolchain, and wireless_space using gcc version
4.7.2 (Ubuntu/Linaro 4.7.2-1ubuntu1).

The whole changes have been build-tested across all ARM
targets using MAKEALL -a arm.

Changes in v5:
- Fixed build on non-ARM architectures

Changes in v4:
- fixed bhs into blo in BSS init loop
- switched from explicit literals to '=target' notation
- fixed location of GD in initial environment
- stopped SPL boot sequence at calling board_init_f

Changes in v3:
- various clarifications and typo fixes
- fixed wrong LED calls conditional
- fixed open comment eating some code
- fixed code overrun in SPL case

Changes in v2:
- moved description from cover letter to patch commit msg
- added note about tests in the cover letter
- fixed baords with CONFIG_SPL but not CONFIG_SPL_STACK
- removed useless includes in arm926ejs and arm925t

Albert ARIBAUD (2):
  arm: move C runtime setup code in crt0.S
  arm: remove useless code in start.S files

 arch/arm/cpu/arm1136/start.S              |   71 +++---------
 arch/arm/cpu/arm1176/start.S              |   62 ++---------
 arch/arm/cpu/arm720t/start.S              |   53 ++-------
 arch/arm/cpu/arm920t/start.S              |   61 ++--------
 arch/arm/cpu/arm925t/start.S              |   65 ++---------
 arch/arm/cpu/arm926ejs/start.S            |   86 ++------------
 arch/arm/cpu/arm946es/start.S             |   56 ++--------
 arch/arm/cpu/arm_intcm/start.S            |   63 ++---------
 arch/arm/cpu/armv7/start.S                |   58 +++-------
 arch/arm/cpu/ixp/start.S                  |   55 ++-------
 arch/arm/cpu/pxa/start.S                  |   63 ++---------
 arch/arm/cpu/s3c44b0/start.S              |   55 ++-------
 arch/arm/cpu/sa1100/start.S               |   50 ++-------
 arch/arm/lib/Makefile                     |    2 +
 arch/arm/lib/board.c                      |   11 --
 arch/arm/lib/crt0.S                       |  173 +++++++++++++++++++++++++++++
 include/common.h                          |    2 +-
 include/configs/socfpga_cyclone5.h        |    2 +-
 lib/asm-offsets.c                         |   14 +++
 nand_spl/board/freescale/mx31pdk/Makefile |    6 +-
 nand_spl/board/karo/tx25/Makefile         |    6 +-
 21 files changed, 326 insertions(+), 688 deletions(-)
 create mode 100644 arch/arm/lib/crt0.S
Albert ARIBAUD - Jan. 8, 2013, 9:16 p.m.
Hi Albert,

On Tue,  8 Jan 2013 21:18:01 +0100, Albert ARIBAUD
<albert.u.boot@aribaud.net> wrote:

> The goal of this series is to scrub the start.S files
> which have proliferated across arch/arm and eliminate
> code redundancy.
> 
> This series only factorizes the C runtime framework,
> that is, the assembly code responsible for setting up
> the C environments needed by board_init_*() and
> relocate_code().
> 
> Further factorization is possible: relocate_code() can
> be moved out and possibly rewritten in C now that it
> has plain C function semantics; exception handlers
> should also be pretty much cpu-independent.
> 
> Also, SPL currently makes board_init_f() perform all
> of the setting up, including BSS clearing and chain-
> loading. This could be streamlined by only putting
> basic inits in board_init_f(), letting crt0 do the
> BSS initialization and chain-loading U-Boot from
> inside board_init_r().
> 
> Eventually, start.S files should disappear or contain
> CPU-specific code only.
> 
> The new C runtime setup sequence has been validated step by
> step on targets versatileqemu using gcc version 4.2.2 from
> the ELDK4.2 toolchain, and wireless_space using gcc version
> 4.7.2 (Ubuntu/Linaro 4.7.2-1ubuntu1).
> 
> The whole changes have been build-tested across all ARM
> targets using MAKEALL -a arm.
> 
> Changes in v5:
> - Fixed build on non-ARM architectures
> 
> Changes in v4:
> - fixed bhs into blo in BSS init loop
> - switched from explicit literals to '=target' notation
> - fixed location of GD in initial environment
> - stopped SPL boot sequence at calling board_init_f
> 
> Changes in v3:
> - various clarifications and typo fixes
> - fixed wrong LED calls conditional
> - fixed open comment eating some code
> - fixed code overrun in SPL case
> 
> Changes in v2:
> - moved description from cover letter to patch commit msg
> - added note about tests in the cover letter
> - fixed baords with CONFIG_SPL but not CONFIG_SPL_STACK
> - removed useless includes in arm926ejs and arm925t
> 
> Albert ARIBAUD (2):
>   arm: move C runtime setup code in crt0.S
>   arm: remove useless code in start.S files
> 
>  arch/arm/cpu/arm1136/start.S              |   71 +++---------
>  arch/arm/cpu/arm1176/start.S              |   62 ++---------
>  arch/arm/cpu/arm720t/start.S              |   53 ++-------
>  arch/arm/cpu/arm920t/start.S              |   61 ++--------
>  arch/arm/cpu/arm925t/start.S              |   65 ++---------
>  arch/arm/cpu/arm926ejs/start.S            |   86 ++------------
>  arch/arm/cpu/arm946es/start.S             |   56 ++--------
>  arch/arm/cpu/arm_intcm/start.S            |   63 ++---------
>  arch/arm/cpu/armv7/start.S                |   58 +++-------
>  arch/arm/cpu/ixp/start.S                  |   55 ++-------
>  arch/arm/cpu/pxa/start.S                  |   63 ++---------
>  arch/arm/cpu/s3c44b0/start.S              |   55 ++-------
>  arch/arm/cpu/sa1100/start.S               |   50 ++-------
>  arch/arm/lib/Makefile                     |    2 +
>  arch/arm/lib/board.c                      |   11 --
>  arch/arm/lib/crt0.S                       |  173 +++++++++++++++++++++++++++++
>  include/common.h                          |    2 +-
>  include/configs/socfpga_cyclone5.h        |    2 +-
>  lib/asm-offsets.c                         |   14 +++
>  nand_spl/board/freescale/mx31pdk/Makefile |    6 +-
>  nand_spl/board/karo/tx25/Makefile         |    6 +-
>  21 files changed, 326 insertions(+), 688 deletions(-)
>  create mode 100644 arch/arm/lib/crt0.S
> 

Applied to u-boot-arm/master with a rollback from the Marvell PR and
v4 of this patch series.

Amicalement,

Patch

diff --git a/arch/arm/cpu/arm1136/start.S b/arch/arm/cpu/arm1136/start.S
index 5d3b4c2..a067b8a 100644
--- a/arch/arm/cpu/arm1136/start.S
+++ b/arch/arm/cpu/arm1136/start.S
@@ -165,13 +165,7 @@  next:
 	bl  cpu_init_crit
 #endif
 
-/* Set stackpointer in internal RAM to call board_init_f */
-call_board_init_f:
-	ldr	sp, =(CONFIG_SYS_INIT_SP_ADDR)
-	bic	sp, sp, #7 /* 8-byte alignment for ABI compliance */
-	ldr	r0,=0x00000000
-
-	bl	board_init_f
+	bl	_main
 
 /*------------------------------------------------------------------------------*/
 
@@ -188,14 +182,10 @@  relocate_code:
 	mov	r5, r1	/* save addr of gd */
 	mov	r6, r2	/* save addr of destination */
 
-	/* Set up the stack						    */
-stack_setup:
-	mov	sp, r4
-
 	adr	r0, _start
 	cmp	r0, r6
 	moveq	r9, #0		/* no relocation. relocation offset(r9) = 0 */
-	beq	clear_bss		/* skip relocation */
+	beq	relocate_done		/* skip relocation */
 	mov	r1, r6			/* r1 <- scratch for copy_loop */
 	ldr	r3, _image_copy_end_ofs
 	add	r2, r0, r3		/* r2 <- source end address	    */
@@ -245,7 +235,15 @@  fixnext:
 	add	r2, r2, #8		/* each rel.dyn entry is 8 bytes */
 	cmp	r2, r3
 	blo	fixloop
-	b	clear_bss
+	bx	lr
+
+#endif
+
+relocate_done:
+
+	bx	lr
+
+#ifndef CONFIG_SPL_BUILD
 
 _rel_dyn_start_ofs:
 	.word __rel_dyn_start - _start
@@ -253,54 +251,13 @@  _rel_dyn_end_ofs:
 	.word __rel_dyn_end - _start
 _dynsym_start_ofs:
 	.word __dynsym_start - _start
-#endif
 
-clear_bss:
-#ifdef CONFIG_SPL_BUILD
-	/* No relocation for SPL */
-	ldr	r0, =__bss_start
-	ldr	r1, =__bss_end__
-#else
-	ldr	r0, _bss_start_ofs
-	ldr	r1, _bss_end_ofs
-	mov	r4, r6			/* reloc addr */
-	add	r0, r0, r4
-	add	r1, r1, r4
 #endif
-	mov	r2, #0x00000000		/* clear			    */
-
-clbss_l:cmp	r0, r1			/* clear loop... */
-	bhs	clbss_e			/* if reached end of bss, exit */
-	str	r2, [r0]
-	add	r0, r0, #4
-	b	clbss_l
-clbss_e:
 
-/*
- * We are done. Do not return, instead branch to second part of board
- * initialization, now running from RAM.
- */
-#ifdef CONFIG_NAND_SPL
-	ldr     r0, _nand_boot_ofs
-	mov	pc, r0
+	.globl	c_runtime_cpu_setup
+c_runtime_cpu_setup:
 
-_nand_boot_ofs:
-	.word nand_boot
-#else
-jump_2_ram:
-	ldr	r0, _board_init_r_ofs
-	adr	r1, _start
-	add	lr, r0, r1
-	add	lr, lr, r9
-	/* setup parameters for board_init_r */
-	mov	r0, r5		/* gd_t */
-	mov	r1, r6		/* dest_addr */
-	/* jump to it ... */
-	mov	pc, lr
-
-_board_init_r_ofs:
-	.word board_init_r - _start
-#endif
+	bx	lr
 
 /*
  *************************************************************************
diff --git a/arch/arm/cpu/arm1176/start.S b/arch/arm/cpu/arm1176/start.S
index 667a0e0..40df4b1 100644
--- a/arch/arm/cpu/arm1176/start.S
+++ b/arch/arm/cpu/arm1176/start.S
@@ -224,12 +224,7 @@  skip_tcmdisable:
 	 */
 	bl	lowlevel_init		/* go setup pll,mux,memory */
 
-/* Set stackpointer in internal RAM to call board_init_f */
-call_board_init_f:
-	ldr	sp, =(CONFIG_SYS_INIT_SP_ADDR)
-	bic	sp, sp, #7 /* 8-byte alignment for ABI compliance */
-	ldr	r0,=0x00000000
-	bl	board_init_f
+	bl	_main
 
 /*------------------------------------------------------------------------------*/
 
@@ -246,14 +241,10 @@  relocate_code:
 	mov	r5, r1	/* save addr of gd */
 	mov	r6, r2	/* save addr of destination */
 
-	/* Set up the stack						    */
-stack_setup:
-	mov	sp, r4
-
 	adr	r0, _start
 	cmp	r0, r6
 	moveq	r9, #0		/* no relocation. relocation offset(r9) = 0 */
-	beq	clear_bss		/* skip relocation */
+	beq	relocate_done		/* skip relocation */
 	mov	r1, r6			/* r1 <- scratch for copy_loop */
 	ldr	r3, _bss_start_ofs
 	add	r2, r0, r3		/* r2 <- source end address	    */
@@ -343,49 +334,9 @@  mmu_enable:
 skip_hw_init:
 #endif
 
-clear_bss:
-#ifndef CONFIG_SPL_BUILD
-	ldr	r0, _bss_start_ofs
-	ldr	r1, _bss_end_ofs
-	mov	r4, r6			/* reloc addr */
-	add	r0, r0, r4
-	add	r1, r1, r4
-	mov	r2, #0x00000000		/* clear			    */
-
-clbss_l:cmp	r0, r1			/* clear loop... */
-	bhs	clbss_e			/* if reached end of bss, exit */
-	str	r2, [r0]
-	add	r0, r0, #4
-	b	clbss_l
-clbss_e:
-#ifndef CONFIG_NAND_SPL
-	bl coloured_LED_init
-	bl red_led_on
-#endif
-#endif
+relocate_done:
 
-/*
- * We are done. Do not return, instead branch to second part of board
- * initialization, now running from RAM.
- */
-#ifdef CONFIG_NAND_SPL
-	ldr     pc, _nand_boot
-
-_nand_boot: .word nand_boot
-#else
-	ldr	r0, _board_init_r_ofs
-	adr	r1, _start
-	add	lr, r0, r1
-	add     lr, lr, r9
-	/* setup parameters for board_init_r */
-	mov	r0, r5		/* gd_t */
-	mov	r1, r6		/* dest_addr */
-	/* jump to it ... */
-	mov	pc, lr
-
-_board_init_r_ofs:
-	.word board_init_r - _start
-#endif
+	bx	lr
 
 _rel_dyn_start_ofs:
 	.word __rel_dyn_start - _start
@@ -399,6 +350,11 @@  _mmu_table_base:
 	.word mmu_table
 #endif
 
+	.globl	c_runtime_cpu_setup
+c_runtime_cpu_setup:
+
+	mov	pc, lr
+
 #ifndef CONFIG_NAND_SPL
 /*
  * we assume that cache operation is done before. (eg. cleanup_before_linux())
diff --git a/arch/arm/cpu/arm720t/start.S b/arch/arm/cpu/arm720t/start.S
index c2a7763..771d386 100644
--- a/arch/arm/cpu/arm720t/start.S
+++ b/arch/arm/cpu/arm720t/start.S
@@ -147,12 +147,7 @@  reset:
 	bl	cpu_init_crit
 #endif
 
-/* Set stackpointer in internal RAM to call board_init_f */
-call_board_init_f:
-	ldr	sp, =(CONFIG_SYS_INIT_SP_ADDR)
-	bic	sp, sp, #7 /* 8-byte alignment for ABI compliance */
-	ldr	r0,=0x00000000
-	bl	board_init_f
+	bl	_main
 
 /*------------------------------------------------------------------------------*/
 
@@ -169,14 +164,10 @@  relocate_code:
 	mov	r5, r1	/* save addr of gd */
 	mov	r6, r2	/* save addr of destination */
 
-	/* Set up the stack						    */
-stack_setup:
-	mov	sp, r4
-
 	adr	r0, _start
 	cmp	r0, r6
 	moveq	r9, #0		/* no relocation. relocation offset(r9) = 0 */
-	beq	clear_bss		/* skip relocation */
+	beq	relocate_done		/* skip relocation */
 	mov	r1, r6			/* r1 <- scratch for copy_loop */
 	ldr	r3, _bss_start_ofs
 	add	r2, r0, r3		/* r2 <- source end address	    */
@@ -228,43 +219,10 @@  fixnext:
 	blo	fixloop
 #endif
 
-clear_bss:
-#ifndef CONFIG_SPL_BUILD
-	ldr	r0, _bss_start_ofs
-	ldr	r1, _bss_end_ofs
-	mov	r4, r6			/* reloc addr */
-	add	r0, r0, r4
-	add	r1, r1, r4
-	mov	r2, #0x00000000		/* clear			    */
-
-clbss_l:cmp	r0, r1			/* clear loop... */
-	bhs	clbss_e			/* if reached end of bss, exit */
-	str	r2, [r0]
-	add	r0, r0, #4
-	b	clbss_l
-clbss_e:
-
-	bl coloured_LED_init
-	bl red_led_on
-#endif
+relocate_done:
 
-/*
- * We are done. Do not return, instead branch to second part of board
- * initialization, now running from RAM.
- */
-	ldr	r0, _board_init_r_ofs
-	adr	r1, _start
-	add	lr, r0, r1
-	add	lr, lr, r9
-	/* setup parameters for board_init_r */
-	mov	r0, r5		/* gd_t */
-	mov	r1, r6		/* dest_addr */
-	/* jump to it ... */
 	mov	pc, lr
 
-_board_init_r_ofs:
-	.word board_init_r - _start
-
 _rel_dyn_start_ofs:
 	.word __rel_dyn_start - _start
 _rel_dyn_end_ofs:
@@ -272,6 +230,11 @@  _rel_dyn_end_ofs:
 _dynsym_start_ofs:
 	.word __dynsym_start - _start
 
+	.globl	c_runtime_cpu_setup
+c_runtime_cpu_setup:
+
+	mov	pc, lr
+
 /*
  *************************************************************************
  *
diff --git a/arch/arm/cpu/arm920t/start.S b/arch/arm/cpu/arm920t/start.S
index 14c9156..511d21d 100644
--- a/arch/arm/cpu/arm920t/start.S
+++ b/arch/arm/cpu/arm920t/start.S
@@ -182,12 +182,7 @@  copyex:
 	bl	cpu_init_crit
 #endif
 
-/* Set stackpointer in internal RAM to call board_init_f */
-call_board_init_f:
-	ldr	sp, =(CONFIG_SYS_INIT_SP_ADDR)
-	bic	sp, sp, #7 /* 8-byte alignment for ABI compliance */
-	ldr	r0,=0x00000000
-	bl	board_init_f
+	bl	_main
 
 /*------------------------------------------------------------------------------*/
 
@@ -204,14 +199,10 @@  relocate_code:
 	mov	r5, r1	/* save addr of gd */
 	mov	r6, r2	/* save addr of destination */
 
-	/* Set up the stack						    */
-stack_setup:
-	mov	sp, r4
-
 	adr	r0, _start
 	cmp	r0, r6
 	moveq	r9, #0		/* no relocation. relocation offset(r9) = 0 */
-	beq	clear_bss		/* skip relocation */
+	beq	relocate_done		/* skip relocation */
 	mov	r1, r6			/* r1 <- scratch for copy_loop */
 	ldr	r3, _bss_start_ofs
 	add	r2, r0, r3		/* r2 <- source end address	    */
@@ -263,51 +254,10 @@  fixnext:
 	blo	fixloop
 #endif
 
-clear_bss:
-#ifndef CONFIG_SPL_BUILD
-	ldr	r0, _bss_start_ofs
-	ldr	r1, _bss_end_ofs
-	mov	r4, r6			/* reloc addr */
-	add	r0, r0, r4
-	add	r1, r1, r4
-	mov	r2, #0x00000000		/* clear			    */
-
-clbss_l:cmp	r0, r1			/* clear loop... */
-	bhs	clbss_e			/* if reached end of bss, exit */
-	str	r2, [r0]
-	add	r0, r0, #4
-	b	clbss_l
-clbss_e:
-
-	bl coloured_LED_init
-	bl red_led_on
-#endif
-
-/*
- * We are done. Do not return, instead branch to second part of board
- * initialization, now running from RAM.
- */
-#ifdef CONFIG_NAND_SPL
-	ldr     r0, _nand_boot_ofs
-	mov	pc, r0
+relocate_done:
 
-_nand_boot_ofs:
-	.word nand_boot
-#else
-	ldr	r0, _board_init_r_ofs
-	adr	r1, _start
-	add	lr, r0, r1
-	add	lr, lr, r9
-	/* setup parameters for board_init_r */
-	mov	r0, r5		/* gd_t */
-	mov	r1, r6		/* dest_addr */
-	/* jump to it ... */
 	mov	pc, lr
 
-_board_init_r_ofs:
-	.word board_init_r - _start
-#endif
-
 _rel_dyn_start_ofs:
 	.word __rel_dyn_start - _start
 _rel_dyn_end_ofs:
@@ -315,6 +265,11 @@  _rel_dyn_end_ofs:
 _dynsym_start_ofs:
 	.word __dynsym_start - _start
 
+	.globl	c_runtime_cpu_setup
+c_runtime_cpu_setup:
+
+	mov	pc, lr
+
 /*
  *************************************************************************
  *
diff --git a/arch/arm/cpu/arm925t/start.S b/arch/arm/cpu/arm925t/start.S
index 3a483f6..82f5b8b 100644
--- a/arch/arm/cpu/arm925t/start.S
+++ b/arch/arm/cpu/arm925t/start.S
@@ -176,12 +176,7 @@  poll1:
 	bl  cpu_init_crit
 #endif
 
-/* Set stackpointer in internal RAM to call board_init_f */
-call_board_init_f:
-	ldr	sp, =(CONFIG_SYS_INIT_SP_ADDR)
-	bic	sp, sp, #7 /* 8-byte alignment for ABI compliance */
-	ldr	r0,=0x00000000
-	bl	board_init_f
+	bl	_main
 
 /*------------------------------------------------------------------------------*/
 
@@ -198,14 +193,10 @@  relocate_code:
 	mov	r5, r1	/* save addr of gd */
 	mov	r6, r2	/* save addr of destination */
 
-	/* Set up the stack						    */
-stack_setup:
-	mov	sp, r4
-
 	adr	r0, _start
 	cmp	r0, r6
 	moveq	r9, #0		/* no relocation. relocation offset(r9) = 0 */
-	beq	clear_bss		/* skip relocation */
+	beq	relocate_done		/* skip relocation */
 	mov	r1, r6			/* r1 <- scratch for copy_loop */
 	ldr	r3, _bss_start_ofs
 	add	r2, r0, r3		/* r2 <- source end address	    */
@@ -257,51 +248,10 @@  fixnext:
 	blo	fixloop
 #endif
 
-clear_bss:
-#ifndef CONFIG_SPL_BUILD
-	ldr	r0, _bss_start_ofs
-	ldr	r1, _bss_end_ofs
-	mov	r4, r6			/* reloc addr */
-	add	r0, r0, r4
-	add	r1, r1, r4
-	mov	r2, #0x00000000		/* clear			    */
-
-clbss_l:cmp	r0, r1			/* clear loop... */
-	bhs	clbss_e			/* if reached end of bss, exit */
-	str	r2, [r0]
-	add	r0, r0, #4
-	b	clbss_l
-clbss_e:
-
-	bl coloured_LED_init
-	bl red_led_on
-#endif
-
-/*
- * We are done. Do not return, instead branch to second part of board
- * initialization, now running from RAM.
- */
-#ifdef CONFIG_NAND_SPL
-	ldr     r0, _nand_boot_ofs
-	mov	pc, r0
+relocate_done:
 
-_nand_boot_ofs:
-	.word nand_boot
-#else
-	ldr	r0, _board_init_r_ofs
-	adr	r1, _start
-	add	lr, r0, r1
-	add	lr, lr, r9
-	/* setup parameters for board_init_r */
-	mov	r0, r5		/* gd_t */
-	mov	r1, r6		/* dest_addr */
-	/* jump to it ... */
 	mov	pc, lr
 
-_board_init_r_ofs:
-	.word board_init_r - _start
-#endif
-
 _rel_dyn_start_ofs:
 	.word __rel_dyn_start - _start
 _rel_dyn_end_ofs:
@@ -309,6 +259,11 @@  _rel_dyn_end_ofs:
 _dynsym_start_ofs:
 	.word __dynsym_start - _start
 
+	.globl	c_runtime_cpu_setup
+c_runtime_cpu_setup:
+
+	mov	pc, lr
+
 /*
  *************************************************************************
  *
diff --git a/arch/arm/cpu/arm926ejs/start.S b/arch/arm/cpu/arm926ejs/start.S
index 2188f7e..3cdecd9 100644
--- a/arch/arm/cpu/arm926ejs/start.S
+++ b/arch/arm/cpu/arm926ejs/start.S
@@ -198,20 +198,7 @@  reset:
 	bl	cpu_init_crit
 #endif
 
-/* Set stackpointer in internal RAM to call board_init_f */
-call_board_init_f:
-#ifdef CONFIG_NAND_SPL /* deprecated, use instead CONFIG_SPL_BUILD */
-	ldr	sp, =(CONFIG_SYS_INIT_SP_ADDR)
-#else
-#ifdef CONFIG_SPL_BUILD
-	ldr	sp, =(CONFIG_SPL_STACK)
-#else
-	ldr	sp, =(CONFIG_SYS_INIT_SP_ADDR)
-#endif
-#endif
-	bic	sp, sp, #7 /* 8-byte alignment for ABI compliance */
-	ldr	r0,=0x00000000
-	bl	board_init_f
+	bl	_main
 
 /*------------------------------------------------------------------------------*/
 
@@ -229,15 +216,11 @@  relocate_code:
 	mov	r5, r1	/* save addr of gd */
 	mov	r6, r2	/* save addr of destination */
 
-	/* Set up the stack						    */
-stack_setup:
-	mov	sp, r4
-
 	adr	r0, _start
 	sub	r9, r6, r0		/* r9 <- relocation offset */
 	cmp	r0, r6
-	moveq	r9, #0		/* no relocation. relocation offset(r9) = 0 */
-	beq	clear_bss		/* skip relocation */
+	moveq	r9, #0			/* no relocation. offset(r9) = 0 */
+	beq	relocate_done		/* skip relocation */
 	mov	r1, r6			/* r1 <- scratch for copy loop */
 	ldr	r3, _bss_start_ofs
 	add	r2, r0, r3		/* r2 <- source end address	    */
@@ -289,56 +272,9 @@  fixnext:
 	blo	fixloop
 #endif
 
-clear_bss:
-#ifdef CONFIG_SPL_BUILD
-	/* No relocation for SPL */
-	ldr	r0, =__bss_start
-	ldr	r1, =__bss_end__
-#else
-	ldr	r0, _bss_start_ofs
-	ldr	r1, _bss_end_ofs
-	mov	r4, r6			/* reloc addr */
-	add	r0, r0, r4
-	add	r1, r1, r4
-#endif
-	mov	r2, #0x00000000		/* clear			    */
-
-clbss_l:cmp	r0, r1			/* clear loop... */
-	bhs	clbss_e			/* if reached end of bss, exit */
-	str	r2, [r0]
-	add	r0, r0, #4
-	b	clbss_l
-clbss_e:
+relocate_done:
 
-#ifndef CONFIG_SPL_BUILD
-	bl coloured_LED_init
-	bl red_led_on
-#endif
-
-/*
- * We are done. Do not return, instead branch to second part of board
- * initialization, now running from RAM.
- */
-#ifdef CONFIG_NAND_SPL
-	ldr     r0, _nand_boot_ofs
-	mov	pc, r0
-
-_nand_boot_ofs:
-	.word nand_boot
-#else
-	ldr	r0, _board_init_r_ofs
-	ldr	r1, _TEXT_BASE
-	add	lr, r0, r1
-	add	lr, lr, r9
-	/* setup parameters for board_init_r */
-	mov	r0, r5		/* gd_t */
-	mov	r1, r6		/* dest_addr */
-	/* jump to it ... */
-	mov	pc, lr
-
-_board_init_r_ofs:
-	.word board_init_r - _start
-#endif
+	bx	lr
 
 _rel_dyn_start_ofs:
 	.word __rel_dyn_start - _start
@@ -346,8 +282,14 @@  _rel_dyn_end_ofs:
 	.word __rel_dyn_end - _start
 _dynsym_start_ofs:
 	.word __dynsym_start - _start
+
 #endif
 
+	.globl	c_runtime_cpu_setup
+c_runtime_cpu_setup:
+
+	bx	lr
+
 /*
  *************************************************************************
  *
diff --git a/arch/arm/cpu/arm946es/start.S b/arch/arm/cpu/arm946es/start.S
index 30e2183..a7a98a4 100644
--- a/arch/arm/cpu/arm946es/start.S
+++ b/arch/arm/cpu/arm946es/start.S
@@ -147,12 +147,7 @@  reset:
 	bl	cpu_init_crit
 #endif
 
-/* Set stackpointer in internal RAM to call board_init_f */
-call_board_init_f:
-	ldr	sp, =(CONFIG_SYS_INIT_SP_ADDR)
-	bic	sp, sp, #7 /* 8-byte alignment for ABI compliance */
-	ldr	r0,=0x00000000
-	bl	board_init_f
+	bl	_main
 
 /*------------------------------------------------------------------------------*/
 
@@ -169,14 +164,10 @@  relocate_code:
 	mov	r5, r1	/* save addr of gd */
 	mov	r6, r2	/* save addr of destination */
 
-	/* Set up the stack						    */
-stack_setup:
-	mov	sp, r4
-
 	adr	r0, _start
 	cmp	r0, r6
 	moveq	r9, #0		/* no relocation. relocation offset(r9) = 0 */
-	beq	clear_bss		/* skip relocation */
+	beq	relocate_done		/* skip relocation */
 	mov	r1, r6			/* r1 <- scratch for copy_loop */
 	ldr	r3, _bss_start_ofs
 	add	r2, r0, r3		/* r2 <- source end address	    */
@@ -228,46 +219,10 @@  fixnext:
 	blo	fixloop
 #endif
 
-clear_bss:
-#ifndef CONFIG_SPL_BUILD
-	ldr	r0, _bss_start_ofs
-	ldr	r1, _bss_end_ofs
-	mov	r4, r6			/* reloc addr */
-	add	r0, r0, r4
-	add	r1, r1, r4
-	mov	r2, #0x00000000		/* clear			    */
-
-clbss_l:cmp	r0, r1			/* clear loop... */
-	bhs	clbss_e			/* if reached end of bss, exit */
-	str	r2, [r0]
-	add	r0, r0, #4
-	b	clbss_l
-clbss_e:
-#endif
-
-/*
- * We are done. Do not return, instead branch to second part of board
- * initialization, now running from RAM.
- */
-#ifdef CONFIG_NAND_SPL
-	ldr     pc, _nand_boot
+relocate_done:
 
-_nand_boot: .word nand_boot
-#else
-	ldr	r0, _board_init_r_ofs
-	adr	r1, _start
-	add	lr, r0, r1
-	add	lr, lr, r9
-	/* setup parameters for board_init_r */
-	mov	r0, r5		/* gd_t */
-	mov	r1, r6		/* dest_addr */
-	/* jump to it ... */
 	mov	pc, lr
 
-_board_init_r_ofs:
-	.word board_init_r - _start
-#endif
-
 _rel_dyn_start_ofs:
 	.word __rel_dyn_start - _start
 _rel_dyn_end_ofs:
@@ -275,6 +230,11 @@  _rel_dyn_end_ofs:
 _dynsym_start_ofs:
 	.word __dynsym_start - _start
 
+	.globl	c_runtime_cpu_setup
+c_runtime_cpu_setup:
+
+	mov	pc, lr
+
 /*
  *************************************************************************
  *
diff --git a/arch/arm/cpu/arm_intcm/start.S b/arch/arm/cpu/arm_intcm/start.S
index a133d19..c189849 100644
--- a/arch/arm/cpu/arm_intcm/start.S
+++ b/arch/arm/cpu/arm_intcm/start.S
@@ -143,12 +143,7 @@  reset:
 	bl	cpu_init_crit
 #endif
 
-/* Set stackpointer in internal RAM to call board_init_f */
-call_board_init_f:
-	ldr	sp, =(CONFIG_SYS_INIT_SP_ADDR)
-	bic	sp, sp, #7 /* 8-byte alignment for ABI compliance */
-	ldr	r0,=0x00000000
-	bl	board_init_f
+	bl	_main
 
 /*------------------------------------------------------------------------------*/
 
@@ -165,14 +160,10 @@  relocate_code:
 	mov	r5, r1	/* save addr of gd */
 	mov	r6, r2	/* save addr of destination */
 
-	/* Set up the stack						    */
-stack_setup:
-	mov	sp, r4
-
 	adr	r0, _start
 	cmp	r0, r6
 	moveq	r9, #0		/* no relocation. relocation offset(r9) = 0 */
-	beq	clear_bss		/* skip relocation */
+	beq	relocate_done		/* skip relocation */
 	mov	r1, r6			/* r1 <- scratch for copy_loop */
 	ldr	r3, _bss_start_ofs
 	add	r2, r0, r3		/* r2 <- source end address	    */
@@ -224,50 +215,9 @@  fixnext:
 	blo	fixloop
 #endif
 
-clear_bss:
-#ifndef CONFIG_SPL_BUILD
-	ldr	r0, _bss_start_ofs
-	ldr	r1, _bss_end_ofs
-	mov	r4, r6			/* reloc addr */
-	add	r0, r0, r4
-	add	r1, r1, r4
-	mov	r2, #0x00000000		/* clear			    */
-
-clbss_l:cmp	r0, r1			/* clear loop... */
-	bhs	clbss_e			/* if reached end of bss, exit */
-	str	r2, [r0]
-	add	r0, r0, #4
-	b	clbss_l
-clbss_e:
-
-	bl coloured_LED_init
-	bl red_led_on
-#endif
+relocate_done:
 
-/*
- * We are done. Do not return, instead branch to second part of board
- * initialization, now running from RAM.
- */
-#ifdef CONFIG_NAND_SPL
-	ldr     r0, _nand_boot_ofs
-	mov	pc, r0
-
-_nand_boot_ofs:
-	.word nand_boot
-#else
-	ldr	r0, _board_init_r_ofs
-	adr	r1, _start
-	add	lr, r0, r1
-	add	lr, lr, r9
-	/* setup parameters for board_init_r */
-	mov	r0, r5		/* gd_t */
-	mov	r1, r6		/* dest_addr */
-	/* jump to it ... */
-	mov	pc, lr
-
-_board_init_r_ofs:
-	.word board_init_r - _start
-#endif
+	bx	lr
 
 _rel_dyn_start_ofs:
 	.word __rel_dyn_start - _start
@@ -276,6 +226,11 @@  _rel_dyn_end_ofs:
 _dynsym_start_ofs:
 	.word __dynsym_start - _start
 
+	.globl	c_runtime_cpu_setup
+c_runtime_cpu_setup:
+
+	mov	pc, lr
+
 /*
  *************************************************************************
  *
diff --git a/arch/arm/cpu/armv7/start.S b/arch/arm/cpu/armv7/start.S
index 7df97c5..dcc1f83 100644
--- a/arch/arm/cpu/armv7/start.S
+++ b/arch/arm/cpu/armv7/start.S
@@ -155,12 +155,7 @@  reset:
 	bl	cpu_init_crit
 #endif
 
-/* Set stackpointer in internal RAM to call board_init_f */
-call_board_init_f:
-	ldr	sp, =(CONFIG_SYS_INIT_SP_ADDR)
-	bic	sp, sp, #7 /* 8-byte alignment for ABI compliance */
-	ldr	r0,=0x00000000
-	bl	board_init_f
+	bl	_main
 
 /*------------------------------------------------------------------------------*/
 
@@ -177,14 +172,10 @@  ENTRY(relocate_code)
 	mov	r5, r1	/* save addr of gd */
 	mov	r6, r2	/* save addr of destination */
 
-	/* Set up the stack						    */
-stack_setup:
-	mov	sp, r4
-
 	adr	r0, _start
 	cmp	r0, r6
 	moveq	r9, #0		/* no relocation. relocation offset(r9) = 0 */
-	beq	clear_bss		/* skip relocation */
+	beq	relocate_done		/* skip relocation */
 	mov	r1, r6			/* r1 <- scratch for copy_loop */
 	ldr	r3, _image_copy_end_ofs
 	add	r2, r0, r3		/* r2 <- source end address	    */
@@ -233,34 +224,22 @@  fixnext:
 	add	r2, r2, #8		/* each rel.dyn entry is 8 bytes */
 	cmp	r2, r3
 	blo	fixloop
-	b	clear_bss
+
+relocate_done:
+
+	bx	lr
+
 _rel_dyn_start_ofs:
 	.word __rel_dyn_start - _start
 _rel_dyn_end_ofs:
 	.word __rel_dyn_end - _start
 _dynsym_start_ofs:
 	.word __dynsym_start - _start
+ENDPROC(relocate_code)
 
-clear_bss:
-	ldr	r0, _bss_start_ofs
-	ldr	r1, _bss_end_ofs
-	mov	r4, r6			/* reloc addr */
-	add	r0, r0, r4
-	add	r1, r1, r4
-	mov	r2, #0x00000000		/* clear			    */
-
-clbss_l:cmp	r0, r1			/* clear loop... */
-	bhs	clbss_e			/* if reached end of bss, exit */
-	str	r2, [r0]
-	add	r0, r0, #4
-	b	clbss_l
-clbss_e:
+#endif
 
-/*
- * We are done. Do not return, instead branch to second part of board
- * initialization, now running from RAM.
- */
-jump_2_ram:
+ENTRY(c_runtime_cpu_setup)
 /*
  * If I-cache is enabled invalidate it
  */
@@ -279,20 +258,9 @@  jump_2_ram:
 	mcr     p15, 0, r0, c12, c0, 0  @Set VBAR
 #endif /* !Tegra20 */
 
-	ldr	r0, _board_init_r_ofs
-	adr	r1, _start
-	add	lr, r0, r1
-	add	lr, lr, r9
-	/* setup parameters for board_init_r */
-	mov	r0, r5		/* gd_t */
-	mov	r1, r6		/* dest_addr */
-	/* jump to it ... */
-	mov	pc, lr
-
-_board_init_r_ofs:
-	.word board_init_r - _start
-ENDPROC(relocate_code)
-#endif
+	bx	lr
+
+ENDPROC(c_runtime_cpu_setup)
 
 /*************************************************************************
  *
diff --git a/arch/arm/cpu/ixp/start.S b/arch/arm/cpu/ixp/start.S
index c12f1a7..efb5a40 100644
--- a/arch/arm/cpu/ixp/start.S
+++ b/arch/arm/cpu/ixp/start.S
@@ -245,12 +245,7 @@  reset:
 	orr	r0,r0,#0x13
 	msr	cpsr,r0
 
-/* Set initial stackpointer in SDRAM to call board_init_f */
-call_board_init_f:
-	ldr	sp, =(CONFIG_SYS_INIT_SP_ADDR)
-	bic	sp, sp, #7 /* 8-byte alignment for ABI compliance */
-	ldr	r0,=0x00000000
-	bl	board_init_f
+	bl	_main
 
 /*------------------------------------------------------------------------------*/
 
@@ -267,14 +262,10 @@  relocate_code:
 	mov	r5, r1	/* save addr of gd */
 	mov	r6, r2	/* save addr of destination */
 
-	/* Set up the stack						    */
-stack_setup:
-	mov	sp, r4
-
 	adr	r0, _start
 	cmp	r0, r6
 	moveq	r9, #0		/* no relocation. relocation offset(r9) = 0 */
-	beq	clear_bss		/* skip relocation */
+	beq	relocate_done		/* skip relocation */
 	mov	r1, r6			/* r1 <- scratch for copy_loop */
 	ldr	r3, _bss_start_ofs
 	add	r2, r0, r3		/* r2 <- source end address	    */
@@ -326,42 +317,9 @@  fixnext:
 	blo	fixloop
 #endif
 
-clear_bss:
-#ifndef CONFIG_SPL_BUILD
-	ldr	r0, _bss_start_ofs
-	ldr	r1, _bss_end_ofs
-	mov	r4, r6			/* reloc addr */
-	add	r0, r0, r4
-	add	r1, r1, r4
-	mov	r2, #0x00000000		/* clear			    */
-
-clbss_l:cmp	r0, r1			/* clear loop... */
-	bhs	clbss_e			/* if reached end of bss, exit */
-	str	r2, [r0]
-	add	r0, r0, #4
-	b	clbss_l
-clbss_e:
-
-	bl coloured_LED_init
-	bl red_led_on
-#endif
+relocate_done:
 
-/*
- * We are done. Do not return, instead branch to second part of board
- * initialization, now running from RAM.
- */
-	ldr	r0, _board_init_r_ofs
-	adr	r1, _start
-	add	lr, r0, r1
-	add	lr, lr, r9
-	/* setup parameters for board_init_r */
-	mov	r0, r5		/* gd_t */
-	mov	r1, r6		/* dest_addr */
-	/* jump to it ... */
-	mov	pc, lr
-
-_board_init_r_ofs:
-	.word board_init_r - _start
+	bx	lr
 
 _rel_dyn_start_ofs:
 	.word __rel_dyn_start - _start
@@ -370,6 +328,11 @@  _rel_dyn_end_ofs:
 _dynsym_start_ofs:
 	.word __dynsym_start - _start
 
+	.globl	c_runtime_cpu_setup
+c_runtime_cpu_setup:
+
+	bx	lr
+
 /****************************************************************************/
 /*									    */
 /* Interrupt handling							    */
diff --git a/arch/arm/cpu/pxa/start.S b/arch/arm/cpu/pxa/start.S
index 536cf5c..72af869 100644
--- a/arch/arm/cpu/pxa/start.S
+++ b/arch/arm/cpu/pxa/start.S
@@ -164,12 +164,7 @@  reset:
 	bl	lock_cache_for_stack
 #endif
 
-/* Set stackpointer in internal RAM to call board_init_f */
-call_board_init_f:
-	ldr	sp, =(CONFIG_SYS_INIT_SP_ADDR)
-	bic	sp, sp, #7 /* 8-byte alignment for ABI compliance */
-	ldr	r0, =0x00000000
-	bl	board_init_f
+	bl	_main
 
 /*------------------------------------------------------------------------------*/
 #ifndef CONFIG_SPL_BUILD
@@ -186,10 +181,6 @@  relocate_code:
 	mov	r5, r1	/* save addr of gd */
 	mov	r6, r2	/* save addr of destination */
 
-	/* Set up the stack						    */
-stack_setup:
-	mov	sp, r4
-
 /* Disable the Dcache RAM lock for stack now */
 #ifdef	CONFIG_CPU_PXA25X
 	bl	cpu_init_crit
@@ -198,7 +189,7 @@  stack_setup:
 	adr	r0, _start
 	cmp	r0, r6
 	moveq	r9, #0		/* no relocation. relocation offset(r9) = 0 */
-	beq	clear_bss		/* skip relocation */
+	beq	relocate_done		/* skip relocation */
 	mov	r1, r6			/* r1 <- scratch for copy_loop */
 	ldr	r3, _bss_start_ofs
 	add	r2, r0, r3		/* r2 <- source end address	    */
@@ -250,48 +241,9 @@  fixnext:
 	blo	fixloop
 #endif
 
-clear_bss:
-#ifndef CONFIG_SPL_BUILD
-	ldr	r0, _bss_start_ofs
-	ldr	r1, _bss_end_ofs
-	mov	r4, r6			/* reloc addr */
-	add	r0, r0, r4
-	add	r1, r1, r4
-	mov	r2, #0x00000000		/* clear			    */
-
-clbss_l:cmp	r0, r1			/* clear loop... */
-	bhs	clbss_e			/* if reached end of bss, exit */
-	str	r2, [r0]
-	add	r0, r0, #4
-	b	clbss_l
-clbss_e:
-#endif	/* #ifndef CONFIG_SPL_BUILD */
+relocate_done:
 
-/*
- * We are done. Do not return, instead branch to second part of board
- * initialization, now running from RAM.
- */
-#ifdef CONFIG_ONENAND_SPL
-	ldr     r0, _onenand_boot_ofs
-	mov	pc, r0
-
-_onenand_boot_ofs:
-	.word onenand_boot
-#else
-jump_2_ram:
-	ldr	r0, _board_init_r_ofs
-	ldr     r1, _TEXT_BASE
-	add	lr, r0, r1
-	add	lr, lr, r9
-	/* setup parameters for board_init_r */
-	mov	r0, r5		/* gd_t */
-	mov	r1, r6		/* dest_addr */
-	/* jump to it ... */
-	mov	pc, lr
-
-_board_init_r_ofs:
-	.word board_init_r - _start
-#endif
+	bx	lr
 
 _rel_dyn_start_ofs:
 	.word __rel_dyn_start - _start
@@ -299,7 +251,14 @@  _rel_dyn_end_ofs:
 	.word __rel_dyn_end - _start
 _dynsym_start_ofs:
 	.word __dynsym_start - _start
+
 #endif
+
+	.globl	c_runtime_cpu_setup
+c_runtime_cpu_setup:
+
+	bx	lr
+
 /*
  *************************************************************************
  *
diff --git a/arch/arm/cpu/s3c44b0/start.S b/arch/arm/cpu/s3c44b0/start.S
index 323b923..4528c91 100644
--- a/arch/arm/cpu/s3c44b0/start.S
+++ b/arch/arm/cpu/s3c44b0/start.S
@@ -128,12 +128,7 @@  reset:
 	bl	lowlevel_init
 #endif
 
-/* Set stackpointer in internal RAM to call board_init_f */
-call_board_init_f:
-	ldr	sp, =(CONFIG_SYS_INIT_SP_ADDR)
-	bic	sp, sp, #7 /* 8-byte alignment for ABI compliance */
-	ldr	r0,=0x00000000
-	bl	board_init_f
+	bl	_main
 
 /*------------------------------------------------------------------------------*/
 
@@ -150,14 +145,10 @@  relocate_code:
 	mov	r5, r1	/* save addr of gd */
 	mov	r6, r2	/* save addr of destination */
 
-	/* Set up the stack						    */
-stack_setup:
-	mov	sp, r4
-
 	adr	r0, _start
 	cmp	r0, r6
 	moveq	r9, #0		/* no relocation. relocation offset(r9) = 0 */
-	beq	clear_bss		/* skip relocation */
+	beq	relocate_done		/* skip relocation */
 	mov	r1, r6			/* r1 <- scratch for copy_loop */
 	ldr	r3, _bss_start_ofs
 	add	r2, r0, r3		/* r2 <- source end address	    */
@@ -209,42 +200,9 @@  fixnext:
 	blo	fixloop
 #endif
 
-clear_bss:
-#ifndef CONFIG_SPL_BUILD
-	ldr	r0, _bss_start_ofs
-	ldr	r1, _bss_end_ofs
-	mov	r4, r6			/* reloc addr */
-	add	r0, r0, r4
-	add	r1, r1, r4
-	mov	r2, #0x00000000		/* clear			    */
-
-clbss_l:cmp	r0, r1			/* clear loop... */
-	bhs	clbss_e			/* if reached end of bss, exit */
-	str	r2, [r0]
-	add	r0, r0, #4
-	b	clbss_l
-clbss_e:
-
-	bl coloured_LED_init
-	bl red_led_on
-#endif
+relocate_done:
 
-/*
- * We are done. Do not return, instead branch to second part of board
- * initialization, now running from RAM.
- */
-	ldr	r0, _board_init_r_ofs
-	adr	r1, _start
-	add	lr, r0, r1
-	add	lr, lr, r9
-	/* setup parameters for board_init_r */
-	mov	r0, r5		/* gd_t */
-	mov	r1, r6		/* dest_addr */
-	/* jump to it ... */
-	mov	pc, lr
-
-_board_init_r_ofs:
-	.word board_init_r - _start
+	bx	lr
 
 _rel_dyn_start_ofs:
 	.word __rel_dyn_start - _start
@@ -253,6 +211,11 @@  _rel_dyn_end_ofs:
 _dynsym_start_ofs:
 	.word __dynsym_start - _start
 
+	.globl	c_runtime_cpu_setup
+c_runtime_cpu_setup:
+
+	bx	lr
+
 /*
  *************************************************************************
  *
diff --git a/arch/arm/cpu/sa1100/start.S b/arch/arm/cpu/sa1100/start.S
index 1ea92d1..3144299 100644
--- a/arch/arm/cpu/sa1100/start.S
+++ b/arch/arm/cpu/sa1100/start.S
@@ -132,12 +132,7 @@  reset:
 	bl	cpu_init_crit
 #endif
 
-/* Set stackpointer in internal RAM to call board_init_f */
-call_board_init_f:
-	ldr	sp, =(CONFIG_SYS_INIT_SP_ADDR)
-	bic	sp, sp, #7 /* 8-byte alignment for ABI compliance */
-	ldr	r0,=0x00000000
-	bl	board_init_f
+	bl	_main
 
 /*------------------------------------------------------------------------------*/
 
@@ -154,14 +149,10 @@  relocate_code:
 	mov	r5, r1	/* save addr of gd */
 	mov	r6, r2	/* save addr of destination */
 
-	/* Set up the stack						    */
-stack_setup:
-	mov	sp, r4
-
 	adr	r0, _start
 	cmp	r0, r6
 	moveq	r9, #0		/* no relocation. relocation offset(r9) = 0 */
-	beq	clear_bss		/* skip relocation */
+	beq	relocate_done		/* skip relocation */
 	mov	r1, r6			/* r1 <- scratch for copy_loop */
 	ldr	r3, _bss_start_ofs
 	add	r2, r0, r3		/* r2 <- source end address	    */
@@ -213,40 +204,10 @@  fixnext:
 	blo	fixloop
 #endif
 
-clear_bss:
-#ifndef CONFIG_SPL_BUILD
-	ldr	r0, _bss_start_ofs
-	ldr	r1, _bss_end_ofs
-	mov	r4, r6			/* reloc addr */
-	add	r0, r0, r4
-	add	r1, r1, r4
-	mov	r2, #0x00000000		/* clear			    */
-
-clbss_l:cmp	r0, r1			/* clear loop... */
-	bhs	clbss_e			/* if reached end of bss, exit */
-	str	r2, [r0]
-	add	r0, r0, #4
-	b	clbss_l
-clbss_e:
-#endif
+relocate_done:
 
-/*
- * We are done. Do not return, instead branch to second part of board
- * initialization, now running from RAM.
- */
-	ldr	r0, _board_init_r_ofs
-	adr	r1, _start
-	add	lr, r0, r1
-	add	lr, lr, r9
-	/* setup parameters for board_init_r */
-	mov	r0, r5		/* gd_t */
-	mov	r1, r6		/* dest_addr */
-	/* jump to it ... */
 	mov	pc, lr
 
-_board_init_r_ofs:
-	.word board_init_r - _start
-
 _rel_dyn_start_ofs:
 	.word __rel_dyn_start - _start
 _rel_dyn_end_ofs:
@@ -254,6 +215,11 @@  _rel_dyn_end_ofs:
 _dynsym_start_ofs:
 	.word __dynsym_start - _start
 
+	.globl	c_runtime_cpu_setup
+c_runtime_cpu_setup:
+
+	mov	pc, lr
+
 /*
  *************************************************************************
  *
diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile
index 3422ac1..07baee2 100644
--- a/arch/arm/lib/Makefile
+++ b/arch/arm/lib/Makefile
@@ -36,6 +36,8 @@  GLSOBJS	+= _umodsi3.o
 
 GLCOBJS	+= div0.o
 
+SOBJS-y += crt0.o
+
 ifndef CONFIG_SPL_BUILD
 COBJS-y	+= board.o
 COBJS-y	+= bootm.o
diff --git a/arch/arm/lib/board.c b/arch/arm/lib/board.c
index 92cad9a..ad96d7e 100644
--- a/arch/arm/lib/board.c
+++ b/arch/arm/lib/board.c
@@ -272,11 +272,6 @@  void board_init_f(ulong bootflag)
 
 	bootstage_mark_name(BOOTSTAGE_ID_START_UBOOT_F, "board_init_f");
 
-	/* Pointer is writable since we allocated a register for it */
-	gd = (gd_t *) ((CONFIG_SYS_INIT_SP_ADDR) & ~0x07);
-	/* compiler optimization barrier needed for GCC >= 3.4 */
-	__asm__ __volatile__("": : :"memory");
-
 	memset((void *)gd, 0, sizeof(gd_t));
 
 	gd->mon_len = _bss_end_ofs;
@@ -458,10 +453,6 @@  void board_init_f(ulong bootflag)
 		gd->fdt_blob = new_fdt;
 	}
 	memcpy(id, (void *)gd, sizeof(gd_t));
-
-	relocate_code(addr_sp, id, addr);
-
-	/* NOTREACHED - relocate_code() does not return */
 }
 
 #if !defined(CONFIG_SYS_NO_FLASH)
@@ -486,8 +477,6 @@  void board_init_r(gd_t *id, ulong dest_addr)
 	ulong flash_size;
 #endif
 
-	gd = id;
-
 	gd->flags |= GD_FLG_RELOC;	/* tell others: relocation done */
 	bootstage_mark_name(BOOTSTAGE_ID_START_UBOOT_R, "board_init_r");
 
diff --git a/arch/arm/lib/crt0.S b/arch/arm/lib/crt0.S
new file mode 100644
index 0000000..4f60958
--- /dev/null
+++ b/arch/arm/lib/crt0.S
@@ -0,0 +1,173 @@ 
+/*
+ *  crt0 - C-runtime startup Code for ARM U-Boot
+ *
+ *  Copyright (c) 2012  Albert ARIBAUD <albert.u.boot@aribaud.net>
+ *
+ * See file CREDITS for list of people who contributed to this
+ * project.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of
+ * the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston,
+ * MA 02111-1307 USA
+ */
+
+#include <config.h>
+#include <asm-offsets.h>
+
+/*
+ * This file handles the target-independent stages of the U-Boot
+ * start-up where a C runtime environment is needed. Its entry point
+ * is _main and is branched into from the target's start.S file.
+ *
+ * _main execution sequence is:
+ *
+ * 1. Set up initial environment for calling board_init_f().
+ *    This environment only provides a stack and a place to store
+ *    the GD ('global data') structure, both located in some readily
+ *    available RAM (SRAM, locked cache...). In this context, VARIABLE
+ *    global data, initialized or not (BSS), are UNAVAILABLE; only
+ *    CONSTANT initialized data are available.
+ *
+ * 2. Call board_init_f(). This function prepares the hardware for
+ *    execution from system RAM (DRAM, DDR...) As system RAM may not
+ *    be available yet, , board_init_f() must use the current GD to
+ *    store any data which must be passed on to later stages. These
+ *    data include the relocation destination, the future stack, and
+ *    the future GD location.
+ *
+ * (the following applies only to non-SPL builds)
+ *
+ * 3. Set up intermediate environment where the stack and GD are the
+ *    ones allocated by board_init_f() in system RAM, but BSS and
+ *    initialized non-const data are still not available.
+ *
+ * 4. Call relocate_code(). This function relocates U-Boot from its
+ *    current location into the relocation destination computed by
+ *    board_init_f().
+ *
+ * 5. Set up final environment for calling board_init_r(). This
+ *    environment has BSS (initialized to 0), initialized non-const
+ *    data (initialized to their intended value), and stack in system
+ *    RAM. GD has retained values set by board_init_f(). Some CPUs
+ *    have some work left to do at this point regarding memory, so
+ *    call c_runtime_cpu_setup.
+ *
+ * 6. Branch to either nand_boot() or board_init_r().
+ */
+
+/*
+ * declare nand_boot() or board_init_r() to jump to at end of crt0
+ */
+
+#if defined(CONFIG_NAND_SPL)
+
+.globl nand_boot
+
+#elif ! defined(CONFIG_SPL_BUILD)
+
+.globl board_init_r
+
+#endif
+
+/*
+ * start and end of BSS
+ */
+
+.globl __bss_start
+.globl __bss_end__
+
+/*
+ * entry point of crt0 sequence
+ */
+
+.global _main
+
+_main:
+
+/*
+ * Set up initial C runtime environment and call board_init_f(0).
+ */
+
+#if defined(CONFIG_NAND_SPL)
+	/* deprecated, use instead CONFIG_SPL_BUILD */
+	ldr	sp, =(CONFIG_SYS_INIT_SP_ADDR)
+#elif defined(CONFIG_SPL_BUILD) && defined(CONFIG_SPL_STACK)
+	ldr	sp, =(CONFIG_SPL_STACK)
+#else
+	ldr	sp, =(CONFIG_SYS_INIT_SP_ADDR)
+#endif
+	bic	sp, sp, #7	/* 8-byte alignment for ABI compliance */
+	sub	sp, #GD_SIZE	/* allocate one GD above SP */
+	bic	sp, sp, #7	/* 8-byte alignment for ABI compliance */
+	mov	r8, sp		/* GD is above SP */
+	mov	r0, #0
+	bl	board_init_f
+
+#if ! defined(CONFIG_SPL_BUILD)
+
+/*
+ * Set up intermediate environment (new sp and gd) and call
+ * relocate_code(addr_sp, gd, addr_moni). Trick here is that
+ * we'll return 'here' but relocated.
+ */
+
+	ldr	sp, [r8, #GD_START_ADDR_SP]	/* r8 = gd->start_addr_sp */
+	bic	sp, sp, #7	/* 8-byte alignment for ABI compliance */
+	ldr	r8, [r8, #GD_BD]		/* r8 = gd->bd */
+	sub	r8, r8, #GD_SIZE		/* new GD is below bd */
+
+	adr	lr, here
+	ldr	r0, [r8, #GD_RELOC_OFF]		/* lr = gd->start_addr_sp */
+	add	lr, lr, r0
+	ldr	r0, [r8, #GD_START_ADDR_SP]	/* r0 = gd->start_addr_sp */
+	mov	r1, r8				/* r1 = gd */
+	ldr	r2, [r8, #GD_RELOCADDR]		/* r2 = gd->relocaddr */
+	b	relocate_code
+here:
+
+/* Set up final (full) environment */
+
+	bl	c_runtime_cpu_setup	/* we still call old routine here */
+
+	ldr	r0, =__bss_start	/* this is auto-relocated! */
+	ldr	r1, =__bss_end__	/* this is auto-relocated! */
+
+	mov	r2, #0x00000000		/* prepare zero to clear BSS */
+
+clbss_l:cmp	r0, r1			/* while not at end of BSS */
+	strlo	r2, [r0]		/* clear 32-bit BSS word */
+	addlo	r0, r0, #4		/* move to next */
+	blo	clbss_l
+
+	bl coloured_LED_init
+	bl red_led_on
+
+#if defined(CONFIG_NAND_SPL)
+
+	/* call _nand_boot() */
+	ldr     pc, =nand_boot
+
+#else
+
+	/* call board_init_r(gd_t *id, ulong dest_addr) */
+	mov	r0, r8			/* gd_t */
+	ldr	r1, [r8, #GD_RELOCADDR]	/* dest_addr */
+	/* call board_init_r */
+	ldr	pc, =board_init_r	/* this is auto-relocated! */
+
+#endif
+
+	/* we should not return here. */
+
+#endif
diff --git a/include/common.h b/include/common.h
index 5e3c5ee..69a1ccd 100644
--- a/include/common.h
+++ b/include/common.h
@@ -300,7 +300,7 @@  int	abortboot(int bootdelay);
 extern char console_buffer[];
 
 /* arch/$(ARCH)/lib/board.c */
-void	board_init_f  (ulong) __attribute__ ((noreturn));
+void	board_init_f(ulong);
 void	board_init_r  (gd_t *, ulong) __attribute__ ((noreturn));
 int	checkboard    (void);
 int	checkflash    (void);
diff --git a/include/configs/socfpga_cyclone5.h b/include/configs/socfpga_cyclone5.h
index 42077bd..5633d2a 100644
--- a/include/configs/socfpga_cyclone5.h
+++ b/include/configs/socfpga_cyclone5.h
@@ -65,7 +65,7 @@ 
 #define CONFIG_SYS_BOOTMAPSZ		((256*1024*1024) - (4*1024))
 
 #define CONFIG_SPL_RAM_DEVICE
-#define CONFIG_SPL_STACK (&__stack_start)
+#define CONFIG_SPL_STACK CONFIG_SYS_INIT_SP_ADDR
 #define CONFIG_SYS_SPL_MALLOC_START ((unsigned long) (&__malloc_start))
 #define CONFIG_SYS_SPL_MALLOC_SIZE (&__malloc_end - &__malloc_start)
 
diff --git a/lib/asm-offsets.c b/lib/asm-offsets.c
index c88f5d4..764e27b 100644
--- a/lib/asm-offsets.c
+++ b/lib/asm-offsets.c
@@ -28,5 +28,15 @@  int main(void)
 	DEFINE(GENERATED_BD_INFO_SIZE,
 		(sizeof(struct bd_info) + 15) & ~15);
 
+	DEFINE(GD_SIZE, sizeof(struct global_data));
+
+	DEFINE(GD_BD, offsetof(struct global_data, bd));
+
+	DEFINE(GD_RELOCADDR, offsetof(struct global_data, relocaddr));
+
+	DEFINE(GD_RELOC_OFF, offsetof(struct global_data, reloc_off));
+
+	DEFINE(GD_START_ADDR_SP, offsetof(struct global_data, start_addr_sp));
+
 	return 0;
 }
diff --git a/nand_spl/board/freescale/mx31pdk/Makefile b/nand_spl/board/freescale/mx31pdk/Makefile
index 43e72c4..2793aef 100644
--- a/nand_spl/board/freescale/mx31pdk/Makefile
+++ b/nand_spl/board/freescale/mx31pdk/Makefile
@@ -12,11 +12,12 @@  LDFLAGS := -T $(nandobj)u-boot.lds -Ttext $(CONFIG_SYS_TEXT_BASE) $(LDFLAGS) \
 AFLAGS	+= -DCONFIG_SPL_BUILD -DCONFIG_NAND_SPL
 CFLAGS	+= -DCONFIG_SPL_BUILD -DCONFIG_NAND_SPL
 
-SOBJS	= start.o lowlevel_init.o
+SOBJS	= start.o crt0.o lowlevel_init.o
 COBJS	= nand_boot_fsl_nfc.o
 
 SRCS	:= $(SRCTREE)/nand_spl/nand_boot_fsl_nfc.c
 SRCS	+= $(SRCTREE)/arch/arm/cpu/arm1136/start.S
+SRCS	+= $(SRCTREE)/arch/arm/lib/crt0.S
 SRCS	+= $(SRCTREE)/board/freescale/mx31pdk/lowlevel_init.S
 OBJS	:= $(addprefix $(obj),$(SOBJS) $(COBJS))
 __OBJS	:= $(SOBJS) $(COBJS)
@@ -49,6 +50,9 @@  $(nandobj)u-boot.lds: $(LDSCRIPT) $(LSTSCRIPT)
 $(obj)%.o:	$(SRCTREE)/arch/arm/cpu/arm1136/%.S
 	$(CC) $(AFLAGS) -c -o $@ $<
 
+$(obj)%.o:	$(SRCTREE)/arch/arm/lib/%.S
+	$(CC) $(AFLAGS) -c -o $@ $<
+
 $(obj)%.o:	$(SRCTREE)/board/freescale/mx31pdk/%.S
 	$(CC) $(AFLAGS) -c -o $@ $<
 
diff --git a/nand_spl/board/karo/tx25/Makefile b/nand_spl/board/karo/tx25/Makefile
index becf7fa..825f06d 100644
--- a/nand_spl/board/karo/tx25/Makefile
+++ b/nand_spl/board/karo/tx25/Makefile
@@ -33,11 +33,12 @@  LDFLAGS := -T $(nandobj)u-boot.lds -Ttext $(CONFIG_SYS_TEXT_BASE) $(LDFLAGS) \
 AFLAGS	+= -DCONFIG_SPL_BUILD -DCONFIG_NAND_SPL
 CFLAGS	+= -DCONFIG_SPL_BUILD -DCONFIG_NAND_SPL
 
-SOBJS	= start.o lowlevel_init.o
+SOBJS	= start.o crt0.o lowlevel_init.o
 COBJS	= nand_boot_fsl_nfc.o
 
 SRCS	:= $(SRCTREE)/nand_spl/nand_boot_fsl_nfc.c
 SRCS	+= $(SRCTREE)/arch/arm/cpu/arm926ejs/start.S
+SRCS	+= $(SRCTREE)/arch/arm/lib/crt0.S
 SRCS	+= $(SRCTREE)/board/karo/tx25/lowlevel_init.S
 OBJS	:= $(addprefix $(obj),$(SOBJS) $(COBJS))
 __OBJS	:= $(SOBJS) $(COBJS)
@@ -70,6 +71,9 @@  $(nandobj)u-boot.lds: $(LDSCRIPT) $(LSTSCRIPT)
 $(obj)%.o:	$(SRCTREE)/arch/arm/cpu/arm926ejs/%.S
 	$(CC) $(AFLAGS) -c -o $@ $<
 
+$(obj)%.o:	$(SRCTREE)/arch/arm/lib/%.S
+	$(CC) $(AFLAGS) -c -o $@ $<
+
 $(obj)%.o:	$(SRCTREE)/board/karo/tx25/%.S
 	$(CC) $(AFLAGS) -c -o $@ $<