diff mbox

[U-Boot,v8,31/31] arm: Remove duplicated start.S code

Message ID 1362139841-16540-31-git-send-email-benoit.thebaudeau@advansee.com
State Superseded
Delegated to: Stefano Babic
Headers show

Commit Message

Benoît Thébaudeau March 1, 2013, 12:10 p.m. UTC
Factorize start.S code as much as possible.

Functions that may need to be customized for some start.S are defined weak for
that purpose.

relocate_code_prepare() and relocate_code_finish() are introduced as hooks to be
executed at the beginning and at the end of relocate_code() if needed by some
start.S, e.g. for special cache or MMU operations.

Signed-off-by: Benoît Thébaudeau <benoit.thebaudeau@advansee.com>
---
Changes in v8:
 - New patch.

Changes in v7: None
Changes in v6: None
Changes in v5: None
Changes in v4: None
Changes in v3: None
Changes in v2: None

 arch/arm/cpu/arm1136/start.S       |  130 +--------------------------
 arch/arm/cpu/arm1176/start.S       |  113 +----------------------
 arch/arm/cpu/arm720t/start.S       |  124 +-------------------------
 arch/arm/cpu/arm920t/start.S       |  125 +-------------------------
 arch/arm/cpu/arm925t/start.S       |  124 +-------------------------
 arch/arm/cpu/arm926ejs/mxs/start.S |   45 +---------
 arch/arm/cpu/arm926ejs/start.S     |  131 +--------------------------
 arch/arm/cpu/arm946es/start.S      |  124 +-------------------------
 arch/arm/cpu/arm_intcm/start.S     |  124 +-------------------------
 arch/arm/cpu/armv7/start.S         |  117 +-----------------------
 arch/arm/cpu/ixp/start.S           |  124 +-------------------------
 arch/arm/cpu/pxa/start.S           |  141 +++--------------------------
 arch/arm/cpu/s3c44b0/start.S       |  124 +-------------------------
 arch/arm/cpu/sa1100/start.S        |  124 +-------------------------
 arch/arm/include/asm/start_macro.S |  172 ++++++++++++++++++++++++++++++++++++
 arch/arm/lib/crt0.S                |   21 +++++
 16 files changed, 241 insertions(+), 1622 deletions(-)
 create mode 100644 arch/arm/include/asm/start_macro.S

Comments

Albert ARIBAUD March 1, 2013, 3:46 p.m. UTC | #1
Hi Benoît,

On Fri,  1 Mar 2013 13:10:40 +0100, Benoît Thébaudeau
<benoit.thebaudeau@advansee.com> wrote:

> Factorize start.S code as much as possible.
> 
> Functions that may need to be customized for some start.S are defined weak for
> that purpose.
> 
> relocate_code_prepare() and relocate_code_finish() are introduced as hooks to be
> executed at the beginning and at the end of relocate_code() if needed by some
> start.S, e.g. for special cache or MMU operations.

NAK.

1. I don't like this idea of planting hooks inside relocate-code().
This function is about relocating code, not about MMU stuff. If there
are any MMU steps to be performed between calls to board_init_f(),
relocate_code() or board_init_r(), I want them laid out as calls of
their own right in crt0.S.

2. If we're going to factorize out relocate_code() from the various
start.S files, I want it moved not in crt0.S but in its own file. This
way, i) people can easily create binaries which use crt0.S but do not
relocate, ii) people who want to make relocate_code() a C function
will have it easier, and iii) crt0.S keeps being the ugly ASM glue
needed for flash inits, relocation and RAM inits to have a C proper
run-time environment.

Incidentally, CC:ing Simon:

> Signed-off-by: Benoît Thébaudeau <benoit.thebaudeau@advansee.com>
> ---
> Changes in v8:
>  - New patch.
> 
> Changes in v7: None
> Changes in v6: None
> Changes in v5: None
> Changes in v4: None
> Changes in v3: None
> Changes in v2: None

Is this produced by patman? If so, is there a way to avoid it? We
*know* it's a new patch in v8, so any history pre-v8 here is
meaningless.

Amicalement,
Benoît Thébaudeau March 1, 2013, 3:50 p.m. UTC | #2
Hi Albert,

On Friday, March 1, 2013 4:46:07 PM, Albert ARIBAUD wrote:
> Hi Benoît,
> 
> On Fri,  1 Mar 2013 13:10:40 +0100, Benoît Thébaudeau
> <benoit.thebaudeau@advansee.com> wrote:
> 
> > Factorize start.S code as much as possible.
> > 
> > Functions that may need to be customized for some start.S are defined weak
> > for
> > that purpose.
> > 
> > relocate_code_prepare() and relocate_code_finish() are introduced as hooks
> > to be
> > executed at the beginning and at the end of relocate_code() if needed by
> > some
> > start.S, e.g. for special cache or MMU operations.
> 
> NAK.
> 
> 1. I don't like this idea of planting hooks inside relocate-code().
> This function is about relocating code, not about MMU stuff. If there
> are any MMU steps to be performed between calls to board_init_f(),
> relocate_code() or board_init_r(), I want them laid out as calls of
> their own right in crt0.S.

I also don't like it. The finish hook was used by SMDK6400 before it was
removed, and the prepare hook is still used by pxa.

So is it OK for you if I just drop relocate_code_finish() and move and
rename the call to relocate_code_prepare() to crt0.S?

> 2. If we're going to factorize out relocate_code() from the various
> start.S files, I want it moved not in crt0.S but in its own file.

It is not in crt0.S, but in arch/arm/include/asm/start_marco.S, which is
almost its own file apart from another macro.

> This
> way, i) people can easily create binaries which use crt0.S but do not
> relocate, ii) people who want to make relocate_code() a C function
> will have it easier, and iii) crt0.S keeps being the ugly ASM glue
> needed for flash inits, relocation and RAM inits to have a C proper
> run-time environment.

Which is already the case with this implementation?

> Incidentally, CC:ing Simon:
> 
> > Signed-off-by: Benoît Thébaudeau <benoit.thebaudeau@advansee.com>
> > ---
> > Changes in v8:
> >  - New patch.
> > 
> > Changes in v7: None
> > Changes in v6: None
> > Changes in v5: None
> > Changes in v4: None
> > Changes in v3: None
> > Changes in v2: None
> 
> Is this produced by patman?

Yes.

> If so, is there a way to avoid it?

Not that I know of.

> We
> *know* it's a new patch in v8, so any history pre-v8 here is
> meaningless.

Sure.

Best regards,
Benoît
Benoît Thébaudeau March 1, 2013, 4:33 p.m. UTC | #3
On Friday, March 1, 2013 4:50:44 PM, Benoît Thébaudeau wrote:
> Hi Albert,
> 
> On Friday, March 1, 2013 4:46:07 PM, Albert ARIBAUD wrote:
> > Hi Benoît,
> > 
> > On Fri,  1 Mar 2013 13:10:40 +0100, Benoît Thébaudeau
> > <benoit.thebaudeau@advansee.com> wrote:
> > 
> > > Factorize start.S code as much as possible.
> > > 
> > > Functions that may need to be customized for some start.S are defined
> > > weak
> > > for
> > > that purpose.
> > > 
> > > relocate_code_prepare() and relocate_code_finish() are introduced as
> > > hooks
> > > to be
> > > executed at the beginning and at the end of relocate_code() if needed by
> > > some
> > > start.S, e.g. for special cache or MMU operations.
> > 
> > NAK.
> > 
> > 1. I don't like this idea of planting hooks inside relocate-code().
> > This function is about relocating code, not about MMU stuff. If there
> > are any MMU steps to be performed between calls to board_init_f(),
> > relocate_code() or board_init_r(), I want them laid out as calls of
> > their own right in crt0.S.
> 
> I also don't like it. The finish hook was used by SMDK6400 before it was
> removed, and the prepare hook is still used by pxa.
> 
> So is it OK for you if I just drop relocate_code_finish() and move and
> rename the call to relocate_code_prepare() to crt0.S?
> 
> > 2. If we're going to factorize out relocate_code() from the various
> > start.S files, I want it moved not in crt0.S but in its own file.
> 
> It is not in crt0.S, but in arch/arm/include/asm/start_marco.S, which is
> almost its own file apart from another macro.
> 
> > This
> > way, i) people can easily create binaries which use crt0.S but do not
> > relocate, ii) people who want to make relocate_code() a C function
> > will have it easier, and iii) crt0.S keeps being the ugly ASM glue
> > needed for flash inits, relocation and RAM inits to have a C proper
> > run-time environment.
> 
> Which is already the case with this implementation?

And in case you ask, with relocate_code() as a function in its own file instead
of a macro called from start.S, it does not work because of the _start-relative
word values that require relocate_code() to be in _start's section.

Best regards,
Benoît
Albert ARIBAUD March 1, 2013, 9:56 p.m. UTC | #4
Hi Benoît,

On Fri, 1 Mar 2013 16:50:44 +0100 (CET), Benoît Thébaudeau
<benoit.thebaudeau@advansee.com> wrote:

> Hi Albert,
> 
> On Friday, March 1, 2013 4:46:07 PM, Albert ARIBAUD wrote:
> > Hi Benoît,
> > 
> > On Fri,  1 Mar 2013 13:10:40 +0100, Benoît Thébaudeau
> > <benoit.thebaudeau@advansee.com> wrote:
> > 
> > > Factorize start.S code as much as possible.
> > > 
> > > Functions that may need to be customized for some start.S are defined weak
> > > for
> > > that purpose.
> > > 
> > > relocate_code_prepare() and relocate_code_finish() are introduced as hooks
> > > to be
> > > executed at the beginning and at the end of relocate_code() if needed by
> > > some
> > > start.S, e.g. for special cache or MMU operations.
> > 
> > NAK.
> > 
> > 1. I don't like this idea of planting hooks inside relocate-code().
> > This function is about relocating code, not about MMU stuff. If there
> > are any MMU steps to be performed between calls to board_init_f(),
> > relocate_code() or board_init_r(), I want them laid out as calls of
> > their own right in crt0.S.
> 
> I also don't like it. The finish hook was used by SMDK6400 before it was
> removed, and the prepare hook is still used by pxa.
> 
> So is it OK for you if I just drop relocate_code_finish() and move and
> rename the call to relocate_code_prepare() to crt0.S?

Fine, except for the name: "prepare for relocation" is what every
instruction does from board_init_f() return to relocate_code() entry.
This 'hook' does only a small part, if at all, of preparing for
relocation, and this part must get a less improper name. If we are
enabling the I-cache here, then let's name the function accordingly.
Better yet, let us find out if we do need to enable the instruction
cache here at all.

> > 2. If we're going to factorize out relocate_code() from the various
> > start.S files, I want it moved not in crt0.S but in its own file.
> 
> It is not in crt0.S, but in arch/arm/include/asm/start_marco.S, which is
> almost its own file apart from another macro.

I do not want it as a macro. It is and should stay a function.
Regarding your added comment:

Actually, I'd stopped dead at the relocate_code() changes, but the
other macros I don't like much either; I don't see the point of it.

To be faire, I don't see the point of the whole patch wrt the
objective.

> And in case you ask, with relocate_code() as a function in its own
> file instead of a macro called from start.S, it does not work because
> of the _start-relative word values that require relocate_code() to be
> in _start's section.

How does it "not work" exactly?

> > This
> > way, i) people can easily create binaries which use crt0.S but do not
> > relocate, ii) people who want to make relocate_code() a C function
> > will have it easier, and iii) crt0.S keeps being the ugly ASM glue
> > needed for flash inits, relocation and RAM inits to have a C proper
> > run-time environment.
> 
> Which is already the case with this implementation?

Not with relocate_code() as a macro, though.

This whole thing/way of "factorizing" start.S using macros feels wrong
to me; this departs from what I have started with crt0.S, where code
is actually really factorized in a single file which is actually a
compilation unit, not a helper file.

Do you need patch 31/31 in your series?

> > Incidentally, CC:ing Simon:
> > 
> > > Signed-off-by: Benoît Thébaudeau <benoit.thebaudeau@advansee.com>
> > > ---
> > > Changes in v8:
> > >  - New patch.
> > > 
> > > Changes in v7: None
> > > Changes in v6: None
> > > Changes in v5: None
> > > Changes in v4: None
> > > Changes in v3: None
> > > Changes in v2: None
> > 
> > Is this produced by patman?
> 
> Yes [...]

Ok, then, don't bother to fix patman's behavior manually in your
own patches -- I'll try and see if I can submit a patch to fix patman
itself.

> Best regards,
> Benoît

Amicalement,
Albert ARIBAUD March 1, 2013, 10:02 p.m. UTC | #5
Hi Albert,

On Fri, 1 Mar 2013 22:56:50 +0100, Albert ARIBAUD
<albert.u.boot@aribaud.net> wrote:

> Hi Benoît,
> 
> On Fri, 1 Mar 2013 16:50:44 +0100 (CET), Benoît Thébaudeau
> <benoit.thebaudeau@advansee.com> wrote:
> 
> > Hi Albert,
> > 
> > On Friday, March 1, 2013 4:46:07 PM, Albert ARIBAUD wrote:
> > > Hi Benoît,
> > > 
> > > On Fri,  1 Mar 2013 13:10:40 +0100, Benoît Thébaudeau
> > > <benoit.thebaudeau@advansee.com> wrote:
> > > 
> > > > Factorize start.S code as much as possible.
> > > > 
> > > > Functions that may need to be customized for some start.S are defined weak
> > > > for
> > > > that purpose.
> > > > 
> > > > relocate_code_prepare() and relocate_code_finish() are introduced as hooks
> > > > to be
> > > > executed at the beginning and at the end of relocate_code() if needed by
> > > > some
> > > > start.S, e.g. for special cache or MMU operations.
> > > 
> > > NAK.
> > > 
> > > 1. I don't like this idea of planting hooks inside relocate-code().
> > > This function is about relocating code, not about MMU stuff. If there
> > > are any MMU steps to be performed between calls to board_init_f(),
> > > relocate_code() or board_init_r(), I want them laid out as calls of
> > > their own right in crt0.S.
> > 
> > I also don't like it. The finish hook was used by SMDK6400 before it was
> > removed, and the prepare hook is still used by pxa.
> > 
> > So is it OK for you if I just drop relocate_code_finish() and move and
> > rename the call to relocate_code_prepare() to crt0.S?
> 
> Fine, except for the name: "prepare for relocation" is what every
> instruction does from board_init_f() return to relocate_code() entry.
> This 'hook' does only a small part, if at all, of preparing for
> relocation, and this part must get a less improper name. If we are
> enabling the I-cache here, then let's name the function accordingly.
> Better yet, let us find out if we do need to enable the instruction
> cache here at all.
> 
> > > 2. If we're going to factorize out relocate_code() from the various
> > > start.S files, I want it moved not in crt0.S but in its own file.
> > 
> > It is not in crt0.S, but in arch/arm/include/asm/start_marco.S, which is
> > almost its own file apart from another macro.
> 
> I do not want it as a macro. It is and should stay a function.
> Regarding your added comment:

Sorry for the mixup here. Drop the "Regarding..." just above...

> Actually, I'd stopped dead at the relocate_code() changes, but the
> other macros I don't like much either; I don't see the point of it.
> 
> To be faire, I don't see the point of the whole patch wrt the
> objective.

... it should go here.
 
> > And in case you ask, with relocate_code() as a function in its own
> > file instead of a macro called from start.S, it does not work because
> > of the _start-relative word values that require relocate_code() to be
> > in _start's section.
> 
> How does it "not work" exactly?
> 
> > > This
> > > way, i) people can easily create binaries which use crt0.S but do not
> > > relocate, ii) people who want to make relocate_code() a C function
> > > will have it easier, and iii) crt0.S keeps being the ugly ASM glue
> > > needed for flash inits, relocation and RAM inits to have a C proper
> > > run-time environment.
> > 
> > Which is already the case with this implementation?
> 
> Not with relocate_code() as a macro, though.
> 
> This whole thing/way of "factorizing" start.S using macros feels wrong
> to me; this departs from what I have started with crt0.S, where code
> is actually really factorized in a single file which is actually a
> compilation unit, not a helper file.
> 
> Do you need patch 31/31 in your series?
> 
> > > Incidentally, CC:ing Simon:
> > > 
> > > > Signed-off-by: Benoît Thébaudeau <benoit.thebaudeau@advansee.com>
> > > > ---
> > > > Changes in v8:
> > > >  - New patch.
> > > > 
> > > > Changes in v7: None
> > > > Changes in v6: None
> > > > Changes in v5: None
> > > > Changes in v4: None
> > > > Changes in v3: None
> > > > Changes in v2: None
> > > 
> > > Is this produced by patman?
> > 
> > Yes [...]
> 
> Ok, then, don't bother to fix patman's behavior manually in your
> own patches -- I'll try and see if I can submit a patch to fix patman
> itself.
> 
> > Best regards,
> > Benoît
> 
> Amicalement,


Amicalement,
Benoît Thébaudeau March 1, 2013, 10:54 p.m. UTC | #6
Hi Albert,

On Friday, March 1, 2013 10:56:50 PM, Albert ARIBAUD wrote:
> Hi Benoît,
> 
> On Fri, 1 Mar 2013 16:50:44 +0100 (CET), Benoît Thébaudeau
> <benoit.thebaudeau@advansee.com> wrote:
> 
> > Hi Albert,
> > 
> > On Friday, March 1, 2013 4:46:07 PM, Albert ARIBAUD wrote:
> > > Hi Benoît,
> > > 
> > > On Fri,  1 Mar 2013 13:10:40 +0100, Benoît Thébaudeau
> > > <benoit.thebaudeau@advansee.com> wrote:
> > > 
> > > > Factorize start.S code as much as possible.
> > > > 
> > > > Functions that may need to be customized for some start.S are defined
> > > > weak
> > > > for
> > > > that purpose.
> > > > 
> > > > relocate_code_prepare() and relocate_code_finish() are introduced as
> > > > hooks
> > > > to be
> > > > executed at the beginning and at the end of relocate_code() if needed
> > > > by
> > > > some
> > > > start.S, e.g. for special cache or MMU operations.
> > > 
> > > NAK.
> > > 
> > > 1. I don't like this idea of planting hooks inside relocate-code().
> > > This function is about relocating code, not about MMU stuff. If there
> > > are any MMU steps to be performed between calls to board_init_f(),
> > > relocate_code() or board_init_r(), I want them laid out as calls of
> > > their own right in crt0.S.
> > 
> > I also don't like it. The finish hook was used by SMDK6400 before it was
> > removed, and the prepare hook is still used by pxa.
> > 
> > So is it OK for you if I just drop relocate_code_finish() and move and
> > rename the call to relocate_code_prepare() to crt0.S?
> 
> Fine, except for the name: "prepare for relocation" is what every
> instruction does from board_init_f() return to relocate_code() entry.
> This 'hook' does only a small part, if at all, of preparing for
> relocation, and this part must get a less improper name. If we are
> enabling the I-cache here, then let's name the function accordingly.
> Better yet, let us find out if we do need to enable the instruction
> cache here at all.

Correct. For PXA25X, this cpu_init_crit() is paired with lock_cache_for_stack(),
apparently for some hardware hack, but this is not very clear if this is
required or if it could not be done otherwise. Do you know a PXA expert?

Marek, you introduced that in commit 7f4cfcf. Do you know the details about why?

If we could drop it or move it elsewhere, that would be great.

> > > 2. If we're going to factorize out relocate_code() from the various
> > > start.S files, I want it moved not in crt0.S but in its own file.
> > 
> > It is not in crt0.S, but in arch/arm/include/asm/start_marco.S, which is
> > almost its own file apart from another macro.
> 
> I do not want it as a macro. It is and should stay a function.
> Regarding your added comment:
> 
> Actually, I'd stopped dead at the relocate_code() changes, but the
> other macros I don't like much either; I don't see the point of it.
> 
> To be faire, I don't see the point of the whole patch wrt the
> objective.

This patch is just appended to the series because it depends on it, not because
the series needs it.

This patch only aims at cleaning up code by removing copied/pasted code in order
to simplify maintenance and to clarify things.

There have been many changes in this relocate_code(), and not always the same
for all start.S, so from the outside, the purpose is unclear because one might
wonder if those differences have been created on purpose or not.

> > And in case you ask, with relocate_code() as a function in its own
> > file instead of a macro called from start.S, it does not work because
> > of the _start-relative word values that require relocate_code() to be
> > in _start's section.
> 
> How does it "not work" exactly?

The assembler issues an error (I don't remember the exact message) for all lines
like ".word __rel_dyn_start - _start", complaining that this is not a kind of
expression that it can prepare for the linker to resolve.

Though, it would still be possible to find a more complicated way of doing the
same thing at runtime.

> > > This
> > > way, i) people can easily create binaries which use crt0.S but do not
> > > relocate, ii) people who want to make relocate_code() a C function
> > > will have it easier, and iii) crt0.S keeps being the ugly ASM glue
> > > needed for flash inits, relocation and RAM inits to have a C proper
> > > run-time environment.
> > 
> > Which is already the case with this implementation?
> 
> Not with relocate_code() as a macro, though.
> 
> This whole thing/way of "factorizing" start.S using macros feels wrong
> to me; this departs from what I have started with crt0.S, where code
> is actually really factorized in a single file which is actually a
> compilation unit, not a helper file.

Yes. Well, for this patch, I had first moved relocate_code() to crt0.S (could
have been its own file), but I eventually switched to the macro solution because
of the assembler errors.

> Do you need patch 31/31 in your series?

As explained above, no. But I really think that something should be done to stop
relocate_code() duplication, one way or another, by me or someone else. I just
wanted to help here.

> > > Incidentally, CC:ing Simon:
> > > 
> > > > Signed-off-by: Benoît Thébaudeau <benoit.thebaudeau@advansee.com>
> > > > ---
> > > > Changes in v8:
> > > >  - New patch.
> > > > 
> > > > Changes in v7: None
> > > > Changes in v6: None
> > > > Changes in v5: None
> > > > Changes in v4: None
> > > > Changes in v3: None
> > > > Changes in v2: None
> > > 
> > > Is this produced by patman?
> > 
> > Yes [...]
> 
> Ok, then, don't bother to fix patman's behavior manually in your
> own patches -- I'll try and see if I can submit a patch to fix patman
> itself.

OK.

patman had also removed some "Reviewed-by" that I had to restore manually before
sending. This is a documented behavior, but not cool.

And contrary to what the documentation says, patman adds my SoB line even if I
have forced another SoB in the commit message, which I also had to fix manually.

Best regards,
Benoît
Simon Glass March 2, 2013, 12:22 a.m. UTC | #7
Hi,

On Fri, Mar 1, 2013 at 2:54 PM, Benoît Thébaudeau
<benoit.thebaudeau@advansee.com> wrote:
> Hi Albert,
>
> On Friday, March 1, 2013 10:56:50 PM, Albert ARIBAUD wrote:
>> Hi Benoît,
>>
>> On Fri, 1 Mar 2013 16:50:44 +0100 (CET), Benoît Thébaudeau
>> <benoit.thebaudeau@advansee.com> wrote:
>>
>> > Hi Albert,
>> >
>> > On Friday, March 1, 2013 4:46:07 PM, Albert ARIBAUD wrote:
>> > > Hi Benoît,
>> > >
>> > > On Fri,  1 Mar 2013 13:10:40 +0100, Benoît Thébaudeau
>> > > <benoit.thebaudeau@advansee.com> wrote:
>> > >
>> > > > Factorize start.S code as much as possible.
>> > > >
>> > > > Functions that may need to be customized for some start.S are defined
>> > > > weak
>> > > > for
>> > > > that purpose.
>> > > >
>> > > > relocate_code_prepare() and relocate_code_finish() are introduced as
>> > > > hooks
>> > > > to be
>> > > > executed at the beginning and at the end of relocate_code() if needed
>> > > > by
>> > > > some
>> > > > start.S, e.g. for special cache or MMU operations.
>> > >
>> > > NAK.
>> > >
>> > > 1. I don't like this idea of planting hooks inside relocate-code().
>> > > This function is about relocating code, not about MMU stuff. If there
>> > > are any MMU steps to be performed between calls to board_init_f(),
>> > > relocate_code() or board_init_r(), I want them laid out as calls of
>> > > their own right in crt0.S.
>> >
>> > I also don't like it. The finish hook was used by SMDK6400 before it was
>> > removed, and the prepare hook is still used by pxa.
>> >
>> > So is it OK for you if I just drop relocate_code_finish() and move and
>> > rename the call to relocate_code_prepare() to crt0.S?
>>
>> Fine, except for the name: "prepare for relocation" is what every
>> instruction does from board_init_f() return to relocate_code() entry.
>> This 'hook' does only a small part, if at all, of preparing for
>> relocation, and this part must get a less improper name. If we are
>> enabling the I-cache here, then let's name the function accordingly.
>> Better yet, let us find out if we do need to enable the instruction
>> cache here at all.
>
> Correct. For PXA25X, this cpu_init_crit() is paired with lock_cache_for_stack(),
> apparently for some hardware hack, but this is not very clear if this is
> required or if it could not be done otherwise. Do you know a PXA expert?
>
> Marek, you introduced that in commit 7f4cfcf. Do you know the details about why?
>
> If we could drop it or move it elsewhere, that would be great.
>
>> > > 2. If we're going to factorize out relocate_code() from the various
>> > > start.S files, I want it moved not in crt0.S but in its own file.
>> >
>> > It is not in crt0.S, but in arch/arm/include/asm/start_marco.S, which is
>> > almost its own file apart from another macro.
>>
>> I do not want it as a macro. It is and should stay a function.
>> Regarding your added comment:
>>
>> Actually, I'd stopped dead at the relocate_code() changes, but the
>> other macros I don't like much either; I don't see the point of it.
>>
>> To be faire, I don't see the point of the whole patch wrt the
>> objective.
>
> This patch is just appended to the series because it depends on it, not because
> the series needs it.
>
> This patch only aims at cleaning up code by removing copied/pasted code in order
> to simplify maintenance and to clarify things.
>
> There have been many changes in this relocate_code(), and not always the same
> for all start.S, so from the outside, the purpose is unclear because one might
> wonder if those differences have been created on purpose or not.
>
>> > And in case you ask, with relocate_code() as a function in its own
>> > file instead of a macro called from start.S, it does not work because
>> > of the _start-relative word values that require relocate_code() to be
>> > in _start's section.
>>
>> How does it "not work" exactly?
>
> The assembler issues an error (I don't remember the exact message) for all lines
> like ".word __rel_dyn_start - _start", complaining that this is not a kind of
> expression that it can prepare for the linker to resolve.
>
> Though, it would still be possible to find a more complicated way of doing the
> same thing at runtime.
>
>> > > This
>> > > way, i) people can easily create binaries which use crt0.S but do not
>> > > relocate, ii) people who want to make relocate_code() a C function
>> > > will have it easier, and iii) crt0.S keeps being the ugly ASM glue
>> > > needed for flash inits, relocation and RAM inits to have a C proper
>> > > run-time environment.
>> >
>> > Which is already the case with this implementation?
>>
>> Not with relocate_code() as a macro, though.
>>
>> This whole thing/way of "factorizing" start.S using macros feels wrong
>> to me; this departs from what I have started with crt0.S, where code
>> is actually really factorized in a single file which is actually a
>> compilation unit, not a helper file.
>
> Yes. Well, for this patch, I had first moved relocate_code() to crt0.S (could
> have been its own file), but I eventually switched to the macro solution because
> of the assembler errors.
>
>> Do you need patch 31/31 in your series?
>
> As explained above, no. But I really think that something should be done to stop
> relocate_code() duplication, one way or another, by me or someone else. I just
> wanted to help here.
>
>> > > Incidentally, CC:ing Simon:
>> > >
>> > > > Signed-off-by: Benoît Thébaudeau <benoit.thebaudeau@advansee.com>
>> > > > ---
>> > > > Changes in v8:
>> > > >  - New patch.
>> > > >
>> > > > Changes in v7: None
>> > > > Changes in v6: None
>> > > > Changes in v5: None
>> > > > Changes in v4: None
>> > > > Changes in v3: None
>> > > > Changes in v2: None
>> > >
>> > > Is this produced by patman?
>> >
>> > Yes [...]
>>
>> Ok, then, don't bother to fix patman's behavior manually in your
>> own patches -- I'll try and see if I can submit a patch to fix patman
>> itself.
>
> OK.
>
> patman had also removed some "Reviewed-by" that I had to restore manually before
> sending. This is a documented behavior, but not cool.
>
> And contrary to what the documentation says, patman adds my SoB line even if I
> have forced another SoB in the commit message, which I also had to fix manually.

Yes I have hit this myself. Someone should do a couple of patches to
fix this. I will put it on my list in case someone else doesn't get to
it first. Specifically:

- Don't touch/add Signed-off-by: but perhaps just want if there is not
at least one in a patch
- Don't touch Reviewed-by: in the normal case - but perhaps provide a
flag to remove this Geritt tag

Regards,
Simon

>
> Best regards,
> Benoît
Benoît Thébaudeau March 2, 2013, 1:10 a.m. UTC | #8
Hi Simon,

On Saturday, March 2, 2013 1:22:28 AM, Simon Glass wrote:
> On Fri, Mar 1, 2013 at 2:54 PM, Benoît Thébaudeau
> <benoit.thebaudeau@advansee.com> wrote:
> > On Friday, March 1, 2013 10:56:50 PM, Albert ARIBAUD wrote:
> >> On Fri, 1 Mar 2013 16:50:44 +0100 (CET), Benoît Thébaudeau
> >> <benoit.thebaudeau@advansee.com> wrote:
> >> > On Friday, March 1, 2013 4:46:07 PM, Albert ARIBAUD wrote:
> >> > > On Fri,  1 Mar 2013 13:10:40 +0100, Benoît Thébaudeau
> >> > > <benoit.thebaudeau@advansee.com> wrote:
> >> > > Incidentally, CC:ing Simon:
> >> > >
> >> > > > Signed-off-by: Benoît Thébaudeau <benoit.thebaudeau@advansee.com>
> >> > > > ---
> >> > > > Changes in v8:
> >> > > >  - New patch.
> >> > > >
> >> > > > Changes in v7: None
> >> > > > Changes in v6: None
> >> > > > Changes in v5: None
> >> > > > Changes in v4: None
> >> > > > Changes in v3: None
> >> > > > Changes in v2: None
> >> > >
> >> > > Is this produced by patman?
> >> >
> >> > Yes [...]
> >>
> >> Ok, then, don't bother to fix patman's behavior manually in your
> >> own patches -- I'll try and see if I can submit a patch to fix patman
> >> itself.
> >
> > OK.
> >
> > patman had also removed some "Reviewed-by" that I had to restore manually
> > before
> > sending. This is a documented behavior, but not cool.
> >
> > And contrary to what the documentation says, patman adds my SoB line even
> > if I
> > have forced another SoB in the commit message, which I also had to fix
> > manually.
> 
> Yes I have hit this myself. Someone should do a couple of patches to
> fix this. I will put it on my list in case someone else doesn't get to
> it first. Specifically:
> 
> - Don't touch/add Signed-off-by: but perhaps just want if there is not
> at least one in a patch
> - Don't touch Reviewed-by: in the normal case - but perhaps provide a
> flag to remove this Geritt tag

Thanks, that'd be great. And also a 3rd one for what Albert said (which he might
do himself):
 - Do not report version changes before a patch has been introduced: if a patch
   has been introduced in version n, start reporting version changes for this
   patch from version n. This will probably require a new tag to tell patman in
   which version a patch has been created, e.g.:

Patch-creation: n

Best regards,
Benoît
Albert ARIBAUD March 2, 2013, 6:45 a.m. UTC | #9
Hi Benoît,

On Fri, 1 Mar 2013 23:54:26 +0100 (CET), Benoît Thébaudeau
<benoit.thebaudeau@advansee.com> wrote:

Re: assembler error messages:

> > > And in case you ask, with relocate_code() as a function in its own
> > > file instead of a macro called from start.S, it does not work because
> > > of the _start-relative word values that require relocate_code() to be
> > > in _start's section.
> > 
> > How does it "not work" exactly?
> 
> The assembler issues an error (I don't remember the exact message) for all lines
> like ".word __rel_dyn_start - _start", complaining that this is not a kind of
> expression that it can prepare for the linker to resolve.
> 
> Though, it would still be possible to find a more complicated way of doing the
> same thing at runtime.

(later)

> > This whole thing/way of "factorizing" start.S using macros feels wrong
> > to me; this departs from what I have started with crt0.S, where code
> > is actually really factorized in a single file which is actually a
> > compilation unit, not a helper file.
> 
> Yes. Well, for this patch, I had first moved relocate_code() to crt0.S (could
> have been its own file), but I eventually switched to the macro solution because
> of the assembler errors.

I've had issue in the past similar to this when I implemented the ELF
relocation, the crt0.S factorization and more recently the R_ARM_ABS32
relocation record removal. These must be fixed with much attention, for
instance in order to not produce R_ARM_ABS32 relocations, the removal
of which I have just submitted a patch for. I think there is a way to
factorize relocate_code() (and other parts) out of start.S files, built
on what I did for crt0.S, and which should not cause these issues.

Re: patch 31/31 generally:

> This patch is just appended to the series because it depends on it, not because
> the series needs it.
> 
> This patch only aims at cleaning up code by removing copied/pasted code in order
> to simplify maintenance and to clarify things.
> 
> There have been many changes in this relocate_code(), and not always the same
> for all start.S, so from the outside, the purpose is unclear because one might
> wonder if those differences have been created on purpose or not.

(picked up from later in the reply)

> > Do you need patch 31/31 in your series?
> 
> As explained above, no. But I really think that something should be done to stop
> relocate_code() duplication, one way or another, by me or someone else. I just
> wanted to help here.

First, let me say that I appreciate the great help that you're giving us
with this (30-patch!) series.

And I agree about the premise that ARM startup sequence, including
but not limited to relocate_code(), is literally all over the place and
that there is a need to fix this; I wrote so in the cover letter of
my crt0.S patch series, which I consider a starting point and example
of how I consider this should be done.

Also, we must keep in mind that part of the code in ARM should, and
eventually will, be merged into a single U-Boot-wide version. ELF code
relocation is not ARM specific except for the two (to be reduced soon
to only one) ARM relocation record types. Thus, when we touch this
code, we must keep it close, or make it closer, to the code in other
U-Boot architectures; IIRC there are already patches out there to make
relocate_code() a single project-wide true C function.

And I think that this newly added patch 31/31 in your series does not
match either the way I want ARM startup simplification to go or the
general goal of unifying relocate_code().

Thus, if you don't mind, I'd prefer patch 31/31 to move out of the
series. And, since I want to avoid anyone the hassle of going through
this again, I guess I will have to submit a patch for relocate_code()
factorization -- quite probably above your series, since many fixes
you make in it may be useful or even needed.

> Best regards,
> Benoît

Amicalement,
Benoît Thébaudeau March 2, 2013, 1:42 p.m. UTC | #10
Hi Albert,

On Saturday, March 2, 2013 7:45:06 AM, Albert ARIBAUD wrote:
> Hi Benoît,
> 
> On Fri, 1 Mar 2013 23:54:26 +0100 (CET), Benoît Thébaudeau
> <benoit.thebaudeau@advansee.com> wrote:
> 
> Re: assembler error messages:
> 
> > > > And in case you ask, with relocate_code() as a function in its own
> > > > file instead of a macro called from start.S, it does not work because
> > > > of the _start-relative word values that require relocate_code() to be
> > > > in _start's section.
> > > 
> > > How does it "not work" exactly?
> > 
> > The assembler issues an error (I don't remember the exact message) for all
> > lines
> > like ".word __rel_dyn_start - _start", complaining that this is not a kind
> > of
> > expression that it can prepare for the linker to resolve.
> > 
> > Though, it would still be possible to find a more complicated way of doing
> > the
> > same thing at runtime.
> 
> (later)
> 
> > > This whole thing/way of "factorizing" start.S using macros feels wrong
> > > to me; this departs from what I have started with crt0.S, where code
> > > is actually really factorized in a single file which is actually a
> > > compilation unit, not a helper file.
> > 
> > Yes. Well, for this patch, I had first moved relocate_code() to crt0.S
> > (could
> > have been its own file), but I eventually switched to the macro solution
> > because
> > of the assembler errors.
> 
> I've had issue in the past similar to this when I implemented the ELF
> relocation, the crt0.S factorization and more recently the R_ARM_ABS32
> relocation record removal. These must be fixed with much attention, for
> instance in order to not produce R_ARM_ABS32 relocations, the removal
> of which I have just submitted a patch for. I think there is a way to
> factorize relocate_code() (and other parts) out of start.S files, built
> on what I did for crt0.S, and which should not cause these issues.

Yes, I think so. In the worst case, it should be possible to access out-of-range
symbol relatively using adr or adrl extensively at runtime instead of
pre-computed _start-relative offsets.

> Re: patch 31/31 generally:
> 
> > This patch is just appended to the series because it depends on it, not
> > because
> > the series needs it.
> > 
> > This patch only aims at cleaning up code by removing copied/pasted code in
> > order
> > to simplify maintenance and to clarify things.
> > 
> > There have been many changes in this relocate_code(), and not always the
> > same
> > for all start.S, so from the outside, the purpose is unclear because one
> > might
> > wonder if those differences have been created on purpose or not.
> 
> (picked up from later in the reply)
> 
> > > Do you need patch 31/31 in your series?
> > 
> > As explained above, no. But I really think that something should be done to
> > stop
> > relocate_code() duplication, one way or another, by me or someone else. I
> > just
> > wanted to help here.
> 
> First, let me say that I appreciate the great help that you're giving us
> with this (30-patch!) series.
> 
> And I agree about the premise that ARM startup sequence, including
> but not limited to relocate_code(), is literally all over the place and
> that there is a need to fix this; I wrote so in the cover letter of
> my crt0.S patch series, which I consider a starting point and example
> of how I consider this should be done.
> 
> Also, we must keep in mind that part of the code in ARM should, and
> eventually will, be merged into a single U-Boot-wide version. ELF code
> relocation is not ARM specific except for the two (to be reduced soon
> to only one) ARM relocation record types. Thus, when we touch this
> code, we must keep it close, or make it closer, to the code in other
> U-Boot architectures; IIRC there are already patches out there to make
> relocate_code() a single project-wide true C function.
> 
> And I think that this newly added patch 31/31 in your series does not
> match either the way I want ARM startup simplification to go or the
> general goal of unifying relocate_code().
> 
> Thus, if you don't mind, I'd prefer patch 31/31 to move out of the
> series. And, since I want to avoid anyone the hassle of going through
> this again, I guess I will have to submit a patch for relocate_code()
> factorization -- quite probably above your series, since many fixes
> you make in it may be useful or even needed.

OK, let's do this. It will also help to stop postponing the application of this
series because of more new versions.

Please just Cc me when you will post these patches so that I review them.

Best regards,
Benoît
Albert ARIBAUD March 3, 2013, 8:14 a.m. UTC | #11
Hi Benoît,

On Sat, 2 Mar 2013 14:42:16 +0100 (CET), Benoît Thébaudeau
<benoit.thebaudeau@advansee.com> wrote:

> > Thus, if you don't mind, I'd prefer patch 31/31 to move out of the
> > series. And, since I want to avoid anyone the hassle of going through
> > this again, I guess I will have to submit a patch for relocate_code()
> > factorization -- quite probably above your series, since many fixes
> > you make in it may be useful or even needed.
> 
> OK, let's do this. It will also help to stop postponing the application of this
> series because of more new versions.

Also, yes.

One minor request re your upcoming v9: beside posting the series as you
do on the mailing list, can you provide a git repo and a branch which
could be pulled, even if it contains patman annotations? This way, the
review process continues unchanged on the list, and I can begin the
relocate_code() work without having to wait for your series to be
applied or to apply every patch in your series manualy myself.

> Please just Cc me when you will post these patches so that I review them.

I will.

> Best regards,
> Benoît

Amicalement,
diff mbox

Patch

diff --git a/arch/arm/cpu/arm1136/start.S b/arch/arm/cpu/arm1136/start.S
index eb11b69..19081e8 100644
--- a/arch/arm/cpu/arm1136/start.S
+++ b/arch/arm/cpu/arm1136/start.S
@@ -31,6 +31,7 @@ 
 #include <asm-offsets.h>
 #include <config.h>
 #include <version.h>
+#include <asm/start_macro.S>
 .globl _start
 _start: b	reset
 #ifdef CONFIG_SPL_BUILD
@@ -86,52 +87,7 @@  _end_vect:
  *************************************************************************
  */
 
-.globl _TEXT_BASE
-_TEXT_BASE:
-#if defined(CONFIG_SPL_BUILD) && defined(CONFIG_SPL_TEXT_BASE)
-	.word	CONFIG_SPL_TEXT_BASE
-#else
-	.word	CONFIG_SYS_TEXT_BASE
-#endif
-
-/*
- * These are defined in the board-specific linker script.
- * Subtracting _start from them lets the linker put their
- * relative position in the executable instead of leaving
- * them null.
- */
-.globl _bss_start_ofs
-_bss_start_ofs:
-	.word __bss_start - _start
-
-.globl _image_copy_end_ofs
-_image_copy_end_ofs:
-	.word __image_copy_end - _start
-
-.globl _bss_end_ofs
-_bss_end_ofs:
-	.word __bss_end__ - _start
-
-.globl _end_ofs
-_end_ofs:
-	.word _end - _start
-
-#ifdef CONFIG_USE_IRQ
-/* IRQ stack memory (calculated at run-time) */
-.globl IRQ_STACK_START
-IRQ_STACK_START:
-	.word	0x0badc0de
-
-/* IRQ stack memory (calculated at run-time) */
-.globl FIQ_STACK_START
-FIQ_STACK_START:
-	.word 0x0badc0de
-#endif
-
-/* IRQ stack memory (calculated at run-time) + 8 bytes */
-.globl IRQ_STACK_START_IN
-IRQ_STACK_START_IN:
-	.word	0x0badc0de
+	define_start_symbols
 
 /*
  * the actual reset code
@@ -173,87 +129,7 @@  next:
 
 /*------------------------------------------------------------------------------*/
 
-/*
- * void relocate_code(addr_moni)
- *
- * This function relocates the monitor code.
- */
-	.globl	relocate_code
-relocate_code:
-	mov	r6, r0	/* save addr of destination */
-
-	adr	r0, _start
-	subs	r9, r6, r0		/* r9 <- relocation offset */
-	beq	relocate_done		/* skip relocation */
-	mov	r1, r6			/* r1 <- scratch for copy_loop */
-	ldr	r3, _image_copy_end_ofs
-	add	r2, r0, r3		/* r2 <- source end address	    */
-
-copy_loop:
-	ldmia	r0!, {r10-r11}		/* copy from source address [r0]    */
-	stmia	r1!, {r10-r11}		/* copy to   target address [r1]    */
-	cmp	r0, r2			/* until source end address [r2]    */
-	blo	copy_loop
-
-#ifndef CONFIG_SPL_BUILD
-	/*
-	 * fix .rel.dyn relocations
-	 */
-	ldr	r0, _TEXT_BASE		/* r0 <- Text base */
-	ldr	r10, _dynsym_start_ofs	/* r10 <- sym table ofs */
-	add	r10, r10, r0		/* r10 <- sym table in FLASH */
-	ldr	r2, _rel_dyn_start_ofs	/* r2 <- rel dyn start ofs */
-	add	r2, r2, r0		/* r2 <- rel dyn start in FLASH */
-	ldr	r3, _rel_dyn_end_ofs	/* r3 <- rel dyn end ofs */
-	add	r3, r3, r0		/* r3 <- rel dyn end in FLASH */
-fixloop:
-	ldr	r0, [r2]		/* r0 <- location to fix up, IN FLASH! */
-	add	r0, r0, r9		/* r0 <- location to fix up in RAM */
-	ldr	r1, [r2, #4]
-	and	r7, r1, #0xff
-	cmp	r7, #23			/* relative fixup? */
-	beq	fixrel
-	cmp	r7, #2			/* absolute fixup? */
-	beq	fixabs
-	/* ignore unknown type of fixup */
-	b	fixnext
-fixabs:
-	/* absolute fix: set location to (offset) symbol value */
-	mov	r1, r1, LSR #4		/* r1 <- symbol index in .dynsym */
-	add	r1, r10, r1		/* r1 <- address of symbol in table */
-	ldr	r1, [r1, #4]		/* r1 <- symbol value */
-	add	r1, r1, r9		/* r1 <- relocated sym addr */
-	b	fixnext
-fixrel:
-	/* relative fix: increase location by offset */
-	ldr	r1, [r0]
-	add	r1, r1, r9
-fixnext:
-	str	r1, [r0]
-	add	r2, r2, #8		/* each rel.dyn entry is 8 bytes */
-	cmp	r2, r3
-	blo	fixloop
-#endif
-
-relocate_done:
-
-	bx	lr
-
-#ifndef CONFIG_SPL_BUILD
-
-_rel_dyn_start_ofs:
-	.word __rel_dyn_start - _start
-_rel_dyn_end_ofs:
-	.word __rel_dyn_end - _start
-_dynsym_start_ofs:
-	.word __dynsym_start - _start
-
-#endif
-
-	.globl	c_runtime_cpu_setup
-c_runtime_cpu_setup:
-
-	bx	lr
+	define_relocate_code
 
 /*
  *************************************************************************
diff --git a/arch/arm/cpu/arm1176/start.S b/arch/arm/cpu/arm1176/start.S
index 18ac8d5..10dd4fc 100644
--- a/arch/arm/cpu/arm1176/start.S
+++ b/arch/arm/cpu/arm1176/start.S
@@ -33,6 +33,7 @@ 
 #include <asm-offsets.h>
 #include <config.h>
 #include <version.h>
+#include <asm/start_macro.S>
 
 #ifndef CONFIG_SYS_PHY_UBOOT_BASE
 #define CONFIG_SYS_PHY_UBOOT_BASE	CONFIG_SYS_UBOOT_BASE
@@ -93,41 +94,7 @@  _end_vect:
  *************************************************************************
  */
 
-.globl _TEXT_BASE
-_TEXT_BASE:
-#if defined(CONFIG_SPL_BUILD) && defined(CONFIG_SPL_TEXT_BASE)
-	.word	CONFIG_SPL_TEXT_BASE
-#else
-	.word	CONFIG_SYS_TEXT_BASE
-#endif
-
-/*
- * These are defined in the board-specific linker script.
- * Subtracting _start from them lets the linker put their
- * relative position in the executable instead of leaving
- * them null.
- */
-
-.globl _bss_start_ofs
-_bss_start_ofs:
-	.word __bss_start - _start
-
-.globl _image_copy_end_ofs
-_image_copy_end_ofs:
-	.word __image_copy_end - _start
-
-.globl _bss_end_ofs
-_bss_end_ofs:
-	.word __bss_end__ - _start
-
-.globl _end_ofs
-_end_ofs:
-	.word _end - _start
-
-/* IRQ stack memory (calculated at run-time) + 8 bytes */
-.globl IRQ_STACK_START_IN
-IRQ_STACK_START_IN:
-	.word	0x0badc0de
+	define_start_symbols
 
 /*
  * the actual reset code
@@ -225,84 +192,10 @@  skip_tcmdisable:
 
 /*------------------------------------------------------------------------------*/
 
-/*
- * void relocate_code(addr_moni)
- *
- * This function relocates the monitor code.
- */
-	.globl	relocate_code
-relocate_code:
-	mov	r6, r0	/* save addr of destination */
-
-	adr	r0, _start
-	subs	r9, r6, r0		/* r9 <- relocation offset */
-	beq	relocate_done		/* skip relocation */
-	mov	r1, r6			/* r1 <- scratch for copy_loop */
-	ldr	r3, _image_copy_end_ofs
-	add	r2, r0, r3		/* r2 <- source end address	    */
-
-copy_loop:
-	ldmia	r0!, {r10-r11}		/* copy from source address [r0]    */
-	stmia	r1!, {r10-r11}		/* copy to   target address [r1]    */
-	cmp	r0, r2			/* until source end address [r2]    */
-	blo	copy_loop
-
 #ifndef CONFIG_SPL_BUILD
-	/*
-	 * fix .rel.dyn relocations
-	 */
-	ldr	r0, _TEXT_BASE		/* r0 <- Text base */
-	ldr	r10, _dynsym_start_ofs	/* r10 <- sym table ofs */
-	add	r10, r10, r0		/* r10 <- sym table in FLASH */
-	ldr	r2, _rel_dyn_start_ofs	/* r2 <- rel dyn start ofs */
-	add	r2, r2, r0		/* r2 <- rel dyn start in FLASH */
-	ldr	r3, _rel_dyn_end_ofs	/* r3 <- rel dyn end ofs */
-	add	r3, r3, r0		/* r3 <- rel dyn end in FLASH */
-fixloop:
-	ldr	r0, [r2]		/* r0 <- location to fix up, IN FLASH! */
-	add	r0, r0, r9		/* r0 <- location to fix up in RAM */
-	ldr	r1, [r2, #4]
-	and	r7, r1, #0xff
-	cmp	r7, #23			/* relative fixup? */
-	beq	fixrel
-	cmp	r7, #2			/* absolute fixup? */
-	beq	fixabs
-	/* ignore unknown type of fixup */
-	b	fixnext
-fixabs:
-	/* absolute fix: set location to (offset) symbol value */
-	mov	r1, r1, LSR #4		/* r1 <- symbol index in .dynsym */
-	add	r1, r10, r1		/* r1 <- address of symbol in table */
-	ldr	r1, [r1, #4]		/* r1 <- symbol value */
-	add	r1, r1, r9		/* r1 <- relocated sym addr */
-	b	fixnext
-fixrel:
-	/* relative fix: increase location by offset */
-	ldr	r1, [r0]
-	add	r1, r1, r9
-fixnext:
-	str	r1, [r0]
-	add	r2, r2, #8		/* each rel.dyn entry is 8 bytes */
-	cmp	r2, r3
-	blo	fixloop
+	define_relocate_code
 #endif
 
-relocate_done:
-
-	bx	lr
-
-_rel_dyn_start_ofs:
-	.word __rel_dyn_start - _start
-_rel_dyn_end_ofs:
-	.word __rel_dyn_end - _start
-_dynsym_start_ofs:
-	.word __dynsym_start - _start
-
-	.globl	c_runtime_cpu_setup
-c_runtime_cpu_setup:
-
-	mov	pc, lr
-
 #ifndef CONFIG_SPL_BUILD
 /*
  *************************************************************************
diff --git a/arch/arm/cpu/arm720t/start.S b/arch/arm/cpu/arm720t/start.S
index d455793..e7150a6 100644
--- a/arch/arm/cpu/arm720t/start.S
+++ b/arch/arm/cpu/arm720t/start.S
@@ -27,6 +27,7 @@ 
 #include <config.h>
 #include <version.h>
 #include <asm/hardware.h>
+#include <asm/start_macro.S>
 
 /*
  *************************************************************************
@@ -83,52 +84,7 @@  _pad:			.word 0x12345678 /* now 16*4=64 */
  *************************************************************************
  */
 
-.globl _TEXT_BASE
-_TEXT_BASE:
-#if defined(CONFIG_SPL_BUILD) && defined(CONFIG_SPL_TEXT_BASE)
-	.word	CONFIG_SPL_TEXT_BASE
-#else
-	.word	CONFIG_SYS_TEXT_BASE
-#endif
-
-/*
- * These are defined in the board-specific linker script.
- * Subtracting _start from them lets the linker put their
- * relative position in the executable instead of leaving
- * them null.
- */
-.globl _bss_start_ofs
-_bss_start_ofs:
-	.word __bss_start - _start
-
-.globl _image_copy_end_ofs
-_image_copy_end_ofs:
-	.word __image_copy_end - _start
-
-.globl _bss_end_ofs
-_bss_end_ofs:
-	.word __bss_end__ - _start
-
-.globl _end_ofs
-_end_ofs:
-	.word _end - _start
-
-#ifdef CONFIG_USE_IRQ
-/* IRQ stack memory (calculated at run-time) */
-.globl IRQ_STACK_START
-IRQ_STACK_START:
-	.word	0x0badc0de
-
-/* IRQ stack memory (calculated at run-time) */
-.globl FIQ_STACK_START
-FIQ_STACK_START:
-	.word 0x0badc0de
-#endif
-
-/* IRQ stack memory (calculated at run-time) + 8 bytes */
-.globl IRQ_STACK_START_IN
-IRQ_STACK_START_IN:
-	.word	0x0badc0de
+	define_start_symbols
 
 /*
  * the actual reset code
@@ -155,84 +111,10 @@  reset:
 
 /*------------------------------------------------------------------------------*/
 
-/*
- * void relocate_code(addr_moni)
- *
- * This function relocates the monitor code.
- */
-	.globl	relocate_code
-relocate_code:
-	mov	r6, r0	/* save addr of destination */
-
-	adr	r0, _start
-	subs	r9, r6, r0		/* r9 <- relocation offset */
-	beq	relocate_done		/* skip relocation */
-	mov	r1, r6			/* r1 <- scratch for copy_loop */
-	ldr	r3, _image_copy_end_ofs
-	add	r2, r0, r3		/* r2 <- source end address	    */
-
-copy_loop:
-	ldmia	r0!, {r10-r11}		/* copy from source address [r0]    */
-	stmia	r1!, {r10-r11}		/* copy to   target address [r1]    */
-	cmp	r0, r2			/* until source end address [r2]    */
-	blo	copy_loop
-
 #ifndef CONFIG_SPL_BUILD
-	/*
-	 * fix .rel.dyn relocations
-	 */
-	ldr	r0, _TEXT_BASE		/* r0 <- Text base */
-	ldr	r10, _dynsym_start_ofs	/* r10 <- sym table ofs */
-	add	r10, r10, r0		/* r10 <- sym table in FLASH */
-	ldr	r2, _rel_dyn_start_ofs	/* r2 <- rel dyn start ofs */
-	add	r2, r2, r0		/* r2 <- rel dyn start in FLASH */
-	ldr	r3, _rel_dyn_end_ofs	/* r3 <- rel dyn end ofs */
-	add	r3, r3, r0		/* r3 <- rel dyn end in FLASH */
-fixloop:
-	ldr	r0, [r2]		/* r0 <- location to fix up, IN FLASH! */
-	add	r0, r0, r9		/* r0 <- location to fix up in RAM */
-	ldr	r1, [r2, #4]
-	and	r7, r1, #0xff
-	cmp	r7, #23			/* relative fixup? */
-	beq	fixrel
-	cmp	r7, #2			/* absolute fixup? */
-	beq	fixabs
-	/* ignore unknown type of fixup */
-	b	fixnext
-fixabs:
-	/* absolute fix: set location to (offset) symbol value */
-	mov	r1, r1, LSR #4		/* r1 <- symbol index in .dynsym */
-	add	r1, r10, r1		/* r1 <- address of symbol in table */
-	ldr	r1, [r1, #4]		/* r1 <- symbol value */
-	add	r1, r1, r9		/* r1 <- relocated sym addr */
-	b	fixnext
-fixrel:
-	/* relative fix: increase location by offset */
-	ldr	r1, [r0]
-	add	r1, r1, r9
-fixnext:
-	str	r1, [r0]
-	add	r2, r2, #8		/* each rel.dyn entry is 8 bytes */
-	cmp	r2, r3
-	blo	fixloop
+	define_relocate_code
 #endif
 
-relocate_done:
-
-	mov	pc, lr
-
-_rel_dyn_start_ofs:
-	.word __rel_dyn_start - _start
-_rel_dyn_end_ofs:
-	.word __rel_dyn_end - _start
-_dynsym_start_ofs:
-	.word __dynsym_start - _start
-
-	.globl	c_runtime_cpu_setup
-c_runtime_cpu_setup:
-
-	mov	pc, lr
-
 /*
  *************************************************************************
  *
diff --git a/arch/arm/cpu/arm920t/start.S b/arch/arm/cpu/arm920t/start.S
index a6cb33f..db30842 100644
--- a/arch/arm/cpu/arm920t/start.S
+++ b/arch/arm/cpu/arm920t/start.S
@@ -25,8 +25,8 @@ 
  */
 
 #include <asm-offsets.h>
-#include <common.h>
 #include <config.h>
+#include <asm/start_macro.S>
 
 /*
  *************************************************************************
@@ -71,52 +71,7 @@  _fiq:			.word fiq
  *************************************************************************
  */
 
-.globl _TEXT_BASE
-_TEXT_BASE:
-#if defined(CONFIG_SPL_BUILD) && defined(CONFIG_SPL_TEXT_BASE)
-	.word	CONFIG_SPL_TEXT_BASE
-#else
-	.word	CONFIG_SYS_TEXT_BASE
-#endif
-
-/*
- * These are defined in the board-specific linker script.
- * Subtracting _start from them lets the linker put their
- * relative position in the executable instead of leaving
- * them null.
- */
-.globl _bss_start_ofs
-_bss_start_ofs:
-	.word __bss_start - _start
-
-.globl _image_copy_end_ofs
-_image_copy_end_ofs:
-	.word __image_copy_end - _start
-
-.globl _bss_end_ofs
-_bss_end_ofs:
-	.word __bss_end__ - _start
-
-.globl _end_ofs
-_end_ofs:
-	.word _end - _start
-
-#ifdef CONFIG_USE_IRQ
-/* IRQ stack memory (calculated at run-time) */
-.globl IRQ_STACK_START
-IRQ_STACK_START:
-	.word	0x0badc0de
-
-/* IRQ stack memory (calculated at run-time) */
-.globl FIQ_STACK_START
-FIQ_STACK_START:
-	.word 0x0badc0de
-#endif
-
-/* IRQ stack memory (calculated at run-time) + 8 bytes */
-.globl IRQ_STACK_START_IN
-IRQ_STACK_START_IN:
-	.word	0x0badc0de
+	define_start_symbols
 
 /*
  * the actual start code
@@ -194,84 +149,10 @@  copyex:
 
 /*------------------------------------------------------------------------------*/
 
-/*
- * void relocate_code(addr_moni)
- *
- * This function relocates the monitor code.
- */
-	.globl	relocate_code
-relocate_code:
-	mov	r6, r0	/* save addr of destination */
-
-	adr	r0, _start
-	subs	r9, r6, r0		/* r9 <- relocation offset */
-	beq	relocate_done		/* skip relocation */
-	mov	r1, r6			/* r1 <- scratch for copy_loop */
-	ldr	r3, _image_copy_end_ofs
-	add	r2, r0, r3		/* r2 <- source end address	    */
-
-copy_loop:
-	ldmia	r0!, {r10-r11}		/* copy from source address [r0]    */
-	stmia	r1!, {r10-r11}		/* copy to   target address [r1]    */
-	cmp	r0, r2			/* until source end address [r2]    */
-	blo	copy_loop
-
 #ifndef CONFIG_SPL_BUILD
-	/*
-	 * fix .rel.dyn relocations
-	 */
-	ldr	r0, _TEXT_BASE		/* r0 <- Text base */
-	ldr	r10, _dynsym_start_ofs	/* r10 <- sym table ofs */
-	add	r10, r10, r0		/* r10 <- sym table in FLASH */
-	ldr	r2, _rel_dyn_start_ofs	/* r2 <- rel dyn start ofs */
-	add	r2, r2, r0		/* r2 <- rel dyn start in FLASH */
-	ldr	r3, _rel_dyn_end_ofs	/* r3 <- rel dyn end ofs */
-	add	r3, r3, r0		/* r3 <- rel dyn end in FLASH */
-fixloop:
-	ldr	r0, [r2]		/* r0 <- location to fix up, IN FLASH! */
-	add	r0, r0, r9		/* r0 <- location to fix up in RAM */
-	ldr	r1, [r2, #4]
-	and	r7, r1, #0xff
-	cmp	r7, #23			/* relative fixup? */
-	beq	fixrel
-	cmp	r7, #2			/* absolute fixup? */
-	beq	fixabs
-	/* ignore unknown type of fixup */
-	b	fixnext
-fixabs:
-	/* absolute fix: set location to (offset) symbol value */
-	mov	r1, r1, LSR #4		/* r1 <- symbol index in .dynsym */
-	add	r1, r10, r1		/* r1 <- address of symbol in table */
-	ldr	r1, [r1, #4]		/* r1 <- symbol value */
-	add	r1, r1, r9		/* r1 <- relocated sym addr */
-	b	fixnext
-fixrel:
-	/* relative fix: increase location by offset */
-	ldr	r1, [r0]
-	add	r1, r1, r9
-fixnext:
-	str	r1, [r0]
-	add	r2, r2, #8		/* each rel.dyn entry is 8 bytes */
-	cmp	r2, r3
-	blo	fixloop
+	define_relocate_code
 #endif
 
-relocate_done:
-
-	mov	pc, lr
-
-_rel_dyn_start_ofs:
-	.word __rel_dyn_start - _start
-_rel_dyn_end_ofs:
-	.word __rel_dyn_end - _start
-_dynsym_start_ofs:
-	.word __dynsym_start - _start
-
-	.globl	c_runtime_cpu_setup
-c_runtime_cpu_setup:
-
-	mov	pc, lr
-
 /*
  *************************************************************************
  *
diff --git a/arch/arm/cpu/arm925t/start.S b/arch/arm/cpu/arm925t/start.S
index 78d5ab4..6b77e94 100644
--- a/arch/arm/cpu/arm925t/start.S
+++ b/arch/arm/cpu/arm925t/start.S
@@ -33,6 +33,7 @@ 
 #include <asm-offsets.h>
 #include <config.h>
 #include <version.h>
+#include <asm/start_macro.S>
 
 /*
  *************************************************************************
@@ -77,52 +78,7 @@  _fiq:			.word fiq
  *************************************************************************
  */
 
-.globl _TEXT_BASE
-_TEXT_BASE:
-#if defined(CONFIG_SPL_BUILD) && defined(CONFIG_SPL_TEXT_BASE)
-	.word	CONFIG_SPL_TEXT_BASE
-#else
-	.word	CONFIG_SYS_TEXT_BASE
-#endif
-
-/*
- * These are defined in the board-specific linker script.
- * Subtracting _start from them lets the linker put their
- * relative position in the executable instead of leaving
- * them null.
- */
-.globl _bss_start_ofs
-_bss_start_ofs:
-	.word __bss_start - _start
-
-.globl _image_copy_end_ofs
-_image_copy_end_ofs:
-	.word __image_copy_end - _start
-
-.globl _bss_end_ofs
-_bss_end_ofs:
-	.word __bss_end__ - _start
-
-.globl _end_ofs
-_end_ofs:
-	.word _end - _start
-
-#ifdef CONFIG_USE_IRQ
-/* IRQ stack memory (calculated at run-time) */
-.globl IRQ_STACK_START
-IRQ_STACK_START:
-	.word	0x0badc0de
-
-/* IRQ stack memory (calculated at run-time) */
-.globl FIQ_STACK_START
-FIQ_STACK_START:
-	.word 0x0badc0de
-#endif
-
-/* IRQ stack memory (calculated at run-time) + 8 bytes */
-.globl IRQ_STACK_START_IN
-IRQ_STACK_START_IN:
-	.word	0x0badc0de
+	define_start_symbols
 
 /*
  * the actual reset code
@@ -184,84 +140,10 @@  poll1:
 
 /*------------------------------------------------------------------------------*/
 
-/*
- * void relocate_code(addr_moni)
- *
- * This function relocates the monitor code.
- */
-	.globl	relocate_code
-relocate_code:
-	mov	r6, r0	/* save addr of destination */
-
-	adr	r0, _start
-	subs	r9, r6, r0		/* r9 <- relocation offset */
-	beq	relocate_done		/* skip relocation */
-	mov	r1, r6			/* r1 <- scratch for copy_loop */
-	ldr	r3, _image_copy_end_ofs
-	add	r2, r0, r3		/* r2 <- source end address	    */
-
-copy_loop:
-	ldmia	r0!, {r10-r11}		/* copy from source address [r0]    */
-	stmia	r1!, {r10-r11}		/* copy to   target address [r1]    */
-	cmp	r0, r2			/* until source end address [r2]    */
-	blo	copy_loop
-
 #ifndef CONFIG_SPL_BUILD
-	/*
-	 * fix .rel.dyn relocations
-	 */
-	ldr	r0, _TEXT_BASE		/* r0 <- Text base */
-	ldr	r10, _dynsym_start_ofs	/* r10 <- sym table ofs */
-	add	r10, r10, r0		/* r10 <- sym table in FLASH */
-	ldr	r2, _rel_dyn_start_ofs	/* r2 <- rel dyn start ofs */
-	add	r2, r2, r0		/* r2 <- rel dyn start in FLASH */
-	ldr	r3, _rel_dyn_end_ofs	/* r3 <- rel dyn end ofs */
-	add	r3, r3, r0		/* r3 <- rel dyn end in FLASH */
-fixloop:
-	ldr	r0, [r2]		/* r0 <- location to fix up, IN FLASH! */
-	add	r0, r0, r9		/* r0 <- location to fix up in RAM */
-	ldr	r1, [r2, #4]
-	and	r7, r1, #0xff
-	cmp	r7, #23			/* relative fixup? */
-	beq	fixrel
-	cmp	r7, #2			/* absolute fixup? */
-	beq	fixabs
-	/* ignore unknown type of fixup */
-	b	fixnext
-fixabs:
-	/* absolute fix: set location to (offset) symbol value */
-	mov	r1, r1, LSR #4		/* r1 <- symbol index in .dynsym */
-	add	r1, r10, r1		/* r1 <- address of symbol in table */
-	ldr	r1, [r1, #4]		/* r1 <- symbol value */
-	add	r1, r1, r9		/* r1 <- relocated sym addr */
-	b	fixnext
-fixrel:
-	/* relative fix: increase location by offset */
-	ldr	r1, [r0]
-	add	r1, r1, r9
-fixnext:
-	str	r1, [r0]
-	add	r2, r2, #8		/* each rel.dyn entry is 8 bytes */
-	cmp	r2, r3
-	blo	fixloop
+	define_relocate_code
 #endif
 
-relocate_done:
-
-	mov	pc, lr
-
-_rel_dyn_start_ofs:
-	.word __rel_dyn_start - _start
-_rel_dyn_end_ofs:
-	.word __rel_dyn_end - _start
-_dynsym_start_ofs:
-	.word __dynsym_start - _start
-
-	.globl	c_runtime_cpu_setup
-c_runtime_cpu_setup:
-
-	mov	pc, lr
-
 /*
  *************************************************************************
  *
diff --git a/arch/arm/cpu/arm926ejs/mxs/start.S b/arch/arm/cpu/arm926ejs/mxs/start.S
index 94da398..be8dda0 100644
--- a/arch/arm/cpu/arm926ejs/mxs/start.S
+++ b/arch/arm/cpu/arm926ejs/mxs/start.S
@@ -37,8 +37,8 @@ 
 
 #include <asm-offsets.h>
 #include <config.h>
-#include <common.h>
 #include <version.h>
+#include <asm/start_macro.S>
 
 /*
  *************************************************************************
@@ -117,48 +117,7 @@  fiq:
  *************************************************************************
  */
 
-.globl _TEXT_BASE
-_TEXT_BASE:
-#ifdef CONFIG_SPL_TEXT_BASE
-	.word	CONFIG_SPL_TEXT_BASE
-#else
-	.word	CONFIG_SYS_TEXT_BASE
-#endif
-
-/*
- * These are defined in the board-specific linker script.
- * Subtracting _start from them lets the linker put their
- * relative position in the executable instead of leaving
- * them null.
- */
-.globl _bss_start_ofs
-_bss_start_ofs:
-	.word __bss_start - _start
-
-.globl _bss_end_ofs
-_bss_end_ofs:
-	.word __bss_end__ - _start
-
-.globl _end_ofs
-_end_ofs:
-	.word _end - _start
-
-#ifdef CONFIG_USE_IRQ
-/* IRQ stack memory (calculated at run-time) */
-.globl IRQ_STACK_START
-IRQ_STACK_START:
-	.word	0x0badc0de
-
-/* IRQ stack memory (calculated at run-time) */
-.globl FIQ_STACK_START
-FIQ_STACK_START:
-	.word 0x0badc0de
-#endif
-
-/* IRQ stack memory (calculated at run-time) + 8 bytes */
-.globl IRQ_STACK_START_IN
-IRQ_STACK_START_IN:
-	.word	0x0badc0de
+	define_start_symbols
 
 /*
  * the actual reset code
diff --git a/arch/arm/cpu/arm926ejs/start.S b/arch/arm/cpu/arm926ejs/start.S
index 863eb8e..9022af8 100644
--- a/arch/arm/cpu/arm926ejs/start.S
+++ b/arch/arm/cpu/arm926ejs/start.S
@@ -33,8 +33,8 @@ 
 
 #include <asm-offsets.h>
 #include <config.h>
-#include <common.h>
 #include <version.h>
+#include <asm/start_macro.S>
 
 /*
  *************************************************************************
@@ -118,52 +118,7 @@  _fiq:
  *************************************************************************
  */
 
-.globl _TEXT_BASE
-_TEXT_BASE:
-#if defined(CONFIG_SPL_BUILD) && defined(CONFIG_SPL_TEXT_BASE)
-	.word	CONFIG_SPL_TEXT_BASE
-#else
-	.word	CONFIG_SYS_TEXT_BASE
-#endif
-
-/*
- * These are defined in the board-specific linker script.
- * Subtracting _start from them lets the linker put their
- * relative position in the executable instead of leaving
- * them null.
- */
-.globl _bss_start_ofs
-_bss_start_ofs:
-	.word __bss_start - _start
-
-.globl _image_copy_end_ofs
-_image_copy_end_ofs:
-	.word __image_copy_end - _start
-
-.globl _bss_end_ofs
-_bss_end_ofs:
-	.word __bss_end__ - _start
-
-.globl _end_ofs
-_end_ofs:
-	.word _end - _start
-
-#ifdef CONFIG_USE_IRQ
-/* IRQ stack memory (calculated at run-time) */
-.globl IRQ_STACK_START
-IRQ_STACK_START:
-	.word	0x0badc0de
-
-/* IRQ stack memory (calculated at run-time) */
-.globl FIQ_STACK_START
-FIQ_STACK_START:
-	.word 0x0badc0de
-#endif
-
-/* IRQ stack memory (calculated at run-time) + 8 bytes */
-.globl IRQ_STACK_START_IN
-IRQ_STACK_START_IN:
-	.word	0x0badc0de
+	define_start_symbols
 
 /*
  * the actual reset code
@@ -190,87 +145,7 @@  reset:
 
 /*------------------------------------------------------------------------------*/
 
-/*
- * void relocate_code(addr_moni)
- *
- * This function relocates the monitor code.
- */
-	.globl	relocate_code
-relocate_code:
-	mov	r6, r0	/* save addr of destination */
-
-	adr	r0, _start
-	subs	r9, r6, r0		/* r9 <- relocation offset */
-	beq	relocate_done		/* skip relocation */
-	mov	r1, r6			/* r1 <- scratch for copy loop */
-	ldr	r3, _image_copy_end_ofs
-	add	r2, r0, r3		/* r2 <- source end address	    */
-
-copy_loop:
-	ldmia	r0!, {r10-r11}		/* copy from source address [r0]    */
-	stmia	r1!, {r10-r11}		/* copy to   target address [r1]    */
-	cmp	r0, r2			/* until source end address [r2]    */
-	blo	copy_loop
-
-#ifndef CONFIG_SPL_BUILD
-	/*
-	 * fix .rel.dyn relocations
-	 */
-	ldr	r0, _TEXT_BASE		/* r0 <- Text base */
-	ldr	r10, _dynsym_start_ofs	/* r10 <- sym table ofs */
-	add	r10, r10, r0		/* r10 <- sym table in FLASH */
-	ldr	r2, _rel_dyn_start_ofs	/* r2 <- rel dyn start ofs */
-	add	r2, r2, r0		/* r2 <- rel dyn start in FLASH */
-	ldr	r3, _rel_dyn_end_ofs	/* r3 <- rel dyn end ofs */
-	add	r3, r3, r0		/* r3 <- rel dyn end in FLASH */
-fixloop:
-	ldr	r0, [r2]		/* r0 <- location to fix up, IN FLASH! */
-	add	r0, r0, r9		/* r0 <- location to fix up in RAM */
-	ldr	r1, [r2, #4]
-	and	r7, r1, #0xff
-	cmp	r7, #23			/* relative fixup? */
-	beq	fixrel
-	cmp	r7, #2			/* absolute fixup? */
-	beq	fixabs
-	/* ignore unknown type of fixup */
-	b	fixnext
-fixabs:
-	/* absolute fix: set location to (offset) symbol value */
-	mov	r1, r1, LSR #4		/* r1 <- symbol index in .dynsym */
-	add	r1, r10, r1		/* r1 <- address of symbol in table */
-	ldr	r1, [r1, #4]		/* r1 <- symbol value */
-	add	r1, r1, r9		/* r1 <- relocated sym addr */
-	b	fixnext
-fixrel:
-	/* relative fix: increase location by offset */
-	ldr	r1, [r0]
-	add	r1, r1, r9
-fixnext:
-	str	r1, [r0]
-	add	r2, r2, #8		/* each rel.dyn entry is 8 bytes */
-	cmp	r2, r3
-	blo	fixloop
-#endif
-
-relocate_done:
-
-	bx	lr
-
-#ifndef CONFIG_SPL_BUILD
-
-_rel_dyn_start_ofs:
-	.word __rel_dyn_start - _start
-_rel_dyn_end_ofs:
-	.word __rel_dyn_end - _start
-_dynsym_start_ofs:
-	.word __dynsym_start - _start
-
-#endif
-
-	.globl	c_runtime_cpu_setup
-c_runtime_cpu_setup:
-
-	bx	lr
+	define_relocate_code
 
 /*
  *************************************************************************
diff --git a/arch/arm/cpu/arm946es/start.S b/arch/arm/cpu/arm946es/start.S
index cc8158d..c3c3938 100644
--- a/arch/arm/cpu/arm946es/start.S
+++ b/arch/arm/cpu/arm946es/start.S
@@ -34,6 +34,7 @@ 
 #include <asm-offsets.h>
 #include <config.h>
 #include <version.h>
+#include <asm/start_macro.S>
 
 /*
  *************************************************************************
@@ -87,52 +88,7 @@  _vectors_end:
  *************************************************************************
  */
 
-.globl _TEXT_BASE
-_TEXT_BASE:
-#if defined(CONFIG_SPL_BUILD) && defined(CONFIG_SPL_TEXT_BASE)
-	.word	CONFIG_SPL_TEXT_BASE
-#else
-	.word	CONFIG_SYS_TEXT_BASE
-#endif
-
-/*
- * These are defined in the board-specific linker script.
- * Subtracting _start from them lets the linker put their
- * relative position in the executable instead of leaving
- * them null.
- */
-.globl _bss_start_ofs
-_bss_start_ofs:
-	.word __bss_start - _start
-
-.globl _image_copy_end_ofs
-_image_copy_end_ofs:
-	.word __image_copy_end - _start
-
-.globl _bss_end_ofs
-_bss_end_ofs:
-	.word __bss_end__ - _start
-
-.globl _end_ofs
-_end_ofs:
-	.word _end - _start
-
-#ifdef CONFIG_USE_IRQ
-/* IRQ stack memory (calculated at run-time) */
-.globl IRQ_STACK_START
-IRQ_STACK_START:
-	.word	0x0badc0de
-
-/* IRQ stack memory (calculated at run-time) */
-.globl FIQ_STACK_START
-FIQ_STACK_START:
-	.word 0x0badc0de
-#endif
-
-/* IRQ stack memory (calculated at run-time) + 8 bytes */
-.globl IRQ_STACK_START_IN
-IRQ_STACK_START_IN:
-	.word	0x0badc0de
+	define_start_symbols
 
 /*
  * the actual reset code
@@ -159,84 +115,10 @@  reset:
 
 /*------------------------------------------------------------------------------*/
 
-/*
- * void relocate_code(addr_moni)
- *
- * This function relocates the monitor code.
- */
-	.globl	relocate_code
-relocate_code:
-	mov	r6, r0	/* save addr of destination */
-
-	adr	r0, _start
-	subs	r9, r6, r0		/* r9 <- relocation offset */
-	beq	relocate_done		/* skip relocation */
-	mov	r1, r6			/* r1 <- scratch for copy_loop */
-	ldr	r3, _image_copy_end_ofs
-	add	r2, r0, r3		/* r2 <- source end address	    */
-
-copy_loop:
-	ldmia	r0!, {r10-r11}		/* copy from source address [r0]    */
-	stmia	r1!, {r10-r11}		/* copy to   target address [r1]    */
-	cmp	r0, r2			/* until source end address [r2]    */
-	blo	copy_loop
-
 #ifndef CONFIG_SPL_BUILD
-	/*
-	 * fix .rel.dyn relocations
-	 */
-	ldr	r0, _TEXT_BASE		/* r0 <- Text base */
-	ldr	r10, _dynsym_start_ofs	/* r10 <- sym table ofs */
-	add	r10, r10, r0		/* r10 <- sym table in FLASH */
-	ldr	r2, _rel_dyn_start_ofs	/* r2 <- rel dyn start ofs */
-	add	r2, r2, r0		/* r2 <- rel dyn start in FLASH */
-	ldr	r3, _rel_dyn_end_ofs	/* r3 <- rel dyn end ofs */
-	add	r3, r3, r0		/* r3 <- rel dyn end in FLASH */
-fixloop:
-	ldr	r0, [r2]		/* r0 <- location to fix up, IN FLASH! */
-	add	r0, r0, r9		/* r0 <- location to fix up in RAM */
-	ldr	r1, [r2, #4]
-	and	r7, r1, #0xff
-	cmp	r7, #23			/* relative fixup? */
-	beq	fixrel
-	cmp	r7, #2			/* absolute fixup? */
-	beq	fixabs
-	/* ignore unknown type of fixup */
-	b	fixnext
-fixabs:
-	/* absolute fix: set location to (offset) symbol value */
-	mov	r1, r1, LSR #4		/* r1 <- symbol index in .dynsym */
-	add	r1, r10, r1		/* r1 <- address of symbol in table */
-	ldr	r1, [r1, #4]		/* r1 <- symbol value */
-	add	r1, r1, r9		/* r1 <- relocated sym addr */
-	b	fixnext
-fixrel:
-	/* relative fix: increase location by offset */
-	ldr	r1, [r0]
-	add	r1, r1, r9
-fixnext:
-	str	r1, [r0]
-	add	r2, r2, #8		/* each rel.dyn entry is 8 bytes */
-	cmp	r2, r3
-	blo	fixloop
+	define_relocate_code
 #endif
 
-relocate_done:
-
-	mov	pc, lr
-
-_rel_dyn_start_ofs:
-	.word __rel_dyn_start - _start
-_rel_dyn_end_ofs:
-	.word __rel_dyn_end - _start
-_dynsym_start_ofs:
-	.word __dynsym_start - _start
-
-	.globl	c_runtime_cpu_setup
-c_runtime_cpu_setup:
-
-	mov	pc, lr
-
 /*
  *************************************************************************
  *
diff --git a/arch/arm/cpu/arm_intcm/start.S b/arch/arm/cpu/arm_intcm/start.S
index 8965d5f..e74ee14 100644
--- a/arch/arm/cpu/arm_intcm/start.S
+++ b/arch/arm/cpu/arm_intcm/start.S
@@ -33,6 +33,7 @@ 
 #include <asm-offsets.h>
 #include <config.h>
 #include <version.h>
+#include <asm/start_macro.S>
 
 /*
  *************************************************************************
@@ -83,52 +84,7 @@  _fiq:
  *************************************************************************
  */
 
-.globl _TEXT_BASE
-_TEXT_BASE:
-#if defined(CONFIG_SPL_BUILD) && defined(CONFIG_SPL_TEXT_BASE)
-	.word	CONFIG_SPL_TEXT_BASE
-#else
-	.word	CONFIG_SYS_TEXT_BASE
-#endif
-
-/*
- * These are defined in the board-specific linker script.
- * Subtracting _start from them lets the linker put their
- * relative position in the executable instead of leaving
- * them null.
- */
-.globl _bss_start_ofs
-_bss_start_ofs:
-	.word __bss_start - _start
-
-.globl _image_copy_end_ofs
-_image_copy_end_ofs:
-	.word __image_copy_end - _start
-
-.globl _bss_end_ofs
-_bss_end_ofs:
-	.word __bss_end__ - _start
-
-.globl _end_ofs
-_end_ofs:
-	.word _end - _start
-
-#ifdef CONFIG_USE_IRQ
-/* IRQ stack memory (calculated at run-time) */
-.globl IRQ_STACK_START
-IRQ_STACK_START:
-	.word	0x0badc0de
-
-/* IRQ stack memory (calculated at run-time) */
-.globl FIQ_STACK_START
-FIQ_STACK_START:
-	.word 0x0badc0de
-#endif
-
-/* IRQ stack memory (calculated at run-time) + 8 bytes */
-.globl IRQ_STACK_START_IN
-IRQ_STACK_START_IN:
-	.word	0x0badc0de
+	define_start_symbols
 
 /*
  * the actual reset code
@@ -155,84 +111,10 @@  reset:
 
 /*------------------------------------------------------------------------------*/
 
-/*
- * void relocate_code(addr_moni)
- *
- * This function relocates the monitor code.
- */
-	.globl	relocate_code
-relocate_code:
-	mov	r6, r0	/* save addr of destination */
-
-	adr	r0, _start
-	subs	r9, r6, r0		/* r9 <- relocation offset */
-	beq	relocate_done		/* skip relocation */
-	mov	r1, r6			/* r1 <- scratch for copy_loop */
-	ldr	r3, _image_copy_end_ofs
-	add	r2, r0, r3		/* r2 <- source end address	    */
-
-copy_loop:
-	ldmia	r0!, {r10-r11}		/* copy from source address [r0]    */
-	stmia	r1!, {r10-r11}		/* copy to   target address [r1]    */
-	cmp	r0, r2			/* until source end address [r2]    */
-	blo	copy_loop
-
 #ifndef CONFIG_SPL_BUILD
-	/*
-	 * fix .rel.dyn relocations
-	 */
-	ldr	r0, _TEXT_BASE		/* r0 <- Text base */
-	ldr	r10, _dynsym_start_ofs	/* r10 <- sym table ofs */
-	add	r10, r10, r0		/* r10 <- sym table in FLASH */
-	ldr	r2, _rel_dyn_start_ofs	/* r2 <- rel dyn start ofs */
-	add	r2, r2, r0		/* r2 <- rel dyn start in FLASH */
-	ldr	r3, _rel_dyn_end_ofs	/* r3 <- rel dyn end ofs */
-	add	r3, r3, r0		/* r3 <- rel dyn end in FLASH */
-fixloop:
-	ldr	r0, [r2]		/* r0 <- location to fix up, IN FLASH! */
-	add	r0, r0, r9		/* r0 <- location to fix up in RAM */
-	ldr	r1, [r2, #4]
-	and	r7, r1, #0xff
-	cmp	r7, #23			/* relative fixup? */
-	beq	fixrel
-	cmp	r7, #2			/* absolute fixup? */
-	beq	fixabs
-	/* ignore unknown type of fixup */
-	b	fixnext
-fixabs:
-	/* absolute fix: set location to (offset) symbol value */
-	mov	r1, r1, LSR #4		/* r1 <- symbol index in .dynsym */
-	add	r1, r10, r1		/* r1 <- address of symbol in table */
-	ldr	r1, [r1, #4]		/* r1 <- symbol value */
-	add	r1, r1, r9		/* r1 <- relocated sym addr */
-	b	fixnext
-fixrel:
-	/* relative fix: increase location by offset */
-	ldr	r1, [r0]
-	add	r1, r1, r9
-fixnext:
-	str	r1, [r0]
-	add	r2, r2, #8		/* each rel.dyn entry is 8 bytes */
-	cmp	r2, r3
-	blo	fixloop
+	define_relocate_code
 #endif
 
-relocate_done:
-
-	bx	lr
-
-_rel_dyn_start_ofs:
-	.word __rel_dyn_start - _start
-_rel_dyn_end_ofs:
-	.word __rel_dyn_end - _start
-_dynsym_start_ofs:
-	.word __dynsym_start - _start
-
-	.globl	c_runtime_cpu_setup
-c_runtime_cpu_setup:
-
-	mov	pc, lr
-
 /*
  *************************************************************************
  *
diff --git a/arch/arm/cpu/armv7/start.S b/arch/arm/cpu/armv7/start.S
index d1b7d33..713420b 100644
--- a/arch/arm/cpu/armv7/start.S
+++ b/arch/arm/cpu/armv7/start.S
@@ -34,6 +34,7 @@ 
 #include <version.h>
 #include <asm/system.h>
 #include <linux/linkage.h>
+#include <asm/start_macro.S>
 
 .globl _start
 _start: b	reset
@@ -79,49 +80,7 @@  _end_vect:
  *
  *************************************************************************/
 
-.globl _TEXT_BASE
-_TEXT_BASE:
-#if defined(CONFIG_SPL_BUILD) && defined(CONFIG_SPL_TEXT_BASE)
-	.word	CONFIG_SPL_TEXT_BASE
-#else
-	.word	CONFIG_SYS_TEXT_BASE
-#endif
-
-/*
- * These are defined in the board-specific linker script.
- */
-.globl _bss_start_ofs
-_bss_start_ofs:
-	.word __bss_start - _start
-
-.globl _image_copy_end_ofs
-_image_copy_end_ofs:
-	.word __image_copy_end - _start
-
-.globl _bss_end_ofs
-_bss_end_ofs:
-	.word __bss_end__ - _start
-
-.globl _end_ofs
-_end_ofs:
-	.word _end - _start
-
-#ifdef CONFIG_USE_IRQ
-/* IRQ stack memory (calculated at run-time) */
-.globl IRQ_STACK_START
-IRQ_STACK_START:
-	.word	0x0badc0de
-
-/* IRQ stack memory (calculated at run-time) */
-.globl FIQ_STACK_START
-FIQ_STACK_START:
-	.word 0x0badc0de
-#endif
-
-/* IRQ stack memory (calculated at run-time) + 8 bytes */
-.globl IRQ_STACK_START_IN
-IRQ_STACK_START_IN:
-	.word	0x0badc0de
+	define_start_symbols
 
 /*
  * the actual reset code
@@ -164,77 +123,7 @@  reset:
 /*------------------------------------------------------------------------------*/
 
 #ifndef CONFIG_SPL_BUILD
-/*
- * void relocate_code(addr_moni)
- *
- * This function relocates the monitor code.
- */
-ENTRY(relocate_code)
-	mov	r6, r0	/* save addr of destination */
-
-	adr	r0, _start
-	subs	r9, r6, r0		/* r9 <- relocation offset */
-	beq	relocate_done		/* skip relocation */
-	mov	r1, r6			/* r1 <- scratch for copy_loop */
-	ldr	r3, _image_copy_end_ofs
-	add	r2, r0, r3		/* r2 <- source end address	    */
-
-copy_loop:
-	ldmia	r0!, {r10-r11}		/* copy from source address [r0]    */
-	stmia	r1!, {r10-r11}		/* copy to   target address [r1]    */
-	cmp	r0, r2			/* until source end address [r2]    */
-	blo	copy_loop
-
-	/*
-	 * fix .rel.dyn relocations
-	 */
-	ldr	r0, _TEXT_BASE		/* r0 <- Text base */
-	ldr	r10, _dynsym_start_ofs	/* r10 <- sym table ofs */
-	add	r10, r10, r0		/* r10 <- sym table in FLASH */
-	ldr	r2, _rel_dyn_start_ofs	/* r2 <- rel dyn start ofs */
-	add	r2, r2, r0		/* r2 <- rel dyn start in FLASH */
-	ldr	r3, _rel_dyn_end_ofs	/* r3 <- rel dyn end ofs */
-	add	r3, r3, r0		/* r3 <- rel dyn end in FLASH */
-fixloop:
-	ldr	r0, [r2]		/* r0 <- location to fix up, IN FLASH! */
-	add	r0, r0, r9		/* r0 <- location to fix up in RAM */
-	ldr	r1, [r2, #4]
-	and	r7, r1, #0xff
-	cmp	r7, #23			/* relative fixup? */
-	beq	fixrel
-	cmp	r7, #2			/* absolute fixup? */
-	beq	fixabs
-	/* ignore unknown type of fixup */
-	b	fixnext
-fixabs:
-	/* absolute fix: set location to (offset) symbol value */
-	mov	r1, r1, LSR #4		/* r1 <- symbol index in .dynsym */
-	add	r1, r10, r1		/* r1 <- address of symbol in table */
-	ldr	r1, [r1, #4]		/* r1 <- symbol value */
-	add	r1, r1, r9		/* r1 <- relocated sym addr */
-	b	fixnext
-fixrel:
-	/* relative fix: increase location by offset */
-	ldr	r1, [r0]
-	add	r1, r1, r9
-fixnext:
-	str	r1, [r0]
-	add	r2, r2, #8		/* each rel.dyn entry is 8 bytes */
-	cmp	r2, r3
-	blo	fixloop
-
-relocate_done:
-
-	bx	lr
-
-_rel_dyn_start_ofs:
-	.word __rel_dyn_start - _start
-_rel_dyn_end_ofs:
-	.word __rel_dyn_end - _start
-_dynsym_start_ofs:
-	.word __dynsym_start - _start
-ENDPROC(relocate_code)
-
+	define_relocate_code
 #endif
 
 ENTRY(c_runtime_cpu_setup)
diff --git a/arch/arm/cpu/ixp/start.S b/arch/arm/cpu/ixp/start.S
index d986c31..b24529d 100644
--- a/arch/arm/cpu/ixp/start.S
+++ b/arch/arm/cpu/ixp/start.S
@@ -30,6 +30,7 @@ 
 #include <asm-offsets.h>
 #include <config.h>
 #include <version.h>
+#include <asm/start_macro.S>
 #include <asm/arch/ixp425.h>
 
 #define MMU_Control_M  0x001    /* Enable MMU */
@@ -96,52 +97,7 @@  _fiq:			.word fiq
  * - jump to second stage
  */
 
-.globl _TEXT_BASE
-_TEXT_BASE:
-#if defined(CONFIG_SPL_BUILD) && defined(CONFIG_SPL_TEXT_BASE)
-	.word	CONFIG_SPL_TEXT_BASE
-#else
-	.word	CONFIG_SYS_TEXT_BASE
-#endif
-
-/*
- * These are defined in the board-specific linker script.
- * Subtracting _start from them lets the linker put their
- * relative position in the executable instead of leaving
- * them null.
- */
-.globl _bss_start_ofs
-_bss_start_ofs:
-	.word __bss_start - _start
-
-.globl _image_copy_end_ofs
-_image_copy_end_ofs:
-	.word __image_copy_end - _start
-
-.globl _bss_end_ofs
-_bss_end_ofs:
-	.word __bss_end__ - _start
-
-.globl _end_ofs
-_end_ofs:
-	.word _end - _start
-
-#ifdef CONFIG_USE_IRQ
-/* IRQ stack memory (calculated at run-time) */
-.globl IRQ_STACK_START
-IRQ_STACK_START:
-	.word	0x0badc0de
-
-/* IRQ stack memory (calculated at run-time) */
-.globl FIQ_STACK_START
-FIQ_STACK_START:
-	.word 0x0badc0de
-#endif
-
-/* IRQ stack memory (calculated at run-time) + 8 bytes */
-.globl IRQ_STACK_START_IN
-IRQ_STACK_START_IN:
-	.word	0x0badc0de
+	define_start_symbols
 
 /*
  * the actual reset code
@@ -257,84 +213,10 @@  reset:
 
 /*------------------------------------------------------------------------------*/
 
-/*
- * void relocate_code(addr_moni)
- *
- * This function relocates the monitor code.
- */
-	.globl	relocate_code
-relocate_code:
-	mov	r6, r0	/* save addr of destination */
-
-	adr	r0, _start
-	subs	r9, r6, r0		/* r9 <- relocation offset */
-	beq	relocate_done		/* skip relocation */
-	mov	r1, r6			/* r1 <- scratch for copy_loop */
-	ldr	r3, _image_copy_end_ofs
-	add	r2, r0, r3		/* r2 <- source end address	    */
-
-copy_loop:
-	ldmia	r0!, {r10-r11}		/* copy from source address [r0]    */
-	stmia	r1!, {r10-r11}		/* copy to   target address [r1]    */
-	cmp	r0, r2			/* until source end address [r2]    */
-	blo	copy_loop
-
 #ifndef CONFIG_SPL_BUILD
-	/*
-	 * fix .rel.dyn relocations
-	 */
-	ldr	r0, _TEXT_BASE		/* r0 <- Text base */
-	ldr	r10, _dynsym_start_ofs	/* r10 <- sym table ofs */
-	add	r10, r10, r0		/* r10 <- sym table in FLASH */
-	ldr	r2, _rel_dyn_start_ofs	/* r2 <- rel dyn start ofs */
-	add	r2, r2, r0		/* r2 <- rel dyn start in FLASH */
-	ldr	r3, _rel_dyn_end_ofs	/* r3 <- rel dyn end ofs */
-	add	r3, r3, r0		/* r3 <- rel dyn end in FLASH */
-fixloop:
-	ldr	r0, [r2]		/* r0 <- location to fix up, IN FLASH! */
-	add	r0, r0, r9		/* r0 <- location to fix up in RAM */
-	ldr	r1, [r2, #4]
-	and	r7, r1, #0xff
-	cmp	r7, #23			/* relative fixup? */
-	beq	fixrel
-	cmp	r7, #2			/* absolute fixup? */
-	beq	fixabs
-	/* ignore unknown type of fixup */
-	b	fixnext
-fixabs:
-	/* absolute fix: set location to (offset) symbol value */
-	mov	r1, r1, LSR #4		/* r1 <- symbol index in .dynsym */
-	add	r1, r10, r1		/* r1 <- address of symbol in table */
-	ldr	r1, [r1, #4]		/* r1 <- symbol value */
-	add	r1, r1, r9		/* r1 <- relocated sym addr */
-	b	fixnext
-fixrel:
-	/* relative fix: increase location by offset */
-	ldr	r1, [r0]
-	add	r1, r1, r9
-fixnext:
-	str	r1, [r0]
-	add	r2, r2, #8		/* each rel.dyn entry is 8 bytes */
-	cmp	r2, r3
-	blo	fixloop
+	define_relocate_code
 #endif
 
-relocate_done:
-
-	bx	lr
-
-_rel_dyn_start_ofs:
-	.word __rel_dyn_start - _start
-_rel_dyn_end_ofs:
-	.word __rel_dyn_end - _start
-_dynsym_start_ofs:
-	.word __dynsym_start - _start
-
-	.globl	c_runtime_cpu_setup
-c_runtime_cpu_setup:
-
-	bx	lr
-
 /****************************************************************************/
 /*									    */
 /* Interrupt handling							    */
diff --git a/arch/arm/cpu/pxa/start.S b/arch/arm/cpu/pxa/start.S
index 3523331..fbad057 100644
--- a/arch/arm/cpu/pxa/start.S
+++ b/arch/arm/cpu/pxa/start.S
@@ -38,6 +38,7 @@ 
 #include <asm-offsets.h>
 #include <config.h>
 #include <version.h>
+#include <asm/start_macro.S>
 
 #ifdef CONFIG_CPU_PXA25X
 #if ((CONFIG_SYS_INIT_SP_ADDR) != 0xfffff800)
@@ -100,52 +101,7 @@  _end_vect:
  *************************************************************************
  */
 
-.globl _TEXT_BASE
-_TEXT_BASE:
-#if defined(CONFIG_SPL_BUILD) && defined(CONFIG_SPL_TEXT_BASE)
-	.word	CONFIG_SPL_TEXT_BASE
-#else
-	.word	CONFIG_SYS_TEXT_BASE
-#endif
-
-/*
- * These are defined in the board-specific linker script.
- * Subtracting _start from them lets the linker put their
- * relative position in the executable instead of leaving
- * them null.
- */
-.globl _bss_start_ofs
-_bss_start_ofs:
-	.word __bss_start - _start
-
-.globl _image_copy_end_ofs
-_image_copy_end_ofs:
-	.word __image_copy_end - _start
-
-.globl _bss_end_ofs
-_bss_end_ofs:
-	.word __bss_end__ - _start
-
-.globl _end_ofs
-_end_ofs:
-	.word _end - _start
-
-#ifdef CONFIG_USE_IRQ
-/* IRQ stack memory (calculated at run-time) */
-.globl IRQ_STACK_START
-IRQ_STACK_START:
-	.word	0x0badc0de
-
-/* IRQ stack memory (calculated at run-time) */
-.globl FIQ_STACK_START
-FIQ_STACK_START:
-	.word 0x0badc0de
-#endif
-
-/* IRQ stack memory (calculated at run-time) + 8 bytes */
-.globl IRQ_STACK_START_IN
-IRQ_STACK_START_IN:
-	.word	0x0badc0de
+	define_start_symbols
 
 /*
  * the actual reset code
@@ -171,94 +127,11 @@  reset:
 	bl	_main
 
 /*------------------------------------------------------------------------------*/
-#ifndef CONFIG_SPL_BUILD
-/*
- * void relocate_code(addr_moni)
- *
- * This function relocates the monitor code.
- */
-	.globl	relocate_code
-relocate_code:
-	mov	r6, r0	/* save addr of destination */
-
-/* Disable the Dcache RAM lock for stack now */
-#ifdef	CONFIG_CPU_PXA25X
-	mov	r12, lr
-	bl	cpu_init_crit
-	mov	lr, r12
-#endif
-
-	adr	r0, _start
-	subs	r9, r6, r0		/* r9 <- relocation offset */
-	beq	relocate_done		/* skip relocation */
-	mov	r1, r6			/* r1 <- scratch for copy_loop */
-	ldr	r3, _image_copy_end_ofs
-	add	r2, r0, r3		/* r2 <- source end address	    */
-
-copy_loop:
-	ldmia	r0!, {r10-r11}		/* copy from source address [r0]    */
-	stmia	r1!, {r10-r11}		/* copy to   target address [r1]    */
-	cmp	r0, r2			/* until source end address [r2]    */
-	blo	copy_loop
 
 #ifndef CONFIG_SPL_BUILD
-	/*
-	 * fix .rel.dyn relocations
-	 */
-	ldr	r0, _TEXT_BASE		/* r0 <- Text base */
-	ldr	r10, _dynsym_start_ofs	/* r10 <- sym table ofs */
-	add	r10, r10, r0		/* r10 <- sym table in FLASH */
-	ldr	r2, _rel_dyn_start_ofs	/* r2 <- rel dyn start ofs */
-	add	r2, r2, r0		/* r2 <- rel dyn start in FLASH */
-	ldr	r3, _rel_dyn_end_ofs	/* r3 <- rel dyn end ofs */
-	add	r3, r3, r0		/* r3 <- rel dyn end in FLASH */
-fixloop:
-	ldr	r0, [r2]		/* r0 <- location to fix up, IN FLASH! */
-	add	r0, r0, r9		/* r0 <- location to fix up in RAM */
-	ldr	r1, [r2, #4]
-	and	r7, r1, #0xff
-	cmp	r7, #23			/* relative fixup? */
-	beq	fixrel
-	cmp	r7, #2			/* absolute fixup? */
-	beq	fixabs
-	/* ignore unknown type of fixup */
-	b	fixnext
-fixabs:
-	/* absolute fix: set location to (offset) symbol value */
-	mov	r1, r1, LSR #4		/* r1 <- symbol index in .dynsym */
-	add	r1, r10, r1		/* r1 <- address of symbol in table */
-	ldr	r1, [r1, #4]		/* r1 <- symbol value */
-	add	r1, r1, r9		/* r1 <- relocated sym addr */
-	b	fixnext
-fixrel:
-	/* relative fix: increase location by offset */
-	ldr	r1, [r0]
-	add	r1, r1, r9
-fixnext:
-	str	r1, [r0]
-	add	r2, r2, #8		/* each rel.dyn entry is 8 bytes */
-	cmp	r2, r3
-	blo	fixloop
+	define_relocate_code
 #endif
 
-relocate_done:
-
-	bx	lr
-
-_rel_dyn_start_ofs:
-	.word __rel_dyn_start - _start
-_rel_dyn_end_ofs:
-	.word __rel_dyn_end - _start
-_dynsym_start_ofs:
-	.word __dynsym_start - _start
-
-#endif
-
-	.globl	c_runtime_cpu_setup
-c_runtime_cpu_setup:
-
-	bx	lr
-
 /*
  *************************************************************************
  *
@@ -269,6 +142,9 @@  c_runtime_cpu_setup:
  *
  *************************************************************************
  */
+#ifdef CONFIG_CPU_PXA25X
+ENTRY(relocate_code_prepare)
+#endif
 #if !defined(CONFIG_SKIP_LOWLEVEL_INIT) || defined(CONFIG_CPU_PXA25X)
 cpu_init_crit:
 	/*
@@ -288,8 +164,11 @@  cpu_init_crit:
 	orr	r0, r0, #0x00001000	@ set bit 12 (I) I-Cache
 	mcr	p15, 0, r0, c1, c0, 0
 
-	mov	pc, lr		/* back to my caller */
+	bx	lr		/* back to my caller */
 #endif /* !CONFIG_SKIP_LOWLEVEL_INIT || CONFIG_CPU_PXA25X */
+#ifdef CONFIG_CPU_PXA25X
+ENDPROC(relocate_code_prepare)
+#endif
 
 #ifndef CONFIG_SPL_BUILD
 /*
diff --git a/arch/arm/cpu/s3c44b0/start.S b/arch/arm/cpu/s3c44b0/start.S
index 62093ed..cb9bc6e 100644
--- a/arch/arm/cpu/s3c44b0/start.S
+++ b/arch/arm/cpu/s3c44b0/start.S
@@ -30,6 +30,7 @@ 
 #include <asm-offsets.h>
 #include <config.h>
 #include <version.h>
+#include <asm/start_macro.S>
 
 /*
  * Jump vector table
@@ -62,52 +63,7 @@  _start:	b       reset
  *************************************************************************
  */
 
-.globl _TEXT_BASE
-_TEXT_BASE:
-#if defined(CONFIG_SPL_BUILD) && defined(CONFIG_SPL_TEXT_BASE)
-	.word	CONFIG_SPL_TEXT_BASE
-#else
-	.word	CONFIG_SYS_TEXT_BASE
-#endif
-
-/*
- * These are defined in the board-specific linker script.
- * Subtracting _start from them lets the linker put their
- * relative position in the executable instead of leaving
- * them null.
- */
-.globl _bss_start_ofs
-_bss_start_ofs:
-	.word __bss_start - _start
-
-.globl _image_copy_end_ofs
-_image_copy_end_ofs:
-	.word __image_copy_end - _start
-
-.globl _bss_end_ofs
-_bss_end_ofs:
-	.word __bss_end__ - _start
-
-.globl _end_ofs
-_end_ofs:
-	.word _end - _start
-
-#ifdef CONFIG_USE_IRQ
-/* IRQ stack memory (calculated at run-time) */
-.globl IRQ_STACK_START
-IRQ_STACK_START:
-	.word	0x0badc0de
-
-/* IRQ stack memory (calculated at run-time) */
-.globl FIQ_STACK_START
-FIQ_STACK_START:
-	.word 0x0badc0de
-#endif
-
-/* IRQ stack memory (calculated at run-time) + 8 bytes */
-.globl IRQ_STACK_START_IN
-IRQ_STACK_START_IN:
-	.word	0x0badc0de
+	define_start_symbols
 
 /*
  * the actual reset code
@@ -140,84 +96,10 @@  reset:
 
 /*------------------------------------------------------------------------------*/
 
-/*
- * void relocate_code(addr_moni)
- *
- * This function relocates the monitor code.
- */
-	.globl	relocate_code
-relocate_code:
-	mov	r6, r0	/* save addr of destination */
-
-	adr	r0, _start
-	subs	r9, r6, r0		/* r9 <- relocation offset */
-	beq	relocate_done		/* skip relocation */
-	mov	r1, r6			/* r1 <- scratch for copy_loop */
-	ldr	r3, _image_copy_end_ofs
-	add	r2, r0, r3		/* r2 <- source end address	    */
-
-copy_loop:
-	ldmia	r0!, {r10-r11}		/* copy from source address [r0]    */
-	stmia	r1!, {r10-r11}		/* copy to   target address [r1]    */
-	cmp	r0, r2			/* until source end address [r2]    */
-	blo	copy_loop
-
 #ifndef CONFIG_SPL_BUILD
-	/*
-	 * fix .rel.dyn relocations
-	 */
-	ldr	r0, _TEXT_BASE		/* r0 <- Text base */
-	ldr	r10, _dynsym_start_ofs	/* r10 <- sym table ofs */
-	add	r10, r10, r0		/* r10 <- sym table in FLASH */
-	ldr	r2, _rel_dyn_start_ofs	/* r2 <- rel dyn start ofs */
-	add	r2, r2, r0		/* r2 <- rel dyn start in FLASH */
-	ldr	r3, _rel_dyn_end_ofs	/* r3 <- rel dyn end ofs */
-	add	r3, r3, r0		/* r3 <- rel dyn end in FLASH */
-fixloop:
-	ldr	r0, [r2]		/* r0 <- location to fix up, IN FLASH! */
-	add	r0, r0, r9		/* r0 <- location to fix up in RAM */
-	ldr	r1, [r2, #4]
-	and	r7, r1, #0xff
-	cmp	r7, #23			/* relative fixup? */
-	beq	fixrel
-	cmp	r7, #2			/* absolute fixup? */
-	beq	fixabs
-	/* ignore unknown type of fixup */
-	b	fixnext
-fixabs:
-	/* absolute fix: set location to (offset) symbol value */
-	mov	r1, r1, LSR #4		/* r1 <- symbol index in .dynsym */
-	add	r1, r10, r1		/* r1 <- address of symbol in table */
-	ldr	r1, [r1, #4]		/* r1 <- symbol value */
-	add	r1, r1, r9		/* r1 <- relocated sym addr */
-	b	fixnext
-fixrel:
-	/* relative fix: increase location by offset */
-	ldr	r1, [r0]
-	add	r1, r1, r9
-fixnext:
-	str	r1, [r0]
-	add	r2, r2, #8		/* each rel.dyn entry is 8 bytes */
-	cmp	r2, r3
-	blo	fixloop
+	define_relocate_code
 #endif
 
-relocate_done:
-
-	bx	lr
-
-_rel_dyn_start_ofs:
-	.word __rel_dyn_start - _start
-_rel_dyn_end_ofs:
-	.word __rel_dyn_end - _start
-_dynsym_start_ofs:
-	.word __dynsym_start - _start
-
-	.globl	c_runtime_cpu_setup
-c_runtime_cpu_setup:
-
-	bx	lr
-
 /*
  *************************************************************************
  *
diff --git a/arch/arm/cpu/sa1100/start.S b/arch/arm/cpu/sa1100/start.S
index 207c190..90fafc3 100644
--- a/arch/arm/cpu/sa1100/start.S
+++ b/arch/arm/cpu/sa1100/start.S
@@ -28,6 +28,7 @@ 
 #include <asm-offsets.h>
 #include <config.h>
 #include <version.h>
+#include <asm/start_macro.S>
 
 /*
  *************************************************************************
@@ -72,52 +73,7 @@  _fiq:			.word fiq
  *************************************************************************
  */
 
-.globl _TEXT_BASE
-_TEXT_BASE:
-#if defined(CONFIG_SPL_BUILD) && defined(CONFIG_SPL_TEXT_BASE)
-	.word	CONFIG_SPL_TEXT_BASE
-#else
-	.word	CONFIG_SYS_TEXT_BASE
-#endif
-
-/*
- * These are defined in the board-specific linker script.
- * Subtracting _start from them lets the linker put their
- * relative position in the executable instead of leaving
- * them null.
- */
-.globl _bss_start_ofs
-_bss_start_ofs:
-	.word __bss_start - _start
-
-.globl _image_copy_end_ofs
-_image_copy_end_ofs:
-	.word __image_copy_end - _start
-
-.globl _bss_end_ofs
-_bss_end_ofs:
-	.word __bss_end__ - _start
-
-.globl _end_ofs
-_end_ofs:
-	.word _end - _start
-
-#ifdef CONFIG_USE_IRQ
-/* IRQ stack memory (calculated at run-time) */
-.globl IRQ_STACK_START
-IRQ_STACK_START:
-	.word	0x0badc0de
-
-/* IRQ stack memory (calculated at run-time) */
-.globl FIQ_STACK_START
-FIQ_STACK_START:
-	.word 0x0badc0de
-#endif
-
-/* IRQ stack memory (calculated at run-time) + 8 bytes */
-.globl IRQ_STACK_START_IN
-IRQ_STACK_START_IN:
-	.word	0x0badc0de
+	define_start_symbols
 
 /*
  * the actual reset code
@@ -144,84 +100,10 @@  reset:
 
 /*------------------------------------------------------------------------------*/
 
-/*
- * void relocate_code(addr_moni)
- *
- * This function relocates the monitor code.
- */
-	.globl	relocate_code
-relocate_code:
-	mov	r6, r0	/* save addr of destination */
-
-	adr	r0, _start
-	subs	r9, r6, r0		/* r9 <- relocation offset */
-	beq	relocate_done		/* skip relocation */
-	mov	r1, r6			/* r1 <- scratch for copy_loop */
-	ldr	r3, _image_copy_end_ofs
-	add	r2, r0, r3		/* r2 <- source end address	    */
-
-copy_loop:
-	ldmia	r0!, {r10-r11}		/* copy from source address [r0]    */
-	stmia	r1!, {r10-r11}		/* copy to   target address [r1]    */
-	cmp	r0, r2			/* until source end address [r2]    */
-	blo	copy_loop
-
 #ifndef CONFIG_SPL_BUILD
-	/*
-	 * fix .rel.dyn relocations
-	 */
-	ldr	r0, _TEXT_BASE		/* r0 <- Text base */
-	ldr	r10, _dynsym_start_ofs	/* r10 <- sym table ofs */
-	add	r10, r10, r0		/* r10 <- sym table in FLASH */
-	ldr	r2, _rel_dyn_start_ofs	/* r2 <- rel dyn start ofs */
-	add	r2, r2, r0		/* r2 <- rel dyn start in FLASH */
-	ldr	r3, _rel_dyn_end_ofs	/* r3 <- rel dyn end ofs */
-	add	r3, r3, r0		/* r3 <- rel dyn end in FLASH */
-fixloop:
-	ldr	r0, [r2]		/* r0 <- location to fix up, IN FLASH! */
-	add	r0, r0, r9		/* r0 <- location to fix up in RAM */
-	ldr	r1, [r2, #4]
-	and	r7, r1, #0xff
-	cmp	r7, #23			/* relative fixup? */
-	beq	fixrel
-	cmp	r7, #2			/* absolute fixup? */
-	beq	fixabs
-	/* ignore unknown type of fixup */
-	b	fixnext
-fixabs:
-	/* absolute fix: set location to (offset) symbol value */
-	mov	r1, r1, LSR #4		/* r1 <- symbol index in .dynsym */
-	add	r1, r10, r1		/* r1 <- address of symbol in table */
-	ldr	r1, [r1, #4]		/* r1 <- symbol value */
-	add	r1, r1, r9		/* r1 <- relocated sym addr */
-	b	fixnext
-fixrel:
-	/* relative fix: increase location by offset */
-	ldr	r1, [r0]
-	add	r1, r1, r9
-fixnext:
-	str	r1, [r0]
-	add	r2, r2, #8		/* each rel.dyn entry is 8 bytes */
-	cmp	r2, r3
-	blo	fixloop
+	define_relocate_code
 #endif
 
-relocate_done:
-
-	mov	pc, lr
-
-_rel_dyn_start_ofs:
-	.word __rel_dyn_start - _start
-_rel_dyn_end_ofs:
-	.word __rel_dyn_end - _start
-_dynsym_start_ofs:
-	.word __dynsym_start - _start
-
-	.globl	c_runtime_cpu_setup
-c_runtime_cpu_setup:
-
-	mov	pc, lr
-
 /*
  *************************************************************************
  *
diff --git a/arch/arm/include/asm/start_macro.S b/arch/arm/include/asm/start_macro.S
new file mode 100644
index 0000000..539317e
--- /dev/null
+++ b/arch/arm/include/asm/start_macro.S
@@ -0,0 +1,172 @@ 
+/*
+ * Macros for ARM start.S files
+ *
+ * (C) Copyright 2013 ADVANSEE
+ * Benoît Thébaudeau <benoit.thebaudeau@advansee.com>
+ *
+ * Based on arch/arm/cpu/armv7/start.S, which is:
+ * Copyright (c) 2004	Texas Instruments <r-woodruff2@ti.com>
+ * Copyright (c) 2001	Marius Gröger <mag@sysgo.de>
+ * Copyright (c) 2002	Alex Züpke <azu@sysgo.de>
+ * Copyright (c) 2002	Gary Jennejohn <garyj@denx.de>
+ * Copyright (c) 2003	Richard Woodruff <r-woodruff2@ti.com>
+ * Copyright (c) 2003	Kshitij <kshitij@ti.com>
+ * Copyright (c) 2006-2008 Syed Mohammed Khasim <x0khasim@ti.com>
+ *
+ * See file CREDITS for list of people who contributed to this
+ * project.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of
+ * the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston,
+ * MA 02111-1307 USA
+ */
+
+#ifndef __START_MACRO_S
+#define __START_MACRO_S
+
+#include <config.h>
+#include <asm-offsets.h>
+#include <linux/linkage.h>
+
+	.macro define_start_symbols
+ENTRY(_TEXT_BASE)
+#if defined(CONFIG_SPL_BUILD) && defined(CONFIG_SPL_TEXT_BASE)
+	.word	CONFIG_SPL_TEXT_BASE
+#else
+	.word	CONFIG_SYS_TEXT_BASE
+#endif
+END(_TEXT_BASE)
+
+/*
+ * These are defined in the board-specific linker script.
+ * Subtracting _start from them lets the linker put their
+ * relative position in the executable instead of leaving
+ * them null.
+ */
+
+ENTRY(_end_ofs)
+	.word	_end - _start
+END(_end_ofs)
+
+ENTRY(_bss_start_ofs)
+	.word	__bss_start - _start
+END(_bss_start_ofs)
+
+ENTRY(_bss_end_ofs)
+	.word	__bss_end__ - _start
+END(_bss_end_ofs)
+
+#ifdef CONFIG_USE_IRQ
+/* IRQ stack memory (calculated at run-time) */
+ENTRY(IRQ_STACK_START)
+	.word	0x0badc0de
+END(IRQ_STACK_START)
+
+/* FIQ stack memory (calculated at run-time) */
+ENTRY(FIQ_STACK_START)
+	.word	0x0badc0de
+END(FIQ_STACK_START)
+#endif
+
+/* IRQ stack memory (calculated at run-time) + 8 bytes */
+ENTRY(IRQ_STACK_START_IN)
+	.word	0x0badc0de
+END(IRQ_STACK_START_IN)
+	.endm /* define_start_symbols */
+
+/*----------------------------------------------------------------------------*/
+
+	.macro define_relocate_code
+/*
+ * void relocate_code(addr_moni)
+ *
+ * This function relocates the monitor code.
+ */
+ENTRY(relocate_code)
+	mov	r6, r0			/* save addr of destination */
+
+	mov	r4, lr
+	bl	relocate_code_prepare
+	mov	lr, r4
+
+	adr	r0, _start
+	subs	r9, r6, r0		/* r9 <- relocation offset */
+	beq	relocate_done		/* skip relocation */
+	mov	r1, r6			/* r1 <- scratch for copy_loop */
+	ldr	r3, _image_copy_end_ofs
+	add	r2, r0, r3		/* r2 <- source end address */
+
+copy_loop:
+	ldmia	r0!, {r4-r5}		/* copy from source address [r0] */
+	stmia	r1!, {r4-r5}		/* copy to   target address [r1] */
+	cmp	r0, r2			/* until source end address [r2] */
+	blo	copy_loop
+
+#ifndef CONFIG_SPL_BUILD
+	/*
+	 * fix .rel.dyn relocations
+	 */
+	ldr	r0, _TEXT_BASE		/* r0 <- Text base */
+	ldr	r10, _dynsym_start_ofs	/* r10 <- sym table ofs */
+	add	r10, r10, r0		/* r10 <- sym table in FLASH */
+	ldr	r2, _rel_dyn_start_ofs	/* r2 <- rel dyn start ofs */
+	add	r2, r2, r0		/* r2 <- rel dyn start in FLASH */
+	ldr	r3, _rel_dyn_end_ofs	/* r3 <- rel dyn end ofs */
+	add	r3, r3, r0		/* r3 <- rel dyn end in FLASH */
+fixloop:
+	ldr	r0, [r2]		/* r0 <- location to fix up in FLASH */
+	add	r0, r0, r9		/* r0 <- location to fix up in RAM */
+	ldr	r1, [r2, #4]
+	and	r7, r1, #0xff
+	cmp	r7, #23			/* relative fixup? */
+	beq	fixrel
+	cmp	r7, #2			/* absolute fixup? */
+	beq	fixabs
+	/* ignore unknown type of fixup */
+	b	fixnext
+fixabs:
+	/* absolute fix: set location to (offset) symbol value */
+	mov	r1, r1, LSR #4		/* r1 <- symbol index in .dynsym */
+	add	r1, r10, r1		/* r1 <- address of symbol in table */
+	ldr	r1, [r1, #4]		/* r1 <- symbol value */
+	add	r1, r1, r9		/* r1 <- relocated sym addr */
+	b	fixnext
+fixrel:
+	/* relative fix: increase location by offset */
+	ldr	r1, [r0]
+	add	r1, r1, r9
+fixnext:
+	str	r1, [r0]
+	add	r2, r2, #8		/* each rel.dyn entry is 8 bytes */
+	cmp	r2, r3
+	blo	fixloop
+#endif
+
+relocate_done:
+	b	relocate_code_finish
+
+_image_copy_end_ofs:
+	.word	__image_copy_end - _start
+#ifndef CONFIG_SPL_BUILD
+_rel_dyn_start_ofs:
+	.word	__rel_dyn_start - _start
+_rel_dyn_end_ofs:
+	.word	__rel_dyn_end - _start
+_dynsym_start_ofs:
+	.word	__dynsym_start - _start
+#endif
+ENDPROC(relocate_code)
+	.endm /* define_relocate_code */
+
+#endif /* __START_MACRO_S */
diff --git a/arch/arm/lib/crt0.S b/arch/arm/lib/crt0.S
index 6fcb7d1..4559d40 100644
--- a/arch/arm/lib/crt0.S
+++ b/arch/arm/lib/crt0.S
@@ -68,9 +68,30 @@ 
  */
 
 /*
+ * A section is created for each symbol in order to optimize section garbage
+ * collection at link time if --gc-sections is used, which is the case for SPL
+ * builds.
+ */
+
+	.section	.text.crt0_default_hooks,"ax",%progbits
+ENTRY(relocate_code_prepare)
+ENTRY(relocate_code_finish)
+ENTRY(c_runtime_cpu_setup)
+#ifdef __ARM_ARCH_4__
+	mov	pc, lr
+#else
+	bx	lr
+#endif
+ENDPROC(c_runtime_cpu_setup)
+ENDPROC(relocate_code_finish)
+ENDPROC(relocate_code_prepare)
+	.weak	relocate_code_prepare, relocate_code_finish, c_runtime_cpu_setup
+
+/*
  * entry point of crt0 sequence
  */
 
+	.section	.text._main,"ax",%progbits
 ENTRY(_main)
 
 /*