Patchwork [U-Boot,RFC,1/3] arm920t: do not set register useless

login
register
mail settings
Submitter Andreas Bießmann
Date Nov. 30, 2010, 7:06 a.m.
Message ID <1291100800-61850-2-git-send-email-andreas.devel@googlemail.com>
Download mbox | patch
Permalink /patch/73576/
State RFC
Headers show

Comments

Andreas Bießmann - Nov. 30, 2010, 7:06 a.m.
in case we are still at relocation target address before relocation we
do not need to load the registers needed for relocation. We should
instead skip the whole relocation part and jump over to clear_bss.

Also prepare to not use target address twice. When we use a scratch
register here r6 is unchanged and can be used later on.

Signed-off-by: Andreas Bießmann <andreas.devel@googlemail.com>
---
 arch/arm/cpu/arm920t/start.S |    7 ++++---
 1 files changed, 4 insertions(+), 3 deletions(-)
Albert ARIBAUD - Nov. 30, 2010, 8:07 a.m.
Hi Andreas,

Le 30/11/2010 08:06, Andreas Bießmann a écrit :

> diff --git a/arch/arm/cpu/arm920t/start.S b/arch/arm/cpu/arm920t/start.S
> index 01edb9b..71de373 100644
> --- a/arch/arm/cpu/arm920t/start.S
> +++ b/arch/arm/cpu/arm920t/start.S
> @@ -208,15 +208,16 @@ stack_setup:
>   	mov	sp, r4
>
>   	adr	r0, _start
> +	cmp	r0, r6
> +	beq	clear_bss		/* skip relocation */
> +	mov	r1, r6

Why use r1?

>   	ldr	r2, _TEXT_BASE
>   	ldr	r3, _bss_start_ofs
>   	add	r2, r0, r3		/* r2<- source end address	    */
> -	cmp	r0, r6
> -	beq	clear_bss
>
>   copy_loop:
>   	ldmia	r0!, {r9-r10}		/* copy from source address [r0]    */
> -	stmia	r6!, {r9-r10}		/* copy to   target address [r1]    */
> +	stmia	r1!, {r9-r10}		/* copy to   target address [r1]    */

Ditto.

>   	cmp	r0, r2			/* until source end address [r2]    */
>   	blo	copy_loop

Amicalement,
Andreas Bießmann - Nov. 30, 2010, 8:28 a.m.
Dear Albert ARIBAUD,

Am 30.11.2010 09:07, schrieb Albert ARIBAUD:
> Hi Andreas,
> 
> Le 30/11/2010 08:06, Andreas Bießmann a écrit :
> 
>> diff --git a/arch/arm/cpu/arm920t/start.S b/arch/arm/cpu/arm920t/start.S
>> index 01edb9b..71de373 100644
>> --- a/arch/arm/cpu/arm920t/start.S
>> +++ b/arch/arm/cpu/arm920t/start.S
>> @@ -208,15 +208,16 @@ stack_setup:
>>       mov    sp, r4
>>
>>       adr    r0, _start
>> +    cmp    r0, r6
>> +    beq    clear_bss        /* skip relocation */
>> +    mov    r1, r6
> 
> Why use r1?

Use a scratch register here cause stmia Rn! does increment Rn.

> 
>>       ldr    r2, _TEXT_BASE
>>       ldr    r3, _bss_start_ofs
>>       add    r2, r0, r3        /* r2<- source end address        */
>> -    cmp    r0, r6
>> -    beq    clear_bss
>>
>>   copy_loop:
>>       ldmia    r0!, {r9-r10}        /* copy from source address
>> [r0]    */
>> -    stmia    r6!, {r9-r10}        /* copy to   target address [r1]    */
>> +    stmia    r1!, {r9-r10}        /* copy to   target address [r1]    */

Therefore usage of r6 here would destroy the saved 'addr of
destination'. Cause that fact we have saved the 'addr of destination' @
beginning of relocate_code twice, once to r6 and once to r7. This is
obviously not needed iv we change the register in copy_loop (which was
already used in comment .. if you saw that on the end of the line).

I doubt if this a 'speed up' but we can have a cleaner interface. We
have r4, r5, r6 as storage of input values to relocate_code. We do only
use scratch registers for copy_loop, fixloop for relocation, clear_bss
later on. We do decide if we need fixloop for relocation or if we can
skip that and then setup the relevant scratch registers.

Well this is only an RFC. I found that plus the NULL pointer stuff worth
to mention. It is not that important to get this special patch in cause
the current implementation do also work.

regards

Andreas Bießmann
Albert ARIBAUD - Nov. 30, 2010, 9:25 a.m.
Le 30/11/2010 09:28, Andreas Bießmann a écrit :

>>> +    beq    clear_bss        /* skip relocation */
>>> +    mov    r1, r6
>>
>> Why use r1?
>
> Use a scratch register here cause stmia Rn! does increment Rn.

> Therefore usage of r6 here would destroy the saved 'addr of
> destination'. Cause that fact we have saved the 'addr of destination' @
> beginning of relocate_code twice, once to r6 and once to r7. This is
> obviously not needed iv we change the register in copy_loop (which was
> already used in comment .. if you saw that on the end of the line).

Understood.

> I doubt if this a 'speed up' but we can have a cleaner interface. We
> have r4, r5, r6 as storage of input values to relocate_code. We do only
> use scratch registers for copy_loop, fixloop for relocation, clear_bss
> later on. We do decide if we need fixloop for relocation or if we can
> skip that and then setup the relevant scratch registers.
>
> Well this is only an RFC. I found that plus the NULL pointer stuff worth
> to mention. It is not that important to get this special patch in cause
> the current implementation do also work.

The r8 issue and the zero-filled reloc entry issue are important to pull 
in as they can cause all sorts of time-wasting issues.

> regards
>
> Andreas Bießmann

Amicalement,

Patch

diff --git a/arch/arm/cpu/arm920t/start.S b/arch/arm/cpu/arm920t/start.S
index 01edb9b..71de373 100644
--- a/arch/arm/cpu/arm920t/start.S
+++ b/arch/arm/cpu/arm920t/start.S
@@ -208,15 +208,16 @@  stack_setup:
 	mov	sp, r4
 
 	adr	r0, _start
+	cmp	r0, r6
+	beq	clear_bss		/* skip relocation */
+	mov	r1, r6
 	ldr	r2, _TEXT_BASE
 	ldr	r3, _bss_start_ofs
 	add	r2, r0, r3		/* r2 <- source end address	    */
-	cmp	r0, r6
-	beq	clear_bss
 
 copy_loop:
 	ldmia	r0!, {r9-r10}		/* copy from source address [r0]    */
-	stmia	r6!, {r9-r10}		/* copy to   target address [r1]    */
+	stmia	r1!, {r9-r10}		/* copy to   target address [r1]    */
 	cmp	r0, r2			/* until source end address [r2]    */
 	blo	copy_loop