diff mbox

[U-Boot,RFC] armv7: fixloop: don't fixup if location is NULL

Message ID 4D186A13.4080004@samsung.com
State RFC
Headers show

Commit Message

Minkyu Kang Dec. 27, 2010, 10:27 a.m. UTC
There is possibility that pointers set to NULL before relocation.
In this case, system is hang, because of r0 is invalid location in RAM.

Signed-off-by: Minkyu Kang <mk7.kang@samsung.com>
---
 arch/arm/cpu/armv7/start.S |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

Comments

Andreas Bießmann Jan. 8, 2011, 10:32 a.m. UTC | #1
Dear Minkyu Kang,

Am 27.12.2010 um 11:27 schrieb Minkyu Kang:

> There is possibility that pointers set to NULL before relocation.
> In this case, system is hang, because of r0 is invalid location in RAM.
> 
> Signed-off-by: Minkyu Kang <mk7.kang@samsung.com>
> ---
> arch/arm/cpu/armv7/start.S |    3 +++
> 1 files changed, 3 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/arm/cpu/armv7/start.S b/arch/arm/cpu/armv7/start.S
> index 684f2d2..4eeb12a 100644
> --- a/arch/arm/cpu/armv7/start.S
> +++ b/arch/arm/cpu/armv7/start.S
> @@ -195,6 +195,8 @@ copy_loop:
> 	add	r3, r3, r0		/* r3 <- rel dyn end in FLASH */
> fixloop:
> 	ldr	r0, [r2]		/* r0 <- location to fix up, IN FLASH! */
> +	cmp	r0, #0
> +	beq	fixskip

I doubt this is correct. In my investigations for 'NULL fixup' (-> see http://thread.gmane.org/gmane.comp.boot-loaders.u-boot/89864/focus=89906) pointed out that only symbols in 'absolute fixup' loop could be 'NULL' if there is a not aliased/empty weakly linked symbol. I did never see a 'NULL' symbol for 'relative fixup' loop!

Therefore I doubt it is correct to check the location at this place. Can you please give an example?

regards

Andreas Bießmann
Albert ARIBAUD Jan. 8, 2011, 10:49 a.m. UTC | #2
Le 08/01/2011 11:32, Andreas Bießmann a écrit :
> Dear Minkyu Kang,
>
> Am 27.12.2010 um 11:27 schrieb Minkyu Kang:
>
>> There is possibility that pointers set to NULL before relocation.
>> In this case, system is hang, because of r0 is invalid location in RAM.
>>
>> Signed-off-by: Minkyu Kang<mk7.kang@samsung.com>
>> ---
>> arch/arm/cpu/armv7/start.S |    3 +++
>> 1 files changed, 3 insertions(+), 0 deletions(-)
>>
>> diff --git a/arch/arm/cpu/armv7/start.S b/arch/arm/cpu/armv7/start.S
>> index 684f2d2..4eeb12a 100644
>> --- a/arch/arm/cpu/armv7/start.S
>> +++ b/arch/arm/cpu/armv7/start.S
>> @@ -195,6 +195,8 @@ copy_loop:
>> 	add	r3, r3, r0		/* r3<- rel dyn end in FLASH */
>> fixloop:
>> 	ldr	r0, [r2]		/* r0<- location to fix up, IN FLASH! */
>> +	cmp	r0, #0
>> +	beq	fixskip
>
> I doubt this is correct. In my investigations for 'NULL fixup' (->  see http://thread.gmane.org/gmane.comp.boot-loaders.u-boot/89864/focus=89906) pointed out that only symbols in 'absolute fixup' loop could be 'NULL' if there is a not aliased/empty weakly linked symbol. I did never see a 'NULL' symbol for 'relative fixup' loop!
>
> Therefore I doubt it is correct to check the location at this place. Can you please give an example?
>
> regards
>
> Andreas Bießmann

Oops. Thanks Andreas for pointing this out. I second the question.

Amicalement,
Albert ARIBAUD Jan. 8, 2011, 12:18 p.m. UTC | #3
Le 08/01/2011 11:49, Albert ARIBAUD a écrit :

>> In my investigations for 'NULL fixup' (-> see
>> http://thread.gmane.org/gmane.comp.boot-loaders.u-boot/89864/focus=89906)
>> pointed out that only symbols in 'absolute fixup' loop could be 'NULL'
>> if there is a not aliased/empty weakly linked symbol.

BTW: is such a situation normal? IIUC, it means there is a weakly linked 
symbol without *any* defintion, *and* it is referenced in the code. 
Granted, maybe it is checked before it is referenced, but we may want to 
check for and report at build time if possible. Would that be useful?

Amicalement,
Joakim Tjernlund Jan. 8, 2011, 4:44 p.m. UTC | #4
>
> Le 08/01/2011 11:49, Albert ARIBAUD a écrit :
>
> >> In my investigations for 'NULL fixup' (-> see
> >> http://thread.gmane.org/gmane.comp.boot-loaders.u-boot/89864/focus=89906)
> >> pointed out that only symbols in 'absolute fixup' loop could be 'NULL'
> >> if there is a not aliased/empty weakly linked symbol.
>
> BTW: is such a situation normal? IIUC, it means there is a weakly linked
> symbol without *any* defintion, *and* it is referenced in the code.
> Granted, maybe it is checked before it is referenced, but we may want to
> check for and report at build time if possible. Would that be useful?

Not very normal but it is a bug that should be fixed so you don't
crash and burn. The behaviour should be the same on all archs.

Then you could consider adding a build time failure for undef weaks
if WD et. all think it is a good idea. I don't see
a good reason to ban undef. weaks ATM. One day they might be needed.
Andreas Bießmann Jan. 8, 2011, 4:51 p.m. UTC | #5
Dear Albert ARIBAUD,

Am 08.01.2011 um 13:18 schrieb Albert ARIBAUD:

> Le 08/01/2011 11:49, Albert ARIBAUD a écrit :
> 
>>> In my investigations for 'NULL fixup' (-> see
>>> http://thread.gmane.org/gmane.comp.boot-loaders.u-boot/89864/focus=89906)
>>> pointed out that only symbols in 'absolute fixup' loop could be 'NULL'
>>> if there is a not aliased/empty weakly linked symbol.
> 
> BTW: is such a situation normal? IIUC, it means there is a weakly linked symbol without *any* defintion, *and* it is referenced in the code.

Yes you may have a weakly linked symbol which was declared but nowhere implemented. See http://patchwork.ozlabs.org/patch/73647, this fixed such a situation for arm920t/at91 style SoC.

BTW: Without the mentioned patch there was another issue with linking (-> see http://patchwork.ozlabs.org/patch/73563). The linker introduced a .plt section which could not be placed and lead to a segfault of linker.

> Granted, maybe it is checked before it is referenced, but we may want to check for and report at build time if possible. Would that be useful?

AFAIR there was a statement to remove those 'undefined weakly linked symbols' from code. So it would be useful to have a tool to detect those symbols at build time.

regards

Andreas Bießmann
Albert ARIBAUD Jan. 9, 2011, 9 a.m. UTC | #6
Hi Andreas,

Le 08/01/2011 17:51, Andreas Bießmann a écrit :
> Dear Albert ARIBAUD,
>
> Am 08.01.2011 um 13:18 schrieb Albert ARIBAUD:
>
>> Le 08/01/2011 11:49, Albert ARIBAUD a écrit :
>>
>>>> In my investigations for 'NULL fixup' (->  see
>>>> http://thread.gmane.org/gmane.comp.boot-loaders.u-boot/89864/focus=89906)
>>>> pointed out that only symbols in 'absolute fixup' loop could be 'NULL'
>>>> if there is a not aliased/empty weakly linked symbol.
>>
>> BTW: is such a situation normal? IIUC, it means there is a weakly linked symbol without *any* defintion, *and* it is referenced in the code.
>
> Yes you may have a weakly linked symbol which was declared but nowhere
> implemented. See http://patchwork.ozlabs.org/patch/73647, this fixed such
> a situation for arm920t/at91 style SoC.

Not sure I follow you there. The example you give has a definition, 
admittedly for an empty function, right above the weak definition. My 
question is about cases where the weak symbol is declared and has no 
definition at all. Were you meaning to give an example of an undefined
weak symbol being fixed so that it has at least a default definition?

> BTW: Without the mentioned patch there was another issue with linking
> (->  see http://patchwork.ozlabs.org/patch/73563). The linker introduced
> a .plt section which could not be placed and lead to a segfault of linker.

That's more of a linker bug to me. The plt sections are unused that I 
know of. They could probably be put after BSS and marked NOLOAD -- 
giving it a try would be a nice thing.

>> Granted, maybe it is checked before it is referenced, but we may want to
>> check for and report at build time if possible. Would that be useful?
>
> AFAIR there was a statement to remove those 'undefined weakly linked symbols'
> from code. So it would be useful to have a tool to detect those symbols at build
> time.

I'll have a look at what can be done.

> regards
>
> Andreas Bießmann

Amicalement,
Andreas Bießmann Jan. 9, 2011, 9:26 p.m. UTC | #7
Hi Albert,

Am 09.01.2011 um 10:00 schrieb Albert ARIBAUD:

> Hi Andreas,
> 
> Le 08/01/2011 17:51, Andreas Bießmann a écrit :
>> Dear Albert ARIBAUD,
>> 
>> Am 08.01.2011 um 13:18 schrieb Albert ARIBAUD:
>> 
>>> Le 08/01/2011 11:49, Albert ARIBAUD a écrit :
>>> 
>>>>> In my investigations for 'NULL fixup' (->  see
>>>>> http://thread.gmane.org/gmane.comp.boot-loaders.u-boot/89864/focus=89906)
>>>>> pointed out that only symbols in 'absolute fixup' loop could be 'NULL'
>>>>> if there is a not aliased/empty weakly linked symbol.
>>> 
>>> BTW: is such a situation normal? IIUC, it means there is a weakly linked symbol without *any* defintion, *and* it is referenced in the code.
>> 
>> Yes you may have a weakly linked symbol which was declared but nowhere
>> implemented. See http://patchwork.ozlabs.org/patch/73647, this fixed such
>> a situation for arm920t/at91 style SoC.
> 
> Not sure I follow you there.

Ok, so I may got you wrong before.

> The example you give has a definition, admittedly for an empty function, right above the weak definition. My question is about cases where the weak symbol is declared and has no definition at all. Were you meaning to give an example of an undefined
> weak symbol being fixed so that it has at least a default definition?

The mentioned patch did define an empty function stub to have a default definition for this case (board_reset()). This fixed two points
 a) the patch sets an address for board_reset() in .dynsym section other than '0' as it was before that patch
 b) the patch fixed an (maybe) linker bug which let the linker segfault if no .plt section was placed in linker script

The real point behind that is the fact that I could never see a '0' in .rel.dyn section's first 4 bytes. But I did see the value '0' in address part of .dynsym section when I did have a undefined weakly linked symbol.

Therefore I doubt the 'armv7: fixloop: don't fixup if location is NULL' patch do something meaningful. My first approach to get my board working was similar to that patch. But further investigation pointed out that it was wrong! In my case the .rel.dyn section got corrupted due to some bss manipulation before bss setup. That corruption did set the address segment of .rel.dyn entry to another value which lead to an aborted write later on (you know the story).

>> BTW: Without the mentioned patch there was another issue with linking
>> (->  see http://patchwork.ozlabs.org/patch/73563). The linker introduced
>> a .plt section which could not be placed and lead to a segfault of linker.
> 
> That's more of a linker bug to me.

Maybe ..

> The plt sections are unused that I know of.

could not find any meaningful definition about that. AFAIR one doc showed some connection to runtime library loading ... Nevertheless the content of .plt section in my case was not really clear to me.

> They could probably be put after BSS and marked NOLOAD -- giving it a try would be a nice thing.

I needed to add .plt to my linker script, did not try NOLOAD til now.

regards

Andreas Bießmann
Minkyu Kang Jan. 10, 2011, 7:31 a.m. UTC | #8
Hello,

On 8 January 2011 19:49, Albert ARIBAUD <albert.aribaud@free.fr> wrote:
>>> diff --git a/arch/arm/cpu/armv7/start.S b/arch/arm/cpu/armv7/start.S
>>> index 684f2d2..4eeb12a 100644
>>> --- a/arch/arm/cpu/armv7/start.S
>>> +++ b/arch/arm/cpu/armv7/start.S
>>> @@ -195,6 +195,8 @@ copy_loop:
>>>      add     r3, r3, r0              /* r3<- rel dyn end in FLASH */
>>> fixloop:
>>>      ldr     r0, [r2]                /* r0<- location to fix up, IN FLASH! */
>>> +    cmp     r0, #0
>>> +    beq     fixskip
>>
>> I doubt this is correct. In my investigations for 'NULL fixup' (->  see http://thread.gmane.org/gmane.comp.boot-loaders.u-boot/89864/focus=89906) pointed out that only symbols in 'absolute fixup' loop could be 'NULL' if there is a not aliased/empty weakly linked symbol. I did never see a 'NULL' symbol for 'relative fixup' loop!
>>
>> Therefore I doubt it is correct to check the location at this place. Can you please give an example?
>>
>> regards
>>
>> Andreas Bießmann
>
> Oops. Thanks Andreas for pointing this out. I second the question.
>

I found another case.
Please see commit message.
"There is possibility that pointers set to NULL before relocation"

Simple test.
Declared function pointer.

int (*test_func)(void);

And then, set to NULL at arch_cpu_init()

I got follow result.

___address____|________0________4________8________C
  ZSD:44833440|>00000000 00000017 44800024 00000017
  ZSD:44833450| 44800028 00000017 4480002C 00000017

44833440(test_func) is zero and relative fixup (0x17).

In this case, because of try to load the data from r0 (line 216),
system is hang.
So, we have to skip the fixup.

Thanks
Minkyu Kang.
Wolfgang Denk Jan. 10, 2011, 10:20 a.m. UTC | #9
Dear Minkyu Kang,

In message <AANLkTimvXwEBjUXQuwFyRwHb8y-oX-goCvpFcL1_1CCu@mail.gmail.com> you wrote:
> 
> Declared function pointer.
> 
> int (*test_func)(void);

This results in a symbol in bss segment, right?

> And then, set to NULL at arch_cpu_init()

Such an assignment is illegal then. Bss has not been initalized before
relocation, and must not be accessed (neither read nor write).

Best regards,

Wolfgang Denk
Minkyu Kang Jan. 10, 2011, 11:30 a.m. UTC | #10
Dear Wolfgang Denk,

On 10 January 2011 19:20, Wolfgang Denk <wd@denx.de> wrote:
> Dear Minkyu Kang,
>
> In message <AANLkTimvXwEBjUXQuwFyRwHb8y-oX-goCvpFcL1_1CCu@mail.gmail.com> you wrote:
>>
>> Declared function pointer.
>>
>> int (*test_func)(void);
>
> This results in a symbol in bss segment, right?
>
>> And then, set to NULL at arch_cpu_init()
>
> Such an assignment is illegal then. Bss has not been initalized before
> relocation, and must not be accessed (neither read nor write).

Illegal? as a result, yes.
But we do many things before the reloaction as arch init, board init and so on.
There is possibility that the system is hang.
Because of there is no protection of access the memory.
This patch is for prevent it.
In any case, the system must be go on.

And one more thing.
How about lcd_setmem function?
panel_info is located at bss area, but lcd_setmem access this structure.
Is it illegal?

Thanks
Minkyu Kang.
Wolfgang Denk Jan. 10, 2011, 12:14 p.m. UTC | #11
Dear Minkyu Kang,

In message <AANLkTimgBjOSE8c+_6AsUon5KnNR1_UKNqc=Wf_UD2+J@mail.gmail.com> you wrote:
> 
> >> int (*test_func)(void);
> >
> > This results in a symbol in bss segment, right?
> >
> >> And then, set to NULL at arch_cpu_init()
> >
> > Such an assignment is illegal then. Bss has not been initalized before
> > relocation, and must not be accessed (neither read nor write).
> 
> Illegal? as a result, yes.

No, illegal as an action.  You MUST NOT access any symbols in BSS
before relocation (more precisely, before bss has been initialized).

And you MUST NOT write any symbols in data segment before relocation,
either.

In both cases, the result of such actions is undefined behaviour.

> But we do many things before the reloaction as arch init, board init and so on.

Of course, but as mentioned we must not read or write to symbols in
bss, and we must not write to symbols in data segment.

> How about lcd_setmem function?
> panel_info is located at bss area, but lcd_setmem access this structure.
> Is it illegal?

This must not be done before relocation.

Best regards,

Wolfgang Denk
Minkyu Kang Jan. 10, 2011, 2:04 p.m. UTC | #12
Dear Wolfgang Denk,

On 10 January 2011 21:14, Wolfgang Denk <wd@denx.de> wrote:
> Dear Minkyu Kang,
>
> In message <AANLkTimgBjOSE8c+_6AsUon5KnNR1_UKNqc=Wf_UD2+J@mail.gmail.com> you wrote:
>>
>> >> int (*test_func)(void);
>> >
>> > This results in a symbol in bss segment, right?
>> >
>> >> And then, set to NULL at arch_cpu_init()
>> >
>> > Such an assignment is illegal then. Bss has not been initalized before
>> > relocation, and must not be accessed (neither read nor write).
>>
>> Illegal? as a result, yes.
>
> No, illegal as an action.  You MUST NOT access any symbols in BSS
> before relocation (more precisely, before bss has been initialized).
>
> And you MUST NOT write any symbols in data segment before relocation,
> either.
>
> In both cases, the result of such actions is undefined behaviour.
>
>> But we do many things before the reloaction as arch init, board init and so on.
>
> Of course, but as mentioned we must not read or write to symbols in
> bss, and we must not write to symbols in data segment.
>
>> How about lcd_setmem function?
>> panel_info is located at bss area, but lcd_setmem access this structure.
>> Is it illegal?
>
> This must not be done before relocation.
>

No, please see 360 line of arch/arm/lib/board.c
This function is called before relocation.

And how about init_func_i2c()?
This function is called twice, before the relocation and after relocation.
When we use board_i2c_init function then, there is possibility that
use symbols in bss because of this function is called after
relocation.

If we ignore this exception, it will be a big constraint.

btw, there are any side effects on my patch?
I think.. It is just a little safety feature.

Thanks
Minkyu Kang
Albert ARIBAUD Jan. 10, 2011, 5:21 p.m. UTC | #13
Le 10/01/2011 15:04, Minkyu Kang a écrit :

>>> How about lcd_setmem function?
>>> panel_info is located at bss area, but lcd_setmem access this structure.
>>> Is it illegal?
>>
>> This must not be done before relocation.
>
> No, please see 360 line of arch/arm/lib/board.c
> This function is called before relocation.

Then it cannot access panel_info, which is "not there yet" at the time 
lcd_setmem() executes.

You must either move the call to lcd_setmem() to after relocation, or 
find a way not to depend on BSS.

> And how about init_func_i2c()?
> This function is called twice, before the relocation and after relocation.
> When we use board_i2c_init function then, there is possibility that
> use symbols in bss because of this function is called after
> relocation.

If it is used both before and after relocation, then it has to respect 
the strictest case, which is before relocation, and not access BSS.

> If we ignore this exception, it will be a big constraint.
>
> btw, there are any side effects on my patch?
> I think.. It is just a little safety feature.

Regardless of the patch, if your code writes to panel_info or any other 
BSS variable before relocation it will trash the relocation tables that 
exist at BSS location at this point.

IOW, accessing BSS before relocation is forbidden, not just out of 
fancy, but for a serious reason.

> Thanks
> Minkyu Kang

Amicalement,
Minkyu Kang Jan. 11, 2011, 10:57 a.m. UTC | #14
Dear Albert ARIBAUD,

On 11 January 2011 02:21, Albert ARIBAUD <albert.aribaud@free.fr> wrote:
> Le 10/01/2011 15:04, Minkyu Kang a écrit :
>
>>>> How about lcd_setmem function?
>>>> panel_info is located at bss area, but lcd_setmem access this structure.
>>>> Is it illegal?
>>>
>>> This must not be done before relocation.
>>
>> No, please see 360 line of arch/arm/lib/board.c
>> This function is called before relocation.
>
> Then it cannot access panel_info, which is "not there yet" at the time
> lcd_setmem() executes.

No, access for reserve the memory for LCD and we got wrong values.
Did you test about it?

>
> You must either move the call to lcd_setmem() to after relocation, or find a
> way not to depend on BSS.

This problem is not belong to my code.
Move after relocation? it's easy.
but, how we can reserve the memory for LCD?

>
>> And how about init_func_i2c()?
>> This function is called twice, before the relocation and after relocation.
>> When we use board_i2c_init function then, there is possibility that
>> use symbols in bss because of this function is called after
>> relocation.
>
> If it is used both before and after relocation, then it has to respect the
> strictest case, which is before relocation, and not access BSS.
>
>> If we ignore this exception, it will be a big constraint.
>>
>> btw, there are any side effects on my patch?
>> I think.. It is just a little safety feature.
>
> Regardless of the patch, if your code writes to panel_info or any other BSS
> variable before relocation it will trash the relocation tables that exist at
> BSS location at this point.
>
> IOW, accessing BSS before relocation is forbidden, not just out of fancy,
> but for a serious reason.

This patch is not for accessing BSS before relocation,
it's for prevent exceptions.

Thanks
Minkyu Kang
Wolfgang Denk Jan. 11, 2011, 11:03 a.m. UTC | #15
Dear Minkyu Kang,

In message <AANLkTikj4LCUUukDKVCG8Shbi6ZHKU9vzGDz0fNG=87b@mail.gmail.com> you wrote:
> 
> This problem is not belong to my code.
> Move after relocation? it's easy.
> but, how we can reserve the memory for LCD?

Reservation of video memory is a standard task in the init sequence.
See this section in "arch/arm/lib/board.c":

358 #ifdef CONFIG_LCD
359         /* reserve memory for LCD display (always full pages) */
360         addr = lcd_setmem (addr);
361         gd->fb_base = addr;
362 #endif /* CONFIG_LCD */


Best regards,

Wolfgang Denk
Minkyu Kang Jan. 11, 2011, 11:13 a.m. UTC | #16
On 11 January 2011 20:03, Wolfgang Denk <wd@denx.de> wrote:
> Dear Minkyu Kang,
>
> In message <AANLkTikj4LCUUukDKVCG8Shbi6ZHKU9vzGDz0fNG=87b@mail.gmail.com> you wrote:
>>
>> This problem is not belong to my code.
>> Move after relocation? it's easy.
>> but, how we can reserve the memory for LCD?
>
> Reservation of video memory is a standard task in the init sequence.
> See this section in "arch/arm/lib/board.c":
>
> 358 #ifdef CONFIG_LCD
> 359         /* reserve memory for LCD display (always full pages) */
> 360         addr = lcd_setmem (addr);
> 361         gd->fb_base = addr;
> 362 #endif /* CONFIG_LCD */
>
>

Yes I know...
This init sequence is run before the relocation, right?

Please see lcd_setmem function.
This function access panel_info that is uninitialized.
Is it correct?

 438 ulong lcd_setmem (ulong addr)
 439 {
 440         ulong size;
 441         int line_length = (panel_info.vl_col * NBITS
(panel_info.vl_bpix)) / 8;
 442
 443         debug ("LCD panel info: %d x %d, %d bit/pix\n",
 444                 panel_info.vl_col, panel_info.vl_row, NBITS
(panel_info.vl_bpix) );
 445
 446         size = line_length * panel_info.vl_row;
 447
 448         /* Round up to nearest full page */
 449         size = (size + (PAGE_SIZE - 1)) & ~(PAGE_SIZE - 1);
 450
 451         /* Allocate pages for the frame buffer. */
 452         addr -= size;
 453
 454         debug ("Reserving %ldk for LCD Framebuffer at: %08lx\n",
size>>10, addr);
 455
 456         return (addr);
 457 }

Thanks
Minkyu Kang.
Wolfgang Denk Jan. 11, 2011, 11:23 a.m. UTC | #17
Dear Minkyu Kang,

In message <AANLkTi=Dfipyi4-Qxs5sfCzsOyWjsUydUaY4_C6r+_Wv@mail.gmail.com> you wrote:
>
> > Reservation of video memory is a standard task in the init sequence.
> > See this section in "arch/arm/lib/board.c":
> >
> > 358 #ifdef CONFIG_LCD
> > 359         /* reserve memory for LCD display (always full pages)>  */
> > 360         addr = lcd_setmem (addr);
> > 361         gd->fb_base = addr;
> > 362 #endif /* CONFIG_LCD */
>
> Yes I know...
> This init sequence is run before the relocation, right?

Yes, immediately preceeding it: the reservation of the video memory is
part of the calculation of the relocation address.

> Please see lcd_setmem function.
> This function access panel_info that is uninitialized.
> Is it correct?

This is not correct, if panel_info is really not initialized.

Normally panel_info should be initialized; see for example here:

"drivers/video/mx3fb.c":

...
110 vidinfo_t panel_info = {
111         .vl_col         = XRES,
112         .vl_row         = YRES,
113         .vl_bpix        = LCD_COLOR_IPU,
114         .cmap           = colormap,
115 };
...

Here panel_info is initialized and located in the data segment; this
is still read-only here, but that is sufficient.

Best regards,

Wolfgang Denk
Andreas Bießmann Jan. 11, 2011, 1 p.m. UTC | #18
Dear Minkyu Kang,

Am 11.01.2011 11:57, schrieb Minkyu Kang:

>> Regardless of the patch, if your code writes to panel_info or any other BSS
>> variable before relocation it will trash the relocation tables that exist at
>> BSS location at this point.
>>
>> IOW, accessing BSS before relocation is forbidden, not just out of fancy,
>> but for a serious reason.
> 
> This patch is not for accessing BSS before relocation,
> it's for prevent exceptions.

The real error is writing to BSS before relocation. This leads to a
corrupted .rel.dyn section which is placed at the same address as .bss
at this moment (bss is overloaded to save space).

If you look in your ELF (e.g. readelf -R .rel.dyn u-boot) you may see,
that the .rel.dyn section does _not_ include a pointer to 0x0 with
relative relocation (0x17) as you showed in a previous post.
If you look in your u-boot.map you may find the function in question
(test_func() was it in your example) is placed in .bss section. Setting
the function pointer to 0 (e.g. test_func() = NULL, as described in
previous mail) before relocation will destroy your .rel.dyn section and
then you will see a zero in .rel.dyn section at some place ... please
investigate the ELF and do not step through the code to find those issues.

I may be wrong, please show it to us.

regards

Andreas Bießmann
Andreas Bießmann Jan. 11, 2011, 1:07 p.m. UTC | #19
Dear Minkyu Kang,

Am 11.01.2011 14:00, schrieb Andreas Bießmann:
> Dear Minkyu Kang,
> 
> Am 11.01.2011 11:57, schrieb Minkyu Kang:
> 
>>> Regardless of the patch, if your code writes to panel_info or any other BSS
>>> variable before relocation it will trash the relocation tables that exist at
>>> BSS location at this point.
>>>
>>> IOW, accessing BSS before relocation is forbidden, not just out of fancy,
>>> but for a serious reason.
>>
>> This patch is not for accessing BSS before relocation,
>> it's for prevent exceptions.
> 
> The real error is writing to BSS before relocation. This leads to a
> corrupted .rel.dyn section which is placed at the same address as .bss
> at this moment (bss is overloaded to save space).
> 
> If you look in your ELF (e.g. readelf -R .rel.dyn u-boot) you may see,
> that the .rel.dyn section does _not_ include a pointer to 0x0 with
> relative relocation (0x17) as you showed in a previous post.
> If you look in your u-boot.map you may find the function in question
> (test_func() was it in your example) is placed in .bss section. Setting
> the function pointer to 0 (e.g. test_func() = NULL, as described in
> previous mail) before relocation will destroy your .rel.dyn section and
> then you will see a zero in .rel.dyn section at some place ... please
> investigate the ELF and do not step through the code to find those issues.

You may have a look at http://patchwork.ozlabs.org/patch/73760 for an
TEST approach to see the damaged .rel.dyn, if you like this hackish
approach.

regards

Andreas Bießmann
diff mbox

Patch

diff --git a/arch/arm/cpu/armv7/start.S b/arch/arm/cpu/armv7/start.S
index 684f2d2..4eeb12a 100644
--- a/arch/arm/cpu/armv7/start.S
+++ b/arch/arm/cpu/armv7/start.S
@@ -195,6 +195,8 @@  copy_loop:
 	add	r3, r3, r0		/* r3 <- rel dyn end in FLASH */
 fixloop:
 	ldr	r0, [r2]		/* r0 <- location to fix up, IN FLASH! */
+	cmp	r0, #0
+	beq	fixskip
 	add	r0, r0, r9		/* r0 <- location to fix up in RAM */
 	ldr	r1, [r2, #4]
 	and	r7, r1, #0xff
@@ -217,6 +219,7 @@  fixrel:
 	add	r1, r1, r9
 fixnext:
 	str	r1, [r0]
+fixskip:
 	add	r2, r2, #8		/* each rel.dyn entry is 8 bytes */
 	cmp	r2, r3
 	blo	fixloop