diff mbox

[U-Boot] avr32: fix relocation address calculation

Message ID 1368005117-8624-1-git-send-email-andreas.devel@googlemail.com
State Accepted, archived
Delegated to: Andreas Bießmann
Headers show

Commit Message

Andreas Bießmann May 8, 2013, 9:25 a.m. UTC
Commit 1865286466a5d0c7f2e3c37632da56556c838e9e (Introduce generic link
section.h symbol files) changed the __bss_end symbol type from char[] to
ulong. This led to wrong relocation parameters which ended up in a not working
u-boot. Unfortunately this is not clear to see cause due to RAM aliasing we
may get a 'half-working' u-boot then.

Fix this by dereferencing the __bss_end symbol where needed.

Signed-off-by: Andreas Bießmann <andreas.devel@googlemail.com>
---
 arch/avr32/lib/board.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Albert ARIBAUD May 10, 2013, 9:24 a.m. UTC | #1
Hi Andreas,

On Wed,  8 May 2013 11:25:17 +0200, Andreas Bießmann
<andreas.devel@googlemail.com> wrote:

> Commit 1865286466a5d0c7f2e3c37632da56556c838e9e (Introduce generic link
> section.h symbol files) changed the __bss_end symbol type from char[] to
> ulong. This led to wrong relocation parameters which ended up in a not working
> u-boot. Unfortunately this is not clear to see cause due to RAM aliasing we
> may get a 'half-working' u-boot then.
> 
> Fix this by dereferencing the __bss_end symbol where needed.

(cc:ing Simon and Tom)

The dereferencing is correct, so this patch seems good per se (it could
actually have applied when __bss_end was still a char[]).

But the definition of __bss_end to being an ulong, as introduced by
18652864, is plain wrong. Just because it is a linker defined symbol
does not mean the object it represents is a 32-bit quantity -- it is
not. It still is a non-object, a pure address.

Ditto for __data_end, __rel_dyn_start, __rel_dyn_end and pretty much any
symbol in sections.h which is not an offset.

Sorry for not spotting this before it was merged in; but now this must
be fixed. I'm ok with the wrongly-ulong symbols being changed back to
char[], though I would prefer their type to be char[0] if possible, as
this is (admittedly marginally) more likely to help the compiler catch
accidental dereferencings.

Amicalement,
Andreas Bießmann May 10, 2013, 10:57 a.m. UTC | #2
Hi Albert,

On 05/10/2013 11:24 AM, Albert ARIBAUD wrote:
> Hi Andreas,
> 
> On Wed,  8 May 2013 11:25:17 +0200, Andreas Bießmann
> <andreas.devel@googlemail.com> wrote:
> 
>> Commit 1865286466a5d0c7f2e3c37632da56556c838e9e (Introduce generic link
>> section.h symbol files) changed the __bss_end symbol type from char[] to
>> ulong. This led to wrong relocation parameters which ended up in a not working
>> u-boot. Unfortunately this is not clear to see cause due to RAM aliasing we
>> may get a 'half-working' u-boot then.
>>
>> Fix this by dereferencing the __bss_end symbol where needed.
> 
> (cc:ing Simon and Tom)
> 
> The dereferencing is correct, so this patch seems good per se (it could
> actually have applied when __bss_end was still a char[]).

well, as I understood this the __bss_end being a char[] did implicitly
take the address when accessing __bss_end (as we do when we have a
definition of char foo[2] and we take just 'foo'). But you say here we
should reference the address of __bss_end while it was still of type
char[]. Sorry, I do not understand that, can you please clarify?

Best regards

Andreas Bießmann
Albert ARIBAUD May 10, 2013, 3:09 p.m. UTC | #3
Hi Andreas,

On Fri, 10 May 2013 12:57:21 +0200, "Andreas Bießmann"
<andreas.devel@googlemail.com> wrote:

> Hi Albert,
> 
> On 05/10/2013 11:24 AM, Albert ARIBAUD wrote:
> > Hi Andreas,
> > 
> > On Wed,  8 May 2013 11:25:17 +0200, Andreas Bießmann
> > <andreas.devel@googlemail.com> wrote:
> > 
> >> Commit 1865286466a5d0c7f2e3c37632da56556c838e9e (Introduce generic link
> >> section.h symbol files) changed the __bss_end symbol type from char[] to
> >> ulong. This led to wrong relocation parameters which ended up in a not working
> >> u-boot. Unfortunately this is not clear to see cause due to RAM aliasing we
> >> may get a 'half-working' u-boot then.
> >>
> >> Fix this by dereferencing the __bss_end symbol where needed.
> > 
> > (cc:ing Simon and Tom)
> > 
> > The dereferencing is correct, so this patch seems good per se (it could
> > actually have applied when __bss_end was still a char[]).
> 
> well, as I understood this the __bss_end being a char[] did implicitly
> take the address when accessing __bss_end (as we do when we have a
> definition of char foo[2] and we take just 'foo'). But you say here we
> should reference the address of __bss_end while it was still of type
> char[]. Sorry, I do not understand that, can you please clarify?

There are several concepts here, some pertaining to the compiler, some
to the linker.

From the linker viewpoint, a symbol is *always* and *only* an address,
the first address of the object corresponding to the symbol, and an
object is just some area in the addressable space.

From the compiler viewpoint, an object has a C type, possibly with an
initial value, and a name, which is the symbol. The compiler considers
the name/symbol to be value, not the address of the corresponding
object... at least most of the time: as you indicate, when the symbol
denotes a C array, then the C compiler understand the symbol as the
address of the array.

The __bss_end symbol does not actually correspond to an object in the
usual sense, since the BSS contains all sorts of data: any C global,
either uninitialized or initialized with zeroes, whatever its type,
ends up in BSS. The most general way one can assign a type to BSS
itself is by considering it as a shapeless array of bytes -- hence the
char[] definition.

Thus, the C compiler considered the name __bss_end to denote the
address of the BSS "object", and the C code for AVR32 was correct as it
was actually referring to the BSS "object"'s address.

When the __bss_end symbol's C type was changed to 'ulong', this changed
the way the compiler understood the symbol: it now thinks __bss_end is
the BSS' "value", which has no true sense, and certainly does not mean
'the first 4 bytes of BSS considered as a 32-bit value'.

To compensate this, the AVR32 code has to add an & to find the address
of __bss_end, but the original error is to have changed the "type" of
the BSS.

IOW, we should *always* take the address of __bss_end, since this is
the only thing it was defined for. We should never give it a chance to
even *have* a value at the C level, because we don't want to read, or
worse, write, the BSS itself; we only want to access C globals in the
BSS.

HTH,

> Best regards
> 
> Andreas Bießmann

Amicalement,
Michael Cashwell May 10, 2013, 5:02 p.m. UTC | #4
On May 10, 2013, at 11:09 AM, Albert ARIBAUD <albert.u.boot@aribaud.net> wrote:

> The compiler considers the name/symbol to be value, not the address of the
> corresponding object... at least most of the time: as you indicate, when
> the symbol denotes a C array, then the C compiler understand the symbol as
> the address of the array.

I ran into a case one on Codewarrior Mac PPC tools where there was a subtle
different here. In an existing body of code there was originally a global
char[] defined with a matching extern char[] declared in a header.

At some point the definition was changed to char* because the size of the
array grew and we wanted it out of the global section. I traced an obscure
crash to some assembly code that had worked when the definition was char[]
but needed an extra level of indirection when it was char *.

During that debugging I found that the declaration had not been changed to
char * but the C compiler hadn't cared. It handled the mixed forms just fine
despite the clear difference in the code.

I surmised it was some subtle issue around PPC's handling of global data
(or the Codewarrior PPC ABI) but still don't really know.

-Mike
Albert ARIBAUD May 10, 2013, 5:15 p.m. UTC | #5
Hi Michael,

On Fri, 10 May 2013 13:02:57 -0400, Michael Cashwell
<mboards@prograde.net> wrote:

> On May 10, 2013, at 11:09 AM, Albert ARIBAUD <albert.u.boot@aribaud.net> wrote:
> 
> > The compiler considers the name/symbol to be value, not the address of the
> > corresponding object... at least most of the time: as you indicate, when
> > the symbol denotes a C array, then the C compiler understand the symbol as
> > the address of the array.
> 
> I ran into a case one on Codewarrior Mac PPC tools where there was a subtle
> different here. In an existing body of code there was originally a global
> char[] defined with a matching extern char[] declared in a header.
> 
> At some point the definition was changed to char* because the size of the
> array grew and we wanted it out of the global section. I traced an obscure
> crash to some assembly code that had worked when the definition was char[]
> but needed an extra level of indirection when it was char *.

Well, of course it did! char* is a pointer to char, with syntactic
facilities to use it as a pointer to char array, but char* is not an
array. The value of a pointer to char is a (probably 32-bit) address,
and you need to dereferenc it to get a char.

> During that debugging I found that the declaration had not been changed to
> char * but the C compiler hadn't cared. It handled the mixed forms just fine
> despite the clear difference in the code.

Well, the compiler would not care that one C module would know the
symbol as char* and another one would know it as char[], since the
compiler treats compilation units completely independently. You would
end up using the same address space area for two different objects: an
array of chars, overlapped with a (probably 32-bit) pointer to char.

Where I worked up until recently, we had a 'coding rule' that required
us to always #include a module's .h file (its public interface) from
within its .c file (its implementation) if only to make sure the
compiler saw both the declarations and the definitions in a single
compilation unit, and would thus bark at us for screwing up between
declaration and definition.

> I surmised it was some subtle issue around PPC's handling of global data
> (or the Codewarrior PPC ABI) but still don't really know.

This has nothing to do with PPC or CW; this is just C working as
designed.

> -Mike

Amicalement,
Andreas Bießmann May 13, 2013, 8:35 a.m. UTC | #6
Hi Albert,

On 05/10/2013 05:09 PM, Albert ARIBAUD wrote:
> Hi Andreas,
> 
> On Fri, 10 May 2013 12:57:21 +0200, "Andreas Bießmann"
> <andreas.devel@googlemail.com> wrote:
> 
>> Hi Albert,
>>
>> On 05/10/2013 11:24 AM, Albert ARIBAUD wrote:
>>> Hi Andreas,
>>>
>>> On Wed,  8 May 2013 11:25:17 +0200, Andreas Bießmann
>>> <andreas.devel@googlemail.com> wrote:
>>>
>>>> Commit 1865286466a5d0c7f2e3c37632da56556c838e9e (Introduce generic link
>>>> section.h symbol files) changed the __bss_end symbol type from char[] to
>>>> ulong. This led to wrong relocation parameters which ended up in a not working
>>>> u-boot. Unfortunately this is not clear to see cause due to RAM aliasing we
>>>> may get a 'half-working' u-boot then.
>>>>
>>>> Fix this by dereferencing the __bss_end symbol where needed.
>>>
>>> (cc:ing Simon and Tom)
>>>
>>> The dereferencing is correct, so this patch seems good per se (it could
>>> actually have applied when __bss_end was still a char[]).
>>
>> well, as I understood this the __bss_end being a char[] did implicitly
>> take the address when accessing __bss_end (as we do when we have a
>> definition of char foo[2] and we take just 'foo'). But you say here we
>> should reference the address of __bss_end while it was still of type
>> char[]. Sorry, I do not understand that, can you please clarify?
> 
> There are several concepts here, some pertaining to the compiler, some
> to the linker.
> 
> From the linker viewpoint, a symbol is *always* and *only* an address,
> the first address of the object corresponding to the symbol, and an
> object is just some area in the addressable space.
> 
> From the compiler viewpoint, an object has a C type, possibly with an
> initial value, and a name, which is the symbol. The compiler considers
> the name/symbol to be value, not the address of the corresponding
> object... at least most of the time: as you indicate, when the symbol
> denotes a C array, then the C compiler understand the symbol as the
> address of the array.
> 
> The __bss_end symbol does not actually correspond to an object in the
> usual sense, since the BSS contains all sorts of data: any C global,
> either uninitialized or initialized with zeroes, whatever its type,
> ends up in BSS. The most general way one can assign a type to BSS
> itself is by considering it as a shapeless array of bytes -- hence the
> char[] definition.
> 
> Thus, the C compiler considered the name __bss_end to denote the
> address of the BSS "object", and the C code for AVR32 was correct as it
> was actually referring to the BSS "object"'s address.
> 
> When the __bss_end symbol's C type was changed to 'ulong', this changed
> the way the compiler understood the symbol: it now thinks __bss_end is
> the BSS' "value", which has no true sense, and certainly does not mean
> 'the first 4 bytes of BSS considered as a 32-bit value'.
>
> To compensate this, the AVR32 code has to add an & to find the address
> of __bss_end, but the original error is to have changed the "type" of
> the BSS.
> 
> IOW, we should *always* take the address of __bss_end, since this is
> the only thing it was defined for. We should never give it a chance to
> even *have* a value at the C level, because we don't want to read, or
> worse, write, the BSS itself; we only want to access C globals in the
> BSS.

thank you for your detailed explanation. So now its clear why referring
the address of an object of type char[] will also work.
Another question, wouldn't it make sense to declare these C globals as
const then?

Best regards

Andreas Bießmann
Andreas Bießmann May 13, 2013, 8:38 a.m. UTC | #7
Dear Andreas Bießmann,

Andreas Bießmann <andreas.devel@googlemail.com> writes:
>Commit 1865286466a5d0c7f2e3c37632da56556c838e9e (Introduce generic link
>section.h symbol files) changed the __bss_end symbol type from char[] to
>ulong. This led to wrong relocation parameters which ended up in a not working
>u-boot. Unfortunately this is not clear to see cause due to RAM aliasing we
>may get a 'half-working' u-boot then.
>
>Fix this by dereferencing the __bss_end symbol where needed.
>
>Signed-off-by: Andreas Bießmann <andreas.devel@googlemail.com>
>
>---
>arch/avr32/lib/board.c |    4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)

applied to u-boot-atmel/master, thanks!

Best regards,
Andreas Bießmann
Andreas Bießmann May 13, 2013, 8:42 a.m. UTC | #8
On 05/13/2013 10:38 AM, Andreas Bießmann wrote:
> Dear Andreas Bießmann,
> 
> Andreas Bießmann <andreas.devel@googlemail.com> writes:
>> Commit 1865286466a5d0c7f2e3c37632da56556c838e9e (Introduce generic link
>> section.h symbol files) changed the __bss_end symbol type from char[] to
>> ulong. This led to wrong relocation parameters which ended up in a not working
>> u-boot. Unfortunately this is not clear to see cause due to RAM aliasing we
>> may get a 'half-working' u-boot then.
>>
>> Fix this by dereferencing the __bss_end symbol where needed.
>>
>> Signed-off-by: Andreas Bießmann <andreas.devel@googlemail.com>
>>
>> ---
>> arch/avr32/lib/board.c |    4 ++--
>> 1 file changed, 2 insertions(+), 2 deletions(-)
> 
> applied to u-boot-atmel/master, thanks!

damn scripts ... have to rework them ;)
Albert ARIBAUD May 13, 2013, 11:34 a.m. UTC | #9
Hi Andreas,

On Mon, 13 May 2013 10:35:12 +0200, "Andreas Bießmann"
<andreas.devel@googlemail.com> wrote:

> Hi Albert,
> 
> On 05/10/2013 05:09 PM, Albert ARIBAUD wrote:
> > Hi Andreas,
> > 
> > On Fri, 10 May 2013 12:57:21 +0200, "Andreas Bießmann"
> > <andreas.devel@googlemail.com> wrote:
> > 
> >> Hi Albert,
> >>
> >> On 05/10/2013 11:24 AM, Albert ARIBAUD wrote:
> >>> Hi Andreas,
> >>>
> >>> On Wed,  8 May 2013 11:25:17 +0200, Andreas Bießmann
> >>> <andreas.devel@googlemail.com> wrote:
> >>>
> >>>> Commit 1865286466a5d0c7f2e3c37632da56556c838e9e (Introduce generic link
> >>>> section.h symbol files) changed the __bss_end symbol type from char[] to
> >>>> ulong. This led to wrong relocation parameters which ended up in a not working
> >>>> u-boot. Unfortunately this is not clear to see cause due to RAM aliasing we
> >>>> may get a 'half-working' u-boot then.
> >>>>
> >>>> Fix this by dereferencing the __bss_end symbol where needed.
> >>>
> >>> (cc:ing Simon and Tom)
> >>>
> >>> The dereferencing is correct, so this patch seems good per se (it could
> >>> actually have applied when __bss_end was still a char[]).
> >>
> >> well, as I understood this the __bss_end being a char[] did implicitly
> >> take the address when accessing __bss_end (as we do when we have a
> >> definition of char foo[2] and we take just 'foo'). But you say here we
> >> should reference the address of __bss_end while it was still of type
> >> char[]. Sorry, I do not understand that, can you please clarify?
> > 
> > There are several concepts here, some pertaining to the compiler, some
> > to the linker.
> > 
> > From the linker viewpoint, a symbol is *always* and *only* an address,
> > the first address of the object corresponding to the symbol, and an
> > object is just some area in the addressable space.
> > 
> > From the compiler viewpoint, an object has a C type, possibly with an
> > initial value, and a name, which is the symbol. The compiler considers
> > the name/symbol to be value, not the address of the corresponding
> > object... at least most of the time: as you indicate, when the symbol
> > denotes a C array, then the C compiler understand the symbol as the
> > address of the array.
> > 
> > The __bss_end symbol does not actually correspond to an object in the
> > usual sense, since the BSS contains all sorts of data: any C global,
> > either uninitialized or initialized with zeroes, whatever its type,
> > ends up in BSS. The most general way one can assign a type to BSS
> > itself is by considering it as a shapeless array of bytes -- hence the
> > char[] definition.
> > 
> > Thus, the C compiler considered the name __bss_end to denote the
> > address of the BSS "object", and the C code for AVR32 was correct as it
> > was actually referring to the BSS "object"'s address.
> > 
> > When the __bss_end symbol's C type was changed to 'ulong', this changed
> > the way the compiler understood the symbol: it now thinks __bss_end is
> > the BSS' "value", which has no true sense, and certainly does not mean
> > 'the first 4 bytes of BSS considered as a 32-bit value'.
> >
> > To compensate this, the AVR32 code has to add an & to find the address
> > of __bss_end, but the original error is to have changed the "type" of
> > the BSS.
> > 
> > IOW, we should *always* take the address of __bss_end, since this is
> > the only thing it was defined for. We should never give it a chance to
> > even *have* a value at the C level, because we don't want to read, or
> > worse, write, the BSS itself; we only want to access C globals in the
> > BSS.
> 
> thank you for your detailed explanation. So now its clear why referring
> the address of an object of type char[] will also work.
> Another question, wouldn't it make sense to declare these C globals as
> const then?

Indeed, const may help prevent these symbols from being accidentally
written into, at least in the most expected cases such as passing
&__bss_end to a function expecting a non-const char*.

There is, however, a much better way of preventing this and more:
just give these symbols a C type of 'struct {}' (empty struct).

Since this type has absolutely no field which could be written into or
read from, it is completely protected from direct write but also
from direct read; and a pointer to it has type "struct {} *" which
is incompatible with any other pointer, so any inconsiderate use of it
is detected at compile time.

I had thought of the 'struct {}' method for linker lists refactoring,
when I needed a zero-size type; I finally turned to char[0] instead
(and comment on this at line 150 of include/linker_lists.h) because the
struct method would cause gcc 4.4 and earlier, such as eldk42, to throw
diagnostics like "warning: dereferencing type-punned pointer will break
strict-aliasing rules" -- that is the incompatibility I am talking
about.

Note that the diagnostics did not stem from the empty struct variable
declarations as such, but type-casting the address of an empty struct
into a pointer to a known, non-empty, struct; I just checked now, and
doing an intermediate cast to char* or void* prevents the warnings.
Why I failed to find this when I was refactoring linker lists, I'll
never know.

Of course, there is always a possibility that the area around BSS end
be accidentally read or written into if &__bss_end is force-cast into,
say, a char*, but that's true whatever type you give __bss_end. Aside
from this force-casting, the empty struct approach is IMO the most
protective.

Thus if one wants to protect symbols in sections.h, I suggest giving
them a type of 'struct {}', then building U-Boot with e.g. ELDK42, and
fixing all type-punning diagnostics.

(and I will 'fix' include/linker_lists.h myself to use empty structs)

> Best regards
> 
> Andreas Bießmann

Amicalement,
Albert ARIBAUD May 13, 2013, 11:43 a.m. UTC | #10
On Fri, 10 May 2013 11:24:44 +0200, Albert ARIBAUD
<albert.u.boot@aribaud.net> wrote:

> Sorry for not spotting this before it was merged in; but now this must
> be fixed. I'm ok with the wrongly-ulong symbols being changed back to
> char[], though I would prefer their type to be char[0] if possible, as
> this is (admittedly marginally) more likely to help the compiler catch
> accidental dereferencings.

Following the latest exchange with Andreas, and after some tests with
eldk42 and a gcc 4.7 toolchain, I amend the previous paragraph: rather
than 'char[0]', I suggest using 'struct {}'.

Amicalement,
diff mbox

Patch

diff --git a/arch/avr32/lib/board.c b/arch/avr32/lib/board.c
index ccf862a..2e79e98 100644
--- a/arch/avr32/lib/board.c
+++ b/arch/avr32/lib/board.c
@@ -116,7 +116,7 @@  static int display_banner (void)
 	printf ("\n\n%s\n\n", version_string);
 	printf ("U-Boot code: %08lx -> %08lx  data: %08lx -> %08lx\n",
 		(unsigned long)_text, (unsigned long)_etext,
-		(unsigned long)_data, (unsigned long)__bss_end);
+		(unsigned long)_data, (unsigned long)(&__bss_end));
 	return 0;
 }
 
@@ -183,7 +183,7 @@  void board_init_f(ulong board_type)
 	 *  - stack
 	 */
 	addr = CONFIG_SYS_SDRAM_BASE + sdram_size;
-	monitor_len = (char *)__bss_end - _text;
+	monitor_len = (char *)(&__bss_end) - _text;
 
 	/*
 	 * Reserve memory for u-boot code, data and bss.