Message ID | 1368005117-8624-1-git-send-email-andreas.devel@googlemail.com |
---|---|
State | Accepted, archived |
Delegated to: | Andreas Bießmann |
Headers | show |
Hi Andreas, On Wed, 8 May 2013 11:25:17 +0200, Andreas Bießmann <andreas.devel@googlemail.com> wrote: > Commit 1865286466a5d0c7f2e3c37632da56556c838e9e (Introduce generic link > section.h symbol files) changed the __bss_end symbol type from char[] to > ulong. This led to wrong relocation parameters which ended up in a not working > u-boot. Unfortunately this is not clear to see cause due to RAM aliasing we > may get a 'half-working' u-boot then. > > Fix this by dereferencing the __bss_end symbol where needed. (cc:ing Simon and Tom) The dereferencing is correct, so this patch seems good per se (it could actually have applied when __bss_end was still a char[]). But the definition of __bss_end to being an ulong, as introduced by 18652864, is plain wrong. Just because it is a linker defined symbol does not mean the object it represents is a 32-bit quantity -- it is not. It still is a non-object, a pure address. Ditto for __data_end, __rel_dyn_start, __rel_dyn_end and pretty much any symbol in sections.h which is not an offset. Sorry for not spotting this before it was merged in; but now this must be fixed. I'm ok with the wrongly-ulong symbols being changed back to char[], though I would prefer their type to be char[0] if possible, as this is (admittedly marginally) more likely to help the compiler catch accidental dereferencings. Amicalement,
Hi Albert, On 05/10/2013 11:24 AM, Albert ARIBAUD wrote: > Hi Andreas, > > On Wed, 8 May 2013 11:25:17 +0200, Andreas Bießmann > <andreas.devel@googlemail.com> wrote: > >> Commit 1865286466a5d0c7f2e3c37632da56556c838e9e (Introduce generic link >> section.h symbol files) changed the __bss_end symbol type from char[] to >> ulong. This led to wrong relocation parameters which ended up in a not working >> u-boot. Unfortunately this is not clear to see cause due to RAM aliasing we >> may get a 'half-working' u-boot then. >> >> Fix this by dereferencing the __bss_end symbol where needed. > > (cc:ing Simon and Tom) > > The dereferencing is correct, so this patch seems good per se (it could > actually have applied when __bss_end was still a char[]). well, as I understood this the __bss_end being a char[] did implicitly take the address when accessing __bss_end (as we do when we have a definition of char foo[2] and we take just 'foo'). But you say here we should reference the address of __bss_end while it was still of type char[]. Sorry, I do not understand that, can you please clarify? Best regards Andreas Bießmann
Hi Andreas, On Fri, 10 May 2013 12:57:21 +0200, "Andreas Bießmann" <andreas.devel@googlemail.com> wrote: > Hi Albert, > > On 05/10/2013 11:24 AM, Albert ARIBAUD wrote: > > Hi Andreas, > > > > On Wed, 8 May 2013 11:25:17 +0200, Andreas Bießmann > > <andreas.devel@googlemail.com> wrote: > > > >> Commit 1865286466a5d0c7f2e3c37632da56556c838e9e (Introduce generic link > >> section.h symbol files) changed the __bss_end symbol type from char[] to > >> ulong. This led to wrong relocation parameters which ended up in a not working > >> u-boot. Unfortunately this is not clear to see cause due to RAM aliasing we > >> may get a 'half-working' u-boot then. > >> > >> Fix this by dereferencing the __bss_end symbol where needed. > > > > (cc:ing Simon and Tom) > > > > The dereferencing is correct, so this patch seems good per se (it could > > actually have applied when __bss_end was still a char[]). > > well, as I understood this the __bss_end being a char[] did implicitly > take the address when accessing __bss_end (as we do when we have a > definition of char foo[2] and we take just 'foo'). But you say here we > should reference the address of __bss_end while it was still of type > char[]. Sorry, I do not understand that, can you please clarify? There are several concepts here, some pertaining to the compiler, some to the linker. From the linker viewpoint, a symbol is *always* and *only* an address, the first address of the object corresponding to the symbol, and an object is just some area in the addressable space. From the compiler viewpoint, an object has a C type, possibly with an initial value, and a name, which is the symbol. The compiler considers the name/symbol to be value, not the address of the corresponding object... at least most of the time: as you indicate, when the symbol denotes a C array, then the C compiler understand the symbol as the address of the array. The __bss_end symbol does not actually correspond to an object in the usual sense, since the BSS contains all sorts of data: any C global, either uninitialized or initialized with zeroes, whatever its type, ends up in BSS. The most general way one can assign a type to BSS itself is by considering it as a shapeless array of bytes -- hence the char[] definition. Thus, the C compiler considered the name __bss_end to denote the address of the BSS "object", and the C code for AVR32 was correct as it was actually referring to the BSS "object"'s address. When the __bss_end symbol's C type was changed to 'ulong', this changed the way the compiler understood the symbol: it now thinks __bss_end is the BSS' "value", which has no true sense, and certainly does not mean 'the first 4 bytes of BSS considered as a 32-bit value'. To compensate this, the AVR32 code has to add an & to find the address of __bss_end, but the original error is to have changed the "type" of the BSS. IOW, we should *always* take the address of __bss_end, since this is the only thing it was defined for. We should never give it a chance to even *have* a value at the C level, because we don't want to read, or worse, write, the BSS itself; we only want to access C globals in the BSS. HTH, > Best regards > > Andreas Bießmann Amicalement,
On May 10, 2013, at 11:09 AM, Albert ARIBAUD <albert.u.boot@aribaud.net> wrote: > The compiler considers the name/symbol to be value, not the address of the > corresponding object... at least most of the time: as you indicate, when > the symbol denotes a C array, then the C compiler understand the symbol as > the address of the array. I ran into a case one on Codewarrior Mac PPC tools where there was a subtle different here. In an existing body of code there was originally a global char[] defined with a matching extern char[] declared in a header. At some point the definition was changed to char* because the size of the array grew and we wanted it out of the global section. I traced an obscure crash to some assembly code that had worked when the definition was char[] but needed an extra level of indirection when it was char *. During that debugging I found that the declaration had not been changed to char * but the C compiler hadn't cared. It handled the mixed forms just fine despite the clear difference in the code. I surmised it was some subtle issue around PPC's handling of global data (or the Codewarrior PPC ABI) but still don't really know. -Mike
Hi Michael, On Fri, 10 May 2013 13:02:57 -0400, Michael Cashwell <mboards@prograde.net> wrote: > On May 10, 2013, at 11:09 AM, Albert ARIBAUD <albert.u.boot@aribaud.net> wrote: > > > The compiler considers the name/symbol to be value, not the address of the > > corresponding object... at least most of the time: as you indicate, when > > the symbol denotes a C array, then the C compiler understand the symbol as > > the address of the array. > > I ran into a case one on Codewarrior Mac PPC tools where there was a subtle > different here. In an existing body of code there was originally a global > char[] defined with a matching extern char[] declared in a header. > > At some point the definition was changed to char* because the size of the > array grew and we wanted it out of the global section. I traced an obscure > crash to some assembly code that had worked when the definition was char[] > but needed an extra level of indirection when it was char *. Well, of course it did! char* is a pointer to char, with syntactic facilities to use it as a pointer to char array, but char* is not an array. The value of a pointer to char is a (probably 32-bit) address, and you need to dereferenc it to get a char. > During that debugging I found that the declaration had not been changed to > char * but the C compiler hadn't cared. It handled the mixed forms just fine > despite the clear difference in the code. Well, the compiler would not care that one C module would know the symbol as char* and another one would know it as char[], since the compiler treats compilation units completely independently. You would end up using the same address space area for two different objects: an array of chars, overlapped with a (probably 32-bit) pointer to char. Where I worked up until recently, we had a 'coding rule' that required us to always #include a module's .h file (its public interface) from within its .c file (its implementation) if only to make sure the compiler saw both the declarations and the definitions in a single compilation unit, and would thus bark at us for screwing up between declaration and definition. > I surmised it was some subtle issue around PPC's handling of global data > (or the Codewarrior PPC ABI) but still don't really know. This has nothing to do with PPC or CW; this is just C working as designed. > -Mike Amicalement,
Hi Albert, On 05/10/2013 05:09 PM, Albert ARIBAUD wrote: > Hi Andreas, > > On Fri, 10 May 2013 12:57:21 +0200, "Andreas Bießmann" > <andreas.devel@googlemail.com> wrote: > >> Hi Albert, >> >> On 05/10/2013 11:24 AM, Albert ARIBAUD wrote: >>> Hi Andreas, >>> >>> On Wed, 8 May 2013 11:25:17 +0200, Andreas Bießmann >>> <andreas.devel@googlemail.com> wrote: >>> >>>> Commit 1865286466a5d0c7f2e3c37632da56556c838e9e (Introduce generic link >>>> section.h symbol files) changed the __bss_end symbol type from char[] to >>>> ulong. This led to wrong relocation parameters which ended up in a not working >>>> u-boot. Unfortunately this is not clear to see cause due to RAM aliasing we >>>> may get a 'half-working' u-boot then. >>>> >>>> Fix this by dereferencing the __bss_end symbol where needed. >>> >>> (cc:ing Simon and Tom) >>> >>> The dereferencing is correct, so this patch seems good per se (it could >>> actually have applied when __bss_end was still a char[]). >> >> well, as I understood this the __bss_end being a char[] did implicitly >> take the address when accessing __bss_end (as we do when we have a >> definition of char foo[2] and we take just 'foo'). But you say here we >> should reference the address of __bss_end while it was still of type >> char[]. Sorry, I do not understand that, can you please clarify? > > There are several concepts here, some pertaining to the compiler, some > to the linker. > > From the linker viewpoint, a symbol is *always* and *only* an address, > the first address of the object corresponding to the symbol, and an > object is just some area in the addressable space. > > From the compiler viewpoint, an object has a C type, possibly with an > initial value, and a name, which is the symbol. The compiler considers > the name/symbol to be value, not the address of the corresponding > object... at least most of the time: as you indicate, when the symbol > denotes a C array, then the C compiler understand the symbol as the > address of the array. > > The __bss_end symbol does not actually correspond to an object in the > usual sense, since the BSS contains all sorts of data: any C global, > either uninitialized or initialized with zeroes, whatever its type, > ends up in BSS. The most general way one can assign a type to BSS > itself is by considering it as a shapeless array of bytes -- hence the > char[] definition. > > Thus, the C compiler considered the name __bss_end to denote the > address of the BSS "object", and the C code for AVR32 was correct as it > was actually referring to the BSS "object"'s address. > > When the __bss_end symbol's C type was changed to 'ulong', this changed > the way the compiler understood the symbol: it now thinks __bss_end is > the BSS' "value", which has no true sense, and certainly does not mean > 'the first 4 bytes of BSS considered as a 32-bit value'. > > To compensate this, the AVR32 code has to add an & to find the address > of __bss_end, but the original error is to have changed the "type" of > the BSS. > > IOW, we should *always* take the address of __bss_end, since this is > the only thing it was defined for. We should never give it a chance to > even *have* a value at the C level, because we don't want to read, or > worse, write, the BSS itself; we only want to access C globals in the > BSS. thank you for your detailed explanation. So now its clear why referring the address of an object of type char[] will also work. Another question, wouldn't it make sense to declare these C globals as const then? Best regards Andreas Bießmann
Dear Andreas Bießmann, Andreas Bießmann <andreas.devel@googlemail.com> writes: >Commit 1865286466a5d0c7f2e3c37632da56556c838e9e (Introduce generic link >section.h symbol files) changed the __bss_end symbol type from char[] to >ulong. This led to wrong relocation parameters which ended up in a not working >u-boot. Unfortunately this is not clear to see cause due to RAM aliasing we >may get a 'half-working' u-boot then. > >Fix this by dereferencing the __bss_end symbol where needed. > >Signed-off-by: Andreas Bießmann <andreas.devel@googlemail.com> > >--- >arch/avr32/lib/board.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) applied to u-boot-atmel/master, thanks! Best regards, Andreas Bießmann
On 05/13/2013 10:38 AM, Andreas Bießmann wrote: > Dear Andreas Bießmann, > > Andreas Bießmann <andreas.devel@googlemail.com> writes: >> Commit 1865286466a5d0c7f2e3c37632da56556c838e9e (Introduce generic link >> section.h symbol files) changed the __bss_end symbol type from char[] to >> ulong. This led to wrong relocation parameters which ended up in a not working >> u-boot. Unfortunately this is not clear to see cause due to RAM aliasing we >> may get a 'half-working' u-boot then. >> >> Fix this by dereferencing the __bss_end symbol where needed. >> >> Signed-off-by: Andreas Bießmann <andreas.devel@googlemail.com> >> >> --- >> arch/avr32/lib/board.c | 4 ++-- >> 1 file changed, 2 insertions(+), 2 deletions(-) > > applied to u-boot-atmel/master, thanks! damn scripts ... have to rework them ;)
Hi Andreas, On Mon, 13 May 2013 10:35:12 +0200, "Andreas Bießmann" <andreas.devel@googlemail.com> wrote: > Hi Albert, > > On 05/10/2013 05:09 PM, Albert ARIBAUD wrote: > > Hi Andreas, > > > > On Fri, 10 May 2013 12:57:21 +0200, "Andreas Bießmann" > > <andreas.devel@googlemail.com> wrote: > > > >> Hi Albert, > >> > >> On 05/10/2013 11:24 AM, Albert ARIBAUD wrote: > >>> Hi Andreas, > >>> > >>> On Wed, 8 May 2013 11:25:17 +0200, Andreas Bießmann > >>> <andreas.devel@googlemail.com> wrote: > >>> > >>>> Commit 1865286466a5d0c7f2e3c37632da56556c838e9e (Introduce generic link > >>>> section.h symbol files) changed the __bss_end symbol type from char[] to > >>>> ulong. This led to wrong relocation parameters which ended up in a not working > >>>> u-boot. Unfortunately this is not clear to see cause due to RAM aliasing we > >>>> may get a 'half-working' u-boot then. > >>>> > >>>> Fix this by dereferencing the __bss_end symbol where needed. > >>> > >>> (cc:ing Simon and Tom) > >>> > >>> The dereferencing is correct, so this patch seems good per se (it could > >>> actually have applied when __bss_end was still a char[]). > >> > >> well, as I understood this the __bss_end being a char[] did implicitly > >> take the address when accessing __bss_end (as we do when we have a > >> definition of char foo[2] and we take just 'foo'). But you say here we > >> should reference the address of __bss_end while it was still of type > >> char[]. Sorry, I do not understand that, can you please clarify? > > > > There are several concepts here, some pertaining to the compiler, some > > to the linker. > > > > From the linker viewpoint, a symbol is *always* and *only* an address, > > the first address of the object corresponding to the symbol, and an > > object is just some area in the addressable space. > > > > From the compiler viewpoint, an object has a C type, possibly with an > > initial value, and a name, which is the symbol. The compiler considers > > the name/symbol to be value, not the address of the corresponding > > object... at least most of the time: as you indicate, when the symbol > > denotes a C array, then the C compiler understand the symbol as the > > address of the array. > > > > The __bss_end symbol does not actually correspond to an object in the > > usual sense, since the BSS contains all sorts of data: any C global, > > either uninitialized or initialized with zeroes, whatever its type, > > ends up in BSS. The most general way one can assign a type to BSS > > itself is by considering it as a shapeless array of bytes -- hence the > > char[] definition. > > > > Thus, the C compiler considered the name __bss_end to denote the > > address of the BSS "object", and the C code for AVR32 was correct as it > > was actually referring to the BSS "object"'s address. > > > > When the __bss_end symbol's C type was changed to 'ulong', this changed > > the way the compiler understood the symbol: it now thinks __bss_end is > > the BSS' "value", which has no true sense, and certainly does not mean > > 'the first 4 bytes of BSS considered as a 32-bit value'. > > > > To compensate this, the AVR32 code has to add an & to find the address > > of __bss_end, but the original error is to have changed the "type" of > > the BSS. > > > > IOW, we should *always* take the address of __bss_end, since this is > > the only thing it was defined for. We should never give it a chance to > > even *have* a value at the C level, because we don't want to read, or > > worse, write, the BSS itself; we only want to access C globals in the > > BSS. > > thank you for your detailed explanation. So now its clear why referring > the address of an object of type char[] will also work. > Another question, wouldn't it make sense to declare these C globals as > const then? Indeed, const may help prevent these symbols from being accidentally written into, at least in the most expected cases such as passing &__bss_end to a function expecting a non-const char*. There is, however, a much better way of preventing this and more: just give these symbols a C type of 'struct {}' (empty struct). Since this type has absolutely no field which could be written into or read from, it is completely protected from direct write but also from direct read; and a pointer to it has type "struct {} *" which is incompatible with any other pointer, so any inconsiderate use of it is detected at compile time. I had thought of the 'struct {}' method for linker lists refactoring, when I needed a zero-size type; I finally turned to char[0] instead (and comment on this at line 150 of include/linker_lists.h) because the struct method would cause gcc 4.4 and earlier, such as eldk42, to throw diagnostics like "warning: dereferencing type-punned pointer will break strict-aliasing rules" -- that is the incompatibility I am talking about. Note that the diagnostics did not stem from the empty struct variable declarations as such, but type-casting the address of an empty struct into a pointer to a known, non-empty, struct; I just checked now, and doing an intermediate cast to char* or void* prevents the warnings. Why I failed to find this when I was refactoring linker lists, I'll never know. Of course, there is always a possibility that the area around BSS end be accidentally read or written into if &__bss_end is force-cast into, say, a char*, but that's true whatever type you give __bss_end. Aside from this force-casting, the empty struct approach is IMO the most protective. Thus if one wants to protect symbols in sections.h, I suggest giving them a type of 'struct {}', then building U-Boot with e.g. ELDK42, and fixing all type-punning diagnostics. (and I will 'fix' include/linker_lists.h myself to use empty structs) > Best regards > > Andreas Bießmann Amicalement,
On Fri, 10 May 2013 11:24:44 +0200, Albert ARIBAUD <albert.u.boot@aribaud.net> wrote: > Sorry for not spotting this before it was merged in; but now this must > be fixed. I'm ok with the wrongly-ulong symbols being changed back to > char[], though I would prefer their type to be char[0] if possible, as > this is (admittedly marginally) more likely to help the compiler catch > accidental dereferencings. Following the latest exchange with Andreas, and after some tests with eldk42 and a gcc 4.7 toolchain, I amend the previous paragraph: rather than 'char[0]', I suggest using 'struct {}'. Amicalement,
diff --git a/arch/avr32/lib/board.c b/arch/avr32/lib/board.c index ccf862a..2e79e98 100644 --- a/arch/avr32/lib/board.c +++ b/arch/avr32/lib/board.c @@ -116,7 +116,7 @@ static int display_banner (void) printf ("\n\n%s\n\n", version_string); printf ("U-Boot code: %08lx -> %08lx data: %08lx -> %08lx\n", (unsigned long)_text, (unsigned long)_etext, - (unsigned long)_data, (unsigned long)__bss_end); + (unsigned long)_data, (unsigned long)(&__bss_end)); return 0; } @@ -183,7 +183,7 @@ void board_init_f(ulong board_type) * - stack */ addr = CONFIG_SYS_SDRAM_BASE + sdram_size; - monitor_len = (char *)__bss_end - _text; + monitor_len = (char *)(&__bss_end) - _text; /* * Reserve memory for u-boot code, data and bss.
Commit 1865286466a5d0c7f2e3c37632da56556c838e9e (Introduce generic link section.h symbol files) changed the __bss_end symbol type from char[] to ulong. This led to wrong relocation parameters which ended up in a not working u-boot. Unfortunately this is not clear to see cause due to RAM aliasing we may get a 'half-working' u-boot then. Fix this by dereferencing the __bss_end symbol where needed. Signed-off-by: Andreas Bießmann <andreas.devel@googlemail.com> --- arch/avr32/lib/board.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)