diff mbox

[U-Boot] Relocation issue - need help!

Message ID 20141022123929.5FD0B3834D1@gemini.denx.de
State Not Applicable
Delegated to: Tom Rini
Headers show

Commit Message

Wolfgang Denk Oct. 22, 2014, 12:39 p.m. UTC
Hi,

I'm trying to track down a "syntax error" issue that gets triggered
when erasing the U-Boot image in NOR flash.  Symptoms look like this:

	=> print update
	update=protect off 0xfc000000 +${filesize};erase 0xfc000000 +${filesize};cp.b 200000 0xfc000000 ${filesize};protect on 0xfc000000 +${filesize}
	=> run update
	Un-Protected 2 sectors

	.. done
	Erased 2 sectors
	syntax error
	Protected 2 sectors
	=> run update
	syntax error

git bisect found commit 199adb6 "common/misc: sparse fixes" as
culprit; breaking this down further showed a single line in
common/cli_hush.c to trigger the problem. This patch fixes it:

---
 common/cli_hush.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Dirk Eibach Oct. 22, 2014, 1:29 p.m. UTC | #1
I had exactly the same behaviour some time ago and tracked it down to
this (and posted it on the mailing list, but sadly got no feedback):

In my latest u-boot builds I had some strange behaviour that I finally
tracked down to not fixed up flash addresses in relocated u-boot.
These addresses come from symbols in the .data.rel.ro.local section
that is not handled by u-boot linker scripts at the moment.

Some background on relro: http://www.airs.com/blog/archives/189

Joerg Albert already inquired about this on the gcc ML:
https://gcc.gnu.org/ml/gcc-help/2014-02/msg00017.html and he already
suggested a solution:
https://gcc.gnu.org/ml/gcc-help/2014-02/msg00054.html

So there a three things to notice:
1. Do not use gcc 4.8 and u-boot at the moment.
2. You might not notice that you have a problem until you erase u-boot
from flash (and get your cache flushed).
3. Handling relro properly should be on the TODO-List

Maybe this is already common knowledge an maybe somebody is already
working on this - but I did not notice yet. So in this case: sorry for
the noise :)

2014-10-22 14:39 GMT+02:00 Wolfgang Denk <wd@denx.de>:
> Hi,
>
> I'm trying to track down a "syntax error" issue that gets triggered
> when erasing the U-Boot image in NOR flash.  Symptoms look like this:
>
>         => print update
>         update=protect off 0xfc000000 +${filesize};erase 0xfc000000 +${filesize};cp.b 200000 0xfc000000 ${filesize};protect on 0xfc000000 +${filesize}
>         => run update
>         Un-Protected 2 sectors
>
>         .. done
>         Erased 2 sectors
>         syntax error
>         Protected 2 sectors
>         => run update
>         syntax error
>
> git bisect found commit 199adb6 "common/misc: sparse fixes" as
> culprit; breaking this down further showed a single line in
> common/cli_hush.c to trigger the problem. This patch fixes it:
>
> ---
>  common/cli_hush.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/common/cli_hush.c b/common/cli_hush.c
> index 38da5a0..5bbcfe6 100644
> --- a/common/cli_hush.c
> +++ b/common/cli_hush.c
> @@ -3127,7 +3127,7 @@ static void mapset(const unsigned char *set, int code)
>         for (s=set; *s; s++) map[*s] = code;
>  }
>
> -static void update_ifs_map(void)
> +void update_ifs_map(void)
>  {
>         /* char *ifs and char map[256] are both globals. */
>         ifs = (uchar *)getenv("IFS");
> --
> 1.8.3.1
>
> But I still have bad feelings - symptoms indicate that this is
> actually a relocation issue, as it only gets triggered when erasing
> the U-Boot image in NOR flash, so probably there are still pointers to
> data in NOR being used.  This patch here is not suited to fix the
> original cause of this issue.  But then, I do not see where there
> might be a relocation problem.  To be sure I even verified that "ifs"
> and "map[]" are really in RAM all the time.
>
> Has anybody an idea how to further track this down?  Or is the patch
> above actually a real fix?  If so, why?
>
> Best regards,
>
> Wolfgang Denk
>
> --
> DENX Software Engineering GmbH,     MD: Wolfgang Denk & Detlev Zundel
> HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
> Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
> Old programmers never die, they just branch to a new address.
> _______________________________________________
> U-Boot mailing list
> U-Boot@lists.denx.de
> http://lists.denx.de/mailman/listinfo/u-boot
Wolfgang Denk Oct. 22, 2014, 4:56 p.m. UTC | #2
Dear Dirk,

In message <CANVMifLGzKz+=-K-E9_sSXBxpYPdG1YqEXc-tsCApi7WVxQAHg@mail.gmail.com> you wrote:
> I had exactly the same behaviour some time ago and tracked it down to
> this (and posted it on the mailing list, but sadly got no feedback):

Thanks a lot for this pointer.

> So there a three things to notice:
> 1. Do not use gcc 4.8 and u-boot at the moment.
> 2. You might not notice that you have a problem until you erase u-boot
> from flash (and get your cache flushed).
> 3. Handling relro properly should be on the TODO-List

I confirm that the problem is in my case with gcc 4.8.1, too.  I did
not try another compiler yet.

> Maybe this is already common knowledge an maybe somebody is already
> working on this - but I did not notice yet. So in this case: sorry for
> the noise :)

I highly appreciate your hint, it was definitely very useful. Thanks!

Best regards,

Wolfgang Denk
Fabio Estevam Oct. 22, 2014, 5:26 p.m. UTC | #3
On Wed, Oct 22, 2014 at 2:56 PM, Wolfgang Denk <wd@denx.de> wrote:

>> So there a three things to notice:
>> 1. Do not use gcc 4.8 and u-boot at the moment.
>> 2. You might not notice that you have a problem until you erase u-boot
>> from flash (and get your cache flushed).
>> 3. Handling relro properly should be on the TODO-List
>
> I confirm that the problem is in my case with gcc 4.8.1, too.  I did
> not try another compiler yet.

Yes, there have been reported issues when using gcc 4.8.1 for building
an ARM kernel as well:
https://lkml.org/lkml/2014/10/10/272

Regards,

Fabio Estevam
Tom Rini Oct. 22, 2014, 5:28 p.m. UTC | #4
On Wed, Oct 22, 2014 at 06:56:11PM +0200, Wolfgang Denk wrote:
> Dear Dirk,
> 
> In message <CANVMifLGzKz+=-K-E9_sSXBxpYPdG1YqEXc-tsCApi7WVxQAHg@mail.gmail.com> you wrote:
> > I had exactly the same behaviour some time ago and tracked it down to
> > this (and posted it on the mailing list, but sadly got no feedback):
> 
> Thanks a lot for this pointer.
> 
> > So there a three things to notice:
> > 1. Do not use gcc 4.8 and u-boot at the moment.
> > 2. You might not notice that you have a problem until you erase u-boot
> > from flash (and get your cache flushed).
> > 3. Handling relro properly should be on the TODO-List
> 
> I confirm that the problem is in my case with gcc 4.8.1, too.  I did
> not try another compiler yet.
> 
> > Maybe this is already common knowledge an maybe somebody is already
> > working on this - but I did not notice yet. So in this case: sorry for
> > the noise :)
> 
> I highly appreciate your hint, it was definitely very useful. Thanks!

Is this ARM or PowerPC?  The kernel has blacklisted 4.8.x for ARM in
some cases, and this may or may not be related (see
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58854)
Wolfgang Denk Oct. 22, 2014, 5:39 p.m. UTC | #5
Dear Tom,

In message <20141022172811.GD25506@bill-the-cat> you wrote:
> 
> Is this ARM or PowerPC?  The kernel has blacklisted 4.8.x for ARM in
> some cases, and this may or may not be related (see
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D58854)

This is on PowerPC (MPC5200, i. e. the TQM5200S board I've had in my
fingers yesterday for other reasons).

Best regards,

Wolfgang Denk
Pavel Machek Oct. 22, 2014, 9:27 p.m. UTC | #6
Hi!

> > In message <CANVMifLGzKz+=-K-E9_sSXBxpYPdG1YqEXc-tsCApi7WVxQAHg@mail.gmail.com> you wrote:
> > > I had exactly the same behaviour some time ago and tracked it down to
> > > this (and posted it on the mailing list, but sadly got no feedback):
> > 
> > Thanks a lot for this pointer.
> > 
> > > So there a three things to notice:
> > > 1. Do not use gcc 4.8 and u-boot at the moment.
> > > 2. You might not notice that you have a problem until you erase u-boot
> > > from flash (and get your cache flushed).
> > > 3. Handling relro properly should be on the TODO-List
> > 
> > I confirm that the problem is in my case with gcc 4.8.1, too.  I did
> > not try another compiler yet.
> > 
> > > Maybe this is already common knowledge an maybe somebody is already
> > > working on this - but I did not notice yet. So in this case: sorry for
> > > the noise :)
> > 
> > I highly appreciate your hint, it was definitely very useful. Thanks!
> 
> Is this ARM or PowerPC?  The kernel has blacklisted 4.8.x for ARM in
> some cases, and this may or may not be related (see
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58854)

Just for the record, I also stree strange issues with 4.8.1 on
arm/socfpga. u-boot works ok when compiled with 4.7.2, and behaviour
on 4.8.1 seems to change based on compiler flags (-Os vs. -O2).

If anyone knows some kind of workaround, that would be nice... I'm
using bitbake with eldk-5.5, and changing that would not be too easy
:-(.

Best regards,
									Pavel
Marek Vasut Oct. 22, 2014, 9:46 p.m. UTC | #7
On Wednesday, October 22, 2014 at 11:27:42 PM, Pavel Machek wrote:
> Hi!
> 
> > > In message <CANVMifLGzKz+=-K-E9_sSXBxpYPdG1YqEXc-
tsCApi7WVxQAHg@mail.gmail.com> you wrote:
> > > > I had exactly the same behaviour some time ago and tracked it down to
> > > 
> > > > this (and posted it on the mailing list, but sadly got no feedback):
> > > Thanks a lot for this pointer.
> > > 
> > > > So there a three things to notice:
> > > > 1. Do not use gcc 4.8 and u-boot at the moment.
> > > > 2. You might not notice that you have a problem until you erase
> > > > u-boot from flash (and get your cache flushed).
> > > > 3. Handling relro properly should be on the TODO-List
> > > 
> > > I confirm that the problem is in my case with gcc 4.8.1, too.  I did
> > > not try another compiler yet.
> > > 
> > > > Maybe this is already common knowledge an maybe somebody is already
> > > > working on this - but I did not notice yet. So in this case: sorry
> > > > for the noise :)
> > > 
> > > I highly appreciate your hint, it was definitely very useful. Thanks!
> > 
> > Is this ARM or PowerPC?  The kernel has blacklisted 4.8.x for ARM in
> > some cases, and this may or may not be related (see
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58854)
> 
> Just for the record, I also stree strange issues with 4.8.1 on
> arm/socfpga. u-boot works ok when compiled with 4.7.2, and behaviour
> on 4.8.1 seems to change based on compiler flags (-Os vs. -O2).
> 
> If anyone knows some kind of workaround, that would be nice... I'm
> using bitbake with eldk-5.5, and changing that would not be too easy

What is the issue that you do see and I do not see ? What are the symptoms?

Best regards,
Marek Vasut
Pavel Machek Oct. 22, 2014, 9:57 p.m. UTC | #8
On Wed 2014-10-22 23:46:45, Marek Vasut wrote:
> On Wednesday, October 22, 2014 at 11:27:42 PM, Pavel Machek wrote:
> > Hi!
> > 
> > > > In message <CANVMifLGzKz+=-K-E9_sSXBxpYPdG1YqEXc-
> tsCApi7WVxQAHg@mail.gmail.com> you wrote:
> > > > > I had exactly the same behaviour some time ago and tracked it down to
> > > > 
> > > > > this (and posted it on the mailing list, but sadly got no feedback):
> > > > Thanks a lot for this pointer.
> > > > 
> > > > > So there a three things to notice:
> > > > > 1. Do not use gcc 4.8 and u-boot at the moment.
> > > > > 2. You might not notice that you have a problem until you erase
> > > > > u-boot from flash (and get your cache flushed).
> > > > > 3. Handling relro properly should be on the TODO-List
> > > > 
> > > > I confirm that the problem is in my case with gcc 4.8.1, too.  I did
> > > > not try another compiler yet.
> > > > 
> > > > > Maybe this is already common knowledge an maybe somebody is already
> > > > > working on this - but I did not notice yet. So in this case: sorry
> > > > > for the noise :)
> > > > 
> > > > I highly appreciate your hint, it was definitely very useful. Thanks!
> > > 
> > > Is this ARM or PowerPC?  The kernel has blacklisted 4.8.x for ARM in
> > > some cases, and this may or may not be related (see
> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58854)
> > 
> > Just for the record, I also stree strange issues with 4.8.1 on
> > arm/socfpga. u-boot works ok when compiled with 4.7.2, and behaviour
> > on 4.8.1 seems to change based on compiler flags (-Os vs. -O2).
> > 
> > If anyone knows some kind of workaround, that would be nice... I'm
> > using bitbake with eldk-5.5, and changing that would not be too easy
> 
> What is the issue that you do see and I do not see ? What are the symptoms?

I'm not sure if you should be seing this issue, are you using gcc
4.8.1?

I get hang during MMC init. If I comment it out, it hangs at setenv of
random variable. With changed compiler flags, commenting out MMC init
does not get me to prompt.

I'll try to update to gcc-4.8.3 as gcc-4.8.1 has known issues.
									Pavel
Marek Vasut Oct. 22, 2014, 10:06 p.m. UTC | #9
On Wednesday, October 22, 2014 at 11:57:39 PM, Pavel Machek wrote:
> On Wed 2014-10-22 23:46:45, Marek Vasut wrote:
> > On Wednesday, October 22, 2014 at 11:27:42 PM, Pavel Machek wrote:
> > > Hi!
> > > 
> > > > > In message <CANVMifLGzKz+=-K-E9_sSXBxpYPdG1YqEXc-
> > 
> > tsCApi7WVxQAHg@mail.gmail.com> you wrote:
> > > > > > I had exactly the same behaviour some time ago and tracked it
> > > > > > down to
> > > > > 
> > > > > > this (and posted it on the mailing list, but sadly got no feedback):
> > > > > Thanks a lot for this pointer.
> > > > > 
> > > > > > So there a three things to notice:
> > > > > > 1. Do not use gcc 4.8 and u-boot at the moment.
> > > > > > 2. You might not notice that you have a problem until you erase
> > > > > > u-boot from flash (and get your cache flushed).
> > > > > > 3. Handling relro properly should be on the TODO-List
> > > > > 
> > > > > I confirm that the problem is in my case with gcc 4.8.1, too.  I
> > > > > did not try another compiler yet.
> > > > > 
> > > > > > Maybe this is already common knowledge an maybe somebody is
> > > > > > already working on this - but I did not notice yet. So in this
> > > > > > case: sorry for the noise :)
> > > > > 
> > > > > I highly appreciate your hint, it was definitely very useful.
> > > > > Thanks!
> > > > 
> > > > Is this ARM or PowerPC?  The kernel has blacklisted 4.8.x for ARM in
> > > > some cases, and this may or may not be related (see
> > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58854)
> > > 
> > > Just for the record, I also stree strange issues with 4.8.1 on
> > > arm/socfpga. u-boot works ok when compiled with 4.7.2, and behaviour
> > > on 4.8.1 seems to change based on compiler flags (-Os vs. -O2).
> > > 
> > > If anyone knows some kind of workaround, that would be nice... I'm
> > > using bitbake with eldk-5.5, and changing that would not be too easy
> > 
> > What is the issue that you do see and I do not see ? What are the
> > symptoms?
> 
> I'm not sure if you should be seing this issue, are you using gcc
> 4.8.1?
> 
> I get hang during MMC init. If I comment it out, it hangs at setenv of
> random variable. With changed compiler flags, commenting out MMC init
> does not get me to prompt.
> 
> I'll try to update to gcc-4.8.3 as gcc-4.8.1 has known issues.

Actually 4.8.2 from ELDK 5.6 . I recall there were some fixes for GCC , but this 
should be fixed in ELDK 5.5.2 . See:
https://www.mail-archive.com/eldk@lists.denx.de/msg00908.html

Best regards,
Marek Vasut
Dirk Eibach Oct. 23, 2014, 6:01 a.m. UTC | #10
Hello Wolfgang,

2014-10-22 18:56 GMT+02:00 Wolfgang Denk <wd@denx.de>:
> Dear Dirk,
>
> In message <CANVMifLGzKz+=-K-E9_sSXBxpYPdG1YqEXc-tsCApi7WVxQAHg@mail.gmail.com> you wrote:
>> I had exactly the same behaviour some time ago and tracked it down to
>> this (and posted it on the mailing list, but sadly got no feedback):
>
> Thanks a lot for this pointer.

I am really glad this was helpful. It was very nasty to track down, so
I was really concerned when I found it. For that reson I chose "u-boot
ppc does not work with gcc 4.8" as a topic when I reported it to
U-Boot mailing list and put you on CC on august 5th. But maybe I
should have been more explicit, something like "APOCALYPSE NOW: u-boot
ppc does not work with gcc 4.8" ;)

This problem is *not* fixed by the links Marek addressed.

Just a quick explanation of what is going on:
Since gcc 4.8 we have new sections .data.rel.ro and
.data.rel.ro.local. They contain absolute addresses that should really
be fixed up in our relocation process but are not considered yet.
In your case  you wre running u-boot referencing the not fixed-up
addresses which worked perfectly as long as they still pointed to
valid content. But as soon as you erased flash this was no longer the
case. To make debugging even more fun, behaviour also depends on cache
contents.

In my original mail I referenced this potential solution, at least it
worked for me:
https://gcc.gnu.org/ml/gcc-help/2014-02/msg00054.html

Cheers
DIrk
Joakim Tjernlund Oct. 23, 2014, 6:42 a.m. UTC | #11
> 
> Hello Wolfgang,
> 
> 2014-10-22 18:56 GMT+02:00 Wolfgang Denk <wd@denx.de>:
> > Dear Dirk,
> >
> > In message 
<CANVMifLGzKz+=-K-E9_sSXBxpYPdG1YqEXc-tsCApi7WVxQAHg@mail.gmail.com> you 
wrote:
> >> I had exactly the same behaviour some time ago and tracked it down to
> >> this (and posted it on the mailing list, but sadly got no feedback):
> >
> > Thanks a lot for this pointer.
> 
> I am really glad this was helpful. It was very nasty to track down, so
> I was really concerned when I found it. For that reson I chose "u-boot
> ppc does not work with gcc 4.8" as a topic when I reported it to
> U-Boot mailing list and put you on CC on august 5th. But maybe I
> should have been more explicit, something like "APOCALYPSE NOW: u-boot
> ppc does not work with gcc 4.8" ;)
> 
> This problem is *not* fixed by the links Marek addressed.
> 
> Just a quick explanation of what is going on:
> Since gcc 4.8 we have new sections .data.rel.ro and
> .data.rel.ro.local. They contain absolute addresses that should really
> be fixed up in our relocation process but are not considered yet.
> In your case  you wre running u-boot referencing the not fixed-up
> addresses which worked perfectly as long as they still pointed to
> valid content. But as soon as you erased flash this was no longer the
> case. To make debugging even more fun, behaviour also depends on cache
> contents.

Ouch, that was a nasty surprise.

> 
> In my original mail I referenced this potential solution, at least it
> worked for me:
> https://gcc.gnu.org/ml/gcc-help/2014-02/msg00054.html

That looks like the correct fix but I presume both .data.rel.ro and
data.rel.ro.local should be added?

 Jocke
diff mbox

Patch

diff --git a/common/cli_hush.c b/common/cli_hush.c
index 38da5a0..5bbcfe6 100644
--- a/common/cli_hush.c
+++ b/common/cli_hush.c
@@ -3127,7 +3127,7 @@  static void mapset(const unsigned char *set, int code)
 	for (s=set; *s; s++) map[*s] = code;
 }
 
-static void update_ifs_map(void)
+void update_ifs_map(void)
 {
 	/* char *ifs and char map[256] are both globals. */
 	ifs = (uchar *)getenv("IFS");