diff mbox

[U-Boot,01/21] Define new system_restart() and emergency_restart()

Message ID 1299519462-25320-2-git-send-email-Kyle.D.Moffett@boeing.com
State Superseded, archived
Headers show

Commit Message

Kyle Moffett March 7, 2011, 5:37 p.m. UTC
In preparation for making system restart use a generic set of hooks for
boards and architectures, we define some wrappers and weak stubs.

The new wrapper functions are:
  system_restart()     -  Normal system reboot (IE: user request)
  emergency_restart()  -  Critical error response (IE: panic(), etc)

During the process of converting all architectures to use the new hooks,
the system_restart() and emergency_restart() code will fall back to call
the existing do_reset() function.

The handler for the "reset" shell command is now do_generic_reset(),
which is a trivial wrapper around system_restart().

The new hooks (for architecture and board-support code to define) are:
  __arch_emergency_restart()
  __arch_restart()
  __board_emergency_restart()
  __board_restart()

By default the __(arch|board)_emergency_restart() functions just call
the corresponding __(arch|board)_restart() hook.  This works for all
hardware platforms which have a properly defined hardware reset
capability.

Signed-off-by: Kyle Moffett <Kyle.D.Moffett@boeing.com>
Cc: Wolfgang Denk <wd@denx.de>
---
 api/api.c                 |    3 +-
 common/cmd_boot.c         |  128 ++++++++++++++++++++++++++++++++++++++++++++-
 common/cmd_bootm.c        |    8 +--
 common/hush.c             |    2 +-
 common/main.c             |    2 +-
 examples/api/libgenwrap.c |    5 +-
 include/_exports.h        |    2 +-
 include/command.h         |   13 +++++
 lib/vsprintf.c            |    2 +-
 9 files changed, 152 insertions(+), 13 deletions(-)

Comments

Mike Frysinger March 7, 2011, 9:40 p.m. UTC | #1
On Monday, March 07, 2011 12:37:22 Kyle Moffett wrote:
> +__attribute__((__noreturn__))
> +void emergency_restart(void)
> +{
> +	__board_emergency_restart();
> +	__arch_emergency_restart();
> +
> +	/* Fallback to the old do_reset() until everything is converted. */
> +	do_reset(NULL, 0, 0, NULL);
> +
> +	printf("EMERGENCY RESTART: All attempts to reboot failed!");

missing a trailing newline, and i'd just call puts() to bypass useless format 
string parsing

> +int system_restart(void)
> +{
> +	int err;
> +
> +	/*
> +	 * Print a nice message and wait a bit to make sure it goes out the
> +	 * console properly.
> +	 */
> +	printf ("Restarting...\n");

no space before the "(", and i'd use puts() here

> +	udelay(50000);

this doesnt sit well with me.  i dont see why this matters ... we dont have 
any delays today, and i dont think we should introduce any.

> +	printf("*** SYSTEM RESTART FAILED ***\n");

puts()

> +__attribute__((__weak__))
> +int __board_restart(void)
> +{
> +	/* Fallthrough to architecture-specific code */
> +	return 0;
> +}
> +
> +__attribute__((__weak__))
> +int __arch_restart(void)
> +{
> +	/* Fallthrough to legacy do_reset() code */
> +	return 0;
> +}

what use is there in having a return value ?  the do_reset() funcs today dont 
return, which means they have no meaningful reset value.
-mike
Kyle Moffett March 7, 2011, 9:56 p.m. UTC | #2
On Mar 07, 2011, at 16:40, Mike Frysinger wrote:
> On Monday, March 07, 2011 12:37:22 Kyle Moffett wrote:
>> +__attribute__((__noreturn__))
>> +void emergency_restart(void)
>> +{
>> +	__board_emergency_restart();
>> +	__arch_emergency_restart();
>> +
>> +	/* Fallback to the old do_reset() until everything is converted. */
>> +	do_reset(NULL, 0, 0, NULL);
>> +
>> +	printf("EMERGENCY RESTART: All attempts to reboot failed!");
> 
> missing a trailing newline, and i'd just call puts() to bypass useless format 
> string parsing

Ok, I'll go check for this in all the patches, thanks.


>> +	udelay(50000);
> 
> this doesnt sit well with me.  i dont see why this matters ... we dont have 
> any delays today, and i dont think we should introduce any.

Actually, several architectures had a printf()+udelay() already.  (i386 and ARM, IIRC).  Since I was moving the printf() to common code I figured I'd better move the udelay() as well.

I believe they had comments to the effect of "This udelay() prevents garbage on the serial console while rebooting", I would guess because it gives the FIFO time to finish flushing.

I can move it back to the individual architectures if you'd like.

>> +__attribute__((__weak__))
>> +int __board_restart(void)
>> +{
>> +	/* Fallthrough to architecture-specific code */
>> +	return 0;
>> +}
>> +
>> +__attribute__((__weak__))
>> +int __arch_restart(void)
>> +{
>> +	/* Fallthrough to legacy do_reset() code */
>> +	return 0;
>> +}
> 
> what use is there in having a return value ?  the do_reset() funcs today dont 
> return, which means they have no meaningful reset value.

Well, actually, a surprising number of them *do* return (at least on some board ports).

In particular I'm doing this for a custom board of ours (HWW-1U-1A) which *must* synchronize with Linux firmware running on another physical processor prior to asserting the reset line, otherwise it probably won't boot up correctly.  Since the synchronization may take an arbitrary amount of time (IE: Waiting for the other Linux to finish initializing), my board's __board_restart() function checks for Ctrl-C on the serial console.  If I actually *do* get a Ctrl-C I need to prevent the __arch_restart() hook from being run, by returning an error code.

The "emergency_restart()" best-effort reboot is obviously a special case, as the system is already wedged and there's nowhere useful to return *to*.

Cheers,
Kyle Moffett
Mike Frysinger March 7, 2011, 10:10 p.m. UTC | #3
On Monday, March 07, 2011 16:56:31 Moffett, Kyle D wrote:
> On Mar 07, 2011, at 16:40, Mike Frysinger wrote:
> > On Monday, March 07, 2011 12:37:22 Kyle Moffett wrote:
> >> +	udelay(50000);
> > 
> > this doesnt sit well with me.  i dont see why this matters ... we dont
> > have any delays today, and i dont think we should introduce any.
> 
> Actually, several architectures had a printf()+udelay() already.  (i386 and
> ARM, IIRC).  Since I was moving the printf() to common code I figured I'd
> better move the udelay() as well.
> 
> I believe they had comments to the effect of "This udelay() prevents
> garbage on the serial console while rebooting", I would guess because it
> gives the FIFO time to finish flushing.

Blackfin specifically does not do this.  personally, a half baked uart char is 
preferable to a (imo) useless hardcoded delay.  i wonder if they picked a 
number that seems to work, or actually calculated it based on the slowest baud 
times the deepest fifo times the extra bits (parity/stop/etc...).  i'd suspect 
the former which means this is even dumber.

> I can move it back to the individual architectures if you'd like.

if we cant agree to simply punt it, then yes, please do.  or make it a config 
option so board porters can opt out, and default it to on for only the 
existing arches.

> >> +__attribute__((__weak__))
> >> +int __board_restart(void)
> >> +{
> >> +	/* Fallthrough to architecture-specific code */
> >> +	return 0;
> >> +}
> >> +
> >> +__attribute__((__weak__))
> >> +int __arch_restart(void)
> >> +{
> >> +	/* Fallthrough to legacy do_reset() code */
> >> +	return 0;
> >> +}
> > 
> > what use is there in having a return value ?  the do_reset() funcs today
> > dont return, which means they have no meaningful reset value.
> 
> Well, actually, a surprising number of them *do* return (at least on some
> board ports).
> 
> In particular I'm doing this for a custom board of ours (HWW-1U-1A) which
> *must* synchronize with Linux firmware running on another physical
> processor prior to asserting the reset line, otherwise it probably won't
> boot up correctly.  Since the synchronization may take an arbitrary amount
> of time (IE: Waiting for the other Linux to finish initializing), my
> board's __board_restart() function checks for Ctrl-C on the serial
> console.  If I actually *do* get a Ctrl-C I need to prevent the
> __arch_restart() hook from being run, by returning an error code.

hrm, sounds dumb to me.  i guess we cant localize this though, and it isnt 
that much overhead, so i wont complain too much about the board part.

what about the arch hook ?
-mike
Graeme Russ March 7, 2011, 11:09 p.m. UTC | #4
On Tue, Mar 8, 2011 at 9:10 AM, Mike Frysinger <vapier@gentoo.org> wrote:
> On Monday, March 07, 2011 16:56:31 Moffett, Kyle D wrote:
>> On Mar 07, 2011, at 16:40, Mike Frysinger wrote:
>> > On Monday, March 07, 2011 12:37:22 Kyle Moffett wrote:
>> >> +  udelay(50000);
>> >
>> > this doesnt sit well with me.  i dont see why this matters ... we dont
>> > have any delays today, and i dont think we should introduce any.
>>
>> Actually, several architectures had a printf()+udelay() already.  (i386 and
>> ARM, IIRC).  Since I was moving the printf() to common code I figured I'd
>> better move the udelay() as well.
>>
>> I believe they had comments to the effect of "This udelay() prevents
>> garbage on the serial console while rebooting", I would guess because it
>> gives the FIFO time to finish flushing.
>
> Blackfin specifically does not do this.  personally, a half baked uart char is
> preferable to a (imo) useless hardcoded delay.  i wonder if they picked a
> number that seems to work, or actually calculated it based on the slowest baud
> times the deepest fifo times the extra bits (parity/stop/etc...).  i'd suspect
> the former which means this is even dumber.
>
>> I can move it back to the individual architectures if you'd like.
>
> if we cant agree to simply punt it, then yes, please do.  or make it a config
> option so board porters can opt out, and default it to on for only the
> existing arches.
>

Is there any way we could detect that the UART FIFO is empty and wait on
that condition (with a timeout in case the FIFO is never emptied)?

Regards,

Graeme
Mike Frysinger March 8, 2011, 2:45 a.m. UTC | #5
On Monday, March 07, 2011 18:09:25 Graeme Russ wrote:
> On Tue, Mar 8, 2011 at 9:10 AM, Mike Frysinger <vapier@gentoo.org> wrote:
> > On Monday, March 07, 2011 16:56:31 Moffett, Kyle D wrote:
> >> On Mar 07, 2011, at 16:40, Mike Frysinger wrote:
> >> > On Monday, March 07, 2011 12:37:22 Kyle Moffett wrote:
> >> >> +  udelay(50000);
> >> > 
> >> > this doesnt sit well with me.  i dont see why this matters ... we dont
> >> > have any delays today, and i dont think we should introduce any.
> >> 
> >> Actually, several architectures had a printf()+udelay() already.  (i386
> >> and ARM, IIRC).  Since I was moving the printf() to common code I
> >> figured I'd better move the udelay() as well.
> >> 
> >> I believe they had comments to the effect of "This udelay() prevents
> >> garbage on the serial console while rebooting", I would guess because it
> >> gives the FIFO time to finish flushing.
> > 
> > Blackfin specifically does not do this.  personally, a half baked uart
> > char is preferable to a (imo) useless hardcoded delay.  i wonder if they
> > picked a number that seems to work, or actually calculated it based on
> > the slowest baud times the deepest fifo times the extra bits
> > (parity/stop/etc...).  i'd suspect the former which means this is even
> > dumber.
> > 
> >> I can move it back to the individual architectures if you'd like.
> > 
> > if we cant agree to simply punt it, then yes, please do.  or make it a
> > config option so board porters can opt out, and default it to on for
> > only the existing arches.
> 
> Is there any way we could detect that the UART FIFO is empty and wait on
> that condition (with a timeout in case the FIFO is never emptied)?

that would require extending the serial API to add a new func, have each 
serial driver implement (i'm pretty sure most serial devices out there expose 
a bit to poll such status), and then having the code call it.  i'm not saying 
it cant be done, just outlining the requirements.

although i vaguely recall that Wolfgang is against this sort of thing since it 
kind of goes against the KISS principle ...
-mike
Wolfgang Denk March 13, 2011, 7:24 p.m. UTC | #6
Dear Kyle Moffett,

In message <1299519462-25320-2-git-send-email-Kyle.D.Moffett@boeing.com> you wrote:
> In preparation for making system restart use a generic set of hooks for
> boards and architectures, we define some wrappers and weak stubs.
> 
> The new wrapper functions are:
>   system_restart()     -  Normal system reboot (IE: user request)
>   emergency_restart()  -  Critical error response (IE: panic(), etc)

What is the difference between these two - and why do we need
different functions at all?

A reset is a reset is a reset, isn't it?

...
> +/*
> + * This MUST be called from normal interrupts-on context.  If it requires
> + * extended interaction with external hardware it SHOULD react to Ctrl-C.

??? Why such complicated restrictions for a simple command that is
just supposed to reset the board?

> + * If this function fails to guarantee a clean reboot or receives a Ctrl-C
> + * keystroke it SHOULD return with an error (-1).

A "reset" is supposed to take place immediately, and unconditionally.
If you need delays and ^C handling and other bells and whistles,
please add these to your own code, but not here.

> +int system_restart(void)
> +{
> +	int err;
> +
> +	/*
> +	 * Print a nice message and wait a bit to make sure it goes out the
> +	 * console properly.
> +	 */
> +	printf ("Restarting...\n");
> +	udelay(50000);
> +
> +	/* First let the board code try to reboot */
> +	err = __board_restart();
> +	if (err)
> +		goto failed;
> +
> +	/* Now call into the architecture-specific code */
> +	err = __arch_restart();
> +	if (err)
> +		goto failed;
> +
> +	/* Fallback to the old do_reset() until everything is converted. */
> +	err = do_reset(NULL, 0, 0, NULL);
> +
> +failed:
> +	printf("*** SYSTEM RESTART FAILED ***\n");
> +	return err;
> +}

You are making a simple thing pretty much complicated.  This adds lots
of code and I cannot see any benefits.

My initial feeling is a plain NAK, for this and the rest of the patch
series.  Why would we want all this?

Best regards,

Wolfgang Denk
Kyle Moffett March 14, 2011, 4:23 p.m. UTC | #7
Hi!

On Mar 13, 2011, at 15:24, Wolfgang Denk wrote:
> In message <1299519462-25320-2-git-send-email-Kyle.D.Moffett@boeing.com> you wrote:
>> In preparation for making system restart use a generic set of hooks for
>> boards and architectures, we define some wrappers and weak stubs.
>> 
>> The new wrapper functions are:
>>  system_restart()     -  Normal system reboot (IE: user request)
>>  emergency_restart()  -  Critical error response (IE: panic(), etc)
> 
> What is the difference between these two - and why do we need
> different functions at all?
> 
> A reset is a reset is a reset, isn't it?

That might be true *IF* all boards could actually perform a real hardware reset.

Some can't, and instead they just jump to their reset vector (Nios-II) or to flash (some ppc 74xx/7xx systems).

If the board just panic()ed or got an unhandled trap or exception, then you don't want to do a soft-reset that assumes everything is OK.  A startup in a bad environment like that could corrupt FLASH or worse.  Right now there is no way to tell the difference, but the lower-level arch-specific code really should care.

So system_restart() is what you use when the system is in a good normal operating condition.  The emergency_restart() is what gets called from panic() or in other places where a crash has happened.


> ...
>> +/*
>> + * This MUST be called from normal interrupts-on context.  If it requires
>> + * extended interaction with external hardware it SHOULD react to Ctrl-C.
> 
> ??? Why such complicated restrictions for a simple command that is
> just supposed to reset the board?

This is just documenting what the underlying hardware-specific code is guaranteed.  On some hardware a "system_restart()" may fail and return -1, on others the board needs to cleanly respond to interrupts while polling external hardware, etc.  This is analogous to the normal "sys_reboot()" code under Linux, which requires interrupts-on, etc.

This is to contrast with the emergency_restart() function which has no such API constraints.  It can be called from any context whatsoever and will do its best to either perform a full hardware reset or hang while trying.


>> + * If this function fails to guarantee a clean reboot or receives a Ctrl-C
>> + * keystroke it SHOULD return with an error (-1).
> 
> A "reset" is supposed to take place immediately, and unconditionally.
> If you need delays and ^C handling and other bells and whistles,
> please add these to your own code, but not here.

There's no Ctrl-C handling anywhere in this code, it will all be in my own __board_restart() hook.  As above, this documentation is just describing the guarantees provided to underlying __board_restart() and __arch_restart() hooks; if they check for Ctrl-C while polling external hardware and return an error then that's fine.

>> +int system_restart(void)
>> +{
>> +	int err;
>> +
>> +	/*
>> +	 * Print a nice message and wait a bit to make sure it goes out the
>> +	 * console properly.
>> +	 */
>> +	printf ("Restarting...\n");
>> +	udelay(50000);
>> +
>> +	/* First let the board code try to reboot */
>> +	err = __board_restart();
>> +	if (err)
>> +		goto failed;
>> +
>> +	/* Now call into the architecture-specific code */
>> +	err = __arch_restart();
>> +	if (err)
>> +		goto failed;
>> +
>> +	/* Fallback to the old do_reset() until everything is converted. */
>> +	err = do_reset(NULL, 0, 0, NULL);
>> +
>> +failed:
>> +	printf("*** SYSTEM RESTART FAILED ***\n");
>> +	return err;
>> +}
> 
> You are making a simple thing pretty much complicated.  This adds lots
> of code and I cannot see any benefits.

No, it's actually a net removal of code, the diffstat is:
  90 files changed, 341 insertions(+), 374 deletions(-)

The basic hook structure from system_restart() already existed individually in 6 different architectures (incl. PowerPC, Blackfin, ARM, MIPS...), each of which had its own "board_reset()" or "_machine_restart()" or similar.  This code makes it entirely generic, so as a new board implementor you can simply define a "__board_restart()" function to override (or enhance) the normal architecture-specific code.


> My initial feeling is a plain NAK, for this and the rest of the patch
> series.  Why would we want all this?

While I was going through the hooks I noticed that several of them were explicitly NOT safe if the board was in the middle of a panic() for whatever reason, so I split off the __*_emergency_restart() hooks separately to allow architectures to handle them cleanly.

My own board needs both processor modules to synchronize resets to allow them to come back up at all, which means that a "reset" may block for an arbitrary amount of time waiting for the other module to cleanly shut down and restart (or waiting for somebody to type "reset" on the other U-Boot).  If someone just types "reset" on the console, I want to allow them to hit Ctrl-C to interrupt the process.

Cheers,
Kyle Moffett
Wolfgang Denk March 14, 2011, 6:59 p.m. UTC | #8
Dear "Moffett, Kyle D",

In message <A60AEA13-1206-4699-9302-0DF9C0F9DE28@boeing.com> you wrote:
> 
> >> The new wrapper functions are:
> >>  system_restart()     -  Normal system reboot (IE: user request)
> >>  emergency_restart()  -  Critical error response (IE: panic(), etc)
> > 
> > What is the difference between these two - and why do we need
> > different functions at all?
> > 
> > A reset is a reset is a reset, isn't it?
>
> That might be true *IF* all boards could actually perform a real hardware reset.
>
> Some can't, and instead they just jump to their reset vector (Nios-II) or to flash (some ppc 74xx/7xx systems).

So this is the "reset" on these boards, then.

> If the board just panic()ed or got an unhandled trap or exception, then you
> don't want to do a soft-reset that assumes everything is OK.  A startup in
> a bad environment like that could corrupt FLASH or worse.  Right now there
> is no way to tell the difference, but the lower-level arch-specific code
> really should care.

I don't understand your chain of arguments.

If there really is no better way to implement the reset on such
boards, then what else can we do?

And if there are more things that could be done to provide a "better"
reset, then why should we not always do these?

> So system_restart() is what you use when the system is in a good normal
> operating condition.  The emergency_restart() is what gets called from panic()
> or in other places where a crash has happened.

Why?  What's the difference?

> >> + * If this function fails to guarantee a clean reboot or receives a Ctrl-C
> >> + * keystroke it SHOULD return with an error (-1).
> > 
> > A "reset" is supposed to take place immediately, and unconditionally.
> > If you need delays and ^C handling and other bells and whistles,
> > please add these to your own code, but not here.
>
> There's no Ctrl-C handling anywhere in this code, it will all be in my own
> __board_restart() hook.  As above, this documentation is just describing the

There is no ^C handling supposed to be in any reset hook.

You are changing user interfaces to very low-level and intentinally
simple commands in a complicated way, and I don;t see any advantage of
either this complexity nor your changes.

> guarantees provided to underlying __board_restart() and __arch_restart()
> hooks; if they check for Ctrl-C while polling external hardware and return
> an error then that's fine.

No, it is not, because it is not supposed to be done.

You could as well implement a "reset" cpmmand that actually turns on a
fan and the LCD backlight - that would be similarly useful.

> > My initial feeling is a plain NAK, for this and the rest of the patch
> > series.  Why would we want all this?
>
> While I was going through the hooks I noticed that several of them were
> explicitly NOT safe if the board was in the middle of a panic() for whatever

Can you please peovide some specific xamples?  I don't understand what
you are talking about.

> reason, so I split off the __*_emergency_restart() hooks separately to allow
> architectures to handle them cleanly.
>
> My own board needs both processor modules to synchronize resets to allow
> them to come back up at all, which means that a "reset" may block for an
> arbitrary amount of time waiting for the other module to cleanly shut down
> and restart (or waiting for somebody to type "reset" on the other U-Boot).
> If someone just types "reset" on the console, I want to allow them to hit
> Ctrl-C to interrupt the process.

This is not what the "reset" command is supposed to do.  The reset
command is supposed to be the software equivalent of someone pressing
the reset button on your board - to the extend possible to be
implemented in software.

Best regards,

Wolfgang Denk
Kyle Moffett March 14, 2011, 7:52 p.m. UTC | #9
On Mar 14, 2011, at 14:59, Wolfgang Denk wrote:
> In message <A60AEA13-1206-4699-9302-0DF9C0F9DE28@boeing.com> you wrote:
>> My own board needs both processor modules to synchronize resets to allow
>> them to come back up at all, which means that a "reset" may block for an
>> arbitrary amount of time waiting for the other module to cleanly shut down
>> and restart (or waiting for somebody to type "reset" on the other U-Boot).
>> If someone just types "reset" on the console, I want to allow them to hit
>> Ctrl-C to interrupt the process.
> 
> This is not what the "reset" command is supposed to do.  The reset
> command is supposed to be the software equivalent of someone pressing
> the reset button on your board - to the extend possible to be
> implemented in software.

On our boards, when the "reset" button is pressed in hardware, both processor modules on the board and all the attached hardware reset at the same time.

If just *one* of the 2 CPUs triggers the reset then only *some* of the attached hardware will be properly reset due to a hardware errata, and as a result the board will sometimes hang or corrupt DMA transfers to the SSDs shortly after reset.

The only way to reset either chip safely is by resetting both at the same time, which requires them to communicate before the reset and wait (possibly a long time) for the other board to agree to reset.  Yes, it's a royal pain, but we're stuck with this hardware for the time being, and if the board can't communicate then it might as well hang() anyways.

This same logic is also implemented in my Linux board-support code, so when one CPU requests a reset the other treats it as a Ctrl-Alt-Del.


>>> What is the difference between these two - and why do we need
>>> different functions at all?
>>> 
>>> A reset is a reset is a reset, isn't it?
>> 
>> That might be true *IF* all boards could actually perform a real hardware reset.
>> 
>> Some can't, and instead they just jump to their reset vector (Nios-II) or to flash (some ppc 74xx/7xx systems).
> 
> So this is the "reset" on these boards, then.
> 
>> If the board just panic()ed or got an unhandled trap or exception, then you
>> don't want to do a soft-reset that assumes everything is OK.  A startup in
>> a bad environment like that could corrupt FLASH or worse.  Right now there
>> is no way to tell the difference, but the lower-level arch-specific code
>> really should care.
> 
> I don't understand your chain of arguments.
> 
> If there really is no better way to implement the reset on such
> boards, then what else can we do?
> 
> And if there are more things that could be done to provide a "better"
> reset, then why should we not always do these?

If the board is in a panic() state it may well have still-running DMA transfers (such as USB URBs), or be in the middle of writing to FLASH.

Performing a jump to early-boot code which is only ever tested when everything is OK and devices are properly initialized is a great way to cause data corruption.

I know for a fact that our boards would rather hang forever than try to reset without cooperation from the other CPU.

>>> My initial feeling is a plain NAK, for this and the rest of the patch
>>> series.  Why would we want all this?
>> 
>> While I was going through the hooks I noticed that several of them were
>> explicitly NOT safe if the board was in the middle of a panic() for whatever
> 
> Can you please peovide some specific examples?  I don't understand what
> you are talking about.

Ok, using the ppmc7xx board as an example:

        /* Disable and invalidate cache */
        icache_disable();
        dcache_disable();

        /* Jump to cold reset point (in RAM) */
        _start();

        /* Should never get here */
        while(1)
                ;

This board uses the EEPRO100 driver, which appears to set up statically allocated TX and RX rings which the device performs DMA to/from.

If this board starts receiving packets and then panic()s, it will disable address translation and immediately re-relocate U-Boot into RAM, then zero the BSS.  If the network card tries to receive a packet after BSS is zeroed, it will read a packet buffer address of (probably) 0x0 from the RX ring and promptly overwrite part of U-Boot's memory at that address.

Depending on the initialization process and memory layout, other similar boards could start writing garbage values to FLASH control registers and potentially corrupt data.

Since the panic() path is so infrequently used and tested, it's better to be safe and hang() on the boards which do not have a reliable hardware-level reset than it is to cause undefined behavior or potentially corrupt data.

Cheers,
Kyle Moffett
Wolfgang Denk March 14, 2011, 8:38 p.m. UTC | #10
Dear "Moffett, Kyle D",

In message <613C8F89-3CE5-4C28-A48E-D5C3E8143A4C@boeing.com> you wrote:
>
> On our boards, when the "reset" button is pressed in hardware, both
> processor modules on the board and all the attached hardware reset at
> the same time.

OK.  So a sane design would provide a way for both of the processors
to do the same, for example by toggeling some GPIO or similar.

> If just *one* of the 2 CPUs triggers the reset then only *some* of
> the attached hardware will be properly reset due to a hardware
> errata, and as a result the board will sometimes hang or corrupt DMA
> transfers to the SSDs shortly after reset.
...
> Yes, it's a royal pain, but we're stuck with this hardware for the
> time being, and if the board can't communicate then it might as well
> hang() anyways.

Do you agree that this is a highly board-specific problem (I would
call it a hardware bug, but I don't insist you agree on that term),
and while there is a the need for you to work around such behaviour
there is little or no reason to do this, or anything like that, in
common code ?


> > And if there are more things that could be done to provide a "better"
> > reset, then why should we not always do these?
> 
> If the board is in a panic() state it may well have still-running DMA
> transfers (such as USB URBs), or be in the middle of writing to
> FLASH.

The same (at least having USB or other drivers still being enabled,
and USB writing it's SOF counters to RAM) can happen for any call to
the reset() function.  I see no reason for assuming there would be
better or worse conditions to perform a reset.

> Performing a jump to early-boot code which is only ever tested when
> everything is OK and devices are properly initialized is a great way
> to cause data corruption.

If there is a software way to prevent such issues, then these steps
should always be performed.

> I know for a fact that our boards would rather hang forever than try
> to reset without cooperation from the other CPU.

As mentioned above, this is a board specific issue that should not
influence common code design.

> >> While I was going through the hooks I noticed that several of them were
> >> explicitly NOT safe if the board was in the middle of a panic() for whatever
> > 
> > Can you please peovide some specific examples?  I don't understand what
> > you are talking about.
> 
> Ok, using the ppmc7xx board as an example:
> 
>         /* Disable and invalidate cache */
>         icache_disable();
>         dcache_disable();
> 
>         /* Jump to cold reset point (in RAM) */
>         _start();
> 
>         /* Should never get here */
>         while(1)
>                 ;
> 
> This board uses the EEPRO100 driver, which appears to set up
> statically allocated TX and RX rings which the device performs DMA
> to/from.
> 
> If this board starts receiving packets and then panic()s, it will
> disable address translation and immediately re-relocate U-Boot into
> RAM, then zero the BSS. If the network card tries to receive a packet
> after BSS is zeroed, it will read a packet buffer address of
> (probably) 0x0 from the RX ring and promptly overwrite part of
> U-Boot's memory at that address.

Agreed.  So this should be fixed.  One clean way to fix it would be to
help improving the driver model for U-Boot (read: create one) and
making sure drivers get deinitialized in such a case.

> Since the panic() path is so infrequently used and tested, it's
> better to be safe and hang() on the boards which do not have a
> reliable hardware-level reset than it is to cause undefined behavior
> or potentially corrupt data.

I disagree.  Instead of adding somewhat obscure alternate code paths
(which get tested even less frequently) we should focus oin fixing
such problems where we run into them.

Best regards,

Wolfgang Denk
Kyle Moffett March 14, 2011, 9:20 p.m. UTC | #11
On Mar 14, 2011, at 16:38, Wolfgang Denk wrote:
> In message <613C8F89-3CE5-4C28-A48E-D5C3E8143A4C@boeing.com> you wrote:
>> 
>> If just *one* of the 2 CPUs triggers the reset then only *some* of
>> the attached hardware will be properly reset due to a hardware
>> errata, and as a result the board will sometimes hang or corrupt DMA
>> transfers to the SSDs shortly after reset.
> ...
>> Yes, it's a royal pain, but we're stuck with this hardware for the
>> time being, and if the board can't communicate then it might as well
>> hang() anyways.
> 
> Do you agree that this is a highly board-specific problem (I would
> call it a hardware bug, but I don't insist you agree on that term),
> and while there is a the need for you to work around such behaviour
> there is little or no reason to do this, or anything like that, in
> common code ?

Oh, absolutely.  I do think there still needs to be a separation between a "normal user-initiated restart" and an "panic-time emergency restart" though, see further on in this email.

The comment about Ctrl-C was simply because our board restart can be aborted by someone at the console because it may take a while for the other CPU to cleanly shut down and acknowledge.  Our __board_restart() watches for Ctrl-C to handle this nicely, and then returns an error, which makes do_reset() return to the user or script with an error.  Obviously in the middle of a panic() there is nothing useful to return to, so the code keeps polling regardless.

Such decisions on what is and is not "acceptable" to run on a panic() are better left to the individual boards and architectures.  Specifically, the separate board and arch hooks for regular and "emergency" restarts that I included in the patch:

  __arch_restart()
  __board_restart()
  __arch_emergency_restart()
  __board_emergency_restart()

>>> And if there are more things that could be done to provide a "better"
>>> reset, then why should we not always do these?
>> 
>> If the board is in a panic() state it may well have still-running DMA
>> transfers (such as USB URBs), or be in the middle of writing to
>> FLASH.
> 
> The same (at least having USB or other drivers still being enabled,
> and USB writing it's SOF counters to RAM) can happen for any call to
> the reset() function.  I see no reason for assuming there would be
> better or worse conditions to perform a reset.

I would argue that is a bug to be fixed.  Regardless of how various boards and architectures implement "reset", U-Boot should provide generic functionality to drivers and library code to allow them to indicate what they want:

  (1) A safe normal operational restart, with all hardware shut down (as much as is considered necessary on the platform).  Depending on the platform this may fail or take some time.

  (2) A critical error restart, where system state may be undefined and the calling code does not expect the function to ever return.

Linux has *both* of those cases in the kernel:  sys_reboot() and emergency_restart().

>> If this board starts receiving packets and then panic()s, it will
>> disable address translation and immediately re-relocate U-Boot into
>> RAM, then zero the BSS. If the network card tries to receive a packet
>> after BSS is zeroed, it will read a packet buffer address of
>> (probably) 0x0 from the RX ring and promptly overwrite part of
>> U-Boot's memory at that address.
> 
> Agreed.  So this should be fixed.  One clean way to fix it would be to
> help improving the driver model for U-Boot (read: create one) and
> making sure drivers get deinitialized in such a case.

This another excellent reason to have separate system_restart() and emergency_restart().  The "system_restart()" function would hook into the driver model and perform device shutdown (just like Linux), while the emergency_restart() function (and therefore panic()) would not.

The "jump to _start" case is very similar to Linux kexec().  There are two specific use-cases:
  (1) Safe reliable run-time handoff from one kernel to another
  (2) Emergency panic() call into another kernel to record the error and reboot safely

In the former case, the kernel runs all of its normal shutdown handlers and quiesces all DMA before passing control to the new kernel.  Under U-Boot this is analogous to the "system_restart()" function I added.  The "system_restart()" function would be the ideal central place to hook driver-model device shut-down.

In the latter case (similar to "emergency_restart()"), the new kernel runs in a sub-section of memory *completely-preallocated* at boot time, and explicitly ignores all other memory because there might be ongoing DMA transactions.  The "emergency kexec" case intentionally avoids running device shutdown handlers because the system state is not well defined.  Under U-Boot there is no way to run completely from a preallocated chunk of memory, which means that a panic()-time jump to "_start" is not safe.

Cheers,
Kyle Moffett
Wolfgang Denk March 14, 2011, 10:01 p.m. UTC | #12
Dear "Moffett, Kyle D",

In message <44A75130-ED4F-46D6-B0E4-12433CC15142@boeing.com> you wrote:
> 
> Oh, absolutely. I do think there still needs to be a separation
> between a "normal user-initiated restart" and an "panic-time
> emergency restart" though, see further on in this email.

These terms refer to another software level, though.

do_reset() is supposed to provide a hard reset function, nothing else
- as mentioned, the software way of pressing the reset button (or
pulling the plug).

A "normal user-initiated restart" would be the equivalent of a
"shutdown -r" in Linux context. We don't offer such a service in
U-Boot, as it has never been needed yet. If implemented, it would
probably call do_reset() as last of it's actions.

A "panic-time emergency restart" can be anything, too - again, it
would probably call do_reset() at the end.

> Such decisions on what is and is not "acceptable" to run on a panic()
> are better left to the individual boards and architectures.

This is a completely new topic now.   We have not been discussing the
implementation or function of panic() before.  This has nothing to do
with what do_reset() is supposed to do - do_reset() may (or may not)
be one action called by panic() [and if so, it should be the last
thing that panic() has been doing.]

> Specifically, the separate board and arch hooks for regular and
> "emergency" restarts that I included in the patch:
> 
>   __arch_restart()
>   __board_restart()
>   __arch_emergency_restart()
>   __board_emergency_restart()

Yes, and I object against such unneeded (for everybody else)
complexity.


> I would argue that is a bug to be fixed. Regardless of how various
> boards and architectures implement "reset", U-Boot should provide
> generic functionality to drivers and library code to allow them to
> indicate what they want:
> 
>   (1) A safe normal operational restart, with all hardware shut down
> (as much as is considered necessary on the platform). Depending on
> the platform this may fail or take some time.
> 
>   (2) A critical error restart, where system state may be undefined
> and the calling code does not expect the function to ever return.

This is overkill for a boot loader.  We assume that when anything goes
wrong we do the best we can to perform a hard reset.  Any halfway sane
hardware design will allow you do do that.  There is a zillion of ways
to do that - from causing a machine check, using a hardware watchdog,
messing limits of system monitor chips, etc. etc.  Or there is a
simple GPIO pin that triggers an external hard reset.

If some hardware provides no such option I will not hesitate to call
this a hardare bug and blame the hardware designer.

Workarounds for such bugs should be dealt with in board specific code.

> Linux has *both* of those cases in the kernel: sys_reboot() and
> emergency_restart().

Linux is an OS.  U-Boot is a boot loader.

Linux offers many things and services that are not available in
U-Boot.

I am even tempted to recommend you to boot Linux as part of your reset
sequence ;-)

> The "jump to _start" case is very similar to Linux kexec().  There are two specific use-cases:

The "jump to _start" case is something I consider a bug that should be
fixed.  I will not accept the existence of such code as reason to
build arbitrarily complex layers on top of it.  It may be the best
possible solution in the given case (which I actually doubt), but even
then it's just that: the best possible approximation to what
actuallyis needed.

>   (1) Safe reliable run-time handoff from one kernel to another
>   (2) Emergency panic() call into another kernel to record the error and reboot safely

U-Boot just provides "reset".


I think I understand what you have in mind, but I'm not going to
accept that, especially not in common code.  Sorry.

Best regards,

Wolfgang Denk
Graeme Russ March 21, 2011, 11:43 a.m. UTC | #13
On 15/03/11 09:01, Wolfgang Denk wrote:
> Dear "Moffett, Kyle D",
> 
> In message <44A75130-ED4F-46D6-B0E4-12433CC15142@boeing.com> you wrote:
>>

[Snip]

I kind of like the idea of different reset sources (CPU exception, hardware
failure, user initiated) but agree copying the linux architecture is over
the top.

Is there any reason reset() could not take a 'reason' parameter? It could
be a bit-mask with CPU, SOC and arch reserved bits (unhandled exception,
user initiated, panic etc) and board specific bits

Board or arch specific code could handle different reasons however they
please (like logging it in NVRAM prior to restart, gracefully shutting down
multiple CPU's, clearing DMA buffers etc)

All 'hang', 'panic', 'reset' etc code can be simplified into a single code
path (although calling 'reset' to 'hang' is a bit odd)

Just a thought

Regards,

Graeme
Wolfgang Denk March 21, 2011, noon UTC | #14
Dear Graeme Russ,

In message <4D8739F6.5040805@gmail.com> you wrote:
>
> I kind of like the idea of different reset sources (CPU exception, hardware
> failure, user initiated) but agree copying the linux architecture is over
> the top.

What's the difference as far as do_reset() is concenred?  It shall
just (hard) reset the system, nothing else.

> Is there any reason reset() could not take a 'reason' parameter? It could
> be a bit-mask with CPU, SOC and arch reserved bits (unhandled exception,
> user initiated, panic etc) and board specific bits

What for? To perform the intended purpose, no parameter is needed.

> Board or arch specific code could handle different reasons however they
> please (like logging it in NVRAM prior to restart, gracefully shutting down
> multiple CPU's, clearing DMA buffers etc)

That would be a layer higher than do_reset() (for example, in
panic()).

> All 'hang', 'panic', 'reset' etc code can be simplified into a single code
> path (although calling 'reset' to 'hang' is a bit odd)

hang() and reset() are intentionally very different things.  A call to
hang() is supposed to hang (surprise, surprise!) infinitely.  It must
not cause a reset.

panic() is a higher software layer. It probably results in calling
reset() in the end.

Best regards,

Wolfgang Denk
Graeme Russ March 22, 2011, 12:05 p.m. UTC | #15
On 21/03/11 23:00, Wolfgang Denk wrote:
> Dear Graeme Russ,
> 
> In message <4D8739F6.5040805@gmail.com> you wrote:
>>
>> I kind of like the idea of different reset sources (CPU exception, hardware
>> failure, user initiated) but agree copying the linux architecture is over
>> the top.
> 
> What's the difference as far as do_reset() is concenred?  It shall
> just (hard) reset the system, nothing else.
> 
>> Is there any reason reset() could not take a 'reason' parameter? It could
>> be a bit-mask with CPU, SOC and arch reserved bits (unhandled exception,
>> user initiated, panic etc) and board specific bits
> 
> What for? To perform the intended purpose, no parameter is needed.
> 
>> Board or arch specific code could handle different reasons however they
>> please (like logging it in NVRAM prior to restart, gracefully shutting down
>> multiple CPU's, clearing DMA buffers etc)
> 
> That would be a layer higher than do_reset() (for example, in
> panic()).

Hmmm, but panic() is defined in lib/vsprintf.c with no possibility for it
to be overridden in any arch or board specific way

> 
>> All 'hang', 'panic', 'reset' etc code can be simplified into a single code
>> path (although calling 'reset' to 'hang' is a bit odd)
> 
> hang() and reset() are intentionally very different things.  A call to
> hang() is supposed to hang (surprise, surprise!) infinitely.  It must
> not cause a reset.

As I said, calling reset() to hang is odd :)

> 
> panic() is a higher software layer. It probably results in calling
> reset() in the end.
> 

Unless CONFIG_PANIC_HANG is defined...

Looking into the code

panic():
  - ~130 call sites
  - Implemented in lib/vsprintf.c
  - Calls do_reset() if CONFIG_PANIC_HANG is not defined
  - Calls hang() if CONFIG_PANIC_HANG is defined

hang():
  - ~180 call sites using hang() and hang ()
  - Implemented in arch\lib\board.c
  - ARM, i386, m68k, microblaze, mips,  prints "### ERROR ### Please RESET
the board ###\n" and loops
  - avr32 prints nothing and loops
  - Blackfin can set status LEDs based on #define, prints "### ERROR ###
Please RESET the board ###\n" and triggers a breakpoint if a JTAG debugger
is connected
  - nios2 disables interrupts then does the same as ARM et all
  - powerpc is similar to ARM et al but also calls show_boot_progress(-30)
  - sh - same as ARM et al but prints "Board ERROR\n"
  - sparc - if CONFIG_SHOW_BOOT_PROGRESS defined, same as powerpc, else
same as ARM et al

So hang() could easily be weakly implemented in common\ which would allow
arch and board specific overrides

do_reset():
  - ~70 call sites
  - 39 implemetations
  - Implemented all over the place (arch/cpu/, arch/lib, board/)
  - Only ARM is in arch/lib which could likely be moved to arch/cpu
  - Is U_BOOT_CMD
  - m68k has a number of very similar/identical implementations
  - Is not weak - Cannot be overridden at the board level
  - Ouch, PPC is real ugly:
    #if !defined(CONFIG_PCIPPC2) && \
        !defined(CONFIG_BAB7xx)  && \
        !defined(CONFIG_ELPPC)   && \
        !defined(CONFIG_PPMC7XX)
    /* no generic way to do board reset. simply call soft_reset. */
    int do_reset(cmd_tbl_t *cmdtp, int flag, int argc, char * const argv[])
    {
    ...
    Boards can thus provide their own do_reset()
  - Wow, m68k/cpu/mcf52x2/cpu.c is even uglier - 7 definitions selectable
    by #ifdef's
  - Because do_reset() is U_BOOT_CMD, practically every call is:
    do_reset (NULL, 0, 0, NULL) - do_bootm() does pass it's args onto
    do_reset()
  - No implementation of do_reset() uses any args anyway

reset():
  - does not exist (as far as I can tell)

So, maybe we could replace do_reset() with a weak reset() function defined
in arch/cpu which can be overridden at the board level. All existing calls
to do_reset() can be converted to simply reset()

The U_BOOT_CMD do_reset() would simply call reset()

Could many of the current calls to do_reset() be replaced with calls to
panic()?

I still like the idea of passing a 'reason' to reset() / panic() - Could we
change panic() to:

void panic(u32 reason, const char *fmt, ...)
{
	va_list	args;
	va_start(args, fmt);
	vprintf(fmt, args);
	putc('\n');
	va_end(args);

	if (reason |= PANIC_FLAG_HANG)
		hang(reason);
	else
		reset(reason);
}

Anywhere in the code that needed to hang has a choice - hang(reason) or
panic(reason | PANIC_FLAG_HANG)

Default implementations of hang() and reset() would just ignore reason.
Board specific code can use reason to do one last boot_progress(), set LED
states etc.

Regards,

Graeme
Wolfgang Denk March 22, 2011, 1:28 p.m. UTC | #16
Dear Graeme Russ,

In message <4D88909A.80508@gmail.com> you wrote:
>
> > That would be a layer higher than do_reset() (for example, in
> > panic()).
> 
> Hmmm, but panic() is defined in lib/vsprintf.c with no possibility for it
> to be overridden in any arch or board specific way

I guess that could be helped.

> > panic() is a higher software layer. It probably results in calling
> > reset() in the end.
> 
> Unless CONFIG_PANIC_HANG is defined...

> reset():
>   - does not exist (as far as I can tell)

reset() is used as symbol in many arm, mips and sparc start.S files

> I still like the idea of passing a 'reason' to reset() / panic() - Could we
> change panic() to:
...
> Anywhere in the code that needed to hang has a choice - hang(reason) or
> panic(reason | PANIC_FLAG_HANG)

I don't think you resonably decide which to use in common code.

Most calls to panic() appear to come from proprietary flash drivers
anyway - most of which should be dropped as they could use the CFI
driver instead. [And if you look at the actual code, the tests for
these panic()s can easily be computed at compile time, so these are
stupid aniway.]

Others 

Now, assume for things like this:

	panic("No working SDRAM available\n");

or like handling undefined instructions or that - what would be more
useful - to hang() to to reset()? ;-)


Can you please show me a specific case where you would use such
different arguments to panic() in the existing code?

> Default implementations of hang() and reset() would just ignore reason.
> Board specific code can use reason to do one last boot_progress(), set LED
> states etc.

That should be done at a higher software layer.

Best regards,

Wolfgang Denk
Graeme Russ March 23, 2011, 12:19 a.m. UTC | #17
On Wed, Mar 23, 2011 at 12:28 AM, Wolfgang Denk <wd@denx.de> wrote:
> Dear Graeme Russ,
>
> In message <4D88909A.80508@gmail.com> you wrote:
>>
>> > That would be a layer higher than do_reset() (for example, in
>> > panic()).
>>
>> Hmmm, but panic() is defined in lib/vsprintf.c with no possibility for it
>> to be overridden in any arch or board specific way
>
> I guess that could be helped.
>
>> > panic() is a higher software layer. It probably results in calling
>> > reset() in the end.
>>
>> Unless CONFIG_PANIC_HANG is defined...
>
>> reset():
>>   - does not exist (as far as I can tell)
>
> reset() is used as symbol in many arm, mips and sparc start.S files

Good point

>
>> I still like the idea of passing a 'reason' to reset() / panic() - Could we
>> change panic() to:
> ...
>> Anywhere in the code that needed to hang has a choice - hang(reason) or
>> panic(reason | PANIC_FLAG_HANG)
>
> I don't think you resonably decide which to use in common code.

My point was that everything can be piped through panic()

>
> Most calls to panic() appear to come from proprietary flash drivers
> anyway - most of which should be dropped as they could use the CFI
> driver instead. [And if you look at the actual code, the tests for
> these panic()s can easily be computed at compile time, so these are
> stupid aniway.]
>
> Others
>
> Now, assume for things like this:
>
>        panic("No working SDRAM available\n");
>
> or like handling undefined instructions or that - what would be more
> useful - to hang() to to reset()? ;-)

Replace with:
        panic(PANIC_FLAG_HANG, "No working SDRAM available\n");

or:
        panic(PANIC_REASON_NO_RAM, "No working SDRAM available\n");

And all of the current do_reset(NULL, 0, 0, NULL) can be changed to:
        panic(PANIC_FLAG_RESET, NULL);

or if the print a message before the call to reset then:
        panic(PANIC_FLAG_RESET, "Whatever message was printed\n");

And panic() becomes:

void panic(u32 reason, const char *fmt, ...)
{
       if(fmt) {
              va_list args;
              va_start(args, fmt);
              vprintf(fmt, args);
              putc('\n');
              va_end(args);
       }

       if (reason |= PANIC_FLAG_HANG)
               hang(reason);
       else,
               reset(reason);
}

>
>
> Can you please show me a specific case where you would use such
> different arguments to panic() in the existing code?

My reasoning is cleaning up the reset()/hang()/panic() API.

Also, consider devices which do not normally have any device attached to
log serial output, but you may want to log reset/hang reasons for diagnosis
later. Board defined hang() and reset() can log the reason in NVRAM and at
next bootup (with a serial console attached) part of the startup message
could be 'Last Reset Reason'

>
>> Default implementations of hang() and reset() would just ignore reason.
>> Board specific code can use reason to do one last boot_progress(), set LED
>> states etc.
>
> That should be done at a higher software layer.
>

How? For example, if an Ethernet device which the board uses to tftp a file
from fails to initialise, that failure is detected in the common driver
code and as a consequence hang(), reset(), or panic() is called. The driver
can print out a message before calling hang() or reset() (useless if you
have no serial console attached) and by the time any arch or board specific
code gets called, all information regarding the failure has been lost. Why
should a common driver decide if the board should hang or reset? What if
the board has an alternative method of retrieving the file it was going to
tftp (maybe a fail-safe backup in Flash). In which case, the board can
just go

OK, I may be bluring the line towards what might be reasonably handled by
loading an OS (but maybe the latest OS image was being tfp'd) but my point
is that a good reset/panic/hang API gives the _board_ ultimate control over
what do do. Currently, ulitimate control of what to do in a particular
failure scenario is hard-coded by drivers and CPU/SOC/arch specific code
with little to no control offered up to the board. What control there is
has been provided by ugly #ifdef's

I am suggesting an API that goes along the lines of:

 - driver/common/board specific code encounters a failure and calls panic()
   There may be cases where the call site _knows_ a hang or reset must be
   performed and would thus provide PANIC_FLAG_HANG or PANIC_FLAG_RESET
 - panic() prints a message (if provided)
 - panic() calls weak hang or reset functions was default implementations
   in arch/soc/cpu
 - do_reset() from the command line calls straight into reset() with
   PANIC_FLAG_USER_RESET
 - boards can override hang() and reset() in order to provide better
   control of the shutdown processes (release DMA buffers etc) or to
   log the reason in non-volatile storage
 - arch hang() and reset() can be called by the board's override to
   perform shutdown of multi-CPU's etc
 - etc

Regards,

Graeme
Wolfgang Denk April 11, 2011, 6:31 p.m. UTC | #18
Dear Graeme Russ,

sorry for the delay.

In message <AANLkTikjCJ6TuJ49TRJWHMh3y=OhFjCKMZd=XxNLvuUD@mail.gmail.com> you wrote:
>
> My point was that everything can be piped through panic()

Yes, it can.  But I don't think that makes sense.

> > Can you please show me a specific case where you would use such
> > different arguments to panic() in the existing code?
> 
> My reasoning is cleaning up the reset()/hang()/panic() API.

Then please keep up with good old Unix philosophy: use small building
block, where each of them fulfils a single purpose, and this done
well.

I seriously dislike the idea of a multifunction panic()
implementation.

> Also, consider devices which do not normally have any device attached to
> log serial output, but you may want to log reset/hang reasons for diagnosis
> later. Board defined hang() and reset() can log the reason in NVRAM and at
> next bootup (with a serial console attached) part of the startup message
> could be 'Last Reset Reason'

Please re-read what I wrote.  Things like hang() or reset() are
supposed to hang or reset _only_.  Any logging is another layer.

> How? For example, if an Ethernet device which the board uses to tftp a file
> from fails to initialise, that failure is detected in the common driver
> code and as a consequence hang(), reset(), or panic() is called. The driver
> can print out a message before calling hang() or reset() (useless if you
> have no serial console attached) and by the time any arch or board specific
> code gets called, all information regarding the failure has been lost. Why
> should a common driver decide if the board should hang or reset? What if

OK, you just proed your own argument wrong.  I agree, a driver should
never just hang(), reset(), or panic() as long as there is a
reasonable way to continue normal operation.


> I am suggesting an API that goes along the lines of:

I understand what you are proposing, and I do not want to accept that.
It is IMO a wrong approach. Functions hang(), reset(), or panic() are
the lowest layer of the implementation, they are function promitives
that are useful as is, and they do exactly what you expect them to do,
without any magic stuff.  Feel free to build your own error handling
and repostiong and logging functions on top of them. If they are
generally useful these may then be reused in more code. But don't try
to put any such stuff into the function primitives.

Best regards,

Wolfgang Denk
diff mbox

Patch

diff --git a/api/api.c b/api/api.c
index 853f010..b65fabd 100644
--- a/api/api.c
+++ b/api/api.c
@@ -131,7 +131,8 @@  static int API_puts(va_list ap)
  */
 static int API_reset(va_list ap)
 {
-	do_reset(NULL, 0, 0, NULL);
+	if (system_restart())
+		return API_ENODEV;
 
 	/* NOT REACHED */
 	return 0;
diff --git a/common/cmd_boot.c b/common/cmd_boot.c
index 7b603d3..c0f26fc 100644
--- a/common/cmd_boot.c
+++ b/common/cmd_boot.c
@@ -58,6 +58,132 @@  int do_go (cmd_tbl_t *cmdtp, int flag, int argc, char * const argv[])
 	return rcode;
 }
 
+/*
+ * emergency_restart()  -  Restart function to call from panic(), etc.
+ *
+ * This may be called from any context when something has gone terribly wrong
+ * and the system needs a best-effort reboot.
+ *
+ * This function will never return.
+ */
+__attribute__((__noreturn__))
+void emergency_restart(void)
+{
+	__board_emergency_restart();
+	__arch_emergency_restart();
+
+	/* Fallback to the old do_reset() until everything is converted. */
+	do_reset(NULL, 0, 0, NULL);
+
+	printf("EMERGENCY RESTART: All attempts to reboot failed!");
+	hang();
+}
+
+/*
+ * Architecture/board-specific emergency-restart hooks.
+ *
+ * These hooks default to the same as the normal restart hooks.
+ *
+ * WARNING: If your platform's "restart" hooks are not a "safe" then you
+ * must redefine these functions in your port as empty stubs.
+ *
+ * Specifically, the following assumptions must be true for these hooks:
+ *   (1) Safe to call from any context (interrupts, etc).
+ *   (2) Must use a hardware mechanism to reset *ALL* system state.
+ *   (3) They must not be interruptible (EG: with Ctrl-C).
+ */
+__attribute__((__weak__))
+void __board_emergency_restart(void)
+{
+	__board_restart();
+}
+
+__attribute__((__weak__))
+void __arch_emergency_restart(void)
+{
+	__arch_restart();
+}
+
+/*
+ * This MUST be called from normal interrupts-on context.  If it requires
+ * extended interaction with external hardware it SHOULD react to Ctrl-C.
+ *
+ * If this function fails to guarantee a clean reboot or receives a Ctrl-C
+ * keystroke it SHOULD return with an error (-1).
+ */
+int system_restart(void)
+{
+	int err;
+
+	/*
+	 * Print a nice message and wait a bit to make sure it goes out the
+	 * console properly.
+	 */
+	printf ("Restarting...\n");
+	udelay(50000);
+
+	/* First let the board code try to reboot */
+	err = __board_restart();
+	if (err)
+		goto failed;
+
+	/* Now call into the architecture-specific code */
+	err = __arch_restart();
+	if (err)
+		goto failed;
+
+	/* Fallback to the old do_reset() until everything is converted. */
+	err = do_reset(NULL, 0, 0, NULL);
+
+failed:
+	printf("*** SYSTEM RESTART FAILED ***\n");
+	return err;
+}
+
+/*
+ * Architecture/board-specific restart hooks.
+ *
+ * If your implementation of these functions does not meet the requirements
+ * for the emergency_restart() hooks described above then you MUST separately
+ * implement those hooks.
+ */
+__attribute__((__weak__))
+int __board_restart(void)
+{
+	/* Fallthrough to architecture-specific code */
+	return 0;
+}
+
+__attribute__((__weak__))
+int __arch_restart(void)
+{
+	/* Fallthrough to legacy do_reset() code */
+	return 0;
+}
+
+/*
+ * Generic reset command implementation
+ *
+ * This is what you get when you type "reset" at the command line.
+ */
+int do_generic_reset(cmd_tbl_t *cmdtp, int flag, int argc, char * const argv[])
+{
+	return system_restart();
+}
+
+/*
+ * Empty legacy "do_reset" stub.
+ *
+ * This allows a platform using the new __board_restart() and
+ * __arch_restart() hooks to completely omit the old do_reset() function.
+ */
+int do_reset_stub(cmd_tbl_t *cmdtp, int flag, int argc, char * const argv[])
+{
+	return 0;
+}
+__attribute__((__weak__,__alias__("do_reset_stub")))
+int do_reset(cmd_tbl_t *cmdtp, int flag, int argc, char * const argv[]);
+
 /* -------------------------------------------------------------------- */
 
 U_BOOT_CMD(
@@ -68,7 +194,7 @@  U_BOOT_CMD(
 );
 
 U_BOOT_CMD(
-	reset, 1, 0,	do_reset,
+	reset, 1, 0,	do_generic_reset,
 	"Perform RESET of the CPU",
 	""
 );
diff --git a/common/cmd_bootm.c b/common/cmd_bootm.c
index 18019d6..60e4983 100644
--- a/common/cmd_bootm.c
+++ b/common/cmd_bootm.c
@@ -645,7 +645,7 @@  int do_bootm (cmd_tbl_t *cmdtp, int flag, int argc, char * const argv[])
 
 	if (ret < 0) {
 		if (ret == BOOTM_ERR_RESET)
-			do_reset (cmdtp, flag, argc, argv);
+			emergency_restart();
 		if (ret == BOOTM_ERR_OVERLAP) {
 			if (images.legacy_hdr_valid) {
 				if (image_get_type (&images.legacy_hdr_os_copy) == IH_TYPE_MULTI)
@@ -655,7 +655,7 @@  int do_bootm (cmd_tbl_t *cmdtp, int flag, int argc, char * const argv[])
 				puts ("ERROR: new format image overwritten - "
 					"must RESET the board to recover\n");
 				show_boot_progress (-113);
-				do_reset (cmdtp, flag, argc, argv);
+				emergency_restart();
 			}
 		}
 		if (ret == BOOTM_ERR_UNIMPLEMENTED) {
@@ -702,9 +702,7 @@  int do_bootm (cmd_tbl_t *cmdtp, int flag, int argc, char * const argv[])
 #ifdef DEBUG
 	puts ("\n## Control returned to monitor - resetting...\n");
 #endif
-	do_reset (cmdtp, flag, argc, argv);
-
-	return 1;
+	emergency_restart();
 }
 
 /**
diff --git a/common/hush.c b/common/hush.c
index 8021a68..f9e57c9 100644
--- a/common/hush.c
+++ b/common/hush.c
@@ -1036,7 +1036,7 @@  static void get_user_input(struct in_str *i)
 	if (n == -2) {
 	  puts("\nTimeout waiting for command\n");
 #  ifdef CONFIG_RESET_TO_RETRY
-	  do_reset(NULL, 0, 0, NULL);
+	  system_restart();
 #  else
 #	error "This currently only works with CONFIG_RESET_TO_RETRY enabled"
 #  endif
diff --git a/common/main.c b/common/main.c
index dcbacc9..abf77a5 100644
--- a/common/main.c
+++ b/common/main.c
@@ -450,7 +450,7 @@  void main_loop (void)
 			puts ("\nTimed out waiting for command\n");
 # ifdef CONFIG_RESET_TO_RETRY
 			/* Reinit board to run initialization code again */
-			do_reset (NULL, 0, 0, NULL);
+			system_restart();
 # else
 			return;		/* retry autoboot */
 # endif
diff --git a/examples/api/libgenwrap.c b/examples/api/libgenwrap.c
index 873cf34..31bfcf5 100644
--- a/examples/api/libgenwrap.c
+++ b/examples/api/libgenwrap.c
@@ -81,9 +81,10 @@  void __udelay(unsigned long usec)
 	ub_udelay(usec);
 }
 
-int do_reset(cmd_tbl_t *cmdtp, int flag, int argc, char * const argv[])
+int system_restart(void)
 {
-	ub_reset();
+	if (ub_reset())
+		return -1;
 	return 0;
 }
 
diff --git a/include/_exports.h b/include/_exports.h
index d89b65b..3d3e511 100644
--- a/include/_exports.h
+++ b/include/_exports.h
@@ -15,7 +15,7 @@  EXPORT_FUNC(free)
 EXPORT_FUNC(udelay)
 EXPORT_FUNC(get_timer)
 EXPORT_FUNC(vprintf)
-EXPORT_FUNC(do_reset)
+EXPORT_FUNC(system_restart)
 EXPORT_FUNC(getenv)
 EXPORT_FUNC(setenv)
 EXPORT_FUNC(simple_strtoul)
diff --git a/include/command.h b/include/command.h
index 8310fe5..ad8c915 100644
--- a/include/command.h
+++ b/include/command.h
@@ -101,6 +101,19 @@  extern int do_bootd(cmd_tbl_t *cmdtp, int flag, int argc, char * const argv[]);
 extern int do_bootm(cmd_tbl_t *cmdtp, int flag, int argc, char * const argv[]);
 extern int do_reset(cmd_tbl_t *cmdtp, int flag, int argc, char * const argv[]);
 
+/* Generic system restart functions */
+__attribute__((__noreturn__))
+void emergency_restart(void);
+int system_restart(void);
+
+/* Architecture/board-specific emergency-restart hooks. */
+void __board_emergency_restart(void);
+void __arch_emergency_restart(void);
+
+/* Architecture/board-specific restart hooks. */
+int __board_restart(void);
+int __arch_restart(void);
+
 #endif	/* __ASSEMBLY__ */
 
 /*
diff --git a/lib/vsprintf.c b/lib/vsprintf.c
index 61e6f0d..295b60b 100644
--- a/lib/vsprintf.c
+++ b/lib/vsprintf.c
@@ -674,6 +674,6 @@  void panic(const char *fmt, ...)
 	hang();
 #else
 	udelay (100000);	/* allow messages to go out */
-	do_reset (NULL, 0, 0, NULL);
+	emergency_restart();
 #endif
 }