diff mbox series

mtd: spi-nor: only apply reset hacks to broken hardware

Message ID 20180727183313.137943-1-computersforpeace@gmail.com
State Accepted
Headers show
Series mtd: spi-nor: only apply reset hacks to broken hardware | expand

Commit Message

Brian Norris July 27, 2018, 6:33 p.m. UTC
Commit 59b356ffd0b0 ("mtd: m25p80: restore the status of SPI flash when
exiting") is the latest from a long history of attempts to add reboot
handling to handle stateful addressing modes on SPI flash. Some prior
mostly-related discussions:

http://lists.infradead.org/pipermail/linux-mtd/2013-March/046343.html
[PATCH 1/3] mtd: m25p80: utilize dedicated 4-byte addressing commands

http://lists.infradead.org/pipermail/barebox/2014-September/020682.html
[RFC] MTD m25p80 3-byte addressing and boot problem

http://lists.infradead.org/pipermail/linux-mtd/2015-February/057683.html
[PATCH 2/2] m25p80: if supported put chip to deep power down if not used

Previously, attempts to add reboot-time software reset handling were
rejected, but the latest attempt was not.

Quick summary of the problem:
Some systems (e.g., boot ROM or bootloader) assume that they can read
initial boot code from their SPI flash using 3-byte addressing. If the
flash is left in 4-byte mode after reset, these systems won't boot. The
above patch provided a shutdown/remove hook to attempt to reset the
addressing mode before we reboot. Notably, this patch misses out on
huge classes of unexpected reboots (e.g., crashes, watchdog resets).

Unfortunately, it is essentially impossible to solve this problem 100%:
if your system doesn't know how to reset the SPI flash to power-on
defaults at initialization time, no amount of software can really rescue
you -- there will always be a chance of some unexpected reset that
leaves your flash in an addressing mode that your boot sequence didn't
expect.

While it is not directly harmful to perform hacks like the
aforementioned commit on all 4-byte addressing flash, a
properly-designed system should not need the hack -- and in fact,
providing this hack may mask the fact that a given system is indeed
broken. So this patch attempts to apply this unsound hack more narrowly,
providing a strong suggestion to developers and system designers that
this is truly a hack. With luck, system designers can catch their errors
early on in their development cycle, rather than applying this hack long
term. But apparently enough systems are out in the wild that we still
have to provide this hack.

Document a new device tree property to denote systems that do not have a
proper hardware (or software) reset mechanism, and apply the hack (with
a loud warning) only in this case.

Signed-off-by: Brian Norris <computersforpeace@gmail.com>
---
Note that I intentionall didn't split the documentation patch. It seems
clearer to do these together IMO, but if it's *really* important to
someone...I can resend
---
 .../devicetree/bindings/mtd/jedec,spi-nor.txt  |  9 +++++++++
 drivers/mtd/spi-nor/spi-nor.c                  | 18 ++++++++++++++++--
 include/linux/mtd/spi-nor.h                    |  1 +
 3 files changed, 26 insertions(+), 2 deletions(-)

Comments

Guenter Roeck July 27, 2018, 7:06 p.m. UTC | #1
On Fri, Jul 27, 2018 at 11:33:13AM -0700, Brian Norris wrote:
> Commit 59b356ffd0b0 ("mtd: m25p80: restore the status of SPI flash when
> exiting") is the latest from a long history of attempts to add reboot
> handling to handle stateful addressing modes on SPI flash. Some prior
> mostly-related discussions:
> 
> http://lists.infradead.org/pipermail/linux-mtd/2013-March/046343.html
> [PATCH 1/3] mtd: m25p80: utilize dedicated 4-byte addressing commands
> 
> http://lists.infradead.org/pipermail/barebox/2014-September/020682.html
> [RFC] MTD m25p80 3-byte addressing and boot problem
> 
> http://lists.infradead.org/pipermail/linux-mtd/2015-February/057683.html
> [PATCH 2/2] m25p80: if supported put chip to deep power down if not used
> 
> Previously, attempts to add reboot-time software reset handling were
> rejected, but the latest attempt was not.
> 
> Quick summary of the problem:
> Some systems (e.g., boot ROM or bootloader) assume that they can read
> initial boot code from their SPI flash using 3-byte addressing. If the
> flash is left in 4-byte mode after reset, these systems won't boot. The
> above patch provided a shutdown/remove hook to attempt to reset the
> addressing mode before we reboot. Notably, this patch misses out on
> huge classes of unexpected reboots (e.g., crashes, watchdog resets).
> 
> Unfortunately, it is essentially impossible to solve this problem 100%:
> if your system doesn't know how to reset the SPI flash to power-on
> defaults at initialization time, no amount of software can really rescue
> you -- there will always be a chance of some unexpected reset that
> leaves your flash in an addressing mode that your boot sequence didn't
> expect.
> 
> While it is not directly harmful to perform hacks like the
> aforementioned commit on all 4-byte addressing flash, a
> properly-designed system should not need the hack -- and in fact,
> providing this hack may mask the fact that a given system is indeed
> broken. So this patch attempts to apply this unsound hack more narrowly,
> providing a strong suggestion to developers and system designers that
> this is truly a hack. With luck, system designers can catch their errors
> early on in their development cycle, rather than applying this hack long
> term. But apparently enough systems are out in the wild that we still
> have to provide this hack.
> 
> Document a new device tree property to denote systems that do not have a
> proper hardware (or software) reset mechanism, and apply the hack (with
> a loud warning) only in this case.
> 
> Signed-off-by: Brian Norris <computersforpeace@gmail.com>

Reviewed-by: Guenter Roeck <linux@roeck-us.net>

> ---
> Note that I intentionall didn't split the documentation patch. It seems
> clearer to do these together IMO, but if it's *really* important to
> someone...I can resend
> ---
>  .../devicetree/bindings/mtd/jedec,spi-nor.txt  |  9 +++++++++
>  drivers/mtd/spi-nor/spi-nor.c                  | 18 ++++++++++++++++--
>  include/linux/mtd/spi-nor.h                    |  1 +
>  3 files changed, 26 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/devicetree/bindings/mtd/jedec,spi-nor.txt b/Documentation/devicetree/bindings/mtd/jedec,spi-nor.txt
> index 956bb046e599..f03be904d3c2 100644
> --- a/Documentation/devicetree/bindings/mtd/jedec,spi-nor.txt
> +++ b/Documentation/devicetree/bindings/mtd/jedec,spi-nor.txt
> @@ -69,6 +69,15 @@ Optional properties:
>                     all chips and support for it can not be detected at runtime.
>                     Refer to your chips' datasheet to check if this is supported
>                     by your chip.
> +- broken-flash-reset : Some flash devices utilize stateful addressing modes
> +		   (e.g., for 32-bit addressing) which need to be managed
> +		   carefully by a system. Because these sorts of flash don't
> +		   have a standardized software reset command, and because some
> +		   systems don't toggle the flash RESET# pin upon system reset
> +		   (if the pin even exists at all), there are systems which
> +		   cannot reboot properly if the flash is left in the "wrong"
> +		   state. This boolean flag can be used on such systems, to
> +		   denote the absence of a reliable reset mechanism.
>  
>  Example:
>  
> diff --git a/drivers/mtd/spi-nor/spi-nor.c b/drivers/mtd/spi-nor/spi-nor.c
> index d9c368c44194..f028277fb1ce 100644
> --- a/drivers/mtd/spi-nor/spi-nor.c
> +++ b/drivers/mtd/spi-nor/spi-nor.c
> @@ -2757,8 +2757,18 @@ static int spi_nor_init(struct spi_nor *nor)
>  
>  	if ((nor->addr_width == 4) &&
>  	    (JEDEC_MFR(nor->info) != SNOR_MFR_SPANSION) &&
> -	    !(nor->info->flags & SPI_NOR_4B_OPCODES))
> +	    !(nor->info->flags & SPI_NOR_4B_OPCODES)) {
> +		/*
> +		 * If the RESET# pin isn't hooked up properly, or the system
> +		 * otherwise doesn't perform a reset command in the boot
> +		 * sequence, it's impossible to 100% protect against unexpected
> +		 * reboots (e.g., crashes). Warn the user (or hopefully, system
> +		 * designer) that this is bad.
> +		 */
> +		WARN_ONCE(nor->flags & SNOR_F_BROKEN_RESET,
> +			  "enabling reset hack; may not recover from unexpected reboots\n");
>  		set_4byte(nor, nor->info, 1);
> +	}
>  
>  	return 0;
>  }
> @@ -2781,7 +2791,8 @@ void spi_nor_restore(struct spi_nor *nor)
>  	/* restore the addressing mode */
>  	if ((nor->addr_width == 4) &&
>  	    (JEDEC_MFR(nor->info) != SNOR_MFR_SPANSION) &&
> -	    !(nor->info->flags & SPI_NOR_4B_OPCODES))
> +	    !(nor->info->flags & SPI_NOR_4B_OPCODES) &&
> +	    (nor->flags & SNOR_F_BROKEN_RESET))
>  		set_4byte(nor, nor->info, 0);
>  }
>  EXPORT_SYMBOL_GPL(spi_nor_restore);
> @@ -2911,6 +2922,9 @@ int spi_nor_scan(struct spi_nor *nor, const char *name,
>  		params.hwcaps.mask |= SNOR_HWCAPS_READ_FAST;
>  	}
>  
> +	if (of_property_read_bool(np, "broken-flash-reset"))
> +		nor->flags |= SNOR_F_BROKEN_RESET;
> +
>  	/* Some devices cannot do fast-read, no matter what DT tells us */
>  	if (info->flags & SPI_NOR_NO_FR)
>  		params.hwcaps.mask &= ~SNOR_HWCAPS_READ_FAST;
> diff --git a/include/linux/mtd/spi-nor.h b/include/linux/mtd/spi-nor.h
> index e60da0d34cc1..c922e97f205a 100644
> --- a/include/linux/mtd/spi-nor.h
> +++ b/include/linux/mtd/spi-nor.h
> @@ -235,6 +235,7 @@ enum spi_nor_option_flags {
>  	SNOR_F_S3AN_ADDR_DEFAULT = BIT(3),
>  	SNOR_F_READY_XSR_RDY	= BIT(4),
>  	SNOR_F_USE_CLSR		= BIT(5),
> +	SNOR_F_BROKEN_RESET	= BIT(6),
>  };
>  
>  /**
Boris Brezillon July 27, 2018, 8:03 p.m. UTC | #2
On Fri, 27 Jul 2018 11:33:13 -0700
Brian Norris <computersforpeace@gmail.com> wrote:

> Commit 59b356ffd0b0 ("mtd: m25p80: restore the status of SPI flash when
> exiting") is the latest from a long history of attempts to add reboot
> handling to handle stateful addressing modes on SPI flash. Some prior
> mostly-related discussions:
> 
> http://lists.infradead.org/pipermail/linux-mtd/2013-March/046343.html
> [PATCH 1/3] mtd: m25p80: utilize dedicated 4-byte addressing commands
> 
> http://lists.infradead.org/pipermail/barebox/2014-September/020682.html
> [RFC] MTD m25p80 3-byte addressing and boot problem
> 
> http://lists.infradead.org/pipermail/linux-mtd/2015-February/057683.html
> [PATCH 2/2] m25p80: if supported put chip to deep power down if not used
> 
> Previously, attempts to add reboot-time software reset handling were
> rejected, but the latest attempt was not.
> 
> Quick summary of the problem:
> Some systems (e.g., boot ROM or bootloader) assume that they can read
> initial boot code from their SPI flash using 3-byte addressing. If the
> flash is left in 4-byte mode after reset, these systems won't boot. The
> above patch provided a shutdown/remove hook to attempt to reset the
> addressing mode before we reboot. Notably, this patch misses out on
> huge classes of unexpected reboots (e.g., crashes, watchdog resets).
> 
> Unfortunately, it is essentially impossible to solve this problem 100%:
> if your system doesn't know how to reset the SPI flash to power-on
> defaults at initialization time, no amount of software can really rescue
> you -- there will always be a chance of some unexpected reset that
> leaves your flash in an addressing mode that your boot sequence didn't
> expect.
> 
> While it is not directly harmful to perform hacks like the
> aforementioned commit on all 4-byte addressing flash, a
> properly-designed system should not need the hack -- and in fact,
> providing this hack may mask the fact that a given system is indeed
> broken. So this patch attempts to apply this unsound hack more narrowly,
> providing a strong suggestion to developers and system designers that
> this is truly a hack. With luck, system designers can catch their errors
> early on in their development cycle, rather than applying this hack long
> term. But apparently enough systems are out in the wild that we still
> have to provide this hack.
> 
> Document a new device tree property to denote systems that do not have a
> proper hardware (or software) reset mechanism, and apply the hack (with
> a loud warning) only in this case.
> 
> Signed-off-by: Brian Norris <computersforpeace@gmail.com>
> ---
> Note that I intentionall didn't split the documentation patch. It seems
> clearer to do these together IMO, but if it's *really* important to
> someone...I can resend

I'm fine with that.

I'll leave Neil some time to review/test/comment on the patch before
queuing it, but it looks good to me.

Thanks,

Boris

> ---
>  .../devicetree/bindings/mtd/jedec,spi-nor.txt  |  9 +++++++++
>  drivers/mtd/spi-nor/spi-nor.c                  | 18 ++++++++++++++++--
>  include/linux/mtd/spi-nor.h                    |  1 +
>  3 files changed, 26 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/devicetree/bindings/mtd/jedec,spi-nor.txt b/Documentation/devicetree/bindings/mtd/jedec,spi-nor.txt
> index 956bb046e599..f03be904d3c2 100644
> --- a/Documentation/devicetree/bindings/mtd/jedec,spi-nor.txt
> +++ b/Documentation/devicetree/bindings/mtd/jedec,spi-nor.txt
> @@ -69,6 +69,15 @@ Optional properties:
>                     all chips and support for it can not be detected at runtime.
>                     Refer to your chips' datasheet to check if this is supported
>                     by your chip.
> +- broken-flash-reset : Some flash devices utilize stateful addressing modes
> +		   (e.g., for 32-bit addressing) which need to be managed
> +		   carefully by a system. Because these sorts of flash don't
> +		   have a standardized software reset command, and because some
> +		   systems don't toggle the flash RESET# pin upon system reset
> +		   (if the pin even exists at all), there are systems which
> +		   cannot reboot properly if the flash is left in the "wrong"
> +		   state. This boolean flag can be used on such systems, to
> +		   denote the absence of a reliable reset mechanism.
>  
>  Example:
>  
> diff --git a/drivers/mtd/spi-nor/spi-nor.c b/drivers/mtd/spi-nor/spi-nor.c
> index d9c368c44194..f028277fb1ce 100644
> --- a/drivers/mtd/spi-nor/spi-nor.c
> +++ b/drivers/mtd/spi-nor/spi-nor.c
> @@ -2757,8 +2757,18 @@ static int spi_nor_init(struct spi_nor *nor)
>  
>  	if ((nor->addr_width == 4) &&
>  	    (JEDEC_MFR(nor->info) != SNOR_MFR_SPANSION) &&
> -	    !(nor->info->flags & SPI_NOR_4B_OPCODES))
> +	    !(nor->info->flags & SPI_NOR_4B_OPCODES)) {
> +		/*
> +		 * If the RESET# pin isn't hooked up properly, or the system
> +		 * otherwise doesn't perform a reset command in the boot
> +		 * sequence, it's impossible to 100% protect against unexpected
> +		 * reboots (e.g., crashes). Warn the user (or hopefully, system
> +		 * designer) that this is bad.
> +		 */
> +		WARN_ONCE(nor->flags & SNOR_F_BROKEN_RESET,
> +			  "enabling reset hack; may not recover from unexpected reboots\n");
>  		set_4byte(nor, nor->info, 1);
> +	}
>  
>  	return 0;
>  }
> @@ -2781,7 +2791,8 @@ void spi_nor_restore(struct spi_nor *nor)
>  	/* restore the addressing mode */
>  	if ((nor->addr_width == 4) &&
>  	    (JEDEC_MFR(nor->info) != SNOR_MFR_SPANSION) &&
> -	    !(nor->info->flags & SPI_NOR_4B_OPCODES))
> +	    !(nor->info->flags & SPI_NOR_4B_OPCODES) &&
> +	    (nor->flags & SNOR_F_BROKEN_RESET))
>  		set_4byte(nor, nor->info, 0);
>  }
>  EXPORT_SYMBOL_GPL(spi_nor_restore);
> @@ -2911,6 +2922,9 @@ int spi_nor_scan(struct spi_nor *nor, const char *name,
>  		params.hwcaps.mask |= SNOR_HWCAPS_READ_FAST;
>  	}
>  
> +	if (of_property_read_bool(np, "broken-flash-reset"))
> +		nor->flags |= SNOR_F_BROKEN_RESET;
> +
>  	/* Some devices cannot do fast-read, no matter what DT tells us */
>  	if (info->flags & SPI_NOR_NO_FR)
>  		params.hwcaps.mask &= ~SNOR_HWCAPS_READ_FAST;
> diff --git a/include/linux/mtd/spi-nor.h b/include/linux/mtd/spi-nor.h
> index e60da0d34cc1..c922e97f205a 100644
> --- a/include/linux/mtd/spi-nor.h
> +++ b/include/linux/mtd/spi-nor.h
> @@ -235,6 +235,7 @@ enum spi_nor_option_flags {
>  	SNOR_F_S3AN_ADDR_DEFAULT = BIT(3),
>  	SNOR_F_READY_XSR_RDY	= BIT(4),
>  	SNOR_F_USE_CLSR		= BIT(5),
> +	SNOR_F_BROKEN_RESET	= BIT(6),
>  };
>  
>  /**
NeilBrown July 31, 2018, 1:05 a.m. UTC | #3
On Fri, Jul 27 2018, Boris Brezillon wrote:

> On Fri, 27 Jul 2018 11:33:13 -0700
> Brian Norris <computersforpeace@gmail.com> wrote:
>
>> Commit 59b356ffd0b0 ("mtd: m25p80: restore the status of SPI flash when
>> exiting") is the latest from a long history of attempts to add reboot
>> handling to handle stateful addressing modes on SPI flash. Some prior
>> mostly-related discussions:
>> 
>> http://lists.infradead.org/pipermail/linux-mtd/2013-March/046343.html
>> [PATCH 1/3] mtd: m25p80: utilize dedicated 4-byte addressing commands
>> 
>> http://lists.infradead.org/pipermail/barebox/2014-September/020682.html
>> [RFC] MTD m25p80 3-byte addressing and boot problem
>> 
>> http://lists.infradead.org/pipermail/linux-mtd/2015-February/057683.html
>> [PATCH 2/2] m25p80: if supported put chip to deep power down if not used
>> 
>> Previously, attempts to add reboot-time software reset handling were
>> rejected, but the latest attempt was not.
>> 
>> Quick summary of the problem:
>> Some systems (e.g., boot ROM or bootloader) assume that they can read
>> initial boot code from their SPI flash using 3-byte addressing. If the
>> flash is left in 4-byte mode after reset, these systems won't boot. The
>> above patch provided a shutdown/remove hook to attempt to reset the
>> addressing mode before we reboot. Notably, this patch misses out on
>> huge classes of unexpected reboots (e.g., crashes, watchdog resets).
>> 
>> Unfortunately, it is essentially impossible to solve this problem 100%:
>> if your system doesn't know how to reset the SPI flash to power-on
>> defaults at initialization time, no amount of software can really rescue
>> you -- there will always be a chance of some unexpected reset that
>> leaves your flash in an addressing mode that your boot sequence didn't
>> expect.
>> 
>> While it is not directly harmful to perform hacks like the
>> aforementioned commit on all 4-byte addressing flash, a
>> properly-designed system should not need the hack -- and in fact,
>> providing this hack may mask the fact that a given system is indeed
>> broken. So this patch attempts to apply this unsound hack more narrowly,
>> providing a strong suggestion to developers and system designers that
>> this is truly a hack. With luck, system designers can catch their errors
>> early on in their development cycle, rather than applying this hack long
>> term. But apparently enough systems are out in the wild that we still
>> have to provide this hack.
>> 
>> Document a new device tree property to denote systems that do not have a
>> proper hardware (or software) reset mechanism, and apply the hack (with
>> a loud warning) only in this case.
>> 
>> Signed-off-by: Brian Norris <computersforpeace@gmail.com>
>> ---
>> Note that I intentionall didn't split the documentation patch. It seems
>> clearer to do these together IMO, but if it's *really* important to
>> someone...I can resend
>
> I'm fine with that.
>
> I'll leave Neil some time to review/test/comment on the patch before
> queuing it, but it looks good to me.

Thanks.
I can confirm that if I apply this patch, my system won't reboot
properly (as expected), and if I then add

		broken-flash-reset;

to the jedec,spi-nor device, it starts functioning correctly again.

I don't like the pejorative "broken", and it also suggests that a thing
used to work, but something happened to break it - this is not
accurate.
I would prefer something like "reset-not-connected" which is an accurate
description of the state of the hardware.

I also think that having a WARN_ON is an over-reaction.  Certainly a
warning could be appropriate, but just one pr_warn() should be enough.
The "problem" is unlikely in practice, and loudly warning people that an
asteroid might kill them isn't particularly helpful.

I genuinely think that if the system fails to reboot, then Linux is at
fault. I accept that changing Linux to be completely robust might be
more trouble than it is worth, but I don't accept that it is impossible.

But I don't intend to fight either of these battles.

Thanks,
NeilBrown

>
> Thanks,
>
> Boris
>
>> ---
>>  .../devicetree/bindings/mtd/jedec,spi-nor.txt  |  9 +++++++++
>>  drivers/mtd/spi-nor/spi-nor.c                  | 18 ++++++++++++++++--
>>  include/linux/mtd/spi-nor.h                    |  1 +
>>  3 files changed, 26 insertions(+), 2 deletions(-)
>> 
>> diff --git a/Documentation/devicetree/bindings/mtd/jedec,spi-nor.txt b/Documentation/devicetree/bindings/mtd/jedec,spi-nor.txt
>> index 956bb046e599..f03be904d3c2 100644
>> --- a/Documentation/devicetree/bindings/mtd/jedec,spi-nor.txt
>> +++ b/Documentation/devicetree/bindings/mtd/jedec,spi-nor.txt
>> @@ -69,6 +69,15 @@ Optional properties:
>>                     all chips and support for it can not be detected at runtime.
>>                     Refer to your chips' datasheet to check if this is supported
>>                     by your chip.
>> +- broken-flash-reset : Some flash devices utilize stateful addressing modes
>> +		   (e.g., for 32-bit addressing) which need to be managed
>> +		   carefully by a system. Because these sorts of flash don't
>> +		   have a standardized software reset command, and because some
>> +		   systems don't toggle the flash RESET# pin upon system reset
>> +		   (if the pin even exists at all), there are systems which
>> +		   cannot reboot properly if the flash is left in the "wrong"
>> +		   state. This boolean flag can be used on such systems, to
>> +		   denote the absence of a reliable reset mechanism.
>>  
>>  Example:
>>  
>> diff --git a/drivers/mtd/spi-nor/spi-nor.c b/drivers/mtd/spi-nor/spi-nor.c
>> index d9c368c44194..f028277fb1ce 100644
>> --- a/drivers/mtd/spi-nor/spi-nor.c
>> +++ b/drivers/mtd/spi-nor/spi-nor.c
>> @@ -2757,8 +2757,18 @@ static int spi_nor_init(struct spi_nor *nor)
>>  
>>  	if ((nor->addr_width == 4) &&
>>  	    (JEDEC_MFR(nor->info) != SNOR_MFR_SPANSION) &&
>> -	    !(nor->info->flags & SPI_NOR_4B_OPCODES))
>> +	    !(nor->info->flags & SPI_NOR_4B_OPCODES)) {
>> +		/*
>> +		 * If the RESET# pin isn't hooked up properly, or the system
>> +		 * otherwise doesn't perform a reset command in the boot
>> +		 * sequence, it's impossible to 100% protect against unexpected
>> +		 * reboots (e.g., crashes). Warn the user (or hopefully, system
>> +		 * designer) that this is bad.
>> +		 */
>> +		WARN_ONCE(nor->flags & SNOR_F_BROKEN_RESET,
>> +			  "enabling reset hack; may not recover from unexpected reboots\n");
>>  		set_4byte(nor, nor->info, 1);
>> +	}
>>  
>>  	return 0;
>>  }
>> @@ -2781,7 +2791,8 @@ void spi_nor_restore(struct spi_nor *nor)
>>  	/* restore the addressing mode */
>>  	if ((nor->addr_width == 4) &&
>>  	    (JEDEC_MFR(nor->info) != SNOR_MFR_SPANSION) &&
>> -	    !(nor->info->flags & SPI_NOR_4B_OPCODES))
>> +	    !(nor->info->flags & SPI_NOR_4B_OPCODES) &&
>> +	    (nor->flags & SNOR_F_BROKEN_RESET))
>>  		set_4byte(nor, nor->info, 0);
>>  }
>>  EXPORT_SYMBOL_GPL(spi_nor_restore);
>> @@ -2911,6 +2922,9 @@ int spi_nor_scan(struct spi_nor *nor, const char *name,
>>  		params.hwcaps.mask |= SNOR_HWCAPS_READ_FAST;
>>  	}
>>  
>> +	if (of_property_read_bool(np, "broken-flash-reset"))
>> +		nor->flags |= SNOR_F_BROKEN_RESET;
>> +
>>  	/* Some devices cannot do fast-read, no matter what DT tells us */
>>  	if (info->flags & SPI_NOR_NO_FR)
>>  		params.hwcaps.mask &= ~SNOR_HWCAPS_READ_FAST;
>> diff --git a/include/linux/mtd/spi-nor.h b/include/linux/mtd/spi-nor.h
>> index e60da0d34cc1..c922e97f205a 100644
>> --- a/include/linux/mtd/spi-nor.h
>> +++ b/include/linux/mtd/spi-nor.h
>> @@ -235,6 +235,7 @@ enum spi_nor_option_flags {
>>  	SNOR_F_S3AN_ADDR_DEFAULT = BIT(3),
>>  	SNOR_F_READY_XSR_RDY	= BIT(4),
>>  	SNOR_F_USE_CLSR		= BIT(5),
>> +	SNOR_F_BROKEN_RESET	= BIT(6),
>>  };
>>  
>>  /**
Boris Brezillon July 31, 2018, 8:12 p.m. UTC | #4
On Tue, 31 Jul 2018 11:05:11 +1000
NeilBrown <neilb@suse.com> wrote:

> On Fri, Jul 27 2018, Boris Brezillon wrote:
> 
> > On Fri, 27 Jul 2018 11:33:13 -0700
> > Brian Norris <computersforpeace@gmail.com> wrote:
> >  
> >> Commit 59b356ffd0b0 ("mtd: m25p80: restore the status of SPI flash when
> >> exiting") is the latest from a long history of attempts to add reboot
> >> handling to handle stateful addressing modes on SPI flash. Some prior
> >> mostly-related discussions:
> >> 
> >> http://lists.infradead.org/pipermail/linux-mtd/2013-March/046343.html
> >> [PATCH 1/3] mtd: m25p80: utilize dedicated 4-byte addressing commands
> >> 
> >> http://lists.infradead.org/pipermail/barebox/2014-September/020682.html
> >> [RFC] MTD m25p80 3-byte addressing and boot problem
> >> 
> >> http://lists.infradead.org/pipermail/linux-mtd/2015-February/057683.html
> >> [PATCH 2/2] m25p80: if supported put chip to deep power down if not used
> >> 
> >> Previously, attempts to add reboot-time software reset handling were
> >> rejected, but the latest attempt was not.
> >> 
> >> Quick summary of the problem:
> >> Some systems (e.g., boot ROM or bootloader) assume that they can read
> >> initial boot code from their SPI flash using 3-byte addressing. If the
> >> flash is left in 4-byte mode after reset, these systems won't boot. The
> >> above patch provided a shutdown/remove hook to attempt to reset the
> >> addressing mode before we reboot. Notably, this patch misses out on
> >> huge classes of unexpected reboots (e.g., crashes, watchdog resets).
> >> 
> >> Unfortunately, it is essentially impossible to solve this problem 100%:
> >> if your system doesn't know how to reset the SPI flash to power-on
> >> defaults at initialization time, no amount of software can really rescue
> >> you -- there will always be a chance of some unexpected reset that
> >> leaves your flash in an addressing mode that your boot sequence didn't
> >> expect.
> >> 
> >> While it is not directly harmful to perform hacks like the
> >> aforementioned commit on all 4-byte addressing flash, a
> >> properly-designed system should not need the hack -- and in fact,
> >> providing this hack may mask the fact that a given system is indeed
> >> broken. So this patch attempts to apply this unsound hack more narrowly,
> >> providing a strong suggestion to developers and system designers that
> >> this is truly a hack. With luck, system designers can catch their errors
> >> early on in their development cycle, rather than applying this hack long
> >> term. But apparently enough systems are out in the wild that we still
> >> have to provide this hack.
> >> 
> >> Document a new device tree property to denote systems that do not have a
> >> proper hardware (or software) reset mechanism, and apply the hack (with
> >> a loud warning) only in this case.
> >> 
> >> Signed-off-by: Brian Norris <computersforpeace@gmail.com>
> >> ---
> >> Note that I intentionall didn't split the documentation patch. It seems
> >> clearer to do these together IMO, but if it's *really* important to
> >> someone...I can resend  
> >
> > I'm fine with that.
> >
> > I'll leave Neil some time to review/test/comment on the patch before
> > queuing it, but it looks good to me.  
> 
> Thanks.
> I can confirm that if I apply this patch, my system won't reboot
> properly (as expected), and if I then add
> 
> 		broken-flash-reset;
> 
> to the jedec,spi-nor device, it starts functioning correctly again.
> 
> I don't like the pejorative "broken", and it also suggests that a thing
> used to work, but something happened to break it - this is not
> accurate.
> I would prefer something like "reset-not-connected" which is an accurate
> description of the state of the hardware.
> 
> I also think that having a WARN_ON is an over-reaction.  Certainly a
> warning could be appropriate, but just one pr_warn() should be enough.
> The "problem" is unlikely in practice, and loudly warning people that an
> asteroid might kill them isn't particularly helpful.
> 
> I genuinely think that if the system fails to reboot, then Linux is at
> fault. I accept that changing Linux to be completely robust might be
> more trouble than it is worth, but I don't accept that it is impossible.
> 
> But I don't intend to fight either of these battles.

Does that mean you're accepting this change? Brian, any comment on what
Neil said?

To be honest, I hate being in the middle of this discussion without
having been involved in the first decision to accept such workarounds.
I keep thinking that making boards that do not have reset properly
wired less likely to fail rebooting is a wise decision, but I also
agree with Brian when he says we should inform people that their design
is unreliable.
The main problem I see here, is that adding this prop won't help people
figuring out what is wrong with their design, it will just help them
workaround the problem when they find out, and it might already be to
late to fix the HW design. But maybe it's not what we're trying to do
here. Maybe we just want to warn users that rebooting such boards is a
risky procedure.
Marek Vasut July 31, 2018, 10:15 p.m. UTC | #5
On 07/31/2018 10:12 PM, Boris Brezillon wrote:
> On Tue, 31 Jul 2018 11:05:11 +1000
> NeilBrown <neilb@suse.com> wrote:
> 
>> On Fri, Jul 27 2018, Boris Brezillon wrote:
>>
>>> On Fri, 27 Jul 2018 11:33:13 -0700
>>> Brian Norris <computersforpeace@gmail.com> wrote:
>>>  
>>>> Commit 59b356ffd0b0 ("mtd: m25p80: restore the status of SPI flash when
>>>> exiting") is the latest from a long history of attempts to add reboot
>>>> handling to handle stateful addressing modes on SPI flash. Some prior
>>>> mostly-related discussions:
>>>>
>>>> http://lists.infradead.org/pipermail/linux-mtd/2013-March/046343.html
>>>> [PATCH 1/3] mtd: m25p80: utilize dedicated 4-byte addressing commands
>>>>
>>>> http://lists.infradead.org/pipermail/barebox/2014-September/020682.html
>>>> [RFC] MTD m25p80 3-byte addressing and boot problem
>>>>
>>>> http://lists.infradead.org/pipermail/linux-mtd/2015-February/057683.html
>>>> [PATCH 2/2] m25p80: if supported put chip to deep power down if not used
>>>>
>>>> Previously, attempts to add reboot-time software reset handling were
>>>> rejected, but the latest attempt was not.
>>>>
>>>> Quick summary of the problem:
>>>> Some systems (e.g., boot ROM or bootloader) assume that they can read
>>>> initial boot code from their SPI flash using 3-byte addressing. If the
>>>> flash is left in 4-byte mode after reset, these systems won't boot. The
>>>> above patch provided a shutdown/remove hook to attempt to reset the
>>>> addressing mode before we reboot. Notably, this patch misses out on
>>>> huge classes of unexpected reboots (e.g., crashes, watchdog resets).
>>>>
>>>> Unfortunately, it is essentially impossible to solve this problem 100%:
>>>> if your system doesn't know how to reset the SPI flash to power-on
>>>> defaults at initialization time, no amount of software can really rescue
>>>> you -- there will always be a chance of some unexpected reset that
>>>> leaves your flash in an addressing mode that your boot sequence didn't
>>>> expect.
>>>>
>>>> While it is not directly harmful to perform hacks like the
>>>> aforementioned commit on all 4-byte addressing flash, a
>>>> properly-designed system should not need the hack -- and in fact,
>>>> providing this hack may mask the fact that a given system is indeed
>>>> broken. So this patch attempts to apply this unsound hack more narrowly,
>>>> providing a strong suggestion to developers and system designers that
>>>> this is truly a hack. With luck, system designers can catch their errors
>>>> early on in their development cycle, rather than applying this hack long
>>>> term. But apparently enough systems are out in the wild that we still
>>>> have to provide this hack.
>>>>
>>>> Document a new device tree property to denote systems that do not have a
>>>> proper hardware (or software) reset mechanism, and apply the hack (with
>>>> a loud warning) only in this case.
>>>>
>>>> Signed-off-by: Brian Norris <computersforpeace@gmail.com>
>>>> ---
>>>> Note that I intentionall didn't split the documentation patch. It seems
>>>> clearer to do these together IMO, but if it's *really* important to
>>>> someone...I can resend  
>>>
>>> I'm fine with that.
>>>
>>> I'll leave Neil some time to review/test/comment on the patch before
>>> queuing it, but it looks good to me.  
>>
>> Thanks.
>> I can confirm that if I apply this patch, my system won't reboot
>> properly (as expected), and if I then add
>>
>> 		broken-flash-reset;
>>
>> to the jedec,spi-nor device, it starts functioning correctly again.
>>
>> I don't like the pejorative "broken", and it also suggests that a thing
>> used to work, but something happened to break it - this is not
>> accurate.
>> I would prefer something like "reset-not-connected" which is an accurate
>> description of the state of the hardware.
>>
>> I also think that having a WARN_ON is an over-reaction.  Certainly a
>> warning could be appropriate, but just one pr_warn() should be enough.
>> The "problem" is unlikely in practice, and loudly warning people that an
>> asteroid might kill them isn't particularly helpful.
>>
>> I genuinely think that if the system fails to reboot, then Linux is at
>> fault. I accept that changing Linux to be completely robust might be
>> more trouble than it is worth, but I don't accept that it is impossible.
>>
>> But I don't intend to fight either of these battles.
> 
> Does that mean you're accepting this change? Brian, any comment on what
> Neil said?
> 
> To be honest, I hate being in the middle of this discussion without
> having been involved in the first decision to accept such workarounds.
> I keep thinking that making boards that do not have reset properly
> wired less likely to fail rebooting is a wise decision, but I also
> agree with Brian when he says we should inform people that their design
> is unreliable.

Hiding the issue in most cases only leads to vendors making more such
crippled boards and never learning.

> The main problem I see here, is that adding this prop won't help people
> figuring out what is wrong with their design, it will just help them
> workaround the problem when they find out, and it might already be to
> late to fix the HW design. But maybe it's not what we're trying to do
> here. Maybe we just want to warn users that rebooting such boards is a
> risky procedure.

The thing is, this is not a workaround, it's just a way of hiding the
problem because the problem does not go away completely. There are still
scenarios in which the system will fail.
Brian Norris July 31, 2018, 10:35 p.m. UTC | #6
Hi Neil, Boris,

On Tue, Jul 31, 2018 at 10:12:55PM +0200, Boris Brezillon wrote:
> On Tue, 31 Jul 2018 11:05:11 +1000
> NeilBrown <neilb@suse.com> wrote:
> > On Fri, Jul 27 2018, Boris Brezillon wrote:
> > > On Fri, 27 Jul 2018 11:33:13 -0700
> > > I'll leave Neil some time to review/test/comment on the patch before
> > > queuing it, but it looks good to me.  
> > 
> > Thanks.
> > I can confirm that if I apply this patch, my system won't reboot
> > properly (as expected), and if I then add
> > 
> > 		broken-flash-reset;
> > 
> > to the jedec,spi-nor device, it starts functioning correctly again.
> > 
> > I don't like the pejorative "broken", and it also suggests that a thing
> > used to work, but something happened to break it - this is not
> > accurate.
> > I would prefer something like "reset-not-connected" which is an accurate
> > description of the state of the hardware.

One reason I didn't specifically say something like "not connected", is
because IIUC it's actually *possible* to have a robust boot sequence
without the RESET# pin -- e.g., if your boot ROM hardcoded a software
reset command (just because it's not really standardized doesn't mean
one can't do it).

> > I also think that having a WARN_ON is an over-reaction.  Certainly a
> > warning could be appropriate, but just one pr_warn() should be enough.
> > The "problem" is unlikely in practice, and loudly warning people that an
> > asteroid might kill them isn't particularly helpful.
> > 
> > I genuinely think that if the system fails to reboot, then Linux is at
> > fault. I accept that changing Linux to be completely robust might be
> > more trouble than it is worth, but I don't accept that it is impossible.

Did you read my last response on the original thread? In my
understanding, there's always a way to, e.g., b0rk your exception
handlers, etc., such that you cannot guarantee your software fallbacks
will work. Normally, one would rely on a (hardware) watchdog to do your
last resort reset for you, but if said reset cannot also reset your boot
flash, then...you're stuck.

IOW, it's impossible.

Is that not an accurate description?

> > But I don't intend to fight either of these battles.
> 
> Does that mean you're accepting this change? Brian, any comment on what
> Neil said?
> 
> To be honest, I hate being in the middle of this discussion without
> having been involved in the first decision to accept such workarounds.
> I keep thinking that making boards that do not have reset properly
> wired less likely to fail rebooting is a wise decision, but I also
> agree with Brian when he says we should inform people that their design
> is unreliable.
> The main problem I see here, is that adding this prop won't help people
> figuring out what is wrong with their design, it will just help them

How else would we help someone figure out what's wrong with their
design? My best attempt is to make it quite obvious, as long as they're
using vanilla mainline: if their system hangs on reboot (without this
property), then it's probably a bad design.

And if instead, someone stuck in this DT property already, the loud
warning might suggest the reader look at the DT binding doc or code
comments, where I elaborated.

> workaround the problem when they find out, and it might already be to
> late to fix the HW design. But maybe it's not what we're trying to do
> here. Maybe we just want to warn users that rebooting such boards is a
> risky procedure.

Brian
NeilBrown Aug. 1, 2018, 12:38 a.m. UTC | #7
On Tue, Jul 31 2018, Boris Brezillon wrote:

> On Tue, 31 Jul 2018 11:05:11 +1000
> NeilBrown <neilb@suse.com> wrote:
>
>> On Fri, Jul 27 2018, Boris Brezillon wrote:
>> 
>> > On Fri, 27 Jul 2018 11:33:13 -0700
>> > Brian Norris <computersforpeace@gmail.com> wrote:
>> >  
>> >> Commit 59b356ffd0b0 ("mtd: m25p80: restore the status of SPI flash when
>> >> exiting") is the latest from a long history of attempts to add reboot
>> >> handling to handle stateful addressing modes on SPI flash. Some prior
>> >> mostly-related discussions:
>> >> 
>> >> http://lists.infradead.org/pipermail/linux-mtd/2013-March/046343.html
>> >> [PATCH 1/3] mtd: m25p80: utilize dedicated 4-byte addressing commands
>> >> 
>> >> http://lists.infradead.org/pipermail/barebox/2014-September/020682.html
>> >> [RFC] MTD m25p80 3-byte addressing and boot problem
>> >> 
>> >> http://lists.infradead.org/pipermail/linux-mtd/2015-February/057683.html
>> >> [PATCH 2/2] m25p80: if supported put chip to deep power down if not used
>> >> 
>> >> Previously, attempts to add reboot-time software reset handling were
>> >> rejected, but the latest attempt was not.
>> >> 
>> >> Quick summary of the problem:
>> >> Some systems (e.g., boot ROM or bootloader) assume that they can read
>> >> initial boot code from their SPI flash using 3-byte addressing. If the
>> >> flash is left in 4-byte mode after reset, these systems won't boot. The
>> >> above patch provided a shutdown/remove hook to attempt to reset the
>> >> addressing mode before we reboot. Notably, this patch misses out on
>> >> huge classes of unexpected reboots (e.g., crashes, watchdog resets).
>> >> 
>> >> Unfortunately, it is essentially impossible to solve this problem 100%:
>> >> if your system doesn't know how to reset the SPI flash to power-on
>> >> defaults at initialization time, no amount of software can really rescue
>> >> you -- there will always be a chance of some unexpected reset that
>> >> leaves your flash in an addressing mode that your boot sequence didn't
>> >> expect.
>> >> 
>> >> While it is not directly harmful to perform hacks like the
>> >> aforementioned commit on all 4-byte addressing flash, a
>> >> properly-designed system should not need the hack -- and in fact,
>> >> providing this hack may mask the fact that a given system is indeed
>> >> broken. So this patch attempts to apply this unsound hack more narrowly,
>> >> providing a strong suggestion to developers and system designers that
>> >> this is truly a hack. With luck, system designers can catch their errors
>> >> early on in their development cycle, rather than applying this hack long
>> >> term. But apparently enough systems are out in the wild that we still
>> >> have to provide this hack.
>> >> 
>> >> Document a new device tree property to denote systems that do not have a
>> >> proper hardware (or software) reset mechanism, and apply the hack (with
>> >> a loud warning) only in this case.
>> >> 
>> >> Signed-off-by: Brian Norris <computersforpeace@gmail.com>
>> >> ---
>> >> Note that I intentionall didn't split the documentation patch. It seems
>> >> clearer to do these together IMO, but if it's *really* important to
>> >> someone...I can resend  
>> >
>> > I'm fine with that.
>> >
>> > I'll leave Neil some time to review/test/comment on the patch before
>> > queuing it, but it looks good to me.  
>> 
>> Thanks.
>> I can confirm that if I apply this patch, my system won't reboot
>> properly (as expected), and if I then add
>> 
>> 		broken-flash-reset;
>> 
>> to the jedec,spi-nor device, it starts functioning correctly again.
>> 
>> I don't like the pejorative "broken", and it also suggests that a thing
>> used to work, but something happened to break it - this is not
>> accurate.
>> I would prefer something like "reset-not-connected" which is an accurate
>> description of the state of the hardware.
>> 
>> I also think that having a WARN_ON is an over-reaction.  Certainly a
>> warning could be appropriate, but just one pr_warn() should be enough.
>> The "problem" is unlikely in practice, and loudly warning people that an
>> asteroid might kill them isn't particularly helpful.
>> 
>> I genuinely think that if the system fails to reboot, then Linux is at
>> fault. I accept that changing Linux to be completely robust might be
>> more trouble than it is worth, but I don't accept that it is impossible.
>> 
>> But I don't intend to fight either of these battles.
>
> Does that mean you're accepting this change? Brian, any comment on what
> Neil said?

I don't see that it is my place to accept or reject the change.
I don't particularly like it, but I hope to never look at this code
against so you shouldn't put to much weight on what I like.

>
> To be honest, I hate being in the middle of this discussion without
> having been involved in the first decision to accept such workarounds.
> I keep thinking that making boards that do not have reset properly
> wired less likely to fail rebooting is a wise decision, but I also
> agree with Brian when he says we should inform people that their design
> is unreliable.
> The main problem I see here, is that adding this prop won't help people
> figuring out what is wrong with their design, it will just help them
> workaround the problem when they find out, and it might already be to
> late to fix the HW design. But maybe it's not what we're trying to do
> here. Maybe we just want to warn users that rebooting such boards is a
> risky procedure.

Simply rebooting the board is not a risky procedure.
The risk is that if something causes Linux to "crash", it may not reboot
properly.

Thanks,
NeilBrown
NeilBrown Aug. 1, 2018, 12:40 a.m. UTC | #8
On Wed, Aug 01 2018, Marek Vasut wrote:

> On 07/31/2018 10:12 PM, Boris Brezillon wrote:
>> On Tue, 31 Jul 2018 11:05:11 +1000
>> NeilBrown <neilb@suse.com> wrote:
>> 
>>> On Fri, Jul 27 2018, Boris Brezillon wrote:
>>>
>>>> On Fri, 27 Jul 2018 11:33:13 -0700
>>>> Brian Norris <computersforpeace@gmail.com> wrote:
>>>>  
>>>>> Commit 59b356ffd0b0 ("mtd: m25p80: restore the status of SPI flash when
>>>>> exiting") is the latest from a long history of attempts to add reboot
>>>>> handling to handle stateful addressing modes on SPI flash. Some prior
>>>>> mostly-related discussions:
>>>>>
>>>>> http://lists.infradead.org/pipermail/linux-mtd/2013-March/046343.html
>>>>> [PATCH 1/3] mtd: m25p80: utilize dedicated 4-byte addressing commands
>>>>>
>>>>> http://lists.infradead.org/pipermail/barebox/2014-September/020682.html
>>>>> [RFC] MTD m25p80 3-byte addressing and boot problem
>>>>>
>>>>> http://lists.infradead.org/pipermail/linux-mtd/2015-February/057683.html
>>>>> [PATCH 2/2] m25p80: if supported put chip to deep power down if not used
>>>>>
>>>>> Previously, attempts to add reboot-time software reset handling were
>>>>> rejected, but the latest attempt was not.
>>>>>
>>>>> Quick summary of the problem:
>>>>> Some systems (e.g., boot ROM or bootloader) assume that they can read
>>>>> initial boot code from their SPI flash using 3-byte addressing. If the
>>>>> flash is left in 4-byte mode after reset, these systems won't boot. The
>>>>> above patch provided a shutdown/remove hook to attempt to reset the
>>>>> addressing mode before we reboot. Notably, this patch misses out on
>>>>> huge classes of unexpected reboots (e.g., crashes, watchdog resets).
>>>>>
>>>>> Unfortunately, it is essentially impossible to solve this problem 100%:
>>>>> if your system doesn't know how to reset the SPI flash to power-on
>>>>> defaults at initialization time, no amount of software can really rescue
>>>>> you -- there will always be a chance of some unexpected reset that
>>>>> leaves your flash in an addressing mode that your boot sequence didn't
>>>>> expect.
>>>>>
>>>>> While it is not directly harmful to perform hacks like the
>>>>> aforementioned commit on all 4-byte addressing flash, a
>>>>> properly-designed system should not need the hack -- and in fact,
>>>>> providing this hack may mask the fact that a given system is indeed
>>>>> broken. So this patch attempts to apply this unsound hack more narrowly,
>>>>> providing a strong suggestion to developers and system designers that
>>>>> this is truly a hack. With luck, system designers can catch their errors
>>>>> early on in their development cycle, rather than applying this hack long
>>>>> term. But apparently enough systems are out in the wild that we still
>>>>> have to provide this hack.
>>>>>
>>>>> Document a new device tree property to denote systems that do not have a
>>>>> proper hardware (or software) reset mechanism, and apply the hack (with
>>>>> a loud warning) only in this case.
>>>>>
>>>>> Signed-off-by: Brian Norris <computersforpeace@gmail.com>
>>>>> ---
>>>>> Note that I intentionall didn't split the documentation patch. It seems
>>>>> clearer to do these together IMO, but if it's *really* important to
>>>>> someone...I can resend  
>>>>
>>>> I'm fine with that.
>>>>
>>>> I'll leave Neil some time to review/test/comment on the patch before
>>>> queuing it, but it looks good to me.  
>>>
>>> Thanks.
>>> I can confirm that if I apply this patch, my system won't reboot
>>> properly (as expected), and if I then add
>>>
>>> 		broken-flash-reset;
>>>
>>> to the jedec,spi-nor device, it starts functioning correctly again.
>>>
>>> I don't like the pejorative "broken", and it also suggests that a thing
>>> used to work, but something happened to break it - this is not
>>> accurate.
>>> I would prefer something like "reset-not-connected" which is an accurate
>>> description of the state of the hardware.
>>>
>>> I also think that having a WARN_ON is an over-reaction.  Certainly a
>>> warning could be appropriate, but just one pr_warn() should be enough.
>>> The "problem" is unlikely in practice, and loudly warning people that an
>>> asteroid might kill them isn't particularly helpful.
>>>
>>> I genuinely think that if the system fails to reboot, then Linux is at
>>> fault. I accept that changing Linux to be completely robust might be
>>> more trouble than it is worth, but I don't accept that it is impossible.
>>>
>>> But I don't intend to fight either of these battles.
>> 
>> Does that mean you're accepting this change? Brian, any comment on what
>> Neil said?
>> 
>> To be honest, I hate being in the middle of this discussion without
>> having been involved in the first decision to accept such workarounds.
>> I keep thinking that making boards that do not have reset properly
>> wired less likely to fail rebooting is a wise decision, but I also
>> agree with Brian when he says we should inform people that their design
>> is unreliable.
>
> Hiding the issue in most cases only leads to vendors making more such
> crippled boards and never learning.

And you think that printing a loud warning would be likely to get vendor
to make fewer crappy boards?
I think it would just annoy people who aren't in a position to do
anything about it.

NeilBrown


>
>> The main problem I see here, is that adding this prop won't help people
>> figuring out what is wrong with their design, it will just help them
>> workaround the problem when they find out, and it might already be to
>> late to fix the HW design. But maybe it's not what we're trying to do
>> here. Maybe we just want to warn users that rebooting such boards is a
>> risky procedure.
>
> The thing is, this is not a workaround, it's just a way of hiding the
> problem because the problem does not go away completely. There are still
> scenarios in which the system will fail.
>
> -- 
> Best regards,
> Marek Vasut
NeilBrown Aug. 1, 2018, 1:06 a.m. UTC | #9
On Tue, Jul 31 2018, Brian Norris wrote:

> Hi Neil, Boris,
>
> On Tue, Jul 31, 2018 at 10:12:55PM +0200, Boris Brezillon wrote:
>> On Tue, 31 Jul 2018 11:05:11 +1000
>> NeilBrown <neilb@suse.com> wrote:
>> > On Fri, Jul 27 2018, Boris Brezillon wrote:
>> > > On Fri, 27 Jul 2018 11:33:13 -0700
>> > > I'll leave Neil some time to review/test/comment on the patch before
>> > > queuing it, but it looks good to me.  
>> > 
>> > Thanks.
>> > I can confirm that if I apply this patch, my system won't reboot
>> > properly (as expected), and if I then add
>> > 
>> > 		broken-flash-reset;
>> > 
>> > to the jedec,spi-nor device, it starts functioning correctly again.
>> > 
>> > I don't like the pejorative "broken", and it also suggests that a thing
>> > used to work, but something happened to break it - this is not
>> > accurate.
>> > I would prefer something like "reset-not-connected" which is an accurate
>> > description of the state of the hardware.
>
> One reason I didn't specifically say something like "not connected", is
> because IIUC it's actually *possible* to have a robust boot sequence
> without the RESET# pin -- e.g., if your boot ROM hardcoded a software
> reset command (just because it's not really standardized doesn't mean
> one can't do it).

Yes, if we could change the hardware (ROM is hardware) there are various
things we could do to improve reliability.
What we want to do in devicetree is to describe the (unchangeable)
hardware so that Linux can work with it as well as possible.

If I have hardware that doesn't reset the flash on reset, then labeling
it
  doesnt-reset-flash-on-system-reset
is perfectly appropriate.  Labeling it "broken" is pejorative and unhelpful.

>
>> > I also think that having a WARN_ON is an over-reaction.  Certainly a
>> > warning could be appropriate, but just one pr_warn() should be enough.
>> > The "problem" is unlikely in practice, and loudly warning people that an
>> > asteroid might kill them isn't particularly helpful.
>> > 
>> > I genuinely think that if the system fails to reboot, then Linux is at
>> > fault. I accept that changing Linux to be completely robust might be
>> > more trouble than it is worth, but I don't accept that it is impossible.
>
> Did you read my last response on the original thread? In my
> understanding, there's always a way to, e.g., b0rk your exception
> handlers, etc., such that you cannot guarantee your software fallbacks
> will work. Normally, one would rely on a (hardware) watchdog to do your
> last resort reset for you, but if said reset cannot also reset your boot
> flash, then...you're stuck.
>
> IOW, it's impossible.

I cannot say for certain if I read your last response, but I've read
quite a few opinions while researching this and think I have a good
handle on the details.

I agree that if you want high reliability then you need a properly
configured hardware watchdog.  Not everyone needs that and not everyone
bothers with a watchdog.
If you do want a watchdog, you would (obviously?) make sure to buy
hardware that supports a watchdog.
But if you choose to buy hardware that doesn't have a watchdog, then it
isn't "broken", it simply doesn't have a watchdog and so can be expected
to freeze if something particularly bad happens.

Linux could get almost arbitrarily sophisticated in ensuring that
the panic-handling code was fully robust and was stored in
write-protected memory, and so be able to reboot cleanly after any
panic.
There will, of course, be situations where it cannot recover (it might
not panic...), but the fact that it needs to reset the flash as part of
recovery shouldn't increase the set of such situations noticeably.

>
> Is that not an accurate description?
>
>> > But I don't intend to fight either of these battles.
>> 
>> Does that mean you're accepting this change? Brian, any comment on what
>> Neil said?
>> 
>> To be honest, I hate being in the middle of this discussion without
>> having been involved in the first decision to accept such workarounds.
>> I keep thinking that making boards that do not have reset properly
>> wired less likely to fail rebooting is a wise decision, but I also
>> agree with Brian when he says we should inform people that their design
>> is unreliable.
>> The main problem I see here, is that adding this prop won't help people
>> figuring out what is wrong with their design, it will just help them
>
> How else would we help someone figure out what's wrong with their
> design? My best attempt is to make it quite obvious, as long as they're
> using vanilla mainline: if their system hangs on reboot (without this
> property), then it's probably a bad design.

Is it really our job to help people figure out what's wrong with their
designs (unless they ask)?
I see it as our job to make Linux work reliably.
If a system hangs on reboot, but we can fix reboot so that it doesn't, I
think we should.  Clearly you disagree.

To clearly state my position:
1/ A clean reboot should reboot cleanly, resetting any hardware that
   might need resetting.
2/ an unclean reboot is never guaranteed (though "best effort" is still
   a good goal).  If you need guaranteed unclean reboots, you need a
   properly configured hardware watchdog.

My hardware doesn't have a properly configured hardware watchdog, and I
don't expect it to handle an unclean reboot.  I do expect it to handle a
clean reboot.  I'd rather not be told the hardware is "broken" because
it isn't - it simply doesn't have watchdog support (it doesn't have
hardware floating point either - that doesn't make it 'broken').

Thanks,
NeilBrown


>
> And if instead, someone stuck in this DT property already, the loud
> warning might suggest the reader look at the DT binding doc or code
> comments, where I elaborated.
>
>> workaround the problem when they find out, and it might already be to
>> late to fix the HW design. But maybe it's not what we're trying to do
>> here. Maybe we just want to warn users that rebooting such boards is a
>> risky procedure.
>
> Brian
Boris Brezillon Aug. 1, 2018, 7:15 a.m. UTC | #10
On Fri, 27 Jul 2018 11:33:13 -0700
Brian Norris <computersforpeace@gmail.com> wrote:

> Commit 59b356ffd0b0 ("mtd: m25p80: restore the status of SPI flash when
> exiting") is the latest from a long history of attempts to add reboot
> handling to handle stateful addressing modes on SPI flash. Some prior
> mostly-related discussions:
> 
> http://lists.infradead.org/pipermail/linux-mtd/2013-March/046343.html
> [PATCH 1/3] mtd: m25p80: utilize dedicated 4-byte addressing commands
> 
> http://lists.infradead.org/pipermail/barebox/2014-September/020682.html
> [RFC] MTD m25p80 3-byte addressing and boot problem
> 
> http://lists.infradead.org/pipermail/linux-mtd/2015-February/057683.html
> [PATCH 2/2] m25p80: if supported put chip to deep power down if not used
> 
> Previously, attempts to add reboot-time software reset handling were
> rejected, but the latest attempt was not.
> 
> Quick summary of the problem:
> Some systems (e.g., boot ROM or bootloader) assume that they can read
> initial boot code from their SPI flash using 3-byte addressing. If the
> flash is left in 4-byte mode after reset, these systems won't boot. The
> above patch provided a shutdown/remove hook to attempt to reset the
> addressing mode before we reboot. Notably, this patch misses out on
> huge classes of unexpected reboots (e.g., crashes, watchdog resets).
> 
> Unfortunately, it is essentially impossible to solve this problem 100%:
> if your system doesn't know how to reset the SPI flash to power-on
> defaults at initialization time, no amount of software can really rescue
> you -- there will always be a chance of some unexpected reset that
> leaves your flash in an addressing mode that your boot sequence didn't
> expect.
> 
> While it is not directly harmful to perform hacks like the
> aforementioned commit on all 4-byte addressing flash, a
> properly-designed system should not need the hack -- and in fact,
> providing this hack may mask the fact that a given system is indeed
> broken. So this patch attempts to apply this unsound hack more narrowly,
> providing a strong suggestion to developers and system designers that
> this is truly a hack. With luck, system designers can catch their errors
> early on in their development cycle, rather than applying this hack long
> term. But apparently enough systems are out in the wild that we still
> have to provide this hack.
> 
> Document a new device tree property to denote systems that do not have a
> proper hardware (or software) reset mechanism, and apply the hack (with
> a loud warning) only in this case.
> 
> Signed-off-by: Brian Norris <computersforpeace@gmail.com>

Queued to spi-nor/next.

Thanks,

Boris

> ---
> Note that I intentionall didn't split the documentation patch. It seems
> clearer to do these together IMO, but if it's *really* important to
> someone...I can resend
> ---
>  .../devicetree/bindings/mtd/jedec,spi-nor.txt  |  9 +++++++++
>  drivers/mtd/spi-nor/spi-nor.c                  | 18 ++++++++++++++++--
>  include/linux/mtd/spi-nor.h                    |  1 +
>  3 files changed, 26 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/devicetree/bindings/mtd/jedec,spi-nor.txt b/Documentation/devicetree/bindings/mtd/jedec,spi-nor.txt
> index 956bb046e599..f03be904d3c2 100644
> --- a/Documentation/devicetree/bindings/mtd/jedec,spi-nor.txt
> +++ b/Documentation/devicetree/bindings/mtd/jedec,spi-nor.txt
> @@ -69,6 +69,15 @@ Optional properties:
>                     all chips and support for it can not be detected at runtime.
>                     Refer to your chips' datasheet to check if this is supported
>                     by your chip.
> +- broken-flash-reset : Some flash devices utilize stateful addressing modes
> +		   (e.g., for 32-bit addressing) which need to be managed
> +		   carefully by a system. Because these sorts of flash don't
> +		   have a standardized software reset command, and because some
> +		   systems don't toggle the flash RESET# pin upon system reset
> +		   (if the pin even exists at all), there are systems which
> +		   cannot reboot properly if the flash is left in the "wrong"
> +		   state. This boolean flag can be used on such systems, to
> +		   denote the absence of a reliable reset mechanism.
>  
>  Example:
>  
> diff --git a/drivers/mtd/spi-nor/spi-nor.c b/drivers/mtd/spi-nor/spi-nor.c
> index d9c368c44194..f028277fb1ce 100644
> --- a/drivers/mtd/spi-nor/spi-nor.c
> +++ b/drivers/mtd/spi-nor/spi-nor.c
> @@ -2757,8 +2757,18 @@ static int spi_nor_init(struct spi_nor *nor)
>  
>  	if ((nor->addr_width == 4) &&
>  	    (JEDEC_MFR(nor->info) != SNOR_MFR_SPANSION) &&
> -	    !(nor->info->flags & SPI_NOR_4B_OPCODES))
> +	    !(nor->info->flags & SPI_NOR_4B_OPCODES)) {
> +		/*
> +		 * If the RESET# pin isn't hooked up properly, or the system
> +		 * otherwise doesn't perform a reset command in the boot
> +		 * sequence, it's impossible to 100% protect against unexpected
> +		 * reboots (e.g., crashes). Warn the user (or hopefully, system
> +		 * designer) that this is bad.
> +		 */
> +		WARN_ONCE(nor->flags & SNOR_F_BROKEN_RESET,
> +			  "enabling reset hack; may not recover from unexpected reboots\n");
>  		set_4byte(nor, nor->info, 1);
> +	}
>  
>  	return 0;
>  }
> @@ -2781,7 +2791,8 @@ void spi_nor_restore(struct spi_nor *nor)
>  	/* restore the addressing mode */
>  	if ((nor->addr_width == 4) &&
>  	    (JEDEC_MFR(nor->info) != SNOR_MFR_SPANSION) &&
> -	    !(nor->info->flags & SPI_NOR_4B_OPCODES))
> +	    !(nor->info->flags & SPI_NOR_4B_OPCODES) &&
> +	    (nor->flags & SNOR_F_BROKEN_RESET))
>  		set_4byte(nor, nor->info, 0);
>  }
>  EXPORT_SYMBOL_GPL(spi_nor_restore);
> @@ -2911,6 +2922,9 @@ int spi_nor_scan(struct spi_nor *nor, const char *name,
>  		params.hwcaps.mask |= SNOR_HWCAPS_READ_FAST;
>  	}
>  
> +	if (of_property_read_bool(np, "broken-flash-reset"))
> +		nor->flags |= SNOR_F_BROKEN_RESET;
> +
>  	/* Some devices cannot do fast-read, no matter what DT tells us */
>  	if (info->flags & SPI_NOR_NO_FR)
>  		params.hwcaps.mask &= ~SNOR_HWCAPS_READ_FAST;
> diff --git a/include/linux/mtd/spi-nor.h b/include/linux/mtd/spi-nor.h
> index e60da0d34cc1..c922e97f205a 100644
> --- a/include/linux/mtd/spi-nor.h
> +++ b/include/linux/mtd/spi-nor.h
> @@ -235,6 +235,7 @@ enum spi_nor_option_flags {
>  	SNOR_F_S3AN_ADDR_DEFAULT = BIT(3),
>  	SNOR_F_READY_XSR_RDY	= BIT(4),
>  	SNOR_F_USE_CLSR		= BIT(5),
> +	SNOR_F_BROKEN_RESET	= BIT(6),
>  };
>  
>  /**
Marek Vasut Aug. 1, 2018, 8:24 a.m. UTC | #11
On 08/01/2018 02:40 AM, NeilBrown wrote:
> On Wed, Aug 01 2018, Marek Vasut wrote:
> 
>> On 07/31/2018 10:12 PM, Boris Brezillon wrote:
>>> On Tue, 31 Jul 2018 11:05:11 +1000
>>> NeilBrown <neilb@suse.com> wrote:
>>>
>>>> On Fri, Jul 27 2018, Boris Brezillon wrote:
>>>>
>>>>> On Fri, 27 Jul 2018 11:33:13 -0700
>>>>> Brian Norris <computersforpeace@gmail.com> wrote:
>>>>>  
>>>>>> Commit 59b356ffd0b0 ("mtd: m25p80: restore the status of SPI flash when
>>>>>> exiting") is the latest from a long history of attempts to add reboot
>>>>>> handling to handle stateful addressing modes on SPI flash. Some prior
>>>>>> mostly-related discussions:
>>>>>>
>>>>>> http://lists.infradead.org/pipermail/linux-mtd/2013-March/046343.html
>>>>>> [PATCH 1/3] mtd: m25p80: utilize dedicated 4-byte addressing commands
>>>>>>
>>>>>> http://lists.infradead.org/pipermail/barebox/2014-September/020682.html
>>>>>> [RFC] MTD m25p80 3-byte addressing and boot problem
>>>>>>
>>>>>> http://lists.infradead.org/pipermail/linux-mtd/2015-February/057683.html
>>>>>> [PATCH 2/2] m25p80: if supported put chip to deep power down if not used
>>>>>>
>>>>>> Previously, attempts to add reboot-time software reset handling were
>>>>>> rejected, but the latest attempt was not.
>>>>>>
>>>>>> Quick summary of the problem:
>>>>>> Some systems (e.g., boot ROM or bootloader) assume that they can read
>>>>>> initial boot code from their SPI flash using 3-byte addressing. If the
>>>>>> flash is left in 4-byte mode after reset, these systems won't boot. The
>>>>>> above patch provided a shutdown/remove hook to attempt to reset the
>>>>>> addressing mode before we reboot. Notably, this patch misses out on
>>>>>> huge classes of unexpected reboots (e.g., crashes, watchdog resets).
>>>>>>
>>>>>> Unfortunately, it is essentially impossible to solve this problem 100%:
>>>>>> if your system doesn't know how to reset the SPI flash to power-on
>>>>>> defaults at initialization time, no amount of software can really rescue
>>>>>> you -- there will always be a chance of some unexpected reset that
>>>>>> leaves your flash in an addressing mode that your boot sequence didn't
>>>>>> expect.
>>>>>>
>>>>>> While it is not directly harmful to perform hacks like the
>>>>>> aforementioned commit on all 4-byte addressing flash, a
>>>>>> properly-designed system should not need the hack -- and in fact,
>>>>>> providing this hack may mask the fact that a given system is indeed
>>>>>> broken. So this patch attempts to apply this unsound hack more narrowly,
>>>>>> providing a strong suggestion to developers and system designers that
>>>>>> this is truly a hack. With luck, system designers can catch their errors
>>>>>> early on in their development cycle, rather than applying this hack long
>>>>>> term. But apparently enough systems are out in the wild that we still
>>>>>> have to provide this hack.
>>>>>>
>>>>>> Document a new device tree property to denote systems that do not have a
>>>>>> proper hardware (or software) reset mechanism, and apply the hack (with
>>>>>> a loud warning) only in this case.
>>>>>>
>>>>>> Signed-off-by: Brian Norris <computersforpeace@gmail.com>
>>>>>> ---
>>>>>> Note that I intentionall didn't split the documentation patch. It seems
>>>>>> clearer to do these together IMO, but if it's *really* important to
>>>>>> someone...I can resend  
>>>>>
>>>>> I'm fine with that.
>>>>>
>>>>> I'll leave Neil some time to review/test/comment on the patch before
>>>>> queuing it, but it looks good to me.  
>>>>
>>>> Thanks.
>>>> I can confirm that if I apply this patch, my system won't reboot
>>>> properly (as expected), and if I then add
>>>>
>>>> 		broken-flash-reset;
>>>>
>>>> to the jedec,spi-nor device, it starts functioning correctly again.
>>>>
>>>> I don't like the pejorative "broken", and it also suggests that a thing
>>>> used to work, but something happened to break it - this is not
>>>> accurate.
>>>> I would prefer something like "reset-not-connected" which is an accurate
>>>> description of the state of the hardware.
>>>>
>>>> I also think that having a WARN_ON is an over-reaction.  Certainly a
>>>> warning could be appropriate, but just one pr_warn() should be enough.
>>>> The "problem" is unlikely in practice, and loudly warning people that an
>>>> asteroid might kill them isn't particularly helpful.
>>>>
>>>> I genuinely think that if the system fails to reboot, then Linux is at
>>>> fault. I accept that changing Linux to be completely robust might be
>>>> more trouble than it is worth, but I don't accept that it is impossible.
>>>>
>>>> But I don't intend to fight either of these battles.
>>>
>>> Does that mean you're accepting this change? Brian, any comment on what
>>> Neil said?
>>>
>>> To be honest, I hate being in the middle of this discussion without
>>> having been involved in the first decision to accept such workarounds.
>>> I keep thinking that making boards that do not have reset properly
>>> wired less likely to fail rebooting is a wise decision, but I also
>>> agree with Brian when he says we should inform people that their design
>>> is unreliable.
>>
>> Hiding the issue in most cases only leads to vendors making more such
>> crippled boards and never learning.
> 
> And you think that printing a loud warning would be likely to get vendor
> to make fewer crappy boards?
> I think it would just annoy people who aren't in a position to do
> anything about it.

If your hardware is broken and it cannot be properly worked around by
software, what do you do ?
Rob Herring (Arm) Aug. 7, 2018, 6:33 p.m. UTC | #12
On Fri, Jul 27, 2018 at 11:33:13AM -0700, Brian Norris wrote:
> Commit 59b356ffd0b0 ("mtd: m25p80: restore the status of SPI flash when
> exiting") is the latest from a long history of attempts to add reboot
> handling to handle stateful addressing modes on SPI flash. Some prior
> mostly-related discussions:
> 
> http://lists.infradead.org/pipermail/linux-mtd/2013-March/046343.html
> [PATCH 1/3] mtd: m25p80: utilize dedicated 4-byte addressing commands
> 
> http://lists.infradead.org/pipermail/barebox/2014-September/020682.html
> [RFC] MTD m25p80 3-byte addressing and boot problem
> 
> http://lists.infradead.org/pipermail/linux-mtd/2015-February/057683.html
> [PATCH 2/2] m25p80: if supported put chip to deep power down if not used
> 
> Previously, attempts to add reboot-time software reset handling were
> rejected, but the latest attempt was not.
> 
> Quick summary of the problem:
> Some systems (e.g., boot ROM or bootloader) assume that they can read
> initial boot code from their SPI flash using 3-byte addressing. If the
> flash is left in 4-byte mode after reset, these systems won't boot. The
> above patch provided a shutdown/remove hook to attempt to reset the
> addressing mode before we reboot. Notably, this patch misses out on
> huge classes of unexpected reboots (e.g., crashes, watchdog resets).
> 
> Unfortunately, it is essentially impossible to solve this problem 100%:
> if your system doesn't know how to reset the SPI flash to power-on
> defaults at initialization time, no amount of software can really rescue
> you -- there will always be a chance of some unexpected reset that
> leaves your flash in an addressing mode that your boot sequence didn't
> expect.
> 
> While it is not directly harmful to perform hacks like the
> aforementioned commit on all 4-byte addressing flash, a
> properly-designed system should not need the hack -- and in fact,
> providing this hack may mask the fact that a given system is indeed
> broken. So this patch attempts to apply this unsound hack more narrowly,
> providing a strong suggestion to developers and system designers that
> this is truly a hack. With luck, system designers can catch their errors
> early on in their development cycle, rather than applying this hack long
> term. But apparently enough systems are out in the wild that we still
> have to provide this hack.
> 
> Document a new device tree property to denote systems that do not have a
> proper hardware (or software) reset mechanism, and apply the hack (with
> a loud warning) only in this case.
> 
> Signed-off-by: Brian Norris <computersforpeace@gmail.com>
> ---
> Note that I intentionall didn't split the documentation patch. It seems
> clearer to do these together IMO, but if it's *really* important to
> someone...I can resend

How is it cleaner in this specific case?

The reason to separate the binding besides that I only review the 
binding part (generally) is so the DT only repository we generate[1] has 
a history and commit msgs that make sense.

That being said, if there are no other changes I'm not going to ask for 
it to be split.

Acked-by: Rob Herring <robh@kernel.org>
 
> ---
>  .../devicetree/bindings/mtd/jedec,spi-nor.txt  |  9 +++++++++
>  drivers/mtd/spi-nor/spi-nor.c                  | 18 ++++++++++++++++--
>  include/linux/mtd/spi-nor.h                    |  1 +
>  3 files changed, 26 insertions(+), 2 deletions(-)

[1] https://git.kernel.org/pub/scm/linux/kernel/git/devicetree/devicetree-rebasing.git/
Rob Herring (Arm) Aug. 7, 2018, 6:39 p.m. UTC | #13
On Tue, Jul 31, 2018 at 03:35:50PM -0700, Brian Norris wrote:
> Hi Neil, Boris,
> 
> On Tue, Jul 31, 2018 at 10:12:55PM +0200, Boris Brezillon wrote:
> > On Tue, 31 Jul 2018 11:05:11 +1000
> > NeilBrown <neilb@suse.com> wrote:
> > > On Fri, Jul 27 2018, Boris Brezillon wrote:
> > > > On Fri, 27 Jul 2018 11:33:13 -0700
> > > > I'll leave Neil some time to review/test/comment on the patch before
> > > > queuing it, but it looks good to me.  
> > > 
> > > Thanks.
> > > I can confirm that if I apply this patch, my system won't reboot
> > > properly (as expected), and if I then add
> > > 
> > > 		broken-flash-reset;
> > > 
> > > to the jedec,spi-nor device, it starts functioning correctly again.
> > > 
> > > I don't like the pejorative "broken", and it also suggests that a thing
> > > used to work, but something happened to break it - this is not
> > > accurate.
> > > I would prefer something like "reset-not-connected" which is an accurate
> > > description of the state of the hardware.
> 
> One reason I didn't specifically say something like "not connected", is
> because IIUC it's actually *possible* to have a robust boot sequence
> without the RESET# pin -- e.g., if your boot ROM hardcoded a software
> reset command (just because it's not really standardized doesn't mean
> one can't do it).

Based on that, then it sounds like you need a specific compatible string 
so you too can know how to s/w reset the device.

I guess you are assuming a bootloader didn't leave the flash in an 
unknown addressing state?

Rob
Brian Norris Aug. 7, 2018, 7:22 p.m. UTC | #14
On Tue, Aug 07, 2018 at 12:39:01PM -0600, Rob Herring wrote:
> On Tue, Jul 31, 2018 at 03:35:50PM -0700, Brian Norris wrote:
> > One reason I didn't specifically say something like "not connected", is
> > because IIUC it's actually *possible* to have a robust boot sequence
> > without the RESET# pin -- e.g., if your boot ROM hardcoded a software
> > reset command (just because it's not really standardized doesn't mean
> > one can't do it).
> 
> Based on that, then it sounds like you need a specific compatible string 
> so you too can know how to s/w reset the device.

We do also support compatible properties for these chips, where needed.
But I don't think that's really needed.

> I guess you are assuming a bootloader didn't leave the flash in an 
> unknown addressing state?

For some of these address states (at least, 3-byte vs. 4-byte
addressing), we can still identify the device independently; the READ ID
command works in either mode.

The problem is that a boot ROM is rarely as complex as a Linux driver
and probably can't be updated. And they usually do stupid things anyway.
So *that's* where you need a well-defined entry state for the flash.

Brian
diff mbox series

Patch

diff --git a/Documentation/devicetree/bindings/mtd/jedec,spi-nor.txt b/Documentation/devicetree/bindings/mtd/jedec,spi-nor.txt
index 956bb046e599..f03be904d3c2 100644
--- a/Documentation/devicetree/bindings/mtd/jedec,spi-nor.txt
+++ b/Documentation/devicetree/bindings/mtd/jedec,spi-nor.txt
@@ -69,6 +69,15 @@  Optional properties:
                    all chips and support for it can not be detected at runtime.
                    Refer to your chips' datasheet to check if this is supported
                    by your chip.
+- broken-flash-reset : Some flash devices utilize stateful addressing modes
+		   (e.g., for 32-bit addressing) which need to be managed
+		   carefully by a system. Because these sorts of flash don't
+		   have a standardized software reset command, and because some
+		   systems don't toggle the flash RESET# pin upon system reset
+		   (if the pin even exists at all), there are systems which
+		   cannot reboot properly if the flash is left in the "wrong"
+		   state. This boolean flag can be used on such systems, to
+		   denote the absence of a reliable reset mechanism.
 
 Example:
 
diff --git a/drivers/mtd/spi-nor/spi-nor.c b/drivers/mtd/spi-nor/spi-nor.c
index d9c368c44194..f028277fb1ce 100644
--- a/drivers/mtd/spi-nor/spi-nor.c
+++ b/drivers/mtd/spi-nor/spi-nor.c
@@ -2757,8 +2757,18 @@  static int spi_nor_init(struct spi_nor *nor)
 
 	if ((nor->addr_width == 4) &&
 	    (JEDEC_MFR(nor->info) != SNOR_MFR_SPANSION) &&
-	    !(nor->info->flags & SPI_NOR_4B_OPCODES))
+	    !(nor->info->flags & SPI_NOR_4B_OPCODES)) {
+		/*
+		 * If the RESET# pin isn't hooked up properly, or the system
+		 * otherwise doesn't perform a reset command in the boot
+		 * sequence, it's impossible to 100% protect against unexpected
+		 * reboots (e.g., crashes). Warn the user (or hopefully, system
+		 * designer) that this is bad.
+		 */
+		WARN_ONCE(nor->flags & SNOR_F_BROKEN_RESET,
+			  "enabling reset hack; may not recover from unexpected reboots\n");
 		set_4byte(nor, nor->info, 1);
+	}
 
 	return 0;
 }
@@ -2781,7 +2791,8 @@  void spi_nor_restore(struct spi_nor *nor)
 	/* restore the addressing mode */
 	if ((nor->addr_width == 4) &&
 	    (JEDEC_MFR(nor->info) != SNOR_MFR_SPANSION) &&
-	    !(nor->info->flags & SPI_NOR_4B_OPCODES))
+	    !(nor->info->flags & SPI_NOR_4B_OPCODES) &&
+	    (nor->flags & SNOR_F_BROKEN_RESET))
 		set_4byte(nor, nor->info, 0);
 }
 EXPORT_SYMBOL_GPL(spi_nor_restore);
@@ -2911,6 +2922,9 @@  int spi_nor_scan(struct spi_nor *nor, const char *name,
 		params.hwcaps.mask |= SNOR_HWCAPS_READ_FAST;
 	}
 
+	if (of_property_read_bool(np, "broken-flash-reset"))
+		nor->flags |= SNOR_F_BROKEN_RESET;
+
 	/* Some devices cannot do fast-read, no matter what DT tells us */
 	if (info->flags & SPI_NOR_NO_FR)
 		params.hwcaps.mask &= ~SNOR_HWCAPS_READ_FAST;
diff --git a/include/linux/mtd/spi-nor.h b/include/linux/mtd/spi-nor.h
index e60da0d34cc1..c922e97f205a 100644
--- a/include/linux/mtd/spi-nor.h
+++ b/include/linux/mtd/spi-nor.h
@@ -235,6 +235,7 @@  enum spi_nor_option_flags {
 	SNOR_F_S3AN_ADDR_DEFAULT = BIT(3),
 	SNOR_F_READY_XSR_RDY	= BIT(4),
 	SNOR_F_USE_CLSR		= BIT(5),
+	SNOR_F_BROKEN_RESET	= BIT(6),
 };
 
 /**