diff mbox series

[PATCHv3,2/2] mtd: m25p80: restore the status of SPI flash when exiting

Message ID 20171206025342.7266-3-Zhiqiang.Hou@nxp.com
State Accepted
Delegated to: Cyrille Pitchen
Headers show
Series mtd: m25p80: restore the addressing mode when exiting | expand

Commit Message

Z.Q. Hou Dec. 6, 2017, 2:53 a.m. UTC
From: Hou Zhiqiang <Zhiqiang.Hou@nxp.com>

Restore the status to be compatible with legacy devices.
Take Freescale eSPI boot for example, it copies (in 3 Byte
addressing mode) the RCW and bootloader images from SPI flash
without firing a reset signal previously, so the reboot command
will fail without reseting the addressing mode of SPI flash.
This patch implement .shutdown function to restore the status
in reboot process, and add the same operation to the .remove
function.

Signed-off-by: Hou Zhiqiang <Zhiqiang.Hou@nxp.com>
---
V3:
 - Modified the commit to make this patch specific.

 drivers/mtd/devices/m25p80.c | 9 +++++++++
 1 file changed, 9 insertions(+)

Comments

Cyrille Pitchen Dec. 12, 2017, 1:19 p.m. UTC | #1
Le 06/12/2017 à 03:53, Zhiqiang Hou a écrit :
> From: Hou Zhiqiang <Zhiqiang.Hou@nxp.com>
> 
> Restore the status to be compatible with legacy devices.
> Take Freescale eSPI boot for example, it copies (in 3 Byte
> addressing mode) the RCW and bootloader images from SPI flash
> without firing a reset signal previously, so the reboot command
> will fail without reseting the addressing mode of SPI flash.
resetting

> This patch implement .shutdown function to restore the status
implements
> in reboot process, and add the same operation to the .remove
> function.
> 
> Signed-off-by: Hou Zhiqiang <Zhiqiang.Hou@nxp.com>

Applied to the spi-nor/next branch of l2-mtd

Thanks!

(corrected few mistakes in the commit message)
> ---
> V3:
>  - Modified the commit to make this patch specific.
> 
>  drivers/mtd/devices/m25p80.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/drivers/mtd/devices/m25p80.c b/drivers/mtd/devices/m25p80.c
> index dbe6a1de2bb8..a4e18f6aaa33 100644
> --- a/drivers/mtd/devices/m25p80.c
> +++ b/drivers/mtd/devices/m25p80.c
> @@ -307,10 +307,18 @@ static int m25p_remove(struct spi_device *spi)
>  {
>  	struct m25p	*flash = spi_get_drvdata(spi);
>  
> +	spi_nor_restore(&flash->spi_nor);
> +
>  	/* Clean up MTD stuff. */
>  	return mtd_device_unregister(&flash->spi_nor.mtd);
>  }
>  
> +static void m25p_shutdown(struct spi_device *spi)
> +{
> +	struct m25p *flash = spi_get_drvdata(spi);
> +
> +	spi_nor_restore(&flash->spi_nor);
> +}
>  /*
>   * Do NOT add to this array without reading the following:
>   *
> @@ -386,6 +394,7 @@ static struct spi_driver m25p80_driver = {
>  	.id_table	= m25p_ids,
>  	.probe	= m25p_probe,
>  	.remove	= m25p_remove,
> +	.shutdown	= m25p_shutdown,
>  
>  	/* REVISIT: many of these chips have deep power-down modes, which
>  	 * should clearly be entered on suspend() to minimize power use.
>
Brian Norris July 23, 2018, 6:13 p.m. UTC | #2
Hello,

I noticed this got merged, but I wanted to put my 2 cents in here:

On Wed, Dec 06, 2017 at 10:53:42AM +0800, Zhiqiang Hou wrote:
> From: Hou Zhiqiang <Zhiqiang.Hou@nxp.com>
> 
> Restore the status to be compatible with legacy devices.
> Take Freescale eSPI boot for example, it copies (in 3 Byte
> addressing mode) the RCW and bootloader images from SPI flash
> without firing a reset signal previously, so the reboot command
> will fail without reseting the addressing mode of SPI flash.
> This patch implement .shutdown function to restore the status
> in reboot process, and add the same operation to the .remove
> function.

We have previously rejected this patch multiple times, because the above
comment demonstrates a broken product. You cannot guarantee that all
reboots will invoke the .shutdown() method -- what about crashes? What
about watchdog resets? IIUC, those will hit the same broken behavior,
and have unexepcted behavior in your bootloader.

I suppose one could argue for doing this in remove(), but AIUI you're
just papering over system bugs by introducing the shutdown() function
here. Thus, I'd prefer we drop the shutdown() method to avoid misleading
other users of this driver.

Brian

> Signed-off-by: Hou Zhiqiang <Zhiqiang.Hou@nxp.com>
> ---
> V3:
>  - Modified the commit to make this patch specific.
> 
>  drivers/mtd/devices/m25p80.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/drivers/mtd/devices/m25p80.c b/drivers/mtd/devices/m25p80.c
> index dbe6a1de2bb8..a4e18f6aaa33 100644
> --- a/drivers/mtd/devices/m25p80.c
> +++ b/drivers/mtd/devices/m25p80.c
> @@ -307,10 +307,18 @@ static int m25p_remove(struct spi_device *spi)
>  {
>  	struct m25p	*flash = spi_get_drvdata(spi);
>  
> +	spi_nor_restore(&flash->spi_nor);
> +
>  	/* Clean up MTD stuff. */
>  	return mtd_device_unregister(&flash->spi_nor.mtd);
>  }
>  
> +static void m25p_shutdown(struct spi_device *spi)
> +{
> +	struct m25p *flash = spi_get_drvdata(spi);
> +
> +	spi_nor_restore(&flash->spi_nor);
> +}
>  /*
>   * Do NOT add to this array without reading the following:
>   *
> @@ -386,6 +394,7 @@ static struct spi_driver m25p80_driver = {
>  	.id_table	= m25p_ids,
>  	.probe	= m25p_probe,
>  	.remove	= m25p_remove,
> +	.shutdown	= m25p_shutdown,
>  
>  	/* REVISIT: many of these chips have deep power-down modes, which
>  	 * should clearly be entered on suspend() to minimize power use.
> -- 
> 2.14.1
>
Boris Brezillon July 23, 2018, 8:10 p.m. UTC | #3
Hi Brian,

On Mon, 23 Jul 2018 11:13:50 -0700
Brian Norris <computersforpeace@gmail.com> wrote:

> Hello,
> 
> I noticed this got merged, but I wanted to put my 2 cents in here:

I wish you had replied to this thread when it was posted (more than
6 months ago). Reverting the patch now implies making some people
unhappy because they'll have to resort to their old out-of-tree
hacks :-(.

> 
> On Wed, Dec 06, 2017 at 10:53:42AM +0800, Zhiqiang Hou wrote:
> > From: Hou Zhiqiang <Zhiqiang.Hou@nxp.com>
> > 
> > Restore the status to be compatible with legacy devices.
> > Take Freescale eSPI boot for example, it copies (in 3 Byte
> > addressing mode) the RCW and bootloader images from SPI flash
> > without firing a reset signal previously, so the reboot command
> > will fail without reseting the addressing mode of SPI flash.
> > This patch implement .shutdown function to restore the status
> > in reboot process, and add the same operation to the .remove
> > function.  
> 
> We have previously rejected this patch multiple times, because the above
> comment demonstrates a broken product.

If we were to only support working HW parts, I fear Linux would not
support a lot of HW (that's even more true when it comes to flashes :P).

> You cannot guarantee that all
> reboots will invoke the .shutdown() method -- what about crashes? What
> about watchdog resets? IIUC, those will hit the same broken behavior,
> and have unexepcted behavior in your bootloader.

Yes, there are corner cases that are not addressed with this approach,
but it still seems to improve things. Of course, that means the
user should try to re-route all HW reset sources to SW ones (RESET input
pin muxed to the GPIO controller, watchdog generating an interrupt
instead of directly asserting the RESET output pin), which is not always
possible, but even when it's not, isn't it better to have a setup that
works fine 99% of the time instead of 50% of the time?

> 
> I suppose one could argue for doing this in remove(), but AIUI you're
> just papering over system bugs by introducing the shutdown() function
> here. Thus, I'd prefer we drop the shutdown() method to avoid misleading
> other users of this driver.

I understand your point. But if the problem is about making sure people
designing new boards get that right, why not complaining at probe time
when things are wrong?

I mean, spi_nor_restore() seems to only do something on very specific
NORs (those on which a SW RESET does not resets the addressing
mode). So, how about adding a flag that says "my board has the NOR HW
RESET pin wired" (there would be a DT props to set that flag). Then you
add a WARN_ON() when this flag is not set and a NOR chip impacted by
this bug is detected. This way you make sure people are informed that
they're doing something wrong, and for those who can't change their HW
(because it's already widely deployed), you have a fix that improve
things.

Regards,

Boris
Brian Norris July 23, 2018, 10:06 p.m. UTC | #4
Hi Boris,

On Mon, Jul 23, 2018 at 1:10 PM, Boris Brezillon
<boris.brezillon@bootlin.com> wrote:
> On Mon, 23 Jul 2018 11:13:50 -0700
> Brian Norris <computersforpeace@gmail.com> wrote:
>> I noticed this got merged, but I wanted to put my 2 cents in here:
>
> I wish you had replied to this thread when it was posted (more than
> 6 months ago). Reverting the patch now implies making some people
> unhappy because they'll have to resort to their old out-of-tree
> hacks :-(.

I'd say I'm sorry for not following things closely these days, but I'm
not really that sorry. There are plenty of other capable hands. And if
y'all shoot yourselves in the foot, so be it. This patch isn't going
to blow things up, but now that I did finally notice it (because it
happened to show up in a list of backports I was looking at), I
thought better late than never to remind you.

For way of notification: Marek already noticed that we've started down
a slippery slope months ago:

https://lkml.org/lkml/2018/4/8/141
Re: [PATCH] mtd: spi-nor: clear Extended Address Reg on switch to
3-byte addressing.

I'm not quite sure why that wasn't taken to its logical conclusion --
that the hack should be reverted.

This problem has been noted many times already, and we've always
stayed on the side of *avoiding* this hack. A few references from a
search of my email:

http://lists.infradead.org/pipermail/linux-mtd/2013-March/046343.html
[PATCH 1/3] mtd: m25p80: utilize dedicated 4-byte addressing commands

http://lists.infradead.org/pipermail/barebox/2014-September/020682.html
[RFC] MTD m25p80 3-byte addressing and boot problem

http://lists.infradead.org/pipermail/linux-mtd/2015-February/057683.html
[PATCH 2/2] m25p80: if supported put chip to deep power down if not used

>> On Wed, Dec 06, 2017 at 10:53:42AM +0800, Zhiqiang Hou wrote:
>> > From: Hou Zhiqiang <Zhiqiang.Hou@nxp.com>
>> >
>> > Restore the status to be compatible with legacy devices.
>> > Take Freescale eSPI boot for example, it copies (in 3 Byte
>> > addressing mode) the RCW and bootloader images from SPI flash
>> > without firing a reset signal previously, so the reboot command
>> > will fail without reseting the addressing mode of SPI flash.
>> > This patch implement .shutdown function to restore the status
>> > in reboot process, and add the same operation to the .remove
>> > function.
>>
>> We have previously rejected this patch multiple times, because the above
>> comment demonstrates a broken product.
>
> If we were to only support working HW parts, I fear Linux would not
> support a lot of HW (that's even more true when it comes to flashes :P).

You stopped allowing UBI to attach to MLC NAND recently, no? That
sounds like almost the same boat -- you've probably killed quite a few
shitty products, if they were to use mainline directly.

Anyway, that's derailing the issue. Supporting broken hardware isn't
something you try to do by applying the same hack to all systems. You
normally try to apply your hack as narrowly as possible. You seem to
imply that below. So maybe that's a solution to move forward with. But
I'd personally be just as happy to see the patch reverted.

>> You cannot guarantee that all
>> reboots will invoke the .shutdown() method -- what about crashes? What
>> about watchdog resets? IIUC, those will hit the same broken behavior,
>> and have unexepcted behavior in your bootloader.
>
> Yes, there are corner cases that are not addressed with this approach,

Is a system crash really a corner case? :D

> but it still seems to improve things. Of course, that means the
> user should try to re-route all HW reset sources to SW ones (RESET input
> pin muxed to the GPIO controller, watchdog generating an interrupt
> instead of directly asserting the RESET output pin), which is not always
> possible, but even when it's not, isn't it better to have a setup that
> works fine 99% of the time instead of 50% of the time?

Perhaps, but not at the expense of future development. And
realistically, no one is doing that if they have this hack. Most
people won't even know that this hack is protecting them at all (so
again, they won't try to mitigate the problem any further).

>> I suppose one could argue for doing this in remove(), but AIUI you're
>> just papering over system bugs by introducing the shutdown() function
>> here. Thus, I'd prefer we drop the shutdown() method to avoid misleading
>> other users of this driver.
>
> I understand your point. But if the problem is about making sure people
> designing new boards get that right, why not complaining at probe time
> when things are wrong?
>
> I mean, spi_nor_restore() seems to only do something on very specific
> NORs (those on which a SW RESET does not resets the addressing
> mode).

The point isn't that SW RESET doesn't reset the addressing mode -- it
does on any flash I've seen. The point is that most systems are built
around a stateless assumption in these flash. IIRC, there wasn't even
a SW RESET command at all until these "huge" flash came around and
stateful addressing modes came about. So boot ROMs and bootloaders
would have to be updated to start figuring out when/how to do this SW
RESET. And once two vendors start doing it differently (I'm not sure:
have they done this already? I think so) it's no longer something a
boot ROM will get right.

The only way to get this stuff right is to have a hardware reset, or
else to avoid all of the stateful modes in software.

> So, how about adding a flag that says "my board has the NOR HW
> RESET pin wired" (there would be a DT props to set that flag). Then you
> add a WARN_ON() when this flag is not set and a NOR chip impacted by
> this bug is detected.

I'd kinda prefer the reverse. There really isn't a need to document
anything for a working system (software usually can't control this
RESET pin). The burden should be on the b0rked system to document
where it needs unsound hacks to survive.

> This way you make sure people are informed that
> they're doing something wrong, and for those who can't change their HW
> (because it's already widely deployed), you have a fix that improve
> things.

Or even better: put this hack behind a DT flag, so that one has to
admit that their board design is broken before it will even do
anything. Proposal: "linux,badly-designed-flash-reset".

But, I'd prefer just (partially?) reverting this, and let the authors
submit something that works. We're not obligated to keep bad hacks in
the kernel.

Brian
NeilBrown July 23, 2018, 10:46 p.m. UTC | #5
On Mon, Jul 23 2018, Brian Norris wrote:

> Hi Boris,
>
> On Mon, Jul 23, 2018 at 1:10 PM, Boris Brezillon
> <boris.brezillon@bootlin.com> wrote:
>> On Mon, 23 Jul 2018 11:13:50 -0700
>> Brian Norris <computersforpeace@gmail.com> wrote:
>>> I noticed this got merged, but I wanted to put my 2 cents in here:
>>
>> I wish you had replied to this thread when it was posted (more than
>> 6 months ago). Reverting the patch now implies making some people
>> unhappy because they'll have to resort to their old out-of-tree
>> hacks :-(.
>
> I'd say I'm sorry for not following things closely these days, but I'm
> not really that sorry. There are plenty of other capable hands. And if
> y'all shoot yourselves in the foot, so be it. This patch isn't going
> to blow things up, but now that I did finally notice it (because it
> happened to show up in a list of backports I was looking at), I
> thought better late than never to remind you.
>
> For way of notification: Marek already noticed that we've started down
> a slippery slope months ago:
>
> https://lkml.org/lkml/2018/4/8/141
> Re: [PATCH] mtd: spi-nor: clear Extended Address Reg on switch to
> 3-byte addressing.
>
> I'm not quite sure why that wasn't taken to its logical conclusion --
> that the hack should be reverted.
>
> This problem has been noted many times already, and we've always
> stayed on the side of *avoiding* this hack. A few references from a
> search of my email:
>
> http://lists.infradead.org/pipermail/linux-mtd/2013-March/046343.html
> [PATCH 1/3] mtd: m25p80: utilize dedicated 4-byte addressing commands
>
> http://lists.infradead.org/pipermail/barebox/2014-September/020682.html
> [RFC] MTD m25p80 3-byte addressing and boot problem
>
> http://lists.infradead.org/pipermail/linux-mtd/2015-February/057683.html
> [PATCH 2/2] m25p80: if supported put chip to deep power down if not used
>
>>> On Wed, Dec 06, 2017 at 10:53:42AM +0800, Zhiqiang Hou wrote:
>>> > From: Hou Zhiqiang <Zhiqiang.Hou@nxp.com>
>>> >
>>> > Restore the status to be compatible with legacy devices.
>>> > Take Freescale eSPI boot for example, it copies (in 3 Byte
>>> > addressing mode) the RCW and bootloader images from SPI flash
>>> > without firing a reset signal previously, so the reboot command
>>> > will fail without reseting the addressing mode of SPI flash.
>>> > This patch implement .shutdown function to restore the status
>>> > in reboot process, and add the same operation to the .remove
>>> > function.
>>>
>>> We have previously rejected this patch multiple times, because the above
>>> comment demonstrates a broken product.
>>
>> If we were to only support working HW parts, I fear Linux would not
>> support a lot of HW (that's even more true when it comes to flashes :P).
>
> You stopped allowing UBI to attach to MLC NAND recently, no? That
> sounds like almost the same boat -- you've probably killed quite a few
> shitty products, if they were to use mainline directly.
>
> Anyway, that's derailing the issue. Supporting broken hardware isn't
> something you try to do by applying the same hack to all systems. You
> normally try to apply your hack as narrowly as possible. You seem to
> imply that below. So maybe that's a solution to move forward with. But
> I'd personally be just as happy to see the patch reverted.
>
>>> You cannot guarantee that all
>>> reboots will invoke the .shutdown() method -- what about crashes? What
>>> about watchdog resets? IIUC, those will hit the same broken behavior,
>>> and have unexepcted behavior in your bootloader.
>>
>> Yes, there are corner cases that are not addressed with this approach,
>
> Is a system crash really a corner case? :D
>
>> but it still seems to improve things. Of course, that means the
>> user should try to re-route all HW reset sources to SW ones (RESET input
>> pin muxed to the GPIO controller, watchdog generating an interrupt
>> instead of directly asserting the RESET output pin), which is not always
>> possible, but even when it's not, isn't it better to have a setup that
>> works fine 99% of the time instead of 50% of the time?
>
> Perhaps, but not at the expense of future development. And
> realistically, no one is doing that if they have this hack. Most
> people won't even know that this hack is protecting them at all (so
> again, they won't try to mitigate the problem any further).
>
>>> I suppose one could argue for doing this in remove(), but AIUI you're
>>> just papering over system bugs by introducing the shutdown() function
>>> here. Thus, I'd prefer we drop the shutdown() method to avoid misleading
>>> other users of this driver.
>>
>> I understand your point. But if the problem is about making sure people
>> designing new boards get that right, why not complaining at probe time
>> when things are wrong?
>>
>> I mean, spi_nor_restore() seems to only do something on very specific
>> NORs (those on which a SW RESET does not resets the addressing
>> mode).
>
> The point isn't that SW RESET doesn't reset the addressing mode -- it
> does on any flash I've seen. The point is that most systems are built
> around a stateless assumption in these flash. IIRC, there wasn't even
> a SW RESET command at all until these "huge" flash came around and
> stateful addressing modes came about. So boot ROMs and bootloaders
> would have to be updated to start figuring out when/how to do this SW
> RESET. And once two vendors start doing it differently (I'm not sure:
> have they done this already? I think so) it's no longer something a
> boot ROM will get right.
>
> The only way to get this stuff right is to have a hardware reset, or
> else to avoid all of the stateful modes in software.
>
>> So, how about adding a flag that says "my board has the NOR HW
>> RESET pin wired" (there would be a DT props to set that flag). Then you
>> add a WARN_ON() when this flag is not set and a NOR chip impacted by
>> this bug is detected.
>
> I'd kinda prefer the reverse. There really isn't a need to document
> anything for a working system (software usually can't control this
> RESET pin). The burden should be on the b0rked system to document
> where it needs unsound hacks to survive.
>
>> This way you make sure people are informed that
>> they're doing something wrong, and for those who can't change their HW
>> (because it's already widely deployed), you have a fix that improve
>> things.
>
> Or even better: put this hack behind a DT flag, so that one has to
> admit that their board design is broken before it will even do
> anything. Proposal: "linux,badly-designed-flash-reset".
>
> But, I'd prefer just (partially?) reverting this, and let the authors
> submit something that works. We're not obligated to keep bad hacks in
> the kernel.
>
> Brian

One possibility that occurred to me when I was exploring this issue is
to revert to 3-byte mode whenever 4-byte was not actively in use.
So any access beyond 16Meg is:
 switch-to-4-byte ; perform IO ; switch to 3-byte
or similar.  On my hardware it would be more efficient to
use the 4-byte opcode to perform the IO, then reset the cached
4th address byte that the NOR chip transparently remembered.

This adds a little overhead, but should be fairly robust.
It doesn't help if something goes terribly wrong while IO is happening,
but I don't think any other software solution does either.

How would you see that approach?

Thanks,
NeilBrown
Boris Brezillon July 23, 2018, 11:18 p.m. UTC | #6
+Neil

On Mon, 23 Jul 2018 15:06:43 -0700
Brian Norris <computersforpeace@gmail.com> wrote:

> Hi Boris,
> 
> On Mon, Jul 23, 2018 at 1:10 PM, Boris Brezillon
> <boris.brezillon@bootlin.com> wrote:
> > On Mon, 23 Jul 2018 11:13:50 -0700
> > Brian Norris <computersforpeace@gmail.com> wrote:  
> >> I noticed this got merged, but I wanted to put my 2 cents in here:  
> >
> > I wish you had replied to this thread when it was posted (more than
> > 6 months ago). Reverting the patch now implies making some people
> > unhappy because they'll have to resort to their old out-of-tree
> > hacks :-(.  
> 
> I'd say I'm sorry for not following things closely these days, but I'm
> not really that sorry. There are plenty of other capable hands. And if
> y'all shoot yourselves in the foot, so be it. This patch isn't going
> to blow things up, but now that I did finally notice it (because it
> happened to show up in a list of backports I was looking at), I
> thought better late than never to remind you.
> 
> For way of notification: Marek already noticed that we've started down
> a slippery slope months ago:
> 
> https://lkml.org/lkml/2018/4/8/141
> Re: [PATCH] mtd: spi-nor: clear Extended Address Reg on switch to
> 3-byte addressing.
> 
> I'm not quite sure why that wasn't taken to its logical conclusion --
> that the hack should be reverted.
> 
> This problem has been noted many times already, and we've always
> stayed on the side of *avoiding* this hack. A few references from a
> search of my email:
> 
> http://lists.infradead.org/pipermail/linux-mtd/2013-March/046343.html
> [PATCH 1/3] mtd: m25p80: utilize dedicated 4-byte addressing commands
> 
> http://lists.infradead.org/pipermail/barebox/2014-September/020682.html
> [RFC] MTD m25p80 3-byte addressing and boot problem
> 
> http://lists.infradead.org/pipermail/linux-mtd/2015-February/057683.html
> [PATCH 2/2] m25p80: if supported put chip to deep power down if not used

To my defense, I was not actively following SPI NOR topics at this
time, but, in the light of those discussions, I still keep thinking we
should try our best to improve support for broken HW.
So, ideally, we should find a way to support getting back to 3-byte
addressing mode when resetting, while generating enough noise to make
people well aware that their board design is broken (I think the
proposal of the new DT prop + a WARN_ON() is a good idea).

> 
> >> On Wed, Dec 06, 2017 at 10:53:42AM +0800, Zhiqiang Hou wrote:  
> >> > From: Hou Zhiqiang <Zhiqiang.Hou@nxp.com>
> >> >
> >> > Restore the status to be compatible with legacy devices.
> >> > Take Freescale eSPI boot for example, it copies (in 3 Byte
> >> > addressing mode) the RCW and bootloader images from SPI flash
> >> > without firing a reset signal previously, so the reboot command
> >> > will fail without reseting the addressing mode of SPI flash.
> >> > This patch implement .shutdown function to restore the status
> >> > in reboot process, and add the same operation to the .remove
> >> > function.  
> >>
> >> We have previously rejected this patch multiple times, because the above
> >> comment demonstrates a broken product.  
> >
> > If we were to only support working HW parts, I fear Linux would not
> > support a lot of HW (that's even more true when it comes to flashes :P).  
> 
> You stopped allowing UBI to attach to MLC NAND recently, no? That
> sounds like almost the same boat -- you've probably killed quite a few
> shitty products, if they were to use mainline directly.

Well, that's a bit different in that it's purely a SW issue, and we
also try to encourage people to take over the work we started with
Richard to fix that problem. But I get your point, we broke setups that
were working in an unreliable way, pretty much what you're suggesting to
do here.

> 
> Anyway, that's derailing the issue. Supporting broken hardware isn't
> something you try to do by applying the same hack to all systems. You
> normally try to apply your hack as narrowly as possible.

This, I agree with.

> You seem to
> imply that below. So maybe that's a solution to move forward with. But
> I'd personally be just as happy to see the patch reverted.
> 
> >> You cannot guarantee that all
> >> reboots will invoke the .shutdown() method -- what about crashes? What
> >> about watchdog resets? IIUC, those will hit the same broken behavior,
> >> and have unexepcted behavior in your bootloader.  
> >
> > Yes, there are corner cases that are not addressed with this approach,  
> 
> Is a system crash really a corner case? :D

Well, nothing forces you to reset the platform using a HW reset when
that happens :P.

> 
> > but it still seems to improve things. Of course, that means the
> > user should try to re-route all HW reset sources to SW ones (RESET input
> > pin muxed to the GPIO controller, watchdog generating an interrupt
> > instead of directly asserting the RESET output pin), which is not always
> > possible, but even when it's not, isn't it better to have a setup that
> > works fine 99% of the time instead of 50% of the time?  
> 
> Perhaps, but not at the expense of future development. And
> realistically, no one is doing that if they have this hack. Most
> people won't even know that this hack is protecting them at all (so
> again, they won't try to mitigate the problem any further).

Unless we add a huge backtrace at probe time which forces them to look
closer at what they did wrong (like you seem to suggest below).

> 
> >> I suppose one could argue for doing this in remove(), but AIUI you're
> >> just papering over system bugs by introducing the shutdown() function
> >> here. Thus, I'd prefer we drop the shutdown() method to avoid misleading
> >> other users of this driver.  
> >
> > I understand your point. But if the problem is about making sure people
> > designing new boards get that right, why not complaining at probe time
> > when things are wrong?
> >
> > I mean, spi_nor_restore() seems to only do something on very specific
> > NORs (those on which a SW RESET does not resets the addressing
> > mode).  
> 
> The point isn't that SW RESET doesn't reset the addressing mode -- it
> does on any flash I've seen. The point is that most systems are built
> around a stateless assumption in these flash. IIRC, there wasn't even
> a SW RESET command at all until these "huge" flash came around and
> stateful addressing modes came about. So boot ROMs and bootloaders
> would have to be updated to start figuring out when/how to do this SW
> RESET. And once two vendors start doing it differently (I'm not sure:
> have they done this already? I think so) it's no longer something a
> boot ROM will get right.
> 
> The only way to get this stuff right is to have a hardware reset, or
> else to avoid all of the stateful modes in software.

Okay.

> 
> > So, how about adding a flag that says "my board has the NOR HW
> > RESET pin wired" (there would be a DT props to set that flag). Then you
> > add a WARN_ON() when this flag is not set and a NOR chip impacted by
> > this bug is detected.  
> 
> I'd kinda prefer the reverse. There really isn't a need to document
> anything for a working system (software usually can't control this
> RESET pin). The burden should be on the b0rked system to document
> where it needs unsound hacks to survive.

That's true.

> 
> > This way you make sure people are informed that
> > they're doing something wrong, and for those who can't change their HW
> > (because it's already widely deployed), you have a fix that improve
> > things.  
> 
> Or even better: put this hack behind a DT flag, so that one has to
> admit that their board design is broken before it will even do
> anything. Proposal: "linux,badly-designed-flash-reset".

I think we can remove the "linux," prefix. If it's badly designed, it
applies to all OSes, don't you think?

Regards,

Boris
Boris Brezillon July 23, 2018, 11:22 p.m. UTC | #7
On Tue, 24 Jul 2018 08:46:33 +1000
NeilBrown <neilb@suse.com> wrote:

> On Mon, Jul 23 2018, Brian Norris wrote:
> 
> > Hi Boris,
> >
> > On Mon, Jul 23, 2018 at 1:10 PM, Boris Brezillon
> > <boris.brezillon@bootlin.com> wrote:  
> >> On Mon, 23 Jul 2018 11:13:50 -0700
> >> Brian Norris <computersforpeace@gmail.com> wrote:  
> >>> I noticed this got merged, but I wanted to put my 2 cents in here:  
> >>
> >> I wish you had replied to this thread when it was posted (more than
> >> 6 months ago). Reverting the patch now implies making some people
> >> unhappy because they'll have to resort to their old out-of-tree
> >> hacks :-(.  
> >
> > I'd say I'm sorry for not following things closely these days, but I'm
> > not really that sorry. There are plenty of other capable hands. And if
> > y'all shoot yourselves in the foot, so be it. This patch isn't going
> > to blow things up, but now that I did finally notice it (because it
> > happened to show up in a list of backports I was looking at), I
> > thought better late than never to remind you.
> >
> > For way of notification: Marek already noticed that we've started down
> > a slippery slope months ago:
> >
> > https://lkml.org/lkml/2018/4/8/141
> > Re: [PATCH] mtd: spi-nor: clear Extended Address Reg on switch to
> > 3-byte addressing.
> >
> > I'm not quite sure why that wasn't taken to its logical conclusion --
> > that the hack should be reverted.
> >
> > This problem has been noted many times already, and we've always
> > stayed on the side of *avoiding* this hack. A few references from a
> > search of my email:
> >
> > http://lists.infradead.org/pipermail/linux-mtd/2013-March/046343.html
> > [PATCH 1/3] mtd: m25p80: utilize dedicated 4-byte addressing commands
> >
> > http://lists.infradead.org/pipermail/barebox/2014-September/020682.html
> > [RFC] MTD m25p80 3-byte addressing and boot problem
> >
> > http://lists.infradead.org/pipermail/linux-mtd/2015-February/057683.html
> > [PATCH 2/2] m25p80: if supported put chip to deep power down if not used
> >  
> >>> On Wed, Dec 06, 2017 at 10:53:42AM +0800, Zhiqiang Hou wrote:  
> >>> > From: Hou Zhiqiang <Zhiqiang.Hou@nxp.com>
> >>> >
> >>> > Restore the status to be compatible with legacy devices.
> >>> > Take Freescale eSPI boot for example, it copies (in 3 Byte
> >>> > addressing mode) the RCW and bootloader images from SPI flash
> >>> > without firing a reset signal previously, so the reboot command
> >>> > will fail without reseting the addressing mode of SPI flash.
> >>> > This patch implement .shutdown function to restore the status
> >>> > in reboot process, and add the same operation to the .remove
> >>> > function.  
> >>>
> >>> We have previously rejected this patch multiple times, because the above
> >>> comment demonstrates a broken product.  
> >>
> >> If we were to only support working HW parts, I fear Linux would not
> >> support a lot of HW (that's even more true when it comes to flashes :P).  
> >
> > You stopped allowing UBI to attach to MLC NAND recently, no? That
> > sounds like almost the same boat -- you've probably killed quite a few
> > shitty products, if they were to use mainline directly.
> >
> > Anyway, that's derailing the issue. Supporting broken hardware isn't
> > something you try to do by applying the same hack to all systems. You
> > normally try to apply your hack as narrowly as possible. You seem to
> > imply that below. So maybe that's a solution to move forward with. But
> > I'd personally be just as happy to see the patch reverted.
> >  
> >>> You cannot guarantee that all
> >>> reboots will invoke the .shutdown() method -- what about crashes? What
> >>> about watchdog resets? IIUC, those will hit the same broken behavior,
> >>> and have unexepcted behavior in your bootloader.  
> >>
> >> Yes, there are corner cases that are not addressed with this approach,  
> >
> > Is a system crash really a corner case? :D
> >  
> >> but it still seems to improve things. Of course, that means the
> >> user should try to re-route all HW reset sources to SW ones (RESET input
> >> pin muxed to the GPIO controller, watchdog generating an interrupt
> >> instead of directly asserting the RESET output pin), which is not always
> >> possible, but even when it's not, isn't it better to have a setup that
> >> works fine 99% of the time instead of 50% of the time?  
> >
> > Perhaps, but not at the expense of future development. And
> > realistically, no one is doing that if they have this hack. Most
> > people won't even know that this hack is protecting them at all (so
> > again, they won't try to mitigate the problem any further).
> >  
> >>> I suppose one could argue for doing this in remove(), but AIUI you're
> >>> just papering over system bugs by introducing the shutdown() function
> >>> here. Thus, I'd prefer we drop the shutdown() method to avoid misleading
> >>> other users of this driver.  
> >>
> >> I understand your point. But if the problem is about making sure people
> >> designing new boards get that right, why not complaining at probe time
> >> when things are wrong?
> >>
> >> I mean, spi_nor_restore() seems to only do something on very specific
> >> NORs (those on which a SW RESET does not resets the addressing
> >> mode).  
> >
> > The point isn't that SW RESET doesn't reset the addressing mode -- it
> > does on any flash I've seen. The point is that most systems are built
> > around a stateless assumption in these flash. IIRC, there wasn't even
> > a SW RESET command at all until these "huge" flash came around and
> > stateful addressing modes came about. So boot ROMs and bootloaders
> > would have to be updated to start figuring out when/how to do this SW
> > RESET. And once two vendors start doing it differently (I'm not sure:
> > have they done this already? I think so) it's no longer something a
> > boot ROM will get right.
> >
> > The only way to get this stuff right is to have a hardware reset, or
> > else to avoid all of the stateful modes in software.
> >  
> >> So, how about adding a flag that says "my board has the NOR HW
> >> RESET pin wired" (there would be a DT props to set that flag). Then you
> >> add a WARN_ON() when this flag is not set and a NOR chip impacted by
> >> this bug is detected.  
> >
> > I'd kinda prefer the reverse. There really isn't a need to document
> > anything for a working system (software usually can't control this
> > RESET pin). The burden should be on the b0rked system to document
> > where it needs unsound hacks to survive.
> >  
> >> This way you make sure people are informed that
> >> they're doing something wrong, and for those who can't change their HW
> >> (because it's already widely deployed), you have a fix that improve
> >> things.  
> >
> > Or even better: put this hack behind a DT flag, so that one has to
> > admit that their board design is broken before it will even do
> > anything. Proposal: "linux,badly-designed-flash-reset".
> >
> > But, I'd prefer just (partially?) reverting this, and let the authors
> > submit something that works. We're not obligated to keep bad hacks in
> > the kernel.
> >
> > Brian  
> 
> One possibility that occurred to me when I was exploring this issue is
> to revert to 3-byte mode whenever 4-byte was not actively in use.
> So any access beyond 16Meg is:
>  switch-to-4-byte ; perform IO ; switch to 3-byte
> or similar.  On my hardware it would be more efficient to
> use the 4-byte opcode to perform the IO, then reset the cached
> 4th address byte that the NOR chip transparently remembered.
> 
> This adds a little overhead, but should be fairly robust.
> It doesn't help if something goes terribly wrong while IO is happening,
> but I don't think any other software solution does either.
> 
> How would you see that approach?

I think the problem stands: people that have proper HW mitigation for
this problem (NOR chip is reset when the Processor is reset) don't want
to pay the overhead. So, even if we go for this approach, we probably
want to only do that for broken HW.
NeilBrown July 24, 2018, 1:51 a.m. UTC | #8
On Tue, Jul 24 2018, Boris Brezillon wrote:

> On Tue, 24 Jul 2018 08:46:33 +1000
> NeilBrown <neilb@suse.com> wrote:
>
>> On Mon, Jul 23 2018, Brian Norris wrote:
>> 
>> > Hi Boris,
>> >
>> > On Mon, Jul 23, 2018 at 1:10 PM, Boris Brezillon
>> > <boris.brezillon@bootlin.com> wrote:  
>> >> On Mon, 23 Jul 2018 11:13:50 -0700
>> >> Brian Norris <computersforpeace@gmail.com> wrote:  
>> >>> I noticed this got merged, but I wanted to put my 2 cents in here:  
>> >>
>> >> I wish you had replied to this thread when it was posted (more than
>> >> 6 months ago). Reverting the patch now implies making some people
>> >> unhappy because they'll have to resort to their old out-of-tree
>> >> hacks :-(.  
>> >
>> > I'd say I'm sorry for not following things closely these days, but I'm
>> > not really that sorry. There are plenty of other capable hands. And if
>> > y'all shoot yourselves in the foot, so be it. This patch isn't going
>> > to blow things up, but now that I did finally notice it (because it
>> > happened to show up in a list of backports I was looking at), I
>> > thought better late than never to remind you.
>> >
>> > For way of notification: Marek already noticed that we've started down
>> > a slippery slope months ago:
>> >
>> > https://lkml.org/lkml/2018/4/8/141
>> > Re: [PATCH] mtd: spi-nor: clear Extended Address Reg on switch to
>> > 3-byte addressing.
>> >
>> > I'm not quite sure why that wasn't taken to its logical conclusion --
>> > that the hack should be reverted.
>> >
>> > This problem has been noted many times already, and we've always
>> > stayed on the side of *avoiding* this hack. A few references from a
>> > search of my email:
>> >
>> > http://lists.infradead.org/pipermail/linux-mtd/2013-March/046343.html
>> > [PATCH 1/3] mtd: m25p80: utilize dedicated 4-byte addressing commands
>> >
>> > http://lists.infradead.org/pipermail/barebox/2014-September/020682.html
>> > [RFC] MTD m25p80 3-byte addressing and boot problem
>> >
>> > http://lists.infradead.org/pipermail/linux-mtd/2015-February/057683.html
>> > [PATCH 2/2] m25p80: if supported put chip to deep power down if not used
>> >  
>> >>> On Wed, Dec 06, 2017 at 10:53:42AM +0800, Zhiqiang Hou wrote:  
>> >>> > From: Hou Zhiqiang <Zhiqiang.Hou@nxp.com>
>> >>> >
>> >>> > Restore the status to be compatible with legacy devices.
>> >>> > Take Freescale eSPI boot for example, it copies (in 3 Byte
>> >>> > addressing mode) the RCW and bootloader images from SPI flash
>> >>> > without firing a reset signal previously, so the reboot command
>> >>> > will fail without reseting the addressing mode of SPI flash.
>> >>> > This patch implement .shutdown function to restore the status
>> >>> > in reboot process, and add the same operation to the .remove
>> >>> > function.  
>> >>>
>> >>> We have previously rejected this patch multiple times, because the above
>> >>> comment demonstrates a broken product.  
>> >>
>> >> If we were to only support working HW parts, I fear Linux would not
>> >> support a lot of HW (that's even more true when it comes to flashes :P).  
>> >
>> > You stopped allowing UBI to attach to MLC NAND recently, no? That
>> > sounds like almost the same boat -- you've probably killed quite a few
>> > shitty products, if they were to use mainline directly.
>> >
>> > Anyway, that's derailing the issue. Supporting broken hardware isn't
>> > something you try to do by applying the same hack to all systems. You
>> > normally try to apply your hack as narrowly as possible. You seem to
>> > imply that below. So maybe that's a solution to move forward with. But
>> > I'd personally be just as happy to see the patch reverted.
>> >  
>> >>> You cannot guarantee that all
>> >>> reboots will invoke the .shutdown() method -- what about crashes? What
>> >>> about watchdog resets? IIUC, those will hit the same broken behavior,
>> >>> and have unexepcted behavior in your bootloader.  
>> >>
>> >> Yes, there are corner cases that are not addressed with this approach,  
>> >
>> > Is a system crash really a corner case? :D
>> >  
>> >> but it still seems to improve things. Of course, that means the
>> >> user should try to re-route all HW reset sources to SW ones (RESET input
>> >> pin muxed to the GPIO controller, watchdog generating an interrupt
>> >> instead of directly asserting the RESET output pin), which is not always
>> >> possible, but even when it's not, isn't it better to have a setup that
>> >> works fine 99% of the time instead of 50% of the time?  
>> >
>> > Perhaps, but not at the expense of future development. And
>> > realistically, no one is doing that if they have this hack. Most
>> > people won't even know that this hack is protecting them at all (so
>> > again, they won't try to mitigate the problem any further).
>> >  
>> >>> I suppose one could argue for doing this in remove(), but AIUI you're
>> >>> just papering over system bugs by introducing the shutdown() function
>> >>> here. Thus, I'd prefer we drop the shutdown() method to avoid misleading
>> >>> other users of this driver.  
>> >>
>> >> I understand your point. But if the problem is about making sure people
>> >> designing new boards get that right, why not complaining at probe time
>> >> when things are wrong?
>> >>
>> >> I mean, spi_nor_restore() seems to only do something on very specific
>> >> NORs (those on which a SW RESET does not resets the addressing
>> >> mode).  
>> >
>> > The point isn't that SW RESET doesn't reset the addressing mode -- it
>> > does on any flash I've seen. The point is that most systems are built
>> > around a stateless assumption in these flash. IIRC, there wasn't even
>> > a SW RESET command at all until these "huge" flash came around and
>> > stateful addressing modes came about. So boot ROMs and bootloaders
>> > would have to be updated to start figuring out when/how to do this SW
>> > RESET. And once two vendors start doing it differently (I'm not sure:
>> > have they done this already? I think so) it's no longer something a
>> > boot ROM will get right.
>> >
>> > The only way to get this stuff right is to have a hardware reset, or
>> > else to avoid all of the stateful modes in software.
>> >  
>> >> So, how about adding a flag that says "my board has the NOR HW
>> >> RESET pin wired" (there would be a DT props to set that flag). Then you
>> >> add a WARN_ON() when this flag is not set and a NOR chip impacted by
>> >> this bug is detected.  
>> >
>> > I'd kinda prefer the reverse. There really isn't a need to document
>> > anything for a working system (software usually can't control this
>> > RESET pin). The burden should be on the b0rked system to document
>> > where it needs unsound hacks to survive.
>> >  
>> >> This way you make sure people are informed that
>> >> they're doing something wrong, and for those who can't change their HW
>> >> (because it's already widely deployed), you have a fix that improve
>> >> things.  
>> >
>> > Or even better: put this hack behind a DT flag, so that one has to
>> > admit that their board design is broken before it will even do
>> > anything. Proposal: "linux,badly-designed-flash-reset".
>> >
>> > But, I'd prefer just (partially?) reverting this, and let the authors
>> > submit something that works. We're not obligated to keep bad hacks in
>> > the kernel.
>> >
>> > Brian  
>> 
>> One possibility that occurred to me when I was exploring this issue is
>> to revert to 3-byte mode whenever 4-byte was not actively in use.
>> So any access beyond 16Meg is:
>>  switch-to-4-byte ; perform IO ; switch to 3-byte
>> or similar.  On my hardware it would be more efficient to
>> use the 4-byte opcode to perform the IO, then reset the cached
>> 4th address byte that the NOR chip transparently remembered.
>> 
>> This adds a little overhead, but should be fairly robust.
>> It doesn't help if something goes terribly wrong while IO is happening,
>> but I don't think any other software solution does either.
>> 
>> How would you see that approach?
>
> I think the problem stands: people that have proper HW mitigation for
> this problem (NOR chip is reset when the Processor is reset) don't want
> to pay the overhead. So, even if we go for this approach, we probably
> want to only do that for broken HW.

I agree that a "my-hardware-is-suboptimal" flag is appropriate.
I was addressing the suggestion that the current approach doesn't handle
all corner cases and was suggesting a different approach that might
handle more corner-cases.  I should have been more explicit about that.

Thanks,
NeilBrown
Brian Norris July 24, 2018, 7:41 p.m. UTC | #9
Hi,

On Tue, Jul 24, 2018 at 11:51:49AM +1000, NeilBrown wrote:
> On Tue, Jul 24 2018, Boris Brezillon wrote:
> > On Tue, 24 Jul 2018 08:46:33 +1000
> > NeilBrown <neilb@suse.com> wrote:
> >> One possibility that occurred to me when I was exploring this issue is
> >> to revert to 3-byte mode whenever 4-byte was not actively in use.
> >> So any access beyond 16Meg is:
> >>  switch-to-4-byte ; perform IO ; switch to 3-byte
> >> or similar.  On my hardware it would be more efficient to
> >> use the 4-byte opcode to perform the IO, then reset the cached
> >> 4th address byte that the NOR chip transparently remembered.

Do these chips cache the last 4th-byte used? I don't recall reading
that, but that would be interesting. It also sounds like that would make
things even more difficult to do robustly.

> >> This adds a little overhead, but should be fairly robust.
> >> It doesn't help if something goes terribly wrong while IO is happening,
> >> but I don't think any other software solution does either.
> >> 
> >> How would you see that approach?
> >
> > I think the problem stands: people that have proper HW mitigation for
> > this problem (NOR chip is reset when the Processor is reset) don't want
> > to pay the overhead. So, even if we go for this approach, we probably
> > want to only do that for broken HW.

If it actually saves bytes on the wire to stay in 3-byte mode more
often, then that could be helpful to all systems. But otherwise, yes, it
doesn't really belong on a properly-designed system.

> I agree that a "my-hardware-is-suboptimal" flag is appropriate.
> I was addressing the suggestion that the current approach doesn't handle
> all corner cases and was suggesting a different approach that might
> handle more corner-cases.  I should have been more explicit about that.

If you want to talk about optimizing the broken-hardware hack, then
fine. That's not my interest at all.

Brian
Brian Norris July 24, 2018, 7:52 p.m. UTC | #10
Hi Boris,

On Tue, Jul 24, 2018 at 01:18:11AM +0200, Boris Brezillon wrote:
> On Mon, 23 Jul 2018 15:06:43 -0700
> Brian Norris <computersforpeace@gmail.com> wrote:
> > On Mon, Jul 23, 2018 at 1:10 PM, Boris Brezillon
> > <boris.brezillon@bootlin.com> wrote:
> > > but it still seems to improve things. Of course, that means the
> > > user should try to re-route all HW reset sources to SW ones (RESET input
> > > pin muxed to the GPIO controller, watchdog generating an interrupt
> > > instead of directly asserting the RESET output pin), which is not always
> > > possible, but even when it's not, isn't it better to have a setup that
> > > works fine 99% of the time instead of 50% of the time?  
> > 
> > Perhaps, but not at the expense of future development. And
> > realistically, no one is doing that if they have this hack. Most
> > people won't even know that this hack is protecting them at all (so
> > again, they won't try to mitigate the problem any further).
> 
> Unless we add a huge backtrace at probe time which forces them to look
> closer at what they did wrong (like you seem to suggest below).

Sure, if we can force a huge backtrace, then maybe that's an acceptable
situation. I just predict that someday, somebody will argue that the
warning should be downgraded, since there's nothing the average user can
do about it. (Well, except not buying from that manufacturer in the
future.)

> > Or even better: put this hack behind a DT flag, so that one has to
> > admit that their board design is broken before it will even do
> > anything. Proposal: "linux,badly-designed-flash-reset".
> 
> I think we can remove the "linux," prefix. If it's badly designed, it
> applies to all OSes, don't you think?

Sure.

Shall I send a patch?

Brian
Boris Brezillon July 24, 2018, 7:59 p.m. UTC | #11
On Tue, 24 Jul 2018 12:52:02 -0700
Brian Norris <computersforpeace@gmail.com> wrote:

> > > Or even better: put this hack behind a DT flag, so that one has to
> > > admit that their board design is broken before it will even do
> > > anything. Proposal: "linux,badly-designed-flash-reset".  
> > 
> > I think we can remove the "linux," prefix. If it's badly designed, it
> > applies to all OSes, don't you think?  
> 
> Sure.
> 
> Shall I send a patch?

Yep.
NeilBrown July 24, 2018, 9:48 p.m. UTC | #12
On Tue, Jul 24 2018, Brian Norris wrote:

> Hi,
>
> On Tue, Jul 24, 2018 at 11:51:49AM +1000, NeilBrown wrote:
>> On Tue, Jul 24 2018, Boris Brezillon wrote:
>> > On Tue, 24 Jul 2018 08:46:33 +1000
>> > NeilBrown <neilb@suse.com> wrote:
>> >> One possibility that occurred to me when I was exploring this issue is
>> >> to revert to 3-byte mode whenever 4-byte was not actively in use.
>> >> So any access beyond 16Meg is:
>> >>  switch-to-4-byte ; perform IO ; switch to 3-byte
>> >> or similar.  On my hardware it would be more efficient to
>> >> use the 4-byte opcode to perform the IO, then reset the cached
>> >> 4th address byte that the NOR chip transparently remembered.
>
> Do these chips cache the last 4th-byte used? I don't recall reading
> that, but that would be interesting. It also sounds like that would make
> things even more difficult to do robustly.

That is how the Winbond W25Q256FV appears to behave in my experiments.
Any time you use a 4-byte address, the high byte is saved.  Any time you
use a 3-byte address, the saved high-byte is used.
The data sheet seems to support this understanding, as detailed in

Commit: f134fbbb4ff8 ("mtd: spi-nor: clear Winbond Extended Address Reg on switch to 3-byte addressing.")


>
>> >> This adds a little overhead, but should be fairly robust.
>> >> It doesn't help if something goes terribly wrong while IO is happening,
>> >> but I don't think any other software solution does either.
>> >> 
>> >> How would you see that approach?
>> >
>> > I think the problem stands: people that have proper HW mitigation for
>> > this problem (NOR chip is reset when the Processor is reset) don't want
>> > to pay the overhead. So, even if we go for this approach, we probably
>> > want to only do that for broken HW.
>
> If it actually saves bytes on the wire to stay in 3-byte mode more
> often, then that could be helpful to all systems. But otherwise, yes, it
> doesn't really belong on a properly-designed system.

I'm not sure exactly what you encompass by the term "properly-designed",
but with the SOC that I have (mt7621) and the flash chip (Winbond
W25Q256FV) it is not possible to wire them up in any way that will work
reliably without software support for leaving 3-byte mode.

The SOC does not have an out-going reset line that signals a general
system reset - it only has one that signals watchdog reset.
The flash chip has an incoming reset line, but it shares a pin with
"hold", and that is the default use.  So we would need to program the
flash to listen for reset, and it would only catch a watchdog reset.
For any other reset, the CPU is (should be) in complete control (even
after a panic!) and it needs to reprogram the flash chip before
resetting the system.
But maybe you think either the SOC and/or the flash chip is not
"properly-designed" - and you may be correct..

Thanks,
NeilBrown


>
>> I agree that a "my-hardware-is-suboptimal" flag is appropriate.
>> I was addressing the suggestion that the current approach doesn't handle
>> all corner cases and was suggesting a different approach that might
>> handle more corner-cases.  I should have been more explicit about that.
>
> If you want to talk about optimizing the broken-hardware hack, then
> fine. That's not my interest at all.
>
> Brian
Brian Norris July 25, 2018, 11:24 p.m. UTC | #13
Hi,

On Wed, Jul 25, 2018 at 07:48:21AM +1000, NeilBrown wrote:
> On Tue, Jul 24 2018, Brian Norris wrote:
> > On Tue, Jul 24, 2018 at 11:51:49AM +1000, NeilBrown wrote:
> >> On Tue, Jul 24 2018, Boris Brezillon wrote:
> >> > On Tue, 24 Jul 2018 08:46:33 +1000
> >> > NeilBrown <neilb@suse.com> wrote:
> >> >> One possibility that occurred to me when I was exploring this issue is
> >> >> to revert to 3-byte mode whenever 4-byte was not actively in use.
> >> >> So any access beyond 16Meg is:
> >> >>  switch-to-4-byte ; perform IO ; switch to 3-byte
> >> >> or similar.  On my hardware it would be more efficient to
> >> >> use the 4-byte opcode to perform the IO, then reset the cached
> >> >> 4th address byte that the NOR chip transparently remembered.
> >
> > Do these chips cache the last 4th-byte used? I don't recall reading
> > that, but that would be interesting. It also sounds like that would make
> > things even more difficult to do robustly.
> 
> That is how the Winbond W25Q256FV appears to behave in my experiments.
> Any time you use a 4-byte address, the high byte is saved.  Any time you
> use a 3-byte address, the saved high-byte is used.
> The data sheet seems to support this understanding, as detailed in
> 
> Commit: f134fbbb4ff8 ("mtd: spi-nor: clear Winbond Extended Address Reg on switch to 3-byte addressing.")

Thanks for the notes. I never really paid attention to that section of
the datasheet, since I always relied on a hardware reset.

By the way, doesn't this part support the dedicated (non-stateful)
4-byte address commands? Why not use those and avoid the problem
entirely? Although, it's not actually clear to me from the datasheet
whether, e.g., Fast Read with 4-byte Address (0Ch) will set the Extended
Address Register.

> >> >> This adds a little overhead, but should be fairly robust.
> >> >> It doesn't help if something goes terribly wrong while IO is happening,
> >> >> but I don't think any other software solution does either.
> >> >> 
> >> >> How would you see that approach?
> >> >
> >> > I think the problem stands: people that have proper HW mitigation for
> >> > this problem (NOR chip is reset when the Processor is reset) don't want
> >> > to pay the overhead. So, even if we go for this approach, we probably
> >> > want to only do that for broken HW.
> >
> > If it actually saves bytes on the wire to stay in 3-byte mode more
> > often, then that could be helpful to all systems. But otherwise, yes, it
> > doesn't really belong on a properly-designed system.
> 
> I'm not sure exactly what you encompass by the term "properly-designed",

I consider all systems with >16MiB SPI flash that have stateful
addressing modes and don't have a useful HW RESET signal to be
improperly designed. These SPI flash never had a standardized (or
easily-determined) software reset, so it's nearly impossible to write
robust software for them that is compatible with a relatively generic
boot ROM and/or bootloader.

> but with the SOC that I have (mt7621) and the flash chip (Winbond
> W25Q256FV) it is not possible to wire them up in any way that will work
> reliably without software support for leaving 3-byte mode.

> The SOC does not have an out-going reset line that signals a general
> system reset - it only has one that signals watchdog reset.
> The flash chip has an incoming reset line, but it shares a pin with
> "hold", and that is the default use.  So we would need to program the

(BTW, you could probably choose to reprogram the HOLD# line to RESET#
before switching to 4-byte mode. That still doesn't resolve the SoC
reset line issue on its own.)

> flash to listen for reset, and it would only catch a watchdog reset.
> For any other reset, the CPU is (should be) in complete control (even
> after a panic!) and it needs to reprogram the flash chip before
> resetting the system.

"Even after a panic" is a bit dubious. That presumes that the cause of
the panic didn't also corrupt your exception code, driver code, driver
data structures, etc., that would be needed to reset the SPI chip
without double-faulting. And what about a panic in your SPI flash
driver?

Overall, my point is that the existence of such non-hardware-resettable
flash guarantees the possibility of a hard enough crash that you cannot
possibly reset it to 3-byte addressing before system reset.

> But maybe you think either the SOC and/or the flash chip is not
> "properly-designed" - and you may be correct..

Yes, I believe I do. If your watchdog reset pin was routed to the flash
HOLD/RESET pin, then I suppose it would be *possible* to have robust
software. But otherwise, I believe it's impossible.

Brian
diff mbox series

Patch

diff --git a/drivers/mtd/devices/m25p80.c b/drivers/mtd/devices/m25p80.c
index dbe6a1de2bb8..a4e18f6aaa33 100644
--- a/drivers/mtd/devices/m25p80.c
+++ b/drivers/mtd/devices/m25p80.c
@@ -307,10 +307,18 @@  static int m25p_remove(struct spi_device *spi)
 {
 	struct m25p	*flash = spi_get_drvdata(spi);
 
+	spi_nor_restore(&flash->spi_nor);
+
 	/* Clean up MTD stuff. */
 	return mtd_device_unregister(&flash->spi_nor.mtd);
 }
 
+static void m25p_shutdown(struct spi_device *spi)
+{
+	struct m25p *flash = spi_get_drvdata(spi);
+
+	spi_nor_restore(&flash->spi_nor);
+}
 /*
  * Do NOT add to this array without reading the following:
  *
@@ -386,6 +394,7 @@  static struct spi_driver m25p80_driver = {
 	.id_table	= m25p_ids,
 	.probe	= m25p_probe,
 	.remove	= m25p_remove,
+	.shutdown	= m25p_shutdown,
 
 	/* REVISIT: many of these chips have deep power-down modes, which
 	 * should clearly be entered on suspend() to minimize power use.