diff mbox series

mtd: rawnand: gpmi: Fix setting busy timeout setting

Message ID 20220614083138.3455683-1-s.hauer@pengutronix.de
State Accepted
Headers show
Series mtd: rawnand: gpmi: Fix setting busy timeout setting | expand

Commit Message

Sascha Hauer June 14, 2022, 8:31 a.m. UTC
The DEVICE_BUSY_TIMEOUT value is described in the Reference Manual as:

| Timeout waiting for NAND Ready/Busy or ATA IRQ. Used in WAIT_FOR_READY
| mode. This value is the number of GPMI_CLK cycles multiplied by 4096.

So instead of multiplying the value in cycles with 4096, we have to
divide it by that value. Use DIV_ROUND_UP to make sure we are on the
safe side, especially when the calculated value in cycles is smaller
than 4096 as typically the case.

This bug likely never triggered because any timeout != 0 usually will
do. In my case the busy timeout in cycles was originally calculated as
2408, which multiplied with 4096 is 0x968000. The lower 16 bits were
taken for the 16 bit wide register field, so the register value was
0x8000. With 2970bf5a32f0 ("mtd: rawnand: gpmi: fix controller timings
setting") however the value in cycles became 2384, which multiplied
with 4096 is 0x950000. The lower 16 bit are 0x0 now resulting in an
intermediate timeout when reading from NAND.

Fixes: b1206122069aa ("mtd: rawnand: gpmi: use core timings instead of an empirical derivation")
Cc: stable@vger.kernel.org
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
---

Just a resend with +Cc: stable@vger.kernel.org

 drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Miquel Raynal June 16, 2022, 2:46 p.m. UTC | #1
On Tue, 2022-06-14 at 08:31:38 UTC, Sascha Hauer wrote:
> The DEVICE_BUSY_TIMEOUT value is described in the Reference Manual as:
> 
> | Timeout waiting for NAND Ready/Busy or ATA IRQ. Used in WAIT_FOR_READY
> | mode. This value is the number of GPMI_CLK cycles multiplied by 4096.
> 
> So instead of multiplying the value in cycles with 4096, we have to
> divide it by that value. Use DIV_ROUND_UP to make sure we are on the
> safe side, especially when the calculated value in cycles is smaller
> than 4096 as typically the case.
> 
> This bug likely never triggered because any timeout != 0 usually will
> do. In my case the busy timeout in cycles was originally calculated as
> 2408, which multiplied with 4096 is 0x968000. The lower 16 bits were
> taken for the 16 bit wide register field, so the register value was
> 0x8000. With 2970bf5a32f0 ("mtd: rawnand: gpmi: fix controller timings
> setting") however the value in cycles became 2384, which multiplied
> with 4096 is 0x950000. The lower 16 bit are 0x0 now resulting in an
> intermediate timeout when reading from NAND.
> 
> Fixes: b1206122069aa ("mtd: rawnand: gpmi: use core timings instead of an empirical derivation")
> Cc: stable@vger.kernel.org
> Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>

Applied to https://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux.git mtd/fixes, thanks.

Miquel
Sascha Hauer July 1, 2022, 9:19 a.m. UTC | #2
On Tue, Jun 14, 2022 at 10:31:38AM +0200, Sascha Hauer wrote:
> The DEVICE_BUSY_TIMEOUT value is described in the Reference Manual as:
> 
> | Timeout waiting for NAND Ready/Busy or ATA IRQ. Used in WAIT_FOR_READY
> | mode. This value is the number of GPMI_CLK cycles multiplied by 4096.
> 
> So instead of multiplying the value in cycles with 4096, we have to
> divide it by that value. Use DIV_ROUND_UP to make sure we are on the
> safe side, especially when the calculated value in cycles is smaller
> than 4096 as typically the case.
> 
> This bug likely never triggered because any timeout != 0 usually will
> do. In my case the busy timeout in cycles was originally calculated as
> 2408, which multiplied with 4096 is 0x968000. The lower 16 bits were
> taken for the 16 bit wide register field, so the register value was
> 0x8000. With 2970bf5a32f0 ("mtd: rawnand: gpmi: fix controller timings
> setting") however the value in cycles became 2384, which multiplied
> with 4096 is 0x950000. The lower 16 bit are 0x0 now resulting in an
> intermediate timeout when reading from NAND.
> 
> Fixes: b1206122069aa ("mtd: rawnand: gpmi: use core timings instead of an empirical derivation")
> Cc: stable@vger.kernel.org
> Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
> ---
> 
> Just a resend with +Cc: stable@vger.kernel.org
> 
>  drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c b/drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c
> index 0b68d05846e18..889e403299568 100644
> --- a/drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c
> +++ b/drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c
> @@ -890,7 +890,7 @@ static int gpmi_nfc_compute_timings(struct gpmi_nand_data *this,
>  	hw->timing0 = BF_GPMI_TIMING0_ADDRESS_SETUP(addr_setup_cycles) |
>  		      BF_GPMI_TIMING0_DATA_HOLD(data_hold_cycles) |
>  		      BF_GPMI_TIMING0_DATA_SETUP(data_setup_cycles);
> -	hw->timing1 = BF_GPMI_TIMING1_BUSY_TIMEOUT(busy_timeout_cycles * 4096);
> +	hw->timing1 = BF_GPMI_TIMING1_BUSY_TIMEOUT(DIV_ROUND_UP(busy_timeout_cycles, 4096));

This patch is broken. The change itself is fine, but busy_timeout_cycles
is calculated wrong.

busy_timeout_cycles is calculated based on sdr->tR_max which is the page
read time. This timeout is also used for the page program and block
erase operations which have orders of magnitude bigger timeouts. This
means with this patch the controller times out on pages programs and
block erase operations. With the current code this timeout will be
silent as the timeout interrupt is not active.

** THIS PATCH WILL CAUSE DATA LOSS ON YOUR NAND!! **

Fortunately this patch hasn't been included in any mainline release, but
unfortunately it showed up in several stable kernels. Don't use
v5.4.202, v5.10.127, v5.15.51 or v5.18.8 on i.MX[678] or i.MX28 boards
with NAND.

I am sorry for the trouble I have likely caused. I am working on a fix
and will post it later the day.

Sascha
Miquel Raynal July 1, 2022, 9:47 a.m. UTC | #3
Hi Sascha,

+ Richard

s.hauer@pengutronix.de wrote on Fri, 1 Jul 2022 11:19:09 +0200:

> On Tue, Jun 14, 2022 at 10:31:38AM +0200, Sascha Hauer wrote:
> > The DEVICE_BUSY_TIMEOUT value is described in the Reference Manual as:
> > 
> > | Timeout waiting for NAND Ready/Busy or ATA IRQ. Used in WAIT_FOR_READY
> > | mode. This value is the number of GPMI_CLK cycles multiplied by 4096.
> > 
> > So instead of multiplying the value in cycles with 4096, we have to
> > divide it by that value. Use DIV_ROUND_UP to make sure we are on the
> > safe side, especially when the calculated value in cycles is smaller
> > than 4096 as typically the case.
> > 
> > This bug likely never triggered because any timeout != 0 usually will
> > do. In my case the busy timeout in cycles was originally calculated as
> > 2408, which multiplied with 4096 is 0x968000. The lower 16 bits were
> > taken for the 16 bit wide register field, so the register value was
> > 0x8000. With 2970bf5a32f0 ("mtd: rawnand: gpmi: fix controller timings
> > setting") however the value in cycles became 2384, which multiplied
> > with 4096 is 0x950000. The lower 16 bit are 0x0 now resulting in an
> > intermediate timeout when reading from NAND.
> > 
> > Fixes: b1206122069aa ("mtd: rawnand: gpmi: use core timings instead of an empirical derivation")
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
> > ---
> > 
> > Just a resend with +Cc: stable@vger.kernel.org
> > 
> >  drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c b/drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c
> > index 0b68d05846e18..889e403299568 100644
> > --- a/drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c
> > +++ b/drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c
> > @@ -890,7 +890,7 @@ static int gpmi_nfc_compute_timings(struct gpmi_nand_data *this,
> >  	hw->timing0 = BF_GPMI_TIMING0_ADDRESS_SETUP(addr_setup_cycles) |
> >  		      BF_GPMI_TIMING0_DATA_HOLD(data_hold_cycles) |
> >  		      BF_GPMI_TIMING0_DATA_SETUP(data_setup_cycles);
> > -	hw->timing1 = BF_GPMI_TIMING1_BUSY_TIMEOUT(busy_timeout_cycles * 4096);
> > +	hw->timing1 = BF_GPMI_TIMING1_BUSY_TIMEOUT(DIV_ROUND_UP(busy_timeout_cycles, 4096));  
> 
> This patch is broken. The change itself is fine, but busy_timeout_cycles
> is calculated wrong.
> 
> busy_timeout_cycles is calculated based on sdr->tR_max which is the page
> read time. This timeout is also used for the page program and block
> erase operations which have orders of magnitude bigger timeouts. This
> means with this patch the controller times out on pages programs and
> block erase operations. With the current code this timeout will be
> silent as the timeout interrupt is not active.
> 
> ** THIS PATCH WILL CAUSE DATA LOSS ON YOUR NAND!! **
> 
> Fortunately this patch hasn't been included in any mainline release, but
> unfortunately it showed up in several stable kernels. Don't use
> v5.4.202, v5.10.127, v5.15.51 or v5.18.8 on i.MX[678] or i.MX28 boards
> with NAND.
> 
> I am sorry for the trouble I have likely caused. I am working on a fix
> and will post it later the day.

Oh crap. Just for the record I'm leaving for several weeks today, so
please don't forget to keep Richard in copy, he will apply the fix of
the fix when ready.

Richard, looks like a pretty busy cycle, sorry for all this trouble :-/

Thanks,
Miquèl
Guenter Roeck July 15, 2022, 2:22 p.m. UTC | #4
On Tue, Jun 14, 2022 at 10:31:38AM +0200, Sascha Hauer wrote:
> The DEVICE_BUSY_TIMEOUT value is described in the Reference Manual as:
> 
> | Timeout waiting for NAND Ready/Busy or ATA IRQ. Used in WAIT_FOR_READY
> | mode. This value is the number of GPMI_CLK cycles multiplied by 4096.
> 
> So instead of multiplying the value in cycles with 4096, we have to
> divide it by that value. Use DIV_ROUND_UP to make sure we are on the
> safe side, especially when the calculated value in cycles is smaller
> than 4096 as typically the case.
> 
> This bug likely never triggered because any timeout != 0 usually will
> do. In my case the busy timeout in cycles was originally calculated as
> 2408, which multiplied with 4096 is 0x968000. The lower 16 bits were
> taken for the 16 bit wide register field, so the register value was
> 0x8000. With 2970bf5a32f0 ("mtd: rawnand: gpmi: fix controller timings
> setting") however the value in cycles became 2384, which multiplied
> with 4096 is 0x950000. The lower 16 bit are 0x0 now resulting in an
> intermediate timeout when reading from NAND.
> 
> Fixes: b1206122069aa ("mtd: rawnand: gpmi: use core timings instead of an empirical derivation")
> Cc: stable@vger.kernel.org
> Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>

I see this patch was reverted in a set of rush stable releases,
but I still see it in the mainline kernel. Is it going to be reverted
there as well ?

Thanks,
Guenter

> ---
> 
> Just a resend with +Cc: stable@vger.kernel.org
> 
>  drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c b/drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c
> index 0b68d05846e18..889e403299568 100644
> --- a/drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c
> +++ b/drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c
> @@ -890,7 +890,7 @@ static int gpmi_nfc_compute_timings(struct gpmi_nand_data *this,
>  	hw->timing0 = BF_GPMI_TIMING0_ADDRESS_SETUP(addr_setup_cycles) |
>  		      BF_GPMI_TIMING0_DATA_HOLD(data_hold_cycles) |
>  		      BF_GPMI_TIMING0_DATA_SETUP(data_setup_cycles);
> -	hw->timing1 = BF_GPMI_TIMING1_BUSY_TIMEOUT(busy_timeout_cycles * 4096);
> +	hw->timing1 = BF_GPMI_TIMING1_BUSY_TIMEOUT(DIV_ROUND_UP(busy_timeout_cycles, 4096));
>  
>  	/*
>  	 * Derive NFC ideal delay from {3}:
gregkh@linuxfoundation.org July 15, 2022, 3:04 p.m. UTC | #5
On Fri, Jul 15, 2022 at 07:22:09AM -0700, Guenter Roeck wrote:
> On Tue, Jun 14, 2022 at 10:31:38AM +0200, Sascha Hauer wrote:
> > The DEVICE_BUSY_TIMEOUT value is described in the Reference Manual as:
> > 
> > | Timeout waiting for NAND Ready/Busy or ATA IRQ. Used in WAIT_FOR_READY
> > | mode. This value is the number of GPMI_CLK cycles multiplied by 4096.
> > 
> > So instead of multiplying the value in cycles with 4096, we have to
> > divide it by that value. Use DIV_ROUND_UP to make sure we are on the
> > safe side, especially when the calculated value in cycles is smaller
> > than 4096 as typically the case.
> > 
> > This bug likely never triggered because any timeout != 0 usually will
> > do. In my case the busy timeout in cycles was originally calculated as
> > 2408, which multiplied with 4096 is 0x968000. The lower 16 bits were
> > taken for the 16 bit wide register field, so the register value was
> > 0x8000. With 2970bf5a32f0 ("mtd: rawnand: gpmi: fix controller timings
> > setting") however the value in cycles became 2384, which multiplied
> > with 4096 is 0x950000. The lower 16 bit are 0x0 now resulting in an
> > intermediate timeout when reading from NAND.
> > 
> > Fixes: b1206122069aa ("mtd: rawnand: gpmi: use core timings instead of an empirical derivation")
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
> 
> I see this patch was reverted in a set of rush stable releases,
> but I still see it in the mainline kernel. Is it going to be reverted
> there as well ?

A fix has been sent, it was said to be picked up hopefully next week:
	https://lore.kernel.org/all/20220701110341.3094023-1-s.hauer@pengutronix.de/

thanks,

greg k-h
diff mbox series

Patch

diff --git a/drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c b/drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c
index 0b68d05846e18..889e403299568 100644
--- a/drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c
+++ b/drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c
@@ -890,7 +890,7 @@  static int gpmi_nfc_compute_timings(struct gpmi_nand_data *this,
 	hw->timing0 = BF_GPMI_TIMING0_ADDRESS_SETUP(addr_setup_cycles) |
 		      BF_GPMI_TIMING0_DATA_HOLD(data_hold_cycles) |
 		      BF_GPMI_TIMING0_DATA_SETUP(data_setup_cycles);
-	hw->timing1 = BF_GPMI_TIMING1_BUSY_TIMEOUT(busy_timeout_cycles * 4096);
+	hw->timing1 = BF_GPMI_TIMING1_BUSY_TIMEOUT(DIV_ROUND_UP(busy_timeout_cycles, 4096));
 
 	/*
 	 * Derive NFC ideal delay from {3}: