diff mbox series

mtd: rawnand: gpmi: Set WAIT_FOR_READY timeout based on program/erase times

Message ID 20220701110341.3094023-1-s.hauer@pengutronix.de
State Changes Requested
Headers show
Series mtd: rawnand: gpmi: Set WAIT_FOR_READY timeout based on program/erase times | expand

Commit Message

Sascha Hauer July 1, 2022, 11:03 a.m. UTC
06781a5026350 Fixes the calculation of the DEVICE_BUSY_TIMEOUT register
value from busy_timeout_cycles. busy_timeout_cycles is calculated wrong
though: It is calculated based on the maximum page read time, but the
timeout is also used for page write and block erase operations which
require orders of magnitude bigger timeouts.

Fix this by calculating busy_timeout_cycles from the maximum of
tBERS_max and tPROG_max.

This is for now the easiest and most obvious way to fix the driver.
There's room for improvements though: The NAND_OP_WAITRDY_INSTR tells us
the desired timeout for the current operation, so we could program the
timeout dynamically for each operation instead of setting a fixed
timeout. Also we could wire up the interrupt handler to actually detect
and forward timeouts occurred when waiting for the chip being ready.

As a sidenote I verified that the change in 06781a5026350 is really
correct. I wired up the interrupt handler in my tree and measured the
time between starting the operation and the timeout interrupt handler
coming in. The time increases 41us with each step in the timeout
register which corresponds to 4096 clock cycles with the 99MHz clock
that I have.

Fixes: 06781a5026350 ("mtd: rawnand: gpmi: Fix setting busy timeout setting")
Fixes: b1206122069aa ("mtd: rawniand: gpmi: use core timings instead of an empirical derivation")
Cc: stable@vger.kernel.org
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
---
 drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

Comments

Han Xu July 1, 2022, 12:53 p.m. UTC | #1
>-----Original Message-----
>From: Sascha Hauer <s.hauer@pengutronix.de>
>Sent: Friday, July 1, 2022 6:04 AM
>To: linux-mtd@lists.infradead.org
>Cc: Han Xu <han.xu@nxp.com>; Miquel Raynal <miquel.raynal@bootlin.com>;
>kernel@pengutronix.de; Richard Weinberger <richard@nod.at>; Sascha Hauer
><s.hauer@pengutronix.de>; stable@vger.kernel.org
>Subject: [PATCH] mtd: rawnand: gpmi: Set WAIT_FOR_READY timeout based on
>program/erase times
>
>06781a5026350 Fixes the calculation of the DEVICE_BUSY_TIMEOUT register value
>from busy_timeout_cycles. busy_timeout_cycles is calculated wrong
>though: It is calculated based on the maximum page read time, but the timeout is
>also used for page write and block erase operations which require orders of
>magnitude bigger timeouts.
>
>Fix this by calculating busy_timeout_cycles from the maximum of tBERS_max and
>tPROG_max.
>
>This is for now the easiest and most obvious way to fix the driver.
>There's room for improvements though: The NAND_OP_WAITRDY_INSTR tells us
>the desired timeout for the current operation, so we could program the timeout
>dynamically for each operation instead of setting a fixed timeout. Also we could
>wire up the interrupt handler to actually detect and forward timeouts occurred
>when waiting for the chip being ready.
>
>As a sidenote I verified that the change in 06781a5026350 is really correct. I wired
>up the interrupt handler in my tree and measured the time between starting the
>operation and the timeout interrupt handler coming in. The time increases 41us
>with each step in the timeout register which corresponds to 4096 clock cycles with
>the 99MHz clock that I have.
>
>Fixes: 06781a5026350 ("mtd: rawnand: gpmi: Fix setting busy timeout setting")
>Fixes: b1206122069aa ("mtd: rawniand: gpmi: use core timings instead of an
>empirical derivation")
>Cc: stable@vger.kernel.org
>Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>

Acked-by: Han Xu <han.xu@nxp.com>

>---
> drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
>diff --git a/drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c
>b/drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c
>index 889e403299568..93da23682d862 100644
>--- a/drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c
>+++ b/drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c
>@@ -850,9 +850,10 @@ static int gpmi_nfc_compute_timings(struct
>gpmi_nand_data *this,
> 	unsigned int tRP_ps;
> 	bool use_half_period;
> 	int sample_delay_ps, sample_delay_factor;
>-	u16 busy_timeout_cycles;
>+	unsigned int busy_timeout_cycles;
> 	u8 wrn_dly_sel;
> 	unsigned long clk_rate, min_rate;
>+	u64 busy_timeout_ps;
>
> 	if (sdr->tRC_min >= 30000) {
> 		/* ONFI non-EDO modes [0-3] */
>@@ -885,7 +886,8 @@ static int gpmi_nfc_compute_timings(struct
>gpmi_nand_data *this,
> 	addr_setup_cycles = TO_CYCLES(sdr->tALS_min, period_ps);
> 	data_setup_cycles = TO_CYCLES(sdr->tDS_min, period_ps);
> 	data_hold_cycles = TO_CYCLES(sdr->tDH_min, period_ps);
>-	busy_timeout_cycles = TO_CYCLES(sdr->tWB_max + sdr->tR_max,
>period_ps);
>+	busy_timeout_ps = max(sdr->tBERS_max, sdr->tPROG_max);
>+	busy_timeout_cycles = TO_CYCLES(busy_timeout_ps, period_ps);
>
> 	hw->timing0 = BF_GPMI_TIMING0_ADDRESS_SETUP(addr_setup_cycles) |
> 		      BF_GPMI_TIMING0_DATA_HOLD(data_hold_cycles) |
>--
>2.30.2
Tomasz Moń July 11, 2022, 9:12 a.m. UTC | #2
On Fri, 2022-07-01 at 13:03 +0200, Sascha Hauer wrote:
> 06781a5026350 Fixes the calculation of the DEVICE_BUSY_TIMEOUT register
> value from busy_timeout_cycles. busy_timeout_cycles is calculated wrong
> though: It is calculated based on the maximum page read time, but the
> timeout is also used for page write and block erase operations which
> require orders of magnitude bigger timeouts.
> 
> Fix this by calculating busy_timeout_cycles from the maximum of
> tBERS_max and tPROG_max.

06781a5026350 was merged in v5.19-rc4 and then was picked up by several
stable kernels, including v5.15.51. After we have upgraded to v5.15.51
we have observed the issue that Sascha mentioned in his email [1].

As the v5.19-rc6 was released yesterday and this fix is still not
applied, the v5.19-rc6 (and all stable kernels that picked up the
backport) causes NAND flash data loss on imx targets.

I have backported this patch to our internal v5.15.51 based kernel on
4th July 2022 and I can confirm that it does indeed solve the NAND data
loss on imx targets.

Is it possible for this patch to make it to the v5.19-rc7?

> Fixes: 06781a5026350 ("mtd: rawnand: gpmi: Fix setting busy timeout setting")
> Fixes: b1206122069aa ("mtd: rawniand: gpmi: use core timings instead of an empirical derivation")
> Cc: stable@vger.kernel.org
> Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>

Tested-by: Tomasz Moń <tomasz.mon@camlingroup.com>

[1] https://lore.kernel.org/linux-mtd/20220701091909.GE2387@pengutronix.de/
Tomasz Moń July 15, 2022, 5:27 a.m. UTC | #3
On Mon, 2022-07-11 at 11:12 +0200, Tomasz Moń wrote:
> On Fri, 2022-07-01 at 13:03 +0200, Sascha Hauer wrote:
> > 06781a5026350 Fixes the calculation of the DEVICE_BUSY_TIMEOUT register
> > value from busy_timeout_cycles. busy_timeout_cycles is calculated wrong
> > though: It is calculated based on the maximum page read time, but the
> > timeout is also used for page write and block erase operations which
> > require orders of magnitude bigger timeouts.
> > 
> > Fix this by calculating busy_timeout_cycles from the maximum of
> > tBERS_max and tPROG_max.
> 
> 06781a5026350 was merged in v5.19-rc4 and then was picked up by several
> stable kernels, including v5.15.51. After we have upgraded to v5.15.51
> we have observed the issue that Sascha mentioned in his email [1].
> 
> As the v5.19-rc6 was released yesterday and this fix is still not
> applied, the v5.19-rc6 (and all stable kernels that picked up the
> backport) causes NAND flash data loss on imx targets.
> 
> I have backported this patch to our internal v5.15.51 based kernel on
> 4th July 2022 and I can confirm that it does indeed solve the NAND data
> loss on imx targets.
> 
> Is it possible for this patch to make it to the v5.19-rc7?

No response, so sending the email to more people so the voice is heard.
Sorry if this is not the proper way, but I think the issue is serious.

Current prepatch kernels starting with v5.19-rc4 and stable kernels
starting with v5.4.202. v5.10.127, v5.15.51, v5.18.8 contain a
  "[PATCH] [REALLY REALLY BROKEN] mtd: rawnand: gpmi: Fix setting busy timeout setting"
that is wreaking havoc to i.MX[678] or i.MX28 devices with NAND
  "** THIS PATCH WILL CAUSE DATA LOSS ON YOUR NAND!! **" [1]

The solution is to either:
  * Revert 06781a5026350 ("mtd: rawnand: gpmi: Fix setting busy timeout
setting") and all its cherry-picks to stable branches, *OR*
  * Apply the fix ("mtd: rawnand: gpmi: Set WAIT_FOR_READY timeout
based on program/erase times") [2]

Please do whatever you see fit.

Best Regards,
Tomasz Moń

[1] https://lore.kernel.org/linux-mtd/20220701091909.GE2387@pengutronix.de/
[2] https://lore.kernel.org/linux-mtd/20220701110341.3094023-1-s.hauer@pengutronix.de/
Greg KH July 15, 2022, 5:49 a.m. UTC | #4
On Fri, Jul 15, 2022 at 07:27:02AM +0200, Tomasz Moń wrote:
> On Mon, 2022-07-11 at 11:12 +0200, Tomasz Moń wrote:
> > On Fri, 2022-07-01 at 13:03 +0200, Sascha Hauer wrote:
> > > 06781a5026350 Fixes the calculation of the DEVICE_BUSY_TIMEOUT register
> > > value from busy_timeout_cycles. busy_timeout_cycles is calculated wrong
> > > though: It is calculated based on the maximum page read time, but the
> > > timeout is also used for page write and block erase operations which
> > > require orders of magnitude bigger timeouts.
> > > 
> > > Fix this by calculating busy_timeout_cycles from the maximum of
> > > tBERS_max and tPROG_max.
> > 
> > 06781a5026350 was merged in v5.19-rc4 and then was picked up by several
> > stable kernels, including v5.15.51. After we have upgraded to v5.15.51
> > we have observed the issue that Sascha mentioned in his email [1].
> > 
> > As the v5.19-rc6 was released yesterday and this fix is still not
> > applied, the v5.19-rc6 (and all stable kernels that picked up the
> > backport) causes NAND flash data loss on imx targets.
> > 
> > I have backported this patch to our internal v5.15.51 based kernel on
> > 4th July 2022 and I can confirm that it does indeed solve the NAND data
> > loss on imx targets.
> > 
> > Is it possible for this patch to make it to the v5.19-rc7?
> 
> No response, so sending the email to more people so the voice is heard.
> Sorry if this is not the proper way, but I think the issue is serious.
> 
> Current prepatch kernels starting with v5.19-rc4 and stable kernels
> starting with v5.4.202. v5.10.127, v5.15.51, v5.18.8 contain a
>   "[PATCH] [REALLY REALLY BROKEN] mtd: rawnand: gpmi: Fix setting busy timeout setting"
> that is wreaking havoc to i.MX[678] or i.MX28 devices with NAND
>   "** THIS PATCH WILL CAUSE DATA LOSS ON YOUR NAND!! **" [1]
> 
> The solution is to either:
>   * Revert 06781a5026350 ("mtd: rawnand: gpmi: Fix setting busy timeout
> setting") and all its cherry-picks to stable branches, *OR*
>   * Apply the fix ("mtd: rawnand: gpmi: Set WAIT_FOR_READY timeout
> based on program/erase times") [2]
> 
> Please do whatever you see fit.

I can do do a stable release with this reverted, but I really expected
to see the fix in linux-next by now at the very least.  Does this driver
not have an active maintainer and subsystem maintainer for some reason?

thanks,

greg k-h
Sascha Hauer July 15, 2022, 7:46 a.m. UTC | #5
On Fri, Jul 15, 2022 at 07:49:40AM +0200, Greg Kroah-Hartman wrote:
> On Fri, Jul 15, 2022 at 07:27:02AM +0200, Tomasz Moń wrote:
> > On Mon, 2022-07-11 at 11:12 +0200, Tomasz Moń wrote:
> > > On Fri, 2022-07-01 at 13:03 +0200, Sascha Hauer wrote:
> > > > 06781a5026350 Fixes the calculation of the DEVICE_BUSY_TIMEOUT register
> > > > value from busy_timeout_cycles. busy_timeout_cycles is calculated wrong
> > > > though: It is calculated based on the maximum page read time, but the
> > > > timeout is also used for page write and block erase operations which
> > > > require orders of magnitude bigger timeouts.
> > > > 
> > > > Fix this by calculating busy_timeout_cycles from the maximum of
> > > > tBERS_max and tPROG_max.
> > > 
> > > 06781a5026350 was merged in v5.19-rc4 and then was picked up by several
> > > stable kernels, including v5.15.51. After we have upgraded to v5.15.51
> > > we have observed the issue that Sascha mentioned in his email [1].
> > > 
> > > As the v5.19-rc6 was released yesterday and this fix is still not
> > > applied, the v5.19-rc6 (and all stable kernels that picked up the
> > > backport) causes NAND flash data loss on imx targets.
> > > 
> > > I have backported this patch to our internal v5.15.51 based kernel on
> > > 4th July 2022 and I can confirm that it does indeed solve the NAND data
> > > loss on imx targets.
> > > 
> > > Is it possible for this patch to make it to the v5.19-rc7?
> > 
> > No response, so sending the email to more people so the voice is heard.
> > Sorry if this is not the proper way, but I think the issue is serious.
> > 
> > Current prepatch kernels starting with v5.19-rc4 and stable kernels
> > starting with v5.4.202. v5.10.127, v5.15.51, v5.18.8 contain a
> >   "[PATCH] [REALLY REALLY BROKEN] mtd: rawnand: gpmi: Fix setting busy timeout setting"
> > that is wreaking havoc to i.MX[678] or i.MX28 devices with NAND
> >   "** THIS PATCH WILL CAUSE DATA LOSS ON YOUR NAND!! **" [1]
> > 
> > The solution is to either:
> >   * Revert 06781a5026350 ("mtd: rawnand: gpmi: Fix setting busy timeout
> > setting") and all its cherry-picks to stable branches, *OR*
> >   * Apply the fix ("mtd: rawnand: gpmi: Set WAIT_FOR_READY timeout
> > based on program/erase times") [2]
> > 
> > Please do whatever you see fit.
> 
> I can do do a stable release with this reverted, but I really expected
> to see the fix in linux-next by now at the very least.  Does this driver
> not have an active maintainer and subsystem maintainer for some reason?

My IRC history doesn't go back far enough, but if I recall correctly
Miquel is on vacation, he would have picked up this patch for linux-next
otherwise.

Sascha
Greg KH July 15, 2022, 7:54 a.m. UTC | #6
On Fri, Jul 15, 2022 at 09:46:31AM +0200, Sascha Hauer wrote:
> On Fri, Jul 15, 2022 at 07:49:40AM +0200, Greg Kroah-Hartman wrote:
> > On Fri, Jul 15, 2022 at 07:27:02AM +0200, Tomasz Moń wrote:
> > > On Mon, 2022-07-11 at 11:12 +0200, Tomasz Moń wrote:
> > > > On Fri, 2022-07-01 at 13:03 +0200, Sascha Hauer wrote:
> > > > > 06781a5026350 Fixes the calculation of the DEVICE_BUSY_TIMEOUT register
> > > > > value from busy_timeout_cycles. busy_timeout_cycles is calculated wrong
> > > > > though: It is calculated based on the maximum page read time, but the
> > > > > timeout is also used for page write and block erase operations which
> > > > > require orders of magnitude bigger timeouts.
> > > > > 
> > > > > Fix this by calculating busy_timeout_cycles from the maximum of
> > > > > tBERS_max and tPROG_max.
> > > > 
> > > > 06781a5026350 was merged in v5.19-rc4 and then was picked up by several
> > > > stable kernels, including v5.15.51. After we have upgraded to v5.15.51
> > > > we have observed the issue that Sascha mentioned in his email [1].
> > > > 
> > > > As the v5.19-rc6 was released yesterday and this fix is still not
> > > > applied, the v5.19-rc6 (and all stable kernels that picked up the
> > > > backport) causes NAND flash data loss on imx targets.
> > > > 
> > > > I have backported this patch to our internal v5.15.51 based kernel on
> > > > 4th July 2022 and I can confirm that it does indeed solve the NAND data
> > > > loss on imx targets.
> > > > 
> > > > Is it possible for this patch to make it to the v5.19-rc7?
> > > 
> > > No response, so sending the email to more people so the voice is heard.
> > > Sorry if this is not the proper way, but I think the issue is serious.
> > > 
> > > Current prepatch kernels starting with v5.19-rc4 and stable kernels
> > > starting with v5.4.202. v5.10.127, v5.15.51, v5.18.8 contain a
> > >   "[PATCH] [REALLY REALLY BROKEN] mtd: rawnand: gpmi: Fix setting busy timeout setting"
> > > that is wreaking havoc to i.MX[678] or i.MX28 devices with NAND
> > >   "** THIS PATCH WILL CAUSE DATA LOSS ON YOUR NAND!! **" [1]
> > > 
> > > The solution is to either:
> > >   * Revert 06781a5026350 ("mtd: rawnand: gpmi: Fix setting busy timeout
> > > setting") and all its cherry-picks to stable branches, *OR*
> > >   * Apply the fix ("mtd: rawnand: gpmi: Set WAIT_FOR_READY timeout
> > > based on program/erase times") [2]
> > > 
> > > Please do whatever you see fit.
> > 
> > I can do do a stable release with this reverted, but I really expected
> > to see the fix in linux-next by now at the very least.  Does this driver
> > not have an active maintainer and subsystem maintainer for some reason?
> 
> My IRC history doesn't go back far enough, but if I recall correctly
> Miquel is on vacation, he would have picked up this patch for linux-next
> otherwise.

Ok, let me do a round of stable releases so that people don't get hit by
this now...

Hopefully this gets fixed up by 5.19-final.

thanks,

greg k-h
Greg KH July 15, 2022, 7:54 a.m. UTC | #7
On Fri, Jul 15, 2022 at 09:54:10AM +0200, Greg Kroah-Hartman wrote:
> On Fri, Jul 15, 2022 at 09:46:31AM +0200, Sascha Hauer wrote:
> > On Fri, Jul 15, 2022 at 07:49:40AM +0200, Greg Kroah-Hartman wrote:
> > > On Fri, Jul 15, 2022 at 07:27:02AM +0200, Tomasz Moń wrote:
> > > > On Mon, 2022-07-11 at 11:12 +0200, Tomasz Moń wrote:
> > > > > On Fri, 2022-07-01 at 13:03 +0200, Sascha Hauer wrote:
> > > > > > 06781a5026350 Fixes the calculation of the DEVICE_BUSY_TIMEOUT register
> > > > > > value from busy_timeout_cycles. busy_timeout_cycles is calculated wrong
> > > > > > though: It is calculated based on the maximum page read time, but the
> > > > > > timeout is also used for page write and block erase operations which
> > > > > > require orders of magnitude bigger timeouts.
> > > > > > 
> > > > > > Fix this by calculating busy_timeout_cycles from the maximum of
> > > > > > tBERS_max and tPROG_max.
> > > > > 
> > > > > 06781a5026350 was merged in v5.19-rc4 and then was picked up by several
> > > > > stable kernels, including v5.15.51. After we have upgraded to v5.15.51
> > > > > we have observed the issue that Sascha mentioned in his email [1].
> > > > > 
> > > > > As the v5.19-rc6 was released yesterday and this fix is still not
> > > > > applied, the v5.19-rc6 (and all stable kernels that picked up the
> > > > > backport) causes NAND flash data loss on imx targets.
> > > > > 
> > > > > I have backported this patch to our internal v5.15.51 based kernel on
> > > > > 4th July 2022 and I can confirm that it does indeed solve the NAND data
> > > > > loss on imx targets.
> > > > > 
> > > > > Is it possible for this patch to make it to the v5.19-rc7?
> > > > 
> > > > No response, so sending the email to more people so the voice is heard.
> > > > Sorry if this is not the proper way, but I think the issue is serious.
> > > > 
> > > > Current prepatch kernels starting with v5.19-rc4 and stable kernels
> > > > starting with v5.4.202. v5.10.127, v5.15.51, v5.18.8 contain a
> > > >   "[PATCH] [REALLY REALLY BROKEN] mtd: rawnand: gpmi: Fix setting busy timeout setting"
> > > > that is wreaking havoc to i.MX[678] or i.MX28 devices with NAND
> > > >   "** THIS PATCH WILL CAUSE DATA LOSS ON YOUR NAND!! **" [1]
> > > > 
> > > > The solution is to either:
> > > >   * Revert 06781a5026350 ("mtd: rawnand: gpmi: Fix setting busy timeout
> > > > setting") and all its cherry-picks to stable branches, *OR*
> > > >   * Apply the fix ("mtd: rawnand: gpmi: Set WAIT_FOR_READY timeout
> > > > based on program/erase times") [2]
> > > > 
> > > > Please do whatever you see fit.
> > > 
> > > I can do do a stable release with this reverted, but I really expected
> > > to see the fix in linux-next by now at the very least.  Does this driver
> > > not have an active maintainer and subsystem maintainer for some reason?
> > 
> > My IRC history doesn't go back far enough, but if I recall correctly
> > Miquel is on vacation, he would have picked up this patch for linux-next
> > otherwise.
> 
> Ok, let me do a round of stable releases so that people don't get hit by
> this now...
> 
> Hopefully this gets fixed up by 5.19-final.

Note, one way this could get fixed up sooner if it was reported to the
regression bot, as Linus and other subsystem maintainers do monitor that
and will pick up patches that are dropped like this one.

thanks,

greg k-h
Richard Weinberger July 15, 2022, 7:59 a.m. UTC | #8
----- Ursprüngliche Mail -----
>> My IRC history doesn't go back far enough, but if I recall correctly
>> Miquel is on vacation, he would have picked up this patch for linux-next
>> otherwise.

Exactly.
 
> Ok, let me do a round of stable releases so that people don't get hit by
> this now...

Thanks a lot for doing so.
 
> Hopefully this gets fixed up by 5.19-final.

Sure, I'll pickup this patch.

Thanks,
//richard
Miquel Raynal Sept. 2, 2022, 7:02 a.m. UTC | #9
Hey folks,

richard@nod.at wrote on Fri, 15 Jul 2022 09:59:10 +0200 (CEST):

> ----- Ursprüngliche Mail -----
> >> My IRC history doesn't go back far enough, but if I recall correctly
> >> Miquel is on vacation, he would have picked up this patch for linux-next
> >> otherwise.  
> 
> Exactly.

Indeed, I was off for an extended period of time, I'm (very) slowly
catching up now.

>  
> > Ok, let me do a round of stable releases so that people don't get hit by
> > this now...  
> 
> Thanks a lot for doing so.
>  
> > Hopefully this gets fixed up by 5.19-final.  
> 
> Sure, I'll pickup this patch.

Thanks Greg & Richard for the handling of this issue.

Cheers,
Miquèl
Tim Harvey Oct. 21, 2022, 9:55 p.m. UTC | #10
On Fri, Sep 2, 2022 at 12:03 AM Miquel Raynal <miquel.raynal@bootlin.com> wrote:
>
> Hey folks,
>
> richard@nod.at wrote on Fri, 15 Jul 2022 09:59:10 +0200 (CEST):
>
> > ----- Ursprüngliche Mail -----
> > >> My IRC history doesn't go back far enough, but if I recall correctly
> > >> Miquel is on vacation, he would have picked up this patch for linux-next
> > >> otherwise.
> >
> > Exactly.
>
> Indeed, I was off for an extended period of time, I'm (very) slowly
> catching up now.
>
> >
> > > Ok, let me do a round of stable releases so that people don't get hit by
> > > this now...
> >
> > Thanks a lot for doing so.
> >
> > > Hopefully this gets fixed up by 5.19-final.
> >
> > Sure, I'll pickup this patch.
>
> Thanks Greg & Richard for the handling of this issue.
>
> Cheers,
> Miquèl
>

Hello All,

As Tomasz stated previously 06781a5026350 was merged in v5.19-rc4 and
then was picked up by several stable kernels. While this made it into
the 5.15 and 5.18 stable branches it did not make it into the
following which are thus the are currently broken:
5.10.y
5.17.y

How do we get this patch applied to those stable branches as well to
resolve this?

Best regards,

Tim
Tim Harvey Oct. 22, 2022, 12:37 a.m. UTC | #11
On Fri, Oct 21, 2022 at 2:55 PM Tim Harvey <tharvey@gateworks.com> wrote:
>
> On Fri, Sep 2, 2022 at 12:03 AM Miquel Raynal <miquel.raynal@bootlin.com> wrote:
> >
> > Hey folks,
> >
> > richard@nod.at wrote on Fri, 15 Jul 2022 09:59:10 +0200 (CEST):
> >
> > > ----- Ursprüngliche Mail -----
> > > >> My IRC history doesn't go back far enough, but if I recall correctly
> > > >> Miquel is on vacation, he would have picked up this patch for linux-next
> > > >> otherwise.
> > >
> > > Exactly.
> >
> > Indeed, I was off for an extended period of time, I'm (very) slowly
> > catching up now.
> >
> > >
> > > > Ok, let me do a round of stable releases so that people don't get hit by
> > > > this now...
> > >
> > > Thanks a lot for doing so.
> > >
> > > > Hopefully this gets fixed up by 5.19-final.
> > >
> > > Sure, I'll pickup this patch.
> >
> > Thanks Greg & Richard for the handling of this issue.
> >
> > Cheers,
> > Miquèl
> >
>
> Hello All,
>
> As Tomasz stated previously 06781a5026350 was merged in v5.19-rc4 and
> then was picked up by several stable kernels. While this made it into
> the 5.15 and 5.18 stable branches it did not make it into the
> following which are thus the are currently broken:
> 5.10.y
> 5.17.y
>
> How do we get this patch applied to those stable branches as well to
> resolve this?
>
> Best regards,
>

Some more details here as I also find that stable 5.4 is broken as
well 5.10 and 5.17:

The issue I see here is on a Gateworks imx6q-gw54xx board with a 2GiB
Cypress NAND part
[    2.703324] nand: device found, Manufacturer ID: 0x01, Chip ID: 0xd5
[    2.709815] nand: AMD/Spansion S34ML16G2
[    2.713758] nand: 2048 MiB, SLC, erase size: 128 KiB, page size:
2048, OOB size: 128
[    2.721990] Scanning device for bad blocks
[    2.727025] Bad eraseblock 0 at 0x000000000000
[    2.731628] Bad eraseblock 1 at 0x000000020000
[    2.736191] Bad eraseblock 2 at 0x000000040000
...

The original regression was from commit d99e7feaed4c: ("mtd: rawnand:
gpmi: fix controller timings setting") which was fixed by commit
156427b3123c ("mtd: rawnand: gpmi: Fix setting busy timeout setting")
but later deemed to be an invalid fix so that got reverted (thus
breaking things again) and fixed instead with commit 0fddf9ad06fd
("mtd: rawnand: gpmi: Set WAIT_FOR_READY timeout based on
program/erase times") which did not make it into all the stable
branches that had the original commit or the revert of the potential
fix.

Here is history of each stable branch affected as best I can tell
(I've actually built/booted all of these trying to understand this so
fixed/broken refers to the issue I see above):

v5.4:
 - v5.4.189 broken with commit 0d0ee651e72c: ("mtd: rawnand: gpmi: fix
controller timings setting")
 - v5.4.202 fixed with commit 71c76f56b97c: ("mtd: rawnand: gpmi: Fix
setting busy timeout setting")
 - v5.4.206 broken with commit 15a3adfe7593: ("Revert "mtd: rawnand:
gpmi: Fix setting busy timeout setting"")
 *** still broken as of v5.4.219 but can be fixed by commit
0fddf9ad06fd ("mtd: rawnand: gpmi: Set WAIT_FOR_READY timeout based on
program/erase times")

v5.10:
 - v5.10.110 broken with commit d99e7feaed4c: ("mtd: rawnand: gpmi:
fix controller timings setting")
 - v5.10.127 fixed with commit 156427b3123c ("mtd: rawnand: gpmi: Fix
setting busy timeout setting")
 - v5.10.131 broken with commit cc5ee0e0eed0 Revert "mtd: rawnand:
gpmi: Fix setting busy timeout setting"
 *** still broken as of v5.10.149 but can be fixed by commit
0fddf9ad06fd ("mtd: rawnand: gpmi: Set WAIT_FOR_READY timeout based on
program/erase times")

v5.15:
 - v5.15.33 broken with commit 8480efe815e5: ("mtd: rawnand: gpmi: fix
controller timings setting")
 - v5.15.51 fixed with commit 0af674e7a764: ("mtd: rawnand: gpmi: Fix
setting busy timeout setting")
 - v5.15.55 broken with commit c80b15105a08 ("Revert "mtd: rawnand:
gpmi: Fix setting busy timeout setting"")
 - v5.15.58 fixed with commit 212a5360ef40 ("mtd: rawnand: gpmi: Set
WAIT_FOR_READY timeout based on program/erase times")

v5.17:
 - v5.17.2 broken with commit 4b2eed1d1ee8 ("mtd: rawnand: gpmi: fix
controller timings setting")
 *** still broken as of v5.17.15 but can be fixed by commit
0fddf9ad06fd ("mtd: rawnand: gpmi: Set WAIT_FOR_READY timeout based on
program/erase times")

v5.18:
 - v5.18-rc2 broken with commit 2970bf5a32f0 mtd: rawnand: gpmi: fix
controller timings setting
 - v5.18.8 fixed with commit 8d45b05f7eaf: ("mtd: rawnand: gpmi: Fix
setting busy timeout setting")
 - v5.18.12 broken with commit f8d01e0f004a: ("Revert "mtd: rawnand:
gpmi: Fix setting busy timeout setting"")
 - v5.18.15 fixed with commit c283223e769d: ("mtd: rawnand: gpmi: Set
WAIT_FOR_READY timeout based on program/erase times")

v5.19:
- v5.19-rc3 broken by commit 06781a502635: ("mtd: rawnand: gpmi: Fix
setting busy timeout setting")
- v5.19-rc7 fixed by commit 0fddf9ad06fd: ("mtd: rawnand: gpmi: Set
WAIT_FOR_READY timeout based on program/erase times")

Best Regards,

Tim
Miquel Raynal Oct. 24, 2022, 8:01 a.m. UTC | #12
Hi Tim,

tharvey@gateworks.com wrote on Fri, 21 Oct 2022 14:55:15 -0700:

> On Fri, Sep 2, 2022 at 12:03 AM Miquel Raynal <miquel.raynal@bootlin.com> wrote:
> >
> > Hey folks,
> >
> > richard@nod.at wrote on Fri, 15 Jul 2022 09:59:10 +0200 (CEST):
> >  
> > > ----- Ursprüngliche Mail -----  
> > > >> My IRC history doesn't go back far enough, but if I recall correctly
> > > >> Miquel is on vacation, he would have picked up this patch for linux-next
> > > >> otherwise.  
> > >
> > > Exactly.  
> >
> > Indeed, I was off for an extended period of time, I'm (very) slowly
> > catching up now.
> >  
> > >  
> > > > Ok, let me do a round of stable releases so that people don't get hit by
> > > > this now...  
> > >
> > > Thanks a lot for doing so.
> > >  
> > > > Hopefully this gets fixed up by 5.19-final.  
> > >
> > > Sure, I'll pickup this patch.  
> >
> > Thanks Greg & Richard for the handling of this issue.
> >
> > Cheers,
> > Miquèl
> >  
> 
> Hello All,
> 
> As Tomasz stated previously 06781a5026350 was merged in v5.19-rc4 and
> then was picked up by several stable kernels. While this made it into
> the 5.15 and 5.18 stable branches it did not make it into the
> following which are thus the are currently broken:
> 5.10.y
> 5.17.y
> 
> How do we get this patch applied to those stable branches as well to
> resolve this?

It is likely that the original patch (targeting a mainline kernel) did
not apply to those branches. In this case you can adapt the fix to the
concerned kernels and send it to stable@ (following the Documentation
guidelines for backports).

Thanks,
Miquèl
Tim Harvey Oct. 25, 2022, 10:02 p.m. UTC | #13
On Mon, Oct 24, 2022 at 1:01 AM Miquel Raynal <miquel.raynal@bootlin.com> wrote:
>
> Hi Tim,
>
> tharvey@gateworks.com wrote on Fri, 21 Oct 2022 14:55:15 -0700:
>
> > On Fri, Sep 2, 2022 at 12:03 AM Miquel Raynal <miquel.raynal@bootlin.com> wrote:
> > >
> > > Hey folks,
> > >
> > > richard@nod.at wrote on Fri, 15 Jul 2022 09:59:10 +0200 (CEST):
> > >
> > > > ----- Ursprüngliche Mail -----
> > > > >> My IRC history doesn't go back far enough, but if I recall correctly
> > > > >> Miquel is on vacation, he would have picked up this patch for linux-next
> > > > >> otherwise.
> > > >
> > > > Exactly.
> > >
> > > Indeed, I was off for an extended period of time, I'm (very) slowly
> > > catching up now.
> > >
> > > >
> > > > > Ok, let me do a round of stable releases so that people don't get hit by
> > > > > this now...
> > > >
> > > > Thanks a lot for doing so.
> > > >
> > > > > Hopefully this gets fixed up by 5.19-final.
> > > >
> > > > Sure, I'll pickup this patch.
> > >
> > > Thanks Greg & Richard for the handling of this issue.
> > >
> > > Cheers,
> > > Miquèl
> > >
> >
> > Hello All,
> >
> > As Tomasz stated previously 06781a5026350 was merged in v5.19-rc4 and
> > then was picked up by several stable kernels. While this made it into
> > the 5.15 and 5.18 stable branches it did not make it into the
> > following which are thus the are currently broken:
> > 5.10.y
> > 5.17.y
> >
> > How do we get this patch applied to those stable branches as well to
> > resolve this?
>
> It is likely that the original patch (targeting a mainline kernel) did
> not apply to those branches. In this case you can adapt the fix to the
> concerned kernels and send it to stable@ (following the Documentation
> guidelines for backports).
>
> Thanks,
> Miquèl

Miquèl,

Thanks for the pointer. You are correct that this patch which resolves
the regression does not apply directly to 5.4/5.10/5.17 stable trees.
I'm looking over
https://www.kernel.org/doc/html/v4.10/process/stable-kernel-rules.html
and I'm not clear what I need to put in the commit to make it clear
that it only applies to those specific trees. Do I simply adjust the
'Fixes' tag to address the commit from that specific stable branch and
send one for each stable branch (thus each would have a different sha
in the Fixes tag) while also adding the 'commit <sha> upstream' to the
top?

Best Regards.

Tim
Miquel Raynal Oct. 26, 2022, 8:18 a.m. UTC | #14
Hi Tim,

tharvey@gateworks.com wrote on Tue, 25 Oct 2022 15:02:27 -0700:

> On Mon, Oct 24, 2022 at 1:01 AM Miquel Raynal <miquel.raynal@bootlin.com> wrote:
> >
> > Hi Tim,
> >
> > tharvey@gateworks.com wrote on Fri, 21 Oct 2022 14:55:15 -0700:
> >  
> > > On Fri, Sep 2, 2022 at 12:03 AM Miquel Raynal <miquel.raynal@bootlin.com> wrote:  
> > > >
> > > > Hey folks,
> > > >
> > > > richard@nod.at wrote on Fri, 15 Jul 2022 09:59:10 +0200 (CEST):
> > > >  
> > > > > ----- Ursprüngliche Mail -----  
> > > > > >> My IRC history doesn't go back far enough, but if I recall correctly
> > > > > >> Miquel is on vacation, he would have picked up this patch for linux-next
> > > > > >> otherwise.  
> > > > >
> > > > > Exactly.  
> > > >
> > > > Indeed, I was off for an extended period of time, I'm (very) slowly
> > > > catching up now.
> > > >  
> > > > >  
> > > > > > Ok, let me do a round of stable releases so that people don't get hit by
> > > > > > this now...  
> > > > >
> > > > > Thanks a lot for doing so.
> > > > >  
> > > > > > Hopefully this gets fixed up by 5.19-final.  
> > > > >
> > > > > Sure, I'll pickup this patch.  
> > > >
> > > > Thanks Greg & Richard for the handling of this issue.
> > > >
> > > > Cheers,
> > > > Miquèl
> > > >  
> > >
> > > Hello All,
> > >
> > > As Tomasz stated previously 06781a5026350 was merged in v5.19-rc4 and
> > > then was picked up by several stable kernels. While this made it into
> > > the 5.15 and 5.18 stable branches it did not make it into the
> > > following which are thus the are currently broken:
> > > 5.10.y
> > > 5.17.y
> > >
> > > How do we get this patch applied to those stable branches as well to
> > > resolve this?  
> >
> > It is likely that the original patch (targeting a mainline kernel) did
> > not apply to those branches. In this case you can adapt the fix to the
> > concerned kernels and send it to stable@ (following the Documentation
> > guidelines for backports).
> >
> > Thanks,
> > Miquèl  
> 
> Miquèl,
> 
> Thanks for the pointer. You are correct that this patch which resolves
> the regression does not apply directly to 5.4/5.10/5.17 stable trees.
> I'm looking over
> https://www.kernel.org/doc/html/v4.10/process/stable-kernel-rules.html

Option 3 fits, right?

> and I'm not clear what I need to put in the commit to make it clear
> that it only applies to those specific trees. Do I simply adjust the
> 'Fixes' tag to address the commit from that specific stable branch and
> send one for each stable branch (thus each would have a different sha
> in the Fixes tag) while also adding the 'commit <sha> upstream' to the
> top?

Here are the failures:
https://lore.kernel.org/stable/165858381623360@kroah.com/
https://lore.kernel.org/stable/165858381472139@kroah.com/
https://lore.kernel.org/stable/165858381313218@kroah.com/

You should not adjust the Fixes tag, this tag should really
reflect what you are trying to fix, and not some kind of target for the
backport. However you're changing the commit content so you can also
change the commit log, with eg. the upstream original commit somewhere
there in plain text. You can target a kernel version on the Cc: stable
line (see the doc) and you can also use

	--subject-prefix="PATCH stable <version>"

during git-format-patch, even though I'm not sure this is actually
needed, it's more like courtesy to let reviewers know what you are
doing.

Here is an example, maybe not the best one (forget about the RESEND)
but a typical case:
https://lore.kernel.org/linux-mtd/20220223174431.1083-1-f.fainelli@gmail.com/

Thanks,
Miquèl
Greg KH Oct. 26, 2022, 10:50 a.m. UTC | #15
On Tue, Oct 25, 2022 at 03:02:27PM -0700, Tim Harvey wrote:
> On Mon, Oct 24, 2022 at 1:01 AM Miquel Raynal <miquel.raynal@bootlin.com> wrote:
> >
> > Hi Tim,
> >
> > tharvey@gateworks.com wrote on Fri, 21 Oct 2022 14:55:15 -0700:
> >
> > > On Fri, Sep 2, 2022 at 12:03 AM Miquel Raynal <miquel.raynal@bootlin.com> wrote:
> > > >
> > > > Hey folks,
> > > >
> > > > richard@nod.at wrote on Fri, 15 Jul 2022 09:59:10 +0200 (CEST):
> > > >
> > > > > ----- Ursprüngliche Mail -----
> > > > > >> My IRC history doesn't go back far enough, but if I recall correctly
> > > > > >> Miquel is on vacation, he would have picked up this patch for linux-next
> > > > > >> otherwise.
> > > > >
> > > > > Exactly.
> > > >
> > > > Indeed, I was off for an extended period of time, I'm (very) slowly
> > > > catching up now.
> > > >
> > > > >
> > > > > > Ok, let me do a round of stable releases so that people don't get hit by
> > > > > > this now...
> > > > >
> > > > > Thanks a lot for doing so.
> > > > >
> > > > > > Hopefully this gets fixed up by 5.19-final.
> > > > >
> > > > > Sure, I'll pickup this patch.
> > > >
> > > > Thanks Greg & Richard for the handling of this issue.
> > > >
> > > > Cheers,
> > > > Miquèl
> > > >
> > >
> > > Hello All,
> > >
> > > As Tomasz stated previously 06781a5026350 was merged in v5.19-rc4 and
> > > then was picked up by several stable kernels. While this made it into
> > > the 5.15 and 5.18 stable branches it did not make it into the
> > > following which are thus the are currently broken:
> > > 5.10.y
> > > 5.17.y
> > >
> > > How do we get this patch applied to those stable branches as well to
> > > resolve this?
> >
> > It is likely that the original patch (targeting a mainline kernel) did
> > not apply to those branches. In this case you can adapt the fix to the
> > concerned kernels and send it to stable@ (following the Documentation
> > guidelines for backports).
> >
> > Thanks,
> > Miquèl
> 
> Miquèl,
> 
> Thanks for the pointer. You are correct that this patch which resolves
> the regression does not apply directly to 5.4/5.10/5.17 stable trees.
> I'm looking over
> https://www.kernel.org/doc/html/v4.10/process/stable-kernel-rules.html
> and I'm not clear what I need to put in the commit to make it clear
> that it only applies to those specific trees. Do I simply adjust the
> 'Fixes' tag to address the commit from that specific stable branch and
> send one for each stable branch (thus each would have a different sha
> in the Fixes tag) while also adding the 'commit <sha> upstream' to the
> top?

Please send multiple patches and say below the --- line which stable
tree it should be applied to.

Note that 5.17 is long end-of-life, always check the front page of
kernel.org for the active kernel versions.

thanks,

greg k-h
diff mbox series

Patch

diff --git a/drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c b/drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c
index 889e403299568..93da23682d862 100644
--- a/drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c
+++ b/drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c
@@ -850,9 +850,10 @@  static int gpmi_nfc_compute_timings(struct gpmi_nand_data *this,
 	unsigned int tRP_ps;
 	bool use_half_period;
 	int sample_delay_ps, sample_delay_factor;
-	u16 busy_timeout_cycles;
+	unsigned int busy_timeout_cycles;
 	u8 wrn_dly_sel;
 	unsigned long clk_rate, min_rate;
+	u64 busy_timeout_ps;
 
 	if (sdr->tRC_min >= 30000) {
 		/* ONFI non-EDO modes [0-3] */
@@ -885,7 +886,8 @@  static int gpmi_nfc_compute_timings(struct gpmi_nand_data *this,
 	addr_setup_cycles = TO_CYCLES(sdr->tALS_min, period_ps);
 	data_setup_cycles = TO_CYCLES(sdr->tDS_min, period_ps);
 	data_hold_cycles = TO_CYCLES(sdr->tDH_min, period_ps);
-	busy_timeout_cycles = TO_CYCLES(sdr->tWB_max + sdr->tR_max, period_ps);
+	busy_timeout_ps = max(sdr->tBERS_max, sdr->tPROG_max);
+	busy_timeout_cycles = TO_CYCLES(busy_timeout_ps, period_ps);
 
 	hw->timing0 = BF_GPMI_TIMING0_ADDRESS_SETUP(addr_setup_cycles) |
 		      BF_GPMI_TIMING0_DATA_HOLD(data_hold_cycles) |