Message ID | 20190207235806.GA39580@dev-dsk-psobon-2c-1dd9f399.us-west-2.amazon.com |
---|---|
State | Superseded |
Delegated to: | Vignesh R |
Headers | show |
Series | mtd: cfi: Fixed endless loop problem in CFI when value was written but corrupted. | expand |
Hi Przemek-san, I think that for the error case it should be done to retry at first. It can be implemented separately but it is possible to be not enough. Since the flash write error causes the user data corruption I think. File systems and applications do not execute any recovery usually. In the past I saw a similar write error actually and fixed as below. dfeae1073583d ("mtd: cfi_cmdset_0002: Change write buffer to check correct value") I am also seeing a similar flash write error for the word write case. In the case the retry with the reset recovery does not work fully. After the repeated retry with the reset the flash is not able to work. There is a possibility for the buffer write also but sorry not sure. Since there is a difference to execute the recovery command. As Jocke-san mentioned I also think the chip_ready() does not work. It is followed correctly basically the flash chip specification. But actually it does not check the chip state correctly I think. So for the flash write error cases I saw the chip_good() is necessary. Regards, Ikegami > -----Original Message----- > From: linux-mtd [mailto:linux-mtd-bounces@lists.infradead.org] On Behalf > Of Przemyslaw Sobon > Sent: Friday, February 8, 2019 8:58 AM > To: bbrezillon@kernel.org; Joakim.Tjernlund@infinera.com; > linux-mtd@lists.infradead.org; chris.packham@alliedtelesis.co.nz; > fbettoni@gmail.com; ikegami@allied-telesis.co.jp; liujian56@huawei.com > Cc: psobon@amazon.com > Subject: [PATCH] mtd: cfi: Fixed endless loop problem in CFI when value > was written but corrupted. > > Fixes: dfeae1073583(mtd: cfi_cmdset_0002: Change write buffer to > check correct value) > > There was an endless loop in CFI Flash driver when a value was written > incorrectly. In such case chip_ready returns true but chip_good returns > false and we never get out of the loop. > > The solution was to break the loop in 2 cases, either device is ready or > device is not ready and timeout elapsed. The correctness of the write is > checked after the loop ended. That way we ensure the loop always ends. > > Signed-off-by: Przemyslaw Sobon <psobon@amazon.com> > --- > drivers/mtd/chips/cfi_cmdset_0002.c | 11 +++++++---- > 1 file changed, 7 insertions(+), 4 deletions(-) > > diff --git a/drivers/mtd/chips/cfi_cmdset_0002.c > b/drivers/mtd/chips/cfi_cmdset_0002.c > index 72428b6bfc47..6cc31d2057e9 100644 > --- a/drivers/mtd/chips/cfi_cmdset_0002.c > +++ b/drivers/mtd/chips/cfi_cmdset_0002.c > @@ -1879,15 +1879,18 @@ static int __xipram do_write_buffer(struct map_info > *map, struct flchip *chip, > if (time_after(jiffies, timeo) && !chip_ready(map, adr)) > break; > > - if (chip_good(map, adr, datum)) { > - xip_enable(map, chip, adr); > - goto op_done; > - } > + if (chip_ready(map, adr)) > + break; > > /* Latency issues. Drop the lock, wait a while and retry > */ > UDELAY(map, chip, adr, 1); > } > > + if (chip_good(map, adr, datum)) { > + xip_enable(map, chip, adr); > + goto op_done; > + } > + > /* > * Recovery from write-buffer programming failures requires > * the write-to-buffer-reset sequence. Since the last part > -- > 2.16.5 > > > ______________________________________________________ > Linux MTD discussion mailing list > http://lists.infradead.org/mailman/listinfo/linux-mtd/
Hi All, On 8/02/19 12:58 PM, Przemyslaw Sobon wrote: > Fixes: dfeae1073583(mtd: cfi_cmdset_0002: Change write buffer to > check correct value) > > There was an endless loop in CFI Flash driver when a value was written > incorrectly. In such case chip_ready returns true but chip_good returns > false and we never get out of the loop. > > The solution was to break the loop in 2 cases, either device is ready or > device is not ready and timeout elapsed. The correctness of the write is > checked after the loop ended. That way we ensure the loop always ends. > > Signed-off-by: Przemyslaw Sobon <psobon@amazon.com> Mark (cc'd) has done some testing here, and assuming he's happy with the forgery. Tested-by: Mark Tomlinson <Mark.Tomlinson@alliedtelesis.co.nz> > --- > drivers/mtd/chips/cfi_cmdset_0002.c | 11 +++++++---- > 1 file changed, 7 insertions(+), 4 deletions(-) > > diff --git a/drivers/mtd/chips/cfi_cmdset_0002.c b/drivers/mtd/chips/cfi_cmdset_0002.c > index 72428b6bfc47..6cc31d2057e9 100644 > --- a/drivers/mtd/chips/cfi_cmdset_0002.c > +++ b/drivers/mtd/chips/cfi_cmdset_0002.c > @@ -1879,15 +1879,18 @@ static int __xipram do_write_buffer(struct map_info *map, struct flchip *chip, > if (time_after(jiffies, timeo) && !chip_ready(map, adr)) > break; > > - if (chip_good(map, adr, datum)) { > - xip_enable(map, chip, adr); > - goto op_done; > - } > + if (chip_ready(map, adr)) > + break; > > /* Latency issues. Drop the lock, wait a while and retry */ > UDELAY(map, chip, adr, 1); > } > > + if (chip_good(map, adr, datum)) { > + xip_enable(map, chip, adr); > + goto op_done; > + } > + > /* > * Recovery from write-buffer programming failures requires > * the write-to-buffer-reset sequence. Since the last part >
On Thu, 14 Feb 2019 00:39:09 +0000 Chris Packham <Chris.Packham@alliedtelesis.co.nz> wrote: > Hi All, > > On 8/02/19 12:58 PM, Przemyslaw Sobon wrote: > > Fixes: dfeae1073583(mtd: cfi_cmdset_0002: Change write buffer to > > check correct value) > > > > There was an endless loop in CFI Flash driver when a value was written > > incorrectly. In such case chip_ready returns true but chip_good returns > > false and we never get out of the loop. > > > > The solution was to break the loop in 2 cases, either device is ready or > > device is not ready and timeout elapsed. The correctness of the write is > > checked after the loop ended. That way we ensure the loop always ends. > > > > Signed-off-by: Przemyslaw Sobon <psobon@amazon.com> > > Mark (cc'd) has done some testing here, and assuming he's happy with the > forgery. > > Tested-by: Mark Tomlinson <Mark.Tomlinson@alliedtelesis.co.nz> I'm a bit lost. Ikegami told us that checking for chip_ready() was not enough and chip_good() could return true after a few tests even though it initially returned false. I'd really like to get that fixed, but it looks like you haven't reached a consensus on what the appropriate fix is :-/. > > > --- > > drivers/mtd/chips/cfi_cmdset_0002.c | 11 +++++++---- > > 1 file changed, 7 insertions(+), 4 deletions(-) > > > > diff --git a/drivers/mtd/chips/cfi_cmdset_0002.c b/drivers/mtd/chips/cfi_cmdset_0002.c > > index 72428b6bfc47..6cc31d2057e9 100644 > > --- a/drivers/mtd/chips/cfi_cmdset_0002.c > > +++ b/drivers/mtd/chips/cfi_cmdset_0002.c > > @@ -1879,15 +1879,18 @@ static int __xipram do_write_buffer(struct map_info *map, struct flchip *chip, > > if (time_after(jiffies, timeo) && !chip_ready(map, adr)) > > break; > > > > - if (chip_good(map, adr, datum)) { > > - xip_enable(map, chip, adr); > > - goto op_done; > > - } > > + if (chip_ready(map, adr)) > > + break; > > > > /* Latency issues. Drop the lock, wait a while and retry */ > > UDELAY(map, chip, adr, 1); > > } > > > > + if (chip_good(map, adr, datum)) { > > + xip_enable(map, chip, adr); > > + goto op_done; > > + } > > + > > /* > > * Recovery from write-buffer programming failures requires > > * the write-to-buffer-reset sequence. Since the last part > > > > > ______________________________________________________ > Linux MTD discussion mailing list > http://lists.infradead.org/mailman/listinfo/linux-mtd/
On 19/02/19 9:00 PM, Boris Brezillon wrote: > On Thu, 14 Feb 2019 00:39:09 +0000 > Chris Packham <Chris.Packham@alliedtelesis.co.nz> wrote: > >> Hi All, >> >> On 8/02/19 12:58 PM, Przemyslaw Sobon wrote: >>> Fixes: dfeae1073583(mtd: cfi_cmdset_0002: Change write buffer to >>> check correct value) >>> >>> There was an endless loop in CFI Flash driver when a value was written >>> incorrectly. In such case chip_ready returns true but chip_good returns >>> false and we never get out of the loop. >>> >>> The solution was to break the loop in 2 cases, either device is ready or >>> device is not ready and timeout elapsed. The correctness of the write is >>> checked after the loop ended. That way we ensure the loop always ends. >>> >>> Signed-off-by: Przemyslaw Sobon <psobon@amazon.com> >> Mark (cc'd) has done some testing here, and assuming he's happy with the >> forgery. >> >> Tested-by: Mark Tomlinson <Mark.Tomlinson@alliedtelesis.co.nz> > I'm a bit lost. Ikegami told us that checking for chip_ready() was not > enough and chip_good() could return true after a few tests even though > it initially returned false. > > I'd really like to get that fixed, but it looks like you haven't reached > a consensus on what the appropriate fix is :-/. I have done some further testing and this patch doesn't work 100%. It appears at least some flash chips do not start toggling immediately, and therefore chip_ready() can return true early. A timeout is reported, even though that isn't what happened. chip_good() makes an additional check over chip_ready() and is the call I believe we should be using. I will submit a new patch which should fix the infinite loop as well as not mis-reporting errors.
On Tue, 19 Feb 2019 20:02:37 +0000 Mark Tomlinson <Mark.Tomlinson@alliedtelesis.co.nz> wrote: > On 19/02/19 9:00 PM, Boris Brezillon wrote: > > On Thu, 14 Feb 2019 00:39:09 +0000 > > Chris Packham <Chris.Packham@alliedtelesis.co.nz> wrote: > > > >> Hi All, > >> > >> On 8/02/19 12:58 PM, Przemyslaw Sobon wrote: > >>> Fixes: dfeae1073583(mtd: cfi_cmdset_0002: Change write buffer to > >>> check correct value) > >>> > >>> There was an endless loop in CFI Flash driver when a value was written > >>> incorrectly. In such case chip_ready returns true but chip_good returns > >>> false and we never get out of the loop. > >>> > >>> The solution was to break the loop in 2 cases, either device is ready or > >>> device is not ready and timeout elapsed. The correctness of the write is > >>> checked after the loop ended. That way we ensure the loop always ends. > >>> > >>> Signed-off-by: Przemyslaw Sobon <psobon@amazon.com> > >> Mark (cc'd) has done some testing here, and assuming he's happy with the > >> forgery. > >> > >> Tested-by: Mark Tomlinson <Mark.Tomlinson@alliedtelesis.co.nz> > > I'm a bit lost. Ikegami told us that checking for chip_ready() was not > > enough and chip_good() could return true after a few tests even though > > it initially returned false. > > > > I'd really like to get that fixed, but it looks like you haven't reached > > a consensus on what the appropriate fix is :-/. > I have done some further testing and this patch doesn't work 100%. It > appears at least some flash chips do not start toggling immediately, and > therefore chip_ready() can return true early. A timeout is reported, > even though that isn't what happened. > > chip_good() makes an additional check over chip_ready() and is the call > I believe we should be using. I will submit a new patch which should fix > the infinite loop as well as not mis-reporting errors. No, please, don't do that. We already have 3 versions of the same fix floating around (one from Ikegami, one from Liu Jian and another one from Przemyslaw). Can you please sync and submit a single patch that all of you agree on?
On 20/02/19 9:03 PM, Boris Brezillon wrote: > On Tue, 19 Feb 2019 20:02:37 +0000 > Mark Tomlinson <Mark.Tomlinson@alliedtelesis.co.nz> wrote: > >> On 19/02/19 9:00 PM, Boris Brezillon wrote: >>> I'm a bit lost. Ikegami told us that checking for chip_ready() was not >>> enough and chip_good() could return true after a few tests even though >>> it initially returned false. >>> >>> I'd really like to get that fixed, but it looks like you haven't reached >>> a consensus on what the appropriate fix is :-/. >> I have done some further testing and this patch doesn't work 100%. It >> appears at least some flash chips do not start toggling immediately, and >> therefore chip_ready() can return true early. A timeout is reported, >> even though that isn't what happened. >> >> chip_good() makes an additional check over chip_ready() and is the call >> I believe we should be using. I will submit a new patch which should fix >> the infinite loop as well as not mis-reporting errors. > No, please, don't do that. We already have 3 versions of the same fix > floating around (one from Ikegami, one from Liu Jian and another one > from Przemyslaw). Can you please sync and submit a single patch that > all of you agree on? > Ikegami-san has pointed out Liu Jian's patch to me. That patch works fine for me, so I won't be creating another one afterall. Hope that reduces the number of possible patches.
diff --git a/drivers/mtd/chips/cfi_cmdset_0002.c b/drivers/mtd/chips/cfi_cmdset_0002.c index 72428b6bfc47..6cc31d2057e9 100644 --- a/drivers/mtd/chips/cfi_cmdset_0002.c +++ b/drivers/mtd/chips/cfi_cmdset_0002.c @@ -1879,15 +1879,18 @@ static int __xipram do_write_buffer(struct map_info *map, struct flchip *chip, if (time_after(jiffies, timeo) && !chip_ready(map, adr)) break; - if (chip_good(map, adr, datum)) { - xip_enable(map, chip, adr); - goto op_done; - } + if (chip_ready(map, adr)) + break; /* Latency issues. Drop the lock, wait a while and retry */ UDELAY(map, chip, adr, 1); } + if (chip_good(map, adr, datum)) { + xip_enable(map, chip, adr); + goto op_done; + } + /* * Recovery from write-buffer programming failures requires * the write-to-buffer-reset sequence. Since the last part
Fixes: dfeae1073583(mtd: cfi_cmdset_0002: Change write buffer to check correct value) There was an endless loop in CFI Flash driver when a value was written incorrectly. In such case chip_ready returns true but chip_good returns false and we never get out of the loop. The solution was to break the loop in 2 cases, either device is ready or device is not ready and timeout elapsed. The correctness of the write is checked after the loop ended. That way we ensure the loop always ends. Signed-off-by: Przemyslaw Sobon <psobon@amazon.com> --- drivers/mtd/chips/cfi_cmdset_0002.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-)