diff mbox series

cfi: fix deadloop in cfi_cmdset_0002.c do_write_buffer

Message ID 1548977439-318904-1-git-send-email-liujian56@huawei.com
State Superseded
Delegated to: Vignesh R
Headers show
Series cfi: fix deadloop in cfi_cmdset_0002.c do_write_buffer | expand

Commit Message

Liu Jian Jan. 31, 2019, 11:30 p.m. UTC
In function do_write_buffer(), in the for loop, there is a case
chip_ready() returns 1 while chip_good() returns 0, so it never
break the loop.
To fix this, chip_good() is enough and it should timeout if it stay
bad for a while.

Fixes: dfeae1073583(mtd: cfi_cmdset_0002: Change write buffer to check
correct value)
Signed-off-by: Yi Huaijie <yihuaijie@huawei.com>
Signed-off-by: Liu Jian <liujian56@huawei.com>
---
 drivers/mtd/chips/cfi_cmdset_0002.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

Comments

Boris Brezillon Feb. 3, 2019, 8:26 a.m. UTC | #1
+Przemyslaw

On Fri, 1 Feb 2019 07:30:39 +0800
Liu Jian <liujian56@huawei.com> wrote:

> In function do_write_buffer(), in the for loop, there is a case
> chip_ready() returns 1 while chip_good() returns 0, so it never
> break the loop.
> To fix this, chip_good() is enough and it should timeout if it stay
> bad for a while.

Looks like Przemyslaw reported and fixed the same problem.

> 
> Fixes: dfeae1073583(mtd: cfi_cmdset_0002: Change write buffer to check
> correct value)

Can you put the Fixes tag on a single, and the format is

Fixes: <hash> ("message")

> Signed-off-by: Yi Huaijie <yihuaijie@huawei.com>
> Signed-off-by: Liu Jian <liujian56@huawei.com>

[1]http://patchwork.ozlabs.org/patch/1025566/

> ---
>  drivers/mtd/chips/cfi_cmdset_0002.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/mtd/chips/cfi_cmdset_0002.c b/drivers/mtd/chips/cfi_cmdset_0002.c
> index 72428b6..818e94b 100644
> --- a/drivers/mtd/chips/cfi_cmdset_0002.c
> +++ b/drivers/mtd/chips/cfi_cmdset_0002.c
> @@ -1876,14 +1876,14 @@ static int __xipram do_write_buffer(struct map_info *map, struct flchip *chip,
>  			continue;
>  		}
>  
> -		if (time_after(jiffies, timeo) && !chip_ready(map, adr))
> -			break;
> -
>  		if (chip_good(map, adr, datum)) {
>  			xip_enable(map, chip, adr);
>  			goto op_done;
>  		}
>  
> +		if (time_after(jiffies, timeo))
> +			break;
> +
>  		/* Latency issues. Drop the lock, wait a while and retry */
>  		UDELAY(map, chip, adr, 1);
>  	}
Boris Brezillon Feb. 3, 2019, 8:35 a.m. UTC | #2
On Sun, 3 Feb 2019 09:26:45 +0100
Boris Brezillon <bbrezillon@kernel.org> wrote:

> +Przemyslaw
> 
> On Fri, 1 Feb 2019 07:30:39 +0800
> Liu Jian <liujian56@huawei.com> wrote:
> 
> > In function do_write_buffer(), in the for loop, there is a case
> > chip_ready() returns 1 while chip_good() returns 0, so it never
> > break the loop.
> > To fix this, chip_good() is enough and it should timeout if it stay
> > bad for a while.  
> 
> Looks like Przemyslaw reported and fixed the same problem.
> 
> > 
> > Fixes: dfeae1073583(mtd: cfi_cmdset_0002: Change write buffer to check
> > correct value)  
> 
> Can you put the Fixes tag on a single, and the format is
> 
> Fixes: <hash> ("message")
> 
> > Signed-off-by: Yi Huaijie <yihuaijie@huawei.com>
> > Signed-off-by: Liu Jian <liujian56@huawei.com>  
> 
> [1]http://patchwork.ozlabs.org/patch/1025566/
> 
> > ---
> >  drivers/mtd/chips/cfi_cmdset_0002.c | 6 +++---
> >  1 file changed, 3 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/mtd/chips/cfi_cmdset_0002.c b/drivers/mtd/chips/cfi_cmdset_0002.c
> > index 72428b6..818e94b 100644
> > --- a/drivers/mtd/chips/cfi_cmdset_0002.c
> > +++ b/drivers/mtd/chips/cfi_cmdset_0002.c
> > @@ -1876,14 +1876,14 @@ static int __xipram do_write_buffer(struct map_info *map, struct flchip *chip,
> >  			continue;
> >  		}
> >  
> > -		if (time_after(jiffies, timeo) && !chip_ready(map, adr))
> > -			break;
> > -
> >  		if (chip_good(map, adr, datum)) {
> >  			xip_enable(map, chip, adr);
> >  			goto op_done;
> >  		}
> >  
> > +		if (time_after(jiffies, timeo))
> > +			break;
> > +
> >  		/* Latency issues. Drop the lock, wait a while and retry */
> >  		UDELAY(map, chip, adr, 1);
> >  	}  
> 

BTW, the patch itself looks good to me. Ikegami, can you confirm it
does the right thing?

Thanks,

Boris
Przemyslaw Sobon Feb. 5, 2019, 10:28 p.m. UTC | #3
> From: Boris Brezillon <bbrezillon@kernel.org> 
> Sent: Sunday, February 3, 2019 12:35 AM
> > +Przemyslaw
> > 
> > On Fri, 1 Feb 2019 07:30:39 +0800
> > Liu Jian <liujian56@huawei.com> wrote:
> > 
> > > In function do_write_buffer(), in the for loop, there is a case
> > > chip_ready() returns 1 while chip_good() returns 0, so it never 
> > > break the loop.
> > > To fix this, chip_good() is enough and it should timeout if it stay 
> > > bad for a while.
> > 
> > Looks like Przemyslaw reported and fixed the same problem.
> > 
> > > 
> > > Fixes: dfeae1073583(mtd: cfi_cmdset_0002: Change write buffer to 
> > > check correct value)
> > 
> > Can you put the Fixes tag on a single, and the format is
> > 
> > Fixes: <hash> ("message")
> > 
> > > Signed-off-by: Yi Huaijie <yihuaijie@huawei.com>
> > > Signed-off-by: Liu Jian <liujian56@huawei.com>
> > 
> > [1]http://patchwork.ozlabs.org/patch/1025566/
> > 
> > > ---
> > >  drivers/mtd/chips/cfi_cmdset_0002.c | 6 +++---
> > >  1 file changed, 3 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/drivers/mtd/chips/cfi_cmdset_0002.c 
> > > b/drivers/mtd/chips/cfi_cmdset_0002.c
> > > index 72428b6..818e94b 100644
> > > --- a/drivers/mtd/chips/cfi_cmdset_0002.c
> > > +++ b/drivers/mtd/chips/cfi_cmdset_0002.c
> > > @@ -1876,14 +1876,14 @@ static int __xipram do_write_buffer(struct map_info *map, struct flchip *chip,
> > >  			continue;
> > >  		}
> > >  
> > > -		if (time_after(jiffies, timeo) && !chip_ready(map, adr))
> > > -			break;
> > > -
> > >  		if (chip_good(map, adr, datum)) {
> > >  			xip_enable(map, chip, adr);
> > >  			goto op_done;
> > >  		}
> > >  
> > > +		if (time_after(jiffies, timeo))
> > > +			break;
> > > +
> > >  		/* Latency issues. Drop the lock, wait a while and retry */
> > >  		UDELAY(map, chip, adr, 1);
> > >  	}
> > 
> 
> BTW, the patch itself looks good to me. Ikegami, can you confirm it does the right thing?
> 
> Thanks,
> 
> Boris
> 

One comment to this patch. If value is written incorrectly quickly we will be
stuck in the loop even though nothing is going to change. For example a value was
written incorrectly after 1us, the loop was set to 1ms, function will return
after 1ms, this solution is not optimized for performance. I considered same
when working on this change and decided to do it different way.

Regards,
Przemek
Tokunori Ikegami Feb. 5, 2019, 11:03 p.m. UTC | #4
The patch looks good to me.
About the performance issue it seems that it is expected by this do_write_buffer() function.
If this patch will be applied I will do rebase my patches with this.

Reviewed-by: Tokunori Ikegami <ikegami_to@yahoo.co.jp>


--- psobon@amazon.com wrote --- :
> > From: Boris Brezillon <bbrezillon@kernel.org> 
> > Sent: Sunday, February 3, 2019 12:35 AM
> > > +Przemyslaw
> > > 
> > > On Fri, 1 Feb 2019 07:30:39 +0800
> > > Liu Jian <liujian56@huawei.com> wrote:
> > > 
> > > > In function do_write_buffer(), in the for loop, there is a case
> > > > chip_ready() returns 1 while chip_good() returns 0, so it never 
> > > > break the loop.
> > > > To fix this, chip_good() is enough and it should timeout if it stay 
> > > > bad for a while.
> > > 
> > > Looks like Przemyslaw reported and fixed the same problem.
> > > 
> > > > 
> > > > Fixes: dfeae1073583(mtd: cfi_cmdset_0002: Change write buffer to 
> > > > check correct value)
> > > 
> > > Can you put the Fixes tag on a single, and the format is
> > > 
> > > Fixes: <hash> ("message")
> > > 
> > > > Signed-off-by: Yi Huaijie <yihuaijie@huawei.com>
> > > > Signed-off-by: Liu Jian <liujian56@huawei.com>
> > > 
> > > [1]http://patchwork.ozlabs.org/patch/1025566/ 
> > > 
> > > > ---
> > > >  drivers/mtd/chips/cfi_cmdset_0002.c | 6 +++---
> > > >  1 file changed, 3 insertions(+), 3 deletions(-)
> > > > 
> > > > diff --git a/drivers/mtd/chips/cfi_cmdset_0002.c 
> > > > b/drivers/mtd/chips/cfi_cmdset_0002.c
> > > > index 72428b6..818e94b 100644
> > > > --- a/drivers/mtd/chips/cfi_cmdset_0002.c
> > > > +++ b/drivers/mtd/chips/cfi_cmdset_0002.c
> > > > @@ -1876,14 +1876,14 @@ static int __xipram do_write_buffer(struct map_info *map, struct flchip *chip,
> > > >              continue;
> > > >          }
> > > >  
> > > > -        if (time_after(jiffies, timeo) && !chip_ready(map, adr))
> > > > -            break;
> > > > -
> > > >          if (chip_good(map, adr, datum)) {
> > > >              xip_enable(map, chip, adr);
> > > >              goto op_done;
> > > >          }
> > > >  
> > > > +        if (time_after(jiffies, timeo))
> > > > +            break;
> > > > +
> > > >          /* Latency issues. Drop the lock, wait a while and retry */
> > > >          UDELAY(map, chip, adr, 1);
> > > >      }
> > > 
> > 
> > BTW, the patch itself looks good to me. Ikegami, can you confirm it does the right thing?
> > 
> > Thanks,
> > 
> > Boris
> > 
> 
> One comment to this patch. If value is written incorrectly quickly we will be
> stuck in the loop even though nothing is going to change. For example a value was
> written incorrectly after 1us, the loop was set to 1ms, function will return
> after 1ms, this solution is not optimized for performance. I considered same
> when working on this change and decided to do it different way.
> 
> Regards,
> Przemek
> 
> ______________________________________________________
> Linux MTD discussion mailing list
> http://lists.infradead.org/mailman/listinfo/linux-mtd/ 
>
Boris Brezillon Feb. 7, 2019, 8:56 a.m. UTC | #5
Hi Sobon,

On Tue, 5 Feb 2019 22:28:44 +0000
"Sobon, Przemyslaw" <psobon@amazon.com> wrote:

> > From: Boris Brezillon <bbrezillon@kernel.org> 
> > Sent: Sunday, February 3, 2019 12:35 AM  
> > > +Przemyslaw
> > > 
> > > On Fri, 1 Feb 2019 07:30:39 +0800
> > > Liu Jian <liujian56@huawei.com> wrote:
> > >   
> > > > In function do_write_buffer(), in the for loop, there is a case
> > > > chip_ready() returns 1 while chip_good() returns 0, so it never 
> > > > break the loop.
> > > > To fix this, chip_good() is enough and it should timeout if it stay 
> > > > bad for a while.  
> > > 
> > > Looks like Przemyslaw reported and fixed the same problem.
> > >   
> > > > 
> > > > Fixes: dfeae1073583(mtd: cfi_cmdset_0002: Change write buffer to 
> > > > check correct value)  
> > > 
> > > Can you put the Fixes tag on a single, and the format is
> > > 
> > > Fixes: <hash> ("message")
> > >   
> > > > Signed-off-by: Yi Huaijie <yihuaijie@huawei.com>
> > > > Signed-off-by: Liu Jian <liujian56@huawei.com>  
> > > 
> > > [1]http://patchwork.ozlabs.org/patch/1025566/
> > >   
> > > > ---
> > > >  drivers/mtd/chips/cfi_cmdset_0002.c | 6 +++---
> > > >  1 file changed, 3 insertions(+), 3 deletions(-)
> > > > 
> > > > diff --git a/drivers/mtd/chips/cfi_cmdset_0002.c 
> > > > b/drivers/mtd/chips/cfi_cmdset_0002.c
> > > > index 72428b6..818e94b 100644
> > > > --- a/drivers/mtd/chips/cfi_cmdset_0002.c
> > > > +++ b/drivers/mtd/chips/cfi_cmdset_0002.c
> > > > @@ -1876,14 +1876,14 @@ static int __xipram do_write_buffer(struct map_info *map, struct flchip *chip,
> > > >  			continue;
> > > >  		}
> > > >  
> > > > -		if (time_after(jiffies, timeo) && !chip_ready(map, adr))
> > > > -			break;
> > > > -
> > > >  		if (chip_good(map, adr, datum)) {
> > > >  			xip_enable(map, chip, adr);
> > > >  			goto op_done;
> > > >  		}
> > > >  
> > > > +		if (time_after(jiffies, timeo))
> > > > +			break;
> > > > +
> > > >  		/* Latency issues. Drop the lock, wait a while and retry */
> > > >  		UDELAY(map, chip, adr, 1);
> > > >  	}  
> > >   
> > 
> > BTW, the patch itself looks good to me. Ikegami, can you confirm it does the right thing?
> > 
> > Thanks,
> > 
> > Boris
> >   
> 
> One comment to this patch. If value is written incorrectly quickly we will be
> stuck in the loop even though nothing is going to change. For example a value was
> written incorrectly after 1us, the loop was set to 1ms, function will return
> after 1ms, this solution is not optimized for performance. I considered same
> when working on this change and decided to do it different way.

Seems like you're right if we assume that checking for GOOD state does
not require a delay after the READY check, but if that's not the case
and an extra delay is actually required, you might end up with a BAD
status while it could have turned GOOD at some point with the 'check
only for GOOD state until we timeout' approach.

TBH, I don't know how CFI flashes work, so I'll let you guys sort this
out.

Regards,

Boris
Tokunori Ikegami Feb. 7, 2019, 10:59 p.m. UTC | #6
Hi Przemek-san,

Could you please explain the case detail that the value is written incorrectly?
I think that the value is only written correctly except a bug.

Regards,
Ikegami

--- boris.brezillon@collabora.com wrote --- :
> Hi Sobon,
> 
> On Tue, 5 Feb 2019 22:28:44 +0000
> "Sobon, Przemyslaw" <psobon@amazon.com> wrote:
> 
> > > From: Boris Brezillon <bbrezillon@kernel.org> 
> > > Sent: Sunday, February 3, 2019 12:35 AM  
> > > > +Przemyslaw
> > > > 
> > > > On Fri, 1 Feb 2019 07:30:39 +0800
> > > > Liu Jian <liujian56@huawei.com> wrote:
> > > >   
> > > > > In function do_write_buffer(), in the for loop, there is a case
> > > > > chip_ready() returns 1 while chip_good() returns 0, so it never 
> > > > > break the loop.
> > > > > To fix this, chip_good() is enough and it should timeout if it stay 
> > > > > bad for a while.  
> > > > 
> > > > Looks like Przemyslaw reported and fixed the same problem.
> > > >   
> > > > > 
> > > > > Fixes: dfeae1073583(mtd: cfi_cmdset_0002: Change write buffer to 
> > > > > check correct value)  
> > > > 
> > > > Can you put the Fixes tag on a single, and the format is
> > > > 
> > > > Fixes: <hash> ("message")
> > > >   
> > > > > Signed-off-by: Yi Huaijie <yihuaijie@huawei.com>
> > > > > Signed-off-by: Liu Jian <liujian56@huawei.com>  
> > > > 
> > > > [1]http://patchwork.ozlabs.org/patch/1025566/ 
> > > >   
> > > > > ---
> > > > >  drivers/mtd/chips/cfi_cmdset_0002.c | 6 +++---
> > > > >  1 file changed, 3 insertions(+), 3 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/mtd/chips/cfi_cmdset_0002.c 
> > > > > b/drivers/mtd/chips/cfi_cmdset_0002.c
> > > > > index 72428b6..818e94b 100644
> > > > > --- a/drivers/mtd/chips/cfi_cmdset_0002.c
> > > > > +++ b/drivers/mtd/chips/cfi_cmdset_0002.c
> > > > > @@ -1876,14 +1876,14 @@ static int __xipram do_write_buffer(struct map_info *map, struct flchip *chip,
> > > > >              continue;
> > > > >          }
> > > > >  
> > > > > -        if (time_after(jiffies, timeo) && !chip_ready(map, adr))
> > > > > -            break;
> > > > > -
> > > > >          if (chip_good(map, adr, datum)) {
> > > > >              xip_enable(map, chip, adr);
> > > > >              goto op_done;
> > > > >          }
> > > > >  
> > > > > +        if (time_after(jiffies, timeo))
> > > > > +            break;
> > > > > +
> > > > >          /* Latency issues. Drop the lock, wait a while and retry */
> > > > >          UDELAY(map, chip, adr, 1);
> > > > >      }  
> > > >   
> > > 
> > > BTW, the patch itself looks good to me. Ikegami, can you confirm it does the right thing?
> > > 
> > > Thanks,
> > > 
> > > Boris
> > >   
> > 
> > One comment to this patch. If value is written incorrectly quickly we will be
> > stuck in the loop even though nothing is going to change. For example a value was
> > written incorrectly after 1us, the loop was set to 1ms, function will return
> > after 1ms, this solution is not optimized for performance. I considered same
> > when working on this change and decided to do it different way.
> 
> Seems like you're right if we assume that checking for GOOD state does
> not require a delay after the READY check, but if that's not the case
> and an extra delay is actually required, you might end up with a BAD
> status while it could have turned GOOD at some point with the 'check
> only for GOOD state until we timeout' approach.
> 
> TBH, I don't know how CFI flashes work, so I'll let you guys sort this
> out.
> 
> Regards,
> 
> Boris
> 
> ______________________________________________________
> Linux MTD discussion mailing list
> http://lists.infradead.org/mailman/listinfo/linux-mtd/ 
>
Przemyslaw Sobon Feb. 7, 2019, 11:50 p.m. UTC | #7
Hi Ikegami,

I have seen a case myself where a value was written, chip changed
state to "ready" but when I was reading the value was incorrect.
This can happen as result of intermittent issue with flash. It is
hard to fall into scenario when testing on limited number of devices
but with large enough population you can see that. Another situation
is when a flash chip reaches its maximum number of writes. So for
example a chip is designed for 100k writes to a page. Once you 
reach that number of writes you can have invalid data written to
flash but chip itself reports everything was good and switches to
"ready" state.

Hope this explanation is clear. Please let me know.

Regards,
Przemek

> -----Original Message-----
> From: ikegami_to@yahoo.co.jp <ikegami_to@yahoo.co.jp> 
> Sent: Thursday, February 7, 2019 3:00 PM
> 
> Hi Przemek-san,
> 
> Could you please explain the case detail that the value is written incorrectly?
> I think that the value is only written correctly except a bug.
> 
> Regards,
> Ikegami
> 
> --- boris.brezillon@collabora.com wrote --- :
> > Hi Sobon,
> > 
> > On Tue, 5 Feb 2019 22:28:44 +0000
> > "Sobon, Przemyslaw" <psobon@amazon.com> wrote:
> > 
> > > > From: Boris Brezillon <bbrezillon@kernel.org>
> > > > Sent: Sunday, February 3, 2019 12:35 AM
> > > > > +Przemyslaw
> > > > > 
> > > > > On Fri, 1 Feb 2019 07:30:39 +0800 Liu Jian 
> > > > > <liujian56@huawei.com> wrote:
> > > > >   
> > > > > > In function do_write_buffer(), in the for loop, there is a 
> > > > > > case
> > > > > > chip_ready() returns 1 while chip_good() returns 0, so it 
> > > > > > never break the loop.
> > > > > > To fix this, chip_good() is enough and it should timeout if it 
> > > > > > stay bad for a while.
> > > > > 
> > > > > Looks like Przemyslaw reported and fixed the same problem.
> > > > >   
> > > > > > 
> > > > > > Fixes: dfeae1073583(mtd: cfi_cmdset_0002: Change write buffer 
> > > > > > to check correct value)
> > > > > 
> > > > > Can you put the Fixes tag on a single, and the format is
> > > > > 
> > > > > Fixes: <hash> ("message")
> > > > >   
> > > > > > Signed-off-by: Yi Huaijie <yihuaijie@huawei.com>
> > > > > > Signed-off-by: Liu Jian <liujian56@huawei.com>
> > > > > 
> > > > > [1]http://patchwork.ozlabs.org/patch/1025566/
> > > > >   
> > > > > > ---
> > > > > >  drivers/mtd/chips/cfi_cmdset_0002.c | 6 +++---
> > > > > >  1 file changed, 3 insertions(+), 3 deletions(-)
> > > > > > 
> > > > > > diff --git a/drivers/mtd/chips/cfi_cmdset_0002.c
> > > > > > b/drivers/mtd/chips/cfi_cmdset_0002.c
> > > > > > index 72428b6..818e94b 100644
> > > > > > --- a/drivers/mtd/chips/cfi_cmdset_0002.c
> > > > > > +++ b/drivers/mtd/chips/cfi_cmdset_0002.c
> > > > > > @@ -1876,14 +1876,14 @@ static int __xipram do_write_buffer(struct map_info *map, struct flchip *chip,
> > > > > >              continue;
> > > > > >          }
> > > > > >  
> > > > > > -        if (time_after(jiffies, timeo) && !chip_ready(map, adr))
> > > > > > -            break;
> > > > > > -
> > > > > >          if (chip_good(map, adr, datum)) {
> > > > > >              xip_enable(map, chip, adr);
> > > > > >              goto op_done;
> > > > > >          }
> > > > > >  
> > > > > > +        if (time_after(jiffies, timeo))
> > > > > > +            break;
> > > > > > +
> > > > > >          /* Latency issues. Drop the lock, wait a while and retry */
> > > > > >          UDELAY(map, chip, adr, 1);
> > > > > >      }
> > > > >   
> > > > 
> > > > BTW, the patch itself looks good to me. Ikegami, can you confirm it does the right thing?
> > > > 
> > > > Thanks,
> > > > 
> > > > Boris
> > > >   
> > > 
> > > One comment to this patch. If value is written incorrectly quickly 
> > > we will be stuck in the loop even though nothing is going to change. 
> > > For example a value was written incorrectly after 1us, the loop was 
> > > set to 1ms, function will return after 1ms, this solution is not 
> > > optimized for performance. I considered same when working on this change and decided to do it different way.
> > 
> > Seems like you're right if we assume that checking for GOOD state does 
> > not require a delay after the READY check, but if that's not the case 
> > and an extra delay is actually required, you might end up with a BAD 
> > status while it could have turned GOOD at some point with the 'check 
> > only for GOOD state until we timeout' approach.
> > 
> > TBH, I don't know how CFI flashes work, so I'll let you guys sort this 
> > out.
> > 
> > Regards,
> > 
> > Boris
> > 
> > ______________________________________________________
> > Linux MTD discussion mailing list
> > http://lists.infradead.org/mailman/listinfo/linux-mtd/
> > 
> 
>
Joakim Tjernlund Feb. 8, 2019, 8:45 a.m. UTC | #8
On Thu, 2019-02-07 at 23:50 +0000, Sobon, Przemyslaw wrote:
> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.
> 
> 
> Hi Ikegami,
> 
> I have seen a case myself where a value was written, chip changed
> state to "ready" but when I was reading the value was incorrect.
> This can happen as result of intermittent issue with flash. It is
> hard to fall into scenario when testing on limited number of devices
> but with large enough population you can see that. Another situation
> is when a flash chip reaches its maximum number of writes. So for
> example a chip is designed for 100k writes to a page. Once you
> reach that number of writes you can have invalid data written to
> flash but chip itself reports everything was good and switches to
> "ready" state.

This makes perfekt sense but the AMD flash control I/F does not. You will
find that trying to do advanced things with "toggle" bits is very hard.
Especially when you also need to scale it to interleaved flashes.

I think the odd delay when flash fails is quite OK. If you want to
fix this you need to move the other control I/F(which mimics what Intel has)

 Jocke
Tokunori Ikegami Feb. 8, 2019, 2:23 p.m. UTC | #9
Hi Przemek-san,

Thank you so much for your explanation.

> I have seen a case myself where a value was written, chip changed
> state to "ready" but when I was reading the value was incorrect.

I also know the similar issues for the both buffer and word write.
Both issues were able to reproduce the write error behavior.
  Note: The word write issue is able to reproduce now also.

Those were resolved by using chip_good() instead to check the state.

> This can happen as result of intermittent issue with flash. It is
> hard to fall into scenario when testing on limited number of devices
> but with large enough population you can see that.

If possible I would like to know the issue detail and its cause also.

> Another situation
> is when a flash chip reaches its maximum number of writes. So for
> example a chip is designed for 100k writes to a page. Once you
> reach that number of writes you can have invalid data written to
> flash but chip itself reports everything was good and switches to
> "ready" state.

Yes I see.

Regards,
Ikegami

> -----Original Message-----
> From: linux-mtd [mailto:linux-mtd-bounces@lists.infradead.org] On Behalf
> Of Sobon, Przemyslaw
> Sent: Friday, February 8, 2019 8:51 AM
> To: ikegami_to@yahoo.co.jp; Boris Brezillon
> Cc: keescook@chromium.org; marek.vasut@gmail.com;
> ikegami@allied-telesis.co.jp; richard@nod.at;
> linux-kernel@vger.kernel.org; joakim.tjernlund@infinera.com;
> linux-mtd@lists.infradead.org; computersforpeace@gmail.com;
> dwmw2@infradead.org; Liu Jian
> Subject: RE: Re: [PATCH] cfi: fix deadloop in cfi_cmdset_0002.c
> do_write_buffer
> 
> Hi Ikegami,
> 
> I have seen a case myself where a value was written, chip changed
> state to "ready" but when I was reading the value was incorrect.
> This can happen as result of intermittent issue with flash. It is
> hard to fall into scenario when testing on limited number of devices
> but with large enough population you can see that. Another situation
> is when a flash chip reaches its maximum number of writes. So for
> example a chip is designed for 100k writes to a page. Once you
> reach that number of writes you can have invalid data written to
> flash but chip itself reports everything was good and switches to
> "ready" state.
> 
> Hope this explanation is clear. Please let me know.
> 
> Regards,
> Przemek
> 
> > -----Original Message-----
> > From: ikegami_to@yahoo.co.jp <ikegami_to@yahoo.co.jp>
> > Sent: Thursday, February 7, 2019 3:00 PM
> >
> > Hi Przemek-san,
> >
> > Could you please explain the case detail that the value is written
> incorrectly?
> > I think that the value is only written correctly except a bug.
> >
> > Regards,
> > Ikegami
> >
> > --- boris.brezillon@collabora.com wrote --- :
> > > Hi Sobon,
> > >
> > > On Tue, 5 Feb 2019 22:28:44 +0000
> > > "Sobon, Przemyslaw" <psobon@amazon.com> wrote:
> > >
> > > > > From: Boris Brezillon <bbrezillon@kernel.org>
> > > > > Sent: Sunday, February 3, 2019 12:35 AM
> > > > > > +Przemyslaw
> > > > > >
> > > > > > On Fri, 1 Feb 2019 07:30:39 +0800 Liu Jian
> > > > > > <liujian56@huawei.com> wrote:
> > > > > >
> > > > > > > In function do_write_buffer(), in the for loop, there is a
> > > > > > > case
> > > > > > > chip_ready() returns 1 while chip_good() returns 0, so it
> > > > > > > never break the loop.
> > > > > > > To fix this, chip_good() is enough and it should timeout if
> it
> > > > > > > stay bad for a while.
> > > > > >
> > > > > > Looks like Przemyslaw reported and fixed the same problem.
> > > > > >
> > > > > > >
> > > > > > > Fixes: dfeae1073583(mtd: cfi_cmdset_0002: Change write buffer
> > > > > > > to check correct value)
> > > > > >
> > > > > > Can you put the Fixes tag on a single, and the format is
> > > > > >
> > > > > > Fixes: <hash> ("message")
> > > > > >
> > > > > > > Signed-off-by: Yi Huaijie <yihuaijie@huawei.com>
> > > > > > > Signed-off-by: Liu Jian <liujian56@huawei.com>
> > > > > >
> > > > > > [1]http://patchwork.ozlabs.org/patch/1025566/
> > > > > >
> > > > > > > ---
> > > > > > >  drivers/mtd/chips/cfi_cmdset_0002.c | 6 +++---
> > > > > > >  1 file changed, 3 insertions(+), 3 deletions(-)
> > > > > > >
> > > > > > > diff --git a/drivers/mtd/chips/cfi_cmdset_0002.c
> > > > > > > b/drivers/mtd/chips/cfi_cmdset_0002.c
> > > > > > > index 72428b6..818e94b 100644
> > > > > > > --- a/drivers/mtd/chips/cfi_cmdset_0002.c
> > > > > > > +++ b/drivers/mtd/chips/cfi_cmdset_0002.c
> > > > > > > @@ -1876,14 +1876,14 @@ static int __xipram
> do_write_buffer(struct map_info *map, struct flchip *chip,
> > > > > > >              continue;
> > > > > > >          }
> > > > > > >
> > > > > > > -        if (time_after(jiffies, timeo) && !chip_ready(map,
> adr))
> > > > > > > -            break;
> > > > > > > -
> > > > > > >          if (chip_good(map, adr, datum)) {
> > > > > > >              xip_enable(map, chip, adr);
> > > > > > >              goto op_done;
> > > > > > >          }
> > > > > > >
> > > > > > > +        if (time_after(jiffies, timeo))
> > > > > > > +            break;
> > > > > > > +
> > > > > > >          /* Latency issues. Drop the lock, wait a while and
> retry */
> > > > > > >          UDELAY(map, chip, adr, 1);
> > > > > > >      }
> > > > > >
> > > > >
> > > > > BTW, the patch itself looks good to me. Ikegami, can you confirm
> it does the right thing?
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Boris
> > > > >
> > > >
> > > > One comment to this patch. If value is written incorrectly quickly
> > > > we will be stuck in the loop even though nothing is going to change.
> > > > For example a value was written incorrectly after 1us, the loop was
> > > > set to 1ms, function will return after 1ms, this solution is not
> > > > optimized for performance. I considered same when working on this
> change and decided to do it different way.
> > >
> > > Seems like you're right if we assume that checking for GOOD state does
> > > not require a delay after the READY check, but if that's not the case
> > > and an extra delay is actually required, you might end up with a BAD
> > > status while it could have turned GOOD at some point with the 'check
> > > only for GOOD state until we timeout' approach.
> > >
> > > TBH, I don't know how CFI flashes work, so I'll let you guys sort this
> > > out.
> > >
> > > Regards,
> > >
> > > Boris
> > >
> > > ______________________________________________________
> > > Linux MTD discussion mailing list
> > > http://lists.infradead.org/mailman/listinfo/linux-mtd/
> > >
> >
> >
> ______________________________________________________
> Linux MTD discussion mailing list
> http://lists.infradead.org/mailman/listinfo/linux-mtd/
Liu Jian Feb. 14, 2019, 1:34 a.m. UTC | #10
Best Regards,
liujian


> -----Original Message-----
> From: Tokunori Ikegami [mailto:ikegami.t@gmail.com]
> Sent: Friday, February 08, 2019 10:24 PM
> To: 'Sobon, Przemyslaw' <psobon@amazon.com>; 'Boris Brezillon'
> <boris.brezillon@collabora.com>
> Cc: keescook@chromium.org; marek.vasut@gmail.com; richard@nod.at;
> linux-kernel@vger.kernel.org; joakim.tjernlund@infinera.com;
> linux-mtd@lists.infradead.org; computersforpeace@gmail.com;
> dwmw2@infradead.org; liujian (CE) <liujian56@huawei.com>;
> ikegami_to@yahoo.co.jp
> Subject: RE: Re: [PATCH] cfi: fix deadloop in cfi_cmdset_0002.c
> do_write_buffer
> 
> Hi Przemek-san,
> 
> Thank you so much for your explanation.
> 
> > I have seen a case myself where a value was written, chip changed
> > state to "ready" but when I was reading the value was incorrect.
> 
> I also know the similar issues for the both buffer and word write.
> Both issues were able to reproduce the write error behavior.
>   Note: The word write issue is able to reproduce now also.
> 
> Those were resolved by using chip_good() instead to check the state.
> 
> > This can happen as result of intermittent issue with flash. It is hard
> > to fall into scenario when testing on limited number of devices but
> > with large enough population you can see that.
> 
> If possible I would like to know the issue detail and its cause also.
> 
> > Another situation
> > is when a flash chip reaches its maximum number of writes. So for
> > example a chip is designed for 100k writes to a page. Once you reach
> > that number of writes you can have invalid data written to flash but
> > chip itself reports everything was good and switches to "ready" state.
> 
> Yes I see.
> 
> Regards,
> Ikegami
> 
> > -----Original Message-----
> > From: linux-mtd [mailto:linux-mtd-bounces@lists.infradead.org] On
> > Behalf Of Sobon, Przemyslaw
> > Sent: Friday, February 8, 2019 8:51 AM
> > To: ikegami_to@yahoo.co.jp; Boris Brezillon
> > Cc: keescook@chromium.org; marek.vasut@gmail.com;
> > ikegami@allied-telesis.co.jp; richard@nod.at;
> > linux-kernel@vger.kernel.org; joakim.tjernlund@infinera.com;
> > linux-mtd@lists.infradead.org; computersforpeace@gmail.com;
> > dwmw2@infradead.org; Liu Jian
> > Subject: RE: Re: [PATCH] cfi: fix deadloop in cfi_cmdset_0002.c
> > do_write_buffer
> >
> > Hi Ikegami,
> >
> > I have seen a case myself where a value was written, chip changed
> > state to "ready" but when I was reading the value was incorrect.
> > This can happen as result of intermittent issue with flash. It is hard
> > to fall into scenario when testing on limited number of devices but
> > with large enough population you can see that. Another situation is
> > when a flash chip reaches its maximum number of writes. So for example
> > a chip is designed for 100k writes to a page. Once you reach that
> > number of writes you can have invalid data written to flash but chip
> > itself reports everything was good and switches to "ready" state.
> >
> > Hope this explanation is clear. Please let me know.
> >
> > Regards,
> > Przemek
> >
> > > -----Original Message-----
> > > From: ikegami_to@yahoo.co.jp <ikegami_to@yahoo.co.jp>
> > > Sent: Thursday, February 7, 2019 3:00 PM
> > >
> > > Hi Przemek-san,
> > >
> > > Could you please explain the case detail that the value is written
> > incorrectly?
> > > I think that the value is only written correctly except a bug.
> > >
> > > Regards,
> > > Ikegami
> > >
> > > --- boris.brezillon@collabora.com wrote --- :
> > > > Hi Sobon,
> > > >
> > > > On Tue, 5 Feb 2019 22:28:44 +0000
> > > > "Sobon, Przemyslaw" <psobon@amazon.com> wrote:
> > > >
> > > > > > From: Boris Brezillon <bbrezillon@kernel.org>
> > > > > > Sent: Sunday, February 3, 2019 12:35 AM
> > > > > > > +Przemyslaw
> > > > > > >
> > > > > > > On Fri, 1 Feb 2019 07:30:39 +0800 Liu Jian
> > > > > > > <liujian56@huawei.com> wrote:
> > > > > > >
> > > > > > > > In function do_write_buffer(), in the for loop, there is a
> > > > > > > > case
> > > > > > > > chip_ready() returns 1 while chip_good() returns 0, so it
> > > > > > > > never break the loop.
> > > > > > > > To fix this, chip_good() is enough and it should timeout
> > > > > > > > if
> > it
> > > > > > > > stay bad for a while.
> > > > > > >
> > > > > > > Looks like Przemyslaw reported and fixed the same problem.
> > > > > > >
> > > > > > > >
> > > > > > > > Fixes: dfeae1073583(mtd: cfi_cmdset_0002: Change write
> > > > > > > > buffer to check correct value)
> > > > > > >
> > > > > > > Can you put the Fixes tag on a single, and the format is
> > > > > > >
> > > > > > > Fixes: <hash> ("message")
> > > > > > >
> > > > > > > > Signed-off-by: Yi Huaijie <yihuaijie@huawei.com>
> > > > > > > > Signed-off-by: Liu Jian <liujian56@huawei.com>
> > > > > > >
> > > > > > > [1]http://patchwork.ozlabs.org/patch/1025566/
> > > > > > >

So, do I need to send a v2 patch? Or use Przemyslaw's new patch http://patchwork.ozlabs.org/patch/1038395/

> > > > > > > > ---
> > > > > > > >  drivers/mtd/chips/cfi_cmdset_0002.c | 6 +++---
> > > > > > > >  1 file changed, 3 insertions(+), 3 deletions(-)
> > > > > > > >
> > > > > > > > diff --git a/drivers/mtd/chips/cfi_cmdset_0002.c
> > > > > > > > b/drivers/mtd/chips/cfi_cmdset_0002.c
> > > > > > > > index 72428b6..818e94b 100644
> > > > > > > > --- a/drivers/mtd/chips/cfi_cmdset_0002.c
> > > > > > > > +++ b/drivers/mtd/chips/cfi_cmdset_0002.c
> > > > > > > > @@ -1876,14 +1876,14 @@ static int __xipram
> > do_write_buffer(struct map_info *map, struct flchip *chip,
> > > > > > > >              continue;
> > > > > > > >          }
> > > > > > > >
> > > > > > > > -        if (time_after(jiffies, timeo) && !chip_ready(map,
> > adr))
> > > > > > > > -            break;
> > > > > > > > -
> > > > > > > >          if (chip_good(map, adr, datum)) {
> > > > > > > >              xip_enable(map, chip, adr);
> > > > > > > >              goto op_done;
> > > > > > > >          }
> > > > > > > >
> > > > > > > > +        if (time_after(jiffies, timeo))
> > > > > > > > +            break;
> > > > > > > > +
> > > > > > > >          /* Latency issues. Drop the lock, wait a while
> > > > > > > > and
> > retry */
> > > > > > > >          UDELAY(map, chip, adr, 1);
> > > > > > > >      }
> > > > > > >
> > > > > >
> > > > > > BTW, the patch itself looks good to me. Ikegami, can you
> > > > > > confirm
> > it does the right thing?
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Boris
> > > > > >
> > > > >
> > > > > One comment to this patch. If value is written incorrectly
> > > > > quickly we will be stuck in the loop even though nothing is going to
> change.
> > > > > For example a value was written incorrectly after 1us, the loop
> > > > > was set to 1ms, function will return after 1ms, this solution is
> > > > > not optimized for performance. I considered same when working on
> > > > > this
> > change and decided to do it different way.
> > > >
> > > > Seems like you're right if we assume that checking for GOOD state
> > > > does not require a delay after the READY check, but if that's not
> > > > the case and an extra delay is actually required, you might end up
> > > > with a BAD status while it could have turned GOOD at some point
> > > > with the 'check only for GOOD state until we timeout' approach.
> > > >
> > > > TBH, I don't know how CFI flashes work, so I'll let you guys sort
> > > > this out.
> > > >
> > > > Regards,
> > > >
> > > > Boris
> > > >
> > > > ______________________________________________________
> > > > Linux MTD discussion mailing list
> > > > http://lists.infradead.org/mailman/listinfo/linux-mtd/
> > > >
> > >
> > >
> > ______________________________________________________
> > Linux MTD discussion mailing list
> > http://lists.infradead.org/mailman/listinfo/linux-mtd/
diff mbox series

Patch

diff --git a/drivers/mtd/chips/cfi_cmdset_0002.c b/drivers/mtd/chips/cfi_cmdset_0002.c
index 72428b6..818e94b 100644
--- a/drivers/mtd/chips/cfi_cmdset_0002.c
+++ b/drivers/mtd/chips/cfi_cmdset_0002.c
@@ -1876,14 +1876,14 @@  static int __xipram do_write_buffer(struct map_info *map, struct flchip *chip,
 			continue;
 		}
 
-		if (time_after(jiffies, timeo) && !chip_ready(map, adr))
-			break;
-
 		if (chip_good(map, adr, datum)) {
 			xip_enable(map, chip, adr);
 			goto op_done;
 		}
 
+		if (time_after(jiffies, timeo))
+			break;
+
 		/* Latency issues. Drop the lock, wait a while and retry */
 		UDELAY(map, chip, adr, 1);
 	}