diff mbox

[V3,2/2] mtd: brcmnand: Detect sticky ucorr ecc error on dma reads

Message ID 1465507075-9447-2-git-send-email-kdasu.kdev@gmail.com
State Accepted
Headers show

Commit Message

Kamal Dasu June 9, 2016, 9:17 p.m. UTC
This change provides a fix for controller bug where nand
controller could have a possible sticky error after a PIO
followed by a DMA read. The fix retries a read if we see
a uncorr_ecc after read to detect such sticky errors.
The fix applies to only controller version 7.0 and 7.1.

Signed-off-by: Kamal Dasu <kdasu.kdev@gmail.com>
---
V3 changes 
 none
V2 Changes 
 Restrict controller bug workaround to version 7.0 and 7.1
---
 drivers/mtd/nand/brcmnand/brcmnand.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

Comments

Brian Norris July 11, 2016, 5:04 a.m. UTC | #1
Hi,

Looking through Boris's pull request, I noticed an issue:

On Thu, Jun 09, 2016 at 05:17:55PM -0400, Kamal Dasu wrote:
> This change provides a fix for controller bug where nand
> controller could have a possible sticky error after a PIO
> followed by a DMA read. The fix retries a read if we see
> a uncorr_ecc after read to detect such sticky errors.
> The fix applies to only controller version 7.0 and 7.1.
> 
> Signed-off-by: Kamal Dasu <kdasu.kdev@gmail.com>
> ---
> V3 changes 
>  none
> V2 Changes 
>  Restrict controller bug workaround to version 7.0 and 7.1
> ---
>  drivers/mtd/nand/brcmnand/brcmnand.c | 18 ++++++++++++++++++
>  1 file changed, 18 insertions(+)
> 
> diff --git a/drivers/mtd/nand/brcmnand/brcmnand.c b/drivers/mtd/nand/brcmnand/brcmnand.c
> index 7ee9617..1156495 100644
> --- a/drivers/mtd/nand/brcmnand/brcmnand.c
> +++ b/drivers/mtd/nand/brcmnand/brcmnand.c
> @@ -1602,9 +1602,11 @@ static int brcmnand_read(struct mtd_info *mtd, struct nand_chip *chip,
>  	struct brcmnand_controller *ctrl = host->ctrl;
>  	u64 err_addr = 0;
>  	int err;
> +	static bool retry = true;

Is this intentionally static? That means your retry will only be
performed exactly once *ever*. Probably not what you expected?

Boris, if this is indeed unintended, would you rather fix this up in the
original patch and send me a new pull request? Or have my apply the
trivial fixup (i.e., s/static//) as a separate patch on top?

Brian

>  
>  	dev_dbg(ctrl->dev, "read %llx -> %p\n", (unsigned long long)addr, buf);
>  
> +try_dmaread:
>  	brcmnand_write_reg(ctrl, BRCMNAND_UNCORR_COUNT, 0);
>  
>  	if (has_flash_dma(ctrl) && !oob && flash_dma_buf_ok(buf)) {
> @@ -1626,6 +1628,22 @@ static int brcmnand_read(struct mtd_info *mtd, struct nand_chip *chip,
>  
>  	if (mtd_is_eccerr(err)) {
>  		/*
> +		 * On controller version and 7.0, 7.1 , DMA read after a
> +		 * prior PIO read that reported uncorrectable error,
> +		 * the DMA engine captures this error following DMA read
> +		 * cleared only on subsequent DMA read, so just retry once
> +		 * to clear a possible false error reported for current DMA
> +		 * read
> +		 */
> +		if ((ctrl->nand_version == 0x0700) ||
> +		    (ctrl->nand_version == 0x0701)) {
> +			if (retry) {
> +				retry = false;
> +				goto try_dmaread;
> +			}
> +		}
> +
> +		/*
>  		 * Controller version 7.2 has hw encoder to detect erased page
>  		 * bitflips, apply sw verification for older controllers only
>  		 */
> -- 
> 1.9.1
>
Boris Brezillon July 11, 2016, 6:49 a.m. UTC | #2
On Sun, 10 Jul 2016 22:04:52 -0700
Brian Norris <computersforpeace@gmail.com> wrote:

> Hi,
> 
> Looking through Boris's pull request, I noticed an issue:
> 
> On Thu, Jun 09, 2016 at 05:17:55PM -0400, Kamal Dasu wrote:
> > This change provides a fix for controller bug where nand
> > controller could have a possible sticky error after a PIO
> > followed by a DMA read. The fix retries a read if we see
> > a uncorr_ecc after read to detect such sticky errors.
> > The fix applies to only controller version 7.0 and 7.1.
> > 
> > Signed-off-by: Kamal Dasu <kdasu.kdev@gmail.com>
> > ---
> > V3 changes 
> >  none
> > V2 Changes 
> >  Restrict controller bug workaround to version 7.0 and 7.1
> > ---
> >  drivers/mtd/nand/brcmnand/brcmnand.c | 18 ++++++++++++++++++
> >  1 file changed, 18 insertions(+)
> > 
> > diff --git a/drivers/mtd/nand/brcmnand/brcmnand.c b/drivers/mtd/nand/brcmnand/brcmnand.c
> > index 7ee9617..1156495 100644
> > --- a/drivers/mtd/nand/brcmnand/brcmnand.c
> > +++ b/drivers/mtd/nand/brcmnand/brcmnand.c
> > @@ -1602,9 +1602,11 @@ static int brcmnand_read(struct mtd_info *mtd, struct nand_chip *chip,
> >  	struct brcmnand_controller *ctrl = host->ctrl;
> >  	u64 err_addr = 0;
> >  	int err;
> > +	static bool retry = true;  
> 
> Is this intentionally static? That means your retry will only be
> performed exactly once *ever*. Probably not what you expected?
> 
> Boris, if this is indeed unintended, would you rather fix this up in the
> original patch and send me a new pull request?

I'll fix the initial commit and send a new PR once Kamal has confirmed
this variable should not be static (which is also my opinion).

> Or have my apply the
> trivial fixup (i.e., s/static//) as a separate patch on top?

Thanks,

Boris
Kamal Dasu July 11, 2016, 2:52 p.m. UTC | #3
It's ok to fix it up.

Kamal

On Mon, Jul 11, 2016 at 2:49 AM, Boris Brezillon
<boris.brezillon@free-electrons.com> wrote:
> On Sun, 10 Jul 2016 22:04:52 -0700
> Brian Norris <computersforpeace@gmail.com> wrote:
>
>> Hi,
>>
>> Looking through Boris's pull request, I noticed an issue:
>>
>> On Thu, Jun 09, 2016 at 05:17:55PM -0400, Kamal Dasu wrote:
>> > This change provides a fix for controller bug where nand
>> > controller could have a possible sticky error after a PIO
>> > followed by a DMA read. The fix retries a read if we see
>> > a uncorr_ecc after read to detect such sticky errors.
>> > The fix applies to only controller version 7.0 and 7.1.
>> >
>> > Signed-off-by: Kamal Dasu <kdasu.kdev@gmail.com>
>> > ---
>> > V3 changes
>> >  none
>> > V2 Changes
>> >  Restrict controller bug workaround to version 7.0 and 7.1
>> > ---
>> >  drivers/mtd/nand/brcmnand/brcmnand.c | 18 ++++++++++++++++++
>> >  1 file changed, 18 insertions(+)
>> >
>> > diff --git a/drivers/mtd/nand/brcmnand/brcmnand.c b/drivers/mtd/nand/brcmnand/brcmnand.c
>> > index 7ee9617..1156495 100644
>> > --- a/drivers/mtd/nand/brcmnand/brcmnand.c
>> > +++ b/drivers/mtd/nand/brcmnand/brcmnand.c
>> > @@ -1602,9 +1602,11 @@ static int brcmnand_read(struct mtd_info *mtd, struct nand_chip *chip,
>> >     struct brcmnand_controller *ctrl = host->ctrl;
>> >     u64 err_addr = 0;
>> >     int err;
>> > +   static bool retry = true;
>>
>> Is this intentionally static? That means your retry will only be
>> performed exactly once *ever*. Probably not what you expected?
>>
>> Boris, if this is indeed unintended, would you rather fix this up in the
>> original patch and send me a new pull request?
>
> I'll fix the initial commit and send a new PR once Kamal has confirmed
> this variable should not be static (which is also my opinion).
>
>> Or have my apply the
>> trivial fixup (i.e., s/static//) as a separate patch on top?
>
> Thanks,
>
> Boris
diff mbox

Patch

diff --git a/drivers/mtd/nand/brcmnand/brcmnand.c b/drivers/mtd/nand/brcmnand/brcmnand.c
index 7ee9617..1156495 100644
--- a/drivers/mtd/nand/brcmnand/brcmnand.c
+++ b/drivers/mtd/nand/brcmnand/brcmnand.c
@@ -1602,9 +1602,11 @@  static int brcmnand_read(struct mtd_info *mtd, struct nand_chip *chip,
 	struct brcmnand_controller *ctrl = host->ctrl;
 	u64 err_addr = 0;
 	int err;
+	static bool retry = true;
 
 	dev_dbg(ctrl->dev, "read %llx -> %p\n", (unsigned long long)addr, buf);
 
+try_dmaread:
 	brcmnand_write_reg(ctrl, BRCMNAND_UNCORR_COUNT, 0);
 
 	if (has_flash_dma(ctrl) && !oob && flash_dma_buf_ok(buf)) {
@@ -1626,6 +1628,22 @@  static int brcmnand_read(struct mtd_info *mtd, struct nand_chip *chip,
 
 	if (mtd_is_eccerr(err)) {
 		/*
+		 * On controller version and 7.0, 7.1 , DMA read after a
+		 * prior PIO read that reported uncorrectable error,
+		 * the DMA engine captures this error following DMA read
+		 * cleared only on subsequent DMA read, so just retry once
+		 * to clear a possible false error reported for current DMA
+		 * read
+		 */
+		if ((ctrl->nand_version == 0x0700) ||
+		    (ctrl->nand_version == 0x0701)) {
+			if (retry) {
+				retry = false;
+				goto try_dmaread;
+			}
+		}
+
+		/*
 		 * Controller version 7.2 has hw encoder to detect erased page
 		 * bitflips, apply sw verification for older controllers only
 		 */