diff mbox series

[3/9] mtd: nand: qcom: erased page detection for uncorrectable errors only

Message ID 1522845745-6624-4-git-send-email-absahu@codeaurora.org
State Changes Requested
Headers show
Series Update for QCOM NAND driver | expand

Commit Message

Abhishek Sahu April 4, 2018, 12:42 p.m. UTC
The NAND flash controller generates ECC uncorrectable error
first in case of completely erased page. Currently driver
applies the erased page detection logic for other operation
errors also so fix this and return EIO for other operational
errors.

Signed-off-by: Abhishek Sahu <absahu@codeaurora.org>
---
 drivers/mtd/nand/qcom_nandc.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

Comments

Miquel Raynal April 10, 2018, 8:59 a.m. UTC | #1
Hi Abhishek,

On Wed,  4 Apr 2018 18:12:19 +0530, Abhishek Sahu
<absahu@codeaurora.org> wrote:

> The NAND flash controller generates ECC uncorrectable error
> first in case of completely erased page. Currently driver
> applies the erased page detection logic for other operation
> errors also so fix this and return EIO for other operational
> errors.

I am sorry I don't understand very well what is the purpose of this
patch, could you please explain it again?

Do you mean that you want to avoid having rising ECC errors when you
read erased pages?

> 
> Signed-off-by: Abhishek Sahu <absahu@codeaurora.org>
> ---
>  drivers/mtd/nand/qcom_nandc.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/mtd/nand/qcom_nandc.c b/drivers/mtd/nand/qcom_nandc.c
> index 17321fc..57c16a6 100644
> --- a/drivers/mtd/nand/qcom_nandc.c
> +++ b/drivers/mtd/nand/qcom_nandc.c
> @@ -1578,6 +1578,7 @@ static int parse_read_errors(struct qcom_nand_host *host, u8 *data_buf,
>  	struct nand_ecc_ctrl *ecc = &chip->ecc;
>  	unsigned int max_bitflips = 0;
>  	struct read_stats *buf;
> +	bool flash_op_err = false;
>  	int i;
>  
>  	buf = (struct read_stats *)nandc->reg_read_buf;
> @@ -1599,7 +1600,7 @@ static int parse_read_errors(struct qcom_nand_host *host, u8 *data_buf,
>  		buffer = le32_to_cpu(buf->buffer);
>  		erased_cw = le32_to_cpu(buf->erased_cw);
>  
> -		if (flash & (FS_OP_ERR | FS_MPU_ERR)) {
> +		if ((flash & FS_OP_ERR) && (buffer & BS_UNCORRECTABLE_BIT)) {

And later you have another "if (buffer & BS_UNCORRECTABLE_BIT)" which
is then redundant, unless that is not what you actually want to do?

Maybe you can add comments before the if ()/ else if () to explain in
which case you enter each branch.

>  			bool erased;
>  
>  			/* ignore erased codeword errors */
> @@ -1641,6 +1642,8 @@ static int parse_read_errors(struct qcom_nand_host *host, u8 *data_buf,
>  						max_t(unsigned int, max_bitflips, ret);
>  				}
>  			}
> +		} else if (flash & (FS_OP_ERR | FS_MPU_ERR)) {
> +			flash_op_err = true;
>  		} else {
>  			unsigned int stat;
>  
> @@ -1654,6 +1657,9 @@ static int parse_read_errors(struct qcom_nand_host *host, u8 *data_buf,
>  			oob_buf += oob_len + ecc->bytes;
>  	}
>  
> +	if (flash_op_err)
> +		return -EIO;
> +

In you are propagating an error related to the controller, this is
fine, but I think you just want to raise the fact that a NAND
uncorrectable error occurred, in this case you should just increment
mtd->ecc_stats.failed and return 0 (returning max_bitflips here would be
fine too has it would be 0 too).

>  	return max_bitflips;
>  }
>  

Thanks,
Miquèl
Abhishek Sahu April 12, 2018, 6:33 a.m. UTC | #2
On 2018-04-10 14:29, Miquel Raynal wrote:
> Hi Abhishek,
> 
> On Wed,  4 Apr 2018 18:12:19 +0530, Abhishek Sahu
> <absahu@codeaurora.org> wrote:
> 
>> The NAND flash controller generates ECC uncorrectable error
>> first in case of completely erased page. Currently driver
>> applies the erased page detection logic for other operation
>> errors also so fix this and return EIO for other operational
>> errors.
> 
> I am sorry I don't understand very well what is the purpose of this
> patch, could you please explain it again?
> 
> Do you mean that you want to avoid having rising ECC errors when you
> read erased pages?
> 
  Thanks Miquel for your review.

  QCOM NAND flash controller has in built erased page
  detection HW.
  Following is the flow in the HW if controller tries
  to read erased page

  1. First ECC uncorrectable error will be generated from
     ECC engine since ECC engine first calculates the ECC with
     all 0xff and match the calculated ECC with ECC code in OOB
     (which is again all 0xff).
  2. After getting ECC error, erased CW detection HW checks if
     all the bytes in page are 0xff and then it updates the
     status in separate register NAND_ERASED_CW_DETECT_STATUS

  So the erased CW detect status should be checked only if
  ECC engine generated the uncorrectable error.

  Currently for all other operational errors also (like TIMEOUT,
  MPU errors etc), the erased CW detect register was being
  checked.

>> 
>> Signed-off-by: Abhishek Sahu <absahu@codeaurora.org>
>> ---
>>  drivers/mtd/nand/qcom_nandc.c | 8 +++++++-
>>  1 file changed, 7 insertions(+), 1 deletion(-)
>> 
>> diff --git a/drivers/mtd/nand/qcom_nandc.c 
>> b/drivers/mtd/nand/qcom_nandc.c
>> index 17321fc..57c16a6 100644
>> --- a/drivers/mtd/nand/qcom_nandc.c
>> +++ b/drivers/mtd/nand/qcom_nandc.c
>> @@ -1578,6 +1578,7 @@ static int parse_read_errors(struct 
>> qcom_nand_host *host, u8 *data_buf,
>>  	struct nand_ecc_ctrl *ecc = &chip->ecc;
>>  	unsigned int max_bitflips = 0;
>>  	struct read_stats *buf;
>> +	bool flash_op_err = false;
>>  	int i;
>> 
>>  	buf = (struct read_stats *)nandc->reg_read_buf;
>> @@ -1599,7 +1600,7 @@ static int parse_read_errors(struct 
>> qcom_nand_host *host, u8 *data_buf,
>>  		buffer = le32_to_cpu(buf->buffer);
>>  		erased_cw = le32_to_cpu(buf->erased_cw);
>> 
>> -		if (flash & (FS_OP_ERR | FS_MPU_ERR)) {
>> +		if ((flash & FS_OP_ERR) && (buffer & BS_UNCORRECTABLE_BIT)) {
> 
> And later you have another "if (buffer & BS_UNCORRECTABLE_BIT)" which
> is then redundant, unless that is not what you actually want to do?

  Yes. That check seems to be redundant. I will fix that.

> 
> Maybe you can add comments before the if ()/ else if () to explain in
> which case you enter each branch.

  Sure. That would be better. Will add the same in next patch set.

> 
>>  			bool erased;
>> 
>>  			/* ignore erased codeword errors */
>> @@ -1641,6 +1642,8 @@ static int parse_read_errors(struct 
>> qcom_nand_host *host, u8 *data_buf,
>>  						max_t(unsigned int, max_bitflips, ret);
>>  				}
>>  			}
>> +		} else if (flash & (FS_OP_ERR | FS_MPU_ERR)) {
>> +			flash_op_err = true;
>>  		} else {
>>  			unsigned int stat;
>> 
>> @@ -1654,6 +1657,9 @@ static int parse_read_errors(struct 
>> qcom_nand_host *host, u8 *data_buf,
>>  			oob_buf += oob_len + ecc->bytes;
>>  	}
>> 
>> +	if (flash_op_err)
>> +		return -EIO;
>> +
> 
> In you are propagating an error related to the controller, this is
> fine, but I think you just want to raise the fact that a NAND
> uncorrectable error occurred, in this case you should just increment
> mtd->ecc_stats.failed and return 0 (returning max_bitflips here would 
> be
> fine too has it would be 0 too).

   The flash_op_err will be for other operational errors only (like 
timeout,
   MPU error, device failure etc). For correctable errors,

   ret = nand_check_erased_ecc_chunk(data_buf,
                           data_len, eccbuf, ecclen, oob_buf,
                           extraooblen, ecc->strength);
                   if (ret < 0) {
                           mtd->ecc_stats.failed++;
                   } else {
                           mtd->ecc_stats.corrected += ret;

  Already, it is incrementing mtd->ecc_stats.failed

  Thanks,
  Abhishek
Miquel Raynal April 12, 2018, 6:49 a.m. UTC | #3
Hi Abhishek,

On Thu, 12 Apr 2018 12:03:58 +0530, Abhishek Sahu
<absahu@codeaurora.org> wrote:

> On 2018-04-10 14:29, Miquel Raynal wrote:
> > Hi Abhishek,  
> > > On Wed,  4 Apr 2018 18:12:19 +0530, Abhishek Sahu  
> > <absahu@codeaurora.org> wrote:  
> > >> The NAND flash controller generates ECC uncorrectable error  
> >> first in case of completely erased page. Currently driver
> >> applies the erased page detection logic for other operation
> >> errors also so fix this and return EIO for other operational
> >> errors.
> > > I am sorry I don't understand very well what is the purpose of this  
> > patch, could you please explain it again?  
> > > Do you mean that you want to avoid having rising ECC errors when you  
> > read erased pages?
> >   Thanks Miquel for your review.  
> 
>   QCOM NAND flash controller has in built erased page
>   detection HW.
>   Following is the flow in the HW if controller tries
>   to read erased page
> 
>   1. First ECC uncorrectable error will be generated from
>      ECC engine since ECC engine first calculates the ECC with
>      all 0xff and match the calculated ECC with ECC code in OOB
>      (which is again all 0xff).
>   2. After getting ECC error, erased CW detection HW checks if
>      all the bytes in page are 0xff and then it updates the
>      status in separate register NAND_ERASED_CW_DETECT_STATUS
> 
>   So the erased CW detect status should be checked only if
>   ECC engine generated the uncorrectable error.
> 
>   Currently for all other operational errors also (like TIMEOUT,
>   MPU errors etc), the erased CW detect register was being
>   checked.

This is very clear, thanks. I don't know very much this controller so I
think you can add this information in the commit message for future
reference.

> 
> >> >> Signed-off-by: Abhishek Sahu <absahu@codeaurora.org>  
> >> ---
> >>  drivers/mtd/nand/qcom_nandc.c | 8 +++++++-
> >>  1 file changed, 7 insertions(+), 1 deletion(-)  
> >> >> diff --git a/drivers/mtd/nand/qcom_nandc.c >> b/drivers/mtd/nand/qcom_nandc.c  
> >> index 17321fc..57c16a6 100644
> >> --- a/drivers/mtd/nand/qcom_nandc.c
> >> +++ b/drivers/mtd/nand/qcom_nandc.c
> >> @@ -1578,6 +1578,7 @@ static int parse_read_errors(struct >> qcom_nand_host *host, u8 *data_buf,
> >>  	struct nand_ecc_ctrl *ecc = &chip->ecc;
> >>  	unsigned int max_bitflips = 0;
> >>  	struct read_stats *buf;
> >> +	bool flash_op_err = false;
> >>  	int i;  
> >> >>  	buf = (struct read_stats *)nandc->reg_read_buf;  
> >> @@ -1599,7 +1600,7 @@ static int parse_read_errors(struct >> qcom_nand_host *host, u8 *data_buf,
> >>  		buffer = le32_to_cpu(buf->buffer);
> >>  		erased_cw = le32_to_cpu(buf->erased_cw);  
> >> >> -		if (flash & (FS_OP_ERR | FS_MPU_ERR)) {  
> >> +		if ((flash & FS_OP_ERR) && (buffer & BS_UNCORRECTABLE_BIT)) {
> > > And later you have another "if (buffer & BS_UNCORRECTABLE_BIT)" which  
> > is then redundant, unless that is not what you actually want to do?  
> 
>   Yes. That check seems to be redundant. I will fix that.
> 
> > > Maybe you can add comments before the if ()/ else if () to explain in  
> > which case you enter each branch.  
> 
>   Sure. That would be better. Will add the same in next patch set.
> 
> > >>  			bool erased;  
> >> >>  			/* ignore erased codeword errors */  
> >> @@ -1641,6 +1642,8 @@ static int parse_read_errors(struct >> qcom_nand_host *host, u8 *data_buf,
> >>  						max_t(unsigned int, max_bitflips, ret);
> >>  				}
> >>  			}
> >> +		} else if (flash & (FS_OP_ERR | FS_MPU_ERR)) {
> >> +			flash_op_err = true;
> >>  		} else {
> >>  			unsigned int stat;  
> >> >> @@ -1654,6 +1657,9 @@ static int parse_read_errors(struct >> qcom_nand_host *host, u8 *data_buf,  
> >>  			oob_buf += oob_len + ecc->bytes;
> >>  	}  
> >> >> +	if (flash_op_err)  
> >> +		return -EIO;
> >> +
> > > In you are propagating an error related to the controller, this is  
> > fine, but I think you just want to raise the fact that a NAND
> > uncorrectable error occurred, in this case you should just increment
> > mtd->ecc_stats.failed and return 0 (returning max_bitflips here would > be
> > fine too has it would be 0 too).  
> 
>    The flash_op_err will be for other operational errors only (like timeout,
>    MPU error, device failure etc). For correctable errors,
> 
>    ret = nand_check_erased_ecc_chunk(data_buf,
>                            data_len, eccbuf, ecclen, oob_buf,
>                            extraooblen, ecc->strength);

Why do you need nand_check_erased_ecc_chunk() if the blank page check
is done in hw?

Thanks,
Miquèl

>                    if (ret < 0) {
>                            mtd->ecc_stats.failed++;
>                    } else {
>                            mtd->ecc_stats.corrected += ret;
> 
>   Already, it is incrementing mtd->ecc_stats.failed
> 
>   Thanks,
>   Abhishek
Abhishek Sahu April 12, 2018, 6:58 a.m. UTC | #4
On 2018-04-12 12:19, Miquel Raynal wrote:
> Hi Abhishek,
> 
> On Thu, 12 Apr 2018 12:03:58 +0530, Abhishek Sahu
> <absahu@codeaurora.org> wrote:
> 
>> On 2018-04-10 14:29, Miquel Raynal wrote:
>> > Hi Abhishek,
>> > > On Wed,  4 Apr 2018 18:12:19 +0530, Abhishek Sahu
>> > <absahu@codeaurora.org> wrote:
>> > >> The NAND flash controller generates ECC uncorrectable error
>> >> first in case of completely erased page. Currently driver
>> >> applies the erased page detection logic for other operation
>> >> errors also so fix this and return EIO for other operational
>> >> errors.
>> > > I am sorry I don't understand very well what is the purpose of this
>> > patch, could you please explain it again?
>> > > Do you mean that you want to avoid having rising ECC errors when you
>> > read erased pages?
>> >   Thanks Miquel for your review.
>> 
>>   QCOM NAND flash controller has in built erased page
>>   detection HW.
>>   Following is the flow in the HW if controller tries
>>   to read erased page
>> 
>>   1. First ECC uncorrectable error will be generated from
>>      ECC engine since ECC engine first calculates the ECC with
>>      all 0xff and match the calculated ECC with ECC code in OOB
>>      (which is again all 0xff).
>>   2. After getting ECC error, erased CW detection HW checks if
>>      all the bytes in page are 0xff and then it updates the
>>      status in separate register NAND_ERASED_CW_DETECT_STATUS
>> 
>>   So the erased CW detect status should be checked only if
>>   ECC engine generated the uncorrectable error.
>> 
>>   Currently for all other operational errors also (like TIMEOUT,
>>   MPU errors etc), the erased CW detect register was being
>>   checked.
> 
> This is very clear, thanks. I don't know very much this controller so I
> think you can add this information in the commit message for future
> reference.
> 

  Sure Miquel.
  I  Will update the commit message to include more detail.

>> 
>> >> >> Signed-off-by: Abhishek Sahu <absahu@codeaurora.org>
>> >> ---
>> >>  drivers/mtd/nand/qcom_nandc.c | 8 +++++++-
>> >>  1 file changed, 7 insertions(+), 1 deletion(-)
>> >> >> diff --git a/drivers/mtd/nand/qcom_nandc.c >> b/drivers/mtd/nand/qcom_nandc.c
>> >> index 17321fc..57c16a6 100644
>> >> --- a/drivers/mtd/nand/qcom_nandc.c
>> >> +++ b/drivers/mtd/nand/qcom_nandc.c
>> >> @@ -1578,6 +1578,7 @@ static int parse_read_errors(struct >> qcom_nand_host *host, u8 *data_buf,
>> >>  	struct nand_ecc_ctrl *ecc = &chip->ecc;
>> >>  	unsigned int max_bitflips = 0;
>> >>  	struct read_stats *buf;
>> >> +	bool flash_op_err = false;
>> >>  	int i;
>> >> >>  	buf = (struct read_stats *)nandc->reg_read_buf;
>> >> @@ -1599,7 +1600,7 @@ static int parse_read_errors(struct >> qcom_nand_host *host, u8 *data_buf,
>> >>  		buffer = le32_to_cpu(buf->buffer);
>> >>  		erased_cw = le32_to_cpu(buf->erased_cw);
>> >> >> -		if (flash & (FS_OP_ERR | FS_MPU_ERR)) {
>> >> +		if ((flash & FS_OP_ERR) && (buffer & BS_UNCORRECTABLE_BIT)) {
>> > > And later you have another "if (buffer & BS_UNCORRECTABLE_BIT)" which
>> > is then redundant, unless that is not what you actually want to do?
>> 
>>   Yes. That check seems to be redundant. I will fix that.
>> 
>> > > Maybe you can add comments before the if ()/ else if () to explain in
>> > which case you enter each branch.
>> 
>>   Sure. That would be better. Will add the same in next patch set.
>> 
>> > >>  			bool erased;
>> >> >>  			/* ignore erased codeword errors */
>> >> @@ -1641,6 +1642,8 @@ static int parse_read_errors(struct >> qcom_nand_host *host, u8 *data_buf,
>> >>  						max_t(unsigned int, max_bitflips, ret);
>> >>  				}
>> >>  			}
>> >> +		} else if (flash & (FS_OP_ERR | FS_MPU_ERR)) {
>> >> +			flash_op_err = true;
>> >>  		} else {
>> >>  			unsigned int stat;
>> >> >> @@ -1654,6 +1657,9 @@ static int parse_read_errors(struct >> qcom_nand_host *host, u8 *data_buf,
>> >>  			oob_buf += oob_len + ecc->bytes;
>> >>  	}
>> >> >> +	if (flash_op_err)
>> >> +		return -EIO;
>> >> +
>> > > In you are propagating an error related to the controller, this is
>> > fine, but I think you just want to raise the fact that a NAND
>> > uncorrectable error occurred, in this case you should just increment
>> > mtd->ecc_stats.failed and return 0 (returning max_bitflips here would > be
>> > fine too has it would be 0 too).
>> 
>>    The flash_op_err will be for other operational errors only (like 
>> timeout,
>>    MPU error, device failure etc). For correctable errors,
>> 
>>    ret = nand_check_erased_ecc_chunk(data_buf,
>>                            data_len, eccbuf, ecclen, oob_buf,
>>                            extraooblen, ecc->strength);
> 
> Why do you need nand_check_erased_ecc_chunk() if the blank page check
> is done in hw?
> 

  This is only applicable for BCH algorithm.
  IPQ806x uses RS code for 4 bit ECC which does not have HW blank page
  detection.

  You can get more detail in function comment of
  erased_chunk_check_and_fixup

   /*
   * when using BCH ECC, the HW flags an error in NAND_FLASH_STATUS if it 
read
   * an erased CW, and reports an erased CW in 
NAND_ERASED_CW_DETECT_STATUS.
   *
   * when using RS ECC, the HW reports the same erros when reading an 
erased CW,
   * but it notifies that it is an erased CW by placing special 
characters at
   * certain offsets in the buffer.
   *
   * verify if the page is erased or not, and fix up the page for RS ECC 
by
   * replacing the special characters with 0xff.
   */
  static bool erased_chunk_check_and_fixup(u8 *data_buf, int data_len)
  {

  Thanks,
  Abhishek
diff mbox series

Patch

diff --git a/drivers/mtd/nand/qcom_nandc.c b/drivers/mtd/nand/qcom_nandc.c
index 17321fc..57c16a6 100644
--- a/drivers/mtd/nand/qcom_nandc.c
+++ b/drivers/mtd/nand/qcom_nandc.c
@@ -1578,6 +1578,7 @@  static int parse_read_errors(struct qcom_nand_host *host, u8 *data_buf,
 	struct nand_ecc_ctrl *ecc = &chip->ecc;
 	unsigned int max_bitflips = 0;
 	struct read_stats *buf;
+	bool flash_op_err = false;
 	int i;
 
 	buf = (struct read_stats *)nandc->reg_read_buf;
@@ -1599,7 +1600,7 @@  static int parse_read_errors(struct qcom_nand_host *host, u8 *data_buf,
 		buffer = le32_to_cpu(buf->buffer);
 		erased_cw = le32_to_cpu(buf->erased_cw);
 
-		if (flash & (FS_OP_ERR | FS_MPU_ERR)) {
+		if ((flash & FS_OP_ERR) && (buffer & BS_UNCORRECTABLE_BIT)) {
 			bool erased;
 
 			/* ignore erased codeword errors */
@@ -1641,6 +1642,8 @@  static int parse_read_errors(struct qcom_nand_host *host, u8 *data_buf,
 						max_t(unsigned int, max_bitflips, ret);
 				}
 			}
+		} else if (flash & (FS_OP_ERR | FS_MPU_ERR)) {
+			flash_op_err = true;
 		} else {
 			unsigned int stat;
 
@@ -1654,6 +1657,9 @@  static int parse_read_errors(struct qcom_nand_host *host, u8 *data_buf,
 			oob_buf += oob_len + ecc->bytes;
 	}
 
+	if (flash_op_err)
+		return -EIO;
+
 	return max_bitflips;
 }