Patchwork mtd: nand: ignore ecc errors during bbt reads

login
register
mail settings
Submitter Mike Dunn
Date June 7, 2012, 6:32 p.m.
Message ID <1339093928-17545-1-git-send-email-mikedunn@newsguy.com>
Download mbox | patch
Permalink /patch/163657/
State New
Headers show

Comments

Mike Dunn - June 7, 2012, 6:32 p.m.
Ignore ecc errors in the scan_read_raw_oob() function.  Also removed code that
is now redundant.

Signed-off-by: Mike Dunn <mikedunn@newsguy.com>
---
 drivers/mtd/nand/nand_bbt.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)
Shmulik Ladkani - June 10, 2012, 11:50 a.m.
Hi Mike,

On Thu,  7 Jun 2012 11:32:08 -0700 Mike Dunn <mikedunn@newsguy.com> wrote:
> Ignore ecc errors in the scan_read_raw_oob() function.  Also removed code that
> is now redundant.
> 
> Signed-off-by: Mike Dunn <mikedunn@newsguy.com>
> ---
>  drivers/mtd/nand/nand_bbt.c |    6 +++---
>  1 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/mtd/nand/nand_bbt.c b/drivers/mtd/nand/nand_bbt.c
> index 30d1319..585da44 100644
> --- a/drivers/mtd/nand/nand_bbt.c
> +++ b/drivers/mtd/nand/nand_bbt.c
> @@ -319,7 +319,8 @@ static int scan_read_raw_oob(struct mtd_info *mtd, uint8_t *buf, loff_t offs,
>  
>  		res = mtd_read_oob(mtd, offs, &ops);
>  
> -		if (res)
> +		/* Ignore ECC errors when checking for BBM */
> +		if (res && !mtd_is_bitflip_or_eccerr(res))
>  			return res;

IMO this is not necessary.
Note the 'ops.mode' is initialized to MTD_OPS_RAW;  meaning, no ECC is
performed during read ('ecc.read_page_raw' will be invoked).
Thus, EUCLEAN/EBADMSG should not be reported.
Am I missing something here?

> @@ -406,8 +407,7 @@ static int scan_block_full(struct mtd_info *mtd, struct nand_bbt_descr *bd,
>  	int ret, j;
>  
>  	ret = scan_read_raw_oob(mtd, buf, offs, readlen);
> -	/* Ignore ECC errors when checking for BBM */
> -	if (ret && !mtd_is_bitflip_or_eccerr(ret))
> +	if (ret)
>  		return ret;

Don't understand why 'mtd_is_bitflip_or_eccerr' is originally tested.
As noted above, 'scan_read_raw_oob' won't return EUCLEAN/EBADMSG, as it
uses MTD_OPS_RAW.

And what's even stranger is the code in 'scan_block_fast()', quote:

	ops.datbuf = NULL;
	ops.mode = MTD_OPS_PLACE_OOB;

	for (j = 0; j < len; j++) {
		ret = mtd_read_oob(mtd, offs, &ops);
		/* Ignore ECC errors when checking for BBM */
		if (ret && !mtd_is_bitflip_or_eccerr(ret))
			return ret;

Here, 'MTD_OPS_PLACE_OOB' is explicitly used - so *theoretically*
'mtd_read_oob' might return EUCLEAN/EBADMSG, see 'nand_do_read_oob',
(although NO driver YET reports maxbitflips from its 'chip->ecc.read_oob'
method).

Why not use 'MTD_OPS_RAW' in 'scan_block_fast()' as well (as done in
'scan_block_full')?

Regards,
Shmulik
Brian Norris - June 11, 2012, 5:45 a.m.
Hi,

Friendly comment: it would be helpful to include me on these types of
threads, since I am responsible for much of the code in discussion
here. (Try 'git blame'). There are some things I apparently got wrong
in the first place and some new issues since the bitflips_threshold
introduction. Also, there's some incomplete information, since (as
noted below) some things are not used in-tree, but they are in use in
my out-of-tree driver for which I submitted the original code. I
believe that the changes are generally useful, though, if understood
and utilized properly.

On Sun, Jun 10, 2012 at 4:50 AM, Shmulik Ladkani
<shmulik.ladkani@gmail.com> wrote:
> On Thu,  7 Jun 2012 11:32:08 -0700 Mike Dunn <mikedunn@newsguy.com> wrote:
>> Ignore ecc errors in the scan_read_raw_oob() function.  Also removed code that
>> is now redundant.
>>
>> Signed-off-by: Mike Dunn <mikedunn@newsguy.com>
>> ---
>>  drivers/mtd/nand/nand_bbt.c |    6 +++---
>>  1 files changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/mtd/nand/nand_bbt.c b/drivers/mtd/nand/nand_bbt.c
>> index 30d1319..585da44 100644
>> --- a/drivers/mtd/nand/nand_bbt.c
>> +++ b/drivers/mtd/nand/nand_bbt.c
>> @@ -319,7 +319,8 @@ static int scan_read_raw_oob(struct mtd_info *mtd, uint8_t *buf, loff_t offs,
>>
>>               res = mtd_read_oob(mtd, offs, &ops);
>>
>> -             if (res)
>> +             /* Ignore ECC errors when checking for BBM */
>> +             if (res && !mtd_is_bitflip_or_eccerr(res))
>>                       return res;
>
> IMO this is not necessary.
> Note the 'ops.mode' is initialized to MTD_OPS_RAW;  meaning, no ECC is
> performed during read ('ecc.read_page_raw' will be invoked).
> Thus, EUCLEAN/EBADMSG should not be reported.
> Am I missing something here?

Shmulik is correct. A check for bitflips on a MTD_OPS_RAW operation is
unnecessary.

>> @@ -406,8 +407,7 @@ static int scan_block_full(struct mtd_info *mtd, struct nand_bbt_descr *bd,
>>       int ret, j;
>>
>>       ret = scan_read_raw_oob(mtd, buf, offs, readlen);
>> -     /* Ignore ECC errors when checking for BBM */
>> -     if (ret && !mtd_is_bitflip_or_eccerr(ret))
>> +     if (ret)
>>               return ret;
>
> Don't understand why 'mtd_is_bitflip_or_eccerr' is originally tested.
> As noted above, 'scan_read_raw_oob' won't return EUCLEAN/EBADMSG, as it
> uses MTD_OPS_RAW.

FWIW, I originally was *trying* to mirror functionality in
scan_block_full() and scan_block_fast(), but my code was wrong
apparently. See below.

> And what's even stranger is the code in 'scan_block_fast()', quote:
>
>        ops.datbuf = NULL;
>        ops.mode = MTD_OPS_PLACE_OOB;
>
>        for (j = 0; j < len; j++) {
>                ret = mtd_read_oob(mtd, offs, &ops);
>                /* Ignore ECC errors when checking for BBM */
>                if (ret && !mtd_is_bitflip_or_eccerr(ret))
>                        return ret;
>
> Here, 'MTD_OPS_PLACE_OOB' is explicitly used - so *theoretically*
> 'mtd_read_oob' might return EUCLEAN/EBADMSG, see 'nand_do_read_oob',
> (although NO driver YET reports maxbitflips from its 'chip->ecc.read_oob'
> method).

I think we discussed the "read_oob" case and max_bitflips previously,
where I noted that my out-of-tree driver uses ECC that covers OOB and
page data and sets ecc_stats.corrected and ecc_stats.failed according
to the entire page+OOB when ecc.read_oob is called. So, this call to
mtd_read_oob() may return EUCLEAN/EBADMSG when used my driver. This
isn't a problem and should be left alone I think.

> Why not use 'MTD_OPS_RAW' in 'scan_block_fast()' as well (as done in
> 'scan_block_full')?

Because when ECC is available on OOB, it is good to utilize it, so
that bitflips don't cause good blocks to be marked bad. Less
importantly, when bad block markers are written by Linux (with ECC),
then these markers can be read back more reliably. (I note that there
is a nand_chip.badblockbits option that could theoretically alleviate
the first issue, but I see it is not actually implemented throughout
nand_bbt.c - only in nand_base.c)

IMO, 'scan_block_full' contains the erroneous code for now. When I
changed scan_block_fast(), I half-heartedly changed scan_block_full()
as well, but it does not appear on my code path, so I didn't notice
the discrepancy.

This brings up a side issue: I think many of the "_raw" function names
in nand_bbt.c are misleading. They do not all use MTD_OPS_RAW (and
shouldn't). Effectively, I would prefer that *all* the calls in
nand_bbt.c use the non-RAW version of the MTD/NAND interfaces, and
then ignore the errors if sensible. e.g., when reading a bad block
marker. But nand_bbt.c does a lot more (reading BBTs from flash,
checking for "Bbt0" and "1tbB" table marker, etc.) that *must* use ECC
to provide any robustness.

So, I can send a patch that straightens out naming and brings
scan_block_fast() and scan_block_full() into alignment on using
MTD_OPS_PLACE_OOB. Then I think that we should continue using this
code in both:
                /* Ignore ECC errors when checking for BBM */
                if (ret && !mtd_is_bitflip_or_eccerr(ret))
And we can (as agreed previously?) avoid using/checking max_bitflips
in mtd_read_oob(), although there is the datbuf+oob case that calls
nand_do_read_ops...

Brian
Shmulik Ladkani - June 11, 2012, 6:41 a.m.
Thanks Brian for your comments,

On Sun, 10 Jun 2012 22:45:50 -0700 Brian Norris <computersforpeace@gmail.com> wrote:
> Friendly comment: it would be helpful to include me on these types of
> threads, since I am responsible for much of the code in discussion
> here. (Try 'git blame').

Sure, will do.
I usually do a 'git blame', for some reason was careless this time...

> > Here, 'MTD_OPS_PLACE_OOB' is explicitly used - so *theoretically*
> > 'mtd_read_oob' might return EUCLEAN/EBADMSG, see 'nand_do_read_oob',
> > (although NO driver YET reports maxbitflips from its 'chip->ecc.read_oob'
> > method).
> 
> I think we discussed the "read_oob" case and max_bitflips previously,
> where I noted that my out-of-tree driver uses ECC that covers OOB and
> page data and sets ecc_stats.corrected and ecc_stats.failed according
> to the entire page+OOB when ecc.read_oob is called. So, this call to
> mtd_read_oob() may return EUCLEAN/EBADMSG when used my driver. This
> isn't a problem and should be left alone I think.

Bad phrasing on my side, should have said "although NO in-tree
driver".

> > Why not use 'MTD_OPS_RAW' in 'scan_block_fast()' as well (as done in
> > 'scan_block_full')?
> 
> Because when ECC is available on OOB, it is good to utilize it, so
> that bitflips don't cause good blocks to be marked bad. Less
> importantly, when bad block markers are written by Linux (with ECC),
> then these markers can be read back more reliably.

Ok (due to the second reason, as we're discussing the BBM read
implementation).

> So, I can send a patch that straightens out naming and brings
> scan_block_fast() and scan_block_full() into alignment on using
> MTD_OPS_PLACE_OOB. Then I think that we should continue using this
> code in both:
>                 /* Ignore ECC errors when checking for BBM */
>                 if (ret && !mtd_is_bitflip_or_eccerr(ret))

Sounds reasonable.

> And we can (as agreed previously?) avoid using/checking max_bitflips
> in mtd_read_oob(), although there is the datbuf+oob case that calls
> nand_do_read_ops...

This is a separate thing that needs to be addressed.
See my findings in [1].

Regards,
Shmulik

[1] http://lists.infradead.org/pipermail/linux-mtd/2012-June/042194.html
Brian Norris - June 11, 2012, 6:53 a.m.
On 06/10/2012 11:41 PM, Shmulik Ladkani wrote:
> Thanks Brian for your comments,

You're welcome. And thanks for the interest in reviewing the code. Few 
people actually delve into nand_bbt.c, it seems :)

> On Sun, 10 Jun 2012 22:45:50 -0700 Brian Norris<computersforpeace@gmail.com>  wrote:
>>> Why not use 'MTD_OPS_RAW' in 'scan_block_fast()' as well (as done in
>>> 'scan_block_full')?
>>
>> Because when ECC is available on OOB, it is good to utilize it, so
>> that bitflips don't cause good blocks to be marked bad. Less
>> importantly, when bad block markers are written by Linux (with ECC),
>> then these markers can be read back more reliably.
>
> Ok (due to the second reason, as we're discussing the BBM read
> implementation).

I guess I worded my 'first' reason a little wrong. I meant: so that 
bitflips don't cause good blocks (0xff) to be read as bad blocks 
(bitflip => non-0xff). I think this would be far more significant, as a 
single bitflip in the BBM byte could cause errors.

>> So, I can send a patch that straightens out naming and brings
>> scan_block_fast() and scan_block_full() into alignment on using
>> MTD_OPS_PLACE_OOB. Then I think that we should continue using this
>> code in both:
>>                  /* Ignore ECC errors when checking for BBM */
>>                  if (ret&&  !mtd_is_bitflip_or_eccerr(ret))
>
> Sounds reasonable.

OK, I'll do that this week.

>> And we can (as agreed previously?) avoid using/checking max_bitflips
>> in mtd_read_oob(), although there is the datbuf+oob case that calls
>> nand_do_read_ops...
>
> This is a separate thing that needs to be addressed.
> See my findings in [1].

Right, I saw both threads kinda simultaneously. It looks like Mike may 
need to add some max_bitflips logic to mtd_read_oob after all.

Brian
Artem Bityutskiy - June 12, 2012, 10:37 a.m.
On Sun, 2012-06-10 at 22:45 -0700, Brian Norris wrote:
> 
> This brings up a side issue: I think many of the "_raw" function names
> in nand_bbt.c are misleading. They do not all use MTD_OPS_RAW (and
> shouldn't). Effectively, I would prefer that *all* the calls in
> nand_bbt.c use the non-RAW version of the MTD/NAND interfaces, and
> then ignore the errors if sensible. e.g., when reading a bad block
> marker. But nand_bbt.c does a lot more (reading BBTs from flash,
> checking for "Bbt0" and "1tbB" table marker, etc.) that *must* use ECC
> to provide any robustness.
> 
> So, I can send a patch that straightens out naming and brings
> scan_block_fast() and scan_block_full() into alignment on using
> MTD_OPS_PLACE_OOB.

Sounds like a good idea to me.
Mike Dunn - June 12, 2012, 1:23 p.m.
On 06/10/2012 10:45 PM, Brian Norris wrote:
> 
> On Sun, Jun 10, 2012 at 4:50 AM, Shmulik Ladkani
> <shmulik.ladkani@gmail.com> wrote:
>> On Thu,  7 Jun 2012 11:32:08 -0700 Mike Dunn <mikedunn@newsguy.com> wrote:
>>> Ignore ecc errors in the scan_read_raw_oob() function.  Also removed code that
>>> is now redundant.
>>>
>>> Signed-off-by: Mike Dunn <mikedunn@newsguy.com>
>>> ---
>>>  drivers/mtd/nand/nand_bbt.c |    6 +++---
>>>  1 files changed, 3 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/mtd/nand/nand_bbt.c b/drivers/mtd/nand/nand_bbt.c
>>> index 30d1319..585da44 100644
>>> --- a/drivers/mtd/nand/nand_bbt.c
>>> +++ b/drivers/mtd/nand/nand_bbt.c
>>> @@ -319,7 +319,8 @@ static int scan_read_raw_oob(struct mtd_info *mtd, uint8_t *buf, loff_t offs,
>>>
>>>               res = mtd_read_oob(mtd, offs, &ops);
>>>
>>> -             if (res)
>>> +             /* Ignore ECC errors when checking for BBM */
>>> +             if (res && !mtd_is_bitflip_or_eccerr(res))
>>>                       return res;
>>
>> IMO this is not necessary.
>> Note the 'ops.mode' is initialized to MTD_OPS_RAW;  meaning, no ECC is
>> performed during read ('ecc.read_page_raw' will be invoked).
>> Thus, EUCLEAN/EBADMSG should not be reported.
>> Am I missing something here?
> 
> Shmulik is correct. A check for bitflips on a MTD_OPS_RAW operation is
> unnecessary.


Yes, I missed the ops mode raw.  Obviously my analysis was wrong.

Good that nand_bbt is getting cleaned up some.

Thanks again Brian and Shmulik.

Mike

Patch

diff --git a/drivers/mtd/nand/nand_bbt.c b/drivers/mtd/nand/nand_bbt.c
index 30d1319..585da44 100644
--- a/drivers/mtd/nand/nand_bbt.c
+++ b/drivers/mtd/nand/nand_bbt.c
@@ -319,7 +319,8 @@  static int scan_read_raw_oob(struct mtd_info *mtd, uint8_t *buf, loff_t offs,
 
 		res = mtd_read_oob(mtd, offs, &ops);
 
-		if (res)
+		/* Ignore ECC errors when checking for BBM */
+		if (res && !mtd_is_bitflip_or_eccerr(res))
 			return res;
 
 		buf += mtd->oobsize + mtd->writesize;
@@ -406,8 +407,7 @@  static int scan_block_full(struct mtd_info *mtd, struct nand_bbt_descr *bd,
 	int ret, j;
 
 	ret = scan_read_raw_oob(mtd, buf, offs, readlen);
-	/* Ignore ECC errors when checking for BBM */
-	if (ret && !mtd_is_bitflip_or_eccerr(ret))
+	if (ret)
 		return ret;
 
 	for (j = 0; j < len; j++, buf += scanlen) {