Patchwork [RFC,v2] mtd: nand: add erased-page bitflip correction

login
register
mail settings
Submitter pekon gupta
Date March 14, 2014, 1:28 p.m.
Message ID <1394803700-29467-1-git-send-email-pekon@ti.com>
Download mbox | patch
Permalink /patch/330306/
State New
Headers show

Comments

pekon gupta - March 14, 2014, 1:28 p.m.
From: Brian Norris <computersforpeace@gmail.com>


*changes v1 -> v2*
Tried optimizing for performance by making following updates
 - *read_data and *read_oob passed as arguments to nand_verify_erased_page()
   to avoid re-reading the page from device chip->ecc.read_page_raw()
 - exit nand_verify_erased_page() as-soon-as bitflips > ecc.strength
 - using hweightN() only for non-0xff data
 - if erased-page is detected then clean only read_data
   ?? need to see if OOB cleaning affects file-system metadata like JFFS2 cleanmarker

Further possible enhancements
 - compare data and OOB in chunks of 32 bits(WORD)
	if (*(*u32 *)&oob[j] != 0xffffffff)
		bitflips += hweight32(~(* (u32 *)&oob[j]));
	if (*(*u32 *)&data[j] != 0xffffffff)
		bitflips += hweight32(~(*(u32 *)&data[j]));

Testing Procedure
NAND Device= Block-size=256K, Page-size=4K, OOBsize=224B
 - 8-bits bit-flips introduced at each 512 bytes (64 bit-flips/page)
 - Test Partition=502 (completely blank), ECC-schme=BCH16
 - with existing patches [2]
	LOG: TIME_TAKEN=94 THROUGHPUT=5.59 MB/s
	LOG: TIME_TAKEN=93 THROUGHPUT=5.66 MB/s
 - with current patch
	LOG: TIME_TAKEN=96 THROUGHPUT=5.48 MB/s


*original v1*
Upper layers (e.g., UBI/UBIFS) expect that pages that have been erased
will return all 1's (0xff). However, modern NAND (MLC, and even some
SLC) experience bitflips in unprogrammed pages, and so they may not read
back all 1's. This is problematic for drivers whose ECC cannot correct
bitflips in an all-0xff page, as they will report an ECC error
(-EBADMSG) when they come across such a page. This appears to UBIFS as
"corrupt empty space".

Several others [1][2] are attempting to solve this problem, but none is
generically useful, and most (all?) have subtle bugs in their reasoning. Let's
implement a simple, generic, and correct solution instead.

To handle such situations, we should implement the following software
workaround for drivers that need it: when the hardware driver reports an
ECC error, nand_base should "verify" this error by

 * Re-reading the page without ECC
 * counting the number of 0 bits in each ECC sector (N[i], for i = 0, 1,
   ..., chip->ecc.steps)
 * If any N[i] exceeds the ECC strength (and thus, the maximum allowable
   bitflips) then we consider this to be an uncorrectable sector.
 * If all N[i] are less than the ECC strength, then we "correct" the
   output to all-0xff and report max-bitflips appropriately

Obviously, this sequence can be compute intensive if applied heavily.
Ideally, MTD users should not need to read un-programmed pages very
often and would require this software check, for instance, only during
attach/mount checks or certain recovery situations.

Drivers can request support for this new feature by OR'ing
NAND_ECC_NEED_CHECK_FF into their chip->options.

[1] http://lists.infradead.org/pipermail/linux-mtd/2014-January/051513.html
[2] http://lists.infradead.org/pipermail/linux-mtd/2014-January/051585.html

Signed-off-by: Pekon Gupta <pekon@ti.com>
---
 drivers/mtd/nand/nand_base.c | 64 +++++++++++++++++++++++++++++++++++++++++++-
 include/linux/mtd/nand.h     |  5 ++++
 2 files changed, 68 insertions(+), 1 deletion(-)
Brian Norris - March 17, 2014, 6:41 p.m.
On Fri, Mar 14, 2014 at 06:58:20PM +0530, Pekon Gupta wrote:
> From: Brian Norris <computersforpeace@gmail.com>
> 
> 
> *changes v1 -> v2*
> Tried optimizing for performance by making following updates
>  - *read_data and *read_oob passed as arguments to nand_verify_erased_page()
>    to avoid re-reading the page from device chip->ecc.read_page_raw()

This is not valid for all drivers. See the 'oob_required' parameter to
read_page(). This means that if we can't verify for sure that it's
uncorrectable without looking at the OOB, we have to re-read to include
the OOB.

>  - exit nand_verify_erased_page() as-soon-as bitflips > ecc.strength

Thanks, that's probably good.

>  - using hweightN() only for non-0xff data

I'm not convinced that's a great optimization. I would write the simpler
version without the 0xff check, then add it if it really makes sense.
hweightN() is actually a pretty simple computation, and adding extra
branches might be more harmful than helpful.

>  - if erased-page is detected then clean only read_data
>    ?? need to see if OOB cleaning affects file-system metadata like JFFS2 cleanmarker

Nak. We must clean both OOB and data buffers if we are doing this
"correction". We don't put FS-specific hacks here. I don't know the
specifics of JFFS2 cleanmarkers, but JFFS2 should have valid ECC bytes
present whenever it expects a (non-raw) read to retain a few 0 bits. Or
else it should be using a raw read mode.

> 
> Further possible enhancements
>  - compare data and OOB in chunks of 32 bits(WORD)
> 	if (*(*u32 *)&oob[j] != 0xffffffff)
> 		bitflips += hweight32(~(* (u32 *)&oob[j]));
> 	if (*(*u32 *)&data[j] != 0xffffffff)
> 		bitflips += hweight32(~(*(u32 *)&data[j]));

I certainly agree that hweight8() is a bad choice. Either
hweight32(), hweight64(), or hweight_long().

> Testing Procedure
> NAND Device= Block-size=256K, Page-size=4K, OOBsize=224B
>  - 8-bits bit-flips introduced at each 512 bytes (64 bit-flips/page)
>  - Test Partition=502 (completely blank), ECC-schme=BCH16
>  - with existing patches [2]
> 	LOG: TIME_TAKEN=94 THROUGHPUT=5.59 MB/s
> 	LOG: TIME_TAKEN=93 THROUGHPUT=5.66 MB/s
>  - with current patch
> 	LOG: TIME_TAKEN=96 THROUGHPUT=5.48 MB/s

Your patch in [2] is incorrect. Please do not make any performance
comparisons with [2].

Can you compare against the current omap2.c head for some
configurations? It seems like there are still some platform-specific
hacks in omap2.c that likely give it a performance advantage.

> *original v1*
> Upper layers (e.g., UBI/UBIFS) expect that pages that have been erased
> will return all 1's (0xff). However, modern NAND (MLC, and even some
> SLC) experience bitflips in unprogrammed pages, and so they may not read
> back all 1's. This is problematic for drivers whose ECC cannot correct
> bitflips in an all-0xff page, as they will report an ECC error
> (-EBADMSG) when they come across such a page. This appears to UBIFS as
> "corrupt empty space".
> 
> Several others [1][2] are attempting to solve this problem, but none is
> generically useful, and most (all?) have subtle bugs in their reasoning. Let's
> implement a simple, generic, and correct solution instead.
> 
> To handle such situations, we should implement the following software
> workaround for drivers that need it: when the hardware driver reports an
> ECC error, nand_base should "verify" this error by
> 
>  * Re-reading the page without ECC
>  * counting the number of 0 bits in each ECC sector (N[i], for i = 0, 1,
>    ..., chip->ecc.steps)
>  * If any N[i] exceeds the ECC strength (and thus, the maximum allowable
>    bitflips) then we consider this to be an uncorrectable sector.
>  * If all N[i] are less than the ECC strength, then we "correct" the
>    output to all-0xff and report max-bitflips appropriately
> 
> Obviously, this sequence can be compute intensive if applied heavily.
> Ideally, MTD users should not need to read un-programmed pages very
> often and would require this software check, for instance, only during
> attach/mount checks or certain recovery situations.
> 
> Drivers can request support for this new feature by OR'ing
> NAND_ECC_NEED_CHECK_FF into their chip->options.
> 
> [1] http://lists.infradead.org/pipermail/linux-mtd/2014-January/051513.html
> [2] http://lists.infradead.org/pipermail/linux-mtd/2014-January/051585.html
> 
> Signed-off-by: Pekon Gupta <pekon@ti.com>
> ---
>  drivers/mtd/nand/nand_base.c | 64 +++++++++++++++++++++++++++++++++++++++++++-
>  include/linux/mtd/nand.h     |  5 ++++
>  2 files changed, 68 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c
> index c4f2cc9..305ba03 100644
> --- a/drivers/mtd/nand/nand_base.c
> +++ b/drivers/mtd/nand/nand_base.c
> @@ -1483,6 +1483,50 @@ static int nand_setup_read_retry(struct mtd_info *mtd, int retry_mode)
>  }
>  
>  /**
> + * nand_verify_erased_page - [INTERN] Check a page for 0xff + bitflips
> + * @mtd: MTD device structure
> + * @data: should contain read_data (either raw or "corrected")
> + * @oob:  should contain read_oob  (either raw or "corrected")
> + *
> + * Check a page to see if it is erased (w/ bitflips) after an uncorrectable ECC
> + * error, by counting number of zero-bits in both data and oob buffers.
> + * if number of bitflips in any ecc-step exceeds ecc.strength then it returns
> + * without checking the complete page, because its a genuine error
> + */
> +static int nand_verify_erased_page(struct mtd_info *mtd, struct nand_chip *chip,
> +					const uint8_t *data, const uint8_t *oob)
> +{
> +	unsigned int max_bitflips = 0, bitflips;
> +	int oob_step = mtd->oobsize / chip->ecc.steps;
> +	int ecc_strength = chip->ecc.strength;
> +	int ecc_size = chip->ecc.size;
> +	int i, j;
> +
> +	/* Check each ECC step individually */
> +	for (i = 0; i < chip->ecc.steps; i++) {
> +		bitflips = 0;
> +		for (j = 0; bitflips <= ecc_strength && j < oob_step; j++)
> +			if (oob[j] != 0xff)
> +				bitflips += hweight8(~oob[j]);
> +
> +		for (j = 0; bitflips <= ecc_strength && j < ecc_size; j++)
> +			if (data[j] != 0xff)
> +				bitflips += hweight8(~data[j]);
> +
> +		/* Too many bitflips */
> +		if (bitflips > ecc_strength)
> +			return bitflips;
> +
> +		max_bitflips = max(max_bitflips, bitflips);
> +
> +		data += ecc_size;
> +		oob += oob_step;
> +	}
> +
> +	return max_bitflips;
> +}
> +
> +/**
>   * nand_do_read_ops - [INTERN] Read data with ECC
>   * @mtd: MTD device structure
>   * @from: offset to read from
> @@ -1544,9 +1588,26 @@ read_retry:
>  				ret = chip->ecc.read_subpage(mtd, chip,
>  							col, bytes, bufpoi,
>  							page);
> -			else
> +			else {
>  				ret = chip->ecc.read_page(mtd, chip, bufpoi,
>  							  oob_required, page);
> +				/* Check if its an erased-page with bit-flips */
> +				if ((chip->options & NAND_ECC_NEED_CHECK_FF) &&
> +					mtd->ecc_stats.failed - ecc_failures) {
> +					ret = nand_verify_erased_page(mtd, chip,
> +							bufpoi, chip->oob_poi);

You have corrupted my patch so that now it does not handle the subpage
case.

Also, it's not safe to use data from chip->oob_poi if
oob_required==false.

> +					if (ret <= chip->ecc.strength) {

It's preferable if we don't have to duplicate the ecc.strength check
here and inside nand_verify_erased_page(). Huang and others have pointed
out that ecc.strength is not the best threshold, so I'd like to keep
this "threshold" defined in exactly one place.

> +						/* it's an erased-page */
> +						mtd->ecc_stats.failed =
> +								ecc_failures;
> +						memset(bufpoi, 0xff,
> +								mtd->writesize);
> +					} else {
> +						pr_warn("found uncorrectable bit-flips\n");
> +						ret = 0;
> +					}
> +				}
> +			}
>  			if (ret < 0) {
>  				if (!aligned)
>  					/* Invalidate page cache */
> @@ -1554,6 +1615,7 @@ read_retry:
>  				break;
>  			}
>  
> +
>  			max_bitflips = max_t(unsigned int, max_bitflips, ret);
>  
>  			/* Transfer not aligned data */

Anyway, I have already heard comments from several people (including
you), but you have only addressed some of them in your version of my
patch. Please, let me send my own patches, and you can provide your
feedback there.

If you want to do several patch iterations with performance comparisons
(NOT against patches that are performing unsound optimizations, like
[2]), then post a summary email with links to the patches you test
(e.g., with and without the "break early" checks; with hweight8() vs.
hweight_long(); etc.). You don't have to formally submit every patch you
test for performance, as this adds confusion. You could place them on
github, for instance, and we can integrate feeback based on the best
approach.

Brian
pekon gupta - March 18, 2014, 6:23 a.m.
Hi Brain,


>From: Brian Norris [mailto:computersforpeace@gmail.com]
>>On Fri, Mar 14, 2014 at 06:58:20PM +0530, Pekon Gupta wrote:
>>> From: Brian Norris <computersforpeace@gmail.com>
>>
>>
>> *changes v1 -> v2*
>> Tried optimizing for performance by making following updates
>>  - *read_data and *read_oob passed as arguments to nand_verify_erased_page()
>>    to avoid re-reading the page from device chip->ecc.read_page_raw()
>
>This is not valid for all drivers. See the 'oob_required' parameter to
>read_page(). This means that if we can't verify for sure that it's
>uncorrectable without looking at the OOB, we have to re-read to include
>the OOB.
>
Ok.. But then your v1 patch does not, work as-it-is with OMAP NAND driver.
I think the problem is same as that of GPMI driver, there is difference in
Page layout when using ecc.read_page_raw() and ecc.read_page().


>>  - exit nand_verify_erased_page() as-soon-as bitflips > ecc.strength
>
>Thanks, that's probably good.
>
>>  - using hweightN() only for non-0xff data
>
>I'm not convinced that's a great optimization. I would write the simpler
>version without the 0xff check, then add it if it really makes sense.
>hweightN() is actually a pretty simple computation, and adding extra
>branches might be more harmful than helpful.
>
Yes, I'll try to get number to check this.


>>  - if erased-page is detected then clean only read_data
>>    ?? need to see if OOB cleaning affects file-system metadata like JFFS2 cleanmarker
>
>Nak. We must clean both OOB and data buffers if we are doing this
>"correction". We don't put FS-specific hacks here. I don't know the
>specifics of JFFS2 cleanmarkers, but JFFS2 should have valid ECC bytes
>present whenever it expects a (non-raw) read to retain a few 0 bits. Or
>else it should be using a raw read mode.
>
This is important, we cannot just ignore it.
JFFS2 writes a signature just after erase in oobfree[] area [a], this helps
to identify that block-erase was completed successfully. Now there are two
issues here:

(1) If you count complete spare area (which includes JFFS2 cleanmarker) then
you would always see more lot of bit-flips > ecc.strength.

(2) If after detection of erased-page if you memset(OOB, oob, mtd->oobsize)
Then you are clearing off this JFFS2 marker also, and the JFFS2 FS *may* crib
(however, I think JFFS2 uses chip->ecc.read_oob_raw() to read OOB, so it
 Should not be a problem).
- Therefore in my earlier patch [2] I was just counting bit-flips in read_ecc[]
  *not* read_oob[].
- Also, 'oob_required=1' may not be used by user-space utilities, unless 
   in specific  conditions like 'nanddump' || 'nandread --raw'.

 [...]

>> +			else {
>>  				ret = chip->ecc.read_page(mtd, chip, bufpoi,
>>  							  oob_required, page);
>> +				/* Check if its an erased-page with bit-flips */
>> +				if ((chip->options & NAND_ECC_NEED_CHECK_FF) &&
>> +					mtd->ecc_stats.failed - ecc_failures) {
>> +					ret = nand_verify_erased_page(mtd, chip,
>> +							bufpoi, chip->oob_poi);
>
>You have corrupted my patch so that now it does not handle the subpage
>case.
>
I think handling sub-page reads should be handled separately, because using
chip->ecc.read_page_raw() when sub-page is enabled, may not be a good.
 
[...]

>
>Anyway, I have already heard comments from several people (including
>you), but you have only addressed some of them in your version of my
>patch. Please, let me send my own patches, and you can provide your
>feedback there.
>
Yes for sure.. I was just trying to show the changes which worked for
OMAP NAND. I had anyways reverted the changes, NAKed by you,
before taking the performance numbers.
However, lets first get this working for all other drivers as well.
(And please do pay attention to JFFS2 clean-marker query).


[a] http://www.linux-mtd.infradead.org/faq/jffs2.html#L_clmarker

with regards, pekon
Brian Norris - March 18, 2014, 9:05 a.m.
On Tue, Mar 18, 2014 at 06:23:10AM +0000, Pekon Gupta wrote:
> >From: Brian Norris [mailto:computersforpeace@gmail.com]
> >>On Fri, Mar 14, 2014 at 06:58:20PM +0530, Pekon Gupta wrote:
> >>> From: Brian Norris <computersforpeace@gmail.com>
> >>
> >>
> >> *changes v1 -> v2*
> >> Tried optimizing for performance by making following updates
> >>  - *read_data and *read_oob passed as arguments to nand_verify_erased_page()
> >>    to avoid re-reading the page from device chip->ecc.read_page_raw()
> >
> >This is not valid for all drivers. See the 'oob_required' parameter to
> >read_page(). This means that if we can't verify for sure that it's
> >uncorrectable without looking at the OOB, we have to re-read to include
> >the OOB.
> >
> Ok.. But then your v1 patch does not, work as-it-is with OMAP NAND driver.

Why, exactly? What ECC modes did you test? And what did you see?

> I think the problem is same as that of GPMI driver, there is difference in
> Page layout when using ecc.read_page_raw() and ecc.read_page().

Are you sure? omap2.c looks like it either uses nand_base defaults, or
it uses omap_read_page_bch(), which pulls data directly into the data
'buf' -- it doesn't do any major concatenation/rearranging of data. So
at most, I think there would be a small (probably negligible)
inconsistency between the layout that my patch assumes and the layout
that omap2.c uses.

> >>  - if erased-page is detected then clean only read_data
> >>    ?? need to see if OOB cleaning affects file-system metadata like JFFS2 cleanmarker
> >
> >Nak. We must clean both OOB and data buffers if we are doing this
> >"correction". We don't put FS-specific hacks here. I don't know the
> >specifics of JFFS2 cleanmarkers, but JFFS2 should have valid ECC bytes
> >present whenever it expects a (non-raw) read to retain a few 0 bits. Or
> >else it should be using a raw read mode.
> >
> This is important, we cannot just ignore it.
> JFFS2 writes a signature just after erase in oobfree[] area [a], this helps
> to identify that block-erase was completed successfully. Now there are two
> issues here:
> 
> (1) If you count complete spare area (which includes JFFS2 cleanmarker) then
> you would always see more lot of bit-flips > ecc.strength.

That's not a problem with our MTD-layer solution; it's a problem with
JFFS2. After you've programmed anything to the flash (without proper ECC
parity), you cannot expect to ever "correct" data.

> (2) If after detection of erased-page if you memset(OOB, oob, mtd->oobsize)
> Then you are clearing off this JFFS2 marker also, and the JFFS2 FS *may* crib
> (however, I think JFFS2 uses chip->ecc.read_oob_raw() to read OOB, so it
>  Should not be a problem).

It does not use read_oob_raw(). The cleanmarker checks use
mtd_read_oob() with MTD_OPS_AUTO_OOB, which filters down to
nand_do_read_oob() and ecc.read_oob(). So incidentally, it never touches
the code I'm adding, although it would probably be good to add
protection in nand_do_read_oob() as well (which then might cause the
problem you are theorizing).

But here's the crucial point: JFFS2 assumes that drivers do not do error
correction on the spare bytes!!! This is a bad assumption today, and I'm
not really interested in correcting it. If this assumption becomes a
problem, then we have an either-or: either a driver chooses to support
the legacy of JFFS2 (it's already broken in many respects on many NAND),
or it chooses to support erased-page correction.

> - Therefore in my earlier patch [2] I was just counting bit-flips in read_ecc[]
>   *not* read_oob[].

You can't have your cake and eat it too. You're trying to support
intentionally programming bitflips (i.e., the 'cleanmarker') to spare
area, but you're also trying to correct bits in such a page. Pick one.
Or improve the JFFS2 / MTD API so that JFFS2 can communicate when it
does and doesn't want error correction, rather than making assumptions.

> - Also, 'oob_required=1' may not be used by user-space utilities, unless 
>    in specific  conditions like 'nanddump' || 'nandread --raw'.

I don't understand what you're saying here.

>  [...]
> 
> >> +			else {
> >>  				ret = chip->ecc.read_page(mtd, chip, bufpoi,
> >>  							  oob_required, page);
> >> +				/* Check if its an erased-page with bit-flips */
> >> +				if ((chip->options & NAND_ECC_NEED_CHECK_FF) &&
> >> +					mtd->ecc_stats.failed - ecc_failures) {
> >> +					ret = nand_verify_erased_page(mtd, chip,
> >> +							bufpoi, chip->oob_poi);
> >
> >You have corrupted my patch so that now it does not handle the subpage
> >case.
> >
> I think handling sub-page reads should be handled separately, because using
> chip->ecc.read_page_raw() when sub-page is enabled, may not be a good.

Why not? nand_do_read_ops() is written so that it can handle either
full- or sub-page reads, so there doesn't seem to be a fundamental
limitation.


> [...]
> 
> >
> >Anyway, I have already heard comments from several people (including
> >you), but you have only addressed some of them in your version of my
> >patch. Please, let me send my own patches, and you can provide your
> >feedback there.
> >
> Yes for sure.. I was just trying to show the changes which worked for
> OMAP NAND. I had anyways reverted the changes, NAKed by you,
> before taking the performance numbers.

But why did you say you were reporting numbers for [2], when I nak'd it?

> However, lets first get this working for all other drivers as well.
> (And please do pay attention to JFFS2 clean-marker query).

Brian

[2] http://lists.infradead.org/pipermail/linux-mtd/2014-January/051585.html
pekon gupta - March 18, 2014, 10:40 a.m.
Hi Brian,

>From: Brian Norris [mailto:computersforpeace@gmail.com]
>On Tue, Mar 18, 2014 at 06:23:10AM +0000, Pekon Gupta wrote:
>> >From: Brian Norris [mailto:computersforpeace@gmail.com]

Just last piece about clearing OOB, on which I'm not convinced ...

 [...]

>> >>  - if erased-page is detected then clean only read_data
>> >>    ?? need to see if OOB cleaning affects file-system metadata like JFFS2 cleanmarker
>> >
>> >Nak. We must clean both OOB and data buffers if we are doing this
>> >"correction". We don't put FS-specific hacks here. I don't know the
>> >specifics of JFFS2 cleanmarkers, but JFFS2 should have valid ECC bytes
>> >present whenever it expects a (non-raw) read to retain a few 0 bits. Or
>> >else it should be using a raw read mode.
>> >
>> This is important, we cannot just ignore it.
>> JFFS2 writes a signature just after erase in oobfree[] area [a], this helps
>> to identify that block-erase was completed successfully. Now there are two
>> issues here:
>>
>> (1) If you count complete spare area (which includes JFFS2 cleanmarker) then
>> you would always see more lot of bit-flips > ecc.strength.
>
>That's not a problem with our MTD-layer solution; it's a problem with
>JFFS2. After you've programmed anything to the flash (without proper ECC
>parity), you cannot expect to ever "correct" data.
>
>> (2) If after detection of erased-page if you memset(OOB, oob, mtd->oobsize)
>> Then you are clearing off this JFFS2 marker also, and the JFFS2 FS *may* crib
>> (however, I think JFFS2 uses chip->ecc.read_oob_raw() to read OOB, so it
>>  Should not be a problem).
>
>It does not use read_oob_raw(). The cleanmarker checks use
>mtd_read_oob() with MTD_OPS_AUTO_OOB, which filters down to
>nand_do_read_oob() and ecc.read_oob(). So incidentally, it never touches
>the code I'm adding, although it would probably be good to add
>protection in nand_do_read_oob() as well (which then might cause the
>problem you are theorizing).
>
>But here's the crucial point: JFFS2 assumes that drivers do not do error
>correction on the spare bytes!!! This is a bad assumption today, and I'm
>not really interested in correcting it. If this assumption becomes a
>problem, then we have an either-or: either a driver chooses to support
>the legacy of JFFS2 (it's already broken in many respects on many NAND),
>or it chooses to support erased-page correction.
>
Sorry, I cannot deprecate support for JFFS2 in lieu of NAND_NEED_FF_ECC_CHECK
because still there are many legacy device users who use JFFS2.

Also, having bit-flips in ecc.layout->oobfree[] should be of an "erased-page"
should be acceptable, because:
(1) UBIFS does not store any metadata in OOB, so any bit-flips there don't
   affect the file-system at all.
(2) JFFS2  which uses ecc.layout->oobfree[] to store its clean-marker also
  does not intend it to be protected by ECC. So, any corruption in clean-marker
  may just cause a re-erase of the already erased-block [b].
(3) YAFFS2 uses oobfree[] to store tags, and probably those are expected
  to be present only for programmed-pages, and are protected by ECC [c].
(4) LOGFS (don't know much about it).

I'll wait for your next revision of patch,  but please re-think on my proposal
to leave out ecc.layout->oobfree[], and just reset ecc.laout->ecc.pos[].
as it keeps compatibility with other file-systems which use OOB to store
some metadata.

(may be you can put resetting complete OOB, as separate patch later).


>> - Therefore in my earlier patch [2] I was just counting bit-flips in read_ecc[]
>>   *not* read_oob[].
>
>You can't have your cake and eat it too. You're trying to support
>intentionally programming bitflips (i.e., the 'cleanmarker') to spare
>area, but you're also trying to correct bits in such a page. Pick one.
>Or improve the JFFS2 / MTD API so that JFFS2 can communicate when it
>does and doesn't want error correction, rather than making assumptions.
>

[b] http://www.linux-mtd.infradead.org/faq/jffs2.html#L_clmarker
[c] http://www.yaffs.net/documents/how-yaffs-works
      http://www.yaffs.net/documents/faq

with regards, pekon

Patch

diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c
index c4f2cc9..305ba03 100644
--- a/drivers/mtd/nand/nand_base.c
+++ b/drivers/mtd/nand/nand_base.c
@@ -1483,6 +1483,50 @@  static int nand_setup_read_retry(struct mtd_info *mtd, int retry_mode)
 }
 
 /**
+ * nand_verify_erased_page - [INTERN] Check a page for 0xff + bitflips
+ * @mtd: MTD device structure
+ * @data: should contain read_data (either raw or "corrected")
+ * @oob:  should contain read_oob  (either raw or "corrected")
+ *
+ * Check a page to see if it is erased (w/ bitflips) after an uncorrectable ECC
+ * error, by counting number of zero-bits in both data and oob buffers.
+ * if number of bitflips in any ecc-step exceeds ecc.strength then it returns
+ * without checking the complete page, because its a genuine error
+ */
+static int nand_verify_erased_page(struct mtd_info *mtd, struct nand_chip *chip,
+					const uint8_t *data, const uint8_t *oob)
+{
+	unsigned int max_bitflips = 0, bitflips;
+	int oob_step = mtd->oobsize / chip->ecc.steps;
+	int ecc_strength = chip->ecc.strength;
+	int ecc_size = chip->ecc.size;
+	int i, j;
+
+	/* Check each ECC step individually */
+	for (i = 0; i < chip->ecc.steps; i++) {
+		bitflips = 0;
+		for (j = 0; bitflips <= ecc_strength && j < oob_step; j++)
+			if (oob[j] != 0xff)
+				bitflips += hweight8(~oob[j]);
+
+		for (j = 0; bitflips <= ecc_strength && j < ecc_size; j++)
+			if (data[j] != 0xff)
+				bitflips += hweight8(~data[j]);
+
+		/* Too many bitflips */
+		if (bitflips > ecc_strength)
+			return bitflips;
+
+		max_bitflips = max(max_bitflips, bitflips);
+
+		data += ecc_size;
+		oob += oob_step;
+	}
+
+	return max_bitflips;
+}
+
+/**
  * nand_do_read_ops - [INTERN] Read data with ECC
  * @mtd: MTD device structure
  * @from: offset to read from
@@ -1544,9 +1588,26 @@  read_retry:
 				ret = chip->ecc.read_subpage(mtd, chip,
 							col, bytes, bufpoi,
 							page);
-			else
+			else {
 				ret = chip->ecc.read_page(mtd, chip, bufpoi,
 							  oob_required, page);
+				/* Check if its an erased-page with bit-flips */
+				if ((chip->options & NAND_ECC_NEED_CHECK_FF) &&
+					mtd->ecc_stats.failed - ecc_failures) {
+					ret = nand_verify_erased_page(mtd, chip,
+							bufpoi, chip->oob_poi);
+					if (ret <= chip->ecc.strength) {
+						/* it's an erased-page */
+						mtd->ecc_stats.failed =
+								ecc_failures;
+						memset(bufpoi, 0xff,
+								mtd->writesize);
+					} else {
+						pr_warn("found uncorrectable bit-flips\n");
+						ret = 0;
+					}
+				}
+			}
 			if (ret < 0) {
 				if (!aligned)
 					/* Invalidate page cache */
@@ -1554,6 +1615,7 @@  read_retry:
 				break;
 			}
 
+
 			max_bitflips = max_t(unsigned int, max_bitflips, ret);
 
 			/* Transfer not aligned data */
diff --git a/include/linux/mtd/nand.h b/include/linux/mtd/nand.h
index 0747fef..515f61f 100644
--- a/include/linux/mtd/nand.h
+++ b/include/linux/mtd/nand.h
@@ -182,6 +182,11 @@  typedef enum {
  * before calling nand_scan_tail.
  */
 #define NAND_BUSWIDTH_AUTO      0x00080000
+/*
+ * Driver's ECC does not correct erased pages that have bitflips (i.e.,
+ * "mostly" 0xff data), so ECC errors should be checked for mostly-0xff
+ */
+#define NAND_ECC_NEED_CHECK_FF	0x00100000
 
 /* Options set by nand scan */
 /* Nand scan has allocated controller struct */