diff mbox

[8/8] mtd: nand: use ECC, if present, when scanning OOB

Message ID 1340408145-24531-9-git-send-email-computersforpeace@gmail.com
State New, archived
Headers show

Commit Message

Brian Norris June 22, 2012, 11:35 p.m. UTC
scan_read_raw_oob() is used in only in places where the MTD_OPS_PLACE_OOB
mode is preferable MTD_OPS_RAW mode, so use MTD_OPS_PLACE_OOB instead.
MTD_OPS_PLACE_OOB provides the same functionality with the potential[1]
added bonus of error correction.

This brings scan_block_full() in line with scan_block_fast() so that they
both read bad block markers with MTD_OPS_PLACE_OOB. This can help in
preventing 0xff markers (in good blocks) from being interpreted as bad
block indicators in the presence of a single bitflip.

Note that ECC error codes (EUCLEAN or EBADMSG) are already silently
ignored in all users of scan_read_raw_oob().

[1] Few  drivers perform proper error correction on OOB data. In those
    cases, the use of MTD_OPS_RAW vs. MTD_OPS_PLACE_OOB is not
    significant.

Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Cc: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Cc: Mike Dunn <mikedunn@newsguy.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 drivers/mtd/nand/nand_bbt.c |   27 ++++++++++++++++++++-------
 1 file changed, 20 insertions(+), 7 deletions(-)

Comments

Shmulik Ladkani June 26, 2012, 2:09 p.m. UTC | #1
Hi,

On Fri, 22 Jun 2012 16:35:45 -0700 Brian Norris <computersforpeace@gmail.com> wrote:
> scan_read_raw_oob() is used in only in places where the MTD_OPS_PLACE_OOB
> mode is preferable MTD_OPS_RAW mode, so use MTD_OPS_PLACE_OOB instead.

Please consider rephrasing, text is bit confusing.

> + * Scan read data from data+OOB. May traverse multiple pages, interleaving
> + * page,OOB,page,OOB,... in buf.

In the case 'len' is less than 'mtd->writesize' (possible from a
'scan_block_full' call), the read oob will be placed immediately after
the partially read page, quoting scan_read_raw_oob:

		ops.len = min(len, (size_t)mtd->writesize);
		ops.oobbuf = buf + ops.len;

Any idea if this is intentional or a bug?

Regards,
Shmulik
Brian Norris July 10, 2012, 2:39 a.m. UTC | #2
On Tue, Jun 26, 2012 at 7:09 AM, Shmulik Ladkani
<shmulik.ladkani@gmail.com> wrote:
> On Fri, 22 Jun 2012 16:35:45 -0700 Brian Norris <computersforpeace@gmail.com> wrote:
>> scan_read_raw_oob() is used in only in places where the MTD_OPS_PLACE_OOB
>> mode is preferable MTD_OPS_RAW mode, so use MTD_OPS_PLACE_OOB instead.
>
> Please consider rephrasing, text is bit confusing.

I realized I was missing the word "to". So the edit would read:

scan_read_raw_oob() is used in only in places where the
MTD_OPS_PLACE_OOB mode is preferable to MTD_OPS_RAW mode, so use
MTD_OPS_PLACE_OOB instead.

I can rephrase more, regarding what each mode does, but I'm wondering
if this clears up your understanding.

>> + * Scan read data from data+OOB. May traverse multiple pages, interleaving
>> + * page,OOB,page,OOB,... in buf.
>
> In the case 'len' is less than 'mtd->writesize' (possible from a
> 'scan_block_full' call), the read oob will be placed immediately after
> the partially read page, quoting scan_read_raw_oob:
>
>                 ops.len = min(len, (size_t)mtd->writesize);
>                 ops.oobbuf = buf + ops.len;
>
> Any idea if this is intentional or a bug?

Not really. I honestly don't fully understand many of the options in
NAND_BBT, but I'm trying to work through some of it. I think there's a
lot of half-baked stuff in here with not-very-good code structure.
Some of the code is written as if it were generic, but it's only meant
for some very specific circumstances.

With that said, here's a thought: it looks like your hypothetical "len
< mtd->writesize" would only occur with NAND_BBT_SCANEMPTY disabled
and NAND_BBT_SCANALLPAGES enabled. But I only see them used together:
NAND_BBT_SCANEMPTY | NAND_BBT_SCANALLPAGES (omap2.c and with
agand_flashbased in nand_bbt.c). So maybe it's a kind of "bug" for a
case that was never really thought out. And if they must be used
together, maybe we should document that.

Brian
Matthieu CASTET July 10, 2012, 7:45 a.m. UTC | #3
Hi Brian,

Brian Norris a écrit :
> scan_read_raw_oob() is used in only in places where the MTD_OPS_PLACE_OOB
> mode is preferable MTD_OPS_RAW mode, so use MTD_OPS_PLACE_OOB instead.
> MTD_OPS_PLACE_OOB provides the same functionality with the potential[1]
> added bonus of error correction.
> 
> This brings scan_block_full() in line with scan_block_fast() so that they
> both read bad block markers with MTD_OPS_PLACE_OOB. This can help in
> preventing 0xff markers (in good blocks) from being interpreted as bad
> block indicators in the presence of a single bitflip.
As far I understand the code, this work when "chip->ecc.read_oob" (used in
nand_do_read_oob) correct bit flip.

But I see no "chip->ecc.read_oob" implementation that can return bit flip. Is
that expected ?

This can also work when nand_do_read_ops is used (ops->datbuf != NULL). But it
is hard to see case where it can correct bit flip in bad block marker. Do you
have any exemple ?

For example the default chip->ecc.read_page (nand_read_page_hwecc) doesn't read
oob with ecc [1].




Matthieu

PS : Did you have any comment on
http://thread.gmane.org/gmane.linux.drivers.mtd/42243 ?

[1]
   for (i = 0; eccsteps; eccsteps--, i += eccbytes, p += eccsize) {
        chip->ecc.hwctl(mtd, NAND_ECC_READ);
        chip->read_buf(mtd, p, eccsize);
        chip->ecc.calculate(mtd, p, &ecc_calc[i]);
    }
    chip->read_buf(mtd, chip->oob_poi, mtd->oobsize);

> 
> Note that ECC error codes (EUCLEAN or EBADMSG) are already silently
> ignored in all users of scan_read_raw_oob().
> 
> [1] Few  drivers perform proper error correction on OOB data. In those
>     cases, the use of MTD_OPS_RAW vs. MTD_OPS_PLACE_OOB is not
>     significant.
> 
> Signed-off-by: Brian Norris <computersforpeace@gmail.com>
> Cc: Shmulik Ladkani <shmulik.ladkani@gmail.com>
> Cc: Mike Dunn <mikedunn@newsguy.com>
> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
>  drivers/mtd/nand/nand_bbt.c |   27 ++++++++++++++++++++-------
>  1 file changed, 20 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/mtd/nand/nand_bbt.c b/drivers/mtd/nand/nand_bbt.c
> index 45cdb97..7b9a0a7 100644
> --- a/drivers/mtd/nand/nand_bbt.c
> +++ b/drivers/mtd/nand/nand_bbt.c
> @@ -289,14 +289,24 @@ static int scan_read_data(struct mtd_info *mtd, uint8_t *buf, loff_t offs,
>  	return mtd_read(mtd, offs, len, &retlen, buf);
>  }
>  
> -/* Scan read data from flash */
> +/**
> + * scan_read_oob - [GENERIC] Scan data+OOB region to buffer
> + * @mtd: MTD device structure
> + * @buf: temporary buffer
> + * @offs: offset at which to scan
> + * @len: length of data region to read
> + *
> + * Scan read data from data+OOB. May traverse multiple pages, interleaving
> + * page,OOB,page,OOB,... in buf. Completes transfer and returns the "strongest"
> + * ECC condition (error or bitflip). May quit on the first (non-ECC) error.
> + */
>  static int scan_read_oob(struct mtd_info *mtd, uint8_t *buf, loff_t offs,
>  			 size_t len)
>  {
>  	struct mtd_oob_ops ops;
> -	int res;
> +	int res, ret = 0;
>  
> -	ops.mode = MTD_OPS_RAW;
> +	ops.mode = MTD_OPS_PLACE_OOB;
>  	ops.ooboffs = 0;
>  	ops.ooblen = mtd->oobsize;
>  
> @@ -306,15 +316,18 @@ static int scan_read_oob(struct mtd_info *mtd, uint8_t *buf, loff_t offs,
>  		ops.oobbuf = buf + ops.len;
>  
>  		res = mtd_read_oob(mtd, offs, &ops);
> -
> -		if (res)
> -			return res;
> +		if (res) {
> +			if (!mtd_is_bitflip_or_eccerr(res))
> +				return res;
> +			else if (mtd_is_eccerr(res) || !ret)
> +				ret = res;
> +		}
>  
>  		buf += mtd->oobsize + mtd->writesize;
>  		len -= mtd->writesize;
>  		offs += mtd->writesize;
>  	}
> -	return 0;
> +	return ret;
>  }
>  
>  static int scan_read(struct mtd_info *mtd, uint8_t *buf, loff_t offs,
Brian Norris July 13, 2012, 5:39 p.m. UTC | #4
On Tue, Jul 10, 2012 at 12:45 AM, Matthieu CASTET
<matthieu.castet@parrot.com> wrote:
> Brian Norris a écrit :
>> scan_read_raw_oob() is used in only in places where the MTD_OPS_PLACE_OOB
>> mode is preferable MTD_OPS_RAW mode, so use MTD_OPS_PLACE_OOB instead.
>> MTD_OPS_PLACE_OOB provides the same functionality with the potential[1]
>> added bonus of error correction.
>>
>> This brings scan_block_full() in line with scan_block_fast() so that they
>> both read bad block markers with MTD_OPS_PLACE_OOB. This can help in
>> preventing 0xff markers (in good blocks) from being interpreted as bad
>> block indicators in the presence of a single bitflip.
>
> As far I understand the code, this work when "chip->ecc.read_oob" (used in
> nand_do_read_oob) correct bit flip.
>
> But I see no "chip->ecc.read_oob" implementation that can return bit flip. Is
> that expected ?

I have an out-of-tree driver that corrects OOB bitflips. Is there
really no other HW out there that corrects OOB errors?

Anyway, I understand that my driver is an outlier here, but I don't
see a real disadvantage in these changes. But on the positive side, I
expect that in the future, more drivers/HW will either want to stop
using OOB for anything at all or will want ECC protection for OOB.

> This can also work when nand_do_read_ops is used (ops->datbuf != NULL). But it
> is hard to see case where it can correct bit flip in bad block marker. Do you
> have any exemple ?

First of all, this has no effect if the driver does not protect OOB
with ECC (i.e., for OOB-only reads, MTD_OPS_PLACE_OOB == MTD_OPS_RAW).
So the following argument only applies when OOB is ECC-protected.

Consider a *good* block that is written with filesystem data. On
bootup, Linux may scan this block's BBM to check if it is bad. If a
bitflip occurs in the bad block marker, then it may be erroneously
considered bad.

Similarly, if a block was marked bad from wear (not factory-marked),
then its BBM may be written along with ECC protection. Then, when we
scan for bad blocks, it will be protected from bitflips that could
possibly cause 0x00 to appear non-zero. (This is not a big issue,
since 'non-zero' is still bad, as long as 0x00 didn't flip to 0xff -
quite unlikely...)

> PS : Did you have any comment on
> http://thread.gmane.org/gmane.linux.drivers.mtd/42243 ?

I read it, and it seems promising. I agree with much of the premise
(that nand_bbt.c is ugly and repetitive at times) but haven't had
enough time to review properly. Sorry. I'm a bit backlogged and will
be for a few weeks, I think. But I'll see what I can do.

Thanks,
Brian
Mike Dunn July 15, 2012, 8:01 p.m. UTC | #5
On 07/13/2012 10:39 AM, Brian Norris wrote:
> 
> I have an out-of-tree driver that corrects OOB bitflips. Is there
> really no other HW out there that corrects OOB errors?


The diskonchip g3 and g4 devices use a hw-generated hamming byte to protect the
first eight oob bytes (those not used for page data ecc) with ecc of one bit
strength, but neither driver implements the correcting algorithm yet.


> 
> Anyway, I understand that my driver is an outlier here, but I don't
> see a real disadvantage in these changes. But on the positive side, I
> expect that in the future, more drivers/HW will either want to stop
> using OOB for anything at all or will want ECC protection for OOB.
> 
>> This can also work when nand_do_read_ops is used (ops->datbuf != NULL). But it
>> is hard to see case where it can correct bit flip in bad block marker. Do you
>> have any exemple ?
> 
> First of all, this has no effect if the driver does not protect OOB
> with ECC (i.e., for OOB-only reads, MTD_OPS_PLACE_OOB == MTD_OPS_RAW).
> So the following argument only applies when OOB is ECC-protected.
> 
> Consider a *good* block that is written with filesystem data. On
> bootup, Linux may scan this block's BBM to check if it is bad. If a
> bitflip occurs in the bad block marker, then it may be erroneously
> considered bad.


Yeah, this is a strong argument for ecc on oob-only reads.

Thanks,
Mike
Ivan Djelic July 16, 2012, 2:01 p.m. UTC | #6
On Sun, Jul 15, 2012 at 10:01:24PM +0200, Mike Dunn wrote:
> On 07/13/2012 10:39 AM, Brian Norris wrote:
> > 
> > I have an out-of-tree driver that corrects OOB bitflips. Is there
> > really no other HW out there that corrects OOB errors?
> 
> 
> The diskonchip g3 and g4 devices use a hw-generated hamming byte to protect the
> first eight oob bytes (those not used for page data ecc) with ecc of one bit
> strength, but neither driver implements the correcting algorithm yet.
> 
> 
> > 
> > Anyway, I understand that my driver is an outlier here, but I don't
> > see a real disadvantage in these changes. But on the positive side, I
> > expect that in the future, more drivers/HW will either want to stop
> > using OOB for anything at all or will want ECC protection for OOB.
> > 
> >> This can also work when nand_do_read_ops is used (ops->datbuf != NULL). But it
> >> is hard to see case where it can correct bit flip in bad block marker. Do you
> >> have any exemple ?
> > 
> > First of all, this has no effect if the driver does not protect OOB
> > with ECC (i.e., for OOB-only reads, MTD_OPS_PLACE_OOB == MTD_OPS_RAW).
> > So the following argument only applies when OOB is ECC-protected.
> > 
> > Consider a *good* block that is written with filesystem data. On
> > bootup, Linux may scan this block's BBM to check if it is bad. If a
> > bitflip occurs in the bad block marker, then it may be erroneously
> > considered bad.
> 
> 
> Yeah, this is a strong argument for ecc on oob-only reads.

Hello Mike,

I think it is a strong argument for a robust reading of BBM, rather than an argument
for ECC on OOB-only reads. By "robust reading", I mean simply looking at the Hamming weight of the
marker (the number of 1s in the BBM) rather than its value, as done in nand_block_bad() by setting chip->badblockbits.

This robust reading is trivially implemented, does not depend on OOB ecc availability,
and benefits all drivers. Even if your driver implements OOB ECC, it may not work
on an erased block with a bitflip in its BBM (because erased data may not have a valid
ECC). Moreover, reading just the OOB region with ECC may require a full page read on some drivers
(when OOB and data are parts of the same codeword).

To me, the only strong reason for wanting OOB ECC is the implementation of YAFFS2 or similar filesystems
which require OOB metadata protection. But maybe I'm missing some other use cases ?

What do you think ?

BR,
Mike Dunn July 16, 2012, 6:36 p.m. UTC | #7
Hi Ivan, thanks again for the comments.

On 07/16/2012 07:01 AM, Ivan Djelic wrote:
> On Sun, Jul 15, 2012 at 10:01:24PM +0200, Mike Dunn wrote:
>
>> Yeah, this is a strong argument for ecc on oob-only reads.
> 
> Hello Mike,
> 
> I think it is a strong argument for a robust reading of BBM, rather than an argument
> for ECC on OOB-only reads. By "robust reading", I mean simply looking at the Hamming weight of the
> marker (the number of 1s in the BBM) rather than its value, as done in nand_block_bad() by setting chip->badblockbits.
> 
> This robust reading is trivially implemented, does not depend on OOB ecc availability,
> and benefits all drivers. Even if your driver implements OOB ECC, it may not work
> on an erased block with a bitflip in its BBM (because erased data may not have a valid
> ECC). 


This is a certainty, no?  Erased, by definition, includes any ecc bytes.


> Moreover, reading just the OOB region with ECC may require a full page read on some drivers
> (when OOB and data are parts of the same codeword).
> 
> To me, the only strong reason for wanting OOB ECC is the implementation of YAFFS2 or similar filesystems
> which require OOB metadata protection. But maybe I'm missing some other use cases ?
> 
> What do you think ?


If we assume the oob bytes on the first page of a good block can contain
anything, won't simply counting the bits make the risk of falsly identifying a
bb marker unacceptably high?

Thanks,
Mike
Ivan Djelic July 16, 2012, 9:34 p.m. UTC | #8
On Mon, Jul 16, 2012 at 08:36:56PM +0200, Mike Dunn wrote:
> Hi Ivan, thanks again for the comments.
> 
> On 07/16/2012 07:01 AM, Ivan Djelic wrote:
> > On Sun, Jul 15, 2012 at 10:01:24PM +0200, Mike Dunn wrote:
> >
> >> Yeah, this is a strong argument for ecc on oob-only reads.
> > 
> > Hello Mike,
> > 
> > I think it is a strong argument for a robust reading of BBM, rather than an argument
> > for ECC on OOB-only reads. By "robust reading", I mean simply looking at the Hamming weight of the
> > marker (the number of 1s in the BBM) rather than its value, as done in nand_block_bad() by setting chip->badblockbits.
> > 
> > This robust reading is trivially implemented, does not depend on OOB ecc availability,
> > and benefits all drivers. Even if your driver implements OOB ECC, it may not work
> > on an erased block with a bitflip in its BBM (because erased data may not have a valid
> > ECC). 
> 
> 
> This is a certainty, no?  Erased, by definition, includes any ecc bytes.

If you add the right polynomial[1] to your Hamming or BCH code, then you can make
sure the ECC of an erased page (possibly with OOB bytes) is actually a sequence of 0xff bytes.
This trick enables bitflip correction on data that has never been programmed.
Take a look at nand_ecc.c:63 (or nand_bch.c:62) for an example.
This trick is not possible on all hardware ECC engines.
 
> 
> > Moreover, reading just the OOB region with ECC may require a full page read on some drivers
> > (when OOB and data are parts of the same codeword).
> > 
> > To me, the only strong reason for wanting OOB ECC is the implementation of YAFFS2 or similar filesystems
> > which require OOB metadata protection. But maybe I'm missing some other use cases ?
> > 
> > What do you think ?
> 
> 
> If we assume the oob bytes on the first page of a good block can contain
> anything, won't simply counting the bits make the risk of falsly identifying a
> bb marker unacceptably high?

Counting bits on a BBM marker byte basically gives a 4-bitflip protection on a _single_ byte, which is
*extremely* unlikely. If you assume an erased good block can contain garbage in its OOB region, then
it can indeed be wrongly identified as a bad block; but this is also true if you use OOB ECC
(e.g. exceeding correction capacity).
In practice, since the bad block marker byte is normally never programmed to anything other than 0xff,
there is no reason why we should find garbage in it (even if a power failure occurs during an
erase/program operation).
BR,
Mike Dunn July 17, 2012, 6:10 p.m. UTC | #9
On 07/16/2012 02:34 PM, Ivan Djelic wrote:
> On Mon, Jul 16, 2012 at 08:36:56PM +0200, Mike Dunn wrote:
>>
>> This is a certainty, no?  Erased, by definition, includes any ecc bytes.
> 
> If you add the right polynomial[1] to your Hamming or BCH code, then you can make
> sure the ECC of an erased page (possibly with OOB bytes) is actually a sequence of 0xff bytes.
> This trick enables bitflip correction on data that has never been programmed.
> Take a look at nand_ecc.c:63 (or nand_bch.c:62) for an example.
> This trick is not possible on all hardware ECC engines.


Ah, yes, I forgot.  You explained this when you were helping me with ecc on the
diskonchip.


>  
>>
>>> Moreover, reading just the OOB region with ECC may require a full page read on some drivers
>>> (when OOB and data are parts of the same codeword).
>>>
>>> To me, the only strong reason for wanting OOB ECC is the implementation of YAFFS2 or similar filesystems
>>> which require OOB metadata protection. But maybe I'm missing some other use cases ?
>>>
>>> What do you think ?
>>
>>
>> If we assume the oob bytes on the first page of a good block can contain
>> anything, won't simply counting the bits make the risk of falsly identifying a
>> bb marker unacceptably high?
> 
> Counting bits on a BBM marker byte basically gives a 4-bitflip protection on a _single_ byte, which is
> *extremely* unlikely. If you assume an erased good block can contain garbage in its OOB region, then
> it can indeed be wrongly identified as a bad block; but this is also true if you use OOB ECC
> (e.g. exceeding correction capacity).
> In practice, since the bad block marker byte is normally never programmed to anything other than 0xff,
> there is no reason why we should find garbage in it (even if a power failure occurs during an
> erase/program operation).


This is what I was missing.  I was assuming that the bb marker byte could be
programmed to an arbitrary value by the user.  I see now that e.g., davinci
defines the bb marker position in its ecc.layout definition, so it will never be
written as long as MTD_OPS_AUTO_OOB mode is used.  I need to fix that in the docg4.

Are you trying to gently nudge me on this? :)  I am loathe to touch that bbm
code.  OTOH, the docg4 would benefit, since it uses an oob bb marker.

Thanks again Ivan,
Mike
Artem Bityutskiy Aug. 15, 2012, 12:05 p.m. UTC | #10
On Fri, 2012-06-22 at 16:35 -0700, Brian Norris wrote:
> scan_read_raw_oob() is used in only in places where the MTD_OPS_PLACE_OOB
> mode is preferable MTD_OPS_RAW mode, so use MTD_OPS_PLACE_OOB instead.
> MTD_OPS_PLACE_OOB provides the same functionality with the potential[1]
> added bonus of error correction.

Amended the commit message to satisfy Smulik's request, and pushed to
l2-mtd.git, thanks Brian!

You are doing great job de-mystifying the NAND/BBT code! Sometimes I
feel like you are playing too safe and you could be a bit more
aggressive removing half-baked or obviously broken stuff.
Shmulik Ladkani Aug. 15, 2012, 2:31 p.m. UTC | #11
Hi Artem,

On Wed, 15 Aug 2012 15:05:17 +0300 Artem Bityutskiy <dedekind1@gmail.com> wrote:
> On Fri, 2012-06-22 at 16:35 -0700, Brian Norris wrote:
> > scan_read_raw_oob() is used in only in places where the MTD_OPS_PLACE_OOB
> > mode is preferable MTD_OPS_RAW mode, so use MTD_OPS_PLACE_OOB instead.
> > MTD_OPS_PLACE_OOB provides the same functionality with the potential[1]
> > added bonus of error correction.
> 
> Amended the commit message to satisfy Smulik's request, and pushed to
> l2-mtd.git, thanks Brian!

I had some second thoughts WRT the move from MTD_OPS_RAW to
MTD_OPS_PLACE_OOB.

When reading BBT content, MTD_OPS_PLACE_OOB is preferred.

But when scanning the OOBs for BBMs (when creating the table), I guess
MTD_OPS_RAW should be used - since manufacturer bad blocks are not
protected by ECC (as Matthieu noted in [1], and I elaborated in [2]).

Can you please review and comment regarding the ideas discussed there?

Thanks,
Shmulik

[1]
http://thread.gmane.org/gmane.linux.drivers.mtd/42243/focus=42365
[2]
http://thread.gmane.org/gmane.linux.drivers.mtd/42243/focus=42669
Artem Bityutskiy Aug. 16, 2012, 10:40 a.m. UTC | #12
Hi Shmulik,

On Wed, 2012-08-15 at 17:31 +0300, Shmulik Ladkani wrote:
> > Amended the commit message to satisfy Smulik's request, and pushed to
> > l2-mtd.git, thanks Brian!
> 
> I had some second thoughts WRT the move from MTD_OPS_RAW to
> MTD_OPS_PLACE_OOB.
> 
> When reading BBT content, MTD_OPS_PLACE_OOB is preferred.
> 
> But when scanning the OOBs for BBMs (when creating the table), I guess
> MTD_OPS_RAW should be used - since manufacturer bad blocks are not
> protected by ECC (as Matthieu noted in [1], and I elaborated in [2]).
> 
> Can you please review and comment regarding the ideas discussed there?
> 
> Thanks,
> Shmulik
> 
> [1]
> http://thread.gmane.org/gmane.linux.drivers.mtd/42243/focus=42365
> [2]
> http://thread.gmane.org/gmane.linux.drivers.mtd/42243/focus=42669

I cannot spend much time reading the threads carefully, so I went
through the discussion only very briefly. Apologies. But I took the
patches because:

1. They do not break existing (in-tree) systems which do not have OOB
protected with ECC. This means the patches are harmless and while useful
because they clean up the code.

2. For systems like Brian has where OOB is protected with ECC, the BBM
should just not be covered by the OOB ECC. And the OOB scanning code
should not read non-BBM bytes, and we are done. This seems to be the
only logical setup for me.

If it is not the case, I just consider it is Brian's problem, he'll deal
with it and send us another set of patches.

Or to put it differently: he sent a good clean-up, which does not seem t
break anything, why would we not take it?

If you object merging these patches, I'd appreciate if you could
elaborate why they would be _harmful_ to have.

Thanks!
Shmulik Ladkani Aug. 20, 2012, 1:12 p.m. UTC | #13
Hi Artem,

On Thu, 16 Aug 2012 13:40:02 +0300 Artem Bityutskiy <dedekind1@gmail.com> wrote:
> If you object merging these patches, I'd appreciate if you could
> elaborate why they would be _harmful_ to have.

Thanks. I understand your point.

Took another look, and no, these are _not_ harmful to have.

Regards,
Shmulik
Angus CLARK Nov. 7, 2013, 2:56 p.m. UTC | #14
Hi Brian,

Firstly, apologies for dragging up an issue that dates back well over a year!

While preparing to upstream an out-of-tree NAND driver, we have fallen foul of
the change from MTD_OPS_RAW to MTD_OPS_PLACE_OOB in nand_bbt.c:scan_read_oob().

One issue relates to what is at best a hack within our own driver, and that is
for us to deal with :-)  However, I also have a concern that the patch could
result in genuine bad blocks escaping detection.

As I understand it, the patch was attempting to address the following situation:
    - NAND-resident BBTs are not used.
    - The BBT is re-created on each boot by scanning for MBBM.
    - A page write yields one or more bit-flips in the location used for the
MBBM, resulting in non-0xff data being present.
    - The non-0xff data is then misinterpreted as a MBBM on a subsequent boot,
giving a false bad-block.

In cases where the ECC scheme covers the MBBM location, then I can see that
enabling the ECC would cause the non-0xff data to be corrected, and therefore
avoid the block being falsely identified as bad.

However, I can also construct a situation where a genuine MBBM gets "corrected"
to 0xff.  Consider, for example, an 8-bit ECC scheme covering the MBBM location,
where the ECC for a sector of all 0xff data is also all 0xff.  In this case, a
MBBM of 0x00, with the remaining data all 0xff, would get "corrected" to 0xff.
Although perhaps a slightly contrived example, the S/W BCH ECC included in the
kernel scheme can be driven in this way, and I have seen blocks marked as bad
with this pattern in the past.

It is difficult to know if your particular system could suffer in this way.  It
all depends on the nature of your ECC scheme.  I guess my concern is that the
patch deviates from what is recommended by the NAND manufacturers, and that it
makes certain assumptions on how the ECC scheme operates.

My own view is that the only safe way to record and track bad blocks is to use
NAND-resident BBTs; after all, if a block is bad then there is no guarantee that
an attempt to write a MBBM would succeed.  NAND-resident BBTs would also avoid
the problem the patch was attempting fix in the first place.

Cheers,

Angus


On 07/13/2012 06:39 PM, Brian Norris wrote:
> On Tue, Jul 10, 2012 at 12:45 AM, Matthieu CASTET
> <matthieu.castet@parrot.com> wrote:
>> Brian Norris a écrit :
>>> scan_read_raw_oob() is used in only in places where the MTD_OPS_PLACE_OOB
>>> mode is preferable MTD_OPS_RAW mode, so use MTD_OPS_PLACE_OOB instead.
>>> MTD_OPS_PLACE_OOB provides the same functionality with the potential[1]
>>> added bonus of error correction.
>>>
>>> This brings scan_block_full() in line with scan_block_fast() so that they
>>> both read bad block markers with MTD_OPS_PLACE_OOB. This can help in
>>> preventing 0xff markers (in good blocks) from being interpreted as bad
>>> block indicators in the presence of a single bitflip.
>>
>> As far I understand the code, this work when "chip->ecc.read_oob" (used in
>> nand_do_read_oob) correct bit flip.
>>
>> But I see no "chip->ecc.read_oob" implementation that can return bit flip. Is
>> that expected ?
> 
> I have an out-of-tree driver that corrects OOB bitflips. Is there
> really no other HW out there that corrects OOB errors?
> 
> Anyway, I understand that my driver is an outlier here, but I don't
> see a real disadvantage in these changes. But on the positive side, I
> expect that in the future, more drivers/HW will either want to stop
> using OOB for anything at all or will want ECC protection for OOB.
> 
>> This can also work when nand_do_read_ops is used (ops->datbuf != NULL). But it
>> is hard to see case where it can correct bit flip in bad block marker. Do you
>> have any exemple ?
> 
> First of all, this has no effect if the driver does not protect OOB
> with ECC (i.e., for OOB-only reads, MTD_OPS_PLACE_OOB == MTD_OPS_RAW).
> So the following argument only applies when OOB is ECC-protected.
> 
> Consider a *good* block that is written with filesystem data. On
> bootup, Linux may scan this block's BBM to check if it is bad. If a
> bitflip occurs in the bad block marker, then it may be erroneously
> considered bad.
> 
> Similarly, if a block was marked bad from wear (not factory-marked),
> then its BBM may be written along with ECC protection. Then, when we
> scan for bad blocks, it will be protected from bitflips that could
> possibly cause 0x00 to appear non-zero. (This is not a big issue,
> since 'non-zero' is still bad, as long as 0x00 didn't flip to 0xff -
> quite unlikely...)
> 
>> PS : Did you have any comment on
>> http://thread.gmane.org/gmane.linux.drivers.mtd/42243 ?
> 
> I read it, and it seems promising. I agree with much of the premise
> (that nand_bbt.c is ugly and repetitive at times) but haven't had
> enough time to review properly. Sorry. I'm a bit backlogged and will
> be for a few weeks, I think. But I'll see what I can do.
> 
> Thanks,
> Brian
> 
> ______________________________________________________
> Linux MTD discussion mailing list
> http://lists.infradead.org/mailman/listinfo/linux-mtd/
>
Brian Norris Nov. 18, 2013, 6:36 p.m. UTC | #15
Hi Angus,

On Thu, Nov 07, 2013 at 02:56:50PM +0000, Angus Clark wrote:
> Firstly, apologies for dragging up an issue that dates back well over a year!

No problem. Some of these same issues have been nagging at me in the
back of my head, actually. I don't think this problem is solved
completely correctly, so it's good you bring it up!

> While preparing to upstream an out-of-tree NAND driver, we have fallen foul of
> the change from MTD_OPS_RAW to MTD_OPS_PLACE_OOB in nand_bbt.c:scan_read_oob().
> 
> One issue relates to what is at best a hack within our own driver, and that is
> for us to deal with :-)  However, I also have a concern that the patch could
> result in genuine bad blocks escaping detection.
> 
> As I understand it, the patch was attempting to address the following situation:
>     - NAND-resident BBTs are not used.
>     - The BBT is re-created on each boot by scanning for MBBM.
>     - A page write yields one or more bit-flips in the location used for the
> MBBM, resulting in non-0xff data being present.
>     - The non-0xff data is then misinterpreted as a MBBM on a subsequent boot,
> giving a false bad-block.
> 
> In cases where the ECC scheme covers the MBBM location, then I can see that
> enabling the ECC would cause the non-0xff data to be corrected, and therefore
> avoid the block being falsely identified as bad.

This is more or less the situation that was being addressed. But my
particular situation was actually targeting situations in which a BBT is
used "most of the time", but sometimes (for various reasons) the BBT may
be erased/corrupted and need rebuilt on an unclean flash. But it's a
similar scenario either way, IMO.

> However, I can also construct a situation where a genuine MBBM gets "corrected"
> to 0xff.  Consider, for example, an 8-bit ECC scheme covering the MBBM location,
> where the ECC for a sector of all 0xff data is also all 0xff.  In this case, a
> MBBM of 0x00, with the remaining data all 0xff, would get "corrected" to 0xff.
> Although perhaps a slightly contrived example, the S/W BCH ECC included in the
> kernel scheme can be driven in this way, and I have seen blocks marked as bad
> with this pattern in the past.

Yes, I suspected that this may be possible. But I never observed it
myself (I don't use SW ECC; I only use my HW ECC).

> It is difficult to know if your particular system could suffer in this way.  It
> all depends on the nature of your ECC scheme.  I guess my concern is that the
> patch deviates from what is recommended by the NAND manufacturers, and that it
> makes certain assumptions on how the ECC scheme operates.

My system cannot suffer in this way, because (unfortunately) my HW ECC
does not consider all-0xff to be valid ECC. So it cannot correct
bitflips in erased pages, nor can it "correct" a bad block marker from
0x00 to 0xff.

I'm not sure it's a deviation from manufacturer recommendations, since
manufacturers make no statement about what to do if the BBT fails. But
they do seem to imply that you should build the BBT once and never
revisit the bad block markers.

I agree that my patch does make a few assumptions about the ECC scheme.

> My own view is that the only safe way to record and track bad blocks is to use
> NAND-resident BBTs; after all, if a block is bad then there is no guarantee that
> an attempt to write a MBBM would succeed.  NAND-resident BBTs would also avoid
> the problem the patch was attempting fix in the first place.

I agree that BBTs should be used (I use them). However, they do not
solve my original problem completely: how can a BBT be rebuilt robustly?

I think that my original approach (use ECC while scanning BBMs) is not
100% correct, but I don't think dropping it is 100% correct either.
Rather, I would prefer taking some form of Matthieu's patch (linked in
the reply that you're bringing up from > 1 year ago!) so that we use the
'badblockbits' parameter appropriately. Then I would be fine with
scanning BBMs in RAW mode, and I think we can satisfy both my use case
and yours safely.

> On 07/13/2012 06:39 PM, Brian Norris wrote:
> > On Tue, Jul 10, 2012 at 12:45 AM, Matthieu CASTET <matthieu.castet@parrot.com> wrote:
> >> http://thread.gmane.org/gmane.linux.drivers.mtd/42243 ?

BTW, this patch came into discussion again recently for other reasons:

  http://lists.infradead.org/pipermail/linux-mtd/2013-November/049692.html

Anything more you can contribute to this general conversation would be
more than welcome! I think Ezequiel and I are still interested in
unifying some of nand_bbt and nand_base, regarding BBM scanning, and I
think we can resolve several pending issues by doing so.

Thanks,
Brian
diff mbox

Patch

diff --git a/drivers/mtd/nand/nand_bbt.c b/drivers/mtd/nand/nand_bbt.c
index 45cdb97..7b9a0a7 100644
--- a/drivers/mtd/nand/nand_bbt.c
+++ b/drivers/mtd/nand/nand_bbt.c
@@ -289,14 +289,24 @@  static int scan_read_data(struct mtd_info *mtd, uint8_t *buf, loff_t offs,
 	return mtd_read(mtd, offs, len, &retlen, buf);
 }
 
-/* Scan read data from flash */
+/**
+ * scan_read_oob - [GENERIC] Scan data+OOB region to buffer
+ * @mtd: MTD device structure
+ * @buf: temporary buffer
+ * @offs: offset at which to scan
+ * @len: length of data region to read
+ *
+ * Scan read data from data+OOB. May traverse multiple pages, interleaving
+ * page,OOB,page,OOB,... in buf. Completes transfer and returns the "strongest"
+ * ECC condition (error or bitflip). May quit on the first (non-ECC) error.
+ */
 static int scan_read_oob(struct mtd_info *mtd, uint8_t *buf, loff_t offs,
 			 size_t len)
 {
 	struct mtd_oob_ops ops;
-	int res;
+	int res, ret = 0;
 
-	ops.mode = MTD_OPS_RAW;
+	ops.mode = MTD_OPS_PLACE_OOB;
 	ops.ooboffs = 0;
 	ops.ooblen = mtd->oobsize;
 
@@ -306,15 +316,18 @@  static int scan_read_oob(struct mtd_info *mtd, uint8_t *buf, loff_t offs,
 		ops.oobbuf = buf + ops.len;
 
 		res = mtd_read_oob(mtd, offs, &ops);
-
-		if (res)
-			return res;
+		if (res) {
+			if (!mtd_is_bitflip_or_eccerr(res))
+				return res;
+			else if (mtd_is_eccerr(res) || !ret)
+				ret = res;
+		}
 
 		buf += mtd->oobsize + mtd->writesize;
 		len -= mtd->writesize;
 		offs += mtd->writesize;
 	}
-	return 0;
+	return ret;
 }
 
 static int scan_read(struct mtd_info *mtd, uint8_t *buf, loff_t offs,