diff mbox

flash bbt broken due to unitialized bitflip_threshold?

Message ID 4FD0E632.2080905@newsguy.com
State New, archived
Headers show

Commit Message

Mike Dunn June 7, 2012, 5:34 p.m. UTC
On 06/07/2012 12:36 AM, Artem Bityutskiy wrote:
> On Wed, 2012-06-06 at 19:55 +0200, Ivan Djelic wrote:
>> 1. on legacy systems with 1-bit nand and strength = 1, default bitflip_threshold is 1
>> 2. on legacy systems with 1-bit nand and strength > 1, default bitflip_threshold is 'strength'
>> 3. on new systems   with 2+ bit nand and strength > 1, default bitflip_threshold is 'strength'
> 
> Ivan, Shmulik,
> 
> I've gave this another though, and I think it is OK to leave this as it
> is now. Your replies were very helpful, thanks.


Yes, many thanks guys.  It seems I picked the wrong couple days to be away from
email.

This is not an issue on my docg4 because it does not use a flash-based BBT, but
instead scans the whole device for blocks that are marked bad in oob. EUCLEAN is
ignored in this case.  The following code is present in both scan_block_full()
and scan_block_fast():

	/* Ignore ECC errors when checking for BBM */
	if (ret && !mtd_is_bitflip_or_eccerr(ret))
		return ret;

Digging into this, it turns out this is a problem only in the case of:

  (1) nand->td != NULL (flash-based BBT present)
  (2) NAND_BBT_NO_OOB is not set

Here's the call stack for the above case, and with NAND_BBT_ABSPAGE not set
(this is true for the mxc_nand controller).  The problem occurs in
scan_read_raw_oob()...

nand_scan_bbt()
|
+-> search_read_bbts()                ignores return code
    |
    +-> search_bbt()                  always returns 1
        |
        +-> scan_read_raw()           -EUCLEAN propagated up
            |
            +-> scan_read_raw_oob()   returns without updating buf, len, offs
                |
                +-> mtd_read_oob()    -EUCLEAN returned


I addition to the patch suggested by Shmulik, I would also suggest the
following, in the interest of consistency with the bad block scanning code, and
also thoroughness:

                buf += mtd->oobsize + mtd->writesize;

Shmulik, please let me know if yuo'd like me to submit the patch you suggested,
and I will do so promptly.  Otherwise, thanks again!

More gory details... by comparison, here's the call stack for the same case,
except NAND_BBT_NO_OOB is set.  Here, there's no problem.

nand_scan_bbt()
|
+-> search_read_bbts()                ignores return code
    |
    +-> search_bbt()                  always returns 1
        |
        +-> scan_read_raw()           -EUCLEAN propagated up
            |
            +-> scan_read_raw_data()  -EUCLEAN propagated up
                |
                +-> mtd_read_oob()    -EUCLEAN returned


Thanks, and sorry for the oversight,
Mike

Comments

Shmulik Ladkani June 7, 2012, 9:07 p.m. UTC | #1
Hi Mike,

On Thu, 07 Jun 2012 10:34:42 -0700 Mike Dunn <mikedunn@newsguy.com> wrote:
> I addition to the patch suggested by Shmulik, I would also suggest the
> following, in the interest of consistency with the bad block scanning code, and
> also thoroughness:
> 
> diff --git a/drivers/mtd/nand/nand_bbt.c b/drivers/mtd/nand/nand_bbt.c
> index 30d1319..ed59aa8 100644
> --- a/drivers/mtd/nand/nand_bbt.c
> +++ b/drivers/mtd/nand/nand_bbt.c
> @@ -319,7 +319,7 @@ static int scan_read_raw_oob(struct mtd_info *mtd, uint8_t
> *buf, loff_t offs,
> 
>                 res = mtd_read_oob(mtd, offs, &ops);
> 
> -               if (res)
> +               if (res && !mtd_is_bitflip_or_eccerr(res))
>                         return res;
> 
>                 buf += mtd->oobsize + mtd->writesize;

Saw you submitted it.

I'll review it thoroughly later on, didn't grasp it on first glance.

> Shmulik, please let me know if yuo'd like me to submit the patch you suggested,
> and I will do so promptly.  Otherwise, thanks again!

Thanks Mike.

I'll work on the patch, as I already digged further, and found two
more problematic spots where same "threshold initialization" is needed.
Obviously, your review would be much appreciated.
I'll probably submit during Friday or Sunday.

Artem, is the ETA ok with your merge plans?

Regards,
Shmulik
Shmulik Ladkani June 10, 2012, 7:08 a.m. UTC | #2
Hi,

On Thu, 07 Jun 2012 10:34:42 -0700 Mike Dunn <mikedunn@newsguy.com> wrote:
> I addition to the patch suggested by Shmulik, I would also suggest the
> following, in the interest of consistency with the bad block scanning code, and
> also thoroughness:
> 
> diff --git a/drivers/mtd/nand/nand_bbt.c b/drivers/mtd/nand/nand_bbt.c
> index 30d1319..ed59aa8 100644
> --- a/drivers/mtd/nand/nand_bbt.c
> +++ b/drivers/mtd/nand/nand_bbt.c
> @@ -319,7 +319,7 @@ static int scan_read_raw_oob(struct mtd_info *mtd, uint8_t
> *buf, loff_t offs,
> 
>                 res = mtd_read_oob(mtd, offs, &ops);
> 
> -               if (res)
> +               if (res && !mtd_is_bitflip_or_eccerr(res))
>                         return res;
> 
>                 buf += mtd->oobsize + mtd->writesize;

While I'm trying to figure out the issue this patch adresses, I found
out we have an inconsistent API, WRT reporting -EUCLEAN.

The 'mtd_read()' API is currently responsible for returning EUCLEAN,
after examining return value of the 'mtd->_read()' method, quote:

	ret_code = mtd->_read(mtd, from, len, retlen, buf);
	if (unlikely(ret_code < 0))
		return ret_code;
	if (mtd->ecc_strength == 0)
		return 0;	/* device lacks ecc */
	return ret_code >= mtd->bitflip_threshold ? -EUCLEAN : 0;

However, 'mtd_read_oob()' does NOT have the 'ret_code >= bitflip_threshold'
comparison, quote:

static inline int mtd_read_oob(struct mtd_info *mtd, loff_t from,
			       struct mtd_oob_ops *ops)
{
	ops->retlen = ops->oobretlen = 0;
	if (!mtd->_read_oob)
		return -EOPNOTSUPP;
	return mtd->_read_oob(mtd, from, ops);
}

Meaning, 'mtd_read_oob()' returns the direct result of the '_read_oob'
call to the MTD user.

Now let's examine what '_read_oob' returns in the nand case, implemented
in 'nand_read_oob'; There are 2 possible uses of 'mtd_read_oob':

1. Supplying NULL 'ops->datbuf', to indicate only the OOB needs to be
   read.
   Execution goes into 'nand_do_read_oob()', which may return EUCLEAN
   (independently of the bitflip_threshold).
   OK. This was discussed in [1].

2. Supplying a non-null 'ops->datbuf', to indicate read the page content
   along with its OOB.
   Execution goes into 'nand_do_read_ops', which returns max_bitflips.
   Not OK.

To summarize, return value of the 'mtd_read_oob()' API is
inconsistent (sometimes max_bitflips, other times EUCLEAN), and does not
adhere to return code policy of 'mtd_read()' - return EUCLEAN to the users,
they're not interested with internals such as max_bitflips.

This might affect users of 'mtd_read_oob()' which might falsely think
the OOB read has failed.

Mike, care to take a look and amend if necessary?
I think this needs to be fixed pre v3.5 as well.

Regards,
Shmulik

[1] http://lists.infradead.org/pipermail/linux-mtd/2012-May/041407.html
diff mbox

Patch

diff --git a/drivers/mtd/nand/nand_bbt.c b/drivers/mtd/nand/nand_bbt.c
index 30d1319..ed59aa8 100644
--- a/drivers/mtd/nand/nand_bbt.c
+++ b/drivers/mtd/nand/nand_bbt.c
@@ -319,7 +319,7 @@  static int scan_read_raw_oob(struct mtd_info *mtd, uint8_t
*buf, loff_t offs,

                res = mtd_read_oob(mtd, offs, &ops);

-               if (res)
+               if (res && !mtd_is_bitflip_or_eccerr(res))
                        return res;