diff mbox

mtd: nand: default bitflip-reporting threshold to 75% of correction strength

Message ID 1421095889-12717-1-git-send-email-computersforpeace@gmail.com
State Accepted
Commit 240181fd0ffa69cac08d6b06c94e843707370f5f
Headers show

Commit Message

Brian Norris Jan. 12, 2015, 8:51 p.m. UTC
The MTD API reports -EUCLEAN only if the maximum number of bitflips
found in any ECC block exceeds a certain threshold. This is done to
avoid excessive -EUCLEAN reports to MTD users, which may induce
additional scrubbing of data, even when the ECC algorithm in use is
perfectly capable of handling the bitflips.

This threshold can be controlled by user-space (via sysfs), to allow
users to determine what they are willing to tolerate in their
application. But it still helps to have sane defaults.

In recent discussion [1], it was pointed out that our default threshold
is equal to the correction strength. That means that we won't actually
report any -EUCLEAN (i.e., "bitflips were corrected") errors until there
are almost too many to handle. It was determined that 3/4 of the
correction strength is probably a better default.

[1] http://lists.infradead.org/pipermail/linux-mtd/2015-January/057259.html

Signed-off-by: Brian Norris <computersforpeace@gmail.com>
---
 drivers/mtd/nand/nand_base.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Huang Shijie Jan. 13, 2015, 2:01 a.m. UTC | #1
On Mon, Jan 12, 2015 at 12:51:29PM -0800, Brian Norris wrote:
> The MTD API reports -EUCLEAN only if the maximum number of bitflips
> found in any ECC block exceeds a certain threshold. This is done to
> avoid excessive -EUCLEAN reports to MTD users, which may induce
> additional scrubbing of data, even when the ECC algorithm in use is
> perfectly capable of handling the bitflips.
> 
> This threshold can be controlled by user-space (via sysfs), to allow
> users to determine what they are willing to tolerate in their
> application. But it still helps to have sane defaults.
> 
> In recent discussion [1], it was pointed out that our default threshold
> is equal to the correction strength. That means that we won't actually
> report any -EUCLEAN (i.e., "bitflips were corrected") errors until there
> are almost too many to handle. It was determined that 3/4 of the
> correction strength is probably a better default.
> 
> [1] http://lists.infradead.org/pipermail/linux-mtd/2015-January/057259.html
> 
> Signed-off-by: Brian Norris <computersforpeace@gmail.com>
> ---
>  drivers/mtd/nand/nand_base.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c
> index 816b5c1fd416..3f24b587304f 100644
> --- a/drivers/mtd/nand/nand_base.c
> +++ b/drivers/mtd/nand/nand_base.c
> @@ -4171,7 +4171,7 @@ int nand_scan_tail(struct mtd_info *mtd)
>  	 * properly set.
>  	 */
>  	if (!mtd->bitflip_threshold)
> -		mtd->bitflip_threshold = mtd->ecc_strength;
> +		mtd->bitflip_threshold = DIV_ROUND_UP(mtd->ecc_strength * 3, 4);
After this patch, we have to change the bitflip_threshold to
ecc_strength manually when we do the mtd_biterrors.ko test.
Anyway, I think this patch makes sense.

>  
>  	/* Check, if we should skip the bad block table scan */
>  	if (chip->options & NAND_SKIP_BBTSCAN)
> -- 
Acked-by: Huang Shijie <shijie.huang@intel.com>
Brian Norris Jan. 13, 2015, 2:38 a.m. UTC | #2
On Mon, Jan 12, 2015 at 6:01 PM, Huang Shijie <shijie.huang@intel.com> wrote:
> On Mon, Jan 12, 2015 at 12:51:29PM -0800, Brian Norris wrote:
>> --- a/drivers/mtd/nand/nand_base.c
>> +++ b/drivers/mtd/nand/nand_base.c
>> @@ -4171,7 +4171,7 @@ int nand_scan_tail(struct mtd_info *mtd)
>>        * properly set.
>>        */
>>       if (!mtd->bitflip_threshold)
>> -             mtd->bitflip_threshold = mtd->ecc_strength;
>> +             mtd->bitflip_threshold = DIV_ROUND_UP(mtd->ecc_strength * 3, 4);
> After this patch, we have to change the bitflip_threshold to
> ecc_strength manually when we do the mtd_biterrors.ko test.

Why?

Brian
Huang Shijie Jan. 13, 2015, 2:56 a.m. UTC | #3
On Mon, Jan 12, 2015 at 06:38:49PM -0800, Brian Norris wrote:
> On Mon, Jan 12, 2015 at 6:01 PM, Huang Shijie <shijie.huang@intel.com> wrote:
> > On Mon, Jan 12, 2015 at 12:51:29PM -0800, Brian Norris wrote:
> >> --- a/drivers/mtd/nand/nand_base.c
> >> +++ b/drivers/mtd/nand/nand_base.c
> >> @@ -4171,7 +4171,7 @@ int nand_scan_tail(struct mtd_info *mtd)
> >>        * properly set.
> >>        */
> >>       if (!mtd->bitflip_threshold)
> >> -             mtd->bitflip_threshold = mtd->ecc_strength;
> >> +             mtd->bitflip_threshold = DIV_ROUND_UP(mtd->ecc_strength * 3, 4);
> > After this patch, we have to change the bitflip_threshold to
> > ecc_strength manually when we do the mtd_biterrors.ko test.
> 
> Why?
sorry, I misunderstood the mtd_biterrors code.
It is fine after this patch.

thanks
Huang Shijie
Richard Weinberger Jan. 13, 2015, 1:25 p.m. UTC | #4
Am 12.01.2015 um 21:51 schrieb Brian Norris:
> The MTD API reports -EUCLEAN only if the maximum number of bitflips
> found in any ECC block exceeds a certain threshold. This is done to
> avoid excessive -EUCLEAN reports to MTD users, which may induce
> additional scrubbing of data, even when the ECC algorithm in use is
> perfectly capable of handling the bitflips.
> 
> This threshold can be controlled by user-space (via sysfs), to allow
> users to determine what they are willing to tolerate in their
> application. But it still helps to have sane defaults.
> 
> In recent discussion [1], it was pointed out that our default threshold
> is equal to the correction strength. That means that we won't actually
> report any -EUCLEAN (i.e., "bitflips were corrected") errors until there
> are almost too many to handle. It was determined that 3/4 of the
> correction strength is probably a better default.
> 
> [1] http://lists.infradead.org/pipermail/linux-mtd/2015-January/057259.html

I like this change but I have one question.

UBI will treat a block as bad if it shows bitflips (EUCLEAN) right
after erasure. For SLC NAND this works very well.
Does this also hold for MLC NAND? If one or two bit flips are okay
even for a freshly erased MLC NAND this change could cause UBI to
mark good blocks as bad depending on the ECC strength.

Thanks,
//richard
Brian Norris Jan. 13, 2015, 6:48 p.m. UTC | #5
Hi Richard,

On Tue, Jan 13, 2015 at 02:25:30PM +0100, Richard Weinberger wrote:
> Am 12.01.2015 um 21:51 schrieb Brian Norris:
> > The MTD API reports -EUCLEAN only if the maximum number of bitflips
> > found in any ECC block exceeds a certain threshold. This is done to
> > avoid excessive -EUCLEAN reports to MTD users, which may induce
> > additional scrubbing of data, even when the ECC algorithm in use is
> > perfectly capable of handling the bitflips.
> > 
> > This threshold can be controlled by user-space (via sysfs), to allow
> > users to determine what they are willing to tolerate in their
> > application. But it still helps to have sane defaults.
> > 
> > In recent discussion [1], it was pointed out that our default threshold
> > is equal to the correction strength. That means that we won't actually
> > report any -EUCLEAN (i.e., "bitflips were corrected") errors until there
> > are almost too many to handle. It was determined that 3/4 of the
> > correction strength is probably a better default.
> > 
> > [1] http://lists.infradead.org/pipermail/linux-mtd/2015-January/057259.html
> 
> I like this change but I have one question.
> 
> UBI will treat a block as bad if it shows bitflips (EUCLEAN) right
> after erasure.

Can you elaborate? When "after erasure"? The closest I see is that UBI
will mark a block bad if it sees an -EIO failure from sync_erase() in
erase_worker(). If you have extra debug checks on, then
ubi_self_check_all_ff() could potentially give you bitflip problems
after the erase, but that's an odd corner case anyway, which many
drivers have been handling in hacked together ad-hoc ways anyway (search
for "bitflips in erase pages").

So I can't pinpoint what you're talking about, exactly.

> For SLC NAND this works very well.
> Does this also hold for MLC NAND? If one or two bit flips are okay
> even for a freshly erased MLC NAND this change could cause UBI to
> mark good blocks as bad depending on the ECC strength.

I would typically assume that MLC NAND users must be using significantly
stronger ECC (e.g., 12-bit / 512-byte, at least), so "one or two
bitflips" would still fall well under 75% of 12 bits.

Brian
Richard Weinberger Jan. 13, 2015, 6:51 p.m. UTC | #6
Brian,

Am 13.01.2015 um 19:48 schrieb Brian Norris:
> Hi Richard,
> 
> On Tue, Jan 13, 2015 at 02:25:30PM +0100, Richard Weinberger wrote:
>> Am 12.01.2015 um 21:51 schrieb Brian Norris:
>>> The MTD API reports -EUCLEAN only if the maximum number of bitflips
>>> found in any ECC block exceeds a certain threshold. This is done to
>>> avoid excessive -EUCLEAN reports to MTD users, which may induce
>>> additional scrubbing of data, even when the ECC algorithm in use is
>>> perfectly capable of handling the bitflips.
>>>
>>> This threshold can be controlled by user-space (via sysfs), to allow
>>> users to determine what they are willing to tolerate in their
>>> application. But it still helps to have sane defaults.
>>>
>>> In recent discussion [1], it was pointed out that our default threshold
>>> is equal to the correction strength. That means that we won't actually
>>> report any -EUCLEAN (i.e., "bitflips were corrected") errors until there
>>> are almost too many to handle. It was determined that 3/4 of the
>>> correction strength is probably a better default.
>>>
>>> [1] http://lists.infradead.org/pipermail/linux-mtd/2015-January/057259.html
>>
>> I like this change but I have one question.
>>
>> UBI will treat a block as bad if it shows bitflips (EUCLEAN) right
>> after erasure.
> 
> Can you elaborate? When "after erasure"? The closest I see is that UBI
> will mark a block bad if it sees an -EIO failure from sync_erase() in
> erase_worker(). If you have extra debug checks on, then
> ubi_self_check_all_ff() could potentially give you bitflip problems
> after the erase, but that's an odd corner case anyway, which many
> drivers have been handling in hacked together ad-hoc ways anyway (search
> for "bitflips in erase pages").
> 
> So I can't pinpoint what you're talking about, exactly.

See torture_peb()
out:
        mutex_unlock(&ubi->buf_mutex);
        if (err == UBI_IO_BITFLIPS || mtd_is_eccerr(err)) {
                /*
                 * If a bit-flip or data integrity error was detected, the test
                 * has not passed because it happened on a freshly erased
                 * physical eraseblock which means something is wrong with it.
                 */
                ubi_err(ubi, "read problems on freshly erased PEB %d, must be bad",
                        pnum);
                err = -EIO;
        }

>> For SLC NAND this works very well.
>> Does this also hold for MLC NAND? If one or two bit flips are okay
>> even for a freshly erased MLC NAND this change could cause UBI to
>> mark good blocks as bad depending on the ECC strength.
> 
> I would typically assume that MLC NAND users must be using significantly
> stronger ECC (e.g., 12-bit / 512-byte, at least), so "one or two
> bitflips" would still fall well under 75% of 12 bits.

Same here. I just want to make sure that UBI does not assume a perfect NAND world. :)

Thanks,
//richard
Brian Norris Jan. 13, 2015, 7:51 p.m. UTC | #7
Hi Richard,

On Tue, Jan 13, 2015 at 07:51:40PM +0100, Richard Weinberger wrote:
> Am 13.01.2015 um 19:48 schrieb Brian Norris:
> > On Tue, Jan 13, 2015 at 02:25:30PM +0100, Richard Weinberger wrote:
> >> Am 12.01.2015 um 21:51 schrieb Brian Norris:
> >>> The MTD API reports -EUCLEAN only if the maximum number of bitflips
> >>> found in any ECC block exceeds a certain threshold. This is done to
> >>> avoid excessive -EUCLEAN reports to MTD users, which may induce
> >>> additional scrubbing of data, even when the ECC algorithm in use is
> >>> perfectly capable of handling the bitflips.
> >>>
> >>> This threshold can be controlled by user-space (via sysfs), to allow
> >>> users to determine what they are willing to tolerate in their
> >>> application. But it still helps to have sane defaults.
> >>>
> >>> In recent discussion [1], it was pointed out that our default threshold
> >>> is equal to the correction strength. That means that we won't actually
> >>> report any -EUCLEAN (i.e., "bitflips were corrected") errors until there
> >>> are almost too many to handle. It was determined that 3/4 of the
> >>> correction strength is probably a better default.
> >>>
> >>> [1] http://lists.infradead.org/pipermail/linux-mtd/2015-January/057259.html
> >>
> >> I like this change but I have one question.
> >>
> >> UBI will treat a block as bad if it shows bitflips (EUCLEAN) right
> >> after erasure.
> > 
> > Can you elaborate? When "after erasure"? The closest I see is that UBI
> > will mark a block bad if it sees an -EIO failure from sync_erase() in
> > erase_worker(). If you have extra debug checks on, then
> > ubi_self_check_all_ff() could potentially give you bitflip problems
> > after the erase, but that's an odd corner case anyway, which many
> > drivers have been handling in hacked together ad-hoc ways anyway (search
> > for "bitflips in erase pages").
> > 
> > So I can't pinpoint what you're talking about, exactly.
> 
> See torture_peb()
> out:
>         mutex_unlock(&ubi->buf_mutex);
>         if (err == UBI_IO_BITFLIPS || mtd_is_eccerr(err)) {
>                 /*
>                  * If a bit-flip or data integrity error was detected, the test
>                  * has not passed because it happened on a freshly erased
>                  * physical eraseblock which means something is wrong with it.
>                  */
>                 ubi_err(ubi, "read problems on freshly erased PEB %d, must be bad",
>                         pnum);
>                 err = -EIO;
>         }

Thanks for refreshing my memory.

This is actually the same function that people have been tripping over
already, especially this:

		/* Make sure the PEB contains only 0xFF bytes */
		err = ubi_io_read(ubi, ubi->peb_buf, pnum, 0, ubi->peb_size);
		if (err)
			goto out;

		err = ubi_check_pattern(ubi->peb_buf, 0xFF, ubi->peb_size);
		if (err == 0) {
			ubi_err(ubi, "erased PEB %d, but a non-0xFF byte found",
				pnum);
			err = -EIO;
			goto out;
		}

The problems here are mostly with drivers that don't (or do a bad job
of) correcting bitflips in blank pages. But there's never been an issue
of more than a few bitflips, I don't think.

So I guess we're focusing on this chunk?

		/* Write a pattern and check it */
		memset(ubi->peb_buf, patterns[i], ubi->peb_size);
		err = ubi_io_write(ubi, ubi->peb_buf, pnum, 0, ubi->peb_size);
		if (err)
			goto out;

		memset(ubi->peb_buf, ~patterns[i], ubi->peb_size);
		err = ubi_io_read(ubi, ubi->peb_buf, pnum, 0, ubi->peb_size);
		if (err)
			goto out;

I wouldn't expect that even MLC NAND would see 75% of its spec'd
bitflips immediately after erase/program/read-verify. But I'll admit I
haven't heavily tested this.

Also interesting to consider: should we make use of the knowledge of the
manufacturer-specified minimum correction info, when available? See
nand_chip::ecc_strength_ds and nand_chip::ecc_step_ds. These are pulled
from the parmeter page (ONFI or JEDEC) or the full-ID table. Note that
some systems might overprovision their ECC strength, but it may still be
worth reporting based on the datasheet specifications. Not sure of all
the implications of doing that automatically, though.

> >> For SLC NAND this works very well.
> >> Does this also hold for MLC NAND? If one or two bit flips are okay
> >> even for a freshly erased MLC NAND this change could cause UBI to
> >> mark good blocks as bad depending on the ECC strength.
> > 
> > I would typically assume that MLC NAND users must be using significantly
> > stronger ECC (e.g., 12-bit / 512-byte, at least), so "one or two
> > bitflips" would still fall well under 75% of 12 bits.
> 
> Same here. I just want to make sure that UBI does not assume a perfect NAND world. :)

At some point the responsibility is on the user. Except for the open
problems on bitflips in erased pages [1], torture_peb() still seems to look
OK. But if there are further issues related to this threshold, we might
need to adjust UBI for a less-perfect world. e.g., notice the difference
between MTD_NANDFLASH and MTD_MLCNANDFLASH.

Recall this threshold parameter is already user-settable, and paranoid
users are likely to already set it to 75% or less anyway. So UBI should
not try to inflict too much unnecessary damage.

Brian

[1] Mostly a MTD/NAND layer problem, although it's arguably a UBI issue
    that it cares about something flash manufacturers never guaranteed;
    that an erased block contains all 0xff.
Boris Brezillon Jan. 17, 2015, 7:01 p.m. UTC | #8
Hi Brian,

On Mon, 12 Jan 2015 12:51:29 -0800
Brian Norris <computersforpeace@gmail.com> wrote:

> The MTD API reports -EUCLEAN only if the maximum number of bitflips
> found in any ECC block exceeds a certain threshold. This is done to
> avoid excessive -EUCLEAN reports to MTD users, which may induce
> additional scrubbing of data, even when the ECC algorithm in use is
> perfectly capable of handling the bitflips.
> 
> This threshold can be controlled by user-space (via sysfs), to allow
> users to determine what they are willing to tolerate in their
> application. But it still helps to have sane defaults.
> 
> In recent discussion [1], it was pointed out that our default threshold
> is equal to the correction strength. That means that we won't actually
> report any -EUCLEAN (i.e., "bitflips were corrected") errors until there
> are almost too many to handle. It was determined that 3/4 of the
> correction strength is probably a better default.
> 
> [1] http://lists.infradead.org/pipermail/linux-mtd/2015-January/057259.html
> 
> Signed-off-by: Brian Norris <computersforpeace@gmail.com>
> ---
>  drivers/mtd/nand/nand_base.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c
> index 816b5c1fd416..3f24b587304f 100644
> --- a/drivers/mtd/nand/nand_base.c
> +++ b/drivers/mtd/nand/nand_base.c
> @@ -4171,7 +4171,7 @@ int nand_scan_tail(struct mtd_info *mtd)
>  	 * properly set.
>  	 */
>  	if (!mtd->bitflip_threshold)
> -		mtd->bitflip_threshold = mtd->ecc_strength;
> +		mtd->bitflip_threshold = DIV_ROUND_UP(mtd->ecc_strength * 3, 4);

Just sharing my experience with MLC NANDs requiring read-retry: the
number of reported bitflips often raise ecc_strength value (at least
with the current read-retry approach).
This patch will definitely make UBI move NAND blocks over and over
again considering the threshold has been raised and the block is not
reliable anymore.

While I like the idea of limiting the threshold to something smaller
than what's recommended on the datasheet (or reported by ONFI) I wonder
if it won't make things worst in some cases.

Regarding the read-retry code, it currently stops retrying reading the
page once the page has been successfully retrieved (or in other terms
all bitflips have been fixed). But it might stop to soon, because by
changing the bit level threshold (in other term retrying one more time)
it might successfully read the page with less bitflips than the
previous attempt (these are just supposition, I haven't tested it yet).
If we can achieve that we could retry until we reach something below
the bitflips threshold value, and if we fail to find any, just consider
the lower number of bitflips found during those read-retry operations.

Best Regards,

Boris
Richard Weinberger Jan. 17, 2015, 7:26 p.m. UTC | #9
Am 17.01.2015 um 20:01 schrieb Boris Brezillon:
> Just sharing my experience with MLC NANDs requiring read-retry: the
> number of reported bitflips often raise ecc_strength value (at least
> with the current read-retry approach).
> This patch will definitely make UBI move NAND blocks over and over
> again considering the threshold has been raised and the block is not
> reliable anymore.

Within the last 6 months I had to face a lot of strange UBI/MTD issues.
All showed one "flaw" in UBI, namely that it was designed with good SLC
NANDs in mind.
Even some modern SLC NANDs show bad behavior like read disturb after
less than 100000 reads.
I think it is time to bite the bullet and improve UBI wrt. MLC NAND.
This is not an easy task as it needs some hardware to play with and
time/budget. But I think it is worth the effort.

Thanks,
//richard
Boris Brezillon Jan. 17, 2015, 7:42 p.m. UTC | #10
On Sat, 17 Jan 2015 20:26:44 +0100
Richard Weinberger <richard@nod.at> wrote:

> Am 17.01.2015 um 20:01 schrieb Boris Brezillon:
> > Just sharing my experience with MLC NANDs requiring read-retry: the
> > number of reported bitflips often raise ecc_strength value (at least
> > with the current read-retry approach).
> > This patch will definitely make UBI move NAND blocks over and over
> > again considering the threshold has been raised and the block is not
> > reliable anymore.
> 
> Within the last 6 months I had to face a lot of strange UBI/MTD issues.
> All showed one "flaw" in UBI, namely that it was designed with good SLC
> NANDs in mind.
> Even some modern SLC NANDs show bad behavior like read disturb after
> less than 100000 reads.
> I think it is time to bite the bullet and improve UBI wrt. MLC NAND.
> This is not an easy task as it needs some hardware to play with and
> time/budget. But I think it is worth the effort.

I do all my MLC tests on a cubietruck (embedding an Allwinner A20 SoC
and a Micron MLC NAND).
I already started to work on randomizer/scrambler support (which are
needed on some MLC chips), and added support for read-retry on a Micron
non-ONFI NAND (you can find my work here [1], but it's not ready to be
mainlined yet).
But these are all things we can handle in the NAND layer.

Then comes trickier parts, like improved bitflips robustness (as
you stated), paired pages handling (you cannot reliably write on one
page without risking to corrupt the page it is paired with, which
implies specific handling for such cases in upper layers: UBI/UBIFS ?),
and surely other things I don't remember :-).

Anyway, I'd be happy to help with any of these tasks.

Best Regards,

Boris

[1]https://github.com/bbrezillon/linux-sunxi/commits/sunxi-nand-next
Richard Weinberger Jan. 17, 2015, 7:54 p.m. UTC | #11
Am 17.01.2015 um 20:42 schrieb Boris Brezillon:
> On Sat, 17 Jan 2015 20:26:44 +0100
> Richard Weinberger <richard@nod.at> wrote:
> 
>> Am 17.01.2015 um 20:01 schrieb Boris Brezillon:
>>> Just sharing my experience with MLC NANDs requiring read-retry: the
>>> number of reported bitflips often raise ecc_strength value (at least
>>> with the current read-retry approach).
>>> This patch will definitely make UBI move NAND blocks over and over
>>> again considering the threshold has been raised and the block is not
>>> reliable anymore.
>>
>> Within the last 6 months I had to face a lot of strange UBI/MTD issues.
>> All showed one "flaw" in UBI, namely that it was designed with good SLC
>> NANDs in mind.
>> Even some modern SLC NANDs show bad behavior like read disturb after
>> less than 100000 reads.
>> I think it is time to bite the bullet and improve UBI wrt. MLC NAND.
>> This is not an easy task as it needs some hardware to play with and
>> time/budget. But I think it is worth the effort.
> 
> I do all my MLC tests on a cubietruck (embedding an Allwinner A20 SoC
> and a Micron MLC NAND).

Maybe I should get me one of these boards.
Despite I'm not really a fan of sunxi.

> I already started to work on randomizer/scrambler support (which are
> needed on some MLC chips), and added support for read-retry on a Micron
> non-ONFI NAND (you can find my work here [1], but it's not ready to be
> mainlined yet).
> But these are all things we can handle in the NAND layer.

Yep.

> Then comes trickier parts, like improved bitflips robustness (as
> you stated), paired pages handling (you cannot reliably write on one
> page without risking to corrupt the page it is paired with, which
> implies specific handling for such cases in upper layers: UBI/UBIFS ?),
> and surely other things I don't remember :-).

Unstable bits for example need also handling.
I really would like to get some input from NAND vendors what they want
us to solve in software.

I'm currently working on a solution for UBI to deal better with
read disturb. Within the next few week I hopefully have something sane
to share. :)

> Anyway, I'd be happy to help with any of these tasks.

Good to know!

Thanks,
//richard
Brian Norris Jan. 21, 2015, 7:45 a.m. UTC | #12
On Mon, Jan 12, 2015 at 12:51:29PM -0800, Brian Norris wrote:
> The MTD API reports -EUCLEAN only if the maximum number of bitflips
> found in any ECC block exceeds a certain threshold. This is done to
> avoid excessive -EUCLEAN reports to MTD users, which may induce
> additional scrubbing of data, even when the ECC algorithm in use is
> perfectly capable of handling the bitflips.
> 
> This threshold can be controlled by user-space (via sysfs), to allow
> users to determine what they are willing to tolerate in their
> application. But it still helps to have sane defaults.
> 
> In recent discussion [1], it was pointed out that our default threshold
> is equal to the correction strength. That means that we won't actually
> report any -EUCLEAN (i.e., "bitflips were corrected") errors until there
> are almost too many to handle. It was determined that 3/4 of the
> correction strength is probably a better default.
> 
> [1] http://lists.infradead.org/pipermail/linux-mtd/2015-January/057259.html
> 
> Signed-off-by: Brian Norris <computersforpeace@gmail.com>

Pushed to l2-mtd.git.

Brian
Brian Norris Jan. 21, 2015, 8:22 a.m. UTC | #13
I'm sorry, I just noticed your reply as I was applying the patch. I can
back it out if we find a real objection.

On Sat, Jan 17, 2015 at 08:01:37PM +0100, Boris Brezillon wrote:
> On Mon, 12 Jan 2015 12:51:29 -0800
> Brian Norris <computersforpeace@gmail.com> wrote:
> > The MTD API reports -EUCLEAN only if the maximum number of bitflips
> > found in any ECC block exceeds a certain threshold. This is done to
> > avoid excessive -EUCLEAN reports to MTD users, which may induce
> > additional scrubbing of data, even when the ECC algorithm in use is
> > perfectly capable of handling the bitflips.
> > 
> > This threshold can be controlled by user-space (via sysfs), to allow
> > users to determine what they are willing to tolerate in their
> > application. But it still helps to have sane defaults.
> > 
> > In recent discussion [1], it was pointed out that our default threshold
> > is equal to the correction strength. That means that we won't actually
> > report any -EUCLEAN (i.e., "bitflips were corrected") errors until there
> > are almost too many to handle. It was determined that 3/4 of the
> > correction strength is probably a better default.
> > 
> > [1] http://lists.infradead.org/pipermail/linux-mtd/2015-January/057259.html
> > 
> > Signed-off-by: Brian Norris <computersforpeace@gmail.com>
> > ---
> >  drivers/mtd/nand/nand_base.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c
> > index 816b5c1fd416..3f24b587304f 100644
> > --- a/drivers/mtd/nand/nand_base.c
> > +++ b/drivers/mtd/nand/nand_base.c
> > @@ -4171,7 +4171,7 @@ int nand_scan_tail(struct mtd_info *mtd)
> >  	 * properly set.
> >  	 */
> >  	if (!mtd->bitflip_threshold)
> > -		mtd->bitflip_threshold = mtd->ecc_strength;
> > +		mtd->bitflip_threshold = DIV_ROUND_UP(mtd->ecc_strength * 3, 4);
> 
> Just sharing my experience with MLC NANDs requiring read-retry: the
> number of reported bitflips often raise ecc_strength value (at least
> with the current read-retry approach).

I did not have fun when testing a few MLC NAND which required read
retry. I ended up recommending that most of our customers move off of it
onto another solution where possible. But yes, I can understand your
issue.

> This patch will definitely make UBI move NAND blocks over and over
> again considering the threshold has been raised and the block is not
> reliable anymore.

Personally, we found that we just needed to lower the
MTD_UBI_WL_THRESHOLD and let UBI move away from blocks with high bitflip
counts. I suppose we essentially treat the block as bad.

> While I like the idea of limiting the threshold to something smaller
> than what's recommended on the datasheet (or reported by ONFI) I wonder
> if it won't make things worst in some cases.

I wouldn't exactly say that there is any threshold recommended on the
datasheet or in ONFI. They simply specify a required correction
strength, with no word about any intermediate handling -- what do they
expect software to do when bitflips exceed the ECC strength? We
immediately lose data. So we need to preemptively move such data.

So I don't think it's a good idea at all to say that
threshold == ecc_strength. That renders your ECC nearly useless w.r.t.
the original design of UBI. UBI intends to use -EUCLEAN as a signal that
high # of bitflips. I would suggest that 23 bitflips on a 24-bit
correction algorithm is a "high # of bitflips." So as I see it, this
patch is just restoring that UBI assumption.

> Regarding the read-retry code, it currently stops retrying reading the
> page once the page has been successfully retrieved (or in other terms
> all bitflips have been fixed). But it might stop to soon, because by
> changing the bit level threshold (in other term retrying one more time)
> it might successfully read the page with less bitflips than the
> previous attempt (these are just supposition, I haven't tested it yet).
> If we can achieve that we could retry until we reach something below
> the bitflips threshold value, and if we fail to find any, just consider
> the lower number of bitflips found during those read-retry operations.

I believe I suggested scenarios like this to some flash vendors when
speaking to reps in person, but they didn't seem to consider that
likely. I think they were implying that there would be only one read
retry mode that gives a reasonable result. I'm not sure if they were
really the experts on that particular topic, though, or if they were
just giving me an answer to make me happy.

Honestly, there's a lot of work that goes into MLC NAND for use by
serious applications. For instance, SSD controller vendors, and even
eMMC makers, use plenty of low-level knowledge that we simply do not
have. From whatever I've gleaned about Toshiba MLC NAND (I don't think
they even try to sell much raw NAND outside of eMMC and SSD solutions),
they actually expose fine-grained voltage threshold controls through
their (non-standard, obviously) command interface, and their clients
likely use this to chart a targeted plan on how to adjust thresholds
over the lifetime of a block, rather than just using a dumb sequential
search like we do.

So anyway, that drifted a little away from the topic at hand. Are you
suggesting that we should not change this default in nand_base, because
of potential issues with highly-unreliable NAND that need read-retry? I
can back out the patch while we finish this discussion, although I'm not
very convinced so far.

Brian
Boris Brezillon Jan. 21, 2015, 8:42 a.m. UTC | #14
Hi Brian,

On Wed, 21 Jan 2015 00:22:57 -0800
Brian Norris <computersforpeace@gmail.com> wrote:

> I'm sorry, I just noticed your reply as I was applying the patch. I can
> back it out if we find a real objection.

No, I'm fine with this patch, I was just trying to draw attention on
several problems I noticed while working on MLC NANDs.
We'll need to sort this out at some point, but I won't prevent sane
SLC/MLC NANDs from having this ECC threshold changed for something that
is not well supported yet ;-).

> 
> On Sat, Jan 17, 2015 at 08:01:37PM +0100, Boris Brezillon wrote:
> > On Mon, 12 Jan 2015 12:51:29 -0800
> > Brian Norris <computersforpeace@gmail.com> wrote:
> > > The MTD API reports -EUCLEAN only if the maximum number of bitflips
> > > found in any ECC block exceeds a certain threshold. This is done to
> > > avoid excessive -EUCLEAN reports to MTD users, which may induce
> > > additional scrubbing of data, even when the ECC algorithm in use is
> > > perfectly capable of handling the bitflips.
> > > 
> > > This threshold can be controlled by user-space (via sysfs), to allow
> > > users to determine what they are willing to tolerate in their
> > > application. But it still helps to have sane defaults.
> > > 
> > > In recent discussion [1], it was pointed out that our default threshold
> > > is equal to the correction strength. That means that we won't actually
> > > report any -EUCLEAN (i.e., "bitflips were corrected") errors until there
> > > are almost too many to handle. It was determined that 3/4 of the
> > > correction strength is probably a better default.
> > > 
> > > [1] http://lists.infradead.org/pipermail/linux-mtd/2015-January/057259.html
> > > 
> > > Signed-off-by: Brian Norris <computersforpeace@gmail.com>
> > > ---
> > >  drivers/mtd/nand/nand_base.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c
> > > index 816b5c1fd416..3f24b587304f 100644
> > > --- a/drivers/mtd/nand/nand_base.c
> > > +++ b/drivers/mtd/nand/nand_base.c
> > > @@ -4171,7 +4171,7 @@ int nand_scan_tail(struct mtd_info *mtd)
> > >  	 * properly set.
> > >  	 */
> > >  	if (!mtd->bitflip_threshold)
> > > -		mtd->bitflip_threshold = mtd->ecc_strength;
> > > +		mtd->bitflip_threshold = DIV_ROUND_UP(mtd->ecc_strength * 3, 4);
> > 
> > Just sharing my experience with MLC NANDs requiring read-retry: the
> > number of reported bitflips often raise ecc_strength value (at least
> > with the current read-retry approach).
> 
> I did not have fun when testing a few MLC NAND which required read
> retry. I ended up recommending that most of our customers move off of it
> onto another solution where possible. But yes, I can understand your
> issue.

Neither did I, but that's the world we live in, and I guess board
manufacturers will keep using crapy MLC chips so this is definitely
something we have to work on.

> 
> > This patch will definitely make UBI move NAND blocks over and over
> > again considering the threshold has been raised and the block is not
> > reliable anymore.
> 
> Personally, we found that we just needed to lower the
> MTD_UBI_WL_THRESHOLD and let UBI move away from blocks with high bitflip
> counts. I suppose we essentially treat the block as bad.

The problem I have is that a lot of them reach this ECC threshold limit
(at least with the current implementation).

> 
> > While I like the idea of limiting the threshold to something smaller
> > than what's recommended on the datasheet (or reported by ONFI) I wonder
> > if it won't make things worst in some cases.
> 
> I wouldn't exactly say that there is any threshold recommended on the
> datasheet or in ONFI. They simply specify a required correction
> strength,

Yes, that's what I was talking about.

> with no word about any intermediate handling -- what do they
> expect software to do when bitflips exceed the ECC strength? We
> immediately lose data. So we need to preemptively move such data.

Yep, I agree.

> 
> So I don't think it's a good idea at all to say that
> threshold == ecc_strength. That renders your ECC nearly useless w.r.t.
> the original design of UBI. UBI intends to use -EUCLEAN as a signal that
> high # of bitflips. I would suggest that 23 bitflips on a 24-bit
> correction algorithm is a "high # of bitflips." So as I see it, this
> patch is just restoring that UBI assumption.

Yes.

> 
> > Regarding the read-retry code, it currently stops retrying reading the
> > page once the page has been successfully retrieved (or in other terms
> > all bitflips have been fixed). But it might stop to soon, because by
> > changing the bit level threshold (in other term retrying one more time)
> > it might successfully read the page with less bitflips than the
> > previous attempt (these are just supposition, I haven't tested it yet).
> > If we can achieve that we could retry until we reach something below
> > the bitflips threshold value, and if we fail to find any, just consider
> > the lower number of bitflips found during those read-retry operations.
> 
> I believe I suggested scenarios like this to some flash vendors when
> speaking to reps in person, but they didn't seem to consider that
> likely. I think they were implying that there would be only one read
> retry mode that gives a reasonable result. I'm not sure if they were
> really the experts on that particular topic, though, or if they were
> just giving me an answer to make me happy.

Okay, good to know. I'll try to do some more testing to verify that.

> 
> Honestly, there's a lot of work that goes into MLC NAND for use by
> serious applications. For instance, SSD controller vendors, and even
> eMMC makers, use plenty of low-level knowledge that we simply do not
> have. From whatever I've gleaned about Toshiba MLC NAND (I don't think
> they even try to sell much raw NAND outside of eMMC and SSD solutions),
> they actually expose fine-grained voltage threshold controls through
> their (non-standard, obviously) command interface, and their clients
> likely use this to chart a targeted plan on how to adjust thresholds
> over the lifetime of a block, rather than just using a dumb sequential
> search like we do.

Yep, but they do sell these unreliable MLC chips (at least Micron and
Hynix do) to board manufacturers, and a lot of people would really like
to use a mainline kernel on these boards.

> 
> So anyway, that drifted a little away from the topic at hand. Are you
> suggesting that we should not change this default in nand_base, because
> of potential issues with highly-unreliable NAND that need read-retry? I
> can back out the patch while we finish this discussion, although I'm not
> very convinced so far.

As I said I don't expect you to back it out, just discussing the MLC
chip problems in an almost unrelated thread ;-).

Best Regards,

Boris
Boris Brezillon Feb. 10, 2015, 1:50 p.m. UTC | #15
On Wed, 21 Jan 2015 09:42:57 +0100
Boris Brezillon <boris.brezillon@free-electrons.com> wrote:

> Hi Brian,
> 
> On Wed, 21 Jan 2015 00:22:57 -0800
> Brian Norris <computersforpeace@gmail.com> wrote:

[...]

> 
> > 
> > > Regarding the read-retry code, it currently stops retrying reading the
> > > page once the page has been successfully retrieved (or in other terms
> > > all bitflips have been fixed). But it might stop to soon, because by
> > > changing the bit level threshold (in other term retrying one more time)
> > > it might successfully read the page with less bitflips than the
> > > previous attempt (these are just supposition, I haven't tested it yet).
> > > If we can achieve that we could retry until we reach something below
> > > the bitflips threshold value, and if we fail to find any, just consider
> > > the lower number of bitflips found during those read-retry operations.
> > 
> > I believe I suggested scenarios like this to some flash vendors when
> > speaking to reps in person, but they didn't seem to consider that
> > likely. I think they were implying that there would be only one read
> > retry mode that gives a reasonable result. I'm not sure if they were
> > really the experts on that particular topic, though, or if they were
> > just giving me an answer to make me happy.
> 
> Okay, good to know. I'll try to do some more testing to verify that.

I did some more test on my cubietruck, trying other read-retry if the
threshold limit is reached (here is the code [1]), and it seems that
better read-retry mode are found in most cases (actually in all the
cases I encountered: see those traces [2]).

Note that I configured the bitflips_threshold to 3/4 of the
ecc-strength (exactly what you're doing in this patch).

Given these results I really think we should consider testing other
'read modes' if the succeeding one exceed the threshold value.

Best Regards,

Boris

[1]http://code.bulix.org/lvcs9x-87859
[2]http://code.bulix.org/xii8nw-87860
diff mbox

Patch

diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c
index 816b5c1fd416..3f24b587304f 100644
--- a/drivers/mtd/nand/nand_base.c
+++ b/drivers/mtd/nand/nand_base.c
@@ -4171,7 +4171,7 @@  int nand_scan_tail(struct mtd_info *mtd)
 	 * properly set.
 	 */
 	if (!mtd->bitflip_threshold)
-		mtd->bitflip_threshold = mtd->ecc_strength;
+		mtd->bitflip_threshold = DIV_ROUND_UP(mtd->ecc_strength * 3, 4);
 
 	/* Check, if we should skip the bad block table scan */
 	if (chip->options & NAND_SKIP_BBTSCAN)