diff mbox

UBIFS seeing corrupt blank pages when image flashed via u-boot

Message ID 20980858CB6D3A4BAE95CA194937D5E73EA620BD@DBDE04.ent.ti.com
State Rejected
Headers show

Commit Message

pekon gupta Jan. 15, 2014, 9:29 p.m. UTC
Hi Artem,

>From: Artem Bityutskiy [mailto:artem.bityutskiy@linux.intel.com]
<snip>
>Conclusion: all UBIFS needs is a way to ask the driver - is this NAND
>page blank or not? UBIFS does not really has to compare to all 0xFFs.
>
Thanks for details. Yes, I understand the concept in general that you
want to recover last bit of user-data written on NAND (without corruption).

Now, as NAND driver itself does differentiation between and erased-page
v/s programmed-page. Can we use different error codes to pass this
information to upper layers like;

*For MTD layer*
0: data valid, length of data is determined by 'read_len'  (currently)
-EUCLEAN: correctable bit-flips found, data is valid
-EBADMSG: un-correctable bit-flips, data *may-be* invalid.
-ENODATA: detected erased-page. *Actual* data determined by read_len.
-ENOMSG:  detected erased-page with bit-flips. *Actual* data determined by read_len.

*For UBI layer*
We can convert the MTD error-codes into UBI specific error-codes at
ubi_io_read() interface. This way many checks in other parts of UBI
layer can be simplified, and even slightly speed-up scan_peb().

--------------------------
of MTD devices (NOR and OneNAND) ?


with regards, pekon

Comments

Artem Bityutskiy Jan. 16, 2014, 7:19 a.m. UTC | #1
On Wed, 2014-01-15 at 21:29 +0000, Gupta, Pekon wrote:
> Hi Artem,
> 
> >From: Artem Bityutskiy [mailto:artem.bityutskiy@linux.intel.com]
> <snip>
> >Conclusion: all UBIFS needs is a way to ask the driver - is this NAND
> >page blank or not? UBIFS does not really has to compare to all 0xFFs.
> >
> Thanks for details. Yes, I understand the concept in general that you
> want to recover last bit of user-data written on NAND (without corruption).
> 
> Now, as NAND driver itself does differentiation between and erased-page
> v/s programmed-page. Can we use different error codes to pass this
> information to upper layers like;

I thought the ECC is something which could be used to differentiate.


> *For MTD layer*
> 0: data valid, length of data is determined by 'read_len'  (currently)
> -EUCLEAN: correctable bit-flips found, data is valid
> -EBADMSG: un-correctable bit-flips, data *may-be* invalid.
> -ENODATA: detected erased-page. *Actual* data determined by read_len.
> -ENOMSG:  detected erased-page with bit-flips. *Actual* data determined by read_len.

Not sure this is a good idea. If NAND driver cannot do the
differentiation, then it should not be done by the MTD layer, I think.

Then just improve UBI and UBIFS and make the function which compares
buffers with all 0xFFs allow for bit-flips. We know the maximum possible
bit-flips per min. I/O unit, right? Just allow for that amount.
pekon gupta Jan. 16, 2014, 7:44 a.m. UTC | #2
Hi Artem,

>From: Artem Bityutskiy [mailto:artem.bityutskiy@linux.intel.com]
>>On Wed, 2014-01-15 at 21:29 +0000, Gupta, Pekon wrote:
>> Hi Artem,
>>
>> >From: Artem Bityutskiy [mailto:artem.bityutskiy@linux.intel.com]
>> <snip>
>> >Conclusion: all UBIFS needs is a way to ask the driver - is this NAND
>> >page blank or not? UBIFS does not really has to compare to all 0xFFs.
>> >
>> Thanks for details. Yes, I understand the concept in general that you
>> want to recover last bit of user-data written on NAND (without corruption).
>>
>> Now, as NAND driver itself does differentiation between and erased-page
>> v/s programmed-page. Can we use different error codes to pass this
>> information to upper layers like;
>
>I thought the ECC is something which could be used to differentiate.
>
(I think you confused this thread with other mail thread going with Brian,
 that one is purely about how to implement detection of erased-page).

However, *assuming NAND driver can identify erased-page correctly*,
I don't want  UBI/UBIFS to re-check the read_buf for 0xff again, because
 underlying NAND driver has already identified as erased-page, and
fixed the data before passing it to above MTD layer.

So, I'm proposing that NAND driver returns following error-codes
to indicate the results of its finding to MTD and UBI layers, so they
don't have to re-check data again.

>
>> *For MTD layer*
>> 0: data valid, length of data is determined by 'read_len'  (currently)
>> -EUCLEAN: correctable bit-flips found, data is valid
>> -EBADMSG: un-correctable bit-flips, data *may-be* invalid.
>> -ENODATA: detected erased-page. *Actual* data determined by read_len.
>> -ENOMSG:  detected erased-page with bit-flips. *Actual* data determined by read_len.
>
>Not sure this is a good idea. If NAND driver cannot do the
>differentiation, then it should not be done by the MTD layer, I think.
>

[...]

>Then just improve UBI and UBIFS and make the function which compares
>buffers with all 0xFFs allow for bit-flips. We know the maximum possible
>bit-flips per min. I/O unit, right? Just allow for that amount.
>
I think UBI/UBIFS layer should be kept independent of ECC details.
So, adding checks for (bit-flip count < ecc.strength) in UBI/UBIFS is not good.

My aim is that UBI/UBIFS should be able to classify its ubi_io_read() in 
any of the following baskets with least possible CPU cycles.
- no data (with or without correctable bit-flips)
- valid data (with or without correctable bit-flips)
- invalid data (un-correctable bit-flips)

with regards, pekon
Artem Bityutskiy Jan. 16, 2014, 8:22 a.m. UTC | #3
On Thu, 2014-01-16 at 07:44 +0000, Gupta, Pekon wrote:
> However, *assuming NAND driver can identify erased-page correctly*,
> I don't want  UBI/UBIFS to re-check the read_buf for 0xff again, because
>  underlying NAND driver has already identified as erased-page, and
> fixed the data before passing it to above MTD layer.

I would agree only if this identification costs nothing, or very little.
If for _some_ setups this would be about comparing buffers against 0xFF
for every single read - then I am so sure.

Indeed, UBIFS only needs to check for blank areas at the recovery time.
Normal reads do not need this. So _if_ there are measurable costs, I'd
argue that there is no need for people to pay it for nothing during
normal file I/O.

Hmm?

Thanks!
diff mbox

Patch

diff --git a/drivers/mtd/ubi/io.c b/drivers/mtd/ubi/io.c
index bf79def..9b011b9 100644
--- a/drivers/mtd/ubi/io.c
+++ b/drivers/mtd/ubi/io.c
@@ -168,18 +168,24 @@  retry:
        if (err) {
                const char *errstr = mtd_is_eccerr(err) ? " (ECC error)" : "";

-               if (mtd_is_bitflip(err)) {
-                       /*
-                        * -EUCLEAN is reported if there was a bit-flip which
-                        * was corrected, so this is harmless.
-                        *
-                        * We do not report about it here unless debugging is
-                        * enabled. A corresponding message will be printed
-                        * later, when it is has been scrubbed.
-                        */
+               switch (err) {
+               case -EUCLEAN:
                        ubi_msg("fixable bit-flip detected at PEB %d", pnum);
                        ubi_assert(len == read);
                        return UBI_IO_BITFLIPS;
+               case -ENODATA:
+                       if (read == 0)
+                               return UBI_IO_FF;
+                       else
+                               return 0;
+               case -ENOMSG:
+                       if (read == 0)
+                               return UBI_IO_FF_BITFLIPS;
+                       else
+                               return UBI_IO_BITFLIPS;
+               case -EBADMSG:
+               default:
+                       continue;
                }
--------------------------

Also, please consider, if this approach is even feasible for other types