Message ID | 1287396125-1890-1-git-send-email-dedekind1@gmail.com |
---|---|
State | New, archived |
Headers | show |
On Mon, 2010-10-18 at 13:02 +0300, Artem Bityutskiy wrote: > From: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> > > Describe a problem reported by Matthieu CASTET which is currently > not handled by UBIFS. > > Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Matthiew, are you happy with this description? Does it properly reflect your findings? Could you please correct, if not? I'm starting working on your problem. Since I do not have much time, I'll do a little everyday, but hope to come up with some patches this week already. The thing is that it is a lot of work. We need to go through a lot of UBI/UBIFS subsystems and analyze them. Why a lot of work? Because we assumed everywhere we can rely on CRC - if it is correct, we are safe. However, according to you this is not reliable for unstable pages - you do not have guarantee that next time you read it you will get correct data. Also, I do not have HW to test this, so I expect you to help by testing, are your testing set-ups kept ready? :-)
Hi Artem, Artem Bityutskiy a écrit : > On Mon, 2010-10-18 at 13:02 +0300, Artem Bityutskiy wrote: >> From: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> >> >> Describe a problem reported by Matthieu CASTET which is currently >> not handled by UBIFS. >> >> Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> > > Matthiew, are you happy with this description? Does it properly reflect > your findings? Could you please correct, if not? Yes that seems correct. > > I'm starting working on your problem. Since I do not have much time, > I'll do a little everyday, but hope to come up with some patches this > week already. The thing is that it is a lot of work. We need to go > through a lot of UBI/UBIFS subsystems and analyze them. > > Why a lot of work? Because we assumed everywhere we can rely on CRC - if > it is correct, we are safe. However, according to you this is not > reliable for unstable pages - you do not have guarantee that next time > you read it you will get correct data. > > Also, I do not have HW to test this, so I expect you to help by testing, > are your testing set-ups kept ready? :-) > Yes our boards are ready to test things. But we can sent you flashs or boards with the problem. What flash/board do you have on your side ? Could you swap nand on your board (via tsop socket) ? We could sent one of our board, but the update side can be complex/tricky. Some of beagleboard may have the problem. But I am unable to test it. On the beagleboard I have, I got strange ecc error [1] event without reboot. Also the driver look strange (for example doesn't do bad block scanning [2]). I end up with unusable nand [3]. Do you know if there is a better version of the nand driver for beagle (I use the one from ubi-2.6) ? Matthieu [1] UBI error: ubi_io_read: error -74 (ECC error) while reading 4144 bytes from PEB 3:45056, read 4144 bytes [...] UBI error: do_sync_erase: cannot erase PEB 137, error -5 [2] for each format I got ubiformat: formatting eraseblock 137 -- 53 % complete ubiformat: error!: failed to erase eraseblock 137 error 5 (Input/output error) ubiformat: marking block 137 bad [3] # ubiformat /dev/mtd3 -y ubiformat: mtd3 (nand), size 33554432 bytes (32.0 MiB), 256 eraseblocks of 131072 bytes (128.0 KiB), min. I/O size 2048 bytes libscan: scanning eraseblock 255 -- 100 % complete ubiformat: 255 eraseblocks have valid erase counter, mean value is 10 ubiformat: 1 bad eraseblocks found, numbers: 137 ubiformat: warning!: VID header and data offsets on flash are 2048 and 4096, which is different to requested offsets 512 and 28 ubiformat: use new offsets 512 and 2048? (yes/no) yes ubiformat: use offsets 512 and 2048 ubiformat: formatting eraseblock 255 -- 100 % complete # ubiattach /dev/ubi_ctrl -m 3 -d 3 [ 166.922119] UBI: attaching mtd3 to ubi3 [ 166.926177] UBI: physical eraseblock size: 131072 bytes (128 KiB) [ 166.932495] UBI: logical eraseblock size: 129024 bytes [ 166.937927] UBI: smallest flash I/O unit: 2048 [ 166.942657] UBI: sub-page size: 512 [ 166.947326] UBI: VID header offset: 512 (aligned 512) [ 166.953186] UBI: data offset: 2048 [ 166.958740] Correcting single bit ECC error at offset: 389, bit: 3 [ 167.137695] UBI: max. sequence number: 0 [ 167.142883] Correcting single bit ECC error at offset: 340, bit: 6 [ 167.149108] ecc failure [ 167.151580] Correcting single bit ECC error at offset: 12, bit: 6 [ 167.158325] ecc failure [ 167.160797] ecc failure [ 167.163269] Correcting single bit ECC error at offset: 44, bit: 6 [ 167.170013] ecc failure [ 167.172485] ecc failure [ 167.175567] ecc failure [ 167.178039] ecc failure [ 167.181121] ecc failure [ 167.183593] ecc failure [ 167.186645] ecc failure [ 167.189147] ecc failure [ 167.192199] Correcting single bit ECC error at offset: 188, bit: 6 [ 167.198455] Correcting single bit ECC error at offset: 196, bit: 6 [ 167.205291] Correcting single bit ECC error at offset: 220, bit: 6 [ 167.211517] Correcting single bit ECC error at offset: 228, bit: 6 [ 167.218353] Correcting single bit ECC error at offset: 252, bit: 6 [ 167.224578] Correcting single bit ECC error at offset: 260, bit: 6 [ 167.231445] Correcting single bit ECC error at offset: 284, bit: 6 [ 167.237670] Correcting single bit ECC error at offset: 292, bit: 6 [ 167.244537] Correcting single bit ECC error at offset: 316, bit: 6 [ 167.250762] Correcting single bit ECC error at offset: 324, bit: 6 [ 167.256988] UBI error: ubi_io_read: error -74 (ECC error) while reading 22528 bytes from PEB 0:2048, read 22528 bytes [ 167.267700] [<c0034d5c>] (unwind_backtrace+0x0/0xf4) from [<c01db4dc>] (ubi_io_read+0x1b0/0x340) [ 167.276580] [<c01db4dc>] (ubi_io_read+0x1b0/0x340) from [<c01d1728>] (ubi_read_volume_table+0xbc/0xa44) [ 167.286071] [<c01d1728>] (ubi_read_volume_table+0xbc/0xa44) from [<c01d537c>] (ubi_attach_mtd_dev+0x674/0xcd0) [ 167.296173] [<c01d537c>] (ubi_attach_mtd_dev+0x674/0xcd0) from [<c01d5b80>] (ctrl_cdev_ioctl+0xec/0x164) [ 167.305725] [<c01d5b80>] (ctrl_cdev_ioctl+0xec/0x164) from [<c00d63d0>] (do_vfs_ioctl+0x7c/0x5f8) [ 167.314697] [<c00d63d0>] (do_vfs_ioctl+0x7c/0x5f8) from [<c00d6984>] (sys_ioctl+0x38/0x60) [ 167.323028] [<c00d6984>] (sys_ioctl+0x38/0x60) from [<c00300c0>] (ret_fast_syscall+0x0/0x30) [ 167.332214] Correcting single bit ECC error at offset: 340, bit: 6 [ 167.338470] ecc failure [ 167.340942] Correcting single bit ECC error at offset: 12, bit: 6 [ 167.347686] ecc failure [ 167.350158] ecc failure [ 167.352630] Correcting single bit ECC error at offset: 44, bit: 6 [ 167.359375] ecc failure [ 167.361846] ecc failure [ 167.364929] ecc failure [ 167.367401] ecc failure [ 167.370452] ecc failure [ 167.372955] ecc failure [ 167.376007] ecc failure [ 167.378479] ecc failure [ 167.381561] Correcting single bit ECC error at offset: 188, bit: 6 [ 167.387786] Correcting single bit ECC error at offset: 196, bit: 6 [ 167.394653] Correcting single bit ECC error at offset: 220, bit: 6 [ 167.400848] Correcting single bit ECC error at offset: 228, bit: 6 [ 167.407714] Correcting single bit ECC error at offset: 252, bit: 6 [ 167.413940] Correcting single bit ECC error at offset: 260, bit: 6 [ 167.420806] Correcting single bit ECC error at offset: 284, bit: 6 [ 167.427032] Correcting single bit ECC error at offset: 292, bit: 6 [ 167.433868] Correcting single bit ECC error at offset: 316, bit: 6 [ 167.440124] Correcting single bit ECC error at offset: 324, bit: 6 [ 167.446350] UBI error: ubi_io_read: error -74 (ECC error) while reading 22528 bytes from PEB 1:2048, read 22528 bytes [ 167.457031] [<c0034d5c>] (unwind_backtrace+0x0/0xf4) from [<c01db4dc>] (ubi_io_read+0x1b0/0x340) [ 167.465911] [<c01db4dc>] (ubi_io_read+0x1b0/0x340) from [<c01d1728>] (ubi_read_volume_table+0xbc/0xa44) [ 167.475402] [<c01d1728>] (ubi_read_volume_table+0xbc/0xa44) from [<c01d537c>] (ubi_attach_mtd_dev+0x674/0xcd0) [ 167.485473] [<c01d537c>] (ubi_attach_mtd_dev+0x674/0xcd0) from [<c01d5b80>] (ctrl_cdev_ioctl+0xec/0x164) [ 167.495056] [<c01d5b80>] (ctrl_cdev_ioctl+0xec/0x164) from [<c00d63d0>] (do_vfs_ioctl+0x7c/0x5f8) [ 167.503997] [<c00d63d0>] (do_vfs_ioctl+0x7c/0x5f8) from [<c00d6984>] (sys_ioctl+0x38/0x60) [ 167.512329] [<c00d6984>] (sys_ioctl+0x38/0x60) from [<c00300c0>] (ret_fast_syscall+0x0/0x30) [ 167.520874] UBI error: vtbl_check: bad CRC at record 1: 0xf116c36b, not 0xb116c36b [ 167.528594] UBI error: vtbl_check: bad CRC at record 1: 0xf116c36b, not 0xb116c36b [ 167.536285] UBI error: process_lvol: both volume tables are corrupted [ 167.542877] UBI error: ubi_attach_mtd_dev: failed to attach by scanning, error -22 ubiattach: error!: cannot attach mtd3 error 22 (Invalid argument)
diff --git a/fs/ubifs/replay.c b/fs/ubifs/replay.c index eed0fcf..e04d74a 100644 --- a/fs/ubifs/replay.c +++ b/fs/ubifs/replay.c @@ -32,6 +32,28 @@ * larger is the journal, the more memory its index may consume. */ +/* + * Problem description: unstable pages after unclean power cut on NAND flashes. + * + * If a power cut happens when we have ongoing NAND page program, this page + * becomes unstable. The following situations are possible when we mount this + * flash next time and UBIFS reads the page. + * o The page may look like it is empty, i.e., it contains only 0xFFs, but + * we write data there, the data becomes corrupted. I.e., when the data are + * read, we may get a ECC errors. Moreover, the page may be read with no + * errors sometimes, with an ECC error next time, with a bit-flip next + * time, etc. + * o The page may have bit-flip, but when it is read next time, it may have + * ECC errors or no errors at all. + * o An UBIFS node may have correct CRC, but when it is read next time, it + * may have CRC error. + * + * IOW, these unstable pages are disaster. UBIFS has to handle them correctly: + * never write to them and never rely on their contents. + * + * TODO: handle this for buds, log, orphan area, and master area. + */ + #include "ubifs.h" /*