Message ID | 4C61223F.30100@parrot.com |
---|---|
State | New, archived |
Headers | show |
On Tue, 2010-08-10 at 11:56 +0200, Matthieu CASTET wrote: > Hi, > > > when running test with ubifs I found the following crash. > One block is instable (some read fails with ecc error correctable or > not) after a power cut. This is due to interrupted write or erase. > > Our test do first a read of the ubi volume (cat /dev/ubi3_0 > /dev/null) > to force complete read of it. > > In this case ecc correctable is detected, and scrubbing is scheduled > But ubi_eba_copy_leb: the block become uncorrectable and added to > erroneous list. > When mounting ubifs read doesn't check that it is erroneous and return data. > It is added again for scrubbing, but prot_queue_del crash because we > already remove it in the first scrubbing try. > > Here an attempt to fix the problem. This is ugly. I didn't try it yet. I > erased my corrupted flash by accident. > > One other solution could be to add the test in ubi_wl_scrub_peb, but I > don't think it is ok to return data on erroneous block. > > An other solution could be to unmap the block (read will return 0xff), > but this may break upper layer ? Matthieu, unfortunately I'm on holidays so cannot really look at this. And I already have a lot of UBI/UBIFS issues waiting for me to look at. I think I'll start looking at the things only in mid-September/October. Sorry for this. But may be Adrian could take a look at this, if he has some time? :-) Artem.
Hi Artem, Artem Bityutskiy a écrit : > On Tue, 2010-08-10 at 11:56 +0200, Matthieu CASTET wrote: >> Hi, >> > > Matthieu, unfortunately I'm on holidays so cannot really look at this. > And I already have a lot of UBI/UBIFS issues waiting for me to look at. > I think I'll start looking at the things only in mid-September/October. > Sorry for this. But may be Adrian could take a look at this, if he has > some time? :-) I don't know if you returned from holidays, but as you post stuff on ML it will post further investigation. I have done more test on these flash and I got other failures. The problem seems in the handling of interrupted write. On some nand we use, the page becomes instable and read can return unstable values. The manufacturer told us we should not use page where write was interrupted, they should have a erase cycle before they can be used again. On mounting, for the page where write was interrupted by a power cut : - I saw ecc error, in these case ubifs should reject it in recovery handling and everything should be fine. - I saw correctable error, in this case ubi move the block unless the next read in copy_page return an ecc error. In case of ecc error in copy we saw it too late, ubifs recovery is already done. - in this case ubifs recover can reject it if the data is not ok (bad crc, ...). Note that in these case we did the scrubbing move for nothing. - I saw page that return correct data (ecc and crc ok), but later they return (un)correctable error. Again this is too late [1], recovery is already done. It seems ubi/ubifs doesn't identify interrupted write pages on scanning/mount ATM. It only relies on ecc/crc, but this is not enough for unstable page. They can be good (or 1 bit error) for one read and bad the next read. So the problem is to identify interrupted write pages on scanning/mount. For static volume it should be easy with the interrupted flags. There is the tricky case of data move (for wear leveling or scrubbing) : if sqnum of the copy is the biggest, we should ignore it/copy it. But for dynamic/ubifs that's an other story. May be using ubi sqnum + ubifs journal it should be possible to do something. Matthieu PS : the same story happen for erase, but ubi should handle them correctly. [1] [ 12.720244] UBIFS: un-mount UBI device 3, volume 0 [ 12.760056] UBIFS: mounted UBI device 3, volume 0, name "system" [ 12.765919] UBIFS: file system size: 30601216 bytes (29884 KiB, 29 MiB, 241 LEBs) [ 12.773642] UBIFS: journal size: 1523712 bytes (1488 KiB, 1 MiB, 12 LEBs) [ 12.780868] UBIFS: media format: w4/r0 (latest is w4/r0) [ 12.786668] UBIFS: default compressor: none [ 12.790852] UBIFS: reserved for root: 1445370 bytes (1411 KiB) writing file '//mnt/dir06/file0046.bin' num=70, size=147120 writing file '//mnt/dir0c/file006c.bin' num=108, size=288146 [ 13.491407] UBI error: ubi_io_read: error -74 while reading 60 bytes from PEB 106:129480, read 60 bytes [ 13.500785] [<c00279f0>] (dump_stack+0x0/0x14) from [<c0161040>] (ubi_io_read+0xf0/0x258) [ 13.508952] [<c0160f50>] (ubi_io_read+0x0/0x258) from [<c01603a0>] (ubi_eba_read_leb+0x1b4/0x490) [ 13.517791] [<c01601ec>] (ubi_eba_read_leb+0x0/0x490) from [<c015e3f0>] (ubi_leb_read+0xe8/0x138) [ 13.526649] [<c015e308>] (ubi_leb_read+0x0/0x138) from [<c00d0c48>] (ubifs_read_node+0x40/0x190) [ 13.535423] r7:00000002 r6:00000000 r5:c78489a0 r4:c78489a0 [ 13.541065] [<c00d0c08>] (ubifs_read_node+0x0/0x190) from [<c00d18b8>] (ubifs_read_node_wbuf+0x4c/0x204) [ 13.550547] [<c00d186c>] (ubifs_read_node_wbuf+0x0/0x204) from [<c00e6b60>] (ubifs_tnc_read_node+0x5c/0xf8) [ 13.560274] [<c00e6b04>] (ubifs_tnc_read_node+0x0/0xf8) from [<c00d32a8>] (matches_name+0x94/0xdc) [ 13.569218] [<c00d3214>] (matches_name+0x0/0xdc) from [<c00d3334>] (resolve_collision+0x44/0x204) [ 13.578074] [<c00d32f0>] (resolve_collision+0x0/0x204) from [<c00d45e4>] (ubifs_tnc_remove_nm+0xf0/0x108) [ 13.587615] [<c00d44f4>] (ubifs_tnc_remove_nm+0x0/0x108) from [<c00c7f08>] (ubifs_jnl_rename+0x4f8/0x70c) [ 13.597169] [<c00c7a10>] (ubifs_jnl_rename+0x0/0x70c) from [<c00caaf8>] (ubifs_rename+0x2b0/0x5e4) [ 13.606117] [<c00ca848>] (ubifs_rename+0x0/0x5e4) from [<c008581c>] (vfs_rename+0x238/0x270) [ 13.614538] [<c00855e4>] (vfs_rename+0x0/0x270) from [<c0086e54>] (sys_renameat+0x1b8/0x1cc) [ 13.622965] [<c0086c9c>] (sys_renameat+0x0/0x1cc) from [<c0086e8c>] (sys_rename+0x24/0x28) [ 13.631213] [<c0086e68>] (sys_rename+0x0/0x28) from [<c0023c00>] (ret_fast_syscall+0x0/0x2c) [ 13.639670] UBIFS error (pid 273): ubifs_read_node: bad node type (0 but expected 2) [ 13.647371] UBIFS error (pid 273): ubifs_read_node: bad node at LEB 47:125384 [ 13.654514] UBIFS warning (pid 273): ubifs_ro_mode: switched to read-only mode, error -22 /endurance: endurance.c: 197: create_file: Assertion `status == 0' failed. [ 46.357586] UBIFS error (pid 101): make_reservation: cannot reserve 160 bytes in jhead 1, error -30 [ 46.366503] UBIFS error (pid 101): ubifs_write_inode: can't write inode 19507, error -30
diff --git a/drivers/mtd/ubi/eba.c b/drivers/mtd/ubi/eba.c index 7fbe0d7..289c003 100644 --- a/drivers/mtd/ubi/eba.c +++ b/drivers/mtd/ubi/eba.c @@ -367,6 +367,7 @@ out_unlock: * returned for any volume type if an ECC error was detected by the MTD device * driver. Other negative error cored may be returned in case of other errors. */ +int in_wl_tree(struct ubi_wl_entry *e, struct rb_root *root); int ubi_eba_read_leb(struct ubi_device *ubi, struct ubi_volume *vol, int lnum, void *buf, int offset, int len, int check) { @@ -392,6 +393,19 @@ int ubi_eba_read_leb(struct ubi_device *ubi, struct ubi_volume *vol, int lnum, memset(buf, 0xFF, len); return 0; } + { + struct ubi_wl_entry *e; + int bad; + + spin_lock(&ubi->wl_lock); + e = ubi->lookuptbl[pnum]; + bad = in_wl_tree(e, &ubi->erroneous); + spin_unlock(&ubi->wl_lock); + /* we should not append to read bad block */ + if (bad) { + return -EBADMSG; + } + } dbg_eba("read %d bytes from offset %d of LEB %d:%d, PEB %d", len, offset, vol_id, lnum, pnum); diff --git a/drivers/mtd/ubi/wl.c b/drivers/mtd/ubi/wl.c index 10b6100..3c4c3ed 100644 --- a/drivers/mtd/ubi/wl.c +++ b/drivers/mtd/ubi/wl.c @@ -292,7 +292,7 @@ static int produce_free_peb(struct ubi_device *ubi) * This function returns non-zero if @e is in the @root RB-tree and zero if it * is not. */ -static int in_wl_tree(struct ubi_wl_entry *e, struct rb_root *root) +/*static */int in_wl_tree(struct ubi_wl_entry *e, struct rb_root *root) { struct rb_node *p;