Patchwork UBIFS Corrupt during power failure

login
register
mail settings
Submitter Artem Bityutskiy
Date July 24, 2009, noon
Message ID <1248436825.24676.70.camel@localhost>
Download mbox | patch
Permalink /patch/30202/
State New, archived
Headers show

Comments

Artem Bityutskiy - July 24, 2009, noon
On Fri, 2009-07-24 at 09:49 +0300, Artem Bityutskiy wrote:
> Further tests revealed that the fix needs some tweaks.
> 
> I get the following error with the latest UBI containing that "write before
> erase" change:
> 
> UBI: background thread "ubi_bgt0d" started, PID 736
> MTD do_write_buffer(): software timeout
> UBI error: nor_erase_prepare: error -5 while writing 4 bytes to PEB 365:64, written 0 bytes
> Call Trace:
> [cfa07ec0] [c0006cf8] show_stack+0x44/0x16c (unreliable)
> [cfa07f00] [c01b8004] ubi_io_sync_erase+0x234/0x39c
> [cfa07f40] [c01b92d4] erase_worker+0x8c/0x4b8
> [cfa07f70] [c01b894c] do_work+0xac/0x124
> [cfa07f90] [c01b9b90] ubi_thread+0xc0/0x138
> [cfa07fd0] [c0038ab0] kthread+0x4c/0x88
> [cfa07ff0] [c000eb38] kernel_thread+0x4c/0x68
> 
> So apparently it is possible to put NOR to the state when you cannot write
> to it. I wonder if this is SW bug or a HW feature.

OK, I've fixed this by:
1. Not panicking when the UBI headers cannot be invalidated because of
   write failures.
2. Reading the flash and making sure the VID header is anyway corrupted.
3. If it is corrupted, everything is fine and we proceed.
4. If it is not, we panic. Presumably this should not happen, but I'm
   not sure. Anyway, these changes let me proceed further. The patches
   I used are inlined. I've also pushed them to ubi-2.6.git/master.

>From 5088eafafc6c948a29b8371731efe373d034d989 Mon Sep 17 00:00:00 2001
From: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Date: Fri, 24 Jul 2009 15:31:33 +0300
Subject: [PATCH] UBI: introduce flash dump helper

Useful for debugging problems, compiled in only if UBI debugging
is enabled. This patch also makes the UBI writing function dump
the flash if it fails to write.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
---
 drivers/mtd/ubi/debug.c |   32 ++++++++++++++++++++++++++++++++
 drivers/mtd/ubi/debug.h |    2 ++
 drivers/mtd/ubi/io.c    |    1 +
 3 files changed, 35 insertions(+), 0 deletions(-)
Eric Holmberg - July 24, 2009, 1:39 p.m.
> More testing of NOR flash against power cuts showed that sometimes
> eraseblocks may be unwritable, and we cannot really invalidate
> them before erasure. But in this case the eraseblock probably
> contains garbage anyway, and we do not have to invalidate the
> headers. This assumption might be not true, but this is at least
> what I have observed. So if we cannot invalidate the headers,
> we make sure that the PEB does not contain valid VID header.
> If this is true, everything is fine, otherwise we panic.

I haven't seen this case with my flash chips  and it seems like an odd
thing to happen, but it may be possible with other NOR chips.

I will apply your patches and run some more tests over the weekend.

Power Cycling Results
---------------------------
I just finished a cycle test using 2 boards and the latest patches from
2.6.27 and both boards were still running after a combined power-cycle
count of 11,178 while running a memory torture test - so the robustness
has been improved dramatically.

Random LEB 1 Corruption
--------------------------------
Out of the 11,178 power cycles, I received 12 random corruptions of LEB
1.  The file system recovers the block.  The offset into the LEB where
the error occurs changes.  This looks like it may be the remaining half
of the problem I used to have where I would get two LEB 1's and one
would be corrupt.  Artem's fix of corrupting the header may have solved
the fatal issue, but I would be interested as to what this is doing.

Kernel mounts the UBIFS system as read-only initially and then remounts
it for write access later in the boot process.  This is the initial scan
in read-only mode.

[42949375.180000] UBIFS error (pid 1): ubifs_check_node: bad CRC:
calculated 0xb6a46e9c, read 0xd46336ef
[42949375.190000] UBIFS error (pid 1): ubifs_check_node: bad node at LEB
1:72704
[42949375.200000] UBIFS error (pid 1): ubifs_scanned_corruption:
corruption at LEB 1:72704
[42949375.250000] UBIFS error (pid 1): ubifs_scan: LEB 1 scanning failed
[42949375.310000] UBIFS: recovery needed
[42949375.430000] UBIFS: recovery deferred
[42949375.440000] UBIFS: mounted UBI device 0, volume 0, name "rootfs"
[42949375.440000] UBIFS: mounted read-only
[42949375.450000] UBIFS: file system size:   27498240 bytes (26853 KiB,
26 MiB, 210 LEBs)
[42949375.450000] UBIFS: journal size:       1178497 bytes (1150 KiB, 1
MiB, 9 LEBs)
[42949375.460000] UBIFS: media format:       w4/r0 (latest is w4/r0)
[42949375.470000] UBIFS: default compressor: lzo
[42949375.470000] UBIFS: reserved for root:  0 bytes (0 KiB)

Truncated Files
------------------
I keep a boot count in two separate files on the file system and update
them for every boot.  This count is getting reset when it gets up to 100
or so.  Here is a stripped down version of the boot count script.  Do
you see anything obvious?  I will do more analysis soon to determine the
exact cause, but just wanted to get everybody's opinion.

# set count = max(/var/boot_count_a, /var/boot_count_b)

count=$(($count+1))

# Write out the new boot count (do files that don't exist first)
if [ ! -e /var/boot_count_a ]; then
  # a doesn't exist, write it first
  echo $count > /var/boot_count_a
  sync
  echo $count > /var/boot_count_b
  sync
else
  echo $count > /var/boot_count_b
  sync
  echo $count > /var/boot_count_a
  sync
fi
 

-Eric
Artem Bityutskiy - July 24, 2009, 2:55 p.m.
On 07/24/2009 04:39 PM, Eric Holmberg wrote:
> Power Cycling Results
> ---------------------------
> I just finished a cycle test using 2 boards and the latest patches from
> 2.6.27 and both boards were still running after a combined power-cycle
> count of 11,178 while running a memory torture test - so the robustness
> has been improved dramatically.

Good news!

> Random LEB 1 Corruption
> --------------------------------
> Out of the 11,178 power cycles, I received 12 random corruptions of LEB
> 1.  The file system recovers the block.  The offset into the LEB where
> the error occurs changes.  This looks like it may be the remaining half
> of the problem I used to have where I would get two LEB 1's and one
> would be corrupt.  Artem's fix of corrupting the header may have solved
> the fatal issue, but I would be interested as to what this is doing.

Well, if you cut the power while UBIFS is in the middle of writing a node,
you have a half written node, and this is expectd.

> Kernel mounts the UBIFS system as read-only initially and then remounts
> it for write access later in the boot process.  This is the initial scan
> in read-only mode.

UBIFS yells about the corrupted node in the journal. If the corrupted node
is in the tail of the journal, this is an expected result of an unclean
reboot. Probably UBIFS should be smarter and print something less scary
in this case. I've just put this to my TODO list.

Anyway, you are mounting read-only, so it yells and goes forward.
>
> [42949375.180000] UBIFS error (pid 1): ubifs_check_node: bad CRC:
> calculated 0xb6a46e9c, read 0xd46336ef
> [42949375.190000] UBIFS error (pid 1): ubifs_check_node: bad node at LEB
> 1:72704
> [42949375.200000] UBIFS error (pid 1): ubifs_scanned_corruption:
> corruption at LEB 1:72704
> [42949375.250000] UBIFS error (pid 1): ubifs_scan: LEB 1 scanning failed
> [42949375.310000] UBIFS: recovery needed
> [42949375.430000] UBIFS: recovery deferred
> [42949375.440000] UBIFS: mounted UBI device 0, volume 0, name "rootfs"
> [42949375.440000] UBIFS: mounted read-only
> [42949375.450000] UBIFS: file system size:   27498240 bytes (26853 KiB,
> 26 MiB, 210 LEBs)
> [42949375.450000] UBIFS: journal size:       1178497 bytes (1150 KiB, 1
> MiB, 9 LEBs)
> [42949375.460000] UBIFS: media format:       w4/r0 (latest is w4/r0)
> [42949375.470000] UBIFS: default compressor: lzo
> [42949375.470000] UBIFS: reserved for root:  0 bytes (0 KiB)

But now it is important to see what happens when you try to remount it R/W.
At this stage it will check whether the corruption is at the end or not.
If it is at the end, UBIFS will clean it up and go forward. If it is not
at the end, UBIFS will yell and refuse remounting.

IOW, if it remounts R/W, then this corruption is expected result of unclean
reboots, nothing alarming.

But of course, to be absolutely sure, you should check your files after
you have mounted. But this is tricky and you should be very careful with
doing syncs, etc. But at least checking R/O files would not hurt.

> Truncated Files
> ------------------
> I keep a boot count in two separate files on the file system and update
> them for every boot.  This count is getting reset when it gets up to 100
> or so.  Here is a stripped down version of the boot count script.  Do
> you see anything obvious?  I will do more analysis soon to determine the
> exact cause, but just wanted to get everybody's opinion.

Err, you have not really told what is the problem.

> # set count = max(/var/boot_count_a, /var/boot_count_b)
>
> count=$(($count+1))
>
> # Write out the new boot count (do files that don't exist first)
> if [ ! -e /var/boot_count_a ]; then
>    # a doesn't exist, write it first
>    echo $count>  /var/boot_count_a
>    sync
>    echo $count>  /var/boot_count_b
>    sync
> else
>    echo $count>  /var/boot_count_b
>    sync
>    echo $count>  /var/boot_count_a
>    sync
> fi

If you never cut power before sync is finished, then you should never
loos the contents of these files. If you do, please bug us.

BTW, the other alternative to try would be to set the "sync" flag for
these files, then you would not have to call 'sync()', which is slower
a bit, because it also causes commit. I mean

if [ ! -e /var/boot_count_a ]; then
    # a doesn't exist, write it first
    echo $count>  /var/boot_count_a
    echo $count>  /var/boot_count_b
    chattr +S /var/boot_count_a
    chattr +S /var/boot_count_b
else
    echo $count>  /var/boot_count_b
    echo $count>  /var/boot_count_a
fi

I think this should be the same, but a bit more efficient. Would be
interesting to check if it really works, I checked this, but very long
time ago.

And just in case, here is some info:
http://www.linux-mtd.infradead.org/doc/ubifs.html#L_writeback

Patch

diff --git a/drivers/mtd/ubi/debug.c b/drivers/mtd/ubi/debug.c
index 54b0186..4876977 100644
--- a/drivers/mtd/ubi/debug.c
+++ b/drivers/mtd/ubi/debug.c
@@ -196,4 +196,36 @@  void ubi_dbg_dump_mkvol_req(const struct ubi_mkvol_req *req)
 	printk(KERN_DEBUG "\t1st 16 characters of name: %s\n", nm);
 }
 
+/**
+ * ubi_dbg_dump_flash - dump a region of flash.
+ * @ubi: UBI device description object
+ * @pnum: the physical eraseblock number to dump
+ * @offset: the starting offset within the physical eraseblock to dump
+ * @len: the length of the region to dump
+ */
+void ubi_dbg_dump_flash(struct ubi_device *ubi, int pnum, int offset, int len)
+{
+	int err;
+	size_t read;
+	void *buf;
+	loff_t addr = (loff_t)pnum * ubi->peb_size + offset;
+
+	buf = vmalloc(len);
+	if (!buf)
+		return;
+	err = ubi->mtd->read(ubi->mtd, addr, len, &read, buf);
+	if (err && err != -EUCLEAN) {
+		ubi_err("error %d while reading %d bytes from PEB %d:%d, "
+			"read %zd bytes", err, len, pnum, offset, read);
+		goto out;
+	}
+
+	dbg_msg("dumping %d bytes of data from PEB %d, offset %d",
+		len, pnum, offset);
+	print_hex_dump(KERN_DEBUG, "", DUMP_PREFIX_OFFSET, 32, 1, buf, len, 1);
+out:
+	vfree(buf);
+	return;
+}
+
 #endif /* CONFIG_MTD_UBI_DEBUG */
diff --git a/drivers/mtd/ubi/debug.h b/drivers/mtd/ubi/debug.h
index a4da7a0..f30bcb3 100644
--- a/drivers/mtd/ubi/debug.h
+++ b/drivers/mtd/ubi/debug.h
@@ -55,6 +55,7 @@  void ubi_dbg_dump_vtbl_record(const struct ubi_vtbl_record *r, int idx);
 void ubi_dbg_dump_sv(const struct ubi_scan_volume *sv);
 void ubi_dbg_dump_seb(const struct ubi_scan_leb *seb, int type);
 void ubi_dbg_dump_mkvol_req(const struct ubi_mkvol_req *req);
+void ubi_dbg_dump_flash(struct ubi_device *ubi, int pnum, int offset, int len);
 
 #ifdef CONFIG_MTD_UBI_DEBUG_MSG
 /* General debugging messages */
@@ -167,6 +168,7 @@  static inline int ubi_dbg_is_erase_failure(void)
 #define ubi_dbg_dump_sv(sv)              ({})
 #define ubi_dbg_dump_seb(seb, type)      ({})
 #define ubi_dbg_dump_mkvol_req(req)      ({})
+#define ubi_dbg_dump_flash(ubi, pnum, offset, len) ({})
 
 #define UBI_IO_DEBUG               0
 #define DBG_DISABLE_BGT            0
diff --git a/drivers/mtd/ubi/io.c b/drivers/mtd/ubi/io.c
index 4e7bcb2..b693138 100644
--- a/drivers/mtd/ubi/io.c
+++ b/drivers/mtd/ubi/io.c
@@ -269,6 +269,7 @@  int ubi_io_write(struct ubi_device *ubi, const void *buf, int pnum, int offset,
 		ubi_err("error %d while writing %d bytes to PEB %d:%d, written "
 			"%zd bytes", err, len, pnum, offset, written);
 		ubi_dbg_dump_stack();
+		ubi_dbg_dump_flash(ubi, pnum, offset, len);
 	} else
 		ubi_assert(written == len);
 
-- 
1.6.0.6


>From d89852b4f75eec9d270d446b164ce12e77b50e50 Mon Sep 17 00:00:00 2001
From: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Date: Fri, 24 Jul 2009 16:18:04 +0300
Subject: [PATCH] UBI: improve NOR flash erasure quirk

More testing of NOR flash against power cuts showed that sometimes
eraseblocks may be unwritable, and we cannot really invalidate
them before erasure. But in this case the eraseblock probably
contains garbage anyway, and we do not have to invalidate the
headers. This assumption might be not true, but this is at least
what I have observed. So if we cannot invalidate the headers,
we make sure that the PEB does not contain valid VID header.
If this is true, everything is fine, otherwise we panic.
---
 drivers/mtd/ubi/io.c |   48 ++++++++++++++++++++++++++++++++----------------
 1 files changed, 32 insertions(+), 16 deletions(-)

diff --git a/drivers/mtd/ubi/io.c b/drivers/mtd/ubi/io.c
index b693138..8aa51e7 100644
--- a/drivers/mtd/ubi/io.c
+++ b/drivers/mtd/ubi/io.c
@@ -476,30 +476,46 @@  out:
  */
 static int nor_erase_prepare(struct ubi_device *ubi, int pnum)
 {
-	int err;
+	int err, err1;
 	size_t written;
 	loff_t addr;
 	uint32_t data = 0;
+	struct ubi_vid_hdr vid_hdr;
 
 	addr = (loff_t)pnum * ubi->peb_size + ubi->vid_hdr_aloffset;
 	err = ubi->mtd->write(ubi->mtd, addr, 4, &written, (void *)&data);
-	if (err) {
-		ubi_err("error %d while writing 4 bytes to PEB %d:%d, written "
-			"%zd bytes", err, pnum, ubi->vid_hdr_aloffset, written);
-		ubi_dbg_dump_stack();
-		return err;
+	if (!err) {
+		addr -= ubi->vid_hdr_aloffset;
+		err = ubi->mtd->write(ubi->mtd, addr, 4, &written,
+				      (void *)&data);
+		if (!err)
+			return 0;
 	}
 
-	addr -= ubi->vid_hdr_aloffset;
-	err = ubi->mtd->write(ubi->mtd, addr, 4, &written, (void *)&data);
-	if (err) {
-		ubi_err("error %d while writing 4 bytes to PEB %d:%d, written "
-			"%zd bytes", err, pnum, 0, written);
-		ubi_dbg_dump_stack();
-		return err;
-	}
+	/*
+	 * We failed to write to the media. This was observed with Spansion
+	 * S29GL512N NOR flash. Most probably the eraseblock erasure was
+	 * interrupted at a very inappropriate moment, so it became unwritable.
+	 * In this case we probably anyway have garbage in this PEB.
+	 */
+	err1 = ubi_io_read_vid_hdr(ubi, pnum, &vid_hdr, 0);
+	if (err1 == UBI_IO_BAD_VID_HDR)
+		/*
+		 * The VID header is corrupted, so we can safely erase this
+		 * PEB and not afraid that it will be treated as a valid PEB in
+		 * case of an unclean reboot.
+		 */
+		return 0;
 
-	return 0;
+	/*
+	 * The PEB contains a valid VID header, but we cannot invalidate it.
+	 * Supposedly the flash media or the driver is screwed up, so return an
+	 * error.
+	 */
+	ubi_err("cannot invalidate PEB %d, write returned %d read returned %d",
+		pnum, err, err1);
+	ubi_dbg_dump_flash(ubi, pnum, 0, ubi->peb_size);
+	return -EIO;
 }
 
 /**