diff mbox series

ubifs: Don't discard nodes in recovery when ecc err detected

Message ID 1582293853-136727-1-git-send-email-chengzhihao1@huawei.com
State Not Applicable
Headers show
Series ubifs: Don't discard nodes in recovery when ecc err detected | expand

Commit Message

Zhihao Cheng Feb. 21, 2020, 2:04 p.m. UTC
The following process will lead TNC to find no corresponding inode node
(Reproduce method see Link):

  1. Garbage collection.
     1) move valid inode nodes from leb A to leb B
        (The leb number of B has been written as GC type bud node in log)
     2) unmap leb A, and corresponding peb is erased
        (GCed inode nodes exist only on leb B)
  2. Poweroff. A node near the end of the LEB is corrupted before power
     on, which is uncorrectable error of ECC.
  3. Replay and return success.
     1) replay_buds -> bud for leb B is the last GC type record in log
     2) ubifs_recover_leb:
        For SCANNED_A_CORRUPT_NODE error caused by node_B, UBIFS will
        discard node_A ... node_C, such nodes are in the same max_io unit.

               corrupt
                  ↓
        node_A  node_B ... node_C
        XXXXXXXXXXXXXXXXXXXXXXXXX  XXXXXX...   ← Replay GC LEB
        |←   max_write_size    →|
   4. Finding a missing inode node triggers an error, as follows:
      [ 2276.992557] UBIFS error (ubi0:0 pid 2509): ubifs_read_node
      [ubifs]: bad node type (255 but expected 2)
      [ 2276.996814] UBIFS error (ubi0:0 pid 2509): ubifs_read_node
      [ubifs]: bad node at LEB 15:73728, LEB mapping status 0
      [ 2276.998439] Not a node, first 24 bytes:
      [ 2276.998442] 00000000: ff ff ff ff ff ff ff ff ff ff ff ff

Fix this by returning fail when scan bad data with ecc error detected in
ubifs_recover_leb(). This reduces the fault tolerance of the file system.
In the case of pad node ecc error at the end of LEB without affecting the
file system data, it also returns fail, but the probability is very low.

Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Cc: <Stable@vger.kernel.org>
Fixes: 1e51764a3c2ac05a23a22b2a95 ("UBIFS: add new flash file system")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206625
---
 fs/ubifs/recovery.c | 41 +++++++++++++++++++++++++++++++++++------
 1 file changed, 35 insertions(+), 6 deletions(-)

Comments

Richard Weinberger March 1, 2020, 8:46 p.m. UTC | #1
Zhihao Cheng,

On Fri, Feb 21, 2020 at 2:57 PM Zhihao Cheng <chengzhihao1@huawei.com> wrote:
>
> The following process will lead TNC to find no corresponding inode node
> (Reproduce method see Link):

Please help me to understand what exactly is going on.

>   1. Garbage collection.
>      1) move valid inode nodes from leb A to leb B
>         (The leb number of B has been written as GC type bud node in log)
>      2) unmap leb A, and corresponding peb is erased
>         (GCed inode nodes exist only on leb B)

At this point all valid nodes are written to LEB B, right?

>   2. Poweroff. A node near the end of the LEB is corrupted before power
>      on, which is uncorrectable error of ECC.

If writing nodes to B has finished, these pages should be stable.
How can a power-cut affect the pages where these valid nodes sit?
Zhihao Cheng March 2, 2020, 3:58 a.m. UTC | #2
在 2020/3/2 4:46, Richard Weinberger 写道:
> Zhihao Cheng,
>
> On Fri, Feb 21, 2020 at 2:57 PM Zhihao Cheng <chengzhihao1@huawei.com> wrote:
>> The following process will lead TNC to find no corresponding inode node
>> (Reproduce method see Link):
> Please help me to understand what exactly is going on.
>
>>    1. Garbage collection.
>>       1) move valid inode nodes from leb A to leb B
>>          (The leb number of B has been written as GC type bud node in log)
>>       2) unmap leb A, and corresponding peb is erased
>>          (GCed inode nodes exist only on leb B)
> At this point all valid nodes are written to LEB B, right?
Yes.
>
>>    2. Poweroff. A node near the end of the LEB is corrupted before power
>>       on, which is uncorrectable error of ECC.
> If writing nodes to B has finished, these pages should be stable.
> How can a power-cut affect the pages where these valid nodes sit?
I mean, the uncorrectable ECC error is caused by hardware which may lead 
to corrupted nodes detected in UBIFS. I found uncorretable ECC errors on 
my NAND, in the environment of high temperature and humidity.

At present, UBIFS ignores all EBADMSG errors, so the corrupted node is 
only considered in being caused by unfinished writing. I think UBIFS 
should consider the corrupted area caused by ECC errors in process 
ubifs_recover_leb(). no_more_nodes() will skip a read-write unit. Maybe 
the corrupted area is skipped.
Richard Weinberger March 2, 2020, 9:14 p.m. UTC | #3
On Mon, Mar 2, 2020 at 4:58 AM Zhihao Cheng <chengzhihao1@huawei.com> wrote:
> I mean, the uncorrectable ECC error is caused by hardware which may lead
> to corrupted nodes detected in UBIFS. I found uncorretable ECC errors on
> my NAND, in the environment of high temperature and humidity.
>
> At present, UBIFS ignores all EBADMSG errors, so the corrupted node is
> only considered in being caused by unfinished writing. I think UBIFS
> should consider the corrupted area caused by ECC errors in process
> ubifs_recover_leb(). no_more_nodes() will skip a read-write unit. Maybe
> the corrupted area is skipped.

Well, if your NAND data is corrupted by your environment UBIFS cannot
do much. Sure, we can paper over some places but at the end of the day
you will always lose.

What if the UBI VID header becomes unreadable or the root node of the
index tree?
Zhihao Cheng March 3, 2020, 6:13 a.m. UTC | #4
在 2020/3/3 5:14, Richard Weinberger 写道:
> On Mon, Mar 2, 2020 at 4:58 AM Zhihao Cheng <chengzhihao1@huawei.com> wrote:
>> I mean, the uncorrectable ECC error is caused by hardware which may lead
>> to corrupted nodes detected in UBIFS. I found uncorretable ECC errors on
>> my NAND, in the environment of high temperature and humidity.
>>
>> At present, UBIFS ignores all EBADMSG errors, so the corrupted node is
>> only considered in being caused by unfinished writing. I think UBIFS
>> should consider the corrupted area caused by ECC errors in process
>> ubifs_recover_leb(). no_more_nodes() will skip a read-write unit. Maybe
>> the corrupted area is skipped.
> Well, if your NAND data is corrupted by your environment UBIFS cannot
> do much. Sure, we can paper over some places but at the end of the day
> you will always lose.
>
> What if the UBI VID header becomes unreadable or the root node of the
> index tree?
>
Agree, thanks for reminding.
diff mbox series

Patch

diff --git a/fs/ubifs/recovery.c b/fs/ubifs/recovery.c
index f116f7b3f9e5..e60f23ea4575 100644
--- a/fs/ubifs/recovery.c
+++ b/fs/ubifs/recovery.c
@@ -624,14 +624,28 @@  struct ubifs_scan_leb *ubifs_recover_leb(struct ubifs_info *c, int lnum,
 {
 	int ret = 0, err, len = c->leb_size - offs, start = offs, min_io_unit;
 	int grouped = jhead == -1 ? 0 : c->jheads[jhead].grouped;
+	int ecc_err = 0;
 	struct ubifs_scan_leb *sleb;
 	void *buf = sbuf + offs;
 
 	dbg_rcvry("%d:%d, jhead %d, grouped %d", lnum, offs, jhead, grouped);
 
-	sleb = ubifs_start_scan(c, lnum, offs, sbuf);
-	if (IS_ERR(sleb))
-		return sleb;
+	sleb = kzalloc(sizeof(struct ubifs_scan_leb), GFP_NOFS);
+	if (!sleb)
+		return ERR_PTR(-ENOMEM);
+
+	sleb->lnum = lnum;
+	INIT_LIST_HEAD(&sleb->nodes);
+	sleb->buf = sbuf;
+
+	err = ubifs_leb_read(c, lnum, sbuf + offs, offs, c->leb_size - offs, 0);
+	if (err && err != -EBADMSG) {
+		ubifs_err(c, "cannot read %d bytes from LEB %d:%d, error %d",
+			  c->leb_size - offs, lnum, offs, err);
+		kfree(sleb);
+		return ERR_PTR(err);
+	} else if (err == -EBADMSG)
+		ecc_err = 1;
 
 	ubifs_assert(c, len >= 8);
 	while (len >= 8) {
@@ -677,10 +691,24 @@  struct ubifs_scan_leb *ubifs_recover_leb(struct ubifs_info *c, int lnum,
 	}
 
 	if (ret == SCANNED_GARBAGE || ret == SCANNED_A_BAD_PAD_NODE) {
-		if (!is_last_write(c, buf, offs))
+		/*
+		 * If the garbage area or bad pad node is caused by ecc
+		 * error, skipping valid nodes in the aligned max write
+		 * size unit will lead tnc to mismatch node (inode,
+		 * data, etc.). But if the ecc error infects only a pad
+		 * node, which doesn't corrupt data nodes on UBIFS, still
+		 * failed. We choose robustness here at the cost of
+		 * giving up tolerating some small probability mistakes.
+		 */
+		if (!is_last_write(c, buf, offs) || ecc_err)
 			goto corrupted_rescan;
 	} else if (ret == SCANNED_A_CORRUPT_NODE) {
-		if (!no_more_nodes(c, buf, len, lnum, offs))
+		/*
+		 * If the corrupt node is caused by ecc error, skipping
+		 * valid nodes in the aligned max write size unit will
+		 * lead tnc to mismatch node (inode, data, etc.).
+		 */
+		if (!no_more_nodes(c, buf, len, lnum, offs) || ecc_err)
 			goto corrupted_rescan;
 	} else if (!is_empty(buf, len)) {
 		if (!is_last_write(c, buf, offs)) {
@@ -782,7 +810,8 @@  struct ubifs_scan_leb *ubifs_recover_leb(struct ubifs_info *c, int lnum,
 	ubifs_scanned_corruption(c, lnum, offs, buf);
 	err = -EUCLEAN;
 error:
-	ubifs_err(c, "LEB %d scanning failed", lnum);
+	ubifs_err(c, "LEB %d scanning failed%s", lnum,
+		  ecc_err ? ", ecc error detected!" : "");
 	ubifs_scan_destroy(sleb);
 	return ERR_PTR(err);
 }