From patchwork Thu Feb 5 09:47:34 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: hujianyang X-Patchwork-Id: 436728 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2001:1868:205::9]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 1BBC9140151 for ; Thu, 5 Feb 2015 20:50:34 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.80.1 #2 (Red Hat Linux)) id 1YJJ3H-0005wl-UT; Thu, 05 Feb 2015 09:48:47 +0000 Received: from szxga03-in.huawei.com ([119.145.14.66]) by bombadil.infradead.org with esmtps (Exim 4.80.1 #2 (Red Hat Linux)) id 1YJJ3C-0005kY-B6 for linux-mtd@lists.infradead.org; Thu, 05 Feb 2015 09:48:43 +0000 Received: from 172.24.2.119 (EHLO szxeml433-hub.china.huawei.com) ([172.24.2.119]) by szxrg03-dlp.huawei.com (MOS 4.4.3-GA FastPath queued) with ESMTP id BBK99673; Thu, 05 Feb 2015 17:47:49 +0800 (CST) Received: from [127.0.0.1] (10.111.68.144) by szxeml433-hub.china.huawei.com (10.82.67.210) with Microsoft SMTP Server id 14.3.158.1; Thu, 5 Feb 2015 17:47:37 +0800 Message-ID: <54D33C36.9060805@huawei.com> Date: Thu, 5 Feb 2015 17:47:34 +0800 From: hujianyang User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:17.0) Gecko/20130801 Thunderbird/17.0.8 MIME-Version: 1.0 To: Artem Bityutskiy Subject: [RFC] UBIFS recovery X-Originating-IP: [10.111.68.144] X-CFilter-Loop: Reflected X-Mirapoint-Virus-RAPID-Raw: score=unknown(0), refid=str=0001.0A020204.54D33C46.01B7, ss=1, re=0.001, recu=0.000, reip=0.000, cl=1, cld=1, fgs=0, ip=0.0.0.0, so=2013-05-26 15:14:31, dmn=2013-03-21 17:37:32 X-Mirapoint-Loop-Id: 4e9ca3aed57ef24a5563da14c481a3dc X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20150205_014842_707173_00667F27 X-CRM114-Status: GOOD ( 18.01 ) X-Spam-Score: -2.3 (--) X-Spam-Report: SpamAssassin version 3.4.0 on bombadil.infradead.org summary: Content analysis details: (-2.3 points) pts rule name description ---- ---------------------- -------------------------------------------------- -2.3 RCVD_IN_DNSWL_MED RBL: Sender listed at http://www.dnswl.org/, medium trust [119.145.14.66 listed in list.dnswl.org] -0.0 T_RP_MATCHES_RCVD Envelope sender domain matches handover relay domain -0.0 RCVD_IN_MSPIKE_H3 RBL: Good reputation (+3) [119.145.14.66 listed in wl.mailspike.net] -0.0 SPF_PASS SPF: sender matches SPF record -0.0 RCVD_IN_MSPIKE_WL Mailspike good senders Cc: Richard Weinberger , linux-mtd , Sheng Yong X-BeenThere: linux-mtd@lists.infradead.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Linux MTD discussion mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-mtd" Errors-To: linux-mtd-bounces+incoming=patchwork.ozlabs.org@lists.infradead.org Current UBIFS is lack of recovery method, that means, once a UBIFS partition refuse to mount, all data on that partition may lose. The default recovery mechanism in UBIFS now can deal with corruption on master node or power cut cleanup. But it's not enough. UBIFS on flash may suffer different kinds of data corrupted, the most common case, ECC error. I've scanned the archive of maillist and found the recovery method was once requested(Sorry, I can't find the link). Artem suggested we could introduce a new repairing mount option instead of working on a new userspace repairing tool. But seems no more efforts had been done so far. There are two ways for UBIFS recovery. One is repairing UBIFS image in userspace via UBI interfaces, the other is repairing the corrupted data during mount by default or via a special mount option. The userspace tool is the most effective way to repair a partition. It could have enough time and resource to whole scan the target and cleanup the corrupted while the file-system offline. But it's hard to program: many structures and functions in kernel need to be copied into this utility, current ubi-utils focus mostly on UBI device, not UBIFS, and the subsequent updating of file-system should consider the userspace tool. It's too complicated. Another way is expanding the existing recovery methods in recovery.c. It's easy to add new recovery method in this way, few lines changes could improve reliability in some fields. But it's hard to give a global view to control these recovery features, they are dispersed in mounting path. Also, make it hard to add new features after importing lots of recovery methods. I can't say which way is better. It depends on what we expect on UBIFS. Actually I'm working on a userspace tool ubidump, it can print on-flash format of a specified LEB now and add features like file-system repairing can be considered. On the other hand, I'm working on expanding UBIFS recovery method in kernel. e.g. cleanup all the logs if an error occur while replaying buds, revert file- system to last commit state instead of mounting fail. Regardless of how to fix a corrupt partition, the first stuff should be done is adding a method that try to mount file-system R/O instead of breaking down to give users a chance to copy their valid data out from the corrupt image. Thanks! Hu buds replay patch for linux 3.10 stable: diff --git a/fs/ubifs/replay.c b/fs/ubifs/replay.c index 3187925..e2208a2 100644 --- a/fs/ubifs/replay.c +++ b/fs/ubifs/replay.c @@ -706,14 +706,35 @@ static int replay_buds(struct ubifs_info *c) list_for_each_entry(b, &c->replay_buds, list) { err = replay_bud(c, b); - if (err) - return err; + if (err) { + ubifs_err("error %d during buds replay, try to revert\n", + err); + goto revert; + } ubifs_assert(b->sqnum > prev_sqnum); prev_sqnum = b->sqnum; } return 0; + +revert: + prev_sqnum = 0; + + list_for_each_entry(b, &c->replay_buds, list) { + /* + * Revert to last commit state, update lprops by setting + * the state of space used by buds to dirty. + */ + b->free = c->leb_size % c->min_io_size; + b->dirty = c->leb_size - b->bud->start - b->free; + + ubifs_assert(b->sqnum > prev_sqnum); + prev_sqnum = b->sqnum; + } + ubifs_warn("revert to last commit state with data lost\n"); + + return 1; } /** @@ -1036,13 +1057,15 @@ int ubifs_replay_journal(struct ubifs_info *c) lnum = ubifs_next_log_lnum(c, lnum); } while (lnum != c->ltail_lnum); - err = replay_buds(c); - if (err) - goto out; - - err = apply_replay_list(c); - if (err) - goto out; + /* + * If an error occur during buds replay, try to revert filesystem + * to last commit state. Should not apply corrupt replay list. + */ + if (!replay_buds(c)) { + err = apply_replay_list(c); + if (err) + goto out; + } err = set_buds_lprops(c); if (err)