From patchwork Wed Jul 17 03:08:32 2013
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Theodore Ts'o <tytso@mit.edu>
X-Patchwork-Id: 259592
Return-Path: <linux-ext4-owner@vger.kernel.org>
X-Original-To: patchwork-incoming@ozlabs.org
Delivered-To: patchwork-incoming@ozlabs.org
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by ozlabs.org (Postfix) with ESMTP id 997062C016A
	for <patchwork-incoming@ozlabs.org>;
	Wed, 17 Jul 2013 13:08:38 +1000 (EST)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1751199Ab3GQDIh (ORCPT <rfc822;patchwork-incoming@ozlabs.org>);
	Tue, 16 Jul 2013 23:08:37 -0400
Received: from li9-11.members.linode.com ([67.18.176.11]:38454 "EHLO
	imap.thunk.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with
	ESMTP
	id S1750892Ab3GQDIg (ORCPT <rfc822;linux-ext4@vger.kernel.org>);
	Tue, 16 Jul 2013 23:08:36 -0400
Received: from root (helo=closure.thunk.org)
	by imap.thunk.org with local-esmtp (Exim 4.80)
	(envelope-from <tytso@thunk.org>)
	id 1UzID4-0000GO-4o; Wed, 17 Jul 2013 03:15:22 +0000
Received: by closure.thunk.org (Postfix, from userid 15806)
	id A914958071E; Tue, 16 Jul 2013 23:08:32 -0400 (EDT)
Date: Tue, 16 Jul 2013 23:08:32 -0400
From: Theodore Ts'o <tytso@mit.edu>
To: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: linux-ext4 <linux-ext4@vger.kernel.org>
Subject: Re: [PATCH] ext4: Prevent massive fs corruption if verifying the
	block bitmap fails
Message-ID: <20130717030832.GB12908@thunk.org>
References: <20130717020209.GB5785@blackbox.djwong.org>
MIME-Version: 1.0
Content-Disposition: inline
In-Reply-To: <20130717020209.GB5785@blackbox.djwong.org>
User-Agent: Mutt/1.5.21 (2010-09-15)
X-SA-Exim-Connect-IP: <locally generated>
X-SA-Exim-Mail-From: tytso@thunk.org
X-SA-Exim-Scanned: No (on imap.thunk.org); SAEximRunCond expanded to false
Sender: linux-ext4-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-ext4.vger.kernel.org>
X-Mailing-List: linux-ext4@vger.kernel.org

On Tue, Jul 16, 2013 at 07:02:09PM -0700, Darrick J. Wong wrote:
> The block bitmap verification code assumes that calling ext4_error() either
> panics the system or makes the fs readonly.  However, this is not always true:
> when 'errors=continue' is specified, an error is printed but we don't return
> any indication of error to the caller, which is (probably) the block allocator,
> which pretends that the crud we read in off the disk is a usable bitmap.  Yuck.
> 
> A block bitmap that fails the check should at least return no bitmap to the
> caller.  The block allocator should be told to go look in a different group,
> but that's a separate issue.
> 
> The easiest way to reproduce this is to modify bg_block_bitmap (on a ^flex_bg
> fs) to point to a block outside the block group; or you can create a
> metadata_csum filesystem and zero out the block bitmaps after copying files.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

Funny that you sent this.  I was just thinking about forward porting
the following patch which allows us to better recover when a file
system is mounted errors=continue, and we notice a file system
corruption when freeing an already freed block.  We've had this in our
tree for a while but we had never gotten around to get it upstream.
This patch doesn't deal metadata checksum verification, but if we
combine this patch and yours, I think it might be peanut butter and
chocolate.  :-)

One thing I like about Aditya's patch is that if we do detect a
problem, we don't reflect an error (which in some cases with your
patch would be the somewhat confusing errno of ENOMEM, although that's
a pre-existing problem with the ext4_read_block_bitmap_nowait
interface).  Instead, we just skip that block group and try to
allocate somewhere else.

						- Ted

commit 1298d9e3d96aedde30d0302d696a540b72f61709
Author: Aditya Kali <adityakali@google.com>
Date:   Fri Nov 2 13:51:35 2012 -0700

    ext4: Mark block group as corrupt on bitmap error
    
    When we notice a block-bitmap corruption (because of device
    failure or something else), we should mark this group as
    corrupt and prevent further block allocations/deallocations
    from it. Currently, we end up generating one error message
    for every block in the bitmap. This potentially could make
    the system unstable as noticed in some bugs. With this patch,
    the error will be printed only the first time and mark the
    entire block group as corrupted. This prevents future access
    allocations/deallocaitons from it.
    
    Tested: FS-Smoke test & performance presubmit test.
    Ran 3 test(s): 3 passing, 0 flaky, 0 failing, 0 with other errors; 0 test(s) were skipped
    Also tested by corrupting the block
    bitmap and forcefully introducing the mb_free_blocks error:
    (1) create a largefile (2Gb)
    $ dd if=/dev/zero of=largefile oflag=direct bs=10485760 count=200
    (2) umount filesystem. use dumpe2fs to see which block-bitmaps
    are in use by largefile and note their block numbers
    (3) use dd to zero-out the used block bitmaps
    $ dd if=/dev/zero of=/dev/hdc4 bs=4096 seek=14 count=8 oflag=direct
    (4) mount the FS and delete the largefile.
    (5) recreate the largefile. verify that the new largefile does not
    get any blocks from the groups marked as bad.
    Without the patch, we will see mb_free_blocks error for each bit in
    each zero'ed out bitmap at (4). With the patch, we only see the error
    once per blockgroup:
    [  309.706803] EXT4-fs error (device sdb4): ext4_mb_generate_buddy:735: group 15: 32768 clusters in bitmap, 0 in gd. blk grp corrupted.
    [  309.720824] EXT4-fs error (device sdb4): ext4_mb_generate_buddy:735: group 14: 32768 clusters in bitmap, 0 in gd. blk grp corrupted.
    [  309.732858] EXT4-fs error (device sdb4) in ext4_free_blocks:4802: IO failure
    [  309.748321] EXT4-fs error (device sdb4): ext4_mb_generate_buddy:735: group 13: 32768 clusters in bitmap, 0 in gd. blk grp corrupted.
    [  309.760331] EXT4-fs error (device sdb4) in ext4_free_blocks:4802: IO failure
    [  309.769695] EXT4-fs error (device sdb4): ext4_mb_generate_buddy:735: group 12: 32768 clusters in bitmap, 0 in gd. blk grp corrupted.
    [  309.781721] EXT4-fs error (device sdb4) in ext4_free_blocks:4802: IO failure
    [  309.798166] EXT4-fs error (device sdb4): ext4_mb_generate_buddy:735: group 11: 32768 clusters in bitmap, 0 in gd. blk grp corrupted.
    [  309.810184] EXT4-fs error (device sdb4) in ext4_free_blocks:4802: IO failure
    [  309.819532] EXT4-fs error (device sdb4): ext4_mb_generate_buddy:735: group 10: 32768 clusters in bitmap, 0 in gd. blk grp corrupted.
    
    Google-Bug-Id: 7258357
    
    Conflicts:
    
    	fs/ext4/ext4.h
    
    Rebase-Tested-v3.3: As Tested. PerfPresubmit:
    Ran 3 test(s): 3 passing, 0 flaky, 0 failing, 0 with other errors; 0 test(s) were skipped
    Effort: fs/ext4
    Origin-2.6.34-SHA1: 684e8895c7aaa742657cfc9a5204f8ccdcd8e2e4
    Change-Id: I9aa96598386d1c964afcd97231786237e5ea4915
---
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index ee8e4f3..12d4be3 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -2289,9 +2289,12 @@ struct ext4_group_info {
 
 #define EXT4_GROUP_INFO_NEED_INIT_BIT		0
 #define EXT4_GROUP_INFO_WAS_TRIMMED_BIT		1
+#define EXT4_GROUP_INFO_CORRUPT_BIT		2
 
 #define EXT4_MB_GRP_NEED_INIT(grp)	\
 	(test_bit(EXT4_GROUP_INFO_NEED_INIT_BIT, &((grp)->bb_state)))
+#define EXT4_MB_GRP_CORRUPT(grp)	\
+	(test_bit(EXT4_GROUP_INFO_CORRUPT_BIT, &((grp)->bb_state)))
 
 #define EXT4_MB_GRP_WAS_TRIMMED(grp)	\
 	(test_bit(EXT4_GROUP_INFO_WAS_TRIMMED_BIT, &((grp)->bb_state)))
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 5b8eebc..31bbe44 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -742,13 +742,15 @@ void ext4_mb_generate_buddy(struct super_block *sb,
 
 	if (free != grp->bb_free) {
 		ext4_grp_locked_error(sb, group, 0, 0,
-				      "%u clusters in bitmap, %u in gd",
+				      "%u clusters in bitmap, %u in gd. "
+				      "blk grp corrupted.",
 				      free, grp->bb_free);
 		/*
 		 * If we intent to continue, we consider group descritor
 		 * corrupt and update bb_free using bitmap value
 		 */
 		grp->bb_free = free;
+		set_bit(EXT4_GROUP_INFO_CORRUPT_BIT, &grp->bb_state);
 	}
 	mb_set_largest_free_order(sb, grp);
 
@@ -1276,6 +1278,10 @@ static void mb_free_blocks(struct inode *inode, struct ext4_buddy *e4b,
 
 	BUG_ON(first + count > (sb->s_blocksize << 3));
 	assert_spin_locked(ext4_group_lock_ptr(sb, e4b->bd_group));
+	/* Don't bother if the block group is corrupt. */
+	if (unlikely(EXT4_MB_GRP_CORRUPT(e4b->bd_info)))
+		return;
+
 	mb_check_buddy(e4b);
 	mb_free_blocks_double(inode, e4b, first, count);
 
@@ -1307,7 +1313,12 @@ static void mb_free_blocks(struct inode *inode, struct ext4_buddy *e4b,
 					      inode ? inode->i_ino : 0,
 					      blocknr,
 					      "freeing already freed block "
-					      "(bit %u)", block);
+					      "(bit %u). blk group corrupted.",
+					      block);
+			/* Mark the block group as corrupt. */
+			set_bit(EXT4_GROUP_INFO_CORRUPT_BIT,
+				&e4b->bd_info->bb_state);
+			return;
 		}
 		mb_clear_bit(block, EXT4_MB_BITMAP(e4b));
 		e4b->bd_info->bb_counters[order]++;
@@ -1681,6 +1692,11 @@ int ext4_mb_find_by_goal(struct ext4_allocation_context *ac,
 	if (err)
 		return err;
 
+	if (unlikely(EXT4_MB_GRP_CORRUPT(e4b->bd_info))) {
+		ext4_mb_unload_buddy(e4b);
+		return 0;
+	}
+
 	ext4_lock_group(ac->ac_sb, group);
 	max = mb_find_extent(e4b, 0, ac->ac_g_ex.fe_start,
 			     ac->ac_g_ex.fe_len, &ex);
@@ -1872,6 +1888,9 @@ static int ext4_mb_good_group(struct ext4_allocation_context *ac,
 
 	BUG_ON(cr < 0 || cr >= 4);
 
+	if (unlikely(EXT4_MB_GRP_CORRUPT(grp)))
+		return 0;
+
 	/* We only do this if the grp has never been initialized */
 	if (unlikely(EXT4_MB_GRP_NEED_INIT(grp))) {
 		int ret = ext4_mb_init_group(ac->ac_sb, group);
@@ -4600,6 +4619,10 @@ do_more:
 	overflow = 0;
 	ext4_get_group_no_and_offset(sb, block, &block_group, &bit);
 
+	if (unlikely(EXT4_MB_GRP_CORRUPT(
+			ext4_get_group_info(sb, block_group))))
+		return;
+
 	/*
 	 * Check to see if we are freeing blocks across a group
 	 * boundary.