diff mbox series

Ext4: Set BBITMAP_CORRUPT_BIT when failing to read the allocation bitmap

Message ID 1523673237-112405-1-git-send-email-bo.liu@linux.alibaba.com
State Superseded, archived
Headers show
Series Ext4: Set BBITMAP_CORRUPT_BIT when failing to read the allocation bitmap | expand

Commit Message

Liu Bo April 14, 2018, 2:33 a.m. UTC
With failing to read the allocation block bitmap due to disk read
errors, ext4 might end up with in-memory buddy bitmap being
inconsistent with on-disk block bitmap.  And the behavior would be
unpredictable, one of the situations I got was

[943102.751298] ------------[ cut here ]------------
[943102.751299] kernel BUG at fs/ext4/mballoc.c:1911!
[943102.751300] invalid opcode: 0000 [#1] SMP
...
[943102.751353] Call Trace:
[943102.751363]  [<ffffffffa05340a6>] ext4_mb_regular_allocator+0x356/0x460 [ext4]
[943102.751371]  [<ffffffffa0535b9c>] ext4_mb_new_blocks+0x5ec/0xaf0 [ext4]
[943102.751379]  [<ffffffffa052434d>] ? __read_extent_tree_block+0x5d/0x1f0 [ext4]
[943102.751386]  [<ffffffffa0525633>] ? ext4_find_extent+0x143/0x2d0 [ext4]
[943102.751394]  [<ffffffffa052a7de>] ext4_ext_map_blocks+0xb5e/0xf30 [ext4]
[943102.751397]  [<ffffffff811bb75c>] ? node_dirty_ok+0x12c/0x170
[943102.751403]  [<ffffffffa04f7802>] ext4_map_blocks+0x172/0x600 [ext4]
[943102.751406]  [<ffffffff8127a8c1>] ? alloc_buffer_head+0x21/0x60
[943102.751407]  [<ffffffff81233601>] ? mem_cgroup_commit_charge+0x91/0x530
[943102.751413]  [<ffffffffa04f7d22>] _ext4_get_block+0x92/0x100 [ext4]
[943102.751419]  [<ffffffffa04f7da6>] ext4_get_block+0x16/0x20 [ext4]
[943102.751420]  [<ffffffff8127d357>] __block_write_begin_int+0x197/0x5e0
[943102.751425]  [<ffffffffa04f7d90>] ? _ext4_get_block+0x100/0x100 [ext4]
[943102.751432]  [<ffffffffa04fcb56>] ? ext4_write_begin+0x126/0x5b0 [ext4]
[943102.751433]  [<ffffffff8127d7b1>] __block_write_begin+0x11/0x20
[943102.751439]  [<ffffffffa04fcbdc>] ext4_write_begin+0x1ac/0x5b0 [ext4]
[943102.751446]  [<ffffffffa052cfdd>] ? __ext4_journal_stop+0x3d/0xa0 [ext4]
[943102.751449]  [<ffffffff811ab578>] generic_perform_write+0xc8/0x1c0
[943102.751451]  [<ffffffff8125f37e>] ? file_update_time+0x5e/0x110
[943102.751452]  [<ffffffff811adbb5>] __generic_file_write_iter+0x185/0x1d0
[943102.751458]  [<ffffffffa04f1a6b>] ext4_file_write_iter+0x8b/0x380 [ext4]
[943102.751460]  [<ffffffff81247429>] ? vfs_getattr_nosec+0x29/0x40
[943102.751462]  [<ffffffff81247c6f>] ? cp_new_stat+0x14f/0x180
[943102.751463]  [<ffffffff81241115>] __vfs_write+0xe5/0x160
[943102.751464]  [<ffffffff812423b5>] vfs_write+0xb5/0x1a0
[943102.751465]  [<ffffffff81243875>] SyS_write+0x55/0xc0
[943102.751468]  [<ffffffff8171a6da>] entry_SYSCALL_64_fastpath+0x1a/0xc5
[943102.751476] Code: 39 44 24 3c 75 27 49 8b 85 60 <0f> 0b 0f 0b e8 3b 57 b5 e0 90 66
[943102.751484] RIP  [<ffffffffa05336dc>] ext4_mb_simple_scan_group+0x14c/0x160 [ext4]
[943102.751484]  RSP <ffffc90037997820>

To avoid the above, EXT4_GROUP_INFO_BBITMAP_CORRUPT_BIT should be set
in order to prevent any further allocations from this block group.

Suggested-by: Theodore Y. Ts'o <tytso@mit.edu>
Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
---
 fs/ext4/balloc.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

Comments

Theodore Ts'o April 16, 2018, 3:14 p.m. UTC | #1
On Mon, Apr 16, 2018 at 01:37:23AM +0000, Wanig Shilong wrote:
>   Yup, that patch have been used in Lustre deployment from RHEL6 and RHEL7 for years.
> I think what Liu Bo tried to fix is a BUG.h
>   If you agreed, I could send our improved patch for bitmap corruptions handling  to upstream.

Yes, please send your patch out to the list so we can look at the two
take the best one.

(For one thing, we probably want to do the same thing for the inode
allocation bitmap.)

Thanks,

					- Ted
Liu Bo April 16, 2018, 5:01 p.m. UTC | #2
On Mon, Apr 16, 2018 at 01:37:23AM +0000, Wang Shilong wrote:
>   Yup, that patch have been used in Lustre deployment from RHEL6 and RHEL7 for years.
> I think what Liu Bo tried to fix is a BUG.
>   If you agreed, I could send our improved patch for bitmap corruptions handling  to upstream.

The patch looks good to me, but needs to cover ext4_wait_block_bitmap
as well with ext4_corrupted_block_group().

thanks,
-liubo
Wang Shilong April 18, 2018, 12:12 a.m. UTC | #3
HI,

 I just sent some of my patches out, it is cleanup and some fixes that I found
which should be sent earlier.

 I am trying to cook another one which will take a bit time.

Thanks,
Shilong
diff mbox series

Patch

diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c
index f9b3e0a83526..f0a5ed3df5fa 100644
--- a/fs/ext4/balloc.c
+++ b/fs/ext4/balloc.c
@@ -499,9 +499,19 @@  int ext4_wait_block_bitmap(struct super_block *sb, ext4_group_t block_group,
 		return -EFSCORRUPTED;
 	wait_on_buffer(bh);
 	if (!buffer_uptodate(bh)) {
+		struct ext4_group_info *grp =
+			ext4_get_group_info(sb, block_group);
+		struct ext4_sb_info *sbi = EXT4_SB(sb);
+
 		ext4_error(sb, "Cannot read block bitmap - "
 			   "block_group = %u, block_bitmap = %llu",
 			   block_group, (unsigned long long) bh->b_blocknr);
+
+		if (!EXT4_MB_GRP_BBITMAP_CORRUPT(grp))
+			percpu_counter_sub(&sbi->s_freeclusters_counter,
+					   grp->bb_free);
+		set_bit(EXT4_GROUP_INFO_BBITMAP_CORRUPT_BIT, &grp->bb_state);
+
 		return -EIO;
 	}
 	clear_buffer_new(bh);