diff mbox series

[6/8] ext4: add a helper function to validate metadata block

Message ID 1604764698-4269-6-git-send-email-brookxu@tencent.com
State Accepted
Headers show
Series [RESEND,1/8] ext4: use ext4_assert() to replace J_ASSERT() | expand

Commit Message

brookxu Nov. 7, 2020, 3:58 p.m. UTC
From: Chunguang Xu <brookxu@tencent.com>

There is a need to check whether a block or a segment overlaps
with metadata, since information of system_zone is incomplete,
we need a more accurate function. Now we check whether it
overlaps with block bitmap, inode bitmap, and inode table.
Perhaps it is better to add a check of super_block and block
group descriptor and provide a helper function.

Signed-off-by: Chunguang Xu <brookxu@tencent.com>
---
 fs/ext4/ext4.h    |  5 ++++-
 fs/ext4/mballoc.c | 57 +++++++++++++++++++++++++++++++++++++++++++------------
 2 files changed, 49 insertions(+), 13 deletions(-)

Comments

Theodore Ts'o Dec. 9, 2020, 4:55 a.m. UTC | #1
On Sat, Nov 07, 2020 at 11:58:16PM +0800, Chunguang Xu wrote:
> From: Chunguang Xu <brookxu@tencent.com>
> 
> There is a need to check whether a block or a segment overlaps
> with metadata, since information of system_zone is incomplete,
> we need a more accurate function. Now we check whether it
> overlaps with block bitmap, inode bitmap, and inode table.
> Perhaps it is better to add a check of super_block and block
> group descriptor and provide a helper function.

The original code was valid only for file systems that are not using
flex_bg.  I suspect the Lustre folks who implemented mballoc.c did so
before flex_bg, and fortunately, on flex_bg we the check is simply
going to have more false negaties, but not any false positives, so no
one noticed.

> +/*
> + * Returns 1 if the passed-in block region (block, block+count)
> + * overlaps with some other filesystem metadata blocks. Others,
> + * return 0.
> + */
> +int ext4_metadata_block_overlaps(struct super_block *sb,
> +				 ext4_group_t block_group,
> +				 ext4_fsblk_t block,
> +				 unsigned long count)
> +{
> +	struct ext4_sb_info *sbi = EXT4_SB(sb);
> +	struct ext4_group_desc *gdp;
> +	int gd_first = ext4_group_first_block_no(sb, block_group);
> +	int itable, gd_blk;
> +	int ret = 0;
> +
> +	gdp = ext4_get_group_desc(sb, block_group, NULL);
> +	// check block bitmap and inode bitmap
> +	if (in_range(ext4_block_bitmap(sb, gdp), block, count) ||
> +	    in_range(ext4_inode_bitmap(sb, gdp), block, count))

We are only checking a single block group descriptor; this is fine if
the allocation bitmaps and inode table are guaranteed to be located in
their own block group.  But this is no longer true when flex_bg is
enabled.

I think what we should do is to rely on the rb tree maintained by
block_validity.c (if the inode number is zero, then the entry refers
to blocks in the "system zone"); that's going to be a much more
complete check.

What do you think?

						- Ted
brookxu Dec. 9, 2020, 12:12 p.m. UTC | #2
Theodore Y. Ts'o wrote on 2020/12/9 12:55:
> On Sat, Nov 07, 2020 at 11:58:16PM +0800, Chunguang Xu wrote:
>> From: Chunguang Xu <brookxu@tencent.com>
>>
>> There is a need to check whether a block or a segment overlaps
>> with metadata, since information of system_zone is incomplete,
>> we need a more accurate function. Now we check whether it
>> overlaps with block bitmap, inode bitmap, and inode table.
>> Perhaps it is better to add a check of super_block and block
>> group descriptor and provide a helper function.
> 
> The original code was valid only for file systems that are not using
> flex_bg.  I suspect the Lustre folks who implemented mballoc.c did so
> before flex_bg, and fortunately, on flex_bg we the check is simply
> going to have more false negaties, but not any false positives, so no
> one noticed.
> 
>> +/*
>> + * Returns 1 if the passed-in block region (block, block+count)
>> + * overlaps with some other filesystem metadata blocks. Others,
>> + * return 0.
>> + */
>> +int ext4_metadata_block_overlaps(struct super_block *sb,
>> +				 ext4_group_t block_group,
>> +				 ext4_fsblk_t block,
>> +				 unsigned long count)
>> +{
>> +	struct ext4_sb_info *sbi = EXT4_SB(sb);
>> +	struct ext4_group_desc *gdp;
>> +	int gd_first = ext4_group_first_block_no(sb, block_group);
>> +	int itable, gd_blk;
>> +	int ret = 0;
>> +
>> +	gdp = ext4_get_group_desc(sb, block_group, NULL);
>> +	// check block bitmap and inode bitmap
>> +	if (in_range(ext4_block_bitmap(sb, gdp), block, count) ||
>> +	    in_range(ext4_inode_bitmap(sb, gdp), block, count))
> 
> We are only checking a single block group descriptor; this is fine if
> the allocation bitmaps and inode table are guaranteed to be located in
> their own block group.  But this is no longer true when flex_bg is
> enabled.

 Right, the check of bb and ib here is not very correct.

> I think what we should do is to rely on the rb tree maintained by
> block_validity.c (if the inode number is zero, then the entry refers
> to blocks in the "system zone"); that's going to be a much more
> complete check.
> 
> What do you think?

This is a good idea. After we merge ext4: add the gdt block of
meta_bg to system_zone, the metadata information of system_zone
is relatively complete. Using system_zone makes the logic
clearer. However, due to the need for additional tree search,
there is a performance risk. I will try this method later and
test the performance overhead.

> 						- Ted
>
diff mbox series

Patch

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index bd88b4a..73d59cb 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -2801,7 +2801,10 @@  extern ext4_group_t ext4_mb_prefetch(struct super_block *sb,
 				     unsigned int nr, int *cnt);
 extern void ext4_mb_prefetch_fini(struct super_block *sb, ext4_group_t group,
 				  unsigned int nr);
-
+extern int ext4_metadata_block_overlaps(struct super_block *sb,
+					ext4_group_t block_group,
+					ext4_fsblk_t block,
+					unsigned long count);
 extern void ext4_free_blocks(handle_t *handle, struct inode *inode,
 			     struct buffer_head *bh, ext4_fsblk_t block,
 			     unsigned long count, int flags);
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 29dfeb04..d8704fe 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -5061,6 +5061,49 @@  static void ext4_try_merge_freed_extent(struct ext4_sb_info *sbi,
 	kmem_cache_free(ext4_free_data_cachep, entry);
 }
 
+/*
+ * Returns 1 if the passed-in block region (block, block+count)
+ * overlaps with some other filesystem metadata blocks. Others,
+ * return 0.
+ */
+int ext4_metadata_block_overlaps(struct super_block *sb,
+				 ext4_group_t block_group,
+				 ext4_fsblk_t block,
+				 unsigned long count)
+{
+	struct ext4_sb_info *sbi = EXT4_SB(sb);
+	struct ext4_group_desc *gdp;
+	int gd_first = ext4_group_first_block_no(sb, block_group);
+	int itable, gd_blk;
+	int ret = 0;
+
+	gdp = ext4_get_group_desc(sb, block_group, NULL);
+	// check block bitmap and inode bitmap
+	if (in_range(ext4_block_bitmap(sb, gdp), block, count) ||
+	    in_range(ext4_inode_bitmap(sb, gdp), block, count))
+		ret = 1;
+
+	// check inode table
+	itable = ext4_inode_table(sb, gdp);
+	if (!(block >= itable + sbi->s_itb_per_group ||
+	      block + count - 1  < itable))
+		ret = 1;
+
+	/* check super_block and block group descriptor table, the
+	 * reserved space of the block group descriptor is managed
+	 * by resize_inode, it may not be processed now due to
+	 * performance.
+	 */
+	gd_blk = ext4_bg_has_super(sb, block_group) +
+		ext4_bg_num_gdb(sb, block_group);
+	if (gd_blk) {
+		if (!(block >= gd_first + gd_blk ||
+		      block + count - 1 < gd_first))
+			ret = 1;
+	}
+	return ret;
+}
+
 static noinline_for_stack int
 ext4_mb_free_metadata(handle_t *handle, struct ext4_buddy *e4b,
 		      struct ext4_free_data *new_entry)
@@ -5360,13 +5403,7 @@  void ext4_free_blocks(handle_t *handle, struct inode *inode,
 		goto error_return;
 	}
 
-	if (in_range(ext4_block_bitmap(sb, gdp), block, count) ||
-	    in_range(ext4_inode_bitmap(sb, gdp), block, count) ||
-	    in_range(block, ext4_inode_table(sb, gdp),
-		     sbi->s_itb_per_group) ||
-	    in_range(block + count - 1, ext4_inode_table(sb, gdp),
-		     sbi->s_itb_per_group)) {
-
+	if (ext4_metadata_block_overlaps(sb, block_group, block, count)) {
 		ext4_error(sb, "Freeing blocks in system zone - "
 			   "Block = %llu, count = %lu", block, count);
 		/* err = 0. ext4_std_error should be a no op */
@@ -5552,11 +5589,7 @@  int ext4_group_add_blocks(handle_t *handle, struct super_block *sb,
 		goto error_return;
 	}
 
-	if (in_range(ext4_block_bitmap(sb, desc), block, count) ||
-	    in_range(ext4_inode_bitmap(sb, desc), block, count) ||
-	    in_range(block, ext4_inode_table(sb, desc), sbi->s_itb_per_group) ||
-	    in_range(block + count - 1, ext4_inode_table(sb, desc),
-		     sbi->s_itb_per_group)) {
+	if (ext4_metadata_block_overlaps(sb, block_group, block, count)) {
 		ext4_error(sb, "Adding blocks in system zones - "
 			   "Block = %llu, count = %lu",
 			   block, count);