Message ID | 1604764698-4269-6-git-send-email-brookxu@tencent.com |
---|---|
State | Accepted |
Headers | show |
Series | [RESEND,1/8] ext4: use ext4_assert() to replace J_ASSERT() | expand |
On Sat, Nov 07, 2020 at 11:58:16PM +0800, Chunguang Xu wrote: > From: Chunguang Xu <brookxu@tencent.com> > > There is a need to check whether a block or a segment overlaps > with metadata, since information of system_zone is incomplete, > we need a more accurate function. Now we check whether it > overlaps with block bitmap, inode bitmap, and inode table. > Perhaps it is better to add a check of super_block and block > group descriptor and provide a helper function. The original code was valid only for file systems that are not using flex_bg. I suspect the Lustre folks who implemented mballoc.c did so before flex_bg, and fortunately, on flex_bg we the check is simply going to have more false negaties, but not any false positives, so no one noticed. > +/* > + * Returns 1 if the passed-in block region (block, block+count) > + * overlaps with some other filesystem metadata blocks. Others, > + * return 0. > + */ > +int ext4_metadata_block_overlaps(struct super_block *sb, > + ext4_group_t block_group, > + ext4_fsblk_t block, > + unsigned long count) > +{ > + struct ext4_sb_info *sbi = EXT4_SB(sb); > + struct ext4_group_desc *gdp; > + int gd_first = ext4_group_first_block_no(sb, block_group); > + int itable, gd_blk; > + int ret = 0; > + > + gdp = ext4_get_group_desc(sb, block_group, NULL); > + // check block bitmap and inode bitmap > + if (in_range(ext4_block_bitmap(sb, gdp), block, count) || > + in_range(ext4_inode_bitmap(sb, gdp), block, count)) We are only checking a single block group descriptor; this is fine if the allocation bitmaps and inode table are guaranteed to be located in their own block group. But this is no longer true when flex_bg is enabled. I think what we should do is to rely on the rb tree maintained by block_validity.c (if the inode number is zero, then the entry refers to blocks in the "system zone"); that's going to be a much more complete check. What do you think? - Ted
Theodore Y. Ts'o wrote on 2020/12/9 12:55: > On Sat, Nov 07, 2020 at 11:58:16PM +0800, Chunguang Xu wrote: >> From: Chunguang Xu <brookxu@tencent.com> >> >> There is a need to check whether a block or a segment overlaps >> with metadata, since information of system_zone is incomplete, >> we need a more accurate function. Now we check whether it >> overlaps with block bitmap, inode bitmap, and inode table. >> Perhaps it is better to add a check of super_block and block >> group descriptor and provide a helper function. > > The original code was valid only for file systems that are not using > flex_bg. I suspect the Lustre folks who implemented mballoc.c did so > before flex_bg, and fortunately, on flex_bg we the check is simply > going to have more false negaties, but not any false positives, so no > one noticed. > >> +/* >> + * Returns 1 if the passed-in block region (block, block+count) >> + * overlaps with some other filesystem metadata blocks. Others, >> + * return 0. >> + */ >> +int ext4_metadata_block_overlaps(struct super_block *sb, >> + ext4_group_t block_group, >> + ext4_fsblk_t block, >> + unsigned long count) >> +{ >> + struct ext4_sb_info *sbi = EXT4_SB(sb); >> + struct ext4_group_desc *gdp; >> + int gd_first = ext4_group_first_block_no(sb, block_group); >> + int itable, gd_blk; >> + int ret = 0; >> + >> + gdp = ext4_get_group_desc(sb, block_group, NULL); >> + // check block bitmap and inode bitmap >> + if (in_range(ext4_block_bitmap(sb, gdp), block, count) || >> + in_range(ext4_inode_bitmap(sb, gdp), block, count)) > > We are only checking a single block group descriptor; this is fine if > the allocation bitmaps and inode table are guaranteed to be located in > their own block group. But this is no longer true when flex_bg is > enabled. Right, the check of bb and ib here is not very correct. > I think what we should do is to rely on the rb tree maintained by > block_validity.c (if the inode number is zero, then the entry refers > to blocks in the "system zone"); that's going to be a much more > complete check. > > What do you think? This is a good idea. After we merge ext4: add the gdt block of meta_bg to system_zone, the metadata information of system_zone is relatively complete. Using system_zone makes the logic clearer. However, due to the need for additional tree search, there is a performance risk. I will try this method later and test the performance overhead. > - Ted >
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index bd88b4a..73d59cb 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -2801,7 +2801,10 @@ extern ext4_group_t ext4_mb_prefetch(struct super_block *sb, unsigned int nr, int *cnt); extern void ext4_mb_prefetch_fini(struct super_block *sb, ext4_group_t group, unsigned int nr); - +extern int ext4_metadata_block_overlaps(struct super_block *sb, + ext4_group_t block_group, + ext4_fsblk_t block, + unsigned long count); extern void ext4_free_blocks(handle_t *handle, struct inode *inode, struct buffer_head *bh, ext4_fsblk_t block, unsigned long count, int flags); diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index 29dfeb04..d8704fe 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -5061,6 +5061,49 @@ static void ext4_try_merge_freed_extent(struct ext4_sb_info *sbi, kmem_cache_free(ext4_free_data_cachep, entry); } +/* + * Returns 1 if the passed-in block region (block, block+count) + * overlaps with some other filesystem metadata blocks. Others, + * return 0. + */ +int ext4_metadata_block_overlaps(struct super_block *sb, + ext4_group_t block_group, + ext4_fsblk_t block, + unsigned long count) +{ + struct ext4_sb_info *sbi = EXT4_SB(sb); + struct ext4_group_desc *gdp; + int gd_first = ext4_group_first_block_no(sb, block_group); + int itable, gd_blk; + int ret = 0; + + gdp = ext4_get_group_desc(sb, block_group, NULL); + // check block bitmap and inode bitmap + if (in_range(ext4_block_bitmap(sb, gdp), block, count) || + in_range(ext4_inode_bitmap(sb, gdp), block, count)) + ret = 1; + + // check inode table + itable = ext4_inode_table(sb, gdp); + if (!(block >= itable + sbi->s_itb_per_group || + block + count - 1 < itable)) + ret = 1; + + /* check super_block and block group descriptor table, the + * reserved space of the block group descriptor is managed + * by resize_inode, it may not be processed now due to + * performance. + */ + gd_blk = ext4_bg_has_super(sb, block_group) + + ext4_bg_num_gdb(sb, block_group); + if (gd_blk) { + if (!(block >= gd_first + gd_blk || + block + count - 1 < gd_first)) + ret = 1; + } + return ret; +} + static noinline_for_stack int ext4_mb_free_metadata(handle_t *handle, struct ext4_buddy *e4b, struct ext4_free_data *new_entry) @@ -5360,13 +5403,7 @@ void ext4_free_blocks(handle_t *handle, struct inode *inode, goto error_return; } - if (in_range(ext4_block_bitmap(sb, gdp), block, count) || - in_range(ext4_inode_bitmap(sb, gdp), block, count) || - in_range(block, ext4_inode_table(sb, gdp), - sbi->s_itb_per_group) || - in_range(block + count - 1, ext4_inode_table(sb, gdp), - sbi->s_itb_per_group)) { - + if (ext4_metadata_block_overlaps(sb, block_group, block, count)) { ext4_error(sb, "Freeing blocks in system zone - " "Block = %llu, count = %lu", block, count); /* err = 0. ext4_std_error should be a no op */ @@ -5552,11 +5589,7 @@ int ext4_group_add_blocks(handle_t *handle, struct super_block *sb, goto error_return; } - if (in_range(ext4_block_bitmap(sb, desc), block, count) || - in_range(ext4_inode_bitmap(sb, desc), block, count) || - in_range(block, ext4_inode_table(sb, desc), sbi->s_itb_per_group) || - in_range(block + count - 1, ext4_inode_table(sb, desc), - sbi->s_itb_per_group)) { + if (ext4_metadata_block_overlaps(sb, block_group, block, count)) { ext4_error(sb, "Adding blocks in system zones - " "Block = %llu, count = %lu", block, count);