diff mbox series

[v3,1/3] ext4: fix bug_on ext4_mb_use_inode_pa

Message ID 20220528110017.354175-2-libaokun1@huawei.com
State Awaiting Upstream
Headers show
Series ext4: fix two bugs in ext4_mb_normalize_request | expand

Commit Message

Baokun Li May 28, 2022, 11 a.m. UTC
Hulk Robot reported a BUG_ON:
==================================================================
kernel BUG at fs/ext4/mballoc.c:3211!
[...]
RIP: 0010:ext4_mb_mark_diskspace_used.cold+0x85/0x136f
[...]
Call Trace:
 ext4_mb_new_blocks+0x9df/0x5d30
 ext4_ext_map_blocks+0x1803/0x4d80
 ext4_map_blocks+0x3a4/0x1a10
 ext4_writepages+0x126d/0x2c30
 do_writepages+0x7f/0x1b0
 __filemap_fdatawrite_range+0x285/0x3b0
 file_write_and_wait_range+0xb1/0x140
 ext4_sync_file+0x1aa/0xca0
 vfs_fsync_range+0xfb/0x260
 do_fsync+0x48/0xa0
[...]
==================================================================

Above issue may happen as follows:
-------------------------------------
do_fsync
 vfs_fsync_range
  ext4_sync_file
   file_write_and_wait_range
    __filemap_fdatawrite_range
     do_writepages
      ext4_writepages
       mpage_map_and_submit_extent
        mpage_map_one_extent
         ext4_map_blocks
          ext4_mb_new_blocks
           ext4_mb_normalize_request
            >>> start + size <= ac->ac_o_ex.fe_logical
           ext4_mb_regular_allocator
            ext4_mb_simple_scan_group
             ext4_mb_use_best_found
              ext4_mb_new_preallocation
               ext4_mb_new_inode_pa
                ext4_mb_use_inode_pa
                 >>> set ac->ac_b_ex.fe_len <= 0
           ext4_mb_mark_diskspace_used
            >>> BUG_ON(ac->ac_b_ex.fe_len <= 0);

we can easily reproduce this problem with the following commands:
	`fallocate -l100M disk`
	`mkfs.ext4 -b 1024 -g 256 disk`
	`mount disk /mnt`
	`fsstress -d /mnt -l 0 -n 1000 -p 1`

The size must be smaller than or equal to EXT4_BLOCKS_PER_GROUP.
Therefore, "start + size <= ac->ac_o_ex.fe_logical" may occur
when the size is truncated. So start should be the start position of
the group where ac_o_ex.fe_logical is located after alignment.
In addition, when the value of fe_logical or EXT4_BLOCKS_PER_GROUP
is very large, the value calculated by start_off is more accurate.

Fixes: cd648b8a8fd5 ("ext4: trim allocation requests to group size")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Baokun Li <libaokun1@huawei.com>
---
V1->V2:
	Replace round_down() with rounddown().
	Modified comments.
V2->V3:
	Convert EXT4_BLOCKS_PER_GROUP type to ext4_lblk_t
	to avoid compilation warnings.

 fs/ext4/mballoc.c | 9 +++++++++
 1 file changed, 9 insertions(+)

Comments

Ritesh Harjani (IBM) May 28, 2022, 3:10 p.m. UTC | #1
On 22/05/28 07:00PM, Baokun Li wrote:
> Hulk Robot reported a BUG_ON:
> ==================================================================
> kernel BUG at fs/ext4/mballoc.c:3211!
> [...]
> RIP: 0010:ext4_mb_mark_diskspace_used.cold+0x85/0x136f
> [...]
> Call Trace:
>  ext4_mb_new_blocks+0x9df/0x5d30
>  ext4_ext_map_blocks+0x1803/0x4d80
>  ext4_map_blocks+0x3a4/0x1a10
>  ext4_writepages+0x126d/0x2c30
>  do_writepages+0x7f/0x1b0
>  __filemap_fdatawrite_range+0x285/0x3b0
>  file_write_and_wait_range+0xb1/0x140
>  ext4_sync_file+0x1aa/0xca0
>  vfs_fsync_range+0xfb/0x260
>  do_fsync+0x48/0xa0
> [...]
> ==================================================================
>
> Above issue may happen as follows:
> -------------------------------------
> do_fsync
>  vfs_fsync_range
>   ext4_sync_file
>    file_write_and_wait_range
>     __filemap_fdatawrite_range
>      do_writepages
>       ext4_writepages
>        mpage_map_and_submit_extent
>         mpage_map_one_extent
>          ext4_map_blocks
>           ext4_mb_new_blocks
>            ext4_mb_normalize_request
>             >>> start + size <= ac->ac_o_ex.fe_logical
>            ext4_mb_regular_allocator
>             ext4_mb_simple_scan_group
>              ext4_mb_use_best_found
>               ext4_mb_new_preallocation
>                ext4_mb_new_inode_pa
>                 ext4_mb_use_inode_pa
>                  >>> set ac->ac_b_ex.fe_len <= 0
>            ext4_mb_mark_diskspace_used
>             >>> BUG_ON(ac->ac_b_ex.fe_len <= 0);
>
> we can easily reproduce this problem with the following commands:
> 	`fallocate -l100M disk`
> 	`mkfs.ext4 -b 1024 -g 256 disk`
> 	`mount disk /mnt`
> 	`fsstress -d /mnt -l 0 -n 1000 -p 1`
>
> The size must be smaller than or equal to EXT4_BLOCKS_PER_GROUP.
> Therefore, "start + size <= ac->ac_o_ex.fe_logical" may occur
> when the size is truncated. So start should be the start position of
> the group where ac_o_ex.fe_logical is located after alignment.
> In addition, when the value of fe_logical or EXT4_BLOCKS_PER_GROUP
> is very large, the value calculated by start_off is more accurate.
>
> Fixes: cd648b8a8fd5 ("ext4: trim allocation requests to group size")
> Reported-by: Hulk Robot <hulkci@huawei.com>
> Signed-off-by: Baokun Li <libaokun1@huawei.com>
> ---
> V1->V2:
> 	Replace round_down() with rounddown().
> 	Modified comments.
> V2->V3:
> 	Convert EXT4_BLOCKS_PER_GROUP type to ext4_lblk_t
> 	to avoid compilation warnings.

Looks good to me. Feel free to add -

Reviewed-by: Ritesh Harjani <ritesh.list@gmail.com>

>
>  fs/ext4/mballoc.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
>
Baokun Li May 30, 2022, 2:12 a.m. UTC | #2
在 2022/5/28 23:10, Ritesh Harjani 写道:
> On 22/05/28 07:00PM, Baokun Li wrote:
>> Hulk Robot reported a BUG_ON:
>> ==================================================================
>> kernel BUG at fs/ext4/mballoc.c:3211!
>> [...]
>> RIP: 0010:ext4_mb_mark_diskspace_used.cold+0x85/0x136f
>> [...]
>> Call Trace:
>>   ext4_mb_new_blocks+0x9df/0x5d30
>>   ext4_ext_map_blocks+0x1803/0x4d80
>>   ext4_map_blocks+0x3a4/0x1a10
>>   ext4_writepages+0x126d/0x2c30
>>   do_writepages+0x7f/0x1b0
>>   __filemap_fdatawrite_range+0x285/0x3b0
>>   file_write_and_wait_range+0xb1/0x140
>>   ext4_sync_file+0x1aa/0xca0
>>   vfs_fsync_range+0xfb/0x260
>>   do_fsync+0x48/0xa0
>> [...]
>> ==================================================================
>>
>> Above issue may happen as follows:
>> -------------------------------------
>> do_fsync
>>   vfs_fsync_range
>>    ext4_sync_file
>>     file_write_and_wait_range
>>      __filemap_fdatawrite_range
>>       do_writepages
>>        ext4_writepages
>>         mpage_map_and_submit_extent
>>          mpage_map_one_extent
>>           ext4_map_blocks
>>            ext4_mb_new_blocks
>>             ext4_mb_normalize_request
>>              >>> start + size <= ac->ac_o_ex.fe_logical
>>             ext4_mb_regular_allocator
>>              ext4_mb_simple_scan_group
>>               ext4_mb_use_best_found
>>                ext4_mb_new_preallocation
>>                 ext4_mb_new_inode_pa
>>                  ext4_mb_use_inode_pa
>>                   >>> set ac->ac_b_ex.fe_len <= 0
>>             ext4_mb_mark_diskspace_used
>>              >>> BUG_ON(ac->ac_b_ex.fe_len <= 0);
>>
>> we can easily reproduce this problem with the following commands:
>> 	`fallocate -l100M disk`
>> 	`mkfs.ext4 -b 1024 -g 256 disk`
>> 	`mount disk /mnt`
>> 	`fsstress -d /mnt -l 0 -n 1000 -p 1`
>>
>> The size must be smaller than or equal to EXT4_BLOCKS_PER_GROUP.
>> Therefore, "start + size <= ac->ac_o_ex.fe_logical" may occur
>> when the size is truncated. So start should be the start position of
>> the group where ac_o_ex.fe_logical is located after alignment.
>> In addition, when the value of fe_logical or EXT4_BLOCKS_PER_GROUP
>> is very large, the value calculated by start_off is more accurate.
>>
>> Fixes: cd648b8a8fd5 ("ext4: trim allocation requests to group size")
>> Reported-by: Hulk Robot <hulkci@huawei.com>
>> Signed-off-by: Baokun Li <libaokun1@huawei.com>
>> ---
>> V1->V2:
>> 	Replace round_down() with rounddown().
>> 	Modified comments.
>> V2->V3:
>> 	Convert EXT4_BLOCKS_PER_GROUP type to ext4_lblk_t
>> 	to avoid compilation warnings.
> Looks good to me. Feel free to add -
>
> Reviewed-by: Ritesh Harjani <ritesh.list@gmail.com>
>
>>   fs/ext4/mballoc.c | 9 +++++++++
>>   1 file changed, 9 insertions(+)
>>
> .
>
Thank you for your review!
diff mbox series

Patch

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 9f12f29bc346..4d3740fdff90 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -4104,6 +4104,15 @@  ext4_mb_normalize_request(struct ext4_allocation_context *ac,
 	size = size >> bsbits;
 	start = start_off >> bsbits;
 
+	/*
+	 * For tiny groups (smaller than 8MB) the chosen allocation
+	 * alignment may be larger than group size. Make sure the
+	 * alignment does not move allocation to a different group which
+	 * makes mballoc fail assertions later.
+	 */
+	start = max(start, rounddown(ac->ac_o_ex.fe_logical,
+			(ext4_lblk_t)EXT4_BLOCKS_PER_GROUP(ac->ac_sb)));
+
 	/* don't cover already allocated blocks in selected range */
 	if (ar->pleft && start <= ar->lleft) {
 		size -= ar->lleft + 1 - start;