Message ID | 20220528110017.354175-2-libaokun1@huawei.com |
---|---|
State | Awaiting Upstream |
Headers | show |
Series | ext4: fix two bugs in ext4_mb_normalize_request | expand |
On 22/05/28 07:00PM, Baokun Li wrote: > Hulk Robot reported a BUG_ON: > ================================================================== > kernel BUG at fs/ext4/mballoc.c:3211! > [...] > RIP: 0010:ext4_mb_mark_diskspace_used.cold+0x85/0x136f > [...] > Call Trace: > ext4_mb_new_blocks+0x9df/0x5d30 > ext4_ext_map_blocks+0x1803/0x4d80 > ext4_map_blocks+0x3a4/0x1a10 > ext4_writepages+0x126d/0x2c30 > do_writepages+0x7f/0x1b0 > __filemap_fdatawrite_range+0x285/0x3b0 > file_write_and_wait_range+0xb1/0x140 > ext4_sync_file+0x1aa/0xca0 > vfs_fsync_range+0xfb/0x260 > do_fsync+0x48/0xa0 > [...] > ================================================================== > > Above issue may happen as follows: > ------------------------------------- > do_fsync > vfs_fsync_range > ext4_sync_file > file_write_and_wait_range > __filemap_fdatawrite_range > do_writepages > ext4_writepages > mpage_map_and_submit_extent > mpage_map_one_extent > ext4_map_blocks > ext4_mb_new_blocks > ext4_mb_normalize_request > >>> start + size <= ac->ac_o_ex.fe_logical > ext4_mb_regular_allocator > ext4_mb_simple_scan_group > ext4_mb_use_best_found > ext4_mb_new_preallocation > ext4_mb_new_inode_pa > ext4_mb_use_inode_pa > >>> set ac->ac_b_ex.fe_len <= 0 > ext4_mb_mark_diskspace_used > >>> BUG_ON(ac->ac_b_ex.fe_len <= 0); > > we can easily reproduce this problem with the following commands: > `fallocate -l100M disk` > `mkfs.ext4 -b 1024 -g 256 disk` > `mount disk /mnt` > `fsstress -d /mnt -l 0 -n 1000 -p 1` > > The size must be smaller than or equal to EXT4_BLOCKS_PER_GROUP. > Therefore, "start + size <= ac->ac_o_ex.fe_logical" may occur > when the size is truncated. So start should be the start position of > the group where ac_o_ex.fe_logical is located after alignment. > In addition, when the value of fe_logical or EXT4_BLOCKS_PER_GROUP > is very large, the value calculated by start_off is more accurate. > > Fixes: cd648b8a8fd5 ("ext4: trim allocation requests to group size") > Reported-by: Hulk Robot <hulkci@huawei.com> > Signed-off-by: Baokun Li <libaokun1@huawei.com> > --- > V1->V2: > Replace round_down() with rounddown(). > Modified comments. > V2->V3: > Convert EXT4_BLOCKS_PER_GROUP type to ext4_lblk_t > to avoid compilation warnings. Looks good to me. Feel free to add - Reviewed-by: Ritesh Harjani <ritesh.list@gmail.com> > > fs/ext4/mballoc.c | 9 +++++++++ > 1 file changed, 9 insertions(+) >
在 2022/5/28 23:10, Ritesh Harjani 写道: > On 22/05/28 07:00PM, Baokun Li wrote: >> Hulk Robot reported a BUG_ON: >> ================================================================== >> kernel BUG at fs/ext4/mballoc.c:3211! >> [...] >> RIP: 0010:ext4_mb_mark_diskspace_used.cold+0x85/0x136f >> [...] >> Call Trace: >> ext4_mb_new_blocks+0x9df/0x5d30 >> ext4_ext_map_blocks+0x1803/0x4d80 >> ext4_map_blocks+0x3a4/0x1a10 >> ext4_writepages+0x126d/0x2c30 >> do_writepages+0x7f/0x1b0 >> __filemap_fdatawrite_range+0x285/0x3b0 >> file_write_and_wait_range+0xb1/0x140 >> ext4_sync_file+0x1aa/0xca0 >> vfs_fsync_range+0xfb/0x260 >> do_fsync+0x48/0xa0 >> [...] >> ================================================================== >> >> Above issue may happen as follows: >> ------------------------------------- >> do_fsync >> vfs_fsync_range >> ext4_sync_file >> file_write_and_wait_range >> __filemap_fdatawrite_range >> do_writepages >> ext4_writepages >> mpage_map_and_submit_extent >> mpage_map_one_extent >> ext4_map_blocks >> ext4_mb_new_blocks >> ext4_mb_normalize_request >> >>> start + size <= ac->ac_o_ex.fe_logical >> ext4_mb_regular_allocator >> ext4_mb_simple_scan_group >> ext4_mb_use_best_found >> ext4_mb_new_preallocation >> ext4_mb_new_inode_pa >> ext4_mb_use_inode_pa >> >>> set ac->ac_b_ex.fe_len <= 0 >> ext4_mb_mark_diskspace_used >> >>> BUG_ON(ac->ac_b_ex.fe_len <= 0); >> >> we can easily reproduce this problem with the following commands: >> `fallocate -l100M disk` >> `mkfs.ext4 -b 1024 -g 256 disk` >> `mount disk /mnt` >> `fsstress -d /mnt -l 0 -n 1000 -p 1` >> >> The size must be smaller than or equal to EXT4_BLOCKS_PER_GROUP. >> Therefore, "start + size <= ac->ac_o_ex.fe_logical" may occur >> when the size is truncated. So start should be the start position of >> the group where ac_o_ex.fe_logical is located after alignment. >> In addition, when the value of fe_logical or EXT4_BLOCKS_PER_GROUP >> is very large, the value calculated by start_off is more accurate. >> >> Fixes: cd648b8a8fd5 ("ext4: trim allocation requests to group size") >> Reported-by: Hulk Robot <hulkci@huawei.com> >> Signed-off-by: Baokun Li <libaokun1@huawei.com> >> --- >> V1->V2: >> Replace round_down() with rounddown(). >> Modified comments. >> V2->V3: >> Convert EXT4_BLOCKS_PER_GROUP type to ext4_lblk_t >> to avoid compilation warnings. > Looks good to me. Feel free to add - > > Reviewed-by: Ritesh Harjani <ritesh.list@gmail.com> > >> fs/ext4/mballoc.c | 9 +++++++++ >> 1 file changed, 9 insertions(+) >> > . > Thank you for your review!
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index 9f12f29bc346..4d3740fdff90 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -4104,6 +4104,15 @@ ext4_mb_normalize_request(struct ext4_allocation_context *ac, size = size >> bsbits; start = start_off >> bsbits; + /* + * For tiny groups (smaller than 8MB) the chosen allocation + * alignment may be larger than group size. Make sure the + * alignment does not move allocation to a different group which + * makes mballoc fail assertions later. + */ + start = max(start, rounddown(ac->ac_o_ex.fe_logical, + (ext4_lblk_t)EXT4_BLOCKS_PER_GROUP(ac->ac_sb))); + /* don't cover already allocated blocks in selected range */ if (ar->pleft && start <= ar->lleft) { size -= ar->lleft + 1 - start;