diff mbox series

[4/4] ext4: avoid dividing by 0 in mb_update_avg_fragment_size() when block bitmap corrupt

Message ID 20231218141814.1477338-5-libaokun1@huawei.com
State Not Applicable
Headers show
Series ext4: fix divide error in mb_update_avg_fragment_size() | expand

Commit Message

Baokun Li Dec. 18, 2023, 2:18 p.m. UTC
When bb_free is not 0 but bb_fragments is 0, return directly to avoid
system crash due to division by zero.

Fixes: 83e80a6e3543 ("ext4: use buckets for cr 1 block scan instead of rbtree")
CC: stable@vger.kernel.org
Signed-off-by: Baokun Li <libaokun1@huawei.com>
---
 fs/ext4/mballoc.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Jan Kara Dec. 18, 2023, 2:43 p.m. UTC | #1
On Mon 18-12-23 22:18:14, Baokun Li wrote:
> When bb_free is not 0 but bb_fragments is 0, return directly to avoid
> system crash due to division by zero.

How could this possibly happen? bb_fragments is the number of free space
extents and bb_free is the number of free blocks. No free space extents =>
no free blocks seems pretty obvious? You can see the logic in
ext4_mb_generate_buddy()...

								Honza

> 
> Fixes: 83e80a6e3543 ("ext4: use buckets for cr 1 block scan instead of rbtree")
> CC: stable@vger.kernel.org
> Signed-off-by: Baokun Li <libaokun1@huawei.com>
> ---
>  fs/ext4/mballoc.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index 2fbee0f0f5c3..e2a167240335 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -845,6 +845,9 @@ mb_update_avg_fragment_size(struct super_block *sb, struct ext4_group_info *grp)
>  	if (!test_opt2(sb, MB_OPTIMIZE_SCAN) || grp->bb_free == 0)
>  		return;
>  
> +	if (unlikely(grp->bb_fragments == 0))
> +		return;
> +
>  	new_order = mb_avg_fragment_size_order(sb,
>  					grp->bb_free / grp->bb_fragments);
>  	if (new_order == grp->bb_avg_fragment_size_order)
> -- 
> 2.31.1
>
Jan Kara Dec. 18, 2023, 3:09 p.m. UTC | #2
On Mon 18-12-23 15:43:42, Jan Kara wrote:
> On Mon 18-12-23 22:18:14, Baokun Li wrote:
> > When bb_free is not 0 but bb_fragments is 0, return directly to avoid
> > system crash due to division by zero.
> 
> How could this possibly happen? bb_fragments is the number of free space
> extents and bb_free is the number of free blocks. No free space extents =>
> no free blocks seems pretty obvious? You can see the logic in
> ext4_mb_generate_buddy()...

Oh, I see. This is probably about "bitmap corrupted case". But still both
allocation and freeing of blocks shouldn't operate on bitmaps marked as
corrupted so this should not happen?

								Honza
Baokun Li Dec. 19, 2023, 8:02 a.m. UTC | #3
On 2023/12/18 23:09, Jan Kara wrote:
> On Mon 18-12-23 15:43:42, Jan Kara wrote:
>> On Mon 18-12-23 22:18:14, Baokun Li wrote:
>>> When bb_free is not 0 but bb_fragments is 0, return directly to avoid
>>> system crash due to division by zero.
>> How could this possibly happen? bb_fragments is the number of free space
>> extents and bb_free is the number of free blocks. No free space extents =>
>> no free blocks seems pretty obvious? You can see the logic in
>> ext4_mb_generate_buddy()...
> Oh, I see. This is probably about "bitmap corrupted case". But still both
> allocation and freeing of blocks shouldn't operate on bitmaps marked as
> corrupted so this should not happen?
>
> 								Honza
Yes, we should make sure that we don't allocate or free blocks in
groups where the block bitmap has been marked as corrupt, but
there are still some issues here:

1. When a block bitmap is found to be corrupted, ext4_grp_locked_error()
is always called first, and only after that the block bitmap of the group
is marked as corrupted. In ext4_grp_locked_error(), the group may
be unlocked, and then other processes may be able to access the
corrupted bitmap. In this case, we can just put the marking of
corruption before ext4_grp_locked_error().

2. ext4_free_blocks() finds a corrupt bitmap can just return and do
nothing, because there is no problem with not freeing an exception
block. But mb_mark_used() has no logic for determining if a block
bitmap is corrupt, and its caller has no error handling logic, so
mb_mark_used() needs its caller to make sure that it doesn't allocate
blocks in a group with a corrupted block bitmap (which is why it
added the judgment in patch 2). However, it is possible to unlock group
between determining whether the group is corrupt and actually calling
mb_mark_used() to use those blocks. For example, when calling
mb_mark_used() in ext4_mb_try_best_found(), we are determining
whether the group's block bitmap is corrupted or not in the previous
ext4_mb_good_group(), but we are not determining it again when using
the blocks in ext4_mb_try_best_found(), at which point we may be
modifying the corrupted block bitmap.

3. Determine if a block bitmap is corrupted outside of a group lock
in ext4_mb_find_by_goal().

4. In ext4_mb_check_limits(), it may be possible to use the ac_b_ex
found in group 0 while holding a lock in group 1.

In addition to the above, there may be some corner cases that cause
inconsistencies, so here we determine if bb_fragments is 0 to avoid a
crash due to division by zero. Perhaps we could just replace
grp->bb_free == 0 with grp->bb_fragments == 0, which wouldn't look
so strange.

Thanks!
Baokun Li Dec. 20, 2023, 1:43 p.m. UTC | #4
On 2023/12/19 16:02, Baokun Li wrote:
> On 2023/12/18 23:09, Jan Kara wrote:
>> On Mon 18-12-23 15:43:42, Jan Kara wrote:
>>> On Mon 18-12-23 22:18:14, Baokun Li wrote:
>>>> When bb_free is not 0 but bb_fragments is 0, return directly to avoid
>>>> system crash due to division by zero.
>>> How could this possibly happen? bb_fragments is the number of free 
>>> space
>>> extents and bb_free is the number of free blocks. No free space 
>>> extents =>
>>> no free blocks seems pretty obvious? You can see the logic in
>>> ext4_mb_generate_buddy()...
>> Oh, I see. This is probably about "bitmap corrupted case". But still 
>> both
>> allocation and freeing of blocks shouldn't operate on bitmaps marked as
>> corrupted so this should not happen?
>>
>>                                 Honza
> Yes, we should make sure that we don't allocate or free blocks in
> groups where the block bitmap has been marked as corrupt, but
> there are still some issues here:
>
> 1. When a block bitmap is found to be corrupted, ext4_grp_locked_error()
> is always called first, and only after that the block bitmap of the group
> is marked as corrupted. In ext4_grp_locked_error(), the group may
> be unlocked, and then other processes may be able to access the
> corrupted bitmap. In this case, we can just put the marking of
> corruption before ext4_grp_locked_error().
>
> 2. ext4_free_blocks() finds a corrupt bitmap can just return and do
> nothing, because there is no problem with not freeing an exception
> block. But mb_mark_used() has no logic for determining if a block
> bitmap is corrupt, and its caller has no error handling logic, so
> mb_mark_used() needs its caller to make sure that it doesn't allocate
> blocks in a group with a corrupted block bitmap (which is why it
> added the judgment in patch 2). However, it is possible to unlock group
> between determining whether the group is corrupt and actually calling
> mb_mark_used() to use those blocks. For example, when calling
> mb_mark_used() in ext4_mb_try_best_found(), we are determining
> whether the group's block bitmap is corrupted or not in the previous
> ext4_mb_good_group(), but we are not determining it again when using
> the blocks in ext4_mb_try_best_found(), at which point we may be
> modifying the corrupted block bitmap.
>
> 3. Determine if a block bitmap is corrupted outside of a group lock
> in ext4_mb_find_by_goal().
>
> 4. In ext4_mb_check_limits(), it may be possible to use the ac_b_ex
> found in group 0 while holding a lock in group 1.

I'm very sorry that the fourth point was wrong, I read "||" as "&&" in
ext4_mb_check_limits() :

  if (finish_group || ac->ac_found > sbi->s_mb_min_to_scan)

>
> In addition to the above, there may be some corner cases that cause
> inconsistencies, so here we determine if bb_fragments is 0 to avoid a
> crash due to division by zero. Perhaps we could just replace
> grp->bb_free == 0 with grp->bb_fragments == 0, which wouldn't look
> so strange.

Thanks!
diff mbox series

Patch

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 2fbee0f0f5c3..e2a167240335 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -845,6 +845,9 @@  mb_update_avg_fragment_size(struct super_block *sb, struct ext4_group_info *grp)
 	if (!test_opt2(sb, MB_OPTIMIZE_SCAN) || grp->bb_free == 0)
 		return;
 
+	if (unlikely(grp->bb_fragments == 0))
+		return;
+
 	new_order = mb_avg_fragment_size_order(sb,
 					grp->bb_free / grp->bb_fragments);
 	if (new_order == grp->bb_avg_fragment_size_order)