diff mbox series

[v6,05/11] ext4: Separate block bitmap and buddy bitmap freeing in ext4_mb_clear_bb()

Message ID 20230826155028.4019470-6-shikemeng@huaweicloud.com
State Superseded
Headers show
Series cleanups and unit test for mballoc | expand

Commit Message

Kemeng Shi Aug. 26, 2023, 3:50 p.m. UTC
This patch separates block bitmap and buddy bitmap freeing in order to
udpate block bitmap with ext4_mb_mark_context in following patch.

Separated freeing is safe with concurrent allocation as long as:
1. Firstly allocate block in buddy bitmap, and then in block bitmap.
2. Firstly free block in block bitmap, and then buddy bitmap.
Then freed block will only be available to allocation when both buddy
bitmap and block bitmap are updated by freeing.
Allocation obeys rule 1 already, just do sperated freeing with rule 2.

Separated freeing has no race with generate_buddy as:
Once ext4_mb_load_buddy_gfp is executed successfully, the update-to-date
buddy page can be found in sbi->s_buddy_cache and no more buddy
initialization of the buddy page will be executed concurrently until
buddy page is unloaded. As we always do free in "load buddy, free,
unload buddy" sequence, separated freeing has no race with generate_buddy.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
---
 fs/ext4/mballoc.c | 18 ++++++++----------
 1 file changed, 8 insertions(+), 10 deletions(-)

Comments

Ritesh Harjani (IBM) Sept. 1, 2023, 9:34 a.m. UTC | #1
Kemeng Shi <shikemeng@huaweicloud.com> writes:

> This patch separates block bitmap and buddy bitmap freeing in order to
> udpate block bitmap with ext4_mb_mark_context in following patch.
^^^ update.

>
> Separated freeing is safe with concurrent allocation as long as:
> 1. Firstly allocate block in buddy bitmap, and then in block bitmap.
> 2. Firstly free block in block bitmap, and then buddy bitmap.
> Then freed block will only be available to allocation when both buddy
> bitmap and block bitmap are updated by freeing.
> Allocation obeys rule 1 already, just do sperated freeing with rule 2.

So we also don't need ext4_load_buddy_gfp() before freeing on-disk
bitmap right. Continue below...

>
> Separated freeing has no race with generate_buddy as:
> Once ext4_mb_load_buddy_gfp is executed successfully, the update-to-date
> buddy page can be found in sbi->s_buddy_cache and no more buddy
> initialization of the buddy page will be executed concurrently until
> buddy page is unloaded. As we always do free in "load buddy, free,
> unload buddy" sequence, separated freeing has no race with generate_buddy.
>
> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
> ---
>  fs/ext4/mballoc.c | 18 ++++++++----------
>  1 file changed, 8 insertions(+), 10 deletions(-)
>
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index e650eac22237..01ad36a1cc96 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -6519,6 +6519,14 @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode,
>  	if (err)
>  		goto error_return;


Let me add the a piece of code before the added changes and continue...

	err = ext4_mb_load_buddy_gfp(sb, block_group, &e4b,
				     GFP_NOFS|__GFP_NOFAIL);
	if (err)
		goto error_return;
>  
> +	ext4_lock_group(sb, block_group);
> +	mb_clear_bits(bitmap_bh->b_data, bit, count_clusters);
> +	ret = ext4_free_group_clusters(sb, gdp) + count_clusters;
> +	ext4_free_group_clusters_set(sb, gdp, ret);
> +	ext4_block_bitmap_csum_set(sb, gdp, bitmap_bh);
> +	ext4_group_desc_csum_set(sb, block_group, gdp);
> +	ext4_unlock_group(sb, block_group);
> +

...Is it required for ext4_mb_load_buddy_gfp() to be called before
freeing on-disk bitmap blocks? Can you explain why please?
At least it is not very clear to me that why do we need
ext4_mb_load_buddy_gfp() before clearing of bitmap blocks. If it's not
needed then I think we should separate out bitmap freeing logic and
buddy freeing logic even further.

Thoughts?

>  	/*
>  	 * We need to make sure we don't reuse the freed block until after the
>  	 * transaction is committed. We make an exception if the inode is to be
> @@ -6541,13 +6549,8 @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode,
>  		new_entry->efd_tid = handle->h_transaction->t_tid;
>  
>  		ext4_lock_group(sb, block_group);
> -		mb_clear_bits(bitmap_bh->b_data, bit, count_clusters);
>  		ext4_mb_free_metadata(handle, &e4b, new_entry);
>  	} else {
> -		/* need to update group_info->bb_free and bitmap
> -		 * with group lock held. generate_buddy look at
> -		 * them with group lock_held
> -		 */
>  		if (test_opt(sb, DISCARD)) {
>  			err = ext4_issue_discard(sb, block_group, bit,
>  						 count_clusters, NULL);
> @@ -6560,14 +6563,9 @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode,
>  			EXT4_MB_GRP_CLEAR_TRIMMED(e4b.bd_info);
>  
>  		ext4_lock_group(sb, block_group);
> -		mb_clear_bits(bitmap_bh->b_data, bit, count_clusters);
>  		mb_free_blocks(inode, &e4b, bit, count_clusters);
>  	}
>  
> -	ret = ext4_free_group_clusters(sb, gdp) + count_clusters;
> -	ext4_free_group_clusters_set(sb, gdp, ret);
> -	ext4_block_bitmap_csum_set(sb, gdp, bitmap_bh);
> -	ext4_group_desc_csum_set(sb, block_group, gdp);
>  	ext4_unlock_group(sb, block_group);
>  
>  	if (sbi->s_log_groups_per_flex) {


Adding piece of code here...

	if (sbi->s_log_groups_per_flex) {
		ext4_group_t flex_group = ext4_flex_group(sbi, block_group);
		atomic64_add(count_clusters,
			     &sbi_array_rcu_deref(sbi, s_flex_groups,
						  flex_group)->free_clusters);
	}

<...>

	/* We dirtied the bitmap block */
	BUFFER_TRACE(bitmap_bh, "dirtied bitmap block");
	err = ext4_handle_dirty_metadata(handle, NULL, bitmap_bh);
	/* And the group descriptor block */
	BUFFER_TRACE(gd_bh, "dirtied group descriptor block");
	ret = ext4_handle_dirty_metadata(handle, NULL, gd_bh);
	if (!err)
		err = ret;

I was thinking even this can go around bitmap freeing logic above. So
the next patch becomes very clear that all of the bitmap freeing logic
is just simply moved into ext4_mb_mark_context() function.

-ritesh
Kemeng Shi Sept. 4, 2023, 3 a.m. UTC | #2
on 9/1/2023 5:34 PM, Ritesh Harjani wrote:
> Kemeng Shi <shikemeng@huaweicloud.com> writes:
> 
>> This patch separates block bitmap and buddy bitmap freeing in order to
>> udpate block bitmap with ext4_mb_mark_context in following patch.
> ^^^ update.
> 
>>
>> Separated freeing is safe with concurrent allocation as long as:
>> 1. Firstly allocate block in buddy bitmap, and then in block bitmap.
>> 2. Firstly free block in block bitmap, and then buddy bitmap.
>> Then freed block will only be available to allocation when both buddy
>> bitmap and block bitmap are updated by freeing.
>> Allocation obeys rule 1 already, just do sperated freeing with rule 2.
> 
> So we also don't need ext4_load_buddy_gfp() before freeing on-disk
> bitmap right. Continue below...
> 
>>
>> Separated freeing has no race with generate_buddy as:
>> Once ext4_mb_load_buddy_gfp is executed successfully, the update-to-date
>> buddy page can be found in sbi->s_buddy_cache and no more buddy
>> initialization of the buddy page will be executed concurrently until
>> buddy page is unloaded. As we always do free in "load buddy, free,
>> unload buddy" sequence, separated freeing has no race with generate_buddy.
>>
>> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
>> ---
>>  fs/ext4/mballoc.c | 18 ++++++++----------
>>  1 file changed, 8 insertions(+), 10 deletions(-)
>>
>> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
>> index e650eac22237..01ad36a1cc96 100644
>> --- a/fs/ext4/mballoc.c
>> +++ b/fs/ext4/mballoc.c
>> @@ -6519,6 +6519,14 @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode,
>>  	if (err)
>>  		goto error_return;
> 
> 
> Let me add the a piece of code before the added changes and continue...
> 
> 	err = ext4_mb_load_buddy_gfp(sb, block_group, &e4b,
> 				     GFP_NOFS|__GFP_NOFAIL);
> 	if (err)
> 		goto error_return;
>>  
>> +	ext4_lock_group(sb, block_group);
>> +	mb_clear_bits(bitmap_bh->b_data, bit, count_clusters);
>> +	ret = ext4_free_group_clusters(sb, gdp) + count_clusters;
>> +	ext4_free_group_clusters_set(sb, gdp, ret);
>> +	ext4_block_bitmap_csum_set(sb, gdp, bitmap_bh);
>> +	ext4_group_desc_csum_set(sb, block_group, gdp);
>> +	ext4_unlock_group(sb, block_group);
>> +
> 
> ...Is it required for ext4_mb_load_buddy_gfp() to be called before
> freeing on-disk bitmap blocks? Can you explain why please?
> At least it is not very clear to me that why do we need
> ext4_mb_load_buddy_gfp() before clearing of bitmap blocks. If it's not
> needed then I think we should separate out bitmap freeing logic and
> buddy freeing logic even further.
Yes, ext4_mb_load_buddy_gfp is no needed for clearing of bitmap, put
it before bit clearing for catching error and aborting mark early
to reduce functional change.
> 
> Thoughts?
> 
>>  	/*
>>  	 * We need to make sure we don't reuse the freed block until after the
>>  	 * transaction is committed. We make an exception if the inode is to be
>> @@ -6541,13 +6549,8 @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode,
>>  		new_entry->efd_tid = handle->h_transaction->t_tid;
>>  
>>  		ext4_lock_group(sb, block_group);
>> -		mb_clear_bits(bitmap_bh->b_data, bit, count_clusters);
>>  		ext4_mb_free_metadata(handle, &e4b, new_entry);
>>  	} else {
>> -		/* need to update group_info->bb_free and bitmap
>> -		 * with group lock held. generate_buddy look at
>> -		 * them with group lock_held
>> -		 */
>>  		if (test_opt(sb, DISCARD)) {
>>  			err = ext4_issue_discard(sb, block_group, bit,
>>  						 count_clusters, NULL);
>> @@ -6560,14 +6563,9 @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode,
>>  			EXT4_MB_GRP_CLEAR_TRIMMED(e4b.bd_info);
>>  
>>  		ext4_lock_group(sb, block_group);
>> -		mb_clear_bits(bitmap_bh->b_data, bit, count_clusters);
>>  		mb_free_blocks(inode, &e4b, bit, count_clusters);
>>  	}
>>  
>> -	ret = ext4_free_group_clusters(sb, gdp) + count_clusters;
>> -	ext4_free_group_clusters_set(sb, gdp, ret);
>> -	ext4_block_bitmap_csum_set(sb, gdp, bitmap_bh);
>> -	ext4_group_desc_csum_set(sb, block_group, gdp);
>>  	ext4_unlock_group(sb, block_group);
>>  
>>  	if (sbi->s_log_groups_per_flex) {
> 
> 
> Adding piece of code here...
> 
> 	if (sbi->s_log_groups_per_flex) {
> 		ext4_group_t flex_group = ext4_flex_group(sbi, block_group);
> 		atomic64_add(count_clusters,
> 			     &sbi_array_rcu_deref(sbi, s_flex_groups,
> 						  flex_group)->free_clusters);
> 	}
> 
> <...>
> 
> 	/* We dirtied the bitmap block */
> 	BUFFER_TRACE(bitmap_bh, "dirtied bitmap block");
> 	err = ext4_handle_dirty_metadata(handle, NULL, bitmap_bh);
> 	/* And the group descriptor block */
> 	BUFFER_TRACE(gd_bh, "dirtied group descriptor block");
> 	ret = ext4_handle_dirty_metadata(handle, NULL, gd_bh);
> 	if (!err)
> 		err = ret;
> 
> I was thinking even this can go around bitmap freeing logic above. So
> the next patch becomes very clear that all of the bitmap freeing logic
> is just simply moved into ext4_mb_mark_context() function.
> 
> -ritesh
>
Kemeng Shi Sept. 12, 2023, 7:02 a.m. UTC | #3
on 9/4/2023 11:00 AM, Kemeng Shi wrote:
> 
> 
> on 9/1/2023 5:34 PM, Ritesh Harjani wrote:
>> Kemeng Shi <shikemeng@huaweicloud.com> writes:
>>
>>> This patch separates block bitmap and buddy bitmap freeing in order to
>>> udpate block bitmap with ext4_mb_mark_context in following patch.
>> ^^^ update.
>>
>>>
>>> Separated freeing is safe with concurrent allocation as long as:
>>> 1. Firstly allocate block in buddy bitmap, and then in block bitmap.
>>> 2. Firstly free block in block bitmap, and then buddy bitmap.
>>> Then freed block will only be available to allocation when both buddy
>>> bitmap and block bitmap are updated by freeing.
>>> Allocation obeys rule 1 already, just do sperated freeing with rule 2.
>>
>> So we also don't need ext4_load_buddy_gfp() before freeing on-disk
>> bitmap right. Continue below...
>>
>>>
>>> Separated freeing has no race with generate_buddy as:
>>> Once ext4_mb_load_buddy_gfp is executed successfully, the update-to-date
>>> buddy page can be found in sbi->s_buddy_cache and no more buddy
>>> initialization of the buddy page will be executed concurrently until
>>> buddy page is unloaded. As we always do free in "load buddy, free,
>>> unload buddy" sequence, separated freeing has no race with generate_buddy.
>>>
>>> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
>>> ---
>>>  fs/ext4/mballoc.c | 18 ++++++++----------
>>>  1 file changed, 8 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
>>> index e650eac22237..01ad36a1cc96 100644
>>> --- a/fs/ext4/mballoc.c
>>> +++ b/fs/ext4/mballoc.c
>>> @@ -6519,6 +6519,14 @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode,
>>>  	if (err)
>>>  		goto error_return;
>>
>>
>> Let me add the a piece of code before the added changes and continue...
>>
>> 	err = ext4_mb_load_buddy_gfp(sb, block_group, &e4b,
>> 				     GFP_NOFS|__GFP_NOFAIL);
>> 	if (err)
>> 		goto error_return;
>>>  
>>> +	ext4_lock_group(sb, block_group);
>>> +	mb_clear_bits(bitmap_bh->b_data, bit, count_clusters);
>>> +	ret = ext4_free_group_clusters(sb, gdp) + count_clusters;
>>> +	ext4_free_group_clusters_set(sb, gdp, ret);
>>> +	ext4_block_bitmap_csum_set(sb, gdp, bitmap_bh);
>>> +	ext4_group_desc_csum_set(sb, block_group, gdp);
>>> +	ext4_unlock_group(sb, block_group);
>>> +
>>
>> ...Is it required for ext4_mb_load_buddy_gfp() to be called before
>> freeing on-disk bitmap blocks? Can you explain why please?
>> At least it is not very clear to me that why do we need
>> ext4_mb_load_buddy_gfp() before clearing of bitmap blocks. If it's not
>> needed then I think we should separate out bitmap freeing logic and
>> buddy freeing logic even further.
> Yes, ext4_mb_load_buddy_gfp is no needed for clearing of bitmap, put
> it before bit clearing for catching error and aborting mark early
> to reduce functional change.
Hi Ritesh, sorry for bother. I'm going to push a new version and I perfer
to call ext4_mb_load_buddy_gfp early to catch potential error in advance.
Just wonder is this good to you. Like to hear any advice. Thanks!
>>
>> Thoughts?
>>
>>>  	/*
>>>  	 * We need to make sure we don't reuse the freed block until after the
>>>  	 * transaction is committed. We make an exception if the inode is to be
>>> @@ -6541,13 +6549,8 @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode,
>>>  		new_entry->efd_tid = handle->h_transaction->t_tid;
>>>  
>>>  		ext4_lock_group(sb, block_group);
>>> -		mb_clear_bits(bitmap_bh->b_data, bit, count_clusters);
>>>  		ext4_mb_free_metadata(handle, &e4b, new_entry);
>>>  	} else {
>>> -		/* need to update group_info->bb_free and bitmap
>>> -		 * with group lock held. generate_buddy look at
>>> -		 * them with group lock_held
>>> -		 */
>>>  		if (test_opt(sb, DISCARD)) {
>>>  			err = ext4_issue_discard(sb, block_group, bit,
>>>  						 count_clusters, NULL);
>>> @@ -6560,14 +6563,9 @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode,
>>>  			EXT4_MB_GRP_CLEAR_TRIMMED(e4b.bd_info);
>>>  
>>>  		ext4_lock_group(sb, block_group);
>>> -		mb_clear_bits(bitmap_bh->b_data, bit, count_clusters);
>>>  		mb_free_blocks(inode, &e4b, bit, count_clusters);
>>>  	}
>>>  
>>> -	ret = ext4_free_group_clusters(sb, gdp) + count_clusters;
>>> -	ext4_free_group_clusters_set(sb, gdp, ret);
>>> -	ext4_block_bitmap_csum_set(sb, gdp, bitmap_bh);
>>> -	ext4_group_desc_csum_set(sb, block_group, gdp);
>>>  	ext4_unlock_group(sb, block_group);
>>>  
>>>  	if (sbi->s_log_groups_per_flex) {
>>
>>
>> Adding piece of code here...
>>
>> 	if (sbi->s_log_groups_per_flex) {
>> 		ext4_group_t flex_group = ext4_flex_group(sbi, block_group);
>> 		atomic64_add(count_clusters,
>> 			     &sbi_array_rcu_deref(sbi, s_flex_groups,
>> 						  flex_group)->free_clusters);
>> 	}
>>
>> <...>
>>
>> 	/* We dirtied the bitmap block */
>> 	BUFFER_TRACE(bitmap_bh, "dirtied bitmap block");
>> 	err = ext4_handle_dirty_metadata(handle, NULL, bitmap_bh);
>> 	/* And the group descriptor block */
>> 	BUFFER_TRACE(gd_bh, "dirtied group descriptor block");
>> 	ret = ext4_handle_dirty_metadata(handle, NULL, gd_bh);
>> 	if (!err)
>> 		err = ret;
>>
>> I was thinking even this can go around bitmap freeing logic above. So
>> the next patch becomes very clear that all of the bitmap freeing logic
>> is just simply moved into ext4_mb_mark_context() function.
>>
>> -ritesh
>>
> 
>
Ritesh Harjani (IBM) Sept. 12, 2023, 10:13 a.m. UTC | #4
Kemeng Shi <shikemeng@huaweicloud.com> writes:

> on 9/4/2023 11:00 AM, Kemeng Shi wrote:
>> 
>> 
>> on 9/1/2023 5:34 PM, Ritesh Harjani wrote:
>>> Kemeng Shi <shikemeng@huaweicloud.com> writes:
>>>
>>>> This patch separates block bitmap and buddy bitmap freeing in order to
>>>> udpate block bitmap with ext4_mb_mark_context in following patch.
>>> ^^^ update.
>>>
>>>>
>>>> Separated freeing is safe with concurrent allocation as long as:
>>>> 1. Firstly allocate block in buddy bitmap, and then in block bitmap.
>>>> 2. Firstly free block in block bitmap, and then buddy bitmap.
>>>> Then freed block will only be available to allocation when both buddy
>>>> bitmap and block bitmap are updated by freeing.
>>>> Allocation obeys rule 1 already, just do sperated freeing with rule 2.
>>>
>>> So we also don't need ext4_load_buddy_gfp() before freeing on-disk
>>> bitmap right. Continue below...
>>>
>>>>
>>>> Separated freeing has no race with generate_buddy as:
>>>> Once ext4_mb_load_buddy_gfp is executed successfully, the update-to-date
>>>> buddy page can be found in sbi->s_buddy_cache and no more buddy
>>>> initialization of the buddy page will be executed concurrently until
>>>> buddy page is unloaded. As we always do free in "load buddy, free,
>>>> unload buddy" sequence, separated freeing has no race with generate_buddy.
>>>>
>>>> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
>>>> ---
>>>>  fs/ext4/mballoc.c | 18 ++++++++----------
>>>>  1 file changed, 8 insertions(+), 10 deletions(-)
>>>>
>>>> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
>>>> index e650eac22237..01ad36a1cc96 100644
>>>> --- a/fs/ext4/mballoc.c
>>>> +++ b/fs/ext4/mballoc.c
>>>> @@ -6519,6 +6519,14 @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode,
>>>>  	if (err)
>>>>  		goto error_return;
>>>
>>>
>>> Let me add the a piece of code before the added changes and continue...
>>>
>>> 	err = ext4_mb_load_buddy_gfp(sb, block_group, &e4b,
>>> 				     GFP_NOFS|__GFP_NOFAIL);
>>> 	if (err)
>>> 		goto error_return;
>>>>  
>>>> +	ext4_lock_group(sb, block_group);
>>>> +	mb_clear_bits(bitmap_bh->b_data, bit, count_clusters);
>>>> +	ret = ext4_free_group_clusters(sb, gdp) + count_clusters;
>>>> +	ext4_free_group_clusters_set(sb, gdp, ret);
>>>> +	ext4_block_bitmap_csum_set(sb, gdp, bitmap_bh);
>>>> +	ext4_group_desc_csum_set(sb, block_group, gdp);
>>>> +	ext4_unlock_group(sb, block_group);
>>>> +
>>>
>>> ...Is it required for ext4_mb_load_buddy_gfp() to be called before
>>> freeing on-disk bitmap blocks? Can you explain why please?
>>> At least it is not very clear to me that why do we need
>>> ext4_mb_load_buddy_gfp() before clearing of bitmap blocks. If it's not
>>> needed then I think we should separate out bitmap freeing logic and
>>> buddy freeing logic even further.
>> Yes, ext4_mb_load_buddy_gfp is no needed for clearing of bitmap, put

Earlier you mentioned here that there is no dependency.


>> it before bit clearing for catching error and aborting mark early
>> to reduce functional change.
> Hi Ritesh, sorry for bother. I'm going to push a new version and I perfer
> to call ext4_mb_load_buddy_gfp early to catch potential error in advance.
> Just wonder is this good to you. Like to hear any advice. Thanks!

Isn't there actually a dependency that we should first always call
ext4_mb_load_buddy_gfp() before clearing the bitmap? 

Because say if we take care of the bitmap handling first, i.e. we clear
the bitmap and also call ext4_handle_dirty_metadata() first. Now assume
we later call ext4_mb_load_buddy_gfp(). Now also assume the buddy pages
were not already loaded in memory (reclaimed due to memory pressure),
in that case we will read the block bitmap of that group from the
on-disk bitmap copy, which we already changed above.
So that means the buddy handling logic will already find the block
bitmap as cleared and should report problems right?

Then isn't there an actual dependency here, meaning
ext4_mb_load_buddy_gfp() should always be called first before bitmap
handling logic. Thoughts?

-ritesh


>>>
>>> Thoughts?
>>>
>>>>  	/*
>>>>  	 * We need to make sure we don't reuse the freed block until after the
>>>>  	 * transaction is committed. We make an exception if the inode is to be
>>>> @@ -6541,13 +6549,8 @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode,
>>>>  		new_entry->efd_tid = handle->h_transaction->t_tid;
>>>>  
>>>>  		ext4_lock_group(sb, block_group);
>>>> -		mb_clear_bits(bitmap_bh->b_data, bit, count_clusters);
>>>>  		ext4_mb_free_metadata(handle, &e4b, new_entry);
>>>>  	} else {
>>>> -		/* need to update group_info->bb_free and bitmap
>>>> -		 * with group lock held. generate_buddy look at
>>>> -		 * them with group lock_held
>>>> -		 */
>>>>  		if (test_opt(sb, DISCARD)) {
>>>>  			err = ext4_issue_discard(sb, block_group, bit,
>>>>  						 count_clusters, NULL);
>>>> @@ -6560,14 +6563,9 @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode,
>>>>  			EXT4_MB_GRP_CLEAR_TRIMMED(e4b.bd_info);
>>>>  
>>>>  		ext4_lock_group(sb, block_group);
>>>> -		mb_clear_bits(bitmap_bh->b_data, bit, count_clusters);
>>>>  		mb_free_blocks(inode, &e4b, bit, count_clusters);
>>>>  	}
>>>>  
>>>> -	ret = ext4_free_group_clusters(sb, gdp) + count_clusters;
>>>> -	ext4_free_group_clusters_set(sb, gdp, ret);
>>>> -	ext4_block_bitmap_csum_set(sb, gdp, bitmap_bh);
>>>> -	ext4_group_desc_csum_set(sb, block_group, gdp);
>>>>  	ext4_unlock_group(sb, block_group);
>>>>  
>>>>  	if (sbi->s_log_groups_per_flex) {
>>>
>>>
>>> Adding piece of code here...
>>>
>>> 	if (sbi->s_log_groups_per_flex) {
>>> 		ext4_group_t flex_group = ext4_flex_group(sbi, block_group);
>>> 		atomic64_add(count_clusters,
>>> 			     &sbi_array_rcu_deref(sbi, s_flex_groups,
>>> 						  flex_group)->free_clusters);
>>> 	}
>>>
>>> <...>
>>>
>>> 	/* We dirtied the bitmap block */
>>> 	BUFFER_TRACE(bitmap_bh, "dirtied bitmap block");
>>> 	err = ext4_handle_dirty_metadata(handle, NULL, bitmap_bh);
>>> 	/* And the group descriptor block */
>>> 	BUFFER_TRACE(gd_bh, "dirtied group descriptor block");
>>> 	ret = ext4_handle_dirty_metadata(handle, NULL, gd_bh);
>>> 	if (!err)
>>> 		err = ret;
>>>
>>> I was thinking even this can go around bitmap freeing logic above. So
>>> the next patch becomes very clear that all of the bitmap freeing logic
>>> is just simply moved into ext4_mb_mark_context() function.
>>>
>>> -ritesh
>>>
>> 
>>
Kemeng Shi Sept. 12, 2023, 11:32 a.m. UTC | #5
on 9/12/2023 6:13 PM, Ritesh Harjani wrote:
> Kemeng Shi <shikemeng@huaweicloud.com> writes:
> 
>> on 9/4/2023 11:00 AM, Kemeng Shi wrote:
>>>
>>>
>>> on 9/1/2023 5:34 PM, Ritesh Harjani wrote:
>>>> Kemeng Shi <shikemeng@huaweicloud.com> writes:
>>>>
>>>>> This patch separates block bitmap and buddy bitmap freeing in order to
>>>>> udpate block bitmap with ext4_mb_mark_context in following patch.
>>>> ^^^ update.
>>>>
>>>>>
>>>>> Separated freeing is safe with concurrent allocation as long as:
>>>>> 1. Firstly allocate block in buddy bitmap, and then in block bitmap.
>>>>> 2. Firstly free block in block bitmap, and then buddy bitmap.
>>>>> Then freed block will only be available to allocation when both buddy
>>>>> bitmap and block bitmap are updated by freeing.
>>>>> Allocation obeys rule 1 already, just do sperated freeing with rule 2.
>>>>
>>>> So we also don't need ext4_load_buddy_gfp() before freeing on-disk
>>>> bitmap right. Continue below...
>>>>
>>>>>
>>>>> Separated freeing has no race with generate_buddy as:
>>>>> Once ext4_mb_load_buddy_gfp is executed successfully, the update-to-date
>>>>> buddy page can be found in sbi->s_buddy_cache and no more buddy
>>>>> initialization of the buddy page will be executed concurrently until
>>>>> buddy page is unloaded. As we always do free in "load buddy, free,
>>>>> unload buddy" sequence, separated freeing has no race with generate_buddy.
>>>>>
>>>>> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
>>>>> ---
>>>>>  fs/ext4/mballoc.c | 18 ++++++++----------
>>>>>  1 file changed, 8 insertions(+), 10 deletions(-)
>>>>>
>>>>> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
>>>>> index e650eac22237..01ad36a1cc96 100644
>>>>> --- a/fs/ext4/mballoc.c
>>>>> +++ b/fs/ext4/mballoc.c
>>>>> @@ -6519,6 +6519,14 @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode,
>>>>>  	if (err)
>>>>>  		goto error_return;
>>>>
>>>>
>>>> Let me add the a piece of code before the added changes and continue...
>>>>
>>>> 	err = ext4_mb_load_buddy_gfp(sb, block_group, &e4b,
>>>> 				     GFP_NOFS|__GFP_NOFAIL);
>>>> 	if (err)
>>>> 		goto error_return;
>>>>>  
>>>>> +	ext4_lock_group(sb, block_group);
>>>>> +	mb_clear_bits(bitmap_bh->b_data, bit, count_clusters);
>>>>> +	ret = ext4_free_group_clusters(sb, gdp) + count_clusters;
>>>>> +	ext4_free_group_clusters_set(sb, gdp, ret);
>>>>> +	ext4_block_bitmap_csum_set(sb, gdp, bitmap_bh);
>>>>> +	ext4_group_desc_csum_set(sb, block_group, gdp);
>>>>> +	ext4_unlock_group(sb, block_group);
>>>>> +
>>>>
>>>> ...Is it required for ext4_mb_load_buddy_gfp() to be called before
>>>> freeing on-disk bitmap blocks? Can you explain why please?
>>>> At least it is not very clear to me that why do we need
>>>> ext4_mb_load_buddy_gfp() before clearing of bitmap blocks. If it's not
>>>> needed then I think we should separate out bitmap freeing logic and
>>>> buddy freeing logic even further.
>>> Yes, ext4_mb_load_buddy_gfp is no needed for clearing of bitmap, put
> 
> Earlier you mentioned here that there is no dependency.
> 
> 
>>> it before bit clearing for catching error and aborting mark early
>>> to reduce functional change.
>> Hi Ritesh, sorry for bother. I'm going to push a new version and I perfer
>> to call ext4_mb_load_buddy_gfp early to catch potential error in advance.
>> Just wonder is this good to you. Like to hear any advice. Thanks!
> 
> Isn't there actually a dependency that we should first always call
> ext4_mb_load_buddy_gfp() before clearing the bitmap? 
> 
> Because say if we take care of the bitmap handling first, i.e. we clear
> the bitmap and also call ext4_handle_dirty_metadata() first. Now assume
> we later call ext4_mb_load_buddy_gfp(). Now also assume the buddy pages
> were not already loaded in memory (reclaimed due to memory pressure),
> in that case we will read the block bitmap of that group from the
> on-disk bitmap copy, which we already changed above.
> So that means the buddy handling logic will already find the block
> bitmap as cleared and should report problems right?
> 
> Then isn't there an actual dependency here, meaning
> ext4_mb_load_buddy_gfp() should always be called first before bitmap
> handling logic. Thoughts?
> 
My fault, ext4_mb_load_buddy_gfp should be called before on-disk bitmap
handling if buddy page was not loaded...

> -ritesh
> 
> 
>>>>
>>>> Thoughts?
>>>>
>>>>>  	/*
>>>>>  	 * We need to make sure we don't reuse the freed block until after the
>>>>>  	 * transaction is committed. We make an exception if the inode is to be
>>>>> @@ -6541,13 +6549,8 @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode,
>>>>>  		new_entry->efd_tid = handle->h_transaction->t_tid;
>>>>>  
>>>>>  		ext4_lock_group(sb, block_group);
>>>>> -		mb_clear_bits(bitmap_bh->b_data, bit, count_clusters);
>>>>>  		ext4_mb_free_metadata(handle, &e4b, new_entry);
>>>>>  	} else {
>>>>> -		/* need to update group_info->bb_free and bitmap
>>>>> -		 * with group lock held. generate_buddy look at
>>>>> -		 * them with group lock_held
>>>>> -		 */
>>>>>  		if (test_opt(sb, DISCARD)) {
>>>>>  			err = ext4_issue_discard(sb, block_group, bit,
>>>>>  						 count_clusters, NULL);
>>>>> @@ -6560,14 +6563,9 @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode,
>>>>>  			EXT4_MB_GRP_CLEAR_TRIMMED(e4b.bd_info);
>>>>>  
>>>>>  		ext4_lock_group(sb, block_group);
>>>>> -		mb_clear_bits(bitmap_bh->b_data, bit, count_clusters);
>>>>>  		mb_free_blocks(inode, &e4b, bit, count_clusters);
>>>>>  	}
>>>>>  
>>>>> -	ret = ext4_free_group_clusters(sb, gdp) + count_clusters;
>>>>> -	ext4_free_group_clusters_set(sb, gdp, ret);
>>>>> -	ext4_block_bitmap_csum_set(sb, gdp, bitmap_bh);
>>>>> -	ext4_group_desc_csum_set(sb, block_group, gdp);
>>>>>  	ext4_unlock_group(sb, block_group);
>>>>>  
>>>>>  	if (sbi->s_log_groups_per_flex) {
>>>>
>>>>
>>>> Adding piece of code here...
>>>>
>>>> 	if (sbi->s_log_groups_per_flex) {
>>>> 		ext4_group_t flex_group = ext4_flex_group(sbi, block_group);
>>>> 		atomic64_add(count_clusters,
>>>> 			     &sbi_array_rcu_deref(sbi, s_flex_groups,
>>>> 						  flex_group)->free_clusters);
>>>> 	}
>>>>
>>>> <...>
>>>>
>>>> 	/* We dirtied the bitmap block */
>>>> 	BUFFER_TRACE(bitmap_bh, "dirtied bitmap block");
>>>> 	err = ext4_handle_dirty_metadata(handle, NULL, bitmap_bh);
>>>> 	/* And the group descriptor block */
>>>> 	BUFFER_TRACE(gd_bh, "dirtied group descriptor block");
>>>> 	ret = ext4_handle_dirty_metadata(handle, NULL, gd_bh);
>>>> 	if (!err)
>>>> 		err = ret;
>>>>
>>>> I was thinking even this can go around bitmap freeing logic above. So
>>>> the next patch becomes very clear that all of the bitmap freeing logic
>>>> is just simply moved into ext4_mb_mark_context() function.
>>>>
>>>> -ritesh
>>>>
>>>
>>>
>
diff mbox series

Patch

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index e650eac22237..01ad36a1cc96 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -6519,6 +6519,14 @@  static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode,
 	if (err)
 		goto error_return;
 
+	ext4_lock_group(sb, block_group);
+	mb_clear_bits(bitmap_bh->b_data, bit, count_clusters);
+	ret = ext4_free_group_clusters(sb, gdp) + count_clusters;
+	ext4_free_group_clusters_set(sb, gdp, ret);
+	ext4_block_bitmap_csum_set(sb, gdp, bitmap_bh);
+	ext4_group_desc_csum_set(sb, block_group, gdp);
+	ext4_unlock_group(sb, block_group);
+
 	/*
 	 * We need to make sure we don't reuse the freed block until after the
 	 * transaction is committed. We make an exception if the inode is to be
@@ -6541,13 +6549,8 @@  static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode,
 		new_entry->efd_tid = handle->h_transaction->t_tid;
 
 		ext4_lock_group(sb, block_group);
-		mb_clear_bits(bitmap_bh->b_data, bit, count_clusters);
 		ext4_mb_free_metadata(handle, &e4b, new_entry);
 	} else {
-		/* need to update group_info->bb_free and bitmap
-		 * with group lock held. generate_buddy look at
-		 * them with group lock_held
-		 */
 		if (test_opt(sb, DISCARD)) {
 			err = ext4_issue_discard(sb, block_group, bit,
 						 count_clusters, NULL);
@@ -6560,14 +6563,9 @@  static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode,
 			EXT4_MB_GRP_CLEAR_TRIMMED(e4b.bd_info);
 
 		ext4_lock_group(sb, block_group);
-		mb_clear_bits(bitmap_bh->b_data, bit, count_clusters);
 		mb_free_blocks(inode, &e4b, bit, count_clusters);
 	}
 
-	ret = ext4_free_group_clusters(sb, gdp) + count_clusters;
-	ext4_free_group_clusters_set(sb, gdp, ret);
-	ext4_block_bitmap_csum_set(sb, gdp, bitmap_bh);
-	ext4_group_desc_csum_set(sb, block_group, gdp);
 	ext4_unlock_group(sb, block_group);
 
 	if (sbi->s_log_groups_per_flex) {