diff mbox series

[v2,09/12] ext4: Ensure ext4_mb_prefetch_fini() is called for all prefetched BGs

Message ID 05e648ae04ec5b754207032823e9c1de9a54f87a.1685449706.git.ojaswin@linux.ibm.com
State Awaiting Upstream
Headers show
Series multiblock allocator improvements | expand

Commit Message

Ojaswin Mujoo May 30, 2023, 12:33 p.m. UTC
Before this patch, the call stack in ext4_run_li_request is as follows:

  /*
   * nr = no. of BGs we want to fetch (=s_mb_prefetch)
   * prefetch_ios = no. of BGs not uptodate after
   * 		    ext4_read_block_bitmap_nowait()
   */
  next_group = ext4_mb_prefetch(sb, group, nr, prefetch_ios);
  ext4_mb_prefetch_fini(sb, next_group prefetch_ios);

ext4_mb_prefetch_fini() will only try to initialize buddies for BGs in
range [next_group - prefetch_ios, next_group). This is incorrect since
sometimes (prefetch_ios < nr), which causes ext4_mb_prefetch_fini() to
incorrectly ignore some of the BGs that might need initialization. This
issue is more notable now with the previous patch enabling "fetching" of
BLOCK_UNINIT BGs which are marked buffer_uptodate by default.

Fix this by passing nr to ext4_mb_prefetch_fini() instead of
prefetch_ios so that it considers the right range of groups.

Similarly, make sure we don't pass nr=0 to ext4_mb_prefetch_fini() in
ext4_mb_regular_allocator() since we might have prefetched BLOCK_UNINIT
groups that would need buddy initialization.

Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/mballoc.c |  4 ----
 fs/ext4/super.c   | 11 ++++-------
 2 files changed, 4 insertions(+), 11 deletions(-)

Comments

Guoqing Jiang June 6, 2023, 2 p.m. UTC | #1
Hello,

On 5/30/23 20:33, Ojaswin Mujoo wrote:
> Before this patch, the call stack in ext4_run_li_request is as follows:
>
>    /*
>     * nr = no. of BGs we want to fetch (=s_mb_prefetch)
>     * prefetch_ios = no. of BGs not uptodate after
>     * 		    ext4_read_block_bitmap_nowait()
>     */
>    next_group = ext4_mb_prefetch(sb, group, nr, prefetch_ios);
>    ext4_mb_prefetch_fini(sb, next_group prefetch_ios);
>
> ext4_mb_prefetch_fini() will only try to initialize buddies for BGs in
> range [next_group - prefetch_ios, next_group). This is incorrect since
> sometimes (prefetch_ios < nr), which causes ext4_mb_prefetch_fini() to
> incorrectly ignore some of the BGs that might need initialization. This
> issue is more notable now with the previous patch enabling "fetching" of
> BLOCK_UNINIT BGs which are marked buffer_uptodate by default.
>
> Fix this by passing nr to ext4_mb_prefetch_fini() instead of
> prefetch_ios so that it considers the right range of groups.

Thanks for the series.

> Similarly, make sure we don't pass nr=0 to ext4_mb_prefetch_fini() in
> ext4_mb_regular_allocator() since we might have prefetched BLOCK_UNINIT
> groups that would need buddy initialization.

Seems ext4_mb_prefetch_fini can't be called by ext4_mb_regular_allocator
if nr is 0.

https://elixir.bootlin.com/linux/v6.4-rc5/source/fs/ext4/mballoc.c#L2816

Am I miss something?

Thanks,
Guoqing

> Signed-off-by: Ojaswin Mujoo<ojaswin@linux.ibm.com>
> Reviewed-by: Ritesh Harjani (IBM)<ritesh.list@gmail.com>
> Reviewed-by: Jan Kara<jack@suse.cz>
> ---
>   fs/ext4/mballoc.c |  4 ----
>   fs/ext4/super.c   | 11 ++++-------
>   2 files changed, 4 insertions(+), 11 deletions(-)
>
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index 79455c7e645b..6775d73dfc68 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -2735,8 +2735,6 @@ ext4_mb_regular_allocator(struct ext4_allocation_context *ac)
>   			if ((prefetch_grp == group) &&
>   			    (cr > CR1 ||
>   			     prefetch_ios < sbi->s_mb_prefetch_limit)) {
> -				unsigned int curr_ios = prefetch_ios;
> -
>   				nr = sbi->s_mb_prefetch;
>   				if (ext4_has_feature_flex_bg(sb)) {
>   					nr = 1 << sbi->s_log_groups_per_flex;
> @@ -2745,8 +2743,6 @@ ext4_mb_regular_allocator(struct ext4_allocation_context *ac)
>   				}
>   				prefetch_grp = ext4_mb_prefetch(sb, group,
>   							nr, &prefetch_ios);
> -				if (prefetch_ios == curr_ios)
> -					nr = 0;
>   			}
>   
>   			/* This now checks without needing the buddy page */
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index 2da5476fa48b..27c1dabacd43 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -3692,16 +3692,13 @@ static int ext4_run_li_request(struct ext4_li_request *elr)
>   	ext4_group_t group = elr->lr_next_group;
>   	unsigned int prefetch_ios = 0;
>   	int ret = 0;
> +	int nr = EXT4_SB(sb)->s_mb_prefetch;
>   	u64 start_time;
>   
>   	if (elr->lr_mode == EXT4_LI_MODE_PREFETCH_BBITMAP) {
> -		elr->lr_next_group = ext4_mb_prefetch(sb, group,
> -				EXT4_SB(sb)->s_mb_prefetch, &prefetch_ios);
> -		if (prefetch_ios)
> -			ext4_mb_prefetch_fini(sb, elr->lr_next_group,
> -					      prefetch_ios);
> -		trace_ext4_prefetch_bitmaps(sb, group, elr->lr_next_group,
> -					    prefetch_ios);
> +		elr->lr_next_group = ext4_mb_prefetch(sb, group, nr, &prefetch_ios);
> +		ext4_mb_prefetch_fini(sb, elr->lr_next_group, nr);
> +		trace_ext4_prefetch_bitmaps(sb, group, elr->lr_next_group, nr);
>   		if (group >= elr->lr_next_group) {
>   			ret = 1;
>   			if (elr->lr_first_not_zeroed != ngroups &&
Ojaswin Mujoo June 27, 2023, 6:51 a.m. UTC | #2
On Tue, Jun 06, 2023 at 10:00:57PM +0800, Guoqing Jiang wrote:
> Hello,
> 
> On 5/30/23 20:33, Ojaswin Mujoo wrote:
> > Before this patch, the call stack in ext4_run_li_request is as follows:
> > 
> >    /*
> >     * nr = no. of BGs we want to fetch (=s_mb_prefetch)
> >     * prefetch_ios = no. of BGs not uptodate after
> >     * 		    ext4_read_block_bitmap_nowait()
> >     */
> >    next_group = ext4_mb_prefetch(sb, group, nr, prefetch_ios);
> >    ext4_mb_prefetch_fini(sb, next_group prefetch_ios);
> > 
> > ext4_mb_prefetch_fini() will only try to initialize buddies for BGs in
> > range [next_group - prefetch_ios, next_group). This is incorrect since
> > sometimes (prefetch_ios < nr), which causes ext4_mb_prefetch_fini() to
> > incorrectly ignore some of the BGs that might need initialization. This
> > issue is more notable now with the previous patch enabling "fetching" of
> > BLOCK_UNINIT BGs which are marked buffer_uptodate by default.
> > 
> > Fix this by passing nr to ext4_mb_prefetch_fini() instead of
> > prefetch_ios so that it considers the right range of groups.
> 
> Thanks for the series.
> 
> > Similarly, make sure we don't pass nr=0 to ext4_mb_prefetch_fini() in
> > ext4_mb_regular_allocator() since we might have prefetched BLOCK_UNINIT
> > groups that would need buddy initialization.
> 
> Seems ext4_mb_prefetch_fini can't be called by ext4_mb_regular_allocator
> if nr is 0.

Hi Guoqing,

Sorry I was on vacation so didn't get a chance to reply to this sooner.
Let me explain what I meant by that statement in the commit message.

So basically, the prefetch_ios output argument is incremented whenever
ext4_mb_prefetch() reads a block group with !buffer_uptodate(bh).
However, for BLOCK_UNINIT BGs the buffer is marked uptodate after
initialization and hence prefetch_ios is not incremented when such BGs
are prefetched. 

This leads to nr becoming 0 due to the following line (removed in this patch):

				if (prefetch_ios == curr_ios)
					nr = 0;

hence ext4_mb_prefetch_fini() would never pre initialise the corresponding 
buddy structures. Instead, these structures would then get initialized
probably at a later point during the slower allocation criterias. The
motivation of making sure the BLOCK_UNINIT BGs' buddies are pre
initialized is so the faster allocation criterias can utilize the data
to make better decisions.

Regards,
ojaswin

> 
> https://elixir.bootlin.com/linux/v6.4-rc5/source/fs/ext4/mballoc.c#L2816
> 
> Am I miss something?
> 
> Thanks,
> Guoqing
>
Guoqing Jiang June 28, 2023, 1:33 a.m. UTC | #3
Hi Ojaswin,

On 6/27/23 14:51, Ojaswin Mujoo wrote:
> On Tue, Jun 06, 2023 at 10:00:57PM +0800, Guoqing Jiang wrote:
>> Hello,
>>
>> On 5/30/23 20:33, Ojaswin Mujoo wrote:
>>> Before this patch, the call stack in ext4_run_li_request is as follows:
>>>
>>>     /*
>>>      * nr = no. of BGs we want to fetch (=s_mb_prefetch)
>>>      * prefetch_ios = no. of BGs not uptodate after
>>>      * 		    ext4_read_block_bitmap_nowait()
>>>      */
>>>     next_group = ext4_mb_prefetch(sb, group, nr, prefetch_ios);
>>>     ext4_mb_prefetch_fini(sb, next_group prefetch_ios);
>>>
>>> ext4_mb_prefetch_fini() will only try to initialize buddies for BGs in
>>> range [next_group - prefetch_ios, next_group). This is incorrect since
>>> sometimes (prefetch_ios < nr), which causes ext4_mb_prefetch_fini() to
>>> incorrectly ignore some of the BGs that might need initialization. This
>>> issue is more notable now with the previous patch enabling "fetching" of
>>> BLOCK_UNINIT BGs which are marked buffer_uptodate by default.
>>>
>>> Fix this by passing nr to ext4_mb_prefetch_fini() instead of
>>> prefetch_ios so that it considers the right range of groups.
>> Thanks for the series.
>>
>>> Similarly, make sure we don't pass nr=0 to ext4_mb_prefetch_fini() in
>>> ext4_mb_regular_allocator() since we might have prefetched BLOCK_UNINIT
>>> groups that would need buddy initialization.
>> Seems ext4_mb_prefetch_fini can't be called by ext4_mb_regular_allocator
>> if nr is 0.
> Hi Guoqing,
>
> Sorry I was on vacation so didn't get a chance to reply to this sooner.
> Let me explain what I meant by that statement in the commit message.
>
> So basically, the prefetch_ios output argument is incremented whenever
> ext4_mb_prefetch() reads a block group with !buffer_uptodate(bh).
> However, for BLOCK_UNINIT BGs the buffer is marked uptodate after
> initialization and hence prefetch_ios is not incremented when such BGs
> are prefetched.
>
> This leads to nr becoming 0 due to the following line (removed in this patch):
>
> 				if (prefetch_ios == curr_ios)
> 					nr = 0;
>
> hence ext4_mb_prefetch_fini() would never pre initialise the corresponding
> buddy structures. Instead, these structures would then get initialized
> probably at a later point during the slower allocation criterias. The
> motivation of making sure the BLOCK_UNINIT BGs' buddies are pre
> initialized is so the faster allocation criterias can utilize the data
> to make better decisions.

Got it, appreciate for the detailed explanation!

Thanks,
Guoqing
diff mbox series

Patch

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 79455c7e645b..6775d73dfc68 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -2735,8 +2735,6 @@  ext4_mb_regular_allocator(struct ext4_allocation_context *ac)
 			if ((prefetch_grp == group) &&
 			    (cr > CR1 ||
 			     prefetch_ios < sbi->s_mb_prefetch_limit)) {
-				unsigned int curr_ios = prefetch_ios;
-
 				nr = sbi->s_mb_prefetch;
 				if (ext4_has_feature_flex_bg(sb)) {
 					nr = 1 << sbi->s_log_groups_per_flex;
@@ -2745,8 +2743,6 @@  ext4_mb_regular_allocator(struct ext4_allocation_context *ac)
 				}
 				prefetch_grp = ext4_mb_prefetch(sb, group,
 							nr, &prefetch_ios);
-				if (prefetch_ios == curr_ios)
-					nr = 0;
 			}
 
 			/* This now checks without needing the buddy page */
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 2da5476fa48b..27c1dabacd43 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -3692,16 +3692,13 @@  static int ext4_run_li_request(struct ext4_li_request *elr)
 	ext4_group_t group = elr->lr_next_group;
 	unsigned int prefetch_ios = 0;
 	int ret = 0;
+	int nr = EXT4_SB(sb)->s_mb_prefetch;
 	u64 start_time;
 
 	if (elr->lr_mode == EXT4_LI_MODE_PREFETCH_BBITMAP) {
-		elr->lr_next_group = ext4_mb_prefetch(sb, group,
-				EXT4_SB(sb)->s_mb_prefetch, &prefetch_ios);
-		if (prefetch_ios)
-			ext4_mb_prefetch_fini(sb, elr->lr_next_group,
-					      prefetch_ios);
-		trace_ext4_prefetch_bitmaps(sb, group, elr->lr_next_group,
-					    prefetch_ios);
+		elr->lr_next_group = ext4_mb_prefetch(sb, group, nr, &prefetch_ios);
+		ext4_mb_prefetch_fini(sb, elr->lr_next_group, nr);
+		trace_ext4_prefetch_bitmaps(sb, group, elr->lr_next_group, nr);
 		if (group >= elr->lr_next_group) {
 			ret = 1;
 			if (elr->lr_first_not_zeroed != ngroups &&