diff mbox series

[1/1] ext4: fallback to complex scan if aligned scan doesn't work

Message ID ee033f6dfa0a7f2934437008a909c3788233950f.1702455010.git.ojaswin@linux.ibm.com
State Awaiting Upstream
Headers show
Series Fix for recent bugzilla reports related to long halts during block allocation | expand

Commit Message

Ojaswin Mujoo Dec. 15, 2023, 11:19 a.m. UTC
Currently in case the goal length is a multiple of stripe size we use
ext4_mb_scan_aligned() to find the stripe size aligned physical blocks.
In case we are not able to find any, we again go back to calling
ext4_mb_choose_next_group() to search for a different suitable block
group. However, since the linear search always begins from the start,
most of the times we end up with the same BG and the cycle continues.

With large fliesystems, the CPU can be stuck in this loop for hours
which can slow down the whole system. Hence, until we figure out a
better way to continue the search (rather than starting from beginning)
in ext4_mb_choose_next_group(), lets just fallback to
ext4_mb_complex_scan_group() in case aligned scan fails, as it is much
more likely to find the needed blocks.

Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
---
 fs/ext4/mballoc.c | 21 +++++++++++++--------
 1 file changed, 13 insertions(+), 8 deletions(-)

Comments

Jan Kara Jan. 4, 2024, 3:27 p.m. UTC | #1
On Fri 15-12-23 16:49:50, Ojaswin Mujoo wrote:
> Currently in case the goal length is a multiple of stripe size we use
> ext4_mb_scan_aligned() to find the stripe size aligned physical blocks.
> In case we are not able to find any, we again go back to calling
> ext4_mb_choose_next_group() to search for a different suitable block
> group. However, since the linear search always begins from the start,
> most of the times we end up with the same BG and the cycle continues.
> 
> With large fliesystems, the CPU can be stuck in this loop for hours
> which can slow down the whole system. Hence, until we figure out a
> better way to continue the search (rather than starting from beginning)
> in ext4_mb_choose_next_group(), lets just fallback to
> ext4_mb_complex_scan_group() in case aligned scan fails, as it is much
> more likely to find the needed blocks.
> 
> Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>

If I understand the difference right, the problem is that while
ext4_mb_choose_next_group() guarantees large enough free space extent for
the CR_GOAL_LEN_FAST or CR_BEST_AVAIL_LEN passes, it does not guaranteed
large enough *aligned* free space extent. Thus for non-aligned allocations
we can fail only due to a race with another allocating process but with
aligned allocations we can consistently fail in ext4_mb_scan_aligned() and
thus livelock in the allocation loop.

If my understanding is correct, feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza



> ---
>  fs/ext4/mballoc.c | 21 +++++++++++++--------
>  1 file changed, 13 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index d72b5e3c92ec..63f12ec02485 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -2895,14 +2895,19 @@ ext4_mb_regular_allocator(struct ext4_allocation_context *ac)
>  			ac->ac_groups_scanned++;
>  			if (cr == CR_POWER2_ALIGNED)
>  				ext4_mb_simple_scan_group(ac, &e4b);
> -			else if ((cr == CR_GOAL_LEN_FAST ||
> -				 cr == CR_BEST_AVAIL_LEN) &&
> -				 sbi->s_stripe &&
> -				 !(ac->ac_g_ex.fe_len %
> -				 EXT4_B2C(sbi, sbi->s_stripe)))
> -				ext4_mb_scan_aligned(ac, &e4b);
> -			else
> -				ext4_mb_complex_scan_group(ac, &e4b);
> +			else {
> +				bool is_stripe_aligned = sbi->s_stripe &&
> +					!(ac->ac_g_ex.fe_len %
> +					  EXT4_B2C(sbi, sbi->s_stripe));
> +
> +				if ((cr == CR_GOAL_LEN_FAST ||
> +				     cr == CR_BEST_AVAIL_LEN) &&
> +				    is_stripe_aligned)
> +					ext4_mb_scan_aligned(ac, &e4b);
> +
> +				if (ac->ac_status == AC_STATUS_CONTINUE)
> +					ext4_mb_complex_scan_group(ac, &e4b);
> +			}
>  
>  			ext4_unlock_group(sb, group);
>  			ext4_mb_unload_buddy(&e4b);
> -- 
> 2.39.3
>
Ojaswin Mujoo Jan. 9, 2024, 9:40 a.m. UTC | #2
On Thu, Jan 04, 2024 at 04:27:17PM +0100, Jan Kara wrote:
> On Fri 15-12-23 16:49:50, Ojaswin Mujoo wrote:
> > Currently in case the goal length is a multiple of stripe size we use
> > ext4_mb_scan_aligned() to find the stripe size aligned physical blocks.
> > In case we are not able to find any, we again go back to calling
> > ext4_mb_choose_next_group() to search for a different suitable block
> > group. However, since the linear search always begins from the start,
> > most of the times we end up with the same BG and the cycle continues.
> > 
> > With large fliesystems, the CPU can be stuck in this loop for hours
> > which can slow down the whole system. Hence, until we figure out a
> > better way to continue the search (rather than starting from beginning)
> > in ext4_mb_choose_next_group(), lets just fallback to
> > ext4_mb_complex_scan_group() in case aligned scan fails, as it is much
> > more likely to find the needed blocks.
> > 
> > Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
> 
> If I understand the difference right, the problem is that while
> ext4_mb_choose_next_group() guarantees large enough free space extent for
> the CR_GOAL_LEN_FAST or CR_BEST_AVAIL_LEN passes, it does not guaranteed
> large enough *aligned* free space extent. Thus for non-aligned allocations
> we can fail only due to a race with another allocating process but with
> aligned allocations we can consistently fail in ext4_mb_scan_aligned() and
> thus livelock in the allocation loop.
> 
> If my understanding is correct, feel free to add:
> 
> Reviewed-by: Jan Kara <jack@suse.cz>
> 
> 								Honza

Hey Jan,

Yes you are correct, thanks for the review.

As you said, it's theoretically possible to livelock during non stripe
scenarios as well, but the probability of getting stuck for any
significant amount of time is really really less. I'm not sure if that
is enough to justify adding some logic to optimize the search for such
scenarios as that might need more involved code changes.

Regards,
ojaswin
> 
> 
> 
> > ---
> >  fs/ext4/mballoc.c | 21 +++++++++++++--------
> >  1 file changed, 13 insertions(+), 8 deletions(-)
> > 
> > diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> > index d72b5e3c92ec..63f12ec02485 100644
> > --- a/fs/ext4/mballoc.c
> > +++ b/fs/ext4/mballoc.c
> > @@ -2895,14 +2895,19 @@ ext4_mb_regular_allocator(struct ext4_allocation_context *ac)
> >  			ac->ac_groups_scanned++;
> >  			if (cr == CR_POWER2_ALIGNED)
> >  				ext4_mb_simple_scan_group(ac, &e4b);
> > -			else if ((cr == CR_GOAL_LEN_FAST ||
> > -				 cr == CR_BEST_AVAIL_LEN) &&
> > -				 sbi->s_stripe &&
> > -				 !(ac->ac_g_ex.fe_len %
> > -				 EXT4_B2C(sbi, sbi->s_stripe)))
> > -				ext4_mb_scan_aligned(ac, &e4b);
> > -			else
> > -				ext4_mb_complex_scan_group(ac, &e4b);
> > +			else {
> > +				bool is_stripe_aligned = sbi->s_stripe &&
> > +					!(ac->ac_g_ex.fe_len %
> > +					  EXT4_B2C(sbi, sbi->s_stripe));
> > +
> > +				if ((cr == CR_GOAL_LEN_FAST ||
> > +				     cr == CR_BEST_AVAIL_LEN) &&
> > +				    is_stripe_aligned)
> > +					ext4_mb_scan_aligned(ac, &e4b);
> > +
> > +				if (ac->ac_status == AC_STATUS_CONTINUE)
> > +					ext4_mb_complex_scan_group(ac, &e4b);
> > +			}
> >  
> >  			ext4_unlock_group(sb, group);
> >  			ext4_mb_unload_buddy(&e4b);
> > -- 
> > 2.39.3
> > 
> -- 
> Jan Kara <jack@suse.com>
> SUSE Labs, CR
diff mbox series

Patch

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index d72b5e3c92ec..63f12ec02485 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -2895,14 +2895,19 @@  ext4_mb_regular_allocator(struct ext4_allocation_context *ac)
 			ac->ac_groups_scanned++;
 			if (cr == CR_POWER2_ALIGNED)
 				ext4_mb_simple_scan_group(ac, &e4b);
-			else if ((cr == CR_GOAL_LEN_FAST ||
-				 cr == CR_BEST_AVAIL_LEN) &&
-				 sbi->s_stripe &&
-				 !(ac->ac_g_ex.fe_len %
-				 EXT4_B2C(sbi, sbi->s_stripe)))
-				ext4_mb_scan_aligned(ac, &e4b);
-			else
-				ext4_mb_complex_scan_group(ac, &e4b);
+			else {
+				bool is_stripe_aligned = sbi->s_stripe &&
+					!(ac->ac_g_ex.fe_len %
+					  EXT4_B2C(sbi, sbi->s_stripe));
+
+				if ((cr == CR_GOAL_LEN_FAST ||
+				     cr == CR_BEST_AVAIL_LEN) &&
+				    is_stripe_aligned)
+					ext4_mb_scan_aligned(ac, &e4b);
+
+				if (ac->ac_status == AC_STATUS_CONTINUE)
+					ext4_mb_complex_scan_group(ac, &e4b);
+			}
 
 			ext4_unlock_group(sb, group);
 			ext4_mb_unload_buddy(&e4b);