diff mbox series

[2/4] ext4: skip non-loaded groups at cr=0/1 when scanning for good groups

Message ID 20200717155352.1053040-3-tytso@mit.edu
State Superseded
Headers show
Series ex4 block bitmap prefetching | expand

Commit Message

Theodore Ts'o July 17, 2020, 3:53 p.m. UTC
From: Alex Zhuravlev <azhuravlev@whamcloud.com>

cr=0 is supposed to be an optimization to save CPU cycles, but if
buddy data (in memory) is not initialized then all this makes no sense
as we have to do sync IO taking a lot of cycles.  also, at cr=0
mballoc doesn't store any avaibale chunk. cr=1 also skips groups using
heuristic based on avg. fragment size. it's more useful to skip such
groups and switch to cr=2 where groups will be scanned for available
chunks.

using sparse image and dm-slow virtual device of 120TB was
simulated. then the image was formatted and filled using debugfs to
mark ~85% of available space as busy.  mount process w/o the patch
couldn't complete in half an hour (according to vmstat it would take
~10-11 hours).  With the patch applied mount took ~20 seconds.

Lustre-bug-id: https://jira.whamcloud.com/browse/LU-12988
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
---
 fs/ext4/mballoc.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

Comments

Andreas Dilger July 21, 2020, 7:48 a.m. UTC | #1
> On Jul 17, 2020, at 9:53 AM, Theodore Ts'o <tytso@mit.edu> wrote:
> 
> From: Alex Zhuravlev <azhuravlev@whamcloud.com>
> 
> cr=0 is supposed to be an optimization to save CPU cycles, but if
> buddy data (in memory) is not initialized then all this makes no sense
> as we have to do sync IO taking a lot of cycles.  also, at cr=0
> mballoc doesn't store any avaibale chunk. cr=1 also skips groups using

(typo) "available", but "doesn't store any available chunk" is still
a bit confusing.  Maybe "doesn't *choose* any available chunk"?

> heuristic based on avg. fragment size. it's more useful to skip such
> groups and switch to cr=2 where groups will be scanned for available
> chunks.
> 
> using sparse image and dm-slow virtual device of 120TB was
> simulated. then the image was formatted and filled using debugfs to
> mark ~85% of available space as busy.  mount process w/o the patch
> couldn't complete in half an hour (according to vmstat it would take
> ~10-11 hours).  With the patch applied mount took ~20 seconds.
> 
> Lustre-bug-id: https://jira.whamcloud.com/browse/LU-12988
> Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
> Reviewed-by: Andreas Dilger <adilger@whamcloud.com>

Purely cosmetic fix below, but still fine otherwise.

> ---
> fs/ext4/mballoc.c | 13 ++++++++++++-
> 1 file changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index 8a1e6e03c088..172994349bf6 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -2195,7 +2195,18 @@ static int ext4_mb_good_group_nolock(struct ext4_allocation_context *ac,
> 
> 	/* We only do this if the grp has never been initialized */
> 	if (unlikely(EXT4_MB_GRP_NEED_INIT(grp))) {
> -		ret = ext4_mb_init_group(ac->ac_sb, group, GFP_NOFS);
> +		struct ext4_group_desc *gdp = ext4_get_group_desc(sb, group,
> +								  NULL);
> +		int ret;

(style) this might be nicer to read like:

		struct ext4_group_desc *gdp =
			ext4_get_group_desc(sb, group, NULL);
		int ret;

> +		/* cr=0/1 is a very optimistic search to find large
> +		 * good chunks almost for free. if buddy data is
> +		 * not ready, then this optimization makes no sense */
> +		if (cr < 2 &&
> +		    !(ext4_has_group_desc_csum(sb) &&
> +		      (gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT))))
> +			return 0;
> +		ret = ext4_mb_init_group(sb, group, GFP_NOFS);
> 		if (ret)
> 			return ret;
> 	}
> --
> 2.24.1
> 


Cheers, Andreas
Artem Blagodarenko July 24, 2020, 11:27 a.m. UTC | #2
Looks good.

Reviewed-by: Artem Blagodarenko <artem.blagodarenko@gmail.com>

> On 17 Jul 2020, at 18:53, Theodore Ts'o <tytso@mit.edu> wrote:
> 
> From: Alex Zhuravlev <azhuravlev@whamcloud.com>
> 
> cr=0 is supposed to be an optimization to save CPU cycles, but if
> buddy data (in memory) is not initialized then all this makes no sense
> as we have to do sync IO taking a lot of cycles.  also, at cr=0
> mballoc doesn't store any avaibale chunk. cr=1 also skips groups using
> heuristic based on avg. fragment size. it's more useful to skip such
> groups and switch to cr=2 where groups will be scanned for available
> chunks.
> 
> using sparse image and dm-slow virtual device of 120TB was
> simulated. then the image was formatted and filled using debugfs to
> mark ~85% of available space as busy.  mount process w/o the patch
> couldn't complete in half an hour (according to vmstat it would take
> ~10-11 hours).  With the patch applied mount took ~20 seconds.
> 
> Lustre-bug-id: https://jira.whamcloud.com/browse/LU-12988
> Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
> Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
> ---
> fs/ext4/mballoc.c | 13 ++++++++++++-
> 1 file changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index 8a1e6e03c088..172994349bf6 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -2195,7 +2195,18 @@ static int ext4_mb_good_group_nolock(struct ext4_allocation_context *ac,
> 
> 	/* We only do this if the grp has never been initialized */
> 	if (unlikely(EXT4_MB_GRP_NEED_INIT(grp))) {
> -		ret = ext4_mb_init_group(ac->ac_sb, group, GFP_NOFS);
> +		struct ext4_group_desc *gdp = ext4_get_group_desc(sb, group,
> +								  NULL);
> +		int ret;
> +
> +		/* cr=0/1 is a very optimistic search to find large
> +		 * good chunks almost for free. if buddy data is
> +		 * not ready, then this optimization makes no sense */
> +		if (cr < 2 &&
> +		    !(ext4_has_group_desc_csum(sb) &&
> +		      (gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT))))
> +			return 0;
> +		ret = ext4_mb_init_group(sb, group, GFP_NOFS);
> 		if (ret)
> 			return ret;
> 	}
> -- 
> 2.24.1
>
diff mbox series

Patch

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 8a1e6e03c088..172994349bf6 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -2195,7 +2195,18 @@  static int ext4_mb_good_group_nolock(struct ext4_allocation_context *ac,
 
 	/* We only do this if the grp has never been initialized */
 	if (unlikely(EXT4_MB_GRP_NEED_INIT(grp))) {
-		ret = ext4_mb_init_group(ac->ac_sb, group, GFP_NOFS);
+		struct ext4_group_desc *gdp = ext4_get_group_desc(sb, group,
+								  NULL);
+		int ret;
+
+		/* cr=0/1 is a very optimistic search to find large
+		 * good chunks almost for free. if buddy data is
+		 * not ready, then this optimization makes no sense */
+		if (cr < 2 &&
+		    !(ext4_has_group_desc_csum(sb) &&
+		      (gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT))))
+			return 0;
+		ret = ext4_mb_init_group(sb, group, GFP_NOFS);
 		if (ret)
 			return ret;
 	}