Message ID | 20200717155352.1053040-3-tytso@mit.edu |
---|---|
State | Superseded |
Headers | show |
Series | ex4 block bitmap prefetching | expand |
> On Jul 17, 2020, at 9:53 AM, Theodore Ts'o <tytso@mit.edu> wrote: > > From: Alex Zhuravlev <azhuravlev@whamcloud.com> > > cr=0 is supposed to be an optimization to save CPU cycles, but if > buddy data (in memory) is not initialized then all this makes no sense > as we have to do sync IO taking a lot of cycles. also, at cr=0 > mballoc doesn't store any avaibale chunk. cr=1 also skips groups using (typo) "available", but "doesn't store any available chunk" is still a bit confusing. Maybe "doesn't *choose* any available chunk"? > heuristic based on avg. fragment size. it's more useful to skip such > groups and switch to cr=2 where groups will be scanned for available > chunks. > > using sparse image and dm-slow virtual device of 120TB was > simulated. then the image was formatted and filled using debugfs to > mark ~85% of available space as busy. mount process w/o the patch > couldn't complete in half an hour (according to vmstat it would take > ~10-11 hours). With the patch applied mount took ~20 seconds. > > Lustre-bug-id: https://jira.whamcloud.com/browse/LU-12988 > Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com> > Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Purely cosmetic fix below, but still fine otherwise. > --- > fs/ext4/mballoc.c | 13 ++++++++++++- > 1 file changed, 12 insertions(+), 1 deletion(-) > > diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c > index 8a1e6e03c088..172994349bf6 100644 > --- a/fs/ext4/mballoc.c > +++ b/fs/ext4/mballoc.c > @@ -2195,7 +2195,18 @@ static int ext4_mb_good_group_nolock(struct ext4_allocation_context *ac, > > /* We only do this if the grp has never been initialized */ > if (unlikely(EXT4_MB_GRP_NEED_INIT(grp))) { > - ret = ext4_mb_init_group(ac->ac_sb, group, GFP_NOFS); > + struct ext4_group_desc *gdp = ext4_get_group_desc(sb, group, > + NULL); > + int ret; (style) this might be nicer to read like: struct ext4_group_desc *gdp = ext4_get_group_desc(sb, group, NULL); int ret; > + /* cr=0/1 is a very optimistic search to find large > + * good chunks almost for free. if buddy data is > + * not ready, then this optimization makes no sense */ > + if (cr < 2 && > + !(ext4_has_group_desc_csum(sb) && > + (gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)))) > + return 0; > + ret = ext4_mb_init_group(sb, group, GFP_NOFS); > if (ret) > return ret; > } > -- > 2.24.1 > Cheers, Andreas
Looks good. Reviewed-by: Artem Blagodarenko <artem.blagodarenko@gmail.com> > On 17 Jul 2020, at 18:53, Theodore Ts'o <tytso@mit.edu> wrote: > > From: Alex Zhuravlev <azhuravlev@whamcloud.com> > > cr=0 is supposed to be an optimization to save CPU cycles, but if > buddy data (in memory) is not initialized then all this makes no sense > as we have to do sync IO taking a lot of cycles. also, at cr=0 > mballoc doesn't store any avaibale chunk. cr=1 also skips groups using > heuristic based on avg. fragment size. it's more useful to skip such > groups and switch to cr=2 where groups will be scanned for available > chunks. > > using sparse image and dm-slow virtual device of 120TB was > simulated. then the image was formatted and filled using debugfs to > mark ~85% of available space as busy. mount process w/o the patch > couldn't complete in half an hour (according to vmstat it would take > ~10-11 hours). With the patch applied mount took ~20 seconds. > > Lustre-bug-id: https://jira.whamcloud.com/browse/LU-12988 > Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com> > Reviewed-by: Andreas Dilger <adilger@whamcloud.com> > --- > fs/ext4/mballoc.c | 13 ++++++++++++- > 1 file changed, 12 insertions(+), 1 deletion(-) > > diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c > index 8a1e6e03c088..172994349bf6 100644 > --- a/fs/ext4/mballoc.c > +++ b/fs/ext4/mballoc.c > @@ -2195,7 +2195,18 @@ static int ext4_mb_good_group_nolock(struct ext4_allocation_context *ac, > > /* We only do this if the grp has never been initialized */ > if (unlikely(EXT4_MB_GRP_NEED_INIT(grp))) { > - ret = ext4_mb_init_group(ac->ac_sb, group, GFP_NOFS); > + struct ext4_group_desc *gdp = ext4_get_group_desc(sb, group, > + NULL); > + int ret; > + > + /* cr=0/1 is a very optimistic search to find large > + * good chunks almost for free. if buddy data is > + * not ready, then this optimization makes no sense */ > + if (cr < 2 && > + !(ext4_has_group_desc_csum(sb) && > + (gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)))) > + return 0; > + ret = ext4_mb_init_group(sb, group, GFP_NOFS); > if (ret) > return ret; > } > -- > 2.24.1 >
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index 8a1e6e03c088..172994349bf6 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -2195,7 +2195,18 @@ static int ext4_mb_good_group_nolock(struct ext4_allocation_context *ac, /* We only do this if the grp has never been initialized */ if (unlikely(EXT4_MB_GRP_NEED_INIT(grp))) { - ret = ext4_mb_init_group(ac->ac_sb, group, GFP_NOFS); + struct ext4_group_desc *gdp = ext4_get_group_desc(sb, group, + NULL); + int ret; + + /* cr=0/1 is a very optimistic search to find large + * good chunks almost for free. if buddy data is + * not ready, then this optimization makes no sense */ + if (cr < 2 && + !(ext4_has_group_desc_csum(sb) && + (gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)))) + return 0; + ret = ext4_mb_init_group(sb, group, GFP_NOFS); if (ret) return ret; }