diff mbox series

[2/5] ext4: Avoid unnecessary spreading of allocations among groups

Message ID 20220906152920.25584-2-jack@suse.cz
State Superseded
Headers show
Series ext4: Fix performance regression with mballoc | expand

Commit Message

Jan Kara Sept. 6, 2022, 3:29 p.m. UTC
mb_set_largest_free_order() updates lists containing groups with largest
chunk of free space of given order. The way it updates it leads to
always moving the group to the tail of the list. Thus allocations
looking for free space of given order effectively end up cycling through
all groups (and due to initialization in last to first order). This
spreads allocations among block groups which reduces performance for
rotating disks or low-end flash media. Change
mb_set_largest_free_order() to only update lists if the order of the
largest free chunk in the group changed.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/mballoc.c | 24 +++++++++++++-----------
 1 file changed, 13 insertions(+), 11 deletions(-)

Comments

Ritesh Harjani (IBM) Sept. 7, 2022, 6:05 p.m. UTC | #1
On 22/09/06 05:29PM, Jan Kara wrote:
> mb_set_largest_free_order() updates lists containing groups with largest
> chunk of free space of given order. The way it updates it leads to
> always moving the group to the tail of the list. Thus allocations
> looking for free space of given order effectively end up cycling through
> all groups (and due to initialization in last to first order). This
> spreads allocations among block groups which reduces performance for
> rotating disks or low-end flash media. Change
> mb_set_largest_free_order() to only update lists if the order of the
> largest free chunk in the group changed.

Nice and clear explaination. Thanks :)

This change also looks good to me.
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>


One other thought to further optimize - 
Will it make a difference if rather then adding the group to the tail of the list, 
we add that group to the head of sbi->s_mb_largest_free_orders[new_order]. 

This is because this group is the latest from where blocks were allocated/freed,
and hence the next allocation should first try from this group in order to keep 
the files/extents blocks close to each other? 
(That sometimes might help with disk firmware to avoid doing discards if the freed 
block can be reused?)

Or does goal block will always cover that case by default and we might never
require this? Maybe in a case of a new file within the same directory where 
the goal group has no free blocks, but the last group attempted should be 
retried first?

-ritesh
Jan Kara Sept. 8, 2022, 8:57 a.m. UTC | #2
On Wed 07-09-22 23:35:07, Ritesh Harjani (IBM) wrote:
> On 22/09/06 05:29PM, Jan Kara wrote:
> > mb_set_largest_free_order() updates lists containing groups with largest
> > chunk of free space of given order. The way it updates it leads to
> > always moving the group to the tail of the list. Thus allocations
> > looking for free space of given order effectively end up cycling through
> > all groups (and due to initialization in last to first order). This
> > spreads allocations among block groups which reduces performance for
> > rotating disks or low-end flash media. Change
> > mb_set_largest_free_order() to only update lists if the order of the
> > largest free chunk in the group changed.
> 
> Nice and clear explaination. Thanks :)
> 
> This change also looks good to me.
> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>

Thanks for review!

> One other thought to further optimize - 
> Will it make a difference if rather then adding the group to the tail of the list, 
> we add that group to the head of sbi->s_mb_largest_free_orders[new_order]. 
> 
> This is because this group is the latest from where blocks were allocated/freed,
> and hence the next allocation should first try from this group in order to keep 
> the files/extents blocks close to each other? 
> (That sometimes might help with disk firmware to avoid doing discards if the freed 
> block can be reused?)
> 
> Or does goal block will always cover that case by default and we might never
> require this? Maybe in a case of a new file within the same directory where 
> the goal group has no free blocks, but the last group attempted should be 
> retried first?

So I was also wondering about this somewhat. I think that goal group will
take care of keeping file data together so head/tail insertion should not
matter too much for one file. Maybe if the allocation comes from a
different inode, then the head/tail insertion matters but then it is not
certain whether the allocation is actually related and what its order is
(depending on that we might prefer same / different group) so I've decided
to just keep things as they are. I agree it might be interesting to
investigate and experiment with various workloads and see whether the
head/tail insertion makes a difference for some workload but I think it's a
separate project.

								Honza
Ritesh Harjani (IBM) Sept. 8, 2022, 9:24 a.m. UTC | #3
On 22/09/08 10:57AM, Jan Kara wrote:
> On Wed 07-09-22 23:35:07, Ritesh Harjani (IBM) wrote:
> > On 22/09/06 05:29PM, Jan Kara wrote:
> > > mb_set_largest_free_order() updates lists containing groups with largest
> > > chunk of free space of given order. The way it updates it leads to
> > > always moving the group to the tail of the list. Thus allocations
> > > looking for free space of given order effectively end up cycling through
> > > all groups (and due to initialization in last to first order). This
> > > spreads allocations among block groups which reduces performance for
> > > rotating disks or low-end flash media. Change
> > > mb_set_largest_free_order() to only update lists if the order of the
> > > largest free chunk in the group changed.
> > 
> > Nice and clear explaination. Thanks :)
> > 
> > This change also looks good to me.
> > Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
> 
> Thanks for review!
> 
> > One other thought to further optimize - 
> > Will it make a difference if rather then adding the group to the tail of the list, 
> > we add that group to the head of sbi->s_mb_largest_free_orders[new_order]. 
> > 
> > This is because this group is the latest from where blocks were allocated/freed,
> > and hence the next allocation should first try from this group in order to keep 
> > the files/extents blocks close to each other? 
> > (That sometimes might help with disk firmware to avoid doing discards if the freed 
> > block can be reused?)
> > 
> > Or does goal block will always cover that case by default and we might never
> > require this? Maybe in a case of a new file within the same directory where 
> > the goal group has no free blocks, but the last group attempted should be 
> > retried first?
> 
> So I was also wondering about this somewhat. I think that goal group will
> take care of keeping file data together so head/tail insertion should not
> matter too much for one file. Maybe if the allocation comes from a
> different inode, then the head/tail insertion matters but then it is not
> certain whether the allocation is actually related and what its order is
> (depending on that we might prefer same / different group) so I've decided
> to just keep things as they are. I agree it might be interesting to
> investigate and experiment with various workloads and see whether the
> head/tail insertion makes a difference for some workload but I think it's a
> separate project.
> 

Sure. Make sense.
Thanks for still sharing your thoughts on it.

-ritesh
diff mbox series

Patch

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 41e1cfecac3b..6251b4a6cc63 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -1077,23 +1077,25 @@  mb_set_largest_free_order(struct super_block *sb, struct ext4_group_info *grp)
 	struct ext4_sb_info *sbi = EXT4_SB(sb);
 	int i;
 
-	if (test_opt2(sb, MB_OPTIMIZE_SCAN) && grp->bb_largest_free_order >= 0) {
+	for (i = MB_NUM_ORDERS(sb) - 1; i >= 0; i--)
+		if (grp->bb_counters[i] > 0)
+			break;
+	/* No need to move between order lists? */
+	if (!test_opt2(sb, MB_OPTIMIZE_SCAN) ||
+	    i == grp->bb_largest_free_order) {
+		grp->bb_largest_free_order = i;
+		return;
+	}
+
+	if (grp->bb_largest_free_order >= 0) {
 		write_lock(&sbi->s_mb_largest_free_orders_locks[
 					      grp->bb_largest_free_order]);
 		list_del_init(&grp->bb_largest_free_order_node);
 		write_unlock(&sbi->s_mb_largest_free_orders_locks[
 					      grp->bb_largest_free_order]);
 	}
-	grp->bb_largest_free_order = -1; /* uninit */
-
-	for (i = MB_NUM_ORDERS(sb) - 1; i >= 0; i--) {
-		if (grp->bb_counters[i] > 0) {
-			grp->bb_largest_free_order = i;
-			break;
-		}
-	}
-	if (test_opt2(sb, MB_OPTIMIZE_SCAN) &&
-	    grp->bb_largest_free_order >= 0 && grp->bb_free) {
+	grp->bb_largest_free_order = i;
+	if (grp->bb_largest_free_order >= 0 && grp->bb_free) {
 		write_lock(&sbi->s_mb_largest_free_orders_locks[
 					      grp->bb_largest_free_order]);
 		list_add_tail(&grp->bb_largest_free_order_node,