Message ID | 20220906152920.25584-2-jack@suse.cz |
---|---|
State | Superseded |
Headers | show |
Series | ext4: Fix performance regression with mballoc | expand |
On 22/09/06 05:29PM, Jan Kara wrote: > mb_set_largest_free_order() updates lists containing groups with largest > chunk of free space of given order. The way it updates it leads to > always moving the group to the tail of the list. Thus allocations > looking for free space of given order effectively end up cycling through > all groups (and due to initialization in last to first order). This > spreads allocations among block groups which reduces performance for > rotating disks or low-end flash media. Change > mb_set_largest_free_order() to only update lists if the order of the > largest free chunk in the group changed. Nice and clear explaination. Thanks :) This change also looks good to me. Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> One other thought to further optimize - Will it make a difference if rather then adding the group to the tail of the list, we add that group to the head of sbi->s_mb_largest_free_orders[new_order]. This is because this group is the latest from where blocks were allocated/freed, and hence the next allocation should first try from this group in order to keep the files/extents blocks close to each other? (That sometimes might help with disk firmware to avoid doing discards if the freed block can be reused?) Or does goal block will always cover that case by default and we might never require this? Maybe in a case of a new file within the same directory where the goal group has no free blocks, but the last group attempted should be retried first? -ritesh
On Wed 07-09-22 23:35:07, Ritesh Harjani (IBM) wrote: > On 22/09/06 05:29PM, Jan Kara wrote: > > mb_set_largest_free_order() updates lists containing groups with largest > > chunk of free space of given order. The way it updates it leads to > > always moving the group to the tail of the list. Thus allocations > > looking for free space of given order effectively end up cycling through > > all groups (and due to initialization in last to first order). This > > spreads allocations among block groups which reduces performance for > > rotating disks or low-end flash media. Change > > mb_set_largest_free_order() to only update lists if the order of the > > largest free chunk in the group changed. > > Nice and clear explaination. Thanks :) > > This change also looks good to me. > Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Thanks for review! > One other thought to further optimize - > Will it make a difference if rather then adding the group to the tail of the list, > we add that group to the head of sbi->s_mb_largest_free_orders[new_order]. > > This is because this group is the latest from where blocks were allocated/freed, > and hence the next allocation should first try from this group in order to keep > the files/extents blocks close to each other? > (That sometimes might help with disk firmware to avoid doing discards if the freed > block can be reused?) > > Or does goal block will always cover that case by default and we might never > require this? Maybe in a case of a new file within the same directory where > the goal group has no free blocks, but the last group attempted should be > retried first? So I was also wondering about this somewhat. I think that goal group will take care of keeping file data together so head/tail insertion should not matter too much for one file. Maybe if the allocation comes from a different inode, then the head/tail insertion matters but then it is not certain whether the allocation is actually related and what its order is (depending on that we might prefer same / different group) so I've decided to just keep things as they are. I agree it might be interesting to investigate and experiment with various workloads and see whether the head/tail insertion makes a difference for some workload but I think it's a separate project. Honza
On 22/09/08 10:57AM, Jan Kara wrote: > On Wed 07-09-22 23:35:07, Ritesh Harjani (IBM) wrote: > > On 22/09/06 05:29PM, Jan Kara wrote: > > > mb_set_largest_free_order() updates lists containing groups with largest > > > chunk of free space of given order. The way it updates it leads to > > > always moving the group to the tail of the list. Thus allocations > > > looking for free space of given order effectively end up cycling through > > > all groups (and due to initialization in last to first order). This > > > spreads allocations among block groups which reduces performance for > > > rotating disks or low-end flash media. Change > > > mb_set_largest_free_order() to only update lists if the order of the > > > largest free chunk in the group changed. > > > > Nice and clear explaination. Thanks :) > > > > This change also looks good to me. > > Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> > > Thanks for review! > > > One other thought to further optimize - > > Will it make a difference if rather then adding the group to the tail of the list, > > we add that group to the head of sbi->s_mb_largest_free_orders[new_order]. > > > > This is because this group is the latest from where blocks were allocated/freed, > > and hence the next allocation should first try from this group in order to keep > > the files/extents blocks close to each other? > > (That sometimes might help with disk firmware to avoid doing discards if the freed > > block can be reused?) > > > > Or does goal block will always cover that case by default and we might never > > require this? Maybe in a case of a new file within the same directory where > > the goal group has no free blocks, but the last group attempted should be > > retried first? > > So I was also wondering about this somewhat. I think that goal group will > take care of keeping file data together so head/tail insertion should not > matter too much for one file. Maybe if the allocation comes from a > different inode, then the head/tail insertion matters but then it is not > certain whether the allocation is actually related and what its order is > (depending on that we might prefer same / different group) so I've decided > to just keep things as they are. I agree it might be interesting to > investigate and experiment with various workloads and see whether the > head/tail insertion makes a difference for some workload but I think it's a > separate project. > Sure. Make sense. Thanks for still sharing your thoughts on it. -ritesh
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index 41e1cfecac3b..6251b4a6cc63 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -1077,23 +1077,25 @@ mb_set_largest_free_order(struct super_block *sb, struct ext4_group_info *grp) struct ext4_sb_info *sbi = EXT4_SB(sb); int i; - if (test_opt2(sb, MB_OPTIMIZE_SCAN) && grp->bb_largest_free_order >= 0) { + for (i = MB_NUM_ORDERS(sb) - 1; i >= 0; i--) + if (grp->bb_counters[i] > 0) + break; + /* No need to move between order lists? */ + if (!test_opt2(sb, MB_OPTIMIZE_SCAN) || + i == grp->bb_largest_free_order) { + grp->bb_largest_free_order = i; + return; + } + + if (grp->bb_largest_free_order >= 0) { write_lock(&sbi->s_mb_largest_free_orders_locks[ grp->bb_largest_free_order]); list_del_init(&grp->bb_largest_free_order_node); write_unlock(&sbi->s_mb_largest_free_orders_locks[ grp->bb_largest_free_order]); } - grp->bb_largest_free_order = -1; /* uninit */ - - for (i = MB_NUM_ORDERS(sb) - 1; i >= 0; i--) { - if (grp->bb_counters[i] > 0) { - grp->bb_largest_free_order = i; - break; - } - } - if (test_opt2(sb, MB_OPTIMIZE_SCAN) && - grp->bb_largest_free_order >= 0 && grp->bb_free) { + grp->bb_largest_free_order = i; + if (grp->bb_largest_free_order >= 0 && grp->bb_free) { write_lock(&sbi->s_mb_largest_free_orders_locks[ grp->bb_largest_free_order]); list_add_tail(&grp->bb_largest_free_order_node,
mb_set_largest_free_order() updates lists containing groups with largest chunk of free space of given order. The way it updates it leads to always moving the group to the tail of the list. Thus allocations looking for free space of given order effectively end up cycling through all groups (and due to initialization in last to first order). This spreads allocations among block groups which reduces performance for rotating disks or low-end flash media. Change mb_set_largest_free_order() to only update lists if the order of the largest free chunk in the group changed. Signed-off-by: Jan Kara <jack@suse.cz> --- fs/ext4/mballoc.c | 24 +++++++++++++----------- 1 file changed, 13 insertions(+), 11 deletions(-)