Message ID | AANLkTimUmo5v14oKodaBKFtZPEHHuR9kZS5OSVtEFj2c@mail.gmail.com |
---|---|
State | Superseded, archived |
Headers | show |
Hi Ted, I have checked myself over and over again and I always end up at the same conclusion: grp->alloc_sem is NOT NEEDED after buddy of non-last group has been initialized! I see you have been busy with cleaning up the buffered write submissions path, removing unneeded locks among other improvements. I would help reviewing your patches, but I find following delayed allocation code way more complicated than following mballoc code ;-). Anyway, if you have some time now, please try to verify my (simple) patch, which: "should improve performance and improve SMP scalability ... in the best way possible --- don't take locks when they aren't needed!" I believe that in some workloads the buddy cache can be frequently loaded without eventually allocating blocks from that group (no matching extent found etc) and it is in those workloads, where we should see the most benefit from not taking grp->alloc_sem on mb_load_buddy(). Amir. On Wed, Feb 9, 2011 at 12:05 PM, Amir G. <amir73il@users.sourceforge.net> wrote: > Hi Aneesh, > > As you are signed off on most of the recent alloc_sem related code changes, > can you please comment on the patch below, which tries to avoid taking > the read lock most of the times on a 4K block fs. > > Can anyone tell what performance impact (if any) will be caused by avoiding > the read lock on most allocations? group spin lock will still be taken, but for > much shorter periods of time (cycles). > > Any ideas how this patch can be properly tested? > > Thanks, > Amir. > > grp->alloc_sem is used to synchronize buddy cache users with buddy cache init > of other groups that use the same buddy cache page and with adding blocks to > group on online resize. > > When blocks_per_page <= 2, each group has it's own private buddy cache page > so taking the read lock for every allocation is futile and can be avoided for > every group, but the last one. > > The write lock is taken in ext4_mb_init_group() and in ext4_add_groupblocks() > to synchronize the buddy cache init of a group on first time allocation after > mount and after extending the last group. > > Signed-off-by: Amir Goldstein <amir73il@users.sf.net> > --- > fs/ext4/mballoc.c | 19 +++++++++++++++---- > 1 files changed, 15 insertions(+), 4 deletions(-) > > diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c > index 1b3256b..22a5251 100644 > --- a/fs/ext4/mballoc.c > +++ b/fs/ext4/mballoc.c > @@ -1160,7 +1160,15 @@ ext4_mb_load_buddy(struct super_block *sb, > ext4_group_t group, > e4b->bd_group = group; > e4b->bd_buddy_page = NULL; > e4b->bd_bitmap_page = NULL; > - e4b->alloc_semp = &grp->alloc_sem; > + /* > + * We only need to take the read lock if other groups share the buddy > + * page with this group or if blocks may be added to this (last) group > + * by ext4_group_extend(). > + */ > + if (blocks_per_page > 2 || group == sbi->s_groups_count - 1) > + e4b->alloc_semp = &grp->alloc_sem; > + else > + e4b->alloc_semp = NULL; > > /* Take the read lock on the group alloc > * sem. This would make sure a parallel > @@ -1169,7 +1177,8 @@ ext4_mb_load_buddy(struct super_block *sb, > ext4_group_t group, > * till we are done with allocation > */ > repeat_load_buddy: > - down_read(e4b->alloc_semp); > + if (e4b->alloc_semp) > + down_read(e4b->alloc_semp); > > if (unlikely(EXT4_MB_GRP_NEED_INIT(grp))) { > /* we need to check for group need init flag > @@ -1177,7 +1186,8 @@ repeat_load_buddy: > * that new blocks didn't get added to the group > * when we are loading the buddy cache > */ > - up_read(e4b->alloc_semp); > + if (e4b->alloc_semp) > + up_read(e4b->alloc_semp); > /* > * we need full data about the group > * to make a good selection > @@ -1277,7 +1287,8 @@ err: > e4b->bd_bitmap = NULL; > > /* Done with the buddy cache */ > - up_read(e4b->alloc_semp); > + if (e4b->alloc_semp) > + up_read(e4b->alloc_semp); > return ret; > } > > -- > 1.7.0.4 > -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 9 Feb 2011 12:05:11 +0200, "Amir G." <amir73il@users.sourceforge.net> wrote: > Hi Aneesh, > > As you are signed off on most of the recent alloc_sem related code changes, > can you please comment on the patch below, which tries to avoid taking > the read lock most of the times on a 4K block fs. > > Can anyone tell what performance impact (if any) will be caused by avoiding > the read lock on most allocations? group spin lock will still be taken, but for > much shorter periods of time (cycles). > > Any ideas how this patch can be properly tested? A quick check says the changes are correct. But i am not sure whether we want to conditionalize these locks unless they appear as highly contented locks in a profile. > > Thanks, > Amir. > > grp->alloc_sem is used to synchronize buddy cache users with buddy cache init > of other groups that use the same buddy cache page and with adding blocks to > group on online resize. > > When blocks_per_page <= 2, each group has it's own private buddy cache page > so taking the read lock for every allocation is futile and can be avoided for > every group, but the last one. > > The write lock is taken in ext4_mb_init_group() and in ext4_add_groupblocks() > to synchronize the buddy cache init of a group on first time allocation after > mount and after extending the last group. > > Signed-off-by: Amir Goldstein <amir73il@users.sf.net> > --- > fs/ext4/mballoc.c | 19 +++++++++++++++---- > 1 files changed, 15 insertions(+), 4 deletions(-) > > diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c > index 1b3256b..22a5251 100644 > --- a/fs/ext4/mballoc.c > +++ b/fs/ext4/mballoc.c > @@ -1160,7 +1160,15 @@ ext4_mb_load_buddy(struct super_block *sb, > ext4_group_t group, > e4b->bd_group = group; > e4b->bd_buddy_page = NULL; > e4b->bd_bitmap_page = NULL; > - e4b->alloc_semp = &grp->alloc_sem; > + /* > + * We only need to take the read lock if other groups share the buddy > + * page with this group or if blocks may be added to this (last) group > + * by ext4_group_extend(). > + */ > + if (blocks_per_page > 2 || group == sbi->s_groups_count - 1) If we can say groups_per_page > 1 that would make it more clear. > + e4b->alloc_semp = &grp->alloc_sem; > + else > + e4b->alloc_semp = NULL; > > /* Take the read lock on the group alloc > * sem. This would make sure a parallel > @@ -1169,7 +1177,8 @@ ext4_mb_load_buddy(struct super_block *sb, > ext4_group_t group, > * till we are done with allocation > */ > repeat_load_buddy: > - down_read(e4b->alloc_semp); > + if (e4b->alloc_semp) > + down_read(e4b->alloc_semp); > > if (unlikely(EXT4_MB_GRP_NEED_INIT(grp))) { > /* we need to check for group need init flag > @@ -1177,7 +1186,8 @@ repeat_load_buddy: > * that new blocks didn't get added to the group > * when we are loading the buddy cache > */ > - up_read(e4b->alloc_semp); > + if (e4b->alloc_semp) > + up_read(e4b->alloc_semp); > /* > * we need full data about the group > * to make a good selection > @@ -1277,7 +1287,8 @@ err: > e4b->bd_bitmap = NULL; > > /* Done with the buddy cache */ > - up_read(e4b->alloc_semp); > + if (e4b->alloc_semp) > + up_read(e4b->alloc_semp); > return ret; > } > -aneesh -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Feb 14, 2011 at 11:34 AM, Aneesh Kumar K. V <aneesh.kumar@linux.vnet.ibm.com> wrote: > On Wed, 9 Feb 2011 12:05:11 +0200, "Amir G." <amir73il@users.sourceforge.net> wrote: >> Hi Aneesh, >> >> As you are signed off on most of the recent alloc_sem related code changes, >> can you please comment on the patch below, which tries to avoid taking >> the read lock most of the times on a 4K block fs. >> >> Can anyone tell what performance impact (if any) will be caused by avoiding >> the read lock on most allocations? group spin lock will still be taken, but for >> much shorter periods of time (cycles). >> >> Any ideas how this patch can be properly tested? > > A quick check says the changes are correct. But i am not sure whether we > want to conditionalize these locks unless they appear as highly > contented locks in a profile. > Hi Aneesh, In general, I would agree with your statement about not conditioning locks, but in this case, the condition for locking is not only unlikely(), but 0, in the VERY common use case of 4K block. So keeping the lock for the sake of generalization seems just wrong. Unfortunately, I do not have a large scale SMP system under my hands to find out exactly how contended is grp->alloc_sem, but I expect will be contended in situations of fragmented block groups, where finding a good extend requires searching over several block groups, without ever taking ext4_lock_group(). Furthermore, I actually need this patch to resolve lock dependencies in my snapshots implementation, so this is not just a "do the right thing" argument. Besides, e4b->alloc_semp is already tested in any other place in the code, except for the 3 places I added in the patch, so it not like the condition is complicating the code. For the case of last block group, maybe it would be more readable to change s_resize_lock to s_resize_sem and take down_read(&sb->s_resize_sem) when loading buddy of last group or more accurately, extend-able last group. We can have also have ext4_mb_good_group() only allow allocation from extend-able last group for MB_HINT_DATA allocations or make it available via another allocation hint. (I don't intend to allow it for snapshot COW allocations) Finally, I would consider changing grp->alloc_sem to grp->init_sem, as the name is misleading people to believe it is protecting against concurrent allocations, while it is really protecting against concurrent init of buddy cache. grp->init_sem may also be a more appropriate name for its other use for protecting concurrent lazy inode table init and inode allocations. Thanks a lot for taking the time to review my patch, Amir. >> >> Thanks, >> Amir. >> >> grp->alloc_sem is used to synchronize buddy cache users with buddy cache init >> of other groups that use the same buddy cache page and with adding blocks to >> group on online resize. >> >> When blocks_per_page <= 2, each group has it's own private buddy cache page >> so taking the read lock for every allocation is futile and can be avoided for >> every group, but the last one. >> >> The write lock is taken in ext4_mb_init_group() and in ext4_add_groupblocks() >> to synchronize the buddy cache init of a group on first time allocation after >> mount and after extending the last group. >> >> Signed-off-by: Amir Goldstein <amir73il@users.sf.net> >> --- >> fs/ext4/mballoc.c | 19 +++++++++++++++---- >> 1 files changed, 15 insertions(+), 4 deletions(-) >> >> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c >> index 1b3256b..22a5251 100644 >> --- a/fs/ext4/mballoc.c >> +++ b/fs/ext4/mballoc.c >> @@ -1160,7 +1160,15 @@ ext4_mb_load_buddy(struct super_block *sb, >> ext4_group_t group, >> e4b->bd_group = group; >> e4b->bd_buddy_page = NULL; >> e4b->bd_bitmap_page = NULL; >> - e4b->alloc_semp = &grp->alloc_sem; >> + /* >> + * We only need to take the read lock if other groups share the buddy >> + * page with this group or if blocks may be added to this (last) group >> + * by ext4_group_extend(). >> + */ >> + if (blocks_per_page > 2 || group == sbi->s_groups_count - 1) > > > If we can say groups_per_page > 1 that would make it more clear. > I agree. I just wanted to keep my patch 2 lines shorter ;-) >> + e4b->alloc_semp = &grp->alloc_sem; >> + else >> + e4b->alloc_semp = NULL; >> >> /* Take the read lock on the group alloc >> * sem. This would make sure a parallel >> @@ -1169,7 +1177,8 @@ ext4_mb_load_buddy(struct super_block *sb, >> ext4_group_t group, >> * till we are done with allocation >> */ >> repeat_load_buddy: >> - down_read(e4b->alloc_semp); >> + if (e4b->alloc_semp) >> + down_read(e4b->alloc_semp); >> >> if (unlikely(EXT4_MB_GRP_NEED_INIT(grp))) { >> /* we need to check for group need init flag >> @@ -1177,7 +1186,8 @@ repeat_load_buddy: >> * that new blocks didn't get added to the group >> * when we are loading the buddy cache >> */ >> - up_read(e4b->alloc_semp); >> + if (e4b->alloc_semp) >> + up_read(e4b->alloc_semp); >> /* >> * we need full data about the group >> * to make a good selection >> @@ -1277,7 +1287,8 @@ err: >> e4b->bd_bitmap = NULL; >> >> /* Done with the buddy cache */ >> - up_read(e4b->alloc_semp); >> + if (e4b->alloc_semp) >> + up_read(e4b->alloc_semp); >> return ret; >> } >> > > -aneesh > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2011-02-14, at 0:52, "Amir G." <amir73il@users.sourceforge.net> wrote: >> @@ -1160,7 +1160,15 @@ ext4_mb_load_buddy(struct super_block *sb, >> >> + /* >> + * We only need to take the read lock if other groups share the buddy >> + * page with this group or if blocks may be added to this (last) group >> + * by ext4_group_extend(). >> + */ >> + if (blocks_per_page > 2 || group == sbi->s_groups_count - 1) >> + e4b->alloc_semp = &grp->alloc_sem; No comment on whether this change is safe or not, but shouldn't this check be: if (blocks_per_page > 1 || Cheers, Andreas-- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Feb 14, 2011 at 6:30 PM, Andreas Dilger <adilger@dilger.ca> wrote: > On 2011-02-14, at 0:52, "Amir G." <amir73il@users.sourceforge.net> wrote: >>> @@ -1160,7 +1160,15 @@ ext4_mb_load_buddy(struct super_block *sb, >>> >>> + /* >>> + * We only need to take the read lock if other groups share the buddy >>> + * page with this group or if blocks may be added to this (last) group >>> + * by ext4_group_extend(). >>> + */ >>> + if (blocks_per_page > 2 || group == sbi->s_groups_count - 1) >>> + e4b->alloc_semp = &grp->alloc_sem; > > No comment on whether this change is safe or not, but shouldn't this check be: > > if (blocks_per_page > 1 || > No, it should be groups_per_page > 1, as Aneesh suggested, which translates to blocks_per_group > 2, but this is obviously not clear from the code... -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index 1b3256b..22a5251 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -1160,7 +1160,15 @@ ext4_mb_load_buddy(struct super_block *sb, ext4_group_t group, e4b->bd_group = group; e4b->bd_buddy_page = NULL; e4b->bd_bitmap_page = NULL; - e4b->alloc_semp = &grp->alloc_sem; + /* + * We only need to take the read lock if other groups share the buddy + * page with this group or if blocks may be added to this (last) group + * by ext4_group_extend(). + */ + if (blocks_per_page > 2 || group == sbi->s_groups_count - 1) + e4b->alloc_semp = &grp->alloc_sem; + else + e4b->alloc_semp = NULL; /* Take the read lock on the group alloc
Hi Aneesh, As you are signed off on most of the recent alloc_sem related code changes, can you please comment on the patch below, which tries to avoid taking the read lock most of the times on a 4K block fs. Can anyone tell what performance impact (if any) will be caused by avoiding the read lock on most allocations? group spin lock will still be taken, but for much shorter periods of time (cycles). Any ideas how this patch can be properly tested? Thanks, Amir. grp->alloc_sem is used to synchronize buddy cache users with buddy cache init of other groups that use the same buddy cache page and with adding blocks to group on online resize. When blocks_per_page <= 2, each group has it's own private buddy cache page so taking the read lock for every allocation is futile and can be avoided for every group, but the last one. The write lock is taken in ext4_mb_init_group() and in ext4_add_groupblocks() to synchronize the buddy cache init of a group on first time allocation after mount and after extending the last group. Signed-off-by: Amir Goldstein <amir73il@users.sf.net> --- fs/ext4/mballoc.c | 19 +++++++++++++++---- 1 files changed, 15 insertions(+), 4 deletions(-) * sem. This would make sure a parallel @@ -1169,7 +1177,8 @@ ext4_mb_load_buddy(struct super_block *sb, ext4_group_t group, * till we are done with allocation */ repeat_load_buddy: - down_read(e4b->alloc_semp); + if (e4b->alloc_semp) + down_read(e4b->alloc_semp); if (unlikely(EXT4_MB_GRP_NEED_INIT(grp))) { /* we need to check for group need init flag @@ -1177,7 +1186,8 @@ repeat_load_buddy: * that new blocks didn't get added to the group * when we are loading the buddy cache */ - up_read(e4b->alloc_semp); + if (e4b->alloc_semp) + up_read(e4b->alloc_semp); /* * we need full data about the group * to make a good selection @@ -1277,7 +1287,8 @@ err: e4b->bd_bitmap = NULL; /* Done with the buddy cache */ - up_read(e4b->alloc_semp); + if (e4b->alloc_semp) + up_read(e4b->alloc_semp); return ret; }