Patchwork [v2,5/5] ext4: remove alloc_semp

login
register
mail settings
Submitter Amir G.
Date March 24, 2011, 4:58 p.m.
Message ID <1300985893-4371-6-git-send-email-amir73il@users.sourceforge.net>
Download mbox | patch
Permalink /patch/88244/
State Superseded
Headers show

Comments

Amir G. - March 24, 2011, 4:58 p.m.
From: Amir Goldstein <amir73il@users.sf.net>

After taking care of all group init races, all that remains is to
remove alloc_semp from ext4_allocation_context and ext4_buddy structs.

Signed-off-by: Amir Goldstein <amir73il@users.sf.net>
---
 fs/ext4/mballoc.c |   31 +------------------------------
 fs/ext4/mballoc.h |    6 ------
 2 files changed, 1 insertions(+), 36 deletions(-)
Theodore Ts'o - May 10, 2011, 1:29 p.m.
On Thu, Mar 24, 2011 at 06:58:13PM +0200, amir73il@users.sourceforge.net wrote:
> From: Amir Goldstein <amir73il@users.sf.net>
> 
> After taking care of all group init races, all that remains is to
> remove alloc_semp from ext4_allocation_context and ext4_buddy structs.
> 
> Signed-off-by: Amir Goldstein <amir73il@users.sf.net>

Hey Amir,

What sort of testing have you done with this patch series?  In
particular, have you tried doing online resizes while the file system
is heavily loaded?

						- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Amir G. - May 10, 2011, 6:20 p.m.
On Tue, May 10, 2011 at 4:29 PM, Ted Ts'o <tytso@mit.edu> wrote:
> On Thu, Mar 24, 2011 at 06:58:13PM +0200, amir73il@users.sourceforge.net wrote:
>> From: Amir Goldstein <amir73il@users.sf.net>
>>
>> After taking care of all group init races, all that remains is to
>> remove alloc_semp from ext4_allocation_context and ext4_buddy structs.
>>
>> Signed-off-by: Amir Goldstein <amir73il@users.sf.net>
>
> Hey Amir,
>
> What sort of testing have you done with this patch series?  In
> particular, have you tried doing online resizes while the file system
> is heavily loaded?

I have tested many small resizes, not block group aligned and added
debug prints in mballoc.c to verify that in-memory bb_free and group_desc
bg_free_blocks remain in sync.

Sorry, I did not run the same tests under heavy load.
I would think that one would need to know exactly how to load his
system to make that load relevant to this test, rather than blindly
fsstressing the system.
If you have specific ideas for testing please let us know.
I think Yongqiang could run these tests.

Regarding the "risk" of online resize with this patch,
if you look at ext4_add_groupblocks() in it's new form,
you will surely notice that it is an exact replica of ext4_free_blocks()
with lots of irrelevant code removed, so the code path of online resize,
is essentially the same as the code patch of freeing any range of blocks
with regards to buddy cache init and locks.

It was actually possible to alter ext4_free_blocks() a bit and reuse it
to do most of the work done by ext4_add_groupblocks(), but I did not
want to make these changes to ext4_free_blocks(). it's your call if
you think that would be better off.

Anyway, I was kind of hoping for another pair of eyes to validate my changes
when I first posted these patches. So what do you say?
Does the core change to synchronize ext4_mb_init_group() by locking
the buddy pages look reasonable to you?
You were the one who suggested that approach in the first place.

Amir.
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 1ddb92b..5449b1a 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -1130,24 +1130,8 @@  ext4_mb_load_buddy(struct super_block *sb, ext4_group_t group,
 	e4b->bd_group = group;
 	e4b->bd_buddy_page = NULL;
 	e4b->bd_bitmap_page = NULL;
-	e4b->alloc_semp = &grp->alloc_sem;
-
-	/* Take the read lock on the group alloc
-	 * sem. This would make sure a parallel
-	 * ext4_mb_init_group happening on other
-	 * groups mapped by the page is blocked
-	 * till we are done with allocation
-	 */
-repeat_load_buddy:
-	down_read(e4b->alloc_semp);
 
 	if (unlikely(EXT4_MB_GRP_NEED_INIT(grp))) {
-		/* we need to check for group need init flag
-		 * with alloc_semp held so that we can be sure
-		 * that new blocks didn't get added to the group
-		 * when we are loading the buddy cache
-		 */
-		up_read(e4b->alloc_semp);
 		/*
 		 * we need full data about the group
 		 * to make a good selection
@@ -1155,7 +1139,6 @@  repeat_load_buddy:
 		ret = ext4_mb_init_group(sb, group);
 		if (ret)
 			return ret;
-		goto repeat_load_buddy;
 	}
 
 	/*
@@ -1245,9 +1228,6 @@  err:
 		page_cache_release(e4b->bd_buddy_page);
 	e4b->bd_buddy = NULL;
 	e4b->bd_bitmap = NULL;
-
-	/* Done with the buddy cache */
-	up_read(e4b->alloc_semp);
 	return ret;
 }
 
@@ -1257,9 +1237,6 @@  static void ext4_mb_unload_buddy(struct ext4_buddy *e4b)
 		page_cache_release(e4b->bd_bitmap_page);
 	if (e4b->bd_buddy_page)
 		page_cache_release(e4b->bd_buddy_page);
-	/* Done with the buddy cache */
-	if (e4b->alloc_semp)
-		up_read(e4b->alloc_semp);
 }
 
 
@@ -1572,9 +1549,6 @@  static void ext4_mb_use_best_found(struct ext4_allocation_context *ac,
 	get_page(ac->ac_bitmap_page);
 	ac->ac_buddy_page = e4b->bd_buddy_page;
 	get_page(ac->ac_buddy_page);
-	/* on allocation we use ac to track the held semaphore */
-	ac->alloc_semp =  e4b->alloc_semp;
-	e4b->alloc_semp = NULL;
 	/* store last allocated for subsequent stream allocation */
 	if (ac->ac_flags & EXT4_MB_STREAM_ALLOC) {
 		spin_lock(&sbi->s_md_lock);
@@ -4192,15 +4166,12 @@  static int ext4_mb_release_context(struct ext4_allocation_context *ac)
 			spin_unlock(&pa->pa_lock);
 		}
 	}
-	if (ac->alloc_semp)
-		up_read(ac->alloc_semp);
 	if (pa) {
 		/*
 		 * We want to add the pa to the right bucket.
 		 * Remove it from the list and while adding
 		 * make sure the list to which we are adding
-		 * doesn't grow big.  We need to release
-		 * alloc_semp before calling ext4_mb_add_n_trim()
+		 * doesn't grow big.
 		 */
 		if ((pa->pa_type == MB_GROUP_PA) && likely(pa->pa_free)) {
 			spin_lock(pa->pa_obj_lock);
diff --git a/fs/ext4/mballoc.h b/fs/ext4/mballoc.h
index 22bd4d7..20b5e7b 100644
--- a/fs/ext4/mballoc.h
+++ b/fs/ext4/mballoc.h
@@ -193,11 +193,6 @@  struct ext4_allocation_context {
 	__u8 ac_op;		/* operation, for history only */
 	struct page *ac_bitmap_page;
 	struct page *ac_buddy_page;
-	/*
-	 * pointer to the held semaphore upon successful
-	 * block allocation
-	 */
-	struct rw_semaphore *alloc_semp;
 	struct ext4_prealloc_space *ac_pa;
 	struct ext4_locality_group *ac_lg;
 };
@@ -215,7 +210,6 @@  struct ext4_buddy {
 	struct super_block *bd_sb;
 	__u16 bd_blkbits;
 	ext4_group_t bd_group;
-	struct rw_semaphore *alloc_semp;
 };
 #define EXT4_MB_BITMAP(e4b)	((e4b)->bd_bitmap)
 #define EXT4_MB_BUDDY(e4b)	((e4b)->bd_buddy)