diff mbox

[-V4] ext4: Fix lockdep recursive locking warning

Message ID 20081123163349.GB17002@skywalker
State Accepted, archived
Headers show

Commit Message

Aneesh Kumar K.V Nov. 23, 2008, 4:33 p.m. UTC
On Sat, Nov 22, 2008 at 09:49:11PM -0500, Theodore Tso wrote:
> On Sat, Nov 22, 2008 at 03:46:25PM -0500, Theodore Tso wrote:
> > On Fri, Nov 21, 2008 at 10:10:46PM +0530, Aneesh Kumar K.V wrote:
> > > Indicate that the group locks can be taken in loop.
> > 
> > I've been looking at this patch more closely, and I think there's a
> > major problem here.
> 
> OK, after looking at this in yet more detail (and having changed
> planes in Dallas :-), I am more than ever convinced this patch is not
> rightq.  We have an rw_sem for each block group, grp->alloc_sem, which
> is allocated in groups of meta blockgroups.  The whole reason why we
> should worry about keeping them in the same class is we should worry
> about is if for some reason, the multiblock allocator happens to
> allocate two block group's alloc_sem, but one does them out of order
> (say, bg 4, then bg 2, while another does bg 2, then 4), we would get
> a dead lock.
> 
> I'm guessing that what caused the problem for you was
> ext4_mb_init_group(), which if you are using 1k filesystems, tries to
> grab multiple grp->alloc_sem's.  In each place where we find those, we
> need to use down_write_nested --- see Documentation/lockdep-design.txt.  

Correct

> 
> If there are any other places in mballoc.c which grabs multiple
> alloc_sem's at the same time, we'll have to use define new subclasses.

No. That is the only call site.

How about the below patch. We can have more than 2 groups in a page
depending on the page size and blocksize. So instead of using
single_depth I guess we should use the relative group number ?.

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Theodore Ts'o Nov. 23, 2008, 6:32 p.m. UTC | #1
On Sun, Nov 23, 2008 at 10:03:49PM +0530, Aneesh Kumar K.V wrote:
> 
> How about the below patch. We can have more than 2 groups in a page
> depending on the page size and blocksize. So instead of using
> single_depth I guess we should use the relative group number ?.

That should work.  The maximum number of subclasses that we can have
by default is 8.  With 16k pages, that will barely be enough for 1k
blocksize file systems (since we lock alloc_sem for
page_size/(2*fs_block_size) block groups).  If we need more than that,
we might be better off just locking the entire filesystem against
block allocations, since after all this is a pretty rare case; it's
used only when we resize or when the filesystem is getting mounted.

     	       	  	    	 - Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Theodore Ts'o Nov. 24, 2008, 5:02 a.m. UTC | #2
I've added your patch to the patch queue, using the following commit
comment, and using it to replace
aneesh-9-fix-lockdeep-recursive-locking-warning in the patch queue.

Please note that commit description explains what was the problem you
were trying to solve, some notes about why this works, what the
limitations might be with the approach.  This is the kind of commit
logs we should strive for.  We've been complemented for the clarity of
our commit logs, and much of that is because I've been rewriting the
changelog messages.  If everyone who submits patches could strive to
meet similar standards, I'd greatly appreciated.

						- Ted

ext4: Fix lockdep recursive locking warning

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

In ext4_mb_init_group(), if the filesystem block size is less than
PAGE_SIZE/2, the code tries to grab alloc_sem for multiple block
groups in a loop.  We need to allow for this by using
down_write_nested() and passing in the loop index as a lock subclass
number.  This works because no other code path needs to take multiple
alloc_sem's.  Note that lockdep will fail for filesystem blocksize
smaller than to PAGE_SIZE/16k.  (e.g., a 1k filesystem blocksize with
a 32k page size, or a 2k filesystem blocksize with a 64k blocksize,
etc.)
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 1fa311c..891ce41 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -1783,7 +1783,7 @@  static int ext4_mb_init_group(struct super_block *sb, ext4_group_t group)
 		 * no block allocation going on in any
 		 * of that groups
 		 */
-		down_write(&grp->alloc_sem);
+		down_write_nested(&grp->alloc_sem, i);
 	}
 	/*
 	 * make sure we look at only those groups