diff mbox

[-V4] ext4: Fix lockdep recursive locking warning

Message ID 1227285646-16263-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com
State Superseded, archived
Headers show

Commit Message

Aneesh Kumar K.V Nov. 21, 2008, 4:40 p.m. UTC
Indicate that the group locks can be taken in loop.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
---
 fs/ext4/mballoc.c |   11 ++++++++++-
 1 files changed, 10 insertions(+), 1 deletions(-)

Comments

Aneesh Kumar K.V Nov. 21, 2008, 4:48 p.m. UTC | #1
Hi Ted,

Along with this change you can drop the patch
aneesh-8-fix-double-free-of-blocks from the patchqueue.
The changes are not needed. We were finding double free
due to a race in uninit bg code which i am fixing in 
series sent after this mail.

-aneesh
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Theodore Ts'o Nov. 22, 2008, 8:46 p.m. UTC | #2
On Fri, Nov 21, 2008 at 10:10:46PM +0530, Aneesh Kumar K.V wrote:
> Indicate that the group locks can be taken in loop.

I've been looking at this patch more closely, and I think there's a
major problem here.  You've statically declared alloc_sem_key to be
NR_BG_LOCKS:

> +#ifdef CONFIG_LOCKDEP
> +static struct lock_class_key alloc_sem_key[NR_BG_LOCKS];
> +#endif

NR_BG_LOCKS is defined in include/linux/blockgroup_lock.h, and is 4 if
NR_CPUS is 1 or 2, 8 if NR_CPUS is 3, 16 if NR_CPUS is between 4 and
7, 32 if NR_CPUS is between 8 and 15, and so on.

It gets used this way:

> +#ifdef CONFIG_LOCKDEP
> +	__init_rwsem(&meta_group_info[i]->alloc_sem,
> +			"&meta_group_info[i]->alloc_sem",
> +			&alloc_sem_key[i]);

But i is set thusly:

    i = group & (EXT4_DESC_PER_BLOCK(sb) - 1);

which means i is between 0 and 127 if the filesystem has block 4k
filesystem....

It's also not clear to me that this will do the right thing if there
are multiple ext4 filesystems mounted.  Since we are using a static
array for the lockdep class keys, that means that sb->s_group_info[x]
for one filesystem is considered in the same lockdep class as
sb->s_group_info[x] for another filesystem.  This could cause false
positives if there are multiple ext4 filesystems mounted and two CPU's
are simultaneously accessing the filesystems and then access the two
s_group_info structures in different orders.  Am I missing something?

	     		   	     	      	 - Ted

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Theodore Ts'o Nov. 23, 2008, 2:49 a.m. UTC | #3
On Sat, Nov 22, 2008 at 03:46:25PM -0500, Theodore Tso wrote:
> On Fri, Nov 21, 2008 at 10:10:46PM +0530, Aneesh Kumar K.V wrote:
> > Indicate that the group locks can be taken in loop.
> 
> I've been looking at this patch more closely, and I think there's a
> major problem here.

OK, after looking at this in yet more detail (and having changed
planes in Dallas :-), I am more than ever convinced this patch is not
rightq.  We have an rw_sem for each block group, grp->alloc_sem, which
is allocated in groups of meta blockgroups.  The whole reason why we
should worry about keeping them in the same class is we should worry
about is if for some reason, the multiblock allocator happens to
allocate two block group's alloc_sem, but one does them out of order
(say, bg 4, then bg 2, while another does bg 2, then 4), we would get
a dead lock.

I'm guessing that what caused the problem for you was
ext4_mb_init_group(), which if you are using 1k filesystems, tries to
grab multiple grp->alloc_sem's.  In each place where we find those, we
need to use down_write_nested --- see Documentation/lockdep-design.txt.  

If there are any other places in mballoc.c which grabs multiple
alloc_sem's at the same time, we'll have to use define new subclasses.

						- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 7293209..1fa311c 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -2413,6 +2413,9 @@  ext4_mb_store_history(struct ext4_allocation_context *ac)
 #define ext4_mb_history_init(sb)
 #endif
 
+#ifdef CONFIG_LOCKDEP
+static struct lock_class_key alloc_sem_key[NR_BG_LOCKS];
+#endif
 
 /* Create and initialize ext4_group_info data for the given group. */
 int ext4_mb_add_groupinfo(struct super_block *sb, ext4_group_t group,
@@ -2473,8 +2476,14 @@  int ext4_mb_add_groupinfo(struct super_block *sb, ext4_group_t group,
 	}
 
 	INIT_LIST_HEAD(&meta_group_info[i]->bb_prealloc_list);
-	init_rwsem(&meta_group_info[i]->alloc_sem);
+#ifdef CONFIG_LOCKDEP
+	__init_rwsem(&meta_group_info[i]->alloc_sem,
+			"&meta_group_info[i]->alloc_sem",
+			&alloc_sem_key[i]);
 	meta_group_info[i]->bb_free_root.rb_node = NULL;;
+#else
+	init_rwsem(&meta_group_info[i]->alloc_sem);
+#endif
 
 #ifdef DOUBLE_CHECK
 	{