diff mbox

fix ext4_free_inode vs. ext4_claim_inode race

Message ID 49AE05D1.9050607@redhat.com
State Accepted, archived
Headers show

Commit Message

Eric Sandeen March 4, 2009, 4:38 a.m. UTC
I was seeing fsck errors on inode bitmaps after a 4 thread
dbench run on a 4 cpu machine:

Inode bitmap differences: -50736 -(50752--50753) etc...

I believe that this is because ext4_free_inode() uses atomic
bitops, and although ext4_new_inode() *used* to also use atomic 
bitops for synchronization, commit 
393418676a7602e1d7d3f6e560159c65c8cbd50e changed this to use
the sb_bgl_lock, so that we could also synchronize against
read_inode_bitmap and initialization of uninit inode tables.

However, that change left ext4_free_inode using atomic bitops,
which I think leaves no synchronization between setting & 
unsetting bits in the inode table.

The below patch fixes it for me, although I wonder if we're 
getting at all heavy-handed with this spinlock...

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
---


--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Aneesh Kumar K.V March 4, 2009, 7:06 p.m. UTC | #1
On Tue, Mar 03, 2009 at 10:38:41PM -0600, Eric Sandeen wrote:
> I was seeing fsck errors on inode bitmaps after a 4 thread
> dbench run on a 4 cpu machine:
> 
> Inode bitmap differences: -50736 -(50752--50753) etc...
> 
> I believe that this is because ext4_free_inode() uses atomic
> bitops, and although ext4_new_inode() *used* to also use atomic 
> bitops for synchronization, commit 
> 393418676a7602e1d7d3f6e560159c65c8cbd50e changed this to use
> the sb_bgl_lock, so that we could also synchronize against
> read_inode_bitmap and initialization of uninit inode tables.
> 
> However, that change left ext4_free_inode using atomic bitops,
> which I think leaves no synchronization between setting & 
> unsetting bits in the inode table.
> 
> The below patch fixes it for me, although I wonder if we're 
> getting at all heavy-handed with this spinlock...
> 
> Signed-off-by: Eric Sandeen <sandeen@redhat.com>

Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>


> ---
> 
> Index: linux-2.6/fs/ext4/ialloc.c
> ===================================================================
> --- linux-2.6.orig/fs/ext4/ialloc.c
> +++ linux-2.6/fs/ext4/ialloc.c
> @@ -188,7 +188,7 @@ void ext4_free_inode(handle_t *handle, s
>  	struct ext4_group_desc *gdp;
>  	struct ext4_super_block *es;
>  	struct ext4_sb_info *sbi;
> -	int fatal = 0, err, count;
> +	int fatal = 0, err, count, cleared;
>  	ext4_group_t flex_group;
> 
>  	if (atomic_read(&inode->i_count) > 1) {
> @@ -248,8 +248,10 @@ void ext4_free_inode(handle_t *handle, s
>  		goto error_return;
> 
>  	/* Ok, now we can actually update the inode bitmaps.. */
> -	if (!ext4_clear_bit_atomic(sb_bgl_lock(sbi, block_group),
> -					bit, bitmap_bh->b_data))
> +	spin_lock(sb_bgl_lock(sbi, block_group));
> +	cleared = ext4_clear_bit(bit, bitmap_bh->b_data);
> +	spin_unlock(sb_bgl_lock(sbi, block_group));
> +	if (!cleared)
>  		ext4_error(sb, "ext4_free_inode",
>  			   "bit already cleared for inode %lu", ino);
>  	else {
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Theodore Ts'o March 5, 2009, 12:06 a.m. UTC | #2
On Thu, Mar 05, 2009 at 12:36:59AM +0530, Aneesh Kumar K.V wrote:
> On Tue, Mar 03, 2009 at 10:38:41PM -0600, Eric Sandeen wrote:
> > I was seeing fsck errors on inode bitmaps after a 4 thread
> > dbench run on a 4 cpu machine:
> > 
> > Inode bitmap differences: -50736 -(50752--50753) etc...
> > 
> > I believe that this is because ext4_free_inode() uses atomic
> > bitops, and although ext4_new_inode() *used* to also use atomic 
> > bitops for synchronization, commit 
> > 393418676a7602e1d7d3f6e560159c65c8cbd50e changed this to use
> > the sb_bgl_lock, so that we could also synchronize against
> > read_inode_bitmap and initialization of uninit inode tables.
> > 
> > However, that change left ext4_free_inode using atomic bitops,
> > which I think leaves no synchronization between setting & 
> > unsetting bits in the inode table.
> > 
> > The below patch fixes it for me, although I wonder if we're 
> > getting at all heavy-handed with this spinlock...
> > 
> > Signed-off-by: Eric Sandeen <sandeen@redhat.com>
> 
> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

Added to the ext4 patch queue.  I will push this to Linus after I do a
bit of testing.

      	     	  		       	    - Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

Index: linux-2.6/fs/ext4/ialloc.c
===================================================================
--- linux-2.6.orig/fs/ext4/ialloc.c
+++ linux-2.6/fs/ext4/ialloc.c
@@ -188,7 +188,7 @@  void ext4_free_inode(handle_t *handle, s
 	struct ext4_group_desc *gdp;
 	struct ext4_super_block *es;
 	struct ext4_sb_info *sbi;
-	int fatal = 0, err, count;
+	int fatal = 0, err, count, cleared;
 	ext4_group_t flex_group;
 
 	if (atomic_read(&inode->i_count) > 1) {
@@ -248,8 +248,10 @@  void ext4_free_inode(handle_t *handle, s
 		goto error_return;
 
 	/* Ok, now we can actually update the inode bitmaps.. */
-	if (!ext4_clear_bit_atomic(sb_bgl_lock(sbi, block_group),
-					bit, bitmap_bh->b_data))
+	spin_lock(sb_bgl_lock(sbi, block_group));
+	cleared = ext4_clear_bit(bit, bitmap_bh->b_data);
+	spin_unlock(sb_bgl_lock(sbi, block_group));
+	if (!cleared)
 		ext4_error(sb, "ext4_free_inode",
 			   "bit already cleared for inode %lu", ino);
 	else {