diff mbox

Apparent serious progressive ext4 data corruption bug in 3.6.3 (and other stable branches?)

Message ID 20121029010012.GE9161@thunk.org
State Superseded, archived
Headers show

Commit Message

Theodore Ts'o Oct. 29, 2012, 1 a.m. UTC
On Sat, Oct 27, 2012 at 05:42:07PM -0500, Eric Sandeen wrote:
> 
> It looks like the inode_bitmap_bh is being modified outside a transaction:
> 
>                 ret2 = ext4_test_and_set_bit(ino, inode_bitmap_bh->b_data);
> 
> It needs a get_write_access / handle_dirty_metadata pair around it.

Oops.   Nice catch!!

The patch isn't quite right, though.  We only want to call
ext4_journal_get_write_access() when we know that there is an available
bit in the bitmap.  (We could still lose the race, but in that case
the winner of the race probably grabbed the bitmap block first.)

Also, we only need to call ext4_handle_dirty_metadata() if we
successfully grab the bit in the bitmap.

So I suggest this patch instead:

commit 087eda81f1ac6a6a0394f781b44f1d555d8f64c6
Author: Eric Sandeen <sandeen@redhat.com>
Date:   Sun Oct 28 20:59:57 2012 -0400

    ext4: fix unjournaled inode bitmap modification
    
    commit 119c0d4460b001e44b41dcf73dc6ee794b98bd31 modified this function
    such that the inode bitmap was being modified outside a transaction,
    which could lead to corruption, and was discovered when journal_checksum
    found a bad checksum in the journal.
    
    Reported-by: Nix <nix@esperi.org.uk>
    Signed-off-by: Eric Sandeen <sandeen@redhat.com>
    Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
    Cc: stable@vger.kernel.org

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Nix Oct. 29, 2012, 1:04 a.m. UTC | #1
On 29 Oct 2012, Theodore Ts'o spake thusly:

>     commit 119c0d4460b001e44b41dcf73dc6ee794b98bd31 modified this function
>     such that the inode bitmap was being modified outside a transaction,
>     which could lead to corruption, and was discovered when journal_checksum
>     found a bad checksum in the journal.

Hm. If this could have caused corruption for non-checksum users, it must
be a pretty rare case if nobody's hit it in six months -- or maybe, I
suppose, they hit it and never noticed. (But, hey, this makes me happier
to have reported this despite all the flap, if it's found a genuine bug
that could have hit people not using wierdo mount options.)

Thanks for spending so much time on this fix. Much appreciated.
Eric Sandeen Oct. 29, 2012, 2:24 a.m. UTC | #2
On 10/28/12 8:00 PM, Theodore Ts'o wrote:
> On Sat, Oct 27, 2012 at 05:42:07PM -0500, Eric Sandeen wrote:
>>
>> It looks like the inode_bitmap_bh is being modified outside a transaction:
>>
>>                 ret2 = ext4_test_and_set_bit(ino, inode_bitmap_bh->b_data);
>>
>> It needs a get_write_access / handle_dirty_metadata pair around it.
> 
> Oops.   Nice catch!!
> 
> The patch isn't quite right, though.  

Yeah, I knew it wasn't ;)  I did resend 
[PATCH] ext4: fix unjournaled inode bitmap modification
which is a bit more involved.

> We only want to call
> ext4_journal_get_write_access() when we know that there is an available
> bit in the bitmap.  (We could still lose the race, but in that case
> the winner of the race probably grabbed the bitmap block first.)
> 
> Also, we only need to call ext4_handle_dirty_metadata() if we
> successfully grab the bit in the bitmap.
> 
> So I suggest this patch instead:

That'll get_write_access on the same buffer over and over, I suppose
it's ok, but the patch I sent tries to minimize that, and call
ext4_handle_release_buffer if we're not going to use it (which is
a no-op today anyway and not normally used I guess...)

If ext4_handle_release_buffer() is dead code now, and repeated calls
via repeat_in_this_group: are no big deal, then your version looks fine.

-Eric

> commit 087eda81f1ac6a6a0394f781b44f1d555d8f64c6
> Author: Eric Sandeen <sandeen@redhat.com>
> Date:   Sun Oct 28 20:59:57 2012 -0400
> 
>     ext4: fix unjournaled inode bitmap modification
>     
>     commit 119c0d4460b001e44b41dcf73dc6ee794b98bd31 modified this function
>     such that the inode bitmap was being modified outside a transaction,
>     which could lead to corruption, and was discovered when journal_checksum
>     found a bad checksum in the journal.
>     
>     Reported-by: Nix <nix@esperi.org.uk>
>     Signed-off-by: Eric Sandeen <sandeen@redhat.com>
>     Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
>     Cc: stable@vger.kernel.org
> 
> diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
> index 4facdd2..575afac 100644
> --- a/fs/ext4/ialloc.c
> +++ b/fs/ext4/ialloc.c
> @@ -725,6 +725,10 @@ repeat_in_this_group:
>  				   "inode=%lu", ino + 1);
>  			continue;
>  		}
> +		BUFFER_TRACE(inode_bitmap_bh, "get_write_access");
> +		err = ext4_journal_get_write_access(handle, inode_bitmap_bh);
> +		if (err)
> +			goto fail;
>  		ext4_lock_group(sb, group);
>  		ret2 = ext4_test_and_set_bit(ino, inode_bitmap_bh->b_data);
>  		ext4_unlock_group(sb, group);
> @@ -738,6 +742,11 @@ repeat_in_this_group:
>  	goto out;
>  
>  got:
> +	BUFFER_TRACE(inode_bitmap_bh, "call ext4_handle_dirty_metadata");
> +	err = ext4_handle_dirty_metadata(handle, NULL, inode_bitmap_bh);
> +	if (err)
> +		goto fail;
> +
>  	/* We may have to initialize the block bitmap if it isn't already */
>  	if (ext4_has_group_desc_csum(sb) &&
>  	    gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) {
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Theodore Ts'o Oct. 29, 2012, 2:34 a.m. UTC | #3
On Sun, Oct 28, 2012 at 09:24:19PM -0500, Eric Sandeen wrote:
> Yeah, I knew it wasn't ;)  I did resend 
> [PATCH] ext4: fix unjournaled inode bitmap modification
> which is a bit more involved.

Yeah, sorry, I didn't see your updated patch at first, since this mail
thread is one complicated tangle.  :-(

> That'll get_write_access on the same buffer over and over, I suppose
> it's ok, but the patch I sent tries to minimize that, and call
> ext4_handle_release_buffer if we're not going to use it (which is
> a no-op today anyway and not normally used I guess...)

Well, it's really rare that we will go through that loop more than
once; it only happens if we have multiple processes race against each
other trying to grab the same inode.

> If ext4_handle_release_buffer() is dead code now, and repeated calls
> via repeat_in_this_group: are no big deal, then your version looks fine.

Yeah, I think it's pretty much dead code.  At least, I can't think of
a good reason why we would want to actually try to handle
ext4_handle_release_buffer() to claw back the transaciton credit.  And
if we do, we'll have to do a sweep through the entire ext4 codebase
anyway.

						- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Sandeen Oct. 29, 2012, 2:35 a.m. UTC | #4
On 10/28/12 9:34 PM, Theodore Ts'o wrote:
> On Sun, Oct 28, 2012 at 09:24:19PM -0500, Eric Sandeen wrote:
>> Yeah, I knew it wasn't ;)  I did resend 
>> [PATCH] ext4: fix unjournaled inode bitmap modification
>> which is a bit more involved.
> 
> Yeah, sorry, I didn't see your updated patch at first, since this mail
> thread is one complicated tangle.  :-(
> 
>> That'll get_write_access on the same buffer over and over, I suppose
>> it's ok, but the patch I sent tries to minimize that, and call
>> ext4_handle_release_buffer if we're not going to use it (which is
>> a no-op today anyway and not normally used I guess...)
> 
> Well, it's really rare that we will go through that loop more than
> once; it only happens if we have multiple processes race against each
> other trying to grab the same inode.
> 
>> If ext4_handle_release_buffer() is dead code now, and repeated calls
>> via repeat_in_this_group: are no big deal, then your version looks fine.
> 
> Yeah, I think it's pretty much dead code.  At least, I can't think of
> a good reason why we would want to actually try to handle
> ext4_handle_release_buffer() to claw back the transaciton credit.  And
> if we do, we'll have to do a sweep through the entire ext4 codebase
> anyway.

Yeah, seems that way.

Then your simpler version is probably the way to go.

Thanks,
-Eric

> 						- Ted
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Theodore Ts'o Oct. 29, 2012, 2:42 a.m. UTC | #5
On Sun, Oct 28, 2012 at 09:35:58PM -0500, Eric Sandeen wrote:
> Yeah, seems that way.
> 
> Then your simpler version is probably the way to go.

If you have a chance, could you do me a favor and test my -v3 version
of the patch?  It should work just as well as yours, but I'm getting
paranoid in my old age, and you seem to have a reliable way of testing
for this failure.  I still need to figure out why my kvm based
approach isn't showing the problem....

Thanks,

-- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
index 4facdd2..575afac 100644
--- a/fs/ext4/ialloc.c
+++ b/fs/ext4/ialloc.c
@@ -725,6 +725,10 @@  repeat_in_this_group:
 				   "inode=%lu", ino + 1);
 			continue;
 		}
+		BUFFER_TRACE(inode_bitmap_bh, "get_write_access");
+		err = ext4_journal_get_write_access(handle, inode_bitmap_bh);
+		if (err)
+			goto fail;
 		ext4_lock_group(sb, group);
 		ret2 = ext4_test_and_set_bit(ino, inode_bitmap_bh->b_data);
 		ext4_unlock_group(sb, group);
@@ -738,6 +742,11 @@  repeat_in_this_group:
 	goto out;
 
 got:
+	BUFFER_TRACE(inode_bitmap_bh, "call ext4_handle_dirty_metadata");
+	err = ext4_handle_dirty_metadata(handle, NULL, inode_bitmap_bh);
+	if (err)
+		goto fail;
+
 	/* We may have to initialize the block bitmap if it isn't already */
 	if (ext4_has_group_desc_csum(sb) &&
 	    gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) {