Message ID | 20121029010012.GE9161@thunk.org |
---|---|
State | Superseded, archived |
Headers | show |
On 29 Oct 2012, Theodore Ts'o spake thusly: > commit 119c0d4460b001e44b41dcf73dc6ee794b98bd31 modified this function > such that the inode bitmap was being modified outside a transaction, > which could lead to corruption, and was discovered when journal_checksum > found a bad checksum in the journal. Hm. If this could have caused corruption for non-checksum users, it must be a pretty rare case if nobody's hit it in six months -- or maybe, I suppose, they hit it and never noticed. (But, hey, this makes me happier to have reported this despite all the flap, if it's found a genuine bug that could have hit people not using wierdo mount options.) Thanks for spending so much time on this fix. Much appreciated.
On 10/28/12 8:00 PM, Theodore Ts'o wrote: > On Sat, Oct 27, 2012 at 05:42:07PM -0500, Eric Sandeen wrote: >> >> It looks like the inode_bitmap_bh is being modified outside a transaction: >> >> ret2 = ext4_test_and_set_bit(ino, inode_bitmap_bh->b_data); >> >> It needs a get_write_access / handle_dirty_metadata pair around it. > > Oops. Nice catch!! > > The patch isn't quite right, though. Yeah, I knew it wasn't ;) I did resend [PATCH] ext4: fix unjournaled inode bitmap modification which is a bit more involved. > We only want to call > ext4_journal_get_write_access() when we know that there is an available > bit in the bitmap. (We could still lose the race, but in that case > the winner of the race probably grabbed the bitmap block first.) > > Also, we only need to call ext4_handle_dirty_metadata() if we > successfully grab the bit in the bitmap. > > So I suggest this patch instead: That'll get_write_access on the same buffer over and over, I suppose it's ok, but the patch I sent tries to minimize that, and call ext4_handle_release_buffer if we're not going to use it (which is a no-op today anyway and not normally used I guess...) If ext4_handle_release_buffer() is dead code now, and repeated calls via repeat_in_this_group: are no big deal, then your version looks fine. -Eric > commit 087eda81f1ac6a6a0394f781b44f1d555d8f64c6 > Author: Eric Sandeen <sandeen@redhat.com> > Date: Sun Oct 28 20:59:57 2012 -0400 > > ext4: fix unjournaled inode bitmap modification > > commit 119c0d4460b001e44b41dcf73dc6ee794b98bd31 modified this function > such that the inode bitmap was being modified outside a transaction, > which could lead to corruption, and was discovered when journal_checksum > found a bad checksum in the journal. > > Reported-by: Nix <nix@esperi.org.uk> > Signed-off-by: Eric Sandeen <sandeen@redhat.com> > Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> > Cc: stable@vger.kernel.org > > diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c > index 4facdd2..575afac 100644 > --- a/fs/ext4/ialloc.c > +++ b/fs/ext4/ialloc.c > @@ -725,6 +725,10 @@ repeat_in_this_group: > "inode=%lu", ino + 1); > continue; > } > + BUFFER_TRACE(inode_bitmap_bh, "get_write_access"); > + err = ext4_journal_get_write_access(handle, inode_bitmap_bh); > + if (err) > + goto fail; > ext4_lock_group(sb, group); > ret2 = ext4_test_and_set_bit(ino, inode_bitmap_bh->b_data); > ext4_unlock_group(sb, group); > @@ -738,6 +742,11 @@ repeat_in_this_group: > goto out; > > got: > + BUFFER_TRACE(inode_bitmap_bh, "call ext4_handle_dirty_metadata"); > + err = ext4_handle_dirty_metadata(handle, NULL, inode_bitmap_bh); > + if (err) > + goto fail; > + > /* We may have to initialize the block bitmap if it isn't already */ > if (ext4_has_group_desc_csum(sb) && > gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) { > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, Oct 28, 2012 at 09:24:19PM -0500, Eric Sandeen wrote: > Yeah, I knew it wasn't ;) I did resend > [PATCH] ext4: fix unjournaled inode bitmap modification > which is a bit more involved. Yeah, sorry, I didn't see your updated patch at first, since this mail thread is one complicated tangle. :-( > That'll get_write_access on the same buffer over and over, I suppose > it's ok, but the patch I sent tries to minimize that, and call > ext4_handle_release_buffer if we're not going to use it (which is > a no-op today anyway and not normally used I guess...) Well, it's really rare that we will go through that loop more than once; it only happens if we have multiple processes race against each other trying to grab the same inode. > If ext4_handle_release_buffer() is dead code now, and repeated calls > via repeat_in_this_group: are no big deal, then your version looks fine. Yeah, I think it's pretty much dead code. At least, I can't think of a good reason why we would want to actually try to handle ext4_handle_release_buffer() to claw back the transaciton credit. And if we do, we'll have to do a sweep through the entire ext4 codebase anyway. - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 10/28/12 9:34 PM, Theodore Ts'o wrote: > On Sun, Oct 28, 2012 at 09:24:19PM -0500, Eric Sandeen wrote: >> Yeah, I knew it wasn't ;) I did resend >> [PATCH] ext4: fix unjournaled inode bitmap modification >> which is a bit more involved. > > Yeah, sorry, I didn't see your updated patch at first, since this mail > thread is one complicated tangle. :-( > >> That'll get_write_access on the same buffer over and over, I suppose >> it's ok, but the patch I sent tries to minimize that, and call >> ext4_handle_release_buffer if we're not going to use it (which is >> a no-op today anyway and not normally used I guess...) > > Well, it's really rare that we will go through that loop more than > once; it only happens if we have multiple processes race against each > other trying to grab the same inode. > >> If ext4_handle_release_buffer() is dead code now, and repeated calls >> via repeat_in_this_group: are no big deal, then your version looks fine. > > Yeah, I think it's pretty much dead code. At least, I can't think of > a good reason why we would want to actually try to handle > ext4_handle_release_buffer() to claw back the transaciton credit. And > if we do, we'll have to do a sweep through the entire ext4 codebase > anyway. Yeah, seems that way. Then your simpler version is probably the way to go. Thanks, -Eric > - Ted > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, Oct 28, 2012 at 09:35:58PM -0500, Eric Sandeen wrote: > Yeah, seems that way. > > Then your simpler version is probably the way to go. If you have a chance, could you do me a favor and test my -v3 version of the patch? It should work just as well as yours, but I'm getting paranoid in my old age, and you seem to have a reliable way of testing for this failure. I still need to figure out why my kvm based approach isn't showing the problem.... Thanks, -- Ted -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c index 4facdd2..575afac 100644 --- a/fs/ext4/ialloc.c +++ b/fs/ext4/ialloc.c @@ -725,6 +725,10 @@ repeat_in_this_group: "inode=%lu", ino + 1); continue; } + BUFFER_TRACE(inode_bitmap_bh, "get_write_access"); + err = ext4_journal_get_write_access(handle, inode_bitmap_bh); + if (err) + goto fail; ext4_lock_group(sb, group); ret2 = ext4_test_and_set_bit(ino, inode_bitmap_bh->b_data); ext4_unlock_group(sb, group); @@ -738,6 +742,11 @@ repeat_in_this_group: goto out; got: + BUFFER_TRACE(inode_bitmap_bh, "call ext4_handle_dirty_metadata"); + err = ext4_handle_dirty_metadata(handle, NULL, inode_bitmap_bh); + if (err) + goto fail; + /* We may have to initialize the block bitmap if it isn't already */ if (ext4_has_group_desc_csum(sb) && gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) {