[e2fsprogs] tune2fs: don't recover journal if device is busy.

Message ID 871shrrok4.fsf@notabene.neil.brown.name
State Accepted, archived
Headers show
Series
  • [e2fsprogs] tune2fs: don't recover journal if device is busy.
Related show

Commit Message

NeilBrown Feb. 12, 2018, 1:20 a.m.
tune2fs currently replays the journal if it needs
recovery and the filesystem isn't mounted.

The test for "is the filesystem mounted" isn't completely robust.
Lustre makes use of ext4 filesystems in a way that they are mounted
without being visible in /proc/mounts or similar.
This usage can easily be detected by attempting to open the device
with O_EXCL.  tune2fs already does this and the EXT2_MF_BUSY flag
is set if open(O_EXCL) fails.
Several uses other than lustre mounts could cause O_EXCL to fail,
but in any case it seems unwise to recover the journal when something
else is keeping the device busy.

So add an extra test to avoid journal recovery when the device
is busy.  This fixes some problems with lustre usage.

Signed-off-by: NeilBrown <neilb@suse.com>

--
Note: it seems wrong to recover the journal *after* making
changes to the superblock - there is a good chance that
recovering the journal will over-write those changes.
This is what was happening that lead me to this problem.
Shouldn't journal recovery happen *first*??

Thanks,
NeilBrown
---
 misc/tune2fs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Darrick J. Wong Feb. 12, 2018, 2:16 a.m. | #1
On Mon, Feb 12, 2018 at 12:20:43PM +1100, NeilBrown wrote:
> 
> tune2fs currently replays the journal if it needs
> recovery and the filesystem isn't mounted.
> 
> The test for "is the filesystem mounted" isn't completely robust.
> Lustre makes use of ext4 filesystems in a way that they are mounted
> without being visible in /proc/mounts or similar.
> This usage can easily be detected by attempting to open the device
> with O_EXCL.  tune2fs already does this and the EXT2_MF_BUSY flag
> is set if open(O_EXCL) fails.
> Several uses other than lustre mounts could cause O_EXCL to fail,
> but in any case it seems unwise to recover the journal when something
> else is keeping the device busy.
> 
> So add an extra test to avoid journal recovery when the device
> is busy.  This fixes some problems with lustre usage.
> 
> Signed-off-by: NeilBrown <neilb@suse.com>
> 
> --
> Note: it seems wrong to recover the journal *after* making
> changes to the superblock - there is a good chance that
> recovering the journal will over-write those changes.
> This is what was happening that lead me to this problem.
> Shouldn't journal recovery happen *first*??

Yes.  Oops. :/

This whole hunk ought to move up to be right after
ext2fs_check_if_mounted, I think.

As for this patch itself,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D
Theodore Y. Ts'o Feb. 24, 2018, 10:23 p.m. | #2
On Sun, Feb 11, 2018 at 06:16:09PM -0800, Darrick J. Wong wrote:
> > Note: it seems wrong to recover the journal *after* making
> > changes to the superblock - there is a good chance that
> > recovering the journal will over-write those changes.
> > This is what was happening that lead me to this problem.
> > Shouldn't journal recovery happen *first*??
> 
> Yes.  Oops. :/
> 
> This whole hunk ought to move up to be right after
> ext2fs_check_if_mounted, I think.

After I did that, the test t_replay_and_set started failing.  The
problem is that the test deliberately corrupts all of the inode and
block bitmaps by writing bogus journal entries.  When tune2fs replays
the journal, it ends up smashing the inode and block bitmaps; but then
when it tries to rewrite checksums, the fact that inode bitmap is
completely zeroed out means all of the inode entries are also cleared
out.   Oops!

This normally isn't supposed to happen because check_fsck_needed() is
not supposed to allow us to do dangerous things that require rewriting
checksums unless the file system is freshly checked.

But the way the test was constructed the superblock's last mount time
is still 0, since the file system was never mounted.  By definition,
though, if there is a journal to be replayed, the file system has been
mounted, and hence must be checked.  Once fixed I this to force
s_lastcheck to be always less than s_mtimea after replaying the
journal, then the t_replay_and_set test would fail because it's not
safe to run "tune2fs -O ^metadata_csum" without running e2fsck first.

So I'll change the test to set the file system label instead doing
something more dangerous like clear the metadata checksum feature.
Was there someone who really wanted to be able to execute "tune2fs -O
^metadata_csum" on a file system with a dirty journal w/o running
e2fsck first?

						- Ted
Darrick J. Wong Feb. 26, 2018, 6:08 p.m. | #3
On Sat, Feb 24, 2018 at 05:23:52PM -0500, Theodore Ts'o wrote:
> On Sun, Feb 11, 2018 at 06:16:09PM -0800, Darrick J. Wong wrote:
> > > Note: it seems wrong to recover the journal *after* making
> > > changes to the superblock - there is a good chance that
> > > recovering the journal will over-write those changes.
> > > This is what was happening that lead me to this problem.
> > > Shouldn't journal recovery happen *first*??
> > 
> > Yes.  Oops. :/
> > 
> > This whole hunk ought to move up to be right after
> > ext2fs_check_if_mounted, I think.
> 
> After I did that, the test t_replay_and_set started failing.  The
> problem is that the test deliberately corrupts all of the inode and
> block bitmaps by writing bogus journal entries.  When tune2fs replays
> the journal, it ends up smashing the inode and block bitmaps; but then
> when it tries to rewrite checksums, the fact that inode bitmap is
> completely zeroed out means all of the inode entries are also cleared
> out.   Oops!
> 
> This normally isn't supposed to happen because check_fsck_needed() is
> not supposed to allow us to do dangerous things that require rewriting
> checksums unless the file system is freshly checked.
> 
> But the way the test was constructed the superblock's last mount time
> is still 0, since the file system was never mounted.  By definition,
> though, if there is a journal to be replayed, the file system has been
> mounted, and hence must be checked.  Once fixed I this to force
> s_lastcheck to be always less than s_mtimea after replaying the
> journal, then the t_replay_and_set test would fail because it's not
> safe to run "tune2fs -O ^metadata_csum" without running e2fsck first.
> 
> So I'll change the test to set the file system label instead doing
> something more dangerous like clear the metadata checksum feature.
> Was there someone who really wanted to be able to execute "tune2fs -O
> ^metadata_csum" on a file system with a dirty journal w/o running
> e2fsck first?

No, I didn't have a specific user in mind.  Frankly I doubt there will
be very many people who want to turn *off* that feature, but for those
who do so to a dirty fs we could replay the journal first.

(Though let's be honest, if you're going to tune2fs you really ought to
e2fsck before just to make sure the fs is ok...)

--D

> 
> 						- Ted

Patch

diff --git a/misc/tune2fs.c b/misc/tune2fs.c
index c33fb9d80b10..703e55b6b972 100644
--- a/misc/tune2fs.c
+++ b/misc/tune2fs.c
@@ -3337,7 +3337,7 @@  _("Warning: The journal is dirty. You may wish to replay the journal like:\n\n"
 	}
 #else
 	/* Recover the journal if possible. */
-	if ((open_flag & EXT2_FLAG_RW) && !(mount_flags & EXT2_MF_MOUNTED) &&
+	if ((open_flag & EXT2_FLAG_RW) && !(mount_flags & (EXT2_MF_BUSY | EXT2_MF_MOUNTED)) &&
 	    ext2fs_has_feature_journal_needs_recovery(fs->super)) {
 		errcode_t err;