Patchwork metadata_csum + unclean shutdown = failure to boot

login
register
mail settings
Submitter Theodore Ts'o
Date Oct. 8, 2012, 2:41 a.m.
Message ID <20121008024126.GC468@thunk.org>
Download mbox | patch
Permalink /patch/189881/
State Accepted, archived
Headers show

Comments

Theodore Ts'o - Oct. 8, 2012, 2:41 a.m.
I found the problem.  It turns out ext4_handle_dirty_super() was
completely FUBAR'ed and was calculating the checksum on the wrong data
(for all but 1k block file systems, sigh).

We just didn't notice because the checksum would be correctly set when
the file system was unmounted cleanly.  (Sigh).

The following patch should fix things.  Thanks for testing out the
metadata checksum on the root file system, and reporting this
problem!!!

						- Ted

From bdd7ed290bf12c2e9132fbe97208a1af79c7a29d Mon Sep 17 00:00:00 2001
From: Theodore Ts'o <tytso@mit.edu>
Date: Sun, 7 Oct 2012 22:18:56 -0400
Subject: [PATCH] ext4: fix metadata checksum calculation for the superblock

The function ext4_handle_dirty_super() was calculating the superblock
on the wrong block data.  As a result, when the superblock is modified
while it is mounted (most commonly, when inodes are added or removed
from the orphan list), the superblock checksum would be wrong.  We
didn't notice because the superblock *was* being correctly calculated
in ext4_commit_super(), and this would get called when the file system
was unmounted.  So the problem only became obvious if the system
crashed while the file system was mounted.

Fix this by removing the poorly designed function signature for
ext4_superblock_Csum_set(); if it only took a single argument, the
pointer to a struct superblock, the ambiguity which caused this
mistake would have been impossible.

Reported-by: George Spelvin <linux@horizon.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org
---
 fs/ext4/ext4.h      | 3 +--
 fs/ext4/ext4_jbd2.c | 8 ++------
 fs/ext4/super.c     | 7 ++++---
 3 files changed, 7 insertions(+), 11 deletions(-)
George Spelvin - Nov. 1, 2012, 1:05 a.m.
I'm currently running with two ext4 patches:

Author: Theodore Ts'o <tytso@mit.edu>
Date: Sun Oct 7 22:18:56 2012 -0400
Subject: ext4: fix metadata checksum calculation for the superblock

Author: Darrick J. Wong <darrick.wong@oracle.com>
Date: Wed Oct 17 12:51:30 2012 -0700
Subject: ext4: Don't verify checksums of dx non-leaf nodes during fallback linear scan

They appear to fix real problems.  I notice, that neither of these have
made it into 2.6.5.  Should they be sent to -stable at some point?

I'm not trying to overrule your judgements on the matter, just ensure that
the omission is actually a conscious decision rather than an oversight.
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Darrick J. Wong - Nov. 1, 2012, 1:13 a.m.
On Wed, Oct 31, 2012 at 09:05:21PM -0400, George Spelvin wrote:
> I'm currently running with two ext4 patches:
> 
> Author: Theodore Ts'o <tytso@mit.edu>
> Date: Sun Oct 7 22:18:56 2012 -0400
> Subject: ext4: fix metadata checksum calculation for the superblock
> 
> Author: Darrick J. Wong <darrick.wong@oracle.com>
> Date: Wed Oct 17 12:51:30 2012 -0700
> Subject: ext4: Don't verify checksums of dx non-leaf nodes during fallback linear scan
> 
> They appear to fix real problems.  I notice, that neither of these have
> made it into 2.6.5.  Should they be sent to -stable at some point?
> 
> I'm not trying to overrule your judgements on the matter, just ensure that
> the omission is actually a conscious decision rather than an oversight.

<shrug> I was wondering too, but I figured Ted was probably busy dealing with
the corruption bug and such.

(Which itself doesn't seem to be in 3.6.x yet)

I suppose it's a good sign that it's been more than a week and you haven't hit
anything else...

--D

> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Theodore Ts'o - Nov. 1, 2012, 1:50 a.m.
On Wed, Oct 31, 2012 at 06:13:12PM -0700, Darrick J. Wong wrote:
> > Author: Theodore Ts'o <tytso@mit.edu>
> > Date: Sun Oct 7 22:18:56 2012 -0400
> > Subject: ext4: fix metadata checksum calculation for the superblock

This one was cc'ed to stable@vger.kernel.org.  But when you said "I
notice, that neither of thse have made it into 2.6.5", I assume you
meant 3.5?  The last 3.5 kernel is 3.5.7, and Greg K-H isn't
backporting fixes to 3.5.x any more.  (See http://www.kernel.org to
see which kernels are marked "EOL"; those are the ones which are no
longer getting updates.)

So that means it should eventually make it to the 3.4.x and 3.6.x
kernels.

> > Author: Darrick J. Wong <darrick.wong@oracle.com>
> > Date: Wed Oct 17 12:51:30 2012 -0700
> > Subject: ext4: Don't verify checksums of dx non-leaf nodes during fallback linear scan

I missed this one because the subject line didn't have [PATCH] in it.
(Darrick, it really helps if you use git format-patch / git
send-email; you can use a message-id of the message you're replying to
in the mail thread to chain the message to the thread.)

I would have eventually found it in patchwork, but even in patchwork
the listing would have had a potentially misleading subject line,
since it grabs the patch title from the subject line of the e-mail.

> <shrug> I was wondering too, but I figured Ted was probably busy dealing with
> the corruption bug and such.
> 
> (Which itself doesn't seem to be in 3.6.x yet)

It isn't in 3.7-rc3 because I didn't see it before I sent the pull
request to Linus....

At this point I'll just include it in the patches to be sent to Linus
at the next merge window, mainly because I don't have the time to run
a separate regression test run just for this patch, and it's only a
cosmetic issue, right?

						- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Darrick J. Wong - Nov. 1, 2012, 3:22 a.m.
On Wed, Oct 31, 2012 at 09:50:15PM -0400, Theodore Ts'o wrote:
> On Wed, Oct 31, 2012 at 06:13:12PM -0700, Darrick J. Wong wrote:
> > > Author: Theodore Ts'o <tytso@mit.edu>
> > > Date: Sun Oct 7 22:18:56 2012 -0400
> > > Subject: ext4: fix metadata checksum calculation for the superblock
> 
> This one was cc'ed to stable@vger.kernel.org.  But when you said "I
> notice, that neither of thse have made it into 2.6.5", I assume you
> meant 3.5?  The last 3.5 kernel is 3.5.7, and Greg K-H isn't
> backporting fixes to 3.5.x any more.  (See http://www.kernel.org to
> see which kernels are marked "EOL"; those are the ones which are no
> longer getting updates.)
> 
> So that means it should eventually make it to the 3.4.x and 3.6.x
> kernels.

I thought he meant 3.6.5, but I haven't really been paying 3.6.x much attention.

> > > Author: Darrick J. Wong <darrick.wong@oracle.com>
> > > Date: Wed Oct 17 12:51:30 2012 -0700
> > > Subject: ext4: Don't verify checksums of dx non-leaf nodes during fallback linear scan
> 
> I missed this one because the subject line didn't have [PATCH] in it.
> (Darrick, it really helps if you use git format-patch / git
> send-email; you can use a message-id of the message you're replying to
> in the mail thread to chain the message to the thread.)
> 
> I would have eventually found it in patchwork, but even in patchwork
> the listing would have had a potentially misleading subject line,
> since it grabs the patch title from the subject line of the e-mail.

Oops, I guess I did forget the magic "[PATCH]".  Sorry about that.

> > <shrug> I was wondering too, but I figured Ted was probably busy dealing with
> > the corruption bug and such.
> > 
> > (Which itself doesn't seem to be in 3.6.x yet)
> 
> It isn't in 3.7-rc3 because I didn't see it before I sent the pull
> request to Linus....
> 
> At this point I'll just include it in the patches to be sent to Linus
> at the next merge window, mainly because I don't have the time to run
> a separate regression test run just for this patch, and it's only a
> cosmetic issue, right?

Yep.

--D
> 
> 						- Ted
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
George Spelvin - Nov. 1, 2012, 6:12 a.m.
> This one was cc'ed to stable@vger.kernel.org.  But when you said "I
> notice, that neither of thse have made it into 2.6.5", I assume you
> meant 3.5?

Whoops, typo!  I meant 3.6.5, the very latest just-out-today stable
kernel.

Quite a few 3.6.x kernels have come out since that patch was Cc'ed,
and it keeps not being included.  So I wondered.

> So that means it should eventually make it to the 3.4.x and 3.6.x
> kernels.

That's what I thought, but I didn't want to pester Greg until I was sure
of your intentions.

> At this point I'll just include it in the patches to be sent to Linus
> at the next merge window, mainly because I don't have the time to run
> a separate regression test run just for this patch, and it's only a
> cosmetic issue, right?

Well, it causes the file system to be marked dirty and unnecessarily
checked on reboot, which I contend is a bug, but it's not a data-loss
bug.

I do worry that it could cause file lookup to fail when it shouldn't,
which *is* effectively a data-loss bug, even if the data reappears
on reboot.  But I'd have to understand the problem and fix better to
know if that actually happens; I haven't observed it.
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Darrick J. Wong - Nov. 1, 2012, 6:49 a.m.
On Thu, Nov 01, 2012 at 02:12:12AM -0400, George Spelvin wrote:
> > This one was cc'ed to stable@vger.kernel.org.  But when you said "I
> > notice, that neither of thse have made it into 2.6.5", I assume you
> > meant 3.5?
> 
> Whoops, typo!  I meant 3.6.5, the very latest just-out-today stable
> kernel.
> 
> Quite a few 3.6.x kernels have come out since that patch was Cc'ed,
> and it keeps not being included.  So I wondered.
> 
> > So that means it should eventually make it to the 3.4.x and 3.6.x
> > kernels.
> 
> That's what I thought, but I didn't want to pester Greg until I was sure
> of your intentions.
> 
> > At this point I'll just include it in the patches to be sent to Linus
> > at the next merge window, mainly because I don't have the time to run
> > a separate regression test run just for this patch, and it's only a
> > cosmetic issue, right?
> 
> Well, it causes the file system to be marked dirty and unnecessarily
> checked on reboot, which I contend is a bug, but it's not a data-loss
> bug.
> 
> I do worry that it could cause file lookup to fail when it shouldn't,
> which *is* effectively a data-loss bug, even if the data reappears
> on reboot.  But I'd have to understand the problem and fix better to
> know if that actually happens; I haven't observed it.

Yes, it would be useful to know what's going on with this directory file, since
it seems to fallback to linear scan, yet e2fsck -D doesn't fix it.  What I was
/going/ for was that the kernel would notice a bad directory and flag it for
fsck on reboot.  Upon reboot, fsck would be run, notice the bad dir, and feed
it to the directory rebuilder to get it fixed for good.  However, there doesn't
seem to be any real checksum mismatch, so the rebuild doesn't happen.

Also ... refresh my memory -- some files have disappeared as a result of this
happening?

--D
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
George Spelvin - Nov. 1, 2012, 7:07 a.m.
> Yes, it would be useful to know what's going on with this directory file,
> since it seems to fallback to linear scan, yet e2fsck -D doesn't fix it.
> What I was /going/ for was that the kernel would notice a bad directory
> and flag it for fsck on reboot.  Upon reboot, fsck would be run, notice
> the bad dir, and feed it to the directory rebuilder to get it fixed
> for good.  However, there doesn't seem to be any real checksum mismatch,
> so the rebuild doesn't happen.

That's what confuses me.  I had already run e2fsck -D (which I assume
rebuilds all directories, even if unnecessary) before observing the
problem.  The other odd clue is that it's always nfsd that chokes;
other accesses to the directory (ls -U, ls -lU, grep -r) don't produce
the message.

> Also ... refresh my memory -- some files have disappeared as a result of this
> happening?

I haven't observed it, no.  But the nature of the symptoms suggests it
might be happening.
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Darrick J. Wong - Nov. 1, 2012, 7:18 a.m.
On Thu, Nov 01, 2012 at 03:07:31AM -0400, George Spelvin wrote:
> > Yes, it would be useful to know what's going on with this directory file,
> > since it seems to fallback to linear scan, yet e2fsck -D doesn't fix it.
> > What I was /going/ for was that the kernel would notice a bad directory
> > and flag it for fsck on reboot.  Upon reboot, fsck would be run, notice
> > the bad dir, and feed it to the directory rebuilder to get it fixed
> > for good.  However, there doesn't seem to be any real checksum mismatch,
> > so the rebuild doesn't happen.
> 
> That's what confuses me.  I had already run e2fsck -D (which I assume
> rebuilds all directories, even if unnecessary) before observing the
> problem.  The other odd clue is that it's always nfsd that chokes;
> other accesses to the directory (ls -U, ls -lU, grep -r) don't produce
> the message.

Oh, so ... it's just nfsd that causes the linear fallback?  Regular (i.e.
non-nfs) users can see everything in the dir, no error messages?

Now *that* is odd. :)

You know, I was starting to wonder what on earth would even cause the fallback
in the first place.  It even looked like most of the "your dir is corrupt"
exits from that function would spit out an error or be somehow obviously
broken.

> > Also ... refresh my memory -- some files have disappeared as a result of this
> > happening?
> 
> I haven't observed it, no.  But the nature of the symptoms suggests it
> might be happening.

Hum.  When linear scan happens on a hashed dir, it's scanning the same blocks
that the hash scan sees.   The htree block looks like a regular directory block
with one huge "unused" dirent that wraps all the htree data.  So, the linear
scan should find the exact same files as a htree scan would.  If it doesn't,
something's wrong.  But you say it isn't, so I imagine it's fine.

<shrug> Another thing for me to ponder tomorrow. :)

--D
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
George Spelvin - Nov. 1, 2012, 7:28 a.m.
> Oh, so ... it's just nfsd that causes the linear fallback?  Regular (i.e.
> non-nfs) users can see everything in the dir, no error messages?

Yup.  After it survived one e2fsck -D, I poked at the directory a bit
to see if I could cause the error.  No success from local access.

It's also probably an NFSv2 client.  I wonder if it's doing something
odd with directory seeks that's causing problems; perhaps htree and the
32-bit seek cookie limit are not friends?

>> I haven't observed it, no.  But the nature of the symptoms suggests it
>> might be happening.

> Hum.  When linear scan happens on a hashed dir, it's scanning the same
> blocks that the hash scan sees.   The htree block looks like a regular
> directory block with one huge "unused" dirent that wraps all the htree
> data.  So, the linear scan should find the exact same files as a htree
> scan would.  If it doesn't, something's wrong.  But you say it isn't,
> so I imagine it's fine.

Maybe I was wrong.  I was worried that it was aborting the directory
scan due to the error and thus files would disappear.  If that doesn't
happen, no worries.
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Darrick J. Wong - Nov. 2, 2012, 12:05 a.m.
On Thu, Nov 01, 2012 at 03:28:47AM -0400, George Spelvin wrote:
> > Oh, so ... it's just nfsd that causes the linear fallback?  Regular (i.e.
> > non-nfs) users can see everything in the dir, no error messages?
> 
> Yup.  After it survived one e2fsck -D, I poked at the directory a bit
> to see if I could cause the error.  No success from local access.
> 
> It's also probably an NFSv2 client.  I wonder if it's doing something
> odd with directory seeks that's causing problems; perhaps htree and the
> 32-bit seek cookie limit are not friends?

<shrug> I'm not nfs-wise, sadly.  I _am_ wondering if an ftrace of this might
be useful... or a gigantic glut of data that I'll never finish processing.

Just from a quick read of ext4_find_entry() it looks like the only thing that
results in fallback mode without a kernel message is ext4_bread() failing in
dx_probe()?

> >> I haven't observed it, no.  But the nature of the symptoms suggests it
> >> might be happening.
> 
> > Hum.  When linear scan happens on a hashed dir, it's scanning the same
> > blocks that the hash scan sees.   The htree block looks like a regular
> > directory block with one huge "unused" dirent that wraps all the htree
> > data.  So, the linear scan should find the exact same files as a htree
> > scan would.  If it doesn't, something's wrong.  But you say it isn't,
> > so I imagine it's fine.
> 
> Maybe I was wrong.  I was worried that it was aborting the directory
> scan due to the error and thus files would disappear.  If that doesn't
> happen, no worries.

Oh well, it'll run slowly but at least it won't be throwing up errors.

--D
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 3ab2539..78971cf 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -2063,8 +2063,7 @@  extern int ext4_resize_fs(struct super_block *sb, ext4_fsblk_t n_blocks_count);
 extern int ext4_calculate_overhead(struct super_block *sb);
 extern int ext4_superblock_csum_verify(struct super_block *sb,
 				       struct ext4_super_block *es);
-extern void ext4_superblock_csum_set(struct super_block *sb,
-				     struct ext4_super_block *es);
+extern void ext4_superblock_csum_set(struct super_block *sb);
 extern void *ext4_kvmalloc(size_t size, gfp_t flags);
 extern void *ext4_kvzalloc(size_t size, gfp_t flags);
 extern void ext4_kvfree(void *ptr);
diff --git a/fs/ext4/ext4_jbd2.c b/fs/ext4/ext4_jbd2.c
index bfa65b4..b4323ba 100644
--- a/fs/ext4/ext4_jbd2.c
+++ b/fs/ext4/ext4_jbd2.c
@@ -143,17 +143,13 @@  int __ext4_handle_dirty_super(const char *where, unsigned int line,
 	struct buffer_head *bh = EXT4_SB(sb)->s_sbh;
 	int err = 0;
 
+	ext4_superblock_csum_set(sb);
 	if (ext4_handle_valid(handle)) {
-		ext4_superblock_csum_set(sb,
-				(struct ext4_super_block *)bh->b_data);
 		err = jbd2_journal_dirty_metadata(handle, bh);
 		if (err)
 			ext4_journal_abort_handle(where, line, __func__,
 						  bh, handle, err);
-	} else {
-		ext4_superblock_csum_set(sb,
-				(struct ext4_super_block *)bh->b_data);
+	} else
 		mark_buffer_dirty(bh);
-	}
 	return err;
 }
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 982f6fc..5ededf1 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -143,9 +143,10 @@  int ext4_superblock_csum_verify(struct super_block *sb,
 	return es->s_checksum == ext4_superblock_csum(sb, es);
 }
 
-void ext4_superblock_csum_set(struct super_block *sb,
-			      struct ext4_super_block *es)
+void ext4_superblock_csum_set(struct super_block *sb)
 {
+	struct ext4_super_block *es = EXT4_SB(sb)->s_es;
+
 	if (!EXT4_HAS_RO_COMPAT_FEATURE(sb,
 		EXT4_FEATURE_RO_COMPAT_METADATA_CSUM))
 		return;
@@ -4387,7 +4388,7 @@  static int ext4_commit_super(struct super_block *sb, int sync)
 		cpu_to_le32(percpu_counter_sum_positive(
 				&EXT4_SB(sb)->s_freeinodes_counter));
 	BUFFER_TRACE(sbh, "marking dirty");
-	ext4_superblock_csum_set(sb, es);
+	ext4_superblock_csum_set(sb);
 	mark_buffer_dirty(sbh);
 	if (sync) {
 		error = sync_dirty_buffer(sbh);