diff mbox

e2fsck: Fix incorrect interior node logical start values

Message ID 20121129133129.GC20413@thunk.org
State Not Applicable, archived
Headers show

Commit Message

Theodore Ts'o Nov. 29, 2012, 1:31 p.m. UTC
Hmm, it looks like you didn't run the regression test script before
submitting this patch?

It looks like it's not a bug which your patch introduced, but rather,
it uncovered a bug.  This was a failure in the second run of
f_extent_bad_node, because in the fix we didn't make sure we updated
the starting block of parent node when we cleared a node in the extent
tree.  (see below)

This brings up another question.  Did you test file systems after
running punch on a number of different files to make sure the e2fsck
is happy withe file systems which current kernels might generate?

In this particular test case, even though the logical start didn't
match, it doesn't cause any problems because it's at the left-most
branch of the tree.  I want to make sure we aren't triggering failures
for file systems where the kernel is creating which is technically
incorrect, but which isn't causing problems in practice...

	       	     	   	   - Ted

% ./test_script f_extent_bad_node
f_extent_bad_node: bad interior node in extent tree: failed
125 tests succeeded	1 tests failed
Tests failed: f_extent_bad_node 


--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Eric Sandeen Nov. 29, 2012, 3:22 p.m. UTC | #1
On 11/29/12 7:31 AM, Theodore Ts'o wrote:
> Hmm, it looks like you didn't run the regression test script before
> submitting this patch?

No :(  I ran it against several filesystems mangled by xfstests,
IIRC, but TBH I forgot to run the e2fsprogs suite.  My mistake,
I've gotten out of that habit.  Mea culpa.

> It looks like it's not a bug which your patch introduced, but rather,
> it uncovered a bug.  This was a failure in the second run of
> f_extent_bad_node, because in the fix we didn't make sure we updated
> the starting block of parent node when we cleared a node in the extent
> tree.  (see below)
> 
> This brings up another question.  Did you test file systems after
> running punch on a number of different files to make sure the e2fsck
> is happy withe file systems which current kernels might generate?

I'm pretty sure I did test it against punched files, but I can
do further testing if you have a particular concern ....

> In this particular test case, even though the logical start didn't
> match, it doesn't cause any problems because it's at the left-most
> branch of the tree.  I want to make sure we aren't triggering failures
> for file systems where the kernel is creating which is technically
> incorrect, but which isn't causing problems in practice...

But it's a weird inconsistency isn't it, and fixing it up in fsck should
be the right thing to do anyway?

-Eric

> 	       	     	   	   - Ted
> 
> % ./test_script f_extent_bad_node
> f_extent_bad_node: bad interior node in extent tree: failed
> --- ../../tests/f_extent_bad_node/expect.2     2012-07-06 13:37:27.316253023 +0000
> +++ f_extent_bad_node.2.log		       2012-11-29 13:24:11.119306973 +0000
> @@ -1,7 +1,23 @@
>  Pass 1: Checking inodes, blocks, and sizes
> +Interior extent node level 0 of inode 12:
> +Logical start 0 does not match logical start 3 at next level.  Fix? yes
> +
> +Inode 12, i_blocks is 8, should be 6.  Fix? yes
> +
>  Pass 2: Checking directory structure
>  Pass 3: Checking directory connectivity
>  Pass 4: Checking reference counts
>  Pass 5: Checking group summary information
> -test_filesys: 12/16 files (0.0% non-contiguous), 25/100 blocks
> -Exit status is 0
> +Block bitmap differences:  -24
> +Fix? yes
> +
> +Free blocks count wrong for group #0 (75, counted=76).
> +Fix? yes
> +
> +Free blocks count wrong (75, counted=76).
> +Fix? yes
> +
> +
> +test_filesys: ***** FILE SYSTEM WAS MODIFIED *****
> +test_filesys: 12/16 files (0.0% non-contiguous), 24/100 blocks
> +Exit status is 1
> 125 tests succeeded	1 tests failed
> Tests failed: f_extent_bad_node 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Theodore Ts'o Nov. 29, 2012, 4:40 p.m. UTC | #2
On Thu, Nov 29, 2012 at 09:22:31AM -0600, Eric Sandeen wrote:
> 
> But it's a weird inconsistency isn't it, and fixing it up in fsck should
> be the right thing to do anyway?

Oh, I agree, but basically, as a result I'm going to put this patch on
hold until we do a bit more testing.  I'm just not ready to push this
out on the maint branch just yet.....

(The general rule is that I want to keep the maint branch in a state
where someone who wants to take a snapshot for a production
environment should feel generally comfortable to do this --- modulo
rollout/integration testing, of course.  I'll keep it on an
es/fsck-int-node-fixup branch to make sure we don't lose it, but it's
something where I want to add some additional testing before I'm
comfortable rolling it out to the maint branch, just to make sure it
doesn't trigger any regression.)

BTW, while I was experimenting with test cases I found another related
bug (but not a regression) where e2fsck isn't able to fix up a
specific fs corruption (see attached).  It's unlikely to happen in
real life, but given how easily I was able to create something that
e2fsck can't fix, it's clear we were missing some synthetic test
cases.

						- Ted
Eric Sandeen Nov. 29, 2012, 4:43 p.m. UTC | #3
On 11/29/12 10:40 AM, Theodore Ts'o wrote:
> On Thu, Nov 29, 2012 at 09:22:31AM -0600, Eric Sandeen wrote:
>>
>> But it's a weird inconsistency isn't it, and fixing it up in fsck should
>> be the right thing to do anyway?
> 
> Oh, I agree, but basically, as a result I'm going to put this patch on
> hold until we do a bit more testing.  I'm just not ready to push this
> out on the maint branch just yet.....
> 
> (The general rule is that I want to keep the maint branch in a state
> where someone who wants to take a snapshot for a production
> environment should feel generally comfortable to do this --- modulo
> rollout/integration testing, of course.  I'll keep it on an
> es/fsck-int-node-fixup branch to make sure we don't lose it, but it's
> something where I want to add some additional testing before I'm
> comfortable rolling it out to the maint branch, just to make sure it
> doesn't trigger any regression.)

FWIW, I hacked xfstests to always check the scratch device after any
test uses it, too, and I'm re-running with this change to be sure
it'll run over every fs modification xfstests makes ...

I'll send that upstream, too.

> BTW, while I was experimenting with test cases I found another related
> bug (but not a regression) where e2fsck isn't able to fix up a
> specific fs corruption (see attached).  It's unlikely to happen in
> real life, but given how easily I was able to create something that
> e2fsck can't fix, it's clear we were missing some synthetic test
> cases.

At one point I turned fsfuzzer into fsckfuzzer, but it was a
"My God, it's full of bugs!" moment for most fileystems, IIRC.  ;)

But if anyone wants to generate some fsck bugs to fix . . .

-Eric

> 						- Ted
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Sandeen Nov. 29, 2012, 6:56 p.m. UTC | #4
On 11/29/12 10:43 AM, Eric Sandeen wrote:
> On 11/29/12 10:40 AM, Theodore Ts'o wrote:
>> On Thu, Nov 29, 2012 at 09:22:31AM -0600, Eric Sandeen wrote:
>>>
>>> But it's a weird inconsistency isn't it, and fixing it up in fsck should
>>> be the right thing to do anyway?
>>
>> Oh, I agree, but basically, as a result I'm going to put this patch on
>> hold until we do a bit more testing.  I'm just not ready to push this
>> out on the maint branch just yet.....
>>
>> (The general rule is that I want to keep the maint branch in a state
>> where someone who wants to take a snapshot for a production
>> environment should feel generally comfortable to do this --- modulo
>> rollout/integration testing, of course.  I'll keep it on an
>> es/fsck-int-node-fixup branch to make sure we don't lose it, but it's
>> something where I want to add some additional testing before I'm
>> comfortable rolling it out to the maint branch, just to make sure it
>> doesn't trigger any regression.)
> 
> FWIW, I hacked xfstests to always check the scratch device after any
> test uses it, too, and I'm re-running with this change to be sure
> it'll run over every fs modification xfstests makes ...
> 
> I'll send that upstream, too.

FWIW, ./check -g auto w/ fsck of both devices after each test
didn't encounter any fs which triggered this fsck check.

-Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

--- ../../tests/f_extent_bad_node/expect.2     2012-07-06 13:37:27.316253023 +0000
+++ f_extent_bad_node.2.log		       2012-11-29 13:24:11.119306973 +0000
@@ -1,7 +1,23 @@ 
 Pass 1: Checking inodes, blocks, and sizes
+Interior extent node level 0 of inode 12:
+Logical start 0 does not match logical start 3 at next level.  Fix? yes
+
+Inode 12, i_blocks is 8, should be 6.  Fix? yes
+
 Pass 2: Checking directory structure
 Pass 3: Checking directory connectivity
 Pass 4: Checking reference counts
 Pass 5: Checking group summary information
-test_filesys: 12/16 files (0.0% non-contiguous), 25/100 blocks
-Exit status is 0
+Block bitmap differences:  -24
+Fix? yes
+
+Free blocks count wrong for group #0 (75, counted=76).
+Fix? yes
+
+Free blocks count wrong (75, counted=76).
+Fix? yes
+
+
+test_filesys: ***** FILE SYSTEM WAS MODIFIED *****
+test_filesys: 12/16 files (0.0% non-contiguous), 24/100 blocks
+Exit status is 1