Message ID | 20121129133129.GC20413@thunk.org |
---|---|
State | Not Applicable, archived |
Headers | show |
On 11/29/12 7:31 AM, Theodore Ts'o wrote: > Hmm, it looks like you didn't run the regression test script before > submitting this patch? No :( I ran it against several filesystems mangled by xfstests, IIRC, but TBH I forgot to run the e2fsprogs suite. My mistake, I've gotten out of that habit. Mea culpa. > It looks like it's not a bug which your patch introduced, but rather, > it uncovered a bug. This was a failure in the second run of > f_extent_bad_node, because in the fix we didn't make sure we updated > the starting block of parent node when we cleared a node in the extent > tree. (see below) > > This brings up another question. Did you test file systems after > running punch on a number of different files to make sure the e2fsck > is happy withe file systems which current kernels might generate? I'm pretty sure I did test it against punched files, but I can do further testing if you have a particular concern .... > In this particular test case, even though the logical start didn't > match, it doesn't cause any problems because it's at the left-most > branch of the tree. I want to make sure we aren't triggering failures > for file systems where the kernel is creating which is technically > incorrect, but which isn't causing problems in practice... But it's a weird inconsistency isn't it, and fixing it up in fsck should be the right thing to do anyway? -Eric > - Ted > > % ./test_script f_extent_bad_node > f_extent_bad_node: bad interior node in extent tree: failed > --- ../../tests/f_extent_bad_node/expect.2 2012-07-06 13:37:27.316253023 +0000 > +++ f_extent_bad_node.2.log 2012-11-29 13:24:11.119306973 +0000 > @@ -1,7 +1,23 @@ > Pass 1: Checking inodes, blocks, and sizes > +Interior extent node level 0 of inode 12: > +Logical start 0 does not match logical start 3 at next level. Fix? yes > + > +Inode 12, i_blocks is 8, should be 6. Fix? yes > + > Pass 2: Checking directory structure > Pass 3: Checking directory connectivity > Pass 4: Checking reference counts > Pass 5: Checking group summary information > -test_filesys: 12/16 files (0.0% non-contiguous), 25/100 blocks > -Exit status is 0 > +Block bitmap differences: -24 > +Fix? yes > + > +Free blocks count wrong for group #0 (75, counted=76). > +Fix? yes > + > +Free blocks count wrong (75, counted=76). > +Fix? yes > + > + > +test_filesys: ***** FILE SYSTEM WAS MODIFIED ***** > +test_filesys: 12/16 files (0.0% non-contiguous), 24/100 blocks > +Exit status is 1 > 125 tests succeeded 1 tests failed > Tests failed: f_extent_bad_node > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Nov 29, 2012 at 09:22:31AM -0600, Eric Sandeen wrote: > > But it's a weird inconsistency isn't it, and fixing it up in fsck should > be the right thing to do anyway? Oh, I agree, but basically, as a result I'm going to put this patch on hold until we do a bit more testing. I'm just not ready to push this out on the maint branch just yet..... (The general rule is that I want to keep the maint branch in a state where someone who wants to take a snapshot for a production environment should feel generally comfortable to do this --- modulo rollout/integration testing, of course. I'll keep it on an es/fsck-int-node-fixup branch to make sure we don't lose it, but it's something where I want to add some additional testing before I'm comfortable rolling it out to the maint branch, just to make sure it doesn't trigger any regression.) BTW, while I was experimenting with test cases I found another related bug (but not a regression) where e2fsck isn't able to fix up a specific fs corruption (see attached). It's unlikely to happen in real life, but given how easily I was able to create something that e2fsck can't fix, it's clear we were missing some synthetic test cases. - Ted
On 11/29/12 10:40 AM, Theodore Ts'o wrote: > On Thu, Nov 29, 2012 at 09:22:31AM -0600, Eric Sandeen wrote: >> >> But it's a weird inconsistency isn't it, and fixing it up in fsck should >> be the right thing to do anyway? > > Oh, I agree, but basically, as a result I'm going to put this patch on > hold until we do a bit more testing. I'm just not ready to push this > out on the maint branch just yet..... > > (The general rule is that I want to keep the maint branch in a state > where someone who wants to take a snapshot for a production > environment should feel generally comfortable to do this --- modulo > rollout/integration testing, of course. I'll keep it on an > es/fsck-int-node-fixup branch to make sure we don't lose it, but it's > something where I want to add some additional testing before I'm > comfortable rolling it out to the maint branch, just to make sure it > doesn't trigger any regression.) FWIW, I hacked xfstests to always check the scratch device after any test uses it, too, and I'm re-running with this change to be sure it'll run over every fs modification xfstests makes ... I'll send that upstream, too. > BTW, while I was experimenting with test cases I found another related > bug (but not a regression) where e2fsck isn't able to fix up a > specific fs corruption (see attached). It's unlikely to happen in > real life, but given how easily I was able to create something that > e2fsck can't fix, it's clear we were missing some synthetic test > cases. At one point I turned fsfuzzer into fsckfuzzer, but it was a "My God, it's full of bugs!" moment for most fileystems, IIRC. ;) But if anyone wants to generate some fsck bugs to fix . . . -Eric > - Ted > -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 11/29/12 10:43 AM, Eric Sandeen wrote: > On 11/29/12 10:40 AM, Theodore Ts'o wrote: >> On Thu, Nov 29, 2012 at 09:22:31AM -0600, Eric Sandeen wrote: >>> >>> But it's a weird inconsistency isn't it, and fixing it up in fsck should >>> be the right thing to do anyway? >> >> Oh, I agree, but basically, as a result I'm going to put this patch on >> hold until we do a bit more testing. I'm just not ready to push this >> out on the maint branch just yet..... >> >> (The general rule is that I want to keep the maint branch in a state >> where someone who wants to take a snapshot for a production >> environment should feel generally comfortable to do this --- modulo >> rollout/integration testing, of course. I'll keep it on an >> es/fsck-int-node-fixup branch to make sure we don't lose it, but it's >> something where I want to add some additional testing before I'm >> comfortable rolling it out to the maint branch, just to make sure it >> doesn't trigger any regression.) > > FWIW, I hacked xfstests to always check the scratch device after any > test uses it, too, and I'm re-running with this change to be sure > it'll run over every fs modification xfstests makes ... > > I'll send that upstream, too. FWIW, ./check -g auto w/ fsck of both devices after each test didn't encounter any fs which triggered this fsck check. -Eric -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
--- ../../tests/f_extent_bad_node/expect.2 2012-07-06 13:37:27.316253023 +0000 +++ f_extent_bad_node.2.log 2012-11-29 13:24:11.119306973 +0000 @@ -1,7 +1,23 @@ Pass 1: Checking inodes, blocks, and sizes +Interior extent node level 0 of inode 12: +Logical start 0 does not match logical start 3 at next level. Fix? yes + +Inode 12, i_blocks is 8, should be 6. Fix? yes + Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information -test_filesys: 12/16 files (0.0% non-contiguous), 25/100 blocks -Exit status is 0 +Block bitmap differences: -24 +Fix? yes + +Free blocks count wrong for group #0 (75, counted=76). +Fix? yes + +Free blocks count wrong (75, counted=76). +Fix? yes + + +test_filesys: ***** FILE SYSTEM WAS MODIFIED ***** +test_filesys: 12/16 files (0.0% non-contiguous), 24/100 blocks +Exit status is 1