diff mbox series

[RFC] fstest: regression test for ext4 crash consistency bug

Message ID 1503830683-21455-1-git-send-email-amir73il@gmail.com
State Not Applicable, archived
Headers show
Series [RFC] fstest: regression test for ext4 crash consistency bug | expand

Commit Message

Amir Goldstein Aug. 27, 2017, 10:44 a.m. UTC
This test is motivated by a bug found in ext4 during random crash
consistency tests.

This test uses device mapper flakey target to demonstrate the bug
found using device mapper log-writes target.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---

Ted,

While working on crash consistency xfstests [1], I stubmled on what
appeared to be an ext4 crash consistency bug.

The tests I used rely on the log-writes dm target code written
by Josef Bacik, which had little exposure to the wide community
as far as I know.  I wanted to prove to myself that the found
inconsistency was not due to a test bug, so I bisected the failed
test to the minimal operations that trigger the failure and wrote
a small independent test to reproduce the issue using dm flakey target.

The following fsck error is reliably reproduced by replaying some fsx ops
on overlapping file regions, then emulating a crash, followed by mount,
umount and fsck -nf:

  ./ltp/fsx -d --replay-ops /tmp/8995.fsxops /mnt/scratch/testfile
  1 write 0x137dd thru    0x21445 (0xdc69 bytes)
  2 falloc        from 0xb531 to 0x16ade (0xb5ad bytes)
  3 collapse      from 0x1c000 to 0x20000, (0x4000 bytes)
  4 write 0x3e5ec thru    0x3ffff (0x1a14 bytes)
  5 zero  from 0x20fac to 0x27d48, (0x6d9c bytes)
  6 mapwrite      0x216ad thru    0x23dfb (0x274f bytes)
  All 7 operations completed A-OK!
  _check_generic_filesystem: filesystem on /dev/mapper/ssd-scratch is inconsistent
  *** fsck.ext4 output ***
  fsck from util-linux 2.27.1
  e2fsck 1.42.13 (17-May-2015)
  Pass 1: Checking inodes, blocks, and sizes
  Inode 12, end of extent exceeds allowed value
          (logical block 33, physical block 33441, len 7)
  Clear? no
  Inode 12, i_blocks is 184, should be 128.  Fix? no

Note that the inconsistency is "applied" by journal replay during mount.
fsck -nf before mount does not report any errors.

I did not intend for this test to be merged as is, but rather to be used
by ext4 developers to analyze the problem and then re-write the test with
more comments and less arbitrary offset/length values.

P.S.: crash consistency tests also reliably reproduce a btrfs fsck error.
      a detailed report with I/O recording was sent to Josef.
P.S.2: crash consistency tests report file data checksum errors on xfs
       after fsync+crash, but I still need to prove the reliability of
       these reports.
 
[1] https://github.com/amir73il/xfstests/commits/dm-log-writes

 tests/generic/501     | 80 +++++++++++++++++++++++++++++++++++++++++++++++++++
 tests/generic/501.out |  2 ++
 tests/generic/group   |  1 +
 3 files changed, 83 insertions(+)
 create mode 100755 tests/generic/501
 create mode 100644 tests/generic/501.out

Comments

Xiao Yang Sept. 25, 2017, 9:49 a.m. UTC | #1
On 2017/08/27 18:44, Amir Goldstein wrote:
> This test is motivated by a bug found in ext4 during random crash
> consistency tests.
>
> This test uses device mapper flakey target to demonstrate the bug
> found using device mapper log-writes target.
>
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> ---
>
> Ted,
>
> While working on crash consistency xfstests [1], I stubmled on what
> appeared to be an ext4 crash consistency bug.
>
> The tests I used rely on the log-writes dm target code written
> by Josef Bacik, which had little exposure to the wide community
> as far as I know.  I wanted to prove to myself that the found
> inconsistency was not due to a test bug, so I bisected the failed
> test to the minimal operations that trigger the failure and wrote
> a small independent test to reproduce the issue using dm flakey target.
>
> The following fsck error is reliably reproduced by replaying some fsx ops
> on overlapping file regions, then emulating a crash, followed by mount,
> umount and fsck -nf:
>
>   ./ltp/fsx -d --replay-ops /tmp/8995.fsxops /mnt/scratch/testfile
>   1 write 0x137dd thru    0x21445 (0xdc69 bytes)
>   2 falloc        from 0xb531 to 0x16ade (0xb5ad bytes)
>   3 collapse      from 0x1c000 to 0x20000, (0x4000 bytes)
>   4 write 0x3e5ec thru    0x3ffff (0x1a14 bytes)
>   5 zero  from 0x20fac to 0x27d48, (0x6d9c bytes)
>   6 mapwrite      0x216ad thru    0x23dfb (0x274f bytes)
>   All 7 operations completed A-OK!
>   _check_generic_filesystem: filesystem on /dev/mapper/ssd-scratch is inconsistent
>   *** fsck.ext4 output ***
>   fsck from util-linux 2.27.1
>   e2fsck 1.42.13 (17-May-2015)
>   Pass 1: Checking inodes, blocks, and sizes
>   Inode 12, end of extent exceeds allowed value
>           (logical block 33, physical block 33441, len 7)
>   Clear? no
>   Inode 12, i_blocks is 184, should be 128.  Fix? no
Hi Amir,

I always get the following output when running your xfstests test case 501.
---------------------------------------------------------------------------
e2fsck 1.42.9 (28-Dec-2013)
Pass 1: Checking inodes, blocks, and sizes
Inode 12, i_size is 147456, should be 163840. Fix? no
---------------------------------------------------------------------------

Could you tell me how to get the expected output as you reported?

Thanks,
Xiao Yang
> Note that the inconsistency is "applied" by journal replay during mount.
> fsck -nf before mount does not report any errors.
>
> I did not intend for this test to be merged as is, but rather to be used
> by ext4 developers to analyze the problem and then re-write the test with
> more comments and less arbitrary offset/length values.
>
> P.S.: crash consistency tests also reliably reproduce a btrfs fsck error.
>       a detailed report with I/O recording was sent to Josef.
> P.S.2: crash consistency tests report file data checksum errors on xfs
>        after fsync+crash, but I still need to prove the reliability of
>        these reports.
>  
> [1] https://github.com/amir73il/xfstests/commits/dm-log-writes
>
>  tests/generic/501     | 80 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  tests/generic/501.out |  2 ++
>  tests/generic/group   |  1 +
>  3 files changed, 83 insertions(+)
>  create mode 100755 tests/generic/501
>  create mode 100644 tests/generic/501.out
>
> diff --git a/tests/generic/501 b/tests/generic/501
> new file mode 100755
> index 0000000..ccb513d
> --- /dev/null
> +++ b/tests/generic/501
> @@ -0,0 +1,80 @@
> +#! /bin/bash
> +# FS QA Test No. 501
> +#
> +# This test is motivated by a bug found in ext4 during random crash
> +# consistency tests.
> +#
> +#-----------------------------------------------------------------------
> +# Copyright (C) 2017 CTERA Networks. All Rights Reserved.
> +# Author: Amir Goldstein <amir73il@gmail.com>
> +#
> +# This program is free software; you can redistribute it and/or
> +# modify it under the terms of the GNU General Public License as
> +# published by the Free Software Foundation.
> +#
> +# This program is distributed in the hope that it would be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program; if not, write the Free Software Foundation,
> +# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
> +#-----------------------------------------------------------------------
> +#
> +
> +seq=`basename $0`
> +seqres=$RESULT_DIR/$seq
> +echo "QA output created by $seq"
> +
> +here=`pwd`
> +tmp=/tmp/$$
> +status=1	# failure is the default!
> +
> +_cleanup()
> +{
> +	_cleanup_flakey
> +	cd /
> +	rm -f $tmp.*
> +}
> +trap "_cleanup; exit \$status" 0 1 2 3 15
> +
> +# get standard environment, filters and checks
> +. ./common/rc
> +. ./common/filter
> +. ./common/dmflakey
> +
> +# real QA test starts here
> +_supported_fs generic
> +_supported_os Linux
> +_require_scratch
> +_require_dm_target flakey
> +_require_metadata_journaling $SCRATCH_DEV
> +
> +rm -f $seqres.full
> +
> +_scratch_mkfs >> $seqres.full 2>&1
> +
> +_init_flakey
> +_mount_flakey
> +
> +fsxops=$tmp.fsxops
> +cat <<EOF > $fsxops
> +write 0x137dd 0xdc69 0x0
> +fallocate 0xb531 0xb5ad 0x21446
> +collapse_range 0x1c000 0x4000 0x21446
> +write 0x3e5ec 0x1a14 0x21446
> +zero_range 0x20fac 0x6d9c 0x40000 keep_size
> +mapwrite 0x216ad 0x274f 0x40000
> +EOF
> +run_check $here/ltp/fsx -d --replay-ops $fsxops $SCRATCH_MNT/testfile
> +
> +_flakey_drop_and_remount
> +_unmount_flakey
> +_cleanup_flakey
> +_check_scratch_fs
> +
> +echo "Silence is golden"
> +
> +status=0
> +exit
> diff --git a/tests/generic/501.out b/tests/generic/501.out
> new file mode 100644
> index 0000000..00133b6
> --- /dev/null
> +++ b/tests/generic/501.out
> @@ -0,0 +1,2 @@
> +QA output created by 501
> +Silence is golden
> diff --git a/tests/generic/group b/tests/generic/group
> index 2396b72..bb870f2 100644
> --- a/tests/generic/group
> +++ b/tests/generic/group
> @@ -454,3 +454,4 @@
>  449 auto quick acl enospc
>  450 auto quick rw
>  500 auto log replay
> +501 auto quick metadata
Amir Goldstein Sept. 25, 2017, 10:53 a.m. UTC | #2
On Mon, Sep 25, 2017 at 12:49 PM, Xiao Yang <yangx.jy@cn.fujitsu.com> wrote:
> On 2017/08/27 18:44, Amir Goldstein wrote:
>> This test is motivated by a bug found in ext4 during random crash
>> consistency tests.
>>
>> This test uses device mapper flakey target to demonstrate the bug
>> found using device mapper log-writes target.
>>
>> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
>> ---
>>
>> Ted,
>>
>> While working on crash consistency xfstests [1], I stubmled on what
>> appeared to be an ext4 crash consistency bug.
>>
>> The tests I used rely on the log-writes dm target code written
>> by Josef Bacik, which had little exposure to the wide community
>> as far as I know.  I wanted to prove to myself that the found
>> inconsistency was not due to a test bug, so I bisected the failed
>> test to the minimal operations that trigger the failure and wrote
>> a small independent test to reproduce the issue using dm flakey target.
>>
>> The following fsck error is reliably reproduced by replaying some fsx ops
>> on overlapping file regions, then emulating a crash, followed by mount,
>> umount and fsck -nf:
>>
>>   ./ltp/fsx -d --replay-ops /tmp/8995.fsxops /mnt/scratch/testfile
>>   1 write 0x137dd thru    0x21445 (0xdc69 bytes)
>>   2 falloc        from 0xb531 to 0x16ade (0xb5ad bytes)
>>   3 collapse      from 0x1c000 to 0x20000, (0x4000 bytes)
>>   4 write 0x3e5ec thru    0x3ffff (0x1a14 bytes)
>>   5 zero  from 0x20fac to 0x27d48, (0x6d9c bytes)
>>   6 mapwrite      0x216ad thru    0x23dfb (0x274f bytes)
>>   All 7 operations completed A-OK!
>>   _check_generic_filesystem: filesystem on /dev/mapper/ssd-scratch is inconsistent
>>   *** fsck.ext4 output ***
>>   fsck from util-linux 2.27.1
>>   e2fsck 1.42.13 (17-May-2015)
>>   Pass 1: Checking inodes, blocks, and sizes
>>   Inode 12, end of extent exceeds allowed value
>>           (logical block 33, physical block 33441, len 7)
>>   Clear? no
>>   Inode 12, i_blocks is 184, should be 128.  Fix? no
> Hi Amir,
>
> I always get the following output when running your xfstests test case 501.

Now merged as test generic/456

> ---------------------------------------------------------------------------
> e2fsck 1.42.9 (28-Dec-2013)
> Pass 1: Checking inodes, blocks, and sizes
> Inode 12, i_size is 147456, should be 163840. Fix? no
> ---------------------------------------------------------------------------
>
> Could you tell me how to get the expected output as you reported?

I can't say I am doing anything special, but I can say that I get the
same output as you did when running the test inside kvm-xfstests.
Actually, I could not reproduce ANY of the the crash consistency bugs
inside kvm-xfstests. Must be something to do with different timing of
IO with KVM+virtio disks??

When running on my laptop (Ubuntu 16.04 with latest kernel)
on a 10G SSD volume, I always get the error reported above.
I just re-verified with latest stable e2fsprogs (1.43.6).

Amir.
Xiao Yang Sept. 26, 2017, 10:45 a.m. UTC | #3
On 2017/09/25 18:53, Amir Goldstein wrote:
> On Mon, Sep 25, 2017 at 12:49 PM, Xiao Yang<yangx.jy@cn.fujitsu.com>  wrote:
>> On 2017/08/27 18:44, Amir Goldstein wrote:
>>> This test is motivated by a bug found in ext4 during random crash
>>> consistency tests.
>>>
>>> This test uses device mapper flakey target to demonstrate the bug
>>> found using device mapper log-writes target.
>>>
>>> Signed-off-by: Amir Goldstein<amir73il@gmail.com>
>>> ---
>>>
>>> Ted,
>>>
>>> While working on crash consistency xfstests [1], I stubmled on what
>>> appeared to be an ext4 crash consistency bug.
>>>
>>> The tests I used rely on the log-writes dm target code written
>>> by Josef Bacik, which had little exposure to the wide community
>>> as far as I know.  I wanted to prove to myself that the found
>>> inconsistency was not due to a test bug, so I bisected the failed
>>> test to the minimal operations that trigger the failure and wrote
>>> a small independent test to reproduce the issue using dm flakey target.
>>>
>>> The following fsck error is reliably reproduced by replaying some fsx ops
>>> on overlapping file regions, then emulating a crash, followed by mount,
>>> umount and fsck -nf:
>>>
>>>    ./ltp/fsx -d --replay-ops /tmp/8995.fsxops /mnt/scratch/testfile
>>>    1 write 0x137dd thru    0x21445 (0xdc69 bytes)
>>>    2 falloc        from 0xb531 to 0x16ade (0xb5ad bytes)
>>>    3 collapse      from 0x1c000 to 0x20000, (0x4000 bytes)
>>>    4 write 0x3e5ec thru    0x3ffff (0x1a14 bytes)
>>>    5 zero  from 0x20fac to 0x27d48, (0x6d9c bytes)
>>>    6 mapwrite      0x216ad thru    0x23dfb (0x274f bytes)
>>>    All 7 operations completed A-OK!
>>>    _check_generic_filesystem: filesystem on /dev/mapper/ssd-scratch is inconsistent
>>>    *** fsck.ext4 output ***
>>>    fsck from util-linux 2.27.1
>>>    e2fsck 1.42.13 (17-May-2015)
>>>    Pass 1: Checking inodes, blocks, and sizes
>>>    Inode 12, end of extent exceeds allowed value
>>>            (logical block 33, physical block 33441, len 7)
>>>    Clear? no
>>>    Inode 12, i_blocks is 184, should be 128.  Fix? no
>> Hi Amir,
>>
>> I always get the following output when running your xfstests test case 501.
> Now merged as test generic/456
>
>> ---------------------------------------------------------------------------
>> e2fsck 1.42.9 (28-Dec-2013)
>> Pass 1: Checking inodes, blocks, and sizes
>> Inode 12, i_size is 147456, should be 163840. Fix? no
>> ---------------------------------------------------------------------------
>>
>> Could you tell me how to get the expected output as you reported?
> I can't say I am doing anything special, but I can say that I get the
> same output as you did when running the test inside kvm-xfstests.
> Actually, I could not reproduce ANY of the the crash consistency bugs
> inside kvm-xfstests. Must be something to do with different timing of
> IO with KVM+virtio disks??
>
> When running on my laptop (Ubuntu 16.04 with latest kernel)
> on a 10G SSD volume, I always get the error reported above.
> I just re-verified with latest stable e2fsprogs (1.43.6).
Hi Amir,

I tested generic/456 with KVM+virtio disks and SATA volumes on some 
kernels (including
v3.10.0, the latest kernel), but i still got the same output as i reported.

Could you determine whether the two different outputs are caused by the 
same bug
or not ?

Thanks,
Xiao Yang.
> Amir.
> --
> To unsubscribe from this list: send the line "unsubscribe fstests" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
> .
>
Amir Goldstein Sept. 26, 2017, 11:48 a.m. UTC | #4
On Tue, Sep 26, 2017 at 1:45 PM, Xiao Yang <yangx.jy@cn.fujitsu.com> wrote:
> On 2017/09/25 18:53, Amir Goldstein wrote:
>>
>> On Mon, Sep 25, 2017 at 12:49 PM, Xiao Yang<yangx.jy@cn.fujitsu.com>
>> wrote:
>>>
>>> On 2017/08/27 18:44, Amir Goldstein wrote:
>>>>
>>>> This test is motivated by a bug found in ext4 during random crash
>>>> consistency tests.
>>>>
>>>> This test uses device mapper flakey target to demonstrate the bug
>>>> found using device mapper log-writes target.
>>>>
>>>> Signed-off-by: Amir Goldstein<amir73il@gmail.com>
>>>> ---
>>>>
>>>> Ted,
>>>>
>>>> While working on crash consistency xfstests [1], I stubmled on what
>>>> appeared to be an ext4 crash consistency bug.
>>>>
>>>> The tests I used rely on the log-writes dm target code written
>>>> by Josef Bacik, which had little exposure to the wide community
>>>> as far as I know.  I wanted to prove to myself that the found
>>>> inconsistency was not due to a test bug, so I bisected the failed
>>>> test to the minimal operations that trigger the failure and wrote
>>>> a small independent test to reproduce the issue using dm flakey target.
>>>>
>>>> The following fsck error is reliably reproduced by replaying some fsx
>>>> ops
>>>> on overlapping file regions, then emulating a crash, followed by mount,
>>>> umount and fsck -nf:
>>>>
>>>>    ./ltp/fsx -d --replay-ops /tmp/8995.fsxops /mnt/scratch/testfile
>>>>    1 write 0x137dd thru    0x21445 (0xdc69 bytes)
>>>>    2 falloc        from 0xb531 to 0x16ade (0xb5ad bytes)
>>>>    3 collapse      from 0x1c000 to 0x20000, (0x4000 bytes)
>>>>    4 write 0x3e5ec thru    0x3ffff (0x1a14 bytes)
>>>>    5 zero  from 0x20fac to 0x27d48, (0x6d9c bytes)
>>>>    6 mapwrite      0x216ad thru    0x23dfb (0x274f bytes)
>>>>    All 7 operations completed A-OK!
>>>>    _check_generic_filesystem: filesystem on /dev/mapper/ssd-scratch is
>>>> inconsistent
>>>>    *** fsck.ext4 output ***
>>>>    fsck from util-linux 2.27.1
>>>>    e2fsck 1.42.13 (17-May-2015)
>>>>    Pass 1: Checking inodes, blocks, and sizes
>>>>    Inode 12, end of extent exceeds allowed value
>>>>            (logical block 33, physical block 33441, len 7)
>>>>    Clear? no
>>>>    Inode 12, i_blocks is 184, should be 128.  Fix? no
>>>
>>> Hi Amir,
>>>
>>> I always get the following output when running your xfstests test case
>>> 501.
>>
>> Now merged as test generic/456
>>
>>>
>>> ---------------------------------------------------------------------------
>>> e2fsck 1.42.9 (28-Dec-2013)
>>> Pass 1: Checking inodes, blocks, and sizes
>>> Inode 12, i_size is 147456, should be 163840. Fix? no
>>>
>>> ---------------------------------------------------------------------------
>>>
>>> Could you tell me how to get the expected output as you reported?
>>
>> I can't say I am doing anything special, but I can say that I get the
>> same output as you did when running the test inside kvm-xfstests.
>> Actually, I could not reproduce ANY of the the crash consistency bugs
>> inside kvm-xfstests. Must be something to do with different timing of
>> IO with KVM+virtio disks??
>>
>> When running on my laptop (Ubuntu 16.04 with latest kernel)
>> on a 10G SSD volume, I always get the error reported above.
>> I just re-verified with latest stable e2fsprogs (1.43.6).
>
> Hi Amir,
>
> I tested generic/456 with KVM+virtio disks and SATA volumes on some kernels

I don't understand. Did you also test without KVM?
Otherwise I suggest that you test without KVM/virtio.

> (including
> v3.10.0, the latest kernel), but i still got the same output as i reported.
>
> Could you determine whether the two different outputs are caused by the same
> bug
> or not ?

No idea if those are 2 symptoms of the same bug or 2 different bugs
I did not investigate the root cause.

Amir.
Ashlie Martinez Sept. 30, 2017, 2:15 p.m. UTC | #5
Hi Xiao,

I am a student at the University of Texas at Austin. Some researchers
in the computer science department at UT, myself included, have
recently been working to develop a file system crash consistency test
harness called CrashMonkey [1][2]. I have been working on the
CrashMonkey project since it was started late last year. With
CrashMonkey we have also been able to reproduce the incorrect i_size
error you noted but we have not been able to reproduce the other
output that Amir found. CrashMonkey works by logging and replaying
operations for a workload, so it should not be sensitive to
differences in timing that could be caused by things like KVM+virtio.
I also did a few experiments with Amir's new xfstests test 456 (both
with and without KVM and virtio) and I was unable to reproduce the
output noted in the xfstest. I have not spent a lot of time looking
into the cause of the bug that Amir found and it is rather unfortunate
that I was unable to reproduce it with either xfstests or CrashMonkey.

At any rate, CrashMonkey is still under development, so it does have
some caveats. First, we are running with a fixed random seed in our
default RandomPermuter (used to generate crash states) to aid
development. Second, the branch with the reproduction of this ext4
regression bug in CrashMonkey [3] will yield a few false positives due
to the way CrashMonkey works and how fsx runs. These false positives
are due to CrashMonkey generating crash states where the directories
for files used for the test have not be fsync-ed in the file system.
The top of the README in the CrashMonkey branch with this bug
reproduction outlines how we determined these were false positives

[1] https://github.com/utsaslab/crashmonkey
[2] https://www.usenix.org/conference/hotstorage17/program/presentation/martinez
[3] https://github.com/utsaslab/crashmonkey/tree/ext4_regression_bug


On Mon, Sep 25, 2017 at 5:53 AM, Amir Goldstein <amir73il@gmail.com> wrote:
> On Mon, Sep 25, 2017 at 12:49 PM, Xiao Yang <yangx.jy@cn.fujitsu.com> wrote:
>> On 2017/08/27 18:44, Amir Goldstein wrote:
>>> This test is motivated by a bug found in ext4 during random crash
>>> consistency tests.
>>>
>>> This test uses device mapper flakey target to demonstrate the bug
>>> found using device mapper log-writes target.
>>>
>>> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
>>> ---
>>>
>>> Ted,
>>>
>>> While working on crash consistency xfstests [1], I stubmled on what
>>> appeared to be an ext4 crash consistency bug.
>>>
>>> The tests I used rely on the log-writes dm target code written
>>> by Josef Bacik, which had little exposure to the wide community
>>> as far as I know.  I wanted to prove to myself that the found
>>> inconsistency was not due to a test bug, so I bisected the failed
>>> test to the minimal operations that trigger the failure and wrote
>>> a small independent test to reproduce the issue using dm flakey target.
>>>
>>> The following fsck error is reliably reproduced by replaying some fsx ops
>>> on overlapping file regions, then emulating a crash, followed by mount,
>>> umount and fsck -nf:
>>>
>>>   ./ltp/fsx -d --replay-ops /tmp/8995.fsxops /mnt/scratch/testfile
>>>   1 write 0x137dd thru    0x21445 (0xdc69 bytes)
>>>   2 falloc        from 0xb531 to 0x16ade (0xb5ad bytes)
>>>   3 collapse      from 0x1c000 to 0x20000, (0x4000 bytes)
>>>   4 write 0x3e5ec thru    0x3ffff (0x1a14 bytes)
>>>   5 zero  from 0x20fac to 0x27d48, (0x6d9c bytes)
>>>   6 mapwrite      0x216ad thru    0x23dfb (0x274f bytes)
>>>   All 7 operations completed A-OK!
>>>   _check_generic_filesystem: filesystem on /dev/mapper/ssd-scratch is inconsistent
>>>   *** fsck.ext4 output ***
>>>   fsck from util-linux 2.27.1
>>>   e2fsck 1.42.13 (17-May-2015)
>>>   Pass 1: Checking inodes, blocks, and sizes
>>>   Inode 12, end of extent exceeds allowed value
>>>           (logical block 33, physical block 33441, len 7)
>>>   Clear? no
>>>   Inode 12, i_blocks is 184, should be 128.  Fix? no
>> Hi Amir,
>>
>> I always get the following output when running your xfstests test case 501.
>
> Now merged as test generic/456
>
>> ---------------------------------------------------------------------------
>> e2fsck 1.42.9 (28-Dec-2013)
>> Pass 1: Checking inodes, blocks, and sizes
>> Inode 12, i_size is 147456, should be 163840. Fix? no
>> ---------------------------------------------------------------------------
>>
>> Could you tell me how to get the expected output as you reported?
>
> I can't say I am doing anything special, but I can say that I get the
> same output as you did when running the test inside kvm-xfstests.
> Actually, I could not reproduce ANY of the the crash consistency bugs
> inside kvm-xfstests. Must be something to do with different timing of
> IO with KVM+virtio disks??
>
> When running on my laptop (Ubuntu 16.04 with latest kernel)
> on a 10G SSD volume, I always get the error reported above.
> I just re-verified with latest stable e2fsprogs (1.43.6).
>
> Amir.
Xiao Yang Oct. 5, 2017, 7:27 a.m. UTC | #6
On 2017/09/30 22:15, Ashlie Martinez wrote:
> Hi Xiao,
>
> I am a student at the University of Texas at Austin. Some researchers
> in the computer science department at UT, myself included, have
> recently been working to develop a file system crash consistency test
> harness called CrashMonkey [1][2]. I have been working on the
> CrashMonkey project since it was started late last year. With
> CrashMonkey we have also been able to reproduce the incorrect i_size
> error you noted but we have not been able to reproduce the other
> output that Amir found. CrashMonkey works by logging and replaying
> operations for a workload, so it should not be sensitive to
> differences in timing that could be caused by things like KVM+virtio.
> I also did a few experiments with Amir's new xfstests test 456 (both
> with and without KVM and virtio) and I was unable to reproduce the
> output noted in the xfstest. I have not spent a lot of time looking
> into the cause of the bug that Amir found and it is rather unfortunate
> that I was unable to reproduce it with either xfstests or CrashMonkey.
Hi Ashlie,

Thanks for your detailed comments.

1) Do you think the output that Amir noted in xfstests is a false positive?

2) About the output that both i and you reproduced,  did you look into 
it and find its root cause?

Thanks,
Xiao Yang
> At any rate, CrashMonkey is still under development, so it does have
> some caveats. First, we are running with a fixed random seed in our
> default RandomPermuter (used to generate crash states) to aid
> development. Second, the branch with the reproduction of this ext4
> regression bug in CrashMonkey [3] will yield a few false positives due
> to the way CrashMonkey works and how fsx runs. These false positives
> are due to CrashMonkey generating crash states where the directories
> for files used for the test have not be fsync-ed in the file system.
> The top of the README in the CrashMonkey branch with this bug
> reproduction outlines how we determined these were false positives
>
> [1] https://github.com/utsaslab/crashmonkey
> [2] https://www.usenix.org/conference/hotstorage17/program/presentation/martinez
> [3] https://github.com/utsaslab/crashmonkey/tree/ext4_regression_bug
>
>
> On Mon, Sep 25, 2017 at 5:53 AM, Amir Goldstein<amir73il@gmail.com>  wrote:
>> On Mon, Sep 25, 2017 at 12:49 PM, Xiao Yang<yangx.jy@cn.fujitsu.com>  wrote:
>>> On 2017/08/27 18:44, Amir Goldstein wrote:
>>>> This test is motivated by a bug found in ext4 during random crash
>>>> consistency tests.
>>>>
>>>> This test uses device mapper flakey target to demonstrate the bug
>>>> found using device mapper log-writes target.
>>>>
>>>> Signed-off-by: Amir Goldstein<amir73il@gmail.com>
>>>> ---
>>>>
>>>> Ted,
>>>>
>>>> While working on crash consistency xfstests [1], I stubmled on what
>>>> appeared to be an ext4 crash consistency bug.
>>>>
>>>> The tests I used rely on the log-writes dm target code written
>>>> by Josef Bacik, which had little exposure to the wide community
>>>> as far as I know.  I wanted to prove to myself that the found
>>>> inconsistency was not due to a test bug, so I bisected the failed
>>>> test to the minimal operations that trigger the failure and wrote
>>>> a small independent test to reproduce the issue using dm flakey target.
>>>>
>>>> The following fsck error is reliably reproduced by replaying some fsx ops
>>>> on overlapping file regions, then emulating a crash, followed by mount,
>>>> umount and fsck -nf:
>>>>
>>>>    ./ltp/fsx -d --replay-ops /tmp/8995.fsxops /mnt/scratch/testfile
>>>>    1 write 0x137dd thru    0x21445 (0xdc69 bytes)
>>>>    2 falloc        from 0xb531 to 0x16ade (0xb5ad bytes)
>>>>    3 collapse      from 0x1c000 to 0x20000, (0x4000 bytes)
>>>>    4 write 0x3e5ec thru    0x3ffff (0x1a14 bytes)
>>>>    5 zero  from 0x20fac to 0x27d48, (0x6d9c bytes)
>>>>    6 mapwrite      0x216ad thru    0x23dfb (0x274f bytes)
>>>>    All 7 operations completed A-OK!
>>>>    _check_generic_filesystem: filesystem on /dev/mapper/ssd-scratch is inconsistent
>>>>    *** fsck.ext4 output ***
>>>>    fsck from util-linux 2.27.1
>>>>    e2fsck 1.42.13 (17-May-2015)
>>>>    Pass 1: Checking inodes, blocks, and sizes
>>>>    Inode 12, end of extent exceeds allowed value
>>>>            (logical block 33, physical block 33441, len 7)
>>>>    Clear? no
>>>>    Inode 12, i_blocks is 184, should be 128.  Fix? no
>>> Hi Amir,
>>>
>>> I always get the following output when running your xfstests test case 501.
>> Now merged as test generic/456
>>
>>> ---------------------------------------------------------------------------
>>> e2fsck 1.42.9 (28-Dec-2013)
>>> Pass 1: Checking inodes, blocks, and sizes
>>> Inode 12, i_size is 147456, should be 163840. Fix? no
>>> ---------------------------------------------------------------------------
>>>
>>> Could you tell me how to get the expected output as you reported?
>> I can't say I am doing anything special, but I can say that I get the
>> same output as you did when running the test inside kvm-xfstests.
>> Actually, I could not reproduce ANY of the the crash consistency bugs
>> inside kvm-xfstests. Must be something to do with different timing of
>> IO with KVM+virtio disks??
>>
>> When running on my laptop (Ubuntu 16.04 with latest kernel)
>> on a 10G SSD volume, I always get the error reported above.
>> I just re-verified with latest stable e2fsprogs (1.43.6).
>>
>> Amir.
>
> .
>
Ashlie Martinez Oct. 5, 2017, 3:04 p.m. UTC | #7
On Thu, Oct 5, 2017 at 2:27 AM, Xiao Yang <yangx.jy@cn.fujitsu.com> wrote:
> On 2017/09/30 22:15, Ashlie Martinez wrote:
>>
>> Hi Xiao,
>>
>> I am a student at the University of Texas at Austin. Some researchers
>> in the computer science department at UT, myself included, have
>> recently been working to develop a file system crash consistency test
>> harness called CrashMonkey [1][2]. I have been working on the
>> CrashMonkey project since it was started late last year. With
>> CrashMonkey we have also been able to reproduce the incorrect i_size
>> error you noted but we have not been able to reproduce the other
>> output that Amir found. CrashMonkey works by logging and replaying
>> operations for a workload, so it should not be sensitive to
>> differences in timing that could be caused by things like KVM+virtio.
>> I also did a few experiments with Amir's new xfstests test 456 (both
>> with and without KVM and virtio) and I was unable to reproduce the
>> output noted in the xfstest. I have not spent a lot of time looking
>> into the cause of the bug that Amir found and it is rather unfortunate
>> that I was unable to reproduce it with either xfstests or CrashMonkey.
>
> Hi Ashlie,
>
> Thanks for your detailed comments.
>
> 1) Do you think the output that Amir noted in xfstests is a false positive?

It almost seems that way, though to be honest, I don't think I know
enough about 1. the setup used by Amir and 2. all the internal working
of KVM+virtio to say for sure. One thing I have identified as being a
potential source of false positives is code in the kernel to remap the
block number sent to device drives to something relative to the start
of a block device, not the start of a partition. I'm thinking this
could cause trouble if 1. a partition higher than the first partition
is monitored by dm-log-writes, 2. the block numbers are recorded
verbatim in dm-log-writes, and 3. the dm-log-writes log is replayed on
a different device with different partitions (or possibly the same
device, but a different partition). I know some other undergrad
students at UT on the CrashMonkey team are looking into this right
now, but I have not had time to test this myself. The offending code
in the kernel is in the completely misnamed
`generic_make_request_checks` function which has given the CrashMonkey
team trouble in the past. Links to that function and the remapping
function it calls are below.

http://elixir.free-electrons.com/linux/latest/source/block/blk-core.c#L2041
http://elixir.free-electrons.com/linux/latest/source/block/blk-core.c#L1902

>
> 2) About the output that both i and you reproduced,  did you look into it
> and find its root cause?

I have not looked to find the root cause of the issue that we both
reliably reproduce.

>
> Thanks,
> Xiao Yang
>>
>> At any rate, CrashMonkey is still under development, so it does have
>> some caveats. First, we are running with a fixed random seed in our
>> default RandomPermuter (used to generate crash states) to aid
>> development. Second, the branch with the reproduction of this ext4
>> regression bug in CrashMonkey [3] will yield a few false positives due
>> to the way CrashMonkey works and how fsx runs. These false positives
>> are due to CrashMonkey generating crash states where the directories
>> for files used for the test have not be fsync-ed in the file system.
>> The top of the README in the CrashMonkey branch with this bug
>> reproduction outlines how we determined these were false positives
>>
>> [1] https://github.com/utsaslab/crashmonkey
>> [2]
>> https://www.usenix.org/conference/hotstorage17/program/presentation/martinez
>> [3] https://github.com/utsaslab/crashmonkey/tree/ext4_regression_bug
>>
>>
>> On Mon, Sep 25, 2017 at 5:53 AM, Amir Goldstein<amir73il@gmail.com>
>> wrote:
>>>
>>> On Mon, Sep 25, 2017 at 12:49 PM, Xiao Yang<yangx.jy@cn.fujitsu.com>
>>> wrote:
>>>>
>>>> On 2017/08/27 18:44, Amir Goldstein wrote:
>>>>>
>>>>> This test is motivated by a bug found in ext4 during random crash
>>>>> consistency tests.
>>>>>
>>>>> This test uses device mapper flakey target to demonstrate the bug
>>>>> found using device mapper log-writes target.
>>>>>
>>>>> Signed-off-by: Amir Goldstein<amir73il@gmail.com>
>>>>> ---
>>>>>
>>>>> Ted,
>>>>>
>>>>> While working on crash consistency xfstests [1], I stubmled on what
>>>>> appeared to be an ext4 crash consistency bug.
>>>>>
>>>>> The tests I used rely on the log-writes dm target code written
>>>>> by Josef Bacik, which had little exposure to the wide community
>>>>> as far as I know.  I wanted to prove to myself that the found
>>>>> inconsistency was not due to a test bug, so I bisected the failed
>>>>> test to the minimal operations that trigger the failure and wrote
>>>>> a small independent test to reproduce the issue using dm flakey target.
>>>>>
>>>>> The following fsck error is reliably reproduced by replaying some fsx
>>>>> ops
>>>>> on overlapping file regions, then emulating a crash, followed by mount,
>>>>> umount and fsck -nf:
>>>>>
>>>>>    ./ltp/fsx -d --replay-ops /tmp/8995.fsxops /mnt/scratch/testfile
>>>>>    1 write 0x137dd thru    0x21445 (0xdc69 bytes)
>>>>>    2 falloc        from 0xb531 to 0x16ade (0xb5ad bytes)
>>>>>    3 collapse      from 0x1c000 to 0x20000, (0x4000 bytes)
>>>>>    4 write 0x3e5ec thru    0x3ffff (0x1a14 bytes)
>>>>>    5 zero  from 0x20fac to 0x27d48, (0x6d9c bytes)
>>>>>    6 mapwrite      0x216ad thru    0x23dfb (0x274f bytes)
>>>>>    All 7 operations completed A-OK!
>>>>>    _check_generic_filesystem: filesystem on /dev/mapper/ssd-scratch is
>>>>> inconsistent
>>>>>    *** fsck.ext4 output ***
>>>>>    fsck from util-linux 2.27.1
>>>>>    e2fsck 1.42.13 (17-May-2015)
>>>>>    Pass 1: Checking inodes, blocks, and sizes
>>>>>    Inode 12, end of extent exceeds allowed value
>>>>>            (logical block 33, physical block 33441, len 7)
>>>>>    Clear? no
>>>>>    Inode 12, i_blocks is 184, should be 128.  Fix? no
>>>>
>>>> Hi Amir,
>>>>
>>>> I always get the following output when running your xfstests test case
>>>> 501.
>>>
>>> Now merged as test generic/456
>>>
>>>>
>>>> ---------------------------------------------------------------------------
>>>> e2fsck 1.42.9 (28-Dec-2013)
>>>> Pass 1: Checking inodes, blocks, and sizes
>>>> Inode 12, i_size is 147456, should be 163840. Fix? no
>>>>
>>>> ---------------------------------------------------------------------------
>>>>
>>>> Could you tell me how to get the expected output as you reported?
>>>
>>> I can't say I am doing anything special, but I can say that I get the
>>> same output as you did when running the test inside kvm-xfstests.
>>> Actually, I could not reproduce ANY of the the crash consistency bugs
>>> inside kvm-xfstests. Must be something to do with different timing of
>>> IO with KVM+virtio disks??
>>>
>>> When running on my laptop (Ubuntu 16.04 with latest kernel)
>>> on a 10G SSD volume, I always get the error reported above.
>>> I just re-verified with latest stable e2fsprogs (1.43.6).
>>>
>>> Amir.
>>
>>
>> .
>>
>
>
>
Amir Goldstein Oct. 5, 2017, 7:10 p.m. UTC | #8
On Thu, Oct 5, 2017 at 6:04 PM, Ashlie Martinez <ashmrtn@utexas.edu> wrote:
> On Thu, Oct 5, 2017 at 2:27 AM, Xiao Yang <yangx.jy@cn.fujitsu.com> wrote:
>> On 2017/09/30 22:15, Ashlie Martinez wrote:
>>>
>>> Hi Xiao,
>>>
>>> I am a student at the University of Texas at Austin. Some researchers
>>> in the computer science department at UT, myself included, have
>>> recently been working to develop a file system crash consistency test
>>> harness called CrashMonkey [1][2]. I have been working on the
>>> CrashMonkey project since it was started late last year. With
>>> CrashMonkey we have also been able to reproduce the incorrect i_size
>>> error you noted but we have not been able to reproduce the other
>>> output that Amir found. CrashMonkey works by logging and replaying
>>> operations for a workload, so it should not be sensitive to
>>> differences in timing that could be caused by things like KVM+virtio.
>>> I also did a few experiments with Amir's new xfstests test 456 (both
>>> with and without KVM and virtio) and I was unable to reproduce the
>>> output noted in the xfstest. I have not spent a lot of time looking
>>> into the cause of the bug that Amir found and it is rather unfortunate
>>> that I was unable to reproduce it with either xfstests or CrashMonkey.
>>
>> Hi Ashlie,
>>
>> Thanks for your detailed comments.
>>
>> 1) Do you think the output that Amir noted in xfstests is a false positive?
>
> It almost seems that way, though to be honest, I don't think I know
> enough about 1. the setup used by Amir and 2. all the internal working
> of KVM+virtio to say for sure.

I believe you misread my email.
The problem was NOT reproduced on KVM+virtio.
The problem *is* reproduced on a 10G LVM volume over SSD on
Ubuntu 16.04 with latest kernel and latest e2fsprogs.

There is no use in speculating why the problem doesn't happen on your
systems. If the issue reproduces on A system (2 systems actually that I tested)
that it is a problem.

Attached is an e2image dump of the inconsistent file system following test
generic/456 on my system.  I would have attached the image sooner,
but since on my system problem reproduces 100% on the time, I did not think
that was a need. Without an ext4 developer looking into this image, I don't
really see the point in further discussion.

I would be interesting to get more test samples from people running the test
on other systems. If only some people see the error
"end of extent exceeds allowed value"
it would be interesting to find the commonality between those setups.

> One thing I have identified as being a
> potential source of false positives is code in the kernel to remap the
> block number sent to device drives to something relative to the start
> of a block device, not the start of a partition. I'm thinking this
> could cause trouble if 1. a partition higher than the first partition
> is monitored by dm-log-writes, 2. the block numbers are recorded
> verbatim in dm-log-writes, and 3. the dm-log-writes log is replayed on
> a different device with different partitions (or possibly the same
> device, but a different partition).

You do realize that the test generic/456 is not using dm-log-writes at all.
I intentionally took it out of the equation and used the even dumber dm-flakey
target to demonstrate the crash inconsistency.

> I know some other undergrad
> students at UT on the CrashMonkey team are looking into this right
> now, but I have not had time to test this myself. The offending code
> in the kernel is in the completely misnamed
> `generic_make_request_checks` function which has given the CrashMonkey
> team trouble in the past. Links to that function and the remapping
> function it calls are below.
>
> http://elixir.free-electrons.com/linux/latest/source/block/blk-core.c#L2041
> http://elixir.free-electrons.com/linux/latest/source/block/blk-core.c#L1902
>
>>
>> 2) About the output that both i and you reproduced,  did you look into it
>> and find its root cause?
>
> I have not looked to find the root cause of the issue that we both
> reliably reproduce.
>

Now you have a broken file system image and the exact set of operations
that led to it. That's should be a pretty big lead for investigation.

Amir.
Ashlie Martinez Oct. 6, 2017, 12:34 a.m. UTC | #9
On Thu, Oct 5, 2017 at 2:10 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> On Thu, Oct 5, 2017 at 6:04 PM, Ashlie Martinez <ashmrtn@utexas.edu> wrote:
>> On Thu, Oct 5, 2017 at 2:27 AM, Xiao Yang <yangx.jy@cn.fujitsu.com> wrote:
>>> On 2017/09/30 22:15, Ashlie Martinez wrote:
>>>>
>>>> Hi Xiao,
>>>>
>>>> I am a student at the University of Texas at Austin. Some researchers
>>>> in the computer science department at UT, myself included, have
>>>> recently been working to develop a file system crash consistency test
>>>> harness called CrashMonkey [1][2]. I have been working on the
>>>> CrashMonkey project since it was started late last year. With
>>>> CrashMonkey we have also been able to reproduce the incorrect i_size
>>>> error you noted but we have not been able to reproduce the other
>>>> output that Amir found. CrashMonkey works by logging and replaying
>>>> operations for a workload, so it should not be sensitive to
>>>> differences in timing that could be caused by things like KVM+virtio.
>>>> I also did a few experiments with Amir's new xfstests test 456 (both
>>>> with and without KVM and virtio) and I was unable to reproduce the
>>>> output noted in the xfstest. I have not spent a lot of time looking
>>>> into the cause of the bug that Amir found and it is rather unfortunate
>>>> that I was unable to reproduce it with either xfstests or CrashMonkey.
>>>
>>> Hi Ashlie,
>>>
>>> Thanks for your detailed comments.
>>>
>>> 1) Do you think the output that Amir noted in xfstests is a false positive?
>>
>> It almost seems that way, though to be honest, I don't think I know
>> enough about 1. the setup used by Amir and 2. all the internal working
>> of KVM+virtio to say for sure.
>
> I believe you misread my email.
> The problem was NOT reproduced on KVM+virtio.
> The problem *is* reproduced on a 10G LVM volume over SSD on
> Ubuntu 16.04 with latest kernel and latest e2fsprogs.

I have also tried running generic/456 on non-KVM, non-virtio machines
and I was unable to reproduce it. Without information on test setups
individuals are using, it is very hard to tell where I, or other
people, are going wrong in testing other than maybe using a virtual
machine.

As an aside, it does not appear to be simply a timing issue due to
KVM+virtio. I would hope that CrashMonkey would still be able to find
the extent error you saw, even on a KVM VM since it is not dependent
on timing.

>
> There is no use in speculating why the problem doesn't happen on your
> systems. If the issue reproduces on A system (2 systems actually that I tested)
> that it is a problem.
>
> Attached is an e2image dump of the inconsistent file system following test
> generic/456 on my system.  I would have attached the image sooner,
> but since on my system problem reproduces 100% on the time, I did not think
> that was a need. Without an ext4 developer looking into this image, I don't
> really see the point in further discussion.
>
> I would be interesting to get more test samples from people running the test
> on other systems. If only some people see the error
> "end of extent exceeds allowed value"
> it would be interesting to find the commonality between those setups.
>

I agree it would be interesting to see why this error appears only for
some people.


>> One thing I have identified as being a
>> potential source of false positives is code in the kernel to remap the
>> block number sent to device drives to something relative to the start
>> of a block device, not the start of a partition. I'm thinking this
>> could cause trouble if 1. a partition higher than the first partition
>> is monitored by dm-log-writes, 2. the block numbers are recorded
>> verbatim in dm-log-writes, and 3. the dm-log-writes log is replayed on
>> a different device with different partitions (or possibly the same
>> device, but a different partition).
>
> You do realize that the test generic/456 is not using dm-log-writes at all.
> I intentionally took it out of the equation and used the even dumber dm-flakey
> target to demonstrate the crash inconsistency.

Oh, yes, I recall that now. The past few weeks have been very busy for
me with all of my other school work, so getting all of my i's dotted
and t's crossed has been a challenge. At any rate, I don't think it is
a real minus to mention the potential remapping of blocks in the cited
function. If people are going to use any sort of block device module
(be dm-log-writes or otherwise) then I think it would be good for
developers to be aware of.

>
>> I know some other undergrad
>> students at UT on the CrashMonkey team are looking into this right
>> now, but I have not had time to test this myself. The offending code
>> in the kernel is in the completely misnamed
>> `generic_make_request_checks` function which has given the CrashMonkey
>> team trouble in the past. Links to that function and the remapping
>> function it calls are below.
>>
>> http://elixir.free-electrons.com/linux/latest/source/block/blk-core.c#L2041
>> http://elixir.free-electrons.com/linux/latest/source/block/blk-core.c#L1902
>>
>>>
>>> 2) About the output that both i and you reproduced,  did you look into it
>>> and find its root cause?
>>
>> I have not looked to find the root cause of the issue that we both
>> reliably reproduce.
>>
>
> Now you have a broken file system image and the exact set of operations
> that led to it. That's should be a pretty big lead for investigation.
>
> Amir.
diff mbox series

Patch

diff --git a/tests/generic/501 b/tests/generic/501
new file mode 100755
index 0000000..ccb513d
--- /dev/null
+++ b/tests/generic/501
@@ -0,0 +1,80 @@ 
+#! /bin/bash
+# FS QA Test No. 501
+#
+# This test is motivated by a bug found in ext4 during random crash
+# consistency tests.
+#
+#-----------------------------------------------------------------------
+# Copyright (C) 2017 CTERA Networks. All Rights Reserved.
+# Author: Amir Goldstein <amir73il@gmail.com>
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#-----------------------------------------------------------------------
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1	# failure is the default!
+
+_cleanup()
+{
+	_cleanup_flakey
+	cd /
+	rm -f $tmp.*
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+. ./common/dmflakey
+
+# real QA test starts here
+_supported_fs generic
+_supported_os Linux
+_require_scratch
+_require_dm_target flakey
+_require_metadata_journaling $SCRATCH_DEV
+
+rm -f $seqres.full
+
+_scratch_mkfs >> $seqres.full 2>&1
+
+_init_flakey
+_mount_flakey
+
+fsxops=$tmp.fsxops
+cat <<EOF > $fsxops
+write 0x137dd 0xdc69 0x0
+fallocate 0xb531 0xb5ad 0x21446
+collapse_range 0x1c000 0x4000 0x21446
+write 0x3e5ec 0x1a14 0x21446
+zero_range 0x20fac 0x6d9c 0x40000 keep_size
+mapwrite 0x216ad 0x274f 0x40000
+EOF
+run_check $here/ltp/fsx -d --replay-ops $fsxops $SCRATCH_MNT/testfile
+
+_flakey_drop_and_remount
+_unmount_flakey
+_cleanup_flakey
+_check_scratch_fs
+
+echo "Silence is golden"
+
+status=0
+exit
diff --git a/tests/generic/501.out b/tests/generic/501.out
new file mode 100644
index 0000000..00133b6
--- /dev/null
+++ b/tests/generic/501.out
@@ -0,0 +1,2 @@ 
+QA output created by 501
+Silence is golden
diff --git a/tests/generic/group b/tests/generic/group
index 2396b72..bb870f2 100644
--- a/tests/generic/group
+++ b/tests/generic/group
@@ -454,3 +454,4 @@ 
 449 auto quick acl enospc
 450 auto quick rw
 500 auto log replay
+501 auto quick metadata