Patchwork xfstests: add disk failure simulation test

login
register
mail settings
Submitter Dmitri Monakho
Date Feb. 13, 2013, 3:41 p.m.
Message ID <1360770097-6351-1-git-send-email-dmonakhov@openvz.org>
Download mbox | patch
Permalink /patch/220172/
State Not Applicable
Headers show

Comments

Dmitri Monakho - Feb. 13, 2013, 3:41 p.m.
There are many situations where disk may fail for example
1) brutal usb dongle unplug
2) iscsi (or any other netbdev) failure due to network issues
In this situation filesystem which use this blockdevice is
expected to fail(force RO remount, abort, etc) but whole system
should still be operational. In other words:
1) Kernel should not panic
2) Memory should not leak
3) Data integrity operations (sync,fsync,fdatasync, directio) should fail
   for affected filesystem
4) It should be possible to umount broken filesystem

Later when disk becomes available again we expect(only for journaled filesystems):
5) It will be possible to mount filesystem w/o explicit fsck (in order to caught
   issues like https://patchwork.kernel.org/patch/1983981/)
6) Filesystem should be operational
7) After mount/umount has being done all errors should be fixed so fsck should
   not spot any issues.

This test use fault enjection (CONFIG_FAIL_MAKE_REQUEST=y config option )
which force all new IO requests to fail for a given device. Xfs already has
XFS_IOC_GOINGDOWN ioctl which provides similar behaviour, but it is fsspeciffic
and it does it in an easy way because it perform freeze_bdev() before actual
shotdown.

Test run fsstress in background and then force disk failure.
Once disk failed it check that (1)-(4) is true.
Then makes disk available again and check that (5)-(7) is also true

BE CAREFUL!! test known to cause memory corruption for XFS
see: https://gist.github.com/dmonakhov/4945005

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
---
 292           |  137 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 292.out       |    8 +++
 common.config |    1 +
 common.rc     |    6 +++
 group         |    1 +
 5 files changed, 153 insertions(+), 0 deletions(-)
 create mode 100755 292
 create mode 100644 292.out
Eric Sandeen - Feb. 13, 2013, 4:28 p.m.
On 2/13/13 9:41 AM, Dmitry Monakhov wrote:
> There are many situations where disk may fail for example
> 1) brutal usb dongle unplug
> 2) iscsi (or any other netbdev) failure due to network issues
> In this situation filesystem which use this blockdevice is
> expected to fail(force RO remount, abort, etc) but whole system
> should still be operational. In other words:
> 1) Kernel should not panic
> 2) Memory should not leak
> 3) Data integrity operations (sync,fsync,fdatasync, directio) should fail
>    for affected filesystem
> 4) It should be possible to umount broken filesystem
> 
> Later when disk becomes available again we expect(only for journaled filesystems):
> 5) It will be possible to mount filesystem w/o explicit fsck (in order to caught
>    issues like https://patchwork.kernel.org/patch/1983981/)
> 6) Filesystem should be operational
> 7) After mount/umount has being done all errors should be fixed so fsck should
>    not spot any issues.

Thanks, this should be very useful.  A couple questions & nitpicky
things below.

> This test use fault enjection (CONFIG_FAIL_MAKE_REQUEST=y config option )
> which force all new IO requests to fail for a given device. Xfs already has
> XFS_IOC_GOINGDOWN ioctl which provides similar behaviour, but it is fsspeciffic
> and it does it in an easy way because it perform freeze_bdev() before actual
> shotdown.

Well, just FWIW it depends on the flags it was given:

/*
 * Flags for going down operation
 */
#define XFS_FSOP_GOING_FLAGS_DEFAULT            0x0     /* going down */
#define XFS_FSOP_GOING_FLAGS_LOGFLUSH           0x1     /* flush log but not data */
#define XFS_FSOP_GOING_FLAGS_NOLOGFLUSH         0x2     /* don't flush log nor data */

Only the default case does freeze_bdev.

> Test run fsstress in background and then force disk failure.
> Once disk failed it check that (1)-(4) is true.
> Then makes disk available again and check that (5)-(7) is also true
> 
> BE CAREFUL!! test known to cause memory corruption for XFS
> see: https://gist.github.com/dmonakhov/4945005

For other reviewers, this patch depends on the series:
[PATCH 0/8] xfstests: Stress tests improments v3

> Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
> ---
>  292           |  137 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  292.out       |    8 +++
>  common.config |    1 +
>  common.rc     |    6 +++
>  group         |    1 +
>  5 files changed, 153 insertions(+), 0 deletions(-)
>  create mode 100755 292
>  create mode 100644 292.out
> 
> diff --git a/292 b/292
> new file mode 100755
> index 0000000..3dd6608
> --- /dev/null
> +++ b/292
> @@ -0,0 +1,136 @@
> +#! /bin/bash
> +# FSQA Test No. 292
> +#
> +# Run fsstress and simulate disk failure
> +# check filesystem consystency at the end.

"consistency," just to be picky.

> +#
> +#-----------------------------------------------------------------------
> +# Copyright (c) 2006 Silicon Graphics, Inc.  All Rights Reserved.

(c) 2013 Dmitry Monakhov

> +#
> +# This program is free software; you can redistribute it and/or
> +# modify it under the terms of the GNU General Public License as
> +# published by the Free Software Foundation.
> +#
> +# This program is distributed in the hope that it would be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program; if not, write the Free Software Foundation,
> +# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
> +#
> +#-----------------------------------------------------------------------
> +#
> +# creator
> +owner=dmonakhov@openvz.org
> +
> +seq=`basename $0`
> +echo "QA output created by $seq"
> +
> +here=`pwd`
> +tmp=/tmp/$$
> +status=1	# failure is the default!
> +
> +# get standard environment, filters and checks
> +. ./common.rc
> +. ./common.filter
> +
> +# TODO move it to common.blkdev if necessery

maybe a comment as to why you do this?  (presumably to find the right thing in /sys)
I hope this always works with all udev schemes etc?

> +SCRATCH_REAL_DEV=`readlink -f $SCRATCH_DEV`
> +SCRATCH_BDEV=`basename $SCRATCH_REAL_DEV`
> +
> +allow_fail_make_request()
> +{
> +    [ -f "$DEBUGFS_MNT/fail_make_request/probability" ] \
> +	|| _notrun "$DEBUGFS_MNT/fail_make_request \
> + not found. Seems that CONFIG_FAIL_MAKE_REQUEST kernel config option not enabled"
> +
> +    echo "Allow global fail_make_request feature"
> +    echo 100 > $DEBUGFS_MNT/fail_make_request/probability
> +    echo 9999999 > $DEBUGFS_MNT/fail_make_request/times
> +    echo 0 >  /sys/kernel/debug/fail_make_request/verbose
> +
> +}
> +
> +disallow_fail_make_request()
> +{
> +    echo "Disallow global fail_make_request feature"
> +    echo 0 > $DEBUGFS_MNT/fail_make_request/probability
> +    echo 0 > $DEBUGFS_MNT/fail_make_request/times
> +}
> +
> +poweroff_scratch_dev()

I guess I'd prefer just "fail_scratch_dev" since you aren't really
powering off anything.  :)

> +{
> +    echo "Force SCRATCH_DEV device failure"
> +    echo " echo 1 > /sys/block/$SCRATCH_BDEV/make-it-fail" >> $here/$seq.full
> +    echo 1 > /sys/block/$SCRATCH_BDEV/make-it-fail
> +
> +}
> +
> +poweron_scratch_dev()

Hm "unfail_scratch_dev?" :)

> +{
> +    echo "Make SCRATCH_DEV device operatable again"

"operable"

> +    echo " echo 0 > /sys/block/$SCRATCH_BDEV/make-it-fail" >> $here/$seq.full
> +    echo 0 > /sys/block/$SCRATCH_BDEV/make-it-fail
> +
> +}
> +
> +_cleanup()
> +{
> +    poweron_scratch_dev
> +    disallow_fail_make_request
> +}
> +trap "_cleanup; exit \$status" 1 2 3 15
> +
> +# Disable all sync operations to get higher load
> +FSSTRESS_AVOID="$FSSTRESS_AVOID -ffsync=0 -fsync=0 -ffdatasync=0 -f setattr=1"
> +
> +_workout()
> +{
> +	out=$SCRATCH_MNT/fsstress.$$
> +	args=`_scale_fsstress_args -p 1 -n999999999 -f setattr=0 $FSSTRESS_AVOID -d $out`
> +	echo ""
> +	echo "Run fsstress"
> +	echo ""
> +	echo "fsstress $args" >> $here/$seq.full
> +	$FSSTRESS_PROG $args > /dev/null 2>&1 &
> +	pid=$!
> +	# Let's it work for awhile
> +	sleep $((20 + 10 * $TIME_FACTOR))
> +	poweroff_scratch_dev
> +	# After device turns in to failed state filesystem may not know about that
> +	# so buffered write(2) may succeed, but any integrity operation such as
> +	# (sync, fsync, fdatasync, direct-io) should fail.
> +	dd if=/dev/zero of=$SCRATCH_MNT/touch_failed_filesystem count=1 bs=4k conv=fsync \
> +	    >> $here/$seq.full 2>&1 && \
> +	    _fail "failed: still able to perform integrity sync on $SCRATCH_MNT"
> +
> +	kill $pid
> +	wait $pid
> +
> +	# We expect that broken fs is still can be umounted
> +	run_check umount -f $SCRATCH_DEV

why -f, out of curiosity?

> +	poweron_scratch_dev
> +
> +	# In order to check that filesystem is able to recover journal on mount(2)
> +	# perform mount/umount, after that all errors should be fixed
> +	run_check _scratch_mount
> +	run_check _scratch_unmount
> +	_check_scratch_fs
> +}
> +
> +# real QA test starts here
> +_supported_fs ext3 ext4 xfs btrfs

Could this be expanded to "generic" but only enforce a post-log-recovery clean
fsck for filesystems we know have journaling?

> +_supported_os Linux
> +_need_to_be_root
> +_require_scratch
> +_require_debugfs

should you have a:

_require_fail_make_request

here, instead of doing it in allow_fail_make_request ?
Might just be a little more obvious, and possibly useful
for other tests that build on this one.

> +
> +_scratch_mkfs >> $here/$seq.full 2>&1 || _fail "mkfs failed"
> +_scratch_mount || _fail "mount failed"
> +allow_fail_make_request
> +_workout
> +status=$?
> +disallow_fail_make_request
> +exit
> diff --git a/292.out b/292.out
> new file mode 100644
> index 0000000..08d9820
> --- /dev/null
> +++ b/292.out
> @@ -0,0 +1,8 @@
> +QA output created by 292
> +Allow global fail_make_request feature
> +
> +Run fsstress
> +
> +Force SCRATCH_DEV device failure
> +Make SCRATCH_DEV device operatable again
> +Disallow global fail_make_request feature
> diff --git a/common.config b/common.config
> index a956a46..f7a2422 100644
> --- a/common.config
> +++ b/common.config
> @@ -75,6 +75,7 @@ export BENCH_PASSES=${BENCH_PASSES:=5}
>  export XFS_MKFS_OPTIONS=${XFS_MKFS_OPTIONS:=-bsize=4096}
>  export TIME_FACTOR=${TIME_FACTOR:=1}
>  export LOAD_FACTOR=${LOAD_FACTOR:=1}
> +export DEBUGFS_MNT=${DEBUGFS_MNT:="/sys/kernel/debug"}
>  
>  export PWD=`pwd`
>  #export DEBUG=${DEBUG:=...} # arbitrary CFLAGS really.
> diff --git a/common.rc b/common.rc
> index 5c3dda1..d5d9c9f 100644
> --- a/common.rc
> +++ b/common.rc
> @@ -1027,6 +1027,12 @@ _require_sparse_files()
>      esac
>  }
>  
> +_require_debugfs()
> +{
> +    #boot_params always present in debugfs
> +    [ -d "$DEBUGFS_MNT/boot_params" ] || _notrun "Debugfs not mounted"
> +}

Would it make more sense to look for debugfs in /proc/filesystems
as a test for it being *available* (as opposed to mounted somewhere?)

I wonder if a helper (maybe in _require_debugfs) should work out if
it's mounted, if not, try to mount it, and in the end, export DEBUGFS_MNT
for any test that wants to use it.

Otherwise if it happens to be mounted elsewhere, this'll all fail.
Just a thought.  Maybe that's unusual enough that there's no point.
But getting it mounted if it's not would be helpful I think.

> +
>  # check that a FS on a device is mounted
>  # if so, return mount point
>  #
> diff --git a/group b/group
> index b962a33..c409957 100644
> --- a/group
> +++ b/group
> @@ -415,3 +415,4 @@ stress
>  289 auto aio dangerous ioctl rw stress
>  290 auto aio dangerous ioctl rw stress
>  291 auto aio dangerous ioctl rw stress
> +292 dangerous rw stress
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dave Chinner - Feb. 13, 2013, 9:32 p.m.
On Wed, Feb 13, 2013 at 07:41:37PM +0400, Dmitry Monakhov wrote:
> BE CAREFUL!! test known to cause memory corruption for XFS
> see: https://gist.github.com/dmonakhov/4945005

Commit 9f87832 (xfs: fix shutdown hang on invalid inode during
create) might actually fix this. IIRC, it went into 3.8-rc7 so that
wouldn't have been in the kernel you were testing. Can you retest
with a more recent kernel and if it is still busted, send a bug
report to the list?

Cheers,

Dave.
Dmitri Monakho - Feb. 14, 2013, 1:52 p.m.
On Wed, 13 Feb 2013 10:28:35 -0600, Eric Sandeen <sandeen@redhat.com> wrote:
> On 2/13/13 9:41 AM, Dmitry Monakhov wrote:
> > There are many situations where disk may fail for example
> > 1) brutal usb dongle unplug
> > 2) iscsi (or any other netbdev) failure due to network issues
> > In this situation filesystem which use this blockdevice is
> > expected to fail(force RO remount, abort, etc) but whole system
> > should still be operational. In other words:
> > 1) Kernel should not panic
> > 2) Memory should not leak
> > 3) Data integrity operations (sync,fsync,fdatasync, directio) should fail
> >    for affected filesystem
> > 4) It should be possible to umount broken filesystem
> > 
> > Later when disk becomes available again we expect(only for journaled filesystems):
> > 5) It will be possible to mount filesystem w/o explicit fsck (in order to caught
> >    issues like https://patchwork.kernel.org/patch/1983981/)
> > 6) Filesystem should be operational
> > 7) After mount/umount has being done all errors should be fixed so fsck should
> >    not spot any issues.
> 
> Thanks, this should be very useful.  A couple questions & nitpicky
> things below.
> 
> > This test use fault enjection (CONFIG_FAIL_MAKE_REQUEST=y config option )
> > which force all new IO requests to fail for a given device. Xfs already has
> > XFS_IOC_GOINGDOWN ioctl which provides similar behaviour, but it is fsspeciffic
> > and it does it in an easy way because it perform freeze_bdev() before actual
> > shotdown.
> 
> Well, just FWIW it depends on the flags it was given:
> 
> /*
>  * Flags for going down operation
>  */
> #define XFS_FSOP_GOING_FLAGS_DEFAULT            0x0     /* going down */
> #define XFS_FSOP_GOING_FLAGS_LOGFLUSH           0x1     /* flush log but not data */
> #define XFS_FSOP_GOING_FLAGS_NOLOGFLUSH         0x2     /* don't flush log nor data */
> 
> Only the default case does freeze_bdev.
Ohh that is correct (the only unimplemented ioctl left is XFS_MAKE_ME_COFFE),
xfstest even have  086'th, 121'th and 181'th testcases which check that
feature, but recent XFS still can not survive after disk failure:
https://gist.github.com/dmonakhov/4951456

fail injection framework is very simple and clean layer which just works.

> 
> > Test run fsstress in background and then force disk failure.
> > Once disk failed it check that (1)-(4) is true.
> > Then makes disk available again and check that (5)-(7) is also true
> > 
> > BE CAREFUL!! test known to cause memory corruption for XFS
> > see: https://gist.github.com/dmonakhov/4945005
> 
> For other reviewers, this patch depends on the series:
> [PATCH 0/8] xfstests: Stress tests improments v3
> 
> > Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
> > ---
> >  292           |  137 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  292.out       |    8 +++
> >  common.config |    1 +
> >  common.rc     |    6 +++
> >  group         |    1 +
> >  5 files changed, 153 insertions(+), 0 deletions(-)
> >  create mode 100755 292
> >  create mode 100644 292.out
> > 
> > diff --git a/292 b/292
> > new file mode 100755
> > index 0000000..3dd6608
> > --- /dev/null
> > +++ b/292
> > @@ -0,0 +1,136 @@
> > +#! /bin/bash
> > +# FSQA Test No. 292
> > +#
> > +# Run fsstress and simulate disk failure
> > +# check filesystem consystency at the end.
> 
> "consistency," just to be picky.
fixed
> 
> > +#
> > +#-----------------------------------------------------------------------
> > +# Copyright (c) 2006 Silicon Graphics, Inc.  All Rights Reserved.
> 
> (c) 2013 Dmitry Monakhov
> 
> > +#
> > +# This program is free software; you can redistribute it and/or
> > +# modify it under the terms of the GNU General Public License as
> > +# published by the Free Software Foundation.
> > +#
> > +# This program is distributed in the hope that it would be useful,
> > +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > +# GNU General Public License for more details.
> > +#
> > +# You should have received a copy of the GNU General Public License
> > +# along with this program; if not, write the Free Software Foundation,
> > +# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
> > +#
> > +#-----------------------------------------------------------------------
> > +#
> > +# creator
> > +owner=dmonakhov@openvz.org
> > +
> > +seq=`basename $0`
> > +echo "QA output created by $seq"
> > +
> > +here=`pwd`
> > +tmp=/tmp/$$
> > +status=1	# failure is the default!
> > +
> > +# get standard environment, filters and checks
> > +. ./common.rc
> > +. ./common.filter
> > +
> > +# TODO move it to common.blkdev if necessery
> 
> maybe a comment as to why you do this?  (presumably to find the right thing in /sys)
> I hope this always works with all udev schemes etc?
I just ment to say that functions below are good candidates to became
common wrappers.
> 
> > +SCRATCH_REAL_DEV=`readlink -f $SCRATCH_DEV`
> > +SCRATCH_BDEV=`basename $SCRATCH_REAL_DEV`
> > +
> > +allow_fail_make_request()
> > +{
> > +    [ -f "$DEBUGFS_MNT/fail_make_request/probability" ] \
> > +	|| _notrun "$DEBUGFS_MNT/fail_make_request \
> > + not found. Seems that CONFIG_FAIL_MAKE_REQUEST kernel config option not enabled"
> > +
> > +    echo "Allow global fail_make_request feature"
> > +    echo 100 > $DEBUGFS_MNT/fail_make_request/probability
> > +    echo 9999999 > $DEBUGFS_MNT/fail_make_request/times
> > +    echo 0 >  /sys/kernel/debug/fail_make_request/verbose
> > +
> > +}
> > +
> > +disallow_fail_make_request()
> > +{
> > +    echo "Disallow global fail_make_request feature"
> > +    echo 0 > $DEBUGFS_MNT/fail_make_request/probability
> > +    echo 0 > $DEBUGFS_MNT/fail_make_request/times
> > +}
> > +
> > +poweroff_scratch_dev()
> 
> I guess I'd prefer just "fail_scratch_dev" since you aren't really
> powering off anything.  :)
I've switched to start_fail_scratch_dev/stop_fail_scratch_dev
> 
> > +{
> > +    echo "Force SCRATCH_DEV device failure"
> > +    echo " echo 1 > /sys/block/$SCRATCH_BDEV/make-it-fail" >> $here/$seq.full
> > +    echo 1 > /sys/block/$SCRATCH_BDEV/make-it-fail
> > +
> > +}
> > +
> > +poweron_scratch_dev()
> 
> Hm "unfail_scratch_dev?" :)
> 
> > +{
> > +    echo "Make SCRATCH_DEV device operatable again"
> 
> "operable"
fixed
> 
> > +    echo " echo 0 > /sys/block/$SCRATCH_BDEV/make-it-fail" >> $here/$seq.full
> > +    echo 0 > /sys/block/$SCRATCH_BDEV/make-it-fail
> > +
> > +}
> > +
> > +_cleanup()
> > +{
> > +    poweron_scratch_dev
> > +    disallow_fail_make_request
> > +}
> > +trap "_cleanup; exit \$status" 1 2 3 15
> > +
> > +# Disable all sync operations to get higher load
> > +FSSTRESS_AVOID="$FSSTRESS_AVOID -ffsync=0 -fsync=0 -ffdatasync=0 -f setattr=1"
> > +
> > +_workout()
> > +{
> > +	out=$SCRATCH_MNT/fsstress.$$
> > +	args=`_scale_fsstress_args -p 1 -n999999999 -f setattr=0 $FSSTRESS_AVOID -d $out`
> > +	echo ""
> > +	echo "Run fsstress"
> > +	echo ""
> > +	echo "fsstress $args" >> $here/$seq.full
> > +	$FSSTRESS_PROG $args > /dev/null 2>&1 &
> > +	pid=$!
> > +	# Let's it work for awhile
> > +	sleep $((20 + 10 * $TIME_FACTOR))
> > +	poweroff_scratch_dev
> > +	# After device turns in to failed state filesystem may not know about that
> > +	# so buffered write(2) may succeed, but any integrity operation such as
> > +	# (sync, fsync, fdatasync, direct-io) should fail.
> > +	dd if=/dev/zero of=$SCRATCH_MNT/touch_failed_filesystem count=1 bs=4k conv=fsync \
> > +	    >> $here/$seq.full 2>&1 && \
> > +	    _fail "failed: still able to perform integrity sync on $SCRATCH_MNT"
> > +
> > +	kill $pid
> > +	wait $pid
> > +
> > +	# We expect that broken fs is still can be umounted
> > +	run_check umount -f $SCRATCH_DEV
> 
> why -f, out of curiosity?
Ohh yes, this should work w/o force option
> 
> > +	poweron_scratch_dev
> > +
> > +	# In order to check that filesystem is able to recover journal on mount(2)
> > +	# perform mount/umount, after that all errors should be fixed
> > +	run_check _scratch_mount
> > +	run_check _scratch_unmount
> > +	_check_scratch_fs
> > +}
> > +
> > +# real QA test starts here
> > +_supported_fs ext3 ext4 xfs btrfs
> 
> Could this be expanded to "generic" but only enforce a post-log-recovery clean
> fsck for filesystems we know have journaling?
It it reasonable to create two separate tests
1) for journaled fs: check (1)-(4) and (5)-(7)
2) for generic fs  : check (1)-(4) and does fsck -f -y  (check that fsck
is capable to fix errors)
> 
> > +_supported_os Linux
> > +_need_to_be_root
> > +_require_scratch
> > +_require_debugfs
> 
> should you have a:
> 
> _require_fail_make_request
done
> 
> here, instead of doing it in allow_fail_make_request ?
> Might just be a little more obvious, and possibly useful
> for other tests that build on this one.
> 
> > +
> > +_scratch_mkfs >> $here/$seq.full 2>&1 || _fail "mkfs failed"
> > +_scratch_mount || _fail "mount failed"
> > +allow_fail_make_request
> > +_workout
> > +status=$?
> > +disallow_fail_make_request
> > +exit
> > diff --git a/292.out b/292.out
> > new file mode 100644
> > index 0000000..08d9820
> > --- /dev/null
> > +++ b/292.out
> > @@ -0,0 +1,8 @@
> > +QA output created by 292
> > +Allow global fail_make_request feature
> > +
> > +Run fsstress
> > +
> > +Force SCRATCH_DEV device failure
> > +Make SCRATCH_DEV device operatable again
> > +Disallow global fail_make_request feature
> > diff --git a/common.config b/common.config
> > index a956a46..f7a2422 100644
> > --- a/common.config
> > +++ b/common.config
> > @@ -75,6 +75,7 @@ export BENCH_PASSES=${BENCH_PASSES:=5}
> >  export XFS_MKFS_OPTIONS=${XFS_MKFS_OPTIONS:=-bsize=4096}
> >  export TIME_FACTOR=${TIME_FACTOR:=1}
> >  export LOAD_FACTOR=${LOAD_FACTOR:=1}
> > +export DEBUGFS_MNT=${DEBUGFS_MNT:="/sys/kernel/debug"}
> >  
> >  export PWD=`pwd`
> >  #export DEBUG=${DEBUG:=...} # arbitrary CFLAGS really.
> > diff --git a/common.rc b/common.rc
> > index 5c3dda1..d5d9c9f 100644
> > --- a/common.rc
> > +++ b/common.rc
> > @@ -1027,6 +1027,12 @@ _require_sparse_files()
> >      esac
> >  }
> >  
> > +_require_debugfs()
> > +{
> > +    #boot_params always present in debugfs
> > +    [ -d "$DEBUGFS_MNT/boot_params" ] || _notrun "Debugfs not mounted"
> > +}
> 
> Would it make more sense to look for debugfs in /proc/filesystems
> as a test for it being *available* (as opposed to mounted somewhere?)
> 
> I wonder if a helper (maybe in _require_debugfs) should work out if
> it's mounted, if not, try to mount it, and in the end, export DEBUGFS_MNT
> for any test that wants to use it.
> 
> Otherwise if it happens to be mounted elsewhere, this'll all fail.
> Just a thought.  Maybe that's unusual enough that there's no point.
> But getting it mounted if it's not would be helpful I think.

> 
> > +
> >  # check that a FS on a device is mounted
> >  # if so, return mount point
> >  #
> > diff --git a/group b/group
> > index b962a33..c409957 100644
> > --- a/group
> > +++ b/group
> > @@ -415,3 +415,4 @@ stress
> >  289 auto aio dangerous ioctl rw stress
> >  290 auto aio dangerous ioctl rw stress
> >  291 auto aio dangerous ioctl rw stress
> > +292 dangerous rw stress
> > 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Sandeen - Feb. 14, 2013, 3:15 p.m.
On 2/14/13 7:52 AM, Dmitry Monakhov wrote:
> On Wed, 13 Feb 2013 10:28:35 -0600, Eric Sandeen <sandeen@redhat.com> wrote:
>> On 2/13/13 9:41 AM, Dmitry Monakhov wrote:

<snip>

>>> +# get standard environment, filters and checks
>>> +. ./common.rc
>>> +. ./common.filter
>>> +
>>> +# TODO move it to common.blkdev if necessery
>>
>> maybe a comment as to why you do this?  (presumably to find the right thing in /sys)
>> I hope this always works with all udev schemes etc?
> I just ment to say that functions below are good candidates to became
> common wrappers.

Sure, but what is the reason for the wrapper?

On inspection I think its' because you need the right sysfs name; it'd
just be nice to say that it's the reason for the readlink/basename
frobbing of the existing $SCRATCH_DEV.  Not a huge deal.

>>> +SCRATCH_REAL_DEV=`readlink -f $SCRATCH_DEV`
>>> +SCRATCH_BDEV=`basename $SCRATCH_REAL_DEV`
>>> +

<snip>

>>> +_require_debugfs()
>>> +{
>>> +    #boot_params always present in debugfs
>>> +    [ -d "$DEBUGFS_MNT/boot_params" ] || _notrun "Debugfs not mounted"
>>> +}
>>
>> Would it make more sense to look for debugfs in /proc/filesystems
>> as a test for it being *available* (as opposed to mounted somewhere?)
>>
>> I wonder if a helper (maybe in _require_debugfs) should work out if
>> it's mounted, if not, try to mount it, and in the end, export DEBUGFS_MNT
>> for any test that wants to use it.
>>
>> Otherwise if it happens to be mounted elsewhere, this'll all fail.
>> Just a thought.  Maybe that's unusual enough that there's no point.
>> But getting it mounted if it's not would be helpful I think.

Any thoughts on this?  As it stands it requires debugfs to be
at /sys/kernel/debug (by default) *and* mounted prior to the test run.
So it's another (maybe unexpected) piece of pre-test setup which might
result in this test not getting run.

-Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dmitri Monakho - Feb. 19, 2013, 11:14 a.m.
On Thu, 14 Feb 2013 09:15:46 -0600, Eric Sandeen <sandeen@redhat.com> wrote:
> On 2/14/13 7:52 AM, Dmitry Monakhov wrote:
> > On Wed, 13 Feb 2013 10:28:35 -0600, Eric Sandeen <sandeen@redhat.com> wrote:
> >> On 2/13/13 9:41 AM, Dmitry Monakhov wrote:
> 
> <snip>
> 
> >>> +# get standard environment, filters and checks
> >>> +. ./common.rc
> >>> +. ./common.filter
> >>> +
> >>> +# TODO move it to common.blkdev if necessery
> >>
> >> maybe a comment as to why you do this?  (presumably to find the right thing in /sys)
> >> I hope this always works with all udev schemes etc?
> > I just ment to say that functions below are good candidates to became
> > common wrappers.
> 
> Sure, but what is the reason for the wrapper?
> 
> On inspection I think its' because you need the right sysfs name; it'd
> just be nice to say that it's the reason for the readlink/basename
> frobbing of the existing $SCRATCH_DEV.  Not a huge deal.
Most people use LVM's names similar to /dev/vg/log1, but real name is
/dev/md-xxx, also some fancy SCSI targets may has crazy names.
> 
> >>> +SCRATCH_REAL_DEV=`readlink -f $SCRATCH_DEV`
> >>> +SCRATCH_BDEV=`basename $SCRATCH_REAL_DEV`
> >>> +
> 
> <snip>
> 
> >>> +_require_debugfs()
> >>> +{
> >>> +    #boot_params always present in debugfs
> >>> +    [ -d "$DEBUGFS_MNT/boot_params" ] || _notrun "Debugfs not mounted"
> >>> +}
> >>
> >> Would it make more sense to look for debugfs in /proc/filesystems
> >> as a test for it being *available* (as opposed to mounted somewhere?)
> >>
> >> I wonder if a helper (maybe in _require_debugfs) should work out if
> >> it's mounted, if not, try to mount it, and in the end, export DEBUGFS_MNT
> >> for any test that wants to use it.
> >>
> >> Otherwise if it happens to be mounted elsewhere, this'll all fail.
> >> Just a thought.  Maybe that's unusual enough that there's no point.
> >> But getting it mounted if it's not would be helpful I think.
> 
> Any thoughts on this?  As it stands it requires debugfs to be
> at /sys/kernel/debug (by default) *and* mounted prior to the test run.
> So it's another (maybe unexpected) piece of pre-test setup which might
> result in this test not getting run.
I just try to preserve blktrace(8) behaviour which complain if
debugfs is absent. IMHO debugfs is MUST_HAVE feature for testing environment.
So if not mounted it was done with purpose.
> 
> -Eric
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Greg Freemyer - Feb. 22, 2013, 3:27 a.m.
On Wed, Feb 13, 2013 at 10:41 AM, Dmitry Monakhov <dmonakhov@openvz.org> wrote:
> There are many situations where disk may fail for example
> 1) brutal usb dongle unplug
> 2) iscsi (or any other netbdev) failure due to network issues
> In this situation filesystem which use this blockdevice is
> expected to fail(force RO remount, abort, etc) but whole system
> should still be operational. In other words:
> 1) Kernel should not panic
> 2) Memory should not leak
> 3) Data integrity operations (sync,fsync,fdatasync, directio) should fail
>    for affected filesystem
> 4) It should be possible to umount broken filesystem

Out of curiosity, does xfstest also have fault injection at the sector level?

It may be a little too aggressive, but hdparm --make-bad-sector
nnnnnnn can use a ATA long_write to write out a sector and
non-matching crc.  When the sector is then read after that, the drive
returns a media error.

At the end of the test  hdparm --repair-sector nnnnnnn will fix the
bad sector and store a valid crc.

The reason I say it is aggressive is that matched pairs of
--make-bad-sector and --repair-sector should have no long term effect
on the drive, but non-matched pairs will leave the drive with a media
error.  A normal write to that "bad" sector will force it to be
remapped to a spare sector.  I don't know of a simple way to undo that
mapping.

Greg
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/292 b/292
new file mode 100755
index 0000000..3dd6608
--- /dev/null
+++ b/292
@@ -0,0 +1,136 @@ 
+#! /bin/bash
+# FSQA Test No. 292
+#
+# Run fsstress and simulate disk failure
+# check filesystem consystency at the end.
+#
+#-----------------------------------------------------------------------
+# Copyright (c) 2006 Silicon Graphics, Inc.  All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#
+#-----------------------------------------------------------------------
+#
+# creator
+owner=dmonakhov@openvz.org
+
+seq=`basename $0`
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1	# failure is the default!
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+
+# TODO move it to common.blkdev if necessery
+SCRATCH_REAL_DEV=`readlink -f $SCRATCH_DEV`
+SCRATCH_BDEV=`basename $SCRATCH_REAL_DEV`
+
+allow_fail_make_request()
+{
+    [ -f "$DEBUGFS_MNT/fail_make_request/probability" ] \
+	|| _notrun "$DEBUGFS_MNT/fail_make_request \
+ not found. Seems that CONFIG_FAIL_MAKE_REQUEST kernel config option not enabled"
+
+    echo "Allow global fail_make_request feature"
+    echo 100 > $DEBUGFS_MNT/fail_make_request/probability
+    echo 9999999 > $DEBUGFS_MNT/fail_make_request/times
+    echo 0 >  /sys/kernel/debug/fail_make_request/verbose
+
+}
+
+disallow_fail_make_request()
+{
+    echo "Disallow global fail_make_request feature"
+    echo 0 > $DEBUGFS_MNT/fail_make_request/probability
+    echo 0 > $DEBUGFS_MNT/fail_make_request/times
+}
+
+poweroff_scratch_dev()
+{
+    echo "Force SCRATCH_DEV device failure"
+    echo " echo 1 > /sys/block/$SCRATCH_BDEV/make-it-fail" >> $here/$seq.full
+    echo 1 > /sys/block/$SCRATCH_BDEV/make-it-fail
+
+}
+
+poweron_scratch_dev()
+{
+    echo "Make SCRATCH_DEV device operatable again"
+    echo " echo 0 > /sys/block/$SCRATCH_BDEV/make-it-fail" >> $here/$seq.full
+    echo 0 > /sys/block/$SCRATCH_BDEV/make-it-fail
+
+}
+
+_cleanup()
+{
+    poweron_scratch_dev
+    disallow_fail_make_request
+}
+trap "_cleanup; exit \$status" 1 2 3 15
+
+# Disable all sync operations to get higher load
+FSSTRESS_AVOID="$FSSTRESS_AVOID -ffsync=0 -fsync=0 -ffdatasync=0 -f setattr=1"
+
+_workout()
+{
+	out=$SCRATCH_MNT/fsstress.$$
+	args=`_scale_fsstress_args -p 1 -n999999999 -f setattr=0 $FSSTRESS_AVOID -d $out`
+	echo ""
+	echo "Run fsstress"
+	echo ""
+	echo "fsstress $args" >> $here/$seq.full
+	$FSSTRESS_PROG $args > /dev/null 2>&1 &
+	pid=$!
+	# Let's it work for awhile
+	sleep $((20 + 10 * $TIME_FACTOR))
+	poweroff_scratch_dev
+	# After device turns in to failed state filesystem may not know about that
+	# so buffered write(2) may succeed, but any integrity operation such as
+	# (sync, fsync, fdatasync, direct-io) should fail.
+	dd if=/dev/zero of=$SCRATCH_MNT/touch_failed_filesystem count=1 bs=4k conv=fsync \
+	    >> $here/$seq.full 2>&1 && \
+	    _fail "failed: still able to perform integrity sync on $SCRATCH_MNT"
+
+	kill $pid
+	wait $pid
+
+	# We expect that broken fs is still can be umounted
+	run_check umount -f $SCRATCH_DEV
+	poweron_scratch_dev
+
+	# In order to check that filesystem is able to recover journal on mount(2)
+	# perform mount/umount, after that all errors should be fixed
+	run_check _scratch_mount
+	run_check _scratch_unmount
+	_check_scratch_fs
+}
+
+# real QA test starts here
+_supported_fs ext3 ext4 xfs btrfs
+_supported_os Linux
+_need_to_be_root
+_require_scratch
+_require_debugfs
+
+_scratch_mkfs >> $here/$seq.full 2>&1 || _fail "mkfs failed"
+_scratch_mount || _fail "mount failed"
+allow_fail_make_request
+_workout
+status=$?
+disallow_fail_make_request
+exit
diff --git a/292.out b/292.out
new file mode 100644
index 0000000..08d9820
--- /dev/null
+++ b/292.out
@@ -0,0 +1,8 @@ 
+QA output created by 292
+Allow global fail_make_request feature
+
+Run fsstress
+
+Force SCRATCH_DEV device failure
+Make SCRATCH_DEV device operatable again
+Disallow global fail_make_request feature
diff --git a/common.config b/common.config
index a956a46..f7a2422 100644
--- a/common.config
+++ b/common.config
@@ -75,6 +75,7 @@  export BENCH_PASSES=${BENCH_PASSES:=5}
 export XFS_MKFS_OPTIONS=${XFS_MKFS_OPTIONS:=-bsize=4096}
 export TIME_FACTOR=${TIME_FACTOR:=1}
 export LOAD_FACTOR=${LOAD_FACTOR:=1}
+export DEBUGFS_MNT=${DEBUGFS_MNT:="/sys/kernel/debug"}
 
 export PWD=`pwd`
 #export DEBUG=${DEBUG:=...} # arbitrary CFLAGS really.
diff --git a/common.rc b/common.rc
index 5c3dda1..d5d9c9f 100644
--- a/common.rc
+++ b/common.rc
@@ -1027,6 +1027,12 @@  _require_sparse_files()
     esac
 }
 
+_require_debugfs()
+{
+    #boot_params always present in debugfs
+    [ -d "$DEBUGFS_MNT/boot_params" ] || _notrun "Debugfs not mounted"
+}
+
 # check that a FS on a device is mounted
 # if so, return mount point
 #
diff --git a/group b/group
index b962a33..c409957 100644
--- a/group
+++ b/group
@@ -415,3 +415,4 @@  stress
 289 auto aio dangerous ioctl rw stress
 290 auto aio dangerous ioctl rw stress
 291 auto aio dangerous ioctl rw stress
+292 dangerous rw stress