Patchwork [11/15] e2fsprogs-tests-f_random_corruption.patch

login
register
mail settings
Submitter Kalpak Shah
Date Oct. 6, 2008, 10:31 a.m.
Message ID <1223289084.4007.91.camel@localhost>
Download mbox | patch
Permalink /patch/2890/
State Deferred
Delegated to: Theodore Ts'o
Headers show

Comments

Kalpak Shah - Oct. 6, 2008, 10:31 a.m.
The f_random_corruption test enables a random subset of filesystem
features, picks one of the valid filesystem block and inode sizes, and a
random device size and creates a new filesystem with those parameters.

It is possible to disable the running of the test by setting the
environment variable F_RANDOM_CORRUPTION=skip.  By default the test
script is run only one time, but setting the LOOP_COUNT variable allows
the test to run multiple times.

If the script is running as root the filesystem is mounted and populated
with file data to allow a more useful test filesystem to be generated.
In some cases the kernel may not support the requested filesystem
features and the filesystem cannot be mounted.  This is not considered a
test failure.

The resulting filesystem is corrupted with both random data and by
shifting data from one part of the device to another and then repaired
by e2fsck. In some rare cases the random corruption is severe enough
that the filesystem is not recoverable (e.g. small filesystem with no
backup superblock has bad superblock corruption) but in most cases
"e2fsck -fy" should be able to fix all errors in some way.

After e2fsck has repaired the filesystem, it is optionally mounted (if
the environment variable MOUNT_AFTER_CORRUPTION=yes is set) and the test
files created in the filesystem are deleted.  This verifies that the
fixes in the filesystem are sufficient for the kernel to use the
filesystem without error. Since there is some possibility of the kernel
oopsing if there is a filesystem bug, this part of the test is not
enabled by default.

Signed-off-by: Andreas Dilger <adilger@sun.com>
Signed-off-by: Kalpak Shah <kalpak.shah@sun.com>

Patch

The f_random_corruption test enables a random subset of filesystem features,
picks one of the valid filesystem block and inode sizes, and a random device
size and creates a new filesystem with those parameters.

It is possible to disable the running of the test by setting the environment
variable F_RANDOM_CORRUPTION=skip.  By default the test script is run only
one time, but setting the LOOP_COUNT variable allows the test to run multiple
times.

If the script is running as root the filesystem is mounted and populated with
file data to allow a more useful test filesystem to be generated.  In some
cases the kernel may not support the requested filesystem features and the
filesystem cannot be mounted.  This is not considered a test failure.

The resulting filesystem is corrupted with both random data and by shifting
data from one part of the device to another and then repaired by e2fsck.
In some rare cases the random corruption is severe enough that the filesystem
is not recoverable (e.g. small filesystem with no backup superblock has bad
superblock corruption) but in most cases "e2fsck -fy" should be able to fix
all errors in some way.

After e2fsck has repaired the filesystem, it is optionally mounted (if the
environment variable MOUNT_AFTER_CORRUPTION=yes is set) and the test files
created in the filesystem are deleted.  This verifies that the fixes in the
filesystem are sufficient for the kernel to use the filesystem without error.
Since there is some possibility of the kernel oopsing if there is a filesystem
bug, this part of the test is not enabled by default.

Signed-off-by: Andreas Dilger <adilger@clusterfs.com>
Signed-off-by: Kalpak Shah <kalpak@clusterfs.com>

Index: e2fsprogs-1.41.1/tests/f_random_corruption/script
===================================================================
--- /dev/null
+++ e2fsprogs-1.41.1/tests/f_random_corruption/script
@@ -0,0 +1,271 @@ 
+# This is to make sure that if this test fails other tests can still be run
+# instead of doing an exit. We break before the end of the loop.
+export LOOP_COUNT=${LOOP_COUNT:-1}
+export COUNT=0
+
+while [ $COUNT -lt $LOOP_COUNT ]; do
+[ "$F_RANDOM_CORRUPTION" = "skip" ] && echo "skipped" && break
+
+# choose block and inode sizes randomly
+BLK_SIZES=(1024 2048 4096)
+INODE_SIZES=(128 256 512 1024)
+
+SEED=$(head -1 /dev/urandom | od -N 1 | awk '{ print $2 }')
+RANDOM=$SEED
+
+IMAGE=${IMAGE:-$TMPFILE}
+DATE=`date '+%Y%m%d%H%M%S'`
+ARCHIVE=$IMAGE.$DATE
+SIZE=${SIZE:-$(( 192000 + RANDOM + RANDOM )) }
+FS_TYPE=${FS_TYPE:-ext3}
+BLK_SIZE=${BLK_SIZES[(( $RANDOM % ${#BLK_SIZES[*]} ))]}
+INODE_SIZE=${INODE_SIZES[(( $RANDOM % ${#INODE_SIZES[*]} ))]}
+DEF_FEATURES="sparse_super,filetype,dir_index"
+FEATURES=${FEATURES:-$DEF_FEATURES}
+MOUNT_OPTS="-o loop"
+MNTPT=$test_dir/temp
+OUT=$test_name.log
+FAILED=$test_name.failed
+OKFILE=$test_name.ok
+
+# Do you want to try and mount the filesystem?
+MOUNT_AFTER_CORRUPTION=${MOUNT_AFTER_CORRUPTION:-"no"}
+# Do you want to remove the files from the mounted filesystem?
+# Ideally use it only in test environment.
+REMOVE_FILES=${REMOVE_FILES:-"no"}
+
+# In KB
+CORRUPTION_SIZE=${CORRUPTION_SIZE:-64}
+CORRUPTION_ITERATIONS=${CORRUPTION_ITERATIONS:-5}
+
+MKFS=../misc/mke2fs
+E2FSCK=../e2fsck/e2fsck
+FIRST_FSCK_OPTS="-fyv"
+SECOND_FSCK_OPTS="-fyv"
+
+# Lets check if the image can fit in the current filesystem.
+BASE_DIR=`dirname $IMAGE`
+BASE_AVAIL_BLOCKS=`df -P -k $BASE_DIR  | awk '/%/ { print $4 }'`
+
+if (( BASE_AVAIL_BLOCKS < NUM_BLKS * (BLK_SIZE / 1024) )); then
+	echo "$BASE_DIR does not have enough space to accomodate test image."
+	echo "Skipping test...."
+	break;
+fi
+
+# Lets have a journal more times than not.
+HAVE_JOURNAL=$((RANDOM % 12 ))
+if (( HAVE_JOURNAL == 0 )); then
+	FS_TYPE="ext2"
+	HAVE_JOURNAL=""
+else
+	HAVE_JOURNAL="-j"
+fi
+
+# meta_bg and resize_inode features should not be enabled simultaneously
+META_BG=$(( $RANDOM % 12 ))
+if (( META_BG == 0 )); then
+	FEATURES=$FEATURES,meta_bg,^resize_inode
+else
+	FEATURES=$FEATURES,resize_inode
+fi
+
+modprobe ext4 2> /dev/null
+modprobe ext4dev 2> /dev/null
+
+# If ext4 is present in the kernel then we can play with ext4 options
+EXT4=`grep ext4 /proc/filesystems`
+if [ -n "$EXT4" ]; then
+	USE_EXT4=$((RANDOM % 2 ))
+	if (( USE_EXT4 == 1 )); then
+		FS_TYPE="ext4dev"
+	fi
+fi
+
+if [ "$FS_TYPE" = "ext4dev" ]; then
+	UNINIT_GROUPS=$((RANDOM % 12 ))
+	if (( UNINIT_GROUPS == 0 )); then
+		FEATURES=$FEATURES,uninit_groups
+	fi
+	EXPAND_ESIZE=$((RANDOM % 12 ))
+	if (( EXPAND_EISIZE == 0 )); then
+		FIRST_FSCK_OPTS=$FIRST_FSCK_OPTS," -E expand_extra_isize"
+	fi
+fi
+
+MKFS_OPTS="$HAVE_JOURNAL -b $BLK_SIZE -I $INODE_SIZE -O $FEATURES"
+
+NUM_BLKS=$(((SIZE * 1024) / BLK_SIZE))
+
+log()
+{
+	[ "$VERBOSE" ] && echo "$*"
+	echo "$*" >> $OUT
+}
+
+error()
+{
+	log "$*"
+	echo "$*" >> $FAILED
+}
+
+unset_vars()
+{
+	unset IMAGE DATE ARCHIVE FS_TYPE SIZE BLK_SIZE MKFS_OPTS MOUNT_OPTS
+	unset E2FSCK FIRST_FSCK_OPTS SECOND_FSCK_OPTS OUT FAILED OKFILE
+}
+
+cleanup()
+{
+	[ "$1" ] && error "$*" || error "Error occured..."
+	umount -f $MNTPT > /dev/null 2>&1 | tee -a $OUT
+	cp $OUT $OUT.$DATE
+	echo " failed"
+	echo "*** This appears to be a bug in e2fsprogs ***"
+	echo "Please contact linux-ext4@vger.kernel.org for further assistance."
+	echo "Include $OUT.$DATE, and save $ARCHIVE locally for reference."
+	unset_vars
+	break;
+}
+
+echo -n "Random corruption test for e2fsck:"
+# Truncate the output log file
+rm -f $FAILED $OKFILE
+> $OUT
+
+get_random_location()
+{
+	total=$1
+
+	tmp=$(((RANDOM * 32768) % total))
+
+	# Try and have more corruption in metadata at the start of the
+	# filesystem.
+	if ((tmp % 3 == 0 || tmp % 5 == 0 || tmp % 7 == 0)); then
+		tmp=$((tmp % 32768))
+	fi
+
+	echo $tmp
+}
+
+make_fs_dirty()
+{
+	from=`get_random_location $NUM_BLKS`
+
+	# Number of blocks to write garbage into should be within fs and should
+	# not be too many.
+	num_blks_to_dirty=$((RANDOM % $1))
+
+	# write garbage into the selected blocks
+	[ ! -c /dev/urandom ] && return
+	log "writing ${num_blks_to_dirty}kB random garbage at offset ${from}kB"
+	dd if=/dev/urandom of=$IMAGE bs=1kB seek=$from conv=notrunc \
+		count=$num_blks_to_dirty >> $OUT 2>&1
+}
+
+
+touch $IMAGE
+log "Format the filesystem image..."
+log
+# Write some garbage blocks into the filesystem to make sure e2fsck has to do
+# a more difficult job than checking blocks of zeroes.
+log "Copy some random data into filesystem image...."
+make_fs_dirty 32768
+log "$MKFS $MKFS_OPTS -F $IMAGE $NUM_BLKS >> $OUT"
+$MKFS $MKFS_OPTS -F $IMAGE $NUM_BLKS >> $OUT 2>&1
+if [ $? -ne 0 ]; then
+	zero_size=`grep "Device size reported to be zero" $OUT`
+	short_write=`grep "Attempt to write block from filesystem resulted in short write" $OUT`
+
+	if (( zero_size != 0 || short_write != 0 )); then
+		echo "mkfs failed due to device size of 0 or a short write. This is harmless and need not be reported."
+	else
+		cleanup "mkfs failed - internal error during operation. Aborting random regression test..."
+	fi
+fi
+
+if [ `id -u` = 0 ]; then
+	mkdir -p $MNTPT
+	if [ $? -ne 0 ]; then
+		log "Failed to create or find mountpoint...."
+	else
+		mount -t $FS_TYPE $MOUNT_OPTS $IMAGE $MNTPT 2>&1 | tee -a $OUT
+		if [ $? -ne 0 ]; then
+			log "Unable to mount file system - skipped"
+		else
+			df -h $MNTPT >> $OUT
+			df -i $MNTPT >> $OUT
+			log "Copying data into the test filesystem..."
+
+			cp -r ../ $MNTPT >> $OUT 2>&1
+			sync
+			umount -f $MNTPT > /dev/null 2>&1 | tee -a $OUT
+		fi
+	fi
+else
+	log "skipping mount test for non-root user"
+fi
+
+log "Corrupt the image by moving around blocks of data..."
+log
+for (( i = 0; i < $CORRUPTION_ITERATIONS; i++ )); do
+	from=`get_random_location $NUM_BLKS`
+	to=`get_random_location $NUM_BLKS`
+
+	log "Moving ${CORRUPTION_SIZE}kB from block ${from}kB to ${to}kB"
+	dd if=$IMAGE of=$IMAGE bs=1k count=$CORRUPTION_SIZE conv=notrunc skip=$from seek=$to >> $OUT 2>&1
+
+	# more corruption by overwriting blocks from within the filesystem.
+	make_fs_dirty $CORRUPTION_SIZE
+done
+
+# Copy the image for reproducing the bug.
+cp --sparse=always $IMAGE $ARCHIVE >> $OUT 2>&1
+
+log "First pass of fsck..."
+$E2FSCK $FIRST_FSCK_OPTS $IMAGE >> $OUT 2>&1
+RET=$?
+
+# Run e2fsck for the second time and check if the problem gets solved.
+# After we can report error with pass1.
+[ $((RET & 1)) == 0 ] || log "The first fsck corrected errors"
+[ $((RET & 2)) == 0 ] || error "The first fsck wants a reboot"
+[ $((RET & 4)) == 0 ] || error "The first fsck left uncorrected errors"
+[ $((RET & 8)) == 0 ] || error "The first fsck reports an operational error"
+[ $((RET & 16)) == 0 ] || error "The first fsck reports there was a usage error"
+[ $((RET & 32)) == 0 ] || error "The first fsck reports it was cancelled"
+[ $((RET & 128)) == 0 ] || error "The first fsck reports a library error"
+
+log "---------------------------------------------------------"
+
+log "Second pass of fsck..."
+$E2FSCK $SECOND_FSCK_OPTS $IMAGE >> $OUT 2>&1
+RET=$?
+[ $((RET & 1)) == 0 ] || cleanup "The second fsck corrected errors!"
+[ $((RET & 2)) == 0 ] || cleanup "The second fsck wants a reboot"
+[ $((RET & 4)) == 0 ] || cleanup "The second fsck left uncorrected errors"
+[ $((RET & 8)) == 0 ] || cleanup "The second fsck reports an operational error"
+[ $((RET & 16)) == 0 ] || cleanup "The second fsck reports a usage error"
+[ $((RET & 32)) == 0 ] || cleanup "The second fsck reports it was cancelled"
+[ $((RET & 128)) == 0 ] || cleanup "The second fsck reports a library error"
+
+[ -f $FAILED ] && cleanup
+
+if [ "$MOUNT_AFTER_CORRUPTION" = "yes" ]; then
+	mount -t $FS_TYPE $MOUNT_OPTS $IMAGE $MNTPT 2>&1 | tee -a $OUT
+	[ $? -ne 0 ] && log "Unable to mount file system - skipped"
+
+	[ "$REMOVE_FILES" = "yes" ] && rm -rf $MNTPT/* >> $OUT
+	umount -f $MNTPT > /dev/null 2>&1 | tee -a $OUT
+fi
+
+rm -f $ARCHIVE
+
+# Report success
+echo " ok"
+echo "Succeeded..." > $OKFILE
+
+unset_vars
+
+COUNT=$((COUNT + 1))
+done
Index: e2fsprogs-1.41.1/tests/Makefile.in
===================================================================
--- e2fsprogs-1.41.1.orig/tests/Makefile.in
+++ e2fsprogs-1.41.1/tests/Makefile.in
@@ -27,6 +27,8 @@  mke2fs.conf: $(srcdir)/../misc/mke2fs.co
 	sed -e 's/blocksize = -1/blocksize = 4096/' $(srcdir)/../misc/mke2fs.conf >mke2fs.conf
 
 check:: test_script
+	@echo "Removing remnants of earlier tests..."
+	$(RM) -f *~ *.log *.new *.failed *.ok test.img2*
 	@echo "Running e2fsprogs test suite..."
 	@echo " "
 	@./test_script
@@ -66,7 +68,7 @@  testend: test_script ${TDIR}/image
 	@echo "If all is well, edit ${TDIR}/name and rename ${TDIR}."
 
 clean::
-	$(RM) -f *~ *.log *.new *.failed *.ok test.img test_script mke2fs.conf
+	$(RM) -f *~ *.log *.new *.failed *.ok test.img* test_script mke2fs.conf
 
 distclean:: clean
 	$(RM) -f Makefile