Patchwork [-v4,6/6] fault-injection: add notifier error injection testing scripts

login
register
mail settings
Submitter Akinobu Mita
Date June 23, 2012, 2:58 p.m.
Message ID <1340463502-15341-7-git-send-email-akinobu.mita@gmail.com>
Download mbox | patch
Permalink /patch/166767/
State Superseded
Headers show

Comments

Akinobu Mita - June 23, 2012, 2:58 p.m.
This adds two testing scripts with notifier error injection

* tools/testing/fault-injection/cpu-notifier.sh is testing script for
CPU notifier error handling by using cpu-notifier-error-inject.ko.

1. Offline all hot-pluggable CPUs in preparation for testing
2. Test CPU hot-add error handling by injecting notifier errors
3. Online all hot-pluggable CPUs in preparation for testing
4. Test CPU hot-remove error handling by injecting notifier errors

* tools/testing/fault-injection/memory-notifier.sh is doing the similar
thing for memory hotplug notifier.

1. Offline 10% of hot-pluggable memory in preparation for testing
2. Test memory hot-add error handling by injecting notifier errors
3. Online all hot-pluggable memory in preparation for testing
4. Test memory hot-remove error handling by injecting notifier errors

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: linux-pm@lists.linux-foundation.org
Cc: Greg KH <greg@kroah.com>
Cc: linux-mm@kvack.org
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Américo Wang <xiyou.wangcong@gmail.com>
---
* v4
- add -r option for memory-notifier.sh to specify percent of offlining
  memory blocks

 tools/testing/fault-injection/cpu-notifier.sh    |  169 +++++++++++++++++++++
 tools/testing/fault-injection/memory-notifier.sh |  176 ++++++++++++++++++++++
 2 files changed, 345 insertions(+)
 create mode 100755 tools/testing/fault-injection/cpu-notifier.sh
 create mode 100755 tools/testing/fault-injection/memory-notifier.sh
Andrew Morton - June 26, 2012, 11:31 p.m.
On Sat, 23 Jun 2012 23:58:22 +0900
Akinobu Mita <akinobu.mita@gmail.com> wrote:

> This adds two testing scripts with notifier error injection

Can we move these into tools/testing/selftests/, so that a "make
run_tests" runs these tests?

Also, I don't think it's appropriate that "fault-injection" be in the
path - that's an implementation detail.  What we're testing here is
memory hotplug, pm, cpu hotplug, etc.  So each test would go into, say,
tools/testing/selftests/cpu-hotplug.

Now, your cpu-hotplug test only tests a tiny part of the cpu-hotplug
code.  But it is a start, and creates the place where additional tests
will be placed in the future.


If the kernel configuration means that the tests cannot be run, the
attempt should succeed so that other tests are not disrupted.  I guess
that printing a warning in this case is useful.

Probably the selftests will require root permissions - we haven't
really thought about that much.  If these tests require root (I assume
they do?) then a sensible approach would be to check for that and to
emit a warning and return "success".

My overall take on the fault-injection code is that there has been a
disappointing amount of uptake: I don't see many developers using them
for whitebox testing their stuff.  I guess this patchset addresses
that, in a way.
Dave Jones - June 26, 2012, 11:58 p.m.
On Tue, Jun 26, 2012 at 04:31:47PM -0700, Andrew Morton wrote:

 > My overall take on the fault-injection code is that there has been a
 > disappointing amount of uptake: I don't see many developers using them
 > for whitebox testing their stuff.  I guess this patchset addresses
 > that, in a way.

I added support for make-it-fail to my syscall fuzzer a while ago.
(if the file exists, the child processes set it before calling the fuzzed syscall).
I've not had a chance to really play with it, because I find enough problems
already even without it.

	Dave
Akinobu Mita - June 27, 2012, 11:42 a.m.
2012/6/27 Andrew Morton <akpm@linux-foundation.org>:
> On Sat, 23 Jun 2012 23:58:22 +0900
> Akinobu Mita <akinobu.mita@gmail.com> wrote:
>
>> This adds two testing scripts with notifier error injection
>
> Can we move these into tools/testing/selftests/, so that a "make
> run_tests" runs these tests?
>
> Also, I don't think it's appropriate that "fault-injection" be in the
> path - that's an implementation detail.  What we're testing here is
> memory hotplug, pm, cpu hotplug, etc.  So each test would go into, say,
> tools/testing/selftests/cpu-hotplug.
>
> Now, your cpu-hotplug test only tests a tiny part of the cpu-hotplug
> code.  But it is a start, and creates the place where additional tests
> will be placed in the future.
>
>
> If the kernel configuration means that the tests cannot be run, the
> attempt should succeed so that other tests are not disrupted.  I guess
> that printing a warning in this case is useful.
>
> Probably the selftests will require root permissions - we haven't
> really thought about that much.  If these tests require root (I assume
> they do?) then a sensible approach would be to check for that and to
> emit a warning and return "success".

Thanks for your advice.

I'm going to make the following changes on these scripts

1. Change these paths to:
tools/testing/selftests/{cpu,memory}-hotplug/on-off-test.sh

2. Skip tests and exit(0) with a warning if no root or no sysfs
so that a "make run_tests" doesn't stop.

3. Add tests that simply online and offline cpus (or memory blocks)
and then tests with this notifier error injection features if the
kernel supports.

> My overall take on the fault-injection code is that there has been a
> disappointing amount of uptake: I don't see many developers using them
> for whitebox testing their stuff.  I guess this patchset addresses
> that, in a way.

I hope so. the impact of notifier error injection is restricted to
the particular kernel functionarity and these scripts are easy to run.

On the other hand, fault injection like failslab has a huge impact
on any kernel components and it often results catastrophe to userspace
even if no kernel bug.  I am confident that I can find a certain amount
of kernel bugs with failslab but it requires enough spare time.

Patch

diff --git a/tools/testing/fault-injection/cpu-notifier.sh b/tools/testing/fault-injection/cpu-notifier.sh
new file mode 100755
index 0000000..af93630
--- /dev/null
+++ b/tools/testing/fault-injection/cpu-notifier.sh
@@ -0,0 +1,169 @@ 
+#!/bin/bash
+
+#
+# list all hot-pluggable CPUs
+#
+hotpluggable_cpus()
+{
+	local state=${1:-.\*}
+
+	for cpu in /sys/devices/system/cpu/cpu*; do
+		if [ -f $cpu/online ] && grep -q $state $cpu/online; then
+			echo ${cpu##/*/cpu}
+		fi
+	done
+}
+
+hotplaggable_offline_cpus()
+{
+	hotpluggable_cpus 0
+}
+
+hotpluggable_online_cpus()
+{
+	hotpluggable_cpus 1
+}
+
+cpu_is_online()
+{
+	grep -q 1 /sys/devices/system/cpu/cpu$1/online
+}
+
+cpu_is_offline()
+{
+	grep -q 0 /sys/devices/system/cpu/cpu$1/online
+}
+
+add_cpu()
+{
+	echo 1 > /sys/devices/system/cpu/cpu$1/online
+}
+
+remove_cpu()
+{
+	echo 0 > /sys/devices/system/cpu/cpu$1/online
+}
+
+add_cpu_expect_success()
+{
+	local cpu=$1
+
+	if ! add_cpu $cpu; then
+		echo $FUNCNAME $cpu: unexpected fail >&2
+	elif ! cpu_is_online $cpu; then
+		echo $FUNCNAME $cpu: unexpected offline >&2
+	fi
+}
+
+add_cpu_expect_fail()
+{
+	local cpu=$1
+
+	if add_cpu $cpu 2> /dev/null; then
+		echo $FUNCNAME $cpu: unexpected success >&2
+	elif ! cpu_is_offline $cpu; then
+		echo $FUNCNAME $cpu: unexpected online >&2
+	fi
+}
+
+remove_cpu_expect_success()
+{
+	local cpu=$1
+
+	if ! remove_cpu $cpu; then
+		echo $FUNCNAME $cpu: unexpected fail >&2
+	elif ! cpu_is_offline $cpu; then
+		echo $FUNCNAME $cpu: unexpected offline >&2
+	fi
+}
+
+remove_cpu_expect_fail()
+{
+	local cpu=$1
+
+	if remove_cpu $cpu 2> /dev/null; then
+		echo $FUNCNAME $cpu: unexpected success >&2
+	elif ! cpu_is_online $cpu; then
+		echo $FUNCNAME $cpu: unexpected offline >&2
+	fi
+}
+
+if [ $UID != 0 ]; then
+	echo must be run as root >&2
+	exit 1
+fi
+
+error=-12
+priority=0
+
+while getopts e:hp: opt; do
+	case $opt in
+	e)
+		error=$OPTARG
+		;;
+	h)
+		echo "Usage $0 [ -e errno ] [ -p notifier-priority ]"
+		exit
+		;;
+	p)
+		priority=$OPTARG
+		;;
+	esac
+done
+
+if ! [ "$error" -ge -4095 -a "$error" -lt 0 ]; then
+	echo "error code must be -4095 <= errno < 0" >&2
+	exit 1
+fi
+
+DEBUGFS=`mount -t debugfs | head -1 | awk '{ print $3 }'`
+
+if [ ! -d "$DEBUGFS" ]; then
+	echo debugfs is not mounted >&2
+	exit 1
+fi
+
+/sbin/modprobe -r cpu-notifier-error-inject
+/sbin/modprobe -q cpu-notifier-error-inject priority=$priority
+
+NOTIFIER_ERR_INJECT_DIR=$DEBUGFS/notifier-error-inject/cpu
+
+if [ ! -d $NOTIFIER_ERR_INJECT_DIR ]; then
+	echo cpu-notifier-error-inject module is not available >&2
+	exit 1
+fi
+
+#
+# Offline all hot-pluggable CPUs
+#
+echo 0 > $NOTIFIER_ERR_INJECT_DIR/actions/CPU_DOWN_PREPARE/error
+for cpu in `hotpluggable_online_cpus`; do
+	remove_cpu_expect_success $cpu
+done
+
+#
+# Test CPU hot-add error handling (offline => online)
+#
+echo $error > $NOTIFIER_ERR_INJECT_DIR/actions/CPU_UP_PREPARE/error
+for cpu in `hotplaggable_offline_cpus`; do
+	add_cpu_expect_fail $cpu
+done
+
+#
+# Online all hot-pluggable CPUs
+#
+echo 0 > $NOTIFIER_ERR_INJECT_DIR/actions/CPU_UP_PREPARE/error
+for cpu in `hotplaggable_offline_cpus`; do
+	add_cpu_expect_success $cpu
+done
+
+#
+# Test CPU hot-remove error handling (online => offline)
+#
+echo $error > $NOTIFIER_ERR_INJECT_DIR/actions/CPU_DOWN_PREPARE/error
+for cpu in `hotpluggable_online_cpus`; do
+	remove_cpu_expect_fail $cpu
+done
+
+echo 0 > $NOTIFIER_ERR_INJECT_DIR/actions/CPU_DOWN_PREPARE/error
+/sbin/modprobe -r cpu-notifier-error-inject
diff --git a/tools/testing/fault-injection/memory-notifier.sh b/tools/testing/fault-injection/memory-notifier.sh
new file mode 100755
index 0000000..843cba7
--- /dev/null
+++ b/tools/testing/fault-injection/memory-notifier.sh
@@ -0,0 +1,176 @@ 
+#!/bin/bash
+
+#
+# list all hot-pluggable memory
+#
+hotpluggable_memory()
+{
+	local state=${1:-.\*}
+
+	for memory in /sys/devices/system/memory/memory*; do
+		if grep -q 1 $memory/removable &&
+		   grep -q $state $memory/state; then
+			echo ${memory##/*/memory}
+		fi
+	done
+}
+
+hotplaggable_offline_memory()
+{
+	hotpluggable_memory offline
+}
+
+hotpluggable_online_memory()
+{
+	hotpluggable_memory online
+}
+
+memory_is_online()
+{
+	grep -q online /sys/devices/system/memory/memory$1/state
+}
+
+memory_is_offline()
+{
+	grep -q offline /sys/devices/system/memory/memory$1/state
+}
+
+add_memory()
+{
+	echo online > /sys/devices/system/memory/memory$1/state
+}
+
+remove_memory()
+{
+	echo offline > /sys/devices/system/memory/memory$1/state
+}
+
+add_memory_expect_success()
+{
+	local memory=$1
+
+	if ! add_memory $memory; then
+		echo $FUNCNAME $memory: unexpected fail >&2
+	elif ! memory_is_online $memory; then
+		echo $FUNCNAME $memory: unexpected offline >&2
+	fi
+}
+
+add_memory_expect_fail()
+{
+	local memory=$1
+
+	if add_memory $memory 2> /dev/null; then
+		echo $FUNCNAME $memory: unexpected success >&2
+	elif ! memory_is_offline $memory; then
+		echo $FUNCNAME $memory: unexpected online >&2
+	fi
+}
+
+remove_memory_expect_success()
+{
+	local memory=$1
+
+	if ! remove_memory $memory; then
+		echo $FUNCNAME $memory: unexpected fail >&2
+	elif ! memory_is_offline $memory; then
+		echo $FUNCNAME $memory: unexpected offline >&2
+	fi
+}
+
+remove_memory_expect_fail()
+{
+	local memory=$1
+
+	if remove_memory $memory 2> /dev/null; then
+		echo $FUNCNAME $memory: unexpected success >&2
+	elif ! memory_is_online $memory; then
+		echo $FUNCNAME $memory: unexpected offline >&2
+	fi
+}
+
+if [ $UID != 0 ]; then
+	echo must be run as root >&2
+	exit 1
+fi
+
+error=-12
+priority=0
+ratio=10
+
+while getopts e:hp:r: opt; do
+	case $opt in
+	e)
+		error=$OPTARG
+		;;
+	h)
+		echo "Usage $0 [ -e errno ] [ -p notifier-priority ] [ -r percent-of-memory-to-offline ]"
+		exit
+		;;
+	p)
+		priority=$OPTARG
+		;;
+	r)
+		ratio=$OPTARG
+		;;
+	esac
+done
+
+if ! [ "$error" -ge -4095 -a "$error" -lt 0 ]; then
+	echo "error code must be -4095 <= errno < 0" >&2
+	exit 1
+fi
+
+DEBUGFS=`mount -t debugfs | head -1 | awk '{ print $3 }'`
+
+if [ ! -d "$DEBUGFS" ]; then
+	echo debugfs is not mounted >&2
+	exit 1
+fi
+
+/sbin/modprobe -r memory-notifier-error-inject
+/sbin/modprobe -q memory-notifier-error-inject priority=$priority
+
+NOTIFIER_ERR_INJECT_DIR=$DEBUGFS/notifier-error-inject/memory
+
+if [ ! -d $NOTIFIER_ERR_INJECT_DIR ]; then
+	echo memory-notifier-error-inject module is not available >&2
+	exit 1
+fi
+
+#
+# Offline $ratio percent of hot-pluggable memory
+#
+echo 0 > $NOTIFIER_ERR_INJECT_DIR/actions/MEM_GOING_OFFLINE/error
+for memory in `hotpluggable_online_memory`; do
+	if [ $((RANDOM % 100)) -lt $ratio ]; then
+		remove_memory_expect_success $memory
+	fi
+done
+
+#
+# Test memory hot-add error handling (offline => online)
+#
+echo $error > $NOTIFIER_ERR_INJECT_DIR/actions/MEM_GOING_ONLINE/error
+for memory in `hotplaggable_offline_memory`; do
+	add_memory_expect_fail $memory
+done
+
+#
+# Online all hot-pluggable memory
+#
+echo 0 > $NOTIFIER_ERR_INJECT_DIR/actions/MEM_GOING_ONLINE/error
+for memory in `hotplaggable_offline_memory`; do
+	add_memory_expect_success $memory
+done
+
+#
+# Test memory hot-remove error handling (online => offline)
+#
+echo $error > $NOTIFIER_ERR_INJECT_DIR/actions/MEM_GOING_OFFLINE/error
+for memory in `hotpluggable_online_memory`; do
+	remove_memory_expect_fail $memory
+done
+
+echo 0 > $NOTIFIER_ERR_INJECT_DIR/actions/MEM_GOING_OFFLINE/error
+/sbin/modprobe -r memory-notifier-error-inject