diff mbox

[kvm-unit-tests,v2,2/2] run_tests: allow run tests in parallel

Message ID 1483438205-31110-3-git-send-email-peterx@redhat.com
State New
Headers show

Commit Message

Peter Xu Jan. 3, 2017, 10:10 a.m. UTC
run_task.sh is getting slow. This patch is trying to make it faster by
running the tests concurrently.

We provide a new parameter "-j" for the run_tests.sh, which can be used
to specify how many run queues we want for the tests. Default queue
length is 1, which is the old behavior.

Quick test on my laptop (4 cores, 2 threads each) shows 3x speed boost:

   |-----------------+-----------|
   | command         | time used |
   |-----------------+-----------|
   | run_test.sh     | 75s       |
   | run_test.sh -j8 | 27s       |
   |-----------------+-----------|

Suggested-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 run_tests.sh           | 12 ++++++++++--
 scripts/functions.bash | 15 ++++++++++++++-
 scripts/global.bash    | 11 +++++++++++
 3 files changed, 35 insertions(+), 3 deletions(-)

Comments

Peter Xu Jan. 3, 2017, 10:17 a.m. UTC | #1
On Tue, Jan 03, 2017 at 06:10:05PM +0800, Peter Xu wrote:

[...]

> diff --git a/scripts/global.bash b/scripts/global.bash
> index 77b0b29..52095bd 100644
> --- a/scripts/global.bash
> +++ b/scripts/global.bash
> @@ -1,2 +1,13 @@
>  : ${ut_log_dir:=logs}
>  : ${ut_log_summary:=${ut_log_dir}/SUMMARY}
> +: ${ut_run_queues:=1}
> +
> +function ut_in_parallel()
> +{
> +    [[ $ut_run_queues != 1 ]]
> +}
> +

This function is useless now. Should remove above five lines. Sorry.

-- peterx
Radim Krčmář Jan. 4, 2017, 3:09 p.m. UTC | #2
2017-01-03 18:10+0800, Peter Xu:
> run_task.sh is getting slow. This patch is trying to make it faster by
> running the tests concurrently.
> 
> We provide a new parameter "-j" for the run_tests.sh, which can be used
> to specify how many run queues we want for the tests. Default queue
> length is 1, which is the old behavior.
> 
> Quick test on my laptop (4 cores, 2 threads each) shows 3x speed boost:
> 
>    |-----------------+-----------|
>    | command         | time used |
>    |-----------------+-----------|
>    | run_test.sh     | 75s       |
>    | run_test.sh -j8 | 27s       |
>    |-----------------+-----------|
> 
> Suggested-by: Radim Krčmář <rkrcmar@redhat.com>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  run_tests.sh           | 12 ++++++++++--
>  scripts/functions.bash | 15 ++++++++++++++-
>  scripts/global.bash    | 11 +++++++++++
>  3 files changed, 35 insertions(+), 3 deletions(-)

I like this diffstat a lot more, thanks :)

The script doesn't handle ^C well now (at least), which can be worked
around with

  trap exit SIGINT

but it would be nice to know if receiving signals in `wait` can't be
fixed.
Peter Xu Jan. 5, 2017, 3:07 a.m. UTC | #3
On Wed, Jan 04, 2017 at 04:09:39PM +0100, Radim Krčmář wrote:
> 2017-01-03 18:10+0800, Peter Xu:
> > run_task.sh is getting slow. This patch is trying to make it faster by
> > running the tests concurrently.
> > 
> > We provide a new parameter "-j" for the run_tests.sh, which can be used
> > to specify how many run queues we want for the tests. Default queue
> > length is 1, which is the old behavior.
> > 
> > Quick test on my laptop (4 cores, 2 threads each) shows 3x speed boost:
> > 
> >    |-----------------+-----------|
> >    | command         | time used |
> >    |-----------------+-----------|
> >    | run_test.sh     | 75s       |
> >    | run_test.sh -j8 | 27s       |
> >    |-----------------+-----------|
> > 
> > Suggested-by: Radim Krčmář <rkrcmar@redhat.com>
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >  run_tests.sh           | 12 ++++++++++--
> >  scripts/functions.bash | 15 ++++++++++++++-
> >  scripts/global.bash    | 11 +++++++++++
> >  3 files changed, 35 insertions(+), 3 deletions(-)
> 
> I like this diffstat a lot more, thanks :)
> 
> The script doesn't handle ^C well now (at least), which can be worked
> around with
> 
>   trap exit SIGINT
> 
> but it would be nice to know if receiving signals in `wait` can't be
> fixed.

When I send SIGINT to "run_tests.sh -j8", I see process hang dead. Not
sure whether you see the same thing:

#0  0x00007f7af2e1559a in waitpid () from /lib64/libc.so.6
#1  0x00005613edf8953e in waitchld.isra ()
#2  0x00005613edf8aae5 in wait_for ()
#3  0x00005613edf8b682 in wait_for_any_job ()
#4  0x00005613edfc7e64 in wait_builtin ()
#5  0x00005613edf616ea in execute_builtin.isra ()
#6  0x00005613edf623ee in execute_simple_command ()
#7  0x00005613edf79e77 in execute_command_internal ()
#8  0x00005613edf7b972 in execute_command ()
#9  0x00005613edf62aca in execute_while_or_until ()
#10 0x00005613edf7a156 in execute_command_internal ()
#11 0x00005613edf79d88 in execute_command_internal ()
...

If I change the "wait -n" into "wait" (this will work, but of course
slower since we'll wait for all subprocesses end before we start
another one), problem disappears. Not sure whether that means a "wait
-n" bug.

Anyway, IMHO squashing you suggestion of "trap exit SIGINT" at the
entry of for_each_unittest() is an acceptable solution - it works in
all cases.

Thanks,

-- peterx
Radim Krčmář Jan. 5, 2017, 7:44 p.m. UTC | #4
2017-01-05 11:07+0800, Peter Xu:
> On Wed, Jan 04, 2017 at 04:09:39PM +0100, Radim Krčmář wrote:
>> 2017-01-03 18:10+0800, Peter Xu:
>> > run_task.sh is getting slow. This patch is trying to make it faster by
>> > running the tests concurrently.
>> > 
>> > We provide a new parameter "-j" for the run_tests.sh, which can be used
>> > to specify how many run queues we want for the tests. Default queue
>> > length is 1, which is the old behavior.
>> > 
>> > Quick test on my laptop (4 cores, 2 threads each) shows 3x speed boost:
>> > 
>> >    |-----------------+-----------|
>> >    | command         | time used |
>> >    |-----------------+-----------|
>> >    | run_test.sh     | 75s       |
>> >    | run_test.sh -j8 | 27s       |
>> >    |-----------------+-----------|
>> > 
>> > Suggested-by: Radim Krčmář <rkrcmar@redhat.com>
>> > Signed-off-by: Peter Xu <peterx@redhat.com>
>> > ---
>> >  run_tests.sh           | 12 ++++++++++--
>> >  scripts/functions.bash | 15 ++++++++++++++-
>> >  scripts/global.bash    | 11 +++++++++++
>> >  3 files changed, 35 insertions(+), 3 deletions(-)
>> 
>> I like this diffstat a lot more, thanks :)
>> 
>> The script doesn't handle ^C well now (at least), which can be worked
>> around with
>> 
>>   trap exit SIGINT
>> 
>> but it would be nice to know if receiving signals in `wait` can't be
>> fixed.
> 
> When I send SIGINT to "run_tests.sh -j8", I see process hang dead. Not
> sure whether you see the same thing:
> 
> #0  0x00007f7af2e1559a in waitpid () from /lib64/libc.so.6
> #1  0x00005613edf8953e in waitchld.isra ()
> #2  0x00005613edf8aae5 in wait_for ()
> #3  0x00005613edf8b682 in wait_for_any_job ()
> #4  0x00005613edfc7e64 in wait_builtin ()
> #5  0x00005613edf616ea in execute_builtin.isra ()
> #6  0x00005613edf623ee in execute_simple_command ()
> #7  0x00005613edf79e77 in execute_command_internal ()
> #8  0x00005613edf7b972 in execute_command ()
> #9  0x00005613edf62aca in execute_while_or_until ()
> #10 0x00005613edf7a156 in execute_command_internal ()
> #11 0x00005613edf79d88 in execute_command_internal ()
> ...

I do.  And sometimes, I also caught it in a signal handler:

  #0  0x00007f7461bcd637 in kill () at ../sysdeps/unix/syscall-template.S:84
  #1  0x00000000004476b9 in wait_sigint_handler (sig=<optimized out>) at jobs.c:2504
  #2  <signal handler called>
  #3  0x00007f7461c674ca in __GI___waitpid (pid=pid@entry=-1, stat_loc=stat_loc@entry=0x7ffdb49dfbd8, 
      options=options@entry=0) at ../sysdeps/unix/sysv/linux/waitpid.c:29
  #4  0x0000000000449cfb in waitchld (block=block@entry=1, wpid=-1) at jobs.c:3474
  #5  0x000000000044b2eb in wait_for (pid=pid@entry=-1) at jobs.c:2718
  #6  0x000000000044becd in wait_for_any_job () at jobs.c:3015
  #7  0x000000000048c38b in wait_builtin (list=0x0) at ./wait.def:154
  ...

> 
> If I change the "wait -n" into "wait" (this will work, but of course
> slower since we'll wait for all subprocesses end before we start
> another one), problem disappears.

Yeah, and just `wait` doesn't work with -j1 ... using `jobs -r` gets rid
of it.

>                                   Not sure whether that means a "wait
> -n" bug.

Most likely a bash bug:
 - doesn't happen if you SIGINT before the first job has finished
 - doesn't happen with normal wait

> Anyway, IMHO squashing you suggestion of "trap exit SIGINT" at the
> entry of for_each_unittest() is an acceptable solution - it works in
> all cases.

Seems like the best solution at this time ...
We actually want to "exit 130", as it would when using normal wait does,
and maybe it could be improved a bit by adding a wait:

  trap 'wait; exit 130' SIGINT
Peter Xu Jan. 6, 2017, 3:31 a.m. UTC | #5
On Thu, Jan 05, 2017 at 08:44:02PM +0100, Radim Krčmář wrote:

[...]

> > Anyway, IMHO squashing you suggestion of "trap exit SIGINT" at the
> > entry of for_each_unittest() is an acceptable solution - it works in
> > all cases.
> 
> Seems like the best solution at this time ...
> We actually want to "exit 130", as it would when using normal wait does,
> and maybe it could be improved a bit by adding a wait:
> 
>   trap 'wait; exit 130' SIGINT

Sounds better, thanks!

Will repost with this one.

-- peterx
diff mbox

Patch

diff --git a/run_tests.sh b/run_tests.sh
index e1bb3a6..a4fc895 100755
--- a/run_tests.sh
+++ b/run_tests.sh
@@ -14,10 +14,11 @@  function usage()
 {
 cat <<EOF
 
-Usage: $0 [-g group] [-h] [-v]
+Usage: $0 [-g group] [-h] [-v] [-j N]
 
     -g: Only execute tests in the given group
     -h: Output this help text
+    -j: Execute tests in parallel
     -v: Enables verbose mode
 
 Set the environment variable QEMU=/path/to/qemu-system-ARCH to
@@ -29,7 +30,7 @@  EOF
 RUNTIME_arch_run="./$TEST_DIR/run"
 source scripts/runtime.bash
 
-while getopts "g:hv" opt; do
+while getopts "g:hj:v" opt; do
     case $opt in
         g)
             only_group=$OPTARG
@@ -38,6 +39,13 @@  while getopts "g:hv" opt; do
             usage
             exit
             ;;
+        j)
+            ut_run_queues=$OPTARG
+            if ! is_number "$ut_run_queues"; then
+                echo "Invalid -j option: $ut_run_queues"
+                exit 1
+            fi
+            ;;
         v)
             verbose="yes"
             ;;
diff --git a/scripts/functions.bash b/scripts/functions.bash
index d1d2e1c..acb306b 100644
--- a/scripts/functions.bash
+++ b/scripts/functions.bash
@@ -1,9 +1,18 @@ 
+source scripts/global.bash
+
 function run_task()
 {
 	local testname="$2"
 
+	while [[ "$(jobs | wc -l)" == $ut_run_queues ]]; do
+		# wait for any background test to finish
+		wait -n
+	done
+
 	RUNTIME_log_file="${ut_log_dir}/${testname}.log"
-	"$@"
+
+	# start the testcase in the background
+	"$@" &
 }
 
 function for_each_unittest()
@@ -53,5 +62,9 @@  function for_each_unittest()
 		fi
 	done
 	run_task "$cmd" "$testname" "$groups" "$smp" "$kernel" "$opts" "$arch" "$check" "$accel" "$timeout"
+
+	# wait all task finish
+	wait
+
 	exec {fd}<&-
 }
diff --git a/scripts/global.bash b/scripts/global.bash
index 77b0b29..52095bd 100644
--- a/scripts/global.bash
+++ b/scripts/global.bash
@@ -1,2 +1,13 @@ 
 : ${ut_log_dir:=logs}
 : ${ut_log_summary:=${ut_log_dir}/SUMMARY}
+: ${ut_run_queues:=1}
+
+function ut_in_parallel()
+{
+    [[ $ut_run_queues != 1 ]]
+}
+
+function is_number()
+{
+    [[ "$1" =~ ^[0-9]+$ ]]
+}