diff mbox

[libgomp,testsuite] Support parallel testing in libgomp (PR libgomp/66005)

Message ID yddlhh0hblq.fsf@lokon.CeBiTec.Uni-Bielefeld.DE
State New
Headers show

Commit Message

Rainer Orth May 7, 2015, 11:26 a.m. UTC
As reported in the PR, with the addition of all those OpenACC tests,
libgomp make check times have skyrocketed since the testsuite is still
run sequentially.

Even on a reasonably fast x86 machine (4 x 2.0 Ghz Xeon E7450) the run
takes 4286 seconds.

On slower sparc boxes (1.2 GHz UltraSPARC-T2) we're at 29406 seconds,
compared to 7825 seconds on the 4.9 branch.

Thus, the libgomp tests massively slow down the whole testsuite run,
being the last part to finish.

Fixing this proved trivial: I managed to almost literally copy the
solution from libstdc++-v3/testsuite/Makefile.am, with a minimal change
to libgomp.exp so the generated libgomp-test-support.exp file is found
in both the sequential and parallel cases.  This isn't an issue in
libstdc++ since all necessary variables are stored in a single
site.exp.

Tested with make (1:13:53.80) and make -j128 (4:58.78) on
i386-pc-solaris2.11 (4 x 2.0 Ghz Xeon E7-4850), no differences in
results according to contrib/dg-cmp-results.sh.

Ok for mainline and gcc-5 branch after some soak time?

	Rainer


2015-05-06  Rainer Orth  <ro@CeBiTec.Uni-Bielefeld.DE>

	PR libgomp/66005
	* testsuite/Makefile.am (PWD_COMMAND): New variable.
	(%/site.exp): New target.
	(check_p_numbers0, check_p_numbers1, check_p_numbers2)
	(check_p_numbers3, check_p_numbers4, check_p_numbers5)
	(check_p_numbers6, check_p_numbers, check_p_subdirs)
	(check_DEJAGNU_normal_targets): New variables.
	($(check_DEJAGNU_normal_targets)): New target.
	($(check_DEJAGNU_normal_targets)): New dependency.
	(check-DEJAGNU $(check_DEJAGNU_normal_targets)): New targets.
	* testsuite/Makefile.in: Regenerate.
	* testsuite/lib/libgomp.exp: Also search in .. for
	libgomp-test-support.exp.

Comments

Jakub Jelinek May 7, 2015, 11:39 a.m. UTC | #1
On Thu, May 07, 2015 at 01:26:57PM +0200, Rainer Orth wrote:
> As reported in the PR, with the addition of all those OpenACC tests,
> libgomp make check times have skyrocketed since the testsuite is still
> run sequentially.
> 
> Even on a reasonably fast x86 machine (4 x 2.0 Ghz Xeon E7450) the run
> takes 4286 seconds.
> 
> On slower sparc boxes (1.2 GHz UltraSPARC-T2) we're at 29406 seconds,
> compared to 7825 seconds on the 4.9 branch.
> 
> Thus, the libgomp tests massively slow down the whole testsuite run,
> being the last part to finish.
> 
> Fixing this proved trivial: I managed to almost literally copy the
> solution from libstdc++-v3/testsuite/Makefile.am, with a minimal change
> to libgomp.exp so the generated libgomp-test-support.exp file is found
> in both the sequential and parallel cases.  This isn't an issue in
> libstdc++ since all necessary variables are stored in a single
> site.exp.

It is far from trivial though.
The point is that most of the OpenMP tests are parallelized with the
default OMP_NUM_THREADS, so running the tests in parallel oversubscribes the
machine a lot, the higher number of hw threads the more.

If we go forward with some parallelization of the tests, we at least should
try to export something like OMP_WAIT_POLICY=passive so that the
oversubscribed machine would at least not spend too much time in spinning.

And perhaps reconsider running all OpenACC threads 3 times, just allow
user to select which offloading target they want to test (host fallback,
the host nonshm hack, PTX, XeonPHI in the future?), and test just that
(that is pretty much how OpenMP offloading testing works).  For tests that
always want to test host fallback, I hope OpenACC offers clauses to force
the host fallback.

	Jakub
Mike Stump May 7, 2015, 6:06 p.m. UTC | #2
On May 7, 2015, at 4:39 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Thu, May 07, 2015 at 01:26:57PM +0200, Rainer Orth wrote:
>> As reported in the PR, with the addition of all those OpenACC tests,
>> libgomp make check times have skyrocketed since the testsuite is still
>> run sequentially.
>> 
>> Even on a reasonably fast x86 machine (4 x 2.0 Ghz Xeon E7450) the run
>> takes 4286 seconds.
>> 
>> On slower sparc boxes (1.2 GHz UltraSPARC-T2) we're at 29406 seconds,
>> compared to 7825 seconds on the 4.9 branch.
>> 
>> Thus, the libgomp tests massively slow down the whole testsuite run,
>> being the last part to finish.
>> 
>> Fixing this proved trivial: I managed to almost literally copy the
>> solution from libstdc++-v3/testsuite/Makefile.am, with a minimal change
>> to libgomp.exp so the generated libgomp-test-support.exp file is found
>> in both the sequential and parallel cases.  This isn't an issue in
>> libstdc++ since all necessary variables are stored in a single
>> site.exp.
> 
> It is far from trivial though.
> The point is that most of the OpenMP tests are parallelized with the
> default OMP_NUM_THREADS, so running the tests in parallel oversubscribes the
> machine a lot

If OpenMP cannot keep the machine busy, then the test suite should.  A 15x speed up means that OpenMP cannot keep the machine busy.  I’d not expect OpenMP to fill the gap here, so that leave just the test suite.  So, unless someone wants to try their hand at getting some serious time from OpenMP, I think the patch lies on the path of goodness.
Thomas Schwinge May 8, 2015, 8:40 a.m. UTC | #3
Hi!

On Thu, 7 May 2015 13:39:40 +0200, Jakub Jelinek <jakub@redhat.com> wrote:
> On Thu, May 07, 2015 at 01:26:57PM +0200, Rainer Orth wrote:
> > As reported in the PR, with the addition of all those OpenACC tests,
> > libgomp make check times have skyrocketed since the testsuite is still
> > run sequentially.

ACK.  And, thanks for looking into that!

> > Fixing this proved trivial: I managed to almost literally copy the
> > solution from libstdc++-v3/testsuite/Makefile.am, with a minimal change
> > to libgomp.exp so the generated libgomp-test-support.exp file is found
> > in both the sequential and parallel cases.  This isn't an issue in
> > libstdc++ since all necessary variables are stored in a single
> > site.exp.
> 
> It is far from trivial though.
> The point is that most of the OpenMP tests are parallelized with the
> default OMP_NUM_THREADS, so running the tests in parallel oversubscribes the
> machine a lot, the higher number of hw threads the more.

Do you agree that we have two classes of test cases in libgomp: 1) test
cases that don't place a considerably higher load on the machine compared
to "normal" (single-threaded) execution tests, because they're just
testing some functionality that is not expected to actively depend
on/interfere with parallelism.  If needed, and/or if not already done,
such test cases can be parameterized (OMP_NUM_THREADS, OpenACC num_gangs,
num_workers, vector_length clauses, and so on) for low parallelism
levels.  And, 2) test cases that place a considerably higher load on the
machine compared to "normal" (single-threaded) execution tests, because
they're testing some functionality that actively depends on/interferes
with some kind of parallelism.  What about marking such tests specially,
such that DejaGnu will only ever schedule one of them for execution at
the same time?  For example, a new dg-* directive to run them wrapped
through »flock [libgomp/testsuite/serial.lock] [a.out]« or some such?

> If we go forward with some parallelization of the tests, we at least should
> try to export something like OMP_WAIT_POLICY=passive so that the
> oversubscribed machine would at least not spend too much time in spinning.

(Will again have the problem that DejaGnu doesn't provide infrastructure
to communicate environment variables to boards in remote testing.)

> And perhaps reconsider running all OpenACC threads 3 times, just allow
> user to select which offloading target they want to test (host fallback,
> the host nonshm hack, PTX, XeonPHI in the future?), and test just that
> (that is pretty much how OpenMP offloading testing works).

My rationale is: if you configure GCC to support a set of offloading
devices (more than one), you'll also want to get the test coverage that
indeed all these work as expected.  (It currently doesn't matter, but...)
that's something I'd like to see improved in the libgomp OpenMP
offloading testing (once it supports more than one architecture for
offloading).

> For tests that
> always want to test host fallback, I hope OpenACC offers clauses to force
> the host fallback.

Yes.


Grüße,
 Thomas
Martin Liška Aug. 14, 2018, 8:37 a.m. UTC | #4
On 05/08/2015 10:40 AM, Thomas Schwinge wrote:
> Hi!
> 
> On Thu, 7 May 2015 13:39:40 +0200, Jakub Jelinek <jakub@redhat.com> wrote:
>> On Thu, May 07, 2015 at 01:26:57PM +0200, Rainer Orth wrote:
>>> As reported in the PR, with the addition of all those OpenACC tests,
>>> libgomp make check times have skyrocketed since the testsuite is still
>>> run sequentially.
> 
> ACK.  And, thanks for looking into that!
> 
>>> Fixing this proved trivial: I managed to almost literally copy the
>>> solution from libstdc++-v3/testsuite/Makefile.am, with a minimal change
>>> to libgomp.exp so the generated libgomp-test-support.exp file is found
>>> in both the sequential and parallel cases.  This isn't an issue in
>>> libstdc++ since all necessary variables are stored in a single
>>> site.exp.
>>
>> It is far from trivial though.
>> The point is that most of the OpenMP tests are parallelized with the
>> default OMP_NUM_THREADS, so running the tests in parallel oversubscribes the
>> machine a lot, the higher number of hw threads the more.
> 
> Do you agree that we have two classes of test cases in libgomp: 1) test
> cases that don't place a considerably higher load on the machine compared
> to "normal" (single-threaded) execution tests, because they're just
> testing some functionality that is not expected to actively depend
> on/interfere with parallelism.  If needed, and/or if not already done,
> such test cases can be parameterized (OMP_NUM_THREADS, OpenACC num_gangs,
> num_workers, vector_length clauses, and so on) for low parallelism
> levels.  And, 2) test cases that place a considerably higher load on the
> machine compared to "normal" (single-threaded) execution tests, because
> they're testing some functionality that actively depends on/interferes
> with some kind of parallelism.  What about marking such tests specially,
> such that DejaGnu will only ever schedule one of them for execution at
> the same time?  For example, a new dg-* directive to run them wrapped
> through »flock [libgomp/testsuite/serial.lock] [a.out]« or some such?

Looks the thread got stuck. Anyway I've just noticed how slow libgomp.exp tests
are on a recent Intel Machine with 160 HT cores. I'm attaching graph with CPU
utilization and 'ps ax | grep expect' log file that shows which tests are running.
Roughly, after 10 minutes I see drop in utilization and then libgomp.exp is running
mainly serially.

So I believe splitting tests in libgomp.exp to serial and parallel would make sense.
Another another idea is to overwrite OMP_NUM_THREADS to a reasonable number which
will enable also parallel execution of parallel tests?

Thanks,
Martin

> 
>> If we go forward with some parallelization of the tests, we at least should
>> try to export something like OMP_WAIT_POLICY=passive so that the
>> oversubscribed machine would at least not spend too much time in spinning.
> 
> (Will again have the problem that DejaGnu doesn't provide infrastructure
> to communicate environment variables to boards in remote testing.)
> 
>> And perhaps reconsider running all OpenACC threads 3 times, just allow
>> user to select which offloading target they want to test (host fallback,
>> the host nonshm hack, PTX, XeonPHI in the future?), and test just that
>> (that is pretty much how OpenMP offloading testing works).
> 
> My rationale is: if you configure GCC to support a set of offloading
> devices (more than one), you'll also want to get the test coverage that
> indeed all these work as expected.  (It currently doesn't matter, but...)
> that's something I'd like to see improved in the libgomp OpenMP
> offloading testing (once it supports more than one architecture for
> offloading).
> 
>> For tests that
>> always want to test host fallback, I hope OpenACC offers clauses to force
>> the host fallback.
> 
> Yes.
> 
> 
> Grüße,
>  Thomas
>
diff mbox

Patch

# HG changeset patch
# Parent 56a827256364c7b567b751287defdb0c9eabc666
Support parallel testing in libgomp (PR libgomp/66005)

diff --git a/libgomp/testsuite/Makefile.am b/libgomp/testsuite/Makefile.am
--- a/libgomp/testsuite/Makefile.am
+++ b/libgomp/testsuite/Makefile.am
@@ -12,6 +12,71 @@  EXPECT = $(shell if test -f $(top_buildd
 	     echo $(top_srcdir)/../dejagnu/runtest; else echo runtest; fi)
 RUNTEST = "$(_RUNTEST) $(AM_RUNTESTFLAGS)"
 
+PWD_COMMAND = $${PWDCMD-pwd}
+
+%/site.exp: site.exp
+	-@test -d $* || mkdir $*
+	@srcdir=`cd $(srcdir); ${PWD_COMMAND}`;
+	@objdir=`${PWD_COMMAND}`/$*; \
+	sed -e "s|^set srcdir .*$$|set srcdir $$srcdir|" \
+	    -e "s|^set objdir .*$$|set objdir $$objdir|" \
+	    site.exp > $*/site.exp.tmp
+	@-rm -f $*/site.bak
+	@test ! -f $*/site.exp || mv $*/site.exp $*/site.bak
+	@mv $*/site.exp.tmp $*/site.exp
+
+check_p_numbers0:=1 2 3 4 5 6 7 8 9
+check_p_numbers1:=0 $(check_p_numbers0)
+check_p_numbers2:=$(foreach i,$(check_p_numbers0),$(addprefix $(i),$(check_p_numbers1)))
+check_p_numbers3:=$(addprefix 0,$(check_p_numbers1)) $(check_p_numbers2)
+check_p_numbers4:=$(foreach i,$(check_p_numbers0),$(addprefix $(i),$(check_p_numbers3)))
+check_p_numbers5:=$(addprefix 0,$(check_p_numbers3)) $(check_p_numbers4)
+check_p_numbers6:=$(foreach i,$(check_p_numbers0),$(addprefix $(i),$(check_p_numbers5)))
+check_p_numbers:=$(check_p_numbers0) $(check_p_numbers2) $(check_p_numbers4) $(check_p_numbers6)
+check_p_subdirs=$(wordlist 1,$(if $(GCC_TEST_PARALLEL_SLOTS),$(GCC_TEST_PARALLEL_SLOTS),128),$(check_p_numbers))
+check_DEJAGNU_normal_targets = $(addprefix check-DEJAGNUnormal,$(check_p_subdirs))
+$(check_DEJAGNU_normal_targets): check-DEJAGNUnormal%: normal%/site.exp
+
+check-DEJAGNU $(check_DEJAGNU_normal_targets): check-DEJAGNU%: site.exp
+	$(if $*,@)AR="$(AR)"; export AR; \
+	RANLIB="$(RANLIB)"; export RANLIB; \
+	if [ -z "$*" ] && [ "$(filter -j, $(MFLAGS))" = "-j" ]; then \
+	  rm -rf normal-parallel || true; \
+	  mkdir normal-parallel; \
+	  $(MAKE) $(AM_MAKEFLAGS) $(check_DEJAGNU_normal_targets); \
+	  rm -rf normal-parallel || true; \
+	  for idx in $(check_p_subdirs); do \
+	    if [ -d normal$$idx ]; then \
+	      mv -f normal$$idx/libgomp.sum normal$$idx/libgomp.sum.sep; \
+	      mv -f normal$$idx/libgomp.log normal$$idx/libgomp.log.sep; \
+	    fi; \
+	  done; \
+	  $(SHELL) $(srcdir)/../../contrib/dg-extract-results.sh \
+	    normal[0-9]*/libgomp.sum.sep > libgomp.sum; \
+	  $(SHELL) $(srcdir)/../../contrib/dg-extract-results.sh -L \
+	    normal[0-9]*/libgomp.log.sep > libgomp.log; \
+	  exit 0; \
+	fi; \
+	srcdir=`$(am__cd) $(srcdir) && pwd`; export srcdir; \
+	EXPECT=$(EXPECT); export EXPECT; \
+	runtest=$(RUNTEST); \
+	if [ -z "$$runtest" ]; then runtest=runtest; fi; \
+	tool=libgomp; \
+	if [ -n "$*" ]; then \
+	  if [ -f normal-parallel/finished ]; then rm -rf "$*"; exit 0; fi; \
+	  GCC_RUNTEST_PARALLELIZE_DIR=`${PWD_COMMAND}`/normal-parallel; \
+	  export GCC_RUNTEST_PARALLELIZE_DIR; \
+	  cd "$*"; \
+	fi; \
+	if $(SHELL) -c "$$runtest --version" > /dev/null 2>&1; then \
+	  $$runtest $(AM_RUNTESTFLAGS) $(RUNTESTDEFAULTFLAGS) \
+		    $(RUNTESTFLAGS); \
+	  if [ -n "$*" ]; then \
+	    touch $$GCC_RUNTEST_PARALLELIZE_DIR/finished; \
+	  fi; \
+	else \
+	  echo "WARNING: could not find \`runtest'" 1>&2; :;\
+	fi
 
 # Instead of directly in ../testsuite/libgomp-test-support.exp.in, the
 # following variables have to be "routed through" this Makefile, for expansion
diff --git a/libgomp/testsuite/lib/libgomp.exp b/libgomp/testsuite/lib/libgomp.exp
--- a/libgomp/testsuite/lib/libgomp.exp
+++ b/libgomp/testsuite/lib/libgomp.exp
@@ -33,7 +33,8 @@  load_gcc_lib torture-options.exp
 load_gcc_lib fortran-modules.exp
 
 # Try to load a test support file, built during libgomp configuration.
-load_file libgomp-test-support.exp
+# Search in both .. and . to support parallel and sequential testing.
+load_file -1 ../libgomp-test-support.exp libgomp-test-support.exp
 
 # Populate offload_targets_s (offloading targets separated by a space), and
 # offload_targets_s_openacc (the same, but with OpenACC names; OpenACC spells