diff mbox

[ada,testsuite] Parallelize check-gnat

Message ID yddlgxh6bsj.fsf@CeBiTec.Uni-Bielefeld.DE
State New
Headers show

Commit Message

Rainer Orth Oct. 21, 2016, 2:01 p.m. UTC
I happened to notice that the gnat.dg testsuite run is slow even on a
reasonably fast SPARC machine (3.6 GHz SPARC T5) and together with the
libgomp testsuite (PR libgomp/66005) dominates bootstrap time: within a
make -j96 -k check, it takes 1h 18m 37s.  For unknown reasons,
check-gnat isn't parallelized though it is trivial to do and buys quite
a bit:

* On the same machine, though otherwise idle, it reduces make -j96
  check-gnat time to 2m 23s and even within a full bootstrap, the time
  goes down to 44m 6s.

* On x86 systems, there are also considable speedups:

  2.6 GHz AMD Opteron 8435, -j24	43m 24s => 33m 4s
  2.93 GHz Intel Xeon X7350, -j16       30m 7s  =>  9m 8s
  2.67 GHz Intel Xeon X7542, -j48       14m 56s =>  5m 50s

Seems like a worthwhile speedup to me.  Bootstrapped without regressions
on i386-pc-solaris2.12, sparc-sun-solaris2.12, and x86_64-pc-linux-gnu.
dg-cmp-results.sh reports the sequential and parallel gnat.sum as
identical.

Ok for mainline (and eventually for 5 and 6 branches given the small
size and low risk of the patch)?

Thanks.
	Rainer

Comments

Arnaud Charlet Oct. 21, 2016, 2:17 p.m. UTC | #1
> Ok for mainline (and eventually for 5 and 6 branches given the small
> size and low risk of the patch)?

I'm not familiar with lang_checks_parallelized, but that's OK with me on
principle.

Arno
Jakub Jelinek Oct. 21, 2016, 2:55 p.m. UTC | #2
On Fri, Oct 21, 2016 at 04:01:48PM +0200, Rainer Orth wrote:
> I happened to notice that the gnat.dg testsuite run is slow even on a
> reasonably fast SPARC machine (3.6 GHz SPARC T5) and together with the
> libgomp testsuite (PR libgomp/66005) dominates bootstrap time: within a
> make -j96 -k check, it takes 1h 18m 37s.  For unknown reasons,
> check-gnat isn't parallelized though it is trivial to do and buys quite
> a bit:

check-gnat dominates anything?  That just really weird,
it has only
# of expected passes            2544
# of unexpected failures        2
# of expected failures          24
# of unsupported tests          3

compared to the 100000+ tests in gcc/g++ or 40000+ in gfortran testsuites
it is just nothing.

libgomp is a know problem, sure, the problem with parallelizing it is that
many tests just use all available cores/threads.  Perhaps we should do some
small (at most 2 or 3 concurrent libgomp tests) parallelization of the
libgomp testsuite unless disallowed through some env var option, but in that
case bound OMP_NUM_THREADS if `getconf _NPROCESSORS_ONLN` > 32 to
`getconf _NPROCESSORS_ONLN` / 2 or something similar.

I'm not strongly against your patch, I'm just very surprised it is really
needed (acats is much larger, check-gnat is small).

> 2016-10-21  Rainer Orth  <ro@CeBiTec.Uni-Bielefeld.DE>
> 
> 	* gcc-interface/Make-lang.in (lang_checks_parallelized): New target.
> 	(check_gnat_parallelize): Likewise.
> 

	Jakub
Eric Botcazou Oct. 21, 2016, 4:54 p.m. UTC | #3
> I'm not strongly against your patch, I'm just very surprised it is really
> needed (acats is much larger, check-gnat is small).

In what unit do you count?  ACATS has fewer tests than gnat.dg nowadays.
Mike Stump Oct. 21, 2016, 5:55 p.m. UTC | #4
On Oct 21, 2016, at 9:54 AM, Eric Botcazou <ebotcazou@adacore.com> wrote:
> 
>> I'm not strongly against your patch, I'm just very surprised it is really
>> needed (acats is much larger, check-gnat is small).
> 
> In what unit do you count?  ACATS has fewer tests than gnat.dg nowadays.

The only unit that matters, wall seconds.
Mike Stump Oct. 21, 2016, 6:03 p.m. UTC | #5
On Oct 21, 2016, at 7:01 AM, Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE> wrote:
> 
> I happened to notice that the gnat.dg testsuite run is slow

>  2.6 GHz AMD Opteron 8435, -j24	43m 24s => 33m 4s
>  2.93 GHz Intel Xeon X7350, -j16       30m 7s  =>  9m 8s
>  2.67 GHz Intel Xeon X7542, -j48       14m 56s =>  5m 50s
> 
> Seems like a worthwhile speedup to me.

> Ok for mainline

I like the change as well (if it shortens bootstrap and/or check).
Rainer Orth Oct. 24, 2016, 9:12 a.m. UTC | #6
Hi Jakub,

> On Fri, Oct 21, 2016 at 04:01:48PM +0200, Rainer Orth wrote:
>> I happened to notice that the gnat.dg testsuite run is slow even on a
>> reasonably fast SPARC machine (3.6 GHz SPARC T5) and together with the
>> libgomp testsuite (PR libgomp/66005) dominates bootstrap time: within a
>> make -j96 -k check, it takes 1h 18m 37s.  For unknown reasons,
>> check-gnat isn't parallelized though it is trivial to do and buys quite
>> a bit:
>
> check-gnat dominates anything?  That just really weird,
> it has only
> # of expected passes            2544
> # of unexpected failures        2
> # of expected failures          24
> # of unsupported tests          3
>
> compared to the 100000+ tests in gcc/g++ or 40000+ in gfortran testsuites
> it is just nothing.

That's comparing apples and oranges: the gnat.dg (and acats) tests are
all compile or even run tests, while within the gcc or g++ testsuites
you're also counting dg-error, dg-warning and some such, which are much
cheeper.

What ultimately matters is wall clock time, though (from a make -j96 run
before my patch):

	 start	    end	       	#tests	#partitions
acats	 12:28:51   13:24:25      2320          19
g++      12:28:57   14:00:13    210885          48
gcc      12:28:57   14:10:41    197266          90
gfortran 12:28:57   13:47:40     86959          32
gnat     12:28:52   14:17:16      5100           1
go       12:28:57   13:17:02     14636          11
obj-c++  12:28:53   12:48:47      3074           1
objc     12:28:57   13:14:38      5742           6

Here you can see what I mean by dominate: even on this relatively fast
system (3.6 GHz SPARC T5), the gnat testsuite runs for several minutes
beyond everything else in gcc/testsuite, thus determinating the end of
the bootstrap.  The effect becomes much more pronounced on slower boxes
(UltraSPARC T2 for example) where the machine is almost idle, running
just a single instance of runtest for half an hour or more.

> libgomp is a know problem, sure, the problem with parallelizing it is that
> many tests just use all available cores/threads.  Perhaps we should do some

Right, the same holds for the Cilk+ tests as well: I'm including my
libcilkrts-on-sparc patch in my bootstraps and often see one or two
tests failing because they time out, grabbing all 96 strands within a
make -j96 check...

> small (at most 2 or 3 concurrent libgomp tests) parallelization of the
> libgomp testsuite unless disallowed through some env var option, but in that
> case bound OMP_NUM_THREADS if `getconf _NPROCESSORS_ONLN` > 32 to
> `getconf _NPROCESSORS_ONLN` / 2 or something similar.

That would certainly be a start, even though _NPROCESSORS_ONLN/2 can
still be a bit much on larger systems, especially if they are already
running make -j_NPROCESSORS_ONLN check (or with even more parallelism).
Besides, there's no reason to limit the parallel number of compile tests
in this way.

But certainly, every single bit helps: the libgomp testsuite right now
is what really dominates make check time, check-gnat was just a
low-hanging fruit.

> I'm not strongly against your patch, I'm just very surprised it is really
> needed (acats is much larger, check-gnat is small).

Not really: on that SPARC T5 system, I have (sequential gnat.dg
vs. acats with 19 partitions), all within a -j96 bootstrap:

		wall clock		#tests

gnat.dg		6505s = 108m 25s        5100
acats           3334s =  55m 34s        2320

compared to (one week later) parallel gnat.dg (5 partitions):

gnat.dg		2458s =  40m 58s        5104

Right now, gnat.dg is larger since it's run for all multilibs (two in
this case) while acats is for the default multilib only (until I finish
my `convert acats to dg' patch).

	Rainer
Jakub Jelinek Oct. 24, 2016, 9:30 a.m. UTC | #7
On Mon, Oct 24, 2016 at 11:12:20AM +0200, Rainer Orth wrote:
> Not really: on that SPARC T5 system, I have (sequential gnat.dg
> vs. acats with 19 partitions), all within a -j96 bootstrap:
> 
> 		wall clock		#tests
> 
> gnat.dg		6505s = 108m 25s        5100

gnat.dg takes for me 8m (x86_64 Haswell-E, 4GHz, 2 parallel -j16
bootstraps/regtests on 8c/16ht), which is why I've been
so surprised it runs so much slower on SPARC.
Anyway, the patch is ok if it is so much slower on other systems.

	Jakub
Rainer Orth Oct. 24, 2016, 11:31 a.m. UTC | #8
Hi Jakub,

> On Mon, Oct 24, 2016 at 11:12:20AM +0200, Rainer Orth wrote:
>> Not really: on that SPARC T5 system, I have (sequential gnat.dg
>> vs. acats with 19 partitions), all within a -j96 bootstrap:
>> 
>> 		wall clock		#tests
>> 
>> gnat.dg		6505s = 108m 25s        5100
>
> gnat.dg takes for me 8m (x86_64 Haswell-E, 4GHz, 2 parallel -j16
> bootstraps/regtests on 8c/16ht), which is why I've been
> so surprised it runs so much slower on SPARC.

it's 15m (-m32 and -m64) on a 4-socket Intel Xeon X7542, 2.67 GHz, -j48
bootstrap on 4 x 6c/12ht) for me, but even other x86 systems (like
AMD Opteron 8435) take about thrice as long.

	Rainer
diff mbox

Patch

# HG changeset patch
# Parent  13db0c5f22f787b7a09b81e1173677a02afa240d
Parallelize check-gnat

diff --git a/gcc/ada/gcc-interface/Make-lang.in b/gcc/ada/gcc-interface/Make-lang.in
--- a/gcc/ada/gcc-interface/Make-lang.in
+++ b/gcc/ada/gcc-interface/Make-lang.in
@@ -863,6 +863,9 @@  ada.stagefeedback: stagefeedback-start
 	-$(MV) ada/stamp-* stagefeedback/ada
 
 lang_checks += check-gnat
+lang_checks_parallelized += check-gnat
+# For description see the check_$lang_parallelize comment in gcc/Makefile.in.
+check_gnat_parallelize = 1000
 
 check-ada: check-acats check-gnat
 check-ada-subtargets: check-acats-subtargets check-gnat-subtargets