Patchwork [testsuite] Increase gcc.dg/pr43058.c timeout

login
register
mail settings
Submitter Rainer Orth
Date July 14, 2010, 3:47 p.m.
Message ID <yddr5j6ysjb.fsf@manam.CeBiTec.Uni-Bielefeld.DE>
Download mbox | patch
Permalink /patch/58912/
State New
Headers show

Comments

Rainer Orth - July 14, 2010, 3:47 p.m.
The gcc.dg/pr43058.c test times out on most of my systems:

WARNING: program timed out.
FAIL: gcc.dg/pr43058.c (test for excess errors)

Even on an idle Sun Fire T5220 (1.2 GHz UltraSPARC-T2), it takes

real     4:56.38
user     4:54.71
sys         0.35

or on an Sun Fire X4450 (2.93 GHz Xeon X7350)

real     1:18.01
user     1:17.20
sys         0.26

As soon as the machine is loaded (e.g. make -j<2 * ncpu> check), the
test is practically guaranteed not to complete within the regular 5
minute (300 s) timeout.  I'd therefore like to increase the timeout by a
factor of 4.

Ok for mainline and the 4.5 branch?

	Rainer


2010-07-09  Rainer Orth  <ro@CeBiTec.Uni-Bielefeld.DE>

	* gcc.dg/pr43058.c: Use dg-timeout-factor 4.
Mike Stump - July 23, 2010, 5:30 p.m.
On Jul 14, 2010, at 8:47 AM, Rainer Orth wrote:
> The gcc.dg/pr43058.c test times out on most of my systems:
> 
> WARNING: program timed out.
> FAIL: gcc.dg/pr43058.c (test for excess errors)
> 
> Even on an idle Sun Fire T5220 (1.2 GHz UltraSPARC-T2), it takes
> 
> real     4:56.38
> user     4:54.71
> sys         0.35
> 
> or on an Sun Fire X4450 (2.93 GHz Xeon X7350)
> 
> real     1:18.01
> user     1:17.20
> sys         0.26
> 
> As soon as the machine is loaded (e.g. make -j<2 * ncpu> check), the
> test is practically guaranteed not to complete within the regular 5
> minute (300 s) timeout.  I'd therefore like to increase the timeout by a
> factor of 4.
> 
> Ok for mainline and the 4.5 branch?

No.  I think the patch is wrong.  I think the patch is wrong because the testcase is wrong.  The testcase is wrong, as it intentionally consumes tons of resource to try and blow the machine out of the water.

I think 43058 should be re-opened.

Philosophy, any testcase that takes more than 30 seconds on a slow machine, is in danger of being a bad testcase.  Really, they should be trimmed, reduced or split up, if possible.  The style of this testcase sucks.  At least the limit should be lowered to 10 seconds and the testcase reduced to take around 5 seconds on a slow machine (1 GHz say), if people were happy with how slow the compiler is.

A better fix would be to design an extension to the testing infrastructure, say:

  PERF: INT test_case_name

and then we could put things like memory usage of the compiler as:

  PERF: 121883 RAM gcc.dg/pr43058.c

and compile time performance as:

  PERF: 312334 comptime gcc.dg/pre43058.c

where the number is, say, the number of cycles the compiler spent compiling the testcase.  We can then add a tool to compare two runs (a la contrib/compare_tests) and report regressions in any performance numbers.  Works for RAM usage, paging, compile time, test case run time, number of cache misses, number of spills...

If people like the slow compiler, then the size of the testcase should just be reduced (and the limit dropped), if they don't want to add the infrastructure to do better.
Mark Mitchell - July 23, 2010, 6:05 p.m.
Mike Stump wrote:

> Philosophy, any testcase that takes more than 30 seconds on a slow
> machine, is in danger of being a bad testcase. 

I tend to agree.  I'm not willing to make it a hard limit, because there
really are bugs that take a lot of processing to reproduce, and they may
be appropriate.

But, it's bothered me for about a decade that we spend X% of our test
cycles running Y% of our tests, where X >> Y.  The marginal costs of
slow tests is usually not commensurate with their marginal benefits.
Rainer Orth - July 23, 2010, 6:10 p.m.
Mark Mitchell <mark@codesourcery.com> writes:

> Mike Stump wrote:
>
>> Philosophy, any testcase that takes more than 30 seconds on a slow
>> machine, is in danger of being a bad testcase. 
>
> I tend to agree.  I'm not willing to make it a hard limit, because there
> really are bugs that take a lot of processing to reproduce, and they may
> be appropriate.
>
> But, it's bothered me for about a decade that we spend X% of our test
> cycles running Y% of our tests, where X >> Y.  The marginal costs of
> slow tests is usually not commensurate with their marginal benefits.

Fully agreed.  The worst offender so far is a single gcc.c-torture
testcase that takes 4+ hours (2 multilibs, 8 different optimization
options) on an UltraSPARC-T2:

	http://gcc.gnu.org/ml/gcc-patches/2010-07/msg01633.html

If this isn't changed, this single testcase runs longer than the rest of
the testsuite combined ;-(

	Rainer
Richard Guenther - July 23, 2010, 6:29 p.m.
On Fri, Jul 23, 2010 at 7:30 PM, Mike Stump <mikestump@comcast.net> wrote:
> On Jul 14, 2010, at 8:47 AM, Rainer Orth wrote:
>> The gcc.dg/pr43058.c test times out on most of my systems:
>>
>> WARNING: program timed out.
>> FAIL: gcc.dg/pr43058.c (test for excess errors)
>>
>> Even on an idle Sun Fire T5220 (1.2 GHz UltraSPARC-T2), it takes
>>
>> real     4:56.38
>> user     4:54.71
>> sys         0.35
>>
>> or on an Sun Fire X4450 (2.93 GHz Xeon X7350)
>>
>> real     1:18.01
>> user     1:17.20
>> sys         0.26
>>
>> As soon as the machine is loaded (e.g. make -j<2 * ncpu> check), the
>> test is practically guaranteed not to complete within the regular 5
>> minute (300 s) timeout.  I'd therefore like to increase the timeout by a
>> factor of 4.
>>
>> Ok for mainline and the 4.5 branch?
>
> No.  I think the patch is wrong.  I think the patch is wrong because the testcase is wrong.  The testcase is wrong, as it intentionally consumes tons of resource to try and blow the machine out of the water.
>
> I think 43058 should be re-opened.

Well - it was fixed (it was about memory consumption).  And to not regress
we need to simulate the original failure which the testcase does.

Now, as for slow machines I'd rather have a { dg-effective-target fast-and-big }
that will just skip these kind of tests on small/slow machines.

Richard.
Mark Mitchell - July 23, 2010, 7:06 p.m.
Rainer Orth wrote:

> Fully agreed.  The worst offender so far is a single gcc.c-torture
> testcase that takes 4+ hours (2 multilibs, 8 different optimization
> options) on an UltraSPARC-T2:
> 
> 	http://gcc.gnu.org/ml/gcc-patches/2010-07/msg01633.html
> 
> If this isn't changed, this single testcase runs longer than the rest of
> the testsuite combined ;-(

That's just silly.

I will approve a patch which moves this into a separate testsuite, which
is run only upon explicit request, and we will remove this from the set
of required tests to run before checking things in.

Thanks,
Mark Mitchell - July 23, 2010, 7:14 p.m.
Jakub Jelinek wrote:

> That can't be true, because pr43058.c isn't run for all optimization levels.
> It is only run at -O2 -g.  So it definitely can't run for 4+ hours.
> With 2 multilibs, if you run with RUNTESTFLAGS='--target_board=unix\{-m32,-m64\}'
> (not the default), it can take at most 2 timeouts if it times out.

I thought Rainer was referring to limits-fnargs.c.
Jakub Jelinek - July 23, 2010, 7:14 p.m.
On Fri, Jul 23, 2010 at 12:06:02PM -0700, Mark Mitchell wrote:
> Rainer Orth wrote:
> 
> > Fully agreed.  The worst offender so far is a single gcc.c-torture
> > testcase that takes 4+ hours (2 multilibs, 8 different optimization
> > options) on an UltraSPARC-T2:
> > 
> > 	http://gcc.gnu.org/ml/gcc-patches/2010-07/msg01633.html
> > 
> > If this isn't changed, this single testcase runs longer than the rest of
> > the testsuite combined ;-(

That can't be true, because pr43058.c isn't run for all optimization levels.
It is only run at -O2 -g.  So it definitely can't run for 4+ hours.
With 2 multilibs, if you run with RUNTESTFLAGS='--target_board=unix\{-m32,-m64\}'
(not the default), it can take at most 2 timeouts if it times out.

> That's just silly.
> 
> I will approve a patch which moves this into a separate testsuite, which
> is run only upon explicit request, and we will remove this from the set
> of required tests to run before checking things in.

	Jakub
Rainer Orth - July 23, 2010, 7:19 p.m.
Mark Mitchell <mark@codesourcery.com> writes:

> Jakub Jelinek wrote:
>
>> That can't be true, because pr43058.c isn't run for all optimization levels.
>> It is only run at -O2 -g.  So it definitely can't run for 4+ hours.
>> With 2 multilibs, if you run with RUNTESTFLAGS='--target_board=unix\{-m32,-m64\}'
>> (not the default), it can take at most 2 timeouts if it times out.
>
> I thought Rainer was referring to limits-fnargs.c.

Indeed: gcc.dg/pr43058.c is merely an annoyance and the timeout warning
is now avoided by increasing the timeout factor, but limits-fnargs.c is
completely out of control.  limits-fndefn.c is in a similar category: on
an otherwise idle T5220, a single compilation takes almost exactly 5
minutes, compared to the 16:40 minutes of limits-fnargs.c

	Rainer
Rainer Orth - July 23, 2010, 7:21 p.m.
Jakub Jelinek <jakub@redhat.com> writes:

>> I thought Rainer was referring to limits-fnargs.c.
>
> Oops, sorry.  For limits-fnargs.c, perhaps copying it over into gcc.dg and
> run only at one opt level (say -O2) in this size and shrink the test
> considerably for what stays in gcc.c-torture/compile?

The shrinking is part of my proposal in

	http://gcc.gnu.org/ml/gcc-patches/2010-07/msg01633.html

The exact amount remains to be determined.

	Rainer
Jakub Jelinek - July 23, 2010, 7:21 p.m.
On Fri, Jul 23, 2010 at 12:14:25PM -0700, Mark Mitchell wrote:
> Jakub Jelinek wrote:
> 
> > That can't be true, because pr43058.c isn't run for all optimization levels.
> > It is only run at -O2 -g.  So it definitely can't run for 4+ hours.
> > With 2 multilibs, if you run with RUNTESTFLAGS='--target_board=unix\{-m32,-m64\}'
> > (not the default), it can take at most 2 timeouts if it times out.
> 
> I thought Rainer was referring to limits-fnargs.c.

Oops, sorry.  For limits-fnargs.c, perhaps copying it over into gcc.dg and
run only at one opt level (say -O2) in this size and shrink the test
considerably for what stays in gcc.c-torture/compile?

	Jakub
Mark Mitchell - July 23, 2010, 7:23 p.m.
Jakub Jelinek wrote:

> Oops, sorry.  For limits-fnargs.c, perhaps copying it over into gcc.dg and
> run only at one opt level (say -O2) in this size and shrink the test
> considerably for what stays in gcc.c-torture/compile?

If we want to have a "mini" version of it that stays in
gcc.c-torture/compile, that's fine, but it shouldn't be more expensive
that most tests in the testsuite.  A few seconds at most.

I don't think we should run the full version by default, even once.
It's just not doing anything very useful, relative to the time it takes.
 How often has it broken since it was added?  If the answer is "never",
then it's consumed thousands of hours, but provided almost no benefit.

Let's make it optional, and if someone (including automated testers)
wants to run it, so be it -- but let's not slow down every GCC developer
with this.

Patch

diff -r 53c3be5f051b gcc/testsuite/gcc.dg/pr43058.c
--- a/gcc/testsuite/gcc.dg/pr43058.c	Fri Jul 09 13:39:23 2010 +0200
+++ b/gcc/testsuite/gcc.dg/pr43058.c	Fri Jul 09 13:46:33 2010 +0200
@@ -1,6 +1,7 @@ 
 /* PR debug/43058 */
 /* { dg-do compile } */
 /* { dg-options "-g -O2" } */
+/* { dg-timeout-factor 4 } */
 
 extern void *f1 (void *, void *, void *);
 extern void *f2 (const char *, int, int, int, void *(*) ());