diff mbox

parallel check output changes?

Message ID 20140918184455.GB28595@gate.crashing.org
State New
Headers show

Commit Message

Segher Boessenkool Sept. 18, 2014, 6:44 p.m. UTC
On Thu, Sep 18, 2014 at 07:36:09PM +0200, Jakub Jelinek wrote:
> Segher on IRC mentioned that changing result_re in dg-extract-results.py
> should help here (or disabling the python version, *.sh version should
> sort everything).

I am testing a patch that is just




Relatedly, is it just me or are most lines of the test summaries (the "#"
lines after "===") missing since the parallelisation patches?


Segher

Comments

Segher Boessenkool Sept. 19, 2014, 9:37 a.m. UTC | #1
On Thu, Sep 18, 2014 at 01:44:55PM -0500, Segher Boessenkool wrote:
> I am testing a patch that is just
> 
> 
> diff --git a/contrib/dg-extract-results.py b/contrib/dg-extract-results.py
> index cccbfd3..3781423 100644
> --- a/contrib/dg-extract-results.py
> +++ b/contrib/dg-extract-results.py
> @@ -117,7 +117,7 @@ class Prog:
>          self.tool_re = re.compile (r'^\t\t=== (.*) tests ===$')
>          self.result_re = re.compile (r'^(PASS|XPASS|FAIL|XFAIL|UNRESOLVED'
>                                       r'|WARNING|ERROR|UNSUPPORTED|UNTESTED'
> -                                     r'|KFAIL):\s*(\S+)')
> +                                     r'|KFAIL):\s*(.+)')
>          self.completed_re = re.compile (r'.* completed at (.*)')
>          # Pieces of text to write at the head of the output.
>          # start_line is a pair in which the first element is a datetime

Tested that with four runs on powerpc64-linux, four configs each time;
test-summary
shows the same in all cases.  Many lines have moved compared to without
the patch, but that cannot be helped.  Okay for mainline?


2014-09-19  Segher Boessenkool  <segher@kernel.crashing.org>

contrib/
	* dg-extract-results.py (Prog.result_re): Include options in test name.


> Relatedly, is it just me or are most lines of the test summaries (the "#"
> lines after "===") missing since the parallelisation patches?

This is still open.


I also did some timings for make -j60 -k check, same -m64,-m32,-m32/-mpowerpc64,
-m64/-mlra configs.  A run takes 65m, is effectively 42x parallel, and has 15%
system time.


Segher
Mike Stump Sept. 19, 2014, 4:30 p.m. UTC | #2
On Sep 19, 2014, at 2:37 AM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> On Thu, Sep 18, 2014 at 01:44:55PM -0500, Segher Boessenkool wrote:
>> I am testing a patch that is just
>> 
>> 
>> diff --git a/contrib/dg-extract-results.py b/contrib/dg-extract-results.py
>> index cccbfd3..3781423 100644
>> --- a/contrib/dg-extract-results.py
>> +++ b/contrib/dg-extract-results.py
>> @@ -117,7 +117,7 @@ class Prog:
>>         self.tool_re = re.compile (r'^\t\t=== (.*) tests ===$')
>>         self.result_re = re.compile (r'^(PASS|XPASS|FAIL|XFAIL|UNRESOLVED'
>>                                      r'|WARNING|ERROR|UNSUPPORTED|UNTESTED'
>> -                                     r'|KFAIL):\s*(\S+)')
>> +                                     r'|KFAIL):\s*(.+)')
>>         self.completed_re = re.compile (r'.* completed at (.*)')
>>         # Pieces of text to write at the head of the output.
>>         # start_line is a pair in which the first element is a datetime
> 
> Tested that with four runs on powerpc64-linux, four configs each time;
> test-summary
> shows the same in all cases.  Many lines have moved compared to without
> the patch, but that cannot be helped.  Okay for mainline?

Ok.

> I also did some timings for make -j60 -k check, same -m64,-m32,-m32/-mpowerpc64,
> -m64/-mlra configs.  A run takes 65m, is effectively 42x parallel, and has 15%
> system time.

Thanks for the work and for the timings.
Richard Sandiford Sept. 23, 2014, 3:33 p.m. UTC | #3
Segher Boessenkool <segher@kernel.crashing.org> writes:
> On Thu, Sep 18, 2014 at 01:44:55PM -0500, Segher Boessenkool wrote:
>> I am testing a patch that is just
>> 
>> 
>> diff --git a/contrib/dg-extract-results.py b/contrib/dg-extract-results.py
>> index cccbfd3..3781423 100644
>> --- a/contrib/dg-extract-results.py
>> +++ b/contrib/dg-extract-results.py
>> @@ -117,7 +117,7 @@ class Prog:
>>          self.tool_re = re.compile (r'^\t\t=== (.*) tests ===$')
>>          self.result_re = re.compile (r'^(PASS|XPASS|FAIL|XFAIL|UNRESOLVED'
>>                                       r'|WARNING|ERROR|UNSUPPORTED|UNTESTED'
>> -                                     r'|KFAIL):\s*(\S+)')
>> +                                     r'|KFAIL):\s*(.+)')
>>          self.completed_re = re.compile (r'.* completed at (.*)')
>>          # Pieces of text to write at the head of the output.
>>          # start_line is a pair in which the first element is a datetime
>
> Tested that with four runs on powerpc64-linux, four configs each time;
> test-summary
> shows the same in all cases.  Many lines have moved compared to without
> the patch, but that cannot be helped.  Okay for mainline?
>
>
> 2014-09-19  Segher Boessenkool  <segher@kernel.crashing.org>
>
> contrib/
> 	* dg-extract-results.py (Prog.result_re): Include options in test name.

FWIW, the \S+ thing was deliberate.  When one test is run multiple times
with different options, those options aren't necessarily tried in
alphabetical order.  The old sh/awk script therefore used just the test
name as the key and kept tests with the same name in the order that
they were encountered:

/^(PASS|XPASS|FAIL|XFAIL|UNRESOLVED|WARNING|ERROR|UNSUPPORTED|UNTESTED|KFAIL):/ {
  testname=\$2
  # Ugly hack for gfortran.dg/dg.exp
  if ("$TOOL" == "gfortran" && testname ~ /^gfortran.dg\/g77\//)
    testname="h"testname
}

(note the "$2").  This means that the output of the script is in the same
order as it would be for non-parallel runs.  I was following (or trying
to follow) that behaviour in the python script.

Your patch instead sorts based on the full test name, including options,
which means that the output no longer matches what you'd get from a
non-parallel run.  AFAICT, it also no longer matches what you'd get from
the .sh version.  That might be OK, just thought I'd mention it.

Thanks,
Richard
Jakub Jelinek Sept. 23, 2014, 3:42 p.m. UTC | #4
On Tue, Sep 23, 2014 at 04:33:19PM +0100, Richard Sandiford wrote:
> FWIW, the \S+ thing was deliberate.  When one test is run multiple times
> with different options, those options aren't necessarily tried in
> alphabetical order.  The old sh/awk script therefore used just the test
> name as the key and kept tests with the same name in the order that
> they were encountered:
> 
> /^(PASS|XPASS|FAIL|XFAIL|UNRESOLVED|WARNING|ERROR|UNSUPPORTED|UNTESTED|KFAIL):/ {
>   testname=\$2
>   # Ugly hack for gfortran.dg/dg.exp
>   if ("$TOOL" == "gfortran" && testname ~ /^gfortran.dg\/g77\//)
>     testname="h"testname
> }
> 
> (note the "$2").  This means that the output of the script is in the same
> order as it would be for non-parallel runs.  I was following (or trying
> to follow) that behaviour in the python script.

My understanding was that the sh version sorts the testcase name followed
by the full line and then removes whatever has been there before the
PASS/XPASS etc., so while e.g. whether some test PASSed or FAILed etc.
is then more important than the option, if two tests PASS, the options are
still used for the sorting.  Note that before the parallelization changes,
usually the same test filename would be run all by a single runtest
instance, so it really didn't matter that much.

> Your patch instead sorts based on the full test name, including options,
> which means that the output no longer matches what you'd get from a
> non-parallel run.  AFAICT, it also no longer matches what you'd get from
> the .sh version.  That might be OK, just thought I'd mention it.

I'm afraid there is not enough info to reconstruct the order serial version
has.

	Jakub
Andrew MacLeod Sept. 24, 2014, 2:54 p.m. UTC | #5
On 09/23/2014 11:33 AM, Richard Sandiford wrote:
> Segher Boessenkool<segher@kernel.crashing.org>  writes:
>> On Thu, Sep 18, 2014 at 01:44:55PM -0500, Segher Boessenkool wrote:
>>> I am testing a patch that is just
>>>
>>>
>>> diff --git a/contrib/dg-extract-results.py b/contrib/dg-extract-results.py
>>> index cccbfd3..3781423 100644
>>> --- a/contrib/dg-extract-results.py
>>> +++ b/contrib/dg-extract-results.py
>>> @@ -117,7 +117,7 @@ class Prog:
>>>           self.tool_re = re.compile (r'^\t\t=== (.*) tests ===$')
>>>           self.result_re = re.compile (r'^(PASS|XPASS|FAIL|XFAIL|UNRESOLVED'
>>>                                        r'|WARNING|ERROR|UNSUPPORTED|UNTESTED'
>>> -                                     r'|KFAIL):\s*(\S+)')
>>> +                                     r'|KFAIL):\s*(.+)')
>>>           self.completed_re = re.compile (r'.* completed at (.*)')
>>>           # Pieces of text to write at the head of the output.
>>>           # start_line is a pair in which the first element is a datetime
>> Tested that with four runs on powerpc64-linux, four configs each time;
>> test-summary
>> shows the same in all cases.  Many lines have moved compared to without
>> the patch, but that cannot be helped.  Okay for mainline?
>>
>>
>> 2014-09-19  Segher Boessenkool<segher@kernel.crashing.org>
>>
>> contrib/
>> 	* dg-extract-results.py (Prog.result_re): Include options in test name.
> FWIW, the \S+ thing was deliberate.  When one test is run multiple times
> with different options, those options aren't necessarily tried in
> alphabetical order.  The old sh/awk script therefore used just the test
> name as the key and kept tests with the same name in the order that
> they were encountered:
>
> /^(PASS|XPASS|FAIL|XFAIL|UNRESOLVED|WARNING|ERROR|UNSUPPORTED|UNTESTED|KFAIL):/ {
>    testname=\$2
>    # Ugly hack for gfortran.dg/dg.exp
>    if ("$TOOL" == "gfortran" && testname ~ /^gfortran.dg\/g77\//)
>      testname="h"testname
> }
>
> (note the "$2").  This means that the output of the script is in the same
> order as it would be for non-parallel runs.  I was following (or trying
> to follow) that behaviour in the python script.
>
> Your patch instead sorts based on the full test name, including options,
> which means that the output no longer matches what you'd get from a
> non-parallel run.  AFAICT, it also no longer matches what you'd get from
> the .sh version.  That might be OK, just thought I'd mention it.
>
> Thanks,
> Richard
>
Is this suppose to be resolved now?  I'm still seeing some issues with a 
branch cut from mainline from yesterday.   This is from the following 
sequence:

check out revision 215511 , build, make -j16 check, make -j16 check, 
then compare all the .sum files:

PASS: gcc.dg/tls/asm-1.c  (test for errors, line 7)
PASS: gcc.dg/tls/asm-1.c (test for excess errors)
PASS: gcc.dg/tls/debug-1.c (test for excess errors)
PASS: gcc.dg/tls/diag-1.c (test for excess errors)
PASS: gcc.dg/tls/diag-2.c  (test for errors, line 4)
PASS: gcc.dg/tls/diag-2.c  (test for errors, line 5)
PASS: gcc.dg/tls/diag-2.c  (test for errors, line 6)
PASS: gcc.dg/tls/diag-2.c  (test for errors, line 7)
PASS: gcc.dg/tls/diag-2.c  (test for errors, line 11)
PASS: gcc.dg/tls/diag-2.c  (test for errors, line 12)
PASS: gcc.dg/tls/diag-2.c  (test for errors, line 13)
PASS: gcc.dg/tls/diag-2.c  (test for errors, line 14)
PASS: gcc.dg/tls/diag-2.c  (test for errors, line 17)
PASS: gcc.dg/tls/diag-2.c  (test for errors, line 18)
PASS: gcc.dg/tls/diag-2.c  (test for errors, line 19)
PASS: gcc.dg/tls/diag-2.c  (test for errors, line 20)
PASS: gcc.dg/tls/diag-2.c  (test for errors, line 22)

and then
PASS: gcc.dg/tls/asm-1.c  (test for errors, line 7)
PASS: gcc.dg/tls/asm-1.c (test for excess errors)
PASS: gcc.dg/tls/debug-1.c (test for excess errors)
PASS: gcc.dg/tls/diag-1.c (test for excess errors)
PASS: gcc.dg/tls/diag-2.c  (test for errors, line 11)
PASS: gcc.dg/tls/diag-2.c  (test for errors, line 12)
PASS: gcc.dg/tls/diag-2.c  (test for errors, line 13)
PASS: gcc.dg/tls/diag-2.c  (test for errors, line 14)
PASS: gcc.dg/tls/diag-2.c  (test for errors, line 17)
PASS: gcc.dg/tls/diag-2.c  (test for errors, line 18)
PASS: gcc.dg/tls/diag-2.c  (test for errors, line 19)
PASS: gcc.dg/tls/diag-2.c  (test for errors, line 20)
PASS: gcc.dg/tls/diag-2.c  (test for errors, line 22)
PASS: gcc.dg/tls/diag-2.c  (test for errors, line 4)
PASS: gcc.dg/tls/diag-2.c  (test for errors, line 5)
PASS: gcc.dg/tls/diag-2.c  (test for errors, line 6)
PASS: gcc.dg/tls/diag-2.c  (test for errors, line 7)

it looks like the first time sorted by line numerically (or just 
happened to leave the run order)  and the second time did the sort 
alphabetically...

Andrew
Segher Boessenkool Sept. 24, 2014, 4:10 p.m. UTC | #6
On Wed, Sep 24, 2014 at 10:54:57AM -0400, Andrew MacLeod wrote:
> On 09/23/2014 11:33 AM, Richard Sandiford wrote:
> >Your patch instead sorts based on the full test name, including options,
> >which means that the output no longer matches what you'd get from a
> >non-parallel run.  AFAICT, it also no longer matches what you'd get from
> >the .sh version.  That might be OK, just thought I'd mention it.

With the parallellisation changes the output was pretty random order.  My
patch made that a fixed order again, albeit a different one from before.

> Is this suppose to be resolved now?  I'm still seeing some issues with a 
> branch cut from mainline from yesterday.   This is from the following 
> sequence:
> 
> check out revision 215511 , build, make -j16 check, make -j16 check, 
> then compare all the .sum files:

I don't understand what exactly you did; you have left out some steps
I think?


Segher
Andrew MacLeod Sept. 24, 2014, 4:29 p.m. UTC | #7
On 09/24/2014 12:10 PM, Segher Boessenkool wrote:
> On Wed, Sep 24, 2014 at 10:54:57AM -0400, Andrew MacLeod wrote:
>> On 09/23/2014 11:33 AM, Richard Sandiford wrote:
>>> Your patch instead sorts based on the full test name, including options,
>>> which means that the output no longer matches what you'd get from a
>>> non-parallel run.  AFAICT, it also no longer matches what you'd get from
>>> the .sh version.  That might be OK, just thought I'd mention it.
> With the parallellisation changes the output was pretty random order.  My
> patch made that a fixed order again, albeit a different one from before.
>
>> Is this suppose to be resolved now?  I'm still seeing some issues with a
>> branch cut from mainline from yesterday.   This is from the following
>> sequence:
>>
>> check out revision 215511 , build, make -j16 check, make -j16 check,
>> then compare all the .sum files:
> I don't understand what exactly you did; you have left out some steps
> I think?
>
What?  no.. like what?  check out a tree, basic configure and build from 
scratch (./configure --verbose, make -j16 all)  and then run make check 
twice in a row.. literally "make -j16 -i check".  nothing in between. so 
the compiler and toolchain are exactly the same. and different results.  
same way Ive done it forever.  except I am still getting some  different 
results from run to run.  target is a normal build-x86_64-unknown-linux-gnu

what I'm saying is something still isn't all getting sorted all the time 
(maybe if a section wasn't split up, it doesn't sort?), or all the 
patches to fix it aren't in, or there is something else still amok.  
Notice it isn't options that is the problem this time.. its the trailing 
line number of the test case warning. One is in numerical order, the 
other is in alphabetical order.

Im running it a third time now.. we'll see if its different than both 
the others or not.

Andrew
Andrew MacLeod Sept. 24, 2014, 5:58 p.m. UTC | #8
On 09/24/2014 12:29 PM, Andrew MacLeod wrote:
> On 09/24/2014 12:10 PM, Segher Boessenkool wrote:
>> On Wed, Sep 24, 2014 at 10:54:57AM -0400, Andrew MacLeod wrote:
>>> On 09/23/2014 11:33 AM, Richard Sandiford wrote:
>>>> Your patch instead sorts based on the full test name, including 
>>>> options,
>>>> which means that the output no longer matches what you'd get from a
>>>> non-parallel run.  AFAICT, it also no longer matches what you'd get 
>>>> from
>>>> the .sh version.  That might be OK, just thought I'd mention it.
>> With the parallellisation changes the output was pretty random 
>> order.  My
>> patch made that a fixed order again, albeit a different one from before.
>>
>>> Is this suppose to be resolved now?  I'm still seeing some issues 
>>> with a
>>> branch cut from mainline from yesterday.   This is from the following
>>> sequence:
>>>
>>> check out revision 215511 , build, make -j16 check, make -j16 check,
>>> then compare all the .sum files:
>> I don't understand what exactly you did; you have left out some steps
>> I think?
>>
> What?  no.. like what?  check out a tree, basic configure and build 
> from scratch (./configure --verbose, make -j16 all)  and then run make 
> check twice in a row.. literally "make -j16 -i check".  nothing in 
> between. so the compiler and toolchain are exactly the same. and 
> different results.  same way Ive done it forever.  except I am still 
> getting some  different results from run to run.  target is a normal 
> build-x86_64-unknown-linux-gnu
>
> what I'm saying is something still isn't all getting sorted all the 
> time (maybe if a section wasn't split up, it doesn't sort?), or all 
> the patches to fix it aren't in, or there is something else still 
> amok.  Notice it isn't options that is the problem this time.. its the 
> trailing line number of the test case warning. One is in numerical 
> order, the other is in alphabetical order.
>
> Im running it a third time now.. we'll see if its different than both 
> the others or not.
>
> Andrew

AH. interesting.

The third run has a gcc.sum that is exactly the same as the first run. 
so only the second run differs, and it seems to be from an alphabetical 
sort.  So run 3 and 1 match.
the gfortran.sum from the third run is identical to the *second* run, 
but it is different from the *first* run.  so run 2 and 3 match.

the two runs that match (2nd and 3rd run) look like:
PASS: gfortran.dg/coarray/this_image_1.f90 -fcoarray=single  -O2 (test 
for excess errors)
PASS: gfortran.dg/coarray/this_image_1.f90 -fcoarray=single  -O2 
execution test
PASS: gfortran.dg/coarray/this_image_1.f90 -fcoarray=lib  -O2 
-lcaf_single (test for excess errors)
PASS: gfortran.dg/coarray/this_image_1.f90 -fcoarray=lib  -O2 
-lcaf_single execution test
PASS: gfortran.dg/coarray/this_image_2.f90 -fcoarray=single  -O2 (test 
for excess errors)
PASS: gfortran.dg/coarray/this_image_2.f90 -fcoarray=single  -O2 
execution test
PASS: gfortran.dg/coarray/this_image_2.f90 -fcoarray=lib  -O2 
-lcaf_single (test for excess errors)
PASS: gfortran.dg/coarray/this_image_2.f90 -fcoarray=lib  -O2 
-lcaf_single execution test

and the odd one out (firstrun:)
PASS: gfortran.dg/coarray/this_image_1.f90 -fcoarray=lib  -O2 
-lcaf_single (test for excess errors)
PASS: gfortran.dg/coarray/this_image_1.f90 -fcoarray=lib  -O2 
-lcaf_single execution test
PASS: gfortran.dg/coarray/this_image_1.f90 -fcoarray=single  -O2 (test 
for excess errors)
PASS: gfortran.dg/coarray/this_image_1.f90 -fcoarray=single  -O2 
execution test
PASS: gfortran.dg/coarray/this_image_2.f90 -fcoarray=lib  -O2 
-lcaf_single (test for excess errors)
PASS: gfortran.dg/coarray/this_image_2.f90 -fcoarray=lib  -O2 
-lcaf_single execution test
PASS: gfortran.dg/coarray/this_image_2.f90 -fcoarray=single  -O2 (test 
for excess errors)
PASS: gfortran.dg/coarray/this_image_2.f90 -fcoarray=single  -O2 
execution test

looks like the first run was sorted, and the other 2 weren't.

There must be some condition under which we don't sort the results? or 
another place which needs to be tweaked to do the sort as well...?

Andrew
Andrew MacLeod Sept. 25, 2014, 12:22 p.m. UTC | #9
On 09/24/2014 01:58 PM, Andrew MacLeod wrote:
> On 09/24/2014 12:29 PM, Andrew MacLeod wrote:
>>
>
> AH. interesting.
>
> The third run has a gcc.sum that is exactly the same as the first run. 
> so only the second run differs, and it seems to be from an 
> alphabetical sort.  So run 3 and 1 match.
> the gfortran.sum from the third run is identical to the *second* run, 
> but it is different from the *first* run.  so run 2 and 3 match.
>
> the two runs that match (2nd and 3rd run) look like:
> PASS: gfortran.dg/coarray/this_image_1.f90 -fcoarray=single  -O2 (test 
> for excess errors)
> PASS: gfortran.dg/coarray/this_image_1.f90 -fcoarray=single  -O2 
> execution test
> PASS: gfortran.dg/coarray/this_image_1.f90 -fcoarray=lib  -O2 
> -lcaf_single (test for excess errors)
> PASS: gfortran.dg/coarray/this_image_1.f90 -fcoarray=lib  -O2 
> -lcaf_single execution test
> PASS: gfortran.dg/coarray/this_image_2.f90 -fcoarray=single  -O2 (test 
> for excess errors)
> PASS: gfortran.dg/coarray/this_image_2.f90 -fcoarray=single  -O2 
> execution test
> PASS: gfortran.dg/coarray/this_image_2.f90 -fcoarray=lib  -O2 
> -lcaf_single (test for excess errors)
> PASS: gfortran.dg/coarray/this_image_2.f90 -fcoarray=lib  -O2 
> -lcaf_single execution test
>
> and the odd one out (firstrun:)
> PASS: gfortran.dg/coarray/this_image_1.f90 -fcoarray=lib  -O2 
> -lcaf_single (test for excess errors)
> PASS: gfortran.dg/coarray/this_image_1.f90 -fcoarray=lib  -O2 
> -lcaf_single execution test
> PASS: gfortran.dg/coarray/this_image_1.f90 -fcoarray=single  -O2 (test 
> for excess errors)
> PASS: gfortran.dg/coarray/this_image_1.f90 -fcoarray=single  -O2 
> execution test
> PASS: gfortran.dg/coarray/this_image_2.f90 -fcoarray=lib  -O2 
> -lcaf_single (test for excess errors)
> PASS: gfortran.dg/coarray/this_image_2.f90 -fcoarray=lib  -O2 
> -lcaf_single execution test
> PASS: gfortran.dg/coarray/this_image_2.f90 -fcoarray=single  -O2 (test 
> for excess errors)
> PASS: gfortran.dg/coarray/this_image_2.f90 -fcoarray=single  -O2 
> execution test
>
> looks like the first run was sorted, and the other 2 weren't.
>
> There must be some condition under which we don't sort the results? or 
> another place which needs to be tweaked to do the sort as well...?
>
> Andrew
>
So to be fair, I could use test_summary, but I think the concern is 
warranted because if this inconsistent ordering can happen to PASS, I 
would expect the same non-deterministic behaviour if those tests happen 
to FAIL.  we just have far less FAILS so we aren't seeing it with 
test_summary at the moment...

Aggregating all my .sum files,  I see a sampling of about 257,000 PASSs, 
whereas I see a total of 141 FAILs.  FAILs only account for < 0.06% of 
the output. ( I'm getting an average of about 510 mis-ordered PASSs, so 
it only affects a small portion of them as well.)

I would think the output of .sum needs to be consistent from one run to 
the next in order for test_summary to consistently report its results as 
well.

Andrew
Segher Boessenkool Sept. 25, 2014, 5:02 p.m. UTC | #10
On Thu, Sep 25, 2014 at 08:22:29AM -0400, Andrew MacLeod wrote:
> So to be fair, I could use test_summary, but I think the concern is 
> warranted because if this inconsistent ordering can happen to PASS, I 
> would expect the same non-deterministic behaviour if those tests happen 
> to FAIL.  we just have far less FAILS so we aren't seeing it with 
> test_summary at the moment...
> 
> Aggregating all my .sum files,  I see a sampling of about 257,000 PASSs, 
> whereas I see a total of 141 FAILs.  FAILs only account for < 0.06% of 
> the output. ( I'm getting an average of about 510 mis-ordered PASSs, so 
> it only affects a small portion of them as well.)

0.24% here (2241 FAILs, 917715 PASSes).

You're seeing about 1 in 500 misordered, so if it was independent (which
of course it is not) I should see it in the FAILs already.

> I would think the output of .sum needs to be consistent from one run to 
> the next in order for test_summary to consistently report its results as 
> well.

Yes.  There also is the problem of the summaries being messed up (which
they were already before the parallelisation changes, but now the result
is much worse).

I'll have another look.


Segher
diff mbox

Patch

diff --git a/contrib/dg-extract-results.py b/contrib/dg-extract-results.py
index cccbfd3..3781423 100644
--- a/contrib/dg-extract-results.py
+++ b/contrib/dg-extract-results.py
@@ -117,7 +117,7 @@  class Prog:
         self.tool_re = re.compile (r'^\t\t=== (.*) tests ===$')
         self.result_re = re.compile (r'^(PASS|XPASS|FAIL|XFAIL|UNRESOLVED'
                                      r'|WARNING|ERROR|UNSUPPORTED|UNTESTED'
-                                     r'|KFAIL):\s*(\S+)')
+                                     r'|KFAIL):\s*(.+)')
         self.completed_re = re.compile (r'.* completed at (.*)')
         # Pieces of text to write at the head of the output.
         # start_line is a pair in which the first element is a datetime