Message ID | 20140918184455.GB28595@gate.crashing.org |
---|---|
State | New |
Headers | show |
On Thu, Sep 18, 2014 at 01:44:55PM -0500, Segher Boessenkool wrote: > I am testing a patch that is just > > > diff --git a/contrib/dg-extract-results.py b/contrib/dg-extract-results.py > index cccbfd3..3781423 100644 > --- a/contrib/dg-extract-results.py > +++ b/contrib/dg-extract-results.py > @@ -117,7 +117,7 @@ class Prog: > self.tool_re = re.compile (r'^\t\t=== (.*) tests ===$') > self.result_re = re.compile (r'^(PASS|XPASS|FAIL|XFAIL|UNRESOLVED' > r'|WARNING|ERROR|UNSUPPORTED|UNTESTED' > - r'|KFAIL):\s*(\S+)') > + r'|KFAIL):\s*(.+)') > self.completed_re = re.compile (r'.* completed at (.*)') > # Pieces of text to write at the head of the output. > # start_line is a pair in which the first element is a datetime Tested that with four runs on powerpc64-linux, four configs each time; test-summary shows the same in all cases. Many lines have moved compared to without the patch, but that cannot be helped. Okay for mainline? 2014-09-19 Segher Boessenkool <segher@kernel.crashing.org> contrib/ * dg-extract-results.py (Prog.result_re): Include options in test name. > Relatedly, is it just me or are most lines of the test summaries (the "#" > lines after "===") missing since the parallelisation patches? This is still open. I also did some timings for make -j60 -k check, same -m64,-m32,-m32/-mpowerpc64, -m64/-mlra configs. A run takes 65m, is effectively 42x parallel, and has 15% system time. Segher
On Sep 19, 2014, at 2:37 AM, Segher Boessenkool <segher@kernel.crashing.org> wrote: > On Thu, Sep 18, 2014 at 01:44:55PM -0500, Segher Boessenkool wrote: >> I am testing a patch that is just >> >> >> diff --git a/contrib/dg-extract-results.py b/contrib/dg-extract-results.py >> index cccbfd3..3781423 100644 >> --- a/contrib/dg-extract-results.py >> +++ b/contrib/dg-extract-results.py >> @@ -117,7 +117,7 @@ class Prog: >> self.tool_re = re.compile (r'^\t\t=== (.*) tests ===$') >> self.result_re = re.compile (r'^(PASS|XPASS|FAIL|XFAIL|UNRESOLVED' >> r'|WARNING|ERROR|UNSUPPORTED|UNTESTED' >> - r'|KFAIL):\s*(\S+)') >> + r'|KFAIL):\s*(.+)') >> self.completed_re = re.compile (r'.* completed at (.*)') >> # Pieces of text to write at the head of the output. >> # start_line is a pair in which the first element is a datetime > > Tested that with four runs on powerpc64-linux, four configs each time; > test-summary > shows the same in all cases. Many lines have moved compared to without > the patch, but that cannot be helped. Okay for mainline? Ok. > I also did some timings for make -j60 -k check, same -m64,-m32,-m32/-mpowerpc64, > -m64/-mlra configs. A run takes 65m, is effectively 42x parallel, and has 15% > system time. Thanks for the work and for the timings.
Segher Boessenkool <segher@kernel.crashing.org> writes: > On Thu, Sep 18, 2014 at 01:44:55PM -0500, Segher Boessenkool wrote: >> I am testing a patch that is just >> >> >> diff --git a/contrib/dg-extract-results.py b/contrib/dg-extract-results.py >> index cccbfd3..3781423 100644 >> --- a/contrib/dg-extract-results.py >> +++ b/contrib/dg-extract-results.py >> @@ -117,7 +117,7 @@ class Prog: >> self.tool_re = re.compile (r'^\t\t=== (.*) tests ===$') >> self.result_re = re.compile (r'^(PASS|XPASS|FAIL|XFAIL|UNRESOLVED' >> r'|WARNING|ERROR|UNSUPPORTED|UNTESTED' >> - r'|KFAIL):\s*(\S+)') >> + r'|KFAIL):\s*(.+)') >> self.completed_re = re.compile (r'.* completed at (.*)') >> # Pieces of text to write at the head of the output. >> # start_line is a pair in which the first element is a datetime > > Tested that with four runs on powerpc64-linux, four configs each time; > test-summary > shows the same in all cases. Many lines have moved compared to without > the patch, but that cannot be helped. Okay for mainline? > > > 2014-09-19 Segher Boessenkool <segher@kernel.crashing.org> > > contrib/ > * dg-extract-results.py (Prog.result_re): Include options in test name. FWIW, the \S+ thing was deliberate. When one test is run multiple times with different options, those options aren't necessarily tried in alphabetical order. The old sh/awk script therefore used just the test name as the key and kept tests with the same name in the order that they were encountered: /^(PASS|XPASS|FAIL|XFAIL|UNRESOLVED|WARNING|ERROR|UNSUPPORTED|UNTESTED|KFAIL):/ { testname=\$2 # Ugly hack for gfortran.dg/dg.exp if ("$TOOL" == "gfortran" && testname ~ /^gfortran.dg\/g77\//) testname="h"testname } (note the "$2"). This means that the output of the script is in the same order as it would be for non-parallel runs. I was following (or trying to follow) that behaviour in the python script. Your patch instead sorts based on the full test name, including options, which means that the output no longer matches what you'd get from a non-parallel run. AFAICT, it also no longer matches what you'd get from the .sh version. That might be OK, just thought I'd mention it. Thanks, Richard
On Tue, Sep 23, 2014 at 04:33:19PM +0100, Richard Sandiford wrote: > FWIW, the \S+ thing was deliberate. When one test is run multiple times > with different options, those options aren't necessarily tried in > alphabetical order. The old sh/awk script therefore used just the test > name as the key and kept tests with the same name in the order that > they were encountered: > > /^(PASS|XPASS|FAIL|XFAIL|UNRESOLVED|WARNING|ERROR|UNSUPPORTED|UNTESTED|KFAIL):/ { > testname=\$2 > # Ugly hack for gfortran.dg/dg.exp > if ("$TOOL" == "gfortran" && testname ~ /^gfortran.dg\/g77\//) > testname="h"testname > } > > (note the "$2"). This means that the output of the script is in the same > order as it would be for non-parallel runs. I was following (or trying > to follow) that behaviour in the python script. My understanding was that the sh version sorts the testcase name followed by the full line and then removes whatever has been there before the PASS/XPASS etc., so while e.g. whether some test PASSed or FAILed etc. is then more important than the option, if two tests PASS, the options are still used for the sorting. Note that before the parallelization changes, usually the same test filename would be run all by a single runtest instance, so it really didn't matter that much. > Your patch instead sorts based on the full test name, including options, > which means that the output no longer matches what you'd get from a > non-parallel run. AFAICT, it also no longer matches what you'd get from > the .sh version. That might be OK, just thought I'd mention it. I'm afraid there is not enough info to reconstruct the order serial version has. Jakub
On 09/23/2014 11:33 AM, Richard Sandiford wrote: > Segher Boessenkool<segher@kernel.crashing.org> writes: >> On Thu, Sep 18, 2014 at 01:44:55PM -0500, Segher Boessenkool wrote: >>> I am testing a patch that is just >>> >>> >>> diff --git a/contrib/dg-extract-results.py b/contrib/dg-extract-results.py >>> index cccbfd3..3781423 100644 >>> --- a/contrib/dg-extract-results.py >>> +++ b/contrib/dg-extract-results.py >>> @@ -117,7 +117,7 @@ class Prog: >>> self.tool_re = re.compile (r'^\t\t=== (.*) tests ===$') >>> self.result_re = re.compile (r'^(PASS|XPASS|FAIL|XFAIL|UNRESOLVED' >>> r'|WARNING|ERROR|UNSUPPORTED|UNTESTED' >>> - r'|KFAIL):\s*(\S+)') >>> + r'|KFAIL):\s*(.+)') >>> self.completed_re = re.compile (r'.* completed at (.*)') >>> # Pieces of text to write at the head of the output. >>> # start_line is a pair in which the first element is a datetime >> Tested that with four runs on powerpc64-linux, four configs each time; >> test-summary >> shows the same in all cases. Many lines have moved compared to without >> the patch, but that cannot be helped. Okay for mainline? >> >> >> 2014-09-19 Segher Boessenkool<segher@kernel.crashing.org> >> >> contrib/ >> * dg-extract-results.py (Prog.result_re): Include options in test name. > FWIW, the \S+ thing was deliberate. When one test is run multiple times > with different options, those options aren't necessarily tried in > alphabetical order. The old sh/awk script therefore used just the test > name as the key and kept tests with the same name in the order that > they were encountered: > > /^(PASS|XPASS|FAIL|XFAIL|UNRESOLVED|WARNING|ERROR|UNSUPPORTED|UNTESTED|KFAIL):/ { > testname=\$2 > # Ugly hack for gfortran.dg/dg.exp > if ("$TOOL" == "gfortran" && testname ~ /^gfortran.dg\/g77\//) > testname="h"testname > } > > (note the "$2"). This means that the output of the script is in the same > order as it would be for non-parallel runs. I was following (or trying > to follow) that behaviour in the python script. > > Your patch instead sorts based on the full test name, including options, > which means that the output no longer matches what you'd get from a > non-parallel run. AFAICT, it also no longer matches what you'd get from > the .sh version. That might be OK, just thought I'd mention it. > > Thanks, > Richard > Is this suppose to be resolved now? I'm still seeing some issues with a branch cut from mainline from yesterday. This is from the following sequence: check out revision 215511 , build, make -j16 check, make -j16 check, then compare all the .sum files: PASS: gcc.dg/tls/asm-1.c (test for errors, line 7) PASS: gcc.dg/tls/asm-1.c (test for excess errors) PASS: gcc.dg/tls/debug-1.c (test for excess errors) PASS: gcc.dg/tls/diag-1.c (test for excess errors) PASS: gcc.dg/tls/diag-2.c (test for errors, line 4) PASS: gcc.dg/tls/diag-2.c (test for errors, line 5) PASS: gcc.dg/tls/diag-2.c (test for errors, line 6) PASS: gcc.dg/tls/diag-2.c (test for errors, line 7) PASS: gcc.dg/tls/diag-2.c (test for errors, line 11) PASS: gcc.dg/tls/diag-2.c (test for errors, line 12) PASS: gcc.dg/tls/diag-2.c (test for errors, line 13) PASS: gcc.dg/tls/diag-2.c (test for errors, line 14) PASS: gcc.dg/tls/diag-2.c (test for errors, line 17) PASS: gcc.dg/tls/diag-2.c (test for errors, line 18) PASS: gcc.dg/tls/diag-2.c (test for errors, line 19) PASS: gcc.dg/tls/diag-2.c (test for errors, line 20) PASS: gcc.dg/tls/diag-2.c (test for errors, line 22) and then PASS: gcc.dg/tls/asm-1.c (test for errors, line 7) PASS: gcc.dg/tls/asm-1.c (test for excess errors) PASS: gcc.dg/tls/debug-1.c (test for excess errors) PASS: gcc.dg/tls/diag-1.c (test for excess errors) PASS: gcc.dg/tls/diag-2.c (test for errors, line 11) PASS: gcc.dg/tls/diag-2.c (test for errors, line 12) PASS: gcc.dg/tls/diag-2.c (test for errors, line 13) PASS: gcc.dg/tls/diag-2.c (test for errors, line 14) PASS: gcc.dg/tls/diag-2.c (test for errors, line 17) PASS: gcc.dg/tls/diag-2.c (test for errors, line 18) PASS: gcc.dg/tls/diag-2.c (test for errors, line 19) PASS: gcc.dg/tls/diag-2.c (test for errors, line 20) PASS: gcc.dg/tls/diag-2.c (test for errors, line 22) PASS: gcc.dg/tls/diag-2.c (test for errors, line 4) PASS: gcc.dg/tls/diag-2.c (test for errors, line 5) PASS: gcc.dg/tls/diag-2.c (test for errors, line 6) PASS: gcc.dg/tls/diag-2.c (test for errors, line 7) it looks like the first time sorted by line numerically (or just happened to leave the run order) and the second time did the sort alphabetically... Andrew
On Wed, Sep 24, 2014 at 10:54:57AM -0400, Andrew MacLeod wrote: > On 09/23/2014 11:33 AM, Richard Sandiford wrote: > >Your patch instead sorts based on the full test name, including options, > >which means that the output no longer matches what you'd get from a > >non-parallel run. AFAICT, it also no longer matches what you'd get from > >the .sh version. That might be OK, just thought I'd mention it. With the parallellisation changes the output was pretty random order. My patch made that a fixed order again, albeit a different one from before. > Is this suppose to be resolved now? I'm still seeing some issues with a > branch cut from mainline from yesterday. This is from the following > sequence: > > check out revision 215511 , build, make -j16 check, make -j16 check, > then compare all the .sum files: I don't understand what exactly you did; you have left out some steps I think? Segher
On 09/24/2014 12:10 PM, Segher Boessenkool wrote: > On Wed, Sep 24, 2014 at 10:54:57AM -0400, Andrew MacLeod wrote: >> On 09/23/2014 11:33 AM, Richard Sandiford wrote: >>> Your patch instead sorts based on the full test name, including options, >>> which means that the output no longer matches what you'd get from a >>> non-parallel run. AFAICT, it also no longer matches what you'd get from >>> the .sh version. That might be OK, just thought I'd mention it. > With the parallellisation changes the output was pretty random order. My > patch made that a fixed order again, albeit a different one from before. > >> Is this suppose to be resolved now? I'm still seeing some issues with a >> branch cut from mainline from yesterday. This is from the following >> sequence: >> >> check out revision 215511 , build, make -j16 check, make -j16 check, >> then compare all the .sum files: > I don't understand what exactly you did; you have left out some steps > I think? > What? no.. like what? check out a tree, basic configure and build from scratch (./configure --verbose, make -j16 all) and then run make check twice in a row.. literally "make -j16 -i check". nothing in between. so the compiler and toolchain are exactly the same. and different results. same way Ive done it forever. except I am still getting some different results from run to run. target is a normal build-x86_64-unknown-linux-gnu what I'm saying is something still isn't all getting sorted all the time (maybe if a section wasn't split up, it doesn't sort?), or all the patches to fix it aren't in, or there is something else still amok. Notice it isn't options that is the problem this time.. its the trailing line number of the test case warning. One is in numerical order, the other is in alphabetical order. Im running it a third time now.. we'll see if its different than both the others or not. Andrew
On 09/24/2014 12:29 PM, Andrew MacLeod wrote: > On 09/24/2014 12:10 PM, Segher Boessenkool wrote: >> On Wed, Sep 24, 2014 at 10:54:57AM -0400, Andrew MacLeod wrote: >>> On 09/23/2014 11:33 AM, Richard Sandiford wrote: >>>> Your patch instead sorts based on the full test name, including >>>> options, >>>> which means that the output no longer matches what you'd get from a >>>> non-parallel run. AFAICT, it also no longer matches what you'd get >>>> from >>>> the .sh version. That might be OK, just thought I'd mention it. >> With the parallellisation changes the output was pretty random >> order. My >> patch made that a fixed order again, albeit a different one from before. >> >>> Is this suppose to be resolved now? I'm still seeing some issues >>> with a >>> branch cut from mainline from yesterday. This is from the following >>> sequence: >>> >>> check out revision 215511 , build, make -j16 check, make -j16 check, >>> then compare all the .sum files: >> I don't understand what exactly you did; you have left out some steps >> I think? >> > What? no.. like what? check out a tree, basic configure and build > from scratch (./configure --verbose, make -j16 all) and then run make > check twice in a row.. literally "make -j16 -i check". nothing in > between. so the compiler and toolchain are exactly the same. and > different results. same way Ive done it forever. except I am still > getting some different results from run to run. target is a normal > build-x86_64-unknown-linux-gnu > > what I'm saying is something still isn't all getting sorted all the > time (maybe if a section wasn't split up, it doesn't sort?), or all > the patches to fix it aren't in, or there is something else still > amok. Notice it isn't options that is the problem this time.. its the > trailing line number of the test case warning. One is in numerical > order, the other is in alphabetical order. > > Im running it a third time now.. we'll see if its different than both > the others or not. > > Andrew AH. interesting. The third run has a gcc.sum that is exactly the same as the first run. so only the second run differs, and it seems to be from an alphabetical sort. So run 3 and 1 match. the gfortran.sum from the third run is identical to the *second* run, but it is different from the *first* run. so run 2 and 3 match. the two runs that match (2nd and 3rd run) look like: PASS: gfortran.dg/coarray/this_image_1.f90 -fcoarray=single -O2 (test for excess errors) PASS: gfortran.dg/coarray/this_image_1.f90 -fcoarray=single -O2 execution test PASS: gfortran.dg/coarray/this_image_1.f90 -fcoarray=lib -O2 -lcaf_single (test for excess errors) PASS: gfortran.dg/coarray/this_image_1.f90 -fcoarray=lib -O2 -lcaf_single execution test PASS: gfortran.dg/coarray/this_image_2.f90 -fcoarray=single -O2 (test for excess errors) PASS: gfortran.dg/coarray/this_image_2.f90 -fcoarray=single -O2 execution test PASS: gfortran.dg/coarray/this_image_2.f90 -fcoarray=lib -O2 -lcaf_single (test for excess errors) PASS: gfortran.dg/coarray/this_image_2.f90 -fcoarray=lib -O2 -lcaf_single execution test and the odd one out (firstrun:) PASS: gfortran.dg/coarray/this_image_1.f90 -fcoarray=lib -O2 -lcaf_single (test for excess errors) PASS: gfortran.dg/coarray/this_image_1.f90 -fcoarray=lib -O2 -lcaf_single execution test PASS: gfortran.dg/coarray/this_image_1.f90 -fcoarray=single -O2 (test for excess errors) PASS: gfortran.dg/coarray/this_image_1.f90 -fcoarray=single -O2 execution test PASS: gfortran.dg/coarray/this_image_2.f90 -fcoarray=lib -O2 -lcaf_single (test for excess errors) PASS: gfortran.dg/coarray/this_image_2.f90 -fcoarray=lib -O2 -lcaf_single execution test PASS: gfortran.dg/coarray/this_image_2.f90 -fcoarray=single -O2 (test for excess errors) PASS: gfortran.dg/coarray/this_image_2.f90 -fcoarray=single -O2 execution test looks like the first run was sorted, and the other 2 weren't. There must be some condition under which we don't sort the results? or another place which needs to be tweaked to do the sort as well...? Andrew
On 09/24/2014 01:58 PM, Andrew MacLeod wrote: > On 09/24/2014 12:29 PM, Andrew MacLeod wrote: >> > > AH. interesting. > > The third run has a gcc.sum that is exactly the same as the first run. > so only the second run differs, and it seems to be from an > alphabetical sort. So run 3 and 1 match. > the gfortran.sum from the third run is identical to the *second* run, > but it is different from the *first* run. so run 2 and 3 match. > > the two runs that match (2nd and 3rd run) look like: > PASS: gfortran.dg/coarray/this_image_1.f90 -fcoarray=single -O2 (test > for excess errors) > PASS: gfortran.dg/coarray/this_image_1.f90 -fcoarray=single -O2 > execution test > PASS: gfortran.dg/coarray/this_image_1.f90 -fcoarray=lib -O2 > -lcaf_single (test for excess errors) > PASS: gfortran.dg/coarray/this_image_1.f90 -fcoarray=lib -O2 > -lcaf_single execution test > PASS: gfortran.dg/coarray/this_image_2.f90 -fcoarray=single -O2 (test > for excess errors) > PASS: gfortran.dg/coarray/this_image_2.f90 -fcoarray=single -O2 > execution test > PASS: gfortran.dg/coarray/this_image_2.f90 -fcoarray=lib -O2 > -lcaf_single (test for excess errors) > PASS: gfortran.dg/coarray/this_image_2.f90 -fcoarray=lib -O2 > -lcaf_single execution test > > and the odd one out (firstrun:) > PASS: gfortran.dg/coarray/this_image_1.f90 -fcoarray=lib -O2 > -lcaf_single (test for excess errors) > PASS: gfortran.dg/coarray/this_image_1.f90 -fcoarray=lib -O2 > -lcaf_single execution test > PASS: gfortran.dg/coarray/this_image_1.f90 -fcoarray=single -O2 (test > for excess errors) > PASS: gfortran.dg/coarray/this_image_1.f90 -fcoarray=single -O2 > execution test > PASS: gfortran.dg/coarray/this_image_2.f90 -fcoarray=lib -O2 > -lcaf_single (test for excess errors) > PASS: gfortran.dg/coarray/this_image_2.f90 -fcoarray=lib -O2 > -lcaf_single execution test > PASS: gfortran.dg/coarray/this_image_2.f90 -fcoarray=single -O2 (test > for excess errors) > PASS: gfortran.dg/coarray/this_image_2.f90 -fcoarray=single -O2 > execution test > > looks like the first run was sorted, and the other 2 weren't. > > There must be some condition under which we don't sort the results? or > another place which needs to be tweaked to do the sort as well...? > > Andrew > So to be fair, I could use test_summary, but I think the concern is warranted because if this inconsistent ordering can happen to PASS, I would expect the same non-deterministic behaviour if those tests happen to FAIL. we just have far less FAILS so we aren't seeing it with test_summary at the moment... Aggregating all my .sum files, I see a sampling of about 257,000 PASSs, whereas I see a total of 141 FAILs. FAILs only account for < 0.06% of the output. ( I'm getting an average of about 510 mis-ordered PASSs, so it only affects a small portion of them as well.) I would think the output of .sum needs to be consistent from one run to the next in order for test_summary to consistently report its results as well. Andrew
On Thu, Sep 25, 2014 at 08:22:29AM -0400, Andrew MacLeod wrote: > So to be fair, I could use test_summary, but I think the concern is > warranted because if this inconsistent ordering can happen to PASS, I > would expect the same non-deterministic behaviour if those tests happen > to FAIL. we just have far less FAILS so we aren't seeing it with > test_summary at the moment... > > Aggregating all my .sum files, I see a sampling of about 257,000 PASSs, > whereas I see a total of 141 FAILs. FAILs only account for < 0.06% of > the output. ( I'm getting an average of about 510 mis-ordered PASSs, so > it only affects a small portion of them as well.) 0.24% here (2241 FAILs, 917715 PASSes). You're seeing about 1 in 500 misordered, so if it was independent (which of course it is not) I should see it in the FAILs already. > I would think the output of .sum needs to be consistent from one run to > the next in order for test_summary to consistently report its results as > well. Yes. There also is the problem of the summaries being messed up (which they were already before the parallelisation changes, but now the result is much worse). I'll have another look. Segher
diff --git a/contrib/dg-extract-results.py b/contrib/dg-extract-results.py index cccbfd3..3781423 100644 --- a/contrib/dg-extract-results.py +++ b/contrib/dg-extract-results.py @@ -117,7 +117,7 @@ class Prog: self.tool_re = re.compile (r'^\t\t=== (.*) tests ===$') self.result_re = re.compile (r'^(PASS|XPASS|FAIL|XFAIL|UNRESOLVED' r'|WARNING|ERROR|UNSUPPORTED|UNTESTED' - r'|KFAIL):\s*(\S+)') + r'|KFAIL):\s*(.+)') self.completed_re = re.compile (r'.* completed at (.*)') # Pieces of text to write at the head of the output. # start_line is a pair in which the first element is a datetime