Message ID | 1545395783-39040-1-git-send-email-pbonzini@redhat.com |
---|---|
State | New |
Headers | show |
On Fri, 21 Dec 2018 at 12:40, Paolo Bonzini <pbonzini@redhat.com> wrote: > > The following changes since commit e85c577158a2e8e252414959da9ef15c12eec63d: > > Merge remote-tracking branch 'remotes/huth-gitlab/tags/pull-request-2018-12-17' into staging (2018-12-18 14:31:06 +0000) > > are available in the git repository at: > > > git://github.com/bonzini/qemu.git tags/for-upstream > > for you to fetch changes up to 2cbc4ea2a9f9b81d5f4bfee8c412849755b842b0: > > avoid TABs in files that only contain a few (2018-12-21 12:58:28 +0100) > > v3->v4: avoid extra long command lines and ease cut-and-paste from make check logs > ---------------------------------------------------------------- > * HAX support for Linux hosts (Alejandro) > * esp bugfixes (Guenter) > * Windows build cleanup (Marc-André) > * checkpatch logic improvements (Paolo) > * coalesced range bugfix (Paolo) > * switch testsuite to TAP (Paolo) > * QTAILQ rewrite (Paolo) > * block/iscsi.c cancellation fixes (Stefan) > * improve selection of the default accelerator (Thomas) > > ---------------------------------------------------------------- Thanks for improving the length of the command lines. Unfortunately this does not seem to have fixed the 'write error' from make: MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))} QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64 QTEST_QEMU_IMG=qemu-img test s/boot-order-test -m=quick -k --tap | ./scripts/tap-driver.pl --test-name="boot-order-test" PASS 1 boot-order-test /x86_64/boot-order/pc MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))} QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64 QTEST_QEMU_IMG=qemu-img test s/bios-tables-test -m=quick -k --tap | ./scripts/tap-driver.pl --test-name="bios-tables-test" Looking for expected file 'tests/data/acpi/pc/DSDT' Using expected file 'tests/data/acpi/pc/DSDT' Looking for expected file 'tests/data/acpi/pc/FACP' Using expected file 'tests/data/acpi/pc/FACP' [... omit a bunch of other uninteresting logging from the acpi test] Looking for expected file 'tests/data/acpi/pc/NFIT.dimmpxm' Using expected file 'tests/data/acpi/pc/NFIT.dimmpxm' Looking for expected file 'tests/data/acpi/q35/DSDT.bridge' Using expected file 'tests/data/acpi/q35/DSDT.bridge' Looking for expected file 'tests/data/acpi/q35/FACP.bridge' Looking for expected file 'tests/data/acpi/q35/FACP' Using expected file 'tests/data/acpi/q35/FACP' Looking for expected file 'tests/data/acpi/q35/APIC.bridge' Looking for expected file 'tests/data/acpi/q35/APIC' Using expected file 'tests/data/acpi/q35/APIC' Looking for expectfwrite(): Resource temporarily unavailable make: Leaving directory '/home/petmay01/linaro/qemu-for-merges/build/alldbg' make: write error: stdout I don't really understand what's going on here, or why it only happens with this one system (my main x86-64 Linux Ubuntu 16.04.5 box) and not the various others I'm running test builds on. But it does seem to be 100% reliable with any of these pullreqs with the new test driver in them :-( thanks -- PMM
On 21/12/18 22:09, Peter Maydell wrote: > I don't really understand what's going on here, or why > it only happens with this one system (my main x86-64 > Linux Ubuntu 16.04.5 box) and not the various others I'm > running test builds on. But it does seem to be 100% > reliable with any of these pullreqs with the new test > driver in them :-( I'm afraid something in your setup is causing make's stdout to have O_NONBLOCK set. Make doesn't use O_NONBLOCK at all, so it must be something above it. I also checked Perl with strace and, at least here, it doesn't set O_NONBLOCK. So here are some ideas... First, can you try applying something like this to reproduce? --- a/Makefile +++ b/Makefile @@ -17,9 +17,13 @@ print-%: # All following code might depend on configuration variables ifneq ($(wildcard config-host.mak),) # Put the all: rule here so that config-host.mak can contain dependencies. -all: +all: lotsofoutput include config-host.mak +.PHONY: lotsofoutput +lotsofoutput: + yes 1234567890 | head -n 10000 + git-submodule-update: .PHONY: git-submodule-update And please try applying this, which is a bit of a shot in the dark but 1) it is a good idea anyway; 2) it may help, if not alone, together with the workarounds below: diff --git a/scripts/tap-driver.pl b/scripts/tap-driver.pl index 5e59b5db49..6621a5cd67 100755 --- a/scripts/tap-driver.pl +++ b/scripts/tap-driver.pl @@ -313,6 +313,7 @@ sub main () my $iterator = TAP::Parser::Iterator::Stream->new(\*STDIN); my $parser = TAP::Parser->new ({iterator => $iterator }); + STDOUT->autoflush(1); while (defined (my $cur = $parser->next)) { # Parsing of TAP input should stop after a "Bail out!" directive. diff --git a/scripts/tap-merge.pl b/scripts/tap-merge.pl index 59e3fa5007..10ccf57bb2 100755 --- a/scripts/tap-merge.pl +++ b/scripts/tap-merge.pl @@ -53,6 +53,7 @@ sub main () my $testno = 0; # Number of test results seen so far. my $bailed_out = 0; # Whether a "Bail out!" directive has been seen. + STDOUT->autoflush(1); while (defined (my $cur = $parser->next)) { if ($cur->is_bailout) Possible workarounds include: - using "make -Oline" or "make -Onone" (for -Oline, it may require the above autoflush patch). - running this Python script before invoking make import os from fcntl import * fcntl(1, F_SETFL, fcntl(1, F_GETFL) & ~os.O_NONBLOCK) Paolo
On Sat, 22 Dec 2018 at 08:41, Paolo Bonzini <pbonzini@redhat.com> wrote: > > On 21/12/18 22:09, Peter Maydell wrote: > > I don't really understand what's going on here, or why > > it only happens with this one system (my main x86-64 > > Linux Ubuntu 16.04.5 box) and not the various others I'm > > running test builds on. But it does seem to be 100% > > reliable with any of these pullreqs with the new test > > driver in them :-( > > I'm afraid something in your setup is causing make's stdout to have > O_NONBLOCK set. Make doesn't use O_NONBLOCK at all, so it must be > something above it. I also checked Perl with strace and, at least here, > it doesn't set O_NONBLOCK. > > So here are some ideas... First, can you try applying something like > this to reproduce? OK, I'll give these a go, but it'll have to be in January now. thanks -- PMM
On Sat, 22 Dec 2018 at 08:41, Paolo Bonzini <pbonzini@redhat.com> wrote: > > On 21/12/18 22:09, Peter Maydell wrote: > > I don't really understand what's going on here, or why > > it only happens with this one system (my main x86-64 > > Linux Ubuntu 16.04.5 box) and not the various others I'm > > running test builds on. But it does seem to be 100% > > reliable with any of these pullreqs with the new test > > driver in them :-( > > I'm afraid something in your setup is causing make's stdout to have > O_NONBLOCK set. Make doesn't use O_NONBLOCK at all, so it must be > something above it. I also checked Perl with strace and, at least here, > it doesn't set O_NONBLOCK. Interestingly, I have today run into "make: write error: stdout" with the existing make check infrastructure. It seems to only happen if I run 'make check' in a loop like this: while make -C build/x86 -j8 V=1 check; do true; done Adding strace to this shows that although the first invocation of 'make' starts with stdout not O_NONBLOCK, something in that run results in stdout being set O_NONBLOCK, and then that persists into the second run of make, which then may fail as a result. (This is make 4.1, which will write output with fputs() unless a --sync-output option is used, and thus doesn't retry on EINTR.) When the 'while' loop finishes, something in the shell results in stdout being set back to blocking again. I presume that something in one of the tests we're running, likely QEMU itself, ends up setting stdout to non-blocking. This while rune *used* to be entirely reliable, so maybe something recent has changed ? thanks -- PMM
On 03/01/19 19:37, Peter Maydell wrote: > On Sat, 22 Dec 2018 at 08:41, Paolo Bonzini <pbonzini@redhat.com> wrote: >> >> On 21/12/18 22:09, Peter Maydell wrote: >>> I don't really understand what's going on here, or why >>> it only happens with this one system (my main x86-64 >>> Linux Ubuntu 16.04.5 box) and not the various others I'm >>> running test builds on. But it does seem to be 100% >>> reliable with any of these pullreqs with the new test >>> driver in them :-( >> >> I'm afraid something in your setup is causing make's stdout to have >> O_NONBLOCK set. Make doesn't use O_NONBLOCK at all, so it must be >> something above it. I also checked Perl with strace and, at least here, >> it doesn't set O_NONBLOCK. > > Interestingly, I have today run into "make: write error: stdout" > with the existing make check infrastructure. [...] > I presume that something in one of the tests we're running, > likely QEMU itself, ends up setting stdout to non-blocking. > This while rune *used* to be entirely reliable, so maybe > something recent has changed ? I don't know... I tried running make check under "strace -e fcntl" and I didn't find any occurrences of fcntl(1, O_SETFL, ...|O_NONBLOCK). Perhaps you can add a check after every invocation of a test executable. Are you going to apply the pull request since the bug is preexisting? Thanks, Polo
On Fri, 4 Jan 2019 at 07:59, Paolo Bonzini <pbonzini@redhat.com> wrote: > > On 03/01/19 19:37, Peter Maydell wrote: > > On Sat, 22 Dec 2018 at 08:41, Paolo Bonzini <pbonzini@redhat.com> wrote: > > Interestingly, I have today run into "make: write error: stdout" > > with the existing make check infrastructure. [...] > > I presume that something in one of the tests we're running, > > likely QEMU itself, ends up setting stdout to non-blocking. > > This while rune *used* to be entirely reliable, so maybe > > something recent has changed ? > > I don't know... I tried running make check under "strace -e fcntl" and I > didn't find any occurrences of fcntl(1, O_SETFL, ...|O_NONBLOCK). > Perhaps you can add a check after every invocation of a test executable. > Are you going to apply the pull request since the bug is preexisting? The pull request takes the problem from "occasionally shows up in by-hand running of make check in a while loop" to "reliably causes my automated tests to fail every time". I can't apply a pullreq that does that, so we need to find at least a workaround. I'm not sure why your test harness makes it much more likely that the problem manifests. (The shell will typically clear the O_NONBLOCK flag on its terminal before waiting for interactive input, which is why this is a problem when running make check inside a while loop or in my pull test script which runs make check for several trees in succession, but not an issue for individual runs on the command line.) thanks -- PMM
On 04/01/19 11:06, Peter Maydell wrote: > On Fri, 4 Jan 2019 at 07:59, Paolo Bonzini <pbonzini@redhat.com> wrote: >> >> On 03/01/19 19:37, Peter Maydell wrote: >>> On Sat, 22 Dec 2018 at 08:41, Paolo Bonzini <pbonzini@redhat.com> wrote: >>> Interestingly, I have today run into "make: write error: stdout" >>> with the existing make check infrastructure. [...] >>> I presume that something in one of the tests we're running, >>> likely QEMU itself, ends up setting stdout to non-blocking. >>> This while rune *used* to be entirely reliable, so maybe >>> something recent has changed ? >> >> I don't know... I tried running make check under "strace -e fcntl" and I >> didn't find any occurrences of fcntl(1, O_SETFL, ...|O_NONBLOCK). >> Perhaps you can add a check after every invocation of a test executable. > >> Are you going to apply the pull request since the bug is preexisting? > > The pull request takes the problem from "occasionally shows up > in by-hand running of make check in a while loop" to "reliably > causes my automated tests to fail every time". I can't apply > a pullreq that does that, so we need to find at least a workaround. The certain workaround is to clear O_NONBLOCK before invoking make, possibly by placing a wrapper in ~/bin: import os from fcntl import * fcntl(1, F_SETFL, fcntl(1, F_GETFL) & ~os.O_NONBLOCK) If you're using --output-sync, *not* using it might work, but not if the write error occurs when writing the command (as opposed to the output). For what it's worth, I think the latest version of Make is still susceptible when using --output-sync. The loop that flushes the temporary buffer to stdout or stderr is as follows: while (1) { int len; EINTRLOOP (len, read (from, buffer, sizeof (buffer))); if (len < 0) perror ("read()"); if (len <= 0) break; if (fwrite (buffer, len, 1, to) < 1) { perror ("fwrite()"); break; } fflush (to); } Maybe it's one of the dependencies of QEMU that is causing it. Paolo > I'm not sure why your test harness makes it much more likely > that the problem manifests.
On Fri, 4 Jan 2019 at 11:01, Paolo Bonzini <pbonzini@redhat.com> wrote: > > On 04/01/19 11:06, Peter Maydell wrote: > > On Fri, 4 Jan 2019 at 07:59, Paolo Bonzini <pbonzini@redhat.com> wrote: > >> > >> On 03/01/19 19:37, Peter Maydell wrote: > >>> On Sat, 22 Dec 2018 at 08:41, Paolo Bonzini <pbonzini@redhat.com> wrote: > >>> Interestingly, I have today run into "make: write error: stdout" > >>> with the existing make check infrastructure. [...] > >>> I presume that something in one of the tests we're running, > >>> likely QEMU itself, ends up setting stdout to non-blocking. > >>> This while rune *used* to be entirely reliable, so maybe > >>> something recent has changed ? > >> > >> I don't know... I tried running make check under "strace -e fcntl" and I > >> didn't find any occurrences of fcntl(1, O_SETFL, ...|O_NONBLOCK). > >> Perhaps you can add a check after every invocation of a test executable. > > > >> Are you going to apply the pull request since the bug is preexisting? > > > > The pull request takes the problem from "occasionally shows up > > in by-hand running of make check in a while loop" to "reliably > > causes my automated tests to fail every time". I can't apply > > a pullreq that does that, so we need to find at least a workaround. > > The certain workaround is to clear O_NONBLOCK before invoking make, > possibly by placing a wrapper in ~/bin: > > import os > from fcntl import * > fcntl(1, F_SETFL, fcntl(1, F_GETFL) & ~os.O_NONBLOCK) I'll give this a go, but I think this will not necessarily be sufficient if some program invoked by make sets the O_NONBLOCK flag, and then make later in the same run tries to output and gets EINTR. Perhaps it would be better to put it in the test harness that invokes test binaries, so as to clear the flag as soon as possible? It could also then in theory print a warning if a test case had put stdin/stdout into non-blocking mode. (The semantics of O_NONBLOCK here seem to me to be completely broken. But they are what they are...) thanks -- PMM
On 04/01/19 12:31, Peter Maydell wrote: > I'll give this a go, but I think this will not necessarily be > sufficient if some program invoked by make sets the O_NONBLOCK flag, > and then make later in the same run tries to output and gets EINTR. > Perhaps it would be better to put it in the test harness that > invokes test binaries, so as to clear the flag as soon > as possible? It could also then in theory print a warning > if a test case had put stdin/stdout into non-blocking mode. Aha, you gave me the clue I needed. :) gtester redirected stdin/out/err to /dev/null. In the new harness, stdout is piped into tap-driver.pl, but everything else is left in place. Leaving stderr should be safe, while stdin is the most likely culprit. Adding </dev/null to the test invocation should fix it; I'll send a new pull request either today or next Monday. > (The semantics of O_NONBLOCK here seem to me to be completely > broken. But they are what they are...) Yes, and F_SETFL is not the only "interesting" fcntl in this respect; fortunately QEMU is not using F_SETOWN anymore! Paolo
On Fri, 4 Jan 2019 at 07:59, Paolo Bonzini <pbonzini@redhat.com> wrote: > I don't know... I tried running make check under "strace -e fcntl" and I > didn't find any occurrences of fcntl(1, O_SETFL, ...|O_NONBLOCK). I found this strace command worked to track down some possible culprits: strace -o /tmp/strace.log -e fcntl,execve,clone -e signal=none -f make -C build/x86 V=1 check Extracts from the logs: 29251 execve("tests/test-char", ["tests/test-char", "-q", "-p", "/char/stdio/subprocess", "--GTestSubprocess", "--GTestLogFD", "4"], [/* 66 vars */] <unfinished ...> 29251 <... execve resumed> ) = 0 29251 fcntl(3, F_GETFD) = 0 29251 fcntl(3, F_SETFD, FD_CLOEXEC) = 0 29251 clone(child_stack=0x7f43946addf0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f43946ae9d0, tls=0x7f43946ae700, child_tidptr=0x7f43946ae9d0) = 29254 29251 fcntl(4, F_GETFD) = 0 29251 fcntl(4, F_SETFD, FD_CLOEXEC) = 0 29251 fcntl(4, F_GETFL) = 0x2 (flags O_RDWR) 29251 fcntl(4, F_SETFL, O_RDWR|O_NONBLOCK) = 0 29251 fcntl(0, F_GETFL) = 0x8000 (flags O_RDONLY|O_LARGEFILE) 29251 fcntl(0, F_GETFL) = 0x8000 (flags O_RDONLY|O_LARGEFILE) 29251 fcntl(0, F_SETFL, O_RDONLY|O_NONBLOCK|O_LARGEFILE) = 0 29251 fcntl(1, F_GETFL) = 0x1 (flags O_WRONLY) 29251 fcntl(1, F_SETFL, O_WRONLY|O_NONBLOCK) = 0 29251 fcntl(0, F_SETFL, O_RDONLY|O_LARGEFILE) = 0 29251 fcntl(0, F_SETFL, O_RDONLY|O_LARGEFILE) = -1 EBADF (Bad file descriptor) 29254 +++ exited with 0 +++ 29251 +++ exited with 0 +++ (the attempt to F_SETFL a closed stdin is also interesting). and 30300 execve("tests/hexloader-test", ["tests/hexloader-test", "--quiet", "--keep-going", "-m =quick", "--GTestLogFD=6"], [/* 68 vars */]) = 0 30300 fcntl(3, F_GETFD) = 0 30300 fcntl(3, F_SETFD, FD_CLOEXEC) = 0 30300 fcntl(4, F_GETFD) = 0 30300 fcntl(4, F_SETFD, FD_CLOEXEC) = 0 30300 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f26bdc09a10) = 30301 30301 execve("/bin/sh", ["sh", "-c", "exec arm-softmmu/qemu-system-arm"...], [/* 69 vars */]) = 0 30301 execve("arm-softmmu/qemu-system-arm", ["arm-softmmu/qemu-system-arm", "-qtest", "unix:/tmp/qtest-30300.sock,nowai"..., "-qtest-log", "/dev/null", "-chardev", "socket,path=/tmp/qtest-30300.qmp"..., "-mon", "chardev=char0,mode=control", "-machine", "accel=qtest", "-display", "none", "-M", "vexpress-a9", "-nographic", ...], [/* 69 vars */]) = 0 30301 fcntl(3, F_GETFD) = 0 30301 fcntl(3, F_SETFD, FD_CLOEXEC) = 0 30301 clone(child_stack=0x7f84f7614730, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f84f76179d0, tls=0x7f84f7617700, child_tidptr=0x7f84f76179d0) = 30302 30301 fcntl(7, F_GETFD) = 0 30301 fcntl(7, F_SETFD, FD_CLOEXEC) = 0 30301 fcntl(7, F_GETFL) = 0x2 (flags O_RDWR) 30301 fcntl(7, F_SETFL, O_RDWR|O_NONBLOCK) = 0 30301 fcntl(11, F_GETFL) = 0x2 (flags O_RDWR) 30301 fcntl(11, F_SETFL, O_RDWR|O_NONBLOCK) = 0 30301 fcntl(12, F_GETFL) = 0x2 (flags O_RDWR) 30301 fcntl(12, F_SETFL, O_RDWR|O_NONBLOCK) = 0 30301 clone(child_stack=0x7f84f6e13730, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f84f6e169d0, tls=0x7f84f6e16700, child_tidptr=0x7f84f6e169d0) = 30303 30301 fcntl(0, F_GETFL) = 0x8000 (flags O_RDONLY|O_LARGEFILE) 30301 fcntl(0, F_GETFL) = 0x8000 (flags O_RDONLY|O_LARGEFILE) 30301 fcntl(0, F_SETFL, O_RDONLY|O_NONBLOCK|O_LARGEFILE) = 0 30301 fcntl(1, F_GETFL) = 0x8402 (flags O_RDWR|O_APPEND|O_LARGEFILE) 30301 fcntl(1, F_SETFL, O_RDWR|O_APPEND|O_NONBLOCK|O_LARGEFILE) = 0 30301 clone(child_stack=0x7f84f6612730, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f84f66159d0, tls=0x7f84f6615700, child_tidptr=0x7f84f66159d0) = 30304 30303 +++ exited with 0 +++ 30301 fcntl(0, F_SETFL, O_RDONLY|O_LARGEFILE) = 0 30301 fcntl(0, F_SETFL, O_RDONLY|O_LARGEFILE) = -1 EBADF (Bad file descriptor) 30302 +++ exited with 0 +++ 30301 +++ exited with 0 +++ where tests/hexloader-test starts a QEMU process which makes stdin/stdout non-blocking. I didn't try to also track dup/dup2/dup3 use to see whether fd 0/1/2 were duped to other fds and then set O_NONBLOCK via those, or open to see if they were reopened with other contents. (Test run is still in-progress, I'll update if there are any more interesting bits of logging.) thanks -- PMM
On 04/01/19 14:03, Peter Maydell wrote: > On Fri, 4 Jan 2019 at 07:59, Paolo Bonzini <pbonzini@redhat.com> wrote: >> I don't know... I tried running make check under "strace -e fcntl" and I >> didn't find any occurrences of fcntl(1, O_SETFL, ...|O_NONBLOCK). > > I found this strace command worked to track down some possible > culprits: > strace -o /tmp/strace.log -e fcntl,execve,clone -e signal=none -f make > -C build/x86 V=1 check > > Extracts from the logs: > > 29251 execve("tests/test-char", ["tests/test-char", "-q", "-p", > "/char/stdio/subprocess", "--GTestSubprocess", "--GTestLogFD", "4"], > [/* 66 vars */] <unfinished ...> > 29251 <... execve resumed> ) = 0 > 29251 fcntl(3, F_GETFD) = 0 > 29251 fcntl(3, F_SETFD, FD_CLOEXEC) = 0 > 29251 clone(child_stack=0x7f43946addf0, > flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, > parent_tidptr=0x7f43946ae9d0, tls=0x7f43946ae700, > child_tidptr=0x7f43946ae9d0) = 29254 > 29251 fcntl(4, F_GETFD) = 0 > 29251 fcntl(4, F_SETFD, FD_CLOEXEC) = 0 > 29251 fcntl(4, F_GETFL) = 0x2 (flags O_RDWR) > 29251 fcntl(4, F_SETFL, O_RDWR|O_NONBLOCK) = 0 > 29251 fcntl(0, F_GETFL) = 0x8000 (flags O_RDONLY|O_LARGEFILE) > 29251 fcntl(0, F_GETFL) = 0x8000 (flags O_RDONLY|O_LARGEFILE) > 29251 fcntl(0, F_SETFL, O_RDONLY|O_NONBLOCK|O_LARGEFILE) = 0 > 29251 fcntl(1, F_GETFL) = 0x1 (flags O_WRONLY) > 29251 fcntl(1, F_SETFL, O_WRONLY|O_NONBLOCK) = 0 > 29251 fcntl(0, F_SETFL, O_RDONLY|O_LARGEFILE) = 0 > 29251 fcntl(0, F_SETFL, O_RDONLY|O_LARGEFILE) = -1 EBADF (Bad file descriptor) > 29254 +++ exited with 0 +++ > 29251 +++ exited with 0 +++ This one is okay, since the subprocess is using a pipe for stdin and stdout. QEMU restores O_NONBLOCK at exit (see term_exit in chardev/char-stdio.c); however, for parallel make and no --output-sync, a small window does remain where make sees a nonblocking stdout. Even that wouldn't explain why the second make inside the while loop always sees a nonblocking stdout, while the first never does. Paolo > (the attempt to F_SETFL a closed stdin is also interesting). > > and > > 30300 execve("tests/hexloader-test", ["tests/hexloader-test", > "--quiet", "--keep-going", "-m > =quick", "--GTestLogFD=6"], [/* 68 vars */]) = 0 > 30300 fcntl(3, F_GETFD) = 0 > 30300 fcntl(3, F_SETFD, FD_CLOEXEC) = 0 > 30300 fcntl(4, F_GETFD) = 0 > 30300 fcntl(4, F_SETFD, FD_CLOEXEC) = 0 > 30300 clone(child_stack=0, > flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, > child_tidptr=0x7f26bdc09a10) = 30301 > 30301 execve("/bin/sh", ["sh", "-c", "exec > arm-softmmu/qemu-system-arm"...], [/* 69 vars */]) = 0 > 30301 execve("arm-softmmu/qemu-system-arm", > ["arm-softmmu/qemu-system-arm", "-qtest", > "unix:/tmp/qtest-30300.sock,nowai"..., "-qtest-log", "/dev/null", > "-chardev", "socket,path=/tmp/qtest-30300.qmp"..., "-mon", > "chardev=char0,mode=control", "-machine", "accel=qtest", "-display", > "none", "-M", "vexpress-a9", "-nographic", ...], [/* 69 vars */]) = 0 > 30301 fcntl(3, F_GETFD) = 0 > 30301 fcntl(3, F_SETFD, FD_CLOEXEC) = 0 > 30301 clone(child_stack=0x7f84f7614730, > flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, > parent_tidptr=0x7f84f76179d0, tls=0x7f84f7617700, > child_tidptr=0x7f84f76179d0) = 30302 > 30301 fcntl(7, F_GETFD) = 0 > 30301 fcntl(7, F_SETFD, FD_CLOEXEC) = 0 > 30301 fcntl(7, F_GETFL) = 0x2 (flags O_RDWR) > 30301 fcntl(7, F_SETFL, O_RDWR|O_NONBLOCK) = 0 > 30301 fcntl(11, F_GETFL) = 0x2 (flags O_RDWR) > 30301 fcntl(11, F_SETFL, O_RDWR|O_NONBLOCK) = 0 > 30301 fcntl(12, F_GETFL) = 0x2 (flags O_RDWR) > 30301 fcntl(12, F_SETFL, O_RDWR|O_NONBLOCK) = 0 > 30301 clone(child_stack=0x7f84f6e13730, > flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, > parent_tidptr=0x7f84f6e169d0, tls=0x7f84f6e16700, > child_tidptr=0x7f84f6e169d0) = 30303 > 30301 fcntl(0, F_GETFL) = 0x8000 (flags O_RDONLY|O_LARGEFILE) > 30301 fcntl(0, F_GETFL) = 0x8000 (flags O_RDONLY|O_LARGEFILE) > 30301 fcntl(0, F_SETFL, O_RDONLY|O_NONBLOCK|O_LARGEFILE) = 0 > 30301 fcntl(1, F_GETFL) = 0x8402 (flags > O_RDWR|O_APPEND|O_LARGEFILE) > 30301 fcntl(1, F_SETFL, O_RDWR|O_APPEND|O_NONBLOCK|O_LARGEFILE) = 0 > 30301 clone(child_stack=0x7f84f6612730, > flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, > parent_tidptr=0x7f84f66159d0, tls=0x7f84f6615700, > child_tidptr=0x7f84f66159d0) = 30304 > 30303 +++ exited with 0 +++ > 30301 fcntl(0, F_SETFL, O_RDONLY|O_LARGEFILE) = 0 > 30301 fcntl(0, F_SETFL, O_RDONLY|O_LARGEFILE) = -1 EBADF (Bad file descriptor) > 30302 +++ exited with 0 +++ > 30301 +++ exited with 0 +++ > > where tests/hexloader-test starts a QEMU process which > makes stdin/stdout non-blocking. > > I didn't try to also track dup/dup2/dup3 use to see whether fd 0/1/2 > were duped to other fds and then set O_NONBLOCK via those, or > open to see if they were reopened with other contents. > > (Test run is still in-progress, I'll update if there are any more > interesting bits of logging.) > > thanks > -- PMM >
On Fri, 4 Jan 2019 at 13:03, Peter Maydell <peter.maydell@linaro.org> wrote: > > On Fri, 4 Jan 2019 at 07:59, Paolo Bonzini <pbonzini@redhat.com> wrote: > > I don't know... I tried running make check under "strace -e fcntl" and I > > didn't find any occurrences of fcntl(1, O_SETFL, ...|O_NONBLOCK). > > I found this strace command worked to track down some possible > culprits: > strace -o /tmp/strace.log -e fcntl,execve,clone -e signal=none -f make > -C build/x86 V=1 check > > Extracts from the logs: > > 29251 execve("tests/test-char", ["tests/test-char", "-q", "-p", > "/char/stdio/subprocess", "--GTestSubprocess", "--GTestLogFD", "4"], > [/* 66 vars */] <unfinished ...> This one seems to be ok, because the call to g_test_trap_subprocess() does not pass any flags arguments, which means the child gets its stdin/stdout/stderr all set to /dev/null, and the change to O_NONBLOCK is thus harmless. > 30300 execve("tests/hexloader-test", ["tests/hexloader-test", > "--quiet", "--keep-going", "-m > =quick", "--GTestLogFD=6"], [/* 68 vars */]) = 0 This one on the other hand does seem like it might be an issue. Removing the "-nographic" from the qtest_initf() command causes it no longer to mess with the O_NONBLOCK state of its stdout. I'm just doing a 'while loop' test run to see if it fixes the problems I was seeing with that. It seems a touch brittle if a harmless-looking command line option to QEMU can break test runs, though -- is there a way we can make this more robust, eg by having the QEMU process i/o always redirected ? thanks -- PMM