Message ID | 20180723193530.20891-1-eblake@redhat.com |
---|---|
State | New |
Headers | show |
Series | [v2,for-3.0] tests/libqtest: Improve kill_qemu() | expand |
On 07/23/2018 12:35 PM, Eric Blake wrote: > In kill_qemu() we have an assert that checks that the QEMU process > didn't dump core: > assert(!WCOREDUMP(wstatus)); > > Unfortunately the WCOREDUMP macro here means the resulting message > is not very easy to comprehend on at least some systems: > > ahci-test: tests/libqtest.c:113: kill_qemu: Assertion `!(((__extension__ (((union { __typeof(wstatus) __in; int __i; }) { .__in = (wstatus) }).__i))) & 0x80)' failed. > > and it doesn't identify what signal the process took. > > Furthermore, we are NOT detecting EINTR (while EINTR shouldn't be > happening if we didn't install signal handlers, it's still better > to always be robust), and also want to log unexpected non-zero status > that was not accompanied by a core dump. > > Instead of using a raw assert, print the information in an > easier to understand way: > > /i386/ahci/sanity: tests/libqtest.c:119: kill_qemu() detected QEMU death with core dump from signal 11 (Segmentation fault) > Aborted (core dumped) > > (Of course, the really useful information would be why the QEMU > process dumped core in the first place, but we don't have that > by the time the test program has picked up the exit status.) > > Suggested-by: Peter Maydell <peter.maydell@linaro.org> > Signed-off-by: Eric Blake <eblake@redhat.com> > --- > > I've taken the ideas from Peter's patch: > https://lists.gnu.org/archive/html/qemu-devel/2018-07/msg04430.html > as well as fixing a related issue brought up last time this was touched: > https://lists.gnu.org/archive/html/qemu-devel/2018-05/msg05710.html > > tests/libqtest.c | 39 +++++++++++++++++++++++++++++++++++++-- > 1 file changed, 37 insertions(+), 2 deletions(-) Reviewed-by: Richard Henderson <richard.henderson@linaro.org> r~
Eric Blake <eblake@redhat.com> writes: > In kill_qemu() we have an assert that checks that the QEMU process > didn't dump core: > assert(!WCOREDUMP(wstatus)); > > Unfortunately the WCOREDUMP macro here means the resulting message > is not very easy to comprehend on at least some systems: > > ahci-test: tests/libqtest.c:113: kill_qemu: Assertion `!(((__extension__ (((union { __typeof(wstatus) __in; int __i; }) { .__in = (wstatus) }).__i))) & 0x80)' failed. > > and it doesn't identify what signal the process took. > > Furthermore, we are NOT detecting EINTR (while EINTR shouldn't be > happening if we didn't install signal handlers, it's still better > to always be robust), and also want to log unexpected non-zero status > that was not accompanied by a core dump. > > Instead of using a raw assert, print the information in an > easier to understand way: > > /i386/ahci/sanity: tests/libqtest.c:119: kill_qemu() detected QEMU death with core dump from signal 11 (Segmentation fault) > Aborted (core dumped) > > (Of course, the really useful information would be why the QEMU > process dumped core in the first place, but we don't have that > by the time the test program has picked up the exit status.) > > Suggested-by: Peter Maydell <peter.maydell@linaro.org> > Signed-off-by: Eric Blake <eblake@redhat.com> > --- > > I've taken the ideas from Peter's patch: > https://lists.gnu.org/archive/html/qemu-devel/2018-07/msg04430.html > as well as fixing a related issue brought up last time this was touched: > https://lists.gnu.org/archive/html/qemu-devel/2018-05/msg05710.html > > tests/libqtest.c | 39 +++++++++++++++++++++++++++++++++++++-- > 1 file changed, 37 insertions(+), 2 deletions(-) > > diff --git a/tests/libqtest.c b/tests/libqtest.c > index 098af6aec44..f3dabfadd78 100644 > --- a/tests/libqtest.c > +++ b/tests/libqtest.c > @@ -105,12 +105,47 @@ static void kill_qemu(QTestState *s) > if (s->qemu_pid != -1) { > int wstatus = 0; > pid_t pid; > + bool die = false; > > kill(s->qemu_pid, SIGTERM); > + retry: > pid = waitpid(s->qemu_pid, &wstatus, 0); > + if (pid == -1 && errno == EINTR) { > + goto retry; > + } > > - if (pid == s->qemu_pid && WIFSIGNALED(wstatus)) { > - assert(!WCOREDUMP(wstatus)); > + assert(pid == s->qemu_pid); > + /* > + * We expect qemu to exit with status 0; anything else is > + * fishy and should be logged. Abort except when death by > + * signal is not accompanied by a coredump (as that's the only > + * time it was likely that the user is trying to kill the > + * testsuite early). > + */ > + if (wstatus) { > + die = true; > + if (WIFEXITED(wstatus)) { > + fprintf(stderr, "%s:%d: kill_qemu() tried to terminate QEMU " > + "process but encountered exit status %d\n", > + __FILE__, __LINE__, WEXITSTATUS(wstatus)); > + } else if (WIFSIGNALED(wstatus)) { > + int sig = WTERMSIG(wstatus); > + const char *signame = strsignal(sig) ?: "unknown ???"; > + > + if (!WCOREDUMP(wstatus)) { > + die = false; Does WCOREDUMP(wstatus) depend on the user's ulimit -c? What higher level kind of QEMU termination are you trying to detect here? > + fprintf(stderr, "%s:%d: kill_qemu() ignoring QEMU death " > + "by signal %d (%s)\n", > + __FILE__, __LINE__, sig, signame); > + } else { > + fprintf(stderr, "%s:%d: kill_qemu() detected QEMU death " > + "with core dump from signal %d (%s)\n", > + __FILE__, __LINE__, sig, signame); > + } > + } > + } > + if (die) { > + abort(); > } > } > }
On 23.07.2018 21:35, Eric Blake wrote: > In kill_qemu() we have an assert that checks that the QEMU process > didn't dump core: > assert(!WCOREDUMP(wstatus)); > > Unfortunately the WCOREDUMP macro here means the resulting message > is not very easy to comprehend on at least some systems: > > ahci-test: tests/libqtest.c:113: kill_qemu: Assertion `!(((__extension__ (((union { __typeof(wstatus) __in; int __i; }) { .__in = (wstatus) }).__i))) & 0x80)' failed. > > and it doesn't identify what signal the process took. > > Furthermore, we are NOT detecting EINTR (while EINTR shouldn't be > happening if we didn't install signal handlers, it's still better > to always be robust), and also want to log unexpected non-zero status > that was not accompanied by a core dump. > > Instead of using a raw assert, print the information in an > easier to understand way: > > /i386/ahci/sanity: tests/libqtest.c:119: kill_qemu() detected QEMU death with core dump from signal 11 (Segmentation fault) > Aborted (core dumped) > > (Of course, the really useful information would be why the QEMU > process dumped core in the first place, but we don't have that > by the time the test program has picked up the exit status.) > > Suggested-by: Peter Maydell <peter.maydell@linaro.org> > Signed-off-by: Eric Blake <eblake@redhat.com> > --- > > I've taken the ideas from Peter's patch: > https://lists.gnu.org/archive/html/qemu-devel/2018-07/msg04430.html > as well as fixing a related issue brought up last time this was touched: > https://lists.gnu.org/archive/html/qemu-devel/2018-05/msg05710.html > > tests/libqtest.c | 39 +++++++++++++++++++++++++++++++++++++-- > 1 file changed, 37 insertions(+), 2 deletions(-) > > diff --git a/tests/libqtest.c b/tests/libqtest.c > index 098af6aec44..f3dabfadd78 100644 > --- a/tests/libqtest.c > +++ b/tests/libqtest.c > @@ -105,12 +105,47 @@ static void kill_qemu(QTestState *s) > if (s->qemu_pid != -1) { > int wstatus = 0; > pid_t pid; > + bool die = false; > > kill(s->qemu_pid, SIGTERM); > + retry: > pid = waitpid(s->qemu_pid, &wstatus, 0); > + if (pid == -1 && errno == EINTR) { > + goto retry; > + } do { pid = waitpid(s->qemu_pid, &wstatus, 0); } while (pid == -1 && errno == EINTR); ? Or use the TFR macro from include/qemu-common.h ? Thomas
On 07/24/2018 01:36 AM, Markus Armbruster wrote: > Eric Blake <eblake@redhat.com> writes: > >> In kill_qemu() we have an assert that checks that the QEMU process >> didn't dump core: >> assert(!WCOREDUMP(wstatus)); >> >> Unfortunately the WCOREDUMP macro here means the resulting message >> is not very easy to comprehend on at least some systems: >> >> - if (pid == s->qemu_pid && WIFSIGNALED(wstatus)) { >> - assert(!WCOREDUMP(wstatus)); >> + } else if (WIFSIGNALED(wstatus)) { >> + int sig = WTERMSIG(wstatus); >> + const char *signame = strsignal(sig) ?: "unknown ???"; >> + >> + if (!WCOREDUMP(wstatus)) { >> + die = false; > > Does WCOREDUMP(wstatus) depend on the user's ulimit -c? 'man waitpid' on Linux mentions that WCOREDUMP is nonportable, but does not mention any interaction with setrlimit. But a quick test shows: $ ulimit -S -c 0 $ cat foo.c int main(int argc, char **argv) { return *argv[1]; } $ gcc -o foo -Wall foo.c $ ./foo 1 $ ./foo Segmentation fault (core dumped) $ the output was produced by bash, which uses waitpid() - and therefore the fact that bash reports the core dump even when no core file is created is promising.
Eric Blake <eblake@redhat.com> writes: > On 07/24/2018 01:36 AM, Markus Armbruster wrote: >> Eric Blake <eblake@redhat.com> writes: >> >>> In kill_qemu() we have an assert that checks that the QEMU process >>> didn't dump core: >>> assert(!WCOREDUMP(wstatus)); >>> >>> Unfortunately the WCOREDUMP macro here means the resulting message >>> is not very easy to comprehend on at least some systems: >>> > >>> - if (pid == s->qemu_pid && WIFSIGNALED(wstatus)) { >>> - assert(!WCOREDUMP(wstatus)); > >>> + } else if (WIFSIGNALED(wstatus)) { >>> + int sig = WTERMSIG(wstatus); >>> + const char *signame = strsignal(sig) ?: "unknown ???"; >>> + >>> + if (!WCOREDUMP(wstatus)) { >>> + die = false; >> >> Does WCOREDUMP(wstatus) depend on the user's ulimit -c? > > 'man waitpid' on Linux mentions that WCOREDUMP is nonportable, but > does not mention any interaction with setrlimit. But a quick test > shows: > > $ ulimit -S -c 0 > $ cat foo.c > int main(int argc, char **argv) { > return *argv[1]; > } > $ gcc -o foo -Wall foo.c > $ ./foo 1 > $ ./foo > Segmentation fault (core dumped) > $ > > the output was produced by bash, which uses waitpid() - and therefore > the fact that bash reports the core dump even when no core file is > created is promising. Proof beats plausibility argument: $ cat wcordump.c #include <stdio.h> #include <stdlib.h> #include <sys/types.h> #include <sys/wait.h> #include <unistd.h> int main(void) { pid_t child = fork(); int wstatus; if (child < 0) { perror("fork"); exit(1); } if (!child) { abort(); } if (waitpid(child, &wstatus, 0) < 0) { perror("waitpid"); exit(1); } if (WIFSIGNALED(wstatus)) { printf("sig %d %d\n", WTERMSIG(wstatus), WCOREDUMP(wstatus)); } else { printf("no sig\n"); } exit(0); } $ gcc -Wall -g -O wcordump.c $ (ulimit -c unlimited; ./a.out) sig 6 128 $ (ulimit -c 0; ./a.out) sig 6 0 Looks like WCOREDUMP() does depend on my ulimit -c.
On 07/25/2018 11:17 AM, Markus Armbruster wrote: >> >> the output was produced by bash, which uses waitpid() - and therefore >> the fact that bash reports the core dump even when no core file is >> created is promising. > > Proof beats plausibility argument: > > $ cat wcordump.c > $ gcc -Wall -g -O wcordump.c > $ (ulimit -c unlimited; ./a.out) > sig 6 128 > $ (ulimit -c 0; ./a.out) > sig 6 0 Doesn't match my results: $ (ulimit -c 0; ./a.out) sig 6 128 So, what's different between our two environments? kernel 4.17.7-100.fc27.x86_64 $ echo /proc/sys/kernel/core_* /proc/sys/kernel/core_pattern /proc/sys/kernel/core_pipe_limit /proc/sys/kernel/core_uses_pid $ cat /proc/sys/kernel/core_* |/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %e 0 1 > > Looks like WCOREDUMP() does depend on my ulimit -c. Or worse, that its behavior is kernel/environment-sensitive.
On 07/24/2018 01:44 AM, Thomas Huth wrote: >> >> Furthermore, we are NOT detecting EINTR (while EINTR shouldn't be >> happening if we didn't install signal handlers, it's still better >> to always be robust), and also want to log unexpected non-zero status >> that was not accompanied by a core dump. >> >> kill(s->qemu_pid, SIGTERM); >> + retry: >> pid = waitpid(s->qemu_pid, &wstatus, 0); >> + if (pid == -1 && errno == EINTR) { >> + goto retry; >> + } > > do { > pid = waitpid(s->qemu_pid, &wstatus, 0); > } while (pid == -1 && errno == EINTR); > > ? > > Or use the TFR macro from include/qemu-common.h ? Cool, I didn't know that macro existed! Will send v3.
Eric Blake <eblake@redhat.com> writes: > On 07/25/2018 11:17 AM, Markus Armbruster wrote: > >>> >>> the output was produced by bash, which uses waitpid() - and therefore >>> the fact that bash reports the core dump even when no core file is >>> created is promising. >> >> Proof beats plausibility argument: >> >> $ cat wcordump.c > >> $ gcc -Wall -g -O wcordump.c >> $ (ulimit -c unlimited; ./a.out) >> sig 6 128 >> $ (ulimit -c 0; ./a.out) >> sig 6 0 > > Doesn't match my results: > > $ (ulimit -c 0; ./a.out) > sig 6 128 > > So, what's different between our two environments? > > kernel 4.17.7-100.fc27.x86_64 4.17.7-200.fc28.x86_64 > $ echo /proc/sys/kernel/core_* > /proc/sys/kernel/core_pattern /proc/sys/kernel/core_pipe_limit > /proc/sys/kernel/core_uses_pid > $ cat /proc/sys/kernel/core_* > |/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %e This is "core" on my development boxes (I'm a happy caveman). # cat /etc/sysctl.d/50-coredump.conf kernel.core_pattern=core > 0 > 1 > >> >> Looks like WCOREDUMP() does depend on my ulimit -c. > > Or worse, that its behavior is kernel/environment-sensitive. Looks like it.
diff --git a/tests/libqtest.c b/tests/libqtest.c index 098af6aec44..f3dabfadd78 100644 --- a/tests/libqtest.c +++ b/tests/libqtest.c @@ -105,12 +105,47 @@ static void kill_qemu(QTestState *s) if (s->qemu_pid != -1) { int wstatus = 0; pid_t pid; + bool die = false; kill(s->qemu_pid, SIGTERM); + retry: pid = waitpid(s->qemu_pid, &wstatus, 0); + if (pid == -1 && errno == EINTR) { + goto retry; + } - if (pid == s->qemu_pid && WIFSIGNALED(wstatus)) { - assert(!WCOREDUMP(wstatus)); + assert(pid == s->qemu_pid); + /* + * We expect qemu to exit with status 0; anything else is + * fishy and should be logged. Abort except when death by + * signal is not accompanied by a coredump (as that's the only + * time it was likely that the user is trying to kill the + * testsuite early). + */ + if (wstatus) { + die = true; + if (WIFEXITED(wstatus)) { + fprintf(stderr, "%s:%d: kill_qemu() tried to terminate QEMU " + "process but encountered exit status %d\n", + __FILE__, __LINE__, WEXITSTATUS(wstatus)); + } else if (WIFSIGNALED(wstatus)) { + int sig = WTERMSIG(wstatus); + const char *signame = strsignal(sig) ?: "unknown ???"; + + if (!WCOREDUMP(wstatus)) { + die = false; + fprintf(stderr, "%s:%d: kill_qemu() ignoring QEMU death " + "by signal %d (%s)\n", + __FILE__, __LINE__, sig, signame); + } else { + fprintf(stderr, "%s:%d: kill_qemu() detected QEMU death " + "with core dump from signal %d (%s)\n", + __FILE__, __LINE__, sig, signame); + } + } + } + if (die) { + abort(); } } }
In kill_qemu() we have an assert that checks that the QEMU process didn't dump core: assert(!WCOREDUMP(wstatus)); Unfortunately the WCOREDUMP macro here means the resulting message is not very easy to comprehend on at least some systems: ahci-test: tests/libqtest.c:113: kill_qemu: Assertion `!(((__extension__ (((union { __typeof(wstatus) __in; int __i; }) { .__in = (wstatus) }).__i))) & 0x80)' failed. and it doesn't identify what signal the process took. Furthermore, we are NOT detecting EINTR (while EINTR shouldn't be happening if we didn't install signal handlers, it's still better to always be robust), and also want to log unexpected non-zero status that was not accompanied by a core dump. Instead of using a raw assert, print the information in an easier to understand way: /i386/ahci/sanity: tests/libqtest.c:119: kill_qemu() detected QEMU death with core dump from signal 11 (Segmentation fault) Aborted (core dumped) (Of course, the really useful information would be why the QEMU process dumped core in the first place, but we don't have that by the time the test program has picked up the exit status.) Suggested-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Eric Blake <eblake@redhat.com> --- I've taken the ideas from Peter's patch: https://lists.gnu.org/archive/html/qemu-devel/2018-07/msg04430.html as well as fixing a related issue brought up last time this was touched: https://lists.gnu.org/archive/html/qemu-devel/2018-05/msg05710.html tests/libqtest.c | 39 +++++++++++++++++++++++++++++++++++++-- 1 file changed, 37 insertions(+), 2 deletions(-)