[v4] tests/libqtest: Improve kill_qemu()

Message ID 20180810132800.38549-1-eblake@redhat.com
State New
Headers show
Series
  • [v4] tests/libqtest: Improve kill_qemu()
Related show

Commit Message

Eric Blake Aug. 10, 2018, 1:28 p.m.
In kill_qemu() we have an assert that checks that the QEMU process
didn't dump core:
            assert(!WCOREDUMP(wstatus));

Unfortunately the WCOREDUMP macro here means the resulting message
is not very easy to comprehend on at least some systems:

ahci-test: tests/libqtest.c:113: kill_qemu: Assertion `!(((__extension__ (((union { __typeof(wstatus) __in; int __i; }) { .__in = (wstatus) }).__i))) & 0x80)' failed.

and it doesn't identify what signal the process took. What's more,
WCOREDUMP is not reliable - in some cases, setrlimit() coupled with
kernel dump settings can result in the flag not being set.  It's
better to log ALL death by signal, instead of caring whether a core
dump was attempted (although once we know a signal happened, also
mentioning if a core dump is present can be helpful).

Furthermore, we are NOT detecting EINTR (while EINTR shouldn't be
happening if we didn't install signal handlers, it's still better
to always be robust).

Finally, even non-signal death with a non-zero status is suspicious,
since qemu's SIGINT handler is supposed to result in exit(0).

Instead of using a raw assert, print the information in an
easier to understand way:

/i386/ahci/sanity: tests/libqtest.c:119: kill_qemu() detected QEMU death from signal 11 (Segmentation fault) (core dumped)
Aborted (core dumped)

(Of course, the really useful information would be why the QEMU
process dumped core in the first place, but we don't have that
by the time the test program has picked up the exit status.)

Suggested-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Eric Blake <eblake@redhat.com>

---
v4: don't special-case death without WCOREDUMP [Markus]
v3: use TFR() instead of open-coding the retry loop [Thomas]
---
 tests/libqtest.c | 24 +++++++++++++++++++++---
 1 file changed, 21 insertions(+), 3 deletions(-)

Comments

Eric Blake Aug. 10, 2018, 3:12 p.m. | #1
On 08/10/2018 08:28 AM, Eric Blake wrote:
> Instead of using a raw assert, print the information in an
> easier to understand way:
> 
> /i386/ahci/sanity: tests/libqtest.c:119: kill_qemu() detected QEMU death from signal 11 (Segmentation fault) (core dumped)

Well, it would help if my commit message actually matched...


> +        if (wstatus) {
> +            if (WIFEXITED(wstatus)) {
> +                fprintf(stderr, "%s:%d: kill_qemu() tried to terminate QEMU "
> +                        "process but encountered exit status %d\n",
> +                        __FILE__, __LINE__, WEXITSTATUS(wstatus));
> +            } else if (WIFSIGNALED(wstatus)) {
> +                int sig = WTERMSIG(wstatus);
> +                const char *signame = strsignal(sig) ?: "unknown ???";
> +                const char *dump = WCOREDUMP(wstatus) ? " (dumped core)" : "";
> +
> +                fprintf(stderr, "%s:%d: kill_qemu() detected QEMU death "
> +                        "from signal %d (%s)%s\n",

...the code.
Markus Armbruster Aug. 13, 2018, 7:17 a.m. | #2
Eric Blake <eblake@redhat.com> writes:

> On 08/10/2018 08:28 AM, Eric Blake wrote:
>> Instead of using a raw assert, print the information in an
>> easier to understand way:
>>
>> /i386/ahci/sanity: tests/libqtest.c:119: kill_qemu() detected QEMU death from signal 11 (Segmentation fault) (core dumped)
>
> Well, it would help if my commit message actually matched...
>
>
>> +        if (wstatus) {
>> +            if (WIFEXITED(wstatus)) {
>> +                fprintf(stderr, "%s:%d: kill_qemu() tried to terminate QEMU "
>> +                        "process but encountered exit status %d\n",
>> +                        __FILE__, __LINE__, WEXITSTATUS(wstatus));
>> +            } else if (WIFSIGNALED(wstatus)) {
>> +                int sig = WTERMSIG(wstatus);
>> +                const char *signame = strsignal(sig) ?: "unknown ???";
>> +                const char *dump = WCOREDUMP(wstatus) ? " (dumped core)" : "";
>> +
>> +                fprintf(stderr, "%s:%d: kill_qemu() detected QEMU death "
>> +                        "from signal %d (%s)%s\n",
>
> ...the code.

I got libqtest patches in my queue, and I could stick this patch in.
Would you like me to touch up the commit message when I apply?  Or
should I expect v5?
Eric Blake Aug. 13, 2018, 3 p.m. | #3
On 08/13/2018 02:17 AM, Markus Armbruster wrote:
> Eric Blake <eblake@redhat.com> writes:
> 
>> On 08/10/2018 08:28 AM, Eric Blake wrote:
>>> Instead of using a raw assert, print the information in an
>>> easier to understand way:
>>>
>>> /i386/ahci/sanity: tests/libqtest.c:119: kill_qemu() detected QEMU death from signal 11 (Segmentation fault) (core dumped)
>>
>> Well, it would help if my commit message actually matched...
>>
>>
>>> +        if (wstatus) {
>>> +            if (WIFEXITED(wstatus)) {
>>> +                fprintf(stderr, "%s:%d: kill_qemu() tried to terminate QEMU "
>>> +                        "process but encountered exit status %d\n",
>>> +                        __FILE__, __LINE__, WEXITSTATUS(wstatus));
>>> +            } else if (WIFSIGNALED(wstatus)) {
>>> +                int sig = WTERMSIG(wstatus);
>>> +                const char *signame = strsignal(sig) ?: "unknown ???";
>>> +                const char *dump = WCOREDUMP(wstatus) ? " (dumped core)" : "";
>>> +
>>> +                fprintf(stderr, "%s:%d: kill_qemu() detected QEMU death "
>>> +                        "from signal %d (%s)%s\n",
>>
>> ...the code.
> 
> I got libqtest patches in my queue, and I could stick this patch in.
> Would you like me to touch up the commit message when I apply?  Or
> should I expect v5?
> 

If you don't mind doing the touchup (s/core dumped/dumped core/), then I 
don't need to submit v5.
Markus Armbruster Aug. 15, 2018, 1:35 p.m. | #4
Eric Blake <eblake@redhat.com> writes:

> On 08/13/2018 02:17 AM, Markus Armbruster wrote:
>> Eric Blake <eblake@redhat.com> writes:
>>
>>> On 08/10/2018 08:28 AM, Eric Blake wrote:
>>>> Instead of using a raw assert, print the information in an
>>>> easier to understand way:
>>>>
>>>> /i386/ahci/sanity: tests/libqtest.c:119: kill_qemu() detected QEMU death from signal 11 (Segmentation fault) (core dumped)
>>>
>>> Well, it would help if my commit message actually matched...
>>>
>>>
>>>> +        if (wstatus) {
>>>> +            if (WIFEXITED(wstatus)) {
>>>> +                fprintf(stderr, "%s:%d: kill_qemu() tried to terminate QEMU "
>>>> +                        "process but encountered exit status %d\n",
>>>> +                        __FILE__, __LINE__, WEXITSTATUS(wstatus));
>>>> +            } else if (WIFSIGNALED(wstatus)) {
>>>> +                int sig = WTERMSIG(wstatus);
>>>> +                const char *signame = strsignal(sig) ?: "unknown ???";
>>>> +                const char *dump = WCOREDUMP(wstatus) ? " (dumped core)" : "";
>>>> +
>>>> +                fprintf(stderr, "%s:%d: kill_qemu() detected QEMU death "
>>>> +                        "from signal %d (%s)%s\n",
>>>
>>> ...the code.
>>
>> I got libqtest patches in my queue, and I could stick this patch in.
>> Would you like me to touch up the commit message when I apply?  Or
>> should I expect v5?
>>
>
> If you don't mind doing the touchup (s/core dumped/dumped core/), then
> I don't need to submit v5.

Since "core dumped" is how the shell traditionally reports this, I'd
prefer to adjust the code to match the commit message.  Okay?
Philippe Mathieu-Daudé Aug. 15, 2018, 2:50 p.m. | #5
On 08/15/2018 10:35 AM, Markus Armbruster wrote:
> Eric Blake <eblake@redhat.com> writes:
> 
>> On 08/13/2018 02:17 AM, Markus Armbruster wrote:
>>> Eric Blake <eblake@redhat.com> writes:
>>>
>>>> On 08/10/2018 08:28 AM, Eric Blake wrote:
>>>>> Instead of using a raw assert, print the information in an
>>>>> easier to understand way:
>>>>>
>>>>> /i386/ahci/sanity: tests/libqtest.c:119: kill_qemu() detected QEMU death from signal 11 (Segmentation fault) (core dumped)
>>>>
>>>> Well, it would help if my commit message actually matched...
>>>>
>>>>
>>>>> +        if (wstatus) {
>>>>> +            if (WIFEXITED(wstatus)) {
>>>>> +                fprintf(stderr, "%s:%d: kill_qemu() tried to terminate QEMU "
>>>>> +                        "process but encountered exit status %d\n",
>>>>> +                        __FILE__, __LINE__, WEXITSTATUS(wstatus));
>>>>> +            } else if (WIFSIGNALED(wstatus)) {
>>>>> +                int sig = WTERMSIG(wstatus);
>>>>> +                const char *signame = strsignal(sig) ?: "unknown ???";
>>>>> +                const char *dump = WCOREDUMP(wstatus) ? " (dumped core)" : "";
>>>>> +
>>>>> +                fprintf(stderr, "%s:%d: kill_qemu() detected QEMU death "
>>>>> +                        "from signal %d (%s)%s\n",
>>>>
>>>> ...the code.
>>>
>>> I got libqtest patches in my queue, and I could stick this patch in.
>>> Would you like me to touch up the commit message when I apply?  Or
>>> should I expect v5?
>>>
>>
>> If you don't mind doing the touchup (s/core dumped/dumped core/), then
>> I don't need to submit v5.
> 
> Since "core dumped" is how the shell traditionally reports this, I'd
> prefer to adjust the code to match the commit message.  Okay?
> 

Apparently your commit message betrayed you =) You are also custom to
see "core dumped" in your terminal.

I also prefer "core dumped", this is the default string I'm custom to
fgrep for, and the quoted version I expect other people to google for.

With code adjusted:
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Eric Blake Aug. 15, 2018, 7:31 p.m. | #6
On 08/15/2018 08:35 AM, Markus Armbruster wrote:

>>>>> /i386/ahci/sanity: tests/libqtest.c:119: kill_qemu() detected QEMU death from signal 11 (Segmentation fault) (core dumped)
>>>>
>>>> Well, it would help if my commit message actually matched...

>>>>> +                const char *dump = WCOREDUMP(wstatus) ? " (dumped core)" : "";

>>>> ...the code.
>>>
>>> I got libqtest patches in my queue, and I could stick this patch in.
>>> Would you like me to touch up the commit message when I apply?  Or
>>> should I expect v5?
>>>
>>
>> If you don't mind doing the touchup (s/core dumped/dumped core/), then
>> I don't need to submit v5.
> 
> Since "core dumped" is how the shell traditionally reports this, I'd
> prefer to adjust the code to match the commit message.  Okay?

Yes.
Marc-André Lureau Aug. 16, 2018, 2:09 p.m. | #7
On Fri, Aug 10, 2018 at 3:28 PM Eric Blake <eblake@redhat.com> wrote:
>
> In kill_qemu() we have an assert that checks that the QEMU process
> didn't dump core:
>             assert(!WCOREDUMP(wstatus));
>
> Unfortunately the WCOREDUMP macro here means the resulting message
> is not very easy to comprehend on at least some systems:
>
> ahci-test: tests/libqtest.c:113: kill_qemu: Assertion `!(((__extension__ (((union { __typeof(wstatus) __in; int __i; }) { .__in = (wstatus) }).__i))) & 0x80)' failed.
>
> and it doesn't identify what signal the process took. What's more,
> WCOREDUMP is not reliable - in some cases, setrlimit() coupled with
> kernel dump settings can result in the flag not being set.  It's
> better to log ALL death by signal, instead of caring whether a core
> dump was attempted (although once we know a signal happened, also
> mentioning if a core dump is present can be helpful).
>
> Furthermore, we are NOT detecting EINTR (while EINTR shouldn't be
> happening if we didn't install signal handlers, it's still better
> to always be robust).

A nice thing about this patch is that "make check" will now return an
error when compiled with --enable-sanitizers and there are leaks
(instead of just reporting warnings).

With the few leak fixes pending merge, x86-64 ASAN make check is
clean: I wish we can enable it in our CI, to avoid regressions.

That I know of, there is at least patchew (make docker), travis, and
Peter has his own set of integration tools/build iirc.

Travis looks overloaded to me.

May I suggest to replace docker-test-clang@ubuntu by
docker-test-debug@ubuntu in patchew? (test-debug uses clang, with some
default sanitizers)

The 2 patches missing upstream for a clean x86-64 run are:
- megasas: fix sglist leak
- tests: fix bdrv-drain leak

thanks

>
> Finally, even non-signal death with a non-zero status is suspicious,
> since qemu's SIGINT handler is supposed to result in exit(0).
>
> Instead of using a raw assert, print the information in an
> easier to understand way:
>
> /i386/ahci/sanity: tests/libqtest.c:119: kill_qemu() detected QEMU death from signal 11 (Segmentation fault) (core dumped)
> Aborted (core dumped)
>
> (Of course, the really useful information would be why the QEMU
> process dumped core in the first place, but we don't have that
> by the time the test program has picked up the exit status.)
>
> Suggested-by: Peter Maydell <peter.maydell@linaro.org>
> Signed-off-by: Eric Blake <eblake@redhat.com>
>
> ---
> v4: don't special-case death without WCOREDUMP [Markus]
> v3: use TFR() instead of open-coding the retry loop [Thomas]
> ---
>  tests/libqtest.c | 24 +++++++++++++++++++++---
>  1 file changed, 21 insertions(+), 3 deletions(-)
>
> diff --git a/tests/libqtest.c b/tests/libqtest.c
> index 098af6aec44..57cf89015e0 100644
> --- a/tests/libqtest.c
> +++ b/tests/libqtest.c
> @@ -107,10 +107,28 @@ static void kill_qemu(QTestState *s)
>          pid_t pid;
>
>          kill(s->qemu_pid, SIGTERM);
> -        pid = waitpid(s->qemu_pid, &wstatus, 0);
> +        TFR(pid = waitpid(s->qemu_pid, &wstatus, 0));
>
> -        if (pid == s->qemu_pid && WIFSIGNALED(wstatus)) {
> -            assert(!WCOREDUMP(wstatus));
> +        assert(pid == s->qemu_pid);
> +        /*
> +         * We expect qemu to exit with status 0; anything else is
> +         * fishy and should be logged with as much detail as possible.
> +         */
> +        if (wstatus) {
> +            if (WIFEXITED(wstatus)) {
> +                fprintf(stderr, "%s:%d: kill_qemu() tried to terminate QEMU "
> +                        "process but encountered exit status %d\n",
> +                        __FILE__, __LINE__, WEXITSTATUS(wstatus));
> +            } else if (WIFSIGNALED(wstatus)) {
> +                int sig = WTERMSIG(wstatus);
> +                const char *signame = strsignal(sig) ?: "unknown ???";
> +                const char *dump = WCOREDUMP(wstatus) ? " (dumped core)" : "";
> +
> +                fprintf(stderr, "%s:%d: kill_qemu() detected QEMU death "
> +                        "from signal %d (%s)%s\n",
> +                        __FILE__, __LINE__, sig, signame, dump);
> +            }
> +            abort();
>          }
>      }
>  }
> --
> 2.14.4
>
>

Patch

diff --git a/tests/libqtest.c b/tests/libqtest.c
index 098af6aec44..57cf89015e0 100644
--- a/tests/libqtest.c
+++ b/tests/libqtest.c
@@ -107,10 +107,28 @@  static void kill_qemu(QTestState *s)
         pid_t pid;

         kill(s->qemu_pid, SIGTERM);
-        pid = waitpid(s->qemu_pid, &wstatus, 0);
+        TFR(pid = waitpid(s->qemu_pid, &wstatus, 0));

-        if (pid == s->qemu_pid && WIFSIGNALED(wstatus)) {
-            assert(!WCOREDUMP(wstatus));
+        assert(pid == s->qemu_pid);
+        /*
+         * We expect qemu to exit with status 0; anything else is
+         * fishy and should be logged with as much detail as possible.
+         */
+        if (wstatus) {
+            if (WIFEXITED(wstatus)) {
+                fprintf(stderr, "%s:%d: kill_qemu() tried to terminate QEMU "
+                        "process but encountered exit status %d\n",
+                        __FILE__, __LINE__, WEXITSTATUS(wstatus));
+            } else if (WIFSIGNALED(wstatus)) {
+                int sig = WTERMSIG(wstatus);
+                const char *signame = strsignal(sig) ?: "unknown ???";
+                const char *dump = WCOREDUMP(wstatus) ? " (dumped core)" : "";
+
+                fprintf(stderr, "%s:%d: kill_qemu() detected QEMU death "
+                        "from signal %d (%s)%s\n",
+                        __FILE__, __LINE__, sig, signame, dump);
+            }
+            abort();
         }
     }
 }