diff mbox series

[v4,2/3] libqtest: fail if child coredumps

Message ID 1527186303-192100-3-git-send-email-mst@redhat.com
State New
Headers show
Series None | expand

Commit Message

Michael S. Tsirkin May 24, 2018, 6:25 p.m. UTC
Right now tests report OK status if QEMU crashes during cleanup.
Let's catch that case and fail the test.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 tests/libqtest.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

Comments

Thomas Huth May 25, 2018, 6:10 a.m. UTC | #1
On 24.05.2018 20:25, Michael S. Tsirkin wrote:
> Right now tests report OK status if QEMU crashes during cleanup.
> Let's catch that case and fail the test.
> 
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
>  tests/libqtest.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/tests/libqtest.c b/tests/libqtest.c
> index 43fb97e..f869854 100644
> --- a/tests/libqtest.c
> +++ b/tests/libqtest.c
> @@ -103,8 +103,15 @@ static int socket_accept(int sock)
>  static void kill_qemu(QTestState *s)
>  {
>      if (s->qemu_pid != -1) {
> +        int wstatus = 0;
> +        pid_t pid;
> +
>          kill(s->qemu_pid, SIGTERM);
> -        waitpid(s->qemu_pid, NULL, 0);
> +        pid = waitpid(s->qemu_pid, &wstatus, 0);
> +
> +        if (pid == s->qemu_pid && WIFSIGNALED(wstatus)) {
> +            assert(!WCOREDUMP(wstatus));

Another ugliness that I just discovered: kill_qemu is also called from
the SIGABRT handler. So if a qtest assert() triggers an abort(), the
abort handler runs kill_qemu which now could trigger another assert()
and thus abort(). It's likely not a real problem since the abort handler
has been installed with SA_RESETHAND, but it's still quite confusing code.

Please let's clean up this ugliness properly: I think kill_qemu should
*only* be used by the abort handler, and then kill QEMU with SIGKILL for
good, to make sure that there are no stuck QEMU processes hanging around
anymore.

qtest_quit() should simply try to quit QEMU via QMP instead, and then
check for WIFEXITED(wstatus) && !WEXITSTATUS(wstatus) instead of using
the kill_qemu() function.

 Thomas
Thomas Huth May 25, 2018, 11:22 a.m. UTC | #2
On 25.05.2018 08:10, Thomas Huth wrote:
> On 24.05.2018 20:25, Michael S. Tsirkin wrote:
>> Right now tests report OK status if QEMU crashes during cleanup.
>> Let's catch that case and fail the test.
>>
>> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>> ---
>>  tests/libqtest.c | 9 ++++++++-
>>  1 file changed, 8 insertions(+), 1 deletion(-)
>>
>> diff --git a/tests/libqtest.c b/tests/libqtest.c
>> index 43fb97e..f869854 100644
>> --- a/tests/libqtest.c
>> +++ b/tests/libqtest.c
>> @@ -103,8 +103,15 @@ static int socket_accept(int sock)
>>  static void kill_qemu(QTestState *s)
>>  {
>>      if (s->qemu_pid != -1) {
>> +        int wstatus = 0;
>> +        pid_t pid;
>> +
>>          kill(s->qemu_pid, SIGTERM);
>> -        waitpid(s->qemu_pid, NULL, 0);
>> +        pid = waitpid(s->qemu_pid, &wstatus, 0);
>> +
>> +        if (pid == s->qemu_pid && WIFSIGNALED(wstatus)) {
>> +            assert(!WCOREDUMP(wstatus));
> 
> Another ugliness that I just discovered: kill_qemu is also called from
> the SIGABRT handler. So if a qtest assert() triggers an abort(), the
> abort handler runs kill_qemu which now could trigger another assert()
> and thus abort(). It's likely not a real problem since the abort handler
> has been installed with SA_RESETHAND, but it's still quite confusing code.
> 
> Please let's clean up this ugliness properly: I think kill_qemu should
> *only* be used by the abort handler, and then kill QEMU with SIGKILL for
> good, to make sure that there are no stuck QEMU processes hanging around
> anymore.
> 
> qtest_quit() should simply try to quit QEMU via QMP instead, and then
> check for WIFEXITED(wstatus) && !WEXITSTATUS(wstatus) instead of using
> the kill_qemu() function.

I just did some experiments with that, and using QMP 'quit' to exit QEMU
is also not working very reliable - some tests apparently mess up QMP
quite badly, so the 'quit' does not work during qtest_quit anymore.
Looks like we have to continue to send SIGTERM during qtest_quit(). But
I still think we should separate the logic from the abort handler (which
should use SIGKILL in case SIGTERM does not work as expected).

 Thomas
Michael S. Tsirkin May 25, 2018, 12:15 p.m. UTC | #3
On Fri, May 25, 2018 at 08:10:48AM +0200, Thomas Huth wrote:
> On 24.05.2018 20:25, Michael S. Tsirkin wrote:
> > Right now tests report OK status if QEMU crashes during cleanup.
> > Let's catch that case and fail the test.
> > 
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > ---
> >  tests/libqtest.c | 9 ++++++++-
> >  1 file changed, 8 insertions(+), 1 deletion(-)
> > 
> > diff --git a/tests/libqtest.c b/tests/libqtest.c
> > index 43fb97e..f869854 100644
> > --- a/tests/libqtest.c
> > +++ b/tests/libqtest.c
> > @@ -103,8 +103,15 @@ static int socket_accept(int sock)
> >  static void kill_qemu(QTestState *s)
> >  {
> >      if (s->qemu_pid != -1) {
> > +        int wstatus = 0;
> > +        pid_t pid;
> > +
> >          kill(s->qemu_pid, SIGTERM);
> > -        waitpid(s->qemu_pid, NULL, 0);
> > +        pid = waitpid(s->qemu_pid, &wstatus, 0);
> > +
> > +        if (pid == s->qemu_pid && WIFSIGNALED(wstatus)) {
> > +            assert(!WCOREDUMP(wstatus));
> 
> Another ugliness that I just discovered: kill_qemu is also called from
> the SIGABRT handler. So if a qtest assert() triggers an abort(), the
> abort handler runs kill_qemu which now could trigger another assert()
> and thus abort().

But only the first one will cause a coredump.

> It's likely not a real problem since the abort handler
> has been installed with SA_RESETHAND, but it's still quite confusing code.
> 
> Please let's clean up this ugliness properly: I think kill_qemu should
> *only* be used by the abort handler, and then kill QEMU with SIGKILL for
> good, to make sure that there are no stuck QEMU processes hanging around
> anymore.
> 
> qtest_quit() should simply try to quit QEMU via QMP instead, and then
> check for WIFEXITED(wstatus) && !WEXITSTATUS(wstatus) instead of using
> the kill_qemu() function.
> 
>  Thomas

I think I'll drop the second patch for now. failing test on coredump
is clearly correct. The rest can wait until someone has the energy
to look into all the intricacies of signal handling.
Thomas Huth May 25, 2018, 2:05 p.m. UTC | #4
On 25.05.2018 14:15, Michael S. Tsirkin wrote:
> On Fri, May 25, 2018 at 08:10:48AM +0200, Thomas Huth wrote:
>> On 24.05.2018 20:25, Michael S. Tsirkin wrote:
>>> Right now tests report OK status if QEMU crashes during cleanup.
>>> Let's catch that case and fail the test.
>>>
>>> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>>> ---
>>>  tests/libqtest.c | 9 ++++++++-
>>>  1 file changed, 8 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/tests/libqtest.c b/tests/libqtest.c
>>> index 43fb97e..f869854 100644
>>> --- a/tests/libqtest.c
>>> +++ b/tests/libqtest.c
>>> @@ -103,8 +103,15 @@ static int socket_accept(int sock)
>>>  static void kill_qemu(QTestState *s)
>>>  {
>>>      if (s->qemu_pid != -1) {
>>> +        int wstatus = 0;
>>> +        pid_t pid;
>>> +
>>>          kill(s->qemu_pid, SIGTERM);
>>> -        waitpid(s->qemu_pid, NULL, 0);
>>> +        pid = waitpid(s->qemu_pid, &wstatus, 0);
>>> +
>>> +        if (pid == s->qemu_pid && WIFSIGNALED(wstatus)) {
>>> +            assert(!WCOREDUMP(wstatus));
>>
>> Another ugliness that I just discovered: kill_qemu is also called from
>> the SIGABRT handler. So if a qtest assert() triggers an abort(), the
>> abort handler runs kill_qemu which now could trigger another assert()
>> and thus abort().
> 
> But only the first one will cause a coredump.
> 
>> It's likely not a real problem since the abort handler
>> has been installed with SA_RESETHAND, but it's still quite confusing code.
>>
>> Please let's clean up this ugliness properly: I think kill_qemu should
>> *only* be used by the abort handler, and then kill QEMU with SIGKILL for
>> good, to make sure that there are no stuck QEMU processes hanging around
>> anymore.
>>
>> qtest_quit() should simply try to quit QEMU via QMP instead, and then
>> check for WIFEXITED(wstatus) && !WEXITSTATUS(wstatus) instead of using
>> the kill_qemu() function.
>>
>>  Thomas
> 
> I think I'll drop the second patch for now. failing test on coredump
> is clearly correct. The rest can wait until someone has the energy
> to look into all the intricacies of signal handling.

Ok, sounds like a plan.

Acked-by: Thomas Huth <thuth@redhat.com>
diff mbox series

Patch

diff --git a/tests/libqtest.c b/tests/libqtest.c
index 43fb97e..f869854 100644
--- a/tests/libqtest.c
+++ b/tests/libqtest.c
@@ -103,8 +103,15 @@  static int socket_accept(int sock)
 static void kill_qemu(QTestState *s)
 {
     if (s->qemu_pid != -1) {
+        int wstatus = 0;
+        pid_t pid;
+
         kill(s->qemu_pid, SIGTERM);
-        waitpid(s->qemu_pid, NULL, 0);
+        pid = waitpid(s->qemu_pid, &wstatus, 0);
+
+        if (pid == s->qemu_pid && WIFSIGNALED(wstatus)) {
+            assert(!WCOREDUMP(wstatus));
+        }
     }
 }