diff mbox

[V2,1/2] tests/libqtest: Fix possible deadlock in qtest initialization

Message ID 1394532550-21857-2-git-send-email-marcel.a@redhat.com
State New
Headers show

Commit Message

Marcel Apfelbaum March 11, 2014, 10:09 a.m. UTC
'socket_accept' waits for Qemu to init its unix socket.
If Qemu encounters an error during command line parsing,
it can exit before initializing the communication channel.

Using a timeout for sockets fixes the issue.

Signed-off-by: Marcel Apfelbaum <marcel.a@redhat.com>
---
 tests/libqtest.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

Comments

Stefan Hajnoczi March 11, 2014, 12:40 p.m. UTC | #1
On Tue, Mar 11, 2014 at 12:09:09PM +0200, Marcel Apfelbaum wrote:
> @@ -78,12 +79,16 @@ static int socket_accept(int sock)
>      struct sockaddr_un addr;
>      socklen_t addrlen;
>      int ret;
> +    struct timeval timeout = { .tv_sec = SOCKET_TIMEOUT,
> +                               .tv_usec = 0 };
> +
> +    setsockopt(sock, SOL_SOCKET, SO_RCVTIMEO, (void *)&timeout,
> +               sizeof(timeout));
>  
>      addrlen = sizeof(addr);
>      do {
>          ret = accept(sock, (struct sockaddr *)&addr, &addrlen);
>      } while (ret == -1 && errno == EINTR);
> -    g_assert_no_errno(ret);
>      close(sock);

Did you mean to leave SO_RCVTIMEO set after this function completes?

> @@ -91,7 +96,7 @@ static int socket_accept(int sock)
>  
>  static void kill_qemu(QTestState *s)
>  {
> -    if (s->qemu_pid != -1) {
> +    if (s && s->qemu_pid != -1) {
>          kill(s->qemu_pid, SIGTERM);
>          waitpid(s->qemu_pid, NULL, 0);
>      }

This is a bug in libqtest.c, please don't silence the crash.

kill_qemu() gets called from the SIGABRT signal handler but I forgot
that global_qtest isn't initialized yet while qtest_init() executes.

In other words, the cleanup is broken if we fail inside qtest_init().
Can you drop this hunk and I'll send a patch to fix the underlying
issue?

> @@ -153,6 +158,8 @@ QTestState *qtest_init(const char *extra_args)
>      g_free(socket_path);
>      g_free(qmp_socket_path);
>  
> +    g_assert(s->fd >= 0 && s->qmp_fd >= 0);
> +

We probably shouldn't socket_accept() s->qmp_fd if s->fd already failed.
Otherwise we'll wait another 5 seconds for the timeout to explire:

    s->fd = socket_accept(sock);
    if (s->fd >= 0) {
        s->qmp_fd = socket_accept(qmpsock);
    }
Marcel Apfelbaum March 11, 2014, 12:51 p.m. UTC | #2
On Tue, 2014-03-11 at 13:40 +0100, Stefan Hajnoczi wrote:
> On Tue, Mar 11, 2014 at 12:09:09PM +0200, Marcel Apfelbaum wrote:
> > @@ -78,12 +79,16 @@ static int socket_accept(int sock)
> >      struct sockaddr_un addr;
> >      socklen_t addrlen;
> >      int ret;
> > +    struct timeval timeout = { .tv_sec = SOCKET_TIMEOUT,
> > +                               .tv_usec = 0 };
> > +
> > +    setsockopt(sock, SOL_SOCKET, SO_RCVTIMEO, (void *)&timeout,
> > +               sizeof(timeout));
> >  
> >      addrlen = sizeof(addr);
> >      do {
> >          ret = accept(sock, (struct sockaddr *)&addr, &addrlen);
> >      } while (ret == -1 && errno == EINTR);
> > -    g_assert_no_errno(ret);
> >      close(sock);
> 
> Did you mean to leave SO_RCVTIMEO set after this function completes?
Yes, I don't think it hurts. A 5 sec timeout should be like infinite,
Qemu running on the same machine. If you think 

> 
> > @@ -91,7 +96,7 @@ static int socket_accept(int sock)
> >  
> >  static void kill_qemu(QTestState *s)
> >  {
> > -    if (s->qemu_pid != -1) {
> > +    if (s && s->qemu_pid != -1) {
> >          kill(s->qemu_pid, SIGTERM);
> >          waitpid(s->qemu_pid, NULL, 0);
> >      }
> 
> This is a bug in libqtest.c, please don't silence the crash.
I didn't see it like hiding a crash, I thought that if there
is any problem during init it is because the Qemu failed to start,
meaning that you don't have a process to kill (Qemu exited already).
Al of the above happens -> you don't have a global state.

Anyway, if you have a better way to deal with it, I have nothing against it :)

Thanks,
Marcel

> 
> kill_qemu() gets called from the SIGABRT signal handler but I forgot
> that global_qtest isn't initialized yet while qtest_init() executes.
> 
> In other words, the cleanup is broken if we fail inside qtest_init().
> Can you drop this hunk and I'll send a patch to fix the underlying
> issue?
> 
> > @@ -153,6 +158,8 @@ QTestState *qtest_init(const char *extra_args)
> >      g_free(socket_path);
> >      g_free(qmp_socket_path);
> >  
> > +    g_assert(s->fd >= 0 && s->qmp_fd >= 0);
> > +
> 
> We probably shouldn't socket_accept() s->qmp_fd if s->fd already failed.
> Otherwise we'll wait another 5 seconds for the timeout to explire:
Yes, I already had this chunk, I have no idea why I dropped it, I'll
return it, thanks.

Thanks,
Marcel
> 
>     s->fd = socket_accept(sock);
>     if (s->fd >= 0) {
>         s->qmp_fd = socket_accept(qmpsock);
>     }
Marcel Apfelbaum March 11, 2014, 1:04 p.m. UTC | #3
On Tue, 2014-03-11 at 14:51 +0200, Marcel Apfelbaum wrote:
> On Tue, 2014-03-11 at 13:40 +0100, Stefan Hajnoczi wrote:
> > On Tue, Mar 11, 2014 at 12:09:09PM +0200, Marcel Apfelbaum wrote:
> > > @@ -78,12 +79,16 @@ static int socket_accept(int sock)
> > >      struct sockaddr_un addr;
> > >      socklen_t addrlen;
> > >      int ret;
> > > +    struct timeval timeout = { .tv_sec = SOCKET_TIMEOUT,
> > > +                               .tv_usec = 0 };
> > > +
> > > +    setsockopt(sock, SOL_SOCKET, SO_RCVTIMEO, (void *)&timeout,
> > > +               sizeof(timeout));
> > >  
> > >      addrlen = sizeof(addr);
> > >      do {
> > >          ret = accept(sock, (struct sockaddr *)&addr, &addrlen);
> > >      } while (ret == -1 && errno == EINTR);
> > > -    g_assert_no_errno(ret);
> > >      close(sock);
> > 
> > Did you mean to leave SO_RCVTIMEO set after this function completes?
> Yes, I don't think it hurts. A 5 sec timeout should be like infinite,
> Qemu running on the same machine. If you think 
... otherwise, I can remove the timeout, but I think it is OK.

> 
> > 
> > > @@ -91,7 +96,7 @@ static int socket_accept(int sock)
> > >  
> > >  static void kill_qemu(QTestState *s)
> > >  {
> > > -    if (s->qemu_pid != -1) {
> > > +    if (s && s->qemu_pid != -1) {
> > >          kill(s->qemu_pid, SIGTERM);
> > >          waitpid(s->qemu_pid, NULL, 0);
> > >      }
> > 
> > This is a bug in libqtest.c, please don't silence the crash.
> I didn't see it like hiding a crash, I thought that if there
> is any problem during init it is because the Qemu failed to start,
> meaning that you don't have a process to kill (Qemu exited already).
> Al of the above happens -> you don't have a global state.
> 
> Anyway, if you have a better way to deal with it, I have nothing against it :)
> 
> Thanks,
> Marcel
> 
> > 
> > kill_qemu() gets called from the SIGABRT signal handler but I forgot
> > that global_qtest isn't initialized yet while qtest_init() executes.
> > 
> > In other words, the cleanup is broken if we fail inside qtest_init().
> > Can you drop this hunk and I'll send a patch to fix the underlying
> > issue?
I dropped it, please take care of it as it gets a segmentation fault
if we abort in qtest_init.

Thanks,
Marcel
> > 
> > > @@ -153,6 +158,8 @@ QTestState *qtest_init(const char *extra_args)
> > >      g_free(socket_path);
> > >      g_free(qmp_socket_path);
> > >  
> > > +    g_assert(s->fd >= 0 && s->qmp_fd >= 0);
> > > +
> > 
> > We probably shouldn't socket_accept() s->qmp_fd if s->fd already failed.
> > Otherwise we'll wait another 5 seconds for the timeout to explire:
> Yes, I already had this chunk, I have no idea why I dropped it, I'll
> return it, thanks.
> 
> Thanks,
> Marcel
> > 
> >     s->fd = socket_accept(sock);
> >     if (s->fd >= 0) {
> >         s->qmp_fd = socket_accept(qmpsock);
> >     }
> 
> 
> 
>
Stefan Hajnoczi March 11, 2014, 6:50 p.m. UTC | #4
On Tue, Mar 11, 2014 at 03:04:22PM +0200, Marcel Apfelbaum wrote:
> On Tue, 2014-03-11 at 14:51 +0200, Marcel Apfelbaum wrote:
> > On Tue, 2014-03-11 at 13:40 +0100, Stefan Hajnoczi wrote:
> > > On Tue, Mar 11, 2014 at 12:09:09PM +0200, Marcel Apfelbaum wrote:
> > > > @@ -78,12 +79,16 @@ static int socket_accept(int sock)
> > > >      struct sockaddr_un addr;
> > > >      socklen_t addrlen;
> > > >      int ret;
> > > > +    struct timeval timeout = { .tv_sec = SOCKET_TIMEOUT,
> > > > +                               .tv_usec = 0 };
> > > > +
> > > > +    setsockopt(sock, SOL_SOCKET, SO_RCVTIMEO, (void *)&timeout,
> > > > +               sizeof(timeout));
> > > >  
> > > >      addrlen = sizeof(addr);
> > > >      do {
> > > >          ret = accept(sock, (struct sockaddr *)&addr, &addrlen);
> > > >      } while (ret == -1 && errno == EINTR);
> > > > -    g_assert_no_errno(ret);
> > > >      close(sock);
> > > 
> > > Did you mean to leave SO_RCVTIMEO set after this function completes?
> > Yes, I don't think it hurts. A 5 sec timeout should be like infinite,
> > Qemu running on the same machine. If you think 
> ... otherwise, I can remove the timeout, but I think it is OK.

I think you are right.  I checked that the qtest protocol has no
long-running operations.  It doesn't seem realistic that any qtest
command would take 5 seconds or longer.

So let's leave in the timeout.

Stefan
diff mbox

Patch

diff --git a/tests/libqtest.c b/tests/libqtest.c
index f587d36..f1ba254 100644
--- a/tests/libqtest.c
+++ b/tests/libqtest.c
@@ -34,6 +34,7 @@ 
 #include "qapi/qmp/json-parser.h"
 
 #define MAX_IRQ 256
+#define SOCKET_TIMEOUT 5
 
 QTestState *global_qtest;
 
@@ -78,12 +79,16 @@  static int socket_accept(int sock)
     struct sockaddr_un addr;
     socklen_t addrlen;
     int ret;
+    struct timeval timeout = { .tv_sec = SOCKET_TIMEOUT,
+                               .tv_usec = 0 };
+
+    setsockopt(sock, SOL_SOCKET, SO_RCVTIMEO, (void *)&timeout,
+               sizeof(timeout));
 
     addrlen = sizeof(addr);
     do {
         ret = accept(sock, (struct sockaddr *)&addr, &addrlen);
     } while (ret == -1 && errno == EINTR);
-    g_assert_no_errno(ret);
     close(sock);
 
     return ret;
@@ -91,7 +96,7 @@  static int socket_accept(int sock)
 
 static void kill_qemu(QTestState *s)
 {
-    if (s->qemu_pid != -1) {
+    if (s && s->qemu_pid != -1) {
         kill(s->qemu_pid, SIGTERM);
         waitpid(s->qemu_pid, NULL, 0);
     }
@@ -153,6 +158,8 @@  QTestState *qtest_init(const char *extra_args)
     g_free(socket_path);
     g_free(qmp_socket_path);
 
+    g_assert(s->fd >= 0 && s->qmp_fd >= 0);
+
     s->rx = g_string_new("");
     for (i = 0; i < MAX_IRQ; i++) {
         s->irq_level[i] = false;