Patchwork [v2] disable sigcld handling before calling pclose()

login
register
mail settings
Submitter Wen Congyang
Date Dec. 21, 2010, 4:05 a.m.
Message ID <4D102795.9040107@cn.fujitsu.com>
Download mbox | patch
Permalink /patch/76259/
State New
Headers show

Comments

Wen Congyang - Dec. 21, 2010, 4:05 a.m.
When I use the command 'virsh save' to save the domain state,
I receive the following error message:
operation failed: Migration unexpectedly failed.

I debug the qemu by adding some printf(), and find the function
pclose() returns -1.

I use strace to trace qemu, the log is as the following:
======
close(17)                               = 0
--- SIGCHLD (Child exited) @ 0 (0) ---
wait4(-1, NULL, WNOHANG, NULL)          = 22016
rt_sigreturn(0)                         = 0
wait4(22016, 0x7fff7f1034fc, 0, NULL)   = -1 ECHILD (No child processes)
Wen Congyang - Jan. 5, 2011, 2:15 a.m.
At 12/21/2010 12:05 PM, Wen Congyang Write:
> When I use the command 'virsh save' to save the domain state,
> I receive the following error message:
> operation failed: Migration unexpectedly failed.
> 
> I debug the qemu by adding some printf(), and find the function
> pclose() returns -1.
> 
> I use strace to trace qemu, the log is as the following:
> ======
> close(17)                               = 0
> --- SIGCHLD (Child exited) @ 0 (0) ---
> wait4(-1, NULL, WNOHANG, NULL)          = 22016
> rt_sigreturn(0)                         = 0
> wait4(22016, 0x7fff7f1034fc, 0, NULL)   = -1 ECHILD (No child processes)
> ======
> 
> We wait the child twice: one is in signal SIGCHLD handling and the other
> one is in pclose().
> 
> We should disable sigcld handling before calling pclose().
> 
> v2:
> - Add stub functions for Win32
> 
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> 

Ping Again... :)
This is a bug fix.
Paolo Bonzini - Jan. 5, 2011, 9:21 a.m.
On 12/21/2010 05:05 AM, Wen Congyang wrote:
> When I use the command 'virsh save' to save the domain state,
> I receive the following error message:
> operation failed: Migration unexpectedly failed.
>
> I debug the qemu by adding some printf(), and find the function
> pclose() returns -1.
>
> I use strace to trace qemu, the log is as the following:
> ======
> close(17)                               = 0
> --- SIGCHLD (Child exited) @ 0 (0) ---
> wait4(-1, NULL, WNOHANG, NULL)          = 22016
> rt_sigreturn(0)                         = 0
> wait4(22016, 0x7fff7f1034fc, 0, NULL)   = -1 ECHILD (No child processes)
> ======
>
> We wait the child twice: one is in signal SIGCHLD handling and the other
> one is in pclose().
>
> We should disable sigcld handling before calling pclose().

I wondered whether we need SIGCHLD handling at all.  fork is called only 
from:

- xen_domain_watcher in hw/xen_domainbuild.c

- launch_script in net/tap.c

- SLIRP's fork_exec ("mini inetd")

For the first, the child will always "outlive" the parent.  For the 
second, we do waitpid in the function.  So SLIRP is the only real user 
of the SIGCHLD handler and in fact this:

http://www.google.com/codesearch/p?hl=en#tGk9u3ZS0cw/pub/Linux/system/network/serial/slirp-1.0.9.tar.gz|s3yKHVXI6eg/slirp-1.0.9/src/main.c

suggests that the handler came from there (search for do_wait).  Would 
anybody object to removing the support for Samba under SLIRP and all the 
resulting cruft?

Paolo
Wen Congyang - March 3, 2011, 3:10 a.m.
At 12/21/2010 12:05 PM, Wen Congyang Write:
> When I use the command 'virsh save' to save the domain state,
> I receive the following error message:
> operation failed: Migration unexpectedly failed.
> 
> I debug the qemu by adding some printf(), and find the function
> pclose() returns -1.
> 
> I use strace to trace qemu, the log is as the following:
> ======
> close(17)                               = 0
> --- SIGCHLD (Child exited) @ 0 (0) ---
> wait4(-1, NULL, WNOHANG, NULL)          = 22016
> rt_sigreturn(0)                         = 0
> wait4(22016, 0x7fff7f1034fc, 0, NULL)   = -1 ECHILD (No child processes)
> ======
> 
> We wait the child twice: one is in signal SIGCHLD handling and the other
> one is in pclose().
> 
> We should disable sigcld handling before calling pclose().
> 
> v2:
> - Add stub functions for Win32
> 
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> 
> ---
>  os-posix.c      |   19 +++++++++++++++++++
>  qemu-os-posix.h |    2 ++
>  qemu-os-win32.h |    2 ++
>  savevm.c        |    2 ++
>  4 files changed, 25 insertions(+), 0 deletions(-)
> 
> diff --git a/os-posix.c b/os-posix.c
> index 38c29d1..b163995 100644
> --- a/os-posix.c
> +++ b/os-posix.c
> @@ -86,6 +86,25 @@ void os_setup_signal_handling(void)
>      sigaction(SIGCHLD, &act, NULL);
>  }
>  
> +void os_stop_sigchld_handling(void)
> +{
> +    struct sigaction act;
> +
> +    memset(&act, 0, sizeof(act));
> +    act.sa_handler = SIG_DFL;
> +    sigaction(SIGCHLD, &act, NULL);
> +}
> +
> +void os_resume_sigchld_handling(void)
> +{
> +    struct sigaction act;
> +
> +    memset(&act, 0, sizeof(act));
> +    act.sa_handler = sigchld_handler;
> +    act.sa_flags = SA_NOCLDSTOP;
> +    sigaction(SIGCHLD, &act, NULL);
> +}
> +
>  /* Find a likely location for support files using the location of the binary.
>     For installed binaries this will be "$bindir/../share/qemu".  When
>     running from the build tree this will be "$bindir/../pc-bios".  */
> diff --git a/qemu-os-posix.h b/qemu-os-posix.h
> index 81fd9ab..1c317f1 100644
> --- a/qemu-os-posix.h
> +++ b/qemu-os-posix.h
> @@ -33,6 +33,8 @@ static inline void os_host_main_loop_wait(int *timeout)
>  void os_set_line_buffering(void);
>  void os_set_proc_name(const char *s);
>  void os_setup_signal_handling(void);
> +void os_stop_sigchld_handling(void);
> +void os_resume_sigchld_handling(void);
>  void os_daemonize(void);
>  void os_setup_post(void);
>  
> diff --git a/qemu-os-win32.h b/qemu-os-win32.h
> index 1a07e5e..f31c5ef 100644
> --- a/qemu-os-win32.h
> +++ b/qemu-os-win32.h
> @@ -43,6 +43,8 @@ void qemu_del_wait_object(HANDLE handle, WaitObjectFunc *func, void *opaque);
>  void os_host_main_loop_wait(int *timeout);
>  
>  static inline void os_setup_signal_handling(void) {}
> +static inline void os_stop_sigchld_handling(void) {}
> +static inline void os_resume_sigchld_handling(void) {}
>  static inline void os_daemonize(void) {}
>  static inline void os_setup_post(void) {}
>  void os_set_line_buffering(void);
> diff --git a/savevm.c b/savevm.c
> index 90aa237..387b70b 100644
> --- a/savevm.c
> +++ b/savevm.c
> @@ -234,7 +234,9 @@ static int stdio_pclose(void *opaque)
>  {
>      QEMUFileStdio *s = opaque;
>      int ret;
> +    os_stop_sigchld_handling();
>      ret = pclose(s->stdio_file);
> +    os_resume_sigchld_handling();
>      qemu_free(s);
>      return ret;
>  }

Ping Again... :)
This is a bug fix.
2 months has gone, but I do not receive any comment.
Should we remove SIGCHLD handling as Paolo said?

Patch

======

We wait the child twice: one is in signal SIGCHLD handling and the other
one is in pclose().

We should disable sigcld handling before calling pclose().

v2:
- Add stub functions for Win32

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>

---
 os-posix.c      |   19 +++++++++++++++++++
 qemu-os-posix.h |    2 ++
 qemu-os-win32.h |    2 ++
 savevm.c        |    2 ++
 4 files changed, 25 insertions(+), 0 deletions(-)

diff --git a/os-posix.c b/os-posix.c
index 38c29d1..b163995 100644
--- a/os-posix.c
+++ b/os-posix.c
@@ -86,6 +86,25 @@  void os_setup_signal_handling(void)
     sigaction(SIGCHLD, &act, NULL);
 }
 
+void os_stop_sigchld_handling(void)
+{
+    struct sigaction act;
+
+    memset(&act, 0, sizeof(act));
+    act.sa_handler = SIG_DFL;
+    sigaction(SIGCHLD, &act, NULL);
+}
+
+void os_resume_sigchld_handling(void)
+{
+    struct sigaction act;
+
+    memset(&act, 0, sizeof(act));
+    act.sa_handler = sigchld_handler;
+    act.sa_flags = SA_NOCLDSTOP;
+    sigaction(SIGCHLD, &act, NULL);
+}
+
 /* Find a likely location for support files using the location of the binary.
    For installed binaries this will be "$bindir/../share/qemu".  When
    running from the build tree this will be "$bindir/../pc-bios".  */
diff --git a/qemu-os-posix.h b/qemu-os-posix.h
index 81fd9ab..1c317f1 100644
--- a/qemu-os-posix.h
+++ b/qemu-os-posix.h
@@ -33,6 +33,8 @@  static inline void os_host_main_loop_wait(int *timeout)
 void os_set_line_buffering(void);
 void os_set_proc_name(const char *s);
 void os_setup_signal_handling(void);
+void os_stop_sigchld_handling(void);
+void os_resume_sigchld_handling(void);
 void os_daemonize(void);
 void os_setup_post(void);
 
diff --git a/qemu-os-win32.h b/qemu-os-win32.h
index 1a07e5e..f31c5ef 100644
--- a/qemu-os-win32.h
+++ b/qemu-os-win32.h
@@ -43,6 +43,8 @@  void qemu_del_wait_object(HANDLE handle, WaitObjectFunc *func, void *opaque);
 void os_host_main_loop_wait(int *timeout);
 
 static inline void os_setup_signal_handling(void) {}
+static inline void os_stop_sigchld_handling(void) {}
+static inline void os_resume_sigchld_handling(void) {}
 static inline void os_daemonize(void) {}
 static inline void os_setup_post(void) {}
 void os_set_line_buffering(void);
diff --git a/savevm.c b/savevm.c
index 90aa237..387b70b 100644
--- a/savevm.c
+++ b/savevm.c
@@ -234,7 +234,9 @@  static int stdio_pclose(void *opaque)
 {
     QEMUFileStdio *s = opaque;
     int ret;
+    os_stop_sigchld_handling();
     ret = pclose(s->stdio_file);
+    os_resume_sigchld_handling();
     qemu_free(s);
     return ret;
 }