diff mbox series

[v6,2/2] vl: fix use of --daemonize with --preconfig

Message ID 1528372809-175770-1-git-send-email-imammedo@redhat.com
State New
Headers show
Series None | expand

Commit Message

Igor Mammedov June 7, 2018, noon UTC
When using --daemonize, the initial lead process will fork a child and
then wait to be notified that setup is complete via a pipe, before it
exits.  When using --preconfig there is an extra call to main_loop()
before the notification is done from os_setup_post(). Thus the parent
process won't exit until the mgmt application connects to the monitor
and tells QEMU to leave the RUN_STATE_PRECONFIG. The mgmt application
won't connect to the monitor until daemonizing has completed though.

This is a chicken and egg problem, leading to deadlock at startup.

The only viable way to fix this is to call os_setup_post() before
the early main_loop() call when --preconfig is used. This has the
downside that any errors from this point onwards won't be handled
well by the mgmt application, because it will think QEMU has started
successfully, so not be expecting an abrupt exit. Moving as much user
input validation as possible to before the main_loop() call might help,
but mgmt application should stop assuming that QEMU has started
successfuly and use other means to collect errors from QEMU (logfile).

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
v5:
  * use original Daniel's patch [1], but addapt it to apply on top of
    "[PATCH v3 1/2] cli: Don't run early event loop if no  --preconfig was specified"
    with extra comment and massage commit message a little bit.
v6:
  * hide os_setup_post_done flag inside of os_setup_post() as it was in v4

CC: berrange@redhat.com
CC: mreitz@redhat.com
CC: pbonzini@redhat.com
CC: ehabkost@redhat.com
CC: ldoktor@redhat.com
CC: eblake@redhat.com
---
 os-posix.c | 6 ++++++
 vl.c       | 6 ++++++
 2 files changed, 12 insertions(+)

Comments

Eduardo Habkost June 8, 2018, 1:21 p.m. UTC | #1
On Thu, Jun 07, 2018 at 02:00:09PM +0200, Igor Mammedov wrote:
> When using --daemonize, the initial lead process will fork a child and
> then wait to be notified that setup is complete via a pipe, before it
> exits.  When using --preconfig there is an extra call to main_loop()
> before the notification is done from os_setup_post(). Thus the parent
> process won't exit until the mgmt application connects to the monitor
> and tells QEMU to leave the RUN_STATE_PRECONFIG. The mgmt application
> won't connect to the monitor until daemonizing has completed though.
> 
> This is a chicken and egg problem, leading to deadlock at startup.
> 
> The only viable way to fix this is to call os_setup_post() before
> the early main_loop() call when --preconfig is used. This has the
> downside that any errors from this point onwards won't be handled
> well by the mgmt application, because it will think QEMU has started
> successfully, so not be expecting an abrupt exit. Moving as much user
> input validation as possible to before the main_loop() call might help,
> but mgmt application should stop assuming that QEMU has started
> successfuly and use other means to collect errors from QEMU (logfile).
> 
> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> ---
> v5:
>   * use original Daniel's patch [1], but addapt it to apply on top of
>     "[PATCH v3 1/2] cli: Don't run early event loop if no  --preconfig was specified"
>     with extra comment and massage commit message a little bit.
> v6:
>   * hide os_setup_post_done flag inside of os_setup_post() as it was in v4
> 
> CC: berrange@redhat.com
> CC: mreitz@redhat.com
> CC: pbonzini@redhat.com
> CC: ehabkost@redhat.com
> CC: ldoktor@redhat.com
> CC: eblake@redhat.com
> ---
>  os-posix.c | 6 ++++++
>  vl.c       | 6 ++++++
>  2 files changed, 12 insertions(+)
> 
> diff --git a/os-posix.c b/os-posix.c
> index 9ce6f74..0246195 100644
> --- a/os-posix.c
> +++ b/os-posix.c
> @@ -309,8 +309,14 @@ void os_daemonize(void)
>  
>  void os_setup_post(void)
>  {
> +    static bool os_setup_post_done;
>      int fd = 0;
>  
> +    if (os_setup_post_done) {
> +        return;
> +    }
> +    os_setup_post_done = true;
> +
>      if (daemonize) {
>          if (chdir("/")) {
>              error_report("not able to chdir to /: %s", strerror(errno));
> diff --git a/vl.c b/vl.c
> index fa44138..457ff2a 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -4578,6 +4578,12 @@ int main(int argc, char **argv, char **envp)
>      parse_numa_opts(current_machine);
>  
>      /* do monitor/qmp handling at preconfig state if requested */
> +    if (!preconfig_exit_requested && is_daemonized()) {
> +        /* signal parent QEMU to exit, libvirt treats it as a sign
> +         * that monitor socket is ready to accept connections
> +         */
> +        os_setup_post();
> +    }

I was looking at the daemonize logic, and noticed it we have a
huge amount of code between this line and the next
os_setup_post() call that could either:

* call exit() and/or error_report(); or
* be unable to finish machine initialization because of
  chdir("/"), change_root(), or change_process_uid().

Doesn't this make -preconfig and -daemonize fundamentally
incompatible?


>      main_loop();
>  
>      /* from here on runstate is RUN_STATE_PRELAUNCH */
> -- 
> 2.7.4
> 
>
Igor Mammedov June 11, 2018, 1:16 p.m. UTC | #2
On Fri, 8 Jun 2018 10:21:05 -0300
Eduardo Habkost <ehabkost@redhat.com> wrote:

> On Thu, Jun 07, 2018 at 02:00:09PM +0200, Igor Mammedov wrote:
> > When using --daemonize, the initial lead process will fork a child and
> > then wait to be notified that setup is complete via a pipe, before it
> > exits.  When using --preconfig there is an extra call to main_loop()
> > before the notification is done from os_setup_post(). Thus the parent
> > process won't exit until the mgmt application connects to the monitor
> > and tells QEMU to leave the RUN_STATE_PRECONFIG. The mgmt application
> > won't connect to the monitor until daemonizing has completed though.
> > 
> > This is a chicken and egg problem, leading to deadlock at startup.
> > 
> > The only viable way to fix this is to call os_setup_post() before
> > the early main_loop() call when --preconfig is used. This has the
> > downside that any errors from this point onwards won't be handled
> > well by the mgmt application, because it will think QEMU has started
> > successfully, so not be expecting an abrupt exit. Moving as much user
> > input validation as possible to before the main_loop() call might help,
> > but mgmt application should stop assuming that QEMU has started
> > successfuly and use other means to collect errors from QEMU (logfile).
> > 
> > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> > ---
> > v5:
> >   * use original Daniel's patch [1], but addapt it to apply on top of
> >     "[PATCH v3 1/2] cli: Don't run early event loop if no  --preconfig was specified"
> >     with extra comment and massage commit message a little bit.
> > v6:
> >   * hide os_setup_post_done flag inside of os_setup_post() as it was in v4
> > 
> > CC: berrange@redhat.com
> > CC: mreitz@redhat.com
> > CC: pbonzini@redhat.com
> > CC: ehabkost@redhat.com
> > CC: ldoktor@redhat.com
> > CC: eblake@redhat.com
> > ---
> >  os-posix.c | 6 ++++++
> >  vl.c       | 6 ++++++
> >  2 files changed, 12 insertions(+)
> > 
> > diff --git a/os-posix.c b/os-posix.c
> > index 9ce6f74..0246195 100644
> > --- a/os-posix.c
> > +++ b/os-posix.c
> > @@ -309,8 +309,14 @@ void os_daemonize(void)
> >  
> >  void os_setup_post(void)
> >  {
> > +    static bool os_setup_post_done;
> >      int fd = 0;
> >  
> > +    if (os_setup_post_done) {
> > +        return;
> > +    }
> > +    os_setup_post_done = true;
> > +
> >      if (daemonize) {
> >          if (chdir("/")) {
> >              error_report("not able to chdir to /: %s", strerror(errno));
> > diff --git a/vl.c b/vl.c
> > index fa44138..457ff2a 100644
> > --- a/vl.c
> > +++ b/vl.c
> > @@ -4578,6 +4578,12 @@ int main(int argc, char **argv, char **envp)
> >      parse_numa_opts(current_machine);
> >  
> >      /* do monitor/qmp handling at preconfig state if requested */
> > +    if (!preconfig_exit_requested && is_daemonized()) {
> > +        /* signal parent QEMU to exit, libvirt treats it as a sign
> > +         * that monitor socket is ready to accept connections
> > +         */
> > +        os_setup_post();
> > +    }  
> 
> I was looking at the daemonize logic, and noticed it we have a
> huge amount of code between this line and the next
> os_setup_post() call that could either:
> 
> * call exit() and/or error_report(); or
logging would work to the extent mentioned in commit message,
i.e. it' would work fine when log file is used otherwise it
errors will go to /dev/null

so it should be more or less fine on this point

> * be unable to finish machine initialization because of
>   chdir("/"), change_root(), or change_process_uid().
this one really no go.
I see 2 options here,

 * move init code that opens files to early stage (before preconfig monitor)
   or split it to open files early.
   (I've spotted several obvious places fwcfg/vnc/replay_start/migration)
   but there might be code somewhere in callbacks that would do it too,
   so it rather risky to go this route.
   (I'd do this anyways one place at the time using sanitizing
    initialization sequence pretext.)

 * split out signaling part that tells parent process to exit into
   separate helper that's called once before/from main_loop().
   This option seems low risk and additionally error output to
   stderr will work as it does currently (until os_setup_post())

> Doesn't this make -preconfig and -daemonize fundamentally
> incompatible?
Don't see anything that prevents both to work together fundamentally.
essentially -preconfig is extra configuration after CLI,
we potentially would be able execute commands that open files there,
so we should leave chroot & co where it is now.

> 
> 
> >      main_loop();
> >  
> >      /* from here on runstate is RUN_STATE_PRELAUNCH */
> > -- 
> > 2.7.4
> > 
> >   
>
Eduardo Habkost June 11, 2018, 7:06 p.m. UTC | #3
On Mon, Jun 11, 2018 at 03:16:25PM +0200, Igor Mammedov wrote:
> On Fri, 8 Jun 2018 10:21:05 -0300
> Eduardo Habkost <ehabkost@redhat.com> wrote:
> 
> > On Thu, Jun 07, 2018 at 02:00:09PM +0200, Igor Mammedov wrote:
> > > When using --daemonize, the initial lead process will fork a child and
> > > then wait to be notified that setup is complete via a pipe, before it
> > > exits.  When using --preconfig there is an extra call to main_loop()
> > > before the notification is done from os_setup_post(). Thus the parent
> > > process won't exit until the mgmt application connects to the monitor
> > > and tells QEMU to leave the RUN_STATE_PRECONFIG. The mgmt application
> > > won't connect to the monitor until daemonizing has completed though.
> > > 
> > > This is a chicken and egg problem, leading to deadlock at startup.
> > > 
> > > The only viable way to fix this is to call os_setup_post() before
> > > the early main_loop() call when --preconfig is used. This has the
> > > downside that any errors from this point onwards won't be handled
> > > well by the mgmt application, because it will think QEMU has started
> > > successfully, so not be expecting an abrupt exit. Moving as much user
> > > input validation as possible to before the main_loop() call might help,
> > > but mgmt application should stop assuming that QEMU has started
> > > successfuly and use other means to collect errors from QEMU (logfile).
> > > 
> > > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> > > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> > > ---
> > > v5:
> > >   * use original Daniel's patch [1], but addapt it to apply on top of
> > >     "[PATCH v3 1/2] cli: Don't run early event loop if no  --preconfig was specified"
> > >     with extra comment and massage commit message a little bit.
> > > v6:
> > >   * hide os_setup_post_done flag inside of os_setup_post() as it was in v4
> > > 
> > > CC: berrange@redhat.com
> > > CC: mreitz@redhat.com
> > > CC: pbonzini@redhat.com
> > > CC: ehabkost@redhat.com
> > > CC: ldoktor@redhat.com
> > > CC: eblake@redhat.com
> > > ---
> > >  os-posix.c | 6 ++++++
> > >  vl.c       | 6 ++++++
> > >  2 files changed, 12 insertions(+)
> > > 
> > > diff --git a/os-posix.c b/os-posix.c
> > > index 9ce6f74..0246195 100644
> > > --- a/os-posix.c
> > > +++ b/os-posix.c
> > > @@ -309,8 +309,14 @@ void os_daemonize(void)
> > >  
> > >  void os_setup_post(void)
> > >  {
> > > +    static bool os_setup_post_done;
> > >      int fd = 0;
> > >  
> > > +    if (os_setup_post_done) {
> > > +        return;
> > > +    }
> > > +    os_setup_post_done = true;
> > > +
> > >      if (daemonize) {
> > >          if (chdir("/")) {
> > >              error_report("not able to chdir to /: %s", strerror(errno));
> > > diff --git a/vl.c b/vl.c
> > > index fa44138..457ff2a 100644
> > > --- a/vl.c
> > > +++ b/vl.c
> > > @@ -4578,6 +4578,12 @@ int main(int argc, char **argv, char **envp)
> > >      parse_numa_opts(current_machine);
> > >  
> > >      /* do monitor/qmp handling at preconfig state if requested */
> > > +    if (!preconfig_exit_requested && is_daemonized()) {
> > > +        /* signal parent QEMU to exit, libvirt treats it as a sign
> > > +         * that monitor socket is ready to accept connections
> > > +         */
> > > +        os_setup_post();
> > > +    }  
> > 
> > I was looking at the daemonize logic, and noticed it we have a
> > huge amount of code between this line and the next
> > os_setup_post() call that could either:
> > 
> > * call exit() and/or error_report(); or
> logging would work to the extent mentioned in commit message,
> i.e. it' would work fine when log file is used otherwise it
> errors will go to /dev/null
> 
> so it should be more or less fine on this point

My worry is that most users of error_report() involve an exit()
call too.

Once we have an active monitor, we must never call exit()
directly.  Even qmp_quit() doesn't call exit() directly.

> 
> > * be unable to finish machine initialization because of
> >   chdir("/"), change_root(), or change_process_uid().
> this one really no go.
> I see 2 options here,
> 
>  * move init code that opens files to early stage (before preconfig monitor)
>    or split it to open files early.
>    (I've spotted several obvious places fwcfg/vnc/replay_start/migration)
>    but there might be code somewhere in callbacks that would do it too,
>    so it rather risky to go this route.
>    (I'd do this anyways one place at the time using sanitizing
>     initialization sequence pretext.)

We might have QMP commands that take file paths as input, so is
this really an option?


> 
>  * split out signaling part that tells parent process to exit into
>    separate helper that's called once before/from main_loop().
>    This option seems low risk and additionally error output to
>    stderr will work as it does currently (until os_setup_post())

My assumption is that separating the chdir()/stdout/stderr logic
from the fork/daemonize/exit steps wouldn't be possible without
breaking expectations about -daemonize.
Igor Mammedov June 11, 2018, 9:29 p.m. UTC | #4
On Mon, 11 Jun 2018 16:06:07 -0300
Eduardo Habkost <ehabkost@redhat.com> wrote:

> On Mon, Jun 11, 2018 at 03:16:25PM +0200, Igor Mammedov wrote:
> > On Fri, 8 Jun 2018 10:21:05 -0300
> > Eduardo Habkost <ehabkost@redhat.com> wrote:
> > 
> > > On Thu, Jun 07, 2018 at 02:00:09PM +0200, Igor Mammedov wrote:
> > > > When using --daemonize, the initial lead process will fork a child and
> > > > then wait to be notified that setup is complete via a pipe, before it
> > > > exits.  When using --preconfig there is an extra call to main_loop()
> > > > before the notification is done from os_setup_post(). Thus the parent
> > > > process won't exit until the mgmt application connects to the monitor
> > > > and tells QEMU to leave the RUN_STATE_PRECONFIG. The mgmt application
> > > > won't connect to the monitor until daemonizing has completed though.
> > > > 
> > > > This is a chicken and egg problem, leading to deadlock at startup.
> > > > 
> > > > The only viable way to fix this is to call os_setup_post() before
> > > > the early main_loop() call when --preconfig is used. This has the
> > > > downside that any errors from this point onwards won't be handled
> > > > well by the mgmt application, because it will think QEMU has started
> > > > successfully, so not be expecting an abrupt exit. Moving as much user
> > > > input validation as possible to before the main_loop() call might help,
> > > > but mgmt application should stop assuming that QEMU has started
> > > > successfuly and use other means to collect errors from QEMU (logfile).
> > > > 
> > > > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> > > > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> > > > ---
> > > > v5:
> > > >   * use original Daniel's patch [1], but addapt it to apply on top of
> > > >     "[PATCH v3 1/2] cli: Don't run early event loop if no  --preconfig was specified"
> > > >     with extra comment and massage commit message a little bit.
> > > > v6:
> > > >   * hide os_setup_post_done flag inside of os_setup_post() as it was in v4
> > > > 
> > > > CC: berrange@redhat.com
> > > > CC: mreitz@redhat.com
> > > > CC: pbonzini@redhat.com
> > > > CC: ehabkost@redhat.com
> > > > CC: ldoktor@redhat.com
> > > > CC: eblake@redhat.com
> > > > ---
> > > >  os-posix.c | 6 ++++++
> > > >  vl.c       | 6 ++++++
> > > >  2 files changed, 12 insertions(+)
> > > > 
> > > > diff --git a/os-posix.c b/os-posix.c
> > > > index 9ce6f74..0246195 100644
> > > > --- a/os-posix.c
> > > > +++ b/os-posix.c
> > > > @@ -309,8 +309,14 @@ void os_daemonize(void)
> > > >  
> > > >  void os_setup_post(void)
> > > >  {
> > > > +    static bool os_setup_post_done;
> > > >      int fd = 0;
> > > >  
> > > > +    if (os_setup_post_done) {
> > > > +        return;
> > > > +    }
> > > > +    os_setup_post_done = true;
> > > > +
> > > >      if (daemonize) {
> > > >          if (chdir("/")) {
> > > >              error_report("not able to chdir to /: %s", strerror(errno));
> > > > diff --git a/vl.c b/vl.c
> > > > index fa44138..457ff2a 100644
> > > > --- a/vl.c
> > > > +++ b/vl.c
> > > > @@ -4578,6 +4578,12 @@ int main(int argc, char **argv, char **envp)
> > > >      parse_numa_opts(current_machine);
> > > >  
> > > >      /* do monitor/qmp handling at preconfig state if requested */
> > > > +    if (!preconfig_exit_requested && is_daemonized()) {
> > > > +        /* signal parent QEMU to exit, libvirt treats it as a sign
> > > > +         * that monitor socket is ready to accept connections
> > > > +         */
> > > > +        os_setup_post();
> > > > +    }  
> > > 
> > > I was looking at the daemonize logic, and noticed it we have a
> > > huge amount of code between this line and the next
> > > os_setup_post() call that could either:
> > > 
> > > * call exit() and/or error_report(); or
> > logging would work to the extent mentioned in commit message,
> > i.e. it' would work fine when log file is used otherwise it
> > errors will go to /dev/null
> > 
> > so it should be more or less fine on this point
> 
> My worry is that most users of error_report() involve an exit()
> call too.
> 
> Once we have an active monitor, we must never call exit()
> directly.  Even qmp_quit() doesn't call exit() directly.
Is there any reason why exit() can't be called?

> > > * be unable to finish machine initialization because of
> > >   chdir("/"), change_root(), or change_process_uid().
> > this one really no go.
> > I see 2 options here,
> > 
> >  * move init code that opens files to early stage (before preconfig monitor)
> >    or split it to open files early.
> >    (I've spotted several obvious places fwcfg/vnc/replay_start/migration)
> >    but there might be code somewhere in callbacks that would do it too,
> >    so it rather risky to go this route.
> >    (I'd do this anyways one place at the time using sanitizing
> >     initialization sequence pretext.)
> 
> We might have QMP commands that take file paths as input, so is
> this really an option?
I'd think that in future we would want to enable object_add in preconfig
to create backends at runtime, so yes we can't do chroot at this point
 

> >  * split out signaling part that tells parent process to exit into
> >    separate helper that's called once before/from main_loop().
> >    This option seems low risk and additionally error output to
> >    stderr will work as it does currently (until os_setup_post())
> 
> My assumption is that separating the chdir()/stdout/stderr logic
> from the fork/daemonize/exit steps wouldn't be possible without
> breaking expectations about -daemonize.
it's already separated and that's what creates one side of problem.
What I suggest is to leave it as is and move out only
  len = write(daemon_pipe, &status, 1);
part of os_setup_post() to sync with parent process. That shouldn't
affect daemonizing flow on QEMU side and would let libvirt reuse parent's
exit as sync point to detect moment when monitor is available.
(patch is in testing, I'll post it tomorrow if it doesn't break tests)

In worst case if we can't do the later in QEMU, mgmt would have to cope with
monitor in preconfig mode without relying on parent exit(0) sync point.
(a typical daemon would fork/chroot and co in one place and clients would use
other means to detect socket availability other than watching parent process
exiting)
Eduardo Habkost June 11, 2018, 10:36 p.m. UTC | #5
CCing libvir-list.

On Mon, Jun 11, 2018 at 11:29:24PM +0200, Igor Mammedov wrote:
> On Mon, 11 Jun 2018 16:06:07 -0300
> Eduardo Habkost <ehabkost@redhat.com> wrote:
> 
> > On Mon, Jun 11, 2018 at 03:16:25PM +0200, Igor Mammedov wrote:
> > > On Fri, 8 Jun 2018 10:21:05 -0300
> > > Eduardo Habkost <ehabkost@redhat.com> wrote:
> > > 
> > > > On Thu, Jun 07, 2018 at 02:00:09PM +0200, Igor Mammedov wrote:
> > > > > When using --daemonize, the initial lead process will fork a child and
> > > > > then wait to be notified that setup is complete via a pipe, before it
> > > > > exits.  When using --preconfig there is an extra call to main_loop()
> > > > > before the notification is done from os_setup_post(). Thus the parent
> > > > > process won't exit until the mgmt application connects to the monitor
> > > > > and tells QEMU to leave the RUN_STATE_PRECONFIG. The mgmt application
> > > > > won't connect to the monitor until daemonizing has completed though.
> > > > > 
> > > > > This is a chicken and egg problem, leading to deadlock at startup.
> > > > > 
> > > > > The only viable way to fix this is to call os_setup_post() before
> > > > > the early main_loop() call when --preconfig is used. This has the
> > > > > downside that any errors from this point onwards won't be handled
> > > > > well by the mgmt application, because it will think QEMU has started
> > > > > successfully, so not be expecting an abrupt exit. Moving as much user
> > > > > input validation as possible to before the main_loop() call might help,
> > > > > but mgmt application should stop assuming that QEMU has started
> > > > > successfuly and use other means to collect errors from QEMU (logfile).
> > > > > 
> > > > > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> > > > > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> > > > > ---
> > > > > v5:
> > > > >   * use original Daniel's patch [1], but addapt it to apply on top of
> > > > >     "[PATCH v3 1/2] cli: Don't run early event loop if no  --preconfig was specified"
> > > > >     with extra comment and massage commit message a little bit.
> > > > > v6:
> > > > >   * hide os_setup_post_done flag inside of os_setup_post() as it was in v4
> > > > > 
> > > > > CC: berrange@redhat.com
> > > > > CC: mreitz@redhat.com
> > > > > CC: pbonzini@redhat.com
> > > > > CC: ehabkost@redhat.com
> > > > > CC: ldoktor@redhat.com
> > > > > CC: eblake@redhat.com
> > > > > ---
> > > > >  os-posix.c | 6 ++++++
> > > > >  vl.c       | 6 ++++++
> > > > >  2 files changed, 12 insertions(+)
> > > > > 
> > > > > diff --git a/os-posix.c b/os-posix.c
> > > > > index 9ce6f74..0246195 100644
> > > > > --- a/os-posix.c
> > > > > +++ b/os-posix.c
> > > > > @@ -309,8 +309,14 @@ void os_daemonize(void)
> > > > >  
> > > > >  void os_setup_post(void)
> > > > >  {
> > > > > +    static bool os_setup_post_done;
> > > > >      int fd = 0;
> > > > >  
> > > > > +    if (os_setup_post_done) {
> > > > > +        return;
> > > > > +    }
> > > > > +    os_setup_post_done = true;
> > > > > +
> > > > >      if (daemonize) {
> > > > >          if (chdir("/")) {
> > > > >              error_report("not able to chdir to /: %s", strerror(errno));
> > > > > diff --git a/vl.c b/vl.c
> > > > > index fa44138..457ff2a 100644
> > > > > --- a/vl.c
> > > > > +++ b/vl.c
> > > > > @@ -4578,6 +4578,12 @@ int main(int argc, char **argv, char **envp)
> > > > >      parse_numa_opts(current_machine);
> > > > >  
> > > > >      /* do monitor/qmp handling at preconfig state if requested */
> > > > > +    if (!preconfig_exit_requested && is_daemonized()) {
> > > > > +        /* signal parent QEMU to exit, libvirt treats it as a sign
> > > > > +         * that monitor socket is ready to accept connections
> > > > > +         */
> > > > > +        os_setup_post();
> > > > > +    }  
> > > > 
> > > > I was looking at the daemonize logic, and noticed it we have a
> > > > huge amount of code between this line and the next
> > > > os_setup_post() call that could either:
> > > > 
> > > > * call exit() and/or error_report(); or
> > > logging would work to the extent mentioned in commit message,
> > > i.e. it' would work fine when log file is used otherwise it
> > > errors will go to /dev/null
> > > 
> > > so it should be more or less fine on this point
> > 
> > My worry is that most users of error_report() involve an exit()
> > call too.
> > 
> > Once we have an active monitor, we must never call exit()
> > directly.  Even qmp_quit() doesn't call exit() directly.
> Is there any reason why exit() can't be called?

QMP clients don't expect the QMP socket to be closed except when
using the 'quit' command.

> 
> > > > * be unable to finish machine initialization because of
> > > >   chdir("/"), change_root(), or change_process_uid().
> > > this one really no go.
> > > I see 2 options here,
> > > 
> > >  * move init code that opens files to early stage (before preconfig monitor)
> > >    or split it to open files early.
> > >    (I've spotted several obvious places fwcfg/vnc/replay_start/migration)
> > >    but there might be code somewhere in callbacks that would do it too,
> > >    so it rather risky to go this route.
> > >    (I'd do this anyways one place at the time using sanitizing
> > >     initialization sequence pretext.)
> > 
> > We might have QMP commands that take file paths as input, so is
> > this really an option?
> I'd think that in future we would want to enable object_add in preconfig
> to create backends at runtime, so yes we can't do chroot at this point
>  
> 
> > >  * split out signaling part that tells parent process to exit into
> > >    separate helper that's called once before/from main_loop().
> > >    This option seems low risk and additionally error output to
> > >    stderr will work as it does currently (until os_setup_post())
> > 
> > My assumption is that separating the chdir()/stdout/stderr logic
> > from the fork/daemonize/exit steps wouldn't be possible without
> > breaking expectations about -daemonize.
> it's already separated and that's what creates one side of problem.

Is it?  Right now '$QEMU -daemonize' will never call exit(0)
before the child process it spawned did the
chdir()/stdout/stderr/etc trick.


> What I suggest is to leave it as is and move out only
>   len = write(daemon_pipe, &status, 1);
> part of os_setup_post() to sync with parent process. That shouldn't
> affect daemonizing flow on QEMU side and would let libvirt reuse parent's
> exit as sync point to detect moment when monitor is available.
> (patch is in testing, I'll post it tomorrow if it doesn't break tests)

This will affect the daemonizing flow, won't it?  It will make
QEMU exit(0) before the child does the chdir()/stderr/stdout/etc
cleanup.  A well-behaved daemon shouldn't do this.

This is probably not a problem for libvirt (which only uses
-daemonize as a sync point for QMP), but possibly a problem for
other users of -daemonize.

> 
> In worst case if we can't do the later in QEMU, mgmt would have to cope with
> monitor in preconfig mode without relying on parent exit(0) sync point.
> (a typical daemon would fork/chroot and co in one place and clients would use
> other means to detect socket availability other than watching parent process
> exiting)

Do we really need to make -daemonize and -preconfig work
together?  libvirt uses -daemonize only for its initial
capability probing, which shouldn't require -preconfig at all.

Even on that case, I wonder why libvirt doesn't simply create a
server socket and waits for QEMU to connect instead of using
-daemonize as a sync point.
Michal Prívozník June 12, 2018, 9:17 a.m. UTC | #6
On 06/12/2018 12:36 AM, Eduardo Habkost wrote:
> CCing libvir-list.
> 
> On Mon, Jun 11, 2018 at 11:29:24PM +0200, Igor Mammedov wrote:
>> On Mon, 11 Jun 2018 16:06:07 -0300
>> Eduardo Habkost <ehabkost@redhat.com> wrote:
>>
>>> On Mon, Jun 11, 2018 at 03:16:25PM +0200, Igor Mammedov wrote:
>>>> On Fri, 8 Jun 2018 10:21:05 -0300
>>>> Eduardo Habkost <ehabkost@redhat.com> wrote:
>>>>
>>>>> On Thu, Jun 07, 2018 at 02:00:09PM +0200, Igor Mammedov wrote:
>>>>>> When using --daemonize, the initial lead process will fork a child and
>>>>>> then wait to be notified that setup is complete via a pipe, before it
>>>>>> exits.  When using --preconfig there is an extra call to main_loop()
>>>>>> before the notification is done from os_setup_post(). Thus the parent
>>>>>> process won't exit until the mgmt application connects to the monitor
>>>>>> and tells QEMU to leave the RUN_STATE_PRECONFIG. The mgmt application
>>>>>> won't connect to the monitor until daemonizing has completed though.
>>>>>>
>>>>>> This is a chicken and egg problem, leading to deadlock at startup.
>>>>>>
>>>>>> The only viable way to fix this is to call os_setup_post() before
>>>>>> the early main_loop() call when --preconfig is used. This has the
>>>>>> downside that any errors from this point onwards won't be handled
>>>>>> well by the mgmt application, because it will think QEMU has started
>>>>>> successfully, so not be expecting an abrupt exit. Moving as much user
>>>>>> input validation as possible to before the main_loop() call might help,
>>>>>> but mgmt application should stop assuming that QEMU has started
>>>>>> successfuly and use other means to collect errors from QEMU (logfile).
>>>>>>
>>>>>> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
>>>>>> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
>>>>>> ---
>>>>>> v5:
>>>>>>   * use original Daniel's patch [1], but addapt it to apply on top of
>>>>>>     "[PATCH v3 1/2] cli: Don't run early event loop if no  --preconfig was specified"
>>>>>>     with extra comment and massage commit message a little bit.
>>>>>> v6:
>>>>>>   * hide os_setup_post_done flag inside of os_setup_post() as it was in v4
>>>>>>
>>>>>> CC: berrange@redhat.com
>>>>>> CC: mreitz@redhat.com
>>>>>> CC: pbonzini@redhat.com
>>>>>> CC: ehabkost@redhat.com
>>>>>> CC: ldoktor@redhat.com
>>>>>> CC: eblake@redhat.com
>>>>>> ---
>>>>>>  os-posix.c | 6 ++++++
>>>>>>  vl.c       | 6 ++++++
>>>>>>  2 files changed, 12 insertions(+)
>>>>>>
>>>>>> diff --git a/os-posix.c b/os-posix.c
>>>>>> index 9ce6f74..0246195 100644
>>>>>> --- a/os-posix.c
>>>>>> +++ b/os-posix.c
>>>>>> @@ -309,8 +309,14 @@ void os_daemonize(void)
>>>>>>  
>>>>>>  void os_setup_post(void)
>>>>>>  {
>>>>>> +    static bool os_setup_post_done;
>>>>>>      int fd = 0;
>>>>>>  
>>>>>> +    if (os_setup_post_done) {
>>>>>> +        return;
>>>>>> +    }
>>>>>> +    os_setup_post_done = true;
>>>>>> +
>>>>>>      if (daemonize) {
>>>>>>          if (chdir("/")) {
>>>>>>              error_report("not able to chdir to /: %s", strerror(errno));
>>>>>> diff --git a/vl.c b/vl.c
>>>>>> index fa44138..457ff2a 100644
>>>>>> --- a/vl.c
>>>>>> +++ b/vl.c
>>>>>> @@ -4578,6 +4578,12 @@ int main(int argc, char **argv, char **envp)
>>>>>>      parse_numa_opts(current_machine);
>>>>>>  
>>>>>>      /* do monitor/qmp handling at preconfig state if requested */
>>>>>> +    if (!preconfig_exit_requested && is_daemonized()) {
>>>>>> +        /* signal parent QEMU to exit, libvirt treats it as a sign
>>>>>> +         * that monitor socket is ready to accept connections
>>>>>> +         */
>>>>>> +        os_setup_post();
>>>>>> +    }  
>>>>>
>>>>> I was looking at the daemonize logic, and noticed it we have a
>>>>> huge amount of code between this line and the next
>>>>> os_setup_post() call that could either:
>>>>>
>>>>> * call exit() and/or error_report(); or
>>>> logging would work to the extent mentioned in commit message,
>>>> i.e. it' would work fine when log file is used otherwise it
>>>> errors will go to /dev/null
>>>>
>>>> so it should be more or less fine on this point
>>>
>>> My worry is that most users of error_report() involve an exit()
>>> call too.
>>>
>>> Once we have an active monitor, we must never call exit()
>>> directly.  Even qmp_quit() doesn't call exit() directly.
>> Is there any reason why exit() can't be called?
> 
> QMP clients don't expect the QMP socket to be closed except when
> using the 'quit' command.

Libvirt views HANGUP on monitor socket as qemu process dying
unexpectedly so it starts cleanup (which involves 'kill -9' of qemu).

> 
>>
>>>>> * be unable to finish machine initialization because of
>>>>>   chdir("/"), change_root(), or change_process_uid().
>>>> this one really no go.
>>>> I see 2 options here,
>>>>
>>>>  * move init code that opens files to early stage (before preconfig monitor)
>>>>    or split it to open files early.
>>>>    (I've spotted several obvious places fwcfg/vnc/replay_start/migration)
>>>>    but there might be code somewhere in callbacks that would do it too,
>>>>    so it rather risky to go this route.
>>>>    (I'd do this anyways one place at the time using sanitizing
>>>>     initialization sequence pretext.)
>>>
>>> We might have QMP commands that take file paths as input, so is
>>> this really an option?
>> I'd think that in future we would want to enable object_add in preconfig
>> to create backends at runtime, so yes we can't do chroot at this point
>>  
>>
>>>>  * split out signaling part that tells parent process to exit into
>>>>    separate helper that's called once before/from main_loop().
>>>>    This option seems low risk and additionally error output to
>>>>    stderr will work as it does currently (until os_setup_post())
>>>
>>> My assumption is that separating the chdir()/stdout/stderr logic
>>> from the fork/daemonize/exit steps wouldn't be possible without
>>> breaking expectations about -daemonize.
>> it's already separated and that's what creates one side of problem.
> 
> Is it?  Right now '$QEMU -daemonize' will never call exit(0)
> before the child process it spawned did the
> chdir()/stdout/stderr/etc trick.
> 
> 
>> What I suggest is to leave it as is and move out only
>>   len = write(daemon_pipe, &status, 1);
>> part of os_setup_post() to sync with parent process. That shouldn't
>> affect daemonizing flow on QEMU side and would let libvirt reuse parent's
>> exit as sync point to detect moment when monitor is available.
>> (patch is in testing, I'll post it tomorrow if it doesn't break tests)
> 
> This will affect the daemonizing flow, won't it?  It will make
> QEMU exit(0) before the child does the chdir()/stderr/stdout/etc
> cleanup.  A well-behaved daemon shouldn't do this.
> 
> This is probably not a problem for libvirt (which only uses
> -daemonize as a sync point for QMP), but possibly a problem for
> other users of -daemonize.

I think the proper behaviour is to exit(0) after chdir()/stderr/... AND
monitor is set up. Then, if any caller started qemu with -preconfig they
know they can start talking on monitor, set up whatever it is they want
and issue 'cont' finally (or what is the right command to exit preconfig
state). This way nothing changes for callers not using -preconfig.

> 
>>
>> In worst case if we can't do the later in QEMU, mgmt would have to cope with
>> monitor in preconfig mode without relying on parent exit(0) sync point.
>> (a typical daemon would fork/chroot and co in one place and clients would use
>> other means to detect socket availability other than watching parent process
>> exiting)
> 
> Do we really need to make -daemonize and -preconfig work
> together?  libvirt uses -daemonize only for its initial
> capability probing, which shouldn't require -preconfig at all.
> 
> Even on that case, I wonder why libvirt doesn't simply create a
> server socket and waits for QEMU to connect instead of using
> -daemonize as a sync point.
> 

because libvirt views qemu as well behaved daemon. Should anything go
wrong libvirt reads qemu's stderr and reports error to upper layers.

Michal
Igor Mammedov June 12, 2018, 12:42 p.m. UTC | #7
On Tue, 12 Jun 2018 11:17:15 +0200
Michal Privoznik <mprivozn@redhat.com> wrote:

> On 06/12/2018 12:36 AM, Eduardo Habkost wrote:
> > CCing libvir-list.
> > 
> > On Mon, Jun 11, 2018 at 11:29:24PM +0200, Igor Mammedov wrote:  
> >> On Mon, 11 Jun 2018 16:06:07 -0300
> >> Eduardo Habkost <ehabkost@redhat.com> wrote:
> >>  
> >>> On Mon, Jun 11, 2018 at 03:16:25PM +0200, Igor Mammedov wrote:  
> >>>> On Fri, 8 Jun 2018 10:21:05 -0300
> >>>> Eduardo Habkost <ehabkost@redhat.com> wrote:
> >>>>  
> >>>>> On Thu, Jun 07, 2018 at 02:00:09PM +0200, Igor Mammedov wrote:  
> >>>>>> When using --daemonize, the initial lead process will fork a child and
> >>>>>> then wait to be notified that setup is complete via a pipe, before it
> >>>>>> exits.  When using --preconfig there is an extra call to main_loop()
> >>>>>> before the notification is done from os_setup_post(). Thus the parent
> >>>>>> process won't exit until the mgmt application connects to the monitor
> >>>>>> and tells QEMU to leave the RUN_STATE_PRECONFIG. The mgmt application
> >>>>>> won't connect to the monitor until daemonizing has completed though.
> >>>>>>
> >>>>>> This is a chicken and egg problem, leading to deadlock at startup.
> >>>>>>
> >>>>>> The only viable way to fix this is to call os_setup_post() before
> >>>>>> the early main_loop() call when --preconfig is used. This has the
> >>>>>> downside that any errors from this point onwards won't be handled
> >>>>>> well by the mgmt application, because it will think QEMU has started
> >>>>>> successfully, so not be expecting an abrupt exit. Moving as much user
> >>>>>> input validation as possible to before the main_loop() call might help,
> >>>>>> but mgmt application should stop assuming that QEMU has started
> >>>>>> successfuly and use other means to collect errors from QEMU (logfile).
> >>>>>>
> >>>>>> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> >>>>>> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> >>>>>> ---
> >>>>>> v5:
> >>>>>>   * use original Daniel's patch [1], but addapt it to apply on top of
> >>>>>>     "[PATCH v3 1/2] cli: Don't run early event loop if no  --preconfig was specified"
> >>>>>>     with extra comment and massage commit message a little bit.
> >>>>>> v6:
> >>>>>>   * hide os_setup_post_done flag inside of os_setup_post() as it was in v4
> >>>>>>
> >>>>>> CC: berrange@redhat.com
> >>>>>> CC: mreitz@redhat.com
> >>>>>> CC: pbonzini@redhat.com
> >>>>>> CC: ehabkost@redhat.com
> >>>>>> CC: ldoktor@redhat.com
> >>>>>> CC: eblake@redhat.com
> >>>>>> ---
> >>>>>>  os-posix.c | 6 ++++++
> >>>>>>  vl.c       | 6 ++++++
> >>>>>>  2 files changed, 12 insertions(+)
> >>>>>>
> >>>>>> diff --git a/os-posix.c b/os-posix.c
> >>>>>> index 9ce6f74..0246195 100644
> >>>>>> --- a/os-posix.c
> >>>>>> +++ b/os-posix.c
> >>>>>> @@ -309,8 +309,14 @@ void os_daemonize(void)
> >>>>>>  
> >>>>>>  void os_setup_post(void)
> >>>>>>  {
> >>>>>> +    static bool os_setup_post_done;
> >>>>>>      int fd = 0;
> >>>>>>  
> >>>>>> +    if (os_setup_post_done) {
> >>>>>> +        return;
> >>>>>> +    }
> >>>>>> +    os_setup_post_done = true;
> >>>>>> +
> >>>>>>      if (daemonize) {
> >>>>>>          if (chdir("/")) {
> >>>>>>              error_report("not able to chdir to /: %s", strerror(errno));
> >>>>>> diff --git a/vl.c b/vl.c
> >>>>>> index fa44138..457ff2a 100644
> >>>>>> --- a/vl.c
> >>>>>> +++ b/vl.c
> >>>>>> @@ -4578,6 +4578,12 @@ int main(int argc, char **argv, char **envp)
> >>>>>>      parse_numa_opts(current_machine);
> >>>>>>  
> >>>>>>      /* do monitor/qmp handling at preconfig state if requested */
> >>>>>> +    if (!preconfig_exit_requested && is_daemonized()) {
> >>>>>> +        /* signal parent QEMU to exit, libvirt treats it as a sign
> >>>>>> +         * that monitor socket is ready to accept connections
> >>>>>> +         */
> >>>>>> +        os_setup_post();
> >>>>>> +    }    
> >>>>>
> >>>>> I was looking at the daemonize logic, and noticed it we have a
> >>>>> huge amount of code between this line and the next
> >>>>> os_setup_post() call that could either:
> >>>>>
> >>>>> * call exit() and/or error_report(); or  
> >>>> logging would work to the extent mentioned in commit message,
> >>>> i.e. it' would work fine when log file is used otherwise it
> >>>> errors will go to /dev/null
> >>>>
> >>>> so it should be more or less fine on this point  
> >>>
> >>> My worry is that most users of error_report() involve an exit()
> >>> call too.
> >>>
> >>> Once we have an active monitor, we must never call exit()
> >>> directly.  Even qmp_quit() doesn't call exit() directly.  
> >> Is there any reason why exit() can't be called?  
> > 
> > QMP clients don't expect the QMP socket to be closed except when
> > using the 'quit' command.  
> 
> Libvirt views HANGUP on monitor socket as qemu process dying
> unexpectedly so it starts cleanup (which involves 'kill -9' of qemu).
So if we exit(1) there is a chance to get SIGKILL before exit(1) completes.
Do we care about it at this point?
(there are places when QEMU calls exit(1) at runtime on unrecoverable error)

> >>>>> * be unable to finish machine initialization because of
> >>>>>   chdir("/"), change_root(), or change_process_uid().  
> >>>> this one really no go.
> >>>> I see 2 options here,
> >>>>
> >>>>  * move init code that opens files to early stage (before preconfig monitor)
> >>>>    or split it to open files early.
> >>>>    (I've spotted several obvious places fwcfg/vnc/replay_start/migration)
> >>>>    but there might be code somewhere in callbacks that would do it too,
> >>>>    so it rather risky to go this route.
> >>>>    (I'd do this anyways one place at the time using sanitizing
> >>>>     initialization sequence pretext.)  
> >>>
> >>> We might have QMP commands that take file paths as input, so is
> >>> this really an option?  
> >> I'd think that in future we would want to enable object_add in preconfig
> >> to create backends at runtime, so yes we can't do chroot at this point
> >>  
> >>  
> >>>>  * split out signaling part that tells parent process to exit into
> >>>>    separate helper that's called once before/from main_loop().
> >>>>    This option seems low risk and additionally error output to
> >>>>    stderr will work as it does currently (until os_setup_post())  
> >>>
> >>> My assumption is that separating the chdir()/stdout/stderr logic
> >>> from the fork/daemonize/exit steps wouldn't be possible without
> >>> breaking expectations about -daemonize.  
> >> it's already separated and that's what creates one side of problem.  
> > 
> > Is it?  Right now '$QEMU -daemonize' will never call exit(0)
> > before the child process it spawned did the
> > chdir()/stdout/stderr/etc trick.
> > 
> >   
> >> What I suggest is to leave it as is and move out only
> >>   len = write(daemon_pipe, &status, 1);
> >> part of os_setup_post() to sync with parent process. That shouldn't
> >> affect daemonizing flow on QEMU side and would let libvirt reuse parent's
> >> exit as sync point to detect moment when monitor is available.
> >> (patch is in testing, I'll post it tomorrow if it doesn't break tests)  
> > 
> > This will affect the daemonizing flow, won't it?  It will make
> > QEMU exit(0) before the child does the chdir()/stderr/stdout/etc
> > cleanup.  A well-behaved daemon shouldn't do this.
> > 
> > This is probably not a problem for libvirt (which only uses
> > -daemonize as a sync point for QMP), but possibly a problem for
> > other users of -daemonize.  
> 
> I think the proper behaviour is to exit(0) after chdir()/stderr/... AND
> monitor is set up. Then, if any caller started qemu with -preconfig they
> know they can start talking on monitor, set up whatever it is they want
> and issue 'cont' finally (or what is the right command to exit preconfig
> state). This way nothing changes for callers not using -preconfig.
As pointed out earlier we need -preconfig stay before chdir/chroot/chuid/stderr/stdout
are called, so it would have the same access rights/permissions for configuration
as CLI options.

> >> In worst case if we can't do the later in QEMU, mgmt would have to cope with
> >> monitor in preconfig mode without relying on parent exit(0) sync point.
> >> (a typical daemon would fork/chroot and co in one place and clients would use
> >> other means to detect socket availability other than watching parent process
> >> exiting)  
> > 
> > Do we really need to make -daemonize and -preconfig work
> > together?  libvirt uses -daemonize only for its initial
> > capability probing, which shouldn't require -preconfig at all.
> > 
> > Even on that case, I wonder why libvirt doesn't simply create a
> > server socket and waits for QEMU to connect instead of using
> > -daemonize as a sync point.
> >   
> 
> because libvirt views qemu as well behaved daemon. Should anything go
> wrong libvirt reads qemu's stderr and reports error to upper layers.
We can keep daemonizing flow in QEMU as it's now.
But Eduardo's idea about libvirt created socked + letting QEMU connect to it
has a merit. It should fix current deadlock issue with as monitor
won't be depending on lead exit event.
Can we do this way on libvirt side when --preconfig is in use
(it might even be fine for normal flow without -preconfig)?

> Michal
Daniel P. Berrangé June 12, 2018, 12:50 p.m. UTC | #8
On Tue, Jun 12, 2018 at 02:42:05PM +0200, Igor Mammedov wrote:
> We can keep daemonizing flow in QEMU as it's now.
> But Eduardo's idea about libvirt created socked + letting QEMU connect to it
> has a merit. It should fix current deadlock issue with as monitor
> won't be depending on lead exit event.

NB, libvirt only ever uses --daemonize when probing capabilities, never
when launching QEMU for a real VM. In the latter case, we now use FD
passing, so libvirt opens the UNIX domain socket listener, and passes
this into QEMU. So libvirt knows it can connect to the listener
immediately and will only ever get a failure if QEMU has exited.

We can't use FD passing for the capabilities probing because of the
chicken & egg problem - we need to probe capabilities to find out
if FD passing it available or not. Fortunately with capabilities
probing, we don't care about using --preconfig, as were not running
a real VM

Regards,
Daniel
Michal Prívozník June 12, 2018, 1:04 p.m. UTC | #9
On 06/12/2018 02:42 PM, Igor Mammedov wrote:

>>>
>>> Do we really need to make -daemonize and -preconfig work
>>> together?  libvirt uses -daemonize only for its initial
>>> capability probing, which shouldn't require -preconfig at all.
>>>
>>> Even on that case, I wonder why libvirt doesn't simply create a
>>> server socket and waits for QEMU to connect instead of using
>>> -daemonize as a sync point.
>>>   
>>
>> because libvirt views qemu as well behaved daemon. Should anything go
>> wrong libvirt reads qemu's stderr and reports error to upper layers.
> We can keep daemonizing flow in QEMU as it's now.
> But Eduardo's idea about libvirt created socked + letting QEMU connect to it
> has a merit. It should fix current deadlock issue with as monitor
> won't be depending on lead exit event.

Not sure about the benefits. Currently, libvirt spawns qemu, waits for
monitor to show up (currently, the timeout dynamic depending on some
black magic involving guest RAM size) and if it does not show up in time
it kills qemu. The same algorithm must be kept in place even for case
when libvirt would pass pre-opened socket to qemu in case qemu deadlocks
before being able to communicate over qmp. The only advantage I see is
libvirt would not need to label the socket (set uid:gid, selinux, ...).
On the other hand, since it would be libvirt creating the socket what
would happen on libvirtd restart?

> Can we do this way on libvirt side when --preconfig is in use
> (it might even be fine for normal flow without -preconfig)?

I think passing pre-opened socket and --preconfig are orthogonal. What
if somebody wants to use --preconfig does not pass any FD?

Michal
Peter Krempa June 12, 2018, 1:10 p.m. UTC | #10
On Tue, Jun 12, 2018 at 15:04:42 +0200, Michal Privoznik wrote:
> On 06/12/2018 02:42 PM, Igor Mammedov wrote:
> 
> >>>
> >>> Do we really need to make -daemonize and -preconfig work
> >>> together?  libvirt uses -daemonize only for its initial
> >>> capability probing, which shouldn't require -preconfig at all.
> >>>
> >>> Even on that case, I wonder why libvirt doesn't simply create a
> >>> server socket and waits for QEMU to connect instead of using
> >>> -daemonize as a sync point.
> >>>   
> >>
> >> because libvirt views qemu as well behaved daemon. Should anything go
> >> wrong libvirt reads qemu's stderr and reports error to upper layers.
> > We can keep daemonizing flow in QEMU as it's now.
> > But Eduardo's idea about libvirt created socked + letting QEMU connect to it
> > has a merit. It should fix current deadlock issue with as monitor
> > won't be depending on lead exit event.
> 
> Not sure about the benefits. Currently, libvirt spawns qemu, waits for
> monitor to show up (currently, the timeout dynamic depending on some
> black magic involving guest RAM size) and if it does not show up in time
> it kills qemu. The same algorithm must be kept in place even for case
> when libvirt would pass pre-opened socket to qemu in case qemu deadlocks
> before being able to communicate over qmp. The only advantage I see is
> libvirt would not need to label the socket (set uid:gid, selinux, ...).
> On the other hand, since it would be libvirt creating the socket what
> would happen on libvirtd restart?

Well, if qemu deadlocks just after spewing out the monitor greeting you
end up in the same situation as the timeout code is not applied later
for regular monitor communication.

Depending on how early the preconfig state happens, keeping in the
timeout may be pointless.
Daniel P. Berrangé June 12, 2018, 1:17 p.m. UTC | #11
On Tue, Jun 12, 2018 at 03:04:42PM +0200, Michal Privoznik wrote:
> On 06/12/2018 02:42 PM, Igor Mammedov wrote:
> 
> >>>
> >>> Do we really need to make -daemonize and -preconfig work
> >>> together?  libvirt uses -daemonize only for its initial
> >>> capability probing, which shouldn't require -preconfig at all.
> >>>
> >>> Even on that case, I wonder why libvirt doesn't simply create a
> >>> server socket and waits for QEMU to connect instead of using
> >>> -daemonize as a sync point.
> >>>   
> >>
> >> because libvirt views qemu as well behaved daemon. Should anything go
> >> wrong libvirt reads qemu's stderr and reports error to upper layers.
> > We can keep daemonizing flow in QEMU as it's now.
> > But Eduardo's idea about libvirt created socked + letting QEMU connect to it
> > has a merit. It should fix current deadlock issue with as monitor
> > won't be depending on lead exit event.
> 
> Not sure about the benefits. Currently, libvirt spawns qemu, waits for
> monitor to show up (currently, the timeout dynamic depending on some
> black magic involving guest RAM size) and if it does not show up in time
> it kills qemu. The same algorithm must be kept in place even for case
> when libvirt would pass pre-opened socket to qemu in case qemu deadlocks
> before being able to communicate over qmp. The only advantage I see is
> libvirt would not need to label the socket (set uid:gid, selinux, ...).

As mentioned in my other reply, we already do FD passing, and that code
has intentionally got rid of the timeout, because timeouts cause false
failures to launch QEMU. This is a particular problem when using many
disks that are encrypted, since LUKS encryption has a minimum 1 second
delay on opening each disk, so with many disks we're at risk of hitting
the timeout even when QEMU is still starting normally.

I don't see a reason to special case startup with timeouts to deal
with hangs, while ignoring the possibility of hangs after initial
startup.

> On the other hand, since it would be libvirt creating the socket what
> would happen on libvirtd restart?

We're creating a *listener* socket, not a client connection, so on
restart we simply connect again as normal.


Regards,
Daniel
Eduardo Habkost June 13, 2018, 2:17 p.m. UTC | #12
On Tue, Jun 12, 2018 at 01:50:33PM +0100, Daniel P. Berrangé wrote:
> On Tue, Jun 12, 2018 at 02:42:05PM +0200, Igor Mammedov wrote:
> > We can keep daemonizing flow in QEMU as it's now.
> > But Eduardo's idea about libvirt created socked + letting QEMU connect to it
> > has a merit. It should fix current deadlock issue with as monitor
> > won't be depending on lead exit event.
> 
> NB, libvirt only ever uses --daemonize when probing capabilities, never
> when launching QEMU for a real VM. In the latter case, we now use FD
> passing, so libvirt opens the UNIX domain socket listener, and passes
> this into QEMU. So libvirt knows it can connect to the listener
> immediately and will only ever get a failure if QEMU has exited.

So, what I'm really missing here is: do we have a good reason to
support --daemonize + --preconfig today?

The options I see are:

1) complete daemonization before preconfig main loop
----------------------------------------------------

By "complete daemonization" I mean doing chdir("/"),
stderr/stdout cleanup, chroot, and UID magic before calling
exit(0) on the main QEMU process.

Pros:
* More intuitive

Cons:
* Can break existing initialization code that don't expect
  this to happen.
  (can this be fixed?)
* Can break any preconfig-time QMP commands that rely on opening
  files
  (is it a real problem?)
* Any initialization error conditions that currently rely on
  error_report()+exit() will be very inconvenient to handle
  properly
  (this can be fixed eventually, but it's not trivial)


2) incomplete daemonization before preconfig main loop
------------------------------------------------------

This means calling exit(0) on the main process before doing the
chdir/stderr/etc magic.

Pros:
* Less likely to break initialization code and other QMP commands

Cons:
* Not what's expected from a well-behaved daemon.
  (If we're not daemonizing properly, what's the point of using
  -daemonize at all?)
* More difficult to change behavior later.


3) daemonize only after leaving preconfig state
-----------------------------------------------

AFAICS, this is the current behavior.

Pros:
* Less likely to break init code
* Keeps existing code as is

Cons:
* Less intuitive
* -daemonize becomes useless as synchronization point for monitor
  availability
* Would this be useful for anybody, at all?
* We won't be able to change this behavior later


4) Not supporting -preconfig + -daemonize
-----------------------------------------

Pros:
* Simple to implement.
* Avoids unexpected bugs.
* Saves our time.
* We can change this behavior later.

Cons:
* People might want us to support it eventually.



I believe the only reasonable options are (1) and (4).
Daniel P. Berrangé June 13, 2018, 2:23 p.m. UTC | #13
On Wed, Jun 13, 2018 at 11:17:30AM -0300, Eduardo Habkost wrote:
> On Tue, Jun 12, 2018 at 01:50:33PM +0100, Daniel P. Berrangé wrote:
> > On Tue, Jun 12, 2018 at 02:42:05PM +0200, Igor Mammedov wrote:
> > > We can keep daemonizing flow in QEMU as it's now.
> > > But Eduardo's idea about libvirt created socked + letting QEMU connect to it
> > > has a merit. It should fix current deadlock issue with as monitor
> > > won't be depending on lead exit event.
> > 
> > NB, libvirt only ever uses --daemonize when probing capabilities, never
> > when launching QEMU for a real VM. In the latter case, we now use FD
> > passing, so libvirt opens the UNIX domain socket listener, and passes
> > this into QEMU. So libvirt knows it can connect to the listener
> > immediately and will only ever get a failure if QEMU has exited.
> 
> So, what I'm really missing here is: do we have a good reason to
> support --daemonize + --preconfig today?

On the libvirt zero, I don't see a compelling need for it.

> The options I see are:
> 
> 1) complete daemonization before preconfig main loop
> ----------------------------------------------------
> 
> By "complete daemonization" I mean doing chdir("/"),
> stderr/stdout cleanup, chroot, and UID magic before calling
> exit(0) on the main QEMU process.
> 
> Pros:
> * More intuitive
> 
> Cons:
> * Can break existing initialization code that don't expect
>   this to happen.
>   (can this be fixed?)
> * Can break any preconfig-time QMP commands that rely on opening
>   files
>   (is it a real problem?)

NB Use of -chroot is separate from -daemonize, so it is not
an issue with -preconfig + -daemonize alone.

There's soo many caveats around -chroot, I struggle to
care about adding another caveats.

> * Any initialization error conditions that currently rely on
>   error_report()+exit() will be very inconvenient to handle
>   properly
>   (this can be fixed eventually, but it's not trivial)


> 3) daemonize only after leaving preconfig state
> -----------------------------------------------
> 
> AFAICS, this is the current behavior.
> 
> Pros:
> * Less likely to break init code
> * Keeps existing code as is
> 
> Cons:
> * Less intuitive
> * -daemonize becomes useless as synchronization point for monitor
>   availability

Yeah that honestly kills the key benefit of having -daemonize
imho.

> * Would this be useful for anybody, at all?
> * We won't be able to change this behavior later
> 
> 

> I believe the only reasonable options are (1) and (4).

Agreed.

Regards,
Daniel
Eduardo Habkost June 13, 2018, 5:09 p.m. UTC | #14
On Wed, Jun 13, 2018 at 03:23:09PM +0100, Daniel P. Berrangé wrote:
> On Wed, Jun 13, 2018 at 11:17:30AM -0300, Eduardo Habkost wrote:
> > On Tue, Jun 12, 2018 at 01:50:33PM +0100, Daniel P. Berrangé wrote:
> > > On Tue, Jun 12, 2018 at 02:42:05PM +0200, Igor Mammedov wrote:
> > > > We can keep daemonizing flow in QEMU as it's now.
> > > > But Eduardo's idea about libvirt created socked + letting QEMU connect to it
> > > > has a merit. It should fix current deadlock issue with as monitor
> > > > won't be depending on lead exit event.
> > > 
> > > NB, libvirt only ever uses --daemonize when probing capabilities, never
> > > when launching QEMU for a real VM. In the latter case, we now use FD
> > > passing, so libvirt opens the UNIX domain socket listener, and passes
> > > this into QEMU. So libvirt knows it can connect to the listener
> > > immediately and will only ever get a failure if QEMU has exited.
> > 
> > So, what I'm really missing here is: do we have a good reason to
> > support --daemonize + --preconfig today?
> 
> On the libvirt zero, I don't see a compelling need for it.

Good. :)

> > The options I see are:
> > 1) complete daemonization before preconfig main loop
[...]
> > 4) Not supporting -preconfig + -daemonize
[...]
> > I believe the only reasonable options are (1) and (4).
> 
> Agreed.

If it was up to me, I would just go with (4) because it's
simpler.

But if somebody wants to implement (1), the caveats should be
clearly documented.  I would prefer to simply document
"--daemonize --preconfig" as experimental, with something like:

  "Note: usage of --daemonize with the --preconfig option is
  experimental, because it can prevent QEMU from reporting
  machine initialization errors and prevent some features from
  working after QEMU is daemonized."
Igor Mammedov June 14, 2018, 12:32 p.m. UTC | #15
On Wed, 13 Jun 2018 14:09:32 -0300
Eduardo Habkost <ehabkost@redhat.com> wrote:

> On Wed, Jun 13, 2018 at 03:23:09PM +0100, Daniel P. Berrangé wrote:
> > On Wed, Jun 13, 2018 at 11:17:30AM -0300, Eduardo Habkost wrote:  
> > > On Tue, Jun 12, 2018 at 01:50:33PM +0100, Daniel P. Berrangé wrote:  
> > > > On Tue, Jun 12, 2018 at 02:42:05PM +0200, Igor Mammedov wrote:  
> > > > > We can keep daemonizing flow in QEMU as it's now.
> > > > > But Eduardo's idea about libvirt created socked + letting QEMU connect to it
> > > > > has a merit. It should fix current deadlock issue with as monitor
> > > > > won't be depending on lead exit event.  
> > > > 
> > > > NB, libvirt only ever uses --daemonize when probing capabilities, never
> > > > when launching QEMU for a real VM. In the latter case, we now use FD
> > > > passing, so libvirt opens the UNIX domain socket listener, and passes
> > > > this into QEMU. So libvirt knows it can connect to the listener
> > > > immediately and will only ever get a failure if QEMU has exited.  
> > > 
> > > So, what I'm really missing here is: do we have a good reason to
> > > support --daemonize + --preconfig today?  
> > 
> > On the libvirt zero, I don't see a compelling need for it.  
> 
> Good. :)
> 
> > > The options I see are:
> > > 1) complete daemonization before preconfig main loop  
> [...]
> > > 4) Not supporting -preconfig + -daemonize  
> [...]
> > > I believe the only reasonable options are (1) and (4).  
> > 
> > Agreed.  
> 
> If it was up to me, I would just go with (4) because it's
> simpler.
Let's just disable it for now. it will be easier to allow it
than take it back later.

> 
> But if somebody wants to implement (1), the caveats should be
> clearly documented.  I would prefer to simply document
> "--daemonize --preconfig" as experimental, with something like:
> 
>   "Note: usage of --daemonize with the --preconfig option is
>   experimental, because it can prevent QEMU from reporting
>   machine initialization errors and prevent some features from
>   working after QEMU is daemonized."
diff mbox series

Patch

diff --git a/os-posix.c b/os-posix.c
index 9ce6f74..0246195 100644
--- a/os-posix.c
+++ b/os-posix.c
@@ -309,8 +309,14 @@  void os_daemonize(void)
 
 void os_setup_post(void)
 {
+    static bool os_setup_post_done;
     int fd = 0;
 
+    if (os_setup_post_done) {
+        return;
+    }
+    os_setup_post_done = true;
+
     if (daemonize) {
         if (chdir("/")) {
             error_report("not able to chdir to /: %s", strerror(errno));
diff --git a/vl.c b/vl.c
index fa44138..457ff2a 100644
--- a/vl.c
+++ b/vl.c
@@ -4578,6 +4578,12 @@  int main(int argc, char **argv, char **envp)
     parse_numa_opts(current_machine);
 
     /* do monitor/qmp handling at preconfig state if requested */
+    if (!preconfig_exit_requested && is_daemonized()) {
+        /* signal parent QEMU to exit, libvirt treats it as a sign
+         * that monitor socket is ready to accept connections
+         */
+        os_setup_post();
+    }
     main_loop();
 
     /* from here on runstate is RUN_STATE_PRELAUNCH */