diff mbox series

[v4,3/9] cli: add -preconfig option

Message ID 1520860275-101576-4-git-send-email-imammedo@redhat.com
State New
Headers show
Series enable numa configuration before machine_init() from QMP | expand

Commit Message

Igor Mammedov March 12, 2018, 1:11 p.m. UTC
This option allows pausing QEMU in the new RUN_STATE_PRECONFIG state,
allowing the configuration of QEMU from QMP before the machine jumps
into board initialization code of machine_run_board_init()

Intent is to allow management to query machine state and additionally
configure it using previous query results within one QEMU instance
(i.e. eliminate need to start QEMU twice, 1st to query board specific
parameters and 2nd for actual VM start using query results for
additional parameters).

New option complements -S option and could be used with or without
it. Difference is that -S pauses QEMU when machine is completely
build with all devices wired up and ready run (QEMU need only to
unpause CPUs to let guest execute its code).
And "preconfig" option pauses QEMU early before board specific init
callback (machine_run_board_init) is executed and will allow to
configure machine parameters which will be used by board init code.

When early introspection/configuration is done, command 'cont' should
be used to exit RUN_STATE_PRECONFIG and transition to the next
requested state (i.e. if -S is used then QEMU will pause the second
time when board/device initialization is completed or start guest
execution if -S isn't provided on CLI)

PS:
Initially 'preconfig' is planned to be used for configuring numa
topology depending on board specified possible cpus layout.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
v4:
  * Explain more on behaviour in commit message and use suggested
    wording in message and patch (Eric Blake <eblake@redhat.com>)
---
 include/sysemu/sysemu.h |  1 +
 qapi/run-state.json     |  5 ++++-
 qemu-options.hx         | 13 +++++++++++++
 qmp.c                   |  5 +++++
 vl.c                    | 35 ++++++++++++++++++++++++++++++++++-
 5 files changed, 57 insertions(+), 2 deletions(-)

Comments

Eric Blake March 23, 2018, 9:02 p.m. UTC | #1
On 03/12/2018 08:11 AM, Igor Mammedov wrote:

I know you wrote this before softfreeze, but I'm only just now getting a 
chance to review. ...[1]

> This option allows pausing QEMU in the new RUN_STATE_PRECONFIG state,
> allowing the configuration of QEMU from QMP before the machine jumps
> into board initialization code of machine_run_board_init()
> 
> Intent is to allow management to query machine state and additionally

s/Intent/The intent/

> configure it using previous query results within one QEMU instance
> (i.e. eliminate need to start QEMU twice, 1st to query board specific

s/need/the need/

> parameters and 2nd for actual VM start using query results for
> additional parameters).
> 
> New option complements -S option and could be used with or without

s/New/The new/


> it. Difference is that -S pauses QEMU when machine is completely

s/Difference/The difference/
s/when/when the/

> build with all devices wired up and ready run (QEMU need only to

s/build/built/
s/ready/ready to/

> unpause CPUs to let guest execute its code).
> And "preconfig" option pauses QEMU early before board specific init

s/. And/; while the/

> callback (machine_run_board_init) is executed and will allow to
> configure machine parameters which will be used by board init code.

s/allow to configure/allow the configuration of/

> 
> When early introspection/configuration is done, command 'cont' should
> be used to exit RUN_STATE_PRECONFIG and transition to the next
> requested state (i.e. if -S is used then QEMU will pause the second
> time when board/device initialization is completed or start guest
> execution if -S isn't provided on CLI)
> 
> PS:
> Initially 'preconfig' is planned to be used for configuring numa
> topology depending on board specified possible cpus layout.
> 
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> ---
> v4:
>    * Explain more on behaviour in commit message and use suggested
>      wording in message and patch (Eric Blake <eblake@redhat.com>)

Well, I'm still coming up with wording tweaks, but it is getting better ;)

> ---
>   include/sysemu/sysemu.h |  1 +
>   qapi/run-state.json     |  5 ++++-
>   qemu-options.hx         | 13 +++++++++++++
>   qmp.c                   |  5 +++++
>   vl.c                    | 35 ++++++++++++++++++++++++++++++++++-
>   5 files changed, 57 insertions(+), 2 deletions(-)
> 

> +++ b/qapi/run-state.json
> @@ -49,12 +49,15 @@
>   # @colo: guest is paused to save/restore VM state under colo checkpoint,
>   #        VM can not get into this state unless colo capability is enabled
>   #        for migration. (since 2.8)
> +# @preconfig: QEMU is paused before board specific init callback is executed.
> +#             The state is reachable only if -preconfig CLI option is used.
> +#             (Since 2.12)

[1]... So are you still trying to cram this in 2.12 as a bugfix? It 
feels enough like a feature that at this point, you'll want to change 
that to 2.13 on your v5 spin.  (Probably a similar comment throughout 
the series, so I'll only mention it this once).

s/if -preconfig/if the --preconfig/

spelling --preconfig with two dashes may make sense; we have a 
bite-sized task that mentions that common options like -object/--object 
should prefer the two-dash form, at which point consistency where all 
our other options use the two-dash form may be worth doing.  But even if 
you stick with the one-dash form, inserting 'the' sounds better to a 
native speaker.

>   ##
>   { 'enum': 'RunState',
>     'data': [ 'debug', 'inmigrate', 'internal-error', 'io-error', 'paused',
>               'postmigrate', 'prelaunch', 'finish-migrate', 'restore-vm',
>               'running', 'save-vm', 'shutdown', 'suspended', 'watchdog',
> -            'guest-panicked', 'colo' ] }
> +            'guest-panicked', 'colo', 'preconfig' ] }
>   
>   ##
>   # @StatusInfo:
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 6585058..7c8aaa5 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -3302,6 +3302,19 @@ STEXI
>   Run the emulation in single step mode.
>   ETEXI
>   
> +DEF("preconfig", 0, QEMU_OPTION_preconfig, \
> +    "-preconfig      pause QEMU before machine is initialized\n",

More places for two-dash spelling consideration.

> +    QEMU_ARCH_ALL)
> +STEXI
> +@item -preconfig
> +@findex -preconfig
> +Pause QEMU for interactive configuration before the machine is created,
> +which allows querying and configuring properties that will affect
> +machine initialization. Use the QMP command 'cont' to exit the preconfig
> +state and move to the next state (ie. run guest if -S isn't used or
> +pause the second time is -S is used).

s/is -S/if -S/
Eduardo Habkost March 23, 2018, 9:05 p.m. UTC | #2
On Mon, Mar 12, 2018 at 02:11:09PM +0100, Igor Mammedov wrote:
> This option allows pausing QEMU in the new RUN_STATE_PRECONFIG state,
> allowing the configuration of QEMU from QMP before the machine jumps
> into board initialization code of machine_run_board_init()
> 
> Intent is to allow management to query machine state and additionally
> configure it using previous query results within one QEMU instance
> (i.e. eliminate need to start QEMU twice, 1st to query board specific
> parameters and 2nd for actual VM start using query results for
> additional parameters).
> 
> New option complements -S option and could be used with or without
> it. Difference is that -S pauses QEMU when machine is completely
> build with all devices wired up and ready run (QEMU need only to
> unpause CPUs to let guest execute its code).
> And "preconfig" option pauses QEMU early before board specific init
> callback (machine_run_board_init) is executed and will allow to
> configure machine parameters which will be used by board init code.
> 
> When early introspection/configuration is done, command 'cont' should
> be used to exit RUN_STATE_PRECONFIG and transition to the next
> requested state (i.e. if -S is used then QEMU will pause the second
> time when board/device initialization is completed or start guest
> execution if -S isn't provided on CLI)
> 
> PS:
> Initially 'preconfig' is planned to be used for configuring numa
> topology depending on board specified possible cpus layout.
> 
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>

TL;DR: I was against this approach of adding a new "preconfig"
state and thought "-S" ought to be enough, but I'm now convinced
this is the best option we have.


Long version:

So, I was skeptical of this approach initially, because I thought
"machine->init() was run" and "machine->init() was not run yet"
is supposed to be internal QEMU state that no external component
should care about at all, because the vCPUs are not running yet.

In other words, if vCPUS were not started yet, we should be able
to reconfigure anything, and "-S" ought to be enough to what we
want.

...in theory.  In practice this is messy:

Currently initialization works this way:

  void vm_start()  /* this is delayed if -S is used */
  {
      resume_all_vcpus();
  }

  void qmp_cont()  /* "cont" command */
  {
      /* ... */
      vm_start();
  }
  
  void main()
  {
      /* ... */
      machine_run_board_init()
      if (autostart) {  /* -S option sets autotstart = 0 */
          vm_start();
      }
      main_loop();  /* QMP becomes available here */
  }

Then we would have to either do this:

  void vm_start()
  {
      machine_run_board_init()  /* <---- HERE */
      resume_all_vcpus();
  }

  void main()
  {
      /* ... */
      /* machine_run_board_init() moved from here */
      if (autostart) {
          vm_start();
      }
      main_loop();
  }

...and fix every single QMP command to not break if
machine_run_board_init() wasn't called yet.

I don't think that's feasible.


Or we could do this:

  void vm_start()
  {
      configure_numa()  /* <---- HERE */
      resume_all_vcpus();
  }

  void main()
  {
      /* ... */
      machine_run_board_init();
      if (autostart) {
          vm_start();
      }
      main_loop();
  }

...and slowly move code from machine_run_board_init() to
vm_start() (like configure_numa() above).

That's how I expected us to implement the NUMA QMP configuration
stuff.

But, really, the data and ordering dependencies we have in
machine initialization is insane, and simply moving
configure_numa() after machine_run_board_init() would require
moving almost all of machine_run_board_init() inside vm_start().

In practice this would be more complex than moving
machine_run_board_init() completely inside vm_start().  I don't
think that's feasible.

So I'm OK with your approach.

Now I will review the actual code in a separate e-mail.  :)
Eduardo Habkost March 23, 2018, 9:25 p.m. UTC | #3
On Mon, Mar 12, 2018 at 02:11:09PM +0100, Igor Mammedov wrote:
> This option allows pausing QEMU in the new RUN_STATE_PRECONFIG state,
> allowing the configuration of QEMU from QMP before the machine jumps
> into board initialization code of machine_run_board_init()
> 
> Intent is to allow management to query machine state and additionally
> configure it using previous query results within one QEMU instance
> (i.e. eliminate need to start QEMU twice, 1st to query board specific
> parameters and 2nd for actual VM start using query results for
> additional parameters).
> 
> New option complements -S option and could be used with or without
> it. Difference is that -S pauses QEMU when machine is completely
> build with all devices wired up and ready run (QEMU need only to
> unpause CPUs to let guest execute its code).
> And "preconfig" option pauses QEMU early before board specific init
> callback (machine_run_board_init) is executed and will allow to
> configure machine parameters which will be used by board init code.
> 
> When early introspection/configuration is done, command 'cont' should
> be used to exit RUN_STATE_PRECONFIG and transition to the next
> requested state (i.e. if -S is used then QEMU will pause the second
> time when board/device initialization is completed or start guest
> execution if -S isn't provided on CLI)
> 
> PS:
> Initially 'preconfig' is planned to be used for configuring numa
> topology depending on board specified possible cpus layout.
> 
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> ---
> v4:
>   * Explain more on behaviour in commit message and use suggested
>     wording in message and patch (Eric Blake <eblake@redhat.com>)
> ---
>  include/sysemu/sysemu.h |  1 +
>  qapi/run-state.json     |  5 ++++-
>  qemu-options.hx         | 13 +++++++++++++
>  qmp.c                   |  5 +++++
>  vl.c                    | 35 ++++++++++++++++++++++++++++++++++-
>  5 files changed, 57 insertions(+), 2 deletions(-)
> 
> diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
> index 356bfdc..996bc38 100644
> --- a/include/sysemu/sysemu.h
> +++ b/include/sysemu/sysemu.h
> @@ -66,6 +66,7 @@ typedef enum WakeupReason {
>      QEMU_WAKEUP_REASON_OTHER,
>  } WakeupReason;
>  
> +void qemu_exit_preconfig_request(void);
>  void qemu_system_reset_request(ShutdownCause reason);
>  void qemu_system_suspend_request(void);
>  void qemu_register_suspend_notifier(Notifier *notifier);
> diff --git a/qapi/run-state.json b/qapi/run-state.json
> index 1c9fff3..ce846a5 100644
> --- a/qapi/run-state.json
> +++ b/qapi/run-state.json
> @@ -49,12 +49,15 @@
>  # @colo: guest is paused to save/restore VM state under colo checkpoint,
>  #        VM can not get into this state unless colo capability is enabled
>  #        for migration. (since 2.8)
> +# @preconfig: QEMU is paused before board specific init callback is executed.
> +#             The state is reachable only if -preconfig CLI option is used.
> +#             (Since 2.12)
>  ##
>  { 'enum': 'RunState',
>    'data': [ 'debug', 'inmigrate', 'internal-error', 'io-error', 'paused',
>              'postmigrate', 'prelaunch', 'finish-migrate', 'restore-vm',
>              'running', 'save-vm', 'shutdown', 'suspended', 'watchdog',
> -            'guest-panicked', 'colo' ] }
> +            'guest-panicked', 'colo', 'preconfig' ] }
>  
>  ##
>  # @StatusInfo:
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 6585058..7c8aaa5 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -3302,6 +3302,19 @@ STEXI
>  Run the emulation in single step mode.
>  ETEXI
>  
> +DEF("preconfig", 0, QEMU_OPTION_preconfig, \
> +    "-preconfig      pause QEMU before machine is initialized\n",
> +    QEMU_ARCH_ALL)
> +STEXI
> +@item -preconfig
> +@findex -preconfig
> +Pause QEMU for interactive configuration before the machine is created,
> +which allows querying and configuring properties that will affect
> +machine initialization. Use the QMP command 'cont' to exit the preconfig
> +state and move to the next state (ie. run guest if -S isn't used or
> +pause the second time is -S is used).
> +ETEXI
> +
>  DEF("S", 0, QEMU_OPTION_S, \
>      "-S              freeze CPU at startup (use 'c' to start execution)\n",
>      QEMU_ARCH_ALL)
> diff --git a/qmp.c b/qmp.c
> index 8c7d1cc..b38090d 100644
> --- a/qmp.c
> +++ b/qmp.c
> @@ -166,6 +166,11 @@ void qmp_cont(Error **errp)
>      BlockBackend *blk;
>      Error *local_err = NULL;
>  
> +    if (runstate_check(RUN_STATE_PRECONFIG)) {
> +        qemu_exit_preconfig_request();
> +        return;
> +    }
> +
>      /* if there is a dump in background, we should wait until the dump
>       * finished */
>      if (dump_in_progress()) {
> diff --git a/vl.c b/vl.c
> index 3ef04ce..69b1997 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -593,7 +593,7 @@ static int default_driver_check(void *opaque, QemuOpts *opts, Error **errp)
>  /***********************************************************/
>  /* QEMU state */
>  
> -static RunState current_run_state = RUN_STATE_PRELAUNCH;
> +static RunState current_run_state = RUN_STATE_PRECONFIG;
>  
>  /* We use RUN_STATE__MAX but any invalid value will do */
>  static RunState vmstop_requested = RUN_STATE__MAX;
> @@ -606,6 +606,9 @@ typedef struct {
>  
>  static const RunStateTransition runstate_transitions_def[] = {
>      /*     from      ->     to      */
> +    { RUN_STATE_PRECONFIG, RUN_STATE_PRELAUNCH },
> +    { RUN_STATE_PRECONFIG, RUN_STATE_INMIGRATE },

Don't this mean -preconfig and -incoming could work together?

> +
>      { RUN_STATE_DEBUG, RUN_STATE_RUNNING },
>      { RUN_STATE_DEBUG, RUN_STATE_FINISH_MIGRATE },
>      { RUN_STATE_DEBUG, RUN_STATE_PRELAUNCH },
> @@ -1629,6 +1632,7 @@ static pid_t shutdown_pid;
>  static int powerdown_requested;
>  static int debug_requested;
>  static int suspend_requested;
> +static bool preconfig_exit_requested = true;
>  static WakeupReason wakeup_reason;
>  static NotifierList powerdown_notifiers =
>      NOTIFIER_LIST_INITIALIZER(powerdown_notifiers);
> @@ -1713,6 +1717,11 @@ static int qemu_debug_requested(void)
>      return r;
>  }
>  
> +void qemu_exit_preconfig_request(void)
> +{
> +    preconfig_exit_requested = true;
> +}
> +
>  /*
>   * Reset the VM. Issue an event unless @reason is SHUTDOWN_CAUSE_NONE.
>   */
> @@ -1886,6 +1895,13 @@ static bool main_loop_should_exit(void)
>      RunState r;
>      ShutdownCause request;
>  
> +    if (preconfig_exit_requested) {
> +        if (runstate_check(RUN_STATE_PRECONFIG)) {

Is it possible to have preconfig_exit_request set outside of
RUN_STATE_PRECONFIG?  When and why?

> +            runstate_set(RUN_STATE_PRELAUNCH);
> +        }
> +        preconfig_exit_requested = false;
> +        return true;
> +    }
>      if (qemu_debug_requested()) {
>          vm_stop(RUN_STATE_DEBUG);
>      }
> @@ -3697,6 +3713,14 @@ int main(int argc, char **argv, char **envp)
>                      exit(1);
>                  }
>                  break;
> +            case QEMU_OPTION_preconfig:
> +                if (runstate_check(RUN_STATE_INMIGRATE)) {
> +                    error_report("option can not be used with "
> +                                 "-incoming option");
> +                    exit(EXIT_FAILURE);
> +                }

So -incoming changes runstate as soon as the option is parsed?

Ouch.

I would rather not rely on that behavior and just do
"if (incoming)".

Why exactly it's not possible to use -incoming with -preconfig?


> +                preconfig_exit_requested = false;
> +                break;
>              case QEMU_OPTION_enable_kvm:
>                  olist = qemu_find_opts("machine");
>                  qemu_opts_parse_noisily(olist, "accel=kvm", false);
> @@ -3902,6 +3926,11 @@ int main(int argc, char **argv, char **envp)
>                  }
>                  break;
>              case QEMU_OPTION_incoming:
> +                if (!preconfig_exit_requested) {
> +                    error_report("option can not be used with "
> +                                 "-preconfig option");
> +                    exit(EXIT_FAILURE);
> +                }

Instead of reimplementing the same check in two separate places,
why not validate options and check for (incoming && preconfig)
after the option parsing loop?

>                  if (!incoming) {
>                      runstate_set(RUN_STATE_INMIGRATE);
>                  }
> @@ -4594,6 +4623,10 @@ int main(int argc, char **argv, char **envp)
>      }
>      parse_numa_opts(current_machine);
>  
> +    /* do monitor/qmp handling at preconfig state if requested */
> +    main_loop();

Wouldn't it be simpler to do "if (!preconfig) { main_loop(); }"
instead of entering main_loop() just to exit immediately?

> +
> +    /* from here on runstate is RUN_STATE_PRELAUNCH */
>      machine_run_board_init(current_machine);
>  
>      realtime_init();
> -- 
> 2.7.4
> 
>
Igor Mammedov March 27, 2018, 3:05 p.m. UTC | #4
On Fri, 23 Mar 2018 18:25:08 -0300
Eduardo Habkost <ehabkost@redhat.com> wrote:

> On Mon, Mar 12, 2018 at 02:11:09PM +0100, Igor Mammedov wrote:
[...]
> > diff --git a/vl.c b/vl.c
> > index 3ef04ce..69b1997 100644
> > --- a/vl.c
> > +++ b/vl.c
> > @@ -593,7 +593,7 @@ static int default_driver_check(void *opaque, QemuOpts *opts, Error **errp)
> >  /***********************************************************/
> >  /* QEMU state */
> >  
> > -static RunState current_run_state = RUN_STATE_PRELAUNCH;
> > +static RunState current_run_state = RUN_STATE_PRECONFIG;
> >  
> >  /* We use RUN_STATE__MAX but any invalid value will do */
> >  static RunState vmstop_requested = RUN_STATE__MAX;
> > @@ -606,6 +606,9 @@ typedef struct {
> >  
> >  static const RunStateTransition runstate_transitions_def[] = {
> >      /*     from      ->     to      */
> > +    { RUN_STATE_PRECONFIG, RUN_STATE_PRELAUNCH },
> > +    { RUN_STATE_PRECONFIG, RUN_STATE_INMIGRATE },  
> 
> Don't this mean -preconfig and -incoming could work together?
theoretically yes, but its not the reason why this transition is here.
It's mimicking existing approach where initial state
   { RUN_STATE_PRELAUNCH, RUN_STATE_INMIGRATE },
were allowed to move to the next possible (including RUN_STATE_INMIGRATE)

> > +
> >      { RUN_STATE_DEBUG, RUN_STATE_RUNNING },
> >      { RUN_STATE_DEBUG, RUN_STATE_FINISH_MIGRATE },
> >      { RUN_STATE_DEBUG, RUN_STATE_PRELAUNCH },
> > @@ -1629,6 +1632,7 @@ static pid_t shutdown_pid;
> >  static int powerdown_requested;
> >  static int debug_requested;
> >  static int suspend_requested;
> > +static bool preconfig_exit_requested = true;
> >  static WakeupReason wakeup_reason;
> >  static NotifierList powerdown_notifiers =
> >      NOTIFIER_LIST_INITIALIZER(powerdown_notifiers);
> > @@ -1713,6 +1717,11 @@ static int qemu_debug_requested(void)
> >      return r;
> >  }
> >  
> > +void qemu_exit_preconfig_request(void)
> > +{
> > +    preconfig_exit_requested = true;
> > +}
> > +
> >  /*
> >   * Reset the VM. Issue an event unless @reason is SHUTDOWN_CAUSE_NONE.
> >   */
> > @@ -1886,6 +1895,13 @@ static bool main_loop_should_exit(void)
> >      RunState r;
> >      ShutdownCause request;
> >  
> > +    if (preconfig_exit_requested) {
> > +        if (runstate_check(RUN_STATE_PRECONFIG)) {  
> 
> Is it possible to have preconfig_exit_request set outside of
> RUN_STATE_PRECONFIG?  When and why?
preconfig_exit_requested is initialized with TRUE and
in combo with '-inmigrate' we need this runstate check.
it's the same as it was with
 { RUN_STATE_PRELAUNCH, RUN_STATE_INMIGRATE },
which I probably should remove (I need to check it though)

> > +            runstate_set(RUN_STATE_PRELAUNCH);
> > +        }
> > +        preconfig_exit_requested = false;
> > +        return true;
> > +    }
> >      if (qemu_debug_requested()) {
> >          vm_stop(RUN_STATE_DEBUG);
> >      }
> > @@ -3697,6 +3713,14 @@ int main(int argc, char **argv, char **envp)
> >                      exit(1);
> >                  }
> >                  break;
> > +            case QEMU_OPTION_preconfig:
> > +                if (runstate_check(RUN_STATE_INMIGRATE)) {
> > +                    error_report("option can not be used with "
> > +                                 "-incoming option");
> > +                    exit(EXIT_FAILURE);
> > +                }  
> 
> So -incoming changes runstate as soon as the option is parsed?
> 
> Ouch.
yep and it's rather fragile (it's well out of scope of
this series to re-factor this, so I'm not changing it here)

> I would rather not rely on that behavior and just do
> "if (incoming)".
> 
> Why exactly it's not possible to use -incoming with -preconfig?
there are 2 reasons why I made options mutually exclusive
1. (excuse ) '-incoming' is an option with non explicit side effects
   on other parts of code. It's hard to predict behavior
   of preconfig commands in combination with inmigrate.
   I wouldn't try to touch/change anything related to it
   in this series.
   If we need to change how option is handled, it should
   be separate series that focuses on it.
2. (main reason) is to expose as minimal interface
   as possible. It's easier to extend/modify it future if
   necessary than cut it down after it was introduced.

   Not counting [1], I don't see a reason to permit
   'preconfig' while migration is in progress.
   Configuration commands that where used during 'preconfig'
   stage on source side, should use corresponding CLI options
   on target side. (it's the same behavior as with hotplugged
   devices, keeping migration work-flow the same)

In short I'd prefer to keep restriction until there will be
a real usecase for combo to work together.

> > +                preconfig_exit_requested = false;
> > +                break;
> >              case QEMU_OPTION_enable_kvm:
> >                  olist = qemu_find_opts("machine");
> >                  qemu_opts_parse_noisily(olist, "accel=kvm", false);
> > @@ -3902,6 +3926,11 @@ int main(int argc, char **argv, char **envp)
> >                  }
> >                  break;
> >              case QEMU_OPTION_incoming:
> > +                if (!preconfig_exit_requested) {
> > +                    error_report("option can not be used with "
> > +                                 "-preconfig option");
> > +                    exit(EXIT_FAILURE);
> > +                }  
> 
> Instead of reimplementing the same check in two separate places,
> why not validate options and check for (incoming && preconfig)
> after the option parsing loop?
it could be done this way, but then we would lose specialized
error message.
Even though the way I did it, it is more code but that code
is close to related options and allows for specialized error
message in the order options are parsed.
Also it's easier to read as one doesn't have to jump around,
all error handling is in place where where an option is parsed.
But it's more style question, so if you prefer
(incoming && preconfig) approach I can easily switch to it
on respin.

> >                  if (!incoming) {
> >                      runstate_set(RUN_STATE_INMIGRATE);
> >                  }
> > @@ -4594,6 +4623,10 @@ int main(int argc, char **argv, char **envp)
> >      }
> >      parse_numa_opts(current_machine);
> >  
> > +    /* do monitor/qmp handling at preconfig state if requested */
> > +    main_loop();  
> 
> Wouldn't it be simpler to do "if (!preconfig) { main_loop(); }"
> instead of entering main_loop() just to exit immediately?
The thought didn't cross my mind, it might work and more readable
as one doesn't have to jump into main_loop() to find out that
it would exit immediately.
I'll try to it on respin.

> > +
> > +    /* from here on runstate is RUN_STATE_PRELAUNCH */
> >      machine_run_board_init(current_machine);
> >  
> >      realtime_init();
> > -- 
> > 2.7.4
> > 
> >   
>
Igor Mammedov March 28, 2018, 11:48 a.m. UTC | #5
On Tue, 27 Mar 2018 17:05:41 +0200
Igor Mammedov <imammedo@redhat.com> wrote:

> On Fri, 23 Mar 2018 18:25:08 -0300
> Eduardo Habkost <ehabkost@redhat.com> wrote:
> 
> > On Mon, Mar 12, 2018 at 02:11:09PM +0100, Igor Mammedov wrote:  
> [...]
[...]
> > > @@ -1886,6 +1895,13 @@ static bool main_loop_should_exit(void)
> > >      RunState r;
> > >      ShutdownCause request;
> > >  
> > > +    if (preconfig_exit_requested) {
> > > +        if (runstate_check(RUN_STATE_PRECONFIG)) {    
> > 
> > Is it possible to have preconfig_exit_request set outside of
> > RUN_STATE_PRECONFIG?  When and why?  
> preconfig_exit_requested is initialized with TRUE and
> in combo with '-inmigrate' we need this runstate check.
> it's the same as it was with
>  { RUN_STATE_PRELAUNCH, RUN_STATE_INMIGRATE },
> which I probably should remove (I need to check it though)
[...]

> > > @@ -4594,6 +4623,10 @@ int main(int argc, char **argv, char **envp)
> > >      }
> > >      parse_numa_opts(current_machine);
> > >  
> > > +    /* do monitor/qmp handling at preconfig state if requested */
> > > +    main_loop();    
> > 
> > Wouldn't it be simpler to do "if (!preconfig) { main_loop(); }"
> > instead of entering main_loop() just to exit immediately?  
> The thought didn't cross my mind, it might work and more readable
> as one doesn't have to jump into main_loop() to find out that
> it would exit immediately.
> I'll try to it on respin.
Well doing as suggested end ups more messy:

    @@static bool main_loop_should_exit(void)
    ...
    if (preconfig_exit_requested) {
        runstate_set(RUN_STATE_PRELAUNCH);                                        
        return true;
    }
   
    @@main
    /* do monitor/qmp handling at preconfig state if requested */
    if (!preconfig_exit_requested) {
        main_loop();
    } else if (runstate_check(RUN_STATE_PRECONFIG)) {
        runstate_set(RUN_STATE_PRELAUNCH);
    }
    preconfig_exit_requested = false;
    ...

I'd prefer original v4 approach, where only main_loop_should_exit()
has to deal with state transitions and book-keeping.

[...]
Eduardo Habkost March 28, 2018, 7:17 p.m. UTC | #6
On Tue, Mar 27, 2018 at 05:05:41PM +0200, Igor Mammedov wrote:
> On Fri, 23 Mar 2018 18:25:08 -0300
> Eduardo Habkost <ehabkost@redhat.com> wrote:
> 
> > On Mon, Mar 12, 2018 at 02:11:09PM +0100, Igor Mammedov wrote:
> [...]
> > > diff --git a/vl.c b/vl.c
> > > index 3ef04ce..69b1997 100644
> > > --- a/vl.c
> > > +++ b/vl.c
> > > @@ -593,7 +593,7 @@ static int default_driver_check(void *opaque, QemuOpts *opts, Error **errp)
> > >  /***********************************************************/
> > >  /* QEMU state */
> > >  
> > > -static RunState current_run_state = RUN_STATE_PRELAUNCH;
> > > +static RunState current_run_state = RUN_STATE_PRECONFIG;
> > >  
> > >  /* We use RUN_STATE__MAX but any invalid value will do */
> > >  static RunState vmstop_requested = RUN_STATE__MAX;
> > > @@ -606,6 +606,9 @@ typedef struct {
> > >  
> > >  static const RunStateTransition runstate_transitions_def[] = {
> > >      /*     from      ->     to      */
> > > +    { RUN_STATE_PRECONFIG, RUN_STATE_PRELAUNCH },
> > > +    { RUN_STATE_PRECONFIG, RUN_STATE_INMIGRATE },  
> > 
> > Don't this mean -preconfig and -incoming could work together?
> theoretically yes, but its not the reason why this transition is here.
> It's mimicking existing approach where initial state
>    { RUN_STATE_PRELAUNCH, RUN_STATE_INMIGRATE },
> were allowed to move to the next possible (including RUN_STATE_INMIGRATE)

I still don't get it.  Where this definition of "next possible"
comes from?  If -incoming and -preconfig don't work together, why
is PRECONFIG -> INMIGRATE migration considered possible?


> 
> > > +
> > >      { RUN_STATE_DEBUG, RUN_STATE_RUNNING },
> > >      { RUN_STATE_DEBUG, RUN_STATE_FINISH_MIGRATE },
> > >      { RUN_STATE_DEBUG, RUN_STATE_PRELAUNCH },
> > > @@ -1629,6 +1632,7 @@ static pid_t shutdown_pid;
> > >  static int powerdown_requested;
> > >  static int debug_requested;
> > >  static int suspend_requested;
> > > +static bool preconfig_exit_requested = true;
> > >  static WakeupReason wakeup_reason;
> > >  static NotifierList powerdown_notifiers =
> > >      NOTIFIER_LIST_INITIALIZER(powerdown_notifiers);
> > > @@ -1713,6 +1717,11 @@ static int qemu_debug_requested(void)
> > >      return r;
> > >  }
> > >  
> > > +void qemu_exit_preconfig_request(void)
> > > +{
> > > +    preconfig_exit_requested = true;
> > > +}
> > > +
> > >  /*
> > >   * Reset the VM. Issue an event unless @reason is SHUTDOWN_CAUSE_NONE.
> > >   */
> > > @@ -1886,6 +1895,13 @@ static bool main_loop_should_exit(void)
> > >      RunState r;
> > >      ShutdownCause request;
> > >  
> > > +    if (preconfig_exit_requested) {
> > > +        if (runstate_check(RUN_STATE_PRECONFIG)) {  
> > 
> > Is it possible to have preconfig_exit_request set outside of
> > RUN_STATE_PRECONFIG?  When and why?
> preconfig_exit_requested is initialized with TRUE and
> in combo with '-inmigrate' we need this runstate check.

I think this now makes sense to me.  It still looks confusing,
but I don't have a better suggestion right now.

Except...

Why exactly do you need to use main_loop() and
main_loop_should_exit() for the preconfig loop?  What about a
separate preconfig_loop() and preconfig_loop_should_exit()
function?


> it's the same as it was with
>  { RUN_STATE_PRELAUNCH, RUN_STATE_INMIGRATE },
> which I probably should remove (I need to check it though)
> 
> > > +            runstate_set(RUN_STATE_PRELAUNCH);
> > > +        }
> > > +        preconfig_exit_requested = false;

What happens if we don't set preconfig_exit_requested=false here?


> > > +        return true;
> > > +    }
> > >      if (qemu_debug_requested()) {
> > >          vm_stop(RUN_STATE_DEBUG);
> > >      }
> > > @@ -3697,6 +3713,14 @@ int main(int argc, char **argv, char **envp)
> > >                      exit(1);
> > >                  }
> > >                  break;
> > > +            case QEMU_OPTION_preconfig:
> > > +                if (runstate_check(RUN_STATE_INMIGRATE)) {
> > > +                    error_report("option can not be used with "
> > > +                                 "-incoming option");
> > > +                    exit(EXIT_FAILURE);
> > > +                }  
> > 
> > So -incoming changes runstate as soon as the option is parsed?
> > 
> > Ouch.
> yep and it's rather fragile (it's well out of scope of
> this series to re-factor this, so I'm not changing it here)
> 
> > I would rather not rely on that behavior and just do
> > "if (incoming)".
> > 
> > Why exactly it's not possible to use -incoming with -preconfig?
> there are 2 reasons why I made options mutually exclusive
> 1. (excuse ) '-incoming' is an option with non explicit side effects
>    on other parts of code. It's hard to predict behavior
>    of preconfig commands in combination with inmigrate.
>    I wouldn't try to touch/change anything related to it
>    in this series.
>    If we need to change how option is handled, it should
>    be separate series that focuses on it.
> 2. (main reason) is to expose as minimal interface
>    as possible. It's easier to extend/modify it future if
>    necessary than cut it down after it was introduced.
> 
>    Not counting [1], I don't see a reason to permit
>    'preconfig' while migration is in progress.
>    Configuration commands that where used during 'preconfig'
>    stage on source side, should use corresponding CLI options
>    on target side. (it's the same behavior as with hotplugged
>    devices, keeping migration work-flow the same)
> 
> In short I'd prefer to keep restriction until there will be
> a real usecase for combo to work together.

I understand the reasons, but I think we already have an
important use case: live-migrating a VM with non-trivial NUMA
config (that needs -preconfig).  Don't we?


> 
> > > +                preconfig_exit_requested = false;
> > > +                break;
> > >              case QEMU_OPTION_enable_kvm:
> > >                  olist = qemu_find_opts("machine");
> > >                  qemu_opts_parse_noisily(olist, "accel=kvm", false);
> > > @@ -3902,6 +3926,11 @@ int main(int argc, char **argv, char **envp)
> > >                  }
> > >                  break;
> > >              case QEMU_OPTION_incoming:
> > > +                if (!preconfig_exit_requested) {
> > > +                    error_report("option can not be used with "
> > > +                                 "-preconfig option");
> > > +                    exit(EXIT_FAILURE);
> > > +                }  
> > 
> > Instead of reimplementing the same check in two separate places,
> > why not validate options and check for (incoming && preconfig)
> > after the option parsing loop?
> it could be done this way, but then we would lose specialized
> error message.
> Even though the way I did it, it is more code but that code
> is close to related options and allows for specialized error
> message in the order options are parsed.

What do you mean by specialized user message?  Both have exactly
the same information: "-incoming and -preconfig can't be used
together", just written in a different way.


> Also it's easier to read as one doesn't have to jump around,
> all error handling is in place where where an option is parsed.
> But it's more style question, so if you prefer
> (incoming && preconfig) approach I can easily switch to it
> on respin.

I would prefer that.  We already have lots of configuration
validation after the option parsing loop, including but not
limited to:

    error_report("Invalid SMP CPUs %d. The min CPUs "
                 "supported by machine '%s' is %d", smp_cpus,
                 machine_class->name, machine_class->min_cpus);
    error_report("Invalid SMP CPUs %d. The max CPUs "
                 "supported by machine '%s' is %d", max_cpus,
                 machine_class->name, machine_class->max_cpus);
    error_report("-nographic cannot be used with -daemonize");
    error_report("curses display cannot be used with -daemonize");
    error_report("-no-frame, -alt-grab and -ctrl-grab are only valid "
                 "for SDL, ignoring option");
    error_report("-no-quit is only valid for GTK and SDL, "
                 "ignoring option");
    error_report("OpenGL is not supported by the display");
    error_report("OpenGL support is disabled");
    error_report("-append only allowed with -kernel option");
    error_report("-initrd only allowed with -kernel option");
    error_report("-icount is not allowed with hardware virtualization");
    error_report("at most 2047 MB RAM can be simulated");


I agree with the argument that validation of config options
should be done all in the same place.  But I disagree that the
body of the option parsing loop is the right place for that.

> 
> > >                  if (!incoming) {
> > >                      runstate_set(RUN_STATE_INMIGRATE);
> > >                  }
> > > @@ -4594,6 +4623,10 @@ int main(int argc, char **argv, char **envp)
> > >      }
> > >      parse_numa_opts(current_machine);
> > >  
> > > +    /* do monitor/qmp handling at preconfig state if requested */
> > > +    main_loop();  
> > 
> > Wouldn't it be simpler to do "if (!preconfig) { main_loop(); }"
> > instead of entering main_loop() just to exit immediately?
> The thought didn't cross my mind, it might work and more readable
> as one doesn't have to jump into main_loop() to find out that
> it would exit immediately.
> I'll try to it on respin.

Thanks!

> 
> > > +
> > > +    /* from here on runstate is RUN_STATE_PRELAUNCH */
> > >      machine_run_board_init(current_machine);
> > >  
> > >      realtime_init();
> > > -- 
> > > 2.7.4
> > > 
> > >   
> > 
>
Eduardo Habkost March 28, 2018, 7:21 p.m. UTC | #7
On Wed, Mar 28, 2018 at 01:48:35PM +0200, Igor Mammedov wrote:
> On Tue, 27 Mar 2018 17:05:41 +0200
> Igor Mammedov <imammedo@redhat.com> wrote:
> 
> > On Fri, 23 Mar 2018 18:25:08 -0300
> > Eduardo Habkost <ehabkost@redhat.com> wrote:
> > 
> > > On Mon, Mar 12, 2018 at 02:11:09PM +0100, Igor Mammedov wrote:  
> > [...]
> [...]
> > > > @@ -1886,6 +1895,13 @@ static bool main_loop_should_exit(void)
> > > >      RunState r;
> > > >      ShutdownCause request;
> > > >  
> > > > +    if (preconfig_exit_requested) {
> > > > +        if (runstate_check(RUN_STATE_PRECONFIG)) {    
> > > 
> > > Is it possible to have preconfig_exit_request set outside of
> > > RUN_STATE_PRECONFIG?  When and why?  
> > preconfig_exit_requested is initialized with TRUE and
> > in combo with '-inmigrate' we need this runstate check.
> > it's the same as it was with
> >  { RUN_STATE_PRELAUNCH, RUN_STATE_INMIGRATE },
> > which I probably should remove (I need to check it though)
> [...]
> 
> > > > @@ -4594,6 +4623,10 @@ int main(int argc, char **argv, char **envp)
> > > >      }
> > > >      parse_numa_opts(current_machine);
> > > >  
> > > > +    /* do monitor/qmp handling at preconfig state if requested */
> > > > +    main_loop();    
> > > 
> > > Wouldn't it be simpler to do "if (!preconfig) { main_loop(); }"
> > > instead of entering main_loop() just to exit immediately?  
> > The thought didn't cross my mind, it might work and more readable
> > as one doesn't have to jump into main_loop() to find out that
> > it would exit immediately.
> > I'll try to it on respin.
> Well doing as suggested end ups more messy:
> 
>     @@static bool main_loop_should_exit(void)
>     ...
>     if (preconfig_exit_requested) {
>         runstate_set(RUN_STATE_PRELAUNCH);                                        
>         return true;
>     }
>    
>     @@main
>     /* do monitor/qmp handling at preconfig state if requested */
>     if (!preconfig_exit_requested) {
>         main_loop();
>     } else if (runstate_check(RUN_STATE_PRECONFIG)) {
>         runstate_set(RUN_STATE_PRELAUNCH);
>     }

This doesn't make sense to me.  Why would we enter
RUN_STATE_PRECONFIG state if -preconfig is not used at all?


>     preconfig_exit_requested = false;
>     ...
> 
> I'd prefer original v4 approach, where only main_loop_should_exit()
> has to deal with state transitions and book-keeping.

If the above is unavoidable, I agree.  But I still don't
understand we have to enter PRECONFIG state if the user didn't
specify -preconfig.
Igor Mammedov March 29, 2018, 11:43 a.m. UTC | #8
On Wed, 28 Mar 2018 16:21:48 -0300
Eduardo Habkost <ehabkost@redhat.com> wrote:

> On Wed, Mar 28, 2018 at 01:48:35PM +0200, Igor Mammedov wrote:
> > On Tue, 27 Mar 2018 17:05:41 +0200
> > Igor Mammedov <imammedo@redhat.com> wrote:
> >   
> > > On Fri, 23 Mar 2018 18:25:08 -0300
> > > Eduardo Habkost <ehabkost@redhat.com> wrote:
> > >   
> > > > On Mon, Mar 12, 2018 at 02:11:09PM +0100, Igor Mammedov wrote:    
> > > [...]  
> > [...]  
> > > > > @@ -1886,6 +1895,13 @@ static bool main_loop_should_exit(void)
> > > > >      RunState r;
> > > > >      ShutdownCause request;
> > > > >  
> > > > > +    if (preconfig_exit_requested) {
> > > > > +        if (runstate_check(RUN_STATE_PRECONFIG)) {      
> > > > 
> > > > Is it possible to have preconfig_exit_request set outside of
> > > > RUN_STATE_PRECONFIG?  When and why?    
> > > preconfig_exit_requested is initialized with TRUE and
> > > in combo with '-inmigrate' we need this runstate check.
> > > it's the same as it was with
> > >  { RUN_STATE_PRELAUNCH, RUN_STATE_INMIGRATE },
> > > which I probably should remove (I need to check it though)  
> > [...]
> >   
> > > > > @@ -4594,6 +4623,10 @@ int main(int argc, char **argv, char **envp)
> > > > >      }
> > > > >      parse_numa_opts(current_machine);
> > > > >  
> > > > > +    /* do monitor/qmp handling at preconfig state if requested */
> > > > > +    main_loop();      
> > > > 
> > > > Wouldn't it be simpler to do "if (!preconfig) { main_loop(); }"
> > > > instead of entering main_loop() just to exit immediately?    
> > > The thought didn't cross my mind, it might work and more readable
> > > as one doesn't have to jump into main_loop() to find out that
> > > it would exit immediately.
> > > I'll try to it on respin.  
> > Well doing as suggested end ups more messy:
> > 
> >     @@static bool main_loop_should_exit(void)
> >     ...
> >     if (preconfig_exit_requested) {
> >         runstate_set(RUN_STATE_PRELAUNCH);                                        
> >         return true;
> >     }
> >    
> >     @@main
> >     /* do monitor/qmp handling at preconfig state if requested */
> >     if (!preconfig_exit_requested) {
> >         main_loop();
> >     } else if (runstate_check(RUN_STATE_PRECONFIG)) {
> >         runstate_set(RUN_STATE_PRELAUNCH);
> >     }  
> 
> This doesn't make sense to me.  Why would we enter
> RUN_STATE_PRECONFIG state if -preconfig is not used at all?
because of RUN_STATE_PRECONFIG becomes new initial state of
our state machine where we start of (used to be RUN_STATE_PRELAUNCH)
Lets call it variant 1:

with this we have 2 possible transitions:
 RUN_STATE_PRECONFIG -> RUN_STATE_PRELAUNCH (machine_init)

and

 RUN_STATE_PRECONFIG -> RUN_STATE_INMIGRATE
   ugly but it was the same with RUN_STATE_PRELAUNCH initial transition

Another variant 2, in case we switch to RUN_STATE_PRECONFIG only on -preconfig
transitions would be
  RUN_STATE_PRELAUNCH -> RUN_STATE_PRECONFIG
    (allow switch from initial to -preconfig)

  RUN_STATE_PRECONFIG -> RUN_STATE_PRELAUNCH

while the last is valid transition, the 1st one isn't really
valid because of (beside of switching from initial state) it
allows bouncing back to RUN_STATE_PRECONFIG later.

If we consider only state machine transitions, I think it's
cleaner to start with variant 1 with the same
-inmigrate hack we already have (which potentially could
be fixed later), than allowing arbitrary bouncing to
RUN_STATE_PRECONFIG at later stage.

With this approach all processing before machine_init()
would run at RUN_STATE_PRECONFIG and then we would switch
to RUN_STATE_PRELAUNCH. Even though it is far reaching
goal but at least that's where we should be moving to
have sane initialization flow in vl.c

> >     preconfig_exit_requested = false;
> >     ...
> > 
> > I'd prefer original v4 approach, where only main_loop_should_exit()
> > has to deal with state transitions and book-keeping.  
> 
> If the above is unavoidable, I agree.  But I still don't
> understand we have to enter PRECONFIG state if the user didn't
> specify -preconfig.
>
Igor Mammedov March 29, 2018, 1:01 p.m. UTC | #9
On Wed, 28 Mar 2018 16:17:32 -0300
Eduardo Habkost <ehabkost@redhat.com> wrote:

> On Tue, Mar 27, 2018 at 05:05:41PM +0200, Igor Mammedov wrote:
> > On Fri, 23 Mar 2018 18:25:08 -0300
> > Eduardo Habkost <ehabkost@redhat.com> wrote:
> >   
> > > On Mon, Mar 12, 2018 at 02:11:09PM +0100, Igor Mammedov wrote:  
> > [...]  
> > > > diff --git a/vl.c b/vl.c
> > > > index 3ef04ce..69b1997 100644
> > > > --- a/vl.c
> > > > +++ b/vl.c
> > > > @@ -593,7 +593,7 @@ static int default_driver_check(void *opaque, QemuOpts *opts, Error **errp)
> > > >  /***********************************************************/
> > > >  /* QEMU state */
> > > >  
> > > > -static RunState current_run_state = RUN_STATE_PRELAUNCH;
> > > > +static RunState current_run_state = RUN_STATE_PRECONFIG;
> > > >  
> > > >  /* We use RUN_STATE__MAX but any invalid value will do */
> > > >  static RunState vmstop_requested = RUN_STATE__MAX;
> > > > @@ -606,6 +606,9 @@ typedef struct {
> > > >  
> > > >  static const RunStateTransition runstate_transitions_def[] = {
> > > >      /*     from      ->     to      */
> > > > +    { RUN_STATE_PRECONFIG, RUN_STATE_PRELAUNCH },
> > > > +    { RUN_STATE_PRECONFIG, RUN_STATE_INMIGRATE },    
> > > 
> > > Don't this mean -preconfig and -incoming could work together?  
> > theoretically yes, but its not the reason why this transition is here.
> > It's mimicking existing approach where initial state
> >    { RUN_STATE_PRELAUNCH, RUN_STATE_INMIGRATE },
> > were allowed to move to the next possible (including RUN_STATE_INMIGRATE)  
> 
> I still don't get it.  Where this definition of "next possible"
> comes from?  If -incoming and -preconfig don't work together, why
> is PRECONFIG -> INMIGRATE migration considered possible?
I'd think it's the same (replacement) hack which we use now
   RUN_STATE_PRELAUNCH -> RUN_STATE_INMIGRATE
to allow following code to succeed:

      case QEMU_OPTION_incoming:
      if (!incoming) {                                                 
             runstate_set(RUN_STATE_INMIGRATE);                           
      }                                                                
      incoming = optarg;
 
I'd get rid of it and move state switching to the actual place
where migration starts if it were just that simple, but from
a quick look around it did look rather risky.
That's why I abandoned an idea of changing it within this series.

> > > >      { RUN_STATE_DEBUG, RUN_STATE_RUNNING },
> > > >      { RUN_STATE_DEBUG, RUN_STATE_FINISH_MIGRATE },
> > > >      { RUN_STATE_DEBUG, RUN_STATE_PRELAUNCH },
> > > > @@ -1629,6 +1632,7 @@ static pid_t shutdown_pid;
> > > >  static int powerdown_requested;
> > > >  static int debug_requested;
> > > >  static int suspend_requested;
> > > > +static bool preconfig_exit_requested = true;
> > > >  static WakeupReason wakeup_reason;
> > > >  static NotifierList powerdown_notifiers =
> > > >      NOTIFIER_LIST_INITIALIZER(powerdown_notifiers);
> > > > @@ -1713,6 +1717,11 @@ static int qemu_debug_requested(void)
> > > >      return r;
> > > >  }
> > > >  
> > > > +void qemu_exit_preconfig_request(void)
> > > > +{
> > > > +    preconfig_exit_requested = true;
> > > > +}
> > > > +
> > > >  /*
> > > >   * Reset the VM. Issue an event unless @reason is SHUTDOWN_CAUSE_NONE.
> > > >   */
> > > > @@ -1886,6 +1895,13 @@ static bool main_loop_should_exit(void)
> > > >      RunState r;
> > > >      ShutdownCause request;
> > > >  
> > > > +    if (preconfig_exit_requested) {
> > > > +        if (runstate_check(RUN_STATE_PRECONFIG)) {    
> > > 
> > > Is it possible to have preconfig_exit_request set outside of
> > > RUN_STATE_PRECONFIG?  When and why?  
> > preconfig_exit_requested is initialized with TRUE and
> > in combo with '-inmigrate' we need this runstate check.  
> 
> I think this now makes sense to me.  It still looks confusing,
> but I don't have a better suggestion right now.
> 
> Except...
> 
> Why exactly do you need to use main_loop() and
> main_loop_should_exit() for the preconfig loop?  What about a
> separate preconfig_loop() and preconfig_loop_should_exit()
> function?
that would duplicate main_loop() for practically no benefit at all,
hence I'm reusing existing main_loop()/main_loop_should_exit()
just by adding relevant exit condition. It also easier to read
when state transitions are kept close to each other.

 
> > it's the same as it was with
> >  { RUN_STATE_PRELAUNCH, RUN_STATE_INMIGRATE },
> > which I probably should remove (I need to check it though)
> >   
> > > > +            runstate_set(RUN_STATE_PRELAUNCH);
> > > > +        }
> > > > +        preconfig_exit_requested = false;  
> 
> What happens if we don't set preconfig_exit_requested=false here?
nothing should go wrong due to 'if (runstate_check(RUN_STATE_PRECONFIG))'
condition. It's the same what qemu_reset_requested()/qemu_shutdown_requested()
do with their respective request variables but not wrapped
into a separate function as it's the only place it's used.

 
> > > > +        return true;
> > > > +    }
> > > >      if (qemu_debug_requested()) {
> > > >          vm_stop(RUN_STATE_DEBUG);
> > > >      }
> > > > @@ -3697,6 +3713,14 @@ int main(int argc, char **argv, char **envp)
> > > >                      exit(1);
> > > >                  }
> > > >                  break;
> > > > +            case QEMU_OPTION_preconfig:
> > > > +                if (runstate_check(RUN_STATE_INMIGRATE)) {
> > > > +                    error_report("option can not be used with "
> > > > +                                 "-incoming option");
> > > > +                    exit(EXIT_FAILURE);
> > > > +                }    
> > > 
> > > So -incoming changes runstate as soon as the option is parsed?
> > > 
> > > Ouch.  
> > yep and it's rather fragile (it's well out of scope of
> > this series to re-factor this, so I'm not changing it here)
> >   
> > > I would rather not rely on that behavior and just do
> > > "if (incoming)".
> > > 
> > > Why exactly it's not possible to use -incoming with -preconfig?  
> > there are 2 reasons why I made options mutually exclusive
> > 1. (excuse ) '-incoming' is an option with non explicit side effects
> >    on other parts of code. It's hard to predict behavior
> >    of preconfig commands in combination with inmigrate.
> >    I wouldn't try to touch/change anything related to it
> >    in this series.
> >    If we need to change how option is handled, it should
> >    be separate series that focuses on it.
> > 2. (main reason) is to expose as minimal interface
> >    as possible. It's easier to extend/modify it future if
> >    necessary than cut it down after it was introduced.
> > 
> >    Not counting [1], I don't see a reason to permit
> >    'preconfig' while migration is in progress.
> >    Configuration commands that where used during 'preconfig'
> >    stage on source side, should use corresponding CLI options
> >    on target side. (it's the same behavior as with hotplugged
> >    devices, keeping migration work-flow the same)
> > 
> > In short I'd prefer to keep restriction until there will be
> > a real usecase for combo to work together.  
> 
> I understand the reasons, but I think we already have an
> important use case: live-migrating a VM with non-trivial NUMA
> config (that needs -preconfig).  Don't we?
Not really,
whatever we have configured on source side using -preconfig
(discovering valid topology in process), we should be able
to replicate using only CLI options on target since we
already have all necessary values for it from source (it's
certainly the case with this series set-numa-node command).

As for the future, I agree it would be much more flexible
to allow both -preconfig and -incoming at the same time,
so we could start target with empty CLI, and then feed it
options from source. It would require audit/refactoring of
INMIGRATE state and making 'all' current CLI options
available via QMP interface.

But for now I'd prefer to keep using old way to start target.

> > > > +                preconfig_exit_requested = false;
> > > > +                break;
> > > >              case QEMU_OPTION_enable_kvm:
> > > >                  olist = qemu_find_opts("machine");
> > > >                  qemu_opts_parse_noisily(olist, "accel=kvm", false);
> > > > @@ -3902,6 +3926,11 @@ int main(int argc, char **argv, char **envp)
> > > >                  }
> > > >                  break;
> > > >              case QEMU_OPTION_incoming:
> > > > +                if (!preconfig_exit_requested) {
> > > > +                    error_report("option can not be used with "
> > > > +                                 "-preconfig option");
> > > > +                    exit(EXIT_FAILURE);
> > > > +                }    
> > > 
> > > Instead of reimplementing the same check in two separate places,
> > > why not validate options and check for (incoming && preconfig)
> > > after the option parsing loop?  
> > it could be done this way, but then we would lose specialized
> > error message.
> > Even though the way I did it, it is more code but that code
> > is close to related options and allows for specialized error
> > message in the order options are parsed.  
> 
> What do you mean by specialized user message?  Both have exactly
> the same information: "-incoming and -preconfig can't be used
> together", just written in a different way.
[...]
> 
> I agree with the argument that validation of config options
> should be done all in the same place.  But I disagree that the
> body of the option parsing loop is the right place for that.
Ok, I'll move it out of loop as you suggested.

[...]
Eduardo Habkost March 29, 2018, 4:24 p.m. UTC | #10
On Thu, Mar 29, 2018 at 01:43:03PM +0200, Igor Mammedov wrote:
> On Wed, 28 Mar 2018 16:21:48 -0300
> Eduardo Habkost <ehabkost@redhat.com> wrote:
> 
> > On Wed, Mar 28, 2018 at 01:48:35PM +0200, Igor Mammedov wrote:
> > > On Tue, 27 Mar 2018 17:05:41 +0200
> > > Igor Mammedov <imammedo@redhat.com> wrote:
> > >   
> > > > On Fri, 23 Mar 2018 18:25:08 -0300
> > > > Eduardo Habkost <ehabkost@redhat.com> wrote:
> > > >   
> > > > > On Mon, Mar 12, 2018 at 02:11:09PM +0100, Igor Mammedov wrote:    
> > > > [...]  
> > > [...]  
> > > > > > @@ -1886,6 +1895,13 @@ static bool main_loop_should_exit(void)
> > > > > >      RunState r;
> > > > > >      ShutdownCause request;
> > > > > >  
> > > > > > +    if (preconfig_exit_requested) {
> > > > > > +        if (runstate_check(RUN_STATE_PRECONFIG)) {      
> > > > > 
> > > > > Is it possible to have preconfig_exit_request set outside of
> > > > > RUN_STATE_PRECONFIG?  When and why?    
> > > > preconfig_exit_requested is initialized with TRUE and
> > > > in combo with '-inmigrate' we need this runstate check.
> > > > it's the same as it was with
> > > >  { RUN_STATE_PRELAUNCH, RUN_STATE_INMIGRATE },
> > > > which I probably should remove (I need to check it though)  
> > > [...]
> > >   
> > > > > > @@ -4594,6 +4623,10 @@ int main(int argc, char **argv, char **envp)
> > > > > >      }
> > > > > >      parse_numa_opts(current_machine);
> > > > > >  
> > > > > > +    /* do monitor/qmp handling at preconfig state if requested */
> > > > > > +    main_loop();      
> > > > > 
> > > > > Wouldn't it be simpler to do "if (!preconfig) { main_loop(); }"
> > > > > instead of entering main_loop() just to exit immediately?    
> > > > The thought didn't cross my mind, it might work and more readable
> > > > as one doesn't have to jump into main_loop() to find out that
> > > > it would exit immediately.
> > > > I'll try to it on respin.  
> > > Well doing as suggested end ups more messy:
> > > 
> > >     @@static bool main_loop_should_exit(void)
> > >     ...
> > >     if (preconfig_exit_requested) {
> > >         runstate_set(RUN_STATE_PRELAUNCH);                                        
> > >         return true;
> > >     }
> > >    
> > >     @@main
> > >     /* do monitor/qmp handling at preconfig state if requested */
> > >     if (!preconfig_exit_requested) {
> > >         main_loop();
> > >     } else if (runstate_check(RUN_STATE_PRECONFIG)) {
> > >         runstate_set(RUN_STATE_PRELAUNCH);
> > >     }  
> > 
> > This doesn't make sense to me.  Why would we enter
> > RUN_STATE_PRECONFIG state if -preconfig is not used at all?
> because of RUN_STATE_PRECONFIG becomes new initial state of
> our state machine where we start of (used to be RUN_STATE_PRELAUNCH)

Oh, I missed that part.

> Lets call it variant 1:
> 
> with this we have 2 possible transitions:
>  RUN_STATE_PRECONFIG -> RUN_STATE_PRELAUNCH (machine_init)
> 
> and
> 
>  RUN_STATE_PRECONFIG -> RUN_STATE_INMIGRATE
>    ugly but it was the same with RUN_STATE_PRELAUNCH initial transition

That explains a lot, thanks.

> 
> Another variant 2, in case we switch to RUN_STATE_PRECONFIG only on -preconfig
> transitions would be
>   RUN_STATE_PRELAUNCH -> RUN_STATE_PRECONFIG
>     (allow switch from initial to -preconfig)
> 
>   RUN_STATE_PRECONFIG -> RUN_STATE_PRELAUNCH
> 
> while the last is valid transition, the 1st one isn't really
> valid because of (beside of switching from initial state) it
> allows bouncing back to RUN_STATE_PRECONFIG later.
> 
> If we consider only state machine transitions, I think it's
> cleaner to start with variant 1 with the same
> -inmigrate hack we already have (which potentially could
> be fixed later), than allowing arbitrary bouncing to
> RUN_STATE_PRECONFIG at later stage.
> 
> With this approach all processing before machine_init()
> would run at RUN_STATE_PRECONFIG and then we would switch
> to RUN_STATE_PRELAUNCH. Even though it is far reaching
> goal but at least that's where we should be moving to
> have sane initialization flow in vl.c

Thanks, now variant 1 makes more sense to me.  But I really miss
here are very clear and explicit descriptions of what each state
really mean, and what are the differences between them.

It looks like the existing description for `prelaunch` isn't
accurate:

# @prelaunch: QEMU was started with -S and guest has not started

This is false, as QEMU can be in `prelaunch` state even if -S is
not used.


Also, this is the description you proposed for `preconfig`:

# @preconfig: QEMU is paused before board specific init callback is executed.
#             The state is reachable only if -preconfig CLI option is used.
#             (Since 2.12)

This seems wrong as well: the `prelaunch` state is reachable even
if `-preconfig` isn't used in the command-line (because it is the
initial state).


> 
> > >     preconfig_exit_requested = false;
> > >     ...
> > > 
> > > I'd prefer original v4 approach, where only main_loop_should_exit()
> > > has to deal with state transitions and book-keeping.  
> > 
> > If the above is unavoidable, I agree.  But I still don't
> > understand we have to enter PRECONFIG state if the user didn't
> > specify -preconfig.
> > 
>
Eduardo Habkost March 29, 2018, 4:57 p.m. UTC | #11
On Thu, Mar 29, 2018 at 03:01:12PM +0200, Igor Mammedov wrote:
> On Wed, 28 Mar 2018 16:17:32 -0300
> Eduardo Habkost <ehabkost@redhat.com> wrote:
> 
> > On Tue, Mar 27, 2018 at 05:05:41PM +0200, Igor Mammedov wrote:
> > > On Fri, 23 Mar 2018 18:25:08 -0300
> > > Eduardo Habkost <ehabkost@redhat.com> wrote:
> > >   
> > > > On Mon, Mar 12, 2018 at 02:11:09PM +0100, Igor Mammedov wrote:  
> > > [...]  
> > > > > diff --git a/vl.c b/vl.c
> > > > > index 3ef04ce..69b1997 100644
> > > > > --- a/vl.c
> > > > > +++ b/vl.c
> > > > > @@ -593,7 +593,7 @@ static int default_driver_check(void *opaque, QemuOpts *opts, Error **errp)
> > > > >  /***********************************************************/
> > > > >  /* QEMU state */
> > > > >  
> > > > > -static RunState current_run_state = RUN_STATE_PRELAUNCH;
> > > > > +static RunState current_run_state = RUN_STATE_PRECONFIG;
> > > > >  
> > > > >  /* We use RUN_STATE__MAX but any invalid value will do */
> > > > >  static RunState vmstop_requested = RUN_STATE__MAX;
> > > > > @@ -606,6 +606,9 @@ typedef struct {
> > > > >  
> > > > >  static const RunStateTransition runstate_transitions_def[] = {
> > > > >      /*     from      ->     to      */
> > > > > +    { RUN_STATE_PRECONFIG, RUN_STATE_PRELAUNCH },
> > > > > +    { RUN_STATE_PRECONFIG, RUN_STATE_INMIGRATE },    
> > > > 
> > > > Don't this mean -preconfig and -incoming could work together?  
> > > theoretically yes, but its not the reason why this transition is here.
> > > It's mimicking existing approach where initial state
> > >    { RUN_STATE_PRELAUNCH, RUN_STATE_INMIGRATE },
> > > were allowed to move to the next possible (including RUN_STATE_INMIGRATE)  
> > 
> > I still don't get it.  Where this definition of "next possible"
> > comes from?  If -incoming and -preconfig don't work together, why
> > is PRECONFIG -> INMIGRATE migration considered possible?
> I'd think it's the same (replacement) hack which we use now
>    RUN_STATE_PRELAUNCH -> RUN_STATE_INMIGRATE
> to allow following code to succeed:
> 
>       case QEMU_OPTION_incoming:
>       if (!incoming) {                                                 
>              runstate_set(RUN_STATE_INMIGRATE);                           
>       }                                                                
>       incoming = optarg;
>  
> I'd get rid of it and move state switching to the actual place
> where migration starts if it were just that simple, but from
> a quick look around it did look rather risky.
> That's why I abandoned an idea of changing it within this series.

Yeah, I now see that the initial state is PRECONFIG.

> 
> > > > >      { RUN_STATE_DEBUG, RUN_STATE_RUNNING },
> > > > >      { RUN_STATE_DEBUG, RUN_STATE_FINISH_MIGRATE },
> > > > >      { RUN_STATE_DEBUG, RUN_STATE_PRELAUNCH },
> > > > > @@ -1629,6 +1632,7 @@ static pid_t shutdown_pid;
> > > > >  static int powerdown_requested;
> > > > >  static int debug_requested;
> > > > >  static int suspend_requested;
> > > > > +static bool preconfig_exit_requested = true;
> > > > >  static WakeupReason wakeup_reason;
> > > > >  static NotifierList powerdown_notifiers =
> > > > >      NOTIFIER_LIST_INITIALIZER(powerdown_notifiers);
> > > > > @@ -1713,6 +1717,11 @@ static int qemu_debug_requested(void)
> > > > >      return r;
> > > > >  }
> > > > >  
> > > > > +void qemu_exit_preconfig_request(void)
> > > > > +{
> > > > > +    preconfig_exit_requested = true;
> > > > > +}
> > > > > +
> > > > >  /*
> > > > >   * Reset the VM. Issue an event unless @reason is SHUTDOWN_CAUSE_NONE.
> > > > >   */
> > > > > @@ -1886,6 +1895,13 @@ static bool main_loop_should_exit(void)
> > > > >      RunState r;
> > > > >      ShutdownCause request;
> > > > >  
> > > > > +    if (preconfig_exit_requested) {
> > > > > +        if (runstate_check(RUN_STATE_PRECONFIG)) {    
> > > > 
> > > > Is it possible to have preconfig_exit_request set outside of
> > > > RUN_STATE_PRECONFIG?  When and why?  
> > > preconfig_exit_requested is initialized with TRUE and
> > > in combo with '-inmigrate' we need this runstate check.  
> > 
> > I think this now makes sense to me.  It still looks confusing,
> > but I don't have a better suggestion right now.
> > 
> > Except...
> > 
> > Why exactly do you need to use main_loop() and
> > main_loop_should_exit() for the preconfig loop?  What about a
> > separate preconfig_loop() and preconfig_loop_should_exit()
> > function?
> that would duplicate main_loop() for practically no benefit at all,
> hence I'm reusing existing main_loop()/main_loop_should_exit()
> just by adding relevant exit condition. It also easier to read
> when state transitions are kept close to each other.

I wouldn't say that main_loop_should_exit() is easy to read, but
I understand that this is the existing style, so no objection.


> 
>  
> > > it's the same as it was with
> > >  { RUN_STATE_PRELAUNCH, RUN_STATE_INMIGRATE },
> > > which I probably should remove (I need to check it though)
> > >   
> > > > > +            runstate_set(RUN_STATE_PRELAUNCH);
> > > > > +        }
> > > > > +        preconfig_exit_requested = false;  
> > 
> > What happens if we don't set preconfig_exit_requested=false here?
> nothing should go wrong due to 'if (runstate_check(RUN_STATE_PRECONFIG))'
> condition. It's the same what qemu_reset_requested()/qemu_shutdown_requested()
> do with their respective request variables but not wrapped
> into a separate function as it's the only place it's used.
> 
>  
> > > > > +        return true;
> > > > > +    }
> > > > >      if (qemu_debug_requested()) {
> > > > >          vm_stop(RUN_STATE_DEBUG);
> > > > >      }
> > > > > @@ -3697,6 +3713,14 @@ int main(int argc, char **argv, char **envp)
> > > > >                      exit(1);
> > > > >                  }
> > > > >                  break;
> > > > > +            case QEMU_OPTION_preconfig:
> > > > > +                if (runstate_check(RUN_STATE_INMIGRATE)) {
> > > > > +                    error_report("option can not be used with "
> > > > > +                                 "-incoming option");
> > > > > +                    exit(EXIT_FAILURE);
> > > > > +                }    
> > > > 
> > > > So -incoming changes runstate as soon as the option is parsed?
> > > > 
> > > > Ouch.  
> > > yep and it's rather fragile (it's well out of scope of
> > > this series to re-factor this, so I'm not changing it here)
> > >   
> > > > I would rather not rely on that behavior and just do
> > > > "if (incoming)".
> > > > 
> > > > Why exactly it's not possible to use -incoming with -preconfig?  
> > > there are 2 reasons why I made options mutually exclusive
> > > 1. (excuse ) '-incoming' is an option with non explicit side effects
> > >    on other parts of code. It's hard to predict behavior
> > >    of preconfig commands in combination with inmigrate.
> > >    I wouldn't try to touch/change anything related to it
> > >    in this series.
> > >    If we need to change how option is handled, it should
> > >    be separate series that focuses on it.
> > > 2. (main reason) is to expose as minimal interface
> > >    as possible. It's easier to extend/modify it future if
> > >    necessary than cut it down after it was introduced.
> > > 
> > >    Not counting [1], I don't see a reason to permit
> > >    'preconfig' while migration is in progress.
> > >    Configuration commands that where used during 'preconfig'
> > >    stage on source side, should use corresponding CLI options
> > >    on target side. (it's the same behavior as with hotplugged
> > >    devices, keeping migration work-flow the same)
> > > 
> > > In short I'd prefer to keep restriction until there will be
> > > a real usecase for combo to work together.  
> > 
> > I understand the reasons, but I think we already have an
> > important use case: live-migrating a VM with non-trivial NUMA
> > config (that needs -preconfig).  Don't we?
> Not really,
> whatever we have configured on source side using -preconfig
> (discovering valid topology in process), we should be able
> to replicate using only CLI options on target since we
> already have all necessary values for it from source (it's
> certainly the case with this series set-numa-node command).
> 
> As for the future, I agree it would be much more flexible
> to allow both -preconfig and -incoming at the same time,
> so we could start target with empty CLI, and then feed it
> options from source. It would require audit/refactoring of
> INMIGRATE state and making 'all' current CLI options
> available via QMP interface.
> 
> But for now I'd prefer to keep using old way to start target.

Well, if management software developers tell us that -preconfig
will be already useful without -incoming support, I won't object.

But it would be very nice for management software if they can
simply assume that -preconfig and -incoming will work together
since the first version.  Can we have this as a goal for 2.13?


> 
> > > > > +                preconfig_exit_requested = false;
> > > > > +                break;
> > > > >              case QEMU_OPTION_enable_kvm:
> > > > >                  olist = qemu_find_opts("machine");
> > > > >                  qemu_opts_parse_noisily(olist, "accel=kvm", false);
> > > > > @@ -3902,6 +3926,11 @@ int main(int argc, char **argv, char **envp)
> > > > >                  }
> > > > >                  break;
> > > > >              case QEMU_OPTION_incoming:
> > > > > +                if (!preconfig_exit_requested) {
> > > > > +                    error_report("option can not be used with "
> > > > > +                                 "-preconfig option");
> > > > > +                    exit(EXIT_FAILURE);
> > > > > +                }    
> > > > 
> > > > Instead of reimplementing the same check in two separate places,
> > > > why not validate options and check for (incoming && preconfig)
> > > > after the option parsing loop?  
> > > it could be done this way, but then we would lose specialized
> > > error message.
> > > Even though the way I did it, it is more code but that code
> > > is close to related options and allows for specialized error
> > > message in the order options are parsed.  
> > 
> > What do you mean by specialized user message?  Both have exactly
> > the same information: "-incoming and -preconfig can't be used
> > together", just written in a different way.
> [...]
> > 
> > I agree with the argument that validation of config options
> > should be done all in the same place.  But I disagree that the
> > body of the option parsing loop is the right place for that.
> Ok, I'll move it out of loop as you suggested.
> 
> [...]
>
Peter Krempa April 3, 2018, 10:41 a.m. UTC | #12
On Thu, Mar 29, 2018 at 13:57:54 -0300, Eduardo Habkost wrote:
> On Thu, Mar 29, 2018 at 03:01:12PM +0200, Igor Mammedov wrote:
> > On Wed, 28 Mar 2018 16:17:32 -0300
> > Eduardo Habkost <ehabkost@redhat.com> wrote:
> > > On Tue, Mar 27, 2018 at 05:05:41PM +0200, Igor Mammedov wrote:
> > > > On Fri, 23 Mar 2018 18:25:08 -0300
> > > > Eduardo Habkost <ehabkost@redhat.com> wrote:
> > > > > On Mon, Mar 12, 2018 at 02:11:09PM +0100, Igor Mammedov wrote:  

[...]

> > > > > Why exactly it's not possible to use -incoming with -preconfig?  
> > > > there are 2 reasons why I made options mutually exclusive
> > > > 1. (excuse ) '-incoming' is an option with non explicit side effects
> > > >    on other parts of code. It's hard to predict behavior
> > > >    of preconfig commands in combination with inmigrate.
> > > >    I wouldn't try to touch/change anything related to it
> > > >    in this series.
> > > >    If we need to change how option is handled, it should
> > > >    be separate series that focuses on it.
> > > > 2. (main reason) is to expose as minimal interface
> > > >    as possible. It's easier to extend/modify it future if
> > > >    necessary than cut it down after it was introduced.
> > > > 
> > > >    Not counting [1], I don't see a reason to permit
> > > >    'preconfig' while migration is in progress.
> > > >    Configuration commands that where used during 'preconfig'
> > > >    stage on source side, should use corresponding CLI options
> > > >    on target side. (it's the same behavior as with hotplugged
> > > >    devices, keeping migration work-flow the same)
> > > > 
> > > > In short I'd prefer to keep restriction until there will be
> > > > a real usecase for combo to work together.  
> > > 
> > > I understand the reasons, but I think we already have an
> > > important use case: live-migrating a VM with non-trivial NUMA
> > > config (that needs -preconfig).  Don't we?
> > Not really,
> > whatever we have configured on source side using -preconfig
> > (discovering valid topology in process), we should be able
> > to replicate using only CLI options on target since we
> > already have all necessary values for it from source (it's
> > certainly the case with this series set-numa-node command).
> > 
> > As for the future, I agree it would be much more flexible
> > to allow both -preconfig and -incoming at the same time,
> > so we could start target with empty CLI, and then feed it
> > options from source. It would require audit/refactoring of
> > INMIGRATE state and making 'all' current CLI options
> > available via QMP interface.
> > 
> > But for now I'd prefer to keep using old way to start target.
> 
> Well, if management software developers tell us that -preconfig
> will be already useful without -incoming support, I won't object.


Hmm, that depends on what we will be configured using the new interface.
We usually prefer to use the same approach to set up things when
starting a new VM and when starting a VM for migration.

Since we do have options to setup the vCPU topology stuff on the
command line once we are going to migrate, we certainly can use
-preconfig even when it will collide with -incoming

Ideally -preconfig should replace -incoming, so that we can swithc to
incomming migration operation after we configure everything.

> But it would be very nice for management software if they can
> simply assume that -preconfig and -incoming will work together
> since the first version.  Can we have this as a goal for 2.13?

It usually helps in reducing code clutter.
Igor Mammedov April 3, 2018, 1:49 p.m. UTC | #13
On Thu, 29 Mar 2018 13:57:54 -0300
Eduardo Habkost <ehabkost@redhat.com> wrote:

[...]
> > As for the future, I agree it would be much more flexible
> > to allow both -preconfig and -incoming at the same time,
> > so we could start target with empty CLI, and then feed it
> > options from source. It would require audit/refactoring of
> > INMIGRATE state and making 'all' current CLI options
> > available via QMP interface.
> > 
> > But for now I'd prefer to keep using old way to start target.  
> 
> Well, if management software developers tell us that -preconfig
> will be already useful without -incoming support, I won't object.
As Peter said, mgmt can/will use -preconfig even without -incoming,
since it lets deal with initial (source) start-up problem (i.e.
avoid restarting QEMU) and lets us make numa configuration work
in qemu/libvirt stack without guess work. (that's the end goal of
the series)

> But it would be very nice for management software if they can
> simply assume that -preconfig and -incoming will work together
> since the first version.  Can we have this as a goal for 2.13?
I can't promise to refactor -incoming in near future, as I'm working
on ARM cpu-hotplug currently and next in queue is ARM memory hotplug.

Peter suggested an alternative idea, to abandon -incoming and
enable incoming migration from QMP after all configuration is done.
Amount of refactoring need probably would be the same but at least
interface and runstate transitions wise it looks cleaner.

[...]
Eduardo Habkost April 3, 2018, 1:52 p.m. UTC | #14
On Tue, Apr 03, 2018 at 03:49:07PM +0200, Igor Mammedov wrote:
> On Thu, 29 Mar 2018 13:57:54 -0300
> Eduardo Habkost <ehabkost@redhat.com> wrote:
> 
> [...]
> > > As for the future, I agree it would be much more flexible
> > > to allow both -preconfig and -incoming at the same time,
> > > so we could start target with empty CLI, and then feed it
> > > options from source. It would require audit/refactoring of
> > > INMIGRATE state and making 'all' current CLI options
> > > available via QMP interface.
> > > 
> > > But for now I'd prefer to keep using old way to start target.  
> > 
> > Well, if management software developers tell us that -preconfig
> > will be already useful without -incoming support, I won't object.
> As Peter said, mgmt can/will use -preconfig even without -incoming,
> since it lets deal with initial (source) start-up problem (i.e.
> avoid restarting QEMU) and lets us make numa configuration work
> in qemu/libvirt stack without guess work. (that's the end goal of
> the series)
> 
> > But it would be very nice for management software if they can
> > simply assume that -preconfig and -incoming will work together
> > since the first version.  Can we have this as a goal for 2.13?
> I can't promise to refactor -incoming in near future, as I'm working
> on ARM cpu-hotplug currently and next in queue is ARM memory hotplug.
> 
> Peter suggested an alternative idea, to abandon -incoming and
> enable incoming migration from QMP after all configuration is done.
> Amount of refactoring need probably would be the same but at least
> interface and runstate transitions wise it looks cleaner.

Also, support for the "start incoming migration" QMP command
would be very easy to probe using existing mechanisms.  Sounds
good to me.
Igor Mammedov April 3, 2018, 2:32 p.m. UTC | #15
On Thu, 29 Mar 2018 13:24:09 -0300
Eduardo Habkost <ehabkost@redhat.com> wrote:

> On Thu, Mar 29, 2018 at 01:43:03PM +0200, Igor Mammedov wrote:
> > On Wed, 28 Mar 2018 16:21:48 -0300
> > Eduardo Habkost <ehabkost@redhat.com> wrote:
> >   
> > > On Wed, Mar 28, 2018 at 01:48:35PM +0200, Igor Mammedov wrote:  
> > > > On Tue, 27 Mar 2018 17:05:41 +0200
> > > > Igor Mammedov <imammedo@redhat.com> wrote:
> > > >     
> > > > > On Fri, 23 Mar 2018 18:25:08 -0300
> > > > > Eduardo Habkost <ehabkost@redhat.com> wrote:
[...]
> > > This doesn't make sense to me.  Why would we enter
> > > RUN_STATE_PRECONFIG state if -preconfig is not used at all?  
> > because of RUN_STATE_PRECONFIG becomes new initial state of
> > our state machine where we start of (used to be RUN_STATE_PRELAUNCH)  
> 
> Oh, I missed that part.
> 
> > Lets call it variant 1:
> > 
> > with this we have 2 possible transitions:
> >  RUN_STATE_PRECONFIG -> RUN_STATE_PRELAUNCH (machine_init)
> > 
> > and
> > 
> >  RUN_STATE_PRECONFIG -> RUN_STATE_INMIGRATE
> >    ugly but it was the same with RUN_STATE_PRELAUNCH initial transition  
[...]
> 
> Thanks, now variant 1 makes more sense to me.  But I really miss
> here are very clear and explicit descriptions of what each state
> really mean, and what are the differences between them.
> 
> It looks like the existing description for `prelaunch` isn't
> accurate:
> 
> # @prelaunch: QEMU was started with -S and guest has not started
> 
> This is false, as QEMU can be in `prelaunch` state even if -S is
> not used.
> 
> 
> Also, this is the description you proposed for `preconfig`:
> 
> # @preconfig: QEMU is paused before board specific init callback is executed.
> #             The state is reachable only if -preconfig CLI option is used.
> #             (Since 2.12)
> 
> This seems wrong as well: the `prelaunch` state is reachable even
> if `-preconfig` isn't used in the command-line (because it is the
> initial state).
I'm not sure we should describe transitions/initial state here
(it's not the case now).

I think descriptions 'almost' match 'end' result of what QMP client
cloud see and the initial state is not something that QMP user could
observe.

[...]
Eduardo Habkost April 3, 2018, 3:31 p.m. UTC | #16
On Tue, Apr 03, 2018 at 04:32:53PM +0200, Igor Mammedov wrote:
> On Thu, 29 Mar 2018 13:24:09 -0300
> Eduardo Habkost <ehabkost@redhat.com> wrote:
> 
> > On Thu, Mar 29, 2018 at 01:43:03PM +0200, Igor Mammedov wrote:
> > > On Wed, 28 Mar 2018 16:21:48 -0300
> > > Eduardo Habkost <ehabkost@redhat.com> wrote:
> > >   
> > > > On Wed, Mar 28, 2018 at 01:48:35PM +0200, Igor Mammedov wrote:  
> > > > > On Tue, 27 Mar 2018 17:05:41 +0200
> > > > > Igor Mammedov <imammedo@redhat.com> wrote:
> > > > >     
> > > > > > On Fri, 23 Mar 2018 18:25:08 -0300
> > > > > > Eduardo Habkost <ehabkost@redhat.com> wrote:
> [...]
> > > > This doesn't make sense to me.  Why would we enter
> > > > RUN_STATE_PRECONFIG state if -preconfig is not used at all?  
> > > because of RUN_STATE_PRECONFIG becomes new initial state of
> > > our state machine where we start of (used to be RUN_STATE_PRELAUNCH)  
> > 
> > Oh, I missed that part.
> > 
> > > Lets call it variant 1:
> > > 
> > > with this we have 2 possible transitions:
> > >  RUN_STATE_PRECONFIG -> RUN_STATE_PRELAUNCH (machine_init)
> > > 
> > > and
> > > 
> > >  RUN_STATE_PRECONFIG -> RUN_STATE_INMIGRATE
> > >    ugly but it was the same with RUN_STATE_PRELAUNCH initial transition  
> [...]
> > 
> > Thanks, now variant 1 makes more sense to me.  But I really miss
> > here are very clear and explicit descriptions of what each state
> > really mean, and what are the differences between them.
> > 
> > It looks like the existing description for `prelaunch` isn't
> > accurate:
> > 
> > # @prelaunch: QEMU was started with -S and guest has not started
> > 
> > This is false, as QEMU can be in `prelaunch` state even if -S is
> > not used.
> > 
> > 
> > Also, this is the description you proposed for `preconfig`:
> > 
> > # @preconfig: QEMU is paused before board specific init callback is executed.
> > #             The state is reachable only if -preconfig CLI option is used.
> > #             (Since 2.12)
> > 
> > This seems wrong as well: the `prelaunch` state is reachable even
> > if `-preconfig` isn't used in the command-line (because it is the
> > initial state).
> I'm not sure we should describe transitions/initial state here
> (it's not the case now).
> 
> I think descriptions 'almost' match 'end' result of what QMP client
> cloud see and the initial state is not something that QMP user could
> observe.

That's my impression as well: `prelaunch` should be visible
externally only if `-S` was used, and `preconfig` should be
visible externally only if `-preconfig` was used.

However, I miss documentation on what are the
expectations/requirements internally.  e.g. there's no
explanation why a reset request moves QEMU to
RUN_STATE_PRELAUNCH.  Ideally, each line in
runstate_transition_def would have a clear explanation for
when/why each transition happens.

But this isn't a requirement for the new feature you are
implementing, anyway.
Igor Mammedov April 4, 2018, 8:51 a.m. UTC | #17
On Tue, 3 Apr 2018 12:31:29 -0300
Eduardo Habkost <ehabkost@redhat.com> wrote:

> On Tue, Apr 03, 2018 at 04:32:53PM +0200, Igor Mammedov wrote:
> > On Thu, 29 Mar 2018 13:24:09 -0300
> > Eduardo Habkost <ehabkost@redhat.com> wrote:
> >   
> > > On Thu, Mar 29, 2018 at 01:43:03PM +0200, Igor Mammedov wrote:  
> > > > On Wed, 28 Mar 2018 16:21:48 -0300
> > > > Eduardo Habkost <ehabkost@redhat.com> wrote:
> > > >     
> > > > > On Wed, Mar 28, 2018 at 01:48:35PM +0200, Igor Mammedov wrote:    
> > > > > > On Tue, 27 Mar 2018 17:05:41 +0200
> > > > > > Igor Mammedov <imammedo@redhat.com> wrote:
> > > > > >       
> > > > > > > On Fri, 23 Mar 2018 18:25:08 -0300
> > > > > > > Eduardo Habkost <ehabkost@redhat.com> wrote:  
> > [...]  
> > > > > This doesn't make sense to me.  Why would we enter
> > > > > RUN_STATE_PRECONFIG state if -preconfig is not used at all?    
> > > > because of RUN_STATE_PRECONFIG becomes new initial state of
> > > > our state machine where we start of (used to be RUN_STATE_PRELAUNCH)    
> > > 
> > > Oh, I missed that part.
> > >   
> > > > Lets call it variant 1:
> > > > 
> > > > with this we have 2 possible transitions:
> > > >  RUN_STATE_PRECONFIG -> RUN_STATE_PRELAUNCH (machine_init)
> > > > 
> > > > and
> > > > 
> > > >  RUN_STATE_PRECONFIG -> RUN_STATE_INMIGRATE
> > > >    ugly but it was the same with RUN_STATE_PRELAUNCH initial transition    
> > [...]  
> > > 
> > > Thanks, now variant 1 makes more sense to me.  But I really miss
> > > here are very clear and explicit descriptions of what each state
> > > really mean, and what are the differences between them.
> > > 
> > > It looks like the existing description for `prelaunch` isn't
> > > accurate:
> > > 
> > > # @prelaunch: QEMU was started with -S and guest has not started
> > > 
> > > This is false, as QEMU can be in `prelaunch` state even if -S is
> > > not used.
> > > 
> > > 
> > > Also, this is the description you proposed for `preconfig`:
> > > 
> > > # @preconfig: QEMU is paused before board specific init callback is executed.
> > > #             The state is reachable only if -preconfig CLI option is used.
> > > #             (Since 2.12)
> > > 
> > > This seems wrong as well: the `prelaunch` state is reachable even
> > > if `-preconfig` isn't used in the command-line (because it is the
> > > initial state).  
> > I'm not sure we should describe transitions/initial state here
> > (it's not the case now).
> > 
> > I think descriptions 'almost' match 'end' result of what QMP client
> > cloud see and the initial state is not something that QMP user could
> > observe.  
> 
> That's my impression as well: `prelaunch` should be visible
> externally only if `-S` was used, and `preconfig` should be
> visible externally only if `-preconfig` was used.
> 
> However, I miss documentation on what are the
> expectations/requirements internally.  e.g. there's no
> explanation why a reset request moves QEMU to
> RUN_STATE_PRELAUNCH.  Ideally, each line in
> runstate_transition_def would have a clear explanation for
> when/why each transition happens.
> 
> But this isn't a requirement for the new feature you are
> implementing, anyway.
I'll add TODO comment to inmigrate to transition I'm adding,
to tag it for future work.

+    /* Early switch to inmigrate state to allow
+     * -incoming CLI option work as it used to.
+     * TODO: delay actual switching to inmigrate
+     * state to the point after machine is built
+     * and remove this hack.
+     */
+    { RUN_STATE_PRECONFIG, RUN_STATE_INMIGRATE },
Dr. David Alan Gilbert April 30, 2018, 7:12 p.m. UTC | #18
* Eduardo Habkost (ehabkost@redhat.com) wrote:
> On Tue, Apr 03, 2018 at 03:49:07PM +0200, Igor Mammedov wrote:
> > On Thu, 29 Mar 2018 13:57:54 -0300
> > Eduardo Habkost <ehabkost@redhat.com> wrote:
> > 
> > [...]
> > > > As for the future, I agree it would be much more flexible
> > > > to allow both -preconfig and -incoming at the same time,
> > > > so we could start target with empty CLI, and then feed it
> > > > options from source. It would require audit/refactoring of
> > > > INMIGRATE state and making 'all' current CLI options
> > > > available via QMP interface.
> > > > 
> > > > But for now I'd prefer to keep using old way to start target.  
> > > 
> > > Well, if management software developers tell us that -preconfig
> > > will be already useful without -incoming support, I won't object.
> > As Peter said, mgmt can/will use -preconfig even without -incoming,
> > since it lets deal with initial (source) start-up problem (i.e.
> > avoid restarting QEMU) and lets us make numa configuration work
> > in qemu/libvirt stack without guess work. (that's the end goal of
> > the series)
> > 
> > > But it would be very nice for management software if they can
> > > simply assume that -preconfig and -incoming will work together
> > > since the first version.  Can we have this as a goal for 2.13?
> > I can't promise to refactor -incoming in near future, as I'm working
> > on ARM cpu-hotplug currently and next in queue is ARM memory hotplug.
> > 
> > Peter suggested an alternative idea, to abandon -incoming and
> > enable incoming migration from QMP after all configuration is done.
> > Amount of refactoring need probably would be the same but at least
> > interface and runstate transitions wise it looks cleaner.
> 
> Also, support for the "start incoming migration" QMP command
> would be very easy to probe using existing mechanisms.  Sounds
> good to me.

You could stick to -incoming defer,   that already says that you want
to do an inward migration but doesn't say where from until later.

Note there are some other bits of code scattered around for
checking whether we're in -incoming mode, for example to stop you
do an outwards migration while waiting for an incoming one, or
to stop you doing a migrate-incoming while you're already waiting
for one.

Dave

> -- 
> Eduardo
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
diff mbox series

Patch

diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 356bfdc..996bc38 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -66,6 +66,7 @@  typedef enum WakeupReason {
     QEMU_WAKEUP_REASON_OTHER,
 } WakeupReason;
 
+void qemu_exit_preconfig_request(void);
 void qemu_system_reset_request(ShutdownCause reason);
 void qemu_system_suspend_request(void);
 void qemu_register_suspend_notifier(Notifier *notifier);
diff --git a/qapi/run-state.json b/qapi/run-state.json
index 1c9fff3..ce846a5 100644
--- a/qapi/run-state.json
+++ b/qapi/run-state.json
@@ -49,12 +49,15 @@ 
 # @colo: guest is paused to save/restore VM state under colo checkpoint,
 #        VM can not get into this state unless colo capability is enabled
 #        for migration. (since 2.8)
+# @preconfig: QEMU is paused before board specific init callback is executed.
+#             The state is reachable only if -preconfig CLI option is used.
+#             (Since 2.12)
 ##
 { 'enum': 'RunState',
   'data': [ 'debug', 'inmigrate', 'internal-error', 'io-error', 'paused',
             'postmigrate', 'prelaunch', 'finish-migrate', 'restore-vm',
             'running', 'save-vm', 'shutdown', 'suspended', 'watchdog',
-            'guest-panicked', 'colo' ] }
+            'guest-panicked', 'colo', 'preconfig' ] }
 
 ##
 # @StatusInfo:
diff --git a/qemu-options.hx b/qemu-options.hx
index 6585058..7c8aaa5 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -3302,6 +3302,19 @@  STEXI
 Run the emulation in single step mode.
 ETEXI
 
+DEF("preconfig", 0, QEMU_OPTION_preconfig, \
+    "-preconfig      pause QEMU before machine is initialized\n",
+    QEMU_ARCH_ALL)
+STEXI
+@item -preconfig
+@findex -preconfig
+Pause QEMU for interactive configuration before the machine is created,
+which allows querying and configuring properties that will affect
+machine initialization. Use the QMP command 'cont' to exit the preconfig
+state and move to the next state (ie. run guest if -S isn't used or
+pause the second time is -S is used).
+ETEXI
+
 DEF("S", 0, QEMU_OPTION_S, \
     "-S              freeze CPU at startup (use 'c' to start execution)\n",
     QEMU_ARCH_ALL)
diff --git a/qmp.c b/qmp.c
index 8c7d1cc..b38090d 100644
--- a/qmp.c
+++ b/qmp.c
@@ -166,6 +166,11 @@  void qmp_cont(Error **errp)
     BlockBackend *blk;
     Error *local_err = NULL;
 
+    if (runstate_check(RUN_STATE_PRECONFIG)) {
+        qemu_exit_preconfig_request();
+        return;
+    }
+
     /* if there is a dump in background, we should wait until the dump
      * finished */
     if (dump_in_progress()) {
diff --git a/vl.c b/vl.c
index 3ef04ce..69b1997 100644
--- a/vl.c
+++ b/vl.c
@@ -593,7 +593,7 @@  static int default_driver_check(void *opaque, QemuOpts *opts, Error **errp)
 /***********************************************************/
 /* QEMU state */
 
-static RunState current_run_state = RUN_STATE_PRELAUNCH;
+static RunState current_run_state = RUN_STATE_PRECONFIG;
 
 /* We use RUN_STATE__MAX but any invalid value will do */
 static RunState vmstop_requested = RUN_STATE__MAX;
@@ -606,6 +606,9 @@  typedef struct {
 
 static const RunStateTransition runstate_transitions_def[] = {
     /*     from      ->     to      */
+    { RUN_STATE_PRECONFIG, RUN_STATE_PRELAUNCH },
+    { RUN_STATE_PRECONFIG, RUN_STATE_INMIGRATE },
+
     { RUN_STATE_DEBUG, RUN_STATE_RUNNING },
     { RUN_STATE_DEBUG, RUN_STATE_FINISH_MIGRATE },
     { RUN_STATE_DEBUG, RUN_STATE_PRELAUNCH },
@@ -1629,6 +1632,7 @@  static pid_t shutdown_pid;
 static int powerdown_requested;
 static int debug_requested;
 static int suspend_requested;
+static bool preconfig_exit_requested = true;
 static WakeupReason wakeup_reason;
 static NotifierList powerdown_notifiers =
     NOTIFIER_LIST_INITIALIZER(powerdown_notifiers);
@@ -1713,6 +1717,11 @@  static int qemu_debug_requested(void)
     return r;
 }
 
+void qemu_exit_preconfig_request(void)
+{
+    preconfig_exit_requested = true;
+}
+
 /*
  * Reset the VM. Issue an event unless @reason is SHUTDOWN_CAUSE_NONE.
  */
@@ -1886,6 +1895,13 @@  static bool main_loop_should_exit(void)
     RunState r;
     ShutdownCause request;
 
+    if (preconfig_exit_requested) {
+        if (runstate_check(RUN_STATE_PRECONFIG)) {
+            runstate_set(RUN_STATE_PRELAUNCH);
+        }
+        preconfig_exit_requested = false;
+        return true;
+    }
     if (qemu_debug_requested()) {
         vm_stop(RUN_STATE_DEBUG);
     }
@@ -3697,6 +3713,14 @@  int main(int argc, char **argv, char **envp)
                     exit(1);
                 }
                 break;
+            case QEMU_OPTION_preconfig:
+                if (runstate_check(RUN_STATE_INMIGRATE)) {
+                    error_report("option can not be used with "
+                                 "-incoming option");
+                    exit(EXIT_FAILURE);
+                }
+                preconfig_exit_requested = false;
+                break;
             case QEMU_OPTION_enable_kvm:
                 olist = qemu_find_opts("machine");
                 qemu_opts_parse_noisily(olist, "accel=kvm", false);
@@ -3902,6 +3926,11 @@  int main(int argc, char **argv, char **envp)
                 }
                 break;
             case QEMU_OPTION_incoming:
+                if (!preconfig_exit_requested) {
+                    error_report("option can not be used with "
+                                 "-preconfig option");
+                    exit(EXIT_FAILURE);
+                }
                 if (!incoming) {
                     runstate_set(RUN_STATE_INMIGRATE);
                 }
@@ -4594,6 +4623,10 @@  int main(int argc, char **argv, char **envp)
     }
     parse_numa_opts(current_machine);
 
+    /* do monitor/qmp handling at preconfig state if requested */
+    main_loop();
+
+    /* from here on runstate is RUN_STATE_PRELAUNCH */
     machine_run_board_init(current_machine);
 
     realtime_init();