diff mbox

[2/4] vl: move global property, migrate init earlier

Message ID 1499049848-18012-3-git-send-email-peterx@redhat.com
State New
Headers show

Commit Message

Peter Xu July 3, 2017, 2:44 a.m. UTC
Currently drive_init_func() may call migrate_get_current() while the
migrate object is still not ready yet at that time. Move the migration
object init earlier, along with the global properties, right after
acceleration init.

This fixes a breakage for iotest 055, which caused an assertion failure.

Reported-by: Max Reitz <mreitz@redhat.com>
Reported-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Fixes: 3df663 ("migration: move only_migratable to MigrationState")
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 vl.c | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

Comments

Eduardo Habkost July 3, 2017, 2:59 p.m. UTC | #1
On Mon, Jul 03, 2017 at 10:44:06AM +0800, Peter Xu wrote:
> Currently drive_init_func() may call migrate_get_current() while the
> migrate object is still not ready yet at that time. Move the migration
> object init earlier, along with the global properties, right after
> acceleration init.
> 
> This fixes a breakage for iotest 055, which caused an assertion failure.
> 
> Reported-by: Max Reitz <mreitz@redhat.com>
> Reported-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
> Fixes: 3df663 ("migration: move only_migratable to MigrationState")
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  vl.c | 24 ++++++++++++------------
>  1 file changed, 12 insertions(+), 12 deletions(-)
> 
> diff --git a/vl.c b/vl.c
> index 0c497a3..2ae4313 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -4414,6 +4414,18 @@ int main(int argc, char **argv, char **envp)
>  
>      configure_accelerator(current_machine);
>  
> +    /*
> +     * Register all the global properties, including accel properties,
> +     * machine properties, and user-specified ones.
> +     */
> +    register_global_properties(current_machine);
> +
> +    /*
> +     * Migration object can only be created after global properties
> +     * are applied correctly.
> +     */
> +    migration_object_init();
> +

So, things that might introduce bugs here are:
1) Unexpected qdev_prop_register_global() calls between this place and
   the original register_global_properties() call (that would now happen
   in a different order).
2) register_global_properties() seeing a different global property list
   because it is being called earlier.
   2.1) AccelClass::global_props is statically defined and will be the
        same here. Not a problem.
   2.2) MachineClass::compat_props: same as above.
   2.3) user-provided global properties: we need to ensure all
        properties in the "global" QemuOpts section are already
        available at this point.


To ensure (1) is not a problem, we need to check all calls for
qdev_prop_register_global().  The callers are:

* configure_rtc()
  - Called very early, when parsing command-line options.  Not a problem.[1]
* global_init_func()
  * Called by user_register_global_props()
    * Called by register_global_properties().
      - That's the code we're moving.  Not a problem.
* QEMU_OPTION_rtc_td_hack case in main()
  - Called very early, when parsing command-line options.  Not a problem.[1]
* QEMU_OPTION_no_kvm_pit_reinjection case in main()
  - Called very early, when parsing command-line options.  Not a problem.[1]
* register_compat_prop()
  * Called by machine_register_compat_props()
    * Called by register_global_properties().
      - That's the code we're moving.  Not a problem.
  * Called by machine_register_compat_for_subclass()
    * Called by machine_register_compat_props() (see above)
  * Called by register_compat_props_array()
    * Called by accel_register_compat_props()
      * Called by register_global_properties().
        - That's the code we're moving.  Not a problem.
* qdev_prop_register_global_list()
  - Used only by unit test code.
* cpu_common_parse_features()
  - Used when initializing CPUs, which is done much later, when
    machine_run_board_init() is called (or when -device is handled by
    device_init_func()).  Not a problem.[2]
* x86_cpu_parse_featurestr()
  - Same as above.


To ensure (2.3) is not a problem, we need to look for references to
qemu_find_opts("global") or qemu_global_opts.  They are:

* user_register_global_props()
  - This is the code we're moving.  Not a problem.
* default_driver_check() call at main()
  - This happens earlier, before the code we're moving.  Not a problem.
* qemu_global_option()
  - Called very early, when parsing command-line options.  Not a problem.


So the code reordering looks OK.

Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>


Notes about things we need to fix in the future:

[1] I think they should be replaced by qemu_global_option() calls, though.
[2] This is a fragile portion of the global property code and should be
    eventually moved to the command-line parsing section of main().


>      if (qtest_chrdev) {
>          qtest_init(qtest_chrdev, qtest_log, &error_fatal);
>      }
> @@ -4595,18 +4607,6 @@ int main(int argc, char **argv, char **envp)
>              exit (i == 1 ? 1 : 0);
>      }
>  
> -    /*
> -     * Register all the global properties, including accel properties,
> -     * machine properties, and user-specified ones.
> -     */
> -    register_global_properties(current_machine);
> -
> -    /*
> -     * Migration object can only be created after global properties
> -     * are applied correctly.
> -     */
> -    migration_object_init();
> -
>      /* This checkpoint is required by replay to separate prior clock
>         reading from the other reads, because timer polling functions query
>         clock values from the log. */
> -- 
> 2.7.4
>
Peter Xu July 4, 2017, 1:43 a.m. UTC | #2
On Mon, Jul 03, 2017 at 11:59:03AM -0300, Eduardo Habkost wrote:
> On Mon, Jul 03, 2017 at 10:44:06AM +0800, Peter Xu wrote:
> > Currently drive_init_func() may call migrate_get_current() while the
> > migrate object is still not ready yet at that time. Move the migration
> > object init earlier, along with the global properties, right after
> > acceleration init.
> > 
> > This fixes a breakage for iotest 055, which caused an assertion failure.
> > 
> > Reported-by: Max Reitz <mreitz@redhat.com>
> > Reported-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
> > Fixes: 3df663 ("migration: move only_migratable to MigrationState")
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >  vl.c | 24 ++++++++++++------------
> >  1 file changed, 12 insertions(+), 12 deletions(-)
> > 
> > diff --git a/vl.c b/vl.c
> > index 0c497a3..2ae4313 100644
> > --- a/vl.c
> > +++ b/vl.c
> > @@ -4414,6 +4414,18 @@ int main(int argc, char **argv, char **envp)
> >  
> >      configure_accelerator(current_machine);
> >  
> > +    /*
> > +     * Register all the global properties, including accel properties,
> > +     * machine properties, and user-specified ones.
> > +     */
> > +    register_global_properties(current_machine);
> > +
> > +    /*
> > +     * Migration object can only be created after global properties
> > +     * are applied correctly.
> > +     */
> > +    migration_object_init();
> > +
> 
> So, things that might introduce bugs here are:
> 1) Unexpected qdev_prop_register_global() calls between this place and
>    the original register_global_properties() call (that would now happen
>    in a different order).
> 2) register_global_properties() seeing a different global property list
>    because it is being called earlier.
>    2.1) AccelClass::global_props is statically defined and will be the
>         same here. Not a problem.
>    2.2) MachineClass::compat_props: same as above.
>    2.3) user-provided global properties: we need to ensure all
>         properties in the "global" QemuOpts section are already
>         available at this point.
> 
> 
> To ensure (1) is not a problem, we need to check all calls for
> qdev_prop_register_global().  The callers are:
> 
> * configure_rtc()
>   - Called very early, when parsing command-line options.  Not a problem.[1]
> * global_init_func()
>   * Called by user_register_global_props()
>     * Called by register_global_properties().
>       - That's the code we're moving.  Not a problem.
> * QEMU_OPTION_rtc_td_hack case in main()
>   - Called very early, when parsing command-line options.  Not a problem.[1]
> * QEMU_OPTION_no_kvm_pit_reinjection case in main()
>   - Called very early, when parsing command-line options.  Not a problem.[1]
> * register_compat_prop()
>   * Called by machine_register_compat_props()
>     * Called by register_global_properties().
>       - That's the code we're moving.  Not a problem.
>   * Called by machine_register_compat_for_subclass()
>     * Called by machine_register_compat_props() (see above)
>   * Called by register_compat_props_array()
>     * Called by accel_register_compat_props()
>       * Called by register_global_properties().
>         - That's the code we're moving.  Not a problem.
> * qdev_prop_register_global_list()
>   - Used only by unit test code.
> * cpu_common_parse_features()
>   - Used when initializing CPUs, which is done much later, when
>     machine_run_board_init() is called (or when -device is handled by
>     device_init_func()).  Not a problem.[2]
> * x86_cpu_parse_featurestr()
>   - Same as above.
> 
> 
> To ensure (2.3) is not a problem, we need to look for references to
> qemu_find_opts("global") or qemu_global_opts.  They are:
> 
> * user_register_global_props()
>   - This is the code we're moving.  Not a problem.
> * default_driver_check() call at main()
>   - This happens earlier, before the code we're moving.  Not a problem.
> * qemu_global_option()
>   - Called very early, when parsing command-line options.  Not a problem.
> 
> 
> So the code reordering looks OK.
> 
> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>

I really appreciate for such an in-depth review on this patch.

> 
> 
> Notes about things we need to fix in the future:
> 
> [1] I think they should be replaced by qemu_global_option() calls, though.
> [2] This is a fragile portion of the global property code and should be
>     eventually moved to the command-line parsing section of main().

Agree on both.

Thanks again!
Markus Armbruster July 4, 2017, 8:12 a.m. UTC | #3
Eduardo Habkost <ehabkost@redhat.com> writes:

> On Mon, Jul 03, 2017 at 10:44:06AM +0800, Peter Xu wrote:
>> Currently drive_init_func() may call migrate_get_current() while the
>> migrate object is still not ready yet at that time. Move the migration
>> object init earlier, along with the global properties, right after
>> acceleration init.
>> 
>> This fixes a breakage for iotest 055, which caused an assertion failure.
>> 
>> Reported-by: Max Reitz <mreitz@redhat.com>
>> Reported-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
>> Fixes: 3df663 ("migration: move only_migratable to MigrationState")
>> Signed-off-by: Peter Xu <peterx@redhat.com>
>> ---
>>  vl.c | 24 ++++++++++++------------
>>  1 file changed, 12 insertions(+), 12 deletions(-)
>> 
>> diff --git a/vl.c b/vl.c
>> index 0c497a3..2ae4313 100644
>> --- a/vl.c
>> +++ b/vl.c
>> @@ -4414,6 +4414,18 @@ int main(int argc, char **argv, char **envp)
>>  
>>      configure_accelerator(current_machine);
>>  
>> +    /*
>> +     * Register all the global properties, including accel properties,
>> +     * machine properties, and user-specified ones.
>> +     */
>> +    register_global_properties(current_machine);
>> +
>> +    /*
>> +     * Migration object can only be created after global properties
>> +     * are applied correctly.
>> +     */
>> +    migration_object_init();
>> +
>
> So, things that might introduce bugs here are:
> 1) Unexpected qdev_prop_register_global() calls between this place and
>    the original register_global_properties() call (that would now happen
>    in a different order).
> 2) register_global_properties() seeing a different global property list
>    because it is being called earlier.
>    2.1) AccelClass::global_props is statically defined and will be the
>         same here. Not a problem.
>    2.2) MachineClass::compat_props: same as above.
>    2.3) user-provided global properties: we need to ensure all
>         properties in the "global" QemuOpts section are already
>         available at this point.

This stuff is fragile.  There are lots of dependencies, all implicit,
and once again we reorder things ever so slightly to keep them
satisfied.  We could certainly use a more robust solution.

[...]
Eduardo Habkost July 5, 2017, 2:10 p.m. UTC | #4
On Tue, Jul 04, 2017 at 10:12:45AM +0200, Markus Armbruster wrote:
> Eduardo Habkost <ehabkost@redhat.com> writes:
> 
> > On Mon, Jul 03, 2017 at 10:44:06AM +0800, Peter Xu wrote:
> >> Currently drive_init_func() may call migrate_get_current() while the
> >> migrate object is still not ready yet at that time. Move the migration
> >> object init earlier, along with the global properties, right after
> >> acceleration init.
> >> 
> >> This fixes a breakage for iotest 055, which caused an assertion failure.
> >> 
> >> Reported-by: Max Reitz <mreitz@redhat.com>
> >> Reported-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
> >> Fixes: 3df663 ("migration: move only_migratable to MigrationState")
> >> Signed-off-by: Peter Xu <peterx@redhat.com>
> >> ---
> >>  vl.c | 24 ++++++++++++------------
> >>  1 file changed, 12 insertions(+), 12 deletions(-)
> >> 
> >> diff --git a/vl.c b/vl.c
> >> index 0c497a3..2ae4313 100644
> >> --- a/vl.c
> >> +++ b/vl.c
> >> @@ -4414,6 +4414,18 @@ int main(int argc, char **argv, char **envp)
> >>  
> >>      configure_accelerator(current_machine);
> >>  
> >> +    /*
> >> +     * Register all the global properties, including accel properties,
> >> +     * machine properties, and user-specified ones.
> >> +     */
> >> +    register_global_properties(current_machine);
> >> +
> >> +    /*
> >> +     * Migration object can only be created after global properties
> >> +     * are applied correctly.
> >> +     */
> >> +    migration_object_init();
> >> +
> >
> > So, things that might introduce bugs here are:
> > 1) Unexpected qdev_prop_register_global() calls between this place and
> >    the original register_global_properties() call (that would now happen
> >    in a different order).
> > 2) register_global_properties() seeing a different global property list
> >    because it is being called earlier.
> >    2.1) AccelClass::global_props is statically defined and will be the
> >         same here. Not a problem.
> >    2.2) MachineClass::compat_props: same as above.
> >    2.3) user-provided global properties: we need to ensure all
> >         properties in the "global" QemuOpts section are already
> >         available at this point.
> 
> This stuff is fragile.  There are lots of dependencies, all implicit,
> and once again we reorder things ever so slightly to keep them
> satisfied.  We could certainly use a more robust solution.

Yes, and we are working on it.  Items 2.1-2.3 are already grouped in
register_global_properties() to make ordering rules explicit.  The rest
of the qdev_prop_register_global() calls need to be either replaced with
something else or moved to register_global_properties() too.
diff mbox

Patch

diff --git a/vl.c b/vl.c
index 0c497a3..2ae4313 100644
--- a/vl.c
+++ b/vl.c
@@ -4414,6 +4414,18 @@  int main(int argc, char **argv, char **envp)
 
     configure_accelerator(current_machine);
 
+    /*
+     * Register all the global properties, including accel properties,
+     * machine properties, and user-specified ones.
+     */
+    register_global_properties(current_machine);
+
+    /*
+     * Migration object can only be created after global properties
+     * are applied correctly.
+     */
+    migration_object_init();
+
     if (qtest_chrdev) {
         qtest_init(qtest_chrdev, qtest_log, &error_fatal);
     }
@@ -4595,18 +4607,6 @@  int main(int argc, char **argv, char **envp)
             exit (i == 1 ? 1 : 0);
     }
 
-    /*
-     * Register all the global properties, including accel properties,
-     * machine properties, and user-specified ones.
-     */
-    register_global_properties(current_machine);
-
-    /*
-     * Migration object can only be created after global properties
-     * are applied correctly.
-     */
-    migration_object_init();
-
     /* This checkpoint is required by replay to separate prior clock
        reading from the other reads, because timer polling functions query
        clock values from the log. */