diff mbox

[PULL,42/57] Page request: Consume pages off the post-copy queue

Message ID 20151112142034.GI2754@work-vm
State New
Headers show

Commit Message

Dr. David Alan Gilbert Nov. 12, 2015, 2:20 p.m. UTC
* Peter Maydell (peter.maydell@linaro.org) wrote:
> On 12 November 2015 at 13:18, Peter Maydell <peter.maydell@linaro.org> wrote:
> > On 12 November 2015 at 13:08, Dr. David Alan Gilbert
> > <dgilbert@redhat.com> wrote:
> >> OK, can you try a simple migration by hand outside of the test harness;
> >> just something simple like:
> >>
> >> ./bin/qemu-system-x86_64 -M pc -nographic
> >> (qemu) migrate "exec: cat > /dev/null"
> >>
> >> and the same with q35 ?
> >
> > (qemu) migrate "exec: cat > /dev/null"
> > migrate_get_current do init of current_migration 65307
> > unqueue_page 65307
> > 0   qemu-system-x86_64                  0x00000001067c01c3 qemu_mutex_lock + 83
> 
> This turns out to be because migrate_init() is corrupting the
> mutex memory when it does "memset(s, 0, sizeof(*s))". Presumably
> Linux's initialized-mutex is all-zeroes, but OSX's is not.

OK, thanks for finding that; I've just smoke tested the following
patch and will post it properly after I test it more thoroughly in
a couple of hours.

Dave


commit 689d4964442c3ee34a2dac77411a30b96c214288
Author: Dr. David Alan Gilbert <dgilbert@redhat.com>
Date:   Thu Nov 12 14:10:33 2015 +0000

    migration_init: Fix lock initialisation/make it explicit
    
    Peter reported a lock error on MacOS after my a82d593b
    patch.
    
    migrate_get_current does one-time initialisation of
    a bunch of variables.
    migrate_init does reinitialisation even on a 2nd
    migrate after a cancel.
    
    The problem here was that I'd initialised the mutex
    in migrate_get_current, and the memset in migrate_init
    corrupted it.
    
    Remove the memset and replace it by explicit initialisation
    of fields that need initialising; this also turns out to be simpler
    than the old code that had to preserve some fields.
    
    Reported-by: Peter Maydell <peter.maydell@linaro.org>
    Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
    Fixes: a82d593b

--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Comments

Juan Quintela Nov. 12, 2015, 3:25 p.m. UTC | #1
"Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> * Peter Maydell (peter.maydell@linaro.org) wrote:
>> On 12 November 2015 at 13:18, Peter Maydell <peter.maydell@linaro.org> wrote:
>> > On 12 November 2015 at 13:08, Dr. David Alan Gilbert
>> > <dgilbert@redhat.com> wrote:
>> >> OK, can you try a simple migration by hand outside of the test harness;
>> >> just something simple like:
>> >>
>> >> ./bin/qemu-system-x86_64 -M pc -nographic
>> >> (qemu) migrate "exec: cat > /dev/null"
>> >>
>> >> and the same with q35 ?
>> >
>> > (qemu) migrate "exec: cat > /dev/null"
>> > migrate_get_current do init of current_migration 65307
>> > unqueue_page 65307
>> > 0   qemu-system-x86_64                  0x00000001067c01c3 qemu_mutex_lock + 83
>> 
>> This turns out to be because migrate_init() is corrupting the
>> mutex memory when it does "memset(s, 0, sizeof(*s))". Presumably
>> Linux's initialized-mutex is all-zeroes, but OSX's is not.
>
> OK, thanks for finding that; I've just smoke tested the following
> patch and will post it properly after I test it more thoroughly in
> a couple of hours.

I did a patch that was almost identical. It is passing for me virt-test.

Later, Juan.
Dr. David Alan Gilbert Nov. 12, 2015, 3:57 p.m. UTC | #2
* Juan Quintela (quintela@redhat.com) wrote:
> "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> > * Peter Maydell (peter.maydell@linaro.org) wrote:
> >> On 12 November 2015 at 13:18, Peter Maydell <peter.maydell@linaro.org> wrote:
> >> > On 12 November 2015 at 13:08, Dr. David Alan Gilbert
> >> > <dgilbert@redhat.com> wrote:
> >> >> OK, can you try a simple migration by hand outside of the test harness;
> >> >> just something simple like:
> >> >>
> >> >> ./bin/qemu-system-x86_64 -M pc -nographic
> >> >> (qemu) migrate "exec: cat > /dev/null"
> >> >>
> >> >> and the same with q35 ?
> >> >
> >> > (qemu) migrate "exec: cat > /dev/null"
> >> > migrate_get_current do init of current_migration 65307
> >> > unqueue_page 65307
> >> > 0   qemu-system-x86_64                  0x00000001067c01c3 qemu_mutex_lock + 83
> >> 
> >> This turns out to be because migrate_init() is corrupting the
> >> mutex memory when it does "memset(s, 0, sizeof(*s))". Presumably
> >> Linux's initialized-mutex is all-zeroes, but OSX's is not.
> >
> > OK, thanks for finding that; I've just smoke tested the following
> > patch and will post it properly after I test it more thoroughly in
> > a couple of hours.
> 
> I did a patch that was almost identical. It is passing for me virt-test.

and the one I posted seems to survive postcopy as well; so looks good.

Dave

> 
> Later, Juan.
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
diff mbox

Patch

diff --git a/migration/migration.c b/migration/migration.c
index 9bd2ce7..7e4e27b 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -902,38 +902,31 @@  bool migration_in_postcopy(MigrationState *s)
 MigrationState *migrate_init(const MigrationParams *params)
 {
     MigrationState *s = migrate_get_current();
-    int64_t bandwidth_limit = s->bandwidth_limit;
-    bool enabled_capabilities[MIGRATION_CAPABILITY_MAX];
-    int64_t xbzrle_cache_size = s->xbzrle_cache_size;
-    int compress_level = s->parameters[MIGRATION_PARAMETER_COMPRESS_LEVEL];
-    int compress_thread_count =
-            s->parameters[MIGRATION_PARAMETER_COMPRESS_THREADS];
-    int decompress_thread_count =
-            s->parameters[MIGRATION_PARAMETER_DECOMPRESS_THREADS];
-    int x_cpu_throttle_initial =
-            s->parameters[MIGRATION_PARAMETER_X_CPU_THROTTLE_INITIAL];
-    int x_cpu_throttle_increment =
-            s->parameters[MIGRATION_PARAMETER_X_CPU_THROTTLE_INCREMENT];
-
-    memcpy(enabled_capabilities, s->enabled_capabilities,
-           sizeof(enabled_capabilities));
 
-    memset(s, 0, sizeof(*s));
+    /*
+     * Reinitialise all migration state, except
+     * parameters/capabilities that the user set, and
+     * locks.
+     */
+    s->bytes_xfer = 0;
+    s->xfer_limit = 0;
+    s->cleanup_bh = 0;
+    s->file = NULL;
+    s->state = MIGRATION_STATUS_NONE;
     s->params = *params;
-    memcpy(s->enabled_capabilities, enabled_capabilities,
-           sizeof(enabled_capabilities));
-    s->xbzrle_cache_size = xbzrle_cache_size;
-
-    s->parameters[MIGRATION_PARAMETER_COMPRESS_LEVEL] = compress_level;
-    s->parameters[MIGRATION_PARAMETER_COMPRESS_THREADS] =
-               compress_thread_count;
-    s->parameters[MIGRATION_PARAMETER_DECOMPRESS_THREADS] =
-               decompress_thread_count;
-    s->parameters[MIGRATION_PARAMETER_X_CPU_THROTTLE_INITIAL] =
-                x_cpu_throttle_initial;
-    s->parameters[MIGRATION_PARAMETER_X_CPU_THROTTLE_INCREMENT] =
-                x_cpu_throttle_increment;
-    s->bandwidth_limit = bandwidth_limit;
+    s->rp_state.from_dst_file = NULL;
+    s->rp_state.error = false;
+    s->mbps = 0.0;
+    s->downtime = 0;
+    s->expected_downtime = 0;
+    s->dirty_pages_rate = 0;
+    s->dirty_bytes_rate = 0;
+    s->setup_time = 0;
+    s->dirty_sync_count = 0;
+    s->start_postcopy = false;
+    s->migration_thread_running = false;
+    s->last_req_rb = NULL;
+
     migrate_set_state(s, MIGRATION_STATUS_NONE, MIGRATION_STATUS_SETUP);
 
     QSIMPLEQ_INIT(&s->src_page_requests);