Message ID | 1462433579-13691-3-git-send-email-liang.z.li@intel.com |
---|---|
State | New |
Headers | show |
On (Thu) 05 May 2016 [15:32:52], Liang Li wrote: > At the end of live migration and before vm_start() on the destination > side, we should make sure all the decompression tasks are finished, if > this can not be guaranteed, the VM may get the incorrect memory data, > or the updated memory may be overwritten by the decompression thread. > Add the code to fix this potential issue. > > Suggested-by: David Alan Gilbert <dgilbert@redhat.com> > Suggested-by: Juan Quintela <quintela@redhat.com> > Signed-off-by: Liang Li <liang.z.li@intel.com> > --- > migration/ram.c | 19 +++++++++++++++++++ > 1 file changed, 19 insertions(+) > > diff --git a/migration/ram.c b/migration/ram.c > index 7ab6ab5..8a59a08 100644 > --- a/migration/ram.c > +++ b/migration/ram.c > @@ -2220,6 +2220,24 @@ static void *do_data_decompress(void *opaque) > return NULL; > } > > +static void wait_for_decompress_done(void) > +{ > + int idx, thread_count; > + > + if (!migrate_use_compression()) { > + return; > + } > + > + thread_count = migrate_decompress_threads(); > + qemu_mutex_lock(&decomp_done_lock); > + for (idx = 0; idx < thread_count; idx++) { > + while (!decomp_param[idx].done) { > + qemu_cond_wait(&decomp_done_cond, &decomp_done_lock); > + } > + } > + qemu_mutex_unlock(&decomp_done_lock); Not sure how this works: in the previous patch, done is set to false under the decomp_done_lock. Here, we take the lock, and wait for done to turn false. That can't happen because this thread holds the lock. My reading is this is going to lead to a deadlock. What am I missing? Amit
> Subject: Re: [PATCH v2 2/9] migration: Fix a potential issue > > On (Thu) 05 May 2016 [15:32:52], Liang Li wrote: > > At the end of live migration and before vm_start() on the destination > > side, we should make sure all the decompression tasks are finished, if > > this can not be guaranteed, the VM may get the incorrect memory data, > > or the updated memory may be overwritten by the decompression thread. > > Add the code to fix this potential issue. > > > > Suggested-by: David Alan Gilbert <dgilbert@redhat.com> > > Suggested-by: Juan Quintela <quintela@redhat.com> > > Signed-off-by: Liang Li <liang.z.li@intel.com> > > --- > > migration/ram.c | 19 +++++++++++++++++++ > > 1 file changed, 19 insertions(+) > > > > diff --git a/migration/ram.c b/migration/ram.c index 7ab6ab5..8a59a08 > > 100644 > > --- a/migration/ram.c > > +++ b/migration/ram.c > > @@ -2220,6 +2220,24 @@ static void *do_data_decompress(void *opaque) > > return NULL; > > } > > > > +static void wait_for_decompress_done(void) { > > + int idx, thread_count; > > + > > + if (!migrate_use_compression()) { > > + return; > > + } > > + > > + thread_count = migrate_decompress_threads(); > > + qemu_mutex_lock(&decomp_done_lock); > > + for (idx = 0; idx < thread_count; idx++) { > > + while (!decomp_param[idx].done) { > > + qemu_cond_wait(&decomp_done_cond, &decomp_done_lock); > > + } > > + } > > + qemu_mutex_unlock(&decomp_done_lock); > > Not sure how this works: in the previous patch, done is set to false under the > decomp_done_lock. Here, we take the lock, and wait for done to turn false. > That can't happen because this thread holds the lock. > My reading is this is going to lead to a deadlock. What am I missing? > > > Amit This is the typical usage of the QemuCond, actually, in qemu_cond_wait() , decomp_done_lock will be unlocked at first and then locked again before qemu_cond_wait() return. So deadlock won't happen. Liang
On (Fri) 10 Jun 2016 [15:03:15], Li, Liang Z wrote: > > Subject: Re: [PATCH v2 2/9] migration: Fix a potential issue > > > > On (Thu) 05 May 2016 [15:32:52], Liang Li wrote: > > > At the end of live migration and before vm_start() on the destination > > > side, we should make sure all the decompression tasks are finished, if > > > this can not be guaranteed, the VM may get the incorrect memory data, > > > or the updated memory may be overwritten by the decompression thread. > > > Add the code to fix this potential issue. > > > > > > Suggested-by: David Alan Gilbert <dgilbert@redhat.com> > > > Suggested-by: Juan Quintela <quintela@redhat.com> > > > Signed-off-by: Liang Li <liang.z.li@intel.com> > > > --- > > > migration/ram.c | 19 +++++++++++++++++++ > > > 1 file changed, 19 insertions(+) > > > > > > diff --git a/migration/ram.c b/migration/ram.c index 7ab6ab5..8a59a08 > > > 100644 > > > --- a/migration/ram.c > > > +++ b/migration/ram.c > > > @@ -2220,6 +2220,24 @@ static void *do_data_decompress(void *opaque) > > > return NULL; > > > } > > > > > > +static void wait_for_decompress_done(void) { > > > + int idx, thread_count; > > > + > > > + if (!migrate_use_compression()) { > > > + return; > > > + } > > > + > > > + thread_count = migrate_decompress_threads(); > > > + qemu_mutex_lock(&decomp_done_lock); > > > + for (idx = 0; idx < thread_count; idx++) { > > > + while (!decomp_param[idx].done) { > > > + qemu_cond_wait(&decomp_done_cond, &decomp_done_lock); > > > + } > > > + } > > > + qemu_mutex_unlock(&decomp_done_lock); > > > > Not sure how this works: in the previous patch, done is set to false under the > > decomp_done_lock. Here, we take the lock, and wait for done to turn false. > > That can't happen because this thread holds the lock. > > My reading is this is going to lead to a deadlock. What am I missing? > > > > This is the typical usage of the QemuCond, actually, in qemu_cond_wait() , > decomp_done_lock will be unlocked at first and then locked again before > qemu_cond_wait() return. So deadlock won't happen. In qemu-thread-posix.c, I don't see such unlock/lock. Amit
> > > > +static void wait_for_decompress_done(void) { > > > > + int idx, thread_count; > > > > + > > > > + if (!migrate_use_compression()) { > > > > + return; > > > > + } > > > > + > > > > + thread_count = migrate_decompress_threads(); > > > > + qemu_mutex_lock(&decomp_done_lock); > > > > + for (idx = 0; idx < thread_count; idx++) { > > > > + while (!decomp_param[idx].done) { > > > > + qemu_cond_wait(&decomp_done_cond, > &decomp_done_lock); > > > > + } > > > > + } > > > > + qemu_mutex_unlock(&decomp_done_lock); > > > > > > Not sure how this works: in the previous patch, done is set to false > > > under the decomp_done_lock. Here, we take the lock, and wait for done > to turn false. > > > That can't happen because this thread holds the lock. > > > My reading is this is going to lead to a deadlock. What am I missing? > > > > > > > This is the typical usage of the QemuCond, actually, in > > qemu_cond_wait() , decomp_done_lock will be unlocked at first and then > > locked again before > > qemu_cond_wait() return. So deadlock won't happen. > > In qemu-thread-posix.c, I don't see such unlock/lock. > > > Amit I mean in the 'pthread_cond_wait()' which called by qemu_cond_wait(). Liang
On (Mon) 13 Jun 2016 [05:07:39], Li, Liang Z wrote: > > > > > +static void wait_for_decompress_done(void) { > > > > > + int idx, thread_count; > > > > > + > > > > > + if (!migrate_use_compression()) { > > > > > + return; > > > > > + } > > > > > + > > > > > + thread_count = migrate_decompress_threads(); > > > > > + qemu_mutex_lock(&decomp_done_lock); > > > > > + for (idx = 0; idx < thread_count; idx++) { > > > > > + while (!decomp_param[idx].done) { > > > > > + qemu_cond_wait(&decomp_done_cond, > > &decomp_done_lock); > > > > > + } > > > > > + } > > > > > + qemu_mutex_unlock(&decomp_done_lock); > > > > > > > > Not sure how this works: in the previous patch, done is set to false > > > > under the decomp_done_lock. Here, we take the lock, and wait for done > > to turn false. > > > > That can't happen because this thread holds the lock. > > > > My reading is this is going to lead to a deadlock. What am I missing? > > > > > > > > > > This is the typical usage of the QemuCond, actually, in > > > qemu_cond_wait() , decomp_done_lock will be unlocked at first and then > > > locked again before > > > qemu_cond_wait() return. So deadlock won't happen. > > > > In qemu-thread-posix.c, I don't see such unlock/lock. > > > > > > Amit > > I mean in the 'pthread_cond_wait()' which called by qemu_cond_wait(). Yes, OK - makes sense now. Thanks, I'll continue the review. Amit
diff --git a/migration/ram.c b/migration/ram.c index 7ab6ab5..8a59a08 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -2220,6 +2220,24 @@ static void *do_data_decompress(void *opaque) return NULL; } +static void wait_for_decompress_done(void) +{ + int idx, thread_count; + + if (!migrate_use_compression()) { + return; + } + + thread_count = migrate_decompress_threads(); + qemu_mutex_lock(&decomp_done_lock); + for (idx = 0; idx < thread_count; idx++) { + while (!decomp_param[idx].done) { + qemu_cond_wait(&decomp_done_cond, &decomp_done_lock); + } + } + qemu_mutex_unlock(&decomp_done_lock); +} + void migrate_decompress_threads_create(void) { int i, thread_count; @@ -2554,6 +2572,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id) } } + wait_for_decompress_done(); rcu_read_unlock(); DPRINTF("Completed load of VM with exit code %d seq iteration " "%" PRIu64 "\n", ret, seq_iter);
At the end of live migration and before vm_start() on the destination side, we should make sure all the decompression tasks are finished, if this can not be guaranteed, the VM may get the incorrect memory data, or the updated memory may be overwritten by the decompression thread. Add the code to fix this potential issue. Suggested-by: David Alan Gilbert <dgilbert@redhat.com> Suggested-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Liang Li <liang.z.li@intel.com> --- migration/ram.c | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+)