diff mbox

[v2,2/9] migration: Fix a potential issue

Message ID 1462433579-13691-3-git-send-email-liang.z.li@intel.com
State New
Headers show

Commit Message

Li, Liang Z May 5, 2016, 7:32 a.m. UTC
At the end of live migration and before vm_start() on the destination
side, we should make sure all the decompression tasks are finished, if
this can not be guaranteed, the VM may get the incorrect memory data,
or the updated memory may be overwritten by the decompression thread.
Add the code to fix this potential issue.

Suggested-by: David Alan Gilbert <dgilbert@redhat.com>
Suggested-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Liang Li <liang.z.li@intel.com>
---
 migration/ram.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

Comments

Amit Shah June 10, 2016, 1:39 p.m. UTC | #1
On (Thu) 05 May 2016 [15:32:52], Liang Li wrote:
> At the end of live migration and before vm_start() on the destination
> side, we should make sure all the decompression tasks are finished, if
> this can not be guaranteed, the VM may get the incorrect memory data,
> or the updated memory may be overwritten by the decompression thread.
> Add the code to fix this potential issue.
> 
> Suggested-by: David Alan Gilbert <dgilbert@redhat.com>
> Suggested-by: Juan Quintela <quintela@redhat.com>
> Signed-off-by: Liang Li <liang.z.li@intel.com>
> ---
>  migration/ram.c | 19 +++++++++++++++++++
>  1 file changed, 19 insertions(+)
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index 7ab6ab5..8a59a08 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -2220,6 +2220,24 @@ static void *do_data_decompress(void *opaque)
>      return NULL;
>  }
>  
> +static void wait_for_decompress_done(void)
> +{
> +    int idx, thread_count;
> +
> +    if (!migrate_use_compression()) {
> +        return;
> +    }
> +
> +    thread_count = migrate_decompress_threads();
> +    qemu_mutex_lock(&decomp_done_lock);
> +    for (idx = 0; idx < thread_count; idx++) {
> +        while (!decomp_param[idx].done) {
> +            qemu_cond_wait(&decomp_done_cond, &decomp_done_lock);
> +        }
> +    }
> +    qemu_mutex_unlock(&decomp_done_lock);

Not sure how this works: in the previous patch, done is set to false
under the decomp_done_lock.  Here, we take the lock, and wait for done
to turn false.  That can't happen because this thread holds the lock.
My reading is this is going to lead to a deadlock.  What am I missing?


		Amit
Li, Liang Z June 10, 2016, 3:03 p.m. UTC | #2
> Subject: Re: [PATCH v2 2/9] migration: Fix a potential issue
> 
> On (Thu) 05 May 2016 [15:32:52], Liang Li wrote:
> > At the end of live migration and before vm_start() on the destination
> > side, we should make sure all the decompression tasks are finished, if
> > this can not be guaranteed, the VM may get the incorrect memory data,
> > or the updated memory may be overwritten by the decompression thread.
> > Add the code to fix this potential issue.
> >
> > Suggested-by: David Alan Gilbert <dgilbert@redhat.com>
> > Suggested-by: Juan Quintela <quintela@redhat.com>
> > Signed-off-by: Liang Li <liang.z.li@intel.com>
> > ---
> >  migration/ram.c | 19 +++++++++++++++++++
> >  1 file changed, 19 insertions(+)
> >
> > diff --git a/migration/ram.c b/migration/ram.c index 7ab6ab5..8a59a08
> > 100644
> > --- a/migration/ram.c
> > +++ b/migration/ram.c
> > @@ -2220,6 +2220,24 @@ static void *do_data_decompress(void *opaque)
> >      return NULL;
> >  }
> >
> > +static void wait_for_decompress_done(void) {
> > +    int idx, thread_count;
> > +
> > +    if (!migrate_use_compression()) {
> > +        return;
> > +    }
> > +
> > +    thread_count = migrate_decompress_threads();
> > +    qemu_mutex_lock(&decomp_done_lock);
> > +    for (idx = 0; idx < thread_count; idx++) {
> > +        while (!decomp_param[idx].done) {
> > +            qemu_cond_wait(&decomp_done_cond, &decomp_done_lock);
> > +        }
> > +    }
> > +    qemu_mutex_unlock(&decomp_done_lock);
> 
> Not sure how this works: in the previous patch, done is set to false under the
> decomp_done_lock.  Here, we take the lock, and wait for done to turn false.
> That can't happen because this thread holds the lock.
> My reading is this is going to lead to a deadlock.  What am I missing?
> 
> 
> 		Amit

This is the typical usage of the QemuCond, actually, in qemu_cond_wait() ,
decomp_done_lock will be unlocked at first and then locked again before 
qemu_cond_wait() return.  So deadlock won't happen.

Liang
Amit Shah June 13, 2016, 4:36 a.m. UTC | #3
On (Fri) 10 Jun 2016 [15:03:15], Li, Liang Z wrote:
> > Subject: Re: [PATCH v2 2/9] migration: Fix a potential issue
> > 
> > On (Thu) 05 May 2016 [15:32:52], Liang Li wrote:
> > > At the end of live migration and before vm_start() on the destination
> > > side, we should make sure all the decompression tasks are finished, if
> > > this can not be guaranteed, the VM may get the incorrect memory data,
> > > or the updated memory may be overwritten by the decompression thread.
> > > Add the code to fix this potential issue.
> > >
> > > Suggested-by: David Alan Gilbert <dgilbert@redhat.com>
> > > Suggested-by: Juan Quintela <quintela@redhat.com>
> > > Signed-off-by: Liang Li <liang.z.li@intel.com>
> > > ---
> > >  migration/ram.c | 19 +++++++++++++++++++
> > >  1 file changed, 19 insertions(+)
> > >
> > > diff --git a/migration/ram.c b/migration/ram.c index 7ab6ab5..8a59a08
> > > 100644
> > > --- a/migration/ram.c
> > > +++ b/migration/ram.c
> > > @@ -2220,6 +2220,24 @@ static void *do_data_decompress(void *opaque)
> > >      return NULL;
> > >  }
> > >
> > > +static void wait_for_decompress_done(void) {
> > > +    int idx, thread_count;
> > > +
> > > +    if (!migrate_use_compression()) {
> > > +        return;
> > > +    }
> > > +
> > > +    thread_count = migrate_decompress_threads();
> > > +    qemu_mutex_lock(&decomp_done_lock);
> > > +    for (idx = 0; idx < thread_count; idx++) {
> > > +        while (!decomp_param[idx].done) {
> > > +            qemu_cond_wait(&decomp_done_cond, &decomp_done_lock);
> > > +        }
> > > +    }
> > > +    qemu_mutex_unlock(&decomp_done_lock);
> > 
> > Not sure how this works: in the previous patch, done is set to false under the
> > decomp_done_lock.  Here, we take the lock, and wait for done to turn false.
> > That can't happen because this thread holds the lock.
> > My reading is this is going to lead to a deadlock.  What am I missing?
> > 
> 
> This is the typical usage of the QemuCond, actually, in qemu_cond_wait() ,
> decomp_done_lock will be unlocked at first and then locked again before 
> qemu_cond_wait() return.  So deadlock won't happen.

In qemu-thread-posix.c, I don't see such unlock/lock.


		Amit
Li, Liang Z June 13, 2016, 5:07 a.m. UTC | #4
> > > > +static void wait_for_decompress_done(void) {
> > > > +    int idx, thread_count;
> > > > +
> > > > +    if (!migrate_use_compression()) {
> > > > +        return;
> > > > +    }
> > > > +
> > > > +    thread_count = migrate_decompress_threads();
> > > > +    qemu_mutex_lock(&decomp_done_lock);
> > > > +    for (idx = 0; idx < thread_count; idx++) {
> > > > +        while (!decomp_param[idx].done) {
> > > > +            qemu_cond_wait(&decomp_done_cond,
> &decomp_done_lock);
> > > > +        }
> > > > +    }
> > > > +    qemu_mutex_unlock(&decomp_done_lock);
> > >
> > > Not sure how this works: in the previous patch, done is set to false
> > > under the decomp_done_lock.  Here, we take the lock, and wait for done
> to turn false.
> > > That can't happen because this thread holds the lock.
> > > My reading is this is going to lead to a deadlock.  What am I missing?
> > >
> >
> > This is the typical usage of the QemuCond, actually, in
> > qemu_cond_wait() , decomp_done_lock will be unlocked at first and then
> > locked again before
> > qemu_cond_wait() return.  So deadlock won't happen.
> 
> In qemu-thread-posix.c, I don't see such unlock/lock.
> 
> 
> 		Amit

I mean in the 'pthread_cond_wait()' which called by qemu_cond_wait().

Liang
Amit Shah June 13, 2016, 10:33 a.m. UTC | #5
On (Mon) 13 Jun 2016 [05:07:39], Li, Liang Z wrote:
> > > > > +static void wait_for_decompress_done(void) {
> > > > > +    int idx, thread_count;
> > > > > +
> > > > > +    if (!migrate_use_compression()) {
> > > > > +        return;
> > > > > +    }
> > > > > +
> > > > > +    thread_count = migrate_decompress_threads();
> > > > > +    qemu_mutex_lock(&decomp_done_lock);
> > > > > +    for (idx = 0; idx < thread_count; idx++) {
> > > > > +        while (!decomp_param[idx].done) {
> > > > > +            qemu_cond_wait(&decomp_done_cond,
> > &decomp_done_lock);
> > > > > +        }
> > > > > +    }
> > > > > +    qemu_mutex_unlock(&decomp_done_lock);
> > > >
> > > > Not sure how this works: in the previous patch, done is set to false
> > > > under the decomp_done_lock.  Here, we take the lock, and wait for done
> > to turn false.
> > > > That can't happen because this thread holds the lock.
> > > > My reading is this is going to lead to a deadlock.  What am I missing?
> > > >
> > >
> > > This is the typical usage of the QemuCond, actually, in
> > > qemu_cond_wait() , decomp_done_lock will be unlocked at first and then
> > > locked again before
> > > qemu_cond_wait() return.  So deadlock won't happen.
> > 
> > In qemu-thread-posix.c, I don't see such unlock/lock.
> > 
> > 
> > 		Amit
> 
> I mean in the 'pthread_cond_wait()' which called by qemu_cond_wait().

Yes, OK - makes sense now.  Thanks, I'll continue the review.

		Amit
diff mbox

Patch

diff --git a/migration/ram.c b/migration/ram.c
index 7ab6ab5..8a59a08 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2220,6 +2220,24 @@  static void *do_data_decompress(void *opaque)
     return NULL;
 }
 
+static void wait_for_decompress_done(void)
+{
+    int idx, thread_count;
+
+    if (!migrate_use_compression()) {
+        return;
+    }
+
+    thread_count = migrate_decompress_threads();
+    qemu_mutex_lock(&decomp_done_lock);
+    for (idx = 0; idx < thread_count; idx++) {
+        while (!decomp_param[idx].done) {
+            qemu_cond_wait(&decomp_done_cond, &decomp_done_lock);
+        }
+    }
+    qemu_mutex_unlock(&decomp_done_lock);
+}
+
 void migrate_decompress_threads_create(void)
 {
     int i, thread_count;
@@ -2554,6 +2572,7 @@  static int ram_load(QEMUFile *f, void *opaque, int version_id)
         }
     }
 
+    wait_for_decompress_done();
     rcu_read_unlock();
     DPRINTF("Completed load of VM with exit code %d seq iteration "
             "%" PRIu64 "\n", ret, seq_iter);