diff mbox

[1/1] migration: fix ram decompression race deadlock

Message ID 1463124421-16587-1-git-send-email-den@openvz.org
State New
Headers show

Commit Message

Denis V. Lunev May 13, 2016, 7:27 a.m. UTC
From: Maxim Nestratov <mnestratov@virtuozzo.com>

There is a race in between do_data_decompress and start_decompression.

do_data_decompress()
    while (!quit_decomp_thread) {
        qemu_mutex_lock(&param->mutex);
        while (!param->start && !quit_decomp_thread) {
            qemu_cond_wait(&param->cond, &param->mutex);
            ...
            param->start = false;
        }
        qemu_mutex_unlock(&param->mutex);
        [ preempted here, start_decompression() is executed ]
    }

start_decompression()
{
    qemu_mutex_lock(&param->mutex);
    param->start = true;
    qemu_cond_signal(&param->cond);
    qemu_mutex_unlock(&param->mutex);
}

In this case do_data_decompress will never enter inner loop again and
will eat 100% CPU. The patch fixes this problem by correcting while loop
where we wait for condition only and other actions are moved out of it.

Signed-off-by: Maxim Nestratov <mnestratov@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Juan Quintela <quintela@redhat.com>
CC: Amit Shah <amit.shah@redhat.com>
---
 migration/ram.c | 22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

Comments

Denis V. Lunev May 19, 2016, 12:03 p.m. UTC | #1
On 05/13/2016 10:27 AM, Denis V. Lunev wrote:
> From: Maxim Nestratov <mnestratov@virtuozzo.com>
>
> There is a race in between do_data_decompress and start_decompression.
>
> do_data_decompress()
>      while (!quit_decomp_thread) {
>          qemu_mutex_lock(&param->mutex);
>          while (!param->start && !quit_decomp_thread) {
>              qemu_cond_wait(&param->cond, &param->mutex);
>              ...
>              param->start = false;
>          }
>          qemu_mutex_unlock(&param->mutex);
>          [ preempted here, start_decompression() is executed ]
>      }
>
> start_decompression()
> {
>      qemu_mutex_lock(&param->mutex);
>      param->start = true;
>      qemu_cond_signal(&param->cond);
>      qemu_mutex_unlock(&param->mutex);
> }
>
> In this case do_data_decompress will never enter inner loop again and
> will eat 100% CPU. The patch fixes this problem by correcting while loop
> where we wait for condition only and other actions are moved out of it.
>
> Signed-off-by: Maxim Nestratov <mnestratov@virtuozzo.com>
> Signed-off-by: Denis V. Lunev <den@openvz.org>
> CC: Juan Quintela <quintela@redhat.com>
> CC: Amit Shah <amit.shah@redhat.com>
> ---
>   migration/ram.c | 22 +++++++++++-----------
>   1 file changed, 11 insertions(+), 11 deletions(-)
>
> diff --git a/migration/ram.c b/migration/ram.c
> index 3f05738..579bfc0 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -2193,18 +2193,18 @@ static void *do_data_decompress(void *opaque)
>           qemu_mutex_lock(&param->mutex);
>           while (!param->start && !quit_decomp_thread) {
>               qemu_cond_wait(&param->cond, &param->mutex);
> -            pagesize = TARGET_PAGE_SIZE;
> -            if (!quit_decomp_thread) {
> -                /* uncompress() will return failed in some case, especially
> -                 * when the page is dirted when doing the compression, it's
> -                 * not a problem because the dirty page will be retransferred
> -                 * and uncompress() won't break the data in other pages.
> -                 */
> -                uncompress((Bytef *)param->des, &pagesize,
> -                           (const Bytef *)param->compbuf, param->len);
> -            }
> -            param->start = false;
>           }
> +        pagesize = TARGET_PAGE_SIZE;
> +        if (!quit_decomp_thread) {
> +           /* uncompress() will return failed in some case, especially
> +            * when the page is dirted when doing the compression, it's
> +            * not a problem because the dirty page will be retransferred
> +            * and uncompress() won't break the data in other pages.
> +            */
> +            uncompress((Bytef *)param->des, &pagesize,
> +                       (const Bytef *)param->compbuf, param->len);
> +        }
> +        param->start = false;
>           qemu_mutex_unlock(&param->mutex);
>       }
>   
ping
Amit Shah May 23, 2016, 7:19 a.m. UTC | #2
Adding Liang Li to CC for his comments as the author of the feature.

On (Fri) 13 May 2016 [10:27:01], Denis V. Lunev wrote:
> From: Maxim Nestratov <mnestratov@virtuozzo.com>
> 
> There is a race in between do_data_decompress and start_decompression.
> 
> do_data_decompress()
>     while (!quit_decomp_thread) {
>         qemu_mutex_lock(&param->mutex);
>         while (!param->start && !quit_decomp_thread) {
>             qemu_cond_wait(&param->cond, &param->mutex);
>             ...
>             param->start = false;
>         }
>         qemu_mutex_unlock(&param->mutex);
>         [ preempted here, start_decompression() is executed ]
>     }
> 
> start_decompression()
> {
>     qemu_mutex_lock(&param->mutex);
>     param->start = true;
>     qemu_cond_signal(&param->cond);
>     qemu_mutex_unlock(&param->mutex);
> }
> 
> In this case do_data_decompress will never enter inner loop again and
> will eat 100% CPU. The patch fixes this problem by correcting while loop
> where we wait for condition only and other actions are moved out of it.
> 
> Signed-off-by: Maxim Nestratov <mnestratov@virtuozzo.com>
> Signed-off-by: Denis V. Lunev <den@openvz.org>
> CC: Juan Quintela <quintela@redhat.com>
> CC: Amit Shah <amit.shah@redhat.com>
> ---
>  migration/ram.c | 22 +++++++++++-----------
>  1 file changed, 11 insertions(+), 11 deletions(-)
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index 3f05738..579bfc0 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -2193,18 +2193,18 @@ static void *do_data_decompress(void *opaque)
>          qemu_mutex_lock(&param->mutex);
>          while (!param->start && !quit_decomp_thread) {
>              qemu_cond_wait(&param->cond, &param->mutex);
> -            pagesize = TARGET_PAGE_SIZE;
> -            if (!quit_decomp_thread) {
> -                /* uncompress() will return failed in some case, especially
> -                 * when the page is dirted when doing the compression, it's
> -                 * not a problem because the dirty page will be retransferred
> -                 * and uncompress() won't break the data in other pages.
> -                 */
> -                uncompress((Bytef *)param->des, &pagesize,
> -                           (const Bytef *)param->compbuf, param->len);
> -            }
> -            param->start = false;
>          }
> +        pagesize = TARGET_PAGE_SIZE;
> +        if (!quit_decomp_thread) {
> +           /* uncompress() will return failed in some case, especially
> +            * when the page is dirted when doing the compression, it's
> +            * not a problem because the dirty page will be retransferred
> +            * and uncompress() won't break the data in other pages.
> +            */
> +            uncompress((Bytef *)param->des, &pagesize,
> +                       (const Bytef *)param->compbuf, param->len);
> +        }
> +        param->start = false;
>          qemu_mutex_unlock(&param->mutex);
>      }
>  
> -- 
> 2.1.4
> 

		Amit
Li, Liang Z May 24, 2016, 2:07 a.m. UTC | #3
> Adding Liang Li to CC for his comments as the author of the feature.
> 
> On (Fri) 13 May 2016 [10:27:01], Denis V. Lunev wrote:
> > From: Maxim Nestratov <mnestratov@virtuozzo.com>
> >
> > There is a race in between do_data_decompress and start_decompression.
> >
> > do_data_decompress()
> >     while (!quit_decomp_thread) {
> >         qemu_mutex_lock(&param->mutex);
> >         while (!param->start && !quit_decomp_thread) {
> >             qemu_cond_wait(&param->cond, &param->mutex);
> >             ...
> >             param->start = false;
> >         }
> >         qemu_mutex_unlock(&param->mutex);
> >         [ preempted here, start_decompression() is executed ]
> >     }
> >
> > start_decompression()
> > {
> >     qemu_mutex_lock(&param->mutex);
> >     param->start = true;
> >     qemu_cond_signal(&param->cond);
> >     qemu_mutex_unlock(&param->mutex);
> > }
> >
> > In this case do_data_decompress will never enter inner loop again and
> > will eat 100% CPU. The patch fixes this problem by correcting while
> > loop where we wait for condition only and other actions are moved out of
> it.
> >
> > Signed-off-by: Maxim Nestratov <mnestratov@virtuozzo.com>
> > Signed-off-by: Denis V. Lunev <den@openvz.org>
> > CC: Juan Quintela <quintela@redhat.com>
> > CC: Amit Shah <amit.shah@redhat.com>
> > ---
> >  migration/ram.c | 22 +++++++++++-----------
> >  1 file changed, 11 insertions(+), 11 deletions(-)
> >
> > diff --git a/migration/ram.c b/migration/ram.c index 3f05738..579bfc0
> > 100644
> > --- a/migration/ram.c
> > +++ b/migration/ram.c
> > @@ -2193,18 +2193,18 @@ static void *do_data_decompress(void
> *opaque)
> >          qemu_mutex_lock(&param->mutex);
> >          while (!param->start && !quit_decomp_thread) {
> >              qemu_cond_wait(&param->cond, &param->mutex);
> > -            pagesize = TARGET_PAGE_SIZE;
> > -            if (!quit_decomp_thread) {
> > -                /* uncompress() will return failed in some case, especially
> > -                 * when the page is dirted when doing the compression, it's
> > -                 * not a problem because the dirty page will be retransferred
> > -                 * and uncompress() won't break the data in other pages.
> > -                 */
> > -                uncompress((Bytef *)param->des, &pagesize,
> > -                           (const Bytef *)param->compbuf, param->len);
> > -            }
> > -            param->start = false;
> >          }
> > +        pagesize = TARGET_PAGE_SIZE;
> > +        if (!quit_decomp_thread) {
> > +           /* uncompress() will return failed in some case, especially
> > +            * when the page is dirted when doing the compression, it's
> > +            * not a problem because the dirty page will be retransferred
> > +            * and uncompress() won't break the data in other pages.
> > +            */
> > +            uncompress((Bytef *)param->des, &pagesize,
> > +                       (const Bytef *)param->compbuf, param->len);
> > +        }
> > +        param->start = false;
> >          qemu_mutex_unlock(&param->mutex);
> >      }

Thanks for your patch!  And it can help to fix this issue.
Actually, the is not the only issue in the current multi-thread (de)compression code.
So I submitted a path set to fix all of them before your patch. And sorry for my mistake.

Refer to the link:
https://lists.gnu.org/archive/html/qemu-devel/2016-05/msg00631.html
for more information.

Thanks
Liang
Denis V. Lunev May 24, 2016, 5:35 a.m. UTC | #4
On 05/24/2016 05:07 AM, Li, Liang Z wrote:
>> Adding Liang Li to CC for his comments as the author of the feature.
>>
>> On (Fri) 13 May 2016 [10:27:01], Denis V. Lunev wrote:
>>> From: Maxim Nestratov <mnestratov@virtuozzo.com>
>>>
>>> There is a race in between do_data_decompress and start_decompression.
>>>
>>> do_data_decompress()
>>>      while (!quit_decomp_thread) {
>>>          qemu_mutex_lock(&param->mutex);
>>>          while (!param->start && !quit_decomp_thread) {
>>>              qemu_cond_wait(&param->cond, &param->mutex);
>>>              ...
>>>              param->start = false;
>>>          }
>>>          qemu_mutex_unlock(&param->mutex);
>>>          [ preempted here, start_decompression() is executed ]
>>>      }
>>>
>>> start_decompression()
>>> {
>>>      qemu_mutex_lock(&param->mutex);
>>>      param->start = true;
>>>      qemu_cond_signal(&param->cond);
>>>      qemu_mutex_unlock(&param->mutex);
>>> }
>>>
>>> In this case do_data_decompress will never enter inner loop again and
>>> will eat 100% CPU. The patch fixes this problem by correcting while
>>> loop where we wait for condition only and other actions are moved out of
>> it.
>>> Signed-off-by: Maxim Nestratov <mnestratov@virtuozzo.com>
>>> Signed-off-by: Denis V. Lunev <den@openvz.org>
>>> CC: Juan Quintela <quintela@redhat.com>
>>> CC: Amit Shah <amit.shah@redhat.com>
>>> ---
>>>   migration/ram.c | 22 +++++++++++-----------
>>>   1 file changed, 11 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/migration/ram.c b/migration/ram.c index 3f05738..579bfc0
>>> 100644
>>> --- a/migration/ram.c
>>> +++ b/migration/ram.c
>>> @@ -2193,18 +2193,18 @@ static void *do_data_decompress(void
>> *opaque)
>>>           qemu_mutex_lock(&param->mutex);
>>>           while (!param->start && !quit_decomp_thread) {
>>>               qemu_cond_wait(&param->cond, &param->mutex);
>>> -            pagesize = TARGET_PAGE_SIZE;
>>> -            if (!quit_decomp_thread) {
>>> -                /* uncompress() will return failed in some case, especially
>>> -                 * when the page is dirted when doing the compression, it's
>>> -                 * not a problem because the dirty page will be retransferred
>>> -                 * and uncompress() won't break the data in other pages.
>>> -                 */
>>> -                uncompress((Bytef *)param->des, &pagesize,
>>> -                           (const Bytef *)param->compbuf, param->len);
>>> -            }
>>> -            param->start = false;
>>>           }
>>> +        pagesize = TARGET_PAGE_SIZE;
>>> +        if (!quit_decomp_thread) {
>>> +           /* uncompress() will return failed in some case, especially
>>> +            * when the page is dirted when doing the compression, it's
>>> +            * not a problem because the dirty page will be retransferred
>>> +            * and uncompress() won't break the data in other pages.
>>> +            */
>>> +            uncompress((Bytef *)param->des, &pagesize,
>>> +                       (const Bytef *)param->compbuf, param->len);
>>> +        }
>>> +        param->start = false;
>>>           qemu_mutex_unlock(&param->mutex);
>>>       }
> Thanks for your patch!  And it can help to fix this issue.
> Actually, the is not the only issue in the current multi-thread (de)compression code.
> So I submitted a path set to fix all of them before your patch. And sorry for my mistake.
>
> Refer to the link:
> https://lists.gnu.org/archive/html/qemu-devel/2016-05/msg00631.html
> for more information.
>
> Thanks
> Liang
yep. thanks a lot. Your patch would be better ;)
Amit Shah June 10, 2016, 6:43 a.m. UTC | #5
On (Tue) 24 May 2016 [08:35:32], Denis V. Lunev wrote:
> On 05/24/2016 05:07 AM, Li, Liang Z wrote:

> > Thanks for your patch!  And it can help to fix this issue.
> > Actually, the is not the only issue in the current multi-thread (de)compression code.
> > So I submitted a path set to fix all of them before your patch. And sorry for my mistake.
> > 
> > Refer to the link:
> > https://lists.gnu.org/archive/html/qemu-devel/2016-05/msg00631.html
> > for more information.
> > 
> > Thanks
> > Liang
> yep. thanks a lot. Your patch would be better ;)

OK, I'll drop this patch.


		Amit
diff mbox

Patch

diff --git a/migration/ram.c b/migration/ram.c
index 3f05738..579bfc0 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2193,18 +2193,18 @@  static void *do_data_decompress(void *opaque)
         qemu_mutex_lock(&param->mutex);
         while (!param->start && !quit_decomp_thread) {
             qemu_cond_wait(&param->cond, &param->mutex);
-            pagesize = TARGET_PAGE_SIZE;
-            if (!quit_decomp_thread) {
-                /* uncompress() will return failed in some case, especially
-                 * when the page is dirted when doing the compression, it's
-                 * not a problem because the dirty page will be retransferred
-                 * and uncompress() won't break the data in other pages.
-                 */
-                uncompress((Bytef *)param->des, &pagesize,
-                           (const Bytef *)param->compbuf, param->len);
-            }
-            param->start = false;
         }
+        pagesize = TARGET_PAGE_SIZE;
+        if (!quit_decomp_thread) {
+           /* uncompress() will return failed in some case, especially
+            * when the page is dirted when doing the compression, it's
+            * not a problem because the dirty page will be retransferred
+            * and uncompress() won't break the data in other pages.
+            */
+            uncompress((Bytef *)param->des, &pagesize,
+                       (const Bytef *)param->compbuf, param->len);
+        }
+        param->start = false;
         qemu_mutex_unlock(&param->mutex);
     }