diff mbox

[1/1] migration: fix deadlock

Message ID 1443099218-14857-1-git-send-email-den@openvz.org
State New
Headers show

Commit Message

Denis V. Lunev Sept. 24, 2015, 12:53 p.m. UTC
From: Igor Redko <redkoi@virtuozzo.com>

Release qemu global mutex before call synchronize_rcu().
synchronize_rcu() waiting for all readers to finish their critical
sections. There is at least one critical section in which we try
to get QGM (critical section is in address_space_rw() and
prepare_mmio_access() is trying to aquire QGM).

Both functions (migration_end() and migration_bitmap_extend())
are called from main thread which is holding QGM.

Thus there is a race condition that ends up with deadlock:
main thread		working thread
Lock QGA                |
|             Call KVM_EXIT_IO handler
|			            |
|        Open rcu reader's critical section
Migration cleanup bh    |
|                       |
synchronize_rcu() is    |
waiting for readers     |
|            prepare_mmio_access() is waiting for QGM
  \                   /
         deadlock

The patch just releases QGM before calling synchronize_rcu().

Signed-off-by: Igor Redko <redkoi@virtuozzo.com>
Reviewed-by: Anna Melekhova <annam@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Juan Quintela <quintela@redhat.com>
CC: Amit Shah <amit.shah@redhat.com>
---
 migration/ram.c | 6 ++++++
 1 file changed, 6 insertions(+)

Comments

Wen Congyang Sept. 25, 2015, 1:21 a.m. UTC | #1
On 09/24/2015 08:53 PM, Denis V. Lunev wrote:
> From: Igor Redko <redkoi@virtuozzo.com>
> 
> Release qemu global mutex before call synchronize_rcu().
> synchronize_rcu() waiting for all readers to finish their critical
> sections. There is at least one critical section in which we try
> to get QGM (critical section is in address_space_rw() and
> prepare_mmio_access() is trying to aquire QGM).
> 
> Both functions (migration_end() and migration_bitmap_extend())
> are called from main thread which is holding QGM.
> 
> Thus there is a race condition that ends up with deadlock:
> main thread		working thread
> Lock QGA                |
> |             Call KVM_EXIT_IO handler
> |			            |
> |        Open rcu reader's critical section
> Migration cleanup bh    |
> |                       |
> synchronize_rcu() is    |
> waiting for readers     |
> |            prepare_mmio_access() is waiting for QGM
>   \                   /
>          deadlock
> 
> The patch just releases QGM before calling synchronize_rcu().
> 
> Signed-off-by: Igor Redko <redkoi@virtuozzo.com>
> Reviewed-by: Anna Melekhova <annam@virtuozzo.com>
> Signed-off-by: Denis V. Lunev <den@openvz.org>
> CC: Juan Quintela <quintela@redhat.com>
> CC: Amit Shah <amit.shah@redhat.com>
> ---
>  migration/ram.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index 7f007e6..d01febc 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -1028,12 +1028,16 @@ static void migration_end(void)
>  {
>      /* caller have hold iothread lock or is in a bh, so there is
>       * no writing race against this migration_bitmap
> +     * but rcu used not only for migration_bitmap, so we should
> +     * release QGM or we get in deadlock.
>       */
>      unsigned long *bitmap = migration_bitmap;
>      atomic_rcu_set(&migration_bitmap, NULL);
>      if (bitmap) {
>          memory_global_dirty_log_stop();
> +        qemu_mutex_unlock_iothread();
>          synchronize_rcu();
> +        qemu_mutex_lock_iothread();

migration_end() can called in two cases:
1. migration_completed
2. migration is cancelled

In case 1, you should not unlock iothread, otherwise, the vm's state may be changed
unexpectedly.

>          g_free(bitmap);
>      }
>  
> @@ -1085,7 +1089,9 @@ void migration_bitmap_extend(ram_addr_t old, ram_addr_t new)
>          atomic_rcu_set(&migration_bitmap, bitmap);
>          qemu_mutex_unlock(&migration_bitmap_mutex);
>          migration_dirty_pages += new - old;
> +        qemu_mutex_unlock_iothread();
>          synchronize_rcu();
> +        qemu_mutex_lock_iothread();

Hmm, I think it is OK to unlock iothread here

>          g_free(old_bitmap);
>      }
>  }
>
Denis V. Lunev Sept. 25, 2015, 8:03 a.m. UTC | #2
On 09/25/2015 04:21 AM, Wen Congyang wrote:
> On 09/24/2015 08:53 PM, Denis V. Lunev wrote:
>> From: Igor Redko <redkoi@virtuozzo.com>
>>
>> Release qemu global mutex before call synchronize_rcu().
>> synchronize_rcu() waiting for all readers to finish their critical
>> sections. There is at least one critical section in which we try
>> to get QGM (critical section is in address_space_rw() and
>> prepare_mmio_access() is trying to aquire QGM).
>>
>> Both functions (migration_end() and migration_bitmap_extend())
>> are called from main thread which is holding QGM.
>>
>> Thus there is a race condition that ends up with deadlock:
>> main thread		working thread
>> Lock QGA                |
>> |             Call KVM_EXIT_IO handler
>> |			            |
>> |        Open rcu reader's critical section
>> Migration cleanup bh    |
>> |                       |
>> synchronize_rcu() is    |
>> waiting for readers     |
>> |            prepare_mmio_access() is waiting for QGM
>>    \                   /
>>           deadlock
>>
>> The patch just releases QGM before calling synchronize_rcu().
>>
>> Signed-off-by: Igor Redko <redkoi@virtuozzo.com>
>> Reviewed-by: Anna Melekhova <annam@virtuozzo.com>
>> Signed-off-by: Denis V. Lunev <den@openvz.org>
>> CC: Juan Quintela <quintela@redhat.com>
>> CC: Amit Shah <amit.shah@redhat.com>
>> ---
>>   migration/ram.c | 6 ++++++
>>   1 file changed, 6 insertions(+)
>>
>> diff --git a/migration/ram.c b/migration/ram.c
>> index 7f007e6..d01febc 100644
>> --- a/migration/ram.c
>> +++ b/migration/ram.c
>> @@ -1028,12 +1028,16 @@ static void migration_end(void)
>>   {
>>       /* caller have hold iothread lock or is in a bh, so there is
>>        * no writing race against this migration_bitmap
>> +     * but rcu used not only for migration_bitmap, so we should
>> +     * release QGM or we get in deadlock.
>>        */
>>       unsigned long *bitmap = migration_bitmap;
>>       atomic_rcu_set(&migration_bitmap, NULL);
>>       if (bitmap) {
>>           memory_global_dirty_log_stop();
>> +        qemu_mutex_unlock_iothread();
>>           synchronize_rcu();
>> +        qemu_mutex_lock_iothread();
> migration_end() can called in two cases:
> 1. migration_completed
> 2. migration is cancelled
>
> In case 1, you should not unlock iothread, otherwise, the vm's state may be changed
> unexpectedly.

sorry, but there is now very good choice here. We should either
unlock or not call synchronize_rcu which is also an option.

In the other case the rework should be much more sufficient.

Den

>>           g_free(bitmap);
>>       }
>>   
>> @@ -1085,7 +1089,9 @@ void migration_bitmap_extend(ram_addr_t old, ram_addr_t new)
>>           atomic_rcu_set(&migration_bitmap, bitmap);
>>           qemu_mutex_unlock(&migration_bitmap_mutex);
>>           migration_dirty_pages += new - old;
>> +        qemu_mutex_unlock_iothread();
>>           synchronize_rcu();
>> +        qemu_mutex_lock_iothread();
> Hmm, I think it is OK to unlock iothread here
>
>>           g_free(old_bitmap);
>>       }
>>   }
>>
Wen Congyang Sept. 25, 2015, 8:23 a.m. UTC | #3
On 09/25/2015 04:03 PM, Denis V. Lunev wrote:
> On 09/25/2015 04:21 AM, Wen Congyang wrote:
>> On 09/24/2015 08:53 PM, Denis V. Lunev wrote:
>>> From: Igor Redko <redkoi@virtuozzo.com>
>>>
>>> Release qemu global mutex before call synchronize_rcu().
>>> synchronize_rcu() waiting for all readers to finish their critical
>>> sections. There is at least one critical section in which we try
>>> to get QGM (critical section is in address_space_rw() and
>>> prepare_mmio_access() is trying to aquire QGM).
>>>
>>> Both functions (migration_end() and migration_bitmap_extend())
>>> are called from main thread which is holding QGM.
>>>
>>> Thus there is a race condition that ends up with deadlock:
>>> main thread        working thread
>>> Lock QGA                |
>>> |             Call KVM_EXIT_IO handler
>>> |                        |
>>> |        Open rcu reader's critical section
>>> Migration cleanup bh    |
>>> |                       |
>>> synchronize_rcu() is    |
>>> waiting for readers     |
>>> |            prepare_mmio_access() is waiting for QGM
>>>    \                   /
>>>           deadlock
>>>
>>> The patch just releases QGM before calling synchronize_rcu().
>>>
>>> Signed-off-by: Igor Redko <redkoi@virtuozzo.com>
>>> Reviewed-by: Anna Melekhova <annam@virtuozzo.com>
>>> Signed-off-by: Denis V. Lunev <den@openvz.org>
>>> CC: Juan Quintela <quintela@redhat.com>
>>> CC: Amit Shah <amit.shah@redhat.com>
>>> ---
>>>   migration/ram.c | 6 ++++++
>>>   1 file changed, 6 insertions(+)
>>>
>>> diff --git a/migration/ram.c b/migration/ram.c
>>> index 7f007e6..d01febc 100644
>>> --- a/migration/ram.c
>>> +++ b/migration/ram.c
>>> @@ -1028,12 +1028,16 @@ static void migration_end(void)
>>>   {
>>>       /* caller have hold iothread lock or is in a bh, so there is
>>>        * no writing race against this migration_bitmap
>>> +     * but rcu used not only for migration_bitmap, so we should
>>> +     * release QGM or we get in deadlock.
>>>        */
>>>       unsigned long *bitmap = migration_bitmap;
>>>       atomic_rcu_set(&migration_bitmap, NULL);
>>>       if (bitmap) {
>>>           memory_global_dirty_log_stop();
>>> +        qemu_mutex_unlock_iothread();
>>>           synchronize_rcu();
>>> +        qemu_mutex_lock_iothread();
>> migration_end() can called in two cases:
>> 1. migration_completed
>> 2. migration is cancelled
>>
>> In case 1, you should not unlock iothread, otherwise, the vm's state may be changed
>> unexpectedly.
> 
> sorry, but there is now very good choice here. We should either
> unlock or not call synchronize_rcu which is also an option.
> 
> In the other case the rework should be much more sufficient.

I don't reproduce this bug. But according to your description, the bug only exists
in case 2. Is it right?

> 
> Den
> 
>>>           g_free(bitmap);
>>>       }
>>>   @@ -1085,7 +1089,9 @@ void migration_bitmap_extend(ram_addr_t old, ram_addr_t new)
>>>           atomic_rcu_set(&migration_bitmap, bitmap);
>>>           qemu_mutex_unlock(&migration_bitmap_mutex);
>>>           migration_dirty_pages += new - old;
>>> +        qemu_mutex_unlock_iothread();
>>>           synchronize_rcu();
>>> +        qemu_mutex_lock_iothread();
>> Hmm, I think it is OK to unlock iothread here
>>
>>>           g_free(old_bitmap);
>>>       }
>>>   }
>>>
> 
> .
>
Denis V. Lunev Sept. 25, 2015, 9:09 a.m. UTC | #4
Release qemu global mutex before call synchronize_rcu().
synchronize_rcu() waiting for all readers to finish their critical
sections. There is at least one critical section in which we try
to get QGM (critical section is in address_space_rw() and
prepare_mmio_access() is trying to aquire QGM).

Both functions (migration_end() and migration_bitmap_extend())
are called from main thread which is holding QGM.

Thus there is a race condition that ends up with deadlock:
main thread     working thread
Lock QGA                |
|             Call KVM_EXIT_IO handler
|                       |
|        Open rcu reader's critical section
Migration cleanup bh    |
|                       |
synchronize_rcu() is    |
waiting for readers     |
|            prepare_mmio_access() is waiting for QGM
  \                   /
         deadlock

Patches here are quick and dirty, compile-tested only to validate the
architectual approach.

Igor, Anna, can you pls start your tests with these patches instead of your
original one. Thank you.

Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Igor Redko <redkoi@virtuozzo.com>
CC: Anna Melekhova <annam@virtuozzo.com>
CC: Juan Quintela <quintela@redhat.com>
CC: Amit Shah <amit.shah@redhat.com>

Denis V. Lunev (2):
  migration: bitmap_set is unnecessary as bitmap_new uses g_try_malloc0
  migration: fix deadlock

 migration/ram.c | 45 ++++++++++++++++++++++++++++-----------------
 1 file changed, 28 insertions(+), 17 deletions(-)
Wen Congyang Sept. 25, 2015, 9:46 a.m. UTC | #5
On 09/25/2015 05:09 PM, Denis V. Lunev wrote:
> Release qemu global mutex before call synchronize_rcu().
> synchronize_rcu() waiting for all readers to finish their critical
> sections. There is at least one critical section in which we try
> to get QGM (critical section is in address_space_rw() and
> prepare_mmio_access() is trying to aquire QGM).
> 
> Both functions (migration_end() and migration_bitmap_extend())
> are called from main thread which is holding QGM.
> 
> Thus there is a race condition that ends up with deadlock:
> main thread     working thread
> Lock QGA                |
> |             Call KVM_EXIT_IO handler
> |                       |
> |        Open rcu reader's critical section
> Migration cleanup bh    |
> |                       |
> synchronize_rcu() is    |
> waiting for readers     |
> |            prepare_mmio_access() is waiting for QGM
>   \                   /
>          deadlock
> 
> Patches here are quick and dirty, compile-tested only to validate the
> architectual approach.
> 
> Igor, Anna, can you pls start your tests with these patches instead of your
> original one. Thank you.

Can you give me the backtrace of the working thread?

I think it is very bad to wait some lock in rcu reader's cirtical section.

To Paolo:
Do we allow this in rcu critical section?

Thanks
Wen Congyang

> 
> Signed-off-by: Denis V. Lunev <den@openvz.org>
> CC: Igor Redko <redkoi@virtuozzo.com>
> CC: Anna Melekhova <annam@virtuozzo.com>
> CC: Juan Quintela <quintela@redhat.com>
> CC: Amit Shah <amit.shah@redhat.com>
> 
> Denis V. Lunev (2):
>   migration: bitmap_set is unnecessary as bitmap_new uses g_try_malloc0
>   migration: fix deadlock
> 
>  migration/ram.c | 45 ++++++++++++++++++++++++++++-----------------
>  1 file changed, 28 insertions(+), 17 deletions(-)
>
Igor Redko Sept. 28, 2015, 10:55 a.m. UTC | #6
On Пт., 2015-09-25 at 17:46 +0800, Wen Congyang wrote:
> On 09/25/2015 05:09 PM, Denis V. Lunev wrote:
> > Release qemu global mutex before call synchronize_rcu().
> > synchronize_rcu() waiting for all readers to finish their critical
> > sections. There is at least one critical section in which we try
> > to get QGM (critical section is in address_space_rw() and
> > prepare_mmio_access() is trying to aquire QGM).
> > 
> > Both functions (migration_end() and migration_bitmap_extend())
> > are called from main thread which is holding QGM.
> > 
> > Thus there is a race condition that ends up with deadlock:
> > main thread     working thread
> > Lock QGA                |
> > |             Call KVM_EXIT_IO handler
> > |                       |
> > |        Open rcu reader's critical section
> > Migration cleanup bh    |
> > |                       |
> > synchronize_rcu() is    |
> > waiting for readers     |
> > |            prepare_mmio_access() is waiting for QGM
> >   \                   /
> >          deadlock
> > 
> > Patches here are quick and dirty, compile-tested only to validate the
> > architectual approach.
> > 
> > Igor, Anna, can you pls start your tests with these patches instead of your
> > original one. Thank you.
> 
> Can you give me the backtrace of the working thread?
> 
> I think it is very bad to wait some lock in rcu reader's cirtical section.

#0  __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1  0x00007f1ef113ccfd in __GI___pthread_mutex_lock (mutex=0x7f1ef4145ce0 <qemu_global_mutex>) at ../nptl/pthread_mutex_lock.c:80
#2  0x00007f1ef3c36546 in qemu_mutex_lock (mutex=0x7f1ef4145ce0 <qemu_global_mutex>) at util/qemu-thread-posix.c:73
#3  0x00007f1ef387ff46 in qemu_mutex_lock_iothread () at /home/user/my_qemu/qemu/cpus.c:1170
#4  0x00007f1ef38514a2 in prepare_mmio_access (mr=0x7f1ef612f200) at /home/user/my_qemu/qemu/exec.c:2390
#5  0x00007f1ef385157e in address_space_rw (as=0x7f1ef40ec940 <address_space_io>, addr=49402, attrs=..., buf=0x7f1ef3f97000 "\001", len=1, is_write=true)
    at /home/user/my_qemu/qemu/exec.c:2425
#6  0x00007f1ef3897c53 in kvm_handle_io (port=49402, attrs=..., data=0x7f1ef3f97000, direction=1, size=1, count=1) at /home/user/my_qemu/qemu/kvm-all.c:1680
#7  0x00007f1ef3898144 in kvm_cpu_exec (cpu=0x7f1ef5010fc0) at /home/user/my_qemu/qemu/kvm-all.c:1849
#8  0x00007f1ef387fa91 in qemu_kvm_cpu_thread_fn (arg=0x7f1ef5010fc0) at /home/user/my_qemu/qemu/cpus.c:979
#9  0x00007f1ef113a6aa in start_thread (arg=0x7f1eef0b9700) at pthread_create.c:333
#10 0x00007f1ef0e6feed in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

> 
> > 
> > Signed-off-by: Denis V. Lunev <den@openvz.org>
> > CC: Igor Redko <redkoi@virtuozzo.com>
> > CC: Anna Melekhova <annam@virtuozzo.com>
> > CC: Juan Quintela <quintela@redhat.com>
> > CC: Amit Shah <amit.shah@redhat.com>
> > 
> > Denis V. Lunev (2):
> >   migration: bitmap_set is unnecessary as bitmap_new uses g_try_malloc0
> >   migration: fix deadlock
> > 
> >  migration/ram.c | 45 ++++++++++++++++++++++++++++-----------------
> >  1 file changed, 28 insertions(+), 17 deletions(-)
> > 
>
Igor Redko Sept. 28, 2015, 3:12 p.m. UTC | #7
On Пн., 2015-09-28 at 13:55 +0300, Igor Redko wrote:
> On Пт., 2015-09-25 at 17:46 +0800, Wen Congyang wrote:
> > On 09/25/2015 05:09 PM, Denis V. Lunev wrote:
> > > Release qemu global mutex before call synchronize_rcu().
> > > synchronize_rcu() waiting for all readers to finish their critical
> > > sections. There is at least one critical section in which we try
> > > to get QGM (critical section is in address_space_rw() and
> > > prepare_mmio_access() is trying to aquire QGM).
> > > 
> > > Both functions (migration_end() and migration_bitmap_extend())
> > > are called from main thread which is holding QGM.
> > > 
> > > Thus there is a race condition that ends up with deadlock:
> > > main thread     working thread
> > > Lock QGA                |
> > > |             Call KVM_EXIT_IO handler
> > > |                       |
> > > |        Open rcu reader's critical section
> > > Migration cleanup bh    |
> > > |                       |
> > > synchronize_rcu() is    |
> > > waiting for readers     |
> > > |            prepare_mmio_access() is waiting for QGM
> > >   \                   /
> > >          deadlock
> > > 
> > > Patches here are quick and dirty, compile-tested only to validate the
> > > architectual approach.
> > > 
> > > Igor, Anna, can you pls start your tests with these patches instead of your
> > > original one. Thank you.
> > 
> > Can you give me the backtrace of the working thread?
> > 
> > I think it is very bad to wait some lock in rcu reader's cirtical section.
> 
> #0  __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
> #1  0x00007f1ef113ccfd in __GI___pthread_mutex_lock (mutex=0x7f1ef4145ce0 <qemu_global_mutex>) at ../nptl/pthread_mutex_lock.c:80
> #2  0x00007f1ef3c36546 in qemu_mutex_lock (mutex=0x7f1ef4145ce0 <qemu_global_mutex>) at util/qemu-thread-posix.c:73
> #3  0x00007f1ef387ff46 in qemu_mutex_lock_iothread () at /home/user/my_qemu/qemu/cpus.c:1170
> #4  0x00007f1ef38514a2 in prepare_mmio_access (mr=0x7f1ef612f200) at /home/user/my_qemu/qemu/exec.c:2390
> #5  0x00007f1ef385157e in address_space_rw (as=0x7f1ef40ec940 <address_space_io>, addr=49402, attrs=..., buf=0x7f1ef3f97000 "\001", len=1, is_write=true)
>     at /home/user/my_qemu/qemu/exec.c:2425
> #6  0x00007f1ef3897c53 in kvm_handle_io (port=49402, attrs=..., data=0x7f1ef3f97000, direction=1, size=1, count=1) at /home/user/my_qemu/qemu/kvm-all.c:1680
> #7  0x00007f1ef3898144 in kvm_cpu_exec (cpu=0x7f1ef5010fc0) at /home/user/my_qemu/qemu/kvm-all.c:1849
> #8  0x00007f1ef387fa91 in qemu_kvm_cpu_thread_fn (arg=0x7f1ef5010fc0) at /home/user/my_qemu/qemu/cpus.c:979
> #9  0x00007f1ef113a6aa in start_thread (arg=0x7f1eef0b9700) at pthread_create.c:333
> #10 0x00007f1ef0e6feed in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Backtrace of the main thread:

#0  syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
#1  0x00007f6ade8e9a17 in futex_wait (ev=0x7f6adf26e120 <rcu_gp_event>, val=4294967295) at util/qemu-thread-posix.c:301
#2  0x00007f6ade8e9b2d in qemu_event_wait (ev=0x7f6adf26e120 <rcu_gp_event>) at util/qemu-thread-posix.c:399
#3  0x00007f6ade8fd7ec in wait_for_readers () at util/rcu.c:120
#4  0x00007f6ade8fd875 in synchronize_rcu () at util/rcu.c:149
#5  0x00007f6ade5640fd in migration_end () at /home/user/my_qemu/qemu/migration/ram.c:1036
#6  0x00007f6ade564194 in ram_migration_cancel (opaque=0x0) at /home/user/my_qemu/qemu/migration/ram.c:1054
#7  0x00007f6ade567d5a in qemu_savevm_state_cancel () at /home/user/my_qemu/qemu/migration/savevm.c:915
#8  0x00007f6ade7b4bdf in migrate_fd_cleanup (opaque=0x7f6aded8fd40 <current_migration>) at migration/migration.c:582
#9  0x00007f6ade804d15 in aio_bh_poll (ctx=0x7f6adf895e50) at async.c:87
#10 0x00007f6ade814dcb in aio_dispatch (ctx=0x7f6adf895e50) at aio-posix.c:135
#11 0x00007f6ade8050b5 in aio_ctx_dispatch (source=0x7f6adf895e50, callback=0x0, user_data=0x0) at async.c:226
#12 0x00007f6adc9a3c3d in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#13 0x00007f6ade813274 in glib_pollfds_poll () at main-loop.c:208
#14 0x00007f6ade813351 in os_host_main_loop_wait (timeout=422420000) at main-loop.c:253
#15 0x00007f6ade813410 in main_loop_wait (nonblocking=0) at main-loop.c:502
#16 0x00007f6ade64ae6a in main_loop () at vl.c:1902
#17 0x00007f6ade652c32 in main (argc=70, argv=0x7ffcc3b674e8, envp=0x7ffcc3b67720) at vl.c:4653

> 
> > 
> > > 
> > > Signed-off-by: Denis V. Lunev <den@openvz.org>
> > > CC: Igor Redko <redkoi@virtuozzo.com>
> > > CC: Anna Melekhova <annam@virtuozzo.com>
> > > CC: Juan Quintela <quintela@redhat.com>
> > > CC: Amit Shah <amit.shah@redhat.com>
> > > 
> > > Denis V. Lunev (2):
> > >   migration: bitmap_set is unnecessary as bitmap_new uses g_try_malloc0
> > >   migration: fix deadlock
> > > 
> > >  migration/ram.c | 45 ++++++++++++++++++++++++++++-----------------
> > >  1 file changed, 28 insertions(+), 17 deletions(-)
> > > 
> > 
>
Dr. David Alan Gilbert Sept. 29, 2015, 8:47 a.m. UTC | #8
* Igor Redko (redkoi@virtuozzo.com) wrote:
> On Пт., 2015-09-25 at 17:46 +0800, Wen Congyang wrote:
> > On 09/25/2015 05:09 PM, Denis V. Lunev wrote:
> > > Release qemu global mutex before call synchronize_rcu().
> > > synchronize_rcu() waiting for all readers to finish their critical
> > > sections. There is at least one critical section in which we try
> > > to get QGM (critical section is in address_space_rw() and
> > > prepare_mmio_access() is trying to aquire QGM).
> > > 
> > > Both functions (migration_end() and migration_bitmap_extend())
> > > are called from main thread which is holding QGM.
> > > 
> > > Thus there is a race condition that ends up with deadlock:
> > > main thread     working thread
> > > Lock QGA                |
> > > |             Call KVM_EXIT_IO handler
> > > |                       |
> > > |        Open rcu reader's critical section
> > > Migration cleanup bh    |
> > > |                       |
> > > synchronize_rcu() is    |
> > > waiting for readers     |
> > > |            prepare_mmio_access() is waiting for QGM
> > >   \                   /
> > >          deadlock
> > > 
> > > Patches here are quick and dirty, compile-tested only to validate the
> > > architectual approach.
> > > 
> > > Igor, Anna, can you pls start your tests with these patches instead of your
> > > original one. Thank you.
> > 
> > Can you give me the backtrace of the working thread?
> > 
> > I think it is very bad to wait some lock in rcu reader's cirtical section.
> 
> #0  __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
> #1  0x00007f1ef113ccfd in __GI___pthread_mutex_lock (mutex=0x7f1ef4145ce0 <qemu_global_mutex>) at ../nptl/pthread_mutex_lock.c:80
> #2  0x00007f1ef3c36546 in qemu_mutex_lock (mutex=0x7f1ef4145ce0 <qemu_global_mutex>) at util/qemu-thread-posix.c:73
> #3  0x00007f1ef387ff46 in qemu_mutex_lock_iothread () at /home/user/my_qemu/qemu/cpus.c:1170
> #4  0x00007f1ef38514a2 in prepare_mmio_access (mr=0x7f1ef612f200) at /home/user/my_qemu/qemu/exec.c:2390
> #5  0x00007f1ef385157e in address_space_rw (as=0x7f1ef40ec940 <address_space_io>, addr=49402, attrs=..., buf=0x7f1ef3f97000 "\001", len=1, is_write=true)
>     at /home/user/my_qemu/qemu/exec.c:2425
> #6  0x00007f1ef3897c53 in kvm_handle_io (port=49402, attrs=..., data=0x7f1ef3f97000, direction=1, size=1, count=1) at /home/user/my_qemu/qemu/kvm-all.c:1680
> #7  0x00007f1ef3898144 in kvm_cpu_exec (cpu=0x7f1ef5010fc0) at /home/user/my_qemu/qemu/kvm-all.c:1849
> #8  0x00007f1ef387fa91 in qemu_kvm_cpu_thread_fn (arg=0x7f1ef5010fc0) at /home/user/my_qemu/qemu/cpus.c:979
> #9  0x00007f1ef113a6aa in start_thread (arg=0x7f1eef0b9700) at pthread_create.c:333
> #10 0x00007f1ef0e6feed in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Do you have a test to run in the guest that easily triggers this?

Dave
> > 
> > > 
> > > Signed-off-by: Denis V. Lunev <den@openvz.org>
> > > CC: Igor Redko <redkoi@virtuozzo.com>
> > > CC: Anna Melekhova <annam@virtuozzo.com>
> > > CC: Juan Quintela <quintela@redhat.com>
> > > CC: Amit Shah <amit.shah@redhat.com>
> > > 
> > > Denis V. Lunev (2):
> > >   migration: bitmap_set is unnecessary as bitmap_new uses g_try_malloc0
> > >   migration: fix deadlock
> > > 
> > >  migration/ram.c | 45 ++++++++++++++++++++++++++++-----------------
> > >  1 file changed, 28 insertions(+), 17 deletions(-)
> > > 
> > 
> 
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Igor Redko Sept. 29, 2015, 3:32 p.m. UTC | #9
On 25.09.2015 11:23, Wen Congyang wrote:
> On 09/25/2015 04:03 PM, Denis V. Lunev wrote:
>> On 09/25/2015 04:21 AM, Wen Congyang wrote:
>>> On 09/24/2015 08:53 PM, Denis V. Lunev wrote:
>>>> From: Igor Redko <redkoi@virtuozzo.com>
>>>>
>>>> Release qemu global mutex before call synchronize_rcu().
>>>> synchronize_rcu() waiting for all readers to finish their critical
>>>> sections. There is at least one critical section in which we try
>>>> to get QGM (critical section is in address_space_rw() and
>>>> prepare_mmio_access() is trying to aquire QGM).
>>>>
>>>> Both functions (migration_end() and migration_bitmap_extend())
>>>> are called from main thread which is holding QGM.
>>>>
>>>> Thus there is a race condition that ends up with deadlock:
>>>> main thread        working thread
>>>> Lock QGA                |
>>>> |             Call KVM_EXIT_IO handler
>>>> |                        |
>>>> |        Open rcu reader's critical section
>>>> Migration cleanup bh    |
>>>> |                       |
>>>> synchronize_rcu() is    |
>>>> waiting for readers     |
>>>> |            prepare_mmio_access() is waiting for QGM
>>>>     \                   /
>>>>            deadlock
>>>>
>>>> The patch just releases QGM before calling synchronize_rcu().
>>>>
>>>> Signed-off-by: Igor Redko <redkoi@virtuozzo.com>
>>>> Reviewed-by: Anna Melekhova <annam@virtuozzo.com>
>>>> Signed-off-by: Denis V. Lunev <den@openvz.org>
>>>> CC: Juan Quintela <quintela@redhat.com>
>>>> CC: Amit Shah <amit.shah@redhat.com>
>>>> ---
>>>>    migration/ram.c | 6 ++++++
>>>>    1 file changed, 6 insertions(+)
>>>>
>>>> diff --git a/migration/ram.c b/migration/ram.c
>>>> index 7f007e6..d01febc 100644
>>>> --- a/migration/ram.c
>>>> +++ b/migration/ram.c
>>>> @@ -1028,12 +1028,16 @@ static void migration_end(void)
>>>>    {
>>>>        /* caller have hold iothread lock or is in a bh, so there is
>>>>         * no writing race against this migration_bitmap
>>>> +     * but rcu used not only for migration_bitmap, so we should
>>>> +     * release QGM or we get in deadlock.
>>>>         */
>>>>        unsigned long *bitmap = migration_bitmap;
>>>>        atomic_rcu_set(&migration_bitmap, NULL);
>>>>        if (bitmap) {
>>>>            memory_global_dirty_log_stop();
>>>> +        qemu_mutex_unlock_iothread();
>>>>            synchronize_rcu();
>>>> +        qemu_mutex_lock_iothread();
>>> migration_end() can called in two cases:
>>> 1. migration_completed
>>> 2. migration is cancelled
>>>
>>> In case 1, you should not unlock iothread, otherwise, the vm's state may be changed
>>> unexpectedly.
>>
>> sorry, but there is now very good choice here. We should either
>> unlock or not call synchronize_rcu which is also an option.
>>
>> In the other case the rework should be much more sufficient.
>
> I don't reproduce this bug. But according to your description, the bug only exists
> in case 2. Is it right?
>
When migration is successfully completed, VM has been already stopped 
before migration_end() is being called. VM must be running to reproduce 
this bug. So, yes bug exists only in case 2

FYI
To reproduce this bug you need 2 hosts with qemu+libvirt (host0 and 
host1) configured for migration.
0. Create VM on host0 and install centos7
1. Shutdown VM.
2. Start VM (virsh start <VM_name>) and right after that start migration 
to host1 (smth like 'virsh migrate --live --verbose <VM_name> 
"qemu+ssh://host1/system"')
3. Stop migration after ~1 sec (after migration process have been 
started, but before it completed. for example when you see "Migration: [ 
  5 %]")
Works for me 9/10
deadlock: no response from VM and no response from qemu monitor (for 
example 'virsh qemu-monitor-command --hmp <VM_NAME> "info migrate"' will 
hang indefinitely)

Another way:
0. Create VM with e1000 network card on host0 and install centos7
1. Run iperf on VM (or any other load on network)
2. Start migration
3. Stop migration before it completed.
For this approach e1000 network card is essential because it generates 
KVM_EXIT_MMIO.

>>
>> Den
>>
>>>>            g_free(bitmap);
>>>>        }
>>>>    @@ -1085,7 +1089,9 @@ void migration_bitmap_extend(ram_addr_t old, ram_addr_t new)
>>>>            atomic_rcu_set(&migration_bitmap, bitmap);
>>>>            qemu_mutex_unlock(&migration_bitmap_mutex);
>>>>            migration_dirty_pages += new - old;
>>>> +        qemu_mutex_unlock_iothread();
>>>>            synchronize_rcu();
>>>> +        qemu_mutex_lock_iothread();
>>> Hmm, I think it is OK to unlock iothread here
>>>
>>>>            g_free(old_bitmap);
>>>>        }
>>>>    }
>>>>
>>
>> .
>>
>
>
Igor Redko Sept. 30, 2015, 2:28 p.m. UTC | #10
On 29.09.2015 11:47, Dr. David Alan Gilbert wrote:
> * Igor Redko (redkoi@virtuozzo.com) wrote:
>> On Пт., 2015-09-25 at 17:46 +0800, Wen Congyang wrote:
>>> On 09/25/2015 05:09 PM, Denis V. Lunev wrote:
>>>> Release qemu global mutex before call synchronize_rcu().
>>>> synchronize_rcu() waiting for all readers to finish their critical
>>>> sections. There is at least one critical section in which we try
>>>> to get QGM (critical section is in address_space_rw() and
>>>> prepare_mmio_access() is trying to aquire QGM).
>>>>
>>>> Both functions (migration_end() and migration_bitmap_extend())
>>>> are called from main thread which is holding QGM.
>>>>
>>>> Thus there is a race condition that ends up with deadlock:
>>>> main thread     working thread
>>>> Lock QGA                |
>>>> |             Call KVM_EXIT_IO handler
>>>> |                       |
>>>> |        Open rcu reader's critical section
>>>> Migration cleanup bh    |
>>>> |                       |
>>>> synchronize_rcu() is    |
>>>> waiting for readers     |
>>>> |            prepare_mmio_access() is waiting for QGM
>>>>    \                   /
>>>>           deadlock
>>>>
>>>> Patches here are quick and dirty, compile-tested only to validate the
>>>> architectual approach.
>>>>
>>>> Igor, Anna, can you pls start your tests with these patches instead of your
>>>> original one. Thank you.
>>>
>>> Can you give me the backtrace of the working thread?
>>>
>>> I think it is very bad to wait some lock in rcu reader's cirtical section.
>>
>> #0  __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
>> #1  0x00007f1ef113ccfd in __GI___pthread_mutex_lock (mutex=0x7f1ef4145ce0 <qemu_global_mutex>) at ../nptl/pthread_mutex_lock.c:80
>> #2  0x00007f1ef3c36546 in qemu_mutex_lock (mutex=0x7f1ef4145ce0 <qemu_global_mutex>) at util/qemu-thread-posix.c:73
>> #3  0x00007f1ef387ff46 in qemu_mutex_lock_iothread () at /home/user/my_qemu/qemu/cpus.c:1170
>> #4  0x00007f1ef38514a2 in prepare_mmio_access (mr=0x7f1ef612f200) at /home/user/my_qemu/qemu/exec.c:2390
>> #5  0x00007f1ef385157e in address_space_rw (as=0x7f1ef40ec940 <address_space_io>, addr=49402, attrs=..., buf=0x7f1ef3f97000 "\001", len=1, is_write=true)
>>      at /home/user/my_qemu/qemu/exec.c:2425
>> #6  0x00007f1ef3897c53 in kvm_handle_io (port=49402, attrs=..., data=0x7f1ef3f97000, direction=1, size=1, count=1) at /home/user/my_qemu/qemu/kvm-all.c:1680
>> #7  0x00007f1ef3898144 in kvm_cpu_exec (cpu=0x7f1ef5010fc0) at /home/user/my_qemu/qemu/kvm-all.c:1849
>> #8  0x00007f1ef387fa91 in qemu_kvm_cpu_thread_fn (arg=0x7f1ef5010fc0) at /home/user/my_qemu/qemu/cpus.c:979
>> #9  0x00007f1ef113a6aa in start_thread (arg=0x7f1eef0b9700) at pthread_create.c:333
>> #10 0x00007f1ef0e6feed in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
>
> Do you have a test to run in the guest that easily triggers this?
>
> Dave

There are two ways to trigger this. Both of them need 2 hosts with 
qemu+libvirt (host0 and host1) configured for migration.

First way:
0. Create VM on host0 and install centos7
1. Shutdown VM.
2. Start VM (virsh start <VM_name>) and right after that start migration 
to host1 (smth like 'virsh migrate --live --verbose <VM_name> 
"qemu+ssh://host1/system"')
3. Stop migration after ~1 sec (after migration process have been 
started, but before it completed. for example when you see "Migration: [ 
  5 %]")

deadlock: no response from VM and no response from qemu monitor (for 
example 'virsh qemu-monitor-command --hmp <VM_NAME> "info migrate"' will 
hang indefinitely) 9/10

Second way:
0. Create VM with e1000 network card on host0 and install centos7
1. Run iperf on VM (or any other load on network)
2. Start migration
3. Stop migration before it completed.

For this approach e1000 network card is essential because it generates 
KVM_EXIT_MMIO.

Igor
>>>
>>>>
>>>> Signed-off-by: Denis V. Lunev <den@openvz.org>
>>>> CC: Igor Redko <redkoi@virtuozzo.com>
>>>> CC: Anna Melekhova <annam@virtuozzo.com>
>>>> CC: Juan Quintela <quintela@redhat.com>
>>>> CC: Amit Shah <amit.shah@redhat.com>
>>>>
>>>> Denis V. Lunev (2):
>>>>    migration: bitmap_set is unnecessary as bitmap_new uses g_try_malloc0
>>>>    migration: fix deadlock
>>>>
>>>>   migration/ram.c | 45 ++++++++++++++++++++++++++++-----------------
>>>>   1 file changed, 28 insertions(+), 17 deletions(-)
>>>>
>>>
>>
>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
diff mbox

Patch

diff --git a/migration/ram.c b/migration/ram.c
index 7f007e6..d01febc 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1028,12 +1028,16 @@  static void migration_end(void)
 {
     /* caller have hold iothread lock or is in a bh, so there is
      * no writing race against this migration_bitmap
+     * but rcu used not only for migration_bitmap, so we should
+     * release QGM or we get in deadlock.
      */
     unsigned long *bitmap = migration_bitmap;
     atomic_rcu_set(&migration_bitmap, NULL);
     if (bitmap) {
         memory_global_dirty_log_stop();
+        qemu_mutex_unlock_iothread();
         synchronize_rcu();
+        qemu_mutex_lock_iothread();
         g_free(bitmap);
     }
 
@@ -1085,7 +1089,9 @@  void migration_bitmap_extend(ram_addr_t old, ram_addr_t new)
         atomic_rcu_set(&migration_bitmap, bitmap);
         qemu_mutex_unlock(&migration_bitmap_mutex);
         migration_dirty_pages += new - old;
+        qemu_mutex_unlock_iothread();
         synchronize_rcu();
+        qemu_mutex_lock_iothread();
         g_free(old_bitmap);
     }
 }