diff mbox

Hang with migration multi-thread compression under high load

Message ID 20160428101527.GG1797@redhat.com
State New
Headers show

Commit Message

Daniel P. Berrangé April 28, 2016, 10:15 a.m. UTC
On Wed, Apr 27, 2016 at 03:20:23PM +0100, Daniel P. Berrange wrote:
> I've been testing various features of migration and have hit a problem
> with the multi-thread compression. It works fine when I have 2 or more
> threads, but if I tell it to only use a single thread, then it almost
> always hangs

[snip]

> Now the target QEMU shows this stack trace:

[snip]

> Thread 2 (Thread 0x7f8677bed700 (LWP 4657)):
> #0  0x00007f86eb035b10 in pthread_cond_wait@@GLIBC_2.3.2 () at /lib64/libpthread.so.0
> #1  0x0000560ecd8a2709 in qemu_cond_wait (cond=cond@entry=0x560ed03918a0, mutex=mutex@entry=0x560ed0391878) at util/qemu-thread-posix.c:123
> #2  0x0000560ecd5b422d in do_data_decompress (opaque=0x560ed0391870) at /home/berrange/src/virt/qemu/migration/ram.c:2195
> #3  0x00007f86eb03060a in start_thread () at /lib64/libpthread.so.0
> #4  0x00007f86ead6aa4d in clone () at /lib64/libc.so.6
> 
> Thread 1 (Thread 0x7f86f767dc80 (LWP 4651)):
> #0  0x0000560ecd5b744f in ram_load (len=711, host=0x7f8677e06000, f=0x560ed03a7950) at /home/berrange/src/virt/qemu/migration/ram.c:2263
> #1  0x0000560ecd5b744f in ram_load (f=0x560ed03a7950, opaque=<optimized out>, version_id=<optimized out>)
>     at /home/berrange/src/virt/qemu/migration/ram.c:2513
> #2  0x0000560ecd5b8b87 in vmstate_load (f=0x560ed03a7950, se=0x560ece731f90, version_id=4)
>     at /home/berrange/src/virt/qemu/migration/savevm.c:643
> #3  0x0000560ecd5b8fc3 in qemu_loadvm_state_main (mis=0x560ece75c330, f=0x560ed03a7950) at /home/berrange/src/virt/qemu/migration/savevm.c:1819
> #4  0x0000560ecd5b8fc3 in qemu_loadvm_state_main (f=f@entry=0x560ed03a7950, mis=mis@entry=0x560ece75c330)
>     at /home/berrange/src/virt/qemu/migration/savevm.c:1850
> #5  0x0000560ecd5bbd36 in qemu_loadvm_state (f=f@entry=0x560ed03a7950) at /home/berrange/src/virt/qemu/migration/savevm.c:1911
> #6  0x0000560ecd7b1b2f in process_incoming_migration_co (opaque=0x560ed03a7950) at migration/migration.c:384
> #7  0x0000560ecd8b1eba in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at util/coroutine-ucontext.c:78
> #8  0x00007f86eacb0f30 in __start_context () at /lib64/libc.so.6
> #9  0x00007ffc877e49c0 in  ()
> 
> 
> for some reason it isn't shown in the stack thrace for thread
> 1 above, when initially connecting GDB it says the main thread
> is at:
> 
> decompress_data_with_multi_threads (len=702, host=0x7fd78fe06000, f=0x55901af09950) at /home/berrange/src/virt/qemu/migration/ram.c:2254
> 2254	        for (idx = 0; idx < thread_count; idx++) {
> 
> 
> Looking at the target QEMU, we see  do_data_decompress method
> is waiting in a condition var:
> 
>         while (!param->start && !quit_decomp_thread) {
> 	    qemu_cond_wait(&param->cond, &param->mutex);
>             ....do stuff..
> 	    param->start = false
>         }
> 
> 
> Now the decompress_data_with_multi_threads is checking param->start without
> holding the param->mutex lock.

Ok, so I've investigated this further and decompress_data_with_multi_threads
is where I believe the flaw is. Its code is:

static void decompress_data_with_multi_threads(QEMUFile *f,
                                               void *host, int len)
{
    int idx, thread_count;

    thread_count = migrate_decompress_threads();
    while (true) {
        for (idx = 0; idx < thread_count; idx++) {
            if (!decomp_param[idx].start) {
                qemu_get_buffer(f, decomp_param[idx].compbuf, len);
                decomp_param[idx].des = host;
                decomp_param[idx].len = len;
                start_decompression(&decomp_param[idx]);
                break;
            }
        }
        if (idx < thread_count) {
            break;
        }
    }
}


Looking at the dissembly for the start of that method we have:

        for (idx = 0; idx < thread_count; idx++) {
    358a:       45 85 f6                test   %r14d,%r14d
    358d:       7e fb                   jle    358a <ram_load+0x49a>
            if (!decomp_param[idx].start) {
    358f:       45 31 e4                xor    %r12d,%r12d
    3592:       40 84 ff                test   %dil,%dil
    3595:       48 8d 42 78             lea    0x78(%rdx),%rax
    3599:       0f 84 b1 04 00 00       je     3a50 <ram_load+0x960>
    359f:       90                      nop


Now asm is not my strong suit, but IIUC, that is showing that
the access to 'decomp_param[idx].start' is *not* accessing the
actual struct memory offset, but instead accessing a cached copy
in the %dil register. This is valid because we've not told the
compiler that this struct variable can be modified by multiple
threads at once.

So when the corresponding do_data_decompress() method sets
'decomp_param[idx].start = false', this is never seen by the
decompress_data_with_multi_threads() method.

If decompress_data_with_multi_threads() used a mutex lock
around its access to 'decomp_param[idx].start', then there
would be a memory barrier and the code would not be checking
the value cached in the %dil register.

Now we don't want a mutex lock in this code, but we do still
need to have a memory barrier here, so I think we need this
patch:



Running my test case with that applied certainly makes migration
run as normal without the hangs I see currently


Regards,
Daniel

Comments

Li, Liang Z April 29, 2016, 4:51 a.m. UTC | #1
> On Wed, Apr 27, 2016 at 03:20:23PM +0100, Daniel P. Berrange wrote:

> > I've been testing various features of migration and have hit a problem

> > with the multi-thread compression. It works fine when I have 2 or more

> > threads, but if I tell it to only use a single thread, then it almost

> > always hangs

> 

> [snip]

> 

> > Now the target QEMU shows this stack trace:

> 

> [snip]

> 

> > Thread 2 (Thread 0x7f8677bed700 (LWP 4657)):

> > #0  0x00007f86eb035b10 in pthread_cond_wait@@GLIBC_2.3.2 () at

> > /lib64/libpthread.so.0

> > #1  0x0000560ecd8a2709 in qemu_cond_wait

> > (cond=cond@entry=0x560ed03918a0,

> mutex=mutex@entry=0x560ed0391878) at

> > util/qemu-thread-posix.c:123

> > #2  0x0000560ecd5b422d in do_data_decompress

> (opaque=0x560ed0391870)

> > at /home/berrange/src/virt/qemu/migration/ram.c:2195

> > #3  0x00007f86eb03060a in start_thread () at /lib64/libpthread.so.0

> > #4  0x00007f86ead6aa4d in clone () at /lib64/libc.so.6

> >

> > Thread 1 (Thread 0x7f86f767dc80 (LWP 4651)):

> > #0  0x0000560ecd5b744f in ram_load (len=711, host=0x7f8677e06000,

> > f=0x560ed03a7950) at

> /home/berrange/src/virt/qemu/migration/ram.c:2263

> > #1  0x0000560ecd5b744f in ram_load (f=0x560ed03a7950,

> opaque=<optimized out>, version_id=<optimized out>)

> >     at /home/berrange/src/virt/qemu/migration/ram.c:2513

> > #2  0x0000560ecd5b8b87 in vmstate_load (f=0x560ed03a7950,

> se=0x560ece731f90, version_id=4)

> >     at /home/berrange/src/virt/qemu/migration/savevm.c:643

> > #3  0x0000560ecd5b8fc3 in qemu_loadvm_state_main

> (mis=0x560ece75c330,

> > f=0x560ed03a7950) at

> > /home/berrange/src/virt/qemu/migration/savevm.c:1819

> > #4  0x0000560ecd5b8fc3 in qemu_loadvm_state_main

> (f=f@entry=0x560ed03a7950, mis=mis@entry=0x560ece75c330)

> >     at /home/berrange/src/virt/qemu/migration/savevm.c:1850

> > #5  0x0000560ecd5bbd36 in qemu_loadvm_state

> (f=f@entry=0x560ed03a7950)

> > at /home/berrange/src/virt/qemu/migration/savevm.c:1911

> > #6  0x0000560ecd7b1b2f in process_incoming_migration_co

> > (opaque=0x560ed03a7950) at migration/migration.c:384

> > #7  0x0000560ecd8b1eba in coroutine_trampoline (i0=<optimized out>,

> > i1=<optimized out>) at util/coroutine-ucontext.c:78

> > #8  0x00007f86eacb0f30 in __start_context () at /lib64/libc.so.6

> > #9  0x00007ffc877e49c0 in  ()

> >

> >

> > for some reason it isn't shown in the stack thrace for thread

> > 1 above, when initially connecting GDB it says the main thread is at:

> >

> > decompress_data_with_multi_threads (len=702, host=0x7fd78fe06000,

> f=0x55901af09950) at /home/berrange/src/virt/qemu/migration/ram.c:2254

> > 2254	        for (idx = 0; idx < thread_count; idx++) {

> >

> >

> > Looking at the target QEMU, we see  do_data_decompress method is

> > waiting in a condition var:

> >

> >         while (!param->start && !quit_decomp_thread) {

> > 	    qemu_cond_wait(&param->cond, &param->mutex);

> >             ....do stuff..

> > 	    param->start = false

> >         }

> >

> >

> > Now the decompress_data_with_multi_threads is checking param->start

> > without holding the param->mutex lock.

> 

> Ok, so I've investigated this further and

> decompress_data_with_multi_threads

> is where I believe the flaw is. Its code is:

> 

> static void decompress_data_with_multi_threads(QEMUFile *f,

>                                                void *host, int len) {

>     int idx, thread_count;

> 

>     thread_count = migrate_decompress_threads();

>     while (true) {

>         for (idx = 0; idx < thread_count; idx++) {

>             if (!decomp_param[idx].start) {

>                 qemu_get_buffer(f, decomp_param[idx].compbuf, len);

>                 decomp_param[idx].des = host;

>                 decomp_param[idx].len = len;

>                 start_decompression(&decomp_param[idx]);

>                 break;

>             }

>         }

>         if (idx < thread_count) {

>             break;

>         }

>     }

> }

> 

> 

> Looking at the dissembly for the start of that method we have:

> 

>         for (idx = 0; idx < thread_count; idx++) {

>     358a:       45 85 f6                test   %r14d,%r14d

>     358d:       7e fb                   jle    358a <ram_load+0x49a>

>             if (!decomp_param[idx].start) {

>     358f:       45 31 e4                xor    %r12d,%r12d

>     3592:       40 84 ff                test   %dil,%dil

>     3595:       48 8d 42 78             lea    0x78(%rdx),%rax

>     3599:       0f 84 b1 04 00 00       je     3a50 <ram_load+0x960>

>     359f:       90                      nop

> 

> 

> Now asm is not my strong suit, but IIUC, that is showing that the access to

> 'decomp_param[idx].start' is *not* accessing the actual struct memory

> offset, but instead accessing a cached copy in the %dil register. This is valid

> because we've not told the compiler that this struct variable can be modified

> by multiple threads at once.

> 

> So when the corresponding do_data_decompress() method sets

> 'decomp_param[idx].start = false', this is never seen by the

> decompress_data_with_multi_threads() method.

> 

> If decompress_data_with_multi_threads() used a mutex lock around its

> access to 'decomp_param[idx].start', then there would be a memory barrier

> and the code would not be checking the value cached in the %dil register.

> 

> Now we don't want a mutex lock in this code, but we do still need to have a

> memory barrier here, so I think we need this

> patch:

> 

> diff --git a/migration/ram.c b/migration/ram.c index be0233f..fddc136 100644

> --- a/migration/ram.c

> +++ b/migration/ram.c

> @@ -2260,6 +2260,7 @@ static void

> decompress_data_with_multi_threads(QEMUFile *f,

>      thread_count = migrate_decompress_threads();

>      while (true) {

>          for (idx = 0; idx < thread_count; idx++) {

> +            smp_mb();

>              if (!decomp_param[idx].start) {

>                  qemu_get_buffer(f, decomp_param[idx].compbuf, len);

>                  decomp_param[idx].des = host;

> 

> 

> Running my test case with that applied certainly makes migration run as

> normal without the hangs I see currently

> 

> 


Yes, you patch can fix the issue, but the busy loop is not good. We have to fix this.

This is a patch which follows the pattern of the compression code. I think it can fix this issue too.

Could you help to test if it works in your environment? 

Liang


> Regards,

> Daniel

> --

> |: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|

> |: http://libvirt.org              -o-             http://virt-manager.org :|

> |: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|

> |: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|
diff mbox

Patch

diff --git a/migration/ram.c b/migration/ram.c
index be0233f..fddc136 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2260,6 +2260,7 @@  static void decompress_data_with_multi_threads(QEMUFile *f,
     thread_count = migrate_decompress_threads();
     while (true) {
         for (idx = 0; idx < thread_count; idx++) {
+            smp_mb();
             if (!decomp_param[idx].start) {
                 qemu_get_buffer(f, decomp_param[idx].compbuf, len);
                 decomp_param[idx].des = host;