Patchwork qemu-kvm-1.0.1 - unable to exit if vcpu is in infinite loop

login
register
mail settings
Submitter Jan Kiszka
Date Aug. 17, 2012, 2:36 p.m.
Message ID <502E56D3.6060607@siemens.com>
Download mbox | patch
Permalink /patch/178225/
State New
Headers show

Comments

Jan Kiszka - Aug. 17, 2012, 2:36 p.m.
On 2012-08-17 15:11, Jan Kiszka wrote:
> On 2012-08-06 17:11, Stefan Hajnoczi wrote:
>> On Thu, Jun 28, 2012 at 2:05 PM, Peter Lieven <pl@dlhnet.de> wrote:
>>> i debugged my initial problem further and found out that the problem happens
>>> to be that
>>> the main thread is stuck in pause_all_vcpus() on reset or quit commands in
>>> the monitor
>>> if one cpu is stuck in the do-while loop kvm_cpu_exec. If I modify the
>>> condition from while (ret == 0)
>>> to while ((ret == 0) && !env->stop); it works, but is this the right fix?
>>> "Quit" command seems to work, but on "Reset" the VM enterns pause state.
>>
>> I think I'm hitting something similar.  I installed a F17 amd64 guest
>> (3.5 kernel) but before booting entered the GRUB boot menu edit mode.
>> The guest seemed unresponsive so I switched to the monitor, which also
>> froze shortly afterwards.  The VNC screen ended up being all black.
>>
>> qemu-kvm.git/master 3e4305694fd891b69e4450e59ec4c65420907ede
>> Linux 3.2.0-3-amd64 from Debian testing
>>
>> $ qemu-system-x86_64 -enable-kvm -m 1024 -smp 2 -drive
>> if=virtio,cache=none,file=f17.img,aio=native -serial stdio
>>
>> (gdb) thread apply all bt
>>
>> Thread 3 (Thread 0x7f8008e23700 (LWP 367)):
>> #0  0x00007f800f891727 in ioctl () at ../sysdeps/unix/syscall-template.S:82
>> #1  0x00007f80137b92c9 in kvm_vcpu_ioctl
>> (env=env@entry=0x7f8015b49640, type=type@entry=44672)
>>     at /home/stefanha/qemu-kvm/kvm-all.c:1619
>> #2  0x00007f80137b93fe in kvm_cpu_exec (env=env@entry=0x7f8015b49640)
>>     at /home/stefanha/qemu-kvm/kvm-all.c:1506
>> #3  0x00007f8013766f31 in qemu_kvm_cpu_thread_fn (arg=0x7f8015b49640)
>>     at /home/stefanha/qemu-kvm/cpus.c:756
>> #4  0x00007f800fb4db50 in start_thread (arg=<optimized out>) at
>> pthread_create.c:304
>> #5  0x00007f800f8986dd in clone () at
>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
>> #6  0x0000000000000000 in ?? ()
>>
>> This vcpu is still executing guest code and I've seen it successfully
>> dispatching I/O.  The problem is it's missing the exit_request...
>>
>> Thread 2 (Thread 0x7f8008622700 (LWP 368)):
>> #0  pthread_cond_wait@@GLIBC_2.3.2 ()
>>     at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
>> #1  0x00007f801372b229 in qemu_cond_wait (cond=<optimized out>,
>>     mutex=mutex@entry=0x7f80144367c0) at qemu-thread-posix.c:113
>> #2  0x00007f8013766eff in qemu_kvm_wait_io_event (env=<optimized out>)
>>     at /home/stefanha/qemu-kvm/cpus.c:724
>> #3  qemu_kvm_cpu_thread_fn (arg=0x7f8015b67450) at
>> /home/stefanha/qemu-kvm/cpus.c:761
>> #4  0x00007f800fb4db50 in start_thread (arg=<optimized out>) at
>> pthread_create.c:304
>> #5  0x00007f800f8986dd in clone () at
>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
>> #6  0x0000000000000000 in ?? ()
>>
>> No problems here.
>>
>> Thread 1 (Thread 0x7f801347b8c0 (LWP 365)):
>> #0  pthread_cond_wait@@GLIBC_2.3.2 ()
>>     at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
>> #1  0x00007f801372b229 in qemu_cond_wait (cond=cond@entry=0x7f801402fd80,
>>     mutex=mutex@entry=0x7f80144367c0) at qemu-thread-posix.c:113
>> #2  0x00007f8013768949 in pause_all_vcpus () at
>> /home/stefanha/qemu-kvm/cpus.c:962
>> #3  0x00007f80136028c8 in main (argc=<optimized out>, argv=<optimized out>,
>>     envp=<optimized out>) at /home/stefanha/qemu-kvm/vl.c:3695
>>
>> We're deadlocked in pause_all_vcpus(), waiting for vcpu #0 to pause.
>> Unfortunately vcpu #0 has ->exit_request=0 although ->stop=1.
>>
>> Here are the vcpus:
>>
>> (gdb) p first_cpu
>> $6 = (struct CPUX86State *) 0x7f8015b49640
>> (gdb) p first_cpu->next_cpu
>> $7 = (struct CPUX86State *) 0x7f8015b67450
>> (gdb) p first_cpu->next_cpu->next_cpu
>> $8 = (struct CPUX86State *) 0x0
>>
>> (gdb) p first_cpu->stop
>> $9 = 1
>> (gdb) p first_cpu->stopped
>> $10 = 0
>> (gdb) p first_cpu->exit_request
>> $11 = 0
> 
> CPUState::exit_request is only set on specific synchronous events, see
> target-i386/kvm.c.
> 
> More interesting is CPUState::thread_kicked. If it's set, qemu_cpu_kick
> will skip the kicking via a signal. Maybe there is some race. Let me
> think about such possibilities again...

Can anyone imagine that such a barrier may actually be required? If it
is currently possible that env->stop is evaluated before we called into
sigtimedwait in qemu_kvm_eat_signals, then we could actually eat the
signal without properly processing its reason (stop).

Jan
Jan Kiszka - Aug. 17, 2012, 2:41 p.m.
On 2012-08-17 16:36, Jan Kiszka wrote:
> On 2012-08-17 15:11, Jan Kiszka wrote:
>> On 2012-08-06 17:11, Stefan Hajnoczi wrote:
>>> On Thu, Jun 28, 2012 at 2:05 PM, Peter Lieven <pl@dlhnet.de> wrote:
>>>> i debugged my initial problem further and found out that the problem happens
>>>> to be that
>>>> the main thread is stuck in pause_all_vcpus() on reset or quit commands in
>>>> the monitor
>>>> if one cpu is stuck in the do-while loop kvm_cpu_exec. If I modify the
>>>> condition from while (ret == 0)
>>>> to while ((ret == 0) && !env->stop); it works, but is this the right fix?
>>>> "Quit" command seems to work, but on "Reset" the VM enterns pause state.
>>>
>>> I think I'm hitting something similar.  I installed a F17 amd64 guest
>>> (3.5 kernel) but before booting entered the GRUB boot menu edit mode.
>>> The guest seemed unresponsive so I switched to the monitor, which also
>>> froze shortly afterwards.  The VNC screen ended up being all black.
>>>
>>> qemu-kvm.git/master 3e4305694fd891b69e4450e59ec4c65420907ede
>>> Linux 3.2.0-3-amd64 from Debian testing
>>>
>>> $ qemu-system-x86_64 -enable-kvm -m 1024 -smp 2 -drive
>>> if=virtio,cache=none,file=f17.img,aio=native -serial stdio
>>>
>>> (gdb) thread apply all bt
>>>
>>> Thread 3 (Thread 0x7f8008e23700 (LWP 367)):
>>> #0  0x00007f800f891727 in ioctl () at ../sysdeps/unix/syscall-template.S:82
>>> #1  0x00007f80137b92c9 in kvm_vcpu_ioctl
>>> (env=env@entry=0x7f8015b49640, type=type@entry=44672)
>>>     at /home/stefanha/qemu-kvm/kvm-all.c:1619
>>> #2  0x00007f80137b93fe in kvm_cpu_exec (env=env@entry=0x7f8015b49640)
>>>     at /home/stefanha/qemu-kvm/kvm-all.c:1506
>>> #3  0x00007f8013766f31 in qemu_kvm_cpu_thread_fn (arg=0x7f8015b49640)
>>>     at /home/stefanha/qemu-kvm/cpus.c:756
>>> #4  0x00007f800fb4db50 in start_thread (arg=<optimized out>) at
>>> pthread_create.c:304
>>> #5  0x00007f800f8986dd in clone () at
>>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
>>> #6  0x0000000000000000 in ?? ()
>>>
>>> This vcpu is still executing guest code and I've seen it successfully
>>> dispatching I/O.  The problem is it's missing the exit_request...
>>>
>>> Thread 2 (Thread 0x7f8008622700 (LWP 368)):
>>> #0  pthread_cond_wait@@GLIBC_2.3.2 ()
>>>     at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
>>> #1  0x00007f801372b229 in qemu_cond_wait (cond=<optimized out>,
>>>     mutex=mutex@entry=0x7f80144367c0) at qemu-thread-posix.c:113
>>> #2  0x00007f8013766eff in qemu_kvm_wait_io_event (env=<optimized out>)
>>>     at /home/stefanha/qemu-kvm/cpus.c:724
>>> #3  qemu_kvm_cpu_thread_fn (arg=0x7f8015b67450) at
>>> /home/stefanha/qemu-kvm/cpus.c:761
>>> #4  0x00007f800fb4db50 in start_thread (arg=<optimized out>) at
>>> pthread_create.c:304
>>> #5  0x00007f800f8986dd in clone () at
>>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
>>> #6  0x0000000000000000 in ?? ()
>>>
>>> No problems here.
>>>
>>> Thread 1 (Thread 0x7f801347b8c0 (LWP 365)):
>>> #0  pthread_cond_wait@@GLIBC_2.3.2 ()
>>>     at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
>>> #1  0x00007f801372b229 in qemu_cond_wait (cond=cond@entry=0x7f801402fd80,
>>>     mutex=mutex@entry=0x7f80144367c0) at qemu-thread-posix.c:113
>>> #2  0x00007f8013768949 in pause_all_vcpus () at
>>> /home/stefanha/qemu-kvm/cpus.c:962
>>> #3  0x00007f80136028c8 in main (argc=<optimized out>, argv=<optimized out>,
>>>     envp=<optimized out>) at /home/stefanha/qemu-kvm/vl.c:3695
>>>
>>> We're deadlocked in pause_all_vcpus(), waiting for vcpu #0 to pause.
>>> Unfortunately vcpu #0 has ->exit_request=0 although ->stop=1.
>>>
>>> Here are the vcpus:
>>>
>>> (gdb) p first_cpu
>>> $6 = (struct CPUX86State *) 0x7f8015b49640
>>> (gdb) p first_cpu->next_cpu
>>> $7 = (struct CPUX86State *) 0x7f8015b67450
>>> (gdb) p first_cpu->next_cpu->next_cpu
>>> $8 = (struct CPUX86State *) 0x0
>>>
>>> (gdb) p first_cpu->stop
>>> $9 = 1
>>> (gdb) p first_cpu->stopped
>>> $10 = 0
>>> (gdb) p first_cpu->exit_request
>>> $11 = 0
>>
>> CPUState::exit_request is only set on specific synchronous events, see
>> target-i386/kvm.c.
>>
>> More interesting is CPUState::thread_kicked. If it's set, qemu_cpu_kick
>> will skip the kicking via a signal. Maybe there is some race. Let me
>> think about such possibilities again...
> 
> diff --git a/cpus.c b/cpus.c
> index e476a3c..30f3228 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -726,6 +726,9 @@ static void qemu_kvm_wait_io_event(CPUArchState *env)
>      }
>  
>      qemu_kvm_eat_signals(env);
> +    /* Ensure that checking env->stop cannot overtake signal processing so
> +     * that we lose the latter without stopping. */
> +    smp_rmb();

rmb is nonsense. Should be a plain barrier() - if at all.

>      qemu_wait_io_event_common(env);
>  }
>  
> Can anyone imagine that such a barrier may actually be required? If it
> is currently possible that env->stop is evaluated before we called into
> sigtimedwait in qemu_kvm_eat_signals, then we could actually eat the
> signal without properly processing its reason (stop).

Jan
Jan Kiszka - Aug. 17, 2012, 3:04 p.m.
On 2012-08-17 16:41, Jan Kiszka wrote:
> On 2012-08-17 16:36, Jan Kiszka wrote:
>> On 2012-08-17 15:11, Jan Kiszka wrote:
>>> On 2012-08-06 17:11, Stefan Hajnoczi wrote:
>>>> On Thu, Jun 28, 2012 at 2:05 PM, Peter Lieven <pl@dlhnet.de> wrote:
>>>>> i debugged my initial problem further and found out that the problem happens
>>>>> to be that
>>>>> the main thread is stuck in pause_all_vcpus() on reset or quit commands in
>>>>> the monitor
>>>>> if one cpu is stuck in the do-while loop kvm_cpu_exec. If I modify the
>>>>> condition from while (ret == 0)
>>>>> to while ((ret == 0) && !env->stop); it works, but is this the right fix?
>>>>> "Quit" command seems to work, but on "Reset" the VM enterns pause state.
>>>>
>>>> I think I'm hitting something similar.  I installed a F17 amd64 guest
>>>> (3.5 kernel) but before booting entered the GRUB boot menu edit mode.
>>>> The guest seemed unresponsive so I switched to the monitor, which also
>>>> froze shortly afterwards.  The VNC screen ended up being all black.
>>>>
>>>> qemu-kvm.git/master 3e4305694fd891b69e4450e59ec4c65420907ede
>>>> Linux 3.2.0-3-amd64 from Debian testing
>>>>
>>>> $ qemu-system-x86_64 -enable-kvm -m 1024 -smp 2 -drive
>>>> if=virtio,cache=none,file=f17.img,aio=native -serial stdio
>>>>
>>>> (gdb) thread apply all bt
>>>>
>>>> Thread 3 (Thread 0x7f8008e23700 (LWP 367)):
>>>> #0  0x00007f800f891727 in ioctl () at ../sysdeps/unix/syscall-template.S:82
>>>> #1  0x00007f80137b92c9 in kvm_vcpu_ioctl
>>>> (env=env@entry=0x7f8015b49640, type=type@entry=44672)
>>>>     at /home/stefanha/qemu-kvm/kvm-all.c:1619
>>>> #2  0x00007f80137b93fe in kvm_cpu_exec (env=env@entry=0x7f8015b49640)
>>>>     at /home/stefanha/qemu-kvm/kvm-all.c:1506
>>>> #3  0x00007f8013766f31 in qemu_kvm_cpu_thread_fn (arg=0x7f8015b49640)
>>>>     at /home/stefanha/qemu-kvm/cpus.c:756
>>>> #4  0x00007f800fb4db50 in start_thread (arg=<optimized out>) at
>>>> pthread_create.c:304
>>>> #5  0x00007f800f8986dd in clone () at
>>>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
>>>> #6  0x0000000000000000 in ?? ()
>>>>
>>>> This vcpu is still executing guest code and I've seen it successfully
>>>> dispatching I/O.  The problem is it's missing the exit_request...
>>>>
>>>> Thread 2 (Thread 0x7f8008622700 (LWP 368)):
>>>> #0  pthread_cond_wait@@GLIBC_2.3.2 ()
>>>>     at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
>>>> #1  0x00007f801372b229 in qemu_cond_wait (cond=<optimized out>,
>>>>     mutex=mutex@entry=0x7f80144367c0) at qemu-thread-posix.c:113
>>>> #2  0x00007f8013766eff in qemu_kvm_wait_io_event (env=<optimized out>)
>>>>     at /home/stefanha/qemu-kvm/cpus.c:724
>>>> #3  qemu_kvm_cpu_thread_fn (arg=0x7f8015b67450) at
>>>> /home/stefanha/qemu-kvm/cpus.c:761
>>>> #4  0x00007f800fb4db50 in start_thread (arg=<optimized out>) at
>>>> pthread_create.c:304
>>>> #5  0x00007f800f8986dd in clone () at
>>>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
>>>> #6  0x0000000000000000 in ?? ()
>>>>
>>>> No problems here.
>>>>
>>>> Thread 1 (Thread 0x7f801347b8c0 (LWP 365)):
>>>> #0  pthread_cond_wait@@GLIBC_2.3.2 ()
>>>>     at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
>>>> #1  0x00007f801372b229 in qemu_cond_wait (cond=cond@entry=0x7f801402fd80,
>>>>     mutex=mutex@entry=0x7f80144367c0) at qemu-thread-posix.c:113
>>>> #2  0x00007f8013768949 in pause_all_vcpus () at
>>>> /home/stefanha/qemu-kvm/cpus.c:962
>>>> #3  0x00007f80136028c8 in main (argc=<optimized out>, argv=<optimized out>,
>>>>     envp=<optimized out>) at /home/stefanha/qemu-kvm/vl.c:3695
>>>>
>>>> We're deadlocked in pause_all_vcpus(), waiting for vcpu #0 to pause.
>>>> Unfortunately vcpu #0 has ->exit_request=0 although ->stop=1.
>>>>
>>>> Here are the vcpus:
>>>>
>>>> (gdb) p first_cpu
>>>> $6 = (struct CPUX86State *) 0x7f8015b49640
>>>> (gdb) p first_cpu->next_cpu
>>>> $7 = (struct CPUX86State *) 0x7f8015b67450
>>>> (gdb) p first_cpu->next_cpu->next_cpu
>>>> $8 = (struct CPUX86State *) 0x0
>>>>
>>>> (gdb) p first_cpu->stop
>>>> $9 = 1
>>>> (gdb) p first_cpu->stopped
>>>> $10 = 0
>>>> (gdb) p first_cpu->exit_request
>>>> $11 = 0
>>>
>>> CPUState::exit_request is only set on specific synchronous events, see
>>> target-i386/kvm.c.
>>>
>>> More interesting is CPUState::thread_kicked. If it's set, qemu_cpu_kick
>>> will skip the kicking via a signal. Maybe there is some race. Let me
>>> think about such possibilities again...
>>
>> diff --git a/cpus.c b/cpus.c
>> index e476a3c..30f3228 100644
>> --- a/cpus.c
>> +++ b/cpus.c
>> @@ -726,6 +726,9 @@ static void qemu_kvm_wait_io_event(CPUArchState *env)
>>      }
>>  
>>      qemu_kvm_eat_signals(env);
>> +    /* Ensure that checking env->stop cannot overtake signal processing so
>> +     * that we lose the latter without stopping. */
>> +    smp_rmb();
> 
> rmb is nonsense. Should be a plain barrier() - if at all.
> 
>>      qemu_wait_io_event_common(env);
>>  }
>>  
>> Can anyone imagine that such a barrier may actually be required? If it
>> is currently possible that env->stop is evaluated before we called into
>> sigtimedwait in qemu_kvm_eat_signals, then we could actually eat the
>> signal without properly processing its reason (stop).

Should not be required (TM): Both signal eating / stop checking and stop
setting / signal generation happens under the BQL, thus the ordering
must not make a difference here.

Don't see where we could lose a signal. Maybe due to a subtle memory
corruption that sets thread_kicked to non-zero, preventing the kicking
this way.

Jan
Avi Kivity - Aug. 19, 2012, 9:42 a.m.
On 08/17/2012 06:04 PM, Jan Kiszka wrote:
>  
>>> Can anyone imagine that such a barrier may actually be required? If it
>>> is currently possible that env->stop is evaluated before we called into
>>> sigtimedwait in qemu_kvm_eat_signals, then we could actually eat the
>>> signal without properly processing its reason (stop).
> 
> Should not be required (TM): Both signal eating / stop checking and stop
> setting / signal generation happens under the BQL, thus the ordering
> must not make a difference here.

Agree.


> Don't see where we could lose a signal. Maybe due to a subtle memory
> corruption that sets thread_kicked to non-zero, preventing the kicking
> this way.

Cannot be ruled out, yet too much of a coincidence.

Could be a kernel bug (either in kvm or elsewhere), we've had several
before in this area.

Is this reproducible?
Jan Kiszka - Aug. 21, 2012, 7:21 a.m.
On 2012-08-19 11:42, Avi Kivity wrote:
> On 08/17/2012 06:04 PM, Jan Kiszka wrote:
>>  
>>>> Can anyone imagine that such a barrier may actually be required? If it
>>>> is currently possible that env->stop is evaluated before we called into
>>>> sigtimedwait in qemu_kvm_eat_signals, then we could actually eat the
>>>> signal without properly processing its reason (stop).
>>
>> Should not be required (TM): Both signal eating / stop checking and stop
>> setting / signal generation happens under the BQL, thus the ordering
>> must not make a difference here.
> 
> Agree.
> 
> 
>> Don't see where we could lose a signal. Maybe due to a subtle memory
>> corruption that sets thread_kicked to non-zero, preventing the kicking
>> this way.
> 
> Cannot be ruled out, yet too much of a coincidence.
> 
> Could be a kernel bug (either in kvm or elsewhere), we've had several
> before in this area.
> 
> Is this reproducible?

Not for me. Peter only hit it very rarely, Peter obviously more easily.

Jan
Stefan Hajnoczi - Aug. 21, 2012, 8:23 a.m.
On Tue, Aug 21, 2012 at 8:21 AM, Jan Kiszka <jan.kiszka@siemens.com> wrote:
> On 2012-08-19 11:42, Avi Kivity wrote:
>> On 08/17/2012 06:04 PM, Jan Kiszka wrote:
>>>
>>>>> Can anyone imagine that such a barrier may actually be required? If it
>>>>> is currently possible that env->stop is evaluated before we called into
>>>>> sigtimedwait in qemu_kvm_eat_signals, then we could actually eat the
>>>>> signal without properly processing its reason (stop).
>>>
>>> Should not be required (TM): Both signal eating / stop checking and stop
>>> setting / signal generation happens under the BQL, thus the ordering
>>> must not make a difference here.
>>
>> Agree.
>>
>>
>>> Don't see where we could lose a signal. Maybe due to a subtle memory
>>> corruption that sets thread_kicked to non-zero, preventing the kicking
>>> this way.
>>
>> Cannot be ruled out, yet too much of a coincidence.
>>
>> Could be a kernel bug (either in kvm or elsewhere), we've had several
>> before in this area.
>>
>> Is this reproducible?
>
> Not for me. Peter only hit it very rarely, Peter obviously more easily.

I have only hit this once and was not able to reproduce it.

Stefan
Peter Lieven - Aug. 22, 2012, 12:52 p.m.
On 08/21/12 10:23, Stefan Hajnoczi wrote:
> On Tue, Aug 21, 2012 at 8:21 AM, Jan Kiszka<jan.kiszka@siemens.com>  wrote:
>> On 2012-08-19 11:42, Avi Kivity wrote:
>>> On 08/17/2012 06:04 PM, Jan Kiszka wrote:
>>>>>> Can anyone imagine that such a barrier may actually be required? If it
>>>>>> is currently possible that env->stop is evaluated before we called into
>>>>>> sigtimedwait in qemu_kvm_eat_signals, then we could actually eat the
>>>>>> signal without properly processing its reason (stop).
>>>> Should not be required (TM): Both signal eating / stop checking and stop
>>>> setting / signal generation happens under the BQL, thus the ordering
>>>> must not make a difference here.
>>> Agree.
>>>
>>>
>>>> Don't see where we could lose a signal. Maybe due to a subtle memory
>>>> corruption that sets thread_kicked to non-zero, preventing the kicking
>>>> this way.
>>> Cannot be ruled out, yet too much of a coincidence.
>>>
>>> Could be a kernel bug (either in kvm or elsewhere), we've had several
>>> before in this area.
>>>
>>> Is this reproducible?
>> Not for me. Peter only hit it very rarely, Peter obviously more easily.
> I have only hit this once and was not able to reproduce it.
For me it was very reproducible, but my issue was fixed by:

http://www.mail-archive.com/kvm@vger.kernel.org/msg70908.html

Never seen this since then,
Peter

> Stefan

Patch

diff --git a/cpus.c b/cpus.c
index e476a3c..30f3228 100644
--- a/cpus.c
+++ b/cpus.c
@@ -726,6 +726,9 @@  static void qemu_kvm_wait_io_event(CPUArchState *env)
     }
 
     qemu_kvm_eat_signals(env);
+    /* Ensure that checking env->stop cannot overtake signal processing so
+     * that we lose the latter without stopping. */
+    smp_rmb();
     qemu_wait_io_event_common(env);
 }