diff mbox

Re: [PATCH] fix halt emulation with icount and CONFIG_IOTHREAD (v2)

Message ID 4D5B99A9.1010404@redhat.com
State New
Headers show

Commit Message

Paolo Bonzini Feb. 16, 2011, 9:32 a.m. UTC
On 02/15/2011 09:56 PM, Marcelo Tosatti wrote:
> Note: to be applied to uq/master.
>
> In icount mode, halt emulation should take into account the nearest
> event when sleeping.

I agree with Jan that this patch is not the best solution, if not incorrect.

However, in the iothread, the main loop can kick the VCPU thread instead 
of running cpu_exec_all like it does in non-iothread mode.  Something 
like this:


I don't like this 100% because it relies on the fact that there is only 
one TCG execution thread.  In a multithreaded world you would:

1) have each CPU register its own instruction counter;

2) have each CPU register its own QEMU_CLOCK_REALTIME timer based on 
qemu_icount_delta() and arm it just before going to sleep; the timer 
kicks the CPU.

3) remove all icount business from qemu_calculate_timeout.

Item (3) is what makes me prefer my patch above (if it works) to 
Marcelo's.  Marcelo's patch is tying even more qemu_calculate_timeout to 
the icount.  So if anything, a patch tweaking the timedwait like 
Marcelo's should use something based on qemu_icount_delta().

Paolo

Comments

Jan Kiszka Feb. 16, 2011, 9:46 a.m. UTC | #1
On 2011-02-16 10:32, Paolo Bonzini wrote:
> On 02/15/2011 09:56 PM, Marcelo Tosatti wrote:
>> Note: to be applied to uq/master.
>>
>> In icount mode, halt emulation should take into account the nearest
>> event when sleeping.
> 
> I agree with Jan that this patch is not the best solution, if not
> incorrect.
> 
> However, in the iothread, the main loop can kick the VCPU thread instead
> of running cpu_exec_all like it does in non-iothread mode.  Something
> like this:
> 
> diff --git a/vl.c b/vl.c
> index b436952..7835317 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -1425,7 +1425,9 @@ static void main_loop(void)
>      qemu_main_loop_start();
> 
>      for (;;) {
> -#ifndef CONFIG_IOTHREAD
> +#ifdef CONFIG_IOTHREAD
> +        qemu_cpu_kick(first_cpu);
> +#else
>          nonblocking = cpu_exec_all();
>          if (vm_request_pending()) {
>              nonblocking = true;

What should this be good for? The iothread already kicks the vcpu if it
wants to acquire the contended global mutex. And when the vcpu thread is
in halt state, kicking it should change no other state.

> 
> I don't like this 100% because it relies on the fact that there is only
> one TCG execution thread.  In a multithreaded world you would:
> 
> 1) have each CPU register its own instruction counter;
> 
> 2) have each CPU register its own QEMU_CLOCK_REALTIME timer based on
> qemu_icount_delta() and arm it just before going to sleep; the timer
> kicks the CPU.
> 
> 3) remove all icount business from qemu_calculate_timeout.
> 
> Item (3) is what makes me prefer my patch above (if it works) to
> Marcelo's.  Marcelo's patch is tying even more qemu_calculate_timeout to
> the icount.  So if anything, a patch tweaking the timedwait like
> Marcelo's should use something based on qemu_icount_delta().

Really, that idle loop apparently does _nothing_ while
all_cpu_threads_idle is true. Or does the IPI signal handler apply some
magic? I still don't get what is supposed to be fixed in
qemu_tcg_wait_io_event.

Jan
Paolo Bonzini Feb. 16, 2011, 9:57 a.m. UTC | #2
On 02/16/2011 10:46 AM, Jan Kiszka wrote:
> What should this be good for? The iothread already kicks the vcpu if it
> wants to acquire the contended global mutex.

Assuming the VCPU is in the timedwait that Marcelo changed, the global 
mutex is free and the iothread will not kick the VCPU.

> And when the vcpu thread is
> in halt state, kicking it should change no other state.

Kicking the VCPU will start running it, if an interrupt request from the 
devices caused cpu_has_work to become true (and hence 
all_cpu_threads_idle to become false).

So, perhaps the correct fix is to kick the cpu in cpu_interrupt, and all 
I wrote about timeouts and timers is wrong.  My patch would band-aid it.

Paolo
Jan Kiszka Feb. 16, 2011, 10:04 a.m. UTC | #3
On 2011-02-16 10:57, Paolo Bonzini wrote:
> On 02/16/2011 10:46 AM, Jan Kiszka wrote:
>> What should this be good for? The iothread already kicks the vcpu if it
>> wants to acquire the contended global mutex.
> 
> Assuming the VCPU is in the timedwait that Marcelo changed, the global 
> mutex is free and the iothread will not kick the VCPU.

Then why should it kick it at all?

> 
>> And when the vcpu thread is
>> in halt state, kicking it should change no other state.
> 
> Kicking the VCPU will start running it, if an interrupt request from the 
> devices caused cpu_has_work to become true (and hence 
> all_cpu_threads_idle to become false).

If we change the halt condition, we should not kick the vcpus but only
signal the condition variable. Actually, I've a patch queued that skips
pointless qemu_thread_signal in qemu_cpu_kick for TCG.

> 
> So, perhaps the correct fix is to kick the cpu in cpu_interrupt, and all 
> I wrote about timeouts and timers is wrong.  My patch would band-aid it.

That's my strong suspect. We really need to understand what goes wrong.

Jan
Paolo Bonzini Feb. 16, 2011, 10:27 a.m. UTC | #4
On 02/16/2011 11:04 AM, Jan Kiszka wrote:
> On 2011-02-16 10:57, Paolo Bonzini wrote:
>> On 02/16/2011 10:46 AM, Jan Kiszka wrote:
>>> What should this be good for? The iothread already kicks the vcpu if it
>>> wants to acquire the contended global mutex.
>>
>> Assuming the VCPU is in the timedwait that Marcelo changed, the global
>> mutex is free and the iothread will not kick the VCPU.
>
> Then why should it kick it at all?

To make it notice something changed in all_cpu_threads_idle---but that's 
wrong, it should have been kicked in cpu_interrupt.

> If we change the halt condition, we should not kick the vcpus but only
> signal the condition variable. Actually, I've a patch queued that skips
> pointless qemu_thread_signal in qemu_cpu_kick for TCG.

Yes, I was kicking just because that's the wrapper that is used to 
signal the condition variable---just like I was kicking in my patches to 
eliminate timedwait.

>> So, perhaps the correct fix is to kick the cpu in cpu_interrupt, and all
>> I wrote about timeouts and timers is wrong.  My patch would band-aid it.
>
> That's my strong suspect. We really need to understand what goes wrong.

I agree on both counts.

Paolo
Jan Kiszka Feb. 16, 2011, 10:34 a.m. UTC | #5
On 2011-02-16 11:27, Paolo Bonzini wrote:
> On 02/16/2011 11:04 AM, Jan Kiszka wrote:
>> On 2011-02-16 10:57, Paolo Bonzini wrote:
>>> On 02/16/2011 10:46 AM, Jan Kiszka wrote:
>>>> What should this be good for? The iothread already kicks the vcpu if it
>>>> wants to acquire the contended global mutex.
>>>
>>> Assuming the VCPU is in the timedwait that Marcelo changed, the global
>>> mutex is free and the iothread will not kick the VCPU.
>>
>> Then why should it kick it at all?
> 
> To make it notice something changed in all_cpu_threads_idle---but that's 
> wrong, it should have been kicked in cpu_interrupt.
> 
>> If we change the halt condition, we should not kick the vcpus but only
>> signal the condition variable. Actually, I've a patch queued that skips
>> pointless qemu_thread_signal in qemu_cpu_kick for TCG.
> 
> Yes, I was kicking just because that's the wrapper that is used to 
> signal the condition variable---just like I was kicking in my patches to 
> eliminate timedwait.
> 
>>> So, perhaps the correct fix is to kick the cpu in cpu_interrupt, and all
>>> I wrote about timeouts and timers is wrong.  My patch would band-aid it.
>>
>> That's my strong suspect. We really need to understand what goes wrong.
> 
> I agree on both counts.
> 

FWIW, I've rebased most of your patches on top of my outstanding ones
and pushed them to

git://git.kiszka.org/qemu-kvm.git queues/kvm-upstream

Jan
Paolo Bonzini Feb. 16, 2011, 11:05 a.m. UTC | #6
On 02/16/2011 11:34 AM, Jan Kiszka wrote:
> FWIW, I've rebased most of your patches on top of my outstanding ones
> and pushed them to
>
> git://git.kiszka.org/qemu-kvm.git queues/kvm-upstream

Yep, I am waiting for Anthony to actually push it.  In the meanwhile I 
have it at git://github.com/bonzini/qemu, branch iothread-win32.

Paolo
Marcelo Tosatti Feb. 17, 2011, 3:15 a.m. UTC | #7
On Wed, Feb 16, 2011 at 10:32:25AM +0100, Paolo Bonzini wrote:
> On 02/15/2011 09:56 PM, Marcelo Tosatti wrote:
> >Note: to be applied to uq/master.
> >
> >In icount mode, halt emulation should take into account the nearest
> >event when sleeping.
> 
> I agree with Jan that this patch is not the best solution, if not incorrect.
> 
> However, in the iothread, the main loop can kick the VCPU thread
> instead of running cpu_exec_all like it does in non-iothread mode.
> Something like this:
> 
> diff --git a/vl.c b/vl.c
> index b436952..7835317 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -1425,7 +1425,9 @@ static void main_loop(void)
>      qemu_main_loop_start();
> 
>      for (;;) {
> -#ifndef CONFIG_IOTHREAD
> +#ifdef CONFIG_IOTHREAD
> +        qemu_cpu_kick(first_cpu);
> +#else
>          nonblocking = cpu_exec_all();
>          if (vm_request_pending()) {
>              nonblocking = true;
> 
> I don't like this 100% because it relies on the fact that there is
> only one TCG execution thread.  In a multithreaded world you would:
> 
> 1) have each CPU register its own instruction counter;
> 
> 2) have each CPU register its own QEMU_CLOCK_REALTIME timer based on
> qemu_icount_delta() and arm it just before going to sleep; the timer
> kicks the CPU.
> 
> 3) remove all icount business from qemu_calculate_timeout.
> 
> Item (3) is what makes me prefer my patch above (if it works) to
> Marcelo's.  Marcelo's patch is tying even more
> qemu_calculate_timeout to the icount.  So if anything, a patch
> tweaking the timedwait like Marcelo's should use something based on
> qemu_icount_delta().

Yes, using qemu_icount_delta directly in tcg_wait_io_event timedwait 
is explicit (partially the reason for confusion with my patch).

So the reasoning for the patch is:

With icount vm_timer timers expire on virtual CPU time. If a CPU halts,
you cannot expect passage of realtime to trigger vm_timers expiration.

So instead vm_timer expiration is converted to realtime, and used as
halt timeout.
Paolo Bonzini Feb. 17, 2011, 8:27 a.m. UTC | #8
On 02/17/2011 04:15 AM, Marcelo Tosatti wrote:
> Yes, using qemu_icount_delta directly in tcg_wait_io_event timedwait
> is explicit (partially the reason for confusion with my patch).
>
> So the reasoning for the patch is:
>
> With icount vm_timer timers expire on virtual CPU time. If a CPU halts,
> you cannot expect passage of realtime to trigger vm_timers expiration.

But if a CPU is halted, all_cpu_threads_idle() will still be true even 
if you signal the condition variable, and you'll be looping in the while 
condition.  That's why I say that

    while (x) {
        cond_timedwait (cond, mutex);
    }

(i.e. without checking the return value of cond_timedwait, and without 
polling something else upon return) is a broken idiom that can only work 
around missing signals/broadcasts.

> So instead vm_timer expiration is converted to realtime, and used as
> halt timeout.

But vm_timer timers are only used by "-icount auto", which works in my 
tests [*].  It is "-icount N" which is broken and unfortunately your 
patch does not fix it.  The CRIS image on qemu.org triggers the watchdog 
(and if I eliminate the watchdog I see that the CPU is hung).

     [*] Actually, it works but doesn't calibrate very well.  It shows
         25 bogomips, sometimes 50, compared to 250 without iothread.

Paolo
Jan Kiszka Feb. 17, 2011, 8:29 a.m. UTC | #9
On 2011-02-17 04:15, Marcelo Tosatti wrote:
> On Wed, Feb 16, 2011 at 10:32:25AM +0100, Paolo Bonzini wrote:
>> On 02/15/2011 09:56 PM, Marcelo Tosatti wrote:
>>> Note: to be applied to uq/master.
>>>
>>> In icount mode, halt emulation should take into account the nearest
>>> event when sleeping.
>>
>> I agree with Jan that this patch is not the best solution, if not incorrect.
>>
>> However, in the iothread, the main loop can kick the VCPU thread
>> instead of running cpu_exec_all like it does in non-iothread mode.
>> Something like this:
>>
>> diff --git a/vl.c b/vl.c
>> index b436952..7835317 100644
>> --- a/vl.c
>> +++ b/vl.c
>> @@ -1425,7 +1425,9 @@ static void main_loop(void)
>>      qemu_main_loop_start();
>>
>>      for (;;) {
>> -#ifndef CONFIG_IOTHREAD
>> +#ifdef CONFIG_IOTHREAD
>> +        qemu_cpu_kick(first_cpu);
>> +#else
>>          nonblocking = cpu_exec_all();
>>          if (vm_request_pending()) {
>>              nonblocking = true;
>>
>> I don't like this 100% because it relies on the fact that there is
>> only one TCG execution thread.  In a multithreaded world you would:
>>
>> 1) have each CPU register its own instruction counter;
>>
>> 2) have each CPU register its own QEMU_CLOCK_REALTIME timer based on
>> qemu_icount_delta() and arm it just before going to sleep; the timer
>> kicks the CPU.
>>
>> 3) remove all icount business from qemu_calculate_timeout.
>>
>> Item (3) is what makes me prefer my patch above (if it works) to
>> Marcelo's.  Marcelo's patch is tying even more
>> qemu_calculate_timeout to the icount.  So if anything, a patch
>> tweaking the timedwait like Marcelo's should use something based on
>> qemu_icount_delta().
> 
> Yes, using qemu_icount_delta directly in tcg_wait_io_event timedwait 
> is explicit (partially the reason for confusion with my patch).
> 
> So the reasoning for the patch is:
> 
> With icount vm_timer timers expire on virtual CPU time. If a CPU halts,
> you cannot expect passage of realtime to trigger vm_timers expiration.
> 
> So instead vm_timer expiration is converted to realtime, and used as
> halt timeout.

The changing the calculation is trying to cure a symptom. A halt with
timeout is already broken, but we fortunately have a patch against that.
Let's shake potential remaining bugs out of *that*.

Jan
Paolo Bonzini Feb. 18, 2011, 5:13 p.m. UTC | #10
On 02/17/2011 09:27 AM, Paolo Bonzini wrote:
> It is "-icount N" which is broken and unfortunately your patch does not
> fix it.

The problem is that for "use_icount == 1" qemu_icount_delta always 
returns 0, and this makes no sense in the iothread case.  As soon as the 
delta becomes greater than 10 ms (the maximum adjustment in 
qemu_calculate_timeout) you just keep polling but hardly execute any code.

I'll try to post something in the weekend.

Paolo
diff mbox

Patch

diff --git a/vl.c b/vl.c
index b436952..7835317 100644
--- a/vl.c
+++ b/vl.c
@@ -1425,7 +1425,9 @@  static void main_loop(void)
      qemu_main_loop_start();

      for (;;) {
-#ifndef CONFIG_IOTHREAD
+#ifdef CONFIG_IOTHREAD
+        qemu_cpu_kick(first_cpu);
+#else
          nonblocking = cpu_exec_all();
          if (vm_request_pending()) {
              nonblocking = true;