Patchwork [14/22] kvm: Fix race between timer signals and vcpu entry under !IOTHREAD

login
register
mail settings
Submitter Jan Kiszka
Date Jan. 27, 2011, 1:09 p.m.
Message ID <c576bef42c459463e4cd18f61f845a32fd347b60.1296133797.git.jan.kiszka@siemens.com>
Download mbox | patch
Permalink /patch/80676/
State New
Headers show

Comments

Jan Kiszka - Jan. 27, 2011, 1:09 p.m.
Found by Stefan Hajnoczi: There is a race in kvm_cpu_exec between
checking for exit_request on vcpu entry and timer signals arriving
before KVM starts to catch them. Plug it by blocking both timer related
signals also on !CONFIG_IOTHREAD and process those via signalfd.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
CC: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
---
 cpus.c |   18 ++++++++++++++++++
 1 files changed, 18 insertions(+), 0 deletions(-)
Marcelo Tosatti - Feb. 1, 2011, 12:47 p.m.
On Thu, Jan 27, 2011 at 02:09:58PM +0100, Jan Kiszka wrote:
> Found by Stefan Hajnoczi: There is a race in kvm_cpu_exec between
> checking for exit_request on vcpu entry and timer signals arriving
> before KVM starts to catch them. Plug it by blocking both timer related
> signals also on !CONFIG_IOTHREAD and process those via signalfd.
> 
> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> CC: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
> ---
>  cpus.c |   18 ++++++++++++++++++
>  1 files changed, 18 insertions(+), 0 deletions(-)
> 
> diff --git a/cpus.c b/cpus.c
> index fc3f222..29b1070 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -254,6 +254,10 @@ static void qemu_kvm_init_cpu_signals(CPUState *env)
>      pthread_sigmask(SIG_BLOCK, NULL, &set);
>      sigdelset(&set, SIG_IPI);
>      sigdelset(&set, SIGBUS);
> +#ifndef CONFIG_IOTHREAD
> +    sigdelset(&set, SIGIO);
> +    sigdelset(&set, SIGALRM);
> +#endif

I'd prefer separate qemu_kvm_init_cpu_signals in the !IOTHREAD
section.

>      r = kvm_set_signal_mask(env, &set);
>      if (r) {
>          fprintf(stderr, "kvm_set_signal_mask: %s\n", strerror(-r));
> @@ -351,6 +355,12 @@ static void qemu_kvm_eat_signals(CPUState *env)
>              exit(1);
>          }
>      } while (sigismember(&chkset, SIG_IPI) || sigismember(&chkset, SIGBUS));
> +
> +#ifndef CONFIG_IOTHREAD
> +    if (sigismember(&chkset, SIGIO) || sigismember(&chkset, SIGALRM)) {
> +        qemu_notify_event();
> +    }
> +#endif

Why is this necessary?

You should break out of cpu_exec_all if there's a pending alarm (see
qemu_alarm_pending()).

>  }
>  
>  #else /* _WIN32 */
> @@ -398,6 +408,14 @@ int qemu_init_main_loop(void)
>      int ret;
>  
>      sigemptyset(&blocked_signals);
> +    if (kvm_enabled()) {
> +        /*
> +         * We need to process timer signals synchronously to avoid a race
> +         * between exit_request check and KVM vcpu entry.
> +         */
> +        sigaddset(&blocked_signals, SIGIO);
> +        sigaddset(&blocked_signals, SIGALRM);
> +    }

A block_io_signals() function for !IOTHREAD would be nicer.

>  
>      ret = qemu_signalfd_init(blocked_signals);
>      if (ret) {
> -- 
> 1.7.1
Jan Kiszka - Feb. 1, 2011, 1:32 p.m.
On 2011-02-01 13:47, Marcelo Tosatti wrote:
> On Thu, Jan 27, 2011 at 02:09:58PM +0100, Jan Kiszka wrote:
>> Found by Stefan Hajnoczi: There is a race in kvm_cpu_exec between
>> checking for exit_request on vcpu entry and timer signals arriving
>> before KVM starts to catch them. Plug it by blocking both timer related
>> signals also on !CONFIG_IOTHREAD and process those via signalfd.
>>
>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>> CC: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
>> ---
>>  cpus.c |   18 ++++++++++++++++++
>>  1 files changed, 18 insertions(+), 0 deletions(-)
>>
>> diff --git a/cpus.c b/cpus.c
>> index fc3f222..29b1070 100644
>> --- a/cpus.c
>> +++ b/cpus.c
>> @@ -254,6 +254,10 @@ static void qemu_kvm_init_cpu_signals(CPUState *env)
>>      pthread_sigmask(SIG_BLOCK, NULL, &set);
>>      sigdelset(&set, SIG_IPI);
>>      sigdelset(&set, SIGBUS);
>> +#ifndef CONFIG_IOTHREAD
>> +    sigdelset(&set, SIGIO);
>> +    sigdelset(&set, SIGALRM);
>> +#endif
> 
> I'd prefer separate qemu_kvm_init_cpu_signals in the !IOTHREAD
> section.

You mean to duplicate qemu_kvm_init_cpu_signals for both configurations?

> 
>>      r = kvm_set_signal_mask(env, &set);
>>      if (r) {
>>          fprintf(stderr, "kvm_set_signal_mask: %s\n", strerror(-r));
>> @@ -351,6 +355,12 @@ static void qemu_kvm_eat_signals(CPUState *env)
>>              exit(1);
>>          }
>>      } while (sigismember(&chkset, SIG_IPI) || sigismember(&chkset, SIGBUS));
>> +
>> +#ifndef CONFIG_IOTHREAD
>> +    if (sigismember(&chkset, SIGIO) || sigismember(&chkset, SIGALRM)) {
>> +        qemu_notify_event();
>> +    }
>> +#endif
> 
> Why is this necessary?
> 
> You should break out of cpu_exec_all if there's a pending alarm (see
> qemu_alarm_pending()).

qemu_alarm_pending() is not true until the signal is actually taken. The
alarm handler sets the required flags.

> 
>>  }
>>  
>>  #else /* _WIN32 */
>> @@ -398,6 +408,14 @@ int qemu_init_main_loop(void)
>>      int ret;
>>  
>>      sigemptyset(&blocked_signals);
>> +    if (kvm_enabled()) {
>> +        /*
>> +         * We need to process timer signals synchronously to avoid a race
>> +         * between exit_request check and KVM vcpu entry.
>> +         */
>> +        sigaddset(&blocked_signals, SIGIO);
>> +        sigaddset(&blocked_signals, SIGALRM);
>> +    }
> 
> A block_io_signals() function for !IOTHREAD would be nicer.

Well, we aren't blocking all I/O signals, so I decided against causing
confusion to people that try to compare the result against real
block_io_signals. If you mean just pushing those lines that set up
blocked_signals into a separate function, then I need to find a good
name for it.

> 
>>  
>>      ret = qemu_signalfd_init(blocked_signals);
>>      if (ret) {
>> -- 
>> 1.7.1

Thanks,
Jan
Marcelo Tosatti - Feb. 1, 2011, 1:48 p.m.
On Tue, Feb 01, 2011 at 02:32:38PM +0100, Jan Kiszka wrote:
> On 2011-02-01 13:47, Marcelo Tosatti wrote:
> > On Thu, Jan 27, 2011 at 02:09:58PM +0100, Jan Kiszka wrote:
> >> Found by Stefan Hajnoczi: There is a race in kvm_cpu_exec between
> >> checking for exit_request on vcpu entry and timer signals arriving
> >> before KVM starts to catch them. Plug it by blocking both timer related
> >> signals also on !CONFIG_IOTHREAD and process those via signalfd.
> >>
> >> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> >> CC: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
> >> ---
> >>  cpus.c |   18 ++++++++++++++++++
> >>  1 files changed, 18 insertions(+), 0 deletions(-)
> >>
> >> diff --git a/cpus.c b/cpus.c
> >> index fc3f222..29b1070 100644
> >> --- a/cpus.c
> >> +++ b/cpus.c
> >> @@ -254,6 +254,10 @@ static void qemu_kvm_init_cpu_signals(CPUState *env)
> >>      pthread_sigmask(SIG_BLOCK, NULL, &set);
> >>      sigdelset(&set, SIG_IPI);
> >>      sigdelset(&set, SIGBUS);
> >> +#ifndef CONFIG_IOTHREAD
> >> +    sigdelset(&set, SIGIO);
> >> +    sigdelset(&set, SIGALRM);
> >> +#endif
> > 
> > I'd prefer separate qemu_kvm_init_cpu_signals in the !IOTHREAD
> > section.
> 
> You mean to duplicate qemu_kvm_init_cpu_signals for both configurations?

Yes, so to avoid #ifdefs spread.

> >> +
> >> +#ifndef CONFIG_IOTHREAD
> >> +    if (sigismember(&chkset, SIGIO) || sigismember(&chkset, SIGALRM)) {
> >> +        qemu_notify_event();
> >> +    }
> >> +#endif
> > 
> > Why is this necessary?
> > 
> > You should break out of cpu_exec_all if there's a pending alarm (see
> > qemu_alarm_pending()).
> 
> qemu_alarm_pending() is not true until the signal is actually taken. The
> alarm handler sets the required flags.

Right. What i mean is you need to execute the signal handler inside
cpu_exec_all loop (so that alarm pending is set).

So, if there is a SIGALRM pending, qemu_run_timers has highest
priority, not vcpu execution.

> >>  
> >>  #else /* _WIN32 */
> >> @@ -398,6 +408,14 @@ int qemu_init_main_loop(void)
> >>      int ret;
> >>  
> >>      sigemptyset(&blocked_signals);
> >> +    if (kvm_enabled()) {
> >> +        /*
> >> +         * We need to process timer signals synchronously to avoid a race
> >> +         * between exit_request check and KVM vcpu entry.
> >> +         */
> >> +        sigaddset(&blocked_signals, SIGIO);
> >> +        sigaddset(&blocked_signals, SIGALRM);
> >> +    }
> > 
> > A block_io_signals() function for !IOTHREAD would be nicer.
> 
> Well, we aren't blocking all I/O signals, so I decided against causing
> confusion to people that try to compare the result against real
> block_io_signals. If you mean just pushing those lines that set up
> blocked_signals into a separate function, then I need to find a good
> name for it.

Yes, separate function, similar to CONFIG_IOTHREAD case (feel free to
rename function).
Jan Kiszka - Feb. 1, 2011, 1:58 p.m.
On 2011-02-01 14:48, Marcelo Tosatti wrote:
> On Tue, Feb 01, 2011 at 02:32:38PM +0100, Jan Kiszka wrote:
>> On 2011-02-01 13:47, Marcelo Tosatti wrote:
>>> On Thu, Jan 27, 2011 at 02:09:58PM +0100, Jan Kiszka wrote:
>>>> Found by Stefan Hajnoczi: There is a race in kvm_cpu_exec between
>>>> checking for exit_request on vcpu entry and timer signals arriving
>>>> before KVM starts to catch them. Plug it by blocking both timer related
>>>> signals also on !CONFIG_IOTHREAD and process those via signalfd.
>>>>
>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>>>> CC: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
>>>> ---
>>>>  cpus.c |   18 ++++++++++++++++++
>>>>  1 files changed, 18 insertions(+), 0 deletions(-)
>>>>
>>>> diff --git a/cpus.c b/cpus.c
>>>> index fc3f222..29b1070 100644
>>>> --- a/cpus.c
>>>> +++ b/cpus.c
>>>> @@ -254,6 +254,10 @@ static void qemu_kvm_init_cpu_signals(CPUState *env)
>>>>      pthread_sigmask(SIG_BLOCK, NULL, &set);
>>>>      sigdelset(&set, SIG_IPI);
>>>>      sigdelset(&set, SIGBUS);
>>>> +#ifndef CONFIG_IOTHREAD
>>>> +    sigdelset(&set, SIGIO);
>>>> +    sigdelset(&set, SIGALRM);
>>>> +#endif
>>>
>>> I'd prefer separate qemu_kvm_init_cpu_signals in the !IOTHREAD
>>> section.
>>
>> You mean to duplicate qemu_kvm_init_cpu_signals for both configurations?
> 
> Yes, so to avoid #ifdefs spread.

Would exchange some #ifdefs against ifndef _WIN32. Haven't measured the
delta though.

> 
>>>> +
>>>> +#ifndef CONFIG_IOTHREAD
>>>> +    if (sigismember(&chkset, SIGIO) || sigismember(&chkset, SIGALRM)) {
>>>> +        qemu_notify_event();
>>>> +    }
>>>> +#endif
>>>
>>> Why is this necessary?
>>>
>>> You should break out of cpu_exec_all if there's a pending alarm (see
>>> qemu_alarm_pending()).
>>
>> qemu_alarm_pending() is not true until the signal is actually taken. The
>> alarm handler sets the required flags.
> 
> Right. What i mean is you need to execute the signal handler inside
> cpu_exec_all loop (so that alarm pending is set).
> 
> So, if there is a SIGALRM pending, qemu_run_timers has highest
> priority, not vcpu execution.

We leave the vcpu loop (thanks to notify_event), process the signal in
the event loop and run the timer handler. This pattern is IMO less
invasive to the existing code, specifically as it is about to die
long-term anyway.

> 
>>>>  
>>>>  #else /* _WIN32 */
>>>> @@ -398,6 +408,14 @@ int qemu_init_main_loop(void)
>>>>      int ret;
>>>>  
>>>>      sigemptyset(&blocked_signals);
>>>> +    if (kvm_enabled()) {
>>>> +        /*
>>>> +         * We need to process timer signals synchronously to avoid a race
>>>> +         * between exit_request check and KVM vcpu entry.
>>>> +         */
>>>> +        sigaddset(&blocked_signals, SIGIO);
>>>> +        sigaddset(&blocked_signals, SIGALRM);
>>>> +    }
>>>
>>> A block_io_signals() function for !IOTHREAD would be nicer.
>>
>> Well, we aren't blocking all I/O signals, so I decided against causing
>> confusion to people that try to compare the result against real
>> block_io_signals. If you mean just pushing those lines that set up
>> blocked_signals into a separate function, then I need to find a good
>> name for it.
> 
> Yes, separate function, similar to CONFIG_IOTHREAD case (feel free to
> rename function).
> 

Will check.

Jan
Marcelo Tosatti - Feb. 1, 2011, 2:10 p.m.
On Tue, Feb 01, 2011 at 02:58:02PM +0100, Jan Kiszka wrote:
> On 2011-02-01 14:48, Marcelo Tosatti wrote:
> > On Tue, Feb 01, 2011 at 02:32:38PM +0100, Jan Kiszka wrote:
> >> On 2011-02-01 13:47, Marcelo Tosatti wrote:
> >>> On Thu, Jan 27, 2011 at 02:09:58PM +0100, Jan Kiszka wrote:
> >>>> Found by Stefan Hajnoczi: There is a race in kvm_cpu_exec between
> >>>> checking for exit_request on vcpu entry and timer signals arriving
> >>>> before KVM starts to catch them. Plug it by blocking both timer related
> >>>> signals also on !CONFIG_IOTHREAD and process those via signalfd.
> >>>>
> >>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> >>>> CC: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
> >>>> ---
> >>>>  cpus.c |   18 ++++++++++++++++++
> >>>>  1 files changed, 18 insertions(+), 0 deletions(-)
> >>>>
> >>>> diff --git a/cpus.c b/cpus.c
> >>>> index fc3f222..29b1070 100644
> >>>> --- a/cpus.c
> >>>> +++ b/cpus.c
> >>>> @@ -254,6 +254,10 @@ static void qemu_kvm_init_cpu_signals(CPUState *env)
> >>>>      pthread_sigmask(SIG_BLOCK, NULL, &set);
> >>>>      sigdelset(&set, SIG_IPI);
> >>>>      sigdelset(&set, SIGBUS);
> >>>> +#ifndef CONFIG_IOTHREAD
> >>>> +    sigdelset(&set, SIGIO);
> >>>> +    sigdelset(&set, SIGALRM);
> >>>> +#endif
> >>>
> >>> I'd prefer separate qemu_kvm_init_cpu_signals in the !IOTHREAD
> >>> section.
> >>
> >> You mean to duplicate qemu_kvm_init_cpu_signals for both configurations?
> > 
> > Yes, so to avoid #ifdefs spread.
> 
> Would exchange some #ifdefs against ifndef _WIN32. Haven't measured the
> delta though.
> 
> > 
> >>>> +
> >>>> +#ifndef CONFIG_IOTHREAD
> >>>> +    if (sigismember(&chkset, SIGIO) || sigismember(&chkset, SIGALRM)) {
> >>>> +        qemu_notify_event();
> >>>> +    }
> >>>> +#endif
> >>>
> >>> Why is this necessary?
> >>>
> >>> You should break out of cpu_exec_all if there's a pending alarm (see
> >>> qemu_alarm_pending()).
> >>
> >> qemu_alarm_pending() is not true until the signal is actually taken. The
> >> alarm handler sets the required flags.
> > 
> > Right. What i mean is you need to execute the signal handler inside
> > cpu_exec_all loop (so that alarm pending is set).
> > 
> > So, if there is a SIGALRM pending, qemu_run_timers has highest
> > priority, not vcpu execution.
> 
> We leave the vcpu loop (thanks to notify_event), process the signal in
> the event loop and run the timer handler. This pattern is IMO less
> invasive to the existing code, specifically as it is about to die
> long-term anyway.

You'll probably see poor timer behaviour on smp guests without iothread
enabled.
Jan Kiszka - Feb. 1, 2011, 2:21 p.m.
On 2011-02-01 15:10, Marcelo Tosatti wrote:
> On Tue, Feb 01, 2011 at 02:58:02PM +0100, Jan Kiszka wrote:
>> On 2011-02-01 14:48, Marcelo Tosatti wrote:
>>> On Tue, Feb 01, 2011 at 02:32:38PM +0100, Jan Kiszka wrote:
>>>> On 2011-02-01 13:47, Marcelo Tosatti wrote:
>>>>> On Thu, Jan 27, 2011 at 02:09:58PM +0100, Jan Kiszka wrote:
>>>>>> Found by Stefan Hajnoczi: There is a race in kvm_cpu_exec between
>>>>>> checking for exit_request on vcpu entry and timer signals arriving
>>>>>> before KVM starts to catch them. Plug it by blocking both timer related
>>>>>> signals also on !CONFIG_IOTHREAD and process those via signalfd.
>>>>>>
>>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>>>>>> CC: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
>>>>>> ---
>>>>>>  cpus.c |   18 ++++++++++++++++++
>>>>>>  1 files changed, 18 insertions(+), 0 deletions(-)
>>>>>>
>>>>>> diff --git a/cpus.c b/cpus.c
>>>>>> index fc3f222..29b1070 100644
>>>>>> --- a/cpus.c
>>>>>> +++ b/cpus.c
>>>>>> @@ -254,6 +254,10 @@ static void qemu_kvm_init_cpu_signals(CPUState *env)
>>>>>>      pthread_sigmask(SIG_BLOCK, NULL, &set);
>>>>>>      sigdelset(&set, SIG_IPI);
>>>>>>      sigdelset(&set, SIGBUS);
>>>>>> +#ifndef CONFIG_IOTHREAD
>>>>>> +    sigdelset(&set, SIGIO);
>>>>>> +    sigdelset(&set, SIGALRM);
>>>>>> +#endif
>>>>>
>>>>> I'd prefer separate qemu_kvm_init_cpu_signals in the !IOTHREAD
>>>>> section.
>>>>
>>>> You mean to duplicate qemu_kvm_init_cpu_signals for both configurations?
>>>
>>> Yes, so to avoid #ifdefs spread.
>>
>> Would exchange some #ifdefs against ifndef _WIN32. Haven't measured the
>> delta though.
>>
>>>
>>>>>> +
>>>>>> +#ifndef CONFIG_IOTHREAD
>>>>>> +    if (sigismember(&chkset, SIGIO) || sigismember(&chkset, SIGALRM)) {
>>>>>> +        qemu_notify_event();
>>>>>> +    }
>>>>>> +#endif
>>>>>
>>>>> Why is this necessary?
>>>>>
>>>>> You should break out of cpu_exec_all if there's a pending alarm (see
>>>>> qemu_alarm_pending()).
>>>>
>>>> qemu_alarm_pending() is not true until the signal is actually taken. The
>>>> alarm handler sets the required flags.
>>>
>>> Right. What i mean is you need to execute the signal handler inside
>>> cpu_exec_all loop (so that alarm pending is set).
>>>
>>> So, if there is a SIGALRM pending, qemu_run_timers has highest
>>> priority, not vcpu execution.
>>
>> We leave the vcpu loop (thanks to notify_event), process the signal in
>> the event loop and run the timer handler. This pattern is IMO less
>> invasive to the existing code, specifically as it is about to die
>> long-term anyway.
> 
> You'll probably see poor timer behaviour on smp guests without iothread
> enabled.
> 

Still checking, but that would mean the notification mechanism is broken
anyway: If IO events do not force us to process them quickly, we already
suffer from latencies in SMP mode.

Jan
Jan Kiszka - Feb. 1, 2011, 2:37 p.m.
On 2011-02-01 15:21, Jan Kiszka wrote:
> On 2011-02-01 15:10, Marcelo Tosatti wrote:
>> On Tue, Feb 01, 2011 at 02:58:02PM +0100, Jan Kiszka wrote:
>>> On 2011-02-01 14:48, Marcelo Tosatti wrote:
>>>> On Tue, Feb 01, 2011 at 02:32:38PM +0100, Jan Kiszka wrote:
>>>>> On 2011-02-01 13:47, Marcelo Tosatti wrote:
>>>>>> On Thu, Jan 27, 2011 at 02:09:58PM +0100, Jan Kiszka wrote:
>>>>>>> Found by Stefan Hajnoczi: There is a race in kvm_cpu_exec between
>>>>>>> checking for exit_request on vcpu entry and timer signals arriving
>>>>>>> before KVM starts to catch them. Plug it by blocking both timer related
>>>>>>> signals also on !CONFIG_IOTHREAD and process those via signalfd.
>>>>>>>
>>>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>>>>>>> CC: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
>>>>>>> ---
>>>>>>>  cpus.c |   18 ++++++++++++++++++
>>>>>>>  1 files changed, 18 insertions(+), 0 deletions(-)
>>>>>>>
>>>>>>> diff --git a/cpus.c b/cpus.c
>>>>>>> index fc3f222..29b1070 100644
>>>>>>> --- a/cpus.c
>>>>>>> +++ b/cpus.c
>>>>>>> @@ -254,6 +254,10 @@ static void qemu_kvm_init_cpu_signals(CPUState *env)
>>>>>>>      pthread_sigmask(SIG_BLOCK, NULL, &set);
>>>>>>>      sigdelset(&set, SIG_IPI);
>>>>>>>      sigdelset(&set, SIGBUS);
>>>>>>> +#ifndef CONFIG_IOTHREAD
>>>>>>> +    sigdelset(&set, SIGIO);
>>>>>>> +    sigdelset(&set, SIGALRM);
>>>>>>> +#endif
>>>>>>
>>>>>> I'd prefer separate qemu_kvm_init_cpu_signals in the !IOTHREAD
>>>>>> section.
>>>>>
>>>>> You mean to duplicate qemu_kvm_init_cpu_signals for both configurations?
>>>>
>>>> Yes, so to avoid #ifdefs spread.
>>>
>>> Would exchange some #ifdefs against ifndef _WIN32. Haven't measured the
>>> delta though.
>>>
>>>>
>>>>>>> +
>>>>>>> +#ifndef CONFIG_IOTHREAD
>>>>>>> +    if (sigismember(&chkset, SIGIO) || sigismember(&chkset, SIGALRM)) {
>>>>>>> +        qemu_notify_event();
>>>>>>> +    }
>>>>>>> +#endif
>>>>>>
>>>>>> Why is this necessary?
>>>>>>
>>>>>> You should break out of cpu_exec_all if there's a pending alarm (see
>>>>>> qemu_alarm_pending()).
>>>>>
>>>>> qemu_alarm_pending() is not true until the signal is actually taken. The
>>>>> alarm handler sets the required flags.
>>>>
>>>> Right. What i mean is you need to execute the signal handler inside
>>>> cpu_exec_all loop (so that alarm pending is set).
>>>>
>>>> So, if there is a SIGALRM pending, qemu_run_timers has highest
>>>> priority, not vcpu execution.
>>>
>>> We leave the vcpu loop (thanks to notify_event), process the signal in
>>> the event loop and run the timer handler. This pattern is IMO less
>>> invasive to the existing code, specifically as it is about to die
>>> long-term anyway.
>>
>> You'll probably see poor timer behaviour on smp guests without iothread
>> enabled.
>>
> 
> Still checking, but that would mean the notification mechanism is broken
> anyway: If IO events do not force us to process them quickly, we already
> suffer from latencies in SMP mode.

Maybe a regression caused by the iothread introduction: we need to break
out of the cpu loop via global exit_request when there is an IO event
pending. Fixing this will also heal the above issue.

Sigh, we need to get rid of those two implementations and focus all
reviewing and testing on one. I bet there are still more corner cases
sleeping somewhere.

Jan
Jan Kiszka - Feb. 1, 2011, 2:45 p.m.
On 2011-02-01 15:37, Jan Kiszka wrote:
> On 2011-02-01 15:21, Jan Kiszka wrote:
>> On 2011-02-01 15:10, Marcelo Tosatti wrote:
>>> On Tue, Feb 01, 2011 at 02:58:02PM +0100, Jan Kiszka wrote:
>>>> On 2011-02-01 14:48, Marcelo Tosatti wrote:
>>>>> On Tue, Feb 01, 2011 at 02:32:38PM +0100, Jan Kiszka wrote:
>>>>>> On 2011-02-01 13:47, Marcelo Tosatti wrote:
>>>>>>> On Thu, Jan 27, 2011 at 02:09:58PM +0100, Jan Kiszka wrote:
>>>>>>>> Found by Stefan Hajnoczi: There is a race in kvm_cpu_exec between
>>>>>>>> checking for exit_request on vcpu entry and timer signals arriving
>>>>>>>> before KVM starts to catch them. Plug it by blocking both timer related
>>>>>>>> signals also on !CONFIG_IOTHREAD and process those via signalfd.
>>>>>>>>
>>>>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>>>>>>>> CC: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
>>>>>>>> ---
>>>>>>>>  cpus.c |   18 ++++++++++++++++++
>>>>>>>>  1 files changed, 18 insertions(+), 0 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/cpus.c b/cpus.c
>>>>>>>> index fc3f222..29b1070 100644
>>>>>>>> --- a/cpus.c
>>>>>>>> +++ b/cpus.c
>>>>>>>> @@ -254,6 +254,10 @@ static void qemu_kvm_init_cpu_signals(CPUState *env)
>>>>>>>>      pthread_sigmask(SIG_BLOCK, NULL, &set);
>>>>>>>>      sigdelset(&set, SIG_IPI);
>>>>>>>>      sigdelset(&set, SIGBUS);
>>>>>>>> +#ifndef CONFIG_IOTHREAD
>>>>>>>> +    sigdelset(&set, SIGIO);
>>>>>>>> +    sigdelset(&set, SIGALRM);
>>>>>>>> +#endif
>>>>>>>
>>>>>>> I'd prefer separate qemu_kvm_init_cpu_signals in the !IOTHREAD
>>>>>>> section.
>>>>>>
>>>>>> You mean to duplicate qemu_kvm_init_cpu_signals for both configurations?
>>>>>
>>>>> Yes, so to avoid #ifdefs spread.
>>>>
>>>> Would exchange some #ifdefs against ifndef _WIN32. Haven't measured the
>>>> delta though.
>>>>
>>>>>
>>>>>>>> +
>>>>>>>> +#ifndef CONFIG_IOTHREAD
>>>>>>>> +    if (sigismember(&chkset, SIGIO) || sigismember(&chkset, SIGALRM)) {
>>>>>>>> +        qemu_notify_event();
>>>>>>>> +    }
>>>>>>>> +#endif
>>>>>>>
>>>>>>> Why is this necessary?
>>>>>>>
>>>>>>> You should break out of cpu_exec_all if there's a pending alarm (see
>>>>>>> qemu_alarm_pending()).
>>>>>>
>>>>>> qemu_alarm_pending() is not true until the signal is actually taken. The
>>>>>> alarm handler sets the required flags.
>>>>>
>>>>> Right. What i mean is you need to execute the signal handler inside
>>>>> cpu_exec_all loop (so that alarm pending is set).
>>>>>
>>>>> So, if there is a SIGALRM pending, qemu_run_timers has highest
>>>>> priority, not vcpu execution.
>>>>
>>>> We leave the vcpu loop (thanks to notify_event), process the signal in
>>>> the event loop and run the timer handler. This pattern is IMO less
>>>> invasive to the existing code, specifically as it is about to die
>>>> long-term anyway.
>>>
>>> You'll probably see poor timer behaviour on smp guests without iothread
>>> enabled.
>>>
>>
>> Still checking, but that would mean the notification mechanism is broken
>> anyway: If IO events do not force us to process them quickly, we already
>> suffer from latencies in SMP mode.
> 
> Maybe a regression caused by the iothread introduction:

I take it back, the issue is actually much older.

> we need to break
> out of the cpu loop via global exit_request when there is an IO event
> pending. Fixing this will also heal the above issue.
> 
> Sigh, we need to get rid of those two implementations and focus all
> reviewing and testing on one. I bet there are still more corner cases
> sleeping somewhere.
> 
> Jan
> 

Jan

Patch

diff --git a/cpus.c b/cpus.c
index fc3f222..29b1070 100644
--- a/cpus.c
+++ b/cpus.c
@@ -254,6 +254,10 @@  static void qemu_kvm_init_cpu_signals(CPUState *env)
     pthread_sigmask(SIG_BLOCK, NULL, &set);
     sigdelset(&set, SIG_IPI);
     sigdelset(&set, SIGBUS);
+#ifndef CONFIG_IOTHREAD
+    sigdelset(&set, SIGIO);
+    sigdelset(&set, SIGALRM);
+#endif
     r = kvm_set_signal_mask(env, &set);
     if (r) {
         fprintf(stderr, "kvm_set_signal_mask: %s\n", strerror(-r));
@@ -351,6 +355,12 @@  static void qemu_kvm_eat_signals(CPUState *env)
             exit(1);
         }
     } while (sigismember(&chkset, SIG_IPI) || sigismember(&chkset, SIGBUS));
+
+#ifndef CONFIG_IOTHREAD
+    if (sigismember(&chkset, SIGIO) || sigismember(&chkset, SIGALRM)) {
+        qemu_notify_event();
+    }
+#endif
 }
 
 #else /* _WIN32 */
@@ -398,6 +408,14 @@  int qemu_init_main_loop(void)
     int ret;
 
     sigemptyset(&blocked_signals);
+    if (kvm_enabled()) {
+        /*
+         * We need to process timer signals synchronously to avoid a race
+         * between exit_request check and KVM vcpu entry.
+         */
+        sigaddset(&blocked_signals, SIGIO);
+        sigaddset(&blocked_signals, SIGALRM);
+    }
 
     ret = qemu_signalfd_init(blocked_signals);
     if (ret) {