Patchwork [v2] qemu-timer.c: Remove 250us timeouts

login
register
mail settings
Submitter Peter Portante
Date April 5, 2012, 3 p.m.
Message ID <1333638045-20806-1-git-send-email-peter.portante@redhat.com>
Download mbox | patch
Permalink /patch/151020/
State New
Headers show

Comments

Peter Portante - April 5, 2012, 3 p.m.
Basically, the main wait loop calls qemu_run_all_timers() unconditionally. The
first thing this routine used to do is to see if a timer had been serviced,
and then reset the loop timeout to the next deadline.

However, the new deadlines had not been calculated at that point, as
qemu_run_timers() had not been called yet for each of the clocks. So
qemu_rearm_alarm_timer() would end up with a negative or zero deadline, and
default to setting a 250us timeout for the loop.

As qemu_run_timers() is called for each clock, the real deadlines would be put
in place, but because a loop timeout was already set, the loop timeout would
not be changed.

Once that 250us timeout fired, the real deadline would be used for the
subsequent timeout.

For idle VMs, this effectively doubles the number of times through the loop,
doubling the number of select() system calls, timer calls, etc. putting added
scheduling pressure on the kernel. And under cgroups, this really causes a big
problem because the cgroup code does not scale well.

By simply running the timers before trying to rearm the timer, we always rearm
with a non-zero deadline, effectively halving the number of system calls.

Signed-off-by: Peter Portante <pportant@redhat.com>
---
 qemu-timer.c |   10 +++++-----
 1 files changed, 5 insertions(+), 5 deletions(-)
Anthony Liguori - April 12, 2012, 8:55 p.m.
On 04/05/2012 10:00 AM, Peter Portante wrote:
> Basically, the main wait loop calls qemu_run_all_timers() unconditionally. The
> first thing this routine used to do is to see if a timer had been serviced,
> and then reset the loop timeout to the next deadline.
>
> However, the new deadlines had not been calculated at that point, as
> qemu_run_timers() had not been called yet for each of the clocks. So
> qemu_rearm_alarm_timer() would end up with a negative or zero deadline, and
> default to setting a 250us timeout for the loop.
>
> As qemu_run_timers() is called for each clock, the real deadlines would be put
> in place, but because a loop timeout was already set, the loop timeout would
> not be changed.
>
> Once that 250us timeout fired, the real deadline would be used for the
> subsequent timeout.
>
> For idle VMs, this effectively doubles the number of times through the loop,
> doubling the number of select() system calls, timer calls, etc. putting added
> scheduling pressure on the kernel. And under cgroups, this really causes a big
> problem because the cgroup code does not scale well.
>
> By simply running the timers before trying to rearm the timer, we always rearm
> with a non-zero deadline, effectively halving the number of system calls.
>
> Signed-off-by: Peter Portante<pportant@redhat.com>

Reviewed-by: Avi Kivity <avi@redhat.com>

It's nice to carry these through from the previous patch so it ends up in the 
git history.

It makes sense to me but since this is a subtle change, Paolo, could you also 
take a look at this change?

Reviewed-by: Anthony Liguori <aliguori@us.ibm.com>

Regards,

Anthony Liguori

> ---
>   qemu-timer.c |   10 +++++-----
>   1 files changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/qemu-timer.c b/qemu-timer.c
> index d7f56e5..b0845f1 100644
> --- a/qemu-timer.c
> +++ b/qemu-timer.c
> @@ -472,16 +472,16 @@ void qemu_run_all_timers(void)
>   {
>       alarm_timer->pending = 0;
>
> +    /* vm time timers */
> +    qemu_run_timers(vm_clock);
> +    qemu_run_timers(rt_clock);
> +    qemu_run_timers(host_clock);
> +
>       /* rearm timer, if not periodic */
>       if (alarm_timer->expired) {
>           alarm_timer->expired = 0;
>           qemu_rearm_alarm_timer(alarm_timer);
>       }
> -
> -    /* vm time timers */
> -    qemu_run_timers(vm_clock);
> -    qemu_run_timers(rt_clock);
> -    qemu_run_timers(host_clock);
>   }
>
>   #ifdef _WIN32
Paolo Bonzini - April 12, 2012, 8:56 p.m.
Il 12/04/2012 22:55, Anthony Liguori ha scritto:
> It makes sense to me but since this is a subtle change, Paolo, could you
> also take a look at this change?

It looks fine, Peter and I already looked at the patch prior to his
submitting.

> Reviewed-by: Anthony Liguori <aliguori@us.ibm.com>

Paolo
Anthony Liguori - April 12, 2012, 8:59 p.m.
On 04/12/2012 03:56 PM, Paolo Bonzini wrote:
> Il 12/04/2012 22:55, Anthony Liguori ha scritto:
>> It makes sense to me but since this is a subtle change, Paolo, could you
>> also take a look at this change?
>
> It looks fine, Peter and I already looked at the patch prior to his
> submitting.

Thanks.  I'll queue it for test.

Regards,

Anthony Liguori

>
>> Reviewed-by: Anthony Liguori<aliguori@us.ibm.com>
>
> Paolo
>
Anthony Liguori - April 16, 2012, 8:14 p.m.
On 04/05/2012 10:00 AM, Peter Portante wrote:
> Basically, the main wait loop calls qemu_run_all_timers() unconditionally. The
> first thing this routine used to do is to see if a timer had been serviced,
> and then reset the loop timeout to the next deadline.
>
> However, the new deadlines had not been calculated at that point, as
> qemu_run_timers() had not been called yet for each of the clocks. So
> qemu_rearm_alarm_timer() would end up with a negative or zero deadline, and
> default to setting a 250us timeout for the loop.
>
> As qemu_run_timers() is called for each clock, the real deadlines would be put
> in place, but because a loop timeout was already set, the loop timeout would
> not be changed.
>
> Once that 250us timeout fired, the real deadline would be used for the
> subsequent timeout.
>
> For idle VMs, this effectively doubles the number of times through the loop,
> doubling the number of select() system calls, timer calls, etc. putting added
> scheduling pressure on the kernel. And under cgroups, this really causes a big
> problem because the cgroup code does not scale well.
>
> By simply running the timers before trying to rearm the timer, we always rearm
> with a non-zero deadline, effectively halving the number of system calls.
>
> Signed-off-by: Peter Portante<pportant@redhat.com>

Applied.  Thanks.

Regards,

Anthony Liguori

> ---
>   qemu-timer.c |   10 +++++-----
>   1 files changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/qemu-timer.c b/qemu-timer.c
> index d7f56e5..b0845f1 100644
> --- a/qemu-timer.c
> +++ b/qemu-timer.c
> @@ -472,16 +472,16 @@ void qemu_run_all_timers(void)
>   {
>       alarm_timer->pending = 0;
>
> +    /* vm time timers */
> +    qemu_run_timers(vm_clock);
> +    qemu_run_timers(rt_clock);
> +    qemu_run_timers(host_clock);
> +
>       /* rearm timer, if not periodic */
>       if (alarm_timer->expired) {
>           alarm_timer->expired = 0;
>           qemu_rearm_alarm_timer(alarm_timer);
>       }
> -
> -    /* vm time timers */
> -    qemu_run_timers(vm_clock);
> -    qemu_run_timers(rt_clock);
> -    qemu_run_timers(host_clock);
>   }
>
>   #ifdef _WIN32

Patch

diff --git a/qemu-timer.c b/qemu-timer.c
index d7f56e5..b0845f1 100644
--- a/qemu-timer.c
+++ b/qemu-timer.c
@@ -472,16 +472,16 @@  void qemu_run_all_timers(void)
 {
     alarm_timer->pending = 0;
 
+    /* vm time timers */
+    qemu_run_timers(vm_clock);
+    qemu_run_timers(rt_clock);
+    qemu_run_timers(host_clock);
+
     /* rearm timer, if not periodic */
     if (alarm_timer->expired) {
         alarm_timer->expired = 0;
         qemu_rearm_alarm_timer(alarm_timer);
     }
-
-    /* vm time timers */
-    qemu_run_timers(vm_clock);
-    qemu_run_timers(rt_clock);
-    qemu_run_timers(host_clock);
 }
 
 #ifdef _WIN32