Patchwork [5/8] timers: prepare the code for future races in calling qemu_clock_warp

login
register
mail settings
Submitter Paolo Bonzini
Date Oct. 8, 2013, 8:47 a.m.
Message ID <1381222058-16701-6-git-send-email-pbonzini@redhat.com>
Download mbox | patch
Permalink /patch/281366/
State New
Headers show

Comments

Paolo Bonzini - Oct. 8, 2013, 8:47 a.m.
Computing the deadline of all vm_clocks is somewhat expensive and calls
out to qemu-timer.c; two reasons not to do it in the seqlock's write-side
critical section.  This however opens the door for races in setting and
reading vm_clock_warp_start.

To plug them, we need to cover the case where a new deadline slips in
between the call to qemu_clock_deadline_ns_all and the actual modification
of the icount_warp_timer.  Restrict changes to vm_clock_warp_start and
the icount_warp_timer's expiration time, to only move them back (which
would simply cause an early wakeup).

If a vm_clock timer is cancelled while CPUs are idle, this might cause the
icount_warp_timer to fire unnecessarily.  This is not a problem, after it
fires the timer becomes inactive and the next call to timer_mod_anticipate
will be precise.

In addition to this, we must deactivate the icount_warp_timer _before_
checking whether CPUs are idle.  This way, if the "last" CPU becomes idle
during the call to timer_del we will still set up the icount_warp_timer.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 cpus.c | 19 +++++++++----------
 1 file changed, 9 insertions(+), 10 deletions(-)
Alex Bligh - Oct. 8, 2013, 4:54 p.m.
On 8 Oct 2013, at 09:47, Paolo Bonzini wrote:

> Computing the deadline of all vm_clocks is somewhat expensive and calls
> out to qemu-timer.c; two reasons not to do it in the seqlock's write-side
> critical section.  This however opens the door for races in setting and
> reading vm_clock_warp_start.
> 
> To plug them, we need to cover the case where a new deadline slips in
> between the call to qemu_clock_deadline_ns_all and the actual modification
> of the icount_warp_timer.  Restrict changes to vm_clock_warp_start and
> the icount_warp_timer's expiration time, to only move them back (which
> would simply cause an early wakeup).
> 
> If a vm_clock timer is cancelled while CPUs are idle, this might cause the
> icount_warp_timer to fire unnecessarily.  This is not a problem, after it
> fires the timer becomes inactive and the next call to timer_mod_anticipate
> will be precise.
> 
> In addition to this, we must deactivate the icount_warp_timer _before_
> checking whether CPUs are idle.  This way, if the "last" CPU becomes idle
> during the call to timer_del we will still set up the icount_warp_timer.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
> cpus.c | 19 +++++++++----------
> 1 file changed, 9 insertions(+), 10 deletions(-)
> 
> diff --git a/cpus.c b/cpus.c
> index 9f450ad..08eaf23 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -319,6 +319,7 @@ void qtest_clock_warp(int64_t dest)
> 
> void qemu_clock_warp(QEMUClockType type)
> {
> +    int64_t clock;
>     int64_t deadline;
> 
>     /*
> @@ -338,7 +339,7 @@ void qemu_clock_warp(QEMUClockType type)
>      * the earliest QEMU_CLOCK_VIRTUAL timer.
>      */
>     icount_warp_rt(NULL);
> -    if (!all_cpu_threads_idle() || !qemu_clock_has_timers(QEMU_CLOCK_VIRTUAL)) {
> -        timer_del(icount_warp_timer);
> +    timer_del(icount_warp_timer);
> +    if (!all_cpu_threads_idle()) {
>         return;
>     }
> @@ -348,17 +349,11 @@ void qemu_clock_warp(QEMUClockType type)
> 	return;
>     }
> 
> -    vm_clock_warp_start = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
>     /* We want to use the earliest deadline from ALL vm_clocks */
> +    clock = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
>     deadline = qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL);
> -
> -    /* Maintain prior (possibly buggy) behaviour where if no deadline
> -     * was set (as there is no QEMU_CLOCK_VIRTUAL timer) or it is more than
> -     * INT32_MAX nanoseconds ahead, we still use INT32_MAX
> -     * nanoseconds.
> -     */
> -    if ((deadline < 0) || (deadline > INT32_MAX)) {
> -        deadline = INT32_MAX;
> +    if (deadline < 0) {
> +        return;
>     }

Arguably the patch could document why removing the check for deadline > INT32_MAX
(the bug for bug compatibility) is safe, as I couldn't entirely convince myself it
was, mostly because I couldn't see why it was doing it in the first place.

> 
>     if (deadline > 0) {
> @@ -379,7 +375,10 @@ void qemu_clock_warp(QEMUClockType type)
>          * you will not be sending network packets continuously instead of
>          * every 100ms.
>          */
> -        timer_mod(icount_warp_timer, vm_clock_warp_start + deadline);
> +        if (vm_clock_warp_start == -1 || vm_clock_warp_start > clock) {
> +            vm_clock_warp_start = clock;
> +        }
> +        timer_mod_anticipate(icount_warp_timer, clock + deadline);
>     } else if (deadline == 0) {
>         qemu_clock_notify(QEMU_CLOCK_VIRTUAL);
>     }
> -- 
> 1.8.3.1
> 
> 
> 
>
Paolo Bonzini - Oct. 8, 2013, 4:56 p.m.
Il 08/10/2013 18:54, Alex Bligh ha scritto:
>> > -
>> > -    /* Maintain prior (possibly buggy) behaviour where if no deadline
>> > -     * was set (as there is no QEMU_CLOCK_VIRTUAL timer) or it is more than
>> > -     * INT32_MAX nanoseconds ahead, we still use INT32_MAX
>> > -     * nanoseconds.
>> > -     */
>> > -    if ((deadline < 0) || (deadline > INT32_MAX)) {
>> > -        deadline = INT32_MAX;
>> > +    if (deadline < 0) {
>> > +        return;
>> >     }
> 
> Arguably the patch could document why removing the check for deadline > INT32_MAX
> (the bug for bug compatibility) is safe, as I couldn't entirely convince myself it
> was, mostly because I couldn't see why it was doing it in the first place.

I couldn't convince myself that it is _not_ safe :) and it made the code
more complicated.  As soon as a deadline appears, qemu_clock_warp() will
be called again and update the icount_warp_timer.

Ok to move that to a separate patch?

Paolo
Alex Bligh - Oct. 8, 2013, 5:08 p.m.
On 8 Oct 2013, at 17:56, Paolo Bonzini wrote:

>> Arguably the patch could document why removing the check for deadline > INT32_MAX
>> (the bug for bug compatibility) is safe, as I couldn't entirely convince myself it
>> was, mostly because I couldn't see why it was doing it in the first place.
> 
> I couldn't convince myself that it is _not_ safe :) and it made the code
> more complicated.  As soon as a deadline appears, qemu_clock_warp() will
> be called again and update the icount_warp_timer.
> 
> Ok to move that to a separate patch?

To be honest I put it in out of an abundance of caution. I am very
tempted to take it out and see what breaks. As far as I can see all
the time arithmetic is not int64_t; perhaps this was not always the
case. I was more checking you hadn't removed it by accident. Perhaps
just add "special casing deadlines > INT32_MAX removed as all
arithmetic now 64 bit".

There is another offender in tcg_cpu_exec.

        deadline = qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL);

        /* Maintain prior (possibly buggy) behaviour where if no deadline
         * was set (as there is no QEMU_CLOCK_VIRTUAL timer) or it is more than
         * INT32_MAX nanoseconds ahead, we still use INT32_MAX
         * nanoseconds.
         */
        if ((deadline < 0) || (deadline > INT32_MAX)) {
            deadline = INT32_MAX;
        }

        count = qemu_icount_round(deadline);
        qemu_icount += count;
        decr = (count > 0xffff) ? 0xffff : count;
        count -= decr;
        env->icount_decr.u16.low = decr;
        env->icount_extra = count;

This implies that qemu_icount_round() cannot take a 64 bit int.

static int64_t qemu_icount_round(int64_t count)
{
    return (count + (1 << icount_time_shift) - 1) >> icount_time_shift;
}

I would have thought it better if qemu_icount_round just
dealt sensibly with negative parameters.
Paolo Bonzini - Oct. 8, 2013, 5:10 p.m.
Il 08/10/2013 19:08, Alex Bligh ha scritto:
> 
> On 8 Oct 2013, at 17:56, Paolo Bonzini wrote:
> 
>>> Arguably the patch could document why removing the check for deadline > INT32_MAX
>>> (the bug for bug compatibility) is safe, as I couldn't entirely convince myself it
>>> was, mostly because I couldn't see why it was doing it in the first place.
>>
>> I couldn't convince myself that it is _not_ safe :) and it made the code
>> more complicated.  As soon as a deadline appears, qemu_clock_warp() will
>> be called again and update the icount_warp_timer.
>>
>> Ok to move that to a separate patch?
> 
> To be honest I put it in out of an abundance of caution. I am very
> tempted to take it out and see what breaks. As far as I can see all
> the time arithmetic is not int64_t; perhaps this was not always the
> case. I was more checking you hadn't removed it by accident. Perhaps
> just add "special casing deadlines > INT32_MAX removed as all
> arithmetic now 64 bit".
> 
> There is another offender in tcg_cpu_exec.
> 
>         deadline = qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL);
> 
>         /* Maintain prior (possibly buggy) behaviour where if no deadline
>          * was set (as there is no QEMU_CLOCK_VIRTUAL timer) or it is more than
>          * INT32_MAX nanoseconds ahead, we still use INT32_MAX
>          * nanoseconds.
>          */
>         if ((deadline < 0) || (deadline > INT32_MAX)) {
>             deadline = INT32_MAX;
>         }
> 
>         count = qemu_icount_round(deadline);
>         qemu_icount += count;
>         decr = (count > 0xffff) ? 0xffff : count;
>         count -= decr;
>         env->icount_decr.u16.low = decr;
>         env->icount_extra = count;
> 
> This implies that qemu_icount_round() cannot take a 64 bit int.
> 
> static int64_t qemu_icount_round(int64_t count)
> {
>     return (count + (1 << icount_time_shift) - 1) >> icount_time_shift;
> }
> 
> I would have thought it better if qemu_icount_round just
> dealt sensibly with negative parameters.
> 

I'll clean that up separately.

Thanks,

Paolo

Patch

diff --git a/cpus.c b/cpus.c
index 9f450ad..08eaf23 100644
--- a/cpus.c
+++ b/cpus.c
@@ -319,6 +319,7 @@  void qtest_clock_warp(int64_t dest)
 
 void qemu_clock_warp(QEMUClockType type)
 {
+    int64_t clock;
     int64_t deadline;
 
     /*
@@ -338,7 +339,7 @@  void qemu_clock_warp(QEMUClockType type)
      * the earliest QEMU_CLOCK_VIRTUAL timer.
      */
     icount_warp_rt(NULL);
-    if (!all_cpu_threads_idle() || !qemu_clock_has_timers(QEMU_CLOCK_VIRTUAL)) {
-        timer_del(icount_warp_timer);
+    timer_del(icount_warp_timer);
+    if (!all_cpu_threads_idle()) {
         return;
     }
@@ -348,17 +349,11 @@  void qemu_clock_warp(QEMUClockType type)
 	return;
     }
 
-    vm_clock_warp_start = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
     /* We want to use the earliest deadline from ALL vm_clocks */
+    clock = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
     deadline = qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL);
-
-    /* Maintain prior (possibly buggy) behaviour where if no deadline
-     * was set (as there is no QEMU_CLOCK_VIRTUAL timer) or it is more than
-     * INT32_MAX nanoseconds ahead, we still use INT32_MAX
-     * nanoseconds.
-     */
-    if ((deadline < 0) || (deadline > INT32_MAX)) {
-        deadline = INT32_MAX;
+    if (deadline < 0) {
+        return;
     }
 
     if (deadline > 0) {
@@ -379,7 +375,10 @@  void qemu_clock_warp(QEMUClockType type)
          * you will not be sending network packets continuously instead of
          * every 100ms.
          */
-        timer_mod(icount_warp_timer, vm_clock_warp_start + deadline);
+        if (vm_clock_warp_start == -1 || vm_clock_warp_start > clock) {
+            vm_clock_warp_start = clock;
+        }
+        timer_mod_anticipate(icount_warp_timer, clock + deadline);
     } else if (deadline == 0) {
         qemu_clock_notify(QEMU_CLOCK_VIRTUAL);
     }