Patchwork [RFC,v2,3/5] timer: make qemu_clock_enable sync between disable and timer's cb

login
register
mail settings
Submitter pingfan liu
Date July 29, 2013, 3:16 a.m.
Message ID <1375067768-11342-4-git-send-email-pingfank@linux.vnet.ibm.com>
Download mbox | patch
Permalink /patch/262653/
State New
Headers show

Comments

pingfan liu - July 29, 2013, 3:16 a.m.
After disabling the QemuClock, we should make sure that no QemuTimers
are still in flight. To implement that, the caller of disabling will
wait until the last user's exit.

Note, the callers of qemu_clock_enable() should be sync by themselves,
not protected by this patch.

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 qemu-timer.c | 34 ++++++++++++++++++++++++++++++++--
 1 file changed, 32 insertions(+), 2 deletions(-)
Paolo Bonzini - July 29, 2013, 6:30 a.m.
Il 29/07/2013 05:16, Liu Ping Fan ha scritto:
> After disabling the QemuClock, we should make sure that no QemuTimers
> are still in flight. To implement that, the caller of disabling will
> wait until the last user's exit.
> 
> Note, the callers of qemu_clock_enable() should be sync by themselves,
> not protected by this patch.
> 
> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>

This is an interesting approach.

> -    if (!clock->enabled)
> -        return;
> +    atomic_inc(&clock->using);
> +    if (unlikely(!clock->enabled)) {
> +        goto exit;
> +    }

This can return directly, it doesn't need to increment and decrement
clock->using.

Paolo

>  
>      current_time = qemu_get_clock_ns(clock);
>      tlist = clock_to_timerlist(clock);
> @@ -461,6 +482,15 @@ void qemu_run_timers(QEMUClock *clock)
>          /* run the callback (the timer list can be modified) */
>          ts->cb(ts->opaque);
>      }
> +
> +exit:
> +    qemu_mutex_lock(&clock->lock);
> +    if (atomic_fetch_dec(&clock->using) == 1) {
> +        if (unlikely(!clock->enabled)) {
> +            qemu_cond_signal(&clock->wait_using);
> +        }
> +    }
> +    qemu_mutex_unlock(&clock->lock);
>  }
>  
>  int64_t qemu_get_clock_ns(QEMUClock *clock)
>
pingfan liu - July 29, 2013, 8:10 a.m.
On Mon, Jul 29, 2013 at 2:30 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Il 29/07/2013 05:16, Liu Ping Fan ha scritto:
>> After disabling the QemuClock, we should make sure that no QemuTimers
>> are still in flight. To implement that, the caller of disabling will
>> wait until the last user's exit.
>>
>> Note, the callers of qemu_clock_enable() should be sync by themselves,
>> not protected by this patch.
>>
>> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>
> This is an interesting approach.
>
>> -    if (!clock->enabled)
>> -        return;
>> +    atomic_inc(&clock->using);
>> +    if (unlikely(!clock->enabled)) {
>> +        goto exit;
>> +    }
>
> This can return directly, it doesn't need to increment and decrement
> clock->using.
>
Here is a race condition like the following, so I swap the sequence

     qemu_clock_enable(false)           run timers


if(!clock->enabled)  return;
                                      ------------>
  clock->enabled=false
  if(atomic_read(using)==0)
                                                        atomic_inc(using)


Regards,
Pingfan
> Paolo
>
>>
>>      current_time = qemu_get_clock_ns(clock);
>>      tlist = clock_to_timerlist(clock);
>> @@ -461,6 +482,15 @@ void qemu_run_timers(QEMUClock *clock)
>>          /* run the callback (the timer list can be modified) */
>>          ts->cb(ts->opaque);
>>      }
>> +
>> +exit:
>> +    qemu_mutex_lock(&clock->lock);
>> +    if (atomic_fetch_dec(&clock->using) == 1) {
>> +        if (unlikely(!clock->enabled)) {
>> +            qemu_cond_signal(&clock->wait_using);
>> +        }
>> +    }
>> +    qemu_mutex_unlock(&clock->lock);
>>  }
>>
>>  int64_t qemu_get_clock_ns(QEMUClock *clock)
>>
>
Paolo Bonzini - July 29, 2013, 11:21 a.m.
Il 29/07/2013 10:10, liu ping fan ha scritto:
> On Mon, Jul 29, 2013 at 2:30 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>> Il 29/07/2013 05:16, Liu Ping Fan ha scritto:
>>> After disabling the QemuClock, we should make sure that no QemuTimers
>>> are still in flight. To implement that, the caller of disabling will
>>> wait until the last user's exit.
>>>
>>> Note, the callers of qemu_clock_enable() should be sync by themselves,
>>> not protected by this patch.
>>>
>>> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>>
>> This is an interesting approach.
>>
>>> -    if (!clock->enabled)
>>> -        return;
>>> +    atomic_inc(&clock->using);
>>> +    if (unlikely(!clock->enabled)) {
>>> +        goto exit;
>>> +    }
>>
>> This can return directly, it doesn't need to increment and decrement
>> clock->using.
>>
> Here is a race condition like the following

Ah, I see.

Still this seems a bit backwards.  Most of the time you will have no one
on the wait_using condvar, but you are almost always signaling the
condvar (should be broadcast BTW)...

Paolo
pingfan liu - July 30, 2013, 2:42 a.m.
On Mon, Jul 29, 2013 at 7:21 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Il 29/07/2013 10:10, liu ping fan ha scritto:
>> On Mon, Jul 29, 2013 at 2:30 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>> Il 29/07/2013 05:16, Liu Ping Fan ha scritto:
>>>> After disabling the QemuClock, we should make sure that no QemuTimers
>>>> are still in flight. To implement that, the caller of disabling will
>>>> wait until the last user's exit.
>>>>
>>>> Note, the callers of qemu_clock_enable() should be sync by themselves,
>>>> not protected by this patch.
>>>>
>>>> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>>>
>>> This is an interesting approach.
>>>
>>>> -    if (!clock->enabled)
>>>> -        return;
>>>> +    atomic_inc(&clock->using);
>>>> +    if (unlikely(!clock->enabled)) {
>>>> +        goto exit;
>>>> +    }
>>>
>>> This can return directly, it doesn't need to increment and decrement
>>> clock->using.
>>>
>> Here is a race condition like the following
>
> Ah, I see.
>
> Still this seems a bit backwards.  Most of the time you will have no one
> on the wait_using condvar, but you are almost always signaling the
> condvar (should be broadcast BTW)...
>
I have tried to filter out the normal case by
+    qemu_mutex_lock(&clock->lock);
+    if (atomic_fetch_dec(&clock->using) == 1) { ---------> 1st place
+        if (unlikely(!clock->enabled)) { -------> 2nd place
+            qemu_cond_signal(&clock->wait_using);
+        }
+    }
+    qemu_mutex_unlock(&clock->lock);

Could I do it better?

Thanks,
Pingfan
Paolo Bonzini - July 30, 2013, 9:17 a.m.
Il 30/07/2013 04:42, liu ping fan ha scritto:
> On Mon, Jul 29, 2013 at 7:21 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>> Il 29/07/2013 10:10, liu ping fan ha scritto:
>>> On Mon, Jul 29, 2013 at 2:30 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>>> Il 29/07/2013 05:16, Liu Ping Fan ha scritto:
>>>>> After disabling the QemuClock, we should make sure that no QemuTimers
>>>>> are still in flight. To implement that, the caller of disabling will
>>>>> wait until the last user's exit.
>>>>>
>>>>> Note, the callers of qemu_clock_enable() should be sync by themselves,
>>>>> not protected by this patch.
>>>>>
>>>>> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>>>>
>>>> This is an interesting approach.
>>>>
>>>>> -    if (!clock->enabled)
>>>>> -        return;
>>>>> +    atomic_inc(&clock->using);
>>>>> +    if (unlikely(!clock->enabled)) {
>>>>> +        goto exit;
>>>>> +    }
>>>>
>>>> This can return directly, it doesn't need to increment and decrement
>>>> clock->using.
>>>>
>>> Here is a race condition like the following
>>
>> Ah, I see.
>>
>> Still this seems a bit backwards.  Most of the time you will have no one
>> on the wait_using condvar, but you are almost always signaling the
>> condvar (should be broadcast BTW)...
>>
> I have tried to filter out the normal case by
> +    qemu_mutex_lock(&clock->lock);
> +    if (atomic_fetch_dec(&clock->using) == 1) { ---------> 1st place
> +        if (unlikely(!clock->enabled)) { -------> 2nd place
> +            qemu_cond_signal(&clock->wait_using);
> +        }
> +    }
> +    qemu_mutex_unlock(&clock->lock);
> 
> Could I do it better?

Hmm, do we even need clock->using at this point?  For example:

   qemu_clock_enable()
   {
       clock->enabled = enabled;
       ...
       if (!enabled) {
           /* If another thread is within qemu_run_timers,
            * wait for it to finish.
            */
           qemu_event_wait(&clock->callbacks_done_event);
       }
   }

   qemu_run_timers()
   {
       qemu_event_reset(&clock->callbacks_done_event);
       if (!clock->enabled) {
           goto out;
       }
       ...
   out:
       qemu_event_set(&eclock->callbacks_done_event);
   }

In the fast path this only does two atomic operations (an OR for reset,
and XCHG for set).

Paolo
Alex Bligh - July 30, 2013, 9:51 a.m.
Paolo,

--On 30 July 2013 11:17:05 +0200 Paolo Bonzini <pbonzini@redhat.com> wrote:

> Hmm, do we even need clock->using at this point?  For example:
>
>    qemu_clock_enable()
>    {
>        clock->enabled = enabled;
>        ...
>        if (!enabled) {
>            /* If another thread is within qemu_run_timers,
>             * wait for it to finish.
>             */
>            qemu_event_wait(&clock->callbacks_done_event);
>        }
>    }
>
>    qemu_run_timers()
>    {
>        qemu_event_reset(&clock->callbacks_done_event);
>        if (!clock->enabled) {
>            goto out;
>        }
>        ...
>    out:
>        qemu_event_set(&eclock->callbacks_done_event);
>    }
>
> In the fast path this only does two atomic operations (an OR for reset,
> and XCHG for set).

I think I'm missing something here. If Pingfan is rebasing on top
of my patches, or is doing vaguely the same thing, then
qemu_clock_enable will do two things:

1. It will will walk the list of QEMUTimerLists
2. For each QemuTimerList associated with the clock, it will call
   qemu_notify or aio_notify

The list of QEMUTimerLists is only accessed by qemu_clock_enable
(to do task 2) and by the creation and deletion of QEMUTimerLists,
which happen only in init and AioContext create/delete (which should
be very rare). I'm presuming qemu_clock_enable(false) is also
pretty rare. It seems to me there is no need to do anything more
complex than a simple mutex protecting the list of QEMUTimerLists of
each QEMUClock.

As far as walking the QEMUTimerList itself is concerned, this is
something which is 99.999% done by the thread owning the AioContext.
qemu_clock_enable should not even be walking this list. So I don't
see why the protection here is needed.
Paolo Bonzini - July 30, 2013, 10:12 a.m.
Il 30/07/2013 11:51, Alex Bligh ha scritto:
> As far as walking the QEMUTimerList itself is concerned, this is
> something which is 99.999% done by the thread owning the AioContext.
> qemu_clock_enable should not even be walking this list. So I don't
> see why the protection here is needed.

The protection is needed not because of qemu_clock_enable, but rather
because of code in qemu_clock_enable's caller.  Such code likely expects
not to run concurrently with timers.

qemu_clock_enable however can be called from other threads than the one
owning the AioContext.  Furthermore, it can happen while timer callbacks
are being called, because callbacks are called without holding any lock.

If you put together these conditions, qemu_clock_enable has to wait for
timer callbacks to finish running before returning.

Paolo
pingfan liu - Aug. 1, 2013, 5:54 a.m.
On Tue, Jul 30, 2013 at 5:17 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Il 30/07/2013 04:42, liu ping fan ha scritto:
>> On Mon, Jul 29, 2013 at 7:21 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>> Il 29/07/2013 10:10, liu ping fan ha scritto:
>>>> On Mon, Jul 29, 2013 at 2:30 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>>>> Il 29/07/2013 05:16, Liu Ping Fan ha scritto:
>>>>>> After disabling the QemuClock, we should make sure that no QemuTimers
>>>>>> are still in flight. To implement that, the caller of disabling will
>>>>>> wait until the last user's exit.
>>>>>>
>>>>>> Note, the callers of qemu_clock_enable() should be sync by themselves,
>>>>>> not protected by this patch.
>>>>>>
>>>>>> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>>>>>
>>>>> This is an interesting approach.
>>>>>
>>>>>> -    if (!clock->enabled)
>>>>>> -        return;
>>>>>> +    atomic_inc(&clock->using);
>>>>>> +    if (unlikely(!clock->enabled)) {
>>>>>> +        goto exit;
>>>>>> +    }
>>>>>
>>>>> This can return directly, it doesn't need to increment and decrement
>>>>> clock->using.
>>>>>
>>>> Here is a race condition like the following
>>>
>>> Ah, I see.
>>>
>>> Still this seems a bit backwards.  Most of the time you will have no one
>>> on the wait_using condvar, but you are almost always signaling the
>>> condvar (should be broadcast BTW)...
>>>
>> I have tried to filter out the normal case by
>> +    qemu_mutex_lock(&clock->lock);
>> +    if (atomic_fetch_dec(&clock->using) == 1) { ---------> 1st place
>> +        if (unlikely(!clock->enabled)) { -------> 2nd place
>> +            qemu_cond_signal(&clock->wait_using);
>> +        }
>> +    }
>> +    qemu_mutex_unlock(&clock->lock);
>>
>> Could I do it better?
>
> Hmm, do we even need clock->using at this point?  For example:
>
>    qemu_clock_enable()
>    {
>        clock->enabled = enabled;
>        ...
>        if (!enabled) {
>            /* If another thread is within qemu_run_timers,
>             * wait for it to finish.
>             */
>            qemu_event_wait(&clock->callbacks_done_event);
>        }
>    }
>
>    qemu_run_timers()
>    {
>        qemu_event_reset(&clock->callbacks_done_event);
>        if (!clock->enabled) {
>            goto out;
>        }
>        ...
>    out:
>        qemu_event_set(&eclock->callbacks_done_event);
>    }
>
> In the fast path this only does two atomic operations (an OR for reset,
> and XCHG for set).
>
There is race condition, suppose the following scenario with A/B thread
A: qemu_event_reset()
B: qemu_event_reset()
A: qemu_event_set() ----> B is still in flight when
qemu_clock_enable() is notified
B: qemu_event_set()

I had tried to build something around futex(2) like qemu_event, but failed.

Thanks and regards,
Pingfan

> Paolo
Paolo Bonzini - Aug. 1, 2013, 8:57 a.m.
> > Hmm, do we even need clock->using at this point?  For example:
> >
> >    qemu_clock_enable()
> >    {
> >        clock->enabled = enabled;
> >        ...
> >        if (!enabled) {
> >            /* If another thread is within qemu_run_timers,
> >             * wait for it to finish.
> >             */
> >            qemu_event_wait(&clock->callbacks_done_event);
> >        }
> >    }
> >
> >    qemu_run_timers()
> >    {
> >        qemu_event_reset(&clock->callbacks_done_event);
> >        if (!clock->enabled) {
> >            goto out;
> >        }
> >        ...
> >    out:
> >        qemu_event_set(&eclock->callbacks_done_event);
> >    }
> >
> > In the fast path this only does two atomic operations (an OR for reset,
> > and XCHG for set).
> >
> There is race condition, suppose the following scenario with A/B thread
> A: qemu_event_reset()
> B: qemu_event_reset()
> A: qemu_event_set() ----> B is still in flight when
> qemu_clock_enable() is notified
> B: qemu_event_set()
> 
> I had tried to build something around futex(2) like qemu_event, but failed.

True, qemu_event basically works only when a single thread resets it.  But
there is no race condition here because qemu_run_timers cannot be executed
concurrently by multiple threads (like aio_poll in your bottom half patches).

Paolo
Alex Bligh - Aug. 1, 2013, 9:35 a.m.
--On 1 August 2013 04:57:52 -0400 Paolo Bonzini <pbonzini@redhat.com> wrote:

> True, qemu_event basically works only when a single thread resets it.  But
> there is no race condition here because qemu_run_timers cannot be executed
> concurrently by multiple threads (like aio_poll in your bottom half
> patches).

... or, if rebasing on top of my patches, qemu_run_timers *can* be
executed concurrently by mulitple threads, but in respect of any given
QEMUTimerList, it will only be executed by one thread.
Paolo Bonzini - Aug. 1, 2013, 12:19 p.m.
> > True, qemu_event basically works only when a single thread resets it.  But
> > there is no race condition here because qemu_run_timers cannot be executed
> > concurrently by multiple threads (like aio_poll in your bottom half
> > patches).
> 
> ... or, if rebasing on top of my patches, qemu_run_timers *can* be
> executed concurrently by mulitple threads, but in respect of any given
> QEMUTimerList, it will only be executed by one thread.

... so the event would be placed in QEMUTimerList, not QEMUClock.

qemu_clock_enable then will have to visit all timer lists.  That's
not hard to do, but as locks proliferate we need to have a clear
policy (e.g. BQL -> clock -> timerlist).

So actually there is another problem with this patch (both the
condvar and the event approach are equally buggy).  If a timer
on clock X disables clock X, qemu_clock_enable will deadlock.

I suppose it's easier to ask each user of qemu_clock_enable to
provide its own synchronization, and drop this patch.  The simplest
kind of synchronization is to delay some work to a bottom half in
the clock's AioContext.  If you do that, you know that the timers
are not running.

Ping Fan, this teaches once more the same lesson: let's not invent
complex concurrency mechanisms unless really necessary.  So far
they almost never survived (there are some exceptions, but we have
always taken them from somewhere else: QemuEvent is an abstraction
of liburcu code to make it portable, RCU and seqlock are from Linux,
and I cannot think of anything else).

Paolo
Alex Bligh - Aug. 1, 2013, 1:28 p.m.
Paolo,

--On 1 August 2013 08:19:34 -0400 Paolo Bonzini <pbonzini@redhat.com> wrote:

>> > True, qemu_event basically works only when a single thread resets it.
>> > But there is no race condition here because qemu_run_timers cannot be
>> > executed concurrently by multiple threads (like aio_poll in your
>> > bottom half patches).
>>
>> ... or, if rebasing on top of my patches, qemu_run_timers *can* be
>> executed concurrently by mulitple threads, but in respect of any given
>> QEMUTimerList, it will only be executed by one thread.
>
> ... so the event would be placed in QEMUTimerList, not QEMUClock.

Correct

> qemu_clock_enable then will have to visit all timer lists. That's
> not hard to do,

Correct, v4 of my patch does this.

> but as locks proliferate we need to have a clear
> policy (e.g. BQL -> clock -> timerlist).

But doesn't do the locking bit yet (Pingfan is going to do that I hope)

> So actually there is another problem with this patch (both the
> condvar and the event approach are equally buggy).  If a timer
> on clock X disables clock X, qemu_clock_enable will deadlock.

Yes. I believe there will be a similar problem if a timer
created or deleted an AioContext (clearly less likely!)

> I suppose it's easier to ask each user of qemu_clock_enable to
> provide its own synchronization, and drop this patch.  The simplest
> kind of synchronization is to delay some work to a bottom half in
> the clock's AioContext.  If you do that, you know that the timers
> are not running.

I'm not sure that's true. If two AioContexts run in different
threads, would their BH's and timers not also run in those two
different threads?
Paolo Bonzini - Aug. 1, 2013, 1:51 p.m.
On Aug 01 2013, Alex Bligh wrote:
> >So actually there is another problem with this patch (both the
> >condvar and the event approach are equally buggy).  If a timer
> >on clock X disables clock X, qemu_clock_enable will deadlock.
> 
> Yes. I believe there will be a similar problem if a timer
> created or deleted an AioContext (clearly less likely!)
> 
> >I suppose it's easier to ask each user of qemu_clock_enable to
> >provide its own synchronization, and drop this patch.  The simplest
> >kind of synchronization is to delay some work to a bottom half in
> >the clock's AioContext.  If you do that, you know that the timers
> >are not running.
> 
> I'm not sure that's true. If two AioContexts run in different
> threads, would their BH's and timers not also run in those two
> different threads?

Suppose a thread wants to do qemu_clock_enable(foo, false), and the
code after qemu_clock_enable assumes that no timers are running.
Then you should move that code to a bottom half in foo's AioContext.

Paolo
Alex Bligh - Aug. 1, 2013, 2:20 p.m.
Paolo,

--On 1 August 2013 15:51:11 +0200 Paolo Bonzini <pbonzini@redhat.com> wrote:

>> > So actually there is another problem with this patch (both the
>> > condvar and the event approach are equally buggy).  If a timer
>> > on clock X disables clock X, qemu_clock_enable will deadlock.
>>
>> Yes. I believe there will be a similar problem if a timer
>> created or deleted an AioContext (clearly less likely!)
>>
>> > I suppose it's easier to ask each user of qemu_clock_enable to
>> > provide its own synchronization, and drop this patch.  The simplest
>> > kind of synchronization is to delay some work to a bottom half in
>> > the clock's AioContext.  If you do that, you know that the timers
>> > are not running.
>>
>> I'm not sure that's true. If two AioContexts run in different
>> threads, would their BH's and timers not also run in those two
>> different threads?
>
> Suppose a thread wants to do qemu_clock_enable(foo, false), and the
> code after qemu_clock_enable assumes that no timers are running.
> Then you should move that code to a bottom half in foo's AioContext.

foo is a QEMUClock here.

A QEMUClock may not have just one AioContext. It could have several
each operated by a different thread.
Paolo Bonzini - Aug. 1, 2013, 2:28 p.m.
On Aug 01 2013, Alex Bligh wrote:
> Paolo,
> 
> --On 1 August 2013 15:51:11 +0200 Paolo Bonzini <pbonzini@redhat.com> wrote:
> 
> >>> So actually there is another problem with this patch (both the
> >>> condvar and the event approach are equally buggy).  If a timer
> >>> on clock X disables clock X, qemu_clock_enable will deadlock.
> >>
> >>Yes. I believe there will be a similar problem if a timer
> >>created or deleted an AioContext (clearly less likely!)
> >>
> >>> I suppose it's easier to ask each user of qemu_clock_enable to
> >>> provide its own synchronization, and drop this patch.  The simplest
> >>> kind of synchronization is to delay some work to a bottom half in
> >>> the clock's AioContext.  If you do that, you know that the timers
> >>> are not running.
> >>
> >>I'm not sure that's true. If two AioContexts run in different
> >>threads, would their BH's and timers not also run in those two
> >>different threads?
> >
> >Suppose a thread wants to do qemu_clock_enable(foo, false), and the
> >code after qemu_clock_enable assumes that no timers are running.
> >Then you should move that code to a bottom half in foo's AioContext.
> 
> foo is a QEMUClock here.
> 
> A QEMUClock may not have just one AioContext. It could have several
> each operated by a different thread.

Oops, you're right.

Checking the code for callers of qemu_clock_enable() it seems like there
shouldn't be any deadlocks.  So it should work if qemu_clock_enable
walks all of the timerlists and waits on each event.

But we should document the expectations since they are different from
the current code.

Paolo
pingfan liu - Aug. 2, 2013, 3:31 a.m.
On Thu, Aug 1, 2013 at 10:28 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>  On Aug 01 2013, Alex Bligh wrote:
>> Paolo,
>>
>> --On 1 August 2013 15:51:11 +0200 Paolo Bonzini <pbonzini@redhat.com> wrote:
>>
>> >>> So actually there is another problem with this patch (both the
>> >>> condvar and the event approach are equally buggy).  If a timer
>> >>> on clock X disables clock X, qemu_clock_enable will deadlock.
>> >>
>> >>Yes. I believe there will be a similar problem if a timer
>> >>created or deleted an AioContext (clearly less likely!)
>> >>
>> >>> I suppose it's easier to ask each user of qemu_clock_enable to
>> >>> provide its own synchronization, and drop this patch.  The simplest
>> >>> kind of synchronization is to delay some work to a bottom half in
>> >>> the clock's AioContext.  If you do that, you know that the timers
>> >>> are not running.
>> >>
>> >>I'm not sure that's true. If two AioContexts run in different
>> >>threads, would their BH's and timers not also run in those two
>> >>different threads?
>> >
>> >Suppose a thread wants to do qemu_clock_enable(foo, false), and the
>> >code after qemu_clock_enable assumes that no timers are running.
>> >Then you should move that code to a bottom half in foo's AioContext.
>>
>> foo is a QEMUClock here.
>>
>> A QEMUClock may not have just one AioContext. It could have several
>> each operated by a different thread.
>
> Oops, you're right.
>
> Checking the code for callers of qemu_clock_enable() it seems like there
> shouldn't be any deadlocks.  So it should work if qemu_clock_enable

Do you mean the caller of qemu_clock_enable(foo,false) can NOT be
timer cb? As for this deadlock issue, making
qemu_clock_enabe(foo,false) ASYNC, and forcing the caller to sync
explicitly ?

> walks all of the timerlists and waits on each event.
>
> But we should document the expectations since they are different from
> the current code.
>
> Paolo
>
pingfan liu - Aug. 2, 2013, 3:33 a.m.
On Thu, Aug 1, 2013 at 9:28 PM, Alex Bligh <alex@alex.org.uk> wrote:
> Paolo,
>
>
> --On 1 August 2013 08:19:34 -0400 Paolo Bonzini <pbonzini@redhat.com> wrote:
>
>>> > True, qemu_event basically works only when a single thread resets it.
>>> > But there is no race condition here because qemu_run_timers cannot be
>>> > executed concurrently by multiple threads (like aio_poll in your
>>> > bottom half patches).
>>>
>>> ... or, if rebasing on top of my patches, qemu_run_timers *can* be
>>> executed concurrently by mulitple threads, but in respect of any given
>>> QEMUTimerList, it will only be executed by one thread.
>>
>>
>> ... so the event would be placed in QEMUTimerList, not QEMUClock.
>
>
> Correct
>
>
>> qemu_clock_enable then will have to visit all timer lists. That's
>> not hard to do,
>
>
> Correct, v4 of my patch does this.
>
>
>> but as locks proliferate we need to have a clear
>> policy (e.g. BQL -> clock -> timerlist).
>
>
> But doesn't do the locking bit yet (Pingfan is going to do that I hope)
>
Seem that Stefanha had been involved in this, and sent out his patches :)
>
>> So actually there is another problem with this patch (both the
>> condvar and the event approach are equally buggy).  If a timer
>> on clock X disables clock X, qemu_clock_enable will deadlock.
>
>
> Yes. I believe there will be a similar problem if a timer
> created or deleted an AioContext (clearly less likely!)
>
>
>> I suppose it's easier to ask each user of qemu_clock_enable to
>> provide its own synchronization, and drop this patch.  The simplest
>> kind of synchronization is to delay some work to a bottom half in
>> the clock's AioContext.  If you do that, you know that the timers
>> are not running.
>
>
> I'm not sure that's true. If two AioContexts run in different
> threads, would their BH's and timers not also run in those two
> different threads?
>
> --
> Alex Bligh
Paolo Bonzini - Aug. 2, 2013, 10:01 a.m.
On Aug 02 2013, liu ping fan wrote:
> On Thu, Aug 1, 2013 at 10:28 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> > > > > > So actually there is another problem with this patch (both the
> > > > > > condvar and the event approach are equally buggy).  If a timer
> > > > > > on clock X disables clock X, qemu_clock_enable will deadlock.
> >
> > Checking the code for callers of qemu_clock_enable() it seems like there
> > shouldn't be any deadlocks.  So it should work if qemu_clock_enable
> 
> Do you mean the caller of qemu_clock_enable(foo,false) can NOT be
> timer cb?

Yes.

> As for this deadlock issue, making
> qemu_clock_enable(foo,false) ASYNC, and forcing the caller to sync
> explicitly ?

Right now the callers of qemu_clock_enable(foo, false) are safe.  So
the problem can be "fixed" just by adequate documentation.

> > But we should document the expectations since they are different from
> > the current code.
> >
> > Paolo
Stefan Hajnoczi - Aug. 2, 2013, 2:43 p.m.
On Fri, Aug 02, 2013 at 11:33:40AM +0800, liu ping fan wrote:
> On Thu, Aug 1, 2013 at 9:28 PM, Alex Bligh <alex@alex.org.uk> wrote:
> > Paolo,
> >
> >
> > --On 1 August 2013 08:19:34 -0400 Paolo Bonzini <pbonzini@redhat.com> wrote:
> >
> >>> > True, qemu_event basically works only when a single thread resets it.
> >>> > But there is no race condition here because qemu_run_timers cannot be
> >>> > executed concurrently by multiple threads (like aio_poll in your
> >>> > bottom half patches).
> >>>
> >>> ... or, if rebasing on top of my patches, qemu_run_timers *can* be
> >>> executed concurrently by mulitple threads, but in respect of any given
> >>> QEMUTimerList, it will only be executed by one thread.
> >>
> >>
> >> ... so the event would be placed in QEMUTimerList, not QEMUClock.
> >
> >
> > Correct
> >
> >
> >> qemu_clock_enable then will have to visit all timer lists. That's
> >> not hard to do,
> >
> >
> > Correct, v4 of my patch does this.
> >
> >
> >> but as locks proliferate we need to have a clear
> >> policy (e.g. BQL -> clock -> timerlist).
> >
> >
> > But doesn't do the locking bit yet (Pingfan is going to do that I hope)
> >
> Seem that Stefanha had been involved in this, and sent out his patches :)

Hi Ping Fan,
I dropped the series where I added locks to qemu-timer.c from a few
weeks ago.

With your series rebased onto Alex's series, I think my patches are no
longer needed?

Stefan
pingfan liu - Aug. 5, 2013, 2:13 a.m.
On Fri, Aug 2, 2013 at 10:43 PM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> On Fri, Aug 02, 2013 at 11:33:40AM +0800, liu ping fan wrote:
>> On Thu, Aug 1, 2013 at 9:28 PM, Alex Bligh <alex@alex.org.uk> wrote:
>> > Paolo,
>> >
>> >
>> > --On 1 August 2013 08:19:34 -0400 Paolo Bonzini <pbonzini@redhat.com> wrote:
>> >
>> >>> > True, qemu_event basically works only when a single thread resets it.
>> >>> > But there is no race condition here because qemu_run_timers cannot be
>> >>> > executed concurrently by multiple threads (like aio_poll in your
>> >>> > bottom half patches).
>> >>>
>> >>> ... or, if rebasing on top of my patches, qemu_run_timers *can* be
>> >>> executed concurrently by mulitple threads, but in respect of any given
>> >>> QEMUTimerList, it will only be executed by one thread.
>> >>
>> >>
>> >> ... so the event would be placed in QEMUTimerList, not QEMUClock.
>> >
>> >
>> > Correct
>> >
>> >
>> >> qemu_clock_enable then will have to visit all timer lists. That's
>> >> not hard to do,
>> >
>> >
>> > Correct, v4 of my patch does this.
>> >
>> >
>> >> but as locks proliferate we need to have a clear
>> >> policy (e.g. BQL -> clock -> timerlist).
>> >
>> >
>> > But doesn't do the locking bit yet (Pingfan is going to do that I hope)
>> >
>> Seem that Stefanha had been involved in this, and sent out his patches :)
>
> Hi Ping Fan,
> I dropped the series where I added locks to qemu-timer.c from a few
> weeks ago.
>
> With your series rebased onto Alex's series, I think my patches are no
> longer needed?
>
No. The active_timers list which is still touched by
qemu_mod_timer_ns() needs protection between vcpu/iothread/..
I think you want to pick up your patches?

Regards,
Pingfan

Patch

diff --git a/qemu-timer.c b/qemu-timer.c
index 5a42035..d941a83 100644
--- a/qemu-timer.c
+++ b/qemu-timer.c
@@ -60,6 +60,14 @@  struct QEMUClock {
 
     int type;
     bool enabled;
+
+    /* rule: Innermost lock
+     * to protect the disable against timer's cb in flight.
+     */
+    QemuMutex lock;
+    /* reference by timer list */
+    int using;
+    QemuCond wait_using;
 };
 
 struct QEMUTimer {
@@ -274,6 +282,9 @@  static QEMUClock *qemu_new_clock(int type)
     clock->type = type;
     clock->enabled = true;
     clock->last = INT64_MIN;
+    clock->using = 0;
+    qemu_cond_init(&clock->wait_using);
+    qemu_mutex_init(&clock->lock);
     notifier_list_init(&clock->reset_notifiers);
     tlist = clock_to_timerlist(clock);
     timer_list_init(tlist);
@@ -287,6 +298,13 @@  void qemu_clock_enable(QEMUClock *clock, bool enabled)
     clock->enabled = enabled;
     if (enabled && !old) {
         qemu_rearm_alarm_timer(alarm_timer);
+    } else {
+        qemu_mutex_lock(&clock->lock);
+        /* Wait for cb in flight to terminate */
+        while (atomic_read(clock->using)) {
+            qemu_cond_wait(&clock->wait_using, &clock->lock);
+        }
+        qemu_mutex_unlock(&clock->lock);
     }
 }
 
@@ -440,8 +458,11 @@  void qemu_run_timers(QEMUClock *clock)
     int64_t current_time;
     TimerList *tlist;
 
-    if (!clock->enabled)
-        return;
+    atomic_inc(&clock->using);
+    if (unlikely(!clock->enabled)) {
+        goto exit;
+    }
+
 
     current_time = qemu_get_clock_ns(clock);
     tlist = clock_to_timerlist(clock);
@@ -461,6 +482,15 @@  void qemu_run_timers(QEMUClock *clock)
         /* run the callback (the timer list can be modified) */
         ts->cb(ts->opaque);
     }
+
+exit:
+    qemu_mutex_lock(&clock->lock);
+    if (atomic_fetch_dec(&clock->using) == 1) {
+        if (unlikely(!clock->enabled)) {
+            qemu_cond_signal(&clock->wait_using);
+        }
+    }
+    qemu_mutex_unlock(&clock->lock);
 }
 
 int64_t qemu_get_clock_ns(QEMUClock *clock)