Message ID | 20190411172709.205032-6-vsementsov@virtuozzo.com |
---|---|
State | New |
Headers | show |
Series | None | expand |
On 4/11/19 12:27 PM, Vladimir Sementsov-Ogievskiy wrote: > Introduce a function to gracefully wake-up a coroutine, sleeping in > qemu_co_sleep_ns() sleep. Maybe: Introduce a function to gracefully short-circuit the remainder of the delay for a coroutine sleeping in qemu_co_sleep_ns(). > > Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> > --- > include/qemu/coroutine.h | 6 ++++++ > util/qemu-coroutine-sleep.c | 20 ++++++++++++++++---- > 2 files changed, 22 insertions(+), 4 deletions(-) > > diff --git a/include/qemu/coroutine.h b/include/qemu/coroutine.h > index 9801e7f5a4..ec765c26f0 100644 > --- a/include/qemu/coroutine.h > +++ b/include/qemu/coroutine.h > @@ -278,6 +278,12 @@ void qemu_co_rwlock_unlock(CoRwlock *lock); > */ > void coroutine_fn qemu_co_sleep_ns(QEMUClockType type, int64_t ns); > > +/* > + * Wake a coroutine if it is sleeping by qemu_co_sleep_ns. Timer will be > + * deleted. Maybe: Wake a coroutine if it is sleeping in qemu_co_sleep_ns, and delete the timer. > +++ b/util/qemu-coroutine-sleep.c > @@ -17,13 +17,24 @@ > #include "qemu/timer.h" > #include "block/aio.h" > > +const char *qemu_co_sleep_ns__scheduled = "qemu_co_sleep_ns"; > + > +void qemu_co_sleep_wake(Coroutine *co) > +{ > + /* Write of schedule protected by barrier write in aio_co_schedule */ > + const char *scheduled = atomic_cmpxchg(&co->scheduled, > + qemu_co_sleep_ns__scheduled, NULL); > + > + if (scheduled == qemu_co_sleep_ns__scheduled) { > + aio_co_wake(co); > + } > +} > + > static void co_sleep_cb(void *opaque) > { > Coroutine *co = opaque; > > - /* Write of schedule protected by barrier write in aio_co_schedule */ > - atomic_set(&co->scheduled, NULL); > - aio_co_wake(co); > + qemu_co_sleep_wake(co); > } > > void coroutine_fn qemu_co_sleep_ns(QEMUClockType type, int64_t ns) > @@ -32,7 +43,8 @@ void coroutine_fn qemu_co_sleep_ns(QEMUClockType type, int64_t ns) > QEMUTimer *ts; > Coroutine *co = qemu_coroutine_self(); > > - const char *scheduled = atomic_cmpxchg(&co->scheduled, NULL, __func__); > + const char *scheduled = atomic_cmpxchg(&co->scheduled, NULL, > + qemu_co_sleep_ns__scheduled); > if (scheduled) { > fprintf(stderr, > "%s: Co-routine was already scheduled in '%s'\n", > Here, I'd rather get an additional review from anyone more familiar with coroutine sleeps. I'm guessing that your intent is to request a maximum timeout for a given operation to complete in, but to leave the sleep loop early if the operation completes earlier. I don't know if any existing coroutine code can be used to express that same idea.
Am 11.04.2019 um 19:27 hat Vladimir Sementsov-Ogievskiy geschrieben: > Introduce a function to gracefully wake-up a coroutine, sleeping in > qemu_co_sleep_ns() sleep. > > Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> You can simply reenter the coroutine while it has yielded in qemu_co_sleep_ns(). This is supported. I think what you add here is just the condition that you wake up the coroutine only if it's currently sleeping, but not when it has yielded for other reasons. This suggests that you're trying to reenter a coroutine of which you don't know where exactly in its code it currently is. This is wrong. Just knowing that it's sleeping doesn't tell you where the coroutine is. It could have called a function that sleeps internally and must not be woken up early. If you reenter a coroutine, you always must know the exact point where it yielded (or in exceptional cases, the exact points (plural)). Just reentering because it sleeps will wake it up in unexpected places, generally speaking. So I don't think this function is a good idea. It's too easy to misuse, and if you don't misuse it, you can directly call aio_co_wake(). Kevin
07.06.2019 10:57, Kevin Wolf wrote: > Am 11.04.2019 um 19:27 hat Vladimir Sementsov-Ogievskiy geschrieben: >> Introduce a function to gracefully wake-up a coroutine, sleeping in >> qemu_co_sleep_ns() sleep. >> >> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> > > You can simply reenter the coroutine while it has yielded in > qemu_co_sleep_ns(). This is supported. No it doesn't. qemu_aio_coroutine_enter checks for scheduled field, and aborts if it is set. If I just use aio_co_enter instead of my new function, I get into #1 0x00007f5d2514f8f8 in __GI_abort () at abort.c:90 #2 0x000055e9c8145278 in qemu_aio_coroutine_enter (ctx=0x55e9c9b12300, co=0x55e9c9b23cb0) at util/qemu-coroutine.c:132 #3 0x000055e9c8124f6d in aio_co_enter (ctx=0x55e9c9b12300, co=0x55e9c9b23cb0) at util/async.c:494 #4 0x000055e9c8124eb1 in aio_co_wake (co=0x55e9c9b23cb0) at util/async.c:478 #5 0x000055e9c808d5d4 in nbd_teardown_connection (bs=0x55e9c9b1bc50) at block/nbd-client.c:88 #6 0x000055e9c8090673 in nbd_client_close (bs=0x55e9c9b1bc50) at block/nbd-client.c:1289 #7 0x000055e9c808ca3f in nbd_close (bs=0x55e9c9b1bc50) at block/nbd.c:486 #8 0x000055e9c8006cd6 in bdrv_close (bs=0x55e9c9b1bc50) at block.c:3841 (gdb) fr 2 #2 0x000055e9c8145278 in qemu_aio_coroutine_enter (ctx=0x55e9c9b12300, co=0x55e9c9b23cb0) at util/qemu-coroutine.c:132 132 abort(); (gdb) list 127 * been deleted */ 128 if (scheduled) { 129 fprintf(stderr, 130 "%s: Co-routine was already scheduled in '%s'\n", 131 __func__, scheduled); 132 abort(); 133 } 134 135 if (to->caller) { 136 fprintf(stderr, "Co-routine re-entered recursively\n"); (gdb) p scheduled $1 = 0x55e9c818e990 "qemu_co_sleep_ns" > > I think what you add here is just the condition that you wake up the > coroutine only if it's currently sleeping, but not when it has yielded > for other reasons. This suggests that you're trying to reenter a > coroutine of which you don't know where exactly in its code it currently > is. This is wrong. > > Just knowing that it's sleeping doesn't tell you where the coroutine is. > It could have called a function that sleeps internally and must not be > woken up early. If you reenter a coroutine, you always must know the > exact point where it yielded (or in exceptional cases, the exact points > (plural)). Just reentering because it sleeps will wake it up in > unexpected places, generally speaking. > > So I don't think this function is a good idea. It's too easy to misuse, > and if you don't misuse it, you can directly call aio_co_wake(). > > Kevin >
Am 07.06.2019 um 13:18 hat Vladimir Sementsov-Ogievskiy geschrieben: > 07.06.2019 10:57, Kevin Wolf wrote: > > Am 11.04.2019 um 19:27 hat Vladimir Sementsov-Ogievskiy geschrieben: > >> Introduce a function to gracefully wake-up a coroutine, sleeping in > >> qemu_co_sleep_ns() sleep. > >> > >> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> > > > > You can simply reenter the coroutine while it has yielded in > > qemu_co_sleep_ns(). This is supported. > > No it doesn't. qemu_aio_coroutine_enter checks for scheduled field, > and aborts if it is set. Ah, yes, it has been broken since commit I actually tried to fix it once, but it turned out more complicated and I think we found a different solution for the problem at hand: Subject: [PATCH for-2.11 0/4] Fix qemu-iotests failures Message-Id: <20171128154350.21504-1-kwolf@redhat.com> In this case, I guess your approach with a new function to interrupt qemu_co_sleep_ns() is okay. Do we need to timer_del() when taking the shortcut? We don't necessarily reenter the coroutine immediately, but might only be scheduling it. In this case, the timer could fire before qemu_co_sleep_ns() has run and schedule the coroutine a second time (ignoring co->scheduled again - maybe we should actually not do that in the timer callback path, but instead let it run into the assertion because it would be a bug for the timer callback to end up in this situation). Kevin
07.06.2019 16:02, Kevin Wolf wrote: > Am 07.06.2019 um 13:18 hat Vladimir Sementsov-Ogievskiy geschrieben: >> 07.06.2019 10:57, Kevin Wolf wrote: >>> Am 11.04.2019 um 19:27 hat Vladimir Sementsov-Ogievskiy geschrieben: >>>> Introduce a function to gracefully wake-up a coroutine, sleeping in >>>> qemu_co_sleep_ns() sleep. >>>> >>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> >>> >>> You can simply reenter the coroutine while it has yielded in >>> qemu_co_sleep_ns(). This is supported. >> >> No it doesn't. qemu_aio_coroutine_enter checks for scheduled field, >> and aborts if it is set. > > Ah, yes, it has been broken since commit > > I actually tried to fix it once, but it turned out more complicated and > I think we found a different solution for the problem at hand: > > Subject: [PATCH for-2.11 0/4] Fix qemu-iotests failures > Message-Id: <20171128154350.21504-1-kwolf@redhat.com> > > In this case, I guess your approach with a new function to interrupt > qemu_co_sleep_ns() is okay. > > Do we need to timer_del() when taking the shortcut? We don't necessarily > reenter the coroutine immediately, but might only be scheduling it. In > this case, the timer could fire before qemu_co_sleep_ns() has run and > schedule the coroutine a second time (ignoring co->scheduled again - > maybe we should actually not do that in the timer callback path, but > instead let it run into the assertion because it would be a bug for the > timer callback to end up in this situation). > Ok, thanks, will try to improve it
07.06.2019 16:02, Kevin Wolf wrote: > Am 07.06.2019 um 13:18 hat Vladimir Sementsov-Ogievskiy geschrieben: >> 07.06.2019 10:57, Kevin Wolf wrote: >>> Am 11.04.2019 um 19:27 hat Vladimir Sementsov-Ogievskiy geschrieben: >>>> Introduce a function to gracefully wake-up a coroutine, sleeping in >>>> qemu_co_sleep_ns() sleep. >>>> >>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> >>> >>> You can simply reenter the coroutine while it has yielded in >>> qemu_co_sleep_ns(). This is supported. >> >> No it doesn't. qemu_aio_coroutine_enter checks for scheduled field, >> and aborts if it is set. > > Ah, yes, it has been broken since commit > > I actually tried to fix it once, but it turned out more complicated and > I think we found a different solution for the problem at hand: > > Subject: [PATCH for-2.11 0/4] Fix qemu-iotests failures > Message-Id: <20171128154350.21504-1-kwolf@redhat.com> > > In this case, I guess your approach with a new function to interrupt > qemu_co_sleep_ns() is okay. > > Do we need to timer_del() when taking the shortcut? We don't necessarily > reenter the coroutine immediately, but might only be scheduling it. In > this case, the timer could fire before qemu_co_sleep_ns() has run and > schedule the coroutine a second time No it will not, as we do cmpxchg, scheduled to NULL, so second call will do nothing.. But it seems unsafe, as even coroutine pointer may be stale when we call qemu_co_sleep_wake second time. So, we possibly should remove timer, but .. (ignoring co->scheduled again - > maybe we should actually not do that in the timer callback path, but > instead let it run into the assertion because it would be a bug for the > timer callback to end up in this situation). > > Kevin > Interesting, could there be a race condition, when we call qemu_co_sleep_wake, but co_sleep_cb already scheduled in some queue and will run soon? Then removing the timer will not help.
07.06.2019 18:52, Vladimir Sementsov-Ogievskiy wrote: > 07.06.2019 16:02, Kevin Wolf wrote: >> Am 07.06.2019 um 13:18 hat Vladimir Sementsov-Ogievskiy geschrieben: >>> 07.06.2019 10:57, Kevin Wolf wrote: >>>> Am 11.04.2019 um 19:27 hat Vladimir Sementsov-Ogievskiy geschrieben: >>>>> Introduce a function to gracefully wake-up a coroutine, sleeping in >>>>> qemu_co_sleep_ns() sleep. >>>>> >>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> >>>> >>>> You can simply reenter the coroutine while it has yielded in >>>> qemu_co_sleep_ns(). This is supported. >>> >>> No it doesn't. qemu_aio_coroutine_enter checks for scheduled field, >>> and aborts if it is set. >> >> Ah, yes, it has been broken since commit >> >> I actually tried to fix it once, but it turned out more complicated and >> I think we found a different solution for the problem at hand: >> >> Subject: [PATCH for-2.11 0/4] Fix qemu-iotests failures >> Message-Id: <20171128154350.21504-1-kwolf@redhat.com> >> >> In this case, I guess your approach with a new function to interrupt >> qemu_co_sleep_ns() is okay. >> >> Do we need to timer_del() when taking the shortcut? We don't necessarily >> reenter the coroutine immediately, but might only be scheduling it. In >> this case, the timer could fire before qemu_co_sleep_ns() has run and >> schedule the coroutine a second time > > No it will not, as we do cmpxchg, scheduled to NULL, so second call will do > nothing.. > > But it seems unsafe, as even coroutine pointer may be stale when we call > qemu_co_sleep_wake second time. So, we possibly should remove timer, but .. > > (ignoring co->scheduled again - >> maybe we should actually not do that in the timer callback path, but >> instead let it run into the assertion because it would be a bug for the >> timer callback to end up in this situation). >> >> Kevin >> > > Interesting, could there be a race condition, when we call qemu_co_sleep_wake, > but co_sleep_cb already scheduled in some queue and will run soon? Then removing > the timer will not help. > > Hmm, it's commented that timer_del is thread-safe.. Hmm, so, if anyway want to return Timer pointer from qemu_co_sleep_ns, may be it's better to just call timer_mod(ts, 0) to shorten waiting instead of cheating with .scheduled?
Am 07.06.2019 um 19:10 hat Vladimir Sementsov-Ogievskiy geschrieben: > 07.06.2019 18:52, Vladimir Sementsov-Ogievskiy wrote: > > 07.06.2019 16:02, Kevin Wolf wrote: > >> Am 07.06.2019 um 13:18 hat Vladimir Sementsov-Ogievskiy geschrieben: > >>> 07.06.2019 10:57, Kevin Wolf wrote: > >>>> Am 11.04.2019 um 19:27 hat Vladimir Sementsov-Ogievskiy geschrieben: > >>>>> Introduce a function to gracefully wake-up a coroutine, sleeping in > >>>>> qemu_co_sleep_ns() sleep. > >>>>> > >>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> > >>>> > >>>> You can simply reenter the coroutine while it has yielded in > >>>> qemu_co_sleep_ns(). This is supported. > >>> > >>> No it doesn't. qemu_aio_coroutine_enter checks for scheduled field, > >>> and aborts if it is set. > >> > >> Ah, yes, it has been broken since commit > >> > >> I actually tried to fix it once, but it turned out more complicated and > >> I think we found a different solution for the problem at hand: > >> > >> Subject: [PATCH for-2.11 0/4] Fix qemu-iotests failures > >> Message-Id: <20171128154350.21504-1-kwolf@redhat.com> > >> > >> In this case, I guess your approach with a new function to interrupt > >> qemu_co_sleep_ns() is okay. > >> > >> Do we need to timer_del() when taking the shortcut? We don't necessarily > >> reenter the coroutine immediately, but might only be scheduling it. In > >> this case, the timer could fire before qemu_co_sleep_ns() has run and > >> schedule the coroutine a second time > > > > No it will not, as we do cmpxchg, scheduled to NULL, so second call will do > > nothing.. > > > > But it seems unsafe, as even coroutine pointer may be stale when we call > > qemu_co_sleep_wake second time. So, we possibly should remove timer, but .. > > > > (ignoring co->scheduled again - > >> maybe we should actually not do that in the timer callback path, but > >> instead let it run into the assertion because it would be a bug for the > >> timer callback to end up in this situation). > >> > >> Kevin > >> > > > > Interesting, could there be a race condition, when we call qemu_co_sleep_wake, > > but co_sleep_cb already scheduled in some queue and will run soon? Then removing > > the timer will not help. > > > > > > Hmm, it's commented that timer_del is thread-safe.. > > Hmm, so, if anyway want to return Timer pointer from qemu_co_sleep_ns, may be it's better > to just call timer_mod(ts, 0) to shorten waiting instead of cheating with .scheduled? This is probably slower than timer_del() and directly entering the coroutine. Is there any advantage in using timer_mod()? I don't think messing with .scheduled is too bad as it's set in the function just below, so it pairs nicely enough. Kevin
11.06.2019 11:53, Kevin Wolf wrote: > Am 07.06.2019 um 19:10 hat Vladimir Sementsov-Ogievskiy geschrieben: >> 07.06.2019 18:52, Vladimir Sementsov-Ogievskiy wrote: >>> 07.06.2019 16:02, Kevin Wolf wrote: >>>> Am 07.06.2019 um 13:18 hat Vladimir Sementsov-Ogievskiy geschrieben: >>>>> 07.06.2019 10:57, Kevin Wolf wrote: >>>>>> Am 11.04.2019 um 19:27 hat Vladimir Sementsov-Ogievskiy geschrieben: >>>>>>> Introduce a function to gracefully wake-up a coroutine, sleeping in >>>>>>> qemu_co_sleep_ns() sleep. >>>>>>> >>>>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> >>>>>> >>>>>> You can simply reenter the coroutine while it has yielded in >>>>>> qemu_co_sleep_ns(). This is supported. >>>>> >>>>> No it doesn't. qemu_aio_coroutine_enter checks for scheduled field, >>>>> and aborts if it is set. >>>> >>>> Ah, yes, it has been broken since commit >>>> >>>> I actually tried to fix it once, but it turned out more complicated and >>>> I think we found a different solution for the problem at hand: >>>> >>>> Subject: [PATCH for-2.11 0/4] Fix qemu-iotests failures >>>> Message-Id: <20171128154350.21504-1-kwolf@redhat.com> >>>> >>>> In this case, I guess your approach with a new function to interrupt >>>> qemu_co_sleep_ns() is okay. >>>> >>>> Do we need to timer_del() when taking the shortcut? We don't necessarily >>>> reenter the coroutine immediately, but might only be scheduling it. In >>>> this case, the timer could fire before qemu_co_sleep_ns() has run and >>>> schedule the coroutine a second time >>> >>> No it will not, as we do cmpxchg, scheduled to NULL, so second call will do >>> nothing.. >>> >>> But it seems unsafe, as even coroutine pointer may be stale when we call >>> qemu_co_sleep_wake second time. So, we possibly should remove timer, but .. >>> >>> (ignoring co->scheduled again - >>>> maybe we should actually not do that in the timer callback path, but >>>> instead let it run into the assertion because it would be a bug for the >>>> timer callback to end up in this situation). >>>> >>>> Kevin >>>> >>> >>> Interesting, could there be a race condition, when we call qemu_co_sleep_wake, >>> but co_sleep_cb already scheduled in some queue and will run soon? Then removing >>> the timer will not help. >>> >>> >> >> Hmm, it's commented that timer_del is thread-safe.. >> >> Hmm, so, if anyway want to return Timer pointer from qemu_co_sleep_ns, may be it's better >> to just call timer_mod(ts, 0) to shorten waiting instead of cheating with .scheduled? > > This is probably slower than timer_del() and directly entering the > coroutine. Is there any advantage in using timer_mod()? I don't think > messing with .scheduled is too bad as it's set in the function just > below, so it pairs nicely enough. > Ok, will try this variant too.
diff --git a/include/qemu/coroutine.h b/include/qemu/coroutine.h index 9801e7f5a4..ec765c26f0 100644 --- a/include/qemu/coroutine.h +++ b/include/qemu/coroutine.h @@ -278,6 +278,12 @@ void qemu_co_rwlock_unlock(CoRwlock *lock); */ void coroutine_fn qemu_co_sleep_ns(QEMUClockType type, int64_t ns); +/* + * Wake a coroutine if it is sleeping by qemu_co_sleep_ns. Timer will be + * deleted. + */ +void qemu_co_sleep_wake(Coroutine *co); + /** * Yield until a file descriptor becomes readable * diff --git a/util/qemu-coroutine-sleep.c b/util/qemu-coroutine-sleep.c index 4bfdd30cbf..bcc6afca3e 100644 --- a/util/qemu-coroutine-sleep.c +++ b/util/qemu-coroutine-sleep.c @@ -17,13 +17,24 @@ #include "qemu/timer.h" #include "block/aio.h" +const char *qemu_co_sleep_ns__scheduled = "qemu_co_sleep_ns"; + +void qemu_co_sleep_wake(Coroutine *co) +{ + /* Write of schedule protected by barrier write in aio_co_schedule */ + const char *scheduled = atomic_cmpxchg(&co->scheduled, + qemu_co_sleep_ns__scheduled, NULL); + + if (scheduled == qemu_co_sleep_ns__scheduled) { + aio_co_wake(co); + } +} + static void co_sleep_cb(void *opaque) { Coroutine *co = opaque; - /* Write of schedule protected by barrier write in aio_co_schedule */ - atomic_set(&co->scheduled, NULL); - aio_co_wake(co); + qemu_co_sleep_wake(co); } void coroutine_fn qemu_co_sleep_ns(QEMUClockType type, int64_t ns) @@ -32,7 +43,8 @@ void coroutine_fn qemu_co_sleep_ns(QEMUClockType type, int64_t ns) QEMUTimer *ts; Coroutine *co = qemu_coroutine_self(); - const char *scheduled = atomic_cmpxchg(&co->scheduled, NULL, __func__); + const char *scheduled = atomic_cmpxchg(&co->scheduled, NULL, + qemu_co_sleep_ns__scheduled); if (scheduled) { fprintf(stderr, "%s: Co-routine was already scheduled in '%s'\n",
Introduce a function to gracefully wake-up a coroutine, sleeping in qemu_co_sleep_ns() sleep. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> --- include/qemu/coroutine.h | 6 ++++++ util/qemu-coroutine-sleep.c | 20 ++++++++++++++++---- 2 files changed, 22 insertions(+), 4 deletions(-)