Message ID | 20171110151934.16883-1-stefanha@redhat.com |
---|---|
State | New |
Headers | show |
Series | [v4] throttle-groups: drain before detaching ThrottleState | expand |
On Fri 10 Nov 2017 04:19:34 PM CET, Stefan Hajnoczi wrote: > I/O requests hang after stop/cont commands at least since QEMU 2.10.0 > with -drive iops=100: > > (guest)$ dd if=/dev/zero of=/dev/vdb oflag=direct count=1000 > (qemu) stop > (qemu) cont > ...I/O is stuck... > > This happens because blk_set_aio_context() detaches the ThrottleState > while requests may still be in flight: > > if (tgm->throttle_state) { > throttle_group_detach_aio_context(tgm); > throttle_group_attach_aio_context(tgm, new_context); > } > > This patch encloses the detach/attach calls in a drained region so no > I/O request is left hanging. Also add assertions so we don't make the > same mistake again in the future. > > Reported-by: Yongxue Hong <yhong@redhat.com> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Berto
On Fri, Nov 10, 2017 at 03:19:34PM +0000, Stefan Hajnoczi wrote: > I/O requests hang after stop/cont commands at least since QEMU 2.10.0 > with -drive iops=100: > > (guest)$ dd if=/dev/zero of=/dev/vdb oflag=direct count=1000 > (qemu) stop > (qemu) cont > ...I/O is stuck... > > This happens because blk_set_aio_context() detaches the ThrottleState > while requests may still be in flight: > > if (tgm->throttle_state) { > throttle_group_detach_aio_context(tgm); > throttle_group_attach_aio_context(tgm, new_context); > } > > This patch encloses the detach/attach calls in a drained region so no > I/O request is left hanging. Also add assertions so we don't make the > same mistake again in the future. > > Reported-by: Yongxue Hong <yhong@redhat.com> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> > --- > v4: > * Simplified patch in response to Berto's review > --- > block/block-backend.c | 2 ++ > block/throttle-groups.c | 6 ++++++ > 2 files changed, 8 insertions(+) Thanks, applied to my block tree: https://github.com/stefanha/qemu/commits/block Stefan
On Fri 10 Nov 2017 04:19:34 PM CET, Stefan Hajnoczi wrote: > I/O requests hang after stop/cont commands at least since QEMU 2.10.0 > with -drive iops=100: > > (guest)$ dd if=/dev/zero of=/dev/vdb oflag=direct count=1000 > (qemu) stop > (qemu) cont > ...I/O is stuck... > > This happens because blk_set_aio_context() detaches the ThrottleState > while requests may still be in flight: > > if (tgm->throttle_state) { > throttle_group_detach_aio_context(tgm); > throttle_group_attach_aio_context(tgm, new_context); > } > > This patch encloses the detach/attach calls in a drained region so no > I/O request is left hanging. Also add assertions so we don't make the > same mistake again in the future. I'm wondering about the implications of this change... is it possible now to bypass the I/O limits simply by stopping and quickly resuming the VM? And is that a problem? Berto
On Mon, Nov 13, 2017 at 2:29 PM, Alberto Garcia <berto@igalia.com> wrote: > On Fri 10 Nov 2017 04:19:34 PM CET, Stefan Hajnoczi wrote: >> I/O requests hang after stop/cont commands at least since QEMU 2.10.0 >> with -drive iops=100: >> >> (guest)$ dd if=/dev/zero of=/dev/vdb oflag=direct count=1000 >> (qemu) stop >> (qemu) cont >> ...I/O is stuck... >> >> This happens because blk_set_aio_context() detaches the ThrottleState >> while requests may still be in flight: >> >> if (tgm->throttle_state) { >> throttle_group_detach_aio_context(tgm); >> throttle_group_attach_aio_context(tgm, new_context); >> } >> >> This patch encloses the detach/attach calls in a drained region so no >> I/O request is left hanging. Also add assertions so we don't make the >> same mistake again in the future. > > I'm wondering about the implications of this change... is it possible > now to bypass the I/O limits simply by stopping and quickly resuming the > VM? And is that a problem? bdrv_set_aio_context() already drains so this patch doesn't change existing behavior with respect to bypassing throttling. It's not ideal that certain VM lifecycle operations temporarily disable throttling, but it's a trade-off since synchronous drain is usually performance sensitive and should not take a long time. Perhaps there are ways to improve the situation, I haven't studied it in detail. Stefan
diff --git a/block/block-backend.c b/block/block-backend.c index 45d9101be3..da2f6c0f8a 100644 --- a/block/block-backend.c +++ b/block/block-backend.c @@ -1748,8 +1748,10 @@ void blk_set_aio_context(BlockBackend *blk, AioContext *new_context) if (bs) { if (tgm->throttle_state) { + bdrv_drained_begin(bs); throttle_group_detach_aio_context(tgm); throttle_group_attach_aio_context(tgm, new_context); + bdrv_drained_end(bs); } bdrv_set_aio_context(bs, new_context); } diff --git a/block/throttle-groups.c b/block/throttle-groups.c index b291a88481..2587f19ca3 100644 --- a/block/throttle-groups.c +++ b/block/throttle-groups.c @@ -594,6 +594,12 @@ void throttle_group_attach_aio_context(ThrottleGroupMember *tgm, void throttle_group_detach_aio_context(ThrottleGroupMember *tgm) { ThrottleTimers *tt = &tgm->throttle_timers; + + /* Requests must have been drained */ + assert(tgm->pending_reqs[0] == 0 && tgm->pending_reqs[1] == 0); + assert(qemu_co_queue_empty(&tgm->throttled_reqs[0])); + assert(qemu_co_queue_empty(&tgm->throttled_reqs[1])); + throttle_timers_detach_aio_context(tt); tgm->aio_context = NULL; }
I/O requests hang after stop/cont commands at least since QEMU 2.10.0 with -drive iops=100: (guest)$ dd if=/dev/zero of=/dev/vdb oflag=direct count=1000 (qemu) stop (qemu) cont ...I/O is stuck... This happens because blk_set_aio_context() detaches the ThrottleState while requests may still be in flight: if (tgm->throttle_state) { throttle_group_detach_aio_context(tgm); throttle_group_attach_aio_context(tgm, new_context); } This patch encloses the detach/attach calls in a drained region so no I/O request is left hanging. Also add assertions so we don't make the same mistake again in the future. Reported-by: Yongxue Hong <yhong@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> --- v4: * Simplified patch in response to Berto's review --- block/block-backend.c | 2 ++ block/throttle-groups.c | 6 ++++++ 2 files changed, 8 insertions(+)