diff mbox series

virtio-blk: fix host notifier issues during dataplane start/stop

Message ID 20230704151527.193586-1-stefanha@redhat.com
State New
Headers show
Series virtio-blk: fix host notifier issues during dataplane start/stop | expand

Commit Message

Stefan Hajnoczi July 4, 2023, 3:15 p.m. UTC
The main loop thread can consume 100% CPU when using --device
virtio-blk-pci,iothread=<iothread>. ppoll() constantly returns but
reading virtqueue host notifiers fails with EAGAIN. The file descriptors
are stale and remain registered with the AioContext because of bugs in
the virtio-blk dataplane start/stop code.

The problem is that the dataplane start/stop code involves drain
operations, which call virtio_blk_drained_begin() and
virtio_blk_drained_end() at points where the host notifier is not
operational:
- In virtio_blk_data_plane_start(), blk_set_aio_context() drains after
  vblk->dataplane_started has been set to true but the host notifier has
  not been attached yet.
- In virtio_blk_data_plane_stop(), blk_drain() and blk_set_aio_context()
  drain after the host notifier has already been detached but with
  vblk->dataplane_started still set to true.

I would like to simplify ->ioeventfd_start/stop() to avoid interactions
with drain entirely, but couldn't find a way to do that. Instead, this
patch accepts the fragile nature of the code and reorders it so that
vblk->dataplane_started is false during drain operations. This way the
virtio_blk_drained_begin() and virtio_blk_drained_end() calls don't
touch the host notifier. The result is that
virtio_blk_data_plane_start() and virtio_blk_data_plane_stop() have
complete control over the host notifier and stale file descriptors are
no longer left in the AioContext.

This patch fixes the 100% CPU consumption in the main loop thread and
correctly moves host notifier processing to the IOThread.

Fixes: 1665d9326fd2 ("virtio-blk: implement BlockDevOps->drained_begin()")
Reported-by: Lukáš Doktor <ldoktor@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 hw/block/dataplane/virtio-blk.c | 67 +++++++++++++++++++--------------
 1 file changed, 38 insertions(+), 29 deletions(-)

Comments

Lukáš Doktor July 10, 2023, 1:57 p.m. UTC | #1
Thank you, Stefan, I tested it with the extended set of tests and it addresses the issue.

Regards,
Lukáš

Tested-by: Lukas Doktor <ldoktor@redhat.com>


Dne 04. 07. 23 v 17:15 Stefan Hajnoczi napsal(a):
> The main loop thread can consume 100% CPU when using --device
> virtio-blk-pci,iothread=<iothread>. ppoll() constantly returns but
> reading virtqueue host notifiers fails with EAGAIN. The file descriptors
> are stale and remain registered with the AioContext because of bugs in
> the virtio-blk dataplane start/stop code.
> 
> The problem is that the dataplane start/stop code involves drain
> operations, which call virtio_blk_drained_begin() and
> virtio_blk_drained_end() at points where the host notifier is not
> operational:
> - In virtio_blk_data_plane_start(), blk_set_aio_context() drains after
>   vblk->dataplane_started has been set to true but the host notifier has
>   not been attached yet.
> - In virtio_blk_data_plane_stop(), blk_drain() and blk_set_aio_context()
>   drain after the host notifier has already been detached but with
>   vblk->dataplane_started still set to true.
> 
> I would like to simplify ->ioeventfd_start/stop() to avoid interactions
> with drain entirely, but couldn't find a way to do that. Instead, this
> patch accepts the fragile nature of the code and reorders it so that
> vblk->dataplane_started is false during drain operations. This way the
> virtio_blk_drained_begin() and virtio_blk_drained_end() calls don't
> touch the host notifier. The result is that
> virtio_blk_data_plane_start() and virtio_blk_data_plane_stop() have
> complete control over the host notifier and stale file descriptors are
> no longer left in the AioContext.
> 
> This patch fixes the 100% CPU consumption in the main loop thread and
> correctly moves host notifier processing to the IOThread.
> 
> Fixes: 1665d9326fd2 ("virtio-blk: implement BlockDevOps->drained_begin()")
> Reported-by: Lukáš Doktor <ldoktor@redhat.com>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  hw/block/dataplane/virtio-blk.c | 67 +++++++++++++++++++--------------
>  1 file changed, 38 insertions(+), 29 deletions(-)
> 
> diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
> index c227b39408..da36fcfd0b 100644
> --- a/hw/block/dataplane/virtio-blk.c
> +++ b/hw/block/dataplane/virtio-blk.c
> @@ -219,13 +219,6 @@ int virtio_blk_data_plane_start(VirtIODevice *vdev)
>  
>      memory_region_transaction_commit();
>  
> -    /*
> -     * These fields are visible to the IOThread so we rely on implicit barriers
> -     * in aio_context_acquire() on the write side and aio_notify_accept() on
> -     * the read side.
> -     */
> -    s->starting = false;
> -    vblk->dataplane_started = true;
>      trace_virtio_blk_data_plane_start(s);
>  
>      old_context = blk_get_aio_context(s->conf->conf.blk);
> @@ -244,6 +237,18 @@ int virtio_blk_data_plane_start(VirtIODevice *vdev)
>          event_notifier_set(virtio_queue_get_host_notifier(vq));
>      }
>  
> +    /*
> +     * These fields must be visible to the IOThread when it processes the
> +     * virtqueue, otherwise it will think dataplane has not started yet.
> +     *
> +     * Make sure ->dataplane_started is false when blk_set_aio_context() is
> +     * called above so that draining does not cause the host notifier to be
> +     * detached/attached prematurely.
> +     */
> +    s->starting = false;
> +    vblk->dataplane_started = true;
> +    smp_wmb(); /* paired with aio_notify_accept() on the read side */
> +
>      /* Get this show started by hooking up our callbacks */
>      if (!blk_in_drain(s->conf->conf.blk)) {
>          aio_context_acquire(s->ctx);
> @@ -273,7 +278,6 @@ int virtio_blk_data_plane_start(VirtIODevice *vdev)
>    fail_guest_notifiers:
>      vblk->dataplane_disabled = true;
>      s->starting = false;
> -    vblk->dataplane_started = true;
>      return -ENOSYS;
>  }
>  
> @@ -327,6 +331,32 @@ void virtio_blk_data_plane_stop(VirtIODevice *vdev)
>          aio_wait_bh_oneshot(s->ctx, virtio_blk_data_plane_stop_bh, s);
>      }
>  
> +    /*
> +     * Batch all the host notifiers in a single transaction to avoid
> +     * quadratic time complexity in address_space_update_ioeventfds().
> +     */
> +    memory_region_transaction_begin();
> +
> +    for (i = 0; i < nvqs; i++) {
> +        virtio_bus_set_host_notifier(VIRTIO_BUS(qbus), i, false);
> +    }
> +
> +    /*
> +     * The transaction expects the ioeventfds to be open when it
> +     * commits. Do it now, before the cleanup loop.
> +     */
> +    memory_region_transaction_commit();
> +
> +    for (i = 0; i < nvqs; i++) {
> +        virtio_bus_cleanup_host_notifier(VIRTIO_BUS(qbus), i);
> +    }
> +
> +    /*
> +     * Set ->dataplane_started to false before draining so that host notifiers
> +     * are not detached/attached anymore.
> +     */
> +    vblk->dataplane_started = false;
> +
>      aio_context_acquire(s->ctx);
>  
>      /* Wait for virtio_blk_dma_restart_bh() and in flight I/O to complete */
> @@ -340,32 +370,11 @@ void virtio_blk_data_plane_stop(VirtIODevice *vdev)
>  
>      aio_context_release(s->ctx);
>  
> -    /*
> -     * Batch all the host notifiers in a single transaction to avoid
> -     * quadratic time complexity in address_space_update_ioeventfds().
> -     */
> -    memory_region_transaction_begin();
> -
> -    for (i = 0; i < nvqs; i++) {
> -        virtio_bus_set_host_notifier(VIRTIO_BUS(qbus), i, false);
> -    }
> -
> -    /*
> -     * The transaction expects the ioeventfds to be open when it
> -     * commits. Do it now, before the cleanup loop.
> -     */
> -    memory_region_transaction_commit();
> -
> -    for (i = 0; i < nvqs; i++) {
> -        virtio_bus_cleanup_host_notifier(VIRTIO_BUS(qbus), i);
> -    }
> -
>      qemu_bh_cancel(s->bh);
>      notify_guest_bh(s); /* final chance to notify guest */
>  
>      /* Clean up guest notifier (irq) */
>      k->set_guest_notifiers(qbus->parent, nvqs, false);
>  
> -    vblk->dataplane_started = false;
>      s->stopping = false;
>  }
Stefan Hajnoczi July 12, 2023, 7:20 p.m. UTC | #2
On Tue, Jul 04, 2023 at 05:15:27PM +0200, Stefan Hajnoczi wrote:
> The main loop thread can consume 100% CPU when using --device
> virtio-blk-pci,iothread=<iothread>. ppoll() constantly returns but
> reading virtqueue host notifiers fails with EAGAIN. The file descriptors
> are stale and remain registered with the AioContext because of bugs in
> the virtio-blk dataplane start/stop code.
> 
> The problem is that the dataplane start/stop code involves drain
> operations, which call virtio_blk_drained_begin() and
> virtio_blk_drained_end() at points where the host notifier is not
> operational:
> - In virtio_blk_data_plane_start(), blk_set_aio_context() drains after
>   vblk->dataplane_started has been set to true but the host notifier has
>   not been attached yet.
> - In virtio_blk_data_plane_stop(), blk_drain() and blk_set_aio_context()
>   drain after the host notifier has already been detached but with
>   vblk->dataplane_started still set to true.
> 
> I would like to simplify ->ioeventfd_start/stop() to avoid interactions
> with drain entirely, but couldn't find a way to do that. Instead, this
> patch accepts the fragile nature of the code and reorders it so that
> vblk->dataplane_started is false during drain operations. This way the
> virtio_blk_drained_begin() and virtio_blk_drained_end() calls don't
> touch the host notifier. The result is that
> virtio_blk_data_plane_start() and virtio_blk_data_plane_stop() have
> complete control over the host notifier and stale file descriptors are
> no longer left in the AioContext.
> 
> This patch fixes the 100% CPU consumption in the main loop thread and
> correctly moves host notifier processing to the IOThread.
> 
> Fixes: 1665d9326fd2 ("virtio-blk: implement BlockDevOps->drained_begin()")
> Reported-by: Lukáš Doktor <ldoktor@redhat.com>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  hw/block/dataplane/virtio-blk.c | 67 +++++++++++++++++++--------------
>  1 file changed, 38 insertions(+), 29 deletions(-)

Thanks, applied to my block tree:
https://gitlab.com/stefanha/qemu/commits/block

Stefan
diff mbox series

Patch

diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
index c227b39408..da36fcfd0b 100644
--- a/hw/block/dataplane/virtio-blk.c
+++ b/hw/block/dataplane/virtio-blk.c
@@ -219,13 +219,6 @@  int virtio_blk_data_plane_start(VirtIODevice *vdev)
 
     memory_region_transaction_commit();
 
-    /*
-     * These fields are visible to the IOThread so we rely on implicit barriers
-     * in aio_context_acquire() on the write side and aio_notify_accept() on
-     * the read side.
-     */
-    s->starting = false;
-    vblk->dataplane_started = true;
     trace_virtio_blk_data_plane_start(s);
 
     old_context = blk_get_aio_context(s->conf->conf.blk);
@@ -244,6 +237,18 @@  int virtio_blk_data_plane_start(VirtIODevice *vdev)
         event_notifier_set(virtio_queue_get_host_notifier(vq));
     }
 
+    /*
+     * These fields must be visible to the IOThread when it processes the
+     * virtqueue, otherwise it will think dataplane has not started yet.
+     *
+     * Make sure ->dataplane_started is false when blk_set_aio_context() is
+     * called above so that draining does not cause the host notifier to be
+     * detached/attached prematurely.
+     */
+    s->starting = false;
+    vblk->dataplane_started = true;
+    smp_wmb(); /* paired with aio_notify_accept() on the read side */
+
     /* Get this show started by hooking up our callbacks */
     if (!blk_in_drain(s->conf->conf.blk)) {
         aio_context_acquire(s->ctx);
@@ -273,7 +278,6 @@  int virtio_blk_data_plane_start(VirtIODevice *vdev)
   fail_guest_notifiers:
     vblk->dataplane_disabled = true;
     s->starting = false;
-    vblk->dataplane_started = true;
     return -ENOSYS;
 }
 
@@ -327,6 +331,32 @@  void virtio_blk_data_plane_stop(VirtIODevice *vdev)
         aio_wait_bh_oneshot(s->ctx, virtio_blk_data_plane_stop_bh, s);
     }
 
+    /*
+     * Batch all the host notifiers in a single transaction to avoid
+     * quadratic time complexity in address_space_update_ioeventfds().
+     */
+    memory_region_transaction_begin();
+
+    for (i = 0; i < nvqs; i++) {
+        virtio_bus_set_host_notifier(VIRTIO_BUS(qbus), i, false);
+    }
+
+    /*
+     * The transaction expects the ioeventfds to be open when it
+     * commits. Do it now, before the cleanup loop.
+     */
+    memory_region_transaction_commit();
+
+    for (i = 0; i < nvqs; i++) {
+        virtio_bus_cleanup_host_notifier(VIRTIO_BUS(qbus), i);
+    }
+
+    /*
+     * Set ->dataplane_started to false before draining so that host notifiers
+     * are not detached/attached anymore.
+     */
+    vblk->dataplane_started = false;
+
     aio_context_acquire(s->ctx);
 
     /* Wait for virtio_blk_dma_restart_bh() and in flight I/O to complete */
@@ -340,32 +370,11 @@  void virtio_blk_data_plane_stop(VirtIODevice *vdev)
 
     aio_context_release(s->ctx);
 
-    /*
-     * Batch all the host notifiers in a single transaction to avoid
-     * quadratic time complexity in address_space_update_ioeventfds().
-     */
-    memory_region_transaction_begin();
-
-    for (i = 0; i < nvqs; i++) {
-        virtio_bus_set_host_notifier(VIRTIO_BUS(qbus), i, false);
-    }
-
-    /*
-     * The transaction expects the ioeventfds to be open when it
-     * commits. Do it now, before the cleanup loop.
-     */
-    memory_region_transaction_commit();
-
-    for (i = 0; i < nvqs; i++) {
-        virtio_bus_cleanup_host_notifier(VIRTIO_BUS(qbus), i);
-    }
-
     qemu_bh_cancel(s->bh);
     notify_guest_bh(s); /* final chance to notify guest */
 
     /* Clean up guest notifier (irq) */
     k->set_guest_notifiers(qbus->parent, nvqs, false);
 
-    vblk->dataplane_started = false;
     s->stopping = false;
 }