diff mbox series

[1/8] virtio_queue_aio_attach_host_notifier: remove AioContext lock

Message ID 20220609143727.1151816-2-eesposit@redhat.com
State New
Headers show
Series virtio-blk: removal of AioContext lock | expand

Commit Message

Emanuele Giuseppe Esposito June 9, 2022, 2:37 p.m. UTC
virtio_queue_aio_attach_host_notifier() and
virtio_queue_aio_attach_host_notifier_nopoll() run always in the
main loop, so there is no need to protect them with AioContext
lock.

On the other side, virtio_queue_aio_detach_host_notifier() runs
in a bh in the iothread context, but it is always scheduled
(thus serialized) by the main loop. Therefore removing the
AioContext lock is safe, but unfortunately we can't do it
right now since bdrv_set_aio_context() and
aio_wait_bh_oneshot() still need to have it.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 hw/block/dataplane/virtio-blk.c | 14 ++++++++++++--
 hw/block/virtio-blk.c           |  2 ++
 hw/scsi/virtio-scsi-dataplane.c | 12 ++++++++++--
 3 files changed, 24 insertions(+), 4 deletions(-)

Comments

Stefan Hajnoczi July 5, 2022, 2:11 p.m. UTC | #1
On Thu, Jun 09, 2022 at 10:37:20AM -0400, Emanuele Giuseppe Esposito wrote:
> @@ -146,7 +147,6 @@ int virtio_scsi_dataplane_start(VirtIODevice *vdev)
>  
>      s->dataplane_starting = false;
>      s->dataplane_started = true;
> -    aio_context_release(s->ctx);
>      return 0;

This looks risky because s->dataplane_started is accessed by IO code and
there is a race condition here. Maybe you can refactor the code along
the lines of virtio-blk to avoid the race.
Emanuele Giuseppe Esposito July 8, 2022, 9:01 a.m. UTC | #2
Am 05/07/2022 um 16:11 schrieb Stefan Hajnoczi:
> On Thu, Jun 09, 2022 at 10:37:20AM -0400, Emanuele Giuseppe Esposito wrote:
>> @@ -146,7 +147,6 @@ int virtio_scsi_dataplane_start(VirtIODevice *vdev)
>>  
>>      s->dataplane_starting = false;
>>      s->dataplane_started = true;
>> -    aio_context_release(s->ctx);
>>      return 0;
> 
> This looks risky because s->dataplane_started is accessed by IO code and
> there is a race condition here. Maybe you can refactor the code along
> the lines of virtio-blk to avoid the race.
> 

Uhmm could you explain why is virtio-blk also safe here?
And what is currently protecting dataplane_started (in both blk and
scsi, as I don't see any other AioContext lock taken)?

Because I see that for example virtio_blk_req_complete is IO_CODE, so it
could theoretically read dataplane_started while it is being changed in
dataplane_stop? Even though I guess it doesn't because we disable and
clean the host notifier before modifying it?

But if so, I don't get what is the difference with scsi code, and why we
need to protect only that instance with the aiocontext lock?

Thank you,
Emanuele
Stefan Hajnoczi July 12, 2022, 12:47 p.m. UTC | #3
On Fri, Jul 08, 2022 at 11:01:37AM +0200, Emanuele Giuseppe Esposito wrote:
> 
> 
> Am 05/07/2022 um 16:11 schrieb Stefan Hajnoczi:
> > On Thu, Jun 09, 2022 at 10:37:20AM -0400, Emanuele Giuseppe Esposito wrote:
> >> @@ -146,7 +147,6 @@ int virtio_scsi_dataplane_start(VirtIODevice *vdev)
> >>  
> >>      s->dataplane_starting = false;
> >>      s->dataplane_started = true;
> >> -    aio_context_release(s->ctx);
> >>      return 0;
> > 
> > This looks risky because s->dataplane_started is accessed by IO code and
> > there is a race condition here. Maybe you can refactor the code along
> > the lines of virtio-blk to avoid the race.
> > 
> 
> Uhmm could you explain why is virtio-blk also safe here?
> And what is currently protecting dataplane_started (in both blk and
> scsi, as I don't see any other AioContext lock taken)?

dataplane_started is assigned before the host notifier is set up, which
I'm assuming is an implicit write barrier.

> Because I see that for example virtio_blk_req_complete is IO_CODE, so it
> could theoretically read dataplane_started while it is being changed in
> dataplane_stop? Even though I guess it doesn't because we disable and
> clean the host notifier before modifying it?

virtio_blk_data_plane_stop() has:

  aio_context_acquire(s->ctx);
  aio_wait_bh_oneshot(s->ctx, virtio_blk_data_plane_stop_bh, s);

  /* Drain and try to switch bs back to the QEMU main loop. If other users
   * keep the BlockBackend in the iothread, that's ok */
  blk_set_aio_context(s->conf->conf.blk, qemu_get_aio_context(), NULL);

  aio_context_release(s->ctx);

and disables host notifiers. At that point the IOThread no longer
receives virtqueue kicks and all in-flight requests have completed.
dataplane_started is only written afterwards so there is no race with
virtio_blk_req_complete().

> 
> But if so, I don't get what is the difference with scsi code, and why we
> need to protect only that instance with the aiocontext lock?

The race condition I pointed out is not with virtio_blk_req_complete()
and data_plane_stop(). It's data_plane_start() racing with
virtio_blk_req_complete().

The virtio-scsi dataplane code is different for historical reasons and
happens to have the race. I don't think the virtio-blk code is affected.

Stefan
diff mbox series

Patch

diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
index 49276e46f2..f9224f23d2 100644
--- a/hw/block/dataplane/virtio-blk.c
+++ b/hw/block/dataplane/virtio-blk.c
@@ -167,6 +167,8 @@  int virtio_blk_data_plane_start(VirtIODevice *vdev)
     Error *local_err = NULL;
     int r;
 
+    GLOBAL_STATE_CODE();
+
     if (vblk->dataplane_started || s->starting) {
         return 0;
     }
@@ -243,13 +245,11 @@  int virtio_blk_data_plane_start(VirtIODevice *vdev)
     }
 
     /* Get this show started by hooking up our callbacks */
-    aio_context_acquire(s->ctx);
     for (i = 0; i < nvqs; i++) {
         VirtQueue *vq = virtio_get_queue(s->vdev, i);
 
         virtio_queue_aio_attach_host_notifier(vq, s->ctx);
     }
-    aio_context_release(s->ctx);
     return 0;
 
   fail_aio_context:
@@ -304,6 +304,8 @@  void virtio_blk_data_plane_stop(VirtIODevice *vdev)
     unsigned i;
     unsigned nvqs = s->conf->num_queues;
 
+    GLOBAL_STATE_CODE();
+
     if (!vblk->dataplane_started || s->stopping) {
         return;
     }
@@ -318,6 +320,14 @@  void virtio_blk_data_plane_stop(VirtIODevice *vdev)
     trace_virtio_blk_data_plane_stop(s);
 
     aio_context_acquire(s->ctx);
+    /*
+     * TODO: virtio_blk_data_plane_stop_bh() does not need the AioContext lock,
+     * because even though virtio_queue_aio_detach_host_notifier() runs in
+     * Iothread context, such calls are serialized by the BQL held (this
+     * function runs in the main loop).
+     * On the other side, virtio_queue_aio_attach_host_notifier* always runs
+     * in the main loop, therefore it doesn't need the AioContext lock.
+     */
     aio_wait_bh_oneshot(s->ctx, virtio_blk_data_plane_stop_bh, s);
 
     /* Drain and try to switch bs back to the QEMU main loop. If other users
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index e9ba752f6b..8d0590cc76 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -121,6 +121,8 @@  static void virtio_blk_rw_complete(void *opaque, int ret)
     VirtIOBlock *s = next->dev;
     VirtIODevice *vdev = VIRTIO_DEVICE(s);
 
+    IO_CODE();
+
     aio_context_acquire(blk_get_aio_context(s->conf.conf.blk));
     while (next) {
         VirtIOBlockReq *req = next;
diff --git a/hw/scsi/virtio-scsi-dataplane.c b/hw/scsi/virtio-scsi-dataplane.c
index 8bb6e6acfc..7080e9caa9 100644
--- a/hw/scsi/virtio-scsi-dataplane.c
+++ b/hw/scsi/virtio-scsi-dataplane.c
@@ -91,6 +91,8 @@  int virtio_scsi_dataplane_start(VirtIODevice *vdev)
     VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(vdev);
     VirtIOSCSI *s = VIRTIO_SCSI(vdev);
 
+    GLOBAL_STATE_CODE();
+
     if (s->dataplane_started ||
         s->dataplane_starting ||
         s->dataplane_fenced) {
@@ -136,7 +138,6 @@  int virtio_scsi_dataplane_start(VirtIODevice *vdev)
 
     memory_region_transaction_commit();
 
-    aio_context_acquire(s->ctx);
     virtio_queue_aio_attach_host_notifier(vs->ctrl_vq, s->ctx);
     virtio_queue_aio_attach_host_notifier_no_poll(vs->event_vq, s->ctx);
 
@@ -146,7 +147,6 @@  int virtio_scsi_dataplane_start(VirtIODevice *vdev)
 
     s->dataplane_starting = false;
     s->dataplane_started = true;
-    aio_context_release(s->ctx);
     return 0;
 
 fail_host_notifiers:
@@ -193,6 +193,14 @@  void virtio_scsi_dataplane_stop(VirtIODevice *vdev)
     s->dataplane_stopping = true;
 
     aio_context_acquire(s->ctx);
+    /*
+     * TODO: virtio_scsi_dataplane_stop_bh() does not need the AioContext lock,
+     * because even though virtio_queue_aio_detach_host_notifier() runs in
+     * Iothread context, such calls are serialized by the BQL held (this
+     * function runs in the main loop).
+     * On the other side, virtio_queue_aio_attach_host_notifier* always runs
+     * in the main loop, therefore it doesn't need the AioContext lock.
+     */
     aio_wait_bh_oneshot(s->ctx, virtio_scsi_dataplane_stop_bh, s);
     aio_context_release(s->ctx);