diff mbox series

[1/4] vhost-user-blk: delay vhost_user_blk_disconnect

Message ID 20200415032826.16701-2-fengli@smartx.com
State New
Headers show
Series fix crashes when inject errors to vhost-user-blk chardev | expand

Commit Message

Li Feng April 15, 2020, 3:28 a.m. UTC
Since commit b0a335e351103bf92f3f9d0bd5759311be8156ac, a socket write
may trigger a disconnect events, calling vhost_user_blk_disconnect() and
clearing all the vhost_dev strutures. Then the next socket read will
encounter an invalid pointer to vhost_dev.

Signed-off-by: Li Feng <fengli@smartx.com>
---
 hw/block/vhost-user-blk.c | 24 +++++++++++++++++++++++-
 1 file changed, 23 insertions(+), 1 deletion(-)

Comments

Raphael Norwitz April 14, 2020, 1:24 a.m. UTC | #1
On Wed, Apr 15, 2020 at 11:28:23AM +0800, Li Feng wrote:
> 
>      switch (event) {
>      case CHR_EVENT_OPENED:
> @@ -363,7 +376,16 @@ static void vhost_user_blk_event(void *opaque, QEMUChrEvent event)
>          }
>          break;
>      case CHR_EVENT_CLOSED:
> -        vhost_user_blk_disconnect(dev);
> +        /*
> +         * a close event may happen during a read/write, but vhost
> +         * code assumes the vhost_dev remains setup, so delay the
> +         * stop & clear to idle.
> +         */
> +        ctx = qemu_get_current_aio_context();
> +
> +        qemu_chr_fe_set_handlers(&s->chardev,  NULL, NULL, NULL,
> +                                 NULL, NULL, NULL, false);
> +        aio_bh_schedule_oneshot(ctx, vhost_user_blk_chr_closed_bh, opaque);

This seems a bit racy. What’s to stop the async operation from executing before
the next read?

If the issue is just that the vhost_dev state is being destroyed too early, why
don’t we rather move the vhost_dev_cleanup() call from vhost_user_blk_disconnect()
to vhost_user_blk_connect()? We may need to add some state to tell if this is the
first connect or a reconnect so we don’t call vhost_dev_cleanup() on initial
connect, but as long as we call vhost_dev_cleanup() before vhost_dev_init()
every time the device reconnects it shouldn’t matter that we keep that state
around.

Thoughts?

>          break;
>      case CHR_EVENT_BREAK:
>      case CHR_EVENT_MUX_IN:
Li Feng April 20, 2020, 11:18 a.m. UTC | #2
Hi Norwitz ,
Thanks for your good suggestion.

I got this fix from net/vhost-user.c, it has the same issue with this case.
Your suggestion is a good option.
I'm trying to do some work. but there is another crash issue ...
I need some time to make your idea be workable.

This is the net/vhost-user patch:

commit e7c83a885f865128ae3cf1946f8cb538b63cbfba
Author: Marc-André Lureau <marcandre.lureau@redhat.com>
Date: Mon Feb 27 14:49:56 2017 +0400

vhost-user: delay vhost_user_stop

Since commit b0a335e351103bf92f3f9d0bd5759311be8156ac, a socket write
may trigger a disconnect events, calling vhost_user_stop() and clearing
all the vhost_dev strutures holding data that vhost.c functions expect
to remain valid. Delay the cleanup to keep the vhost_dev structure
valid during the vhost.c functions.



Feng Li

Raphael Norwitz <raphael.norwitz@nutanix.com> 于2020年4月17日周五 上午11:41写道:



>
> On Wed, Apr 15, 2020 at 11:28:23AM +0800, Li Feng wrote:
> >
> >      switch (event) {
> >      case CHR_EVENT_OPENED:
> > @@ -363,7 +376,16 @@ static void vhost_user_blk_event(void *opaque, QEMUChrEvent event)
> >          }
> >          break;
> >      case CHR_EVENT_CLOSED:
> > -        vhost_user_blk_disconnect(dev);
> > +        /*
> > +         * a close event may happen during a read/write, but vhost
> > +         * code assumes the vhost_dev remains setup, so delay the
> > +         * stop & clear to idle.
> > +         */
> > +        ctx = qemu_get_current_aio_context();
> > +
> > +        qemu_chr_fe_set_handlers(&s->chardev,  NULL, NULL, NULL,
> > +                                 NULL, NULL, NULL, false);
> > +        aio_bh_schedule_oneshot(ctx, vhost_user_blk_chr_closed_bh, opaque);
>
> This seems a bit racy. What’s to stop the async operation from executing before
> the next read?
>
> If the issue is just that the vhost_dev state is being destroyed too early, why
> don’t we rather move the vhost_dev_cleanup() call from vhost_user_blk_disconnect()
> to vhost_user_blk_connect()? We may need to add some state to tell if this is the
> first connect or a reconnect so we don’t call vhost_dev_cleanup() on initial
> connect, but as long as we call vhost_dev_cleanup() before vhost_dev_init()
> every time the device reconnects it shouldn’t matter that we keep that state
> around.
>
> Thoughts?
>
> >          break;
> >      case CHR_EVENT_BREAK:
> >      case CHR_EVENT_MUX_IN:
diff mbox series

Patch

diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index 17df5338e7..776b9af3eb 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -349,11 +349,24 @@  static void vhost_user_blk_disconnect(DeviceState *dev)
     vhost_dev_cleanup(&s->dev);
 }
 
+static void vhost_user_blk_event(void *opaque, QEMUChrEvent event);
+
+static void vhost_user_blk_chr_closed_bh(void *opaque)
+{
+    DeviceState *dev = opaque;
+    VirtIODevice *vdev = VIRTIO_DEVICE(dev);
+    VHostUserBlk *s = VHOST_USER_BLK(vdev);
+    vhost_user_blk_disconnect(dev);
+    qemu_chr_fe_set_handlers(&s->chardev,  NULL, NULL, vhost_user_blk_event,
+                             NULL, (void *)dev, NULL, true);
+}
+
 static void vhost_user_blk_event(void *opaque, QEMUChrEvent event)
 {
     DeviceState *dev = opaque;
     VirtIODevice *vdev = VIRTIO_DEVICE(dev);
     VHostUserBlk *s = VHOST_USER_BLK(vdev);
+    AioContext *ctx;
 
     switch (event) {
     case CHR_EVENT_OPENED:
@@ -363,7 +376,16 @@  static void vhost_user_blk_event(void *opaque, QEMUChrEvent event)
         }
         break;
     case CHR_EVENT_CLOSED:
-        vhost_user_blk_disconnect(dev);
+        /*
+         * a close event may happen during a read/write, but vhost
+         * code assumes the vhost_dev remains setup, so delay the
+         * stop & clear to idle.
+         */
+        ctx = qemu_get_current_aio_context();
+
+        qemu_chr_fe_set_handlers(&s->chardev,  NULL, NULL, NULL,
+                                 NULL, NULL, NULL, false);
+        aio_bh_schedule_oneshot(ctx, vhost_user_blk_chr_closed_bh, opaque);
         break;
     case CHR_EVENT_BREAK:
     case CHR_EVENT_MUX_IN: