mbox series

[RFC,0/6] vhost-user: Shutdown/Flush slave channel properly

Message ID 20210125180115.22936-1-vgoyal@redhat.com
Headers show
Series vhost-user: Shutdown/Flush slave channel properly | expand

Message

Vivek Goyal Jan. 25, 2021, 6:01 p.m. UTC
Hi,

We are working on DAX support in virtiofs and have some patches out of
the tree hosted here.

https://gitlab.com/virtio-fs/qemu/-/commits/virtio-fs-dev

These patches have not been proposed for merge yet, becasue David
Gilbert noticed that we can run into a deadlock during an emergency
reboot of guest kernel. (echo b > /proc/sysrq-trigger).

I have provided details of deadlock in 4th path of the series with
subject "qemu, vhost-user: Extend protocol to start/stop/flush slave
channel".

Basic problem seems to be that we don't have a proper mechanism to
shutdown slave channel when vhost-user device is stopping. This means
there might be pending messages in slave channel and slave is blocked
and waiting for response.

This is an RFC patch series to enhance vhost-user protocol to 
properly shutdown/flush slave channel and avoid the deadlock. Though
we faced the issue in the context of virtiofs, any vhost-user
device using slave channel can potentially run into issues and
can benefit from these patches.

Any feedback is welcome. Currently patches are based on out of
tree code but after I get some feedback, I can only take pieces
which are relevant to upstream and post separately.

Thanks
Vivek

Vivek Goyal (6):
  virtiofsd: Drop ->vu_dispatch_rwlock while waiting for thread to exit
  libvhost-user: Use slave_mutex in all slave messages
  vhost-user: Return error code from slave_read()
  qemu, vhost-user: Extend protocol to start/stop/flush slave channel
  libvhost-user: Add support to start/stop/flush slave channel
  virtiofsd: Opt in for slave start/stop/shutdown functionality

 hw/virtio/vhost-user.c                    | 151 +++++++++++++++++++++-
 subprojects/libvhost-user/libvhost-user.c | 147 +++++++++++++++++----
 subprojects/libvhost-user/libvhost-user.h |   8 +-
 tools/virtiofsd/fuse_virtio.c             |  20 +++
 4 files changed, 294 insertions(+), 32 deletions(-)

Comments

Michael S. Tsirkin Feb. 10, 2021, 9:39 p.m. UTC | #1
On Mon, Jan 25, 2021 at 01:01:09PM -0500, Vivek Goyal wrote:
> Hi,
> 
> We are working on DAX support in virtiofs and have some patches out of
> the tree hosted here.
> 
> https://gitlab.com/virtio-fs/qemu/-/commits/virtio-fs-dev
> 
> These patches have not been proposed for merge yet, becasue David
> Gilbert noticed that we can run into a deadlock during an emergency
> reboot of guest kernel. (echo b > /proc/sysrq-trigger).
> 
> I have provided details of deadlock in 4th path of the series with
> subject "qemu, vhost-user: Extend protocol to start/stop/flush slave
> channel".
> 
> Basic problem seems to be that we don't have a proper mechanism to
> shutdown slave channel when vhost-user device is stopping. This means
> there might be pending messages in slave channel and slave is blocked
> and waiting for response.
> 
> This is an RFC patch series to enhance vhost-user protocol to 
> properly shutdown/flush slave channel and avoid the deadlock. Though
> we faced the issue in the context of virtiofs, any vhost-user
> device using slave channel can potentially run into issues and
> can benefit from these patches.
> 
> Any feedback is welcome. Currently patches are based on out of
> tree code but after I get some feedback, I can only take pieces
> which are relevant to upstream and post separately.
> 
> Thanks
> Vivek

No comments so far - do you plan to post a non-RFC patchset?


> Vivek Goyal (6):
>   virtiofsd: Drop ->vu_dispatch_rwlock while waiting for thread to exit
>   libvhost-user: Use slave_mutex in all slave messages
>   vhost-user: Return error code from slave_read()
>   qemu, vhost-user: Extend protocol to start/stop/flush slave channel
>   libvhost-user: Add support to start/stop/flush slave channel
>   virtiofsd: Opt in for slave start/stop/shutdown functionality
> 
>  hw/virtio/vhost-user.c                    | 151 +++++++++++++++++++++-
>  subprojects/libvhost-user/libvhost-user.c | 147 +++++++++++++++++----
>  subprojects/libvhost-user/libvhost-user.h |   8 +-
>  tools/virtiofsd/fuse_virtio.c             |  20 +++
>  4 files changed, 294 insertions(+), 32 deletions(-)
> 
> -- 
> 2.25.4
Vivek Goyal Feb. 10, 2021, 10:15 p.m. UTC | #2
On Wed, Feb 10, 2021 at 04:39:06PM -0500, Michael S. Tsirkin wrote:
> On Mon, Jan 25, 2021 at 01:01:09PM -0500, Vivek Goyal wrote:
> > Hi,
> > 
> > We are working on DAX support in virtiofs and have some patches out of
> > the tree hosted here.
> > 
> > https://gitlab.com/virtio-fs/qemu/-/commits/virtio-fs-dev
> > 
> > These patches have not been proposed for merge yet, becasue David
> > Gilbert noticed that we can run into a deadlock during an emergency
> > reboot of guest kernel. (echo b > /proc/sysrq-trigger).
> > 
> > I have provided details of deadlock in 4th path of the series with
> > subject "qemu, vhost-user: Extend protocol to start/stop/flush slave
> > channel".
> > 
> > Basic problem seems to be that we don't have a proper mechanism to
> > shutdown slave channel when vhost-user device is stopping. This means
> > there might be pending messages in slave channel and slave is blocked
> > and waiting for response.
> > 
> > This is an RFC patch series to enhance vhost-user protocol to 
> > properly shutdown/flush slave channel and avoid the deadlock. Though
> > we faced the issue in the context of virtiofs, any vhost-user
> > device using slave channel can potentially run into issues and
> > can benefit from these patches.
> > 
> > Any feedback is welcome. Currently patches are based on out of
> > tree code but after I get some feedback, I can only take pieces
> > which are relevant to upstream and post separately.
> > 
> > Thanks
> > Vivek
> 
> No comments so far - do you plan to post a non-RFC patchset?

Yes. Stefan wants me to poll both unix fd and slave fd in device
shutdown path and serve both of these in parallel, instead of
adding a new slave channel shutdown message. I am planning to give
it a try and post new patches.

Vivek
> 
> 
> > Vivek Goyal (6):
> >   virtiofsd: Drop ->vu_dispatch_rwlock while waiting for thread to exit
> >   libvhost-user: Use slave_mutex in all slave messages
> >   vhost-user: Return error code from slave_read()
> >   qemu, vhost-user: Extend protocol to start/stop/flush slave channel
> >   libvhost-user: Add support to start/stop/flush slave channel
> >   virtiofsd: Opt in for slave start/stop/shutdown functionality
> > 
> >  hw/virtio/vhost-user.c                    | 151 +++++++++++++++++++++-
> >  subprojects/libvhost-user/libvhost-user.c | 147 +++++++++++++++++----
> >  subprojects/libvhost-user/libvhost-user.h |   8 +-
> >  tools/virtiofsd/fuse_virtio.c             |  20 +++
> >  4 files changed, 294 insertions(+), 32 deletions(-)
> > 
> > -- 
> > 2.25.4
>
Michael S. Tsirkin Feb. 23, 2021, 2:14 p.m. UTC | #3
On Mon, Jan 25, 2021 at 01:01:09PM -0500, Vivek Goyal wrote:
> Hi,
> 
> We are working on DAX support in virtiofs and have some patches out of
> the tree hosted here.
> 
> https://gitlab.com/virtio-fs/qemu/-/commits/virtio-fs-dev

any plans to post a non RFC version?

> These patches have not been proposed for merge yet, becasue David
> Gilbert noticed that we can run into a deadlock during an emergency
> reboot of guest kernel. (echo b > /proc/sysrq-trigger).
> 
> I have provided details of deadlock in 4th path of the series with
> subject "qemu, vhost-user: Extend protocol to start/stop/flush slave
> channel".
> 
> Basic problem seems to be that we don't have a proper mechanism to
> shutdown slave channel when vhost-user device is stopping. This means
> there might be pending messages in slave channel and slave is blocked
> and waiting for response.
> 
> This is an RFC patch series to enhance vhost-user protocol to 
> properly shutdown/flush slave channel and avoid the deadlock. Though
> we faced the issue in the context of virtiofs, any vhost-user
> device using slave channel can potentially run into issues and
> can benefit from these patches.
> 
> Any feedback is welcome. Currently patches are based on out of
> tree code but after I get some feedback, I can only take pieces
> which are relevant to upstream and post separately.
> 
> Thanks
> Vivek
> 
> Vivek Goyal (6):
>   virtiofsd: Drop ->vu_dispatch_rwlock while waiting for thread to exit
>   libvhost-user: Use slave_mutex in all slave messages
>   vhost-user: Return error code from slave_read()
>   qemu, vhost-user: Extend protocol to start/stop/flush slave channel
>   libvhost-user: Add support to start/stop/flush slave channel
>   virtiofsd: Opt in for slave start/stop/shutdown functionality
> 
>  hw/virtio/vhost-user.c                    | 151 +++++++++++++++++++++-
>  subprojects/libvhost-user/libvhost-user.c | 147 +++++++++++++++++----
>  subprojects/libvhost-user/libvhost-user.h |   8 +-
>  tools/virtiofsd/fuse_virtio.c             |  20 +++
>  4 files changed, 294 insertions(+), 32 deletions(-)
> 
> -- 
> 2.25.4
Vivek Goyal Feb. 23, 2021, 2:23 p.m. UTC | #4
On Tue, Feb 23, 2021 at 09:14:16AM -0500, Michael S. Tsirkin wrote:
> On Mon, Jan 25, 2021 at 01:01:09PM -0500, Vivek Goyal wrote:
> > Hi,
> > 
> > We are working on DAX support in virtiofs and have some patches out of
> > the tree hosted here.
> > 
> > https://gitlab.com/virtio-fs/qemu/-/commits/virtio-fs-dev
> 
> any plans to post a non RFC version?

We want to post a non-RFC version. But review comments have not been
taken care of yet.

Stefan says don't extend vhost-user protocl. Instead, modify
vhost_user_read() so that it polls both u->user->chr (unix domain socket)
as well as u->slave_fd. IOW, keep on servicing slave fd request while we
are waiting for vhost user message response.

Have not been able to figure out how to do that given unix domain
socket details are abstracted behind char device interface. 

CCing Greg, He might have ideas on how do that.

Vivek

> 
> > These patches have not been proposed for merge yet, becasue David
> > Gilbert noticed that we can run into a deadlock during an emergency
> > reboot of guest kernel. (echo b > /proc/sysrq-trigger).
> > 
> > I have provided details of deadlock in 4th path of the series with
> > subject "qemu, vhost-user: Extend protocol to start/stop/flush slave
> > channel".
> > 
> > Basic problem seems to be that we don't have a proper mechanism to
> > shutdown slave channel when vhost-user device is stopping. This means
> > there might be pending messages in slave channel and slave is blocked
> > and waiting for response.
> > 
> > This is an RFC patch series to enhance vhost-user protocol to 
> > properly shutdown/flush slave channel and avoid the deadlock. Though
> > we faced the issue in the context of virtiofs, any vhost-user
> > device using slave channel can potentially run into issues and
> > can benefit from these patches.
> > 
> > Any feedback is welcome. Currently patches are based on out of
> > tree code but after I get some feedback, I can only take pieces
> > which are relevant to upstream and post separately.
> > 
> > Thanks
> > Vivek
> > 
> > Vivek Goyal (6):
> >   virtiofsd: Drop ->vu_dispatch_rwlock while waiting for thread to exit
> >   libvhost-user: Use slave_mutex in all slave messages
> >   vhost-user: Return error code from slave_read()
> >   qemu, vhost-user: Extend protocol to start/stop/flush slave channel
> >   libvhost-user: Add support to start/stop/flush slave channel
> >   virtiofsd: Opt in for slave start/stop/shutdown functionality
> > 
> >  hw/virtio/vhost-user.c                    | 151 +++++++++++++++++++++-
> >  subprojects/libvhost-user/libvhost-user.c | 147 +++++++++++++++++----
> >  subprojects/libvhost-user/libvhost-user.h |   8 +-
> >  tools/virtiofsd/fuse_virtio.c             |  20 +++
> >  4 files changed, 294 insertions(+), 32 deletions(-)
> > 
> > -- 
> > 2.25.4
>
Michael S. Tsirkin March 14, 2021, 10:21 p.m. UTC | #5
On Mon, Jan 25, 2021 at 01:01:09PM -0500, Vivek Goyal wrote:
> Hi,
> 
> We are working on DAX support in virtiofs and have some patches out of
> the tree hosted here.
> 
> https://gitlab.com/virtio-fs/qemu/-/commits/virtio-fs-dev

ping anyone wants to pick this up and post a non-rfc version?

> These patches have not been proposed for merge yet, becasue David
> Gilbert noticed that we can run into a deadlock during an emergency
> reboot of guest kernel. (echo b > /proc/sysrq-trigger).
> 
> I have provided details of deadlock in 4th path of the series with
> subject "qemu, vhost-user: Extend protocol to start/stop/flush slave
> channel".
> 
> Basic problem seems to be that we don't have a proper mechanism to
> shutdown slave channel when vhost-user device is stopping. This means
> there might be pending messages in slave channel and slave is blocked
> and waiting for response.
> 
> This is an RFC patch series to enhance vhost-user protocol to 
> properly shutdown/flush slave channel and avoid the deadlock. Though
> we faced the issue in the context of virtiofs, any vhost-user
> device using slave channel can potentially run into issues and
> can benefit from these patches.
> 
> Any feedback is welcome. Currently patches are based on out of
> tree code but after I get some feedback, I can only take pieces
> which are relevant to upstream and post separately.
> 
> Thanks
> Vivek
> 
> Vivek Goyal (6):
>   virtiofsd: Drop ->vu_dispatch_rwlock while waiting for thread to exit
>   libvhost-user: Use slave_mutex in all slave messages
>   vhost-user: Return error code from slave_read()
>   qemu, vhost-user: Extend protocol to start/stop/flush slave channel
>   libvhost-user: Add support to start/stop/flush slave channel
>   virtiofsd: Opt in for slave start/stop/shutdown functionality
> 
>  hw/virtio/vhost-user.c                    | 151 +++++++++++++++++++++-
>  subprojects/libvhost-user/libvhost-user.c | 147 +++++++++++++++++----
>  subprojects/libvhost-user/libvhost-user.h |   8 +-
>  tools/virtiofsd/fuse_virtio.c             |  20 +++
>  4 files changed, 294 insertions(+), 32 deletions(-)
> 
> -- 
> 2.25.4
Vivek Goyal March 14, 2021, 10:26 p.m. UTC | #6
On Sun, Mar 14, 2021 at 06:21:04PM -0400, Michael S. Tsirkin wrote:
> On Mon, Jan 25, 2021 at 01:01:09PM -0500, Vivek Goyal wrote:
> > Hi,
> > 
> > We are working on DAX support in virtiofs and have some patches out of
> > the tree hosted here.
> > 
> > https://gitlab.com/virtio-fs/qemu/-/commits/virtio-fs-dev
> 
> ping anyone wants to pick this up and post a non-rfc version?

Hi Michael,

Greg has picked this work and has alredy posted V2 of patches here.

https://lore.kernel.org/qemu-devel/20210312092212.782255-8-groug@kaod.org/T/

Please have a look.

Thanks
Vivek

> 
> > These patches have not been proposed for merge yet, becasue David
> > Gilbert noticed that we can run into a deadlock during an emergency
> > reboot of guest kernel. (echo b > /proc/sysrq-trigger).
> > 
> > I have provided details of deadlock in 4th path of the series with
> > subject "qemu, vhost-user: Extend protocol to start/stop/flush slave
> > channel".
> > 
> > Basic problem seems to be that we don't have a proper mechanism to
> > shutdown slave channel when vhost-user device is stopping. This means
> > there might be pending messages in slave channel and slave is blocked
> > and waiting for response.
> > 
> > This is an RFC patch series to enhance vhost-user protocol to 
> > properly shutdown/flush slave channel and avoid the deadlock. Though
> > we faced the issue in the context of virtiofs, any vhost-user
> > device using slave channel can potentially run into issues and
> > can benefit from these patches.
> > 
> > Any feedback is welcome. Currently patches are based on out of
> > tree code but after I get some feedback, I can only take pieces
> > which are relevant to upstream and post separately.
> > 
> > Thanks
> > Vivek
> > 
> > Vivek Goyal (6):
> >   virtiofsd: Drop ->vu_dispatch_rwlock while waiting for thread to exit
> >   libvhost-user: Use slave_mutex in all slave messages
> >   vhost-user: Return error code from slave_read()
> >   qemu, vhost-user: Extend protocol to start/stop/flush slave channel
> >   libvhost-user: Add support to start/stop/flush slave channel
> >   virtiofsd: Opt in for slave start/stop/shutdown functionality
> > 
> >  hw/virtio/vhost-user.c                    | 151 +++++++++++++++++++++-
> >  subprojects/libvhost-user/libvhost-user.c | 147 +++++++++++++++++----
> >  subprojects/libvhost-user/libvhost-user.h |   8 +-
> >  tools/virtiofsd/fuse_virtio.c             |  20 +++
> >  4 files changed, 294 insertions(+), 32 deletions(-)
> > 
> > -- 
> > 2.25.4
>