[v4,0/5] vhost-user block device backend implementation

Message ID	20200218050711.8133-1-coiby.xu@gmail.com
Headers	show Return-Path: <qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org> From: Coiby Xu <coiby.xu@gmail.com> To: qemu-devel@nongnu.org Subject: [PATCH v4 0/5] vhost-user block device backend implementation Date: Tue, 18 Feb 2020 13:07:06 +0800 Message-Id: <20200218050711.8133-1-coiby.xu@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: list Cc: kwolf@redhat.com, bharatlkmlkvm@gmail.com, Coiby Xu <coiby.xu@gmail.com>, stefanha@redhat.com Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" <qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>
Series	vhost-user block device backend implementation \| expand [v4,0/5] vhost-user block device backend implementation [v4,1/5] extend libvhost to support IOThread and coroutine [v4,2/5] generic vhost user server [v4,3/5] vhost-user block device backend server [v4,4/5] a standone-alone tool to directly share disk image file via vhost-user protocol [v4,5/5] new qTest case to test the vhost-user-blk-server

Coiby Xu Feb. 18, 2020, 5:07 a.m. UTC

v4:
 * add object properties in class_init
 * relocate vhost-user-blk-test
 * other changes including using SocketAddress, coding style, etc.

v3:
 * separate generic vhost-user-server code from vhost-user-blk-server
   code
 * re-write vu_message_read and kick hander function as coroutines to
   directly call blk_co_preadv, blk_co_pwritev, etc.
 * add aio_context notifier functions to support multi-threading model
 * other fixes regarding coding style, warning report, etc.

v2:
 * Only enable this feauture for Linux because eventfd is a Linux-specific
   feature


This patch series is an implementation of vhost-user block device
backend server, thanks to Stefan and Kevin's guidance.

Vhost-user block device backend server is a UserCreatable object and can be
started using object_add,

 (qemu) object_add vhost-user-blk-server,id=ID,unix-socket=/tmp/vhost-user-blk_vhost.socket,node-name=DRIVE_NAME,writable=off,blk-size=512
 (qemu) object_del ID

or appending the "-object" option when starting QEMU,

  $ -object vhost-user-blk-server,id=disk,unix-socket=/tmp/vhost-user-blk_vhost.socket,node-name=DRIVE_NAME,writable=off,blk-size=512

Then vhost-user client can connect to the server backend.
For example, QEMU could act as a client,

  $ -m 256 -object memory-backend-memfd,id=mem,size=256M,share=on -numa node,memdev=mem -chardev socket,id=char1,path=/tmp/vhost-user-blk_vhost.socket -device vhost-user-blk-pci,id=blk0,chardev=char1

And guest OS could access this vhost-user block device after mouting it.

Coiby Xu (5):
  extend libvhost to support IOThread and coroutine
  generic vhost user server
  vhost-user block device backend server
  a standone-alone tool to directly share disk image file via vhost-user
    protocol
  new qTest case to test the vhost-user-blk-server

 Makefile                              |   4 +
 Makefile.target                       |   1 +
 backends/Makefile.objs                |   2 +
 backends/vhost-user-blk-server.c      | 718 ++++++++++++++++++++++++++
 backends/vhost-user-blk-server.h      |  21 +
 configure                             |   3 +
 contrib/libvhost-user/libvhost-user.c |  54 +-
 contrib/libvhost-user/libvhost-user.h |  38 +-
 qemu-vu.c                             | 252 +++++++++
 tests/Makefile.include                |   3 +-
 tests/qtest/Makefile.include          |   2 +
 tests/qtest/libqos/vhost-user-blk.c   | 126 +++++
 tests/qtest/libqos/vhost-user-blk.h   |  44 ++
 tests/qtest/vhost-user-blk-test.c     | 694 +++++++++++++++++++++++++
 util/Makefile.objs                    |   3 +
 util/vhost-user-server.c              | 427 +++++++++++++++
 util/vhost-user-server.h              |  56 ++
 vl.c                                  |   4 +
 18 files changed, 2444 insertions(+), 8 deletions(-)
 create mode 100644 backends/vhost-user-blk-server.c
 create mode 100644 backends/vhost-user-blk-server.h
 create mode 100644 qemu-vu.c
 create mode 100644 tests/qtest/libqos/vhost-user-blk.c
 create mode 100644 tests/qtest/libqos/vhost-user-blk.h
 create mode 100644 tests/qtest/vhost-user-blk-test.c
 create mode 100644 util/vhost-user-server.c
 create mode 100644 util/vhost-user-server.h

--
2.25.0

Stefan Hajnoczi Feb. 19, 2020, 4:38 p.m. UTC | #1

On Tue, Feb 18, 2020 at 01:07:06PM +0800, Coiby Xu wrote:
> v4:
>  * add object properties in class_init
>  * relocate vhost-user-blk-test
>  * other changes including using SocketAddress, coding style, etc.

Thanks!  I think the vhost-user server code can be simplified if
libvhost-user uses the event loop for asynchronous socket I/O.  Then
it's no longer necessary to duplicate vu_message_read() and
vu_kick_cb().  I've replied to Patch 1 and we can discuss on IRC if you
want to chat about it.

I've also CCed Marc-André to see what he thinks about removing the
blocking recv from libvhost-user and instead using the event loop (just
like for eventfds).

Stefan

Coiby Xu Feb. 26, 2020, 3:18 p.m. UTC | #2

Hi Stefan,

Thank you for reviewing my code!

I tried to reach you on IRC. But somehow either you missed my message
or I missed your reply. So I will reply by email instead.

If we use qio_channel_set_aio_fd_handler to monitor G_IO_IN event,
i.e. use vu_dispatch as the read handler, then we can re-use
vu_message_read. And "removing the blocking recv from libvhost-user"
isn't necessary because "the operation of poll() and ppoll() is not
affected by the O_NONBLOCK flag" despite that we use
qio_channel_set_blocking before calling qio_channel_set_aio_fd_handler
to make recv non-blocking.

Previously I needed to run customized vu_kick_cb as coroutines to call
blk_co_readv/blk_co_writev directly. After taking Kevin's feedback on
v4 into consideration, now I use aio_set_fd_handler to set a read
handler for kick_fd and  this read handler will then call vu_kick_cb.

On Thu, Feb 20, 2020 at 1:58 AM Stefan Hajnoczi <stefanha@redhat.com> wrote:
>
> On Tue, Feb 18, 2020 at 01:07:06PM +0800, Coiby Xu wrote:
> > v4:
> >  * add object properties in class_init
> >  * relocate vhost-user-blk-test
> >  * other changes including using SocketAddress, coding style, etc.
>
> Thanks!  I think the vhost-user server code can be simplified if
> libvhost-user uses the event loop for asynchronous socket I/O.  Then
> it's no longer necessary to duplicate vu_message_read() and
> vu_kick_cb().  I've replied to Patch 1 and we can discuss on IRC if you
> want to chat about it.
>
> I've also CCed Marc-André to see what he thinks about removing the
> blocking recv from libvhost-user and instead using the event loop (just
> like for eventfds).
>
> Stefan

Stefan Hajnoczi Feb. 27, 2020, 7:41 a.m. UTC | #3

On Wed, Feb 26, 2020 at 11:18:41PM +0800, Coiby Xu wrote:
> Hi Stefan,
> 
> Thank you for reviewing my code!
> 
> I tried to reach you on IRC. But somehow either you missed my message
> or I missed your reply. So I will reply by email instead.
> 
> If we use qio_channel_set_aio_fd_handler to monitor G_IO_IN event,
> i.e. use vu_dispatch as the read handler, then we can re-use
> vu_message_read. And "removing the blocking recv from libvhost-user"
> isn't necessary because "the operation of poll() and ppoll() is not
> affected by the O_NONBLOCK flag" despite that we use
> qio_channel_set_blocking before calling qio_channel_set_aio_fd_handler
> to make recv non-blocking.

I'm not sure I understand.  poll() just says whether the file descriptor
is readable.  It does not say whether enough bytes are readable :).  So
our callback will be invoked if there is 1 byte ready, but when we try
to read 20 bytes either it will block (without O_NONBLOCK) or return
only 1 byte (with O_NONBLOCK).  Neither case is okay, so I expect that
code changes will be necessary.

But please go ahead and send the next revision and I'll take a look.

Stefan

Coiby Xu Feb. 27, 2020, 9:53 a.m. UTC | #4

Thank you for reminding me of this socket short read issue! It seems
we still need customized vu_message_read because libvhost-user assumes
we will always get a full-size VhostUserMsg and hasn't taken care of
this short read case. I will improve libvhost-user's vu_message_read
by making it keep reading from socket util getting enough bytes. I
assume short read is a rare case thus introduced performance penalty
would be negligible.


On Thu, Feb 27, 2020 at 3:41 PM Stefan Hajnoczi <stefanha@redhat.com> wrote:
>
> On Wed, Feb 26, 2020 at 11:18:41PM +0800, Coiby Xu wrote:
> > Hi Stefan,
> >
> > Thank you for reviewing my code!
> >
> > I tried to reach you on IRC. But somehow either you missed my message
> > or I missed your reply. So I will reply by email instead.
> >
> > If we use qio_channel_set_aio_fd_handler to monitor G_IO_IN event,
> > i.e. use vu_dispatch as the read handler, then we can re-use
> > vu_message_read. And "removing the blocking recv from libvhost-user"
> > isn't necessary because "the operation of poll() and ppoll() is not
> > affected by the O_NONBLOCK flag" despite that we use
> > qio_channel_set_blocking before calling qio_channel_set_aio_fd_handler
> > to make recv non-blocking.
>
> I'm not sure I understand.  poll() just says whether the file descriptor
> is readable.  It does not say whether enough bytes are readable :).  So
> our callback will be invoked if there is 1 byte ready, but when we try
> to read 20 bytes either it will block (without O_NONBLOCK) or return
> only 1 byte (with O_NONBLOCK).  Neither case is okay, so I expect that
> code changes will be necessary.
>
> But please go ahead and send the next revision and I'll take a look.
>
> Stefan



--
Best regards,
Coiby

Kevin Wolf Feb. 27, 2020, 10:02 a.m. UTC | #5

Am 27.02.2020 um 10:53 hat Coiby Xu geschrieben:
> Thank you for reminding me of this socket short read issue! It seems
> we still need customized vu_message_read because libvhost-user assumes
> we will always get a full-size VhostUserMsg and hasn't taken care of
> this short read case. I will improve libvhost-user's vu_message_read
> by making it keep reading from socket util getting enough bytes. I
> assume short read is a rare case thus introduced performance penalty
> would be negligible.

In any case, please make sure that we use the QIOChannel functions
called from a coroutine in QEMU so that it will never block, but the
coroutine can just yield while it's waiting for more bytes.

Kevin

> On Thu, Feb 27, 2020 at 3:41 PM Stefan Hajnoczi <stefanha@redhat.com> wrote:
> >
> > On Wed, Feb 26, 2020 at 11:18:41PM +0800, Coiby Xu wrote:
> > > Hi Stefan,
> > >
> > > Thank you for reviewing my code!
> > >
> > > I tried to reach you on IRC. But somehow either you missed my message
> > > or I missed your reply. So I will reply by email instead.
> > >
> > > If we use qio_channel_set_aio_fd_handler to monitor G_IO_IN event,
> > > i.e. use vu_dispatch as the read handler, then we can re-use
> > > vu_message_read. And "removing the blocking recv from libvhost-user"
> > > isn't necessary because "the operation of poll() and ppoll() is not
> > > affected by the O_NONBLOCK flag" despite that we use
> > > qio_channel_set_blocking before calling qio_channel_set_aio_fd_handler
> > > to make recv non-blocking.
> >
> > I'm not sure I understand.  poll() just says whether the file descriptor
> > is readable.  It does not say whether enough bytes are readable :).  So
> > our callback will be invoked if there is 1 byte ready, but when we try
> > to read 20 bytes either it will block (without O_NONBLOCK) or return
> > only 1 byte (with O_NONBLOCK).  Neither case is okay, so I expect that
> > code changes will be necessary.
> >
> > But please go ahead and send the next revision and I'll take a look.
> >
> > Stefan
> 
> 
> 
> --
> Best regards,
> Coiby
>

Coiby Xu Feb. 27, 2020, 10:28 a.m. UTC | #6

> > we still need customized vu_message_read because libvhost-user assumes
> > we will always get a full-size VhostUserMsg and hasn't taken care of
> > this short read case. I will improve libvhost-user's vu_message_read
> > by making it keep reading from socket util getting enough bytes. I
> > assume short read is a rare case thus introduced performance penalty
> > would be negligible.

> In any case, please make sure that we use the QIOChannel functions
> called from a coroutine in QEMU so that it will never block, but the
> coroutine can just yield while it's waiting for more bytes.

But if I am not wrong, libvhost-user is supposed to be indepdent from
the main QEMU code. So it can't use the QIOChannel functions if we
simply modify exiting vu_message_read to address the short read issue.
In v3 & v4, I extended libvhost-user to allow vu_message_read to be
replaced by one which will depend on the main QEMU code. I'm not sure
which way is better.

On Thu, Feb 27, 2020 at 6:02 PM Kevin Wolf <kwolf@redhat.com> wrote:
>
> Am 27.02.2020 um 10:53 hat Coiby Xu geschrieben:
> > Thank you for reminding me of this socket short read issue! It seems
> > we still need customized vu_message_read because libvhost-user assumes
> > we will always get a full-size VhostUserMsg and hasn't taken care of
> > this short read case. I will improve libvhost-user's vu_message_read
> > by making it keep reading from socket util getting enough bytes. I
> > assume short read is a rare case thus introduced performance penalty
> > would be negligible.
>
> In any case, please make sure that we use the QIOChannel functions
> called from a coroutine in QEMU so that it will never block, but the
> coroutine can just yield while it's waiting for more bytes.
>
> Kevin
>
> > On Thu, Feb 27, 2020 at 3:41 PM Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > >
> > > On Wed, Feb 26, 2020 at 11:18:41PM +0800, Coiby Xu wrote:
> > > > Hi Stefan,
> > > >
> > > > Thank you for reviewing my code!
> > > >
> > > > I tried to reach you on IRC. But somehow either you missed my message
> > > > or I missed your reply. So I will reply by email instead.
> > > >
> > > > If we use qio_channel_set_aio_fd_handler to monitor G_IO_IN event,
> > > > i.e. use vu_dispatch as the read handler, then we can re-use
> > > > vu_message_read. And "removing the blocking recv from libvhost-user"
> > > > isn't necessary because "the operation of poll() and ppoll() is not
> > > > affected by the O_NONBLOCK flag" despite that we use
> > > > qio_channel_set_blocking before calling qio_channel_set_aio_fd_handler
> > > > to make recv non-blocking.
> > >
> > > I'm not sure I understand.  poll() just says whether the file descriptor
> > > is readable.  It does not say whether enough bytes are readable :).  So
> > > our callback will be invoked if there is 1 byte ready, but when we try
> > > to read 20 bytes either it will block (without O_NONBLOCK) or return
> > > only 1 byte (with O_NONBLOCK).  Neither case is okay, so I expect that
> > > code changes will be necessary.
> > >
> > > But please go ahead and send the next revision and I'll take a look.
> > >
> > > Stefan
> >
> >
> >
> > --
> > Best regards,
> > Coiby
> >
>

Kevin Wolf Feb. 27, 2020, 10:55 a.m. UTC | #7

Am 27.02.2020 um 11:28 hat Coiby Xu geschrieben:
> > > we still need customized vu_message_read because libvhost-user assumes
> > > we will always get a full-size VhostUserMsg and hasn't taken care of
> > > this short read case. I will improve libvhost-user's vu_message_read
> > > by making it keep reading from socket util getting enough bytes. I
> > > assume short read is a rare case thus introduced performance penalty
> > > would be negligible.
> 
> > In any case, please make sure that we use the QIOChannel functions
> > called from a coroutine in QEMU so that it will never block, but the
> > coroutine can just yield while it's waiting for more bytes.
> 
> But if I am not wrong, libvhost-user is supposed to be indepdent from
> the main QEMU code. So it can't use the QIOChannel functions if we
> simply modify exiting vu_message_read to address the short read issue.
> In v3 & v4, I extended libvhost-user to allow vu_message_read to be
> replaced by one which will depend on the main QEMU code. I'm not sure
> which way is better.

The way your latest patches have it, with a separate read function,
works for me.

You could probably change libvhost-user to reimplement the same
functionality, and it might be an improvement for other users of the
library, but it's also code duplication and doesn't provide more value
in the context of the vhost-user export in QEMU.

The point that's really important to me is just that we never block when
we run inside QEMU because that would actually stall the guest. This
means busy waiting in a tight loop until read() returns enough bytes is
not acceptable in QEMU.

Kevin

> On Thu, Feb 27, 2020 at 6:02 PM Kevin Wolf <kwolf@redhat.com> wrote:
> >
> > Am 27.02.2020 um 10:53 hat Coiby Xu geschrieben:
> > > Thank you for reminding me of this socket short read issue! It seems
> > > we still need customized vu_message_read because libvhost-user assumes
> > > we will always get a full-size VhostUserMsg and hasn't taken care of
> > > this short read case. I will improve libvhost-user's vu_message_read
> > > by making it keep reading from socket util getting enough bytes. I
> > > assume short read is a rare case thus introduced performance penalty
> > > would be negligible.
> >
> > In any case, please make sure that we use the QIOChannel functions
> > called from a coroutine in QEMU so that it will never block, but the
> > coroutine can just yield while it's waiting for more bytes.
> >
> > Kevin
> >
> > > On Thu, Feb 27, 2020 at 3:41 PM Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > > >
> > > > On Wed, Feb 26, 2020 at 11:18:41PM +0800, Coiby Xu wrote:
> > > > > Hi Stefan,
> > > > >
> > > > > Thank you for reviewing my code!
> > > > >
> > > > > I tried to reach you on IRC. But somehow either you missed my message
> > > > > or I missed your reply. So I will reply by email instead.
> > > > >
> > > > > If we use qio_channel_set_aio_fd_handler to monitor G_IO_IN event,
> > > > > i.e. use vu_dispatch as the read handler, then we can re-use
> > > > > vu_message_read. And "removing the blocking recv from libvhost-user"
> > > > > isn't necessary because "the operation of poll() and ppoll() is not
> > > > > affected by the O_NONBLOCK flag" despite that we use
> > > > > qio_channel_set_blocking before calling qio_channel_set_aio_fd_handler
> > > > > to make recv non-blocking.
> > > >
> > > > I'm not sure I understand.  poll() just says whether the file descriptor
> > > > is readable.  It does not say whether enough bytes are readable :).  So
> > > > our callback will be invoked if there is 1 byte ready, but when we try
> > > > to read 20 bytes either it will block (without O_NONBLOCK) or return
> > > > only 1 byte (with O_NONBLOCK).  Neither case is okay, so I expect that
> > > > code changes will be necessary.
> > > >
> > > > But please go ahead and send the next revision and I'll take a look.
> > > >
> > > > Stefan
> > >
> > >
> > >
> > > --
> > > Best regards,
> > > Coiby
> > >
> >
> 
> 
> -- 
> Best regards,
> Coiby
>

Marc-André Lureau Feb. 27, 2020, 11:07 a.m. UTC | #8

Hi

On Thu, Feb 27, 2020 at 11:55 AM Kevin Wolf <kwolf@redhat.com> wrote:
>
> Am 27.02.2020 um 11:28 hat Coiby Xu geschrieben:
> > > > we still need customized vu_message_read because libvhost-user assumes
> > > > we will always get a full-size VhostUserMsg and hasn't taken care of
> > > > this short read case. I will improve libvhost-user's vu_message_read
> > > > by making it keep reading from socket util getting enough bytes. I
> > > > assume short read is a rare case thus introduced performance penalty
> > > > would be negligible.
> >
> > > In any case, please make sure that we use the QIOChannel functions
> > > called from a coroutine in QEMU so that it will never block, but the
> > > coroutine can just yield while it's waiting for more bytes.
> >
> > But if I am not wrong, libvhost-user is supposed to be indepdent from
> > the main QEMU code. So it can't use the QIOChannel functions if we
> > simply modify exiting vu_message_read to address the short read issue.
> > In v3 & v4, I extended libvhost-user to allow vu_message_read to be
> > replaced by one which will depend on the main QEMU code. I'm not sure
> > which way is better.
>
> The way your latest patches have it, with a separate read function,
> works for me.

Done right, I am not against it, fwiw

> You could probably change libvhost-user to reimplement the same
> functionality, and it might be an improvement for other users of the
> library, but it's also code duplication and doesn't provide more value
> in the context of the vhost-user export in QEMU.
>
> The point that's really important to me is just that we never block when
> we run inside QEMU because that would actually stall the guest. This
> means busy waiting in a tight loop until read() returns enough bytes is
> not acceptable in QEMU.

In the context of vhost-user, local unix sockets with short messages
(do we have >1k messages?), I am not sure if this is really a problem.

And isn't it possible to run libvhost-user in its own thread for this series?


>
> Kevin
>
> > On Thu, Feb 27, 2020 at 6:02 PM Kevin Wolf <kwolf@redhat.com> wrote:
> > >
> > > Am 27.02.2020 um 10:53 hat Coiby Xu geschrieben:
> > > > Thank you for reminding me of this socket short read issue! It seems
> > > > we still need customized vu_message_read because libvhost-user assumes
> > > > we will always get a full-size VhostUserMsg and hasn't taken care of
> > > > this short read case. I will improve libvhost-user's vu_message_read
> > > > by making it keep reading from socket util getting enough bytes. I
> > > > assume short read is a rare case thus introduced performance penalty
> > > > would be negligible.
> > >
> > > In any case, please make sure that we use the QIOChannel functions
> > > called from a coroutine in QEMU so that it will never block, but the
> > > coroutine can just yield while it's waiting for more bytes.
> > >
> > > Kevin
> > >
> > > > On Thu, Feb 27, 2020 at 3:41 PM Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > > > >
> > > > > On Wed, Feb 26, 2020 at 11:18:41PM +0800, Coiby Xu wrote:
> > > > > > Hi Stefan,
> > > > > >
> > > > > > Thank you for reviewing my code!
> > > > > >
> > > > > > I tried to reach you on IRC. But somehow either you missed my message
> > > > > > or I missed your reply. So I will reply by email instead.
> > > > > >
> > > > > > If we use qio_channel_set_aio_fd_handler to monitor G_IO_IN event,
> > > > > > i.e. use vu_dispatch as the read handler, then we can re-use
> > > > > > vu_message_read. And "removing the blocking recv from libvhost-user"
> > > > > > isn't necessary because "the operation of poll() and ppoll() is not
> > > > > > affected by the O_NONBLOCK flag" despite that we use
> > > > > > qio_channel_set_blocking before calling qio_channel_set_aio_fd_handler
> > > > > > to make recv non-blocking.
> > > > >
> > > > > I'm not sure I understand.  poll() just says whether the file descriptor
> > > > > is readable.  It does not say whether enough bytes are readable :).  So
> > > > > our callback will be invoked if there is 1 byte ready, but when we try
> > > > > to read 20 bytes either it will block (without O_NONBLOCK) or return
> > > > > only 1 byte (with O_NONBLOCK).  Neither case is okay, so I expect that
> > > > > code changes will be necessary.
> > > > >
> > > > > But please go ahead and send the next revision and I'll take a look.
> > > > >
> > > > > Stefan
> > > >
> > > >
> > > >
> > > > --
> > > > Best regards,
> > > > Coiby
> > > >
> > >
> >
> >
> > --
> > Best regards,
> > Coiby
> >
>

Kevin Wolf Feb. 27, 2020, 11:19 a.m. UTC | #9

Am 27.02.2020 um 12:07 hat Marc-André Lureau geschrieben:
> On Thu, Feb 27, 2020 at 11:55 AM Kevin Wolf <kwolf@redhat.com> wrote:
> > Am 27.02.2020 um 11:28 hat Coiby Xu geschrieben:
> > > > > we still need customized vu_message_read because libvhost-user assumes
> > > > > we will always get a full-size VhostUserMsg and hasn't taken care of
> > > > > this short read case. I will improve libvhost-user's vu_message_read
> > > > > by making it keep reading from socket util getting enough bytes. I
> > > > > assume short read is a rare case thus introduced performance penalty
> > > > > would be negligible.
> > >
> > > > In any case, please make sure that we use the QIOChannel functions
> > > > called from a coroutine in QEMU so that it will never block, but the
> > > > coroutine can just yield while it's waiting for more bytes.
> > >
> > > But if I am not wrong, libvhost-user is supposed to be indepdent from
> > > the main QEMU code. So it can't use the QIOChannel functions if we
> > > simply modify exiting vu_message_read to address the short read issue.
> > > In v3 & v4, I extended libvhost-user to allow vu_message_read to be
> > > replaced by one which will depend on the main QEMU code. I'm not sure
> > > which way is better.
> >
> > The way your latest patches have it, with a separate read function,
> > works for me.
> 
> Done right, I am not against it, fwiw
> 
> > You could probably change libvhost-user to reimplement the same
> > functionality, and it might be an improvement for other users of the
> > library, but it's also code duplication and doesn't provide more value
> > in the context of the vhost-user export in QEMU.
> >
> > The point that's really important to me is just that we never block when
> > we run inside QEMU because that would actually stall the guest. This
> > means busy waiting in a tight loop until read() returns enough bytes is
> > not acceptable in QEMU.
> 
> In the context of vhost-user, local unix sockets with short messages
> (do we have >1k messages?), I am not sure if this is really a problem.

I'm not sure how much of a problem it is in practice, and whether we
can consider the vhost-user client trusted. But using QIOChannel from
within a coroutine just avoids the problem completely, so it feels like
a natural choice to just do that.

> And isn't it possible to run libvhost-user in its own thread for this
> series?

You need to run the actual block I/O requests in the iothread where the
block device happens to run. So if you move the message processing to a
separate thread, you would have to communicate between threads for the
actual request processing. Possible, but probably slower than necessary
and certainly more complex.

Kevin

Daniel P. Berrangé Feb. 27, 2020, 11:38 a.m. UTC | #10

On Thu, Feb 27, 2020 at 12:19:58PM +0100, Kevin Wolf wrote:
> Am 27.02.2020 um 12:07 hat Marc-André Lureau geschrieben:
> > On Thu, Feb 27, 2020 at 11:55 AM Kevin Wolf <kwolf@redhat.com> wrote:
> > > Am 27.02.2020 um 11:28 hat Coiby Xu geschrieben:
> > > > > > we still need customized vu_message_read because libvhost-user assumes
> > > > > > we will always get a full-size VhostUserMsg and hasn't taken care of
> > > > > > this short read case. I will improve libvhost-user's vu_message_read
> > > > > > by making it keep reading from socket util getting enough bytes. I
> > > > > > assume short read is a rare case thus introduced performance penalty
> > > > > > would be negligible.
> > > >
> > > > > In any case, please make sure that we use the QIOChannel functions
> > > > > called from a coroutine in QEMU so that it will never block, but the
> > > > > coroutine can just yield while it's waiting for more bytes.
> > > >
> > > > But if I am not wrong, libvhost-user is supposed to be indepdent from
> > > > the main QEMU code. So it can't use the QIOChannel functions if we
> > > > simply modify exiting vu_message_read to address the short read issue.
> > > > In v3 & v4, I extended libvhost-user to allow vu_message_read to be
> > > > replaced by one which will depend on the main QEMU code. I'm not sure
> > > > which way is better.
> > >
> > > The way your latest patches have it, with a separate read function,
> > > works for me.
> > 
> > Done right, I am not against it, fwiw
> > 
> > > You could probably change libvhost-user to reimplement the same
> > > functionality, and it might be an improvement for other users of the
> > > library, but it's also code duplication and doesn't provide more value
> > > in the context of the vhost-user export in QEMU.
> > >
> > > The point that's really important to me is just that we never block when
> > > we run inside QEMU because that would actually stall the guest. This
> > > means busy waiting in a tight loop until read() returns enough bytes is
> > > not acceptable in QEMU.
> > 
> > In the context of vhost-user, local unix sockets with short messages
> > (do we have >1k messages?), I am not sure if this is really a problem.
> 
> I'm not sure how much of a problem it is in practice, and whether we
> can consider the vhost-user client trusted. But using QIOChannel from
> within a coroutine just avoids the problem completely, so it feels like
> a natural choice to just do that.

It isn't clear to me that we have a consitent plan for how we intend
libvhost-user to develop & what it is permitted to use.  What information
I see in the source code and in this thread are contradictory.

For example, in the text quoted above:

  "libvhost-user is supposed to be indepdent from the main QEMU code."

which did match my overall understanding too. At the top of libvhost-user.c
there is a comment

   /* this code avoids GLib dependency */

but a few lines later it does

  #include "qemu/atomic.h"
  #include "qemu/osdep.h"
  #include "qemu/memfd.h"

and in the Makefile we link it to much of QEMU util code:

  libvhost-user.a: $(libvhost-user-obj-y) $(util-obj-y) $(stub-obj-y)

this in turn pulls in GLib code, and looking at symbols we can see
over 100 GLib functions used:

  $ nm ./libvhost-user.a | grep 'U g_' | sort | uniq | wc -l
  128

And over 200 QEMU source object files included:

  $ nm ./libvhost-user.a | grep '.o:' | sort | wc -l
  224

So unless I'm missing something, we have lost any independance from
both QEMU and GLib code that we might have had in the past.

Note this also has licensing implications, as I expect this means that
via the QEMU source objects it pulls in, libvhost-user.a has become
a GPLv2-only combined work, not a GPLv2-or-later combined work.


Regards,
Daniel

Marc-André Lureau Feb. 27, 2020, 1:07 p.m. UTC | #11

Hi

On Thu, Feb 27, 2020 at 12:39 PM Daniel P. Berrangé <berrange@redhat.com> wrote:
>
> On Thu, Feb 27, 2020 at 12:19:58PM +0100, Kevin Wolf wrote:
> > Am 27.02.2020 um 12:07 hat Marc-André Lureau geschrieben:
> > > On Thu, Feb 27, 2020 at 11:55 AM Kevin Wolf <kwolf@redhat.com> wrote:
> > > > Am 27.02.2020 um 11:28 hat Coiby Xu geschrieben:
> > > > > > > we still need customized vu_message_read because libvhost-user assumes
> > > > > > > we will always get a full-size VhostUserMsg and hasn't taken care of
> > > > > > > this short read case. I will improve libvhost-user's vu_message_read
> > > > > > > by making it keep reading from socket util getting enough bytes. I
> > > > > > > assume short read is a rare case thus introduced performance penalty
> > > > > > > would be negligible.
> > > > >
> > > > > > In any case, please make sure that we use the QIOChannel functions
> > > > > > called from a coroutine in QEMU so that it will never block, but the
> > > > > > coroutine can just yield while it's waiting for more bytes.
> > > > >
> > > > > But if I am not wrong, libvhost-user is supposed to be indepdent from
> > > > > the main QEMU code. So it can't use the QIOChannel functions if we
> > > > > simply modify exiting vu_message_read to address the short read issue.
> > > > > In v3 & v4, I extended libvhost-user to allow vu_message_read to be
> > > > > replaced by one which will depend on the main QEMU code. I'm not sure
> > > > > which way is better.
> > > >
> > > > The way your latest patches have it, with a separate read function,
> > > > works for me.
> > >
> > > Done right, I am not against it, fwiw
> > >
> > > > You could probably change libvhost-user to reimplement the same
> > > > functionality, and it might be an improvement for other users of the
> > > > library, but it's also code duplication and doesn't provide more value
> > > > in the context of the vhost-user export in QEMU.
> > > >
> > > > The point that's really important to me is just that we never block when
> > > > we run inside QEMU because that would actually stall the guest. This
> > > > means busy waiting in a tight loop until read() returns enough bytes is
> > > > not acceptable in QEMU.
> > >
> > > In the context of vhost-user, local unix sockets with short messages
> > > (do we have >1k messages?), I am not sure if this is really a problem.
> >
> > I'm not sure how much of a problem it is in practice, and whether we
> > can consider the vhost-user client trusted. But using QIOChannel from
> > within a coroutine just avoids the problem completely, so it feels like
> > a natural choice to just do that.
>
> It isn't clear to me that we have a consitent plan for how we intend
> libvhost-user to develop & what it is permitted to use.  What information
> I see in the source code and in this thread are contradictory.
>
> For example, in the text quoted above:
>
>   "libvhost-user is supposed to be indepdent from the main QEMU code."
>
> which did match my overall understanding too. At the top of libvhost-user.c
> there is a comment
>
>    /* this code avoids GLib dependency */
>
> but a few lines later it does
>
>   #include "qemu/atomic.h"
>   #include "qemu/osdep.h"
>   #include "qemu/memfd.h"
>
> and in the Makefile we link it to much of QEMU util code:
>
>   libvhost-user.a: $(libvhost-user-obj-y) $(util-obj-y) $(stub-obj-y)
>
> this in turn pulls in GLib code, and looking at symbols we can see
> over 100 GLib functions used:
>
>   $ nm ./libvhost-user.a | grep 'U g_' | sort | uniq | wc -l
>   128
>
> And over 200 QEMU source object files included:
>
>   $ nm ./libvhost-user.a | grep '.o:' | sort | wc -l
>   224
>
> So unless I'm missing something, we have lost any independance from
> both QEMU and GLib code that we might have had in the past.

Yep, I think this is mostly due to commit 5f9ff1eff38 ("libvhost-user:
Support tracking inflight I/O in shared memory")

It may not be that hard to bring back glib independence. Is it worth it though?

>
> Note this also has licensing implications, as I expect this means that
> via the QEMU source objects it pulls in, libvhost-user.a has become
> a GPLv2-only combined work, not a GPLv2-or-later combined work.
>

libvhost-user.c is GPLv2-or-later because tests/vhost-user-bridge.c
was and is still as well. Do we need to change that?

> Regards,
> Daniel
> --
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
>
>

[v4,0/5] vhost-user block device backend implementation

Message

Comments