mbox series

[RFC,0/2] vhost-vfio: introduce mdev based HW vhost backend

Message ID 20181016132327.121839-1-xiao.w.wang@intel.com
Headers show
Series vhost-vfio: introduce mdev based HW vhost backend | expand

Message

Xiao Wang Oct. 16, 2018, 1:23 p.m. UTC
What's this
===========
Following the patch (vhost: introduce mdev based hardware vhost backend)
https://lwn.net/Articles/750770/, which defines a generic mdev device for
vhost data path acceleration (aliased as vDPA mdev below), this patch set
introduces a new net client type: vhost-vfio.

Currently we have 2 types of vhost backends in QEMU: vhost kernel (tap)
and vhost-user (e.g. DPDK vhost), in order to have a kernel space HW vhost
acceleration framework, the vDPA mdev device works as a generic configuring
channel. It exposes to user space a non-vendor-specific configuration
interface for setting up a vhost HW accelerator, based on this, this patch
set introduces a third vhost backend called vhost-vfio.

How does it work
================
The vDPA mdev defines 2 BAR regions, BAR0 and BAR1. BAR0 is the main
device interface, vhost messages can be written to or read from this
region following below format. All the regular vhost messages about vring
addr, negotiated features, etc., are written to this region directly.

struct vhost_vfio_op {
	__u64 request;
	__u32 flags;
	/* Flag values: */
#define VHOST_VFIO_NEED_REPLY 0x1 /* Whether need reply */
	__u32 size;
	union {
		__u64 u64;
		struct vhost_vring_state state;
		struct vhost_vring_addr addr;
		struct vhost_memory memory;
	} payload;
};

BAR1 is defined to be a region of doorbells, QEMU can use this region as
host notifier for virtio. To optimize virtio notify, vhost-vfio trys to
mmap the corresponding page on BAR1 for each queue and leverage EPT to let
guest virtio driver kick vDPA device doorbell directly. For virtio 0.95
case in which we cannot set host notifier memory region, QEMU will help to
relay the notify to vDPA device.

Note: EPT mapping requires each queue's notify address locates at the
beginning of a separate page, parameter "page-per-vq=on" could help.

For interrupt setting, vDPA mdev device leverages existing VFIO API to
enable interrupt config in user space. In this way, KVM's irqfd for virtio
can be set to mdev device by QEMU using ioctl().

vhost-vfio net client will set up a vDPA mdev device which is specified
by a "sysfsdev" parameter, during the net client init, the device will be
opened and parsed using VFIO API, the VFIO device fd and device BAR region
offset will be kept in a VhostVFIO structure, this initialization provides
a channel to configure vhost information to the vDPA device driver.

To do later
===========
1. The net client initialization uses raw VFIO API to open vDPA mdev
device, it's better to provide a set of helpers in hw/vfio/common.c
to help vhost-vfio initialize device easily.

2. For device DMA mapping, QEMU passes memory region info to mdev device
and let kernel parent device driver program IOMMU. This is a temporary
implementation, for future when IOMMU driver supports mdev bus, we
can use VFIO API to program IOMMU directly for parent device.
Refer to the patch (vfio/mdev: IOMMU aware mediated device):
https://lkml.org/lkml/2018/10/12/225

Vhost-vfio usage
================
# Query the number of available mdev instances
$ cat /sys/class/mdev_bus/0000:84:00.3/mdev_supported_types/ifcvf_vdpa-vdpa_virtio/available_instances

# Create a mdev instance
$ echo $UUID > /sys/class/mdev_bus/0000:84:00.3/mdev_supported_types/ifcvf_vdpa-vdpa_virtio/create

# Launch QEMU with a virtio-net device
    qemu-system-x86_64 -cpu host -enable-kvm \
    <snip>
    -mem-prealloc \
    -netdev type=vhost-vfio,sysfsdev=/sys/bus/mdev/devices/$UUID,id=mynet\
    -device virtio-net-pci,netdv=mynet,page-per-vq=on \

-------- END --------

Xiao Wang (2):
  vhost-vfio: introduce vhost-vfio net client
  vhost-vfio: implement vhost-vfio backend

 hw/net/vhost_net.c                |  56 ++++-
 hw/vfio/common.c                  |   3 +-
 hw/virtio/Makefile.objs           |   2 +-
 hw/virtio/vhost-backend.c         |   3 +
 hw/virtio/vhost-vfio.c            | 501 ++++++++++++++++++++++++++++++++++++++
 hw/virtio/vhost.c                 |  15 ++
 include/hw/virtio/vhost-backend.h |   7 +-
 include/hw/virtio/vhost-vfio.h    |  35 +++
 include/hw/virtio/vhost.h         |   2 +
 include/net/vhost-vfio.h          |  17 ++
 linux-headers/linux/vhost.h       |   9 +
 net/Makefile.objs                 |   1 +
 net/clients.h                     |   3 +
 net/net.c                         |   1 +
 net/vhost-vfio.c                  | 327 +++++++++++++++++++++++++
 qapi/net.json                     |  22 +-
 16 files changed, 996 insertions(+), 8 deletions(-)
 create mode 100644 hw/virtio/vhost-vfio.c
 create mode 100644 include/hw/virtio/vhost-vfio.h
 create mode 100644 include/net/vhost-vfio.h
 create mode 100644 net/vhost-vfio.c

Comments

Jason Wang Nov. 6, 2018, 4:17 a.m. UTC | #1
On 2018/10/16 下午9:23, Xiao Wang wrote:
> What's this
> ===========
> Following the patch (vhost: introduce mdev based hardware vhost backend)
> https://lwn.net/Articles/750770/, which defines a generic mdev device for
> vhost data path acceleration (aliased as vDPA mdev below), this patch set
> introduces a new net client type: vhost-vfio.


Thanks a lot for a such interesting series. Some generic questions:


If we consider to use software backend (e.g vhost-kernel or a rely of 
virito-vhost-user or other cases) as well in the future, maybe 
vhost-mdev is better which mean it does not tie to VFIO anyway.


>
> Currently we have 2 types of vhost backends in QEMU: vhost kernel (tap)
> and vhost-user (e.g. DPDK vhost), in order to have a kernel space HW vhost
> acceleration framework, the vDPA mdev device works as a generic configuring
> channel.


Does "generic" configuring channel means dpdk will also go for this way? 
E.g it will have a vhost mdev pmd?


>   It exposes to user space a non-vendor-specific configuration
> interface for setting up a vhost HW accelerator,


Or even a software translation layer on top of exist hardware.


> based on this, this patch
> set introduces a third vhost backend called vhost-vfio.
>
> How does it work
> ================
> The vDPA mdev defines 2 BAR regions, BAR0 and BAR1. BAR0 is the main
> device interface, vhost messages can be written to or read from this
> region following below format. All the regular vhost messages about vring
> addr, negotiated features, etc., are written to this region directly.


If I understand this correctly, the mdev was not used for passed through 
to guest directly. So what's the reason of inventing a PCI like device 
here? I'm asking since:

- vhost protocol is transport indepedent, we should consider to support 
transport other than PCI. I know we can even do it with the exist design 
but it looks rather odd if we do e.g ccw device with a PCI like mediated 
device.

- can we try to reuse vhost-kernel ioctl? Less API means less bugs and 
code reusing. E.g virtio-user can benefit from the vhost kernel ioctl 
API almost with no changes I believe.


>
> struct vhost_vfio_op {
> 	__u64 request;
> 	__u32 flags;
> 	/* Flag values: */
> #define VHOST_VFIO_NEED_REPLY 0x1 /* Whether need reply */
> 	__u32 size;
> 	union {
> 		__u64 u64;
> 		struct vhost_vring_state state;
> 		struct vhost_vring_addr addr;
> 		struct vhost_memory memory;
> 	} payload;
> };
>
> BAR1 is defined to be a region of doorbells, QEMU can use this region as
> host notifier for virtio. To optimize virtio notify, vhost-vfio trys to
> mmap the corresponding page on BAR1 for each queue and leverage EPT to let
> guest virtio driver kick vDPA device doorbell directly. For virtio 0.95
> case in which we cannot set host notifier memory region, QEMU will help to
> relay the notify to vDPA device.
>
> Note: EPT mapping requires each queue's notify address locates at the
> beginning of a separate page, parameter "page-per-vq=on" could help.


I think qemu should prepare a fallback for this if page-per-vq is off.


>
> For interrupt setting, vDPA mdev device leverages existing VFIO API to
> enable interrupt config in user space. In this way, KVM's irqfd for virtio
> can be set to mdev device by QEMU using ioctl().
>
> vhost-vfio net client will set up a vDPA mdev device which is specified
> by a "sysfsdev" parameter, during the net client init, the device will be
> opened and parsed using VFIO API, the VFIO device fd and device BAR region
> offset will be kept in a VhostVFIO structure, this initialization provides
> a channel to configure vhost information to the vDPA device driver.
>
> To do later
> ===========
> 1. The net client initialization uses raw VFIO API to open vDPA mdev
> device, it's better to provide a set of helpers in hw/vfio/common.c
> to help vhost-vfio initialize device easily.
>
> 2. For device DMA mapping, QEMU passes memory region info to mdev device
> and let kernel parent device driver program IOMMU. This is a temporary
> implementation, for future when IOMMU driver supports mdev bus, we
> can use VFIO API to program IOMMU directly for parent device.
> Refer to the patch (vfio/mdev: IOMMU aware mediated device):
> https://lkml.org/lkml/2018/10/12/225


As Steve mentioned in the KVM forum. It's better to have at least one 
sample driver e.g virtio-net itself.

Then it would be more convenient for the reviewer to evaluate the whole 
stack.

Thanks


>
> Vhost-vfio usage
> ================
> # Query the number of available mdev instances
> $ cat /sys/class/mdev_bus/0000:84:00.3/mdev_supported_types/ifcvf_vdpa-vdpa_virtio/available_instances
>
> # Create a mdev instance
> $ echo $UUID > /sys/class/mdev_bus/0000:84:00.3/mdev_supported_types/ifcvf_vdpa-vdpa_virtio/create
>
> # Launch QEMU with a virtio-net device
>      qemu-system-x86_64 -cpu host -enable-kvm \
>      <snip>
>      -mem-prealloc \
>      -netdev type=vhost-vfio,sysfsdev=/sys/bus/mdev/devices/$UUID,id=mynet\
>      -device virtio-net-pci,netdv=mynet,page-per-vq=on \
>
> -------- END --------
>
> Xiao Wang (2):
>    vhost-vfio: introduce vhost-vfio net client
>    vhost-vfio: implement vhost-vfio backend
>
>   hw/net/vhost_net.c                |  56 ++++-
>   hw/vfio/common.c                  |   3 +-
>   hw/virtio/Makefile.objs           |   2 +-
>   hw/virtio/vhost-backend.c         |   3 +
>   hw/virtio/vhost-vfio.c            | 501 ++++++++++++++++++++++++++++++++++++++
>   hw/virtio/vhost.c                 |  15 ++
>   include/hw/virtio/vhost-backend.h |   7 +-
>   include/hw/virtio/vhost-vfio.h    |  35 +++
>   include/hw/virtio/vhost.h         |   2 +
>   include/net/vhost-vfio.h          |  17 ++
>   linux-headers/linux/vhost.h       |   9 +
>   net/Makefile.objs                 |   1 +
>   net/clients.h                     |   3 +
>   net/net.c                         |   1 +
>   net/vhost-vfio.c                  | 327 +++++++++++++++++++++++++
>   qapi/net.json                     |  22 +-
>   16 files changed, 996 insertions(+), 8 deletions(-)
>   create mode 100644 hw/virtio/vhost-vfio.c
>   create mode 100644 include/hw/virtio/vhost-vfio.h
>   create mode 100644 include/net/vhost-vfio.h
>   create mode 100644 net/vhost-vfio.c
>
Liang, Cunming Nov. 7, 2018, 12:26 p.m. UTC | #2
> -----Original Message-----
> From: Jason Wang [mailto:jasowang@redhat.com]
> Sent: Tuesday, November 6, 2018 4:18 AM
> To: Wang, Xiao W <xiao.w.wang@intel.com>; mst@redhat.com;
> alex.williamson@redhat.com
> Cc: qemu-devel@nongnu.org; Bie, Tiwei <tiwei.bie@intel.com>; Liang, Cunming
> <cunming.liang@intel.com>; Ye, Xiaolong <xiaolong.ye@intel.com>; Wang, Zhihong
> <zhihong.wang@intel.com>; Daly, Dan <dan.daly@intel.com>
> Subject: Re: [RFC 0/2] vhost-vfio: introduce mdev based HW vhost backend
> 
> 
> On 2018/10/16 下午9:23, Xiao Wang wrote:
> > What's this
> > ===========
> > Following the patch (vhost: introduce mdev based hardware vhost
> > backend) https://lwn.net/Articles/750770/, which defines a generic
> > mdev device for vhost data path acceleration (aliased as vDPA mdev
> > below), this patch set introduces a new net client type: vhost-vfio.
> 
> 
> Thanks a lot for a such interesting series. Some generic questions:
> 
> 
> If we consider to use software backend (e.g vhost-kernel or a rely of virito-vhost-
> user or other cases) as well in the future, maybe vhost-mdev is better which mean it
> does not tie to VFIO anyway.
[LC] The initial thought of using term of '-vfio' due to the VFIO UAPI being used as interface, which is the only available mdev bus driver. It causes to use the term of 'vhost-vfio' in qemu, while using term of 'vhost-mdev' which represents a helper in kernel for vhost messages via mdev.

> 
> 
> >
> > Currently we have 2 types of vhost backends in QEMU: vhost kernel
> > (tap) and vhost-user (e.g. DPDK vhost), in order to have a kernel
> > space HW vhost acceleration framework, the vDPA mdev device works as a
> > generic configuring channel.
> 
> 
> Does "generic" configuring channel means dpdk will also go for this way?
> E.g it will have a vhost mdev pmd?
[LC] We don't plan to have a vhost-mdev pmd, but thinking to have consistent virtio PMD running on top of vhost-mdev.  Virtio PMD supports pci bus and vdev (by virtio-user) bus today. Vhost-mdev most likely would be introduced as another bus (mdev bus) provider. mdev bus DPDK support is in backlog.

> 
> 
> >   It exposes to user space a non-vendor-specific configuration
> > interface for setting up a vhost HW accelerator,
> 
> 
> Or even a software translation layer on top of exist hardware.
> 
> 
> > based on this, this patch
> > set introduces a third vhost backend called vhost-vfio.
> >
> > How does it work
> > ================
> > The vDPA mdev defines 2 BAR regions, BAR0 and BAR1. BAR0 is the main
> > device interface, vhost messages can be written to or read from this
> > region following below format. All the regular vhost messages about
> > vring addr, negotiated features, etc., are written to this region directly.
> 
> 
> If I understand this correctly, the mdev was not used for passed through to guest
> directly. So what's the reason of inventing a PCI like device here? I'm asking since:
[LC] mdev uses mandatory attribute of 'device_api' to identify the layout. We pick up one available from pci, platform, amba and ccw. It works if defining a new one for this transport.

> 
> - vhost protocol is transport indepedent, we should consider to support transport
> other than PCI. I know we can even do it with the exist design but it looks rather odd
> if we do e.g ccw device with a PCI like mediated device.
> 
> - can we try to reuse vhost-kernel ioctl? Less API means less bugs and code reusing.
> E.g virtio-user can benefit from the vhost kernel ioctl API almost with no changes I
> believe.
[LC] Agreed, so it reuses CMD defined by vhost-kernel ioctl. But VFIO provides device specific things (e.g. DMAR, INTR and etc.) which is the extra APIs being introduced by this transport.

> 
> 
> >
> > struct vhost_vfio_op {
> > 	__u64 request;
> > 	__u32 flags;
> > 	/* Flag values: */
> > #define VHOST_VFIO_NEED_REPLY 0x1 /* Whether need reply */
> > 	__u32 size;
> > 	union {
> > 		__u64 u64;
> > 		struct vhost_vring_state state;
> > 		struct vhost_vring_addr addr;
> > 		struct vhost_memory memory;
> > 	} payload;
> > };
> >
> > BAR1 is defined to be a region of doorbells, QEMU can use this region
> > as host notifier for virtio. To optimize virtio notify, vhost-vfio
> > trys to mmap the corresponding page on BAR1 for each queue and
> > leverage EPT to let guest virtio driver kick vDPA device doorbell
> > directly. For virtio 0.95 case in which we cannot set host notifier
> > memory region, QEMU will help to relay the notify to vDPA device.
> >
> > Note: EPT mapping requires each queue's notify address locates at the
> > beginning of a separate page, parameter "page-per-vq=on" could help.
> 
> 
> I think qemu should prepare a fallback for this if page-per-vq is off.
[LC] Yeah, qemu does that and fallback to a syscall to vhost-mdev in kernel.

> 
> 
> >
> > For interrupt setting, vDPA mdev device leverages existing VFIO API to
> > enable interrupt config in user space. In this way, KVM's irqfd for
> > virtio can be set to mdev device by QEMU using ioctl().
> >
> > vhost-vfio net client will set up a vDPA mdev device which is
> > specified by a "sysfsdev" parameter, during the net client init, the
> > device will be opened and parsed using VFIO API, the VFIO device fd
> > and device BAR region offset will be kept in a VhostVFIO structure,
> > this initialization provides a channel to configure vhost information to the vDPA
> device driver.
> >
> > To do later
> > ===========
> > 1. The net client initialization uses raw VFIO API to open vDPA mdev
> > device, it's better to provide a set of helpers in hw/vfio/common.c to
> > help vhost-vfio initialize device easily.
> >
> > 2. For device DMA mapping, QEMU passes memory region info to mdev
> > device and let kernel parent device driver program IOMMU. This is a
> > temporary implementation, for future when IOMMU driver supports mdev
> > bus, we can use VFIO API to program IOMMU directly for parent device.
> > Refer to the patch (vfio/mdev: IOMMU aware mediated device):
> > https://lkml.org/lkml/2018/10/12/225
> 
> 
> As Steve mentioned in the KVM forum. It's better to have at least one sample driver
> e.g virtio-net itself.
> 
> Then it would be more convenient for the reviewer to evaluate the whole stack.
> 
> Thanks
> 
> 
> >
> > Vhost-vfio usage
> > ================
> > # Query the number of available mdev instances $ cat
> > /sys/class/mdev_bus/0000:84:00.3/mdev_supported_types/ifcvf_vdpa-vdpa_
> > virtio/available_instances
> >
> > # Create a mdev instance
> > $ echo $UUID >
> > /sys/class/mdev_bus/0000:84:00.3/mdev_supported_types/ifcvf_vdpa-vdpa_
> > virtio/create
> >
> > # Launch QEMU with a virtio-net device
> >      qemu-system-x86_64 -cpu host -enable-kvm \
> >      <snip>
> >      -mem-prealloc \
> >      -netdev type=vhost-vfio,sysfsdev=/sys/bus/mdev/devices/$UUID,id=mynet\
> >      -device virtio-net-pci,netdv=mynet,page-per-vq=on \
> >
> > -------- END --------
> >
> > Xiao Wang (2):
> >    vhost-vfio: introduce vhost-vfio net client
> >    vhost-vfio: implement vhost-vfio backend
> >
> >   hw/net/vhost_net.c                |  56 ++++-
> >   hw/vfio/common.c                  |   3 +-
> >   hw/virtio/Makefile.objs           |   2 +-
> >   hw/virtio/vhost-backend.c         |   3 +
> >   hw/virtio/vhost-vfio.c            | 501
> ++++++++++++++++++++++++++++++++++++++
> >   hw/virtio/vhost.c                 |  15 ++
> >   include/hw/virtio/vhost-backend.h |   7 +-
> >   include/hw/virtio/vhost-vfio.h    |  35 +++
> >   include/hw/virtio/vhost.h         |   2 +
> >   include/net/vhost-vfio.h          |  17 ++
> >   linux-headers/linux/vhost.h       |   9 +
> >   net/Makefile.objs                 |   1 +
> >   net/clients.h                     |   3 +
> >   net/net.c                         |   1 +
> >   net/vhost-vfio.c                  | 327 +++++++++++++++++++++++++
> >   qapi/net.json                     |  22 +-
> >   16 files changed, 996 insertions(+), 8 deletions(-)
> >   create mode 100644 hw/virtio/vhost-vfio.c
> >   create mode 100644 include/hw/virtio/vhost-vfio.h
> >   create mode 100644 include/net/vhost-vfio.h
> >   create mode 100644 net/vhost-vfio.c
> >
Jason Wang Nov. 7, 2018, 2:38 p.m. UTC | #3
On 2018/11/7 下午8:26, Liang, Cunming wrote:
>
>> -----Original Message-----
>> From: Jason Wang [mailto:jasowang@redhat.com]
>> Sent: Tuesday, November 6, 2018 4:18 AM
>> To: Wang, Xiao W <xiao.w.wang@intel.com>; mst@redhat.com;
>> alex.williamson@redhat.com
>> Cc: qemu-devel@nongnu.org; Bie, Tiwei <tiwei.bie@intel.com>; Liang, Cunming
>> <cunming.liang@intel.com>; Ye, Xiaolong <xiaolong.ye@intel.com>; Wang, Zhihong
>> <zhihong.wang@intel.com>; Daly, Dan <dan.daly@intel.com>
>> Subject: Re: [RFC 0/2] vhost-vfio: introduce mdev based HW vhost backend
>>
>>
>> On 2018/10/16 下午9:23, Xiao Wang wrote:
>>> What's this
>>> ===========
>>> Following the patch (vhost: introduce mdev based hardware vhost
>>> backend) https://lwn.net/Articles/750770/, which defines a generic
>>> mdev device for vhost data path acceleration (aliased as vDPA mdev
>>> below), this patch set introduces a new net client type: vhost-vfio.
>>
>> Thanks a lot for a such interesting series. Some generic questions:
>>
>>
>> If we consider to use software backend (e.g vhost-kernel or a rely of virito-vhost-
>> user or other cases) as well in the future, maybe vhost-mdev is better which mean it
>> does not tie to VFIO anyway.
> [LC] The initial thought of using term of '-vfio' due to the VFIO UAPI being used as interface, which is the only available mdev bus driver. It causes to use the term of 'vhost-vfio' in qemu, while using term of 'vhost-mdev' which represents a helper in kernel for vhost messages via mdev.
>
>>
>>> Currently we have 2 types of vhost backends in QEMU: vhost kernel
>>> (tap) and vhost-user (e.g. DPDK vhost), in order to have a kernel
>>> space HW vhost acceleration framework, the vDPA mdev device works as a
>>> generic configuring channel.
>>
>> Does "generic" configuring channel means dpdk will also go for this way?
>> E.g it will have a vhost mdev pmd?
> [LC] We don't plan to have a vhost-mdev pmd, but thinking to have consistent virtio PMD running on top of vhost-mdev.  Virtio PMD supports pci bus and vdev (by virtio-user) bus today. Vhost-mdev most likely would be introduced as another bus (mdev bus) provider.


This seems could be eliminated if you keep use the vhost-kernel ioctl 
API. Then you can use virtio-user.


>   mdev bus DPDK support is in backlog.
>
>>
>>>    It exposes to user space a non-vendor-specific configuration
>>> interface for setting up a vhost HW accelerator,
>>
>> Or even a software translation layer on top of exist hardware.
>>
>>
>>> based on this, this patch
>>> set introduces a third vhost backend called vhost-vfio.
>>>
>>> How does it work
>>> ================
>>> The vDPA mdev defines 2 BAR regions, BAR0 and BAR1. BAR0 is the main
>>> device interface, vhost messages can be written to or read from this
>>> region following below format. All the regular vhost messages about
>>> vring addr, negotiated features, etc., are written to this region directly.
>>
>> If I understand this correctly, the mdev was not used for passed through to guest
>> directly. So what's the reason of inventing a PCI like device here? I'm asking since:
> [LC] mdev uses mandatory attribute of 'device_api' to identify the layout. We pick up one available from pci, platform, amba and ccw. It works if defining a new one for this transport.
>
>> - vhost protocol is transport indepedent, we should consider to support transport
>> other than PCI. I know we can even do it with the exist design but it looks rather odd
>> if we do e.g ccw device with a PCI like mediated device.
>>
>> - can we try to reuse vhost-kernel ioctl? Less API means less bugs and code reusing.
>> E.g virtio-user can benefit from the vhost kernel ioctl API almost with no changes I
>> believe.
> [LC] Agreed, so it reuses CMD defined by vhost-kernel ioctl. But VFIO provides device specific things (e.g. DMAR, INTR and etc.) which is the extra APIs being introduced by this transport.


I'm not quite sure I understand here. I think having vhost-kernel 
compatible ioctl does not conflict of using VFIO ioctl like DMA or INTR?

Btw, VFIO DMA ioctl is even not a must from my point of view, vhost-mdev 
can forward the mem table information to device driver and let it call 
DMA API to map/umap pages.

Thanks
Liang, Cunming Nov. 7, 2018, 3:08 p.m. UTC | #4
> -----Original Message-----
> From: Jason Wang [mailto:jasowang@redhat.com]
> Sent: Wednesday, November 7, 2018 2:38 PM
> To: Liang, Cunming <cunming.liang@intel.com>; Wang, Xiao W
> <xiao.w.wang@intel.com>; mst@redhat.com; alex.williamson@redhat.com
> Cc: qemu-devel@nongnu.org; Bie, Tiwei <tiwei.bie@intel.com>; Ye, Xiaolong
> <xiaolong.ye@intel.com>; Wang, Zhihong <zhihong.wang@intel.com>; Daly, Dan
> <dan.daly@intel.com>
> Subject: Re: [RFC 0/2] vhost-vfio: introduce mdev based HW vhost backend
> 
> 
> On 2018/11/7 下午8:26, Liang, Cunming wrote:
> >
> >> -----Original Message-----
> >> From: Jason Wang [mailto:jasowang@redhat.com]
> >> Sent: Tuesday, November 6, 2018 4:18 AM
> >> To: Wang, Xiao W <xiao.w.wang@intel.com>; mst@redhat.com;
> >> alex.williamson@redhat.com
> >> Cc: qemu-devel@nongnu.org; Bie, Tiwei <tiwei.bie@intel.com>; Liang,
> >> Cunming <cunming.liang@intel.com>; Ye, Xiaolong
> >> <xiaolong.ye@intel.com>; Wang, Zhihong <zhihong.wang@intel.com>;
> >> Daly, Dan <dan.daly@intel.com>
> >> Subject: Re: [RFC 0/2] vhost-vfio: introduce mdev based HW vhost
> >> backend
> >>
> >>
> >> On 2018/10/16 下午9:23, Xiao Wang wrote:
> >>> What's this
> >>> ===========
> >>> Following the patch (vhost: introduce mdev based hardware vhost
> >>> backend) https://lwn.net/Articles/750770/, which defines a generic
> >>> mdev device for vhost data path acceleration (aliased as vDPA mdev
> >>> below), this patch set introduces a new net client type: vhost-vfio.
> >>
> >> Thanks a lot for a such interesting series. Some generic questions:
> >>
> >>
> >> If we consider to use software backend (e.g vhost-kernel or a rely of
> >> virito-vhost- user or other cases) as well in the future, maybe
> >> vhost-mdev is better which mean it does not tie to VFIO anyway.
> > [LC] The initial thought of using term of '-vfio' due to the VFIO UAPI being used as
> interface, which is the only available mdev bus driver. It causes to use the term of
> 'vhost-vfio' in qemu, while using term of 'vhost-mdev' which represents a helper in
> kernel for vhost messages via mdev.
> >
> >>
> >>> Currently we have 2 types of vhost backends in QEMU: vhost kernel
> >>> (tap) and vhost-user (e.g. DPDK vhost), in order to have a kernel
> >>> space HW vhost acceleration framework, the vDPA mdev device works as
> >>> a generic configuring channel.
> >>
> >> Does "generic" configuring channel means dpdk will also go for this way?
> >> E.g it will have a vhost mdev pmd?
> > [LC] We don't plan to have a vhost-mdev pmd, but thinking to have consistent
> virtio PMD running on top of vhost-mdev.  Virtio PMD supports pci bus and vdev (by
> virtio-user) bus today. Vhost-mdev most likely would be introduced as another bus
> (mdev bus) provider.
> 
> 
> This seems could be eliminated if you keep use the vhost-kernel ioctl API. Then you
> can use virtio-user.
[LC] That's true.

> 
> 
> >   mdev bus DPDK support is in backlog.
> >
> >>
> >>>    It exposes to user space a non-vendor-specific configuration
> >>> interface for setting up a vhost HW accelerator,
> >>
> >> Or even a software translation layer on top of exist hardware.
> >>
> >>
> >>> based on this, this patch
> >>> set introduces a third vhost backend called vhost-vfio.
> >>>
> >>> How does it work
> >>> ================
> >>> The vDPA mdev defines 2 BAR regions, BAR0 and BAR1. BAR0 is the main
> >>> device interface, vhost messages can be written to or read from this
> >>> region following below format. All the regular vhost messages about
> >>> vring addr, negotiated features, etc., are written to this region directly.
> >>
> >> If I understand this correctly, the mdev was not used for passed through to guest
> >> directly. So what's the reason of inventing a PCI like device here? I'm asking since:
> > [LC] mdev uses mandatory attribute of 'device_api' to identify the layout. We pick
> up one available from pci, platform, amba and ccw. It works if defining a new one
> for this transport.
> >
> >> - vhost protocol is transport indepedent, we should consider to support transport
> >> other than PCI. I know we can even do it with the exist design but it looks rather
> odd
> >> if we do e.g ccw device with a PCI like mediated device.
> >>
> >> - can we try to reuse vhost-kernel ioctl? Less API means less bugs and code
> reusing.
> >> E.g virtio-user can benefit from the vhost kernel ioctl API almost with no changes
> I
> >> believe.
> > [LC] Agreed, so it reuses CMD defined by vhost-kernel ioctl. But VFIO provides
> device specific things (e.g. DMAR, INTR and etc.) which is the extra APIs being
> introduced by this transport.
> 
> 
> I'm not quite sure I understand here. I think having vhost-kernel
> compatible ioctl does not conflict of using VFIO ioctl like DMA or INTR?
> 
> Btw, VFIO DMA ioctl is even not a must from my point of view, vhost-mdev
> can forward the mem table information to device driver and let it call
> DMA API to map/umap pages.
[LC] If not regarding vhost-mdev as a device, then forward mem table won't be a concern.
If introducing a new mdev bus driver (vhost-mdev) which allows mdev instance to be a new type of provider for vhost-kernel. It becomes a pretty good alternative to fully leverage vhost-kernel ioctl.
I'm not sure it's the same view as yours when you says reusing vhost-kernel ioctl.

> 
> Thanks
Jason Wang Nov. 8, 2018, 2:15 a.m. UTC | #5
On 2018/11/7 下午11:08, Liang, Cunming wrote:
>>>> believe.
>>> [LC] Agreed, so it reuses CMD defined by vhost-kernel ioctl. But VFIO provides
>> device specific things (e.g. DMAR, INTR and etc.) which is the extra APIs being
>> introduced by this transport.
>>
>>
>> I'm not quite sure I understand here. I think having vhost-kernel
>> compatible ioctl does not conflict of using VFIO ioctl like DMA or INTR?
>>
>> Btw, VFIO DMA ioctl is even not a must from my point of view, vhost-mdev
>> can forward the mem table information to device driver and let it call
>> DMA API to map/umap pages.
> [LC] If not regarding vhost-mdev as a device, then forward mem table won't be a concern.
> If introducing a new mdev bus driver (vhost-mdev) which allows mdev instance to be a new type of provider for vhost-kernel. It becomes a pretty good alternative to fully leverage vhost-kernel ioctl.
> I'm not sure it's the same view as yours when you says reusing vhost-kernel ioctl.
>

Yes it is.

Thanks
Liang, Cunming Nov. 8, 2018, 4:48 p.m. UTC | #6
> -----Original Message-----
> From: Jason Wang [mailto:jasowang@redhat.com]
> Sent: Thursday, November 8, 2018 2:16 AM
> To: Liang, Cunming <cunming.liang@intel.com>; Wang, Xiao W
> <xiao.w.wang@intel.com>; mst@redhat.com; alex.williamson@redhat.com
> Cc: qemu-devel@nongnu.org; Bie, Tiwei <tiwei.bie@intel.com>; Ye, Xiaolong
> <xiaolong.ye@intel.com>; Wang, Zhihong <zhihong.wang@intel.com>; Daly, Dan
> <dan.daly@intel.com>
> Subject: Re: [RFC 0/2] vhost-vfio: introduce mdev based HW vhost backend
> 
> 
> On 2018/11/7 下午11:08, Liang, Cunming wrote:
> >>>> believe.
> >>> [LC] Agreed, so it reuses CMD defined by vhost-kernel ioctl. But
> >>> VFIO provides
> >> device specific things (e.g. DMAR, INTR and etc.) which is the extra
> >> APIs being introduced by this transport.
> >>
> >>
> >> I'm not quite sure I understand here. I think having vhost-kernel
> >> compatible ioctl does not conflict of using VFIO ioctl like DMA or INTR?
> >>
> >> Btw, VFIO DMA ioctl is even not a must from my point of view,
> >> vhost-mdev can forward the mem table information to device driver and
> >> let it call DMA API to map/umap pages.
> > [LC] If not regarding vhost-mdev as a device, then forward mem table won't be a
> concern.
> > If introducing a new mdev bus driver (vhost-mdev) which allows mdev instance to
> be a new type of provider for vhost-kernel. It becomes a pretty good alternative to
> fully leverage vhost-kernel ioctl.
> > I'm not sure it's the same view as yours when you says reusing vhost-kernel ioctl.
> >
> 
> Yes it is.
[LC] It sounds a pretty good idea to me. Let us spend some time to figure out the next level detail, and sync-up further plan in community call. :)

> 
> Thanks
Jason Wang Nov. 9, 2018, 2:32 a.m. UTC | #7
On 2018/11/9 上午12:48, Liang, Cunming wrote:
>> -----Original Message-----
>> From: Jason Wang [mailto:jasowang@redhat.com]
>> Sent: Thursday, November 8, 2018 2:16 AM
>> To: Liang, Cunming<cunming.liang@intel.com>; Wang, Xiao W
>> <xiao.w.wang@intel.com>;mst@redhat.com;alex.williamson@redhat.com
>> Cc:qemu-devel@nongnu.org; Bie, Tiwei<tiwei.bie@intel.com>; Ye, Xiaolong
>> <xiaolong.ye@intel.com>; Wang, Zhihong<zhihong.wang@intel.com>; Daly, Dan
>> <dan.daly@intel.com>
>> Subject: Re: [RFC 0/2] vhost-vfio: introduce mdev based HW vhost backend
>>
>>
>> On 2018/11/7 下午11:08, Liang, Cunming wrote:
>>>>>> believe.
>>>>> [LC] Agreed, so it reuses CMD defined by vhost-kernel ioctl. But
>>>>> VFIO provides
>>>> device specific things (e.g. DMAR, INTR and etc.) which is the extra
>>>> APIs being introduced by this transport.
>>>>
>>>>
>>>> I'm not quite sure I understand here. I think having vhost-kernel
>>>> compatible ioctl does not conflict of using VFIO ioctl like DMA or INTR?
>>>>
>>>> Btw, VFIO DMA ioctl is even not a must from my point of view,
>>>> vhost-mdev can forward the mem table information to device driver and
>>>> let it call DMA API to map/umap pages.
>>> [LC] If not regarding vhost-mdev as a device, then forward mem table won't be a
>> concern.
>>> If introducing a new mdev bus driver (vhost-mdev) which allows mdev instance to
>> be a new type of provider for vhost-kernel. It becomes a pretty good alternative to
>> fully leverage vhost-kernel ioctl.
>>> I'm not sure it's the same view as yours when you says reusing vhost-kernel ioctl.
>>>
>> Yes it is.
> [LC] It sounds a pretty good idea to me. Let us spend some time to figure out the next level detail, and sync-up further plan in community call.:)
>

Cool, thanks.