mbox series

[RFC,00/22] Enhance VHOST to enable SoC-to-SoC communication

Message ID 20200702082143.25259-1-kishon@ti.com
Headers show
Series Enhance VHOST to enable SoC-to-SoC communication | expand

Message

Kishon Vijay Abraham I July 2, 2020, 8:21 a.m. UTC
This series enhances Linux Vhost support to enable SoC-to-SoC
communication over MMIO. This series enables rpmsg communication between
two SoCs using both PCIe RC<->EP and HOST1-NTB-HOST2

1) Modify vhost to use standard Linux driver model
2) Add support in vring to access virtqueue over MMIO
3) Add vhost client driver for rpmsg
4) Add PCIe RC driver (uses virtio) and PCIe EP driver (uses vhost) for
   rpmsg communication between two SoCs connected to each other
5) Add NTB Virtio driver and NTB Vhost driver for rpmsg communication
   between two SoCs connected via NTB
6) Add configfs to configure the components

UseCase1 :

 VHOST RPMSG                     VIRTIO RPMSG
      +                               +
      |                               |
      |                               |
      |                               |
      |                               |
+-----v------+                 +------v-------+
|   Linux    |                 |     Linux    |
|  Endpoint  |                 | Root Complex |
|            <----------------->              |
|            |                 |              |
|    SOC1    |                 |     SOC2     |
+------------+                 +--------------+

UseCase 2:

     VHOST RPMSG                                      VIRTIO RPMSG
          +                                                 +
          |                                                 |
          |                                                 |
          |                                                 |
          |                                                 |
   +------v------+                                   +------v------+
   |             |                                   |             |
   |    HOST1    |                                   |    HOST2    |
   |             |                                   |             |
   +------^------+                                   +------^------+
          |                                                 |
          |                                                 |
+---------------------------------------------------------------------+
|  +------v------+                                   +------v------+  |
|  |             |                                   |             |  |
|  |     EP      |                                   |     EP      |  |
|  | CONTROLLER1 |                                   | CONTROLLER2 |  |
|  |             <----------------------------------->             |  |
|  |             |                                   |             |  |
|  |             |                                   |             |  |
|  |             |  SoC With Multiple EP Instances   |             |  |
|  |             |  (Configured using NTB Function)  |             |  |
|  +-------------+                                   +-------------+  |
+---------------------------------------------------------------------+

Software Layering:

The high-level SW layering should look something like below. This series
adds support only for RPMSG VHOST, however something similar should be
done for net and scsi. With that any vhost device (PCI, NTB, Platform
device, user) can use any of the vhost client driver.


    +----------------+  +-----------+  +------------+  +----------+
    |  RPMSG VHOST   |  | NET VHOST |  | SCSI VHOST |  |    X     |
    +-------^--------+  +-----^-----+  +-----^------+  +----^-----+
            |                 |              |              |
            |                 |              |              |
            |                 |              |              |
+-----------v-----------------v--------------v--------------v----------+
|                            VHOST CORE                                |
+--------^---------------^--------------------^------------------^-----+
         |               |                    |                  |
         |               |                    |                  |
         |               |                    |                  |
+--------v-------+  +----v------+  +----------v----------+  +----v-----+
|  PCI EPF VHOST |  | NTB VHOST |  |PLATFORM DEVICE VHOST|  |    X     |
+----------------+  +-----------+  +---------------------+  +----------+

This was initially proposed here [1]

[1] -> https://lore.kernel.org/r/2cf00ec4-1ed6-f66e-6897-006d1a5b6390@ti.com


Kishon Vijay Abraham I (22):
  vhost: Make _feature_ bits a property of vhost device
  vhost: Introduce standard Linux driver model in VHOST
  vhost: Add ops for the VHOST driver to configure VHOST device
  vringh: Add helpers to access vring in MMIO
  vhost: Add MMIO helpers for operations on vhost virtqueue
  vhost: Introduce configfs entry for configuring VHOST
  virtio_pci: Use request_threaded_irq() instead of request_irq()
  rpmsg: virtio_rpmsg_bus: Disable receive virtqueue callback when
    reading messages
  rpmsg: Introduce configfs entry for configuring rpmsg
  rpmsg: virtio_rpmsg_bus: Add Address Service Notification support
  rpmsg: virtio_rpmsg_bus: Move generic rpmsg structure to
    rpmsg_internal.h
  virtio: Add ops to allocate and free buffer
  rpmsg: virtio_rpmsg_bus: Use virtio_alloc_buffer() and
    virtio_free_buffer()
  rpmsg: Add VHOST based remote processor messaging bus
  samples/rpmsg: Setup delayed work to send message
  samples/rpmsg: Wait for address to be bound to rpdev for sending
    message
  rpmsg.txt: Add Documentation to configure rpmsg using configfs
  virtio_pci: Add VIRTIO driver for VHOST on Configurable PCIe Endpoint
    device
  PCI: endpoint: Add EP function driver to provide VHOST interface
  NTB: Add a new NTB client driver to implement VIRTIO functionality
  NTB: Add a new NTB client driver to implement VHOST functionality
  NTB: Describe the ntb_virtio and ntb_vhost client in the documentation

 Documentation/driver-api/ntb.rst              |   11 +
 Documentation/rpmsg.txt                       |   56 +
 drivers/ntb/Kconfig                           |   18 +
 drivers/ntb/Makefile                          |    2 +
 drivers/ntb/ntb_vhost.c                       |  776 +++++++++++
 drivers/ntb/ntb_virtio.c                      |  853 ++++++++++++
 drivers/ntb/ntb_virtio.h                      |   56 +
 drivers/pci/endpoint/functions/Kconfig        |   11 +
 drivers/pci/endpoint/functions/Makefile       |    1 +
 .../pci/endpoint/functions/pci-epf-vhost.c    | 1144 ++++++++++++++++
 drivers/rpmsg/Kconfig                         |   10 +
 drivers/rpmsg/Makefile                        |    3 +-
 drivers/rpmsg/rpmsg_cfs.c                     |  394 ++++++
 drivers/rpmsg/rpmsg_core.c                    |    7 +
 drivers/rpmsg/rpmsg_internal.h                |  136 ++
 drivers/rpmsg/vhost_rpmsg_bus.c               | 1151 +++++++++++++++++
 drivers/rpmsg/virtio_rpmsg_bus.c              |  184 ++-
 drivers/vhost/Kconfig                         |    1 +
 drivers/vhost/Makefile                        |    2 +-
 drivers/vhost/net.c                           |   10 +-
 drivers/vhost/scsi.c                          |   24 +-
 drivers/vhost/test.c                          |   17 +-
 drivers/vhost/vdpa.c                          |    2 +-
 drivers/vhost/vhost.c                         |  730 ++++++++++-
 drivers/vhost/vhost_cfs.c                     |  341 +++++
 drivers/vhost/vringh.c                        |  332 +++++
 drivers/vhost/vsock.c                         |   20 +-
 drivers/virtio/Kconfig                        |    9 +
 drivers/virtio/Makefile                       |    1 +
 drivers/virtio/virtio_pci_common.c            |   25 +-
 drivers/virtio/virtio_pci_epf.c               |  670 ++++++++++
 include/linux/mod_devicetable.h               |    6 +
 include/linux/rpmsg.h                         |    6 +
 {drivers/vhost => include/linux}/vhost.h      |  132 +-
 include/linux/virtio.h                        |    3 +
 include/linux/virtio_config.h                 |   42 +
 include/linux/vringh.h                        |   46 +
 samples/rpmsg/rpmsg_client_sample.c           |   32 +-
 tools/virtio/virtio_test.c                    |    2 +-
 39 files changed, 7083 insertions(+), 183 deletions(-)
 create mode 100644 drivers/ntb/ntb_vhost.c
 create mode 100644 drivers/ntb/ntb_virtio.c
 create mode 100644 drivers/ntb/ntb_virtio.h
 create mode 100644 drivers/pci/endpoint/functions/pci-epf-vhost.c
 create mode 100644 drivers/rpmsg/rpmsg_cfs.c
 create mode 100644 drivers/rpmsg/vhost_rpmsg_bus.c
 create mode 100644 drivers/vhost/vhost_cfs.c
 create mode 100644 drivers/virtio/virtio_pci_epf.c
 rename {drivers/vhost => include/linux}/vhost.h (66%)

Comments

Michael S. Tsirkin July 2, 2020, 9:51 a.m. UTC | #1
On Thu, Jul 02, 2020 at 01:51:21PM +0530, Kishon Vijay Abraham I wrote:
> This series enhances Linux Vhost support to enable SoC-to-SoC
> communication over MMIO. This series enables rpmsg communication between
> two SoCs using both PCIe RC<->EP and HOST1-NTB-HOST2
> 
> 1) Modify vhost to use standard Linux driver model
> 2) Add support in vring to access virtqueue over MMIO
> 3) Add vhost client driver for rpmsg
> 4) Add PCIe RC driver (uses virtio) and PCIe EP driver (uses vhost) for
>    rpmsg communication between two SoCs connected to each other
> 5) Add NTB Virtio driver and NTB Vhost driver for rpmsg communication
>    between two SoCs connected via NTB
> 6) Add configfs to configure the components
> 
> UseCase1 :
> 
>  VHOST RPMSG                     VIRTIO RPMSG
>       +                               +
>       |                               |
>       |                               |
>       |                               |
>       |                               |
> +-----v------+                 +------v-------+
> |   Linux    |                 |     Linux    |
> |  Endpoint  |                 | Root Complex |
> |            <----------------->              |
> |            |                 |              |
> |    SOC1    |                 |     SOC2     |
> +------------+                 +--------------+
> 
> UseCase 2:
> 
>      VHOST RPMSG                                      VIRTIO RPMSG
>           +                                                 +
>           |                                                 |
>           |                                                 |
>           |                                                 |
>           |                                                 |
>    +------v------+                                   +------v------+
>    |             |                                   |             |
>    |    HOST1    |                                   |    HOST2    |
>    |             |                                   |             |
>    +------^------+                                   +------^------+
>           |                                                 |
>           |                                                 |
> +---------------------------------------------------------------------+
> |  +------v------+                                   +------v------+  |
> |  |             |                                   |             |  |
> |  |     EP      |                                   |     EP      |  |
> |  | CONTROLLER1 |                                   | CONTROLLER2 |  |
> |  |             <----------------------------------->             |  |
> |  |             |                                   |             |  |
> |  |             |                                   |             |  |
> |  |             |  SoC With Multiple EP Instances   |             |  |
> |  |             |  (Configured using NTB Function)  |             |  |
> |  +-------------+                                   +-------------+  |
> +---------------------------------------------------------------------+
> 
> Software Layering:
> 
> The high-level SW layering should look something like below. This series
> adds support only for RPMSG VHOST, however something similar should be
> done for net and scsi. With that any vhost device (PCI, NTB, Platform
> device, user) can use any of the vhost client driver.
> 
> 
>     +----------------+  +-----------+  +------------+  +----------+
>     |  RPMSG VHOST   |  | NET VHOST |  | SCSI VHOST |  |    X     |
>     +-------^--------+  +-----^-----+  +-----^------+  +----^-----+
>             |                 |              |              |
>             |                 |              |              |
>             |                 |              |              |
> +-----------v-----------------v--------------v--------------v----------+
> |                            VHOST CORE                                |
> +--------^---------------^--------------------^------------------^-----+
>          |               |                    |                  |
>          |               |                    |                  |
>          |               |                    |                  |
> +--------v-------+  +----v------+  +----------v----------+  +----v-----+
> |  PCI EPF VHOST |  | NTB VHOST |  |PLATFORM DEVICE VHOST|  |    X     |
> +----------------+  +-----------+  +---------------------+  +----------+
> 
> This was initially proposed here [1]
> 
> [1] -> https://lore.kernel.org/r/2cf00ec4-1ed6-f66e-6897-006d1a5b6390@ti.com


I find this very interesting. A huge patchset so will take a bit
to review, but I certainly plan to do that. Thanks!

> 
> Kishon Vijay Abraham I (22):
>   vhost: Make _feature_ bits a property of vhost device
>   vhost: Introduce standard Linux driver model in VHOST
>   vhost: Add ops for the VHOST driver to configure VHOST device
>   vringh: Add helpers to access vring in MMIO
>   vhost: Add MMIO helpers for operations on vhost virtqueue
>   vhost: Introduce configfs entry for configuring VHOST
>   virtio_pci: Use request_threaded_irq() instead of request_irq()
>   rpmsg: virtio_rpmsg_bus: Disable receive virtqueue callback when
>     reading messages
>   rpmsg: Introduce configfs entry for configuring rpmsg
>   rpmsg: virtio_rpmsg_bus: Add Address Service Notification support
>   rpmsg: virtio_rpmsg_bus: Move generic rpmsg structure to
>     rpmsg_internal.h
>   virtio: Add ops to allocate and free buffer
>   rpmsg: virtio_rpmsg_bus: Use virtio_alloc_buffer() and
>     virtio_free_buffer()
>   rpmsg: Add VHOST based remote processor messaging bus
>   samples/rpmsg: Setup delayed work to send message
>   samples/rpmsg: Wait for address to be bound to rpdev for sending
>     message
>   rpmsg.txt: Add Documentation to configure rpmsg using configfs
>   virtio_pci: Add VIRTIO driver for VHOST on Configurable PCIe Endpoint
>     device
>   PCI: endpoint: Add EP function driver to provide VHOST interface
>   NTB: Add a new NTB client driver to implement VIRTIO functionality
>   NTB: Add a new NTB client driver to implement VHOST functionality
>   NTB: Describe the ntb_virtio and ntb_vhost client in the documentation
> 
>  Documentation/driver-api/ntb.rst              |   11 +
>  Documentation/rpmsg.txt                       |   56 +
>  drivers/ntb/Kconfig                           |   18 +
>  drivers/ntb/Makefile                          |    2 +
>  drivers/ntb/ntb_vhost.c                       |  776 +++++++++++
>  drivers/ntb/ntb_virtio.c                      |  853 ++++++++++++
>  drivers/ntb/ntb_virtio.h                      |   56 +
>  drivers/pci/endpoint/functions/Kconfig        |   11 +
>  drivers/pci/endpoint/functions/Makefile       |    1 +
>  .../pci/endpoint/functions/pci-epf-vhost.c    | 1144 ++++++++++++++++
>  drivers/rpmsg/Kconfig                         |   10 +
>  drivers/rpmsg/Makefile                        |    3 +-
>  drivers/rpmsg/rpmsg_cfs.c                     |  394 ++++++
>  drivers/rpmsg/rpmsg_core.c                    |    7 +
>  drivers/rpmsg/rpmsg_internal.h                |  136 ++
>  drivers/rpmsg/vhost_rpmsg_bus.c               | 1151 +++++++++++++++++
>  drivers/rpmsg/virtio_rpmsg_bus.c              |  184 ++-
>  drivers/vhost/Kconfig                         |    1 +
>  drivers/vhost/Makefile                        |    2 +-
>  drivers/vhost/net.c                           |   10 +-
>  drivers/vhost/scsi.c                          |   24 +-
>  drivers/vhost/test.c                          |   17 +-
>  drivers/vhost/vdpa.c                          |    2 +-
>  drivers/vhost/vhost.c                         |  730 ++++++++++-
>  drivers/vhost/vhost_cfs.c                     |  341 +++++
>  drivers/vhost/vringh.c                        |  332 +++++
>  drivers/vhost/vsock.c                         |   20 +-
>  drivers/virtio/Kconfig                        |    9 +
>  drivers/virtio/Makefile                       |    1 +
>  drivers/virtio/virtio_pci_common.c            |   25 +-
>  drivers/virtio/virtio_pci_epf.c               |  670 ++++++++++
>  include/linux/mod_devicetable.h               |    6 +
>  include/linux/rpmsg.h                         |    6 +
>  {drivers/vhost => include/linux}/vhost.h      |  132 +-
>  include/linux/virtio.h                        |    3 +
>  include/linux/virtio_config.h                 |   42 +
>  include/linux/vringh.h                        |   46 +
>  samples/rpmsg/rpmsg_client_sample.c           |   32 +-
>  tools/virtio/virtio_test.c                    |    2 +-
>  39 files changed, 7083 insertions(+), 183 deletions(-)
>  create mode 100644 drivers/ntb/ntb_vhost.c
>  create mode 100644 drivers/ntb/ntb_virtio.c
>  create mode 100644 drivers/ntb/ntb_virtio.h
>  create mode 100644 drivers/pci/endpoint/functions/pci-epf-vhost.c
>  create mode 100644 drivers/rpmsg/rpmsg_cfs.c
>  create mode 100644 drivers/rpmsg/vhost_rpmsg_bus.c
>  create mode 100644 drivers/vhost/vhost_cfs.c
>  create mode 100644 drivers/virtio/virtio_pci_epf.c
>  rename {drivers/vhost => include/linux}/vhost.h (66%)
> 
> -- 
> 2.17.1
>
Jason Wang July 2, 2020, 10:10 a.m. UTC | #2
On 2020/7/2 下午5:51, Michael S. Tsirkin wrote:
> On Thu, Jul 02, 2020 at 01:51:21PM +0530, Kishon Vijay Abraham I wrote:
>> This series enhances Linux Vhost support to enable SoC-to-SoC
>> communication over MMIO. This series enables rpmsg communication between
>> two SoCs using both PCIe RC<->EP and HOST1-NTB-HOST2
>>
>> 1) Modify vhost to use standard Linux driver model
>> 2) Add support in vring to access virtqueue over MMIO
>> 3) Add vhost client driver for rpmsg
>> 4) Add PCIe RC driver (uses virtio) and PCIe EP driver (uses vhost) for
>>     rpmsg communication between two SoCs connected to each other
>> 5) Add NTB Virtio driver and NTB Vhost driver for rpmsg communication
>>     between two SoCs connected via NTB
>> 6) Add configfs to configure the components
>>
>> UseCase1 :
>>
>>   VHOST RPMSG                     VIRTIO RPMSG
>>        +                               +
>>        |                               |
>>        |                               |
>>        |                               |
>>        |                               |
>> +-----v------+                 +------v-------+
>> |   Linux    |                 |     Linux    |
>> |  Endpoint  |                 | Root Complex |
>> |            <----------------->              |
>> |            |                 |              |
>> |    SOC1    |                 |     SOC2     |
>> +------------+                 +--------------+
>>
>> UseCase 2:
>>
>>       VHOST RPMSG                                      VIRTIO RPMSG
>>            +                                                 +
>>            |                                                 |
>>            |                                                 |
>>            |                                                 |
>>            |                                                 |
>>     +------v------+                                   +------v------+
>>     |             |                                   |             |
>>     |    HOST1    |                                   |    HOST2    |
>>     |             |                                   |             |
>>     +------^------+                                   +------^------+
>>            |                                                 |
>>            |                                                 |
>> +---------------------------------------------------------------------+
>> |  +------v------+                                   +------v------+  |
>> |  |             |                                   |             |  |
>> |  |     EP      |                                   |     EP      |  |
>> |  | CONTROLLER1 |                                   | CONTROLLER2 |  |
>> |  |             <----------------------------------->             |  |
>> |  |             |                                   |             |  |
>> |  |             |                                   |             |  |
>> |  |             |  SoC With Multiple EP Instances   |             |  |
>> |  |             |  (Configured using NTB Function)  |             |  |
>> |  +-------------+                                   +-------------+  |
>> +---------------------------------------------------------------------+
>>
>> Software Layering:
>>
>> The high-level SW layering should look something like below. This series
>> adds support only for RPMSG VHOST, however something similar should be
>> done for net and scsi. With that any vhost device (PCI, NTB, Platform
>> device, user) can use any of the vhost client driver.
>>
>>
>>      +----------------+  +-----------+  +------------+  +----------+
>>      |  RPMSG VHOST   |  | NET VHOST |  | SCSI VHOST |  |    X     |
>>      +-------^--------+  +-----^-----+  +-----^------+  +----^-----+
>>              |                 |              |              |
>>              |                 |              |              |
>>              |                 |              |              |
>> +-----------v-----------------v--------------v--------------v----------+
>> |                            VHOST CORE                                |
>> +--------^---------------^--------------------^------------------^-----+
>>           |               |                    |                  |
>>           |               |                    |                  |
>>           |               |                    |                  |
>> +--------v-------+  +----v------+  +----------v----------+  +----v-----+
>> |  PCI EPF VHOST |  | NTB VHOST |  |PLATFORM DEVICE VHOST|  |    X     |
>> +----------------+  +-----------+  +---------------------+  +----------+
>>
>> This was initially proposed here [1]
>>
>> [1] -> https://lore.kernel.org/r/2cf00ec4-1ed6-f66e-6897-006d1a5b6390@ti.com
>
> I find this very interesting. A huge patchset so will take a bit
> to review, but I certainly plan to do that. Thanks!


Yes, it would be better if there's a git branch for us to have a look.

Btw, I'm not sure I get the big picture, but I vaguely feel some of the 
work is duplicated with vDPA (e.g the epf transport or vhost bus).

Have you considered to implement these through vDPA?

Thanks


>
>> Kishon Vijay Abraham I (22):
>>    vhost: Make _feature_ bits a property of vhost device
>>    vhost: Introduce standard Linux driver model in VHOST
>>    vhost: Add ops for the VHOST driver to configure VHOST device
>>    vringh: Add helpers to access vring in MMIO
>>    vhost: Add MMIO helpers for operations on vhost virtqueue
>>    vhost: Introduce configfs entry for configuring VHOST
>>    virtio_pci: Use request_threaded_irq() instead of request_irq()
>>    rpmsg: virtio_rpmsg_bus: Disable receive virtqueue callback when
>>      reading messages
>>    rpmsg: Introduce configfs entry for configuring rpmsg
>>    rpmsg: virtio_rpmsg_bus: Add Address Service Notification support
>>    rpmsg: virtio_rpmsg_bus: Move generic rpmsg structure to
>>      rpmsg_internal.h
>>    virtio: Add ops to allocate and free buffer
>>    rpmsg: virtio_rpmsg_bus: Use virtio_alloc_buffer() and
>>      virtio_free_buffer()
>>    rpmsg: Add VHOST based remote processor messaging bus
>>    samples/rpmsg: Setup delayed work to send message
>>    samples/rpmsg: Wait for address to be bound to rpdev for sending
>>      message
>>    rpmsg.txt: Add Documentation to configure rpmsg using configfs
>>    virtio_pci: Add VIRTIO driver for VHOST on Configurable PCIe Endpoint
>>      device
>>    PCI: endpoint: Add EP function driver to provide VHOST interface
>>    NTB: Add a new NTB client driver to implement VIRTIO functionality
>>    NTB: Add a new NTB client driver to implement VHOST functionality
>>    NTB: Describe the ntb_virtio and ntb_vhost client in the documentation
>>
>>   Documentation/driver-api/ntb.rst              |   11 +
>>   Documentation/rpmsg.txt                       |   56 +
>>   drivers/ntb/Kconfig                           |   18 +
>>   drivers/ntb/Makefile                          |    2 +
>>   drivers/ntb/ntb_vhost.c                       |  776 +++++++++++
>>   drivers/ntb/ntb_virtio.c                      |  853 ++++++++++++
>>   drivers/ntb/ntb_virtio.h                      |   56 +
>>   drivers/pci/endpoint/functions/Kconfig        |   11 +
>>   drivers/pci/endpoint/functions/Makefile       |    1 +
>>   .../pci/endpoint/functions/pci-epf-vhost.c    | 1144 ++++++++++++++++
>>   drivers/rpmsg/Kconfig                         |   10 +
>>   drivers/rpmsg/Makefile                        |    3 +-
>>   drivers/rpmsg/rpmsg_cfs.c                     |  394 ++++++
>>   drivers/rpmsg/rpmsg_core.c                    |    7 +
>>   drivers/rpmsg/rpmsg_internal.h                |  136 ++
>>   drivers/rpmsg/vhost_rpmsg_bus.c               | 1151 +++++++++++++++++
>>   drivers/rpmsg/virtio_rpmsg_bus.c              |  184 ++-
>>   drivers/vhost/Kconfig                         |    1 +
>>   drivers/vhost/Makefile                        |    2 +-
>>   drivers/vhost/net.c                           |   10 +-
>>   drivers/vhost/scsi.c                          |   24 +-
>>   drivers/vhost/test.c                          |   17 +-
>>   drivers/vhost/vdpa.c                          |    2 +-
>>   drivers/vhost/vhost.c                         |  730 ++++++++++-
>>   drivers/vhost/vhost_cfs.c                     |  341 +++++
>>   drivers/vhost/vringh.c                        |  332 +++++
>>   drivers/vhost/vsock.c                         |   20 +-
>>   drivers/virtio/Kconfig                        |    9 +
>>   drivers/virtio/Makefile                       |    1 +
>>   drivers/virtio/virtio_pci_common.c            |   25 +-
>>   drivers/virtio/virtio_pci_epf.c               |  670 ++++++++++
>>   include/linux/mod_devicetable.h               |    6 +
>>   include/linux/rpmsg.h                         |    6 +
>>   {drivers/vhost => include/linux}/vhost.h      |  132 +-
>>   include/linux/virtio.h                        |    3 +
>>   include/linux/virtio_config.h                 |   42 +
>>   include/linux/vringh.h                        |   46 +
>>   samples/rpmsg/rpmsg_client_sample.c           |   32 +-
>>   tools/virtio/virtio_test.c                    |    2 +-
>>   39 files changed, 7083 insertions(+), 183 deletions(-)
>>   create mode 100644 drivers/ntb/ntb_vhost.c
>>   create mode 100644 drivers/ntb/ntb_virtio.c
>>   create mode 100644 drivers/ntb/ntb_virtio.h
>>   create mode 100644 drivers/pci/endpoint/functions/pci-epf-vhost.c
>>   create mode 100644 drivers/rpmsg/rpmsg_cfs.c
>>   create mode 100644 drivers/rpmsg/vhost_rpmsg_bus.c
>>   create mode 100644 drivers/vhost/vhost_cfs.c
>>   create mode 100644 drivers/virtio/virtio_pci_epf.c
>>   rename {drivers/vhost => include/linux}/vhost.h (66%)
>>
>> -- 
>> 2.17.1
>>
Kishon Vijay Abraham I July 2, 2020, 10:25 a.m. UTC | #3
Hi Michael,

On 7/2/2020 3:21 PM, Michael S. Tsirkin wrote:
> On Thu, Jul 02, 2020 at 01:51:21PM +0530, Kishon Vijay Abraham I wrote:
>> This series enhances Linux Vhost support to enable SoC-to-SoC
>> communication over MMIO. This series enables rpmsg communication between
>> two SoCs using both PCIe RC<->EP and HOST1-NTB-HOST2
>>
>> 1) Modify vhost to use standard Linux driver model
>> 2) Add support in vring to access virtqueue over MMIO
>> 3) Add vhost client driver for rpmsg
>> 4) Add PCIe RC driver (uses virtio) and PCIe EP driver (uses vhost) for
>>    rpmsg communication between two SoCs connected to each other
>> 5) Add NTB Virtio driver and NTB Vhost driver for rpmsg communication
>>    between two SoCs connected via NTB
>> 6) Add configfs to configure the components
>>
>> UseCase1 :
>>
>>  VHOST RPMSG                     VIRTIO RPMSG
>>       +                               +
>>       |                               |
>>       |                               |
>>       |                               |
>>       |                               |
>> +-----v------+                 +------v-------+
>> |   Linux    |                 |     Linux    |
>> |  Endpoint  |                 | Root Complex |
>> |            <----------------->              |
>> |            |                 |              |
>> |    SOC1    |                 |     SOC2     |
>> +------------+                 +--------------+
>>
>> UseCase 2:
>>
>>      VHOST RPMSG                                      VIRTIO RPMSG
>>           +                                                 +
>>           |                                                 |
>>           |                                                 |
>>           |                                                 |
>>           |                                                 |
>>    +------v------+                                   +------v------+
>>    |             |                                   |             |
>>    |    HOST1    |                                   |    HOST2    |
>>    |             |                                   |             |
>>    +------^------+                                   +------^------+
>>           |                                                 |
>>           |                                                 |
>> +---------------------------------------------------------------------+
>> |  +------v------+                                   +------v------+  |
>> |  |             |                                   |             |  |
>> |  |     EP      |                                   |     EP      |  |
>> |  | CONTROLLER1 |                                   | CONTROLLER2 |  |
>> |  |             <----------------------------------->             |  |
>> |  |             |                                   |             |  |
>> |  |             |                                   |             |  |
>> |  |             |  SoC With Multiple EP Instances   |             |  |
>> |  |             |  (Configured using NTB Function)  |             |  |
>> |  +-------------+                                   +-------------+  |
>> +---------------------------------------------------------------------+
>>
>> Software Layering:
>>
>> The high-level SW layering should look something like below. This series
>> adds support only for RPMSG VHOST, however something similar should be
>> done for net and scsi. With that any vhost device (PCI, NTB, Platform
>> device, user) can use any of the vhost client driver.
>>
>>
>>     +----------------+  +-----------+  +------------+  +----------+
>>     |  RPMSG VHOST   |  | NET VHOST |  | SCSI VHOST |  |    X     |
>>     +-------^--------+  +-----^-----+  +-----^------+  +----^-----+
>>             |                 |              |              |
>>             |                 |              |              |
>>             |                 |              |              |
>> +-----------v-----------------v--------------v--------------v----------+
>> |                            VHOST CORE                                |
>> +--------^---------------^--------------------^------------------^-----+
>>          |               |                    |                  |
>>          |               |                    |                  |
>>          |               |                    |                  |
>> +--------v-------+  +----v------+  +----------v----------+  +----v-----+
>> |  PCI EPF VHOST |  | NTB VHOST |  |PLATFORM DEVICE VHOST|  |    X     |
>> +----------------+  +-----------+  +---------------------+  +----------+
>>
>> This was initially proposed here [1]
>>
>> [1] -> https://lore.kernel.org/r/2cf00ec4-1ed6-f66e-6897-006d1a5b6390@ti.com
> 
> 
> I find this very interesting. A huge patchset so will take a bit
> to review, but I certainly plan to do that. Thanks!

Great to hear! Thanks in advance for reviewing!

Regards
Kishon

> 
>>
>> Kishon Vijay Abraham I (22):
>>   vhost: Make _feature_ bits a property of vhost device
>>   vhost: Introduce standard Linux driver model in VHOST
>>   vhost: Add ops for the VHOST driver to configure VHOST device
>>   vringh: Add helpers to access vring in MMIO
>>   vhost: Add MMIO helpers for operations on vhost virtqueue
>>   vhost: Introduce configfs entry for configuring VHOST
>>   virtio_pci: Use request_threaded_irq() instead of request_irq()
>>   rpmsg: virtio_rpmsg_bus: Disable receive virtqueue callback when
>>     reading messages
>>   rpmsg: Introduce configfs entry for configuring rpmsg
>>   rpmsg: virtio_rpmsg_bus: Add Address Service Notification support
>>   rpmsg: virtio_rpmsg_bus: Move generic rpmsg structure to
>>     rpmsg_internal.h
>>   virtio: Add ops to allocate and free buffer
>>   rpmsg: virtio_rpmsg_bus: Use virtio_alloc_buffer() and
>>     virtio_free_buffer()
>>   rpmsg: Add VHOST based remote processor messaging bus
>>   samples/rpmsg: Setup delayed work to send message
>>   samples/rpmsg: Wait for address to be bound to rpdev for sending
>>     message
>>   rpmsg.txt: Add Documentation to configure rpmsg using configfs
>>   virtio_pci: Add VIRTIO driver for VHOST on Configurable PCIe Endpoint
>>     device
>>   PCI: endpoint: Add EP function driver to provide VHOST interface
>>   NTB: Add a new NTB client driver to implement VIRTIO functionality
>>   NTB: Add a new NTB client driver to implement VHOST functionality
>>   NTB: Describe the ntb_virtio and ntb_vhost client in the documentation
>>
>>  Documentation/driver-api/ntb.rst              |   11 +
>>  Documentation/rpmsg.txt                       |   56 +
>>  drivers/ntb/Kconfig                           |   18 +
>>  drivers/ntb/Makefile                          |    2 +
>>  drivers/ntb/ntb_vhost.c                       |  776 +++++++++++
>>  drivers/ntb/ntb_virtio.c                      |  853 ++++++++++++
>>  drivers/ntb/ntb_virtio.h                      |   56 +
>>  drivers/pci/endpoint/functions/Kconfig        |   11 +
>>  drivers/pci/endpoint/functions/Makefile       |    1 +
>>  .../pci/endpoint/functions/pci-epf-vhost.c    | 1144 ++++++++++++++++
>>  drivers/rpmsg/Kconfig                         |   10 +
>>  drivers/rpmsg/Makefile                        |    3 +-
>>  drivers/rpmsg/rpmsg_cfs.c                     |  394 ++++++
>>  drivers/rpmsg/rpmsg_core.c                    |    7 +
>>  drivers/rpmsg/rpmsg_internal.h                |  136 ++
>>  drivers/rpmsg/vhost_rpmsg_bus.c               | 1151 +++++++++++++++++
>>  drivers/rpmsg/virtio_rpmsg_bus.c              |  184 ++-
>>  drivers/vhost/Kconfig                         |    1 +
>>  drivers/vhost/Makefile                        |    2 +-
>>  drivers/vhost/net.c                           |   10 +-
>>  drivers/vhost/scsi.c                          |   24 +-
>>  drivers/vhost/test.c                          |   17 +-
>>  drivers/vhost/vdpa.c                          |    2 +-
>>  drivers/vhost/vhost.c                         |  730 ++++++++++-
>>  drivers/vhost/vhost_cfs.c                     |  341 +++++
>>  drivers/vhost/vringh.c                        |  332 +++++
>>  drivers/vhost/vsock.c                         |   20 +-
>>  drivers/virtio/Kconfig                        |    9 +
>>  drivers/virtio/Makefile                       |    1 +
>>  drivers/virtio/virtio_pci_common.c            |   25 +-
>>  drivers/virtio/virtio_pci_epf.c               |  670 ++++++++++
>>  include/linux/mod_devicetable.h               |    6 +
>>  include/linux/rpmsg.h                         |    6 +
>>  {drivers/vhost => include/linux}/vhost.h      |  132 +-
>>  include/linux/virtio.h                        |    3 +
>>  include/linux/virtio_config.h                 |   42 +
>>  include/linux/vringh.h                        |   46 +
>>  samples/rpmsg/rpmsg_client_sample.c           |   32 +-
>>  tools/virtio/virtio_test.c                    |    2 +-
>>  39 files changed, 7083 insertions(+), 183 deletions(-)
>>  create mode 100644 drivers/ntb/ntb_vhost.c
>>  create mode 100644 drivers/ntb/ntb_virtio.c
>>  create mode 100644 drivers/ntb/ntb_virtio.h
>>  create mode 100644 drivers/pci/endpoint/functions/pci-epf-vhost.c
>>  create mode 100644 drivers/rpmsg/rpmsg_cfs.c
>>  create mode 100644 drivers/rpmsg/vhost_rpmsg_bus.c
>>  create mode 100644 drivers/vhost/vhost_cfs.c
>>  create mode 100644 drivers/virtio/virtio_pci_epf.c
>>  rename {drivers/vhost => include/linux}/vhost.h (66%)
>>
>> -- 
>> 2.17.1
>>
>
Kishon Vijay Abraham I July 2, 2020, 1:35 p.m. UTC | #4
Hi Jason,

On 7/2/2020 3:40 PM, Jason Wang wrote:
> 
> On 2020/7/2 下午5:51, Michael S. Tsirkin wrote:
>> On Thu, Jul 02, 2020 at 01:51:21PM +0530, Kishon Vijay Abraham I wrote:
>>> This series enhances Linux Vhost support to enable SoC-to-SoC
>>> communication over MMIO. This series enables rpmsg communication between
>>> two SoCs using both PCIe RC<->EP and HOST1-NTB-HOST2
>>>
>>> 1) Modify vhost to use standard Linux driver model
>>> 2) Add support in vring to access virtqueue over MMIO
>>> 3) Add vhost client driver for rpmsg
>>> 4) Add PCIe RC driver (uses virtio) and PCIe EP driver (uses vhost) for
>>>     rpmsg communication between two SoCs connected to each other
>>> 5) Add NTB Virtio driver and NTB Vhost driver for rpmsg communication
>>>     between two SoCs connected via NTB
>>> 6) Add configfs to configure the components
>>>
>>> UseCase1 :
>>>
>>>   VHOST RPMSG                     VIRTIO RPMSG
>>>        +                               +
>>>        |                               |
>>>        |                               |
>>>        |                               |
>>>        |                               |
>>> +-----v------+                 +------v-------+
>>> |   Linux    |                 |     Linux    |
>>> |  Endpoint  |                 | Root Complex |
>>> |            <----------------->              |
>>> |            |                 |              |
>>> |    SOC1    |                 |     SOC2     |
>>> +------------+                 +--------------+
>>>
>>> UseCase 2:
>>>
>>>       VHOST RPMSG                                      VIRTIO RPMSG
>>>            +                                                 +
>>>            |                                                 |
>>>            |                                                 |
>>>            |                                                 |
>>>            |                                                 |
>>>     +------v------+                                   +------v------+
>>>     |             |                                   |             |
>>>     |    HOST1    |                                   |    HOST2    |
>>>     |             |                                   |             |
>>>     +------^------+                                   +------^------+
>>>            |                                                 |
>>>            |                                                 |
>>> +---------------------------------------------------------------------+
>>> |  +------v------+                                   +------v------+  |
>>> |  |             |                                   |             |  |
>>> |  |     EP      |                                   |     EP      |  |
>>> |  | CONTROLLER1 |                                   | CONTROLLER2 |  |
>>> |  |             <----------------------------------->             |  |
>>> |  |             |                                   |             |  |
>>> |  |             |                                   |             |  |
>>> |  |             |  SoC With Multiple EP Instances   |             |  |
>>> |  |             |  (Configured using NTB Function)  |             |  |
>>> |  +-------------+                                   +-------------+  |
>>> +---------------------------------------------------------------------+
>>>
>>> Software Layering:
>>>
>>> The high-level SW layering should look something like below. This series
>>> adds support only for RPMSG VHOST, however something similar should be
>>> done for net and scsi. With that any vhost device (PCI, NTB, Platform
>>> device, user) can use any of the vhost client driver.
>>>
>>>
>>>      +----------------+  +-----------+  +------------+  +----------+
>>>      |  RPMSG VHOST   |  | NET VHOST |  | SCSI VHOST |  |    X     |
>>>      +-------^--------+  +-----^-----+  +-----^------+  +----^-----+
>>>              |                 |              |              |
>>>              |                 |              |              |
>>>              |                 |              |              |
>>> +-----------v-----------------v--------------v--------------v----------+
>>> |                            VHOST CORE                                |
>>> +--------^---------------^--------------------^------------------^-----+
>>>           |               |                    |                  |
>>>           |               |                    |                  |
>>>           |               |                    |                  |
>>> +--------v-------+  +----v------+  +----------v----------+  +----v-----+
>>> |  PCI EPF VHOST |  | NTB VHOST |  |PLATFORM DEVICE VHOST|  |    X     |
>>> +----------------+  +-----------+  +---------------------+  +----------+
>>>
>>> This was initially proposed here [1]
>>>
>>> [1] -> https://lore.kernel.org/r/2cf00ec4-1ed6-f66e-6897-006d1a5b6390@ti.com
>>
>> I find this very interesting. A huge patchset so will take a bit
>> to review, but I certainly plan to do that. Thanks!
> 
> 
> Yes, it would be better if there's a git branch for us to have a look.

I've pushed the branch
https://github.com/kishon/linux-wip.git vhost_rpmsg_pci_ntb_rfc
> 
> Btw, I'm not sure I get the big picture, but I vaguely feel some of the work is
> duplicated with vDPA (e.g the epf transport or vhost bus).

This is about connecting two different HW systems both running Linux and
doesn't necessarily involve virtualization. So there is no guest or host as in
virtualization but two entirely different systems connected via PCIe cable, one
acting as guest and one as host. So one system will provide virtio
functionality reserving memory for virtqueues and the other provides vhost
functionality providing a way to access the virtqueues in virtio memory. One is
source and the other is sink and there is no intermediate entity. (vhost was
probably intermediate entity in virtualization?)

> 
> Have you considered to implement these through vDPA?

IIUC vDPA only provides an interface to userspace and an in-kernel rpmsg driver
or vhost net driver is not provided.

The HW connection looks something like https://pasteboard.co/JfMVVHC.jpg
(usecase2 above), all the boards run Linux. The middle board provides NTB
functionality and board on either side provides virtio/vhost functionality and
transfer data using rpmsg.

Thanks
Kishon

> 
> Thanks
> 
> 
>>
>>> Kishon Vijay Abraham I (22):
>>>    vhost: Make _feature_ bits a property of vhost device
>>>    vhost: Introduce standard Linux driver model in VHOST
>>>    vhost: Add ops for the VHOST driver to configure VHOST device
>>>    vringh: Add helpers to access vring in MMIO
>>>    vhost: Add MMIO helpers for operations on vhost virtqueue
>>>    vhost: Introduce configfs entry for configuring VHOST
>>>    virtio_pci: Use request_threaded_irq() instead of request_irq()
>>>    rpmsg: virtio_rpmsg_bus: Disable receive virtqueue callback when
>>>      reading messages
>>>    rpmsg: Introduce configfs entry for configuring rpmsg
>>>    rpmsg: virtio_rpmsg_bus: Add Address Service Notification support
>>>    rpmsg: virtio_rpmsg_bus: Move generic rpmsg structure to
>>>      rpmsg_internal.h
>>>    virtio: Add ops to allocate and free buffer
>>>    rpmsg: virtio_rpmsg_bus: Use virtio_alloc_buffer() and
>>>      virtio_free_buffer()
>>>    rpmsg: Add VHOST based remote processor messaging bus
>>>    samples/rpmsg: Setup delayed work to send message
>>>    samples/rpmsg: Wait for address to be bound to rpdev for sending
>>>      message
>>>    rpmsg.txt: Add Documentation to configure rpmsg using configfs
>>>    virtio_pci: Add VIRTIO driver for VHOST on Configurable PCIe Endpoint
>>>      device
>>>    PCI: endpoint: Add EP function driver to provide VHOST interface
>>>    NTB: Add a new NTB client driver to implement VIRTIO functionality
>>>    NTB: Add a new NTB client driver to implement VHOST functionality
>>>    NTB: Describe the ntb_virtio and ntb_vhost client in the documentation
>>>
>>>   Documentation/driver-api/ntb.rst              |   11 +
>>>   Documentation/rpmsg.txt                       |   56 +
>>>   drivers/ntb/Kconfig                           |   18 +
>>>   drivers/ntb/Makefile                          |    2 +
>>>   drivers/ntb/ntb_vhost.c                       |  776 +++++++++++
>>>   drivers/ntb/ntb_virtio.c                      |  853 ++++++++++++
>>>   drivers/ntb/ntb_virtio.h                      |   56 +
>>>   drivers/pci/endpoint/functions/Kconfig        |   11 +
>>>   drivers/pci/endpoint/functions/Makefile       |    1 +
>>>   .../pci/endpoint/functions/pci-epf-vhost.c    | 1144 ++++++++++++++++
>>>   drivers/rpmsg/Kconfig                         |   10 +
>>>   drivers/rpmsg/Makefile                        |    3 +-
>>>   drivers/rpmsg/rpmsg_cfs.c                     |  394 ++++++
>>>   drivers/rpmsg/rpmsg_core.c                    |    7 +
>>>   drivers/rpmsg/rpmsg_internal.h                |  136 ++
>>>   drivers/rpmsg/vhost_rpmsg_bus.c               | 1151 +++++++++++++++++
>>>   drivers/rpmsg/virtio_rpmsg_bus.c              |  184 ++-
>>>   drivers/vhost/Kconfig                         |    1 +
>>>   drivers/vhost/Makefile                        |    2 +-
>>>   drivers/vhost/net.c                           |   10 +-
>>>   drivers/vhost/scsi.c                          |   24 +-
>>>   drivers/vhost/test.c                          |   17 +-
>>>   drivers/vhost/vdpa.c                          |    2 +-
>>>   drivers/vhost/vhost.c                         |  730 ++++++++++-
>>>   drivers/vhost/vhost_cfs.c                     |  341 +++++
>>>   drivers/vhost/vringh.c                        |  332 +++++
>>>   drivers/vhost/vsock.c                         |   20 +-
>>>   drivers/virtio/Kconfig                        |    9 +
>>>   drivers/virtio/Makefile                       |    1 +
>>>   drivers/virtio/virtio_pci_common.c            |   25 +-
>>>   drivers/virtio/virtio_pci_epf.c               |  670 ++++++++++
>>>   include/linux/mod_devicetable.h               |    6 +
>>>   include/linux/rpmsg.h                         |    6 +
>>>   {drivers/vhost => include/linux}/vhost.h      |  132 +-
>>>   include/linux/virtio.h                        |    3 +
>>>   include/linux/virtio_config.h                 |   42 +
>>>   include/linux/vringh.h                        |   46 +
>>>   samples/rpmsg/rpmsg_client_sample.c           |   32 +-
>>>   tools/virtio/virtio_test.c                    |    2 +-
>>>   39 files changed, 7083 insertions(+), 183 deletions(-)
>>>   create mode 100644 drivers/ntb/ntb_vhost.c
>>>   create mode 100644 drivers/ntb/ntb_virtio.c
>>>   create mode 100644 drivers/ntb/ntb_virtio.h
>>>   create mode 100644 drivers/pci/endpoint/functions/pci-epf-vhost.c
>>>   create mode 100644 drivers/rpmsg/rpmsg_cfs.c
>>>   create mode 100644 drivers/rpmsg/vhost_rpmsg_bus.c
>>>   create mode 100644 drivers/vhost/vhost_cfs.c
>>>   create mode 100644 drivers/virtio/virtio_pci_epf.c
>>>   rename {drivers/vhost => include/linux}/vhost.h (66%)
>>>
>>> -- 
>>> 2.17.1
>>>
>
Mathieu Poirier July 2, 2020, 5:31 p.m. UTC | #5
On Thu, 2 Jul 2020 at 03:51, Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Thu, Jul 02, 2020 at 01:51:21PM +0530, Kishon Vijay Abraham I wrote:
> > This series enhances Linux Vhost support to enable SoC-to-SoC
> > communication over MMIO. This series enables rpmsg communication between
> > two SoCs using both PCIe RC<->EP and HOST1-NTB-HOST2
> >
> > 1) Modify vhost to use standard Linux driver model
> > 2) Add support in vring to access virtqueue over MMIO
> > 3) Add vhost client driver for rpmsg
> > 4) Add PCIe RC driver (uses virtio) and PCIe EP driver (uses vhost) for
> >    rpmsg communication between two SoCs connected to each other
> > 5) Add NTB Virtio driver and NTB Vhost driver for rpmsg communication
> >    between two SoCs connected via NTB
> > 6) Add configfs to configure the components
> >
> > UseCase1 :
> >
> >  VHOST RPMSG                     VIRTIO RPMSG
> >       +                               +
> >       |                               |
> >       |                               |
> >       |                               |
> >       |                               |
> > +-----v------+                 +------v-------+
> > |   Linux    |                 |     Linux    |
> > |  Endpoint  |                 | Root Complex |
> > |            <----------------->              |
> > |            |                 |              |
> > |    SOC1    |                 |     SOC2     |
> > +------------+                 +--------------+
> >
> > UseCase 2:
> >
> >      VHOST RPMSG                                      VIRTIO RPMSG
> >           +                                                 +
> >           |                                                 |
> >           |                                                 |
> >           |                                                 |
> >           |                                                 |
> >    +------v------+                                   +------v------+
> >    |             |                                   |             |
> >    |    HOST1    |                                   |    HOST2    |
> >    |             |                                   |             |
> >    +------^------+                                   +------^------+
> >           |                                                 |
> >           |                                                 |
> > +---------------------------------------------------------------------+
> > |  +------v------+                                   +------v------+  |
> > |  |             |                                   |             |  |
> > |  |     EP      |                                   |     EP      |  |
> > |  | CONTROLLER1 |                                   | CONTROLLER2 |  |
> > |  |             <----------------------------------->             |  |
> > |  |             |                                   |             |  |
> > |  |             |                                   |             |  |
> > |  |             |  SoC With Multiple EP Instances   |             |  |
> > |  |             |  (Configured using NTB Function)  |             |  |
> > |  +-------------+                                   +-------------+  |
> > +---------------------------------------------------------------------+
> >
> > Software Layering:
> >
> > The high-level SW layering should look something like below. This series
> > adds support only for RPMSG VHOST, however something similar should be
> > done for net and scsi. With that any vhost device (PCI, NTB, Platform
> > device, user) can use any of the vhost client driver.
> >
> >
> >     +----------------+  +-----------+  +------------+  +----------+
> >     |  RPMSG VHOST   |  | NET VHOST |  | SCSI VHOST |  |    X     |
> >     +-------^--------+  +-----^-----+  +-----^------+  +----^-----+
> >             |                 |              |              |
> >             |                 |              |              |
> >             |                 |              |              |
> > +-----------v-----------------v--------------v--------------v----------+
> > |                            VHOST CORE                                |
> > +--------^---------------^--------------------^------------------^-----+
> >          |               |                    |                  |
> >          |               |                    |                  |
> >          |               |                    |                  |
> > +--------v-------+  +----v------+  +----------v----------+  +----v-----+
> > |  PCI EPF VHOST |  | NTB VHOST |  |PLATFORM DEVICE VHOST|  |    X     |
> > +----------------+  +-----------+  +---------------------+  +----------+
> >
> > This was initially proposed here [1]
> >
> > [1] -> https://lore.kernel.org/r/2cf00ec4-1ed6-f66e-6897-006d1a5b6390@ti.com
>
>
> I find this very interesting. A huge patchset so will take a bit
> to review, but I certainly plan to do that. Thanks!

Same here - it will take time.  This patchset is sizable and sits
behind a few others that are equally big.

>
> >
> > Kishon Vijay Abraham I (22):
> >   vhost: Make _feature_ bits a property of vhost device
> >   vhost: Introduce standard Linux driver model in VHOST
> >   vhost: Add ops for the VHOST driver to configure VHOST device
> >   vringh: Add helpers to access vring in MMIO
> >   vhost: Add MMIO helpers for operations on vhost virtqueue
> >   vhost: Introduce configfs entry for configuring VHOST
> >   virtio_pci: Use request_threaded_irq() instead of request_irq()
> >   rpmsg: virtio_rpmsg_bus: Disable receive virtqueue callback when
> >     reading messages
> >   rpmsg: Introduce configfs entry for configuring rpmsg
> >   rpmsg: virtio_rpmsg_bus: Add Address Service Notification support
> >   rpmsg: virtio_rpmsg_bus: Move generic rpmsg structure to
> >     rpmsg_internal.h
> >   virtio: Add ops to allocate and free buffer
> >   rpmsg: virtio_rpmsg_bus: Use virtio_alloc_buffer() and
> >     virtio_free_buffer()
> >   rpmsg: Add VHOST based remote processor messaging bus
> >   samples/rpmsg: Setup delayed work to send message
> >   samples/rpmsg: Wait for address to be bound to rpdev for sending
> >     message
> >   rpmsg.txt: Add Documentation to configure rpmsg using configfs
> >   virtio_pci: Add VIRTIO driver for VHOST on Configurable PCIe Endpoint
> >     device
> >   PCI: endpoint: Add EP function driver to provide VHOST interface
> >   NTB: Add a new NTB client driver to implement VIRTIO functionality
> >   NTB: Add a new NTB client driver to implement VHOST functionality
> >   NTB: Describe the ntb_virtio and ntb_vhost client in the documentation
> >
> >  Documentation/driver-api/ntb.rst              |   11 +
> >  Documentation/rpmsg.txt                       |   56 +
> >  drivers/ntb/Kconfig                           |   18 +
> >  drivers/ntb/Makefile                          |    2 +
> >  drivers/ntb/ntb_vhost.c                       |  776 +++++++++++
> >  drivers/ntb/ntb_virtio.c                      |  853 ++++++++++++
> >  drivers/ntb/ntb_virtio.h                      |   56 +
> >  drivers/pci/endpoint/functions/Kconfig        |   11 +
> >  drivers/pci/endpoint/functions/Makefile       |    1 +
> >  .../pci/endpoint/functions/pci-epf-vhost.c    | 1144 ++++++++++++++++
> >  drivers/rpmsg/Kconfig                         |   10 +
> >  drivers/rpmsg/Makefile                        |    3 +-
> >  drivers/rpmsg/rpmsg_cfs.c                     |  394 ++++++
> >  drivers/rpmsg/rpmsg_core.c                    |    7 +
> >  drivers/rpmsg/rpmsg_internal.h                |  136 ++
> >  drivers/rpmsg/vhost_rpmsg_bus.c               | 1151 +++++++++++++++++
> >  drivers/rpmsg/virtio_rpmsg_bus.c              |  184 ++-
> >  drivers/vhost/Kconfig                         |    1 +
> >  drivers/vhost/Makefile                        |    2 +-
> >  drivers/vhost/net.c                           |   10 +-
> >  drivers/vhost/scsi.c                          |   24 +-
> >  drivers/vhost/test.c                          |   17 +-
> >  drivers/vhost/vdpa.c                          |    2 +-
> >  drivers/vhost/vhost.c                         |  730 ++++++++++-
> >  drivers/vhost/vhost_cfs.c                     |  341 +++++
> >  drivers/vhost/vringh.c                        |  332 +++++
> >  drivers/vhost/vsock.c                         |   20 +-
> >  drivers/virtio/Kconfig                        |    9 +
> >  drivers/virtio/Makefile                       |    1 +
> >  drivers/virtio/virtio_pci_common.c            |   25 +-
> >  drivers/virtio/virtio_pci_epf.c               |  670 ++++++++++
> >  include/linux/mod_devicetable.h               |    6 +
> >  include/linux/rpmsg.h                         |    6 +
> >  {drivers/vhost => include/linux}/vhost.h      |  132 +-
> >  include/linux/virtio.h                        |    3 +
> >  include/linux/virtio_config.h                 |   42 +
> >  include/linux/vringh.h                        |   46 +
> >  samples/rpmsg/rpmsg_client_sample.c           |   32 +-
> >  tools/virtio/virtio_test.c                    |    2 +-
> >  39 files changed, 7083 insertions(+), 183 deletions(-)
> >  create mode 100644 drivers/ntb/ntb_vhost.c
> >  create mode 100644 drivers/ntb/ntb_virtio.c
> >  create mode 100644 drivers/ntb/ntb_virtio.h
> >  create mode 100644 drivers/pci/endpoint/functions/pci-epf-vhost.c
> >  create mode 100644 drivers/rpmsg/rpmsg_cfs.c
> >  create mode 100644 drivers/rpmsg/vhost_rpmsg_bus.c
> >  create mode 100644 drivers/vhost/vhost_cfs.c
> >  create mode 100644 drivers/virtio/virtio_pci_epf.c
> >  rename {drivers/vhost => include/linux}/vhost.h (66%)
> >
> > --
> > 2.17.1
> >
>
Kishon Vijay Abraham I July 3, 2020, 6:17 a.m. UTC | #6
+Alan, Haotian

On 7/2/2020 11:01 PM, Mathieu Poirier wrote:
> On Thu, 2 Jul 2020 at 03:51, Michael S. Tsirkin <mst@redhat.com> wrote:
>>
>> On Thu, Jul 02, 2020 at 01:51:21PM +0530, Kishon Vijay Abraham I wrote:
>>> This series enhances Linux Vhost support to enable SoC-to-SoC
>>> communication over MMIO. This series enables rpmsg communication between
>>> two SoCs using both PCIe RC<->EP and HOST1-NTB-HOST2
>>>
>>> 1) Modify vhost to use standard Linux driver model
>>> 2) Add support in vring to access virtqueue over MMIO
>>> 3) Add vhost client driver for rpmsg
>>> 4) Add PCIe RC driver (uses virtio) and PCIe EP driver (uses vhost) for
>>>    rpmsg communication between two SoCs connected to each other
>>> 5) Add NTB Virtio driver and NTB Vhost driver for rpmsg communication
>>>    between two SoCs connected via NTB
>>> 6) Add configfs to configure the components
>>>
>>> UseCase1 :
>>>
>>>  VHOST RPMSG                     VIRTIO RPMSG
>>>       +                               +
>>>       |                               |
>>>       |                               |
>>>       |                               |
>>>       |                               |
>>> +-----v------+                 +------v-------+
>>> |   Linux    |                 |     Linux    |
>>> |  Endpoint  |                 | Root Complex |
>>> |            <----------------->              |
>>> |            |                 |              |
>>> |    SOC1    |                 |     SOC2     |
>>> +------------+                 +--------------+
>>>
>>> UseCase 2:
>>>
>>>      VHOST RPMSG                                      VIRTIO RPMSG
>>>           +                                                 +
>>>           |                                                 |
>>>           |                                                 |
>>>           |                                                 |
>>>           |                                                 |
>>>    +------v------+                                   +------v------+
>>>    |             |                                   |             |
>>>    |    HOST1    |                                   |    HOST2    |
>>>    |             |                                   |             |
>>>    +------^------+                                   +------^------+
>>>           |                                                 |
>>>           |                                                 |
>>> +---------------------------------------------------------------------+
>>> |  +------v------+                                   +------v------+  |
>>> |  |             |                                   |             |  |
>>> |  |     EP      |                                   |     EP      |  |
>>> |  | CONTROLLER1 |                                   | CONTROLLER2 |  |
>>> |  |             <----------------------------------->             |  |
>>> |  |             |                                   |             |  |
>>> |  |             |                                   |             |  |
>>> |  |             |  SoC With Multiple EP Instances   |             |  |
>>> |  |             |  (Configured using NTB Function)  |             |  |
>>> |  +-------------+                                   +-------------+  |
>>> +---------------------------------------------------------------------+
>>>
>>> Software Layering:
>>>
>>> The high-level SW layering should look something like below. This series
>>> adds support only for RPMSG VHOST, however something similar should be
>>> done for net and scsi. With that any vhost device (PCI, NTB, Platform
>>> device, user) can use any of the vhost client driver.
>>>
>>>
>>>     +----------------+  +-----------+  +------------+  +----------+
>>>     |  RPMSG VHOST   |  | NET VHOST |  | SCSI VHOST |  |    X     |
>>>     +-------^--------+  +-----^-----+  +-----^------+  +----^-----+
>>>             |                 |              |              |
>>>             |                 |              |              |
>>>             |                 |              |              |
>>> +-----------v-----------------v--------------v--------------v----------+
>>> |                            VHOST CORE                                |
>>> +--------^---------------^--------------------^------------------^-----+
>>>          |               |                    |                  |
>>>          |               |                    |                  |
>>>          |               |                    |                  |
>>> +--------v-------+  +----v------+  +----------v----------+  +----v-----+
>>> |  PCI EPF VHOST |  | NTB VHOST |  |PLATFORM DEVICE VHOST|  |    X     |
>>> +----------------+  +-----------+  +---------------------+  +----------+
>>>
>>> This was initially proposed here [1]
>>>
>>> [1] -> https://lore.kernel.org/r/2cf00ec4-1ed6-f66e-6897-006d1a5b6390@ti.com
>>
>>
>> I find this very interesting. A huge patchset so will take a bit
>> to review, but I certainly plan to do that. Thanks!
> 
> Same here - it will take time.  This patchset is sizable and sits
> behind a few others that are equally big.
> 
>>
>>>
>>> Kishon Vijay Abraham I (22):
>>>   vhost: Make _feature_ bits a property of vhost device
>>>   vhost: Introduce standard Linux driver model in VHOST
>>>   vhost: Add ops for the VHOST driver to configure VHOST device
>>>   vringh: Add helpers to access vring in MMIO
>>>   vhost: Add MMIO helpers for operations on vhost virtqueue
>>>   vhost: Introduce configfs entry for configuring VHOST
>>>   virtio_pci: Use request_threaded_irq() instead of request_irq()
>>>   rpmsg: virtio_rpmsg_bus: Disable receive virtqueue callback when
>>>     reading messages
>>>   rpmsg: Introduce configfs entry for configuring rpmsg
>>>   rpmsg: virtio_rpmsg_bus: Add Address Service Notification support
>>>   rpmsg: virtio_rpmsg_bus: Move generic rpmsg structure to
>>>     rpmsg_internal.h
>>>   virtio: Add ops to allocate and free buffer
>>>   rpmsg: virtio_rpmsg_bus: Use virtio_alloc_buffer() and
>>>     virtio_free_buffer()
>>>   rpmsg: Add VHOST based remote processor messaging bus
>>>   samples/rpmsg: Setup delayed work to send message
>>>   samples/rpmsg: Wait for address to be bound to rpdev for sending
>>>     message
>>>   rpmsg.txt: Add Documentation to configure rpmsg using configfs
>>>   virtio_pci: Add VIRTIO driver for VHOST on Configurable PCIe Endpoint
>>>     device
>>>   PCI: endpoint: Add EP function driver to provide VHOST interface
>>>   NTB: Add a new NTB client driver to implement VIRTIO functionality
>>>   NTB: Add a new NTB client driver to implement VHOST functionality
>>>   NTB: Describe the ntb_virtio and ntb_vhost client in the documentation
>>>
>>>  Documentation/driver-api/ntb.rst              |   11 +
>>>  Documentation/rpmsg.txt                       |   56 +
>>>  drivers/ntb/Kconfig                           |   18 +
>>>  drivers/ntb/Makefile                          |    2 +
>>>  drivers/ntb/ntb_vhost.c                       |  776 +++++++++++
>>>  drivers/ntb/ntb_virtio.c                      |  853 ++++++++++++
>>>  drivers/ntb/ntb_virtio.h                      |   56 +
>>>  drivers/pci/endpoint/functions/Kconfig        |   11 +
>>>  drivers/pci/endpoint/functions/Makefile       |    1 +
>>>  .../pci/endpoint/functions/pci-epf-vhost.c    | 1144 ++++++++++++++++
>>>  drivers/rpmsg/Kconfig                         |   10 +
>>>  drivers/rpmsg/Makefile                        |    3 +-
>>>  drivers/rpmsg/rpmsg_cfs.c                     |  394 ++++++
>>>  drivers/rpmsg/rpmsg_core.c                    |    7 +
>>>  drivers/rpmsg/rpmsg_internal.h                |  136 ++
>>>  drivers/rpmsg/vhost_rpmsg_bus.c               | 1151 +++++++++++++++++
>>>  drivers/rpmsg/virtio_rpmsg_bus.c              |  184 ++-
>>>  drivers/vhost/Kconfig                         |    1 +
>>>  drivers/vhost/Makefile                        |    2 +-
>>>  drivers/vhost/net.c                           |   10 +-
>>>  drivers/vhost/scsi.c                          |   24 +-
>>>  drivers/vhost/test.c                          |   17 +-
>>>  drivers/vhost/vdpa.c                          |    2 +-
>>>  drivers/vhost/vhost.c                         |  730 ++++++++++-
>>>  drivers/vhost/vhost_cfs.c                     |  341 +++++
>>>  drivers/vhost/vringh.c                        |  332 +++++
>>>  drivers/vhost/vsock.c                         |   20 +-
>>>  drivers/virtio/Kconfig                        |    9 +
>>>  drivers/virtio/Makefile                       |    1 +
>>>  drivers/virtio/virtio_pci_common.c            |   25 +-
>>>  drivers/virtio/virtio_pci_epf.c               |  670 ++++++++++
>>>  include/linux/mod_devicetable.h               |    6 +
>>>  include/linux/rpmsg.h                         |    6 +
>>>  {drivers/vhost => include/linux}/vhost.h      |  132 +-
>>>  include/linux/virtio.h                        |    3 +
>>>  include/linux/virtio_config.h                 |   42 +
>>>  include/linux/vringh.h                        |   46 +
>>>  samples/rpmsg/rpmsg_client_sample.c           |   32 +-
>>>  tools/virtio/virtio_test.c                    |    2 +-
>>>  39 files changed, 7083 insertions(+), 183 deletions(-)
>>>  create mode 100644 drivers/ntb/ntb_vhost.c
>>>  create mode 100644 drivers/ntb/ntb_virtio.c
>>>  create mode 100644 drivers/ntb/ntb_virtio.h
>>>  create mode 100644 drivers/pci/endpoint/functions/pci-epf-vhost.c
>>>  create mode 100644 drivers/rpmsg/rpmsg_cfs.c
>>>  create mode 100644 drivers/rpmsg/vhost_rpmsg_bus.c
>>>  create mode 100644 drivers/vhost/vhost_cfs.c
>>>  create mode 100644 drivers/virtio/virtio_pci_epf.c
>>>  rename {drivers/vhost => include/linux}/vhost.h (66%)
>>>
>>> --
>>> 2.17.1
>>>
>>
Jason Wang July 3, 2020, 7:16 a.m. UTC | #7
On 2020/7/2 下午9:35, Kishon Vijay Abraham I wrote:
> Hi Jason,
>
> On 7/2/2020 3:40 PM, Jason Wang wrote:
>> On 2020/7/2 下午5:51, Michael S. Tsirkin wrote:
>>> On Thu, Jul 02, 2020 at 01:51:21PM +0530, Kishon Vijay Abraham I wrote:
>>>> This series enhances Linux Vhost support to enable SoC-to-SoC
>>>> communication over MMIO. This series enables rpmsg communication between
>>>> two SoCs using both PCIe RC<->EP and HOST1-NTB-HOST2
>>>>
>>>> 1) Modify vhost to use standard Linux driver model
>>>> 2) Add support in vring to access virtqueue over MMIO
>>>> 3) Add vhost client driver for rpmsg
>>>> 4) Add PCIe RC driver (uses virtio) and PCIe EP driver (uses vhost) for
>>>>      rpmsg communication between two SoCs connected to each other
>>>> 5) Add NTB Virtio driver and NTB Vhost driver for rpmsg communication
>>>>      between two SoCs connected via NTB
>>>> 6) Add configfs to configure the components
>>>>
>>>> UseCase1 :
>>>>
>>>>    VHOST RPMSG                     VIRTIO RPMSG
>>>>         +                               +
>>>>         |                               |
>>>>         |                               |
>>>>         |                               |
>>>>         |                               |
>>>> +-----v------+                 +------v-------+
>>>> |   Linux    |                 |     Linux    |
>>>> |  Endpoint  |                 | Root Complex |
>>>> |            <----------------->              |
>>>> |            |                 |              |
>>>> |    SOC1    |                 |     SOC2     |
>>>> +------------+                 +--------------+
>>>>
>>>> UseCase 2:
>>>>
>>>>        VHOST RPMSG                                      VIRTIO RPMSG
>>>>             +                                                 +
>>>>             |                                                 |
>>>>             |                                                 |
>>>>             |                                                 |
>>>>             |                                                 |
>>>>      +------v------+                                   +------v------+
>>>>      |             |                                   |             |
>>>>      |    HOST1    |                                   |    HOST2    |
>>>>      |             |                                   |             |
>>>>      +------^------+                                   +------^------+
>>>>             |                                                 |
>>>>             |                                                 |
>>>> +---------------------------------------------------------------------+
>>>> |  +------v------+                                   +------v------+  |
>>>> |  |             |                                   |             |  |
>>>> |  |     EP      |                                   |     EP      |  |
>>>> |  | CONTROLLER1 |                                   | CONTROLLER2 |  |
>>>> |  |             <----------------------------------->             |  |
>>>> |  |             |                                   |             |  |
>>>> |  |             |                                   |             |  |
>>>> |  |             |  SoC With Multiple EP Instances   |             |  |
>>>> |  |             |  (Configured using NTB Function)  |             |  |
>>>> |  +-------------+                                   +-------------+  |
>>>> +---------------------------------------------------------------------+
>>>>
>>>> Software Layering:
>>>>
>>>> The high-level SW layering should look something like below. This series
>>>> adds support only for RPMSG VHOST, however something similar should be
>>>> done for net and scsi. With that any vhost device (PCI, NTB, Platform
>>>> device, user) can use any of the vhost client driver.
>>>>
>>>>
>>>>       +----------------+  +-----------+  +------------+  +----------+
>>>>       |  RPMSG VHOST   |  | NET VHOST |  | SCSI VHOST |  |    X     |
>>>>       +-------^--------+  +-----^-----+  +-----^------+  +----^-----+
>>>>               |                 |              |              |
>>>>               |                 |              |              |
>>>>               |                 |              |              |
>>>> +-----------v-----------------v--------------v--------------v----------+
>>>> |                            VHOST CORE                                |
>>>> +--------^---------------^--------------------^------------------^-----+
>>>>            |               |                    |                  |
>>>>            |               |                    |                  |
>>>>            |               |                    |                  |
>>>> +--------v-------+  +----v------+  +----------v----------+  +----v-----+
>>>> |  PCI EPF VHOST |  | NTB VHOST |  |PLATFORM DEVICE VHOST|  |    X     |
>>>> +----------------+  +-----------+  +---------------------+  +----------+
>>>>
>>>> This was initially proposed here [1]
>>>>
>>>> [1] -> https://lore.kernel.org/r/2cf00ec4-1ed6-f66e-6897-006d1a5b6390@ti.com
>>> I find this very interesting. A huge patchset so will take a bit
>>> to review, but I certainly plan to do that. Thanks!
>>
>> Yes, it would be better if there's a git branch for us to have a look.
> I've pushed the branch
> https://github.com/kishon/linux-wip.git vhost_rpmsg_pci_ntb_rfc


Thanks


>> Btw, I'm not sure I get the big picture, but I vaguely feel some of the work is
>> duplicated with vDPA (e.g the epf transport or vhost bus).
> This is about connecting two different HW systems both running Linux and
> doesn't necessarily involve virtualization.


Right, this is something similar to VOP 
(Documentation/misc-devices/mic/mic_overview.rst). The different is the 
hardware I guess and VOP use userspace application to implement the device.


>   So there is no guest or host as in
> virtualization but two entirely different systems connected via PCIe cable, one
> acting as guest and one as host. So one system will provide virtio
> functionality reserving memory for virtqueues and the other provides vhost
> functionality providing a way to access the virtqueues in virtio memory. One is
> source and the other is sink and there is no intermediate entity. (vhost was
> probably intermediate entity in virtualization?)


(Not a native English speaker) but "vhost" could introduce some 
confusion for me since it was use for implementing virtio backend for 
userspace drivers. I guess "vringh" could be better.


>
>> Have you considered to implement these through vDPA?
> IIUC vDPA only provides an interface to userspace and an in-kernel rpmsg driver
> or vhost net driver is not provided.
>
> The HW connection looks something like https://pasteboard.co/JfMVVHC.jpg
> (usecase2 above),


I see.


>   all the boards run Linux. The middle board provides NTB
> functionality and board on either side provides virtio/vhost functionality and
> transfer data using rpmsg.


So I wonder whether it's worthwhile for a new bus. Can we use the 
existed virtio-bus/drivers? It might work as, except for the epf 
transport, we can introduce a epf "vhost" transport driver.

It will have virtqueues but only used for the communication between 
itself and uppter virtio driver. And it will have vringh queues which 
will be probe by virtio epf transport drivers. And it needs to do 
datacopy between virtqueue and vringh queues.

It works like:

virtio drivers <- virtqueue/virtio-bus -> epf vhost drivers <- vringh 
queue/epf>

The advantages is that there's no need for writing new buses and drivers.

Does this make sense?

Thanks


>
> Thanks
> Kishon
>
>> Thanks
>>
>>
>>>> Kishon Vijay Abraham I (22):
>>>>     vhost: Make _feature_ bits a property of vhost device
>>>>     vhost: Introduce standard Linux driver model in VHOST
>>>>     vhost: Add ops for the VHOST driver to configure VHOST device
>>>>     vringh: Add helpers to access vring in MMIO
>>>>     vhost: Add MMIO helpers for operations on vhost virtqueue
>>>>     vhost: Introduce configfs entry for configuring VHOST
>>>>     virtio_pci: Use request_threaded_irq() instead of request_irq()
>>>>     rpmsg: virtio_rpmsg_bus: Disable receive virtqueue callback when
>>>>       reading messages
>>>>     rpmsg: Introduce configfs entry for configuring rpmsg
>>>>     rpmsg: virtio_rpmsg_bus: Add Address Service Notification support
>>>>     rpmsg: virtio_rpmsg_bus: Move generic rpmsg structure to
>>>>       rpmsg_internal.h
>>>>     virtio: Add ops to allocate and free buffer
>>>>     rpmsg: virtio_rpmsg_bus: Use virtio_alloc_buffer() and
>>>>       virtio_free_buffer()
>>>>     rpmsg: Add VHOST based remote processor messaging bus
>>>>     samples/rpmsg: Setup delayed work to send message
>>>>     samples/rpmsg: Wait for address to be bound to rpdev for sending
>>>>       message
>>>>     rpmsg.txt: Add Documentation to configure rpmsg using configfs
>>>>     virtio_pci: Add VIRTIO driver for VHOST on Configurable PCIe Endpoint
>>>>       device
>>>>     PCI: endpoint: Add EP function driver to provide VHOST interface
>>>>     NTB: Add a new NTB client driver to implement VIRTIO functionality
>>>>     NTB: Add a new NTB client driver to implement VHOST functionality
>>>>     NTB: Describe the ntb_virtio and ntb_vhost client in the documentation
>>>>
>>>>    Documentation/driver-api/ntb.rst              |   11 +
>>>>    Documentation/rpmsg.txt                       |   56 +
>>>>    drivers/ntb/Kconfig                           |   18 +
>>>>    drivers/ntb/Makefile                          |    2 +
>>>>    drivers/ntb/ntb_vhost.c                       |  776 +++++++++++
>>>>    drivers/ntb/ntb_virtio.c                      |  853 ++++++++++++
>>>>    drivers/ntb/ntb_virtio.h                      |   56 +
>>>>    drivers/pci/endpoint/functions/Kconfig        |   11 +
>>>>    drivers/pci/endpoint/functions/Makefile       |    1 +
>>>>    .../pci/endpoint/functions/pci-epf-vhost.c    | 1144 ++++++++++++++++
>>>>    drivers/rpmsg/Kconfig                         |   10 +
>>>>    drivers/rpmsg/Makefile                        |    3 +-
>>>>    drivers/rpmsg/rpmsg_cfs.c                     |  394 ++++++
>>>>    drivers/rpmsg/rpmsg_core.c                    |    7 +
>>>>    drivers/rpmsg/rpmsg_internal.h                |  136 ++
>>>>    drivers/rpmsg/vhost_rpmsg_bus.c               | 1151 +++++++++++++++++
>>>>    drivers/rpmsg/virtio_rpmsg_bus.c              |  184 ++-
>>>>    drivers/vhost/Kconfig                         |    1 +
>>>>    drivers/vhost/Makefile                        |    2 +-
>>>>    drivers/vhost/net.c                           |   10 +-
>>>>    drivers/vhost/scsi.c                          |   24 +-
>>>>    drivers/vhost/test.c                          |   17 +-
>>>>    drivers/vhost/vdpa.c                          |    2 +-
>>>>    drivers/vhost/vhost.c                         |  730 ++++++++++-
>>>>    drivers/vhost/vhost_cfs.c                     |  341 +++++
>>>>    drivers/vhost/vringh.c                        |  332 +++++
>>>>    drivers/vhost/vsock.c                         |   20 +-
>>>>    drivers/virtio/Kconfig                        |    9 +
>>>>    drivers/virtio/Makefile                       |    1 +
>>>>    drivers/virtio/virtio_pci_common.c            |   25 +-
>>>>    drivers/virtio/virtio_pci_epf.c               |  670 ++++++++++
>>>>    include/linux/mod_devicetable.h               |    6 +
>>>>    include/linux/rpmsg.h                         |    6 +
>>>>    {drivers/vhost => include/linux}/vhost.h      |  132 +-
>>>>    include/linux/virtio.h                        |    3 +
>>>>    include/linux/virtio_config.h                 |   42 +
>>>>    include/linux/vringh.h                        |   46 +
>>>>    samples/rpmsg/rpmsg_client_sample.c           |   32 +-
>>>>    tools/virtio/virtio_test.c                    |    2 +-
>>>>    39 files changed, 7083 insertions(+), 183 deletions(-)
>>>>    create mode 100644 drivers/ntb/ntb_vhost.c
>>>>    create mode 100644 drivers/ntb/ntb_virtio.c
>>>>    create mode 100644 drivers/ntb/ntb_virtio.h
>>>>    create mode 100644 drivers/pci/endpoint/functions/pci-epf-vhost.c
>>>>    create mode 100644 drivers/rpmsg/rpmsg_cfs.c
>>>>    create mode 100644 drivers/rpmsg/vhost_rpmsg_bus.c
>>>>    create mode 100644 drivers/vhost/vhost_cfs.c
>>>>    create mode 100644 drivers/virtio/virtio_pci_epf.c
>>>>    rename {drivers/vhost => include/linux}/vhost.h (66%)
>>>>
>>>> -- 
>>>> 2.17.1
>>>>
Kishon Vijay Abraham I July 6, 2020, 9:32 a.m. UTC | #8
Hi Jason,

On 7/3/2020 12:46 PM, Jason Wang wrote:
> 
> On 2020/7/2 下午9:35, Kishon Vijay Abraham I wrote:
>> Hi Jason,
>>
>> On 7/2/2020 3:40 PM, Jason Wang wrote:
>>> On 2020/7/2 下午5:51, Michael S. Tsirkin wrote:
>>>> On Thu, Jul 02, 2020 at 01:51:21PM +0530, Kishon Vijay Abraham I wrote:
>>>>> This series enhances Linux Vhost support to enable SoC-to-SoC
>>>>> communication over MMIO. This series enables rpmsg communication between
>>>>> two SoCs using both PCIe RC<->EP and HOST1-NTB-HOST2
>>>>>
>>>>> 1) Modify vhost to use standard Linux driver model
>>>>> 2) Add support in vring to access virtqueue over MMIO
>>>>> 3) Add vhost client driver for rpmsg
>>>>> 4) Add PCIe RC driver (uses virtio) and PCIe EP driver (uses vhost) for
>>>>>      rpmsg communication between two SoCs connected to each other
>>>>> 5) Add NTB Virtio driver and NTB Vhost driver for rpmsg communication
>>>>>      between two SoCs connected via NTB
>>>>> 6) Add configfs to configure the components
>>>>>
>>>>> UseCase1 :
>>>>>
>>>>>    VHOST RPMSG                     VIRTIO RPMSG
>>>>>         +                               +
>>>>>         |                               |
>>>>>         |                               |
>>>>>         |                               |
>>>>>         |                               |
>>>>> +-----v------+                 +------v-------+
>>>>> |   Linux    |                 |     Linux    |
>>>>> |  Endpoint  |                 | Root Complex |
>>>>> |            <----------------->              |
>>>>> |            |                 |              |
>>>>> |    SOC1    |                 |     SOC2     |
>>>>> +------------+                 +--------------+
>>>>>
>>>>> UseCase 2:
>>>>>
>>>>>        VHOST RPMSG                                      VIRTIO RPMSG
>>>>>             +                                                 +
>>>>>             |                                                 |
>>>>>             |                                                 |
>>>>>             |                                                 |
>>>>>             |                                                 |
>>>>>      +------v------+                                   +------v------+
>>>>>      |             |                                   |             |
>>>>>      |    HOST1    |                                   |    HOST2    |
>>>>>      |             |                                   |             |
>>>>>      +------^------+                                   +------^------+
>>>>>             |                                                 |
>>>>>             |                                                 |
>>>>> +---------------------------------------------------------------------+
>>>>> |  +------v------+                                   +------v------+  |
>>>>> |  |             |                                   |             |  |
>>>>> |  |     EP      |                                   |     EP      |  |
>>>>> |  | CONTROLLER1 |                                   | CONTROLLER2 |  |
>>>>> |  |             <----------------------------------->             |  |
>>>>> |  |             |                                   |             |  |
>>>>> |  |             |                                   |             |  |
>>>>> |  |             |  SoC With Multiple EP Instances   |             |  |
>>>>> |  |             |  (Configured using NTB Function)  |             |  |
>>>>> |  +-------------+                                   +-------------+  |
>>>>> +---------------------------------------------------------------------+
>>>>>
>>>>> Software Layering:
>>>>>
>>>>> The high-level SW layering should look something like below. This series
>>>>> adds support only for RPMSG VHOST, however something similar should be
>>>>> done for net and scsi. With that any vhost device (PCI, NTB, Platform
>>>>> device, user) can use any of the vhost client driver.
>>>>>
>>>>>
>>>>>       +----------------+  +-----------+  +------------+  +----------+
>>>>>       |  RPMSG VHOST   |  | NET VHOST |  | SCSI VHOST |  |    X     |
>>>>>       +-------^--------+  +-----^-----+  +-----^------+  +----^-----+
>>>>>               |                 |              |              |
>>>>>               |                 |              |              |
>>>>>               |                 |              |              |
>>>>> +-----------v-----------------v--------------v--------------v----------+
>>>>> |                            VHOST CORE                                |
>>>>> +--------^---------------^--------------------^------------------^-----+
>>>>>            |               |                    |                  |
>>>>>            |               |                    |                  |
>>>>>            |               |                    |                  |
>>>>> +--------v-------+  +----v------+  +----------v----------+  +----v-----+
>>>>> |  PCI EPF VHOST |  | NTB VHOST |  |PLATFORM DEVICE VHOST|  |    X     |
>>>>> +----------------+  +-----------+  +---------------------+  +----------+
>>>>>
>>>>> This was initially proposed here [1]
>>>>>
>>>>> [1] -> https://lore.kernel.org/r/2cf00ec4-1ed6-f66e-6897-006d1a5b6390@ti.com
>>>> I find this very interesting. A huge patchset so will take a bit
>>>> to review, but I certainly plan to do that. Thanks!
>>>
>>> Yes, it would be better if there's a git branch for us to have a look.
>> I've pushed the branch
>> https://github.com/kishon/linux-wip.git vhost_rpmsg_pci_ntb_rfc
> 
> 
> Thanks
> 
> 
>>> Btw, I'm not sure I get the big picture, but I vaguely feel some of the work is
>>> duplicated with vDPA (e.g the epf transport or vhost bus).
>> This is about connecting two different HW systems both running Linux and
>> doesn't necessarily involve virtualization.
> 
> 
> Right, this is something similar to VOP
> (Documentation/misc-devices/mic/mic_overview.rst). The different is the
> hardware I guess and VOP use userspace application to implement the device.

I'd also like to point out, this series tries to have communication between two
SoCs in vendor agnostic way. Since this series solves for 2 usecases (PCIe
RC<->EP and NTB), for the NTB case it directly plugs into NTB framework and any
of the HW in NTB below should be able to use a virtio-vhost communication

#ls drivers/ntb/hw/
amd  epf  idt  intel  mscc

And similarly for the PCIe RC<->EP communication, this adds a generic endpoint
function driver and hence any SoC that supports configurable PCIe endpoint can
use virtio-vhost communication

# ls drivers/pci/controller/dwc/*ep*
drivers/pci/controller/dwc/pcie-designware-ep.c
drivers/pci/controller/dwc/pcie-uniphier-ep.c
drivers/pci/controller/dwc/pci-layerscape-ep.c

> 
> 
>>   So there is no guest or host as in
>> virtualization but two entirely different systems connected via PCIe cable, one
>> acting as guest and one as host. So one system will provide virtio
>> functionality reserving memory for virtqueues and the other provides vhost
>> functionality providing a way to access the virtqueues in virtio memory. One is
>> source and the other is sink and there is no intermediate entity. (vhost was
>> probably intermediate entity in virtualization?)
> 
> 
> (Not a native English speaker) but "vhost" could introduce some confusion for
> me since it was use for implementing virtio backend for userspace drivers. I
> guess "vringh" could be better.

Initially I had named this vringh but later decided to choose vhost instead of
vringh. vhost is still a virtio backend (not necessarily userspace) though it
now resides in an entirely different system. Whatever virtio is for a frontend
system, vhost can be that for a backend system. vring can be for accessing
virtqueue and can be used either in frontend or backend.
> 
> 
>>
>>> Have you considered to implement these through vDPA?
>> IIUC vDPA only provides an interface to userspace and an in-kernel rpmsg driver
>> or vhost net driver is not provided.
>>
>> The HW connection looks something like https://pasteboard.co/JfMVVHC.jpg
>> (usecase2 above),
> 
> 
> I see.
> 
> 
>>   all the boards run Linux. The middle board provides NTB
>> functionality and board on either side provides virtio/vhost functionality and
>> transfer data using rpmsg.
> 
> 
> So I wonder whether it's worthwhile for a new bus. Can we use the existed
> virtio-bus/drivers? It might work as, except for the epf transport, we can
> introduce a epf "vhost" transport driver.

IMHO we'll need two buses one for frontend and other for backend because the
two components can then co-operate/interact with each other to provide a
functionality. Though both will seemingly provide similar callbacks, they are
both provide symmetrical or complimentary funcitonality and need not be same or
identical.

Having the same bus can also create sequencing issues.

If you look at virtio_dev_probe() of virtio_bus

device_features = dev->config->get_features(dev);

Now if we use same bus for both front-end and back-end, both will try to
get_features when there has been no set_features. Ideally vhost device should
be initialized first with the set of features it supports. Vhost and virtio
should use "status" and "features" complimentarily and not identically.

virtio device (or frontend) cannot be initialized before vhost device (or
backend) gets initialized with data such as features. Similarly vhost (backend)
cannot access virqueues or buffers before virtio (frontend) sets
VIRTIO_CONFIG_S_DRIVER_OK whereas that requirement is not there for virtio as
the physical memory for virtqueues are created by virtio (frontend).

> 
> It will have virtqueues but only used for the communication between itself and
> uppter virtio driver. And it will have vringh queues which will be probe by
> virtio epf transport drivers. And it needs to do datacopy between virtqueue and
> vringh queues.
> 
> It works like:
> 
> virtio drivers <- virtqueue/virtio-bus -> epf vhost drivers <- vringh queue/epf>
> 
> The advantages is that there's no need for writing new buses and drivers.

I think this will work however there is an addtional copy between vringh queue
and virtqueue, in some cases adds latency because of forwarding interrupts
between vhost and virtio driver, vhost drivers providing features (which means
it has to be aware of which virtio driver will be connected).
virtio drivers (front end) generally access the buffers from it's local memory
but when in backend it can access over MMIO (like PCI EPF or NTB) or userspace.
> 
> Does this make sense?

Two copies in my opinion is an issue but lets get others opinions as well.

Thanks for your suggestions!

Regards
Kishon

> 
> Thanks
> 
> 
>>
>> Thanks
>> Kishon
>>
>>> Thanks
>>>
>>>
>>>>> Kishon Vijay Abraham I (22):
>>>>>     vhost: Make _feature_ bits a property of vhost device
>>>>>     vhost: Introduce standard Linux driver model in VHOST
>>>>>     vhost: Add ops for the VHOST driver to configure VHOST device
>>>>>     vringh: Add helpers to access vring in MMIO
>>>>>     vhost: Add MMIO helpers for operations on vhost virtqueue
>>>>>     vhost: Introduce configfs entry for configuring VHOST
>>>>>     virtio_pci: Use request_threaded_irq() instead of request_irq()
>>>>>     rpmsg: virtio_rpmsg_bus: Disable receive virtqueue callback when
>>>>>       reading messages
>>>>>     rpmsg: Introduce configfs entry for configuring rpmsg
>>>>>     rpmsg: virtio_rpmsg_bus: Add Address Service Notification support
>>>>>     rpmsg: virtio_rpmsg_bus: Move generic rpmsg structure to
>>>>>       rpmsg_internal.h
>>>>>     virtio: Add ops to allocate and free buffer
>>>>>     rpmsg: virtio_rpmsg_bus: Use virtio_alloc_buffer() and
>>>>>       virtio_free_buffer()
>>>>>     rpmsg: Add VHOST based remote processor messaging bus
>>>>>     samples/rpmsg: Setup delayed work to send message
>>>>>     samples/rpmsg: Wait for address to be bound to rpdev for sending
>>>>>       message
>>>>>     rpmsg.txt: Add Documentation to configure rpmsg using configfs
>>>>>     virtio_pci: Add VIRTIO driver for VHOST on Configurable PCIe Endpoint
>>>>>       device
>>>>>     PCI: endpoint: Add EP function driver to provide VHOST interface
>>>>>     NTB: Add a new NTB client driver to implement VIRTIO functionality
>>>>>     NTB: Add a new NTB client driver to implement VHOST functionality
>>>>>     NTB: Describe the ntb_virtio and ntb_vhost client in the documentation
>>>>>
>>>>>    Documentation/driver-api/ntb.rst              |   11 +
>>>>>    Documentation/rpmsg.txt                       |   56 +
>>>>>    drivers/ntb/Kconfig                           |   18 +
>>>>>    drivers/ntb/Makefile                          |    2 +
>>>>>    drivers/ntb/ntb_vhost.c                       |  776 +++++++++++
>>>>>    drivers/ntb/ntb_virtio.c                      |  853 ++++++++++++
>>>>>    drivers/ntb/ntb_virtio.h                      |   56 +
>>>>>    drivers/pci/endpoint/functions/Kconfig        |   11 +
>>>>>    drivers/pci/endpoint/functions/Makefile       |    1 +
>>>>>    .../pci/endpoint/functions/pci-epf-vhost.c    | 1144 ++++++++++++++++
>>>>>    drivers/rpmsg/Kconfig                         |   10 +
>>>>>    drivers/rpmsg/Makefile                        |    3 +-
>>>>>    drivers/rpmsg/rpmsg_cfs.c                     |  394 ++++++
>>>>>    drivers/rpmsg/rpmsg_core.c                    |    7 +
>>>>>    drivers/rpmsg/rpmsg_internal.h                |  136 ++
>>>>>    drivers/rpmsg/vhost_rpmsg_bus.c               | 1151 +++++++++++++++++
>>>>>    drivers/rpmsg/virtio_rpmsg_bus.c              |  184 ++-
>>>>>    drivers/vhost/Kconfig                         |    1 +
>>>>>    drivers/vhost/Makefile                        |    2 +-
>>>>>    drivers/vhost/net.c                           |   10 +-
>>>>>    drivers/vhost/scsi.c                          |   24 +-
>>>>>    drivers/vhost/test.c                          |   17 +-
>>>>>    drivers/vhost/vdpa.c                          |    2 +-
>>>>>    drivers/vhost/vhost.c                         |  730 ++++++++++-
>>>>>    drivers/vhost/vhost_cfs.c                     |  341 +++++
>>>>>    drivers/vhost/vringh.c                        |  332 +++++
>>>>>    drivers/vhost/vsock.c                         |   20 +-
>>>>>    drivers/virtio/Kconfig                        |    9 +
>>>>>    drivers/virtio/Makefile                       |    1 +
>>>>>    drivers/virtio/virtio_pci_common.c            |   25 +-
>>>>>    drivers/virtio/virtio_pci_epf.c               |  670 ++++++++++
>>>>>    include/linux/mod_devicetable.h               |    6 +
>>>>>    include/linux/rpmsg.h                         |    6 +
>>>>>    {drivers/vhost => include/linux}/vhost.h      |  132 +-
>>>>>    include/linux/virtio.h                        |    3 +
>>>>>    include/linux/virtio_config.h                 |   42 +
>>>>>    include/linux/vringh.h                        |   46 +
>>>>>    samples/rpmsg/rpmsg_client_sample.c           |   32 +-
>>>>>    tools/virtio/virtio_test.c                    |    2 +-
>>>>>    39 files changed, 7083 insertions(+), 183 deletions(-)
>>>>>    create mode 100644 drivers/ntb/ntb_vhost.c
>>>>>    create mode 100644 drivers/ntb/ntb_virtio.c
>>>>>    create mode 100644 drivers/ntb/ntb_virtio.h
>>>>>    create mode 100644 drivers/pci/endpoint/functions/pci-epf-vhost.c
>>>>>    create mode 100644 drivers/rpmsg/rpmsg_cfs.c
>>>>>    create mode 100644 drivers/rpmsg/vhost_rpmsg_bus.c
>>>>>    create mode 100644 drivers/vhost/vhost_cfs.c
>>>>>    create mode 100644 drivers/virtio/virtio_pci_epf.c
>>>>>    rename {drivers/vhost => include/linux}/vhost.h (66%)
>>>>>
>>>>> -- 
>>>>> 2.17.1
>>>>>
>
Jason Wang July 7, 2020, 9:47 a.m. UTC | #9
On 2020/7/6 下午5:32, Kishon Vijay Abraham I wrote:
> Hi Jason,
>
> On 7/3/2020 12:46 PM, Jason Wang wrote:
>> On 2020/7/2 下午9:35, Kishon Vijay Abraham I wrote:
>>> Hi Jason,
>>>
>>> On 7/2/2020 3:40 PM, Jason Wang wrote:
>>>> On 2020/7/2 下午5:51, Michael S. Tsirkin wrote:
>>>>> On Thu, Jul 02, 2020 at 01:51:21PM +0530, Kishon Vijay Abraham I wrote:
>>>>>> This series enhances Linux Vhost support to enable SoC-to-SoC
>>>>>> communication over MMIO. This series enables rpmsg communication between
>>>>>> two SoCs using both PCIe RC<->EP and HOST1-NTB-HOST2
>>>>>>
>>>>>> 1) Modify vhost to use standard Linux driver model
>>>>>> 2) Add support in vring to access virtqueue over MMIO
>>>>>> 3) Add vhost client driver for rpmsg
>>>>>> 4) Add PCIe RC driver (uses virtio) and PCIe EP driver (uses vhost) for
>>>>>>       rpmsg communication between two SoCs connected to each other
>>>>>> 5) Add NTB Virtio driver and NTB Vhost driver for rpmsg communication
>>>>>>       between two SoCs connected via NTB
>>>>>> 6) Add configfs to configure the components
>>>>>>
>>>>>> UseCase1 :
>>>>>>
>>>>>>     VHOST RPMSG                     VIRTIO RPMSG
>>>>>>          +                               +
>>>>>>          |                               |
>>>>>>          |                               |
>>>>>>          |                               |
>>>>>>          |                               |
>>>>>> +-----v------+                 +------v-------+
>>>>>> |   Linux    |                 |     Linux    |
>>>>>> |  Endpoint  |                 | Root Complex |
>>>>>> |            <----------------->              |
>>>>>> |            |                 |              |
>>>>>> |    SOC1    |                 |     SOC2     |
>>>>>> +------------+                 +--------------+
>>>>>>
>>>>>> UseCase 2:
>>>>>>
>>>>>>         VHOST RPMSG                                      VIRTIO RPMSG
>>>>>>              +                                                 +
>>>>>>              |                                                 |
>>>>>>              |                                                 |
>>>>>>              |                                                 |
>>>>>>              |                                                 |
>>>>>>       +------v------+                                   +------v------+
>>>>>>       |             |                                   |             |
>>>>>>       |    HOST1    |                                   |    HOST2    |
>>>>>>       |             |                                   |             |
>>>>>>       +------^------+                                   +------^------+
>>>>>>              |                                                 |
>>>>>>              |                                                 |
>>>>>> +---------------------------------------------------------------------+
>>>>>> |  +------v------+                                   +------v------+  |
>>>>>> |  |             |                                   |             |  |
>>>>>> |  |     EP      |                                   |     EP      |  |
>>>>>> |  | CONTROLLER1 |                                   | CONTROLLER2 |  |
>>>>>> |  |             <----------------------------------->             |  |
>>>>>> |  |             |                                   |             |  |
>>>>>> |  |             |                                   |             |  |
>>>>>> |  |             |  SoC With Multiple EP Instances   |             |  |
>>>>>> |  |             |  (Configured using NTB Function)  |             |  |
>>>>>> |  +-------------+                                   +-------------+  |
>>>>>> +---------------------------------------------------------------------+
>>>>>>
>>>>>> Software Layering:
>>>>>>
>>>>>> The high-level SW layering should look something like below. This series
>>>>>> adds support only for RPMSG VHOST, however something similar should be
>>>>>> done for net and scsi. With that any vhost device (PCI, NTB, Platform
>>>>>> device, user) can use any of the vhost client driver.
>>>>>>
>>>>>>
>>>>>>        +----------------+  +-----------+  +------------+  +----------+
>>>>>>        |  RPMSG VHOST   |  | NET VHOST |  | SCSI VHOST |  |    X     |
>>>>>>        +-------^--------+  +-----^-----+  +-----^------+  +----^-----+
>>>>>>                |                 |              |              |
>>>>>>                |                 |              |              |
>>>>>>                |                 |              |              |
>>>>>> +-----------v-----------------v--------------v--------------v----------+
>>>>>> |                            VHOST CORE                                |
>>>>>> +--------^---------------^--------------------^------------------^-----+
>>>>>>             |               |                    |                  |
>>>>>>             |               |                    |                  |
>>>>>>             |               |                    |                  |
>>>>>> +--------v-------+  +----v------+  +----------v----------+  +----v-----+
>>>>>> |  PCI EPF VHOST |  | NTB VHOST |  |PLATFORM DEVICE VHOST|  |    X     |
>>>>>> +----------------+  +-----------+  +---------------------+  +----------+
>>>>>>
>>>>>> This was initially proposed here [1]
>>>>>>
>>>>>> [1] -> https://lore.kernel.org/r/2cf00ec4-1ed6-f66e-6897-006d1a5b6390@ti.com
>>>>> I find this very interesting. A huge patchset so will take a bit
>>>>> to review, but I certainly plan to do that. Thanks!
>>>> Yes, it would be better if there's a git branch for us to have a look.
>>> I've pushed the branch
>>> https://github.com/kishon/linux-wip.git vhost_rpmsg_pci_ntb_rfc
>>
>> Thanks
>>
>>
>>>> Btw, I'm not sure I get the big picture, but I vaguely feel some of the work is
>>>> duplicated with vDPA (e.g the epf transport or vhost bus).
>>> This is about connecting two different HW systems both running Linux and
>>> doesn't necessarily involve virtualization.
>>
>> Right, this is something similar to VOP
>> (Documentation/misc-devices/mic/mic_overview.rst). The different is the
>> hardware I guess and VOP use userspace application to implement the device.
> I'd also like to point out, this series tries to have communication between two
> SoCs in vendor agnostic way. Since this series solves for 2 usecases (PCIe
> RC<->EP and NTB), for the NTB case it directly plugs into NTB framework and any
> of the HW in NTB below should be able to use a virtio-vhost communication
>
> #ls drivers/ntb/hw/
> amd  epf  idt  intel  mscc
>
> And similarly for the PCIe RC<->EP communication, this adds a generic endpoint
> function driver and hence any SoC that supports configurable PCIe endpoint can
> use virtio-vhost communication
>
> # ls drivers/pci/controller/dwc/*ep*
> drivers/pci/controller/dwc/pcie-designware-ep.c
> drivers/pci/controller/dwc/pcie-uniphier-ep.c
> drivers/pci/controller/dwc/pci-layerscape-ep.c


Thanks for those backgrounds.


>
>>
>>>    So there is no guest or host as in
>>> virtualization but two entirely different systems connected via PCIe cable, one
>>> acting as guest and one as host. So one system will provide virtio
>>> functionality reserving memory for virtqueues and the other provides vhost
>>> functionality providing a way to access the virtqueues in virtio memory. One is
>>> source and the other is sink and there is no intermediate entity. (vhost was
>>> probably intermediate entity in virtualization?)
>>
>> (Not a native English speaker) but "vhost" could introduce some confusion for
>> me since it was use for implementing virtio backend for userspace drivers. I
>> guess "vringh" could be better.
> Initially I had named this vringh but later decided to choose vhost instead of
> vringh. vhost is still a virtio backend (not necessarily userspace) though it
> now resides in an entirely different system. Whatever virtio is for a frontend
> system, vhost can be that for a backend system. vring can be for accessing
> virtqueue and can be used either in frontend or backend.


Ok.


>>
>>>> Have you considered to implement these through vDPA?
>>> IIUC vDPA only provides an interface to userspace and an in-kernel rpmsg driver
>>> or vhost net driver is not provided.
>>>
>>> The HW connection looks something like https://pasteboard.co/JfMVVHC.jpg
>>> (usecase2 above),
>>
>> I see.
>>
>>
>>>    all the boards run Linux. The middle board provides NTB
>>> functionality and board on either side provides virtio/vhost functionality and
>>> transfer data using rpmsg.
>>
>> So I wonder whether it's worthwhile for a new bus. Can we use the existed
>> virtio-bus/drivers? It might work as, except for the epf transport, we can
>> introduce a epf "vhost" transport driver.
> IMHO we'll need two buses one for frontend and other for backend because the
> two components can then co-operate/interact with each other to provide a
> functionality. Though both will seemingly provide similar callbacks, they are
> both provide symmetrical or complimentary funcitonality and need not be same or
> identical.
>
> Having the same bus can also create sequencing issues.
>
> If you look at virtio_dev_probe() of virtio_bus
>
> device_features = dev->config->get_features(dev);
>
> Now if we use same bus for both front-end and back-end, both will try to
> get_features when there has been no set_features. Ideally vhost device should
> be initialized first with the set of features it supports. Vhost and virtio
> should use "status" and "features" complimentarily and not identically.


Yes, but there's no need for doing status/features passthrough in epf 
vhost drivers.b


>
> virtio device (or frontend) cannot be initialized before vhost device (or
> backend) gets initialized with data such as features. Similarly vhost (backend)
> cannot access virqueues or buffers before virtio (frontend) sets
> VIRTIO_CONFIG_S_DRIVER_OK whereas that requirement is not there for virtio as
> the physical memory for virtqueues are created by virtio (frontend).


epf vhost drivers need to implement two devices: vhost(vringh) device 
and virtio device (which is a mediated device). The vhost(vringh) device 
is doing feature negotiation with the virtio device via RC/EP or NTB. 
The virtio device is doing feature negotiation with local virtio 
drivers. If there're feature mismatch, epf vhost drivers and do 
mediation between them.


>
>> It will have virtqueues but only used for the communication between itself and
>> uppter virtio driver. And it will have vringh queues which will be probe by
>> virtio epf transport drivers. And it needs to do datacopy between virtqueue and
>> vringh queues.
>>
>> It works like:
>>
>> virtio drivers <- virtqueue/virtio-bus -> epf vhost drivers <- vringh queue/epf>
>>
>> The advantages is that there's no need for writing new buses and drivers.
> I think this will work however there is an addtional copy between vringh queue
> and virtqueue,


I think not? E.g in use case 1), if we stick to virtio bus, we will have:

virtio-rpmsg (EP) <- virtio ring(1) -> epf vhost driver (EP) <- virtio 
ring(2) -> virtio pci (RC) <-> virtio rpmsg (RC)

What epf vhost driver did is to read from virtio ring(1) about the 
buffer len and addr and them DMA to Linux(RC)?


> in some cases adds latency because of forwarding interrupts
> between vhost and virtio driver, vhost drivers providing features (which means
> it has to be aware of which virtio driver will be connected).
> virtio drivers (front end) generally access the buffers from it's local memory
> but when in backend it can access over MMIO (like PCI EPF or NTB) or userspace.
>> Does this make sense?
> Two copies in my opinion is an issue but lets get others opinions as well.


Sure.


>
> Thanks for your suggestions!


You're welcome.

Thanks


>
> Regards
> Kishon
>
>> Thanks
>>
>>
>>> Thanks
>>> Kishon
>>>
>>>> Thanks
>>>>
>>>>
>>>>>> Kishon Vijay Abraham I (22):
>>>>>>      vhost: Make _feature_ bits a property of vhost device
>>>>>>      vhost: Introduce standard Linux driver model in VHOST
>>>>>>      vhost: Add ops for the VHOST driver to configure VHOST device
>>>>>>      vringh: Add helpers to access vring in MMIO
>>>>>>      vhost: Add MMIO helpers for operations on vhost virtqueue
>>>>>>      vhost: Introduce configfs entry for configuring VHOST
>>>>>>      virtio_pci: Use request_threaded_irq() instead of request_irq()
>>>>>>      rpmsg: virtio_rpmsg_bus: Disable receive virtqueue callback when
>>>>>>        reading messages
>>>>>>      rpmsg: Introduce configfs entry for configuring rpmsg
>>>>>>      rpmsg: virtio_rpmsg_bus: Add Address Service Notification support
>>>>>>      rpmsg: virtio_rpmsg_bus: Move generic rpmsg structure to
>>>>>>        rpmsg_internal.h
>>>>>>      virtio: Add ops to allocate and free buffer
>>>>>>      rpmsg: virtio_rpmsg_bus: Use virtio_alloc_buffer() and
>>>>>>        virtio_free_buffer()
>>>>>>      rpmsg: Add VHOST based remote processor messaging bus
>>>>>>      samples/rpmsg: Setup delayed work to send message
>>>>>>      samples/rpmsg: Wait for address to be bound to rpdev for sending
>>>>>>        message
>>>>>>      rpmsg.txt: Add Documentation to configure rpmsg using configfs
>>>>>>      virtio_pci: Add VIRTIO driver for VHOST on Configurable PCIe Endpoint
>>>>>>        device
>>>>>>      PCI: endpoint: Add EP function driver to provide VHOST interface
>>>>>>      NTB: Add a new NTB client driver to implement VIRTIO functionality
>>>>>>      NTB: Add a new NTB client driver to implement VHOST functionality
>>>>>>      NTB: Describe the ntb_virtio and ntb_vhost client in the documentation
>>>>>>
>>>>>>     Documentation/driver-api/ntb.rst              |   11 +
>>>>>>     Documentation/rpmsg.txt                       |   56 +
>>>>>>     drivers/ntb/Kconfig                           |   18 +
>>>>>>     drivers/ntb/Makefile                          |    2 +
>>>>>>     drivers/ntb/ntb_vhost.c                       |  776 +++++++++++
>>>>>>     drivers/ntb/ntb_virtio.c                      |  853 ++++++++++++
>>>>>>     drivers/ntb/ntb_virtio.h                      |   56 +
>>>>>>     drivers/pci/endpoint/functions/Kconfig        |   11 +
>>>>>>     drivers/pci/endpoint/functions/Makefile       |    1 +
>>>>>>     .../pci/endpoint/functions/pci-epf-vhost.c    | 1144 ++++++++++++++++
>>>>>>     drivers/rpmsg/Kconfig                         |   10 +
>>>>>>     drivers/rpmsg/Makefile                        |    3 +-
>>>>>>     drivers/rpmsg/rpmsg_cfs.c                     |  394 ++++++
>>>>>>     drivers/rpmsg/rpmsg_core.c                    |    7 +
>>>>>>     drivers/rpmsg/rpmsg_internal.h                |  136 ++
>>>>>>     drivers/rpmsg/vhost_rpmsg_bus.c               | 1151 +++++++++++++++++
>>>>>>     drivers/rpmsg/virtio_rpmsg_bus.c              |  184 ++-
>>>>>>     drivers/vhost/Kconfig                         |    1 +
>>>>>>     drivers/vhost/Makefile                        |    2 +-
>>>>>>     drivers/vhost/net.c                           |   10 +-
>>>>>>     drivers/vhost/scsi.c                          |   24 +-
>>>>>>     drivers/vhost/test.c                          |   17 +-
>>>>>>     drivers/vhost/vdpa.c                          |    2 +-
>>>>>>     drivers/vhost/vhost.c                         |  730 ++++++++++-
>>>>>>     drivers/vhost/vhost_cfs.c                     |  341 +++++
>>>>>>     drivers/vhost/vringh.c                        |  332 +++++
>>>>>>     drivers/vhost/vsock.c                         |   20 +-
>>>>>>     drivers/virtio/Kconfig                        |    9 +
>>>>>>     drivers/virtio/Makefile                       |    1 +
>>>>>>     drivers/virtio/virtio_pci_common.c            |   25 +-
>>>>>>     drivers/virtio/virtio_pci_epf.c               |  670 ++++++++++
>>>>>>     include/linux/mod_devicetable.h               |    6 +
>>>>>>     include/linux/rpmsg.h                         |    6 +
>>>>>>     {drivers/vhost => include/linux}/vhost.h      |  132 +-
>>>>>>     include/linux/virtio.h                        |    3 +
>>>>>>     include/linux/virtio_config.h                 |   42 +
>>>>>>     include/linux/vringh.h                        |   46 +
>>>>>>     samples/rpmsg/rpmsg_client_sample.c           |   32 +-
>>>>>>     tools/virtio/virtio_test.c                    |    2 +-
>>>>>>     39 files changed, 7083 insertions(+), 183 deletions(-)
>>>>>>     create mode 100644 drivers/ntb/ntb_vhost.c
>>>>>>     create mode 100644 drivers/ntb/ntb_virtio.c
>>>>>>     create mode 100644 drivers/ntb/ntb_virtio.h
>>>>>>     create mode 100644 drivers/pci/endpoint/functions/pci-epf-vhost.c
>>>>>>     create mode 100644 drivers/rpmsg/rpmsg_cfs.c
>>>>>>     create mode 100644 drivers/rpmsg/vhost_rpmsg_bus.c
>>>>>>     create mode 100644 drivers/vhost/vhost_cfs.c
>>>>>>     create mode 100644 drivers/virtio/virtio_pci_epf.c
>>>>>>     rename {drivers/vhost => include/linux}/vhost.h (66%)
>>>>>>
>>>>>> -- 
>>>>>> 2.17.1
>>>>>>
Kishon Vijay Abraham I July 7, 2020, 2:45 p.m. UTC | #10
Hi Jason,

On 7/7/2020 3:17 PM, Jason Wang wrote:
> 
> On 2020/7/6 下午5:32, Kishon Vijay Abraham I wrote:
>> Hi Jason,
>>
>> On 7/3/2020 12:46 PM, Jason Wang wrote:
>>> On 2020/7/2 下午9:35, Kishon Vijay Abraham I wrote:
>>>> Hi Jason,
>>>>
>>>> On 7/2/2020 3:40 PM, Jason Wang wrote:
>>>>> On 2020/7/2 下午5:51, Michael S. Tsirkin wrote:
>>>>>> On Thu, Jul 02, 2020 at 01:51:21PM +0530, Kishon Vijay Abraham I wrote:
>>>>>>> This series enhances Linux Vhost support to enable SoC-to-SoC
>>>>>>> communication over MMIO. This series enables rpmsg communication between
>>>>>>> two SoCs using both PCIe RC<->EP and HOST1-NTB-HOST2
>>>>>>>
>>>>>>> 1) Modify vhost to use standard Linux driver model
>>>>>>> 2) Add support in vring to access virtqueue over MMIO
>>>>>>> 3) Add vhost client driver for rpmsg
>>>>>>> 4) Add PCIe RC driver (uses virtio) and PCIe EP driver (uses vhost) for
>>>>>>>       rpmsg communication between two SoCs connected to each other
>>>>>>> 5) Add NTB Virtio driver and NTB Vhost driver for rpmsg communication
>>>>>>>       between two SoCs connected via NTB
>>>>>>> 6) Add configfs to configure the components
>>>>>>>
>>>>>>> UseCase1 :
>>>>>>>
>>>>>>>     VHOST RPMSG                     VIRTIO RPMSG
>>>>>>>          +                               +
>>>>>>>          |                               |
>>>>>>>          |                               |
>>>>>>>          |                               |
>>>>>>>          |                               |
>>>>>>> +-----v------+                 +------v-------+
>>>>>>> |   Linux    |                 |     Linux    |
>>>>>>> |  Endpoint  |                 | Root Complex |
>>>>>>> |            <----------------->              |
>>>>>>> |            |                 |              |
>>>>>>> |    SOC1    |                 |     SOC2     |
>>>>>>> +------------+                 +--------------+
>>>>>>>
>>>>>>> UseCase 2:
>>>>>>>
>>>>>>>         VHOST RPMSG                                      VIRTIO RPMSG
>>>>>>>              +                                                 +
>>>>>>>              |                                                 |
>>>>>>>              |                                                 |
>>>>>>>              |                                                 |
>>>>>>>              |                                                 |
>>>>>>>       +------v------+                                   +------v------+
>>>>>>>       |             |                                   |             |
>>>>>>>       |    HOST1    |                                   |    HOST2    |
>>>>>>>       |             |                                   |             |
>>>>>>>       +------^------+                                   +------^------+
>>>>>>>              |                                                 |
>>>>>>>              |                                                 |
>>>>>>> +---------------------------------------------------------------------+
>>>>>>> |  +------v------+                                   +------v------+  |
>>>>>>> |  |             |                                   |             |  |
>>>>>>> |  |     EP      |                                   |     EP      |  |
>>>>>>> |  | CONTROLLER1 |                                   | CONTROLLER2 |  |
>>>>>>> |  |             <----------------------------------->             |  |
>>>>>>> |  |             |                                   |             |  |
>>>>>>> |  |             |                                   |             |  |
>>>>>>> |  |             |  SoC With Multiple EP Instances   |             |  |
>>>>>>> |  |             |  (Configured using NTB Function)  |             |  |
>>>>>>> |  +-------------+                                   +-------------+  |
>>>>>>> +---------------------------------------------------------------------+
>>>>>>>
>>>>>>> Software Layering:
>>>>>>>
>>>>>>> The high-level SW layering should look something like below. This series
>>>>>>> adds support only for RPMSG VHOST, however something similar should be
>>>>>>> done for net and scsi. With that any vhost device (PCI, NTB, Platform
>>>>>>> device, user) can use any of the vhost client driver.
>>>>>>>
>>>>>>>
>>>>>>>        +----------------+  +-----------+  +------------+  +----------+
>>>>>>>        |  RPMSG VHOST   |  | NET VHOST |  | SCSI VHOST |  |    X     |
>>>>>>>        +-------^--------+  +-----^-----+  +-----^------+  +----^-----+
>>>>>>>                |                 |              |              |
>>>>>>>                |                 |              |              |
>>>>>>>                |                 |              |              |
>>>>>>> +-----------v-----------------v--------------v--------------v----------+
>>>>>>> |                            VHOST CORE                                |
>>>>>>> +--------^---------------^--------------------^------------------^-----+
>>>>>>>             |               |                    |                  |
>>>>>>>             |               |                    |                  |
>>>>>>>             |               |                    |                  |
>>>>>>> +--------v-------+  +----v------+  +----------v----------+  +----v-----+
>>>>>>> |  PCI EPF VHOST |  | NTB VHOST |  |PLATFORM DEVICE VHOST|  |    X     |
>>>>>>> +----------------+  +-----------+  +---------------------+  +----------+
>>>>>>>
>>>>>>> This was initially proposed here [1]
>>>>>>>
>>>>>>> [1] ->
>>>>>>> https://lore.kernel.org/r/2cf00ec4-1ed6-f66e-6897-006d1a5b6390@ti.com
>>>>>> I find this very interesting. A huge patchset so will take a bit
>>>>>> to review, but I certainly plan to do that. Thanks!
>>>>> Yes, it would be better if there's a git branch for us to have a look.
>>>> I've pushed the branch
>>>> https://github.com/kishon/linux-wip.git vhost_rpmsg_pci_ntb_rfc
>>>
>>> Thanks
>>>
>>>
>>>>> Btw, I'm not sure I get the big picture, but I vaguely feel some of the
>>>>> work is
>>>>> duplicated with vDPA (e.g the epf transport or vhost bus).
>>>> This is about connecting two different HW systems both running Linux and
>>>> doesn't necessarily involve virtualization.
>>>
>>> Right, this is something similar to VOP
>>> (Documentation/misc-devices/mic/mic_overview.rst). The different is the
>>> hardware I guess and VOP use userspace application to implement the device.
>> I'd also like to point out, this series tries to have communication between two
>> SoCs in vendor agnostic way. Since this series solves for 2 usecases (PCIe
>> RC<->EP and NTB), for the NTB case it directly plugs into NTB framework and any
>> of the HW in NTB below should be able to use a virtio-vhost communication
>>
>> #ls drivers/ntb/hw/
>> amd  epf  idt  intel  mscc
>>
>> And similarly for the PCIe RC<->EP communication, this adds a generic endpoint
>> function driver and hence any SoC that supports configurable PCIe endpoint can
>> use virtio-vhost communication
>>
>> # ls drivers/pci/controller/dwc/*ep*
>> drivers/pci/controller/dwc/pcie-designware-ep.c
>> drivers/pci/controller/dwc/pcie-uniphier-ep.c
>> drivers/pci/controller/dwc/pci-layerscape-ep.c
> 
> 
> Thanks for those backgrounds.
> 
> 
>>
>>>
>>>>    So there is no guest or host as in
>>>> virtualization but two entirely different systems connected via PCIe cable,
>>>> one
>>>> acting as guest and one as host. So one system will provide virtio
>>>> functionality reserving memory for virtqueues and the other provides vhost
>>>> functionality providing a way to access the virtqueues in virtio memory.
>>>> One is
>>>> source and the other is sink and there is no intermediate entity. (vhost was
>>>> probably intermediate entity in virtualization?)
>>>
>>> (Not a native English speaker) but "vhost" could introduce some confusion for
>>> me since it was use for implementing virtio backend for userspace drivers. I
>>> guess "vringh" could be better.
>> Initially I had named this vringh but later decided to choose vhost instead of
>> vringh. vhost is still a virtio backend (not necessarily userspace) though it
>> now resides in an entirely different system. Whatever virtio is for a frontend
>> system, vhost can be that for a backend system. vring can be for accessing
>> virtqueue and can be used either in frontend or backend.
> 
> 
> Ok.
> 
> 
>>>
>>>>> Have you considered to implement these through vDPA?
>>>> IIUC vDPA only provides an interface to userspace and an in-kernel rpmsg
>>>> driver
>>>> or vhost net driver is not provided.
>>>>
>>>> The HW connection looks something like https://pasteboard.co/JfMVVHC.jpg
>>>> (usecase2 above),
>>>
>>> I see.
>>>
>>>
>>>>    all the boards run Linux. The middle board provides NTB
>>>> functionality and board on either side provides virtio/vhost functionality and
>>>> transfer data using rpmsg.
>>>
>>> So I wonder whether it's worthwhile for a new bus. Can we use the existed
>>> virtio-bus/drivers? It might work as, except for the epf transport, we can
>>> introduce a epf "vhost" transport driver.
>> IMHO we'll need two buses one for frontend and other for backend because the
>> two components can then co-operate/interact with each other to provide a
>> functionality. Though both will seemingly provide similar callbacks, they are
>> both provide symmetrical or complimentary funcitonality and need not be same or
>> identical.
>>
>> Having the same bus can also create sequencing issues.
>>
>> If you look at virtio_dev_probe() of virtio_bus
>>
>> device_features = dev->config->get_features(dev);
>>
>> Now if we use same bus for both front-end and back-end, both will try to
>> get_features when there has been no set_features. Ideally vhost device should
>> be initialized first with the set of features it supports. Vhost and virtio
>> should use "status" and "features" complimentarily and not identically.
> 
> 
> Yes, but there's no need for doing status/features passthrough in epf vhost
> drivers.b
> 
> 
>>
>> virtio device (or frontend) cannot be initialized before vhost device (or
>> backend) gets initialized with data such as features. Similarly vhost (backend)
>> cannot access virqueues or buffers before virtio (frontend) sets
>> VIRTIO_CONFIG_S_DRIVER_OK whereas that requirement is not there for virtio as
>> the physical memory for virtqueues are created by virtio (frontend).
> 
> 
> epf vhost drivers need to implement two devices: vhost(vringh) device and
> virtio device (which is a mediated device). The vhost(vringh) device is doing
> feature negotiation with the virtio device via RC/EP or NTB. The virtio device
> is doing feature negotiation with local virtio drivers. If there're feature
> mismatch, epf vhost drivers and do mediation between them.

Here epf vhost should be initialized with a set of features for it to negotiate
either as vhost device or virtio device no? Where should the initial feature
set for epf vhost come from?
> 
> 
>>
>>> It will have virtqueues but only used for the communication between itself and
>>> uppter virtio driver. And it will have vringh queues which will be probe by
>>> virtio epf transport drivers. And it needs to do datacopy between virtqueue and
>>> vringh queues.
>>>
>>> It works like:
>>>
>>> virtio drivers <- virtqueue/virtio-bus -> epf vhost drivers <- vringh
>>> queue/epf>
>>>
>>> The advantages is that there's no need for writing new buses and drivers.
>> I think this will work however there is an addtional copy between vringh queue
>> and virtqueue,
> 
> 
> I think not? E.g in use case 1), if we stick to virtio bus, we will have:
> 
> virtio-rpmsg (EP) <- virtio ring(1) -> epf vhost driver (EP) <- virtio ring(2)
> -> virtio pci (RC) <-> virtio rpmsg (RC)

IIUC epf vhost driver (EP) will access virtio ring(2) using vringh? And virtio
ring(2) is created by virtio pci (RC).
> 
> What epf vhost driver did is to read from virtio ring(1) about the buffer len
> and addr and them DMA to Linux(RC)?

okay, I made some optimization here where vhost-rpmsg using a helper writes a
buffer from rpmsg's upper layer directly to remote Linux (RC) as against here
were it has to be first written to virtio ring (1).

Thinking how this would look for NTB
virtio-rpmsg (HOST1) <- virtio ring(1) -> NTB(HOST1) <-> NTB(HOST2)  <- virtio
ring(2) -> virtio-rpmsg (HOST2)

Here the NTB(HOST1) will access the virtio ring(2) using vringh?

Do you also think this will work seamlessly with virtio_net.c, virtio_blk.c?

I'd like to get clarity on two things in the approach you suggested, one is
features (since epf vhost should ideally be transparent to any virtio driver)
and the other is how certain inputs to virtio device such as number of buffers
be determined.

Thanks again for your suggestions!

Regards
Kishon

> 
> 
>> in some cases adds latency because of forwarding interrupts
>> between vhost and virtio driver, vhost drivers providing features (which means
>> it has to be aware of which virtio driver will be connected).
>> virtio drivers (front end) generally access the buffers from it's local memory
>> but when in backend it can access over MMIO (like PCI EPF or NTB) or userspace.
>>> Does this make sense?
>> Two copies in my opinion is an issue but lets get others opinions as well.
> 
> 
> Sure.
> 
> 
>>
>> Thanks for your suggestions!
> 
> 
> You're welcome.
> 
> Thanks
> 
> 
>>
>> Regards
>> Kishon
>>
>>> Thanks
>>>
>>>
>>>> Thanks
>>>> Kishon
>>>>
>>>>> Thanks
>>>>>
>>>>>
>>>>>>> Kishon Vijay Abraham I (22):
>>>>>>>      vhost: Make _feature_ bits a property of vhost device
>>>>>>>      vhost: Introduce standard Linux driver model in VHOST
>>>>>>>      vhost: Add ops for the VHOST driver to configure VHOST device
>>>>>>>      vringh: Add helpers to access vring in MMIO
>>>>>>>      vhost: Add MMIO helpers for operations on vhost virtqueue
>>>>>>>      vhost: Introduce configfs entry for configuring VHOST
>>>>>>>      virtio_pci: Use request_threaded_irq() instead of request_irq()
>>>>>>>      rpmsg: virtio_rpmsg_bus: Disable receive virtqueue callback when
>>>>>>>        reading messages
>>>>>>>      rpmsg: Introduce configfs entry for configuring rpmsg
>>>>>>>      rpmsg: virtio_rpmsg_bus: Add Address Service Notification support
>>>>>>>      rpmsg: virtio_rpmsg_bus: Move generic rpmsg structure to
>>>>>>>        rpmsg_internal.h
>>>>>>>      virtio: Add ops to allocate and free buffer
>>>>>>>      rpmsg: virtio_rpmsg_bus: Use virtio_alloc_buffer() and
>>>>>>>        virtio_free_buffer()
>>>>>>>      rpmsg: Add VHOST based remote processor messaging bus
>>>>>>>      samples/rpmsg: Setup delayed work to send message
>>>>>>>      samples/rpmsg: Wait for address to be bound to rpdev for sending
>>>>>>>        message
>>>>>>>      rpmsg.txt: Add Documentation to configure rpmsg using configfs
>>>>>>>      virtio_pci: Add VIRTIO driver for VHOST on Configurable PCIe Endpoint
>>>>>>>        device
>>>>>>>      PCI: endpoint: Add EP function driver to provide VHOST interface
>>>>>>>      NTB: Add a new NTB client driver to implement VIRTIO functionality
>>>>>>>      NTB: Add a new NTB client driver to implement VHOST functionality
>>>>>>>      NTB: Describe the ntb_virtio and ntb_vhost client in the documentation
>>>>>>>
>>>>>>>     Documentation/driver-api/ntb.rst              |   11 +
>>>>>>>     Documentation/rpmsg.txt                       |   56 +
>>>>>>>     drivers/ntb/Kconfig                           |   18 +
>>>>>>>     drivers/ntb/Makefile                          |    2 +
>>>>>>>     drivers/ntb/ntb_vhost.c                       |  776 +++++++++++
>>>>>>>     drivers/ntb/ntb_virtio.c                      |  853 ++++++++++++
>>>>>>>     drivers/ntb/ntb_virtio.h                      |   56 +
>>>>>>>     drivers/pci/endpoint/functions/Kconfig        |   11 +
>>>>>>>     drivers/pci/endpoint/functions/Makefile       |    1 +
>>>>>>>     .../pci/endpoint/functions/pci-epf-vhost.c    | 1144 ++++++++++++++++
>>>>>>>     drivers/rpmsg/Kconfig                         |   10 +
>>>>>>>     drivers/rpmsg/Makefile                        |    3 +-
>>>>>>>     drivers/rpmsg/rpmsg_cfs.c                     |  394 ++++++
>>>>>>>     drivers/rpmsg/rpmsg_core.c                    |    7 +
>>>>>>>     drivers/rpmsg/rpmsg_internal.h                |  136 ++
>>>>>>>     drivers/rpmsg/vhost_rpmsg_bus.c               | 1151 +++++++++++++++++
>>>>>>>     drivers/rpmsg/virtio_rpmsg_bus.c              |  184 ++-
>>>>>>>     drivers/vhost/Kconfig                         |    1 +
>>>>>>>     drivers/vhost/Makefile                        |    2 +-
>>>>>>>     drivers/vhost/net.c                           |   10 +-
>>>>>>>     drivers/vhost/scsi.c                          |   24 +-
>>>>>>>     drivers/vhost/test.c                          |   17 +-
>>>>>>>     drivers/vhost/vdpa.c                          |    2 +-
>>>>>>>     drivers/vhost/vhost.c                         |  730 ++++++++++-
>>>>>>>     drivers/vhost/vhost_cfs.c                     |  341 +++++
>>>>>>>     drivers/vhost/vringh.c                        |  332 +++++
>>>>>>>     drivers/vhost/vsock.c                         |   20 +-
>>>>>>>     drivers/virtio/Kconfig                        |    9 +
>>>>>>>     drivers/virtio/Makefile                       |    1 +
>>>>>>>     drivers/virtio/virtio_pci_common.c            |   25 +-
>>>>>>>     drivers/virtio/virtio_pci_epf.c               |  670 ++++++++++
>>>>>>>     include/linux/mod_devicetable.h               |    6 +
>>>>>>>     include/linux/rpmsg.h                         |    6 +
>>>>>>>     {drivers/vhost => include/linux}/vhost.h      |  132 +-
>>>>>>>     include/linux/virtio.h                        |    3 +
>>>>>>>     include/linux/virtio_config.h                 |   42 +
>>>>>>>     include/linux/vringh.h                        |   46 +
>>>>>>>     samples/rpmsg/rpmsg_client_sample.c           |   32 +-
>>>>>>>     tools/virtio/virtio_test.c                    |    2 +-
>>>>>>>     39 files changed, 7083 insertions(+), 183 deletions(-)
>>>>>>>     create mode 100644 drivers/ntb/ntb_vhost.c
>>>>>>>     create mode 100644 drivers/ntb/ntb_virtio.c
>>>>>>>     create mode 100644 drivers/ntb/ntb_virtio.h
>>>>>>>     create mode 100644 drivers/pci/endpoint/functions/pci-epf-vhost.c
>>>>>>>     create mode 100644 drivers/rpmsg/rpmsg_cfs.c
>>>>>>>     create mode 100644 drivers/rpmsg/vhost_rpmsg_bus.c
>>>>>>>     create mode 100644 drivers/vhost/vhost_cfs.c
>>>>>>>     create mode 100644 drivers/virtio/virtio_pci_epf.c
>>>>>>>     rename {drivers/vhost => include/linux}/vhost.h (66%)
>>>>>>>
>>>>>>> -- 
>>>>>>> 2.17.1
>>>>>>>
>
Jason Wang July 8, 2020, 11:22 a.m. UTC | #11
On 2020/7/7 下午10:45, Kishon Vijay Abraham I wrote:
> Hi Jason,
>
> On 7/7/2020 3:17 PM, Jason Wang wrote:
>> On 2020/7/6 下午5:32, Kishon Vijay Abraham I wrote:
>>> Hi Jason,
>>>
>>> On 7/3/2020 12:46 PM, Jason Wang wrote:
>>>> On 2020/7/2 下午9:35, Kishon Vijay Abraham I wrote:
>>>>> Hi Jason,
>>>>>
>>>>> On 7/2/2020 3:40 PM, Jason Wang wrote:
>>>>>> On 2020/7/2 下午5:51, Michael S. Tsirkin wrote:
>>>>>>> On Thu, Jul 02, 2020 at 01:51:21PM +0530, Kishon Vijay Abraham I wrote:
>>>>>>>> This series enhances Linux Vhost support to enable SoC-to-SoC
>>>>>>>> communication over MMIO. This series enables rpmsg communication between
>>>>>>>> two SoCs using both PCIe RC<->EP and HOST1-NTB-HOST2
>>>>>>>>
>>>>>>>> 1) Modify vhost to use standard Linux driver model
>>>>>>>> 2) Add support in vring to access virtqueue over MMIO
>>>>>>>> 3) Add vhost client driver for rpmsg
>>>>>>>> 4) Add PCIe RC driver (uses virtio) and PCIe EP driver (uses vhost) for
>>>>>>>>        rpmsg communication between two SoCs connected to each other
>>>>>>>> 5) Add NTB Virtio driver and NTB Vhost driver for rpmsg communication
>>>>>>>>        between two SoCs connected via NTB
>>>>>>>> 6) Add configfs to configure the components
>>>>>>>>
>>>>>>>> UseCase1 :
>>>>>>>>
>>>>>>>>      VHOST RPMSG                     VIRTIO RPMSG
>>>>>>>>           +                               +
>>>>>>>>           |                               |
>>>>>>>>           |                               |
>>>>>>>>           |                               |
>>>>>>>>           |                               |
>>>>>>>> +-----v------+                 +------v-------+
>>>>>>>> |   Linux    |                 |     Linux    |
>>>>>>>> |  Endpoint  |                 | Root Complex |
>>>>>>>> |            <----------------->              |
>>>>>>>> |            |                 |              |
>>>>>>>> |    SOC1    |                 |     SOC2     |
>>>>>>>> +------------+                 +--------------+
>>>>>>>>
>>>>>>>> UseCase 2:
>>>>>>>>
>>>>>>>>          VHOST RPMSG                                      VIRTIO RPMSG
>>>>>>>>               +                                                 +
>>>>>>>>               |                                                 |
>>>>>>>>               |                                                 |
>>>>>>>>               |                                                 |
>>>>>>>>               |                                                 |
>>>>>>>>        +------v------+                                   +------v------+
>>>>>>>>        |             |                                   |             |
>>>>>>>>        |    HOST1    |                                   |    HOST2    |
>>>>>>>>        |             |                                   |             |
>>>>>>>>        +------^------+                                   +------^------+
>>>>>>>>               |                                                 |
>>>>>>>>               |                                                 |
>>>>>>>> +---------------------------------------------------------------------+
>>>>>>>> |  +------v------+                                   +------v------+  |
>>>>>>>> |  |             |                                   |             |  |
>>>>>>>> |  |     EP      |                                   |     EP      |  |
>>>>>>>> |  | CONTROLLER1 |                                   | CONTROLLER2 |  |
>>>>>>>> |  |             <----------------------------------->             |  |
>>>>>>>> |  |             |                                   |             |  |
>>>>>>>> |  |             |                                   |             |  |
>>>>>>>> |  |             |  SoC With Multiple EP Instances   |             |  |
>>>>>>>> |  |             |  (Configured using NTB Function)  |             |  |
>>>>>>>> |  +-------------+                                   +-------------+  |
>>>>>>>> +---------------------------------------------------------------------+
>>>>>>>>
>>>>>>>> Software Layering:
>>>>>>>>
>>>>>>>> The high-level SW layering should look something like below. This series
>>>>>>>> adds support only for RPMSG VHOST, however something similar should be
>>>>>>>> done for net and scsi. With that any vhost device (PCI, NTB, Platform
>>>>>>>> device, user) can use any of the vhost client driver.
>>>>>>>>
>>>>>>>>
>>>>>>>>         +----------------+  +-----------+  +------------+  +----------+
>>>>>>>>         |  RPMSG VHOST   |  | NET VHOST |  | SCSI VHOST |  |    X     |
>>>>>>>>         +-------^--------+  +-----^-----+  +-----^------+  +----^-----+
>>>>>>>>                 |                 |              |              |
>>>>>>>>                 |                 |              |              |
>>>>>>>>                 |                 |              |              |
>>>>>>>> +-----------v-----------------v--------------v--------------v----------+
>>>>>>>> |                            VHOST CORE                                |
>>>>>>>> +--------^---------------^--------------------^------------------^-----+
>>>>>>>>              |               |                    |                  |
>>>>>>>>              |               |                    |                  |
>>>>>>>>              |               |                    |                  |
>>>>>>>> +--------v-------+  +----v------+  +----------v----------+  +----v-----+
>>>>>>>> |  PCI EPF VHOST |  | NTB VHOST |  |PLATFORM DEVICE VHOST|  |    X     |
>>>>>>>> +----------------+  +-----------+  +---------------------+  +----------+
>>>>>>>>
>>>>>>>> This was initially proposed here [1]
>>>>>>>>
>>>>>>>> [1] ->
>>>>>>>> https://lore.kernel.org/r/2cf00ec4-1ed6-f66e-6897-006d1a5b6390@ti.com
>>>>>>> I find this very interesting. A huge patchset so will take a bit
>>>>>>> to review, but I certainly plan to do that. Thanks!
>>>>>> Yes, it would be better if there's a git branch for us to have a look.
>>>>> I've pushed the branch
>>>>> https://github.com/kishon/linux-wip.git vhost_rpmsg_pci_ntb_rfc
>>>> Thanks
>>>>
>>>>
>>>>>> Btw, I'm not sure I get the big picture, but I vaguely feel some of the
>>>>>> work is
>>>>>> duplicated with vDPA (e.g the epf transport or vhost bus).
>>>>> This is about connecting two different HW systems both running Linux and
>>>>> doesn't necessarily involve virtualization.
>>>> Right, this is something similar to VOP
>>>> (Documentation/misc-devices/mic/mic_overview.rst). The different is the
>>>> hardware I guess and VOP use userspace application to implement the device.
>>> I'd also like to point out, this series tries to have communication between two
>>> SoCs in vendor agnostic way. Since this series solves for 2 usecases (PCIe
>>> RC<->EP and NTB), for the NTB case it directly plugs into NTB framework and any
>>> of the HW in NTB below should be able to use a virtio-vhost communication
>>>
>>> #ls drivers/ntb/hw/
>>> amd  epf  idt  intel  mscc
>>>
>>> And similarly for the PCIe RC<->EP communication, this adds a generic endpoint
>>> function driver and hence any SoC that supports configurable PCIe endpoint can
>>> use virtio-vhost communication
>>>
>>> # ls drivers/pci/controller/dwc/*ep*
>>> drivers/pci/controller/dwc/pcie-designware-ep.c
>>> drivers/pci/controller/dwc/pcie-uniphier-ep.c
>>> drivers/pci/controller/dwc/pci-layerscape-ep.c
>>
>> Thanks for those backgrounds.
>>
>>
>>>>>     So there is no guest or host as in
>>>>> virtualization but two entirely different systems connected via PCIe cable,
>>>>> one
>>>>> acting as guest and one as host. So one system will provide virtio
>>>>> functionality reserving memory for virtqueues and the other provides vhost
>>>>> functionality providing a way to access the virtqueues in virtio memory.
>>>>> One is
>>>>> source and the other is sink and there is no intermediate entity. (vhost was
>>>>> probably intermediate entity in virtualization?)
>>>> (Not a native English speaker) but "vhost" could introduce some confusion for
>>>> me since it was use for implementing virtio backend for userspace drivers. I
>>>> guess "vringh" could be better.
>>> Initially I had named this vringh but later decided to choose vhost instead of
>>> vringh. vhost is still a virtio backend (not necessarily userspace) though it
>>> now resides in an entirely different system. Whatever virtio is for a frontend
>>> system, vhost can be that for a backend system. vring can be for accessing
>>> virtqueue and can be used either in frontend or backend.
>>
>> Ok.
>>
>>
>>>>>> Have you considered to implement these through vDPA?
>>>>> IIUC vDPA only provides an interface to userspace and an in-kernel rpmsg
>>>>> driver
>>>>> or vhost net driver is not provided.
>>>>>
>>>>> The HW connection looks something like https://pasteboard.co/JfMVVHC.jpg
>>>>> (usecase2 above),
>>>> I see.
>>>>
>>>>
>>>>>     all the boards run Linux. The middle board provides NTB
>>>>> functionality and board on either side provides virtio/vhost functionality and
>>>>> transfer data using rpmsg.
>>>> So I wonder whether it's worthwhile for a new bus. Can we use the existed
>>>> virtio-bus/drivers? It might work as, except for the epf transport, we can
>>>> introduce a epf "vhost" transport driver.
>>> IMHO we'll need two buses one for frontend and other for backend because the
>>> two components can then co-operate/interact with each other to provide a
>>> functionality. Though both will seemingly provide similar callbacks, they are
>>> both provide symmetrical or complimentary funcitonality and need not be same or
>>> identical.
>>>
>>> Having the same bus can also create sequencing issues.
>>>
>>> If you look at virtio_dev_probe() of virtio_bus
>>>
>>> device_features = dev->config->get_features(dev);
>>>
>>> Now if we use same bus for both front-end and back-end, both will try to
>>> get_features when there has been no set_features. Ideally vhost device should
>>> be initialized first with the set of features it supports. Vhost and virtio
>>> should use "status" and "features" complimentarily and not identically.
>>
>> Yes, but there's no need for doing status/features passthrough in epf vhost
>> drivers.b
>>
>>
>>> virtio device (or frontend) cannot be initialized before vhost device (or
>>> backend) gets initialized with data such as features. Similarly vhost (backend)
>>> cannot access virqueues or buffers before virtio (frontend) sets
>>> VIRTIO_CONFIG_S_DRIVER_OK whereas that requirement is not there for virtio as
>>> the physical memory for virtqueues are created by virtio (frontend).
>>
>> epf vhost drivers need to implement two devices: vhost(vringh) device and
>> virtio device (which is a mediated device). The vhost(vringh) device is doing
>> feature negotiation with the virtio device via RC/EP or NTB. The virtio device
>> is doing feature negotiation with local virtio drivers. If there're feature
>> mismatch, epf vhost drivers and do mediation between them.
> Here epf vhost should be initialized with a set of features for it to negotiate
> either as vhost device or virtio device no? Where should the initial feature
> set for epf vhost come from?


I think it can work as:

1) Having an initial features (hard coded in the code) set X in epf vhost
2) Using this X for both virtio device and vhost(vringh) device
3) local virtio driver will negotiate with virtio device with feature set Y
4) remote virtio driver will negotiate with vringh device with feature set Z
5) mediate between feature Y and feature Z since both Y and Z are a 
subset of X


>>
>>>> It will have virtqueues but only used for the communication between itself and
>>>> uppter virtio driver. And it will have vringh queues which will be probe by
>>>> virtio epf transport drivers. And it needs to do datacopy between virtqueue and
>>>> vringh queues.
>>>>
>>>> It works like:
>>>>
>>>> virtio drivers <- virtqueue/virtio-bus -> epf vhost drivers <- vringh
>>>> queue/epf>
>>>>
>>>> The advantages is that there's no need for writing new buses and drivers.
>>> I think this will work however there is an addtional copy between vringh queue
>>> and virtqueue,
>>
>> I think not? E.g in use case 1), if we stick to virtio bus, we will have:
>>
>> virtio-rpmsg (EP) <- virtio ring(1) -> epf vhost driver (EP) <- virtio ring(2)
>> -> virtio pci (RC) <-> virtio rpmsg (RC)
> IIUC epf vhost driver (EP) will access virtio ring(2) using vringh?


Yes.


> And virtio
> ring(2) is created by virtio pci (RC).


Yes.


>> What epf vhost driver did is to read from virtio ring(1) about the buffer len
>> and addr and them DMA to Linux(RC)?
> okay, I made some optimization here where vhost-rpmsg using a helper writes a
> buffer from rpmsg's upper layer directly to remote Linux (RC) as against here
> were it has to be first written to virtio ring (1).
>
> Thinking how this would look for NTB
> virtio-rpmsg (HOST1) <- virtio ring(1) -> NTB(HOST1) <-> NTB(HOST2)  <- virtio
> ring(2) -> virtio-rpmsg (HOST2)
>
> Here the NTB(HOST1) will access the virtio ring(2) using vringh?


Yes, I think so it needs to use vring to access virtio ring (1) as well.


>
> Do you also think this will work seamlessly with virtio_net.c, virtio_blk.c?


Yes.


>
> I'd like to get clarity on two things in the approach you suggested, one is
> features (since epf vhost should ideally be transparent to any virtio driver)


We can have have an array of pre-defined features indexed by virtio 
device id in the code.


> and the other is how certain inputs to virtio device such as number of buffers
> be determined.


We can start from hard coded the value like 256, or introduce some API 
for user to change the value.


>
> Thanks again for your suggestions!


You're welcome.

Note that I just want to check whether or not we can reuse the virtio 
bus/driver. It's something similar to what you proposed in Software 
Layering but we just replace "vhost core" with "virtio bus" and move the 
vhost core below epf/ntb/platform transport.

Thanks


>
> Regards
> Kishon
>
>>
>>> in some cases adds latency because of forwarding interrupts
>>> between vhost and virtio driver, vhost drivers providing features (which means
>>> it has to be aware of which virtio driver will be connected).
>>> virtio drivers (front end) generally access the buffers from it's local memory
>>> but when in backend it can access over MMIO (like PCI EPF or NTB) or userspace.
>>>> Does this make sense?
>>> Two copies in my opinion is an issue but lets get others opinions as well.
>>
>> Sure.
>>
>>
>>> Thanks for your suggestions!
>>
>> You're welcome.
>>
>> Thanks
>>
>>
>>> Regards
>>> Kishon
>>>
>>>> Thanks
>>>>
>>>>
>>>>> Thanks
>>>>> Kishon
>>>>>
>>>>>> Thanks
>>>>>>
>>>>>>
>>>>>>>> Kishon Vijay Abraham I (22):
>>>>>>>>       vhost: Make _feature_ bits a property of vhost device
>>>>>>>>       vhost: Introduce standard Linux driver model in VHOST
>>>>>>>>       vhost: Add ops for the VHOST driver to configure VHOST device
>>>>>>>>       vringh: Add helpers to access vring in MMIO
>>>>>>>>       vhost: Add MMIO helpers for operations on vhost virtqueue
>>>>>>>>       vhost: Introduce configfs entry for configuring VHOST
>>>>>>>>       virtio_pci: Use request_threaded_irq() instead of request_irq()
>>>>>>>>       rpmsg: virtio_rpmsg_bus: Disable receive virtqueue callback when
>>>>>>>>         reading messages
>>>>>>>>       rpmsg: Introduce configfs entry for configuring rpmsg
>>>>>>>>       rpmsg: virtio_rpmsg_bus: Add Address Service Notification support
>>>>>>>>       rpmsg: virtio_rpmsg_bus: Move generic rpmsg structure to
>>>>>>>>         rpmsg_internal.h
>>>>>>>>       virtio: Add ops to allocate and free buffer
>>>>>>>>       rpmsg: virtio_rpmsg_bus: Use virtio_alloc_buffer() and
>>>>>>>>         virtio_free_buffer()
>>>>>>>>       rpmsg: Add VHOST based remote processor messaging bus
>>>>>>>>       samples/rpmsg: Setup delayed work to send message
>>>>>>>>       samples/rpmsg: Wait for address to be bound to rpdev for sending
>>>>>>>>         message
>>>>>>>>       rpmsg.txt: Add Documentation to configure rpmsg using configfs
>>>>>>>>       virtio_pci: Add VIRTIO driver for VHOST on Configurable PCIe Endpoint
>>>>>>>>         device
>>>>>>>>       PCI: endpoint: Add EP function driver to provide VHOST interface
>>>>>>>>       NTB: Add a new NTB client driver to implement VIRTIO functionality
>>>>>>>>       NTB: Add a new NTB client driver to implement VHOST functionality
>>>>>>>>       NTB: Describe the ntb_virtio and ntb_vhost client in the documentation
>>>>>>>>
>>>>>>>>      Documentation/driver-api/ntb.rst              |   11 +
>>>>>>>>      Documentation/rpmsg.txt                       |   56 +
>>>>>>>>      drivers/ntb/Kconfig                           |   18 +
>>>>>>>>      drivers/ntb/Makefile                          |    2 +
>>>>>>>>      drivers/ntb/ntb_vhost.c                       |  776 +++++++++++
>>>>>>>>      drivers/ntb/ntb_virtio.c                      |  853 ++++++++++++
>>>>>>>>      drivers/ntb/ntb_virtio.h                      |   56 +
>>>>>>>>      drivers/pci/endpoint/functions/Kconfig        |   11 +
>>>>>>>>      drivers/pci/endpoint/functions/Makefile       |    1 +
>>>>>>>>      .../pci/endpoint/functions/pci-epf-vhost.c    | 1144 ++++++++++++++++
>>>>>>>>      drivers/rpmsg/Kconfig                         |   10 +
>>>>>>>>      drivers/rpmsg/Makefile                        |    3 +-
>>>>>>>>      drivers/rpmsg/rpmsg_cfs.c                     |  394 ++++++
>>>>>>>>      drivers/rpmsg/rpmsg_core.c                    |    7 +
>>>>>>>>      drivers/rpmsg/rpmsg_internal.h                |  136 ++
>>>>>>>>      drivers/rpmsg/vhost_rpmsg_bus.c               | 1151 +++++++++++++++++
>>>>>>>>      drivers/rpmsg/virtio_rpmsg_bus.c              |  184 ++-
>>>>>>>>      drivers/vhost/Kconfig                         |    1 +
>>>>>>>>      drivers/vhost/Makefile                        |    2 +-
>>>>>>>>      drivers/vhost/net.c                           |   10 +-
>>>>>>>>      drivers/vhost/scsi.c                          |   24 +-
>>>>>>>>      drivers/vhost/test.c                          |   17 +-
>>>>>>>>      drivers/vhost/vdpa.c                          |    2 +-
>>>>>>>>      drivers/vhost/vhost.c                         |  730 ++++++++++-
>>>>>>>>      drivers/vhost/vhost_cfs.c                     |  341 +++++
>>>>>>>>      drivers/vhost/vringh.c                        |  332 +++++
>>>>>>>>      drivers/vhost/vsock.c                         |   20 +-
>>>>>>>>      drivers/virtio/Kconfig                        |    9 +
>>>>>>>>      drivers/virtio/Makefile                       |    1 +
>>>>>>>>      drivers/virtio/virtio_pci_common.c            |   25 +-
>>>>>>>>      drivers/virtio/virtio_pci_epf.c               |  670 ++++++++++
>>>>>>>>      include/linux/mod_devicetable.h               |    6 +
>>>>>>>>      include/linux/rpmsg.h                         |    6 +
>>>>>>>>      {drivers/vhost => include/linux}/vhost.h      |  132 +-
>>>>>>>>      include/linux/virtio.h                        |    3 +
>>>>>>>>      include/linux/virtio_config.h                 |   42 +
>>>>>>>>      include/linux/vringh.h                        |   46 +
>>>>>>>>      samples/rpmsg/rpmsg_client_sample.c           |   32 +-
>>>>>>>>      tools/virtio/virtio_test.c                    |    2 +-
>>>>>>>>      39 files changed, 7083 insertions(+), 183 deletions(-)
>>>>>>>>      create mode 100644 drivers/ntb/ntb_vhost.c
>>>>>>>>      create mode 100644 drivers/ntb/ntb_virtio.c
>>>>>>>>      create mode 100644 drivers/ntb/ntb_virtio.h
>>>>>>>>      create mode 100644 drivers/pci/endpoint/functions/pci-epf-vhost.c
>>>>>>>>      create mode 100644 drivers/rpmsg/rpmsg_cfs.c
>>>>>>>>      create mode 100644 drivers/rpmsg/vhost_rpmsg_bus.c
>>>>>>>>      create mode 100644 drivers/vhost/vhost_cfs.c
>>>>>>>>      create mode 100644 drivers/virtio/virtio_pci_epf.c
>>>>>>>>      rename {drivers/vhost => include/linux}/vhost.h (66%)
>>>>>>>>
>>>>>>>> -- 
>>>>>>>> 2.17.1
>>>>>>>>
Kishon Vijay Abraham I July 8, 2020, 1:13 p.m. UTC | #12
Hi Jason,

On 7/8/2020 4:52 PM, Jason Wang wrote:
> 
> On 2020/7/7 下午10:45, Kishon Vijay Abraham I wrote:
>> Hi Jason,
>>
>> On 7/7/2020 3:17 PM, Jason Wang wrote:
>>> On 2020/7/6 下午5:32, Kishon Vijay Abraham I wrote:
>>>> Hi Jason,
>>>>
>>>> On 7/3/2020 12:46 PM, Jason Wang wrote:
>>>>> On 2020/7/2 下午9:35, Kishon Vijay Abraham I wrote:
>>>>>> Hi Jason,
>>>>>>
>>>>>> On 7/2/2020 3:40 PM, Jason Wang wrote:
>>>>>>> On 2020/7/2 下午5:51, Michael S. Tsirkin wrote:
>>>>>>>> On Thu, Jul 02, 2020 at 01:51:21PM +0530, Kishon Vijay Abraham I wrote:
>>>>>>>>> This series enhances Linux Vhost support to enable SoC-to-SoC
>>>>>>>>> communication over MMIO. This series enables rpmsg communication between
>>>>>>>>> two SoCs using both PCIe RC<->EP and HOST1-NTB-HOST2
>>>>>>>>>
>>>>>>>>> 1) Modify vhost to use standard Linux driver model
>>>>>>>>> 2) Add support in vring to access virtqueue over MMIO
>>>>>>>>> 3) Add vhost client driver for rpmsg
>>>>>>>>> 4) Add PCIe RC driver (uses virtio) and PCIe EP driver (uses vhost) for
>>>>>>>>>        rpmsg communication between two SoCs connected to each other
>>>>>>>>> 5) Add NTB Virtio driver and NTB Vhost driver for rpmsg communication
>>>>>>>>>        between two SoCs connected via NTB
>>>>>>>>> 6) Add configfs to configure the components
>>>>>>>>>
>>>>>>>>> UseCase1 :
>>>>>>>>>
>>>>>>>>>      VHOST RPMSG                     VIRTIO RPMSG
>>>>>>>>>           +                               +
>>>>>>>>>           |                               |
>>>>>>>>>           |                               |
>>>>>>>>>           |                               |
>>>>>>>>>           |                               |
>>>>>>>>> +-----v------+                 +------v-------+
>>>>>>>>> |   Linux    |                 |     Linux    |
>>>>>>>>> |  Endpoint  |                 | Root Complex |
>>>>>>>>> |            <----------------->              |
>>>>>>>>> |            |                 |              |
>>>>>>>>> |    SOC1    |                 |     SOC2     |
>>>>>>>>> +------------+                 +--------------+
>>>>>>>>>
>>>>>>>>> UseCase 2:
>>>>>>>>>
>>>>>>>>>          VHOST RPMSG                                      VIRTIO RPMSG
>>>>>>>>>               +                                                 +
>>>>>>>>>               |                                                 |
>>>>>>>>>               |                                                 |
>>>>>>>>>               |                                                 |
>>>>>>>>>               |                                                 |
>>>>>>>>>        +------v------+                                   +------v------+
>>>>>>>>>        |             |                                   |             |
>>>>>>>>>        |    HOST1    |                                   |    HOST2    |
>>>>>>>>>        |             |                                   |             |
>>>>>>>>>        +------^------+                                   +------^------+
>>>>>>>>>               |                                                 |
>>>>>>>>>               |                                                 |
>>>>>>>>> +---------------------------------------------------------------------+
>>>>>>>>> |  +------v------+                                   +------v------+  |
>>>>>>>>> |  |             |                                   |             |  |
>>>>>>>>> |  |     EP      |                                   |     EP      |  |
>>>>>>>>> |  | CONTROLLER1 |                                   | CONTROLLER2 |  |
>>>>>>>>> |  |             <----------------------------------->             |  |
>>>>>>>>> |  |             |                                   |             |  |
>>>>>>>>> |  |             |                                   |             |  |
>>>>>>>>> |  |             |  SoC With Multiple EP Instances   |             |  |
>>>>>>>>> |  |             |  (Configured using NTB Function)  |             |  |
>>>>>>>>> |  +-------------+                                   +-------------+  |
>>>>>>>>> +---------------------------------------------------------------------+
>>>>>>>>>
>>>>>>>>> Software Layering:
>>>>>>>>>
>>>>>>>>> The high-level SW layering should look something like below. This series
>>>>>>>>> adds support only for RPMSG VHOST, however something similar should be
>>>>>>>>> done for net and scsi. With that any vhost device (PCI, NTB, Platform
>>>>>>>>> device, user) can use any of the vhost client driver.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>         +----------------+  +-----------+  +------------+  +----------+
>>>>>>>>>         |  RPMSG VHOST   |  | NET VHOST |  | SCSI VHOST |  |    X     |
>>>>>>>>>         +-------^--------+  +-----^-----+  +-----^------+  +----^-----+
>>>>>>>>>                 |                 |              |              |
>>>>>>>>>                 |                 |              |              |
>>>>>>>>>                 |                 |              |              |
>>>>>>>>> +-----------v-----------------v--------------v--------------v----------+
>>>>>>>>> |                            VHOST CORE                                |
>>>>>>>>> +--------^---------------^--------------------^------------------^-----+
>>>>>>>>>              |               |                    |                  |
>>>>>>>>>              |               |                    |                  |
>>>>>>>>>              |               |                    |                  |
>>>>>>>>> +--------v-------+  +----v------+  +----------v----------+  +----v-----+
>>>>>>>>> |  PCI EPF VHOST |  | NTB VHOST |  |PLATFORM DEVICE VHOST|  |    X     |
>>>>>>>>> +----------------+  +-----------+  +---------------------+  +----------+
>>>>>>>>>
>>>>>>>>> This was initially proposed here [1]
>>>>>>>>>
>>>>>>>>> [1] ->
>>>>>>>>> https://lore.kernel.org/r/2cf00ec4-1ed6-f66e-6897-006d1a5b6390@ti.com
>>>>>>>> I find this very interesting. A huge patchset so will take a bit
>>>>>>>> to review, but I certainly plan to do that. Thanks!
>>>>>>> Yes, it would be better if there's a git branch for us to have a look.
>>>>>> I've pushed the branch
>>>>>> https://github.com/kishon/linux-wip.git vhost_rpmsg_pci_ntb_rfc
>>>>> Thanks
>>>>>
>>>>>
>>>>>>> Btw, I'm not sure I get the big picture, but I vaguely feel some of the
>>>>>>> work is
>>>>>>> duplicated with vDPA (e.g the epf transport or vhost bus).
>>>>>> This is about connecting two different HW systems both running Linux and
>>>>>> doesn't necessarily involve virtualization.
>>>>> Right, this is something similar to VOP
>>>>> (Documentation/misc-devices/mic/mic_overview.rst). The different is the
>>>>> hardware I guess and VOP use userspace application to implement the device.
>>>> I'd also like to point out, this series tries to have communication between
>>>> two
>>>> SoCs in vendor agnostic way. Since this series solves for 2 usecases (PCIe
>>>> RC<->EP and NTB), for the NTB case it directly plugs into NTB framework and
>>>> any
>>>> of the HW in NTB below should be able to use a virtio-vhost communication
>>>>
>>>> #ls drivers/ntb/hw/
>>>> amd  epf  idt  intel  mscc
>>>>
>>>> And similarly for the PCIe RC<->EP communication, this adds a generic endpoint
>>>> function driver and hence any SoC that supports configurable PCIe endpoint can
>>>> use virtio-vhost communication
>>>>
>>>> # ls drivers/pci/controller/dwc/*ep*
>>>> drivers/pci/controller/dwc/pcie-designware-ep.c
>>>> drivers/pci/controller/dwc/pcie-uniphier-ep.c
>>>> drivers/pci/controller/dwc/pci-layerscape-ep.c
>>>
>>> Thanks for those backgrounds.
>>>
>>>
>>>>>>     So there is no guest or host as in
>>>>>> virtualization but two entirely different systems connected via PCIe cable,
>>>>>> one
>>>>>> acting as guest and one as host. So one system will provide virtio
>>>>>> functionality reserving memory for virtqueues and the other provides vhost
>>>>>> functionality providing a way to access the virtqueues in virtio memory.
>>>>>> One is
>>>>>> source and the other is sink and there is no intermediate entity. (vhost was
>>>>>> probably intermediate entity in virtualization?)
>>>>> (Not a native English speaker) but "vhost" could introduce some confusion for
>>>>> me since it was use for implementing virtio backend for userspace drivers. I
>>>>> guess "vringh" could be better.
>>>> Initially I had named this vringh but later decided to choose vhost instead of
>>>> vringh. vhost is still a virtio backend (not necessarily userspace) though it
>>>> now resides in an entirely different system. Whatever virtio is for a frontend
>>>> system, vhost can be that for a backend system. vring can be for accessing
>>>> virtqueue and can be used either in frontend or backend.
>>>
>>> Ok.
>>>
>>>
>>>>>>> Have you considered to implement these through vDPA?
>>>>>> IIUC vDPA only provides an interface to userspace and an in-kernel rpmsg
>>>>>> driver
>>>>>> or vhost net driver is not provided.
>>>>>>
>>>>>> The HW connection looks something like https://pasteboard.co/JfMVVHC.jpg
>>>>>> (usecase2 above),
>>>>> I see.
>>>>>
>>>>>
>>>>>>     all the boards run Linux. The middle board provides NTB
>>>>>> functionality and board on either side provides virtio/vhost
>>>>>> functionality and
>>>>>> transfer data using rpmsg.
>>>>> So I wonder whether it's worthwhile for a new bus. Can we use the existed
>>>>> virtio-bus/drivers? It might work as, except for the epf transport, we can
>>>>> introduce a epf "vhost" transport driver.
>>>> IMHO we'll need two buses one for frontend and other for backend because the
>>>> two components can then co-operate/interact with each other to provide a
>>>> functionality. Though both will seemingly provide similar callbacks, they are
>>>> both provide symmetrical or complimentary funcitonality and need not be
>>>> same or
>>>> identical.
>>>>
>>>> Having the same bus can also create sequencing issues.
>>>>
>>>> If you look at virtio_dev_probe() of virtio_bus
>>>>
>>>> device_features = dev->config->get_features(dev);
>>>>
>>>> Now if we use same bus for both front-end and back-end, both will try to
>>>> get_features when there has been no set_features. Ideally vhost device should
>>>> be initialized first with the set of features it supports. Vhost and virtio
>>>> should use "status" and "features" complimentarily and not identically.
>>>
>>> Yes, but there's no need for doing status/features passthrough in epf vhost
>>> drivers.b
>>>
>>>
>>>> virtio device (or frontend) cannot be initialized before vhost device (or
>>>> backend) gets initialized with data such as features. Similarly vhost
>>>> (backend)
>>>> cannot access virqueues or buffers before virtio (frontend) sets
>>>> VIRTIO_CONFIG_S_DRIVER_OK whereas that requirement is not there for virtio as
>>>> the physical memory for virtqueues are created by virtio (frontend).
>>>
>>> epf vhost drivers need to implement two devices: vhost(vringh) device and
>>> virtio device (which is a mediated device). The vhost(vringh) device is doing
>>> feature negotiation with the virtio device via RC/EP or NTB. The virtio device
>>> is doing feature negotiation with local virtio drivers. If there're feature
>>> mismatch, epf vhost drivers and do mediation between them.
>> Here epf vhost should be initialized with a set of features for it to negotiate
>> either as vhost device or virtio device no? Where should the initial feature
>> set for epf vhost come from?
> 
> 
> I think it can work as:
> 
> 1) Having an initial features (hard coded in the code) set X in epf vhost
> 2) Using this X for both virtio device and vhost(vringh) device
> 3) local virtio driver will negotiate with virtio device with feature set Y
> 4) remote virtio driver will negotiate with vringh device with feature set Z
> 5) mediate between feature Y and feature Z since both Y and Z are a subset of X
> 
> 

okay. I'm also thinking if we could have configfs for configuring this. Anyways
we could find different approaches of configuring this.
>>>
>>>>> It will have virtqueues but only used for the communication between itself
>>>>> and
>>>>> uppter virtio driver. And it will have vringh queues which will be probe by
>>>>> virtio epf transport drivers. And it needs to do datacopy between
>>>>> virtqueue and
>>>>> vringh queues.
>>>>>
>>>>> It works like:
>>>>>
>>>>> virtio drivers <- virtqueue/virtio-bus -> epf vhost drivers <- vringh
>>>>> queue/epf>
>>>>>
>>>>> The advantages is that there's no need for writing new buses and drivers.
>>>> I think this will work however there is an addtional copy between vringh queue
>>>> and virtqueue,
>>>
>>> I think not? E.g in use case 1), if we stick to virtio bus, we will have:
>>>
>>> virtio-rpmsg (EP) <- virtio ring(1) -> epf vhost driver (EP) <- virtio ring(2)
>>> -> virtio pci (RC) <-> virtio rpmsg (RC)
>> IIUC epf vhost driver (EP) will access virtio ring(2) using vringh?
> 
> 
> Yes.
> 
> 
>> And virtio
>> ring(2) is created by virtio pci (RC).
> 
> 
> Yes.
> 
> 
>>> What epf vhost driver did is to read from virtio ring(1) about the buffer len
>>> and addr and them DMA to Linux(RC)?
>> okay, I made some optimization here where vhost-rpmsg using a helper writes a
>> buffer from rpmsg's upper layer directly to remote Linux (RC) as against here
>> were it has to be first written to virtio ring (1).
>>
>> Thinking how this would look for NTB
>> virtio-rpmsg (HOST1) <- virtio ring(1) -> NTB(HOST1) <-> NTB(HOST2)  <- virtio
>> ring(2) -> virtio-rpmsg (HOST2)
>>
>> Here the NTB(HOST1) will access the virtio ring(2) using vringh?
> 
> 
> Yes, I think so it needs to use vring to access virtio ring (1) as well.

NTB(HOST1) and virtio ring(1) will be in the same system. So it doesn't have to
use vring. virtio ring(1) is by the virtio device the NTB(HOST1) creates.
> 
> 
>>
>> Do you also think this will work seamlessly with virtio_net.c, virtio_blk.c?
> 
> 
> Yes.

okay, I haven't looked at this but the backend of virtio_blk should access an
actual storage device no?
> 
> 
>>
>> I'd like to get clarity on two things in the approach you suggested, one is
>> features (since epf vhost should ideally be transparent to any virtio driver)
> 
> 
> We can have have an array of pre-defined features indexed by virtio device id
> in the code.
> 
> 
>> and the other is how certain inputs to virtio device such as number of buffers
>> be determined.
> 
> 
> We can start from hard coded the value like 256, or introduce some API for user
> to change the value.
> 
> 
>>
>> Thanks again for your suggestions!
> 
> 
> You're welcome.
> 
> Note that I just want to check whether or not we can reuse the virtio
> bus/driver. It's something similar to what you proposed in Software Layering
> but we just replace "vhost core" with "virtio bus" and move the vhost core
> below epf/ntb/platform transport.

Got it. My initial design was based on my understanding of your comments [1].

I'll try to create something based on your proposed design here.

Regards
Kishon

[1] ->
https://lore.kernel.org/linux-pci/59982499-0fc1-2e39-9ff9-993fb4dd7dcc@redhat.com/
> 
> Thanks
> 
> 
>>
>> Regards
>> Kishon
>>
>>>
>>>> in some cases adds latency because of forwarding interrupts
>>>> between vhost and virtio driver, vhost drivers providing features (which means
>>>> it has to be aware of which virtio driver will be connected).
>>>> virtio drivers (front end) generally access the buffers from it's local memory
>>>> but when in backend it can access over MMIO (like PCI EPF or NTB) or
>>>> userspace.
>>>>> Does this make sense?
>>>> Two copies in my opinion is an issue but lets get others opinions as well.
>>>
>>> Sure.
>>>
>>>
>>>> Thanks for your suggestions!
>>>
>>> You're welcome.
>>>
>>> Thanks
>>>
>>>
>>>> Regards
>>>> Kishon
>>>>
>>>>> Thanks
>>>>>
>>>>>
>>>>>> Thanks
>>>>>> Kishon
>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>>
>>>>>>>>> Kishon Vijay Abraham I (22):
>>>>>>>>>       vhost: Make _feature_ bits a property of vhost device
>>>>>>>>>       vhost: Introduce standard Linux driver model in VHOST
>>>>>>>>>       vhost: Add ops for the VHOST driver to configure VHOST device
>>>>>>>>>       vringh: Add helpers to access vring in MMIO
>>>>>>>>>       vhost: Add MMIO helpers for operations on vhost virtqueue
>>>>>>>>>       vhost: Introduce configfs entry for configuring VHOST
>>>>>>>>>       virtio_pci: Use request_threaded_irq() instead of request_irq()
>>>>>>>>>       rpmsg: virtio_rpmsg_bus: Disable receive virtqueue callback when
>>>>>>>>>         reading messages
>>>>>>>>>       rpmsg: Introduce configfs entry for configuring rpmsg
>>>>>>>>>       rpmsg: virtio_rpmsg_bus: Add Address Service Notification support
>>>>>>>>>       rpmsg: virtio_rpmsg_bus: Move generic rpmsg structure to
>>>>>>>>>         rpmsg_internal.h
>>>>>>>>>       virtio: Add ops to allocate and free buffer
>>>>>>>>>       rpmsg: virtio_rpmsg_bus: Use virtio_alloc_buffer() and
>>>>>>>>>         virtio_free_buffer()
>>>>>>>>>       rpmsg: Add VHOST based remote processor messaging bus
>>>>>>>>>       samples/rpmsg: Setup delayed work to send message
>>>>>>>>>       samples/rpmsg: Wait for address to be bound to rpdev for sending
>>>>>>>>>         message
>>>>>>>>>       rpmsg.txt: Add Documentation to configure rpmsg using configfs
>>>>>>>>>       virtio_pci: Add VIRTIO driver for VHOST on Configurable PCIe
>>>>>>>>> Endpoint
>>>>>>>>>         device
>>>>>>>>>       PCI: endpoint: Add EP function driver to provide VHOST interface
>>>>>>>>>       NTB: Add a new NTB client driver to implement VIRTIO functionality
>>>>>>>>>       NTB: Add a new NTB client driver to implement VHOST functionality
>>>>>>>>>       NTB: Describe the ntb_virtio and ntb_vhost client in the
>>>>>>>>> documentation
>>>>>>>>>
>>>>>>>>>      Documentation/driver-api/ntb.rst              |   11 +
>>>>>>>>>      Documentation/rpmsg.txt                       |   56 +
>>>>>>>>>      drivers/ntb/Kconfig                           |   18 +
>>>>>>>>>      drivers/ntb/Makefile                          |    2 +
>>>>>>>>>      drivers/ntb/ntb_vhost.c                       |  776 +++++++++++
>>>>>>>>>      drivers/ntb/ntb_virtio.c                      |  853 ++++++++++++
>>>>>>>>>      drivers/ntb/ntb_virtio.h                      |   56 +
>>>>>>>>>      drivers/pci/endpoint/functions/Kconfig        |   11 +
>>>>>>>>>      drivers/pci/endpoint/functions/Makefile       |    1 +
>>>>>>>>>      .../pci/endpoint/functions/pci-epf-vhost.c    | 1144
>>>>>>>>> ++++++++++++++++
>>>>>>>>>      drivers/rpmsg/Kconfig                         |   10 +
>>>>>>>>>      drivers/rpmsg/Makefile                        |    3 +-
>>>>>>>>>      drivers/rpmsg/rpmsg_cfs.c                     |  394 ++++++
>>>>>>>>>      drivers/rpmsg/rpmsg_core.c                    |    7 +
>>>>>>>>>      drivers/rpmsg/rpmsg_internal.h                |  136 ++
>>>>>>>>>      drivers/rpmsg/vhost_rpmsg_bus.c               | 1151
>>>>>>>>> +++++++++++++++++
>>>>>>>>>      drivers/rpmsg/virtio_rpmsg_bus.c              |  184 ++-
>>>>>>>>>      drivers/vhost/Kconfig                         |    1 +
>>>>>>>>>      drivers/vhost/Makefile                        |    2 +-
>>>>>>>>>      drivers/vhost/net.c                           |   10 +-
>>>>>>>>>      drivers/vhost/scsi.c                          |   24 +-
>>>>>>>>>      drivers/vhost/test.c                          |   17 +-
>>>>>>>>>      drivers/vhost/vdpa.c                          |    2 +-
>>>>>>>>>      drivers/vhost/vhost.c                         |  730 ++++++++++-
>>>>>>>>>      drivers/vhost/vhost_cfs.c                     |  341 +++++
>>>>>>>>>      drivers/vhost/vringh.c                        |  332 +++++
>>>>>>>>>      drivers/vhost/vsock.c                         |   20 +-
>>>>>>>>>      drivers/virtio/Kconfig                        |    9 +
>>>>>>>>>      drivers/virtio/Makefile                       |    1 +
>>>>>>>>>      drivers/virtio/virtio_pci_common.c            |   25 +-
>>>>>>>>>      drivers/virtio/virtio_pci_epf.c               |  670 ++++++++++
>>>>>>>>>      include/linux/mod_devicetable.h               |    6 +
>>>>>>>>>      include/linux/rpmsg.h                         |    6 +
>>>>>>>>>      {drivers/vhost => include/linux}/vhost.h      |  132 +-
>>>>>>>>>      include/linux/virtio.h                        |    3 +
>>>>>>>>>      include/linux/virtio_config.h                 |   42 +
>>>>>>>>>      include/linux/vringh.h                        |   46 +
>>>>>>>>>      samples/rpmsg/rpmsg_client_sample.c           |   32 +-
>>>>>>>>>      tools/virtio/virtio_test.c                    |    2 +-
>>>>>>>>>      39 files changed, 7083 insertions(+), 183 deletions(-)
>>>>>>>>>      create mode 100644 drivers/ntb/ntb_vhost.c
>>>>>>>>>      create mode 100644 drivers/ntb/ntb_virtio.c
>>>>>>>>>      create mode 100644 drivers/ntb/ntb_virtio.h
>>>>>>>>>      create mode 100644 drivers/pci/endpoint/functions/pci-epf-vhost.c
>>>>>>>>>      create mode 100644 drivers/rpmsg/rpmsg_cfs.c
>>>>>>>>>      create mode 100644 drivers/rpmsg/vhost_rpmsg_bus.c
>>>>>>>>>      create mode 100644 drivers/vhost/vhost_cfs.c
>>>>>>>>>      create mode 100644 drivers/virtio/virtio_pci_epf.c
>>>>>>>>>      rename {drivers/vhost => include/linux}/vhost.h (66%)
>>>>>>>>>
>>>>>>>>> -- 
>>>>>>>>> 2.17.1
>>>>>>>>>
>
Jason Wang July 9, 2020, 6:26 a.m. UTC | #13
On 2020/7/8 下午9:13, Kishon Vijay Abraham I wrote:
> Hi Jason,
>
> On 7/8/2020 4:52 PM, Jason Wang wrote:
>> On 2020/7/7 下午10:45, Kishon Vijay Abraham I wrote:
>>> Hi Jason,
>>>
>>> On 7/7/2020 3:17 PM, Jason Wang wrote:
>>>> On 2020/7/6 下午5:32, Kishon Vijay Abraham I wrote:
>>>>> Hi Jason,
>>>>>
>>>>> On 7/3/2020 12:46 PM, Jason Wang wrote:
>>>>>> On 2020/7/2 下午9:35, Kishon Vijay Abraham I wrote:
>>>>>>> Hi Jason,
>>>>>>>
>>>>>>> On 7/2/2020 3:40 PM, Jason Wang wrote:
>>>>>>>> On 2020/7/2 下午5:51, Michael S. Tsirkin wrote:
>>>>>>>>> On Thu, Jul 02, 2020 at 01:51:21PM +0530, Kishon Vijay Abraham I wrote:
>>>>>>>>>> This series enhances Linux Vhost support to enable SoC-to-SoC
>>>>>>>>>> communication over MMIO. This series enables rpmsg communication between
>>>>>>>>>> two SoCs using both PCIe RC<->EP and HOST1-NTB-HOST2
>>>>>>>>>>
>>>>>>>>>> 1) Modify vhost to use standard Linux driver model
>>>>>>>>>> 2) Add support in vring to access virtqueue over MMIO
>>>>>>>>>> 3) Add vhost client driver for rpmsg
>>>>>>>>>> 4) Add PCIe RC driver (uses virtio) and PCIe EP driver (uses vhost) for
>>>>>>>>>>         rpmsg communication between two SoCs connected to each other
>>>>>>>>>> 5) Add NTB Virtio driver and NTB Vhost driver for rpmsg communication
>>>>>>>>>>         between two SoCs connected via NTB
>>>>>>>>>> 6) Add configfs to configure the components
>>>>>>>>>>
>>>>>>>>>> UseCase1 :
>>>>>>>>>>
>>>>>>>>>>       VHOST RPMSG                     VIRTIO RPMSG
>>>>>>>>>>            +                               +
>>>>>>>>>>            |                               |
>>>>>>>>>>            |                               |
>>>>>>>>>>            |                               |
>>>>>>>>>>            |                               |
>>>>>>>>>> +-----v------+                 +------v-------+
>>>>>>>>>> |   Linux    |                 |     Linux    |
>>>>>>>>>> |  Endpoint  |                 | Root Complex |
>>>>>>>>>> |            <----------------->              |
>>>>>>>>>> |            |                 |              |
>>>>>>>>>> |    SOC1    |                 |     SOC2     |
>>>>>>>>>> +------------+                 +--------------+
>>>>>>>>>>
>>>>>>>>>> UseCase 2:
>>>>>>>>>>
>>>>>>>>>>           VHOST RPMSG                                      VIRTIO RPMSG
>>>>>>>>>>                +                                                 +
>>>>>>>>>>                |                                                 |
>>>>>>>>>>                |                                                 |
>>>>>>>>>>                |                                                 |
>>>>>>>>>>                |                                                 |
>>>>>>>>>>         +------v------+                                   +------v------+
>>>>>>>>>>         |             |                                   |             |
>>>>>>>>>>         |    HOST1    |                                   |    HOST2    |
>>>>>>>>>>         |             |                                   |             |
>>>>>>>>>>         +------^------+                                   +------^------+
>>>>>>>>>>                |                                                 |
>>>>>>>>>>                |                                                 |
>>>>>>>>>> +---------------------------------------------------------------------+
>>>>>>>>>> |  +------v------+                                   +------v------+  |
>>>>>>>>>> |  |             |                                   |             |  |
>>>>>>>>>> |  |     EP      |                                   |     EP      |  |
>>>>>>>>>> |  | CONTROLLER1 |                                   | CONTROLLER2 |  |
>>>>>>>>>> |  |             <----------------------------------->             |  |
>>>>>>>>>> |  |             |                                   |             |  |
>>>>>>>>>> |  |             |                                   |             |  |
>>>>>>>>>> |  |             |  SoC With Multiple EP Instances   |             |  |
>>>>>>>>>> |  |             |  (Configured using NTB Function)  |             |  |
>>>>>>>>>> |  +-------------+                                   +-------------+  |
>>>>>>>>>> +---------------------------------------------------------------------+
>>>>>>>>>>
>>>>>>>>>> Software Layering:
>>>>>>>>>>
>>>>>>>>>> The high-level SW layering should look something like below. This series
>>>>>>>>>> adds support only for RPMSG VHOST, however something similar should be
>>>>>>>>>> done for net and scsi. With that any vhost device (PCI, NTB, Platform
>>>>>>>>>> device, user) can use any of the vhost client driver.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>          +----------------+  +-----------+  +------------+  +----------+
>>>>>>>>>>          |  RPMSG VHOST   |  | NET VHOST |  | SCSI VHOST |  |    X     |
>>>>>>>>>>          +-------^--------+  +-----^-----+  +-----^------+  +----^-----+
>>>>>>>>>>                  |                 |              |              |
>>>>>>>>>>                  |                 |              |              |
>>>>>>>>>>                  |                 |              |              |
>>>>>>>>>> +-----------v-----------------v--------------v--------------v----------+
>>>>>>>>>> |                            VHOST CORE                                |
>>>>>>>>>> +--------^---------------^--------------------^------------------^-----+
>>>>>>>>>>               |               |                    |                  |
>>>>>>>>>>               |               |                    |                  |
>>>>>>>>>>               |               |                    |                  |
>>>>>>>>>> +--------v-------+  +----v------+  +----------v----------+  +----v-----+
>>>>>>>>>> |  PCI EPF VHOST |  | NTB VHOST |  |PLATFORM DEVICE VHOST|  |    X     |
>>>>>>>>>> +----------------+  +-----------+  +---------------------+  +----------+
>>>>>>>>>>
>>>>>>>>>> This was initially proposed here [1]
>>>>>>>>>>
>>>>>>>>>> [1] ->
>>>>>>>>>> https://lore.kernel.org/r/2cf00ec4-1ed6-f66e-6897-006d1a5b6390@ti.com
>>>>>>>>> I find this very interesting. A huge patchset so will take a bit
>>>>>>>>> to review, but I certainly plan to do that. Thanks!
>>>>>>>> Yes, it would be better if there's a git branch for us to have a look.
>>>>>>> I've pushed the branch
>>>>>>> https://github.com/kishon/linux-wip.git vhost_rpmsg_pci_ntb_rfc
>>>>>> Thanks
>>>>>>
>>>>>>
>>>>>>>> Btw, I'm not sure I get the big picture, but I vaguely feel some of the
>>>>>>>> work is
>>>>>>>> duplicated with vDPA (e.g the epf transport or vhost bus).
>>>>>>> This is about connecting two different HW systems both running Linux and
>>>>>>> doesn't necessarily involve virtualization.
>>>>>> Right, this is something similar to VOP
>>>>>> (Documentation/misc-devices/mic/mic_overview.rst). The different is the
>>>>>> hardware I guess and VOP use userspace application to implement the device.
>>>>> I'd also like to point out, this series tries to have communication between
>>>>> two
>>>>> SoCs in vendor agnostic way. Since this series solves for 2 usecases (PCIe
>>>>> RC<->EP and NTB), for the NTB case it directly plugs into NTB framework and
>>>>> any
>>>>> of the HW in NTB below should be able to use a virtio-vhost communication
>>>>>
>>>>> #ls drivers/ntb/hw/
>>>>> amd  epf  idt  intel  mscc
>>>>>
>>>>> And similarly for the PCIe RC<->EP communication, this adds a generic endpoint
>>>>> function driver and hence any SoC that supports configurable PCIe endpoint can
>>>>> use virtio-vhost communication
>>>>>
>>>>> # ls drivers/pci/controller/dwc/*ep*
>>>>> drivers/pci/controller/dwc/pcie-designware-ep.c
>>>>> drivers/pci/controller/dwc/pcie-uniphier-ep.c
>>>>> drivers/pci/controller/dwc/pci-layerscape-ep.c
>>>> Thanks for those backgrounds.
>>>>
>>>>
>>>>>>>      So there is no guest or host as in
>>>>>>> virtualization but two entirely different systems connected via PCIe cable,
>>>>>>> one
>>>>>>> acting as guest and one as host. So one system will provide virtio
>>>>>>> functionality reserving memory for virtqueues and the other provides vhost
>>>>>>> functionality providing a way to access the virtqueues in virtio memory.
>>>>>>> One is
>>>>>>> source and the other is sink and there is no intermediate entity. (vhost was
>>>>>>> probably intermediate entity in virtualization?)
>>>>>> (Not a native English speaker) but "vhost" could introduce some confusion for
>>>>>> me since it was use for implementing virtio backend for userspace drivers. I
>>>>>> guess "vringh" could be better.
>>>>> Initially I had named this vringh but later decided to choose vhost instead of
>>>>> vringh. vhost is still a virtio backend (not necessarily userspace) though it
>>>>> now resides in an entirely different system. Whatever virtio is for a frontend
>>>>> system, vhost can be that for a backend system. vring can be for accessing
>>>>> virtqueue and can be used either in frontend or backend.
>>>> Ok.
>>>>
>>>>
>>>>>>>> Have you considered to implement these through vDPA?
>>>>>>> IIUC vDPA only provides an interface to userspace and an in-kernel rpmsg
>>>>>>> driver
>>>>>>> or vhost net driver is not provided.
>>>>>>>
>>>>>>> The HW connection looks something like https://pasteboard.co/JfMVVHC.jpg
>>>>>>> (usecase2 above),
>>>>>> I see.
>>>>>>
>>>>>>
>>>>>>>      all the boards run Linux. The middle board provides NTB
>>>>>>> functionality and board on either side provides virtio/vhost
>>>>>>> functionality and
>>>>>>> transfer data using rpmsg.
>>>>>> So I wonder whether it's worthwhile for a new bus. Can we use the existed
>>>>>> virtio-bus/drivers? It might work as, except for the epf transport, we can
>>>>>> introduce a epf "vhost" transport driver.
>>>>> IMHO we'll need two buses one for frontend and other for backend because the
>>>>> two components can then co-operate/interact with each other to provide a
>>>>> functionality. Though both will seemingly provide similar callbacks, they are
>>>>> both provide symmetrical or complimentary funcitonality and need not be
>>>>> same or
>>>>> identical.
>>>>>
>>>>> Having the same bus can also create sequencing issues.
>>>>>
>>>>> If you look at virtio_dev_probe() of virtio_bus
>>>>>
>>>>> device_features = dev->config->get_features(dev);
>>>>>
>>>>> Now if we use same bus for both front-end and back-end, both will try to
>>>>> get_features when there has been no set_features. Ideally vhost device should
>>>>> be initialized first with the set of features it supports. Vhost and virtio
>>>>> should use "status" and "features" complimentarily and not identically.
>>>> Yes, but there's no need for doing status/features passthrough in epf vhost
>>>> drivers.b
>>>>
>>>>
>>>>> virtio device (or frontend) cannot be initialized before vhost device (or
>>>>> backend) gets initialized with data such as features. Similarly vhost
>>>>> (backend)
>>>>> cannot access virqueues or buffers before virtio (frontend) sets
>>>>> VIRTIO_CONFIG_S_DRIVER_OK whereas that requirement is not there for virtio as
>>>>> the physical memory for virtqueues are created by virtio (frontend).
>>>> epf vhost drivers need to implement two devices: vhost(vringh) device and
>>>> virtio device (which is a mediated device). The vhost(vringh) device is doing
>>>> feature negotiation with the virtio device via RC/EP or NTB. The virtio device
>>>> is doing feature negotiation with local virtio drivers. If there're feature
>>>> mismatch, epf vhost drivers and do mediation between them.
>>> Here epf vhost should be initialized with a set of features for it to negotiate
>>> either as vhost device or virtio device no? Where should the initial feature
>>> set for epf vhost come from?
>>
>> I think it can work as:
>>
>> 1) Having an initial features (hard coded in the code) set X in epf vhost
>> 2) Using this X for both virtio device and vhost(vringh) device
>> 3) local virtio driver will negotiate with virtio device with feature set Y
>> 4) remote virtio driver will negotiate with vringh device with feature set Z
>> 5) mediate between feature Y and feature Z since both Y and Z are a subset of X
>>
>>
> okay. I'm also thinking if we could have configfs for configuring this. Anyways
> we could find different approaches of configuring this.


Yes, and I think some management API is needed even in the design of 
your "Software Layering". In that figure, rpmsg vhost need some pre-set 
or hard-coded features.


>>>>>> It will have virtqueues but only used for the communication between itself
>>>>>> and
>>>>>> uppter virtio driver. And it will have vringh queues which will be probe by
>>>>>> virtio epf transport drivers. And it needs to do datacopy between
>>>>>> virtqueue and
>>>>>> vringh queues.
>>>>>>
>>>>>> It works like:
>>>>>>
>>>>>> virtio drivers <- virtqueue/virtio-bus -> epf vhost drivers <- vringh
>>>>>> queue/epf>
>>>>>>
>>>>>> The advantages is that there's no need for writing new buses and drivers.
>>>>> I think this will work however there is an addtional copy between vringh queue
>>>>> and virtqueue,
>>>> I think not? E.g in use case 1), if we stick to virtio bus, we will have:
>>>>
>>>> virtio-rpmsg (EP) <- virtio ring(1) -> epf vhost driver (EP) <- virtio ring(2)
>>>> -> virtio pci (RC) <-> virtio rpmsg (RC)
>>> IIUC epf vhost driver (EP) will access virtio ring(2) using vringh?
>>
>> Yes.
>>
>>
>>> And virtio
>>> ring(2) is created by virtio pci (RC).
>>
>> Yes.
>>
>>
>>>> What epf vhost driver did is to read from virtio ring(1) about the buffer len
>>>> and addr and them DMA to Linux(RC)?
>>> okay, I made some optimization here where vhost-rpmsg using a helper writes a
>>> buffer from rpmsg's upper layer directly to remote Linux (RC) as against here
>>> were it has to be first written to virtio ring (1).
>>>
>>> Thinking how this would look for NTB
>>> virtio-rpmsg (HOST1) <- virtio ring(1) -> NTB(HOST1) <-> NTB(HOST2)  <- virtio
>>> ring(2) -> virtio-rpmsg (HOST2)
>>>
>>> Here the NTB(HOST1) will access the virtio ring(2) using vringh?
>>
>> Yes, I think so it needs to use vring to access virtio ring (1) as well.
> NTB(HOST1) and virtio ring(1) will be in the same system. So it doesn't have to
> use vring. virtio ring(1) is by the virtio device the NTB(HOST1) creates.


Right.


>>
>>> Do you also think this will work seamlessly with virtio_net.c, virtio_blk.c?
>>
>> Yes.
> okay, I haven't looked at this but the backend of virtio_blk should access an
> actual storage device no?


Good point, for non-peer device like storage. There's probably no need 
for it to be registered on the virtio bus and it might be better to 
behave as you proposed.

Just to make sure I understand the design, how is VHOST SCSI expected to 
work in your proposal, does it have a device for file as a backend?


>>
>>> I'd like to get clarity on two things in the approach you suggested, one is
>>> features (since epf vhost should ideally be transparent to any virtio driver)
>>
>> We can have have an array of pre-defined features indexed by virtio device id
>> in the code.
>>
>>
>>> and the other is how certain inputs to virtio device such as number of buffers
>>> be determined.
>>
>> We can start from hard coded the value like 256, or introduce some API for user
>> to change the value.
>>
>>
>>> Thanks again for your suggestions!
>>
>> You're welcome.
>>
>> Note that I just want to check whether or not we can reuse the virtio
>> bus/driver. It's something similar to what you proposed in Software Layering
>> but we just replace "vhost core" with "virtio bus" and move the vhost core
>> below epf/ntb/platform transport.
> Got it. My initial design was based on my understanding of your comments [1].


Yes, but that's just for a networking device. If we want something more 
generic, it may require more thought (bus etc).


>
> I'll try to create something based on your proposed design here.


Sure, but for coding, we'd better wait for other's opinion here.

Thanks


>
> Regards
> Kishon
>
> [1] ->
> https://lore.kernel.org/linux-pci/59982499-0fc1-2e39-9ff9-993fb4dd7dcc@redhat.com/
>> Thanks
>>
>>
>>> Regards
>>> Kishon
>>>
>>>>> in some cases adds latency because of forwarding interrupts
>>>>> between vhost and virtio driver, vhost drivers providing features (which means
>>>>> it has to be aware of which virtio driver will be connected).
>>>>> virtio drivers (front end) generally access the buffers from it's local memory
>>>>> but when in backend it can access over MMIO (like PCI EPF or NTB) or
>>>>> userspace.
>>>>>> Does this make sense?
>>>>> Two copies in my opinion is an issue but lets get others opinions as well.
>>>> Sure.
>>>>
>>>>
>>>>> Thanks for your suggestions!
>>>> You're welcome.
>>>>
>>>> Thanks
>>>>
>>>>
>>>>> Regards
>>>>> Kishon
>>>>>
>>>>>> Thanks
>>>>>>
>>>>>>
>>>>>>> Thanks
>>>>>>> Kishon
>>>>>>>
>>>>>>>> Thanks
>>>>>>>>
>>>>>>>>
>>>>>>>>>> Kishon Vijay Abraham I (22):
>>>>>>>>>>        vhost: Make _feature_ bits a property of vhost device
>>>>>>>>>>        vhost: Introduce standard Linux driver model in VHOST
>>>>>>>>>>        vhost: Add ops for the VHOST driver to configure VHOST device
>>>>>>>>>>        vringh: Add helpers to access vring in MMIO
>>>>>>>>>>        vhost: Add MMIO helpers for operations on vhost virtqueue
>>>>>>>>>>        vhost: Introduce configfs entry for configuring VHOST
>>>>>>>>>>        virtio_pci: Use request_threaded_irq() instead of request_irq()
>>>>>>>>>>        rpmsg: virtio_rpmsg_bus: Disable receive virtqueue callback when
>>>>>>>>>>          reading messages
>>>>>>>>>>        rpmsg: Introduce configfs entry for configuring rpmsg
>>>>>>>>>>        rpmsg: virtio_rpmsg_bus: Add Address Service Notification support
>>>>>>>>>>        rpmsg: virtio_rpmsg_bus: Move generic rpmsg structure to
>>>>>>>>>>          rpmsg_internal.h
>>>>>>>>>>        virtio: Add ops to allocate and free buffer
>>>>>>>>>>        rpmsg: virtio_rpmsg_bus: Use virtio_alloc_buffer() and
>>>>>>>>>>          virtio_free_buffer()
>>>>>>>>>>        rpmsg: Add VHOST based remote processor messaging bus
>>>>>>>>>>        samples/rpmsg: Setup delayed work to send message
>>>>>>>>>>        samples/rpmsg: Wait for address to be bound to rpdev for sending
>>>>>>>>>>          message
>>>>>>>>>>        rpmsg.txt: Add Documentation to configure rpmsg using configfs
>>>>>>>>>>        virtio_pci: Add VIRTIO driver for VHOST on Configurable PCIe
>>>>>>>>>> Endpoint
>>>>>>>>>>          device
>>>>>>>>>>        PCI: endpoint: Add EP function driver to provide VHOST interface
>>>>>>>>>>        NTB: Add a new NTB client driver to implement VIRTIO functionality
>>>>>>>>>>        NTB: Add a new NTB client driver to implement VHOST functionality
>>>>>>>>>>        NTB: Describe the ntb_virtio and ntb_vhost client in the
>>>>>>>>>> documentation
>>>>>>>>>>
>>>>>>>>>>       Documentation/driver-api/ntb.rst              |   11 +
>>>>>>>>>>       Documentation/rpmsg.txt                       |   56 +
>>>>>>>>>>       drivers/ntb/Kconfig                           |   18 +
>>>>>>>>>>       drivers/ntb/Makefile                          |    2 +
>>>>>>>>>>       drivers/ntb/ntb_vhost.c                       |  776 +++++++++++
>>>>>>>>>>       drivers/ntb/ntb_virtio.c                      |  853 ++++++++++++
>>>>>>>>>>       drivers/ntb/ntb_virtio.h                      |   56 +
>>>>>>>>>>       drivers/pci/endpoint/functions/Kconfig        |   11 +
>>>>>>>>>>       drivers/pci/endpoint/functions/Makefile       |    1 +
>>>>>>>>>>       .../pci/endpoint/functions/pci-epf-vhost.c    | 1144
>>>>>>>>>> ++++++++++++++++
>>>>>>>>>>       drivers/rpmsg/Kconfig                         |   10 +
>>>>>>>>>>       drivers/rpmsg/Makefile                        |    3 +-
>>>>>>>>>>       drivers/rpmsg/rpmsg_cfs.c                     |  394 ++++++
>>>>>>>>>>       drivers/rpmsg/rpmsg_core.c                    |    7 +
>>>>>>>>>>       drivers/rpmsg/rpmsg_internal.h                |  136 ++
>>>>>>>>>>       drivers/rpmsg/vhost_rpmsg_bus.c               | 1151
>>>>>>>>>> +++++++++++++++++
>>>>>>>>>>       drivers/rpmsg/virtio_rpmsg_bus.c              |  184 ++-
>>>>>>>>>>       drivers/vhost/Kconfig                         |    1 +
>>>>>>>>>>       drivers/vhost/Makefile                        |    2 +-
>>>>>>>>>>       drivers/vhost/net.c                           |   10 +-
>>>>>>>>>>       drivers/vhost/scsi.c                          |   24 +-
>>>>>>>>>>       drivers/vhost/test.c                          |   17 +-
>>>>>>>>>>       drivers/vhost/vdpa.c                          |    2 +-
>>>>>>>>>>       drivers/vhost/vhost.c                         |  730 ++++++++++-
>>>>>>>>>>       drivers/vhost/vhost_cfs.c                     |  341 +++++
>>>>>>>>>>       drivers/vhost/vringh.c                        |  332 +++++
>>>>>>>>>>       drivers/vhost/vsock.c                         |   20 +-
>>>>>>>>>>       drivers/virtio/Kconfig                        |    9 +
>>>>>>>>>>       drivers/virtio/Makefile                       |    1 +
>>>>>>>>>>       drivers/virtio/virtio_pci_common.c            |   25 +-
>>>>>>>>>>       drivers/virtio/virtio_pci_epf.c               |  670 ++++++++++
>>>>>>>>>>       include/linux/mod_devicetable.h               |    6 +
>>>>>>>>>>       include/linux/rpmsg.h                         |    6 +
>>>>>>>>>>       {drivers/vhost => include/linux}/vhost.h      |  132 +-
>>>>>>>>>>       include/linux/virtio.h                        |    3 +
>>>>>>>>>>       include/linux/virtio_config.h                 |   42 +
>>>>>>>>>>       include/linux/vringh.h                        |   46 +
>>>>>>>>>>       samples/rpmsg/rpmsg_client_sample.c           |   32 +-
>>>>>>>>>>       tools/virtio/virtio_test.c                    |    2 +-
>>>>>>>>>>       39 files changed, 7083 insertions(+), 183 deletions(-)
>>>>>>>>>>       create mode 100644 drivers/ntb/ntb_vhost.c
>>>>>>>>>>       create mode 100644 drivers/ntb/ntb_virtio.c
>>>>>>>>>>       create mode 100644 drivers/ntb/ntb_virtio.h
>>>>>>>>>>       create mode 100644 drivers/pci/endpoint/functions/pci-epf-vhost.c
>>>>>>>>>>       create mode 100644 drivers/rpmsg/rpmsg_cfs.c
>>>>>>>>>>       create mode 100644 drivers/rpmsg/vhost_rpmsg_bus.c
>>>>>>>>>>       create mode 100644 drivers/vhost/vhost_cfs.c
>>>>>>>>>>       create mode 100644 drivers/virtio/virtio_pci_epf.c
>>>>>>>>>>       rename {drivers/vhost => include/linux}/vhost.h (66%)
>>>>>>>>>>
>>>>>>>>>> -- 
>>>>>>>>>> 2.17.1
>>>>>>>>>>
Mathieu Poirier July 15, 2020, 5:15 p.m. UTC | #14
Hey Kishon,

On Wed, Jul 08, 2020 at 06:43:45PM +0530, Kishon Vijay Abraham I wrote:
> Hi Jason,
> 
> On 7/8/2020 4:52 PM, Jason Wang wrote:
> > 
> > On 2020/7/7 下午10:45, Kishon Vijay Abraham I wrote:
> >> Hi Jason,
> >>
> >> On 7/7/2020 3:17 PM, Jason Wang wrote:
> >>> On 2020/7/6 下午5:32, Kishon Vijay Abraham I wrote:
> >>>> Hi Jason,
> >>>>
> >>>> On 7/3/2020 12:46 PM, Jason Wang wrote:
> >>>>> On 2020/7/2 下午9:35, Kishon Vijay Abraham I wrote:
> >>>>>> Hi Jason,
> >>>>>>
> >>>>>> On 7/2/2020 3:40 PM, Jason Wang wrote:
> >>>>>>> On 2020/7/2 下午5:51, Michael S. Tsirkin wrote:
> >>>>>>>> On Thu, Jul 02, 2020 at 01:51:21PM +0530, Kishon Vijay Abraham I wrote:
> >>>>>>>>> This series enhances Linux Vhost support to enable SoC-to-SoC
> >>>>>>>>> communication over MMIO. This series enables rpmsg communication between
> >>>>>>>>> two SoCs using both PCIe RC<->EP and HOST1-NTB-HOST2
> >>>>>>>>>
> >>>>>>>>> 1) Modify vhost to use standard Linux driver model
> >>>>>>>>> 2) Add support in vring to access virtqueue over MMIO
> >>>>>>>>> 3) Add vhost client driver for rpmsg
> >>>>>>>>> 4) Add PCIe RC driver (uses virtio) and PCIe EP driver (uses vhost) for
> >>>>>>>>>        rpmsg communication between two SoCs connected to each other
> >>>>>>>>> 5) Add NTB Virtio driver and NTB Vhost driver for rpmsg communication
> >>>>>>>>>        between two SoCs connected via NTB
> >>>>>>>>> 6) Add configfs to configure the components
> >>>>>>>>>
> >>>>>>>>> UseCase1 :
> >>>>>>>>>
> >>>>>>>>>      VHOST RPMSG                     VIRTIO RPMSG
> >>>>>>>>>           +                               +
> >>>>>>>>>           |                               |
> >>>>>>>>>           |                               |
> >>>>>>>>>           |                               |
> >>>>>>>>>           |                               |
> >>>>>>>>> +-----v------+                 +------v-------+
> >>>>>>>>> |   Linux    |                 |     Linux    |
> >>>>>>>>> |  Endpoint  |                 | Root Complex |
> >>>>>>>>> |            <----------------->              |
> >>>>>>>>> |            |                 |              |
> >>>>>>>>> |    SOC1    |                 |     SOC2     |
> >>>>>>>>> +------------+                 +--------------+
> >>>>>>>>>
> >>>>>>>>> UseCase 2:
> >>>>>>>>>
> >>>>>>>>>          VHOST RPMSG                                      VIRTIO RPMSG
> >>>>>>>>>               +                                                 +
> >>>>>>>>>               |                                                 |
> >>>>>>>>>               |                                                 |
> >>>>>>>>>               |                                                 |
> >>>>>>>>>               |                                                 |
> >>>>>>>>>        +------v------+                                   +------v------+
> >>>>>>>>>        |             |                                   |             |
> >>>>>>>>>        |    HOST1    |                                   |    HOST2    |
> >>>>>>>>>        |             |                                   |             |
> >>>>>>>>>        +------^------+                                   +------^------+
> >>>>>>>>>               |                                                 |
> >>>>>>>>>               |                                                 |
> >>>>>>>>> +---------------------------------------------------------------------+
> >>>>>>>>> |  +------v------+                                   +------v------+  |
> >>>>>>>>> |  |             |                                   |             |  |
> >>>>>>>>> |  |     EP      |                                   |     EP      |  |
> >>>>>>>>> |  | CONTROLLER1 |                                   | CONTROLLER2 |  |
> >>>>>>>>> |  |             <----------------------------------->             |  |
> >>>>>>>>> |  |             |                                   |             |  |
> >>>>>>>>> |  |             |                                   |             |  |
> >>>>>>>>> |  |             |  SoC With Multiple EP Instances   |             |  |
> >>>>>>>>> |  |             |  (Configured using NTB Function)  |             |  |
> >>>>>>>>> |  +-------------+                                   +-------------+  |
> >>>>>>>>> +---------------------------------------------------------------------+
> >>>>>>>>>
> >>>>>>>>> Software Layering:
> >>>>>>>>>
> >>>>>>>>> The high-level SW layering should look something like below. This series
> >>>>>>>>> adds support only for RPMSG VHOST, however something similar should be
> >>>>>>>>> done for net and scsi. With that any vhost device (PCI, NTB, Platform
> >>>>>>>>> device, user) can use any of the vhost client driver.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>         +----------------+  +-----------+  +------------+  +----------+
> >>>>>>>>>         |  RPMSG VHOST   |  | NET VHOST |  | SCSI VHOST |  |    X     |
> >>>>>>>>>         +-------^--------+  +-----^-----+  +-----^------+  +----^-----+
> >>>>>>>>>                 |                 |              |              |
> >>>>>>>>>                 |                 |              |              |
> >>>>>>>>>                 |                 |              |              |
> >>>>>>>>> +-----------v-----------------v--------------v--------------v----------+
> >>>>>>>>> |                            VHOST CORE                                |
> >>>>>>>>> +--------^---------------^--------------------^------------------^-----+
> >>>>>>>>>              |               |                    |                  |
> >>>>>>>>>              |               |                    |                  |
> >>>>>>>>>              |               |                    |                  |
> >>>>>>>>> +--------v-------+  +----v------+  +----------v----------+  +----v-----+
> >>>>>>>>> |  PCI EPF VHOST |  | NTB VHOST |  |PLATFORM DEVICE VHOST|  |    X     |
> >>>>>>>>> +----------------+  +-----------+  +---------------------+  +----------+
> >>>>>>>>>
> >>>>>>>>> This was initially proposed here [1]
> >>>>>>>>>
> >>>>>>>>> [1] ->
> >>>>>>>>> https://lore.kernel.org/r/2cf00ec4-1ed6-f66e-6897-006d1a5b6390@ti.com
> >>>>>>>> I find this very interesting. A huge patchset so will take a bit
> >>>>>>>> to review, but I certainly plan to do that. Thanks!
> >>>>>>> Yes, it would be better if there's a git branch for us to have a look.
> >>>>>> I've pushed the branch
> >>>>>> https://github.com/kishon/linux-wip.git vhost_rpmsg_pci_ntb_rfc
> >>>>> Thanks
> >>>>>
> >>>>>
> >>>>>>> Btw, I'm not sure I get the big picture, but I vaguely feel some of the
> >>>>>>> work is
> >>>>>>> duplicated with vDPA (e.g the epf transport or vhost bus).
> >>>>>> This is about connecting two different HW systems both running Linux and
> >>>>>> doesn't necessarily involve virtualization.
> >>>>> Right, this is something similar to VOP
> >>>>> (Documentation/misc-devices/mic/mic_overview.rst). The different is the
> >>>>> hardware I guess and VOP use userspace application to implement the device.
> >>>> I'd also like to point out, this series tries to have communication between
> >>>> two
> >>>> SoCs in vendor agnostic way. Since this series solves for 2 usecases (PCIe
> >>>> RC<->EP and NTB), for the NTB case it directly plugs into NTB framework and
> >>>> any
> >>>> of the HW in NTB below should be able to use a virtio-vhost communication
> >>>>
> >>>> #ls drivers/ntb/hw/
> >>>> amd  epf  idt  intel  mscc
> >>>>
> >>>> And similarly for the PCIe RC<->EP communication, this adds a generic endpoint
> >>>> function driver and hence any SoC that supports configurable PCIe endpoint can
> >>>> use virtio-vhost communication
> >>>>
> >>>> # ls drivers/pci/controller/dwc/*ep*
> >>>> drivers/pci/controller/dwc/pcie-designware-ep.c
> >>>> drivers/pci/controller/dwc/pcie-uniphier-ep.c
> >>>> drivers/pci/controller/dwc/pci-layerscape-ep.c
> >>>
> >>> Thanks for those backgrounds.
> >>>
> >>>
> >>>>>>     So there is no guest or host as in
> >>>>>> virtualization but two entirely different systems connected via PCIe cable,
> >>>>>> one
> >>>>>> acting as guest and one as host. So one system will provide virtio
> >>>>>> functionality reserving memory for virtqueues and the other provides vhost
> >>>>>> functionality providing a way to access the virtqueues in virtio memory.
> >>>>>> One is
> >>>>>> source and the other is sink and there is no intermediate entity. (vhost was
> >>>>>> probably intermediate entity in virtualization?)
> >>>>> (Not a native English speaker) but "vhost" could introduce some confusion for
> >>>>> me since it was use for implementing virtio backend for userspace drivers. I
> >>>>> guess "vringh" could be better.
> >>>> Initially I had named this vringh but later decided to choose vhost instead of
> >>>> vringh. vhost is still a virtio backend (not necessarily userspace) though it
> >>>> now resides in an entirely different system. Whatever virtio is for a frontend
> >>>> system, vhost can be that for a backend system. vring can be for accessing
> >>>> virtqueue and can be used either in frontend or backend.
> >>>
> >>> Ok.
> >>>
> >>>
> >>>>>>> Have you considered to implement these through vDPA?
> >>>>>> IIUC vDPA only provides an interface to userspace and an in-kernel rpmsg
> >>>>>> driver
> >>>>>> or vhost net driver is not provided.
> >>>>>>
> >>>>>> The HW connection looks something like https://pasteboard.co/JfMVVHC.jpg
> >>>>>> (usecase2 above),
> >>>>> I see.
> >>>>>
> >>>>>
> >>>>>>     all the boards run Linux. The middle board provides NTB
> >>>>>> functionality and board on either side provides virtio/vhost
> >>>>>> functionality and
> >>>>>> transfer data using rpmsg.
> >>>>> So I wonder whether it's worthwhile for a new bus. Can we use the existed
> >>>>> virtio-bus/drivers? It might work as, except for the epf transport, we can
> >>>>> introduce a epf "vhost" transport driver.
> >>>> IMHO we'll need two buses one for frontend and other for backend because the
> >>>> two components can then co-operate/interact with each other to provide a
> >>>> functionality. Though both will seemingly provide similar callbacks, they are
> >>>> both provide symmetrical or complimentary funcitonality and need not be
> >>>> same or
> >>>> identical.
> >>>>
> >>>> Having the same bus can also create sequencing issues.
> >>>>
> >>>> If you look at virtio_dev_probe() of virtio_bus
> >>>>
> >>>> device_features = dev->config->get_features(dev);
> >>>>
> >>>> Now if we use same bus for both front-end and back-end, both will try to
> >>>> get_features when there has been no set_features. Ideally vhost device should
> >>>> be initialized first with the set of features it supports. Vhost and virtio
> >>>> should use "status" and "features" complimentarily and not identically.
> >>>
> >>> Yes, but there's no need for doing status/features passthrough in epf vhost
> >>> drivers.b
> >>>
> >>>
> >>>> virtio device (or frontend) cannot be initialized before vhost device (or
> >>>> backend) gets initialized with data such as features. Similarly vhost
> >>>> (backend)
> >>>> cannot access virqueues or buffers before virtio (frontend) sets
> >>>> VIRTIO_CONFIG_S_DRIVER_OK whereas that requirement is not there for virtio as
> >>>> the physical memory for virtqueues are created by virtio (frontend).
> >>>
> >>> epf vhost drivers need to implement two devices: vhost(vringh) device and
> >>> virtio device (which is a mediated device). The vhost(vringh) device is doing
> >>> feature negotiation with the virtio device via RC/EP or NTB. The virtio device
> >>> is doing feature negotiation with local virtio drivers. If there're feature
> >>> mismatch, epf vhost drivers and do mediation between them.
> >> Here epf vhost should be initialized with a set of features for it to negotiate
> >> either as vhost device or virtio device no? Where should the initial feature
> >> set for epf vhost come from?
> > 
> > 
> > I think it can work as:
> > 
> > 1) Having an initial features (hard coded in the code) set X in epf vhost
> > 2) Using this X for both virtio device and vhost(vringh) device
> > 3) local virtio driver will negotiate with virtio device with feature set Y
> > 4) remote virtio driver will negotiate with vringh device with feature set Z
> > 5) mediate between feature Y and feature Z since both Y and Z are a subset of X
> > 
> > 
> 
> okay. I'm also thinking if we could have configfs for configuring this. Anyways
> we could find different approaches of configuring this.
> >>>
> >>>>> It will have virtqueues but only used for the communication between itself
> >>>>> and
> >>>>> uppter virtio driver. And it will have vringh queues which will be probe by
> >>>>> virtio epf transport drivers. And it needs to do datacopy between
> >>>>> virtqueue and
> >>>>> vringh queues.
> >>>>>
> >>>>> It works like:
> >>>>>
> >>>>> virtio drivers <- virtqueue/virtio-bus -> epf vhost drivers <- vringh
> >>>>> queue/epf>
> >>>>>
> >>>>> The advantages is that there's no need for writing new buses and drivers.
> >>>> I think this will work however there is an addtional copy between vringh queue
> >>>> and virtqueue,
> >>>
> >>> I think not? E.g in use case 1), if we stick to virtio bus, we will have:
> >>>
> >>> virtio-rpmsg (EP) <- virtio ring(1) -> epf vhost driver (EP) <- virtio ring(2)
> >>> -> virtio pci (RC) <-> virtio rpmsg (RC)
> >> IIUC epf vhost driver (EP) will access virtio ring(2) using vringh?
> > 
> > 
> > Yes.
> > 
> > 
> >> And virtio
> >> ring(2) is created by virtio pci (RC).
> > 
> > 
> > Yes.
> > 
> > 
> >>> What epf vhost driver did is to read from virtio ring(1) about the buffer len
> >>> and addr and them DMA to Linux(RC)?
> >> okay, I made some optimization here where vhost-rpmsg using a helper writes a
> >> buffer from rpmsg's upper layer directly to remote Linux (RC) as against here
> >> were it has to be first written to virtio ring (1).
> >>
> >> Thinking how this would look for NTB
> >> virtio-rpmsg (HOST1) <- virtio ring(1) -> NTB(HOST1) <-> NTB(HOST2)  <- virtio
> >> ring(2) -> virtio-rpmsg (HOST2)
> >>
> >> Here the NTB(HOST1) will access the virtio ring(2) using vringh?
> > 
> > 
> > Yes, I think so it needs to use vring to access virtio ring (1) as well.
> 
> NTB(HOST1) and virtio ring(1) will be in the same system. So it doesn't have to
> use vring. virtio ring(1) is by the virtio device the NTB(HOST1) creates.
> > 
> > 
> >>
> >> Do you also think this will work seamlessly with virtio_net.c, virtio_blk.c?
> > 
> > 
> > Yes.
> 
> okay, I haven't looked at this but the backend of virtio_blk should access an
> actual storage device no?
> > 
> > 
> >>
> >> I'd like to get clarity on two things in the approach you suggested, one is
> >> features (since epf vhost should ideally be transparent to any virtio driver)
> > 
> > 
> > We can have have an array of pre-defined features indexed by virtio device id
> > in the code.
> > 
> > 
> >> and the other is how certain inputs to virtio device such as number of buffers
> >> be determined.
> > 
> > 
> > We can start from hard coded the value like 256, or introduce some API for user
> > to change the value.
> > 
> > 
> >>
> >> Thanks again for your suggestions!
> > 
> > 
> > You're welcome.
> > 
> > Note that I just want to check whether or not we can reuse the virtio
> > bus/driver. It's something similar to what you proposed in Software Layering
> > but we just replace "vhost core" with "virtio bus" and move the vhost core
> > below epf/ntb/platform transport.
> 
> Got it. My initial design was based on my understanding of your comments [1].
> 
> I'll try to create something based on your proposed design here.

Based on the above conversation it seems like I should wait for another revision
of this set before reviewing the RPMSG part.  Please confirm that my
understanding is correct.

Thanks,
Mathieu

> 
> Regards
> Kishon
> 
> [1] ->
> https://lore.kernel.org/linux-pci/59982499-0fc1-2e39-9ff9-993fb4dd7dcc@redhat.com/
> > 
> > Thanks
> > 
> > 
> >>
> >> Regards
> >> Kishon
> >>
> >>>
> >>>> in some cases adds latency because of forwarding interrupts
> >>>> between vhost and virtio driver, vhost drivers providing features (which means
> >>>> it has to be aware of which virtio driver will be connected).
> >>>> virtio drivers (front end) generally access the buffers from it's local memory
> >>>> but when in backend it can access over MMIO (like PCI EPF or NTB) or
> >>>> userspace.
> >>>>> Does this make sense?
> >>>> Two copies in my opinion is an issue but lets get others opinions as well.
> >>>
> >>> Sure.
> >>>
> >>>
> >>>> Thanks for your suggestions!
> >>>
> >>> You're welcome.
> >>>
> >>> Thanks
> >>>
> >>>
> >>>> Regards
> >>>> Kishon
> >>>>
> >>>>> Thanks
> >>>>>
> >>>>>
> >>>>>> Thanks
> >>>>>> Kishon
> >>>>>>
> >>>>>>> Thanks
> >>>>>>>
> >>>>>>>
> >>>>>>>>> Kishon Vijay Abraham I (22):
> >>>>>>>>>       vhost: Make _feature_ bits a property of vhost device
> >>>>>>>>>       vhost: Introduce standard Linux driver model in VHOST
> >>>>>>>>>       vhost: Add ops for the VHOST driver to configure VHOST device
> >>>>>>>>>       vringh: Add helpers to access vring in MMIO
> >>>>>>>>>       vhost: Add MMIO helpers for operations on vhost virtqueue
> >>>>>>>>>       vhost: Introduce configfs entry for configuring VHOST
> >>>>>>>>>       virtio_pci: Use request_threaded_irq() instead of request_irq()
> >>>>>>>>>       rpmsg: virtio_rpmsg_bus: Disable receive virtqueue callback when
> >>>>>>>>>         reading messages
> >>>>>>>>>       rpmsg: Introduce configfs entry for configuring rpmsg
> >>>>>>>>>       rpmsg: virtio_rpmsg_bus: Add Address Service Notification support
> >>>>>>>>>       rpmsg: virtio_rpmsg_bus: Move generic rpmsg structure to
> >>>>>>>>>         rpmsg_internal.h
> >>>>>>>>>       virtio: Add ops to allocate and free buffer
> >>>>>>>>>       rpmsg: virtio_rpmsg_bus: Use virtio_alloc_buffer() and
> >>>>>>>>>         virtio_free_buffer()
> >>>>>>>>>       rpmsg: Add VHOST based remote processor messaging bus
> >>>>>>>>>       samples/rpmsg: Setup delayed work to send message
> >>>>>>>>>       samples/rpmsg: Wait for address to be bound to rpdev for sending
> >>>>>>>>>         message
> >>>>>>>>>       rpmsg.txt: Add Documentation to configure rpmsg using configfs
> >>>>>>>>>       virtio_pci: Add VIRTIO driver for VHOST on Configurable PCIe
> >>>>>>>>> Endpoint
> >>>>>>>>>         device
> >>>>>>>>>       PCI: endpoint: Add EP function driver to provide VHOST interface
> >>>>>>>>>       NTB: Add a new NTB client driver to implement VIRTIO functionality
> >>>>>>>>>       NTB: Add a new NTB client driver to implement VHOST functionality
> >>>>>>>>>       NTB: Describe the ntb_virtio and ntb_vhost client in the
> >>>>>>>>> documentation
> >>>>>>>>>
> >>>>>>>>>      Documentation/driver-api/ntb.rst              |   11 +
> >>>>>>>>>      Documentation/rpmsg.txt                       |   56 +
> >>>>>>>>>      drivers/ntb/Kconfig                           |   18 +
> >>>>>>>>>      drivers/ntb/Makefile                          |    2 +
> >>>>>>>>>      drivers/ntb/ntb_vhost.c                       |  776 +++++++++++
> >>>>>>>>>      drivers/ntb/ntb_virtio.c                      |  853 ++++++++++++
> >>>>>>>>>      drivers/ntb/ntb_virtio.h                      |   56 +
> >>>>>>>>>      drivers/pci/endpoint/functions/Kconfig        |   11 +
> >>>>>>>>>      drivers/pci/endpoint/functions/Makefile       |    1 +
> >>>>>>>>>      .../pci/endpoint/functions/pci-epf-vhost.c    | 1144
> >>>>>>>>> ++++++++++++++++
> >>>>>>>>>      drivers/rpmsg/Kconfig                         |   10 +
> >>>>>>>>>      drivers/rpmsg/Makefile                        |    3 +-
> >>>>>>>>>      drivers/rpmsg/rpmsg_cfs.c                     |  394 ++++++
> >>>>>>>>>      drivers/rpmsg/rpmsg_core.c                    |    7 +
> >>>>>>>>>      drivers/rpmsg/rpmsg_internal.h                |  136 ++
> >>>>>>>>>      drivers/rpmsg/vhost_rpmsg_bus.c               | 1151
> >>>>>>>>> +++++++++++++++++
> >>>>>>>>>      drivers/rpmsg/virtio_rpmsg_bus.c              |  184 ++-
> >>>>>>>>>      drivers/vhost/Kconfig                         |    1 +
> >>>>>>>>>      drivers/vhost/Makefile                        |    2 +-
> >>>>>>>>>      drivers/vhost/net.c                           |   10 +-
> >>>>>>>>>      drivers/vhost/scsi.c                          |   24 +-
> >>>>>>>>>      drivers/vhost/test.c                          |   17 +-
> >>>>>>>>>      drivers/vhost/vdpa.c                          |    2 +-
> >>>>>>>>>      drivers/vhost/vhost.c                         |  730 ++++++++++-
> >>>>>>>>>      drivers/vhost/vhost_cfs.c                     |  341 +++++
> >>>>>>>>>      drivers/vhost/vringh.c                        |  332 +++++
> >>>>>>>>>      drivers/vhost/vsock.c                         |   20 +-
> >>>>>>>>>      drivers/virtio/Kconfig                        |    9 +
> >>>>>>>>>      drivers/virtio/Makefile                       |    1 +
> >>>>>>>>>      drivers/virtio/virtio_pci_common.c            |   25 +-
> >>>>>>>>>      drivers/virtio/virtio_pci_epf.c               |  670 ++++++++++
> >>>>>>>>>      include/linux/mod_devicetable.h               |    6 +
> >>>>>>>>>      include/linux/rpmsg.h                         |    6 +
> >>>>>>>>>      {drivers/vhost => include/linux}/vhost.h      |  132 +-
> >>>>>>>>>      include/linux/virtio.h                        |    3 +
> >>>>>>>>>      include/linux/virtio_config.h                 |   42 +
> >>>>>>>>>      include/linux/vringh.h                        |   46 +
> >>>>>>>>>      samples/rpmsg/rpmsg_client_sample.c           |   32 +-
> >>>>>>>>>      tools/virtio/virtio_test.c                    |    2 +-
> >>>>>>>>>      39 files changed, 7083 insertions(+), 183 deletions(-)
> >>>>>>>>>      create mode 100644 drivers/ntb/ntb_vhost.c
> >>>>>>>>>      create mode 100644 drivers/ntb/ntb_virtio.c
> >>>>>>>>>      create mode 100644 drivers/ntb/ntb_virtio.h
> >>>>>>>>>      create mode 100644 drivers/pci/endpoint/functions/pci-epf-vhost.c
> >>>>>>>>>      create mode 100644 drivers/rpmsg/rpmsg_cfs.c
> >>>>>>>>>      create mode 100644 drivers/rpmsg/vhost_rpmsg_bus.c
> >>>>>>>>>      create mode 100644 drivers/vhost/vhost_cfs.c
> >>>>>>>>>      create mode 100644 drivers/virtio/virtio_pci_epf.c
> >>>>>>>>>      rename {drivers/vhost => include/linux}/vhost.h (66%)
> >>>>>>>>>
> >>>>>>>>> -- 
> >>>>>>>>> 2.17.1
> >>>>>>>>>
> >
Cornelia Huck Aug. 28, 2020, 10:34 a.m. UTC | #15
On Thu, 9 Jul 2020 14:26:53 +0800
Jason Wang <jasowang@redhat.com> wrote:

[Let me note right at the beginning that I first noted this while
listening to Kishon's talk at LPC on Wednesday. I might be very
confused about the background here, so let me apologize beforehand for
any confusion I might spread.]

> On 2020/7/8 下午9:13, Kishon Vijay Abraham I wrote:
> > Hi Jason,
> >
> > On 7/8/2020 4:52 PM, Jason Wang wrote:  
> >> On 2020/7/7 下午10:45, Kishon Vijay Abraham I wrote:  
> >>> Hi Jason,
> >>>
> >>> On 7/7/2020 3:17 PM, Jason Wang wrote:  
> >>>> On 2020/7/6 下午5:32, Kishon Vijay Abraham I wrote:  
> >>>>> Hi Jason,
> >>>>>
> >>>>> On 7/3/2020 12:46 PM, Jason Wang wrote:  
> >>>>>> On 2020/7/2 下午9:35, Kishon Vijay Abraham I wrote:  
> >>>>>>> Hi Jason,
> >>>>>>>
> >>>>>>> On 7/2/2020 3:40 PM, Jason Wang wrote:  
> >>>>>>>> On 2020/7/2 下午5:51, Michael S. Tsirkin wrote:  
> >>>>>>>>> On Thu, Jul 02, 2020 at 01:51:21PM +0530, Kishon Vijay Abraham I wrote:  
> >>>>>>>>>> This series enhances Linux Vhost support to enable SoC-to-SoC
> >>>>>>>>>> communication over MMIO. This series enables rpmsg communication between
> >>>>>>>>>> two SoCs using both PCIe RC<->EP and HOST1-NTB-HOST2
> >>>>>>>>>>
> >>>>>>>>>> 1) Modify vhost to use standard Linux driver model
> >>>>>>>>>> 2) Add support in vring to access virtqueue over MMIO
> >>>>>>>>>> 3) Add vhost client driver for rpmsg
> >>>>>>>>>> 4) Add PCIe RC driver (uses virtio) and PCIe EP driver (uses vhost) for
> >>>>>>>>>>         rpmsg communication between two SoCs connected to each other
> >>>>>>>>>> 5) Add NTB Virtio driver and NTB Vhost driver for rpmsg communication
> >>>>>>>>>>         between two SoCs connected via NTB
> >>>>>>>>>> 6) Add configfs to configure the components
> >>>>>>>>>>
> >>>>>>>>>> UseCase1 :
> >>>>>>>>>>
> >>>>>>>>>>       VHOST RPMSG                     VIRTIO RPMSG
> >>>>>>>>>>            +                               +
> >>>>>>>>>>            |                               |
> >>>>>>>>>>            |                               |
> >>>>>>>>>>            |                               |
> >>>>>>>>>>            |                               |
> >>>>>>>>>> +-----v------+                 +------v-------+
> >>>>>>>>>> |   Linux    |                 |     Linux    |
> >>>>>>>>>> |  Endpoint  |                 | Root Complex |
> >>>>>>>>>> |            <----------------->              |
> >>>>>>>>>> |            |                 |              |
> >>>>>>>>>> |    SOC1    |                 |     SOC2     |
> >>>>>>>>>> +------------+                 +--------------+
> >>>>>>>>>>
> >>>>>>>>>> UseCase 2:
> >>>>>>>>>>
> >>>>>>>>>>           VHOST RPMSG                                      VIRTIO RPMSG
> >>>>>>>>>>                +                                                 +
> >>>>>>>>>>                |                                                 |
> >>>>>>>>>>                |                                                 |
> >>>>>>>>>>                |                                                 |
> >>>>>>>>>>                |                                                 |
> >>>>>>>>>>         +------v------+                                   +------v------+
> >>>>>>>>>>         |             |                                   |             |
> >>>>>>>>>>         |    HOST1    |                                   |    HOST2    |
> >>>>>>>>>>         |             |                                   |             |
> >>>>>>>>>>         +------^------+                                   +------^------+
> >>>>>>>>>>                |                                                 |
> >>>>>>>>>>                |                                                 |
> >>>>>>>>>> +---------------------------------------------------------------------+
> >>>>>>>>>> |  +------v------+                                   +------v------+  |
> >>>>>>>>>> |  |             |                                   |             |  |
> >>>>>>>>>> |  |     EP      |                                   |     EP      |  |
> >>>>>>>>>> |  | CONTROLLER1 |                                   | CONTROLLER2 |  |
> >>>>>>>>>> |  |             <----------------------------------->             |  |
> >>>>>>>>>> |  |             |                                   |             |  |
> >>>>>>>>>> |  |             |                                   |             |  |
> >>>>>>>>>> |  |             |  SoC With Multiple EP Instances   |             |  |
> >>>>>>>>>> |  |             |  (Configured using NTB Function)  |             |  |
> >>>>>>>>>> |  +-------------+                                   +-------------+  |
> >>>>>>>>>> +---------------------------------------------------------------------+

First of all, to clarify the terminology:
Is "vhost rpmsg" acting as what the virtio standard calls the 'device',
and "virtio rpmsg" as the 'driver'? Or is the "vhost" part mostly just
virtqueues + the exiting vhost interfaces?

> >>>>>>>>>>
> >>>>>>>>>> Software Layering:
> >>>>>>>>>>
> >>>>>>>>>> The high-level SW layering should look something like below. This series
> >>>>>>>>>> adds support only for RPMSG VHOST, however something similar should be
> >>>>>>>>>> done for net and scsi. With that any vhost device (PCI, NTB, Platform
> >>>>>>>>>> device, user) can use any of the vhost client driver.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>          +----------------+  +-----------+  +------------+  +----------+
> >>>>>>>>>>          |  RPMSG VHOST   |  | NET VHOST |  | SCSI VHOST |  |    X     |
> >>>>>>>>>>          +-------^--------+  +-----^-----+  +-----^------+  +----^-----+
> >>>>>>>>>>                  |                 |              |              |
> >>>>>>>>>>                  |                 |              |              |
> >>>>>>>>>>                  |                 |              |              |
> >>>>>>>>>> +-----------v-----------------v--------------v--------------v----------+
> >>>>>>>>>> |                            VHOST CORE                                |
> >>>>>>>>>> +--------^---------------^--------------------^------------------^-----+
> >>>>>>>>>>               |               |                    |                  |
> >>>>>>>>>>               |               |                    |                  |
> >>>>>>>>>>               |               |                    |                  |
> >>>>>>>>>> +--------v-------+  +----v------+  +----------v----------+  +----v-----+
> >>>>>>>>>> |  PCI EPF VHOST |  | NTB VHOST |  |PLATFORM DEVICE VHOST|  |    X     |
> >>>>>>>>>> +----------------+  +-----------+  +---------------------+  +----------+

So, the upper half is basically various functionality types, e.g. a net
device. What is the lower half, a hardware interface? Would it be
equivalent to e.g. a normal PCI device?

> >>>>>>>>>>
> >>>>>>>>>> This was initially proposed here [1]
> >>>>>>>>>>
> >>>>>>>>>> [1] ->
> >>>>>>>>>> https://lore.kernel.org/r/2cf00ec4-1ed6-f66e-6897-006d1a5b6390@ti.com  
> >>>>>>>>> I find this very interesting. A huge patchset so will take a bit
> >>>>>>>>> to review, but I certainly plan to do that. Thanks!  
> >>>>>>>> Yes, it would be better if there's a git branch for us to have a look.  
> >>>>>>> I've pushed the branch
> >>>>>>> https://github.com/kishon/linux-wip.git vhost_rpmsg_pci_ntb_rfc  
> >>>>>> Thanks
> >>>>>>
> >>>>>>  
> >>>>>>>> Btw, I'm not sure I get the big picture, but I vaguely feel some of the
> >>>>>>>> work is
> >>>>>>>> duplicated with vDPA (e.g the epf transport or vhost bus).  
> >>>>>>> This is about connecting two different HW systems both running Linux and
> >>>>>>> doesn't necessarily involve virtualization.  
> >>>>>> Right, this is something similar to VOP
> >>>>>> (Documentation/misc-devices/mic/mic_overview.rst). The different is the
> >>>>>> hardware I guess and VOP use userspace application to implement the device.  
> >>>>> I'd also like to point out, this series tries to have communication between
> >>>>> two
> >>>>> SoCs in vendor agnostic way. Since this series solves for 2 usecases (PCIe
> >>>>> RC<->EP and NTB), for the NTB case it directly plugs into NTB framework and
> >>>>> any
> >>>>> of the HW in NTB below should be able to use a virtio-vhost communication
> >>>>>
> >>>>> #ls drivers/ntb/hw/
> >>>>> amd  epf  idt  intel  mscc
> >>>>>
> >>>>> And similarly for the PCIe RC<->EP communication, this adds a generic endpoint
> >>>>> function driver and hence any SoC that supports configurable PCIe endpoint can
> >>>>> use virtio-vhost communication
> >>>>>
> >>>>> # ls drivers/pci/controller/dwc/*ep*
> >>>>> drivers/pci/controller/dwc/pcie-designware-ep.c
> >>>>> drivers/pci/controller/dwc/pcie-uniphier-ep.c
> >>>>> drivers/pci/controller/dwc/pci-layerscape-ep.c  
> >>>> Thanks for those backgrounds.
> >>>>
> >>>>  
> >>>>>>>      So there is no guest or host as in
> >>>>>>> virtualization but two entirely different systems connected via PCIe cable,
> >>>>>>> one
> >>>>>>> acting as guest and one as host. So one system will provide virtio
> >>>>>>> functionality reserving memory for virtqueues and the other provides vhost
> >>>>>>> functionality providing a way to access the virtqueues in virtio memory.
> >>>>>>> One is
> >>>>>>> source and the other is sink and there is no intermediate entity. (vhost was
> >>>>>>> probably intermediate entity in virtualization?)  
> >>>>>> (Not a native English speaker) but "vhost" could introduce some confusion for
> >>>>>> me since it was use for implementing virtio backend for userspace drivers. I
> >>>>>> guess "vringh" could be better.  
> >>>>> Initially I had named this vringh but later decided to choose vhost instead of
> >>>>> vringh. vhost is still a virtio backend (not necessarily userspace) though it
> >>>>> now resides in an entirely different system. Whatever virtio is for a frontend
> >>>>> system, vhost can be that for a backend system. vring can be for accessing
> >>>>> virtqueue and can be used either in frontend or backend.  

I guess that clears up at least some of my questions from above...

> >>>> Ok.
> >>>>
> >>>>  
> >>>>>>>> Have you considered to implement these through vDPA?  
> >>>>>>> IIUC vDPA only provides an interface to userspace and an in-kernel rpmsg
> >>>>>>> driver
> >>>>>>> or vhost net driver is not provided.
> >>>>>>>
> >>>>>>> The HW connection looks something like https://pasteboard.co/JfMVVHC.jpg
> >>>>>>> (usecase2 above),  
> >>>>>> I see.
> >>>>>>
> >>>>>>  
> >>>>>>>      all the boards run Linux. The middle board provides NTB
> >>>>>>> functionality and board on either side provides virtio/vhost
> >>>>>>> functionality and
> >>>>>>> transfer data using rpmsg. 

This setup looks really interesting (sometimes, it's really hard to
imagine this in the abstract.)
 
> >>>>>> So I wonder whether it's worthwhile for a new bus. Can we use
> >>>>>> the existed virtio-bus/drivers? It might work as, except for
> >>>>>> the epf transport, we can introduce a epf "vhost" transport
> >>>>>> driver.  
> >>>>> IMHO we'll need two buses one for frontend and other for
> >>>>> backend because the two components can then co-operate/interact
> >>>>> with each other to provide a functionality. Though both will
> >>>>> seemingly provide similar callbacks, they are both provide
> >>>>> symmetrical or complimentary funcitonality and need not be same
> >>>>> or identical.
> >>>>>
> >>>>> Having the same bus can also create sequencing issues.
> >>>>>
> >>>>> If you look at virtio_dev_probe() of virtio_bus
> >>>>>
> >>>>> device_features = dev->config->get_features(dev);
> >>>>>
> >>>>> Now if we use same bus for both front-end and back-end, both
> >>>>> will try to get_features when there has been no set_features.
> >>>>> Ideally vhost device should be initialized first with the set
> >>>>> of features it supports. Vhost and virtio should use "status"
> >>>>> and "features" complimentarily and not identically.  
> >>>> Yes, but there's no need for doing status/features passthrough
> >>>> in epf vhost drivers.b
> >>>>
> >>>>  
> >>>>> virtio device (or frontend) cannot be initialized before vhost
> >>>>> device (or backend) gets initialized with data such as
> >>>>> features. Similarly vhost (backend)
> >>>>> cannot access virqueues or buffers before virtio (frontend) sets
> >>>>> VIRTIO_CONFIG_S_DRIVER_OK whereas that requirement is not there
> >>>>> for virtio as the physical memory for virtqueues are created by
> >>>>> virtio (frontend).  
> >>>> epf vhost drivers need to implement two devices: vhost(vringh)
> >>>> device and virtio device (which is a mediated device). The
> >>>> vhost(vringh) device is doing feature negotiation with the
> >>>> virtio device via RC/EP or NTB. The virtio device is doing
> >>>> feature negotiation with local virtio drivers. If there're
> >>>> feature mismatch, epf vhost drivers and do mediation between
> >>>> them.  
> >>> Here epf vhost should be initialized with a set of features for
> >>> it to negotiate either as vhost device or virtio device no? Where
> >>> should the initial feature set for epf vhost come from?  
> >>
> >> I think it can work as:
> >>
> >> 1) Having an initial features (hard coded in the code) set X in
> >> epf vhost 2) Using this X for both virtio device and vhost(vringh)
> >> device 3) local virtio driver will negotiate with virtio device
> >> with feature set Y 4) remote virtio driver will negotiate with
> >> vringh device with feature set Z 5) mediate between feature Y and
> >> feature Z since both Y and Z are a subset of X
> >>
> >>  
> > okay. I'm also thinking if we could have configfs for configuring
> > this. Anyways we could find different approaches of configuring
> > this.  
> 
> 
> Yes, and I think some management API is needed even in the design of 
> your "Software Layering". In that figure, rpmsg vhost need some
> pre-set or hard-coded features.

When I saw the plumbers talk, my first idea was "this needs to be a new
transport". You have some hard-coded or pre-configured features, and
then features are negotiated via a transport-specific means in the
usual way. There's basically an extra/extended layer for this (and
status, and whatever).

Does that make any sense?

> 
> 
> >>>>>> It will have virtqueues but only used for the communication
> >>>>>> between itself and
> >>>>>> uppter virtio driver. And it will have vringh queues which
> >>>>>> will be probe by virtio epf transport drivers. And it needs to
> >>>>>> do datacopy between virtqueue and
> >>>>>> vringh queues.
> >>>>>>
> >>>>>> It works like:
> >>>>>>
> >>>>>> virtio drivers <- virtqueue/virtio-bus -> epf vhost drivers <-
> >>>>>> vringh queue/epf>  
> >>>>>>
> >>>>>> The advantages is that there's no need for writing new buses
> >>>>>> and drivers.  
> >>>>> I think this will work however there is an addtional copy
> >>>>> between vringh queue and virtqueue,  
> >>>> I think not? E.g in use case 1), if we stick to virtio bus, we
> >>>> will have:
> >>>>
> >>>> virtio-rpmsg (EP) <- virtio ring(1) -> epf vhost driver (EP) <-
> >>>> virtio ring(2) -> virtio pci (RC) <-> virtio rpmsg (RC)  
> >>> IIUC epf vhost driver (EP) will access virtio ring(2) using
> >>> vringh?  
> >>
> >> Yes.
> >>
> >>  
> >>> And virtio
> >>> ring(2) is created by virtio pci (RC).  
> >>
> >> Yes.
> >>
> >>  
> >>>> What epf vhost driver did is to read from virtio ring(1) about
> >>>> the buffer len and addr and them DMA to Linux(RC)?  
> >>> okay, I made some optimization here where vhost-rpmsg using a
> >>> helper writes a buffer from rpmsg's upper layer directly to
> >>> remote Linux (RC) as against here were it has to be first written
> >>> to virtio ring (1).
> >>>
> >>> Thinking how this would look for NTB
> >>> virtio-rpmsg (HOST1) <- virtio ring(1) -> NTB(HOST1) <->
> >>> NTB(HOST2)  <- virtio ring(2) -> virtio-rpmsg (HOST2)
> >>>
> >>> Here the NTB(HOST1) will access the virtio ring(2) using vringh?  
> >>
> >> Yes, I think so it needs to use vring to access virtio ring (1) as
> >> well.  
> > NTB(HOST1) and virtio ring(1) will be in the same system. So it
> > doesn't have to use vring. virtio ring(1) is by the virtio device
> > the NTB(HOST1) creates.  
> 
> 
> Right.
> 
> 
> >>  
> >>> Do you also think this will work seamlessly with virtio_net.c,
> >>> virtio_blk.c?  
> >>
> >> Yes.  
> > okay, I haven't looked at this but the backend of virtio_blk should
> > access an actual storage device no?  
> 
> 
> Good point, for non-peer device like storage. There's probably no
> need for it to be registered on the virtio bus and it might be better
> to behave as you proposed.

I might be missing something; but if you expose something as a block
device, it should have something it can access with block reads/writes,
shouldn't it? Of course, that can be a variety of things.

> 
> Just to make sure I understand the design, how is VHOST SCSI expected
> to work in your proposal, does it have a device for file as a backend?
> 
> 
> >>  
> >>> I'd like to get clarity on two things in the approach you
> >>> suggested, one is features (since epf vhost should ideally be
> >>> transparent to any virtio driver)  
> >>
> >> We can have have an array of pre-defined features indexed by
> >> virtio device id in the code.
> >>
> >>  
> >>> and the other is how certain inputs to virtio device such as
> >>> number of buffers be determined.  
> >>
> >> We can start from hard coded the value like 256, or introduce some
> >> API for user to change the value.
> >>
> >>  
> >>> Thanks again for your suggestions!  
> >>
> >> You're welcome.
> >>
> >> Note that I just want to check whether or not we can reuse the
> >> virtio bus/driver. It's something similar to what you proposed in
> >> Software Layering but we just replace "vhost core" with "virtio
> >> bus" and move the vhost core below epf/ntb/platform transport.  
> > Got it. My initial design was based on my understanding of your
> > comments [1].  
> 
> 
> Yes, but that's just for a networking device. If we want something
> more generic, it may require more thought (bus etc).

I believe that we indeed need something bus-like to be able to support
a variety of devices.

> 
> 
> >
> > I'll try to create something based on your proposed design here.  
> 
> 
> Sure, but for coding, we'd better wait for other's opinion here.

Please tell me if my thoughts above make any sense... I have just
started looking at that, so I might be completely off.
Kishon Vijay Abraham I Sept. 1, 2020, 4:40 a.m. UTC | #16
Hi Mathieu,

On 15/07/20 10:45 pm, Mathieu Poirier wrote:
> Hey Kishon,
> 
> On Wed, Jul 08, 2020 at 06:43:45PM +0530, Kishon Vijay Abraham I wrote:
>> Hi Jason,
>>
>> On 7/8/2020 4:52 PM, Jason Wang wrote:
>>>
>>> On 2020/7/7 下午10:45, Kishon Vijay Abraham I wrote:
>>>> Hi Jason,
>>>>
>>>> On 7/7/2020 3:17 PM, Jason Wang wrote:
>>>>> On 2020/7/6 下午5:32, Kishon Vijay Abraham I wrote:
>>>>>> Hi Jason,
>>>>>>
>>>>>> On 7/3/2020 12:46 PM, Jason Wang wrote:
>>>>>>> On 2020/7/2 下午9:35, Kishon Vijay Abraham I wrote:
>>>>>>>> Hi Jason,
>>>>>>>>
>>>>>>>> On 7/2/2020 3:40 PM, Jason Wang wrote:
>>>>>>>>> On 2020/7/2 下午5:51, Michael S. Tsirkin wrote:
>>>>>>>>>> On Thu, Jul 02, 2020 at 01:51:21PM +0530, Kishon Vijay Abraham I wrote:
>>>>>>>>>>> This series enhances Linux Vhost support to enable SoC-to-SoC
>>>>>>>>>>> communication over MMIO. This series enables rpmsg communication between
>>>>>>>>>>> two SoCs using both PCIe RC<->EP and HOST1-NTB-HOST2
>>>>>>>>>>>
>>>>>>>>>>> 1) Modify vhost to use standard Linux driver model
>>>>>>>>>>> 2) Add support in vring to access virtqueue over MMIO
>>>>>>>>>>> 3) Add vhost client driver for rpmsg
>>>>>>>>>>> 4) Add PCIe RC driver (uses virtio) and PCIe EP driver (uses vhost) for
>>>>>>>>>>>         rpmsg communication between two SoCs connected to each other
>>>>>>>>>>> 5) Add NTB Virtio driver and NTB Vhost driver for rpmsg communication
>>>>>>>>>>>         between two SoCs connected via NTB
>>>>>>>>>>> 6) Add configfs to configure the components
>>>>>>>>>>>
>>>>>>>>>>> UseCase1 :
>>>>>>>>>>>
>>>>>>>>>>>       VHOST RPMSG                     VIRTIO RPMSG
>>>>>>>>>>>            +                               +
>>>>>>>>>>>            |                               |
>>>>>>>>>>>            |                               |
>>>>>>>>>>>            |                               |
>>>>>>>>>>>            |                               |
>>>>>>>>>>> +-----v------+                 +------v-------+
>>>>>>>>>>> |   Linux    |                 |     Linux    |
>>>>>>>>>>> |  Endpoint  |                 | Root Complex |
>>>>>>>>>>> |            <----------------->              |
>>>>>>>>>>> |            |                 |              |
>>>>>>>>>>> |    SOC1    |                 |     SOC2     |
>>>>>>>>>>> +------------+                 +--------------+
>>>>>>>>>>>
>>>>>>>>>>> UseCase 2:
>>>>>>>>>>>
>>>>>>>>>>>           VHOST RPMSG                                      VIRTIO RPMSG
>>>>>>>>>>>                +                                                 +
>>>>>>>>>>>                |                                                 |
>>>>>>>>>>>                |                                                 |
>>>>>>>>>>>                |                                                 |
>>>>>>>>>>>                |                                                 |
>>>>>>>>>>>         +------v------+                                   +------v------+
>>>>>>>>>>>         |             |                                   |             |
>>>>>>>>>>>         |    HOST1    |                                   |    HOST2    |
>>>>>>>>>>>         |             |                                   |             |
>>>>>>>>>>>         +------^------+                                   +------^------+
>>>>>>>>>>>                |                                                 |
>>>>>>>>>>>                |                                                 |
>>>>>>>>>>> +---------------------------------------------------------------------+
>>>>>>>>>>> |  +------v------+                                   +------v------+  |
>>>>>>>>>>> |  |             |                                   |             |  |
>>>>>>>>>>> |  |     EP      |                                   |     EP      |  |
>>>>>>>>>>> |  | CONTROLLER1 |                                   | CONTROLLER2 |  |
>>>>>>>>>>> |  |             <----------------------------------->             |  |
>>>>>>>>>>> |  |             |                                   |             |  |
>>>>>>>>>>> |  |             |                                   |             |  |
>>>>>>>>>>> |  |             |  SoC With Multiple EP Instances   |             |  |
>>>>>>>>>>> |  |             |  (Configured using NTB Function)  |             |  |
>>>>>>>>>>> |  +-------------+                                   +-------------+  |
>>>>>>>>>>> +---------------------------------------------------------------------+
>>>>>>>>>>>
>>>>>>>>>>> Software Layering:
>>>>>>>>>>>
>>>>>>>>>>> The high-level SW layering should look something like below. This series
>>>>>>>>>>> adds support only for RPMSG VHOST, however something similar should be
>>>>>>>>>>> done for net and scsi. With that any vhost device (PCI, NTB, Platform
>>>>>>>>>>> device, user) can use any of the vhost client driver.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>          +----------------+  +-----------+  +------------+  +----------+
>>>>>>>>>>>          |  RPMSG VHOST   |  | NET VHOST |  | SCSI VHOST |  |    X     |
>>>>>>>>>>>          +-------^--------+  +-----^-----+  +-----^------+  +----^-----+
>>>>>>>>>>>                  |                 |              |              |
>>>>>>>>>>>                  |                 |              |              |
>>>>>>>>>>>                  |                 |              |              |
>>>>>>>>>>> +-----------v-----------------v--------------v--------------v----------+
>>>>>>>>>>> |                            VHOST CORE                                |
>>>>>>>>>>> +--------^---------------^--------------------^------------------^-----+
>>>>>>>>>>>               |               |                    |                  |
>>>>>>>>>>>               |               |                    |                  |
>>>>>>>>>>>               |               |                    |                  |
>>>>>>>>>>> +--------v-------+  +----v------+  +----------v----------+  +----v-----+
>>>>>>>>>>> |  PCI EPF VHOST |  | NTB VHOST |  |PLATFORM DEVICE VHOST|  |    X     |
>>>>>>>>>>> +----------------+  +-----------+  +---------------------+  +----------+
>>>>>>>>>>>
>>>>>>>>>>> This was initially proposed here [1]
>>>>>>>>>>>
>>>>>>>>>>> [1] ->
>>>>>>>>>>> https://lore.kernel.org/r/2cf00ec4-1ed6-f66e-6897-006d1a5b6390@ti.com
>>>>>>>>>> I find this very interesting. A huge patchset so will take a bit
>>>>>>>>>> to review, but I certainly plan to do that. Thanks!
>>>>>>>>> Yes, it would be better if there's a git branch for us to have a look.
>>>>>>>> I've pushed the branch
>>>>>>>> https://github.com/kishon/linux-wip.git vhost_rpmsg_pci_ntb_rfc
>>>>>>> Thanks
>>>>>>>
>>>>>>>
>>>>>>>>> Btw, I'm not sure I get the big picture, but I vaguely feel some of the
>>>>>>>>> work is
>>>>>>>>> duplicated with vDPA (e.g the epf transport or vhost bus).
>>>>>>>> This is about connecting two different HW systems both running Linux and
>>>>>>>> doesn't necessarily involve virtualization.
>>>>>>> Right, this is something similar to VOP
>>>>>>> (Documentation/misc-devices/mic/mic_overview.rst). The different is the
>>>>>>> hardware I guess and VOP use userspace application to implement the device.
>>>>>> I'd also like to point out, this series tries to have communication between
>>>>>> two
>>>>>> SoCs in vendor agnostic way. Since this series solves for 2 usecases (PCIe
>>>>>> RC<->EP and NTB), for the NTB case it directly plugs into NTB framework and
>>>>>> any
>>>>>> of the HW in NTB below should be able to use a virtio-vhost communication
>>>>>>
>>>>>> #ls drivers/ntb/hw/
>>>>>> amd  epf  idt  intel  mscc
>>>>>>
>>>>>> And similarly for the PCIe RC<->EP communication, this adds a generic endpoint
>>>>>> function driver and hence any SoC that supports configurable PCIe endpoint can
>>>>>> use virtio-vhost communication
>>>>>>
>>>>>> # ls drivers/pci/controller/dwc/*ep*
>>>>>> drivers/pci/controller/dwc/pcie-designware-ep.c
>>>>>> drivers/pci/controller/dwc/pcie-uniphier-ep.c
>>>>>> drivers/pci/controller/dwc/pci-layerscape-ep.c
>>>>>
>>>>> Thanks for those backgrounds.
>>>>>
>>>>>
>>>>>>>>      So there is no guest or host as in
>>>>>>>> virtualization but two entirely different systems connected via PCIe cable,
>>>>>>>> one
>>>>>>>> acting as guest and one as host. So one system will provide virtio
>>>>>>>> functionality reserving memory for virtqueues and the other provides vhost
>>>>>>>> functionality providing a way to access the virtqueues in virtio memory.
>>>>>>>> One is
>>>>>>>> source and the other is sink and there is no intermediate entity. (vhost was
>>>>>>>> probably intermediate entity in virtualization?)
>>>>>>> (Not a native English speaker) but "vhost" could introduce some confusion for
>>>>>>> me since it was use for implementing virtio backend for userspace drivers. I
>>>>>>> guess "vringh" could be better.
>>>>>> Initially I had named this vringh but later decided to choose vhost instead of
>>>>>> vringh. vhost is still a virtio backend (not necessarily userspace) though it
>>>>>> now resides in an entirely different system. Whatever virtio is for a frontend
>>>>>> system, vhost can be that for a backend system. vring can be for accessing
>>>>>> virtqueue and can be used either in frontend or backend.
>>>>>
>>>>> Ok.
>>>>>
>>>>>
>>>>>>>>> Have you considered to implement these through vDPA?
>>>>>>>> IIUC vDPA only provides an interface to userspace and an in-kernel rpmsg
>>>>>>>> driver
>>>>>>>> or vhost net driver is not provided.
>>>>>>>>
>>>>>>>> The HW connection looks something like https://pasteboard.co/JfMVVHC.jpg
>>>>>>>> (usecase2 above),
>>>>>>> I see.
>>>>>>>
>>>>>>>
>>>>>>>>      all the boards run Linux. The middle board provides NTB
>>>>>>>> functionality and board on either side provides virtio/vhost
>>>>>>>> functionality and
>>>>>>>> transfer data using rpmsg.
>>>>>>> So I wonder whether it's worthwhile for a new bus. Can we use the existed
>>>>>>> virtio-bus/drivers? It might work as, except for the epf transport, we can
>>>>>>> introduce a epf "vhost" transport driver.
>>>>>> IMHO we'll need two buses one for frontend and other for backend because the
>>>>>> two components can then co-operate/interact with each other to provide a
>>>>>> functionality. Though both will seemingly provide similar callbacks, they are
>>>>>> both provide symmetrical or complimentary funcitonality and need not be
>>>>>> same or
>>>>>> identical.
>>>>>>
>>>>>> Having the same bus can also create sequencing issues.
>>>>>>
>>>>>> If you look at virtio_dev_probe() of virtio_bus
>>>>>>
>>>>>> device_features = dev->config->get_features(dev);
>>>>>>
>>>>>> Now if we use same bus for both front-end and back-end, both will try to
>>>>>> get_features when there has been no set_features. Ideally vhost device should
>>>>>> be initialized first with the set of features it supports. Vhost and virtio
>>>>>> should use "status" and "features" complimentarily and not identically.
>>>>>
>>>>> Yes, but there's no need for doing status/features passthrough in epf vhost
>>>>> drivers.b
>>>>>
>>>>>
>>>>>> virtio device (or frontend) cannot be initialized before vhost device (or
>>>>>> backend) gets initialized with data such as features. Similarly vhost
>>>>>> (backend)
>>>>>> cannot access virqueues or buffers before virtio (frontend) sets
>>>>>> VIRTIO_CONFIG_S_DRIVER_OK whereas that requirement is not there for virtio as
>>>>>> the physical memory for virtqueues are created by virtio (frontend).
>>>>>
>>>>> epf vhost drivers need to implement two devices: vhost(vringh) device and
>>>>> virtio device (which is a mediated device). The vhost(vringh) device is doing
>>>>> feature negotiation with the virtio device via RC/EP or NTB. The virtio device
>>>>> is doing feature negotiation with local virtio drivers. If there're feature
>>>>> mismatch, epf vhost drivers and do mediation between them.
>>>> Here epf vhost should be initialized with a set of features for it to negotiate
>>>> either as vhost device or virtio device no? Where should the initial feature
>>>> set for epf vhost come from?
>>>
>>>
>>> I think it can work as:
>>>
>>> 1) Having an initial features (hard coded in the code) set X in epf vhost
>>> 2) Using this X for both virtio device and vhost(vringh) device
>>> 3) local virtio driver will negotiate with virtio device with feature set Y
>>> 4) remote virtio driver will negotiate with vringh device with feature set Z
>>> 5) mediate between feature Y and feature Z since both Y and Z are a subset of X
>>>
>>>
>>
>> okay. I'm also thinking if we could have configfs for configuring this. Anyways
>> we could find different approaches of configuring this.
>>>>>
>>>>>>> It will have virtqueues but only used for the communication between itself
>>>>>>> and
>>>>>>> uppter virtio driver. And it will have vringh queues which will be probe by
>>>>>>> virtio epf transport drivers. And it needs to do datacopy between
>>>>>>> virtqueue and
>>>>>>> vringh queues.
>>>>>>>
>>>>>>> It works like:
>>>>>>>
>>>>>>> virtio drivers <- virtqueue/virtio-bus -> epf vhost drivers <- vringh
>>>>>>> queue/epf>
>>>>>>>
>>>>>>> The advantages is that there's no need for writing new buses and drivers.
>>>>>> I think this will work however there is an addtional copy between vringh queue
>>>>>> and virtqueue,
>>>>>
>>>>> I think not? E.g in use case 1), if we stick to virtio bus, we will have:
>>>>>
>>>>> virtio-rpmsg (EP) <- virtio ring(1) -> epf vhost driver (EP) <- virtio ring(2)
>>>>> -> virtio pci (RC) <-> virtio rpmsg (RC)
>>>> IIUC epf vhost driver (EP) will access virtio ring(2) using vringh?
>>>
>>>
>>> Yes.
>>>
>>>
>>>> And virtio
>>>> ring(2) is created by virtio pci (RC).
>>>
>>>
>>> Yes.
>>>
>>>
>>>>> What epf vhost driver did is to read from virtio ring(1) about the buffer len
>>>>> and addr and them DMA to Linux(RC)?
>>>> okay, I made some optimization here where vhost-rpmsg using a helper writes a
>>>> buffer from rpmsg's upper layer directly to remote Linux (RC) as against here
>>>> were it has to be first written to virtio ring (1).
>>>>
>>>> Thinking how this would look for NTB
>>>> virtio-rpmsg (HOST1) <- virtio ring(1) -> NTB(HOST1) <-> NTB(HOST2)  <- virtio
>>>> ring(2) -> virtio-rpmsg (HOST2)
>>>>
>>>> Here the NTB(HOST1) will access the virtio ring(2) using vringh?
>>>
>>>
>>> Yes, I think so it needs to use vring to access virtio ring (1) as well.
>>
>> NTB(HOST1) and virtio ring(1) will be in the same system. So it doesn't have to
>> use vring. virtio ring(1) is by the virtio device the NTB(HOST1) creates.
>>>
>>>
>>>>
>>>> Do you also think this will work seamlessly with virtio_net.c, virtio_blk.c?
>>>
>>>
>>> Yes.
>>
>> okay, I haven't looked at this but the backend of virtio_blk should access an
>> actual storage device no?
>>>
>>>
>>>>
>>>> I'd like to get clarity on two things in the approach you suggested, one is
>>>> features (since epf vhost should ideally be transparent to any virtio driver)
>>>
>>>
>>> We can have have an array of pre-defined features indexed by virtio device id
>>> in the code.
>>>
>>>
>>>> and the other is how certain inputs to virtio device such as number of buffers
>>>> be determined.
>>>
>>>
>>> We can start from hard coded the value like 256, or introduce some API for user
>>> to change the value.
>>>
>>>
>>>>
>>>> Thanks again for your suggestions!
>>>
>>>
>>> You're welcome.
>>>
>>> Note that I just want to check whether or not we can reuse the virtio
>>> bus/driver. It's something similar to what you proposed in Software Layering
>>> but we just replace "vhost core" with "virtio bus" and move the vhost core
>>> below epf/ntb/platform transport.
>>
>> Got it. My initial design was based on my understanding of your comments [1].
>>
>> I'll try to create something based on your proposed design here.
> 
> Based on the above conversation it seems like I should wait for another revision
> of this set before reviewing the RPMSG part.  Please confirm that my
> understanding is correct.

Right, there are multiple parts in this series that has to be aligned. 
I'd still think irrespective of the approach something like Address 
Service Notification support might have to be supported by rpmsg.

Thanks
Kishon
> 
> Thanks,
> Mathieu
> 
>>
>> Regards
>> Kishon
>>
>> [1] ->
>> https://lore.kernel.org/linux-pci/59982499-0fc1-2e39-9ff9-993fb4dd7dcc@redhat.com/
>>>
>>> Thanks
>>>
>>>
>>>>
>>>> Regards
>>>> Kishon
>>>>
>>>>>
>>>>>> in some cases adds latency because of forwarding interrupts
>>>>>> between vhost and virtio driver, vhost drivers providing features (which means
>>>>>> it has to be aware of which virtio driver will be connected).
>>>>>> virtio drivers (front end) generally access the buffers from it's local memory
>>>>>> but when in backend it can access over MMIO (like PCI EPF or NTB) or
>>>>>> userspace.
>>>>>>> Does this make sense?
>>>>>> Two copies in my opinion is an issue but lets get others opinions as well.
>>>>>
>>>>> Sure.
>>>>>
>>>>>
>>>>>> Thanks for your suggestions!
>>>>>
>>>>> You're welcome.
>>>>>
>>>>> Thanks
>>>>>
>>>>>
>>>>>> Regards
>>>>>> Kishon
>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Kishon
>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>> Kishon Vijay Abraham I (22):
>>>>>>>>>>>        vhost: Make _feature_ bits a property of vhost device
>>>>>>>>>>>        vhost: Introduce standard Linux driver model in VHOST
>>>>>>>>>>>        vhost: Add ops for the VHOST driver to configure VHOST device
>>>>>>>>>>>        vringh: Add helpers to access vring in MMIO
>>>>>>>>>>>        vhost: Add MMIO helpers for operations on vhost virtqueue
>>>>>>>>>>>        vhost: Introduce configfs entry for configuring VHOST
>>>>>>>>>>>        virtio_pci: Use request_threaded_irq() instead of request_irq()
>>>>>>>>>>>        rpmsg: virtio_rpmsg_bus: Disable receive virtqueue callback when
>>>>>>>>>>>          reading messages
>>>>>>>>>>>        rpmsg: Introduce configfs entry for configuring rpmsg
>>>>>>>>>>>        rpmsg: virtio_rpmsg_bus: Add Address Service Notification support
>>>>>>>>>>>        rpmsg: virtio_rpmsg_bus: Move generic rpmsg structure to
>>>>>>>>>>>          rpmsg_internal.h
>>>>>>>>>>>        virtio: Add ops to allocate and free buffer
>>>>>>>>>>>        rpmsg: virtio_rpmsg_bus: Use virtio_alloc_buffer() and
>>>>>>>>>>>          virtio_free_buffer()
>>>>>>>>>>>        rpmsg: Add VHOST based remote processor messaging bus
>>>>>>>>>>>        samples/rpmsg: Setup delayed work to send message
>>>>>>>>>>>        samples/rpmsg: Wait for address to be bound to rpdev for sending
>>>>>>>>>>>          message
>>>>>>>>>>>        rpmsg.txt: Add Documentation to configure rpmsg using configfs
>>>>>>>>>>>        virtio_pci: Add VIRTIO driver for VHOST on Configurable PCIe
>>>>>>>>>>> Endpoint
>>>>>>>>>>>          device
>>>>>>>>>>>        PCI: endpoint: Add EP function driver to provide VHOST interface
>>>>>>>>>>>        NTB: Add a new NTB client driver to implement VIRTIO functionality
>>>>>>>>>>>        NTB: Add a new NTB client driver to implement VHOST functionality
>>>>>>>>>>>        NTB: Describe the ntb_virtio and ntb_vhost client in the
>>>>>>>>>>> documentation
>>>>>>>>>>>
>>>>>>>>>>>       Documentation/driver-api/ntb.rst              |   11 +
>>>>>>>>>>>       Documentation/rpmsg.txt                       |   56 +
>>>>>>>>>>>       drivers/ntb/Kconfig                           |   18 +
>>>>>>>>>>>       drivers/ntb/Makefile                          |    2 +
>>>>>>>>>>>       drivers/ntb/ntb_vhost.c                       |  776 +++++++++++
>>>>>>>>>>>       drivers/ntb/ntb_virtio.c                      |  853 ++++++++++++
>>>>>>>>>>>       drivers/ntb/ntb_virtio.h                      |   56 +
>>>>>>>>>>>       drivers/pci/endpoint/functions/Kconfig        |   11 +
>>>>>>>>>>>       drivers/pci/endpoint/functions/Makefile       |    1 +
>>>>>>>>>>>       .../pci/endpoint/functions/pci-epf-vhost.c    | 1144
>>>>>>>>>>> ++++++++++++++++
>>>>>>>>>>>       drivers/rpmsg/Kconfig                         |   10 +
>>>>>>>>>>>       drivers/rpmsg/Makefile                        |    3 +-
>>>>>>>>>>>       drivers/rpmsg/rpmsg_cfs.c                     |  394 ++++++
>>>>>>>>>>>       drivers/rpmsg/rpmsg_core.c                    |    7 +
>>>>>>>>>>>       drivers/rpmsg/rpmsg_internal.h                |  136 ++
>>>>>>>>>>>       drivers/rpmsg/vhost_rpmsg_bus.c               | 1151
>>>>>>>>>>> +++++++++++++++++
>>>>>>>>>>>       drivers/rpmsg/virtio_rpmsg_bus.c              |  184 ++-
>>>>>>>>>>>       drivers/vhost/Kconfig                         |    1 +
>>>>>>>>>>>       drivers/vhost/Makefile                        |    2 +-
>>>>>>>>>>>       drivers/vhost/net.c                           |   10 +-
>>>>>>>>>>>       drivers/vhost/scsi.c                          |   24 +-
>>>>>>>>>>>       drivers/vhost/test.c                          |   17 +-
>>>>>>>>>>>       drivers/vhost/vdpa.c                          |    2 +-
>>>>>>>>>>>       drivers/vhost/vhost.c                         |  730 ++++++++++-
>>>>>>>>>>>       drivers/vhost/vhost_cfs.c                     |  341 +++++
>>>>>>>>>>>       drivers/vhost/vringh.c                        |  332 +++++
>>>>>>>>>>>       drivers/vhost/vsock.c                         |   20 +-
>>>>>>>>>>>       drivers/virtio/Kconfig                        |    9 +
>>>>>>>>>>>       drivers/virtio/Makefile                       |    1 +
>>>>>>>>>>>       drivers/virtio/virtio_pci_common.c            |   25 +-
>>>>>>>>>>>       drivers/virtio/virtio_pci_epf.c               |  670 ++++++++++
>>>>>>>>>>>       include/linux/mod_devicetable.h               |    6 +
>>>>>>>>>>>       include/linux/rpmsg.h                         |    6 +
>>>>>>>>>>>       {drivers/vhost => include/linux}/vhost.h      |  132 +-
>>>>>>>>>>>       include/linux/virtio.h                        |    3 +
>>>>>>>>>>>       include/linux/virtio_config.h                 |   42 +
>>>>>>>>>>>       include/linux/vringh.h                        |   46 +
>>>>>>>>>>>       samples/rpmsg/rpmsg_client_sample.c           |   32 +-
>>>>>>>>>>>       tools/virtio/virtio_test.c                    |    2 +-
>>>>>>>>>>>       39 files changed, 7083 insertions(+), 183 deletions(-)
>>>>>>>>>>>       create mode 100644 drivers/ntb/ntb_vhost.c
>>>>>>>>>>>       create mode 100644 drivers/ntb/ntb_virtio.c
>>>>>>>>>>>       create mode 100644 drivers/ntb/ntb_virtio.h
>>>>>>>>>>>       create mode 100644 drivers/pci/endpoint/functions/pci-epf-vhost.c
>>>>>>>>>>>       create mode 100644 drivers/rpmsg/rpmsg_cfs.c
>>>>>>>>>>>       create mode 100644 drivers/rpmsg/vhost_rpmsg_bus.c
>>>>>>>>>>>       create mode 100644 drivers/vhost/vhost_cfs.c
>>>>>>>>>>>       create mode 100644 drivers/virtio/virtio_pci_epf.c
>>>>>>>>>>>       rename {drivers/vhost => include/linux}/vhost.h (66%)
>>>>>>>>>>>
>>>>>>>>>>> -- 
>>>>>>>>>>> 2.17.1
>>>>>>>>>>>
>>>
Kishon Vijay Abraham I Sept. 1, 2020, 5:24 a.m. UTC | #17
Hi,

On 28/08/20 4:04 pm, Cornelia Huck wrote:
> On Thu, 9 Jul 2020 14:26:53 +0800
> Jason Wang <jasowang@redhat.com> wrote:
> 
> [Let me note right at the beginning that I first noted this while
> listening to Kishon's talk at LPC on Wednesday. I might be very
> confused about the background here, so let me apologize beforehand for
> any confusion I might spread.]
> 
>> On 2020/7/8 下午9:13, Kishon Vijay Abraham I wrote:
>>> Hi Jason,
>>>
>>> On 7/8/2020 4:52 PM, Jason Wang wrote:
>>>> On 2020/7/7 下午10:45, Kishon Vijay Abraham I wrote:
>>>>> Hi Jason,
>>>>>
>>>>> On 7/7/2020 3:17 PM, Jason Wang wrote:
>>>>>> On 2020/7/6 下午5:32, Kishon Vijay Abraham I wrote:
>>>>>>> Hi Jason,
>>>>>>>
>>>>>>> On 7/3/2020 12:46 PM, Jason Wang wrote:
>>>>>>>> On 2020/7/2 下午9:35, Kishon Vijay Abraham I wrote:
>>>>>>>>> Hi Jason,
>>>>>>>>>
>>>>>>>>> On 7/2/2020 3:40 PM, Jason Wang wrote:
>>>>>>>>>> On 2020/7/2 下午5:51, Michael S. Tsirkin wrote:
>>>>>>>>>>> On Thu, Jul 02, 2020 at 01:51:21PM +0530, Kishon Vijay Abraham I wrote:
>>>>>>>>>>>> This series enhances Linux Vhost support to enable SoC-to-SoC
>>>>>>>>>>>> communication over MMIO. This series enables rpmsg communication between
>>>>>>>>>>>> two SoCs using both PCIe RC<->EP and HOST1-NTB-HOST2
>>>>>>>>>>>>
>>>>>>>>>>>> 1) Modify vhost to use standard Linux driver model
>>>>>>>>>>>> 2) Add support in vring to access virtqueue over MMIO
>>>>>>>>>>>> 3) Add vhost client driver for rpmsg
>>>>>>>>>>>> 4) Add PCIe RC driver (uses virtio) and PCIe EP driver (uses vhost) for
>>>>>>>>>>>>          rpmsg communication between two SoCs connected to each other
>>>>>>>>>>>> 5) Add NTB Virtio driver and NTB Vhost driver for rpmsg communication
>>>>>>>>>>>>          between two SoCs connected via NTB
>>>>>>>>>>>> 6) Add configfs to configure the components
>>>>>>>>>>>>
>>>>>>>>>>>> UseCase1 :
>>>>>>>>>>>>
>>>>>>>>>>>>        VHOST RPMSG                     VIRTIO RPMSG
>>>>>>>>>>>>             +                               +
>>>>>>>>>>>>             |                               |
>>>>>>>>>>>>             |                               |
>>>>>>>>>>>>             |                               |
>>>>>>>>>>>>             |                               |
>>>>>>>>>>>> +-----v------+                 +------v-------+
>>>>>>>>>>>> |   Linux    |                 |     Linux    |
>>>>>>>>>>>> |  Endpoint  |                 | Root Complex |
>>>>>>>>>>>> |            <----------------->              |
>>>>>>>>>>>> |            |                 |              |
>>>>>>>>>>>> |    SOC1    |                 |     SOC2     |
>>>>>>>>>>>> +------------+                 +--------------+
>>>>>>>>>>>>
>>>>>>>>>>>> UseCase 2:
>>>>>>>>>>>>
>>>>>>>>>>>>            VHOST RPMSG                                      VIRTIO RPMSG
>>>>>>>>>>>>                 +                                                 +
>>>>>>>>>>>>                 |                                                 |
>>>>>>>>>>>>                 |                                                 |
>>>>>>>>>>>>                 |                                                 |
>>>>>>>>>>>>                 |                                                 |
>>>>>>>>>>>>          +------v------+                                   +------v------+
>>>>>>>>>>>>          |             |                                   |             |
>>>>>>>>>>>>          |    HOST1    |                                   |    HOST2    |
>>>>>>>>>>>>          |             |                                   |             |
>>>>>>>>>>>>          +------^------+                                   +------^------+
>>>>>>>>>>>>                 |                                                 |
>>>>>>>>>>>>                 |                                                 |
>>>>>>>>>>>> +---------------------------------------------------------------------+
>>>>>>>>>>>> |  +------v------+                                   +------v------+  |
>>>>>>>>>>>> |  |             |                                   |             |  |
>>>>>>>>>>>> |  |     EP      |                                   |     EP      |  |
>>>>>>>>>>>> |  | CONTROLLER1 |                                   | CONTROLLER2 |  |
>>>>>>>>>>>> |  |             <----------------------------------->             |  |
>>>>>>>>>>>> |  |             |                                   |             |  |
>>>>>>>>>>>> |  |             |                                   |             |  |
>>>>>>>>>>>> |  |             |  SoC With Multiple EP Instances   |             |  |
>>>>>>>>>>>> |  |             |  (Configured using NTB Function)  |             |  |
>>>>>>>>>>>> |  +-------------+                                   +-------------+  |
>>>>>>>>>>>> +---------------------------------------------------------------------+
> 
> First of all, to clarify the terminology:
> Is "vhost rpmsg" acting as what the virtio standard calls the 'device',
> and "virtio rpmsg" as the 'driver'? Or is the "vhost" part mostly just

Right, vhost_rpmsg is 'device' and virtio_rpmsg is 'driver'.
> virtqueues + the exiting vhost interfaces?

It's implemented to provide the full 'device' functionality.
> 
>>>>>>>>>>>>
>>>>>>>>>>>> Software Layering:
>>>>>>>>>>>>
>>>>>>>>>>>> The high-level SW layering should look something like below. This series
>>>>>>>>>>>> adds support only for RPMSG VHOST, however something similar should be
>>>>>>>>>>>> done for net and scsi. With that any vhost device (PCI, NTB, Platform
>>>>>>>>>>>> device, user) can use any of the vhost client driver.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>           +----------------+  +-----------+  +------------+  +----------+
>>>>>>>>>>>>           |  RPMSG VHOST   |  | NET VHOST |  | SCSI VHOST |  |    X     |
>>>>>>>>>>>>           +-------^--------+  +-----^-----+  +-----^------+  +----^-----+
>>>>>>>>>>>>                   |                 |              |              |
>>>>>>>>>>>>                   |                 |              |              |
>>>>>>>>>>>>                   |                 |              |              |
>>>>>>>>>>>> +-----------v-----------------v--------------v--------------v----------+
>>>>>>>>>>>> |                            VHOST CORE                                |
>>>>>>>>>>>> +--------^---------------^--------------------^------------------^-----+
>>>>>>>>>>>>                |               |                    |                  |
>>>>>>>>>>>>                |               |                    |                  |
>>>>>>>>>>>>                |               |                    |                  |
>>>>>>>>>>>> +--------v-------+  +----v------+  +----------v----------+  +----v-----+
>>>>>>>>>>>> |  PCI EPF VHOST |  | NTB VHOST |  |PLATFORM DEVICE VHOST|  |    X     |
>>>>>>>>>>>> +----------------+  +-----------+  +---------------------+  +----------+
> 
> So, the upper half is basically various functionality types, e.g. a net
> device. What is the lower half, a hardware interface? Would it be
> equivalent to e.g. a normal PCI device?

Right, the upper half should provide the functionality.
The bottom layer could be a HW interface (like PCIe device or NTB 
device) or it could be a SW interface (for accessing virtio ring in 
userspace) that could be used by Hypervisor.

The top half should be transparent to what type of device is actually 
using it.

> 
>>>>>>>>>>>>
>>>>>>>>>>>> This was initially proposed here [1]
>>>>>>>>>>>>
>>>>>>>>>>>> [1] ->
>>>>>>>>>>>> https://lore.kernel.org/r/2cf00ec4-1ed6-f66e-6897-006d1a5b6390@ti.com
>>>>>>>>>>> I find this very interesting. A huge patchset so will take a bit
>>>>>>>>>>> to review, but I certainly plan to do that. Thanks!
>>>>>>>>>> Yes, it would be better if there's a git branch for us to have a look.
>>>>>>>>> I've pushed the branch
>>>>>>>>> https://github.com/kishon/linux-wip.git vhost_rpmsg_pci_ntb_rfc
>>>>>>>> Thanks
>>>>>>>>
>>>>>>>>   
>>>>>>>>>> Btw, I'm not sure I get the big picture, but I vaguely feel some of the
>>>>>>>>>> work is
>>>>>>>>>> duplicated with vDPA (e.g the epf transport or vhost bus).
>>>>>>>>> This is about connecting two different HW systems both running Linux and
>>>>>>>>> doesn't necessarily involve virtualization.
>>>>>>>> Right, this is something similar to VOP
>>>>>>>> (Documentation/misc-devices/mic/mic_overview.rst). The different is the
>>>>>>>> hardware I guess and VOP use userspace application to implement the device.
>>>>>>> I'd also like to point out, this series tries to have communication between
>>>>>>> two
>>>>>>> SoCs in vendor agnostic way. Since this series solves for 2 usecases (PCIe
>>>>>>> RC<->EP and NTB), for the NTB case it directly plugs into NTB framework and
>>>>>>> any
>>>>>>> of the HW in NTB below should be able to use a virtio-vhost communication
>>>>>>>
>>>>>>> #ls drivers/ntb/hw/
>>>>>>> amd  epf  idt  intel  mscc
>>>>>>>
>>>>>>> And similarly for the PCIe RC<->EP communication, this adds a generic endpoint
>>>>>>> function driver and hence any SoC that supports configurable PCIe endpoint can
>>>>>>> use virtio-vhost communication
>>>>>>>
>>>>>>> # ls drivers/pci/controller/dwc/*ep*
>>>>>>> drivers/pci/controller/dwc/pcie-designware-ep.c
>>>>>>> drivers/pci/controller/dwc/pcie-uniphier-ep.c
>>>>>>> drivers/pci/controller/dwc/pci-layerscape-ep.c
>>>>>> Thanks for those backgrounds.
>>>>>>
>>>>>>   
>>>>>>>>>       So there is no guest or host as in
>>>>>>>>> virtualization but two entirely different systems connected via PCIe cable,
>>>>>>>>> one
>>>>>>>>> acting as guest and one as host. So one system will provide virtio
>>>>>>>>> functionality reserving memory for virtqueues and the other provides vhost
>>>>>>>>> functionality providing a way to access the virtqueues in virtio memory.
>>>>>>>>> One is
>>>>>>>>> source and the other is sink and there is no intermediate entity. (vhost was
>>>>>>>>> probably intermediate entity in virtualization?)
>>>>>>>> (Not a native English speaker) but "vhost" could introduce some confusion for
>>>>>>>> me since it was use for implementing virtio backend for userspace drivers. I
>>>>>>>> guess "vringh" could be better.
>>>>>>> Initially I had named this vringh but later decided to choose vhost instead of
>>>>>>> vringh. vhost is still a virtio backend (not necessarily userspace) though it
>>>>>>> now resides in an entirely different system. Whatever virtio is for a frontend
>>>>>>> system, vhost can be that for a backend system. vring can be for accessing
>>>>>>> virtqueue and can be used either in frontend or backend.
> 
> I guess that clears up at least some of my questions from above...
> 
>>>>>> Ok.
>>>>>>
>>>>>>   
>>>>>>>>>> Have you considered to implement these through vDPA?
>>>>>>>>> IIUC vDPA only provides an interface to userspace and an in-kernel rpmsg
>>>>>>>>> driver
>>>>>>>>> or vhost net driver is not provided.
>>>>>>>>>
>>>>>>>>> The HW connection looks something like https://pasteboard.co/JfMVVHC.jpg
>>>>>>>>> (usecase2 above),
>>>>>>>> I see.
>>>>>>>>
>>>>>>>>   
>>>>>>>>>       all the boards run Linux. The middle board provides NTB
>>>>>>>>> functionality and board on either side provides virtio/vhost
>>>>>>>>> functionality and
>>>>>>>>> transfer data using rpmsg.
> 
> This setup looks really interesting (sometimes, it's really hard to
> imagine this in the abstract.)
>   
>>>>>>>> So I wonder whether it's worthwhile for a new bus. Can we use
>>>>>>>> the existed virtio-bus/drivers? It might work as, except for
>>>>>>>> the epf transport, we can introduce a epf "vhost" transport
>>>>>>>> driver.
>>>>>>> IMHO we'll need two buses one for frontend and other for
>>>>>>> backend because the two components can then co-operate/interact
>>>>>>> with each other to provide a functionality. Though both will
>>>>>>> seemingly provide similar callbacks, they are both provide
>>>>>>> symmetrical or complimentary funcitonality and need not be same
>>>>>>> or identical.
>>>>>>>
>>>>>>> Having the same bus can also create sequencing issues.
>>>>>>>
>>>>>>> If you look at virtio_dev_probe() of virtio_bus
>>>>>>>
>>>>>>> device_features = dev->config->get_features(dev);
>>>>>>>
>>>>>>> Now if we use same bus for both front-end and back-end, both
>>>>>>> will try to get_features when there has been no set_features.
>>>>>>> Ideally vhost device should be initialized first with the set
>>>>>>> of features it supports. Vhost and virtio should use "status"
>>>>>>> and "features" complimentarily and not identically.
>>>>>> Yes, but there's no need for doing status/features passthrough
>>>>>> in epf vhost drivers.b
>>>>>>
>>>>>>   
>>>>>>> virtio device (or frontend) cannot be initialized before vhost
>>>>>>> device (or backend) gets initialized with data such as
>>>>>>> features. Similarly vhost (backend)
>>>>>>> cannot access virqueues or buffers before virtio (frontend) sets
>>>>>>> VIRTIO_CONFIG_S_DRIVER_OK whereas that requirement is not there
>>>>>>> for virtio as the physical memory for virtqueues are created by
>>>>>>> virtio (frontend).
>>>>>> epf vhost drivers need to implement two devices: vhost(vringh)
>>>>>> device and virtio device (which is a mediated device). The
>>>>>> vhost(vringh) device is doing feature negotiation with the
>>>>>> virtio device via RC/EP or NTB. The virtio device is doing
>>>>>> feature negotiation with local virtio drivers. If there're
>>>>>> feature mismatch, epf vhost drivers and do mediation between
>>>>>> them.
>>>>> Here epf vhost should be initialized with a set of features for
>>>>> it to negotiate either as vhost device or virtio device no? Where
>>>>> should the initial feature set for epf vhost come from?
>>>>
>>>> I think it can work as:
>>>>
>>>> 1) Having an initial features (hard coded in the code) set X in
>>>> epf vhost 2) Using this X for both virtio device and vhost(vringh)
>>>> device 3) local virtio driver will negotiate with virtio device
>>>> with feature set Y 4) remote virtio driver will negotiate with
>>>> vringh device with feature set Z 5) mediate between feature Y and
>>>> feature Z since both Y and Z are a subset of X
>>>>
>>>>   
>>> okay. I'm also thinking if we could have configfs for configuring
>>> this. Anyways we could find different approaches of configuring
>>> this.
>>
>>
>> Yes, and I think some management API is needed even in the design of
>> your "Software Layering". In that figure, rpmsg vhost need some
>> pre-set or hard-coded features.
> 
> When I saw the plumbers talk, my first idea was "this needs to be a new
> transport". You have some hard-coded or pre-configured features, and
> then features are negotiated via a transport-specific means in the
> usual way. There's basically an extra/extended layer for this (and
> status, and whatever).

I think for PCIe root complex to PCIe endpoint communication it's still 
"Virtio Over PCI Bus", though existing layout cannot be used in this 
context (find virtio capability will fail for modern interface and 
loading queue status immediately after writing queue number is not 
possible for root complex to endpoint communication; setup_vq() in 
virtio_pci_legacy.c).

"Virtio Over NTB" should anyways be a new transport.
> 
> Does that make any sense?

yeah, in the approach I used the initial features are hard-coded in 
vhost-rpmsg (inherent to the rpmsg) but when we have to use adapter 
layer (vhost only for accessing virtio ring and use virtio drivers on 
both front end and backend), based on the functionality (e.g, rpmsg), 
the vhost should be configured with features (to be presented to the 
virtio) and that's why additional layer or APIs will be required.
> 
>>
>>
>>>>>>>> It will have virtqueues but only used for the communication
>>>>>>>> between itself and
>>>>>>>> uppter virtio driver. And it will have vringh queues which
>>>>>>>> will be probe by virtio epf transport drivers. And it needs to
>>>>>>>> do datacopy between virtqueue and
>>>>>>>> vringh queues.
>>>>>>>>
>>>>>>>> It works like:
>>>>>>>>
>>>>>>>> virtio drivers <- virtqueue/virtio-bus -> epf vhost drivers <-
>>>>>>>> vringh queue/epf>
>>>>>>>>
>>>>>>>> The advantages is that there's no need for writing new buses
>>>>>>>> and drivers.
>>>>>>> I think this will work however there is an addtional copy
>>>>>>> between vringh queue and virtqueue,
>>>>>> I think not? E.g in use case 1), if we stick to virtio bus, we
>>>>>> will have:
>>>>>>
>>>>>> virtio-rpmsg (EP) <- virtio ring(1) -> epf vhost driver (EP) <-
>>>>>> virtio ring(2) -> virtio pci (RC) <-> virtio rpmsg (RC)
>>>>> IIUC epf vhost driver (EP) will access virtio ring(2) using
>>>>> vringh?
>>>>
>>>> Yes.
>>>>
>>>>   
>>>>> And virtio
>>>>> ring(2) is created by virtio pci (RC).
>>>>
>>>> Yes.
>>>>
>>>>   
>>>>>> What epf vhost driver did is to read from virtio ring(1) about
>>>>>> the buffer len and addr and them DMA to Linux(RC)?
>>>>> okay, I made some optimization here where vhost-rpmsg using a
>>>>> helper writes a buffer from rpmsg's upper layer directly to
>>>>> remote Linux (RC) as against here were it has to be first written
>>>>> to virtio ring (1).
>>>>>
>>>>> Thinking how this would look for NTB
>>>>> virtio-rpmsg (HOST1) <- virtio ring(1) -> NTB(HOST1) <->
>>>>> NTB(HOST2)  <- virtio ring(2) -> virtio-rpmsg (HOST2)
>>>>>
>>>>> Here the NTB(HOST1) will access the virtio ring(2) using vringh?
>>>>
>>>> Yes, I think so it needs to use vring to access virtio ring (1) as
>>>> well.
>>> NTB(HOST1) and virtio ring(1) will be in the same system. So it
>>> doesn't have to use vring. virtio ring(1) is by the virtio device
>>> the NTB(HOST1) creates.
>>
>>
>> Right.
>>
>>
>>>>   
>>>>> Do you also think this will work seamlessly with virtio_net.c,
>>>>> virtio_blk.c?
>>>>
>>>> Yes.
>>> okay, I haven't looked at this but the backend of virtio_blk should
>>> access an actual storage device no?
>>
>>
>> Good point, for non-peer device like storage. There's probably no
>> need for it to be registered on the virtio bus and it might be better
>> to behave as you proposed.
> 
> I might be missing something; but if you expose something as a block
> device, it should have something it can access with block reads/writes,
> shouldn't it? Of course, that can be a variety of things.
> 
>>
>> Just to make sure I understand the design, how is VHOST SCSI expected
>> to work in your proposal, does it have a device for file as a backend?
>>
>>
>>>>   
>>>>> I'd like to get clarity on two things in the approach you
>>>>> suggested, one is features (since epf vhost should ideally be
>>>>> transparent to any virtio driver)
>>>>
>>>> We can have have an array of pre-defined features indexed by
>>>> virtio device id in the code.
>>>>
>>>>   
>>>>> and the other is how certain inputs to virtio device such as
>>>>> number of buffers be determined.
>>>>
>>>> We can start from hard coded the value like 256, or introduce some
>>>> API for user to change the value.
>>>>
>>>>   
>>>>> Thanks again for your suggestions!
>>>>
>>>> You're welcome.
>>>>
>>>> Note that I just want to check whether or not we can reuse the
>>>> virtio bus/driver. It's something similar to what you proposed in
>>>> Software Layering but we just replace "vhost core" with "virtio
>>>> bus" and move the vhost core below epf/ntb/platform transport.
>>> Got it. My initial design was based on my understanding of your
>>> comments [1].
>>
>>
>> Yes, but that's just for a networking device. If we want something
>> more generic, it may require more thought (bus etc).
> 
> I believe that we indeed need something bus-like to be able to support
> a variety of devices.

I think we could still have adapter layers for different types of 
devices ([1]) and use existing virtio bus for both front end and back 
end. Using bus-like will however simplify adding support for new types 
of devices and adding adapters for devices will be slightly more complex.

[1] -> Page 13 in 
https://linuxplumbersconf.org/event/7/contributions/849/attachments/642/1175/Virtio_for_PCIe_RC_EP_NTB.pdf
> 
>>
>>
>>>
>>> I'll try to create something based on your proposed design here.
>>
>>
>> Sure, but for coding, we'd better wait for other's opinion here.
> 
> Please tell me if my thoughts above make any sense... I have just
> started looking at that, so I might be completely off.

I think your understanding is correct! Thanks for your inputs.

Thanks
Kishon
Jason Wang Sept. 1, 2020, 8:50 a.m. UTC | #18
On 2020/9/1 下午1:24, Kishon Vijay Abraham I wrote:
> Hi,
>
> On 28/08/20 4:04 pm, Cornelia Huck wrote:
>> On Thu, 9 Jul 2020 14:26:53 +0800
>> Jason Wang <jasowang@redhat.com> wrote:
>>
>> [Let me note right at the beginning that I first noted this while
>> listening to Kishon's talk at LPC on Wednesday. I might be very
>> confused about the background here, so let me apologize beforehand for
>> any confusion I might spread.]
>>
>>> On 2020/7/8 下午9:13, Kishon Vijay Abraham I wrote:
>>>> Hi Jason,
>>>>
>>>> On 7/8/2020 4:52 PM, Jason Wang wrote:
>>>>> On 2020/7/7 下午10:45, Kishon Vijay Abraham I wrote:
>>>>>> Hi Jason,
>>>>>>
>>>>>> On 7/7/2020 3:17 PM, Jason Wang wrote:
>>>>>>> On 2020/7/6 下午5:32, Kishon Vijay Abraham I wrote:
>>>>>>>> Hi Jason,
>>>>>>>>
>>>>>>>> On 7/3/2020 12:46 PM, Jason Wang wrote:
>>>>>>>>> On 2020/7/2 下午9:35, Kishon Vijay Abraham I wrote:
>>>>>>>>>> Hi Jason,
>>>>>>>>>>
>>>>>>>>>> On 7/2/2020 3:40 PM, Jason Wang wrote:
>>>>>>>>>>> On 2020/7/2 下午5:51, Michael S. Tsirkin wrote:
>>>>>>>>>>>> On Thu, Jul 02, 2020 at 01:51:21PM +0530, Kishon Vijay 
>>>>>>>>>>>> Abraham I wrote:
>>>>>>>>>>>>> This series enhances Linux Vhost support to enable SoC-to-SoC
>>>>>>>>>>>>> communication over MMIO. This series enables rpmsg 
>>>>>>>>>>>>> communication between
>>>>>>>>>>>>> two SoCs using both PCIe RC<->EP and HOST1-NTB-HOST2
>>>>>>>>>>>>>
>>>>>>>>>>>>> 1) Modify vhost to use standard Linux driver model
>>>>>>>>>>>>> 2) Add support in vring to access virtqueue over MMIO
>>>>>>>>>>>>> 3) Add vhost client driver for rpmsg
>>>>>>>>>>>>> 4) Add PCIe RC driver (uses virtio) and PCIe EP driver 
>>>>>>>>>>>>> (uses vhost) for
>>>>>>>>>>>>>          rpmsg communication between two SoCs connected to 
>>>>>>>>>>>>> each other
>>>>>>>>>>>>> 5) Add NTB Virtio driver and NTB Vhost driver for rpmsg 
>>>>>>>>>>>>> communication
>>>>>>>>>>>>>          between two SoCs connected via NTB
>>>>>>>>>>>>> 6) Add configfs to configure the components
>>>>>>>>>>>>>
>>>>>>>>>>>>> UseCase1 :
>>>>>>>>>>>>>
>>>>>>>>>>>>>        VHOST RPMSG VIRTIO RPMSG
>>>>>>>>>>>>> +                               +
>>>>>>>>>>>>> |                               |
>>>>>>>>>>>>> |                               |
>>>>>>>>>>>>> |                               |
>>>>>>>>>>>>> |                               |
>>>>>>>>>>>>> +-----v------+ +------v-------+
>>>>>>>>>>>>> |   Linux    |                 | Linux    |
>>>>>>>>>>>>> |  Endpoint  |                 | Root Complex |
>>>>>>>>>>>>> | <----------------->              |
>>>>>>>>>>>>> |            | |              |
>>>>>>>>>>>>> |    SOC1    |                 | SOC2     |
>>>>>>>>>>>>> +------------+ +--------------+
>>>>>>>>>>>>>
>>>>>>>>>>>>> UseCase 2:
>>>>>>>>>>>>>
>>>>>>>>>>>>>            VHOST RPMSG VIRTIO RPMSG
>>>>>>>>>>>>> + +
>>>>>>>>>>>>> | |
>>>>>>>>>>>>> | |
>>>>>>>>>>>>> | |
>>>>>>>>>>>>> | |
>>>>>>>>>>>>> +------v------+ +------v------+
>>>>>>>>>>>>>          | | |             |
>>>>>>>>>>>>>          |    HOST1 |                                   | 
>>>>>>>>>>>>> HOST2    |
>>>>>>>>>>>>>          | | |             |
>>>>>>>>>>>>> +------^------+ +------^------+
>>>>>>>>>>>>> | |
>>>>>>>>>>>>> | |
>>>>>>>>>>>>> +---------------------------------------------------------------------+ 
>>>>>>>>>>>>>
>>>>>>>>>>>>> | +------v------+ +------v------+  |
>>>>>>>>>>>>> |  | | |             |  |
>>>>>>>>>>>>> |  |     EP |                                   | EP      
>>>>>>>>>>>>> |  |
>>>>>>>>>>>>> |  | CONTROLLER1 |                                   | 
>>>>>>>>>>>>> CONTROLLER2 |  |
>>>>>>>>>>>>> |  | <-----------------------------------> |  |
>>>>>>>>>>>>> |  | | |             |  |
>>>>>>>>>>>>> |  | | |             |  |
>>>>>>>>>>>>> |  |             |  SoC With Multiple EP Instances   
>>>>>>>>>>>>> |             |  |
>>>>>>>>>>>>> |  |             |  (Configured using NTB Function)  
>>>>>>>>>>>>> |             |  |
>>>>>>>>>>>>> | +-------------+ +-------------+  |
>>>>>>>>>>>>> +---------------------------------------------------------------------+ 
>>>>>>>>>>>>>
>>
>> First of all, to clarify the terminology:
>> Is "vhost rpmsg" acting as what the virtio standard calls the 'device',
>> and "virtio rpmsg" as the 'driver'? Or is the "vhost" part mostly just
>
> Right, vhost_rpmsg is 'device' and virtio_rpmsg is 'driver'.
>> virtqueues + the exiting vhost interfaces?
>
> It's implemented to provide the full 'device' functionality.
>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Software Layering:
>>>>>>>>>>>>>
>>>>>>>>>>>>> The high-level SW layering should look something like 
>>>>>>>>>>>>> below. This series
>>>>>>>>>>>>> adds support only for RPMSG VHOST, however something 
>>>>>>>>>>>>> similar should be
>>>>>>>>>>>>> done for net and scsi. With that any vhost device (PCI, 
>>>>>>>>>>>>> NTB, Platform
>>>>>>>>>>>>> device, user) can use any of the vhost client driver.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>           +----------------+ +-----------+  +------------+ 
>>>>>>>>>>>>> +----------+
>>>>>>>>>>>>>           |  RPMSG VHOST   |  | NET VHOST |  | SCSI VHOST 
>>>>>>>>>>>>> |  |    X     |
>>>>>>>>>>>>>           +-------^--------+ +-----^-----+  +-----^------+ 
>>>>>>>>>>>>> +----^-----+
>>>>>>>>>>>>>                   | |              |              |
>>>>>>>>>>>>>                   | |              |              |
>>>>>>>>>>>>>                   | |              |              |
>>>>>>>>>>>>> +-----------v-----------------v--------------v--------------v----------+ 
>>>>>>>>>>>>>
>>>>>>>>>>>>> |                            VHOST 
>>>>>>>>>>>>> CORE                                |
>>>>>>>>>>>>> +--------^---------------^--------------------^------------------^-----+ 
>>>>>>>>>>>>>
>>>>>>>>>>>>>                | |                    |                  |
>>>>>>>>>>>>>                | |                    |                  |
>>>>>>>>>>>>>                | |                    |                  |
>>>>>>>>>>>>> +--------v-------+  +----v------+ +----------v----------+  
>>>>>>>>>>>>> +----v-----+
>>>>>>>>>>>>> |  PCI EPF VHOST |  | NTB VHOST | |PLATFORM DEVICE VHOST|  
>>>>>>>>>>>>> |    X     |
>>>>>>>>>>>>> +----------------+  +-----------+ +---------------------+  
>>>>>>>>>>>>> +----------+
>>
>> So, the upper half is basically various functionality types, e.g. a net
>> device. What is the lower half, a hardware interface? Would it be
>> equivalent to e.g. a normal PCI device?
>
> Right, the upper half should provide the functionality.
> The bottom layer could be a HW interface (like PCIe device or NTB 
> device) or it could be a SW interface (for accessing virtio ring in 
> userspace) that could be used by Hypervisor.
>
> The top half should be transparent to what type of device is actually 
> using it.
>
>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> This was initially proposed here [1]
>>>>>>>>>>>>>
>>>>>>>>>>>>> [1] ->
>>>>>>>>>>>>> https://lore.kernel.org/r/2cf00ec4-1ed6-f66e-6897-006d1a5b6390@ti.com 
>>>>>>>>>>>>>
>>>>>>>>>>>> I find this very interesting. A huge patchset so will take 
>>>>>>>>>>>> a bit
>>>>>>>>>>>> to review, but I certainly plan to do that. Thanks!
>>>>>>>>>>> Yes, it would be better if there's a git branch for us to 
>>>>>>>>>>> have a look.
>>>>>>>>>> I've pushed the branch
>>>>>>>>>> https://github.com/kishon/linux-wip.git vhost_rpmsg_pci_ntb_rfc
>>>>>>>>> Thanks
>>>>>>>>>
>>>>>>>>>>> Btw, I'm not sure I get the big picture, but I vaguely feel 
>>>>>>>>>>> some of the
>>>>>>>>>>> work is
>>>>>>>>>>> duplicated with vDPA (e.g the epf transport or vhost bus).
>>>>>>>>>> This is about connecting two different HW systems both 
>>>>>>>>>> running Linux and
>>>>>>>>>> doesn't necessarily involve virtualization.
>>>>>>>>> Right, this is something similar to VOP
>>>>>>>>> (Documentation/misc-devices/mic/mic_overview.rst). The 
>>>>>>>>> different is the
>>>>>>>>> hardware I guess and VOP use userspace application to 
>>>>>>>>> implement the device.
>>>>>>>> I'd also like to point out, this series tries to have 
>>>>>>>> communication between
>>>>>>>> two
>>>>>>>> SoCs in vendor agnostic way. Since this series solves for 2 
>>>>>>>> usecases (PCIe
>>>>>>>> RC<->EP and NTB), for the NTB case it directly plugs into NTB 
>>>>>>>> framework and
>>>>>>>> any
>>>>>>>> of the HW in NTB below should be able to use a virtio-vhost 
>>>>>>>> communication
>>>>>>>>
>>>>>>>> #ls drivers/ntb/hw/
>>>>>>>> amd  epf  idt  intel  mscc
>>>>>>>>
>>>>>>>> And similarly for the PCIe RC<->EP communication, this adds a 
>>>>>>>> generic endpoint
>>>>>>>> function driver and hence any SoC that supports configurable 
>>>>>>>> PCIe endpoint can
>>>>>>>> use virtio-vhost communication
>>>>>>>>
>>>>>>>> # ls drivers/pci/controller/dwc/*ep*
>>>>>>>> drivers/pci/controller/dwc/pcie-designware-ep.c
>>>>>>>> drivers/pci/controller/dwc/pcie-uniphier-ep.c
>>>>>>>> drivers/pci/controller/dwc/pci-layerscape-ep.c
>>>>>>> Thanks for those backgrounds.
>>>>>>>
>>>>>>>>>>       So there is no guest or host as in
>>>>>>>>>> virtualization but two entirely different systems connected 
>>>>>>>>>> via PCIe cable,
>>>>>>>>>> one
>>>>>>>>>> acting as guest and one as host. So one system will provide 
>>>>>>>>>> virtio
>>>>>>>>>> functionality reserving memory for virtqueues and the other 
>>>>>>>>>> provides vhost
>>>>>>>>>> functionality providing a way to access the virtqueues in 
>>>>>>>>>> virtio memory.
>>>>>>>>>> One is
>>>>>>>>>> source and the other is sink and there is no intermediate 
>>>>>>>>>> entity. (vhost was
>>>>>>>>>> probably intermediate entity in virtualization?)
>>>>>>>>> (Not a native English speaker) but "vhost" could introduce 
>>>>>>>>> some confusion for
>>>>>>>>> me since it was use for implementing virtio backend for 
>>>>>>>>> userspace drivers. I
>>>>>>>>> guess "vringh" could be better.
>>>>>>>> Initially I had named this vringh but later decided to choose 
>>>>>>>> vhost instead of
>>>>>>>> vringh. vhost is still a virtio backend (not necessarily 
>>>>>>>> userspace) though it
>>>>>>>> now resides in an entirely different system. Whatever virtio is 
>>>>>>>> for a frontend
>>>>>>>> system, vhost can be that for a backend system. vring can be 
>>>>>>>> for accessing
>>>>>>>> virtqueue and can be used either in frontend or backend.
>>
>> I guess that clears up at least some of my questions from above...
>>
>>>>>>> Ok.
>>>>>>>
>>>>>>>>>>> Have you considered to implement these through vDPA?
>>>>>>>>>> IIUC vDPA only provides an interface to userspace and an 
>>>>>>>>>> in-kernel rpmsg
>>>>>>>>>> driver
>>>>>>>>>> or vhost net driver is not provided.
>>>>>>>>>>
>>>>>>>>>> The HW connection looks something like 
>>>>>>>>>> https://pasteboard.co/JfMVVHC.jpg
>>>>>>>>>> (usecase2 above),
>>>>>>>>> I see.
>>>>>>>>>
>>>>>>>>>>       all the boards run Linux. The middle board provides NTB
>>>>>>>>>> functionality and board on either side provides virtio/vhost
>>>>>>>>>> functionality and
>>>>>>>>>> transfer data using rpmsg.
>>
>> This setup looks really interesting (sometimes, it's really hard to
>> imagine this in the abstract.)
>>>>>>>>> So I wonder whether it's worthwhile for a new bus. Can we use
>>>>>>>>> the existed virtio-bus/drivers? It might work as, except for
>>>>>>>>> the epf transport, we can introduce a epf "vhost" transport
>>>>>>>>> driver.
>>>>>>>> IMHO we'll need two buses one for frontend and other for
>>>>>>>> backend because the two components can then co-operate/interact
>>>>>>>> with each other to provide a functionality. Though both will
>>>>>>>> seemingly provide similar callbacks, they are both provide
>>>>>>>> symmetrical or complimentary funcitonality and need not be same
>>>>>>>> or identical.
>>>>>>>>
>>>>>>>> Having the same bus can also create sequencing issues.
>>>>>>>>
>>>>>>>> If you look at virtio_dev_probe() of virtio_bus
>>>>>>>>
>>>>>>>> device_features = dev->config->get_features(dev);
>>>>>>>>
>>>>>>>> Now if we use same bus for both front-end and back-end, both
>>>>>>>> will try to get_features when there has been no set_features.
>>>>>>>> Ideally vhost device should be initialized first with the set
>>>>>>>> of features it supports. Vhost and virtio should use "status"
>>>>>>>> and "features" complimentarily and not identically.
>>>>>>> Yes, but there's no need for doing status/features passthrough
>>>>>>> in epf vhost drivers.b
>>>>>>>
>>>>>>>> virtio device (or frontend) cannot be initialized before vhost
>>>>>>>> device (or backend) gets initialized with data such as
>>>>>>>> features. Similarly vhost (backend)
>>>>>>>> cannot access virqueues or buffers before virtio (frontend) sets
>>>>>>>> VIRTIO_CONFIG_S_DRIVER_OK whereas that requirement is not there
>>>>>>>> for virtio as the physical memory for virtqueues are created by
>>>>>>>> virtio (frontend).
>>>>>>> epf vhost drivers need to implement two devices: vhost(vringh)
>>>>>>> device and virtio device (which is a mediated device). The
>>>>>>> vhost(vringh) device is doing feature negotiation with the
>>>>>>> virtio device via RC/EP or NTB. The virtio device is doing
>>>>>>> feature negotiation with local virtio drivers. If there're
>>>>>>> feature mismatch, epf vhost drivers and do mediation between
>>>>>>> them.
>>>>>> Here epf vhost should be initialized with a set of features for
>>>>>> it to negotiate either as vhost device or virtio device no? Where
>>>>>> should the initial feature set for epf vhost come from?
>>>>>
>>>>> I think it can work as:
>>>>>
>>>>> 1) Having an initial features (hard coded in the code) set X in
>>>>> epf vhost 2) Using this X for both virtio device and vhost(vringh)
>>>>> device 3) local virtio driver will negotiate with virtio device
>>>>> with feature set Y 4) remote virtio driver will negotiate with
>>>>> vringh device with feature set Z 5) mediate between feature Y and
>>>>> feature Z since both Y and Z are a subset of X
>>>>>
>>>> okay. I'm also thinking if we could have configfs for configuring
>>>> this. Anyways we could find different approaches of configuring
>>>> this.
>>>
>>>
>>> Yes, and I think some management API is needed even in the design of
>>> your "Software Layering". In that figure, rpmsg vhost need some
>>> pre-set or hard-coded features.
>>
>> When I saw the plumbers talk, my first idea was "this needs to be a new
>> transport". You have some hard-coded or pre-configured features, and
>> then features are negotiated via a transport-specific means in the
>> usual way. There's basically an extra/extended layer for this (and
>> status, and whatever).
>
> I think for PCIe root complex to PCIe endpoint communication it's 
> still "Virtio Over PCI Bus", though existing layout cannot be used in 
> this context (find virtio capability will fail for modern interface 
> and loading queue status immediately after writing queue number is not 
> possible for root complex to endpoint communication; setup_vq() in 
> virtio_pci_legacy.c).


Then you need something that is functional equivalent to virtio PCI 
which is actually the concept of vDPA (e.g vDPA provides alternatives if 
the queue_sel is hard in the EP implementation).


>
> "Virtio Over NTB" should anyways be a new transport.
>>
>> Does that make any sense?
>
> yeah, in the approach I used the initial features are hard-coded in 
> vhost-rpmsg (inherent to the rpmsg) but when we have to use adapter 
> layer (vhost only for accessing virtio ring and use virtio drivers on 
> both front end and backend), based on the functionality (e.g, rpmsg), 
> the vhost should be configured with features (to be presented to the 
> virtio) and that's why additional layer or APIs will be required.


A question here, if we go with vhost bus approach, does it mean the 
virtio device can only be implemented in EP's userspace?

Thanks


>>
>>>
>>>
>>>>>>>>> It will have virtqueues but only used for the communication
>>>>>>>>> between itself and
>>>>>>>>> uppter virtio driver. And it will have vringh queues which
>>>>>>>>> will be probe by virtio epf transport drivers. And it needs to
>>>>>>>>> do datacopy between virtqueue and
>>>>>>>>> vringh queues.
>>>>>>>>>
>>>>>>>>> It works like:
>>>>>>>>>
>>>>>>>>> virtio drivers <- virtqueue/virtio-bus -> epf vhost drivers <-
>>>>>>>>> vringh queue/epf>
>>>>>>>>>
>>>>>>>>> The advantages is that there's no need for writing new buses
>>>>>>>>> and drivers.
>>>>>>>> I think this will work however there is an addtional copy
>>>>>>>> between vringh queue and virtqueue,
>>>>>>> I think not? E.g in use case 1), if we stick to virtio bus, we
>>>>>>> will have:
>>>>>>>
>>>>>>> virtio-rpmsg (EP) <- virtio ring(1) -> epf vhost driver (EP) <-
>>>>>>> virtio ring(2) -> virtio pci (RC) <-> virtio rpmsg (RC)
>>>>>> IIUC epf vhost driver (EP) will access virtio ring(2) using
>>>>>> vringh?
>>>>>
>>>>> Yes.
>>>>>
>>>>>> And virtio
>>>>>> ring(2) is created by virtio pci (RC).
>>>>>
>>>>> Yes.
>>>>>
>>>>>>> What epf vhost driver did is to read from virtio ring(1) about
>>>>>>> the buffer len and addr and them DMA to Linux(RC)?
>>>>>> okay, I made some optimization here where vhost-rpmsg using a
>>>>>> helper writes a buffer from rpmsg's upper layer directly to
>>>>>> remote Linux (RC) as against here were it has to be first written
>>>>>> to virtio ring (1).
>>>>>>
>>>>>> Thinking how this would look for NTB
>>>>>> virtio-rpmsg (HOST1) <- virtio ring(1) -> NTB(HOST1) <->
>>>>>> NTB(HOST2)  <- virtio ring(2) -> virtio-rpmsg (HOST2)
>>>>>>
>>>>>> Here the NTB(HOST1) will access the virtio ring(2) using vringh?
>>>>>
>>>>> Yes, I think so it needs to use vring to access virtio ring (1) as
>>>>> well.
>>>> NTB(HOST1) and virtio ring(1) will be in the same system. So it
>>>> doesn't have to use vring. virtio ring(1) is by the virtio device
>>>> the NTB(HOST1) creates.
>>>
>>>
>>> Right.
>>>
>>>
>>>>>> Do you also think this will work seamlessly with virtio_net.c,
>>>>>> virtio_blk.c?
>>>>>
>>>>> Yes.
>>>> okay, I haven't looked at this but the backend of virtio_blk should
>>>> access an actual storage device no?
>>>
>>>
>>> Good point, for non-peer device like storage. There's probably no
>>> need for it to be registered on the virtio bus and it might be better
>>> to behave as you proposed.
>>
>> I might be missing something; but if you expose something as a block
>> device, it should have something it can access with block reads/writes,
>> shouldn't it? Of course, that can be a variety of things.
>>
>>>
>>> Just to make sure I understand the design, how is VHOST SCSI expected
>>> to work in your proposal, does it have a device for file as a backend?
>>>
>>>
>>>>>> I'd like to get clarity on two things in the approach you
>>>>>> suggested, one is features (since epf vhost should ideally be
>>>>>> transparent to any virtio driver)
>>>>>
>>>>> We can have have an array of pre-defined features indexed by
>>>>> virtio device id in the code.
>>>>>
>>>>>> and the other is how certain inputs to virtio device such as
>>>>>> number of buffers be determined.
>>>>>
>>>>> We can start from hard coded the value like 256, or introduce some
>>>>> API for user to change the value.
>>>>>
>>>>>> Thanks again for your suggestions!
>>>>>
>>>>> You're welcome.
>>>>>
>>>>> Note that I just want to check whether or not we can reuse the
>>>>> virtio bus/driver. It's something similar to what you proposed in
>>>>> Software Layering but we just replace "vhost core" with "virtio
>>>>> bus" and move the vhost core below epf/ntb/platform transport.
>>>> Got it. My initial design was based on my understanding of your
>>>> comments [1].
>>>
>>>
>>> Yes, but that's just for a networking device. If we want something
>>> more generic, it may require more thought (bus etc).
>>
>> I believe that we indeed need something bus-like to be able to support
>> a variety of devices.
>
> I think we could still have adapter layers for different types of 
> devices ([1]) and use existing virtio bus for both front end and back 
> end. Using bus-like will however simplify adding support for new types 
> of devices and adding adapters for devices will be slightly more complex.
>
> [1] -> Page 13 in 
> https://linuxplumbersconf.org/event/7/contributions/849/attachments/642/1175/Virtio_for_PCIe_RC_EP_NTB.pdf
>>
>>>
>>>
>>>>
>>>> I'll try to create something based on your proposed design here.
>>>
>>>
>>> Sure, but for coding, we'd better wait for other's opinion here.
>>
>> Please tell me if my thoughts above make any sense... I have just
>> started looking at that, so I might be completely off.
>
> I think your understanding is correct! Thanks for your inputs.
>
> Thanks
> Kishon
Cornelia Huck Sept. 8, 2020, 4:37 p.m. UTC | #19
On Tue, 1 Sep 2020 16:50:03 +0800
Jason Wang <jasowang@redhat.com> wrote:

> On 2020/9/1 下午1:24, Kishon Vijay Abraham I wrote:
> > Hi,
> >
> > On 28/08/20 4:04 pm, Cornelia Huck wrote:  
> >> On Thu, 9 Jul 2020 14:26:53 +0800
> >> Jason Wang <jasowang@redhat.com> wrote:
> >>
> >> [Let me note right at the beginning that I first noted this while
> >> listening to Kishon's talk at LPC on Wednesday. I might be very
> >> confused about the background here, so let me apologize beforehand for
> >> any confusion I might spread.]
> >>  
> >>> On 2020/7/8 下午9:13, Kishon Vijay Abraham I wrote:  
> >>>> Hi Jason,
> >>>>
> >>>> On 7/8/2020 4:52 PM, Jason Wang wrote:  
> >>>>> On 2020/7/7 下午10:45, Kishon Vijay Abraham I wrote:  
> >>>>>> Hi Jason,
> >>>>>>
> >>>>>> On 7/7/2020 3:17 PM, Jason Wang wrote:  
> >>>>>>> On 2020/7/6 下午5:32, Kishon Vijay Abraham I wrote:  
> >>>>>>>> Hi Jason,
> >>>>>>>>
> >>>>>>>> On 7/3/2020 12:46 PM, Jason Wang wrote:  
> >>>>>>>>> On 2020/7/2 下午9:35, Kishon Vijay Abraham I wrote:  
> >>>>>>>>>> Hi Jason,
> >>>>>>>>>>
> >>>>>>>>>> On 7/2/2020 3:40 PM, Jason Wang wrote:  
> >>>>>>>>>>> On 2020/7/2 下午5:51, Michael S. Tsirkin wrote:  
> >>>>>>>>>>>> On Thu, Jul 02, 2020 at 01:51:21PM +0530, Kishon Vijay 
> >>>>>>>>>>>> Abraham I wrote:  
> >>>>>>>>>>>>> This series enhances Linux Vhost support to enable SoC-to-SoC
> >>>>>>>>>>>>> communication over MMIO. This series enables rpmsg 
> >>>>>>>>>>>>> communication between
> >>>>>>>>>>>>> two SoCs using both PCIe RC<->EP and HOST1-NTB-HOST2
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> 1) Modify vhost to use standard Linux driver model
> >>>>>>>>>>>>> 2) Add support in vring to access virtqueue over MMIO
> >>>>>>>>>>>>> 3) Add vhost client driver for rpmsg
> >>>>>>>>>>>>> 4) Add PCIe RC driver (uses virtio) and PCIe EP driver 
> >>>>>>>>>>>>> (uses vhost) for
> >>>>>>>>>>>>>          rpmsg communication between two SoCs connected to 
> >>>>>>>>>>>>> each other
> >>>>>>>>>>>>> 5) Add NTB Virtio driver and NTB Vhost driver for rpmsg 
> >>>>>>>>>>>>> communication
> >>>>>>>>>>>>>          between two SoCs connected via NTB
> >>>>>>>>>>>>> 6) Add configfs to configure the components
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> UseCase1 :
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>        VHOST RPMSG VIRTIO RPMSG
> >>>>>>>>>>>>> +                               +
> >>>>>>>>>>>>> |                               |
> >>>>>>>>>>>>> |                               |
> >>>>>>>>>>>>> |                               |
> >>>>>>>>>>>>> |                               |
> >>>>>>>>>>>>> +-----v------+ +------v-------+
> >>>>>>>>>>>>> |   Linux    |                 | Linux    |
> >>>>>>>>>>>>> |  Endpoint  |                 | Root Complex |
> >>>>>>>>>>>>> | <----------------->              |
> >>>>>>>>>>>>> |            | |              |
> >>>>>>>>>>>>> |    SOC1    |                 | SOC2     |
> >>>>>>>>>>>>> +------------+ +--------------+
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> UseCase 2:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>            VHOST RPMSG VIRTIO RPMSG
> >>>>>>>>>>>>> + +
> >>>>>>>>>>>>> | |
> >>>>>>>>>>>>> | |
> >>>>>>>>>>>>> | |
> >>>>>>>>>>>>> | |
> >>>>>>>>>>>>> +------v------+ +------v------+
> >>>>>>>>>>>>>          | | |             |
> >>>>>>>>>>>>>          |    HOST1 |                                   | 
> >>>>>>>>>>>>> HOST2    |
> >>>>>>>>>>>>>          | | |             |
> >>>>>>>>>>>>> +------^------+ +------^------+
> >>>>>>>>>>>>> | |
> >>>>>>>>>>>>> | |
> >>>>>>>>>>>>> +---------------------------------------------------------------------+ 
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> | +------v------+ +------v------+  |
> >>>>>>>>>>>>> |  | | |             |  |
> >>>>>>>>>>>>> |  |     EP |                                   | EP      
> >>>>>>>>>>>>> |  |
> >>>>>>>>>>>>> |  | CONTROLLER1 |                                   | 
> >>>>>>>>>>>>> CONTROLLER2 |  |
> >>>>>>>>>>>>> |  | <-----------------------------------> |  |
> >>>>>>>>>>>>> |  | | |             |  |
> >>>>>>>>>>>>> |  | | |             |  |
> >>>>>>>>>>>>> |  |             |  SoC With Multiple EP Instances   
> >>>>>>>>>>>>> |             |  |
> >>>>>>>>>>>>> |  |             |  (Configured using NTB Function)  
> >>>>>>>>>>>>> |             |  |
> >>>>>>>>>>>>> | +-------------+ +-------------+  |
> >>>>>>>>>>>>> +---------------------------------------------------------------------+ 
> >>>>>>>>>>>>>  
> >>
> >> First of all, to clarify the terminology:
> >> Is "vhost rpmsg" acting as what the virtio standard calls the 'device',
> >> and "virtio rpmsg" as the 'driver'? Or is the "vhost" part mostly just  
> >
> > Right, vhost_rpmsg is 'device' and virtio_rpmsg is 'driver'.  
> >> virtqueues + the exiting vhost interfaces?  
> >
> > It's implemented to provide the full 'device' functionality.  

Ok.

> >>  
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Software Layering:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> The high-level SW layering should look something like 
> >>>>>>>>>>>>> below. This series
> >>>>>>>>>>>>> adds support only for RPMSG VHOST, however something 
> >>>>>>>>>>>>> similar should be
> >>>>>>>>>>>>> done for net and scsi. With that any vhost device (PCI, 
> >>>>>>>>>>>>> NTB, Platform
> >>>>>>>>>>>>> device, user) can use any of the vhost client driver.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>           +----------------+ +-----------+  +------------+ 
> >>>>>>>>>>>>> +----------+
> >>>>>>>>>>>>>           |  RPMSG VHOST   |  | NET VHOST |  | SCSI VHOST 
> >>>>>>>>>>>>> |  |    X     |
> >>>>>>>>>>>>>           +-------^--------+ +-----^-----+  +-----^------+ 
> >>>>>>>>>>>>> +----^-----+
> >>>>>>>>>>>>>                   | |              |              |
> >>>>>>>>>>>>>                   | |              |              |
> >>>>>>>>>>>>>                   | |              |              |
> >>>>>>>>>>>>> +-----------v-----------------v--------------v--------------v----------+ 
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> |                            VHOST 
> >>>>>>>>>>>>> CORE                                |
> >>>>>>>>>>>>> +--------^---------------^--------------------^------------------^-----+ 
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>                | |                    |                  |
> >>>>>>>>>>>>>                | |                    |                  |
> >>>>>>>>>>>>>                | |                    |                  |
> >>>>>>>>>>>>> +--------v-------+  +----v------+ +----------v----------+  
> >>>>>>>>>>>>> +----v-----+
> >>>>>>>>>>>>> |  PCI EPF VHOST |  | NTB VHOST | |PLATFORM DEVICE VHOST|  
> >>>>>>>>>>>>> |    X     |
> >>>>>>>>>>>>> +----------------+  +-----------+ +---------------------+  
> >>>>>>>>>>>>> +----------+  
> >>
> >> So, the upper half is basically various functionality types, e.g. a net
> >> device. What is the lower half, a hardware interface? Would it be
> >> equivalent to e.g. a normal PCI device?  
> >
> > Right, the upper half should provide the functionality.
> > The bottom layer could be a HW interface (like PCIe device or NTB 
> > device) or it could be a SW interface (for accessing virtio ring in 
> > userspace) that could be used by Hypervisor.

Ok. In that respect, the SW interface is sufficiently device-like, I
guess.

> >
> > The top half should be transparent to what type of device is actually 
> > using it.
> >  
> >>  
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> This was initially proposed here [1]
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> [1] ->
> >>>>>>>>>>>>> https://lore.kernel.org/r/2cf00ec4-1ed6-f66e-6897-006d1a5b6390@ti.com 
> >>>>>>>>>>>>>  
> >>>>>>>>>>>> I find this very interesting. A huge patchset so will take 
> >>>>>>>>>>>> a bit
> >>>>>>>>>>>> to review, but I certainly plan to do that. Thanks!  
> >>>>>>>>>>> Yes, it would be better if there's a git branch for us to 
> >>>>>>>>>>> have a look.  
> >>>>>>>>>> I've pushed the branch
> >>>>>>>>>> https://github.com/kishon/linux-wip.git vhost_rpmsg_pci_ntb_rfc  
> >>>>>>>>> Thanks
> >>>>>>>>>  
> >>>>>>>>>>> Btw, I'm not sure I get the big picture, but I vaguely feel 
> >>>>>>>>>>> some of the
> >>>>>>>>>>> work is
> >>>>>>>>>>> duplicated with vDPA (e.g the epf transport or vhost bus).  
> >>>>>>>>>> This is about connecting two different HW systems both 
> >>>>>>>>>> running Linux and
> >>>>>>>>>> doesn't necessarily involve virtualization.  
> >>>>>>>>> Right, this is something similar to VOP
> >>>>>>>>> (Documentation/misc-devices/mic/mic_overview.rst). The 
> >>>>>>>>> different is the
> >>>>>>>>> hardware I guess and VOP use userspace application to 
> >>>>>>>>> implement the device.  
> >>>>>>>> I'd also like to point out, this series tries to have 
> >>>>>>>> communication between
> >>>>>>>> two
> >>>>>>>> SoCs in vendor agnostic way. Since this series solves for 2 
> >>>>>>>> usecases (PCIe
> >>>>>>>> RC<->EP and NTB), for the NTB case it directly plugs into NTB 
> >>>>>>>> framework and
> >>>>>>>> any
> >>>>>>>> of the HW in NTB below should be able to use a virtio-vhost 
> >>>>>>>> communication
> >>>>>>>>
> >>>>>>>> #ls drivers/ntb/hw/
> >>>>>>>> amd  epf  idt  intel  mscc
> >>>>>>>>
> >>>>>>>> And similarly for the PCIe RC<->EP communication, this adds a 
> >>>>>>>> generic endpoint
> >>>>>>>> function driver and hence any SoC that supports configurable 
> >>>>>>>> PCIe endpoint can
> >>>>>>>> use virtio-vhost communication
> >>>>>>>>
> >>>>>>>> # ls drivers/pci/controller/dwc/*ep*
> >>>>>>>> drivers/pci/controller/dwc/pcie-designware-ep.c
> >>>>>>>> drivers/pci/controller/dwc/pcie-uniphier-ep.c
> >>>>>>>> drivers/pci/controller/dwc/pci-layerscape-ep.c  
> >>>>>>> Thanks for those backgrounds.
> >>>>>>>  
> >>>>>>>>>>       So there is no guest or host as in
> >>>>>>>>>> virtualization but two entirely different systems connected 
> >>>>>>>>>> via PCIe cable,
> >>>>>>>>>> one
> >>>>>>>>>> acting as guest and one as host. So one system will provide 
> >>>>>>>>>> virtio
> >>>>>>>>>> functionality reserving memory for virtqueues and the other 
> >>>>>>>>>> provides vhost
> >>>>>>>>>> functionality providing a way to access the virtqueues in 
> >>>>>>>>>> virtio memory.
> >>>>>>>>>> One is
> >>>>>>>>>> source and the other is sink and there is no intermediate 
> >>>>>>>>>> entity. (vhost was
> >>>>>>>>>> probably intermediate entity in virtualization?)  
> >>>>>>>>> (Not a native English speaker) but "vhost" could introduce 
> >>>>>>>>> some confusion for
> >>>>>>>>> me since it was use for implementing virtio backend for 
> >>>>>>>>> userspace drivers. I
> >>>>>>>>> guess "vringh" could be better.  
> >>>>>>>> Initially I had named this vringh but later decided to choose 
> >>>>>>>> vhost instead of
> >>>>>>>> vringh. vhost is still a virtio backend (not necessarily 
> >>>>>>>> userspace) though it
> >>>>>>>> now resides in an entirely different system. Whatever virtio is 
> >>>>>>>> for a frontend
> >>>>>>>> system, vhost can be that for a backend system. vring can be 
> >>>>>>>> for accessing
> >>>>>>>> virtqueue and can be used either in frontend or backend.  
> >>
> >> I guess that clears up at least some of my questions from above...
> >>  
> >>>>>>> Ok.
> >>>>>>>  
> >>>>>>>>>>> Have you considered to implement these through vDPA?  
> >>>>>>>>>> IIUC vDPA only provides an interface to userspace and an 
> >>>>>>>>>> in-kernel rpmsg
> >>>>>>>>>> driver
> >>>>>>>>>> or vhost net driver is not provided.
> >>>>>>>>>>
> >>>>>>>>>> The HW connection looks something like 
> >>>>>>>>>> https://pasteboard.co/JfMVVHC.jpg
> >>>>>>>>>> (usecase2 above),  
> >>>>>>>>> I see.
> >>>>>>>>>  
> >>>>>>>>>>       all the boards run Linux. The middle board provides NTB
> >>>>>>>>>> functionality and board on either side provides virtio/vhost
> >>>>>>>>>> functionality and
> >>>>>>>>>> transfer data using rpmsg.  
> >>
> >> This setup looks really interesting (sometimes, it's really hard to
> >> imagine this in the abstract.)  
> >>>>>>>>> So I wonder whether it's worthwhile for a new bus. Can we use
> >>>>>>>>> the existed virtio-bus/drivers? It might work as, except for
> >>>>>>>>> the epf transport, we can introduce a epf "vhost" transport
> >>>>>>>>> driver.  
> >>>>>>>> IMHO we'll need two buses one for frontend and other for
> >>>>>>>> backend because the two components can then co-operate/interact
> >>>>>>>> with each other to provide a functionality. Though both will
> >>>>>>>> seemingly provide similar callbacks, they are both provide
> >>>>>>>> symmetrical or complimentary funcitonality and need not be same
> >>>>>>>> or identical.
> >>>>>>>>
> >>>>>>>> Having the same bus can also create sequencing issues.
> >>>>>>>>
> >>>>>>>> If you look at virtio_dev_probe() of virtio_bus
> >>>>>>>>
> >>>>>>>> device_features = dev->config->get_features(dev);
> >>>>>>>>
> >>>>>>>> Now if we use same bus for both front-end and back-end, both
> >>>>>>>> will try to get_features when there has been no set_features.
> >>>>>>>> Ideally vhost device should be initialized first with the set
> >>>>>>>> of features it supports. Vhost and virtio should use "status"
> >>>>>>>> and "features" complimentarily and not identically.  
> >>>>>>> Yes, but there's no need for doing status/features passthrough
> >>>>>>> in epf vhost drivers.b
> >>>>>>>  
> >>>>>>>> virtio device (or frontend) cannot be initialized before vhost
> >>>>>>>> device (or backend) gets initialized with data such as
> >>>>>>>> features. Similarly vhost (backend)
> >>>>>>>> cannot access virqueues or buffers before virtio (frontend) sets
> >>>>>>>> VIRTIO_CONFIG_S_DRIVER_OK whereas that requirement is not there
> >>>>>>>> for virtio as the physical memory for virtqueues are created by
> >>>>>>>> virtio (frontend).  
> >>>>>>> epf vhost drivers need to implement two devices: vhost(vringh)
> >>>>>>> device and virtio device (which is a mediated device). The
> >>>>>>> vhost(vringh) device is doing feature negotiation with the
> >>>>>>> virtio device via RC/EP or NTB. The virtio device is doing
> >>>>>>> feature negotiation with local virtio drivers. If there're
> >>>>>>> feature mismatch, epf vhost drivers and do mediation between
> >>>>>>> them.  
> >>>>>> Here epf vhost should be initialized with a set of features for
> >>>>>> it to negotiate either as vhost device or virtio device no? Where
> >>>>>> should the initial feature set for epf vhost come from?  
> >>>>>
> >>>>> I think it can work as:
> >>>>>
> >>>>> 1) Having an initial features (hard coded in the code) set X in
> >>>>> epf vhost 2) Using this X for both virtio device and vhost(vringh)
> >>>>> device 3) local virtio driver will negotiate with virtio device
> >>>>> with feature set Y 4) remote virtio driver will negotiate with
> >>>>> vringh device with feature set Z 5) mediate between feature Y and
> >>>>> feature Z since both Y and Z are a subset of X
> >>>>>  
> >>>> okay. I'm also thinking if we could have configfs for configuring
> >>>> this. Anyways we could find different approaches of configuring
> >>>> this.  
> >>>
> >>>
> >>> Yes, and I think some management API is needed even in the design of
> >>> your "Software Layering". In that figure, rpmsg vhost need some
> >>> pre-set or hard-coded features.  
> >>
> >> When I saw the plumbers talk, my first idea was "this needs to be a new
> >> transport". You have some hard-coded or pre-configured features, and
> >> then features are negotiated via a transport-specific means in the
> >> usual way. There's basically an extra/extended layer for this (and
> >> status, and whatever).  
> >
> > I think for PCIe root complex to PCIe endpoint communication it's 
> > still "Virtio Over PCI Bus", though existing layout cannot be used in 
> > this context (find virtio capability will fail for modern interface 
> > and loading queue status immediately after writing queue number is not 
> > possible for root complex to endpoint communication; setup_vq() in 
> > virtio_pci_legacy.c).  
> 
> 
> Then you need something that is functional equivalent to virtio PCI 
> which is actually the concept of vDPA (e.g vDPA provides alternatives if 
> the queue_sel is hard in the EP implementation).

It seems I really need to read up on vDPA more... do you have a pointer
for diving into this alternatives aspect?

> 
> 
> >
> > "Virtio Over NTB" should anyways be a new transport.  
> >>
> >> Does that make any sense?  
> >
> > yeah, in the approach I used the initial features are hard-coded in 
> > vhost-rpmsg (inherent to the rpmsg) but when we have to use adapter 
> > layer (vhost only for accessing virtio ring and use virtio drivers on 
> > both front end and backend), based on the functionality (e.g, rpmsg), 
> > the vhost should be configured with features (to be presented to the 
> > virtio) and that's why additional layer or APIs will be required.  
> 
> 
> A question here, if we go with vhost bus approach, does it mean the 
> virtio device can only be implemented in EP's userspace?

Can we maybe implement an alternative bus as well that would allow us
to support different virtio device implementations (in addition to the
vhost bus + userspace combination)?

> 
> Thanks
> 
> 
> >>  
> >>>
> >>>  
> >>>>>>>>> It will have virtqueues but only used for the communication
> >>>>>>>>> between itself and
> >>>>>>>>> uppter virtio driver. And it will have vringh queues which
> >>>>>>>>> will be probe by virtio epf transport drivers. And it needs to
> >>>>>>>>> do datacopy between virtqueue and
> >>>>>>>>> vringh queues.
> >>>>>>>>>
> >>>>>>>>> It works like:
> >>>>>>>>>
> >>>>>>>>> virtio drivers <- virtqueue/virtio-bus -> epf vhost drivers <-
> >>>>>>>>> vringh queue/epf>
> >>>>>>>>>
> >>>>>>>>> The advantages is that there's no need for writing new buses
> >>>>>>>>> and drivers.  
> >>>>>>>> I think this will work however there is an addtional copy
> >>>>>>>> between vringh queue and virtqueue,  
> >>>>>>> I think not? E.g in use case 1), if we stick to virtio bus, we
> >>>>>>> will have:
> >>>>>>>
> >>>>>>> virtio-rpmsg (EP) <- virtio ring(1) -> epf vhost driver (EP) <-
> >>>>>>> virtio ring(2) -> virtio pci (RC) <-> virtio rpmsg (RC)  
> >>>>>> IIUC epf vhost driver (EP) will access virtio ring(2) using
> >>>>>> vringh?  
> >>>>>
> >>>>> Yes.
> >>>>>  
> >>>>>> And virtio
> >>>>>> ring(2) is created by virtio pci (RC).  
> >>>>>
> >>>>> Yes.
> >>>>>  
> >>>>>>> What epf vhost driver did is to read from virtio ring(1) about
> >>>>>>> the buffer len and addr and them DMA to Linux(RC)?  
> >>>>>> okay, I made some optimization here where vhost-rpmsg using a
> >>>>>> helper writes a buffer from rpmsg's upper layer directly to
> >>>>>> remote Linux (RC) as against here were it has to be first written
> >>>>>> to virtio ring (1).
> >>>>>>
> >>>>>> Thinking how this would look for NTB
> >>>>>> virtio-rpmsg (HOST1) <- virtio ring(1) -> NTB(HOST1) <->
> >>>>>> NTB(HOST2)  <- virtio ring(2) -> virtio-rpmsg (HOST2)
> >>>>>>
> >>>>>> Here the NTB(HOST1) will access the virtio ring(2) using vringh?  
> >>>>>
> >>>>> Yes, I think so it needs to use vring to access virtio ring (1) as
> >>>>> well.  
> >>>> NTB(HOST1) and virtio ring(1) will be in the same system. So it
> >>>> doesn't have to use vring. virtio ring(1) is by the virtio device
> >>>> the NTB(HOST1) creates.  
> >>>
> >>>
> >>> Right.
> >>>
> >>>  
> >>>>>> Do you also think this will work seamlessly with virtio_net.c,
> >>>>>> virtio_blk.c?  
> >>>>>
> >>>>> Yes.  
> >>>> okay, I haven't looked at this but the backend of virtio_blk should
> >>>> access an actual storage device no?  
> >>>
> >>>
> >>> Good point, for non-peer device like storage. There's probably no
> >>> need for it to be registered on the virtio bus and it might be better
> >>> to behave as you proposed.  
> >>
> >> I might be missing something; but if you expose something as a block
> >> device, it should have something it can access with block reads/writes,
> >> shouldn't it? Of course, that can be a variety of things.
> >>  
> >>>
> >>> Just to make sure I understand the design, how is VHOST SCSI expected
> >>> to work in your proposal, does it have a device for file as a backend?
> >>>
> >>>  
> >>>>>> I'd like to get clarity on two things in the approach you
> >>>>>> suggested, one is features (since epf vhost should ideally be
> >>>>>> transparent to any virtio driver)  
> >>>>>
> >>>>> We can have have an array of pre-defined features indexed by
> >>>>> virtio device id in the code.
> >>>>>  
> >>>>>> and the other is how certain inputs to virtio device such as
> >>>>>> number of buffers be determined.  
> >>>>>
> >>>>> We can start from hard coded the value like 256, or introduce some
> >>>>> API for user to change the value.
> >>>>>  
> >>>>>> Thanks again for your suggestions!  
> >>>>>
> >>>>> You're welcome.
> >>>>>
> >>>>> Note that I just want to check whether or not we can reuse the
> >>>>> virtio bus/driver. It's something similar to what you proposed in
> >>>>> Software Layering but we just replace "vhost core" with "virtio
> >>>>> bus" and move the vhost core below epf/ntb/platform transport.  
> >>>> Got it. My initial design was based on my understanding of your
> >>>> comments [1].  
> >>>
> >>>
> >>> Yes, but that's just for a networking device. If we want something
> >>> more generic, it may require more thought (bus etc).  
> >>
> >> I believe that we indeed need something bus-like to be able to support
> >> a variety of devices.  
> >
> > I think we could still have adapter layers for different types of 
> > devices ([1]) and use existing virtio bus for both front end and back 
> > end. Using bus-like will however simplify adding support for new types 
> > of devices and adding adapters for devices will be slightly more complex.
> >
> > [1] -> Page 13 in 
> > https://linuxplumbersconf.org/event/7/contributions/849/attachments/642/1175/Virtio_for_PCIe_RC_EP_NTB.pdf  

So, I guess it's a trade-off: do we expect a variety of device types,
or more of a variety of devices needing different adapters?
Jason Wang Sept. 9, 2020, 8:41 a.m. UTC | #20
On 2020/9/9 上午12:37, Cornelia Huck wrote:
>> Then you need something that is functional equivalent to virtio PCI
>> which is actually the concept of vDPA (e.g vDPA provides alternatives if
>> the queue_sel is hard in the EP implementation).
> It seems I really need to read up on vDPA more... do you have a pointer
> for diving into this alternatives aspect?


See vpda_config_ops in include/linux/vdpa.h

Especially this part:

     int (*set_vq_address)(struct vdpa_device *vdev,
                   u16 idx, u64 desc_area, u64 driver_area,
                   u64 device_area);

This means for the devices (e.g endpoint device) that is hard to 
implement virtio-pci layout, it can use any other register layout or 
vendor specific way to configure the virtqueue.


>
>>> "Virtio Over NTB" should anyways be a new transport.
>>>> Does that make any sense?
>>> yeah, in the approach I used the initial features are hard-coded in
>>> vhost-rpmsg (inherent to the rpmsg) but when we have to use adapter
>>> layer (vhost only for accessing virtio ring and use virtio drivers on
>>> both front end and backend), based on the functionality (e.g, rpmsg),
>>> the vhost should be configured with features (to be presented to the
>>> virtio) and that's why additional layer or APIs will be required.
>> A question here, if we go with vhost bus approach, does it mean the
>> virtio device can only be implemented in EP's userspace?
> Can we maybe implement an alternative bus as well that would allow us
> to support different virtio device implementations (in addition to the
> vhost bus + userspace combination)?


That should be fine, but I'm not quite sure that implementing the device 
in kerne (kthread) is the good approach.

Thanks


>
Kishon Vijay Abraham I Sept. 14, 2020, 7:23 a.m. UTC | #21
Hi Jason,

On 01/09/20 2:20 pm, Jason Wang wrote:
> 
> On 2020/9/1 下午1:24, Kishon Vijay Abraham I wrote:
>> Hi,
>>
>> On 28/08/20 4:04 pm, Cornelia Huck wrote:
>>> On Thu, 9 Jul 2020 14:26:53 +0800
>>> Jason Wang <jasowang@redhat.com> wrote:
>>>
>>> [Let me note right at the beginning that I first noted this while
>>> listening to Kishon's talk at LPC on Wednesday. I might be very
>>> confused about the background here, so let me apologize beforehand for
>>> any confusion I might spread.]
>>>
>>>> On 2020/7/8 下午9:13, Kishon Vijay Abraham I wrote:
>>>>> Hi Jason,
>>>>>
>>>>> On 7/8/2020 4:52 PM, Jason Wang wrote:
>>>>>> On 2020/7/7 下午10:45, Kishon Vijay Abraham I wrote:
>>>>>>> Hi Jason,
>>>>>>>
>>>>>>> On 7/7/2020 3:17 PM, Jason Wang wrote:
>>>>>>>> On 2020/7/6 下午5:32, Kishon Vijay Abraham I wrote:
>>>>>>>>> Hi Jason,
>>>>>>>>>
>>>>>>>>> On 7/3/2020 12:46 PM, Jason Wang wrote:
>>>>>>>>>> On 2020/7/2 下午9:35, Kishon Vijay Abraham I wrote:
>>>>>>>>>>> Hi Jason,
>>>>>>>>>>>
>>>>>>>>>>> On 7/2/2020 3:40 PM, Jason Wang wrote:
>>>>>>>>>>>> On 2020/7/2 下午5:51, Michael S. Tsirkin wrote:
>>>>>>>>>>>>> On Thu, Jul 02, 2020 at 01:51:21PM +0530, Kishon Vijay
>>>>>>>>>>>>> Abraham I wrote:
>>>>>>>>>>>>>> This series enhances Linux Vhost support to enable SoC-to-SoC
>>>>>>>>>>>>>> communication over MMIO. This series enables rpmsg
>>>>>>>>>>>>>> communication between
>>>>>>>>>>>>>> two SoCs using both PCIe RC<->EP and HOST1-NTB-HOST2
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 1) Modify vhost to use standard Linux driver model
>>>>>>>>>>>>>> 2) Add support in vring to access virtqueue over MMIO
>>>>>>>>>>>>>> 3) Add vhost client driver for rpmsg
>>>>>>>>>>>>>> 4) Add PCIe RC driver (uses virtio) and PCIe EP driver
>>>>>>>>>>>>>> (uses vhost) for
>>>>>>>>>>>>>>          rpmsg communication between two SoCs connected to
>>>>>>>>>>>>>> each other
>>>>>>>>>>>>>> 5) Add NTB Virtio driver and NTB Vhost driver for rpmsg
>>>>>>>>>>>>>> communication
>>>>>>>>>>>>>>          between two SoCs connected via NTB
>>>>>>>>>>>>>> 6) Add configfs to configure the components
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> UseCase1 :
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>        VHOST RPMSG VIRTIO RPMSG
>>>>>>>>>>>>>> +                               +
>>>>>>>>>>>>>> |                               |
>>>>>>>>>>>>>> |                               |
>>>>>>>>>>>>>> |                               |
>>>>>>>>>>>>>> |                               |
>>>>>>>>>>>>>> +-----v------+ +------v-------+
>>>>>>>>>>>>>> |   Linux    |                 | Linux    |
>>>>>>>>>>>>>> |  Endpoint  |                 | Root Complex |
>>>>>>>>>>>>>> | <----------------->              |
>>>>>>>>>>>>>> |            | |              |
>>>>>>>>>>>>>> |    SOC1    |                 | SOC2     |
>>>>>>>>>>>>>> +------------+ +--------------+
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> UseCase 2:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>            VHOST RPMSG VIRTIO RPMSG
>>>>>>>>>>>>>> + +
>>>>>>>>>>>>>> | |
>>>>>>>>>>>>>> | |
>>>>>>>>>>>>>> | |
>>>>>>>>>>>>>> | |
>>>>>>>>>>>>>> +------v------+ +------v------+
>>>>>>>>>>>>>>          | | |             |
>>>>>>>>>>>>>>          |    HOST1 |                                   |
>>>>>>>>>>>>>> HOST2    |
>>>>>>>>>>>>>>          | | |             |
>>>>>>>>>>>>>> +------^------+ +------^------+
>>>>>>>>>>>>>> | |
>>>>>>>>>>>>>> | |
>>>>>>>>>>>>>> +---------------------------------------------------------------------+
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> | +------v------+ +------v------+  |
>>>>>>>>>>>>>> |  | | |             |  |
>>>>>>>>>>>>>> |  |     EP |                                   | EP     
>>>>>>>>>>>>>> |  |
>>>>>>>>>>>>>> |  | CONTROLLER1 |                                   |
>>>>>>>>>>>>>> CONTROLLER2 |  |
>>>>>>>>>>>>>> |  | <-----------------------------------> |  |
>>>>>>>>>>>>>> |  | | |             |  |
>>>>>>>>>>>>>> |  | | |             |  |
>>>>>>>>>>>>>> |  |             |  SoC With Multiple EP Instances  
>>>>>>>>>>>>>> |             |  |
>>>>>>>>>>>>>> |  |             |  (Configured using NTB Function) 
>>>>>>>>>>>>>> |             |  |
>>>>>>>>>>>>>> | +-------------+ +-------------+  |
>>>>>>>>>>>>>> +---------------------------------------------------------------------+
>>>>>>>>>>>>>>
>>>
>>> First of all, to clarify the terminology:
>>> Is "vhost rpmsg" acting as what the virtio standard calls the 'device',
>>> and "virtio rpmsg" as the 'driver'? Or is the "vhost" part mostly just
>>
>> Right, vhost_rpmsg is 'device' and virtio_rpmsg is 'driver'.
>>> virtqueues + the exiting vhost interfaces?
>>
>> It's implemented to provide the full 'device' functionality.
>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Software Layering:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The high-level SW layering should look something like
>>>>>>>>>>>>>> below. This series
>>>>>>>>>>>>>> adds support only for RPMSG VHOST, however something
>>>>>>>>>>>>>> similar should be
>>>>>>>>>>>>>> done for net and scsi. With that any vhost device (PCI,
>>>>>>>>>>>>>> NTB, Platform
>>>>>>>>>>>>>> device, user) can use any of the vhost client driver.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>           +----------------+ +-----------+  +------------+
>>>>>>>>>>>>>> +----------+
>>>>>>>>>>>>>>           |  RPMSG VHOST   |  | NET VHOST |  | SCSI VHOST
>>>>>>>>>>>>>> |  |    X     |
>>>>>>>>>>>>>>           +-------^--------+ +-----^-----+  +-----^------+
>>>>>>>>>>>>>> +----^-----+
>>>>>>>>>>>>>>                   | |              |              |
>>>>>>>>>>>>>>                   | |              |              |
>>>>>>>>>>>>>>                   | |              |              |
>>>>>>>>>>>>>> +-----------v-----------------v--------------v--------------v----------+
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> |                            VHOST
>>>>>>>>>>>>>> CORE                                |
>>>>>>>>>>>>>> +--------^---------------^--------------------^------------------^-----+
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>                | |                    |                  |
>>>>>>>>>>>>>>                | |                    |                  |
>>>>>>>>>>>>>>                | |                    |                  |
>>>>>>>>>>>>>> +--------v-------+  +----v------+ +----------v----------+ 
>>>>>>>>>>>>>> +----v-----+
>>>>>>>>>>>>>> |  PCI EPF VHOST |  | NTB VHOST | |PLATFORM DEVICE VHOST| 
>>>>>>>>>>>>>> |    X     |
>>>>>>>>>>>>>> +----------------+  +-----------+ +---------------------+ 
>>>>>>>>>>>>>> +----------+
>>>
>>> So, the upper half is basically various functionality types, e.g. a net
>>> device. What is the lower half, a hardware interface? Would it be
>>> equivalent to e.g. a normal PCI device?
>>
>> Right, the upper half should provide the functionality.
>> The bottom layer could be a HW interface (like PCIe device or NTB
>> device) or it could be a SW interface (for accessing virtio ring in
>> userspace) that could be used by Hypervisor.
>>
>> The top half should be transparent to what type of device is actually
>> using it.
>>
>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This was initially proposed here [1]
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> [1] ->
>>>>>>>>>>>>>> https://lore.kernel.org/r/2cf00ec4-1ed6-f66e-6897-006d1a5b6390@ti.com
>>>>>>>>>>>>>>
>>>>>>>>>>>>> I find this very interesting. A huge patchset so will take
>>>>>>>>>>>>> a bit
>>>>>>>>>>>>> to review, but I certainly plan to do that. Thanks!
>>>>>>>>>>>> Yes, it would be better if there's a git branch for us to
>>>>>>>>>>>> have a look.
>>>>>>>>>>> I've pushed the branch
>>>>>>>>>>> https://github.com/kishon/linux-wip.git vhost_rpmsg_pci_ntb_rfc
>>>>>>>>>> Thanks
>>>>>>>>>>
>>>>>>>>>>>> Btw, I'm not sure I get the big picture, but I vaguely feel
>>>>>>>>>>>> some of the
>>>>>>>>>>>> work is
>>>>>>>>>>>> duplicated with vDPA (e.g the epf transport or vhost bus).
>>>>>>>>>>> This is about connecting two different HW systems both
>>>>>>>>>>> running Linux and
>>>>>>>>>>> doesn't necessarily involve virtualization.
>>>>>>>>>> Right, this is something similar to VOP
>>>>>>>>>> (Documentation/misc-devices/mic/mic_overview.rst). The
>>>>>>>>>> different is the
>>>>>>>>>> hardware I guess and VOP use userspace application to
>>>>>>>>>> implement the device.
>>>>>>>>> I'd also like to point out, this series tries to have
>>>>>>>>> communication between
>>>>>>>>> two
>>>>>>>>> SoCs in vendor agnostic way. Since this series solves for 2
>>>>>>>>> usecases (PCIe
>>>>>>>>> RC<->EP and NTB), for the NTB case it directly plugs into NTB
>>>>>>>>> framework and
>>>>>>>>> any
>>>>>>>>> of the HW in NTB below should be able to use a virtio-vhost
>>>>>>>>> communication
>>>>>>>>>
>>>>>>>>> #ls drivers/ntb/hw/
>>>>>>>>> amd  epf  idt  intel  mscc
>>>>>>>>>
>>>>>>>>> And similarly for the PCIe RC<->EP communication, this adds a
>>>>>>>>> generic endpoint
>>>>>>>>> function driver and hence any SoC that supports configurable
>>>>>>>>> PCIe endpoint can
>>>>>>>>> use virtio-vhost communication
>>>>>>>>>
>>>>>>>>> # ls drivers/pci/controller/dwc/*ep*
>>>>>>>>> drivers/pci/controller/dwc/pcie-designware-ep.c
>>>>>>>>> drivers/pci/controller/dwc/pcie-uniphier-ep.c
>>>>>>>>> drivers/pci/controller/dwc/pci-layerscape-ep.c
>>>>>>>> Thanks for those backgrounds.
>>>>>>>>
>>>>>>>>>>>       So there is no guest or host as in
>>>>>>>>>>> virtualization but two entirely different systems connected
>>>>>>>>>>> via PCIe cable,
>>>>>>>>>>> one
>>>>>>>>>>> acting as guest and one as host. So one system will provide
>>>>>>>>>>> virtio
>>>>>>>>>>> functionality reserving memory for virtqueues and the other
>>>>>>>>>>> provides vhost
>>>>>>>>>>> functionality providing a way to access the virtqueues in
>>>>>>>>>>> virtio memory.
>>>>>>>>>>> One is
>>>>>>>>>>> source and the other is sink and there is no intermediate
>>>>>>>>>>> entity. (vhost was
>>>>>>>>>>> probably intermediate entity in virtualization?)
>>>>>>>>>> (Not a native English speaker) but "vhost" could introduce
>>>>>>>>>> some confusion for
>>>>>>>>>> me since it was use for implementing virtio backend for
>>>>>>>>>> userspace drivers. I
>>>>>>>>>> guess "vringh" could be better.
>>>>>>>>> Initially I had named this vringh but later decided to choose
>>>>>>>>> vhost instead of
>>>>>>>>> vringh. vhost is still a virtio backend (not necessarily
>>>>>>>>> userspace) though it
>>>>>>>>> now resides in an entirely different system. Whatever virtio is
>>>>>>>>> for a frontend
>>>>>>>>> system, vhost can be that for a backend system. vring can be
>>>>>>>>> for accessing
>>>>>>>>> virtqueue and can be used either in frontend or backend.
>>>
>>> I guess that clears up at least some of my questions from above...
>>>
>>>>>>>> Ok.
>>>>>>>>
>>>>>>>>>>>> Have you considered to implement these through vDPA?
>>>>>>>>>>> IIUC vDPA only provides an interface to userspace and an
>>>>>>>>>>> in-kernel rpmsg
>>>>>>>>>>> driver
>>>>>>>>>>> or vhost net driver is not provided.
>>>>>>>>>>>
>>>>>>>>>>> The HW connection looks something like
>>>>>>>>>>> https://pasteboard.co/JfMVVHC.jpg
>>>>>>>>>>> (usecase2 above),
>>>>>>>>>> I see.
>>>>>>>>>>
>>>>>>>>>>>       all the boards run Linux. The middle board provides NTB
>>>>>>>>>>> functionality and board on either side provides virtio/vhost
>>>>>>>>>>> functionality and
>>>>>>>>>>> transfer data using rpmsg.
>>>
>>> This setup looks really interesting (sometimes, it's really hard to
>>> imagine this in the abstract.)
>>>>>>>>>> So I wonder whether it's worthwhile for a new bus. Can we use
>>>>>>>>>> the existed virtio-bus/drivers? It might work as, except for
>>>>>>>>>> the epf transport, we can introduce a epf "vhost" transport
>>>>>>>>>> driver.
>>>>>>>>> IMHO we'll need two buses one for frontend and other for
>>>>>>>>> backend because the two components can then co-operate/interact
>>>>>>>>> with each other to provide a functionality. Though both will
>>>>>>>>> seemingly provide similar callbacks, they are both provide
>>>>>>>>> symmetrical or complimentary funcitonality and need not be same
>>>>>>>>> or identical.
>>>>>>>>>
>>>>>>>>> Having the same bus can also create sequencing issues.
>>>>>>>>>
>>>>>>>>> If you look at virtio_dev_probe() of virtio_bus
>>>>>>>>>
>>>>>>>>> device_features = dev->config->get_features(dev);
>>>>>>>>>
>>>>>>>>> Now if we use same bus for both front-end and back-end, both
>>>>>>>>> will try to get_features when there has been no set_features.
>>>>>>>>> Ideally vhost device should be initialized first with the set
>>>>>>>>> of features it supports. Vhost and virtio should use "status"
>>>>>>>>> and "features" complimentarily and not identically.
>>>>>>>> Yes, but there's no need for doing status/features passthrough
>>>>>>>> in epf vhost drivers.b
>>>>>>>>
>>>>>>>>> virtio device (or frontend) cannot be initialized before vhost
>>>>>>>>> device (or backend) gets initialized with data such as
>>>>>>>>> features. Similarly vhost (backend)
>>>>>>>>> cannot access virqueues or buffers before virtio (frontend) sets
>>>>>>>>> VIRTIO_CONFIG_S_DRIVER_OK whereas that requirement is not there
>>>>>>>>> for virtio as the physical memory for virtqueues are created by
>>>>>>>>> virtio (frontend).
>>>>>>>> epf vhost drivers need to implement two devices: vhost(vringh)
>>>>>>>> device and virtio device (which is a mediated device). The
>>>>>>>> vhost(vringh) device is doing feature negotiation with the
>>>>>>>> virtio device via RC/EP or NTB. The virtio device is doing
>>>>>>>> feature negotiation with local virtio drivers. If there're
>>>>>>>> feature mismatch, epf vhost drivers and do mediation between
>>>>>>>> them.
>>>>>>> Here epf vhost should be initialized with a set of features for
>>>>>>> it to negotiate either as vhost device or virtio device no? Where
>>>>>>> should the initial feature set for epf vhost come from?
>>>>>>
>>>>>> I think it can work as:
>>>>>>
>>>>>> 1) Having an initial features (hard coded in the code) set X in
>>>>>> epf vhost 2) Using this X for both virtio device and vhost(vringh)
>>>>>> device 3) local virtio driver will negotiate with virtio device
>>>>>> with feature set Y 4) remote virtio driver will negotiate with
>>>>>> vringh device with feature set Z 5) mediate between feature Y and
>>>>>> feature Z since both Y and Z are a subset of X
>>>>>>
>>>>> okay. I'm also thinking if we could have configfs for configuring
>>>>> this. Anyways we could find different approaches of configuring
>>>>> this.
>>>>
>>>>
>>>> Yes, and I think some management API is needed even in the design of
>>>> your "Software Layering". In that figure, rpmsg vhost need some
>>>> pre-set or hard-coded features.
>>>
>>> When I saw the plumbers talk, my first idea was "this needs to be a new
>>> transport". You have some hard-coded or pre-configured features, and
>>> then features are negotiated via a transport-specific means in the
>>> usual way. There's basically an extra/extended layer for this (and
>>> status, and whatever).
>>
>> I think for PCIe root complex to PCIe endpoint communication it's
>> still "Virtio Over PCI Bus", though existing layout cannot be used in
>> this context (find virtio capability will fail for modern interface
>> and loading queue status immediately after writing queue number is not
>> possible for root complex to endpoint communication; setup_vq() in
>> virtio_pci_legacy.c).
> 
> 
> Then you need something that is functional equivalent to virtio PCI
> which is actually the concept of vDPA (e.g vDPA provides alternatives if
> the queue_sel is hard in the EP implementation).

Okay, I just tried to compare the 'struct vdpa_config_ops' and 'struct
vhost_config_ops' ( introduced in [RFC PATCH 03/22] vhost: Add ops for
the VHOST driver to configure VHOST device).

struct vdpa_config_ops {
	/* Virtqueue ops */
	int (*set_vq_address)(struct vdpa_device *vdev,
			      u16 idx, u64 desc_area, u64 driver_area,
			      u64 device_area);
	void (*set_vq_num)(struct vdpa_device *vdev, u16 idx, u32 num);
	void (*kick_vq)(struct vdpa_device *vdev, u16 idx);
	void (*set_vq_cb)(struct vdpa_device *vdev, u16 idx,
			  struct vdpa_callback *cb);
	void (*set_vq_ready)(struct vdpa_device *vdev, u16 idx, bool ready);
	bool (*get_vq_ready)(struct vdpa_device *vdev, u16 idx);
	int (*set_vq_state)(struct vdpa_device *vdev, u16 idx,
			    const struct vdpa_vq_state *state);
	int (*get_vq_state)(struct vdpa_device *vdev, u16 idx,
			    struct vdpa_vq_state *state);
	struct vdpa_notification_area
	(*get_vq_notification)(struct vdpa_device *vdev, u16 idx);
	/* vq irq is not expected to be changed once DRIVER_OK is set */
	int (*get_vq_irq)(struct vdpa_device *vdv, u16 idx);

	/* Device ops */
	u32 (*get_vq_align)(struct vdpa_device *vdev);
	u64 (*get_features)(struct vdpa_device *vdev);
	int (*set_features)(struct vdpa_device *vdev, u64 features);
	void (*set_config_cb)(struct vdpa_device *vdev,
			      struct vdpa_callback *cb);
	u16 (*get_vq_num_max)(struct vdpa_device *vdev);
	u32 (*get_device_id)(struct vdpa_device *vdev);
	u32 (*get_vendor_id)(struct vdpa_device *vdev);
	u8 (*get_status)(struct vdpa_device *vdev);
	void (*set_status)(struct vdpa_device *vdev, u8 status);
	void (*get_config)(struct vdpa_device *vdev, unsigned int offset,
			   void *buf, unsigned int len);
	void (*set_config)(struct vdpa_device *vdev, unsigned int offset,
			   const void *buf, unsigned int len);
	u32 (*get_generation)(struct vdpa_device *vdev);

	/* DMA ops */
	int (*set_map)(struct vdpa_device *vdev, struct vhost_iotlb *iotlb);
	int (*dma_map)(struct vdpa_device *vdev, u64 iova, u64 size,
		       u64 pa, u32 perm);
	int (*dma_unmap)(struct vdpa_device *vdev, u64 iova, u64 size);

	/* Free device resources */
	void (*free)(struct vdpa_device *vdev);
};

+struct vhost_config_ops {
+	int (*create_vqs)(struct vhost_dev *vdev, unsigned int nvqs,
+			  unsigned int num_bufs, struct vhost_virtqueue *vqs[],
+			  vhost_vq_callback_t *callbacks[],
+			  const char * const names[]);
+	void (*del_vqs)(struct vhost_dev *vdev);
+	int (*write)(struct vhost_dev *vdev, u64 vhost_dst, void *src, int len);
+	int (*read)(struct vhost_dev *vdev, void *dst, u64 vhost_src, int len);
+	int (*set_features)(struct vhost_dev *vdev, u64 device_features);
+	int (*set_status)(struct vhost_dev *vdev, u8 status);
+	u8 (*get_status)(struct vhost_dev *vdev);
+};
+
struct virtio_config_ops
I think there's some overlap here and some of the ops tries to do the
same thing.

I think it differs in (*set_vq_address)() and (*create_vqs)().
[create_vqs() introduced in struct vhost_config_ops provides
complimentary functionality to (*find_vqs)() in struct
virtio_config_ops. It seemingly encapsulates the functionality of
(*set_vq_address)(), (*set_vq_num)(), (*set_vq_cb)(),..].

Back to the difference between (*set_vq_address)() and (*create_vqs)(),
set_vq_address() directly provides the virtqueue address to the vdpa
device but create_vqs() only provides the parameters of the virtqueue
(like the number of virtqueues, number of buffers) but does not directly
provide the address. IMO the backend client drivers (like net or vhost)
shouldn't/cannot by itself know how to access the vring created on
virtio front-end. The vdpa device/vhost device should have logic for
that. That will help the client drivers to work with different types of
vdpa device/vhost device and can access the vring created by virtio
irrespective of whether the vring can be accessed via mmio or kernel
space or user space.

I think vdpa always works with client drivers in userspace and providing
userspace address for vring.
> 
> 
>>
>> "Virtio Over NTB" should anyways be a new transport.
>>>
>>> Does that make any sense?
>>
>> yeah, in the approach I used the initial features are hard-coded in
>> vhost-rpmsg (inherent to the rpmsg) but when we have to use adapter
>> layer (vhost only for accessing virtio ring and use virtio drivers on
>> both front end and backend), based on the functionality (e.g, rpmsg),
>> the vhost should be configured with features (to be presented to the
>> virtio) and that's why additional layer or APIs will be required.
> 
> 
> A question here, if we go with vhost bus approach, does it mean the
> virtio device can only be implemented in EP's userspace?

The vhost bus approach doesn't provide any restriction in where the
virto backend device should be created. This series creates two types of
virtio backend device (one for PCIe endpoint and the other for NTB) and
both these devices are created in kernel.

Thanks
Kishon

> 
> Thanks
> 
> 
>>>
>>>>
>>>>
>>>>>>>>>> It will have virtqueues but only used for the communication
>>>>>>>>>> between itself and
>>>>>>>>>> uppter virtio driver. And it will have vringh queues which
>>>>>>>>>> will be probe by virtio epf transport drivers. And it needs to
>>>>>>>>>> do datacopy between virtqueue and
>>>>>>>>>> vringh queues.
>>>>>>>>>>
>>>>>>>>>> It works like:
>>>>>>>>>>
>>>>>>>>>> virtio drivers <- virtqueue/virtio-bus -> epf vhost drivers <-
>>>>>>>>>> vringh queue/epf>
>>>>>>>>>>
>>>>>>>>>> The advantages is that there's no need for writing new buses
>>>>>>>>>> and drivers.
>>>>>>>>> I think this will work however there is an addtional copy
>>>>>>>>> between vringh queue and virtqueue,
>>>>>>>> I think not? E.g in use case 1), if we stick to virtio bus, we
>>>>>>>> will have:
>>>>>>>>
>>>>>>>> virtio-rpmsg (EP) <- virtio ring(1) -> epf vhost driver (EP) <-
>>>>>>>> virtio ring(2) -> virtio pci (RC) <-> virtio rpmsg (RC)
>>>>>>> IIUC epf vhost driver (EP) will access virtio ring(2) using
>>>>>>> vringh?
>>>>>>
>>>>>> Yes.
>>>>>>
>>>>>>> And virtio
>>>>>>> ring(2) is created by virtio pci (RC).
>>>>>>
>>>>>> Yes.
>>>>>>
>>>>>>>> What epf vhost driver did is to read from virtio ring(1) about
>>>>>>>> the buffer len and addr and them DMA to Linux(RC)?
>>>>>>> okay, I made some optimization here where vhost-rpmsg using a
>>>>>>> helper writes a buffer from rpmsg's upper layer directly to
>>>>>>> remote Linux (RC) as against here were it has to be first written
>>>>>>> to virtio ring (1).
>>>>>>>
>>>>>>> Thinking how this would look for NTB
>>>>>>> virtio-rpmsg (HOST1) <- virtio ring(1) -> NTB(HOST1) <->
>>>>>>> NTB(HOST2)  <- virtio ring(2) -> virtio-rpmsg (HOST2)
>>>>>>>
>>>>>>> Here the NTB(HOST1) will access the virtio ring(2) using vringh?
>>>>>>
>>>>>> Yes, I think so it needs to use vring to access virtio ring (1) as
>>>>>> well.
>>>>> NTB(HOST1) and virtio ring(1) will be in the same system. So it
>>>>> doesn't have to use vring. virtio ring(1) is by the virtio device
>>>>> the NTB(HOST1) creates.
>>>>
>>>>
>>>> Right.
>>>>
>>>>
>>>>>>> Do you also think this will work seamlessly with virtio_net.c,
>>>>>>> virtio_blk.c?
>>>>>>
>>>>>> Yes.
>>>>> okay, I haven't looked at this but the backend of virtio_blk should
>>>>> access an actual storage device no?
>>>>
>>>>
>>>> Good point, for non-peer device like storage. There's probably no
>>>> need for it to be registered on the virtio bus and it might be better
>>>> to behave as you proposed.
>>>
>>> I might be missing something; but if you expose something as a block
>>> device, it should have something it can access with block reads/writes,
>>> shouldn't it? Of course, that can be a variety of things.
>>>
>>>>
>>>> Just to make sure I understand the design, how is VHOST SCSI expected
>>>> to work in your proposal, does it have a device for file as a backend?
>>>>
>>>>
>>>>>>> I'd like to get clarity on two things in the approach you
>>>>>>> suggested, one is features (since epf vhost should ideally be
>>>>>>> transparent to any virtio driver)
>>>>>>
>>>>>> We can have have an array of pre-defined features indexed by
>>>>>> virtio device id in the code.
>>>>>>
>>>>>>> and the other is how certain inputs to virtio device such as
>>>>>>> number of buffers be determined.
>>>>>>
>>>>>> We can start from hard coded the value like 256, or introduce some
>>>>>> API for user to change the value.
>>>>>>
>>>>>>> Thanks again for your suggestions!
>>>>>>
>>>>>> You're welcome.
>>>>>>
>>>>>> Note that I just want to check whether or not we can reuse the
>>>>>> virtio bus/driver. It's something similar to what you proposed in
>>>>>> Software Layering but we just replace "vhost core" with "virtio
>>>>>> bus" and move the vhost core below epf/ntb/platform transport.
>>>>> Got it. My initial design was based on my understanding of your
>>>>> comments [1].
>>>>
>>>>
>>>> Yes, but that's just for a networking device. If we want something
>>>> more generic, it may require more thought (bus etc).
>>>
>>> I believe that we indeed need something bus-like to be able to support
>>> a variety of devices.
>>
>> I think we could still have adapter layers for different types of
>> devices ([1]) and use existing virtio bus for both front end and back
>> end. Using bus-like will however simplify adding support for new types
>> of devices and adding adapters for devices will be slightly more complex.
>>
>> [1] -> Page 13 in
>> https://linuxplumbersconf.org/event/7/contributions/849/attachments/642/1175/Virtio_for_PCIe_RC_EP_NTB.pdf
>>
>>>
>>>>
>>>>
>>>>>
>>>>> I'll try to create something based on your proposed design here.
>>>>
>>>>
>>>> Sure, but for coding, we'd better wait for other's opinion here.
>>>
>>> Please tell me if my thoughts above make any sense... I have just
>>> started looking at that, so I might be completely off.
>>
>> I think your understanding is correct! Thanks for your inputs.
>>
>> Thanks
>> Kishon
>
Jason Wang Sept. 15, 2020, 8:18 a.m. UTC | #22
Hi Kishon:

On 2020/9/14 下午3:23, Kishon Vijay Abraham I wrote:
>> Then you need something that is functional equivalent to virtio PCI
>> which is actually the concept of vDPA (e.g vDPA provides alternatives if
>> the queue_sel is hard in the EP implementation).
> Okay, I just tried to compare the 'struct vdpa_config_ops' and 'struct
> vhost_config_ops' ( introduced in [RFC PATCH 03/22] vhost: Add ops for
> the VHOST driver to configure VHOST device).
>
> struct vdpa_config_ops {
> 	/* Virtqueue ops */
> 	int (*set_vq_address)(struct vdpa_device *vdev,
> 			      u16 idx, u64 desc_area, u64 driver_area,
> 			      u64 device_area);
> 	void (*set_vq_num)(struct vdpa_device *vdev, u16 idx, u32 num);
> 	void (*kick_vq)(struct vdpa_device *vdev, u16 idx);
> 	void (*set_vq_cb)(struct vdpa_device *vdev, u16 idx,
> 			  struct vdpa_callback *cb);
> 	void (*set_vq_ready)(struct vdpa_device *vdev, u16 idx, bool ready);
> 	bool (*get_vq_ready)(struct vdpa_device *vdev, u16 idx);
> 	int (*set_vq_state)(struct vdpa_device *vdev, u16 idx,
> 			    const struct vdpa_vq_state *state);
> 	int (*get_vq_state)(struct vdpa_device *vdev, u16 idx,
> 			    struct vdpa_vq_state *state);
> 	struct vdpa_notification_area
> 	(*get_vq_notification)(struct vdpa_device *vdev, u16 idx);
> 	/* vq irq is not expected to be changed once DRIVER_OK is set */
> 	int (*get_vq_irq)(struct vdpa_device *vdv, u16 idx);
>
> 	/* Device ops */
> 	u32 (*get_vq_align)(struct vdpa_device *vdev);
> 	u64 (*get_features)(struct vdpa_device *vdev);
> 	int (*set_features)(struct vdpa_device *vdev, u64 features);
> 	void (*set_config_cb)(struct vdpa_device *vdev,
> 			      struct vdpa_callback *cb);
> 	u16 (*get_vq_num_max)(struct vdpa_device *vdev);
> 	u32 (*get_device_id)(struct vdpa_device *vdev);
> 	u32 (*get_vendor_id)(struct vdpa_device *vdev);
> 	u8 (*get_status)(struct vdpa_device *vdev);
> 	void (*set_status)(struct vdpa_device *vdev, u8 status);
> 	void (*get_config)(struct vdpa_device *vdev, unsigned int offset,
> 			   void *buf, unsigned int len);
> 	void (*set_config)(struct vdpa_device *vdev, unsigned int offset,
> 			   const void *buf, unsigned int len);
> 	u32 (*get_generation)(struct vdpa_device *vdev);
>
> 	/* DMA ops */
> 	int (*set_map)(struct vdpa_device *vdev, struct vhost_iotlb *iotlb);
> 	int (*dma_map)(struct vdpa_device *vdev, u64 iova, u64 size,
> 		       u64 pa, u32 perm);
> 	int (*dma_unmap)(struct vdpa_device *vdev, u64 iova, u64 size);
>
> 	/* Free device resources */
> 	void (*free)(struct vdpa_device *vdev);
> };
>
> +struct vhost_config_ops {
> +	int (*create_vqs)(struct vhost_dev *vdev, unsigned int nvqs,
> +			  unsigned int num_bufs, struct vhost_virtqueue *vqs[],
> +			  vhost_vq_callback_t *callbacks[],
> +			  const char * const names[]);
> +	void (*del_vqs)(struct vhost_dev *vdev);
> +	int (*write)(struct vhost_dev *vdev, u64 vhost_dst, void *src, int len);
> +	int (*read)(struct vhost_dev *vdev, void *dst, u64 vhost_src, int len);
> +	int (*set_features)(struct vhost_dev *vdev, u64 device_features);
> +	int (*set_status)(struct vhost_dev *vdev, u8 status);
> +	u8 (*get_status)(struct vhost_dev *vdev);
> +};
> +
> struct virtio_config_ops
> I think there's some overlap here and some of the ops tries to do the
> same thing.
>
> I think it differs in (*set_vq_address)() and (*create_vqs)().
> [create_vqs() introduced in struct vhost_config_ops provides
> complimentary functionality to (*find_vqs)() in struct
> virtio_config_ops. It seemingly encapsulates the functionality of
> (*set_vq_address)(), (*set_vq_num)(), (*set_vq_cb)(),..].
>
> Back to the difference between (*set_vq_address)() and (*create_vqs)(),
> set_vq_address() directly provides the virtqueue address to the vdpa
> device but create_vqs() only provides the parameters of the virtqueue
> (like the number of virtqueues, number of buffers) but does not directly
> provide the address. IMO the backend client drivers (like net or vhost)
> shouldn't/cannot by itself know how to access the vring created on
> virtio front-end. The vdpa device/vhost device should have logic for
> that. That will help the client drivers to work with different types of
> vdpa device/vhost device and can access the vring created by virtio
> irrespective of whether the vring can be accessed via mmio or kernel
> space or user space.
>
> I think vdpa always works with client drivers in userspace and providing
> userspace address for vring.


Sorry for being unclear. What I meant is not replacing vDPA with the 
vhost(bus) you proposed but the possibility of replacing virtio-pci-epf 
with vDPA in:

My question is basically for the part of virtio_pci_epf_send_command(), 
so it looks to me you have a vendor specific API to replace the 
virtio-pci layout of the BAR:


+static int virtio_pci_epf_send_command(struct virtio_pci_device *vp_dev,
+                       u32 command)
+{
+    struct virtio_pci_epf *pci_epf;
+    void __iomem *ioaddr;
+    ktime_t timeout;
+    bool timedout;
+    int ret = 0;
+    u8 status;
+
+    pci_epf = to_virtio_pci_epf(vp_dev);
+    ioaddr = vp_dev->ioaddr;
+
+    mutex_lock(&pci_epf->lock);
+    writeb(command, ioaddr + HOST_CMD);
+    timeout = ktime_add_ms(ktime_get(), COMMAND_TIMEOUT);
+    while (1) {
+        timedout = ktime_after(ktime_get(), timeout);
+        status = readb(ioaddr + HOST_CMD_STATUS);
+

Several questions:

- It's not clear to me how the synchronization is done between the RC 
and EP. E.g how and when the value of HOST_CMD_STATUS can be changed.  
If you still want to introduce a new transport, a virtio spec patch 
would be helpful for us to understand the device API.
- You have you vendor specific layout (according to 
virtio_pci_epb_table()), so I guess you it's better to have a vendor 
specific vDPA driver instead
- The advantage of vendor specific vDPA driver is that it can 1) have 
less codes 2) support userspace drivers through vhost-vDPA (instead of 
inventing new APIs since we can't use vfio-pci here).


>>> "Virtio Over NTB" should anyways be a new transport.
>>>> Does that make any sense?
>>> yeah, in the approach I used the initial features are hard-coded in
>>> vhost-rpmsg (inherent to the rpmsg) but when we have to use adapter
>>> layer (vhost only for accessing virtio ring and use virtio drivers on
>>> both front end and backend), based on the functionality (e.g, rpmsg),
>>> the vhost should be configured with features (to be presented to the
>>> virtio) and that's why additional layer or APIs will be required.
>> A question here, if we go with vhost bus approach, does it mean the
>> virtio device can only be implemented in EP's userspace?
> The vhost bus approach doesn't provide any restriction in where the
> virto backend device should be created. This series creates two types of
> virtio backend device (one for PCIe endpoint and the other for NTB) and
> both these devices are created in kernel.


Ok.

Thanks


>
> Thanks
> Kishon
>
Kishon Vijay Abraham I Sept. 15, 2020, 3:47 p.m. UTC | #23
Hi Jason,

On 15/09/20 1:48 pm, Jason Wang wrote:
> Hi Kishon:
> 
> On 2020/9/14 下午3:23, Kishon Vijay Abraham I wrote:
>>> Then you need something that is functional equivalent to virtio PCI
>>> which is actually the concept of vDPA (e.g vDPA provides alternatives if
>>> the queue_sel is hard in the EP implementation).
>> Okay, I just tried to compare the 'struct vdpa_config_ops' and 'struct
>> vhost_config_ops' ( introduced in [RFC PATCH 03/22] vhost: Add ops for
>> the VHOST driver to configure VHOST device).
>>
>> struct vdpa_config_ops {
>>     /* Virtqueue ops */
>>     int (*set_vq_address)(struct vdpa_device *vdev,
>>                   u16 idx, u64 desc_area, u64 driver_area,
>>                   u64 device_area);
>>     void (*set_vq_num)(struct vdpa_device *vdev, u16 idx, u32 num);
>>     void (*kick_vq)(struct vdpa_device *vdev, u16 idx);
>>     void (*set_vq_cb)(struct vdpa_device *vdev, u16 idx,
>>               struct vdpa_callback *cb);
>>     void (*set_vq_ready)(struct vdpa_device *vdev, u16 idx, bool ready);
>>     bool (*get_vq_ready)(struct vdpa_device *vdev, u16 idx);
>>     int (*set_vq_state)(struct vdpa_device *vdev, u16 idx,
>>                 const struct vdpa_vq_state *state);
>>     int (*get_vq_state)(struct vdpa_device *vdev, u16 idx,
>>                 struct vdpa_vq_state *state);
>>     struct vdpa_notification_area
>>     (*get_vq_notification)(struct vdpa_device *vdev, u16 idx);
>>     /* vq irq is not expected to be changed once DRIVER_OK is set */
>>     int (*get_vq_irq)(struct vdpa_device *vdv, u16 idx);
>>
>>     /* Device ops */
>>     u32 (*get_vq_align)(struct vdpa_device *vdev);
>>     u64 (*get_features)(struct vdpa_device *vdev);
>>     int (*set_features)(struct vdpa_device *vdev, u64 features);
>>     void (*set_config_cb)(struct vdpa_device *vdev,
>>                   struct vdpa_callback *cb);
>>     u16 (*get_vq_num_max)(struct vdpa_device *vdev);
>>     u32 (*get_device_id)(struct vdpa_device *vdev);
>>     u32 (*get_vendor_id)(struct vdpa_device *vdev);
>>     u8 (*get_status)(struct vdpa_device *vdev);
>>     void (*set_status)(struct vdpa_device *vdev, u8 status);
>>     void (*get_config)(struct vdpa_device *vdev, unsigned int offset,
>>                void *buf, unsigned int len);
>>     void (*set_config)(struct vdpa_device *vdev, unsigned int offset,
>>                const void *buf, unsigned int len);
>>     u32 (*get_generation)(struct vdpa_device *vdev);
>>
>>     /* DMA ops */
>>     int (*set_map)(struct vdpa_device *vdev, struct vhost_iotlb *iotlb);
>>     int (*dma_map)(struct vdpa_device *vdev, u64 iova, u64 size,
>>                u64 pa, u32 perm);
>>     int (*dma_unmap)(struct vdpa_device *vdev, u64 iova, u64 size);
>>
>>     /* Free device resources */
>>     void (*free)(struct vdpa_device *vdev);
>> };
>>
>> +struct vhost_config_ops {
>> +    int (*create_vqs)(struct vhost_dev *vdev, unsigned int nvqs,
>> +              unsigned int num_bufs, struct vhost_virtqueue *vqs[],
>> +              vhost_vq_callback_t *callbacks[],
>> +              const char * const names[]);
>> +    void (*del_vqs)(struct vhost_dev *vdev);
>> +    int (*write)(struct vhost_dev *vdev, u64 vhost_dst, void *src,
>> int len);
>> +    int (*read)(struct vhost_dev *vdev, void *dst, u64 vhost_src, int
>> len);
>> +    int (*set_features)(struct vhost_dev *vdev, u64 device_features);
>> +    int (*set_status)(struct vhost_dev *vdev, u8 status);
>> +    u8 (*get_status)(struct vhost_dev *vdev);
>> +};
>> +
>> struct virtio_config_ops
>> I think there's some overlap here and some of the ops tries to do the
>> same thing.
>>
>> I think it differs in (*set_vq_address)() and (*create_vqs)().
>> [create_vqs() introduced in struct vhost_config_ops provides
>> complimentary functionality to (*find_vqs)() in struct
>> virtio_config_ops. It seemingly encapsulates the functionality of
>> (*set_vq_address)(), (*set_vq_num)(), (*set_vq_cb)(),..].
>>
>> Back to the difference between (*set_vq_address)() and (*create_vqs)(),
>> set_vq_address() directly provides the virtqueue address to the vdpa
>> device but create_vqs() only provides the parameters of the virtqueue
>> (like the number of virtqueues, number of buffers) but does not directly
>> provide the address. IMO the backend client drivers (like net or vhost)
>> shouldn't/cannot by itself know how to access the vring created on
>> virtio front-end. The vdpa device/vhost device should have logic for
>> that. That will help the client drivers to work with different types of
>> vdpa device/vhost device and can access the vring created by virtio
>> irrespective of whether the vring can be accessed via mmio or kernel
>> space or user space.
>>
>> I think vdpa always works with client drivers in userspace and providing
>> userspace address for vring.
> 
> 
> Sorry for being unclear. What I meant is not replacing vDPA with the
> vhost(bus) you proposed but the possibility of replacing virtio-pci-epf
> with vDPA in:

Okay, so the virtio back-end still use vhost and front end should use
vDPA. I see. So the host side PCI driver for EPF should populate
vdpa_config_ops and invoke vdpa_register_device().
> 
> My question is basically for the part of virtio_pci_epf_send_command(),
> so it looks to me you have a vendor specific API to replace the
> virtio-pci layout of the BAR:

Even when we use vDPA, we have to use some sort of
virtio_pci_epf_send_command() to communicate with virtio backend right?

Right, the layout is slightly different from the standard layout.

This is the layout
struct epf_vhost_reg_queue {
        u8 cmd;
        u8 cmd_status;
        u16 status;
        u16 num_buffers;
        u16 msix_vector;
        u64 queue_addr;
} __packed;

struct epf_vhost_reg {
        u64 host_features;
        u64 guest_features;
        u16 msix_config;
        u16 num_queues;
        u8 device_status;
        u8 config_generation;
        u32 isr;
        u8 cmd;
        u8 cmd_status;
        struct epf_vhost_reg_queue vq[MAX_VQS];
} __packed;
> 
> 
> +static int virtio_pci_epf_send_command(struct virtio_pci_device *vp_dev,
> +                       u32 command)
> +{
> +    struct virtio_pci_epf *pci_epf;
> +    void __iomem *ioaddr;
> +    ktime_t timeout;
> +    bool timedout;
> +    int ret = 0;
> +    u8 status;
> +
> +    pci_epf = to_virtio_pci_epf(vp_dev);
> +    ioaddr = vp_dev->ioaddr;
> +
> +    mutex_lock(&pci_epf->lock);
> +    writeb(command, ioaddr + HOST_CMD);
> +    timeout = ktime_add_ms(ktime_get(), COMMAND_TIMEOUT);
> +    while (1) {
> +        timedout = ktime_after(ktime_get(), timeout);
> +        status = readb(ioaddr + HOST_CMD_STATUS);
> +
> 
> Several questions:
> 
> - It's not clear to me how the synchronization is done between the RC
> and EP. E.g how and when the value of HOST_CMD_STATUS can be changed.

The HOST_CMD (commands sent to the EP) is serialized by using mutex.
Once the EP reads the command, it resets the value in HOST_CMD. So
HOST_CMD is less likely an issue.

A sufficiently large time is given for the EP to complete it's operation
(1 Sec) where the EP provides the status in HOST_CMD_STATUS. After it
expires, HOST_CMD_STATUS_NONE is written to HOST_CMD_STATUS. There could
be case where EP updates HOST_CMD_STATUS after RC writes
HOST_CMD_STATUS_NONE, but by then HOST has already detected this as
failure and error-ed out.
 
> If you still want to introduce a new transport, a virtio spec patch
> would be helpful for us to understand the device API.

Okay, that should be on https://github.com/oasis-tcs/virtio-spec.git?
> - You have you vendor specific layout (according to
> virtio_pci_epb_table()), so I guess you it's better to have a vendor
> specific vDPA driver instead

Okay, with vDPA, we are free to define our own layouts.
> - The advantage of vendor specific vDPA driver is that it can 1) have
> less codes 2) support userspace drivers through vhost-vDPA (instead of
> inventing new APIs since we can't use vfio-pci here).

I see there's an additional level of indirection from virtio to vDPA and
probably no need for spec update but don't exactly see how it'll reduce
code.

For 2, Isn't vhost-vdpa supposed to run on virtio backend?

From a high level, I think I should be able to use vDPA for
virtio_pci_epf.c. Would you also suggest using vDPA for ntb_virtio.c?
([RFC PATCH 20/22] NTB: Add a new NTB client driver to implement VIRTIO
functionality).

Thanks
Kishon
Jason Wang Sept. 16, 2020, 3:10 a.m. UTC | #24
On 2020/9/15 下午11:47, Kishon Vijay Abraham I wrote:
> Hi Jason,
>
> On 15/09/20 1:48 pm, Jason Wang wrote:
>> Hi Kishon:
>>
>> On 2020/9/14 下午3:23, Kishon Vijay Abraham I wrote:
>>>> Then you need something that is functional equivalent to virtio PCI
>>>> which is actually the concept of vDPA (e.g vDPA provides alternatives if
>>>> the queue_sel is hard in the EP implementation).
>>> Okay, I just tried to compare the 'struct vdpa_config_ops' and 'struct
>>> vhost_config_ops' ( introduced in [RFC PATCH 03/22] vhost: Add ops for
>>> the VHOST driver to configure VHOST device).
>>>
>>> struct vdpa_config_ops {
>>>      /* Virtqueue ops */
>>>      int (*set_vq_address)(struct vdpa_device *vdev,
>>>                    u16 idx, u64 desc_area, u64 driver_area,
>>>                    u64 device_area);
>>>      void (*set_vq_num)(struct vdpa_device *vdev, u16 idx, u32 num);
>>>      void (*kick_vq)(struct vdpa_device *vdev, u16 idx);
>>>      void (*set_vq_cb)(struct vdpa_device *vdev, u16 idx,
>>>                struct vdpa_callback *cb);
>>>      void (*set_vq_ready)(struct vdpa_device *vdev, u16 idx, bool ready);
>>>      bool (*get_vq_ready)(struct vdpa_device *vdev, u16 idx);
>>>      int (*set_vq_state)(struct vdpa_device *vdev, u16 idx,
>>>                  const struct vdpa_vq_state *state);
>>>      int (*get_vq_state)(struct vdpa_device *vdev, u16 idx,
>>>                  struct vdpa_vq_state *state);
>>>      struct vdpa_notification_area
>>>      (*get_vq_notification)(struct vdpa_device *vdev, u16 idx);
>>>      /* vq irq is not expected to be changed once DRIVER_OK is set */
>>>      int (*get_vq_irq)(struct vdpa_device *vdv, u16 idx);
>>>
>>>      /* Device ops */
>>>      u32 (*get_vq_align)(struct vdpa_device *vdev);
>>>      u64 (*get_features)(struct vdpa_device *vdev);
>>>      int (*set_features)(struct vdpa_device *vdev, u64 features);
>>>      void (*set_config_cb)(struct vdpa_device *vdev,
>>>                    struct vdpa_callback *cb);
>>>      u16 (*get_vq_num_max)(struct vdpa_device *vdev);
>>>      u32 (*get_device_id)(struct vdpa_device *vdev);
>>>      u32 (*get_vendor_id)(struct vdpa_device *vdev);
>>>      u8 (*get_status)(struct vdpa_device *vdev);
>>>      void (*set_status)(struct vdpa_device *vdev, u8 status);
>>>      void (*get_config)(struct vdpa_device *vdev, unsigned int offset,
>>>                 void *buf, unsigned int len);
>>>      void (*set_config)(struct vdpa_device *vdev, unsigned int offset,
>>>                 const void *buf, unsigned int len);
>>>      u32 (*get_generation)(struct vdpa_device *vdev);
>>>
>>>      /* DMA ops */
>>>      int (*set_map)(struct vdpa_device *vdev, struct vhost_iotlb *iotlb);
>>>      int (*dma_map)(struct vdpa_device *vdev, u64 iova, u64 size,
>>>                 u64 pa, u32 perm);
>>>      int (*dma_unmap)(struct vdpa_device *vdev, u64 iova, u64 size);
>>>
>>>      /* Free device resources */
>>>      void (*free)(struct vdpa_device *vdev);
>>> };
>>>
>>> +struct vhost_config_ops {
>>> +    int (*create_vqs)(struct vhost_dev *vdev, unsigned int nvqs,
>>> +              unsigned int num_bufs, struct vhost_virtqueue *vqs[],
>>> +              vhost_vq_callback_t *callbacks[],
>>> +              const char * const names[]);
>>> +    void (*del_vqs)(struct vhost_dev *vdev);
>>> +    int (*write)(struct vhost_dev *vdev, u64 vhost_dst, void *src,
>>> int len);
>>> +    int (*read)(struct vhost_dev *vdev, void *dst, u64 vhost_src, int
>>> len);
>>> +    int (*set_features)(struct vhost_dev *vdev, u64 device_features);
>>> +    int (*set_status)(struct vhost_dev *vdev, u8 status);
>>> +    u8 (*get_status)(struct vhost_dev *vdev);
>>> +};
>>> +
>>> struct virtio_config_ops
>>> I think there's some overlap here and some of the ops tries to do the
>>> same thing.
>>>
>>> I think it differs in (*set_vq_address)() and (*create_vqs)().
>>> [create_vqs() introduced in struct vhost_config_ops provides
>>> complimentary functionality to (*find_vqs)() in struct
>>> virtio_config_ops. It seemingly encapsulates the functionality of
>>> (*set_vq_address)(), (*set_vq_num)(), (*set_vq_cb)(),..].
>>>
>>> Back to the difference between (*set_vq_address)() and (*create_vqs)(),
>>> set_vq_address() directly provides the virtqueue address to the vdpa
>>> device but create_vqs() only provides the parameters of the virtqueue
>>> (like the number of virtqueues, number of buffers) but does not directly
>>> provide the address. IMO the backend client drivers (like net or vhost)
>>> shouldn't/cannot by itself know how to access the vring created on
>>> virtio front-end. The vdpa device/vhost device should have logic for
>>> that. That will help the client drivers to work with different types of
>>> vdpa device/vhost device and can access the vring created by virtio
>>> irrespective of whether the vring can be accessed via mmio or kernel
>>> space or user space.
>>>
>>> I think vdpa always works with client drivers in userspace and providing
>>> userspace address for vring.
>>
>> Sorry for being unclear. What I meant is not replacing vDPA with the
>> vhost(bus) you proposed but the possibility of replacing virtio-pci-epf
>> with vDPA in:
> Okay, so the virtio back-end still use vhost and front end should use
> vDPA. I see. So the host side PCI driver for EPF should populate
> vdpa_config_ops and invoke vdpa_register_device().


Yes.


>> My question is basically for the part of virtio_pci_epf_send_command(),
>> so it looks to me you have a vendor specific API to replace the
>> virtio-pci layout of the BAR:
> Even when we use vDPA, we have to use some sort of
> virtio_pci_epf_send_command() to communicate with virtio backend right?


Right.


>
> Right, the layout is slightly different from the standard layout.
>
> This is the layout
> struct epf_vhost_reg_queue {
>          u8 cmd;
>          u8 cmd_status;
>          u16 status;
>          u16 num_buffers;
>          u16 msix_vector;
>          u64 queue_addr;


What's the meaning of queue_addr here?

Does not mean the device expects a contiguous memory for avail/desc/used 
ring?


> } __packed;
>
> struct epf_vhost_reg {
>          u64 host_features;
>          u64 guest_features;
>          u16 msix_config;
>          u16 num_queues;
>          u8 device_status;
>          u8 config_generation;
>          u32 isr;
>          u8 cmd;
>          u8 cmd_status;
>          struct epf_vhost_reg_queue vq[MAX_VQS];
> } __packed;
>>
>> +static int virtio_pci_epf_send_command(struct virtio_pci_device *vp_dev,
>> +                       u32 command)
>> +{
>> +    struct virtio_pci_epf *pci_epf;
>> +    void __iomem *ioaddr;
>> +    ktime_t timeout;
>> +    bool timedout;
>> +    int ret = 0;
>> +    u8 status;
>> +
>> +    pci_epf = to_virtio_pci_epf(vp_dev);
>> +    ioaddr = vp_dev->ioaddr;
>> +
>> +    mutex_lock(&pci_epf->lock);
>> +    writeb(command, ioaddr + HOST_CMD);
>> +    timeout = ktime_add_ms(ktime_get(), COMMAND_TIMEOUT);
>> +    while (1) {
>> +        timedout = ktime_after(ktime_get(), timeout);
>> +        status = readb(ioaddr + HOST_CMD_STATUS);
>> +
>>
>> Several questions:
>>
>> - It's not clear to me how the synchronization is done between the RC
>> and EP. E.g how and when the value of HOST_CMD_STATUS can be changed.
> The HOST_CMD (commands sent to the EP) is serialized by using mutex.
> Once the EP reads the command, it resets the value in HOST_CMD. So
> HOST_CMD is less likely an issue.


Here's my understanding of the protocol:

1) RC write to HOST_CMD
2) RC wait for HOST_CMD_STATUS to be HOST_CMD_STATUS_OKAY

It looks to me what EP should do is

1) EP reset HOST_CMD after reading new command

And it looks to me EP should also reset HOST_CMD_STATUS here?

(I thought there should be patch to handle stuffs like this but I didn't 
find it in this series)


>
> A sufficiently large time is given for the EP to complete it's operation
> (1 Sec) where the EP provides the status in HOST_CMD_STATUS. After it
> expires, HOST_CMD_STATUS_NONE is written to HOST_CMD_STATUS. There could
> be case where EP updates HOST_CMD_STATUS after RC writes
> HOST_CMD_STATUS_NONE, but by then HOST has already detected this as
> failure and error-ed out.
>   
>> If you still want to introduce a new transport, a virtio spec patch
>> would be helpful for us to understand the device API.
> Okay, that should be on https://github.com/oasis-tcs/virtio-spec.git?


Yes.


>> - You have you vendor specific layout (according to
>> virtio_pci_epb_table()), so I guess you it's better to have a vendor
>> specific vDPA driver instead
> Okay, with vDPA, we are free to define our own layouts.


Right, but vDPA have other requirements. E.g it requires the device have 
the ability to save/restore the state (e.g the last_avail_idx).

So it actually depends on what you want. If you don't care about 
userspace drivers and want to have a standard transport, you can still 
go virtio.


>> - The advantage of vendor specific vDPA driver is that it can 1) have
>> less codes 2) support userspace drivers through vhost-vDPA (instead of
>> inventing new APIs since we can't use vfio-pci here).
> I see there's an additional level of indirection from virtio to vDPA and
> probably no need for spec update but don't exactly see how it'll reduce
> code.


AFAIK you don't need to implement your own setup_vq and del_vq.


>
> For 2, Isn't vhost-vdpa supposed to run on virtio backend?


Not currently, vDPA is a superset of virtio (e.g it support virtqueue 
state save/restore). This it should be possible in the future probably.


>
>  From a high level, I think I should be able to use vDPA for
> virtio_pci_epf.c. Would you also suggest using vDPA for ntb_virtio.c?
> ([RFC PATCH 20/22] NTB: Add a new NTB client driver to implement VIRTIO
> functionality).


I think it's your call. If you want

1) a well-defined standard virtio transport
2) willing to finalize d and maintain the spec
3) doesn't care about userspace drivers

You can go with virtio, otherwise vDPA.

Thanks


>
> Thanks
> Kishon
>
Kishon Vijay Abraham I Sept. 16, 2020, 11:47 a.m. UTC | #25
Hi Jason,

On 16/09/20 8:40 am, Jason Wang wrote:
> 
> On 2020/9/15 下午11:47, Kishon Vijay Abraham I wrote:
>> Hi Jason,
>>
>> On 15/09/20 1:48 pm, Jason Wang wrote:
>>> Hi Kishon:
>>>
>>> On 2020/9/14 下午3:23, Kishon Vijay Abraham I wrote:
>>>>> Then you need something that is functional equivalent to virtio PCI
>>>>> which is actually the concept of vDPA (e.g vDPA provides
>>>>> alternatives if
>>>>> the queue_sel is hard in the EP implementation).
>>>> Okay, I just tried to compare the 'struct vdpa_config_ops' and 'struct
>>>> vhost_config_ops' ( introduced in [RFC PATCH 03/22] vhost: Add ops for
>>>> the VHOST driver to configure VHOST device).
>>>>
>>>> struct vdpa_config_ops {
>>>>      /* Virtqueue ops */
>>>>      int (*set_vq_address)(struct vdpa_device *vdev,
>>>>                    u16 idx, u64 desc_area, u64 driver_area,
>>>>                    u64 device_area);
>>>>      void (*set_vq_num)(struct vdpa_device *vdev, u16 idx, u32 num);
>>>>      void (*kick_vq)(struct vdpa_device *vdev, u16 idx);
>>>>      void (*set_vq_cb)(struct vdpa_device *vdev, u16 idx,
>>>>                struct vdpa_callback *cb);
>>>>      void (*set_vq_ready)(struct vdpa_device *vdev, u16 idx, bool
>>>> ready);
>>>>      bool (*get_vq_ready)(struct vdpa_device *vdev, u16 idx);
>>>>      int (*set_vq_state)(struct vdpa_device *vdev, u16 idx,
>>>>                  const struct vdpa_vq_state *state);
>>>>      int (*get_vq_state)(struct vdpa_device *vdev, u16 idx,
>>>>                  struct vdpa_vq_state *state);
>>>>      struct vdpa_notification_area
>>>>      (*get_vq_notification)(struct vdpa_device *vdev, u16 idx);
>>>>      /* vq irq is not expected to be changed once DRIVER_OK is set */
>>>>      int (*get_vq_irq)(struct vdpa_device *vdv, u16 idx);
>>>>
>>>>      /* Device ops */
>>>>      u32 (*get_vq_align)(struct vdpa_device *vdev);
>>>>      u64 (*get_features)(struct vdpa_device *vdev);
>>>>      int (*set_features)(struct vdpa_device *vdev, u64 features);
>>>>      void (*set_config_cb)(struct vdpa_device *vdev,
>>>>                    struct vdpa_callback *cb);
>>>>      u16 (*get_vq_num_max)(struct vdpa_device *vdev);
>>>>      u32 (*get_device_id)(struct vdpa_device *vdev);
>>>>      u32 (*get_vendor_id)(struct vdpa_device *vdev);
>>>>      u8 (*get_status)(struct vdpa_device *vdev);
>>>>      void (*set_status)(struct vdpa_device *vdev, u8 status);
>>>>      void (*get_config)(struct vdpa_device *vdev, unsigned int offset,
>>>>                 void *buf, unsigned int len);
>>>>      void (*set_config)(struct vdpa_device *vdev, unsigned int offset,
>>>>                 const void *buf, unsigned int len);
>>>>      u32 (*get_generation)(struct vdpa_device *vdev);
>>>>
>>>>      /* DMA ops */
>>>>      int (*set_map)(struct vdpa_device *vdev, struct vhost_iotlb
>>>> *iotlb);
>>>>      int (*dma_map)(struct vdpa_device *vdev, u64 iova, u64 size,
>>>>                 u64 pa, u32 perm);
>>>>      int (*dma_unmap)(struct vdpa_device *vdev, u64 iova, u64 size);
>>>>
>>>>      /* Free device resources */
>>>>      void (*free)(struct vdpa_device *vdev);
>>>> };
>>>>
>>>> +struct vhost_config_ops {
>>>> +    int (*create_vqs)(struct vhost_dev *vdev, unsigned int nvqs,
>>>> +              unsigned int num_bufs, struct vhost_virtqueue *vqs[],
>>>> +              vhost_vq_callback_t *callbacks[],
>>>> +              const char * const names[]);
>>>> +    void (*del_vqs)(struct vhost_dev *vdev);
>>>> +    int (*write)(struct vhost_dev *vdev, u64 vhost_dst, void *src,
>>>> int len);
>>>> +    int (*read)(struct vhost_dev *vdev, void *dst, u64 vhost_src, int
>>>> len);
>>>> +    int (*set_features)(struct vhost_dev *vdev, u64 device_features);
>>>> +    int (*set_status)(struct vhost_dev *vdev, u8 status);
>>>> +    u8 (*get_status)(struct vhost_dev *vdev);
>>>> +};
>>>> +
>>>> struct virtio_config_ops
>>>> I think there's some overlap here and some of the ops tries to do the
>>>> same thing.
>>>>
>>>> I think it differs in (*set_vq_address)() and (*create_vqs)().
>>>> [create_vqs() introduced in struct vhost_config_ops provides
>>>> complimentary functionality to (*find_vqs)() in struct
>>>> virtio_config_ops. It seemingly encapsulates the functionality of
>>>> (*set_vq_address)(), (*set_vq_num)(), (*set_vq_cb)(),..].
>>>>
>>>> Back to the difference between (*set_vq_address)() and (*create_vqs)(),
>>>> set_vq_address() directly provides the virtqueue address to the vdpa
>>>> device but create_vqs() only provides the parameters of the virtqueue
>>>> (like the number of virtqueues, number of buffers) but does not
>>>> directly
>>>> provide the address. IMO the backend client drivers (like net or vhost)
>>>> shouldn't/cannot by itself know how to access the vring created on
>>>> virtio front-end. The vdpa device/vhost device should have logic for
>>>> that. That will help the client drivers to work with different types of
>>>> vdpa device/vhost device and can access the vring created by virtio
>>>> irrespective of whether the vring can be accessed via mmio or kernel
>>>> space or user space.
>>>>
>>>> I think vdpa always works with client drivers in userspace and
>>>> providing
>>>> userspace address for vring.
>>>
>>> Sorry for being unclear. What I meant is not replacing vDPA with the
>>> vhost(bus) you proposed but the possibility of replacing virtio-pci-epf
>>> with vDPA in:
>> Okay, so the virtio back-end still use vhost and front end should use
>> vDPA. I see. So the host side PCI driver for EPF should populate
>> vdpa_config_ops and invoke vdpa_register_device().
> 
> 
> Yes.
> 
> 
>>> My question is basically for the part of virtio_pci_epf_send_command(),
>>> so it looks to me you have a vendor specific API to replace the
>>> virtio-pci layout of the BAR:
>> Even when we use vDPA, we have to use some sort of
>> virtio_pci_epf_send_command() to communicate with virtio backend right?
> 
> 
> Right.
> 
> 
>>
>> Right, the layout is slightly different from the standard layout.
>>
>> This is the layout
>> struct epf_vhost_reg_queue {
>>          u8 cmd;
>>          u8 cmd_status;
>>          u16 status;
>>          u16 num_buffers;
>>          u16 msix_vector;
>>          u64 queue_addr;
> 
> 
> What's the meaning of queue_addr here?

Using queue_addr, the virtio front-end communicates the address of the
allocated memory for virtqueue to the virtio back-end.
> 
> Does not mean the device expects a contiguous memory for avail/desc/used
> ring?

It's contiguous memory. Isn't this similar to other virtio transport
(both PCI legacy and modern interface)?.
> 
> 
>> } __packed;
>>
>> struct epf_vhost_reg {
>>          u64 host_features;
>>          u64 guest_features;
>>          u16 msix_config;
>>          u16 num_queues;
>>          u8 device_status;
>>          u8 config_generation;
>>          u32 isr;
>>          u8 cmd;
>>          u8 cmd_status;
>>          struct epf_vhost_reg_queue vq[MAX_VQS];
>> } __packed;
>>>
>>> +static int virtio_pci_epf_send_command(struct virtio_pci_device
>>> *vp_dev,
>>> +                       u32 command)
>>> +{
>>> +    struct virtio_pci_epf *pci_epf;
>>> +    void __iomem *ioaddr;
>>> +    ktime_t timeout;
>>> +    bool timedout;
>>> +    int ret = 0;
>>> +    u8 status;
>>> +
>>> +    pci_epf = to_virtio_pci_epf(vp_dev);
>>> +    ioaddr = vp_dev->ioaddr;
>>> +
>>> +    mutex_lock(&pci_epf->lock);
>>> +    writeb(command, ioaddr + HOST_CMD);
>>> +    timeout = ktime_add_ms(ktime_get(), COMMAND_TIMEOUT);
>>> +    while (1) {
>>> +        timedout = ktime_after(ktime_get(), timeout);
>>> +        status = readb(ioaddr + HOST_CMD_STATUS);
>>> +
>>>
>>> Several questions:
>>>
>>> - It's not clear to me how the synchronization is done between the RC
>>> and EP. E.g how and when the value of HOST_CMD_STATUS can be changed.
>> The HOST_CMD (commands sent to the EP) is serialized by using mutex.
>> Once the EP reads the command, it resets the value in HOST_CMD. So
>> HOST_CMD is less likely an issue.
> 
> 
> Here's my understanding of the protocol:
> 
> 1) RC write to HOST_CMD
> 2) RC wait for HOST_CMD_STATUS to be HOST_CMD_STATUS_OKAY

That's right!
> 
> It looks to me what EP should do is
> 
> 1) EP reset HOST_CMD after reading new command

That's right! It does.
> 
> And it looks to me EP should also reset HOST_CMD_STATUS here?

yeah, that would require RC to send another command to reset the status.
Didn't see it required in the normal scenario but good to add this.
> 
> (I thought there should be patch to handle stuffs like this but I didn't
> find it in this series)

This is added in [RFC PATCH 19/22] PCI: endpoint: Add EP function driver
to provide VHOST interface

pci_epf_vhost_cmd_handler() gets commands from RC using "reg->cmd;". On
the EP side, it is local memory access (mapped to BAR memory exposed to
the host) and hence accessed using structure member access.
> 
> 
>>
>> A sufficiently large time is given for the EP to complete it's operation
>> (1 Sec) where the EP provides the status in HOST_CMD_STATUS. After it
>> expires, HOST_CMD_STATUS_NONE is written to HOST_CMD_STATUS. There could
>> be case where EP updates HOST_CMD_STATUS after RC writes
>> HOST_CMD_STATUS_NONE, but by then HOST has already detected this as
>> failure and error-ed out.
>>  
>>> If you still want to introduce a new transport, a virtio spec patch
>>> would be helpful for us to understand the device API.
>> Okay, that should be on https://github.com/oasis-tcs/virtio-spec.git?
> 
> 
> Yes.
> 
> 
>>> - You have you vendor specific layout (according to
>>> virtio_pci_epb_table()), so I guess you it's better to have a vendor
>>> specific vDPA driver instead
>> Okay, with vDPA, we are free to define our own layouts.
> 
> 
> Right, but vDPA have other requirements. E.g it requires the device have
> the ability to save/restore the state (e.g the last_avail_idx).
> 
> So it actually depends on what you want. If you don't care about
> userspace drivers and want to have a standard transport, you can still
> go virtio.

okay.
> 
> 
>>> - The advantage of vendor specific vDPA driver is that it can 1) have
>>> less codes 2) support userspace drivers through vhost-vDPA (instead of
>>> inventing new APIs since we can't use vfio-pci here).
>> I see there's an additional level of indirection from virtio to vDPA and
>> probably no need for spec update but don't exactly see how it'll reduce
>> code.
> 
> 
> AFAIK you don't need to implement your own setup_vq and del_vq.
> 
There should still be some entity that allocates memory for virtqueues
and then communicate this address to the backend.

Maybe I have to look this further.
> 
>>
>> For 2, Isn't vhost-vdpa supposed to run on virtio backend?
> 
> 
> Not currently, vDPA is a superset of virtio (e.g it support virtqueue
> state save/restore). This it should be possible in the future probably.
> 
> 
>>
>>  From a high level, I think I should be able to use vDPA for
>> virtio_pci_epf.c. Would you also suggest using vDPA for ntb_virtio.c?
>> ([RFC PATCH 20/22] NTB: Add a new NTB client driver to implement VIRTIO
>> functionality).
> 
> 
> I think it's your call. If you want
> 
> 1) a well-defined standard virtio transport
> 2) willing to finalize d and maintain the spec
> 3) doesn't care about userspace drivers

IIUC, we can use vDPA (virtio_vdpa.c) but still don't need userspace
drivers right?
> 
> You can go with virtio, otherwise vDPA.

Okay, let me see. Thanks for your inputs.

Best Regards,
Kishon
Jason Wang Sept. 18, 2020, 4:04 a.m. UTC | #26
On 2020/9/16 下午7:47, Kishon Vijay Abraham I wrote:
> Hi Jason,
>
> On 16/09/20 8:40 am, Jason Wang wrote:
>> On 2020/9/15 下午11:47, Kishon Vijay Abraham I wrote:
>>> Hi Jason,
>>>
>>> On 15/09/20 1:48 pm, Jason Wang wrote:
>>>> Hi Kishon:
>>>>
>>>> On 2020/9/14 下午3:23, Kishon Vijay Abraham I wrote:
>>>>>> Then you need something that is functional equivalent to virtio PCI
>>>>>> which is actually the concept of vDPA (e.g vDPA provides
>>>>>> alternatives if
>>>>>> the queue_sel is hard in the EP implementation).
>>>>> Okay, I just tried to compare the 'struct vdpa_config_ops' and 'struct
>>>>> vhost_config_ops' ( introduced in [RFC PATCH 03/22] vhost: Add ops for
>>>>> the VHOST driver to configure VHOST device).
>>>>>
>>>>> struct vdpa_config_ops {
>>>>>       /* Virtqueue ops */
>>>>>       int (*set_vq_address)(struct vdpa_device *vdev,
>>>>>                     u16 idx, u64 desc_area, u64 driver_area,
>>>>>                     u64 device_area);
>>>>>       void (*set_vq_num)(struct vdpa_device *vdev, u16 idx, u32 num);
>>>>>       void (*kick_vq)(struct vdpa_device *vdev, u16 idx);
>>>>>       void (*set_vq_cb)(struct vdpa_device *vdev, u16 idx,
>>>>>                 struct vdpa_callback *cb);
>>>>>       void (*set_vq_ready)(struct vdpa_device *vdev, u16 idx, bool
>>>>> ready);
>>>>>       bool (*get_vq_ready)(struct vdpa_device *vdev, u16 idx);
>>>>>       int (*set_vq_state)(struct vdpa_device *vdev, u16 idx,
>>>>>                   const struct vdpa_vq_state *state);
>>>>>       int (*get_vq_state)(struct vdpa_device *vdev, u16 idx,
>>>>>                   struct vdpa_vq_state *state);
>>>>>       struct vdpa_notification_area
>>>>>       (*get_vq_notification)(struct vdpa_device *vdev, u16 idx);
>>>>>       /* vq irq is not expected to be changed once DRIVER_OK is set */
>>>>>       int (*get_vq_irq)(struct vdpa_device *vdv, u16 idx);
>>>>>
>>>>>       /* Device ops */
>>>>>       u32 (*get_vq_align)(struct vdpa_device *vdev);
>>>>>       u64 (*get_features)(struct vdpa_device *vdev);
>>>>>       int (*set_features)(struct vdpa_device *vdev, u64 features);
>>>>>       void (*set_config_cb)(struct vdpa_device *vdev,
>>>>>                     struct vdpa_callback *cb);
>>>>>       u16 (*get_vq_num_max)(struct vdpa_device *vdev);
>>>>>       u32 (*get_device_id)(struct vdpa_device *vdev);
>>>>>       u32 (*get_vendor_id)(struct vdpa_device *vdev);
>>>>>       u8 (*get_status)(struct vdpa_device *vdev);
>>>>>       void (*set_status)(struct vdpa_device *vdev, u8 status);
>>>>>       void (*get_config)(struct vdpa_device *vdev, unsigned int offset,
>>>>>                  void *buf, unsigned int len);
>>>>>       void (*set_config)(struct vdpa_device *vdev, unsigned int offset,
>>>>>                  const void *buf, unsigned int len);
>>>>>       u32 (*get_generation)(struct vdpa_device *vdev);
>>>>>
>>>>>       /* DMA ops */
>>>>>       int (*set_map)(struct vdpa_device *vdev, struct vhost_iotlb
>>>>> *iotlb);
>>>>>       int (*dma_map)(struct vdpa_device *vdev, u64 iova, u64 size,
>>>>>                  u64 pa, u32 perm);
>>>>>       int (*dma_unmap)(struct vdpa_device *vdev, u64 iova, u64 size);
>>>>>
>>>>>       /* Free device resources */
>>>>>       void (*free)(struct vdpa_device *vdev);
>>>>> };
>>>>>
>>>>> +struct vhost_config_ops {
>>>>> +    int (*create_vqs)(struct vhost_dev *vdev, unsigned int nvqs,
>>>>> +              unsigned int num_bufs, struct vhost_virtqueue *vqs[],
>>>>> +              vhost_vq_callback_t *callbacks[],
>>>>> +              const char * const names[]);
>>>>> +    void (*del_vqs)(struct vhost_dev *vdev);
>>>>> +    int (*write)(struct vhost_dev *vdev, u64 vhost_dst, void *src,
>>>>> int len);
>>>>> +    int (*read)(struct vhost_dev *vdev, void *dst, u64 vhost_src, int
>>>>> len);
>>>>> +    int (*set_features)(struct vhost_dev *vdev, u64 device_features);
>>>>> +    int (*set_status)(struct vhost_dev *vdev, u8 status);
>>>>> +    u8 (*get_status)(struct vhost_dev *vdev);
>>>>> +};
>>>>> +
>>>>> struct virtio_config_ops
>>>>> I think there's some overlap here and some of the ops tries to do the
>>>>> same thing.
>>>>>
>>>>> I think it differs in (*set_vq_address)() and (*create_vqs)().
>>>>> [create_vqs() introduced in struct vhost_config_ops provides
>>>>> complimentary functionality to (*find_vqs)() in struct
>>>>> virtio_config_ops. It seemingly encapsulates the functionality of
>>>>> (*set_vq_address)(), (*set_vq_num)(), (*set_vq_cb)(),..].
>>>>>
>>>>> Back to the difference between (*set_vq_address)() and (*create_vqs)(),
>>>>> set_vq_address() directly provides the virtqueue address to the vdpa
>>>>> device but create_vqs() only provides the parameters of the virtqueue
>>>>> (like the number of virtqueues, number of buffers) but does not
>>>>> directly
>>>>> provide the address. IMO the backend client drivers (like net or vhost)
>>>>> shouldn't/cannot by itself know how to access the vring created on
>>>>> virtio front-end. The vdpa device/vhost device should have logic for
>>>>> that. That will help the client drivers to work with different types of
>>>>> vdpa device/vhost device and can access the vring created by virtio
>>>>> irrespective of whether the vring can be accessed via mmio or kernel
>>>>> space or user space.
>>>>>
>>>>> I think vdpa always works with client drivers in userspace and
>>>>> providing
>>>>> userspace address for vring.
>>>> Sorry for being unclear. What I meant is not replacing vDPA with the
>>>> vhost(bus) you proposed but the possibility of replacing virtio-pci-epf
>>>> with vDPA in:
>>> Okay, so the virtio back-end still use vhost and front end should use
>>> vDPA. I see. So the host side PCI driver for EPF should populate
>>> vdpa_config_ops and invoke vdpa_register_device().
>>
>> Yes.
>>
>>
>>>> My question is basically for the part of virtio_pci_epf_send_command(),
>>>> so it looks to me you have a vendor specific API to replace the
>>>> virtio-pci layout of the BAR:
>>> Even when we use vDPA, we have to use some sort of
>>> virtio_pci_epf_send_command() to communicate with virtio backend right?
>>
>> Right.
>>
>>
>>> Right, the layout is slightly different from the standard layout.
>>>
>>> This is the layout
>>> struct epf_vhost_reg_queue {
>>>           u8 cmd;
>>>           u8 cmd_status;
>>>           u16 status;
>>>           u16 num_buffers;
>>>           u16 msix_vector;
>>>           u64 queue_addr;
>>
>> What's the meaning of queue_addr here?
> Using queue_addr, the virtio front-end communicates the address of the
> allocated memory for virtqueue to the virtio back-end.
>> Does not mean the device expects a contiguous memory for avail/desc/used
>> ring?
> It's contiguous memory. Isn't this similar to other virtio transport
> (both PCI legacy and modern interface)?.


That's only for legacy device, for modern device we don't have such 
restriction.


>>
>>> } __packed;
>>>
>>> struct epf_vhost_reg {
>>>           u64 host_features;
>>>           u64 guest_features;
>>>           u16 msix_config;
>>>           u16 num_queues;
>>>           u8 device_status;
>>>           u8 config_generation;
>>>           u32 isr;
>>>           u8 cmd;
>>>           u8 cmd_status;
>>>           struct epf_vhost_reg_queue vq[MAX_VQS];
>>> } __packed;
>>>> +static int virtio_pci_epf_send_command(struct virtio_pci_device
>>>> *vp_dev,
>>>> +                       u32 command)
>>>> +{
>>>> +    struct virtio_pci_epf *pci_epf;
>>>> +    void __iomem *ioaddr;
>>>> +    ktime_t timeout;
>>>> +    bool timedout;
>>>> +    int ret = 0;
>>>> +    u8 status;
>>>> +
>>>> +    pci_epf = to_virtio_pci_epf(vp_dev);
>>>> +    ioaddr = vp_dev->ioaddr;
>>>> +
>>>> +    mutex_lock(&pci_epf->lock);
>>>> +    writeb(command, ioaddr + HOST_CMD);
>>>> +    timeout = ktime_add_ms(ktime_get(), COMMAND_TIMEOUT);
>>>> +    while (1) {
>>>> +        timedout = ktime_after(ktime_get(), timeout);
>>>> +        status = readb(ioaddr + HOST_CMD_STATUS);
>>>> +
>>>>
>>>> Several questions:
>>>>
>>>> - It's not clear to me how the synchronization is done between the RC
>>>> and EP. E.g how and when the value of HOST_CMD_STATUS can be changed.
>>> The HOST_CMD (commands sent to the EP) is serialized by using mutex.
>>> Once the EP reads the command, it resets the value in HOST_CMD. So
>>> HOST_CMD is less likely an issue.
>>
>> Here's my understanding of the protocol:
>>
>> 1) RC write to HOST_CMD
>> 2) RC wait for HOST_CMD_STATUS to be HOST_CMD_STATUS_OKAY
> That's right!
>> It looks to me what EP should do is
>>
>> 1) EP reset HOST_CMD after reading new command
> That's right! It does.
>> And it looks to me EP should also reset HOST_CMD_STATUS here?
> yeah, that would require RC to send another command to reset the status.
> Didn't see it required in the normal scenario but good to add this.
>> (I thought there should be patch to handle stuffs like this but I didn't
>> find it in this series)
> This is added in [RFC PATCH 19/22] PCI: endpoint: Add EP function driver
> to provide VHOST interface
>
> pci_epf_vhost_cmd_handler() gets commands from RC using "reg->cmd;". On
> the EP side, it is local memory access (mapped to BAR memory exposed to
> the host) and hence accessed using structure member access.


Thanks for the pointer, will have a look at and I think this part need 
to be carefully designed and the key to the success of the epf transport.