diff mbox series

[ovs-dev,ovs,v3,2/2] netdev-dpdk: Add dpdkvdpa port

Message ID 1571311016-38066-3-git-send-email-noae@mellanox.com
State Superseded
Headers show
Series Introduce dpdkvdpa netdev | expand

Commit Message

Noa Ezra Oct. 17, 2019, 11:16 a.m. UTC
dpdkvdpa netdev works with 3 components:
vhost-user socket, vdpa device: real vdpa device or a VF and
representor of "vdpa device".

In order to add a new vDPA port, add a new port to existing bridge
with type dpdkvdpa and vDPA options:
ovs-vsctl add-port br0 vdpa0 -- set Interface vdpa0 type=dpdkvdpa
   options:vdpa-socket-path=<sock path>
   options:vdpa-accelerator-devargs=<VF pci id>
   options:dpdk-devargs=<vdpa pci id>,representor=[id]

On this command OVS will create a new netdev:
1. Register vhost-user-client device.
2. Open and configure VF dpdk port.
3. Open and configure representor dpdk port.

The new netdev will use netdev_rxq_recv() function in order to receive
packets from VF and push to vhost-user and receive packets from
vhost-user and push to VF.

Signed-off-by: Noa Ezra <noae@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
---
 Documentation/automake.mk           |   1 +
 Documentation/topics/dpdk/index.rst |   1 +
 Documentation/topics/dpdk/vdpa.rst  |  90 ++++++++++++++++++++
 NEWS                                |   1 +
 lib/netdev-dpdk.c                   | 162 ++++++++++++++++++++++++++++++++++++
 vswitchd/vswitch.xml                |  25 ++++++
 6 files changed, 280 insertions(+)
 create mode 100644 Documentation/topics/dpdk/vdpa.rst

Comments

William Tu Oct. 17, 2019, 9:33 p.m. UTC | #1
On Thu, Oct 17, 2019 at 02:16:56PM +0300, Noa Ezra wrote:

Hi Noa,

Thanks for the patch. I'm new to this and have a question below.

> dpdkvdpa netdev works with 3 components:
> vhost-user socket, vdpa device: real vdpa device or a VF and
> representor of "vdpa device".
> 
> In order to add a new vDPA port, add a new port to existing bridge
> with type dpdkvdpa and vDPA options:
> ovs-vsctl add-port br0 vdpa0 -- set Interface vdpa0 type=dpdkvdpa
>    options:vdpa-socket-path=<sock path>
>    options:vdpa-accelerator-devargs=<VF pci id>
>    options:dpdk-devargs=<vdpa pci id>,representor=[id]
> 
> On this command OVS will create a new netdev:
> 1. Register vhost-user-client device.
> 2. Open and configure VF dpdk port.
> 3. Open and configure representor dpdk port.
> 
> The new netdev will use netdev_rxq_recv() function in order to receive
> packets from VF and push to vhost-user and receive packets from
> vhost-user and push to VF.

So does OVS in this case is able to apply OpenFlow rules on packets?

When netdev_dpdk_vdpa_rxq_recv() is invoked, does the batch of packets
go into OVS's parse, lookup, action pipeline? Or all packets go directly
into VM if (VF -> VM) and vice versa?

Is
fwd_rx = netdev_dpdk_vdpa_rxq_recv_impl(dev->relay, rxq->queue_id);
forward packets from vhost-user to VF
and 
ret = netdev_dpdk_rxq_recv(rxq, batch, qfill);
forward packets from vhost-user to VM?

Thanks
William
Noa Ezra Oct. 22, 2019, 7:17 a.m. UTC | #2
Hi,
Please see the answer below.

Thanks,
Noa.

> -----Original Message-----
> From: William Tu [mailto:u9012063@gmail.com]
> Sent: Friday, October 18, 2019 12:34 AM
> To: Noa Ezra <noae@mellanox.com>
> Cc: ovs-dev@openvswitch.org; Oz Shlomo <ozsh@mellanox.com>; Majd
> Dibbiny <majd@mellanox.com>; Ameer Mahagneh
> <ameerm@mellanox.com>; Eli Britstein <elibr@mellanox.com>
> Subject: Re: [ovs-dev] [PATCH ovs v3 2/2] netdev-dpdk: Add dpdkvdpa port
> 
> On Thu, Oct 17, 2019 at 02:16:56PM +0300, Noa Ezra wrote:
> 
> Hi Noa,
> 
> Thanks for the patch. I'm new to this and have a question below.
> 
> > dpdkvdpa netdev works with 3 components:
> > vhost-user socket, vdpa device: real vdpa device or a VF and
> > representor of "vdpa device".
> >
> > In order to add a new vDPA port, add a new port to existing bridge
> > with type dpdkvdpa and vDPA options:
> > ovs-vsctl add-port br0 vdpa0 -- set Interface vdpa0 type=dpdkvdpa
> >    options:vdpa-socket-path=<sock path>
> >    options:vdpa-accelerator-devargs=<VF pci id>
> >    options:dpdk-devargs=<vdpa pci id>,representor=[id]
> >
> > On this command OVS will create a new netdev:
> > 1. Register vhost-user-client device.
> > 2. Open and configure VF dpdk port.
> > 3. Open and configure representor dpdk port.
> >
> > The new netdev will use netdev_rxq_recv() function in order to receive
> > packets from VF and push to vhost-user and receive packets from
> > vhost-user and push to VF.
> 
> So does OVS in this case is able to apply OpenFlow rules on packets?
> 
> When netdev_dpdk_vdpa_rxq_recv() is invoked, does the batch of packets
> go into OVS's parse, lookup, action pipeline? Or all packets go directly into
> VM if (VF -> VM) and vice versa?
> 
> Is
> fwd_rx = netdev_dpdk_vdpa_rxq_recv_impl(dev->relay, rxq->queue_id);
> forward packets from vhost-user to VF and ret =
> netdev_dpdk_rxq_recv(rxq, batch, qfill); forward packets from vhost-user to
> VM?

I hope that I understand your question correctly, the netdev_dpdk_vdpa_rxq_recv forwards packets from VM to VF and vice versa.
There is no change in the processing of the packet between VF and up-link and no change in the packet's header.
The new netdev only translate between SR-IOV (phy) VF to virtIO VM.

> Thanks
> William
William Tu Oct. 22, 2019, 3:53 p.m. UTC | #3
On Tue, Oct 22, 2019 at 12:17 AM Noa Ezra <noae@mellanox.com> wrote:
>
> Hi,
> Please see the answer below.
>
> Thanks,
> Noa.
>
> > -----Original Message-----
> > From: William Tu [mailto:u9012063@gmail.com]
> > Sent: Friday, October 18, 2019 12:34 AM
> > To: Noa Ezra <noae@mellanox.com>
> > Cc: ovs-dev@openvswitch.org; Oz Shlomo <ozsh@mellanox.com>; Majd
> > Dibbiny <majd@mellanox.com>; Ameer Mahagneh
> > <ameerm@mellanox.com>; Eli Britstein <elibr@mellanox.com>
> > Subject: Re: [ovs-dev] [PATCH ovs v3 2/2] netdev-dpdk: Add dpdkvdpa port
> >
> > On Thu, Oct 17, 2019 at 02:16:56PM +0300, Noa Ezra wrote:
> >
> > Hi Noa,
> >
> > Thanks for the patch. I'm new to this and have a question below.
> >
> > > dpdkvdpa netdev works with 3 components:
> > > vhost-user socket, vdpa device: real vdpa device or a VF and
> > > representor of "vdpa device".
> > >
> > > In order to add a new vDPA port, add a new port to existing bridge
> > > with type dpdkvdpa and vDPA options:
> > > ovs-vsctl add-port br0 vdpa0 -- set Interface vdpa0 type=dpdkvdpa
> > >    options:vdpa-socket-path=<sock path>
> > >    options:vdpa-accelerator-devargs=<VF pci id>
> > >    options:dpdk-devargs=<vdpa pci id>,representor=[id]
> > >
> > > On this command OVS will create a new netdev:
> > > 1. Register vhost-user-client device.
> > > 2. Open and configure VF dpdk port.
> > > 3. Open and configure representor dpdk port.
> > >
> > > The new netdev will use netdev_rxq_recv() function in order to receive
> > > packets from VF and push to vhost-user and receive packets from
> > > vhost-user and push to VF.
> >
> > So does OVS in this case is able to apply OpenFlow rules on packets?
> >
> > When netdev_dpdk_vdpa_rxq_recv() is invoked, does the batch of packets
> > go into OVS's parse, lookup, action pipeline? Or all packets go directly into
> > VM if (VF -> VM) and vice versa?
> >
> > Is
> > fwd_rx = netdev_dpdk_vdpa_rxq_recv_impl(dev->relay, rxq->queue_id);
> > forward packets from vhost-user to VF and ret =
> > netdev_dpdk_rxq_recv(rxq, batch, qfill); forward packets from vhost-user to
> > VM?
>
> I hope that I understand your question correctly, the netdev_dpdk_vdpa_rxq_recv forwards packets from VM to VF and vice versa.
> There is no change in the processing of the packet between VF and up-link and no change in the packet's header.
> The new netdev only translate between SR-IOV (phy) VF to virtIO VM.
>

Hi Noa,

Thank you for your reply.
So does the netdev_dpdk_rxq_recv below also forwards packets?
 netdev_dpdk_rxq_recv(rxq, batch, qfill);

Am I able to modify packet content by accessing batch->packets[i]?

Regards,
William
William Tu Oct. 23, 2019, 11 p.m. UTC | #4
Hi Noa,

I have a couple more questions. I'm still at the learning stage of this
new feature, thanks in advance for your patience.

On Thu, Oct 17, 2019 at 02:16:56PM +0300, Noa Ezra wrote:
> dpdkvdpa netdev works with 3 components:
> vhost-user socket, vdpa device: real vdpa device or a VF and
> representor of "vdpa device".

What NIC card support this feature?
I don't have real vdpa device, can I use Intel X540 VF feature?

> 
> In order to add a new vDPA port, add a new port to existing bridge
> with type dpdkvdpa and vDPA options:
> ovs-vsctl add-port br0 vdpa0 -- set Interface vdpa0 type=dpdkvdpa
>    options:vdpa-socket-path=<sock path>
>    options:vdpa-accelerator-devargs=<VF pci id>
>    options:dpdk-devargs=<vdpa pci id>,representor=[id]
> 
> On this command OVS will create a new netdev:
> 1. Register vhost-user-client device.
> 2. Open and configure VF dpdk port.
> 3. Open and configure representor dpdk port.
> 
> The new netdev will use netdev_rxq_recv() function in order to receive
> packets from VF and push to vhost-user and receive packets from
> vhost-user and push to VF.
> 
> Signed-off-by: Noa Ezra <noae@mellanox.com>
> Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
> ---
>  Documentation/automake.mk           |   1 +
>  Documentation/topics/dpdk/index.rst |   1 +
>  Documentation/topics/dpdk/vdpa.rst  |  90 ++++++++++++++++++++
>  NEWS                                |   1 +
>  lib/netdev-dpdk.c                   | 162 ++++++++++++++++++++++++++++++++++++
>  vswitchd/vswitch.xml                |  25 ++++++
>  6 files changed, 280 insertions(+)
>  create mode 100644 Documentation/topics/dpdk/vdpa.rst
> 
> diff --git a/Documentation/automake.mk b/Documentation/automake.mk
> index cd68f3b..ee574bc 100644
> --- a/Documentation/automake.mk
> +++ b/Documentation/automake.mk
> @@ -43,6 +43,7 @@ DOC_SOURCE = \
>  	Documentation/topics/dpdk/ring.rst \
>  	Documentation/topics/dpdk/vdev.rst \
>  	Documentation/topics/dpdk/vhost-user.rst \
> +	Documentation/topics/dpdk/vdpa.rst \
>  	Documentation/topics/fuzzing/index.rst \
>  	Documentation/topics/fuzzing/what-is-fuzzing.rst \
>  	Documentation/topics/fuzzing/ovs-fuzzing-infrastructure.rst \
> diff --git a/Documentation/topics/dpdk/index.rst b/Documentation/topics/dpdk/index.rst
> index cf24a7b..c1d4ea7 100644
> --- a/Documentation/topics/dpdk/index.rst
> +++ b/Documentation/topics/dpdk/index.rst
> @@ -41,3 +41,4 @@ The DPDK Datapath
>     /topics/dpdk/pdump
>     /topics/dpdk/jumbo-frames
>     /topics/dpdk/memory
> +   /topics/dpdk/vdpa
> diff --git a/Documentation/topics/dpdk/vdpa.rst b/Documentation/topics/dpdk/vdpa.rst
> new file mode 100644
> index 0000000..34c5300
> --- /dev/null
> +++ b/Documentation/topics/dpdk/vdpa.rst
> @@ -0,0 +1,90 @@
> +..
> +      Copyright (c) 2019 Mellanox Technologies, Ltd.
> +
> +      Licensed under the Apache License, Version 2.0 (the "License");
> +      you may not use this file except in compliance with the License.
> +      You may obtain a copy of the License at:
> +
> +          http://www.apache.org/licenses/LICENSE-2.0
> +
> +      Unless required by applicable law or agreed to in writing, software
> +      distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
> +      WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
> +      License for the specific language governing permissions and limitations
> +      under the License.
> +
> +      Convention for heading levels in Open vSwitch documentation:
> +
> +      =======  Heading 0 (reserved for the title in a document)
> +      -------  Heading 1
> +      ~~~~~~~  Heading 2
> +      +++++++  Heading 3
> +      '''''''  Heading 4
> +
> +      Avoid deeper levels because they do not render well.
> +
> +
> +===============
> +DPDK VDPA Ports
> +===============
> +
> +In user space there are two main approaches to communicate with a guest (VM),
> +using virtIO ports (e.g. netdev type=dpdkvhoshuser/dpdkvhostuserclient) or
> +SR-IOV using phy ports (e.g. netdev type = dpdk).
> +Phy ports allow working with port representor which is attached to the OVS and
> +a matching VF is given with pass-through to the guest.
> +HW rules can process packets from up-link and direct them to the VF without
> +going through SW (OVS) and therefore using phy ports gives the best
> +performance.
> +However, SR-IOV architecture requires that the guest will use a driver which is
> +specific to the underlying HW. Specific HW driver has two main drawbacks:
> +1. Breaks virtualization in some sense (guest aware of the HW), can also limit
> +the type of images supported.
> +2. Less natural support for live migration.
> +
> +Using virtIO port solves both problems, but reduces performance and causes
> +losing of some functionality, for example, for some HW offload, working
> +directly with virtIO cannot be supported.
> +
> +We created a new netdev type- dpdkvdpa. dpdkvdpa port solves this conflict.
> +The new netdev is basically very similar to regular dpdk netdev but it has some
> +additional functionally.
> +This port translates between phy port to virtIO port, it takes packets from
> +rx-queue and send them to the suitable tx-queue and allows to transfer packets
> +from virtIO guest (VM) to a VF and vice versa and benefit both SR-IOV and
> +virtIO.
> +
> +Quick Example
> +-------------
> +
> +Configure OVS bridge and ports
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +you must first create a bridge and add ports to the switch.
> +Since the dpdkvdpa port is configured as a client, the vdpa-socket-path must be
> +configured by the user.
> +VHOST_USER_SOCKET_PATH=/path/to/socket
> +
> +    $ ovs-vsctl add-br br0-ovs -- set bridge br0-ovs datapath_type=netdev
> +    $ ovs-vsctl add-port br0-ovs pf -- set Interface pf \
> +    type=dpdk options:dpdk-devargs=<pf pci id>

Is adding pf port to br0 necessary?

> +    $ ovs-vsctl add-port br0 vdpa0 -- set Interface vdpa0 type=dpdkvdpa \
> +    options:vdpa-socket-path=VHOST_USER_SOCKET_PATH \
> +    options:vdpa-accelerator-devargs=<vf pci id> \
> +    options:dpdk-devargs=<pf pci id>,representor=[id]
> +
> +Once the ports have been added to the switch, they must be added to the guest.
> +
> +Adding vhost-user ports to the guest (QEMU)
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +Attach the vhost-user device sockets to the guest. To do this, you must pass
> +the following parameters to QEMU:
> +
> +    -chardev socket,id=char1,path=$VHOST_USER_SOCKET_PATH,server
> +    -netdev type=vhost-user,id=mynet1,chardev=char1,vhostforce
> +    -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1
> +
> +QEMU will wait until the port is created successfully in OVS to boot the VM.
> +In this mode, in case the switch will crash, the vHost ports will reconnect
> +automatically once it is brought back.
> diff --git a/NEWS b/NEWS
> index f5a0b8f..6f315c6 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -542,6 +542,7 @@ v2.6.0 - 27 Sep 2016
>       * Remove dpdkvhostcuse port type.
>       * OVS client mode for vHost and vHost reconnect (Requires QEMU 2.7)
>       * 'dpdkvhostuserclient' port type.
> +     * 'dpdkvdpa' port type.
>     - Increase number of registers to 16.
>     - ovs-benchmark: This utility has been removed due to lack of use and
>       bitrot.
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index bc20d68..16ddf58 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -47,6 +47,7 @@
>  #include "dpif-netdev.h"
>  #include "fatal-signal.h"
>  #include "netdev-provider.h"
> +#include "netdev-dpdk-vdpa.h"
>  #include "netdev-vport.h"
>  #include "odp-util.h"
>  #include "openvswitch/dynamic-string.h"
> @@ -137,6 +138,9 @@ typedef uint16_t dpdk_port_t;
>  /* Legacy default value for vhost tx retries. */
>  #define VHOST_ENQ_RETRY_DEF 8
>  
> +/* Size of VDPA custom stats. */
> +#define VDPA_CUSTOM_STATS_SIZE          4
> +
>  #define IF_NAME_SZ (PATH_MAX > IFNAMSIZ ? PATH_MAX : IFNAMSIZ)
>  
>  static const struct rte_eth_conf port_conf = {
> @@ -461,6 +465,8 @@ struct netdev_dpdk {
>          int rte_xstats_ids_size;
>          uint64_t *rte_xstats_ids;
>      );
> +
> +    struct netdev_dpdk_vdpa_relay *relay;
>  };
>  
>  struct netdev_rxq_dpdk {
> @@ -1346,6 +1352,30 @@ netdev_dpdk_construct(struct netdev *netdev)
>      return err;
>  }
>  
> +static int
> +netdev_dpdk_vdpa_construct(struct netdev *netdev)
> +{
> +    struct netdev_dpdk *dev;
> +    int err;
> +
> +    err = netdev_dpdk_construct(netdev);
> +    if (err) {
> +        VLOG_ERR("netdev_dpdk_construct failed. Port: %s\n", netdev->name);
> +        goto out;
> +    }
> +
> +    ovs_mutex_lock(&dpdk_mutex);
> +    dev = netdev_dpdk_cast(netdev);
> +    dev->relay = netdev_dpdk_vdpa_alloc_relay();
> +    if (!dev->relay) {
> +        err = ENOMEM;
> +    }
> +
> +    ovs_mutex_unlock(&dpdk_mutex);
> +out:
> +    return err;
> +}
> +
>  static void
>  common_destruct(struct netdev_dpdk *dev)
>      OVS_REQUIRES(dpdk_mutex)
> @@ -1428,6 +1458,19 @@ dpdk_vhost_driver_unregister(struct netdev_dpdk *dev OVS_UNUSED,
>  }
>  
>  static void
> +netdev_dpdk_vdpa_destruct(struct netdev *netdev)
> +{
> +    struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
> +
> +    ovs_mutex_lock(&dpdk_mutex);
> +    netdev_dpdk_vdpa_destruct_impl(dev->relay);
> +    rte_free(dev->relay);
> +    ovs_mutex_unlock(&dpdk_mutex);
> +
> +    netdev_dpdk_destruct(netdev);
> +}
> +
> +static void
>  netdev_dpdk_vhost_destruct(struct netdev *netdev)
>  {
>      struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
> @@ -1878,6 +1921,47 @@ out:
>  }
>  
>  static int
> +netdev_dpdk_vdpa_set_config(struct netdev *netdev, const struct smap *args,
> +                            char **errp)
> +{
> +    struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
> +    const char *vdpa_accelerator_devargs =
> +                smap_get(args, "vdpa-accelerator-devargs");
> +    const char *vdpa_socket_path =
> +                smap_get(args, "vdpa-socket-path");
> +    int err = 0;
> +
> +    if ((vdpa_accelerator_devargs == NULL) || (vdpa_socket_path == NULL)) {
> +        VLOG_ERR("netdev_dpdk_vdpa_set_config failed."
> +                 "Required arguments are missing for VDPA port %s",
> +                 netdev->name);
> +        goto free_relay;
> +    }
> +
> +    err = netdev_dpdk_set_config(netdev, args, errp);
> +    if (err) {
> +        VLOG_ERR("netdev_dpdk_set_config failed. Port: %s", netdev->name);
> +        goto free_relay;
> +    }
> +
> +    err = netdev_dpdk_vdpa_config_impl(dev->relay, dev->port_id,
> +                                       vdpa_socket_path,
> +                                       vdpa_accelerator_devargs);
> +    if (err) {
> +        VLOG_ERR("netdev_dpdk_vdpa_config_impl failed. Port %s",
> +                 netdev->name);
> +        goto free_relay;
> +    }
> +
> +    goto out;
> +
> +free_relay:
> +    rte_free(dev->relay);
> +out:
> +    return err;
> +}
> +
> +static int
>  netdev_dpdk_ring_set_config(struct netdev *netdev, const struct smap *args,
>                              char **errp OVS_UNUSED)
>  {
> @@ -2273,6 +2357,23 @@ netdev_dpdk_rxq_recv(struct netdev_rxq *rxq, struct dp_packet_batch *batch,
>      return 0;
>  }
>  
> +static int
> +netdev_dpdk_vdpa_rxq_recv(struct netdev_rxq *rxq,
> +                          struct dp_packet_batch *batch,
> +                          int *qfill)
> +{
> +    struct netdev_dpdk *dev = netdev_dpdk_cast(rxq->netdev);
> +    int fwd_rx;
> +    int ret;
> +
> +    fwd_rx = netdev_dpdk_vdpa_rxq_recv_impl(dev->relay, rxq->queue_id);
I'm still not clear about the above function.
So netdev_dpdk_vdpa_recv_impl() 
    netdev_dpdk_vdpa_forward_traffic(), with a queue pair as parameter
        ...
        rte_eth_rx_burst(qpair->port_id_rx...)
        ...
        rte_eth_tx_burst(qpair->port_id_tx...)

So looks like forwarding between vf to vhostuser and vice versa is done in
this function.

> +    ret = netdev_dpdk_rxq_recv(rxq, batch, qfill);

Then why do we call netdev_dpdk_rxq_recv() above again?
Are packets received above the same packets as rte_eth_rx_burst() previously
called in netdev_dpdk_vdpa_forward_traffic()?


Thanks
William

> +    if ((ret == EAGAIN) && fwd_rx) {
> +        return 0;
> +    }
> +    return ret;
> +}
> +
>  static inline int
>  netdev_dpdk_qos_run(struct netdev_dpdk *dev, struct rte_mbuf **pkts,
>                      int cnt, bool should_steal)
> @@ -2854,6 +2955,29 @@ netdev_dpdk_vhost_get_custom_stats(const struct netdev *netdev,
>  }
>  
>  static int
> +netdev_dpdk_vdpa_get_custom_stats(const struct netdev *netdev,
> +                                  struct netdev_custom_stats *custom_stats)
> +{
> +    struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
> +    int err = 0;
> +
> +    ovs_mutex_lock(&dev->mutex);
> +
> +    custom_stats->size = VDPA_CUSTOM_STATS_SIZE;
> +    custom_stats->counters = xcalloc(custom_stats->size,
> +                                     sizeof *custom_stats->counters);
> +    err = netdev_dpdk_vdpa_get_custom_stats_impl(dev->relay,
> +                                                 custom_stats);
> +    if (err) {
> +        VLOG_ERR("netdev_dpdk_vdpa_get_custom_stats_impl failed."
> +                 "Port %s\n", netdev->name);
> +    }
> +
> +    ovs_mutex_unlock(&dev->mutex);
> +    return err;
> +}
> +
> +static int
>  netdev_dpdk_get_features(const struct netdev *netdev,
>                           enum netdev_features *current,
>                           enum netdev_features *advertised,
> @@ -4237,6 +4361,31 @@ netdev_dpdk_vhost_reconfigure(struct netdev *netdev)
>  }
>  
>  static int
> +netdev_dpdk_vdpa_reconfigure(struct netdev *netdev)
> +{
> +    struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
> +    int err;
> +
> +    err = netdev_dpdk_reconfigure(netdev);
> +    if (err) {
> +        VLOG_ERR("netdev_dpdk_reconfigure failed. Port %s", netdev->name);
> +        goto out;
> +    }
> +
> +    ovs_mutex_lock(&dev->mutex);
> +    err = netdev_dpdk_vdpa_update_relay(dev->relay, dev->dpdk_mp->mp,
> +                                        dev->up.n_rxq);
> +    if (err) {
> +        VLOG_ERR("netdev_dpdk_vdpa_update_relay failed. Port %s",
> +                 netdev->name);
> +    }
> +
> +    ovs_mutex_unlock(&dev->mutex);
> +out:
> +    return err;
> +}
> +
> +static int
>  netdev_dpdk_vhost_client_reconfigure(struct netdev *netdev)
>  {
>      struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
> @@ -4456,6 +4605,18 @@ static const struct netdev_class dpdk_vhost_client_class = {
>      .rxq_enabled = netdev_dpdk_vhost_rxq_enabled,
>  };
>  
> +static const struct netdev_class dpdk_vdpa_class = {
> +    .type = "dpdkvdpa",
> +    NETDEV_DPDK_CLASS_COMMON,
> +    .construct = netdev_dpdk_vdpa_construct,
> +    .destruct = netdev_dpdk_vdpa_destruct,
> +    .rxq_recv = netdev_dpdk_vdpa_rxq_recv,
> +    .set_config = netdev_dpdk_vdpa_set_config,
> +    .reconfigure = netdev_dpdk_vdpa_reconfigure,
> +    .get_custom_stats = netdev_dpdk_vdpa_get_custom_stats,
> +    .send = netdev_dpdk_eth_send
> +};
> +
>  void
>  netdev_dpdk_register(void)
>  {
> @@ -4463,4 +4624,5 @@ netdev_dpdk_register(void)
>      netdev_register_provider(&dpdk_ring_class);
>      netdev_register_provider(&dpdk_vhost_class);
>      netdev_register_provider(&dpdk_vhost_client_class);
> +    netdev_register_provider(&dpdk_vdpa_class);
>  }
> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
> index 9a743c0..9e94950 100644
> --- a/vswitchd/vswitch.xml
> +++ b/vswitchd/vswitch.xml
> @@ -2640,6 +2640,13 @@
>            <dd>
>              A pair of virtual devices that act as a patch cable.
>            </dd>
> +
> +          <dt><code>dpdkvdpa</code></dt>
> +          <dd>
> +            The dpdk vDPA port allows forwarding bi-directional traffic between
> +            SR-IOV virtual functions (VFs) and VirtIO devices in virtual
> +            machines (VMs).
> +          </dd>
>          </dl>
>        </column>
>      </group>
> @@ -3156,6 +3163,24 @@ ovs-vsctl add-port br0 p0 -- set Interface p0 type=patch options:peer=p1 \
>          </p>
>        </column>
>  
> +      <column name="options" key="vdpa-socket-path"
> +              type='{"type": "string"}'>
> +        <p>
> +          The value specifies the path to the socket associated with a VDPA
> +          port that will be created by QEMU.
> +          Only supported by dpdkvdpa interfaces.
> +        </p>
> +      </column>
> +
> +      <column name="options" key="vdpa-accelerator-devargs"
> +              type='{"type": "string"}'>
> +        <p>
> +          The value specifies the PCI address associated with the virtual
> +          function.
> +          Only supported by dpdkvdpa interfaces.
> +        </p>
> +      </column>
> +
>        <column name="options" key="dq-zero-copy"
>                type='{"type": "boolean"}'>
>          <p>
> -- 
> 1.8.3.1
> 
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Noa Ezra Oct. 27, 2019, 9:24 a.m. UTC | #5
> -----Original Message-----
> From: William Tu [mailto:u9012063@gmail.com]
> Sent: Thursday, October 24, 2019 2:00 AM
> To: Noa Levy <noae@mellanox.com>
> Cc: ovs-dev@openvswitch.org; Oz Shlomo <ozsh@mellanox.com>; Majd
> Dibbiny <majd@mellanox.com>; Ameer Mahagneh
> <ameerm@mellanox.com>; Eli Britstein <elibr@mellanox.com>
> Subject: Re: [ovs-dev] [PATCH ovs v3 2/2] netdev-dpdk: Add dpdkvdpa port
> 
> Hi Noa,
> 
> I have a couple more questions. I'm still at the learning stage of this new
> feature, thanks in advance for your patience.
> 
> On Thu, Oct 17, 2019 at 02:16:56PM +0300, Noa Ezra wrote:
> > dpdkvdpa netdev works with 3 components:
> > vhost-user socket, vdpa device: real vdpa device or a VF and
> > representor of "vdpa device".
> 
> What NIC card support this feature?
> I don't have real vdpa device, can I use Intel X540 VF feature?
> 

This feature will have two modes, SW and HW.
The SW mode doesn't depend on a real vdpa device and allows you to use this feature even if you don't have a NIC that support it.
The HW mode will be implemented in the future and will use a real vdpa device. It will be better to use the HW mode if you have a NIC that support it. 

For now, we only support the SW mode, when vdpa will have support in dpdk, we will add the HW mode to OVS.

> >
> > In order to add a new vDPA port, add a new port to existing bridge
> > with type dpdkvdpa and vDPA options:
> > ovs-vsctl add-port br0 vdpa0 -- set Interface vdpa0 type=dpdkvdpa
> >    options:vdpa-socket-path=<sock path>
> >    options:vdpa-accelerator-devargs=<VF pci id>
> >    options:dpdk-devargs=<vdpa pci id>,representor=[id]
> >
> > On this command OVS will create a new netdev:
> > 1. Register vhost-user-client device.
> > 2. Open and configure VF dpdk port.
> > 3. Open and configure representor dpdk port.
> >
> > The new netdev will use netdev_rxq_recv() function in order to receive
> > packets from VF and push to vhost-user and receive packets from
> > vhost-user and push to VF.
> >
> > Signed-off-by: Noa Ezra <noae@mellanox.com>
> > Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
> > ---
> >  Documentation/automake.mk           |   1 +
> >  Documentation/topics/dpdk/index.rst |   1 +
> >  Documentation/topics/dpdk/vdpa.rst  |  90 ++++++++++++++++++++
> >  NEWS                                |   1 +
> >  lib/netdev-dpdk.c                   | 162
> ++++++++++++++++++++++++++++++++++++
> >  vswitchd/vswitch.xml                |  25 ++++++
> >  6 files changed, 280 insertions(+)
> >  create mode 100644 Documentation/topics/dpdk/vdpa.rst
> >
> > diff --git a/Documentation/automake.mk b/Documentation/automake.mk
> > index cd68f3b..ee574bc 100644
> > --- a/Documentation/automake.mk
> > +++ b/Documentation/automake.mk
> > @@ -43,6 +43,7 @@ DOC_SOURCE = \
> >  	Documentation/topics/dpdk/ring.rst \
> >  	Documentation/topics/dpdk/vdev.rst \
> >  	Documentation/topics/dpdk/vhost-user.rst \
> > +	Documentation/topics/dpdk/vdpa.rst \
> >  	Documentation/topics/fuzzing/index.rst \
> >  	Documentation/topics/fuzzing/what-is-fuzzing.rst \
> >  	Documentation/topics/fuzzing/ovs-fuzzing-infrastructure.rst \ diff
> > --git a/Documentation/topics/dpdk/index.rst
> > b/Documentation/topics/dpdk/index.rst
> > index cf24a7b..c1d4ea7 100644
> > --- a/Documentation/topics/dpdk/index.rst
> > +++ b/Documentation/topics/dpdk/index.rst
> > @@ -41,3 +41,4 @@ The DPDK Datapath
> >     /topics/dpdk/pdump
> >     /topics/dpdk/jumbo-frames
> >     /topics/dpdk/memory
> > +   /topics/dpdk/vdpa
> > diff --git a/Documentation/topics/dpdk/vdpa.rst
> > b/Documentation/topics/dpdk/vdpa.rst
> > new file mode 100644
> > index 0000000..34c5300
> > --- /dev/null
> > +++ b/Documentation/topics/dpdk/vdpa.rst
> > @@ -0,0 +1,90 @@
> > +..
> > +      Copyright (c) 2019 Mellanox Technologies, Ltd.
> > +
> > +      Licensed under the Apache License, Version 2.0 (the "License");
> > +      you may not use this file except in compliance with the License.
> > +      You may obtain a copy of the License at:
> > +
> > +
> > +
> https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww
> > + .apache.org%2Flicenses%2FLICENSE-
> 2.0&amp;data=02%7C01%7Cnoae%40mella
> > +
> nox.com%7C6390609428bf4e2a2df808d7580cc233%7Ca652971c7d2e4d9ba6a4
> d14
> > +
> 9256f461b%7C0%7C0%7C637074684147132980&amp;sdata=94myUB4Fchqm4
> 4lxlto
> > + OIcbCXhlu%2FA%2FoVID8Z9EyvXU%3D&amp;reserved=0
> > +
> > +      Unless required by applicable law or agreed to in writing, software
> > +      distributed under the License is distributed on an "AS IS" BASIS,
> WITHOUT
> > +      WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> See the
> > +      License for the specific language governing permissions and limitations
> > +      under the License.
> > +
> > +      Convention for heading levels in Open vSwitch documentation:
> > +
> > +      =======  Heading 0 (reserved for the title in a document)
> > +      -------  Heading 1
> > +      ~~~~~~~  Heading 2
> > +      +++++++  Heading 3
> > +      '''''''  Heading 4
> > +
> > +      Avoid deeper levels because they do not render well.
> > +
> > +
> > +===============
> > +DPDK VDPA Ports
> > +===============
> > +
> > +In user space there are two main approaches to communicate with a
> > +guest (VM), using virtIO ports (e.g. netdev
> > +type=dpdkvhoshuser/dpdkvhostuserclient) or SR-IOV using phy ports
> (e.g. netdev type = dpdk).
> > +Phy ports allow working with port representor which is attached to
> > +the OVS and a matching VF is given with pass-through to the guest.
> > +HW rules can process packets from up-link and direct them to the VF
> > +without going through SW (OVS) and therefore using phy ports gives
> > +the best performance.
> > +However, SR-IOV architecture requires that the guest will use a
> > +driver which is specific to the underlying HW. Specific HW driver has two
> main drawbacks:
> > +1. Breaks virtualization in some sense (guest aware of the HW), can
> > +also limit the type of images supported.
> > +2. Less natural support for live migration.
> > +
> > +Using virtIO port solves both problems, but reduces performance and
> > +causes losing of some functionality, for example, for some HW
> > +offload, working directly with virtIO cannot be supported.
> > +
> > +We created a new netdev type- dpdkvdpa. dpdkvdpa port solves this
> conflict.
> > +The new netdev is basically very similar to regular dpdk netdev but
> > +it has some additional functionally.
> > +This port translates between phy port to virtIO port, it takes
> > +packets from rx-queue and send them to the suitable tx-queue and
> > +allows to transfer packets from virtIO guest (VM) to a VF and vice
> > +versa and benefit both SR-IOV and virtIO.
> > +
> > +Quick Example
> > +-------------
> > +
> > +Configure OVS bridge and ports
> > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > +
> > +you must first create a bridge and add ports to the switch.
> > +Since the dpdkvdpa port is configured as a client, the
> > +vdpa-socket-path must be configured by the user.
> > +VHOST_USER_SOCKET_PATH=/path/to/socket
> > +
> > +    $ ovs-vsctl add-br br0-ovs -- set bridge br0-ovs datapath_type=netdev
> > +    $ ovs-vsctl add-port br0-ovs pf -- set Interface pf \
> > +    type=dpdk options:dpdk-devargs=<pf pci id>
> 
> Is adding pf port to br0 necessary?
> 
> > +    $ ovs-vsctl add-port br0 vdpa0 -- set Interface vdpa0 type=dpdkvdpa \
> > +    options:vdpa-socket-path=VHOST_USER_SOCKET_PATH \
> > +    options:vdpa-accelerator-devargs=<vf pci id> \
> > +    options:dpdk-devargs=<pf pci id>,representor=[id]
> > +
> > +Once the ports have been added to the switch, they must be added to
> the guest.
> > +
> > +Adding vhost-user ports to the guest (QEMU)
> > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > +
> > +Attach the vhost-user device sockets to the guest. To do this, you
> > +must pass the following parameters to QEMU:
> > +
> > +    -chardev socket,id=char1,path=$VHOST_USER_SOCKET_PATH,server
> > +    -netdev type=vhost-user,id=mynet1,chardev=char1,vhostforce
> > +    -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1
> > +
> > +QEMU will wait until the port is created successfully in OVS to boot the
> VM.
> > +In this mode, in case the switch will crash, the vHost ports will
> > +reconnect automatically once it is brought back.
> > diff --git a/NEWS b/NEWS
> > index f5a0b8f..6f315c6 100644
> > --- a/NEWS
> > +++ b/NEWS
> > @@ -542,6 +542,7 @@ v2.6.0 - 27 Sep 2016
> >       * Remove dpdkvhostcuse port type.
> >       * OVS client mode for vHost and vHost reconnect (Requires QEMU 2.7)
> >       * 'dpdkvhostuserclient' port type.
> > +     * 'dpdkvdpa' port type.
> >     - Increase number of registers to 16.
> >     - ovs-benchmark: This utility has been removed due to lack of use and
> >       bitrot.
> > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index
> > bc20d68..16ddf58 100644
> > --- a/lib/netdev-dpdk.c
> > +++ b/lib/netdev-dpdk.c
> > @@ -47,6 +47,7 @@
> >  #include "dpif-netdev.h"
> >  #include "fatal-signal.h"
> >  #include "netdev-provider.h"
> > +#include "netdev-dpdk-vdpa.h"
> >  #include "netdev-vport.h"
> >  #include "odp-util.h"
> >  #include "openvswitch/dynamic-string.h"
> > @@ -137,6 +138,9 @@ typedef uint16_t dpdk_port_t;
> >  /* Legacy default value for vhost tx retries. */  #define
> > VHOST_ENQ_RETRY_DEF 8
> >
> > +/* Size of VDPA custom stats. */
> > +#define VDPA_CUSTOM_STATS_SIZE          4
> > +
> >  #define IF_NAME_SZ (PATH_MAX > IFNAMSIZ ? PATH_MAX : IFNAMSIZ)
> >
> >  static const struct rte_eth_conf port_conf = { @@ -461,6 +465,8 @@
> > struct netdev_dpdk {
> >          int rte_xstats_ids_size;
> >          uint64_t *rte_xstats_ids;
> >      );
> > +
> > +    struct netdev_dpdk_vdpa_relay *relay;
> >  };
> >
> >  struct netdev_rxq_dpdk {
> > @@ -1346,6 +1352,30 @@ netdev_dpdk_construct(struct netdev *netdev)
> >      return err;
> >  }
> >
> > +static int
> > +netdev_dpdk_vdpa_construct(struct netdev *netdev) {
> > +    struct netdev_dpdk *dev;
> > +    int err;
> > +
> > +    err = netdev_dpdk_construct(netdev);
> > +    if (err) {
> > +        VLOG_ERR("netdev_dpdk_construct failed. Port: %s\n", netdev-
> >name);
> > +        goto out;
> > +    }
> > +
> > +    ovs_mutex_lock(&dpdk_mutex);
> > +    dev = netdev_dpdk_cast(netdev);
> > +    dev->relay = netdev_dpdk_vdpa_alloc_relay();
> > +    if (!dev->relay) {
> > +        err = ENOMEM;
> > +    }
> > +
> > +    ovs_mutex_unlock(&dpdk_mutex);
> > +out:
> > +    return err;
> > +}
> > +
> >  static void
> >  common_destruct(struct netdev_dpdk *dev)
> >      OVS_REQUIRES(dpdk_mutex)
> > @@ -1428,6 +1458,19 @@ dpdk_vhost_driver_unregister(struct
> netdev_dpdk
> > *dev OVS_UNUSED,  }
> >
> >  static void
> > +netdev_dpdk_vdpa_destruct(struct netdev *netdev) {
> > +    struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
> > +
> > +    ovs_mutex_lock(&dpdk_mutex);
> > +    netdev_dpdk_vdpa_destruct_impl(dev->relay);
> > +    rte_free(dev->relay);
> > +    ovs_mutex_unlock(&dpdk_mutex);
> > +
> > +    netdev_dpdk_destruct(netdev);
> > +}
> > +
> > +static void
> >  netdev_dpdk_vhost_destruct(struct netdev *netdev)  {
> >      struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); @@ -1878,6
> > +1921,47 @@ out:
> >  }
> >
> >  static int
> > +netdev_dpdk_vdpa_set_config(struct netdev *netdev, const struct smap
> *args,
> > +                            char **errp) {
> > +    struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
> > +    const char *vdpa_accelerator_devargs =
> > +                smap_get(args, "vdpa-accelerator-devargs");
> > +    const char *vdpa_socket_path =
> > +                smap_get(args, "vdpa-socket-path");
> > +    int err = 0;
> > +
> > +    if ((vdpa_accelerator_devargs == NULL) || (vdpa_socket_path ==
> NULL)) {
> > +        VLOG_ERR("netdev_dpdk_vdpa_set_config failed."
> > +                 "Required arguments are missing for VDPA port %s",
> > +                 netdev->name);
> > +        goto free_relay;
> > +    }
> > +
> > +    err = netdev_dpdk_set_config(netdev, args, errp);
> > +    if (err) {
> > +        VLOG_ERR("netdev_dpdk_set_config failed. Port: %s", netdev-
> >name);
> > +        goto free_relay;
> > +    }
> > +
> > +    err = netdev_dpdk_vdpa_config_impl(dev->relay, dev->port_id,
> > +                                       vdpa_socket_path,
> > +                                       vdpa_accelerator_devargs);
> > +    if (err) {
> > +        VLOG_ERR("netdev_dpdk_vdpa_config_impl failed. Port %s",
> > +                 netdev->name);
> > +        goto free_relay;
> > +    }
> > +
> > +    goto out;
> > +
> > +free_relay:
> > +    rte_free(dev->relay);
> > +out:
> > +    return err;
> > +}
> > +
> > +static int
> >  netdev_dpdk_ring_set_config(struct netdev *netdev, const struct smap
> *args,
> >                              char **errp OVS_UNUSED)  { @@ -2273,6
> > +2357,23 @@ netdev_dpdk_rxq_recv(struct netdev_rxq *rxq, struct
> dp_packet_batch *batch,
> >      return 0;
> >  }
> >
> > +static int
> > +netdev_dpdk_vdpa_rxq_recv(struct netdev_rxq *rxq,
> > +                          struct dp_packet_batch *batch,
> > +                          int *qfill) {
> > +    struct netdev_dpdk *dev = netdev_dpdk_cast(rxq->netdev);
> > +    int fwd_rx;
> > +    int ret;
> > +
> > +    fwd_rx = netdev_dpdk_vdpa_rxq_recv_impl(dev->relay,
> > + rxq->queue_id);
> I'm still not clear about the above function.
> So netdev_dpdk_vdpa_recv_impl()
>     netdev_dpdk_vdpa_forward_traffic(), with a queue pair as parameter
>         ...
>         rte_eth_rx_burst(qpair->port_id_rx...)
>         ...
>         rte_eth_tx_burst(qpair->port_id_tx...)
> 
> So looks like forwarding between vf to vhostuser and vice versa is done in
> this function.
> 
> > +    ret = netdev_dpdk_rxq_recv(rxq, batch, qfill);
> 
> Then why do we call netdev_dpdk_rxq_recv() above again?
> Are packets received above the same packets as rte_eth_rx_burst()
> previously called in netdev_dpdk_vdpa_forward_traffic()?
> 
> 
> Thanks
> William
> 
> > +    if ((ret == EAGAIN) && fwd_rx) {
> > +        return 0;
> > +    }
> > +    return ret;
> > +}
> > +
> >  static inline int
> >  netdev_dpdk_qos_run(struct netdev_dpdk *dev, struct rte_mbuf **pkts,
> >                      int cnt, bool should_steal) @@ -2854,6 +2955,29
> > @@ netdev_dpdk_vhost_get_custom_stats(const struct netdev *netdev,
> }
> >
> >  static int
> > +netdev_dpdk_vdpa_get_custom_stats(const struct netdev *netdev,
> > +                                  struct netdev_custom_stats
> > +*custom_stats) {
> > +    struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
> > +    int err = 0;
> > +
> > +    ovs_mutex_lock(&dev->mutex);
> > +
> > +    custom_stats->size = VDPA_CUSTOM_STATS_SIZE;
> > +    custom_stats->counters = xcalloc(custom_stats->size,
> > +                                     sizeof *custom_stats->counters);
> > +    err = netdev_dpdk_vdpa_get_custom_stats_impl(dev->relay,
> > +                                                 custom_stats);
> > +    if (err) {
> > +        VLOG_ERR("netdev_dpdk_vdpa_get_custom_stats_impl failed."
> > +                 "Port %s\n", netdev->name);
> > +    }
> > +
> > +    ovs_mutex_unlock(&dev->mutex);
> > +    return err;
> > +}
> > +
> > +static int
> >  netdev_dpdk_get_features(const struct netdev *netdev,
> >                           enum netdev_features *current,
> >                           enum netdev_features *advertised, @@ -4237,6
> > +4361,31 @@ netdev_dpdk_vhost_reconfigure(struct netdev *netdev)  }
> >
> >  static int
> > +netdev_dpdk_vdpa_reconfigure(struct netdev *netdev) {
> > +    struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
> > +    int err;
> > +
> > +    err = netdev_dpdk_reconfigure(netdev);
> > +    if (err) {
> > +        VLOG_ERR("netdev_dpdk_reconfigure failed. Port %s", netdev-
> >name);
> > +        goto out;
> > +    }
> > +
> > +    ovs_mutex_lock(&dev->mutex);
> > +    err = netdev_dpdk_vdpa_update_relay(dev->relay, dev->dpdk_mp-
> >mp,
> > +                                        dev->up.n_rxq);
> > +    if (err) {
> > +        VLOG_ERR("netdev_dpdk_vdpa_update_relay failed. Port %s",
> > +                 netdev->name);
> > +    }
> > +
> > +    ovs_mutex_unlock(&dev->mutex);
> > +out:
> > +    return err;
> > +}
> > +
> > +static int
> >  netdev_dpdk_vhost_client_reconfigure(struct netdev *netdev)  {
> >      struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); @@ -4456,6
> > +4605,18 @@ static const struct netdev_class dpdk_vhost_client_class = {
> >      .rxq_enabled = netdev_dpdk_vhost_rxq_enabled,  };
> >
> > +static const struct netdev_class dpdk_vdpa_class = {
> > +    .type = "dpdkvdpa",
> > +    NETDEV_DPDK_CLASS_COMMON,
> > +    .construct = netdev_dpdk_vdpa_construct,
> > +    .destruct = netdev_dpdk_vdpa_destruct,
> > +    .rxq_recv = netdev_dpdk_vdpa_rxq_recv,
> > +    .set_config = netdev_dpdk_vdpa_set_config,
> > +    .reconfigure = netdev_dpdk_vdpa_reconfigure,
> > +    .get_custom_stats = netdev_dpdk_vdpa_get_custom_stats,
> > +    .send = netdev_dpdk_eth_send
> > +};
> > +
> >  void
> >  netdev_dpdk_register(void)
> >  {
> > @@ -4463,4 +4624,5 @@ netdev_dpdk_register(void)
> >      netdev_register_provider(&dpdk_ring_class);
> >      netdev_register_provider(&dpdk_vhost_class);
> >      netdev_register_provider(&dpdk_vhost_client_class);
> > +    netdev_register_provider(&dpdk_vdpa_class);
> >  }
> > diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml index
> > 9a743c0..9e94950 100644
> > --- a/vswitchd/vswitch.xml
> > +++ b/vswitchd/vswitch.xml
> > @@ -2640,6 +2640,13 @@
> >            <dd>
> >              A pair of virtual devices that act as a patch cable.
> >            </dd>
> > +
> > +          <dt><code>dpdkvdpa</code></dt>
> > +          <dd>
> > +            The dpdk vDPA port allows forwarding bi-directional traffic between
> > +            SR-IOV virtual functions (VFs) and VirtIO devices in virtual
> > +            machines (VMs).
> > +          </dd>
> >          </dl>
> >        </column>
> >      </group>
> > @@ -3156,6 +3163,24 @@ ovs-vsctl add-port br0 p0 -- set Interface p0
> type=patch options:peer=p1 \
> >          </p>
> >        </column>
> >
> > +      <column name="options" key="vdpa-socket-path"
> > +              type='{"type": "string"}'>
> > +        <p>
> > +          The value specifies the path to the socket associated with a VDPA
> > +          port that will be created by QEMU.
> > +          Only supported by dpdkvdpa interfaces.
> > +        </p>
> > +      </column>
> > +
> > +      <column name="options" key="vdpa-accelerator-devargs"
> > +              type='{"type": "string"}'>
> > +        <p>
> > +          The value specifies the PCI address associated with the virtual
> > +          function.
> > +          Only supported by dpdkvdpa interfaces.
> > +        </p>
> > +      </column>
> > +
> >        <column name="options" key="dq-zero-copy"
> >                type='{"type": "boolean"}'>
> >          <p>
> > --
> > 1.8.3.1
> >
> > _______________________________________________
> > dev mailing list
> > dev@openvswitch.org
> >
> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail
> > .openvswitch.org%2Fmailman%2Flistinfo%2Fovs-
> dev&amp;data=02%7C01%7Cnoa
> >
> e%40mellanox.com%7C6390609428bf4e2a2df808d7580cc233%7Ca652971c7d2
> e4d9b
> >
> a6a4d149256f461b%7C0%7C0%7C637074684147132980&amp;sdata=Eai7e%2B
> Ln5x8a
> > fpEi7HdWF8FHDYe4vD7dxRLO2Yo0usQ%3D&amp;reserved=0
Noa Ezra Oct. 27, 2019, 11:25 a.m. UTC | #6
> -----Original Message-----
> From: Noa Levy
> Sent: Sunday, October 27, 2019 11:24 AM
> To: William Tu <u9012063@gmail.com>
> Cc: ovs-dev@openvswitch.org; Oz Shlomo <ozsh@mellanox.com>; Majd
> Dibbiny <majd@mellanox.com>; Ameer Mahagneh
> <ameerm@mellanox.com>; Eli Britstein <elibr@mellanox.com>
> Subject: RE: [ovs-dev] [PATCH ovs v3 2/2] netdev-dpdk: Add dpdkvdpa port
> 
> 
> > -----Original Message-----
> > From: William Tu [mailto:u9012063@gmail.com]
> > Sent: Thursday, October 24, 2019 2:00 AM
> > To: Noa Levy <noae@mellanox.com>
> > Cc: ovs-dev@openvswitch.org; Oz Shlomo <ozsh@mellanox.com>; Majd
> > Dibbiny <majd@mellanox.com>; Ameer Mahagneh
> <ameerm@mellanox.com>; Eli
> > Britstein <elibr@mellanox.com>
> > Subject: Re: [ovs-dev] [PATCH ovs v3 2/2] netdev-dpdk: Add dpdkvdpa
> > port
> >
> > Hi Noa,
> >
> > I have a couple more questions. I'm still at the learning stage of
> > this new feature, thanks in advance for your patience.
> >
> > On Thu, Oct 17, 2019 at 02:16:56PM +0300, Noa Ezra wrote:
> > > dpdkvdpa netdev works with 3 components:
> > > vhost-user socket, vdpa device: real vdpa device or a VF and
> > > representor of "vdpa device".
> >
> > What NIC card support this feature?
> > I don't have real vdpa device, can I use Intel X540 VF feature?
> >
> 
> This feature will have two modes, SW and HW.
> The SW mode doesn't depend on a real vdpa device and allows you to use
> this feature even if you don't have a NIC that support it.
> The HW mode will be implemented in the future and will use a real vdpa
> device. It will be better to use the HW mode if you have a NIC that support it.
> 
> For now, we only support the SW mode, when vdpa will have support in
> dpdk, we will add the HW mode to OVS.
> 
> > >
> > > In order to add a new vDPA port, add a new port to existing bridge
> > > with type dpdkvdpa and vDPA options:
> > > ovs-vsctl add-port br0 vdpa0 -- set Interface vdpa0 type=dpdkvdpa
> > >    options:vdpa-socket-path=<sock path>
> > >    options:vdpa-accelerator-devargs=<VF pci id>
> > >    options:dpdk-devargs=<vdpa pci id>,representor=[id]
> > >
> > > On this command OVS will create a new netdev:
> > > 1. Register vhost-user-client device.
> > > 2. Open and configure VF dpdk port.
> > > 3. Open and configure representor dpdk port.
> > >
> > > The new netdev will use netdev_rxq_recv() function in order to
> > > receive packets from VF and push to vhost-user and receive packets
> > > from vhost-user and push to VF.
> > >
> > > Signed-off-by: Noa Ezra <noae@mellanox.com>
> > > Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
> > > ---
> > >  Documentation/automake.mk           |   1 +
> > >  Documentation/topics/dpdk/index.rst |   1 +
> > >  Documentation/topics/dpdk/vdpa.rst  |  90 ++++++++++++++++++++
> > >  NEWS                                |   1 +
> > >  lib/netdev-dpdk.c                   | 162
> > ++++++++++++++++++++++++++++++++++++
> > >  vswitchd/vswitch.xml                |  25 ++++++
> > >  6 files changed, 280 insertions(+)
> > >  create mode 100644 Documentation/topics/dpdk/vdpa.rst
> > >
> > > diff --git a/Documentation/automake.mk
> b/Documentation/automake.mk
> > > index cd68f3b..ee574bc 100644
> > > --- a/Documentation/automake.mk
> > > +++ b/Documentation/automake.mk
> > > @@ -43,6 +43,7 @@ DOC_SOURCE = \
> > >  	Documentation/topics/dpdk/ring.rst \
> > >  	Documentation/topics/dpdk/vdev.rst \
> > >  	Documentation/topics/dpdk/vhost-user.rst \
> > > +	Documentation/topics/dpdk/vdpa.rst \
> > >  	Documentation/topics/fuzzing/index.rst \
> > >  	Documentation/topics/fuzzing/what-is-fuzzing.rst \
> > >  	Documentation/topics/fuzzing/ovs-fuzzing-infrastructure.rst \ diff
> > > --git a/Documentation/topics/dpdk/index.rst
> > > b/Documentation/topics/dpdk/index.rst
> > > index cf24a7b..c1d4ea7 100644
> > > --- a/Documentation/topics/dpdk/index.rst
> > > +++ b/Documentation/topics/dpdk/index.rst
> > > @@ -41,3 +41,4 @@ The DPDK Datapath
> > >     /topics/dpdk/pdump
> > >     /topics/dpdk/jumbo-frames
> > >     /topics/dpdk/memory
> > > +   /topics/dpdk/vdpa
> > > diff --git a/Documentation/topics/dpdk/vdpa.rst
> > > b/Documentation/topics/dpdk/vdpa.rst
> > > new file mode 100644
> > > index 0000000..34c5300
> > > --- /dev/null
> > > +++ b/Documentation/topics/dpdk/vdpa.rst
> > > @@ -0,0 +1,90 @@
> > > +..
> > > +      Copyright (c) 2019 Mellanox Technologies, Ltd.
> > > +
> > > +      Licensed under the Apache License, Version 2.0 (the "License");
> > > +      you may not use this file except in compliance with the License.
> > > +      You may obtain a copy of the License at:
> > > +
> > > +
> > > +
> >
> https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww
> > > + .apache.org%2Flicenses%2FLICENSE-
> > 2.0&amp;data=02%7C01%7Cnoae%40mella
> > > +
> >
> nox.com%7C6390609428bf4e2a2df808d7580cc233%7Ca652971c7d2e4d9ba6a4
> > d14
> > > +
> >
> 9256f461b%7C0%7C0%7C637074684147132980&amp;sdata=94myUB4Fchqm4
> > 4lxlto
> > > + OIcbCXhlu%2FA%2FoVID8Z9EyvXU%3D&amp;reserved=0
> > > +
> > > +      Unless required by applicable law or agreed to in writing, software
> > > +      distributed under the License is distributed on an "AS IS"
> > > + BASIS,
> > WITHOUT
> > > +      WARRANTIES OR CONDITIONS OF ANY KIND, either express or
> implied.
> > See the
> > > +      License for the specific language governing permissions and
> limitations
> > > +      under the License.
> > > +
> > > +      Convention for heading levels in Open vSwitch documentation:
> > > +
> > > +      =======  Heading 0 (reserved for the title in a document)
> > > +      -------  Heading 1
> > > +      ~~~~~~~  Heading 2
> > > +      +++++++  Heading 3
> > > +      '''''''  Heading 4
> > > +
> > > +      Avoid deeper levels because they do not render well.
> > > +
> > > +
> > > +===============
> > > +DPDK VDPA Ports
> > > +===============
> > > +
> > > +In user space there are two main approaches to communicate with a
> > > +guest (VM), using virtIO ports (e.g. netdev
> > > +type=dpdkvhoshuser/dpdkvhostuserclient) or SR-IOV using phy ports
> > (e.g. netdev type = dpdk).
> > > +Phy ports allow working with port representor which is attached to
> > > +the OVS and a matching VF is given with pass-through to the guest.
> > > +HW rules can process packets from up-link and direct them to the VF
> > > +without going through SW (OVS) and therefore using phy ports gives
> > > +the best performance.
> > > +However, SR-IOV architecture requires that the guest will use a
> > > +driver which is specific to the underlying HW. Specific HW driver
> > > +has two
> > main drawbacks:
> > > +1. Breaks virtualization in some sense (guest aware of the HW), can
> > > +also limit the type of images supported.
> > > +2. Less natural support for live migration.
> > > +
> > > +Using virtIO port solves both problems, but reduces performance and
> > > +causes losing of some functionality, for example, for some HW
> > > +offload, working directly with virtIO cannot be supported.
> > > +
> > > +We created a new netdev type- dpdkvdpa. dpdkvdpa port solves this
> > conflict.
> > > +The new netdev is basically very similar to regular dpdk netdev but
> > > +it has some additional functionally.
> > > +This port translates between phy port to virtIO port, it takes
> > > +packets from rx-queue and send them to the suitable tx-queue and
> > > +allows to transfer packets from virtIO guest (VM) to a VF and vice
> > > +versa and benefit both SR-IOV and virtIO.
> > > +
> > > +Quick Example
> > > +-------------
> > > +
> > > +Configure OVS bridge and ports
> > > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > +
> > > +you must first create a bridge and add ports to the switch.
> > > +Since the dpdkvdpa port is configured as a client, the
> > > +vdpa-socket-path must be configured by the user.
> > > +VHOST_USER_SOCKET_PATH=/path/to/socket
> > > +
> > > +    $ ovs-vsctl add-br br0-ovs -- set bridge br0-ovs datapath_type=netdev
> > > +    $ ovs-vsctl add-port br0-ovs pf -- set Interface pf \
> > > +    type=dpdk options:dpdk-devargs=<pf pci id>
> >
> > Is adding pf port to br0 necessary?

No, the pf is not related to the vdpa port and forwarding feature, it is just a way to connect the VF to the up-link.

> >
> > > +    $ ovs-vsctl add-port br0 vdpa0 -- set Interface vdpa0 type=dpdkvdpa \
> > > +    options:vdpa-socket-path=VHOST_USER_SOCKET_PATH \
> > > +    options:vdpa-accelerator-devargs=<vf pci id> \
> > > +    options:dpdk-devargs=<pf pci id>,representor=[id]
> > > +
> > > +Once the ports have been added to the switch, they must be added to
> > the guest.
> > > +
> > > +Adding vhost-user ports to the guest (QEMU)
> > > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > +
> > > +Attach the vhost-user device sockets to the guest. To do this, you
> > > +must pass the following parameters to QEMU:
> > > +
> > > +    -chardev socket,id=char1,path=$VHOST_USER_SOCKET_PATH,server
> > > +    -netdev type=vhost-user,id=mynet1,chardev=char1,vhostforce
> > > +    -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1
> > > +
> > > +QEMU will wait until the port is created successfully in OVS to
> > > +boot the
> > VM.
> > > +In this mode, in case the switch will crash, the vHost ports will
> > > +reconnect automatically once it is brought back.
> > > diff --git a/NEWS b/NEWS
> > > index f5a0b8f..6f315c6 100644
> > > --- a/NEWS
> > > +++ b/NEWS
> > > @@ -542,6 +542,7 @@ v2.6.0 - 27 Sep 2016
> > >       * Remove dpdkvhostcuse port type.
> > >       * OVS client mode for vHost and vHost reconnect (Requires QEMU
> 2.7)
> > >       * 'dpdkvhostuserclient' port type.
> > > +     * 'dpdkvdpa' port type.
> > >     - Increase number of registers to 16.
> > >     - ovs-benchmark: This utility has been removed due to lack of use and
> > >       bitrot.
> > > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index
> > > bc20d68..16ddf58 100644
> > > --- a/lib/netdev-dpdk.c
> > > +++ b/lib/netdev-dpdk.c
> > > @@ -47,6 +47,7 @@
> > >  #include "dpif-netdev.h"
> > >  #include "fatal-signal.h"
> > >  #include "netdev-provider.h"
> > > +#include "netdev-dpdk-vdpa.h"
> > >  #include "netdev-vport.h"
> > >  #include "odp-util.h"
> > >  #include "openvswitch/dynamic-string.h"
> > > @@ -137,6 +138,9 @@ typedef uint16_t dpdk_port_t;
> > >  /* Legacy default value for vhost tx retries. */  #define
> > > VHOST_ENQ_RETRY_DEF 8
> > >
> > > +/* Size of VDPA custom stats. */
> > > +#define VDPA_CUSTOM_STATS_SIZE          4
> > > +
> > >  #define IF_NAME_SZ (PATH_MAX > IFNAMSIZ ? PATH_MAX : IFNAMSIZ)
> > >
> > >  static const struct rte_eth_conf port_conf = { @@ -461,6 +465,8 @@
> > > struct netdev_dpdk {
> > >          int rte_xstats_ids_size;
> > >          uint64_t *rte_xstats_ids;
> > >      );
> > > +
> > > +    struct netdev_dpdk_vdpa_relay *relay;
> > >  };
> > >
> > >  struct netdev_rxq_dpdk {
> > > @@ -1346,6 +1352,30 @@ netdev_dpdk_construct(struct netdev
> *netdev)
> > >      return err;
> > >  }
> > >
> > > +static int
> > > +netdev_dpdk_vdpa_construct(struct netdev *netdev) {
> > > +    struct netdev_dpdk *dev;
> > > +    int err;
> > > +
> > > +    err = netdev_dpdk_construct(netdev);
> > > +    if (err) {
> > > +        VLOG_ERR("netdev_dpdk_construct failed. Port: %s\n",
> > > + netdev-
> > >name);
> > > +        goto out;
> > > +    }
> > > +
> > > +    ovs_mutex_lock(&dpdk_mutex);
> > > +    dev = netdev_dpdk_cast(netdev);
> > > +    dev->relay = netdev_dpdk_vdpa_alloc_relay();
> > > +    if (!dev->relay) {
> > > +        err = ENOMEM;
> > > +    }
> > > +
> > > +    ovs_mutex_unlock(&dpdk_mutex);
> > > +out:
> > > +    return err;
> > > +}
> > > +
> > >  static void
> > >  common_destruct(struct netdev_dpdk *dev)
> > >      OVS_REQUIRES(dpdk_mutex)
> > > @@ -1428,6 +1458,19 @@ dpdk_vhost_driver_unregister(struct
> > netdev_dpdk
> > > *dev OVS_UNUSED,  }
> > >
> > >  static void
> > > +netdev_dpdk_vdpa_destruct(struct netdev *netdev) {
> > > +    struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
> > > +
> > > +    ovs_mutex_lock(&dpdk_mutex);
> > > +    netdev_dpdk_vdpa_destruct_impl(dev->relay);
> > > +    rte_free(dev->relay);
> > > +    ovs_mutex_unlock(&dpdk_mutex);
> > > +
> > > +    netdev_dpdk_destruct(netdev);
> > > +}
> > > +
> > > +static void
> > >  netdev_dpdk_vhost_destruct(struct netdev *netdev)  {
> > >      struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); @@ -1878,6
> > > +1921,47 @@ out:
> > >  }
> > >
> > >  static int
> > > +netdev_dpdk_vdpa_set_config(struct netdev *netdev, const struct
> > > +smap
> > *args,
> > > +                            char **errp) {
> > > +    struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
> > > +    const char *vdpa_accelerator_devargs =
> > > +                smap_get(args, "vdpa-accelerator-devargs");
> > > +    const char *vdpa_socket_path =
> > > +                smap_get(args, "vdpa-socket-path");
> > > +    int err = 0;
> > > +
> > > +    if ((vdpa_accelerator_devargs == NULL) || (vdpa_socket_path ==
> > NULL)) {
> > > +        VLOG_ERR("netdev_dpdk_vdpa_set_config failed."
> > > +                 "Required arguments are missing for VDPA port %s",
> > > +                 netdev->name);
> > > +        goto free_relay;
> > > +    }
> > > +
> > > +    err = netdev_dpdk_set_config(netdev, args, errp);
> > > +    if (err) {
> > > +        VLOG_ERR("netdev_dpdk_set_config failed. Port: %s", netdev-
> > >name);
> > > +        goto free_relay;
> > > +    }
> > > +
> > > +    err = netdev_dpdk_vdpa_config_impl(dev->relay, dev->port_id,
> > > +                                       vdpa_socket_path,
> > > +                                       vdpa_accelerator_devargs);
> > > +    if (err) {
> > > +        VLOG_ERR("netdev_dpdk_vdpa_config_impl failed. Port %s",
> > > +                 netdev->name);
> > > +        goto free_relay;
> > > +    }
> > > +
> > > +    goto out;
> > > +
> > > +free_relay:
> > > +    rte_free(dev->relay);
> > > +out:
> > > +    return err;
> > > +}
> > > +
> > > +static int
> > >  netdev_dpdk_ring_set_config(struct netdev *netdev, const struct
> > > smap
> > *args,
> > >                              char **errp OVS_UNUSED)  { @@ -2273,6
> > > +2357,23 @@ netdev_dpdk_rxq_recv(struct netdev_rxq *rxq, struct
> > dp_packet_batch *batch,
> > >      return 0;
> > >  }
> > >
> > > +static int
> > > +netdev_dpdk_vdpa_rxq_recv(struct netdev_rxq *rxq,
> > > +                          struct dp_packet_batch *batch,
> > > +                          int *qfill) {
> > > +    struct netdev_dpdk *dev = netdev_dpdk_cast(rxq->netdev);
> > > +    int fwd_rx;
> > > +    int ret;
> > > +
> > > +    fwd_rx = netdev_dpdk_vdpa_rxq_recv_impl(dev->relay,
> > > + rxq->queue_id);
> > I'm still not clear about the above function.
> > So netdev_dpdk_vdpa_recv_impl()
> >     netdev_dpdk_vdpa_forward_traffic(), with a queue pair as parameter
> >         ...
> >         rte_eth_rx_burst(qpair->port_id_rx...)
> >         ...
> >         rte_eth_tx_burst(qpair->port_id_tx...)
> >
> > So looks like forwarding between vf to vhostuser and vice versa is
> > done in this function.
> >
> > > +    ret = netdev_dpdk_rxq_recv(rxq, batch, qfill);
> >
> > Then why do we call netdev_dpdk_rxq_recv() above again?
> > Are packets received above the same packets as rte_eth_rx_burst()
> > previously called in netdev_dpdk_vdpa_forward_traffic()?
> >

netdev_dpdk_vdpa_recv_impl() first calls: rte_eth_rx_burst and rte_eth_tx_burst in order to forward between vf to vhostuser and vice versa.
After rx_burst and tx_burst is done, we call netdev_dpdk_rxq_recv() in order to receive packets for the representor. 
The queue is different in rte_eth_rx_burst, rte_eth_tx_burst and netdev_dpdk_rxq_recv.
We use this methodology in order to use the free cycles allocated for the representor's receive. When using HW offload most of the packets will be offloaded and won't go through OVS.  
Attached is the RFC and community discussion.
I hope it is clearer now.

> >
> > Thanks
> > William
> >
> > > +    if ((ret == EAGAIN) && fwd_rx) {
> > > +        return 0;
> > > +    }
> > > +    return ret;
> > > +}
> > > +
> > >  static inline int
> > >  netdev_dpdk_qos_run(struct netdev_dpdk *dev, struct rte_mbuf
> **pkts,
> > >                      int cnt, bool should_steal) @@ -2854,6 +2955,29
> > > @@ netdev_dpdk_vhost_get_custom_stats(const struct netdev
> *netdev,
> > }
> > >
> > >  static int
> > > +netdev_dpdk_vdpa_get_custom_stats(const struct netdev *netdev,
> > > +                                  struct netdev_custom_stats
> > > +*custom_stats) {
> > > +    struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
> > > +    int err = 0;
> > > +
> > > +    ovs_mutex_lock(&dev->mutex);
> > > +
> > > +    custom_stats->size = VDPA_CUSTOM_STATS_SIZE;
> > > +    custom_stats->counters = xcalloc(custom_stats->size,
> > > +                                     sizeof *custom_stats->counters);
> > > +    err = netdev_dpdk_vdpa_get_custom_stats_impl(dev->relay,
> > > +                                                 custom_stats);
> > > +    if (err) {
> > > +        VLOG_ERR("netdev_dpdk_vdpa_get_custom_stats_impl failed."
> > > +                 "Port %s\n", netdev->name);
> > > +    }
> > > +
> > > +    ovs_mutex_unlock(&dev->mutex);
> > > +    return err;
> > > +}
> > > +
> > > +static int
> > >  netdev_dpdk_get_features(const struct netdev *netdev,
> > >                           enum netdev_features *current,
> > >                           enum netdev_features *advertised, @@
> > > -4237,6
> > > +4361,31 @@ netdev_dpdk_vhost_reconfigure(struct netdev *netdev)  }
> > >
> > >  static int
> > > +netdev_dpdk_vdpa_reconfigure(struct netdev *netdev) {
> > > +    struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
> > > +    int err;
> > > +
> > > +    err = netdev_dpdk_reconfigure(netdev);
> > > +    if (err) {
> > > +        VLOG_ERR("netdev_dpdk_reconfigure failed. Port %s", netdev-
> > >name);
> > > +        goto out;
> > > +    }
> > > +
> > > +    ovs_mutex_lock(&dev->mutex);
> > > +    err = netdev_dpdk_vdpa_update_relay(dev->relay, dev->dpdk_mp-
> > >mp,
> > > +                                        dev->up.n_rxq);
> > > +    if (err) {
> > > +        VLOG_ERR("netdev_dpdk_vdpa_update_relay failed. Port %s",
> > > +                 netdev->name);
> > > +    }
> > > +
> > > +    ovs_mutex_unlock(&dev->mutex);
> > > +out:
> > > +    return err;
> > > +}
> > > +
> > > +static int
> > >  netdev_dpdk_vhost_client_reconfigure(struct netdev *netdev)  {
> > >      struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); @@ -4456,6
> > > +4605,18 @@ static const struct netdev_class dpdk_vhost_client_class
> > > += {
> > >      .rxq_enabled = netdev_dpdk_vhost_rxq_enabled,  };
> > >
> > > +static const struct netdev_class dpdk_vdpa_class = {
> > > +    .type = "dpdkvdpa",
> > > +    NETDEV_DPDK_CLASS_COMMON,
> > > +    .construct = netdev_dpdk_vdpa_construct,
> > > +    .destruct = netdev_dpdk_vdpa_destruct,
> > > +    .rxq_recv = netdev_dpdk_vdpa_rxq_recv,
> > > +    .set_config = netdev_dpdk_vdpa_set_config,
> > > +    .reconfigure = netdev_dpdk_vdpa_reconfigure,
> > > +    .get_custom_stats = netdev_dpdk_vdpa_get_custom_stats,
> > > +    .send = netdev_dpdk_eth_send
> > > +};
> > > +
> > >  void
> > >  netdev_dpdk_register(void)
> > >  {
> > > @@ -4463,4 +4624,5 @@ netdev_dpdk_register(void)
> > >      netdev_register_provider(&dpdk_ring_class);
> > >      netdev_register_provider(&dpdk_vhost_class);
> > >      netdev_register_provider(&dpdk_vhost_client_class);
> > > +    netdev_register_provider(&dpdk_vdpa_class);
> > >  }
> > > diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml index
> > > 9a743c0..9e94950 100644
> > > --- a/vswitchd/vswitch.xml
> > > +++ b/vswitchd/vswitch.xml
> > > @@ -2640,6 +2640,13 @@
> > >            <dd>
> > >              A pair of virtual devices that act as a patch cable.
> > >            </dd>
> > > +
> > > +          <dt><code>dpdkvdpa</code></dt>
> > > +          <dd>
> > > +            The dpdk vDPA port allows forwarding bi-directional traffic
> between
> > > +            SR-IOV virtual functions (VFs) and VirtIO devices in virtual
> > > +            machines (VMs).
> > > +          </dd>
> > >          </dl>
> > >        </column>
> > >      </group>
> > > @@ -3156,6 +3163,24 @@ ovs-vsctl add-port br0 p0 -- set Interface p0
> > type=patch options:peer=p1 \
> > >          </p>
> > >        </column>
> > >
> > > +      <column name="options" key="vdpa-socket-path"
> > > +              type='{"type": "string"}'>
> > > +        <p>
> > > +          The value specifies the path to the socket associated with a VDPA
> > > +          port that will be created by QEMU.
> > > +          Only supported by dpdkvdpa interfaces.
> > > +        </p>
> > > +      </column>
> > > +
> > > +      <column name="options" key="vdpa-accelerator-devargs"
> > > +              type='{"type": "string"}'>
> > > +        <p>
> > > +          The value specifies the PCI address associated with the virtual
> > > +          function.
> > > +          Only supported by dpdkvdpa interfaces.
> > > +        </p>
> > > +      </column>
> > > +
> > >        <column name="options" key="dq-zero-copy"
> > >                type='{"type": "boolean"}'>
> > >          <p>
> > > --
> > > 1.8.3.1
> > >
> > > _______________________________________________
> > > dev mailing list
> > > dev@openvswitch.org
> > >
> >
> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail
> > > .openvswitch.org%2Fmailman%2Flistinfo%2Fovs-
> > dev&amp;data=02%7C01%7Cnoa
> > >
> >
> e%40mellanox.com%7C6390609428bf4e2a2df808d7580cc233%7Ca652971c7d2
> > e4d9b
> > >
> >
> a6a4d149256f461b%7C0%7C0%7C637074684147132980&amp;sdata=Eai7e%2B
> > Ln5x8a
> > > fpEi7HdWF8FHDYe4vD7dxRLO2Yo0usQ%3D&amp;reserved=0
Noa Ezra Oct. 27, 2019, 11:48 a.m. UTC | #7
> -----Original Message-----
> From: Noa Levy
> Sent: Sunday, October 27, 2019 1:26 PM
> To: 'William Tu' <u9012063@gmail.com>
> Cc: 'ovs-dev@openvswitch.org' <ovs-dev@openvswitch.org>; Oz Shlomo
> <ozsh@mellanox.com>; Majd Dibbiny <majd@mellanox.com>; Ameer
> Mahagneh <ameerm@mellanox.com>; Eli Britstein <elibr@mellanox.com>
> Subject: RE: [ovs-dev] [PATCH ovs v3 2/2] netdev-dpdk: Add dpdkvdpa port
> 
> 
> 
> > -----Original Message-----
> > From: Noa Levy
> > Sent: Sunday, October 27, 2019 11:24 AM
> > To: William Tu <u9012063@gmail.com>
> > Cc: ovs-dev@openvswitch.org; Oz Shlomo <ozsh@mellanox.com>; Majd
> > Dibbiny <majd@mellanox.com>; Ameer Mahagneh
> <ameerm@mellanox.com>; Eli
> > Britstein <elibr@mellanox.com>
> > Subject: RE: [ovs-dev] [PATCH ovs v3 2/2] netdev-dpdk: Add dpdkvdpa
> > port
> >
> >
> > > -----Original Message-----
> > > From: William Tu [mailto:u9012063@gmail.com]
> > > Sent: Thursday, October 24, 2019 2:00 AM
> > > To: Noa Levy <noae@mellanox.com>
> > > Cc: ovs-dev@openvswitch.org; Oz Shlomo <ozsh@mellanox.com>; Majd
> > > Dibbiny <majd@mellanox.com>; Ameer Mahagneh
> > <ameerm@mellanox.com>; Eli
> > > Britstein <elibr@mellanox.com>
> > > Subject: Re: [ovs-dev] [PATCH ovs v3 2/2] netdev-dpdk: Add dpdkvdpa
> > > port
> > >
> > > Hi Noa,
> > >
> > > I have a couple more questions. I'm still at the learning stage of
> > > this new feature, thanks in advance for your patience.
> > >
> > > On Thu, Oct 17, 2019 at 02:16:56PM +0300, Noa Ezra wrote:
> > > > dpdkvdpa netdev works with 3 components:
> > > > vhost-user socket, vdpa device: real vdpa device or a VF and
> > > > representor of "vdpa device".
> > >
> > > What NIC card support this feature?
> > > I don't have real vdpa device, can I use Intel X540 VF feature?
> > >
> >
> > This feature will have two modes, SW and HW.
> > The SW mode doesn't depend on a real vdpa device and allows you to use
> > this feature even if you don't have a NIC that support it.
Although you need to use representors, so you need your NIC to support it.
> > The HW mode will be implemented in the future and will use a real vdpa
> > device. It will be better to use the HW mode if you have a NIC that support
> it.
> >
> > For now, we only support the SW mode, when vdpa will have support in
> > dpdk, we will add the HW mode to OVS.
> >
> > > >
> > > > In order to add a new vDPA port, add a new port to existing bridge
> > > > with type dpdkvdpa and vDPA options:
> > > > ovs-vsctl add-port br0 vdpa0 -- set Interface vdpa0 type=dpdkvdpa
> > > >    options:vdpa-socket-path=<sock path>
> > > >    options:vdpa-accelerator-devargs=<VF pci id>
> > > >    options:dpdk-devargs=<vdpa pci id>,representor=[id]
> > > >
> > > > On this command OVS will create a new netdev:
> > > > 1. Register vhost-user-client device.
> > > > 2. Open and configure VF dpdk port.
> > > > 3. Open and configure representor dpdk port.
> > > >
> > > > The new netdev will use netdev_rxq_recv() function in order to
> > > > receive packets from VF and push to vhost-user and receive packets
> > > > from vhost-user and push to VF.
> > > >
> > > > Signed-off-by: Noa Ezra <noae@mellanox.com>
> > > > Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
> > > > ---
> > > >  Documentation/automake.mk           |   1 +
> > > >  Documentation/topics/dpdk/index.rst |   1 +
> > > >  Documentation/topics/dpdk/vdpa.rst  |  90 ++++++++++++++++++++
> > > >  NEWS                                |   1 +
> > > >  lib/netdev-dpdk.c                   | 162
> > > ++++++++++++++++++++++++++++++++++++
> > > >  vswitchd/vswitch.xml                |  25 ++++++
> > > >  6 files changed, 280 insertions(+)  create mode 100644
> > > > Documentation/topics/dpdk/vdpa.rst
> > > >
> > > > diff --git a/Documentation/automake.mk
> > b/Documentation/automake.mk
> > > > index cd68f3b..ee574bc 100644
> > > > --- a/Documentation/automake.mk
> > > > +++ b/Documentation/automake.mk
> > > > @@ -43,6 +43,7 @@ DOC_SOURCE = \
> > > >  	Documentation/topics/dpdk/ring.rst \
> > > >  	Documentation/topics/dpdk/vdev.rst \
> > > >  	Documentation/topics/dpdk/vhost-user.rst \
> > > > +	Documentation/topics/dpdk/vdpa.rst \
> > > >  	Documentation/topics/fuzzing/index.rst \
> > > >  	Documentation/topics/fuzzing/what-is-fuzzing.rst \
> > > >  	Documentation/topics/fuzzing/ovs-fuzzing-infrastructure.rst \
> > > > diff --git a/Documentation/topics/dpdk/index.rst
> > > > b/Documentation/topics/dpdk/index.rst
> > > > index cf24a7b..c1d4ea7 100644
> > > > --- a/Documentation/topics/dpdk/index.rst
> > > > +++ b/Documentation/topics/dpdk/index.rst
> > > > @@ -41,3 +41,4 @@ The DPDK Datapath
> > > >     /topics/dpdk/pdump
> > > >     /topics/dpdk/jumbo-frames
> > > >     /topics/dpdk/memory
> > > > +   /topics/dpdk/vdpa
> > > > diff --git a/Documentation/topics/dpdk/vdpa.rst
> > > > b/Documentation/topics/dpdk/vdpa.rst
> > > > new file mode 100644
> > > > index 0000000..34c5300
> > > > --- /dev/null
> > > > +++ b/Documentation/topics/dpdk/vdpa.rst
> > > > @@ -0,0 +1,90 @@
> > > > +..
> > > > +      Copyright (c) 2019 Mellanox Technologies, Ltd.
> > > > +
> > > > +      Licensed under the Apache License, Version 2.0 (the "License");
> > > > +      you may not use this file except in compliance with the License.
> > > > +      You may obtain a copy of the License at:
> > > > +
> > > > +
> > > > +
> > >
> >
> https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww
> > > > + .apache.org%2Flicenses%2FLICENSE-
> > > 2.0&amp;data=02%7C01%7Cnoae%40mella
> > > > +
> > >
> >
> nox.com%7C6390609428bf4e2a2df808d7580cc233%7Ca652971c7d2e4d9ba6a4
> > > d14
> > > > +
> > >
> >
> 9256f461b%7C0%7C0%7C637074684147132980&amp;sdata=94myUB4Fchqm4
> > > 4lxlto
> > > > + OIcbCXhlu%2FA%2FoVID8Z9EyvXU%3D&amp;reserved=0
> > > > +
> > > > +      Unless required by applicable law or agreed to in writing, software
> > > > +      distributed under the License is distributed on an "AS IS"
> > > > + BASIS,
> > > WITHOUT
> > > > +      WARRANTIES OR CONDITIONS OF ANY KIND, either express or
> > implied.
> > > See the
> > > > +      License for the specific language governing permissions and
> > limitations
> > > > +      under the License.
> > > > +
> > > > +      Convention for heading levels in Open vSwitch documentation:
> > > > +
> > > > +      =======  Heading 0 (reserved for the title in a document)
> > > > +      -------  Heading 1
> > > > +      ~~~~~~~  Heading 2
> > > > +      +++++++  Heading 3
> > > > +      '''''''  Heading 4
> > > > +
> > > > +      Avoid deeper levels because they do not render well.
> > > > +
> > > > +
> > > > +===============
> > > > +DPDK VDPA Ports
> > > > +===============
> > > > +
> > > > +In user space there are two main approaches to communicate with a
> > > > +guest (VM), using virtIO ports (e.g. netdev
> > > > +type=dpdkvhoshuser/dpdkvhostuserclient) or SR-IOV using phy ports
> > > (e.g. netdev type = dpdk).
> > > > +Phy ports allow working with port representor which is attached
> > > > +to the OVS and a matching VF is given with pass-through to the guest.
> > > > +HW rules can process packets from up-link and direct them to the
> > > > +VF without going through SW (OVS) and therefore using phy ports
> > > > +gives the best performance.
> > > > +However, SR-IOV architecture requires that the guest will use a
> > > > +driver which is specific to the underlying HW. Specific HW driver
> > > > +has two
> > > main drawbacks:
> > > > +1. Breaks virtualization in some sense (guest aware of the HW),
> > > > +can also limit the type of images supported.
> > > > +2. Less natural support for live migration.
> > > > +
> > > > +Using virtIO port solves both problems, but reduces performance
> > > > +and causes losing of some functionality, for example, for some HW
> > > > +offload, working directly with virtIO cannot be supported.
> > > > +
> > > > +We created a new netdev type- dpdkvdpa. dpdkvdpa port solves this
> > > conflict.
> > > > +The new netdev is basically very similar to regular dpdk netdev
> > > > +but it has some additional functionally.
> > > > +This port translates between phy port to virtIO port, it takes
> > > > +packets from rx-queue and send them to the suitable tx-queue and
> > > > +allows to transfer packets from virtIO guest (VM) to a VF and
> > > > +vice versa and benefit both SR-IOV and virtIO.
> > > > +
> > > > +Quick Example
> > > > +-------------
> > > > +
> > > > +Configure OVS bridge and ports
> > > > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > > +
> > > > +you must first create a bridge and add ports to the switch.
> > > > +Since the dpdkvdpa port is configured as a client, the
> > > > +vdpa-socket-path must be configured by the user.
> > > > +VHOST_USER_SOCKET_PATH=/path/to/socket
> > > > +
> > > > +    $ ovs-vsctl add-br br0-ovs -- set bridge br0-ovs
> datapath_type=netdev
> > > > +    $ ovs-vsctl add-port br0-ovs pf -- set Interface pf \
> > > > +    type=dpdk options:dpdk-devargs=<pf pci id>
> > >
> > > Is adding pf port to br0 necessary?
> 
> No, the pf is not related to the vdpa port and forwarding feature, it is just a
> way to connect the VF to the up-link.
> 
> > >
> > > > +    $ ovs-vsctl add-port br0 vdpa0 -- set Interface vdpa0 type=dpdkvdpa
> \
> > > > +    options:vdpa-socket-path=VHOST_USER_SOCKET_PATH \
> > > > +    options:vdpa-accelerator-devargs=<vf pci id> \
> > > > +    options:dpdk-devargs=<pf pci id>,representor=[id]
> > > > +
> > > > +Once the ports have been added to the switch, they must be added
> > > > +to
> > > the guest.
> > > > +
> > > > +Adding vhost-user ports to the guest (QEMU)
> > > > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > > +
> > > > +Attach the vhost-user device sockets to the guest. To do this,
> > > > +you must pass the following parameters to QEMU:
> > > > +
> > > > +    -chardev
> socket,id=char1,path=$VHOST_USER_SOCKET_PATH,server
> > > > +    -netdev type=vhost-user,id=mynet1,chardev=char1,vhostforce
> > > > +    -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1
> > > > +
> > > > +QEMU will wait until the port is created successfully in OVS to
> > > > +boot the
> > > VM.
> > > > +In this mode, in case the switch will crash, the vHost ports will
> > > > +reconnect automatically once it is brought back.
> > > > diff --git a/NEWS b/NEWS
> > > > index f5a0b8f..6f315c6 100644
> > > > --- a/NEWS
> > > > +++ b/NEWS
> > > > @@ -542,6 +542,7 @@ v2.6.0 - 27 Sep 2016
> > > >       * Remove dpdkvhostcuse port type.
> > > >       * OVS client mode for vHost and vHost reconnect (Requires
> > > > QEMU
> > 2.7)
> > > >       * 'dpdkvhostuserclient' port type.
> > > > +     * 'dpdkvdpa' port type.
> > > >     - Increase number of registers to 16.
> > > >     - ovs-benchmark: This utility has been removed due to lack of use and
> > > >       bitrot.
> > > > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index
> > > > bc20d68..16ddf58 100644
> > > > --- a/lib/netdev-dpdk.c
> > > > +++ b/lib/netdev-dpdk.c
> > > > @@ -47,6 +47,7 @@
> > > >  #include "dpif-netdev.h"
> > > >  #include "fatal-signal.h"
> > > >  #include "netdev-provider.h"
> > > > +#include "netdev-dpdk-vdpa.h"
> > > >  #include "netdev-vport.h"
> > > >  #include "odp-util.h"
> > > >  #include "openvswitch/dynamic-string.h"
> > > > @@ -137,6 +138,9 @@ typedef uint16_t dpdk_port_t;
> > > >  /* Legacy default value for vhost tx retries. */  #define
> > > > VHOST_ENQ_RETRY_DEF 8
> > > >
> > > > +/* Size of VDPA custom stats. */
> > > > +#define VDPA_CUSTOM_STATS_SIZE          4
> > > > +
> > > >  #define IF_NAME_SZ (PATH_MAX > IFNAMSIZ ? PATH_MAX :
> IFNAMSIZ)
> > > >
> > > >  static const struct rte_eth_conf port_conf = { @@ -461,6 +465,8
> > > > @@ struct netdev_dpdk {
> > > >          int rte_xstats_ids_size;
> > > >          uint64_t *rte_xstats_ids;
> > > >      );
> > > > +
> > > > +    struct netdev_dpdk_vdpa_relay *relay;
> > > >  };
> > > >
> > > >  struct netdev_rxq_dpdk {
> > > > @@ -1346,6 +1352,30 @@ netdev_dpdk_construct(struct netdev
> > *netdev)
> > > >      return err;
> > > >  }
> > > >
> > > > +static int
> > > > +netdev_dpdk_vdpa_construct(struct netdev *netdev) {
> > > > +    struct netdev_dpdk *dev;
> > > > +    int err;
> > > > +
> > > > +    err = netdev_dpdk_construct(netdev);
> > > > +    if (err) {
> > > > +        VLOG_ERR("netdev_dpdk_construct failed. Port: %s\n",
> > > > + netdev-
> > > >name);
> > > > +        goto out;
> > > > +    }
> > > > +
> > > > +    ovs_mutex_lock(&dpdk_mutex);
> > > > +    dev = netdev_dpdk_cast(netdev);
> > > > +    dev->relay = netdev_dpdk_vdpa_alloc_relay();
> > > > +    if (!dev->relay) {
> > > > +        err = ENOMEM;
> > > > +    }
> > > > +
> > > > +    ovs_mutex_unlock(&dpdk_mutex);
> > > > +out:
> > > > +    return err;
> > > > +}
> > > > +
> > > >  static void
> > > >  common_destruct(struct netdev_dpdk *dev)
> > > >      OVS_REQUIRES(dpdk_mutex)
> > > > @@ -1428,6 +1458,19 @@ dpdk_vhost_driver_unregister(struct
> > > netdev_dpdk
> > > > *dev OVS_UNUSED,  }
> > > >
> > > >  static void
> > > > +netdev_dpdk_vdpa_destruct(struct netdev *netdev) {
> > > > +    struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
> > > > +
> > > > +    ovs_mutex_lock(&dpdk_mutex);
> > > > +    netdev_dpdk_vdpa_destruct_impl(dev->relay);
> > > > +    rte_free(dev->relay);
> > > > +    ovs_mutex_unlock(&dpdk_mutex);
> > > > +
> > > > +    netdev_dpdk_destruct(netdev); }
> > > > +
> > > > +static void
> > > >  netdev_dpdk_vhost_destruct(struct netdev *netdev)  {
> > > >      struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); @@
> > > > -1878,6
> > > > +1921,47 @@ out:
> > > >  }
> > > >
> > > >  static int
> > > > +netdev_dpdk_vdpa_set_config(struct netdev *netdev, const struct
> > > > +smap
> > > *args,
> > > > +                            char **errp) {
> > > > +    struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
> > > > +    const char *vdpa_accelerator_devargs =
> > > > +                smap_get(args, "vdpa-accelerator-devargs");
> > > > +    const char *vdpa_socket_path =
> > > > +                smap_get(args, "vdpa-socket-path");
> > > > +    int err = 0;
> > > > +
> > > > +    if ((vdpa_accelerator_devargs == NULL) || (vdpa_socket_path
> > > > + ==
> > > NULL)) {
> > > > +        VLOG_ERR("netdev_dpdk_vdpa_set_config failed."
> > > > +                 "Required arguments are missing for VDPA port %s",
> > > > +                 netdev->name);
> > > > +        goto free_relay;
> > > > +    }
> > > > +
> > > > +    err = netdev_dpdk_set_config(netdev, args, errp);
> > > > +    if (err) {
> > > > +        VLOG_ERR("netdev_dpdk_set_config failed. Port: %s",
> > > > + netdev-
> > > >name);
> > > > +        goto free_relay;
> > > > +    }
> > > > +
> > > > +    err = netdev_dpdk_vdpa_config_impl(dev->relay, dev->port_id,
> > > > +                                       vdpa_socket_path,
> > > > +                                       vdpa_accelerator_devargs);
> > > > +    if (err) {
> > > > +        VLOG_ERR("netdev_dpdk_vdpa_config_impl failed. Port %s",
> > > > +                 netdev->name);
> > > > +        goto free_relay;
> > > > +    }
> > > > +
> > > > +    goto out;
> > > > +
> > > > +free_relay:
> > > > +    rte_free(dev->relay);
> > > > +out:
> > > > +    return err;
> > > > +}
> > > > +
> > > > +static int
> > > >  netdev_dpdk_ring_set_config(struct netdev *netdev, const struct
> > > > smap
> > > *args,
> > > >                              char **errp OVS_UNUSED)  { @@ -2273,6
> > > > +2357,23 @@ netdev_dpdk_rxq_recv(struct netdev_rxq *rxq, struct
> > > dp_packet_batch *batch,
> > > >      return 0;
> > > >  }
> > > >
> > > > +static int
> > > > +netdev_dpdk_vdpa_rxq_recv(struct netdev_rxq *rxq,
> > > > +                          struct dp_packet_batch *batch,
> > > > +                          int *qfill) {
> > > > +    struct netdev_dpdk *dev = netdev_dpdk_cast(rxq->netdev);
> > > > +    int fwd_rx;
> > > > +    int ret;
> > > > +
> > > > +    fwd_rx = netdev_dpdk_vdpa_rxq_recv_impl(dev->relay,
> > > > + rxq->queue_id);
> > > I'm still not clear about the above function.
> > > So netdev_dpdk_vdpa_recv_impl()
> > >     netdev_dpdk_vdpa_forward_traffic(), with a queue pair as parameter
> > >         ...
> > >         rte_eth_rx_burst(qpair->port_id_rx...)
> > >         ...
> > >         rte_eth_tx_burst(qpair->port_id_tx...)
> > >
> > > So looks like forwarding between vf to vhostuser and vice versa is
> > > done in this function.
> > >
> > > > +    ret = netdev_dpdk_rxq_recv(rxq, batch, qfill);
> > >
> > > Then why do we call netdev_dpdk_rxq_recv() above again?
> > > Are packets received above the same packets as rte_eth_rx_burst()
> > > previously called in netdev_dpdk_vdpa_forward_traffic()?
> > >
> 
> netdev_dpdk_vdpa_recv_impl() first calls: rte_eth_rx_burst and
> rte_eth_tx_burst in order to forward between vf to vhostuser and vice
> versa.
> After rx_burst and tx_burst is done, we call netdev_dpdk_rxq_recv() in
> order to receive packets for the representor.
> The queue is different in rte_eth_rx_burst, rte_eth_tx_burst and
> netdev_dpdk_rxq_recv.
> We use this methodology in order to use the free cycles allocated for the
> representor's receive. When using HW offload most of the packets will be
> offloaded and won't go through OVS.
> Attached is the RFC and community discussion.
> I hope it is clearer now.

Sorry for adding an attachment, you can find the RFC here: 
https://mail.openvswitch.org/pipermail/ovs-dev/2019-May/359007.html
Or the whole thread: https://mail.openvswitch.org/pipermail/ovs-dev/2019-May/thread.html#359007

> > >
> > > Thanks
> > > William
> > >
> > > > +    if ((ret == EAGAIN) && fwd_rx) {
> > > > +        return 0;
> > > > +    }
> > > > +    return ret;
> > > > +}
> > > > +
> > > >  static inline int
> > > >  netdev_dpdk_qos_run(struct netdev_dpdk *dev, struct rte_mbuf
> > **pkts,
> > > >                      int cnt, bool should_steal) @@ -2854,6
> > > > +2955,29 @@ netdev_dpdk_vhost_get_custom_stats(const struct
> netdev
> > *netdev,
> > > }
> > > >
> > > >  static int
> > > > +netdev_dpdk_vdpa_get_custom_stats(const struct netdev *netdev,
> > > > +                                  struct netdev_custom_stats
> > > > +*custom_stats) {
> > > > +    struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
> > > > +    int err = 0;
> > > > +
> > > > +    ovs_mutex_lock(&dev->mutex);
> > > > +
> > > > +    custom_stats->size = VDPA_CUSTOM_STATS_SIZE;
> > > > +    custom_stats->counters = xcalloc(custom_stats->size,
> > > > +                                     sizeof *custom_stats->counters);
> > > > +    err = netdev_dpdk_vdpa_get_custom_stats_impl(dev->relay,
> > > > +                                                 custom_stats);
> > > > +    if (err) {
> > > > +        VLOG_ERR("netdev_dpdk_vdpa_get_custom_stats_impl failed."
> > > > +                 "Port %s\n", netdev->name);
> > > > +    }
> > > > +
> > > > +    ovs_mutex_unlock(&dev->mutex);
> > > > +    return err;
> > > > +}
> > > > +
> > > > +static int
> > > >  netdev_dpdk_get_features(const struct netdev *netdev,
> > > >                           enum netdev_features *current,
> > > >                           enum netdev_features *advertised, @@
> > > > -4237,6
> > > > +4361,31 @@ netdev_dpdk_vhost_reconfigure(struct netdev *netdev)
> > > > +}
> > > >
> > > >  static int
> > > > +netdev_dpdk_vdpa_reconfigure(struct netdev *netdev) {
> > > > +    struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
> > > > +    int err;
> > > > +
> > > > +    err = netdev_dpdk_reconfigure(netdev);
> > > > +    if (err) {
> > > > +        VLOG_ERR("netdev_dpdk_reconfigure failed. Port %s",
> > > > + netdev-
> > > >name);
> > > > +        goto out;
> > > > +    }
> > > > +
> > > > +    ovs_mutex_lock(&dev->mutex);
> > > > +    err = netdev_dpdk_vdpa_update_relay(dev->relay, dev-
> >dpdk_mp-
> > > >mp,
> > > > +                                        dev->up.n_rxq);
> > > > +    if (err) {
> > > > +        VLOG_ERR("netdev_dpdk_vdpa_update_relay failed. Port %s",
> > > > +                 netdev->name);
> > > > +    }
> > > > +
> > > > +    ovs_mutex_unlock(&dev->mutex);
> > > > +out:
> > > > +    return err;
> > > > +}
> > > > +
> > > > +static int
> > > >  netdev_dpdk_vhost_client_reconfigure(struct netdev *netdev)  {
> > > >      struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); @@
> > > > -4456,6
> > > > +4605,18 @@ static const struct netdev_class
> > > > +dpdk_vhost_client_class = {
> > > >      .rxq_enabled = netdev_dpdk_vhost_rxq_enabled,  };
> > > >
> > > > +static const struct netdev_class dpdk_vdpa_class = {
> > > > +    .type = "dpdkvdpa",
> > > > +    NETDEV_DPDK_CLASS_COMMON,
> > > > +    .construct = netdev_dpdk_vdpa_construct,
> > > > +    .destruct = netdev_dpdk_vdpa_destruct,
> > > > +    .rxq_recv = netdev_dpdk_vdpa_rxq_recv,
> > > > +    .set_config = netdev_dpdk_vdpa_set_config,
> > > > +    .reconfigure = netdev_dpdk_vdpa_reconfigure,
> > > > +    .get_custom_stats = netdev_dpdk_vdpa_get_custom_stats,
> > > > +    .send = netdev_dpdk_eth_send
> > > > +};
> > > > +
> > > >  void
> > > >  netdev_dpdk_register(void)
> > > >  {
> > > > @@ -4463,4 +4624,5 @@ netdev_dpdk_register(void)
> > > >      netdev_register_provider(&dpdk_ring_class);
> > > >      netdev_register_provider(&dpdk_vhost_class);
> > > >      netdev_register_provider(&dpdk_vhost_client_class);
> > > > +    netdev_register_provider(&dpdk_vdpa_class);
> > > >  }
> > > > diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml index
> > > > 9a743c0..9e94950 100644
> > > > --- a/vswitchd/vswitch.xml
> > > > +++ b/vswitchd/vswitch.xml
> > > > @@ -2640,6 +2640,13 @@
> > > >            <dd>
> > > >              A pair of virtual devices that act as a patch cable.
> > > >            </dd>
> > > > +
> > > > +          <dt><code>dpdkvdpa</code></dt>
> > > > +          <dd>
> > > > +            The dpdk vDPA port allows forwarding bi-directional
> > > > + traffic
> > between
> > > > +            SR-IOV virtual functions (VFs) and VirtIO devices in virtual
> > > > +            machines (VMs).
> > > > +          </dd>
> > > >          </dl>
> > > >        </column>
> > > >      </group>
> > > > @@ -3156,6 +3163,24 @@ ovs-vsctl add-port br0 p0 -- set Interface
> > > > p0
> > > type=patch options:peer=p1 \
> > > >          </p>
> > > >        </column>
> > > >
> > > > +      <column name="options" key="vdpa-socket-path"
> > > > +              type='{"type": "string"}'>
> > > > +        <p>
> > > > +          The value specifies the path to the socket associated with a VDPA
> > > > +          port that will be created by QEMU.
> > > > +          Only supported by dpdkvdpa interfaces.
> > > > +        </p>
> > > > +      </column>
> > > > +
> > > > +      <column name="options" key="vdpa-accelerator-devargs"
> > > > +              type='{"type": "string"}'>
> > > > +        <p>
> > > > +          The value specifies the PCI address associated with the virtual
> > > > +          function.
> > > > +          Only supported by dpdkvdpa interfaces.
> > > > +        </p>
> > > > +      </column>
> > > > +
> > > >        <column name="options" key="dq-zero-copy"
> > > >                type='{"type": "boolean"}'>
> > > >          <p>
> > > > --
> > > > 1.8.3.1
> > > >
> > > > _______________________________________________
> > > > dev mailing list
> > > > dev@openvswitch.org
> > > >
> > >
> >
> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail
> > > > .openvswitch.org%2Fmailman%2Flistinfo%2Fovs-
> > > dev&amp;data=02%7C01%7Cnoa
> > > >
> > >
> >
> e%40mellanox.com%7C6390609428bf4e2a2df808d7580cc233%7Ca652971c7d2
> > > e4d9b
> > > >
> > >
> >
> a6a4d149256f461b%7C0%7C0%7C637074684147132980&amp;sdata=Eai7e%2B
> > > Ln5x8a
> > > > fpEi7HdWF8FHDYe4vD7dxRLO2Yo0usQ%3D&amp;reserved=0
William Tu Oct. 28, 2019, 8:45 p.m. UTC | #8
Hi Noa,

Thanks for your reply.

> > > > Hi Noa,
> > > >
> > > > I have a couple more questions. I'm still at the learning stage of
> > > > this new feature, thanks in advance for your patience.
> > > >
> > > > On Thu, Oct 17, 2019 at 02:16:56PM +0300, Noa Ezra wrote:
> > > > > dpdkvdpa netdev works with 3 components:
> > > > > vhost-user socket, vdpa device: real vdpa device or a VF and
> > > > > representor of "vdpa device".
> > > >
> > > > What NIC card support this feature?
> > > > I don't have real vdpa device, can I use Intel X540 VF feature?
> > > >
> > >
> > > This feature will have two modes, SW and HW.
> > > The SW mode doesn't depend on a real vdpa device and allows you to use
> > > this feature even if you don't have a NIC that support it.
> Although you need to use representors, so you need your NIC to support it.
> > > The HW mode will be implemented in the future and will use a real vdpa
> > > device. It will be better to use the HW mode if you have a NIC that support
> > it.
> > >
> > > For now, we only support the SW mode, when vdpa will have support in
> > > dpdk, we will add the HW mode to OVS.
> > >
> > > > >
> > > > > In order to add a new vDPA port, add a new port to existing bridge
> > > > > with type dpdkvdpa and vDPA options:
> > > > > ovs-vsctl add-port br0 vdpa0 -- set Interface vdpa0 type=dpdkvdpa
> > > > >    options:vdpa-socket-path=<sock path>
> > > > >    options:vdpa-accelerator-devargs=<VF pci id>
> > > > >    options:dpdk-devargs=<vdpa pci id>,representor=[id]
> > > > >
> > > > > On this command OVS will create a new netdev:
> > > > > 1. Register vhost-user-client device.
> > > > > 2. Open and configure VF dpdk port.
> > > > > 3. Open and configure representor dpdk port.
> > > > >
> > > > > The new netdev will use netdev_rxq_recv() function in order to
> > > > > receive packets from VF and push to vhost-user and receive packets
> > > > > from vhost-user and push to VF.
> > > > >
> > > > > Signed-off-by: Noa Ezra <noae@mellanox.com>
> > > > > Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
> > > > > ---
> > > > >  Documentation/automake.mk           |   1 +
> > > > >  Documentation/topics/dpdk/index.rst |   1 +
> > > > >  Documentation/topics/dpdk/vdpa.rst  |  90 ++++++++++++++++++++
> > > > >  NEWS                                |   1 +
> > > > >  lib/netdev-dpdk.c                   | 162
> > > > ++++++++++++++++++++++++++++++++++++
> > > > >  vswitchd/vswitch.xml                |  25 ++++++
> > > > >  6 files changed, 280 insertions(+)  create mode 100644
> > > > > Documentation/topics/dpdk/vdpa.rst
> > > > >
> > > > > diff --git a/Documentation/automake.mk
> > > b/Documentation/automake.mk
> > > > > index cd68f3b..ee574bc 100644
> > > > > --- a/Documentation/automake.mk
> > > > > +++ b/Documentation/automake.mk
> > > > > @@ -43,6 +43,7 @@ DOC_SOURCE = \
> > > > >         Documentation/topics/dpdk/ring.rst \
> > > > >         Documentation/topics/dpdk/vdev.rst \
> > > > >         Documentation/topics/dpdk/vhost-user.rst \
> > > > > +       Documentation/topics/dpdk/vdpa.rst \
> > > > >         Documentation/topics/fuzzing/index.rst \
> > > > >         Documentation/topics/fuzzing/what-is-fuzzing.rst \
> > > > >         Documentation/topics/fuzzing/ovs-fuzzing-infrastructure.rst \
> > > > > diff --git a/Documentation/topics/dpdk/index.rst
> > > > > b/Documentation/topics/dpdk/index.rst
> > > > > index cf24a7b..c1d4ea7 100644
> > > > > --- a/Documentation/topics/dpdk/index.rst
> > > > > +++ b/Documentation/topics/dpdk/index.rst
> > > > > @@ -41,3 +41,4 @@ The DPDK Datapath
> > > > >     /topics/dpdk/pdump
> > > > >     /topics/dpdk/jumbo-frames
> > > > >     /topics/dpdk/memory
> > > > > +   /topics/dpdk/vdpa
> > > > > diff --git a/Documentation/topics/dpdk/vdpa.rst
> > > > > b/Documentation/topics/dpdk/vdpa.rst
> > > > > new file mode 100644

<snip>

> > > > > +2357,23 @@ netdev_dpdk_rxq_recv(struct netdev_rxq *rxq, struct
> > > > dp_packet_batch *batch,
> > > > >      return 0;
> > > > >  }
> > > > >
> > > > > +static int
> > > > > +netdev_dpdk_vdpa_rxq_recv(struct netdev_rxq *rxq,
> > > > > +                          struct dp_packet_batch *batch,
> > > > > +                          int *qfill) {
> > > > > +    struct netdev_dpdk *dev = netdev_dpdk_cast(rxq->netdev);
> > > > > +    int fwd_rx;
> > > > > +    int ret;
> > > > > +
> > > > > +    fwd_rx = netdev_dpdk_vdpa_rxq_recv_impl(dev->relay,
> > > > > + rxq->queue_id);
> > > > I'm still not clear about the above function.
> > > > So netdev_dpdk_vdpa_recv_impl()
> > > >     netdev_dpdk_vdpa_forward_traffic(), with a queue pair as parameter
> > > >         ...
> > > >         rte_eth_rx_burst(qpair->port_id_rx...)
> > > >         ...
> > > >         rte_eth_tx_burst(qpair->port_id_tx...)
> > > >
> > > > So looks like forwarding between vf to vhostuser and vice versa is
> > > > done in this function.
> > > >
> > > > > +    ret = netdev_dpdk_rxq_recv(rxq, batch, qfill);
> > > >
> > > > Then why do we call netdev_dpdk_rxq_recv() above again?
> > > > Are packets received above the same packets as rte_eth_rx_burst()
> > > > previously called in netdev_dpdk_vdpa_forward_traffic()?
> > > >
> >
> > netdev_dpdk_vdpa_recv_impl() first calls: rte_eth_rx_burst and
> > rte_eth_tx_burst in order to forward between vf to vhostuser and vice
> > versa.
> > After rx_burst and tx_burst is done, we call netdev_dpdk_rxq_recv() in
> > order to receive packets for the representor.
> > The queue is different in rte_eth_rx_burst, rte_eth_tx_burst and
> > netdev_dpdk_rxq_recv.

So what traffic goes into the queues seen by (rte_eth_rx_burst,
rte_eth_tx_burst)
and what traffic goes to queues seen by netdev_dpdk_rxq_recv()?

And if the HW mode is enabled, then we can remove calling the
rte_eth_rx_burst() and
rte_eth_tx_burst() because HW directly places packet into the virtio queue.
Do I understand correctly?

> > We use this methodology in order to use the free cycles allocated for the
> > representor's receive. When using HW offload most of the packets will be
> > offloaded and won't go through OVS.
> > Attached is the RFC and community discussion.
> > I hope it is clearer now.

Yes, thanks!

>
> Sorry for adding an attachment, you can find the RFC here:
> https://mail.openvswitch.org/pipermail/ovs-dev/2019-May/359007.html
> Or the whole thread: https://mail.openvswitch.org/pipermail/ovs-dev/2019-May/thread.html#359007
>
Thank you for the link, it's very helpful.

I'd like to test this patch set but IIUC this needs a card with port
representer support.
If there is any other way, please let me know.

Regards,
William
Noa Ezra Oct. 29, 2019, 11:20 a.m. UTC | #9
> -----Original Message-----
> From: William Tu [mailto:u9012063@gmail.com]
> Sent: Monday, October 28, 2019 10:46 PM
> To: Noa Levy <noae@mellanox.com>
> Cc: ovs-dev@openvswitch.org; Oz Shlomo <ozsh@mellanox.com>; Majd
> Dibbiny <majd@mellanox.com>; Ameer Mahagneh
> <ameerm@mellanox.com>; Eli Britstein <elibr@mellanox.com>
> Subject: Re: [ovs-dev] [PATCH ovs v3 2/2] netdev-dpdk: Add dpdkvdpa port
> 
> Hi Noa,
> 
> Thanks for your reply.
> 
> > > > > Hi Noa,
> > > > >
> > > > > I have a couple more questions. I'm still at the learning stage
> > > > > of this new feature, thanks in advance for your patience.
> > > > >
> > > > > On Thu, Oct 17, 2019 at 02:16:56PM +0300, Noa Ezra wrote:
> > > > > > dpdkvdpa netdev works with 3 components:
> > > > > > vhost-user socket, vdpa device: real vdpa device or a VF and
> > > > > > representor of "vdpa device".
> > > > >
> > > > > What NIC card support this feature?
> > > > > I don't have real vdpa device, can I use Intel X540 VF feature?
> > > > >
> > > >
> > > > This feature will have two modes, SW and HW.
> > > > The SW mode doesn't depend on a real vdpa device and allows you to
> > > > use this feature even if you don't have a NIC that support it.
> > Although you need to use representors, so you need your NIC to support
> it.
> > > > The HW mode will be implemented in the future and will use a real
> > > > vdpa device. It will be better to use the HW mode if you have a
> > > > NIC that support
> > > it.
> > > >
> > > > For now, we only support the SW mode, when vdpa will have support
> > > > in dpdk, we will add the HW mode to OVS.
> > > >
> > > > > >
> > > > > > In order to add a new vDPA port, add a new port to existing
> > > > > > bridge with type dpdkvdpa and vDPA options:
> > > > > > ovs-vsctl add-port br0 vdpa0 -- set Interface vdpa0 type=dpdkvdpa
> > > > > >    options:vdpa-socket-path=<sock path>
> > > > > >    options:vdpa-accelerator-devargs=<VF pci id>
> > > > > >    options:dpdk-devargs=<vdpa pci id>,representor=[id]
> > > > > >
> > > > > > On this command OVS will create a new netdev:
> > > > > > 1. Register vhost-user-client device.
> > > > > > 2. Open and configure VF dpdk port.
> > > > > > 3. Open and configure representor dpdk port.
> > > > > >
> > > > > > The new netdev will use netdev_rxq_recv() function in order to
> > > > > > receive packets from VF and push to vhost-user and receive
> > > > > > packets from vhost-user and push to VF.
> > > > > >
> > > > > > Signed-off-by: Noa Ezra <noae@mellanox.com>
> > > > > > Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
> > > > > > ---
> > > > > >  Documentation/automake.mk           |   1 +
> > > > > >  Documentation/topics/dpdk/index.rst |   1 +
> > > > > >  Documentation/topics/dpdk/vdpa.rst  |  90
> ++++++++++++++++++++
> > > > > >  NEWS                                |   1 +
> > > > > >  lib/netdev-dpdk.c                   | 162
> > > > > ++++++++++++++++++++++++++++++++++++
> > > > > >  vswitchd/vswitch.xml                |  25 ++++++
> > > > > >  6 files changed, 280 insertions(+)  create mode 100644
> > > > > > Documentation/topics/dpdk/vdpa.rst
> > > > > >
> > > > > > diff --git a/Documentation/automake.mk
> > > > b/Documentation/automake.mk
> > > > > > index cd68f3b..ee574bc 100644
> > > > > > --- a/Documentation/automake.mk
> > > > > > +++ b/Documentation/automake.mk
> > > > > > @@ -43,6 +43,7 @@ DOC_SOURCE = \
> > > > > >         Documentation/topics/dpdk/ring.rst \
> > > > > >         Documentation/topics/dpdk/vdev.rst \
> > > > > >         Documentation/topics/dpdk/vhost-user.rst \
> > > > > > +       Documentation/topics/dpdk/vdpa.rst \
> > > > > >         Documentation/topics/fuzzing/index.rst \
> > > > > >         Documentation/topics/fuzzing/what-is-fuzzing.rst \
> > > > > >
> > > > > > Documentation/topics/fuzzing/ovs-fuzzing-infrastructure.rst \
> > > > > > diff --git a/Documentation/topics/dpdk/index.rst
> > > > > > b/Documentation/topics/dpdk/index.rst
> > > > > > index cf24a7b..c1d4ea7 100644
> > > > > > --- a/Documentation/topics/dpdk/index.rst
> > > > > > +++ b/Documentation/topics/dpdk/index.rst
> > > > > > @@ -41,3 +41,4 @@ The DPDK Datapath
> > > > > >     /topics/dpdk/pdump
> > > > > >     /topics/dpdk/jumbo-frames
> > > > > >     /topics/dpdk/memory
> > > > > > +   /topics/dpdk/vdpa
> > > > > > diff --git a/Documentation/topics/dpdk/vdpa.rst
> > > > > > b/Documentation/topics/dpdk/vdpa.rst
> > > > > > new file mode 100644
> 
> <snip>
> 
> > > > > > +2357,23 @@ netdev_dpdk_rxq_recv(struct netdev_rxq *rxq,
> > > > > > +struct
> > > > > dp_packet_batch *batch,
> > > > > >      return 0;
> > > > > >  }
> > > > > >
> > > > > > +static int
> > > > > > +netdev_dpdk_vdpa_rxq_recv(struct netdev_rxq *rxq,
> > > > > > +                          struct dp_packet_batch *batch,
> > > > > > +                          int *qfill) {
> > > > > > +    struct netdev_dpdk *dev = netdev_dpdk_cast(rxq->netdev);
> > > > > > +    int fwd_rx;
> > > > > > +    int ret;
> > > > > > +
> > > > > > +    fwd_rx = netdev_dpdk_vdpa_rxq_recv_impl(dev->relay,
> > > > > > + rxq->queue_id);
> > > > > I'm still not clear about the above function.
> > > > > So netdev_dpdk_vdpa_recv_impl()
> > > > >     netdev_dpdk_vdpa_forward_traffic(), with a queue pair as
> parameter
> > > > >         ...
> > > > >         rte_eth_rx_burst(qpair->port_id_rx...)
> > > > >         ...
> > > > >         rte_eth_tx_burst(qpair->port_id_tx...)
> > > > >
> > > > > So looks like forwarding between vf to vhostuser and vice versa
> > > > > is done in this function.
> > > > >
> > > > > > +    ret = netdev_dpdk_rxq_recv(rxq, batch, qfill);
> > > > >
> > > > > Then why do we call netdev_dpdk_rxq_recv() above again?
> > > > > Are packets received above the same packets as
> > > > > rte_eth_rx_burst() previously called in
> netdev_dpdk_vdpa_forward_traffic()?
> > > > >
> > >
> > > netdev_dpdk_vdpa_recv_impl() first calls: rte_eth_rx_burst and
> > > rte_eth_tx_burst in order to forward between vf to vhostuser and
> > > vice versa.
> > > After rx_burst and tx_burst is done, we call netdev_dpdk_rxq_recv()
> > > in order to receive packets for the representor.
> > > The queue is different in rte_eth_rx_burst, rte_eth_tx_burst and
> > > netdev_dpdk_rxq_recv.
> 
> So what traffic goes into the queues seen by (rte_eth_rx_burst,
> rte_eth_tx_burst)
> and what traffic goes to queues seen by netdev_dpdk_rxq_recv()?
> 
The traffic that goes through rte_eth_rx_burst and rte_eth_tx_burst is the
traffic from vm to vf or from vf to vm (the "forwarder" traffic).
The traffic that goes through netdev_dpdk_rxq_recv() is the packets sent to 
the representor's queues. 

> And if the HW mode is enabled, then we can remove calling the
> rte_eth_rx_burst() and
> rte_eth_tx_burst() because HW directly places packet into the virtio queue.
> Do I understand correctly?

Yes, you understand correctly, when the HW mode is enabled the packets go directly to the virtio queue and the "forwarding" will 
take place in the HW and not through SW, so rte_eth_rx_burst() and rte_eth_tx_burst() won't be used.
We will support both HW and SW modes, so we won't remove this SW implementation, we will add support for HW. 

> 
> > > We use this methodology in order to use the free cycles allocated
> > > for the representor's receive. When using HW offload most of the
> > > packets will be offloaded and won't go through OVS.
> > > Attached is the RFC and community discussion.
> > > I hope it is clearer now.
> 
> Yes, thanks!
> 
> >
> > Sorry for adding an attachment, you can find the RFC here:
> >
> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail
> > .openvswitch.org%2Fpipermail%2Fovs-dev%2F2019-
> May%2F359007.html&amp;da
> >
> ta=02%7C01%7Cnoae%40mellanox.com%7C9820a68db12b4c9a39d008d75be7
> e21b%7C
> >
> a652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C637078923809520198&a
> mp;sdat
> >
> a=VgVzknn2uKZ6RWTjNG8q8%2BHmzphtf4P4mUxg7WOfHgA%3D&amp;res
> erved=0
> > Or the whole thread:
> >
> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail
> > .openvswitch.org%2Fpipermail%2Fovs-dev%2F2019-
> May%2Fthread.html%233590
> >
> 07&amp;data=02%7C01%7Cnoae%40mellanox.com%7C9820a68db12b4c9a39
> d008d75b
> >
> e7e21b%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C63707892380
> 9520198
> >
> &amp;sdata=fOV8CewQSwCRxKR8HmaZ1JQGR9P2gsTJ0XlQU3%2FaQE4%3D
> &amp;reserv
> > ed=0
> >
> Thank you for the link, it's very helpful.
> 
> I'd like to test this patch set but IIUC this needs a card with port representer
> support.
> If there is any other way, please let me know.
> 
> Regards,
> William
William Tu Oct. 30, 2019, 9:21 p.m. UTC | #10
On Tue, Oct 29, 2019 at 4:20 AM Noa Levy <noae@mellanox.com> wrote:
>
>
>
> > -----Original Message-----
> > From: William Tu [mailto:u9012063@gmail.com]
> > Sent: Monday, October 28, 2019 10:46 PM
> > To: Noa Levy <noae@mellanox.com>
> > Cc: ovs-dev@openvswitch.org; Oz Shlomo <ozsh@mellanox.com>; Majd
> > Dibbiny <majd@mellanox.com>; Ameer Mahagneh
> > <ameerm@mellanox.com>; Eli Britstein <elibr@mellanox.com>
> > Subject: Re: [ovs-dev] [PATCH ovs v3 2/2] netdev-dpdk: Add dpdkvdpa port
> >
> > Hi Noa,
> >
> > Thanks for your reply.
> >
> > > > > > Hi Noa,
> > > > > >
> > > > > > I have a couple more questions. I'm still at the learning stage
> > > > > > of this new feature, thanks in advance for your patience.
> > > > > >
> > > > > > On Thu, Oct 17, 2019 at 02:16:56PM +0300, Noa Ezra wrote:
> > > > > > > dpdkvdpa netdev works with 3 components:
> > > > > > > vhost-user socket, vdpa device: real vdpa device or a VF and
> > > > > > > representor of "vdpa device".
> > > > > >
> > > > > > What NIC card support this feature?
> > > > > > I don't have real vdpa device, can I use Intel X540 VF feature?
> > > > > >
> > > > >
> > > > > This feature will have two modes, SW and HW.
> > > > > The SW mode doesn't depend on a real vdpa device and allows you to
> > > > > use this feature even if you don't have a NIC that support it.
> > > Although you need to use representors, so you need your NIC to support
> > it.
> > > > > The HW mode will be implemented in the future and will use a real
> > > > > vdpa device. It will be better to use the HW mode if you have a
> > > > > NIC that support
> > > > it.
> > > > >
> > > > > For now, we only support the SW mode, when vdpa will have support
> > > > > in dpdk, we will add the HW mode to OVS.
> > > > >
> > > > > > >
> > > > > > > In order to add a new vDPA port, add a new port to existing
> > > > > > > bridge with type dpdkvdpa and vDPA options:
> > > > > > > ovs-vsctl add-port br0 vdpa0 -- set Interface vdpa0 type=dpdkvdpa
> > > > > > >    options:vdpa-socket-path=<sock path>
> > > > > > >    options:vdpa-accelerator-devargs=<VF pci id>
> > > > > > >    options:dpdk-devargs=<vdpa pci id>,representor=[id]
> > > > > > >
> > > > > > > On this command OVS will create a new netdev:
> > > > > > > 1. Register vhost-user-client device.
> > > > > > > 2. Open and configure VF dpdk port.
> > > > > > > 3. Open and configure representor dpdk port.
> > > > > > >
> > > > > > > The new netdev will use netdev_rxq_recv() function in order to
> > > > > > > receive packets from VF and push to vhost-user and receive
> > > > > > > packets from vhost-user and push to VF.
> > > > > > >
> > > > > > > Signed-off-by: Noa Ezra <noae@mellanox.com>
> > > > > > > Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
> > > > > > > ---
> > > > > > >  Documentation/automake.mk           |   1 +
> > > > > > >  Documentation/topics/dpdk/index.rst |   1 +
> > > > > > >  Documentation/topics/dpdk/vdpa.rst  |  90
> > ++++++++++++++++++++
> > > > > > >  NEWS                                |   1 +
> > > > > > >  lib/netdev-dpdk.c                   | 162
> > > > > > ++++++++++++++++++++++++++++++++++++
> > > > > > >  vswitchd/vswitch.xml                |  25 ++++++
> > > > > > >  6 files changed, 280 insertions(+)  create mode 100644
> > > > > > > Documentation/topics/dpdk/vdpa.rst
> > > > > > >
> > > > > > > diff --git a/Documentation/automake.mk
> > > > > b/Documentation/automake.mk
> > > > > > > index cd68f3b..ee574bc 100644
> > > > > > > --- a/Documentation/automake.mk
> > > > > > > +++ b/Documentation/automake.mk
> > > > > > > @@ -43,6 +43,7 @@ DOC_SOURCE = \
> > > > > > >         Documentation/topics/dpdk/ring.rst \
> > > > > > >         Documentation/topics/dpdk/vdev.rst \
> > > > > > >         Documentation/topics/dpdk/vhost-user.rst \
> > > > > > > +       Documentation/topics/dpdk/vdpa.rst \
> > > > > > >         Documentation/topics/fuzzing/index.rst \
> > > > > > >         Documentation/topics/fuzzing/what-is-fuzzing.rst \
> > > > > > >
> > > > > > > Documentation/topics/fuzzing/ovs-fuzzing-infrastructure.rst \
> > > > > > > diff --git a/Documentation/topics/dpdk/index.rst
> > > > > > > b/Documentation/topics/dpdk/index.rst
> > > > > > > index cf24a7b..c1d4ea7 100644
> > > > > > > --- a/Documentation/topics/dpdk/index.rst
> > > > > > > +++ b/Documentation/topics/dpdk/index.rst
> > > > > > > @@ -41,3 +41,4 @@ The DPDK Datapath
> > > > > > >     /topics/dpdk/pdump
> > > > > > >     /topics/dpdk/jumbo-frames
> > > > > > >     /topics/dpdk/memory
> > > > > > > +   /topics/dpdk/vdpa
> > > > > > > diff --git a/Documentation/topics/dpdk/vdpa.rst
> > > > > > > b/Documentation/topics/dpdk/vdpa.rst
> > > > > > > new file mode 100644
> >
> > <snip>
> >
> > > > > > > +2357,23 @@ netdev_dpdk_rxq_recv(struct netdev_rxq *rxq,
> > > > > > > +struct
> > > > > > dp_packet_batch *batch,
> > > > > > >      return 0;
> > > > > > >  }
> > > > > > >
> > > > > > > +static int
> > > > > > > +netdev_dpdk_vdpa_rxq_recv(struct netdev_rxq *rxq,
> > > > > > > +                          struct dp_packet_batch *batch,
> > > > > > > +                          int *qfill) {
> > > > > > > +    struct netdev_dpdk *dev = netdev_dpdk_cast(rxq->netdev);
> > > > > > > +    int fwd_rx;
> > > > > > > +    int ret;
> > > > > > > +
> > > > > > > +    fwd_rx = netdev_dpdk_vdpa_rxq_recv_impl(dev->relay,
> > > > > > > + rxq->queue_id);
> > > > > > I'm still not clear about the above function.
> > > > > > So netdev_dpdk_vdpa_recv_impl()
> > > > > >     netdev_dpdk_vdpa_forward_traffic(), with a queue pair as
> > parameter
> > > > > >         ...
> > > > > >         rte_eth_rx_burst(qpair->port_id_rx...)
> > > > > >         ...
> > > > > >         rte_eth_tx_burst(qpair->port_id_tx...)
> > > > > >
> > > > > > So looks like forwarding between vf to vhostuser and vice versa
> > > > > > is done in this function.
> > > > > >
> > > > > > > +    ret = netdev_dpdk_rxq_recv(rxq, batch, qfill);
> > > > > >
> > > > > > Then why do we call netdev_dpdk_rxq_recv() above again?
> > > > > > Are packets received above the same packets as
> > > > > > rte_eth_rx_burst() previously called in
> > netdev_dpdk_vdpa_forward_traffic()?
> > > > > >
> > > >
> > > > netdev_dpdk_vdpa_recv_impl() first calls: rte_eth_rx_burst and
> > > > rte_eth_tx_burst in order to forward between vf to vhostuser and
> > > > vice versa.
> > > > After rx_burst and tx_burst is done, we call netdev_dpdk_rxq_recv()
> > > > in order to receive packets for the representor.
> > > > The queue is different in rte_eth_rx_burst, rte_eth_tx_burst and
> > > > netdev_dpdk_rxq_recv.
> >
> > So what traffic goes into the queues seen by (rte_eth_rx_burst,
> > rte_eth_tx_burst)
> > and what traffic goes to queues seen by netdev_dpdk_rxq_recv()?
> >
> The traffic that goes through rte_eth_rx_burst and rte_eth_tx_burst is the
> traffic from vm to vf or from vf to vm (the "forwarder" traffic).
> The traffic that goes through netdev_dpdk_rxq_recv() is the packets sent to
> the representor's queues.
>
> > And if the HW mode is enabled, then we can remove calling the
> > rte_eth_rx_burst() and
> > rte_eth_tx_burst() because HW directly places packet into the virtio queue.
> > Do I understand correctly?
>
> Yes, you understand correctly, when the HW mode is enabled the packets go directly to the virtio queue and the "forwarding" will
> take place in the HW and not through SW, so rte_eth_rx_burst() and rte_eth_tx_burst() won't be used.
> We will support both HW and SW modes, so we won't remove this SW implementation, we will add support for HW.
>

Hi Noa,

Thank you!
So when using HW mode, OVS does not need to handle packet forwarding
(no rte_eth_rx_burst and rte_eth_tx_burst)
But when using HW mode, does OVS need to handle vhost user vring kick
and call event?
This is when a guest kicks the host/ovs because it has placed buffers
onto a virtqueue, or when
host/ovs trying to kick using call file descriptor to inform guest
there is incoming packets.

Or this two events are also handled in the hardware so OVS does not
need to do anything?

Thank you.
Regards,
William
diff mbox series

Patch

diff --git a/Documentation/automake.mk b/Documentation/automake.mk
index cd68f3b..ee574bc 100644
--- a/Documentation/automake.mk
+++ b/Documentation/automake.mk
@@ -43,6 +43,7 @@  DOC_SOURCE = \
 	Documentation/topics/dpdk/ring.rst \
 	Documentation/topics/dpdk/vdev.rst \
 	Documentation/topics/dpdk/vhost-user.rst \
+	Documentation/topics/dpdk/vdpa.rst \
 	Documentation/topics/fuzzing/index.rst \
 	Documentation/topics/fuzzing/what-is-fuzzing.rst \
 	Documentation/topics/fuzzing/ovs-fuzzing-infrastructure.rst \
diff --git a/Documentation/topics/dpdk/index.rst b/Documentation/topics/dpdk/index.rst
index cf24a7b..c1d4ea7 100644
--- a/Documentation/topics/dpdk/index.rst
+++ b/Documentation/topics/dpdk/index.rst
@@ -41,3 +41,4 @@  The DPDK Datapath
    /topics/dpdk/pdump
    /topics/dpdk/jumbo-frames
    /topics/dpdk/memory
+   /topics/dpdk/vdpa
diff --git a/Documentation/topics/dpdk/vdpa.rst b/Documentation/topics/dpdk/vdpa.rst
new file mode 100644
index 0000000..34c5300
--- /dev/null
+++ b/Documentation/topics/dpdk/vdpa.rst
@@ -0,0 +1,90 @@ 
+..
+      Copyright (c) 2019 Mellanox Technologies, Ltd.
+
+      Licensed under the Apache License, Version 2.0 (the "License");
+      you may not use this file except in compliance with the License.
+      You may obtain a copy of the License at:
+
+          http://www.apache.org/licenses/LICENSE-2.0
+
+      Unless required by applicable law or agreed to in writing, software
+      distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+      WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+      License for the specific language governing permissions and limitations
+      under the License.
+
+      Convention for heading levels in Open vSwitch documentation:
+
+      =======  Heading 0 (reserved for the title in a document)
+      -------  Heading 1
+      ~~~~~~~  Heading 2
+      +++++++  Heading 3
+      '''''''  Heading 4
+
+      Avoid deeper levels because they do not render well.
+
+
+===============
+DPDK VDPA Ports
+===============
+
+In user space there are two main approaches to communicate with a guest (VM),
+using virtIO ports (e.g. netdev type=dpdkvhoshuser/dpdkvhostuserclient) or
+SR-IOV using phy ports (e.g. netdev type = dpdk).
+Phy ports allow working with port representor which is attached to the OVS and
+a matching VF is given with pass-through to the guest.
+HW rules can process packets from up-link and direct them to the VF without
+going through SW (OVS) and therefore using phy ports gives the best
+performance.
+However, SR-IOV architecture requires that the guest will use a driver which is
+specific to the underlying HW. Specific HW driver has two main drawbacks:
+1. Breaks virtualization in some sense (guest aware of the HW), can also limit
+the type of images supported.
+2. Less natural support for live migration.
+
+Using virtIO port solves both problems, but reduces performance and causes
+losing of some functionality, for example, for some HW offload, working
+directly with virtIO cannot be supported.
+
+We created a new netdev type- dpdkvdpa. dpdkvdpa port solves this conflict.
+The new netdev is basically very similar to regular dpdk netdev but it has some
+additional functionally.
+This port translates between phy port to virtIO port, it takes packets from
+rx-queue and send them to the suitable tx-queue and allows to transfer packets
+from virtIO guest (VM) to a VF and vice versa and benefit both SR-IOV and
+virtIO.
+
+Quick Example
+-------------
+
+Configure OVS bridge and ports
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+you must first create a bridge and add ports to the switch.
+Since the dpdkvdpa port is configured as a client, the vdpa-socket-path must be
+configured by the user.
+VHOST_USER_SOCKET_PATH=/path/to/socket
+
+    $ ovs-vsctl add-br br0-ovs -- set bridge br0-ovs datapath_type=netdev
+    $ ovs-vsctl add-port br0-ovs pf -- set Interface pf \
+    type=dpdk options:dpdk-devargs=<pf pci id>
+    $ ovs-vsctl add-port br0 vdpa0 -- set Interface vdpa0 type=dpdkvdpa \
+    options:vdpa-socket-path=VHOST_USER_SOCKET_PATH \
+    options:vdpa-accelerator-devargs=<vf pci id> \
+    options:dpdk-devargs=<pf pci id>,representor=[id]
+
+Once the ports have been added to the switch, they must be added to the guest.
+
+Adding vhost-user ports to the guest (QEMU)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Attach the vhost-user device sockets to the guest. To do this, you must pass
+the following parameters to QEMU:
+
+    -chardev socket,id=char1,path=$VHOST_USER_SOCKET_PATH,server
+    -netdev type=vhost-user,id=mynet1,chardev=char1,vhostforce
+    -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1
+
+QEMU will wait until the port is created successfully in OVS to boot the VM.
+In this mode, in case the switch will crash, the vHost ports will reconnect
+automatically once it is brought back.
diff --git a/NEWS b/NEWS
index f5a0b8f..6f315c6 100644
--- a/NEWS
+++ b/NEWS
@@ -542,6 +542,7 @@  v2.6.0 - 27 Sep 2016
      * Remove dpdkvhostcuse port type.
      * OVS client mode for vHost and vHost reconnect (Requires QEMU 2.7)
      * 'dpdkvhostuserclient' port type.
+     * 'dpdkvdpa' port type.
    - Increase number of registers to 16.
    - ovs-benchmark: This utility has been removed due to lack of use and
      bitrot.
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index bc20d68..16ddf58 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -47,6 +47,7 @@ 
 #include "dpif-netdev.h"
 #include "fatal-signal.h"
 #include "netdev-provider.h"
+#include "netdev-dpdk-vdpa.h"
 #include "netdev-vport.h"
 #include "odp-util.h"
 #include "openvswitch/dynamic-string.h"
@@ -137,6 +138,9 @@  typedef uint16_t dpdk_port_t;
 /* Legacy default value for vhost tx retries. */
 #define VHOST_ENQ_RETRY_DEF 8
 
+/* Size of VDPA custom stats. */
+#define VDPA_CUSTOM_STATS_SIZE          4
+
 #define IF_NAME_SZ (PATH_MAX > IFNAMSIZ ? PATH_MAX : IFNAMSIZ)
 
 static const struct rte_eth_conf port_conf = {
@@ -461,6 +465,8 @@  struct netdev_dpdk {
         int rte_xstats_ids_size;
         uint64_t *rte_xstats_ids;
     );
+
+    struct netdev_dpdk_vdpa_relay *relay;
 };
 
 struct netdev_rxq_dpdk {
@@ -1346,6 +1352,30 @@  netdev_dpdk_construct(struct netdev *netdev)
     return err;
 }
 
+static int
+netdev_dpdk_vdpa_construct(struct netdev *netdev)
+{
+    struct netdev_dpdk *dev;
+    int err;
+
+    err = netdev_dpdk_construct(netdev);
+    if (err) {
+        VLOG_ERR("netdev_dpdk_construct failed. Port: %s\n", netdev->name);
+        goto out;
+    }
+
+    ovs_mutex_lock(&dpdk_mutex);
+    dev = netdev_dpdk_cast(netdev);
+    dev->relay = netdev_dpdk_vdpa_alloc_relay();
+    if (!dev->relay) {
+        err = ENOMEM;
+    }
+
+    ovs_mutex_unlock(&dpdk_mutex);
+out:
+    return err;
+}
+
 static void
 common_destruct(struct netdev_dpdk *dev)
     OVS_REQUIRES(dpdk_mutex)
@@ -1428,6 +1458,19 @@  dpdk_vhost_driver_unregister(struct netdev_dpdk *dev OVS_UNUSED,
 }
 
 static void
+netdev_dpdk_vdpa_destruct(struct netdev *netdev)
+{
+    struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
+
+    ovs_mutex_lock(&dpdk_mutex);
+    netdev_dpdk_vdpa_destruct_impl(dev->relay);
+    rte_free(dev->relay);
+    ovs_mutex_unlock(&dpdk_mutex);
+
+    netdev_dpdk_destruct(netdev);
+}
+
+static void
 netdev_dpdk_vhost_destruct(struct netdev *netdev)
 {
     struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
@@ -1878,6 +1921,47 @@  out:
 }
 
 static int
+netdev_dpdk_vdpa_set_config(struct netdev *netdev, const struct smap *args,
+                            char **errp)
+{
+    struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
+    const char *vdpa_accelerator_devargs =
+                smap_get(args, "vdpa-accelerator-devargs");
+    const char *vdpa_socket_path =
+                smap_get(args, "vdpa-socket-path");
+    int err = 0;
+
+    if ((vdpa_accelerator_devargs == NULL) || (vdpa_socket_path == NULL)) {
+        VLOG_ERR("netdev_dpdk_vdpa_set_config failed."
+                 "Required arguments are missing for VDPA port %s",
+                 netdev->name);
+        goto free_relay;
+    }
+
+    err = netdev_dpdk_set_config(netdev, args, errp);
+    if (err) {
+        VLOG_ERR("netdev_dpdk_set_config failed. Port: %s", netdev->name);
+        goto free_relay;
+    }
+
+    err = netdev_dpdk_vdpa_config_impl(dev->relay, dev->port_id,
+                                       vdpa_socket_path,
+                                       vdpa_accelerator_devargs);
+    if (err) {
+        VLOG_ERR("netdev_dpdk_vdpa_config_impl failed. Port %s",
+                 netdev->name);
+        goto free_relay;
+    }
+
+    goto out;
+
+free_relay:
+    rte_free(dev->relay);
+out:
+    return err;
+}
+
+static int
 netdev_dpdk_ring_set_config(struct netdev *netdev, const struct smap *args,
                             char **errp OVS_UNUSED)
 {
@@ -2273,6 +2357,23 @@  netdev_dpdk_rxq_recv(struct netdev_rxq *rxq, struct dp_packet_batch *batch,
     return 0;
 }
 
+static int
+netdev_dpdk_vdpa_rxq_recv(struct netdev_rxq *rxq,
+                          struct dp_packet_batch *batch,
+                          int *qfill)
+{
+    struct netdev_dpdk *dev = netdev_dpdk_cast(rxq->netdev);
+    int fwd_rx;
+    int ret;
+
+    fwd_rx = netdev_dpdk_vdpa_rxq_recv_impl(dev->relay, rxq->queue_id);
+    ret = netdev_dpdk_rxq_recv(rxq, batch, qfill);
+    if ((ret == EAGAIN) && fwd_rx) {
+        return 0;
+    }
+    return ret;
+}
+
 static inline int
 netdev_dpdk_qos_run(struct netdev_dpdk *dev, struct rte_mbuf **pkts,
                     int cnt, bool should_steal)
@@ -2854,6 +2955,29 @@  netdev_dpdk_vhost_get_custom_stats(const struct netdev *netdev,
 }
 
 static int
+netdev_dpdk_vdpa_get_custom_stats(const struct netdev *netdev,
+                                  struct netdev_custom_stats *custom_stats)
+{
+    struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
+    int err = 0;
+
+    ovs_mutex_lock(&dev->mutex);
+
+    custom_stats->size = VDPA_CUSTOM_STATS_SIZE;
+    custom_stats->counters = xcalloc(custom_stats->size,
+                                     sizeof *custom_stats->counters);
+    err = netdev_dpdk_vdpa_get_custom_stats_impl(dev->relay,
+                                                 custom_stats);
+    if (err) {
+        VLOG_ERR("netdev_dpdk_vdpa_get_custom_stats_impl failed."
+                 "Port %s\n", netdev->name);
+    }
+
+    ovs_mutex_unlock(&dev->mutex);
+    return err;
+}
+
+static int
 netdev_dpdk_get_features(const struct netdev *netdev,
                          enum netdev_features *current,
                          enum netdev_features *advertised,
@@ -4237,6 +4361,31 @@  netdev_dpdk_vhost_reconfigure(struct netdev *netdev)
 }
 
 static int
+netdev_dpdk_vdpa_reconfigure(struct netdev *netdev)
+{
+    struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
+    int err;
+
+    err = netdev_dpdk_reconfigure(netdev);
+    if (err) {
+        VLOG_ERR("netdev_dpdk_reconfigure failed. Port %s", netdev->name);
+        goto out;
+    }
+
+    ovs_mutex_lock(&dev->mutex);
+    err = netdev_dpdk_vdpa_update_relay(dev->relay, dev->dpdk_mp->mp,
+                                        dev->up.n_rxq);
+    if (err) {
+        VLOG_ERR("netdev_dpdk_vdpa_update_relay failed. Port %s",
+                 netdev->name);
+    }
+
+    ovs_mutex_unlock(&dev->mutex);
+out:
+    return err;
+}
+
+static int
 netdev_dpdk_vhost_client_reconfigure(struct netdev *netdev)
 {
     struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
@@ -4456,6 +4605,18 @@  static const struct netdev_class dpdk_vhost_client_class = {
     .rxq_enabled = netdev_dpdk_vhost_rxq_enabled,
 };
 
+static const struct netdev_class dpdk_vdpa_class = {
+    .type = "dpdkvdpa",
+    NETDEV_DPDK_CLASS_COMMON,
+    .construct = netdev_dpdk_vdpa_construct,
+    .destruct = netdev_dpdk_vdpa_destruct,
+    .rxq_recv = netdev_dpdk_vdpa_rxq_recv,
+    .set_config = netdev_dpdk_vdpa_set_config,
+    .reconfigure = netdev_dpdk_vdpa_reconfigure,
+    .get_custom_stats = netdev_dpdk_vdpa_get_custom_stats,
+    .send = netdev_dpdk_eth_send
+};
+
 void
 netdev_dpdk_register(void)
 {
@@ -4463,4 +4624,5 @@  netdev_dpdk_register(void)
     netdev_register_provider(&dpdk_ring_class);
     netdev_register_provider(&dpdk_vhost_class);
     netdev_register_provider(&dpdk_vhost_client_class);
+    netdev_register_provider(&dpdk_vdpa_class);
 }
diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
index 9a743c0..9e94950 100644
--- a/vswitchd/vswitch.xml
+++ b/vswitchd/vswitch.xml
@@ -2640,6 +2640,13 @@ 
           <dd>
             A pair of virtual devices that act as a patch cable.
           </dd>
+
+          <dt><code>dpdkvdpa</code></dt>
+          <dd>
+            The dpdk vDPA port allows forwarding bi-directional traffic between
+            SR-IOV virtual functions (VFs) and VirtIO devices in virtual
+            machines (VMs).
+          </dd>
         </dl>
       </column>
     </group>
@@ -3156,6 +3163,24 @@  ovs-vsctl add-port br0 p0 -- set Interface p0 type=patch options:peer=p1 \
         </p>
       </column>
 
+      <column name="options" key="vdpa-socket-path"
+              type='{"type": "string"}'>
+        <p>
+          The value specifies the path to the socket associated with a VDPA
+          port that will be created by QEMU.
+          Only supported by dpdkvdpa interfaces.
+        </p>
+      </column>
+
+      <column name="options" key="vdpa-accelerator-devargs"
+              type='{"type": "string"}'>
+        <p>
+          The value specifies the PCI address associated with the virtual
+          function.
+          Only supported by dpdkvdpa interfaces.
+        </p>
+      </column>
+
       <column name="options" key="dq-zero-copy"
               type='{"type": "boolean"}'>
         <p>