From patchwork Thu Apr 2 11:13:47 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Noa Ezra X-Patchwork-Id: 1265525 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=140.211.166.133; helo=hemlock.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=mellanox.com Received: from hemlock.osuosl.org (smtp2.osuosl.org [140.211.166.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 48tL6n5c4Fz9sQt for ; Thu, 2 Apr 2020 22:14:09 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by hemlock.osuosl.org (Postfix) with ESMTP id 1160D887B3; Thu, 2 Apr 2020 11:14:08 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from hemlock.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id wcGqeTOd6Bn5; Thu, 2 Apr 2020 11:14:03 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by hemlock.osuosl.org (Postfix) with ESMTP id 12DAA887EF; Thu, 2 Apr 2020 11:14:03 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id EEFFFC1D7F; Thu, 2 Apr 2020 11:14:02 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from silver.osuosl.org (smtp3.osuosl.org [140.211.166.136]) by lists.linuxfoundation.org (Postfix) with ESMTP id CAC98C07FF for ; Thu, 2 Apr 2020 11:14:01 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by silver.osuosl.org (Postfix) with ESMTP id 9A525203AE for ; Thu, 2 Apr 2020 11:14:01 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from silver.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id VpotDBnXXzrR for ; Thu, 2 Apr 2020 11:13:59 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from mellanox.co.il (mail-il-dmz.mellanox.com [193.47.165.129]) by silver.osuosl.org (Postfix) with ESMTP id 15507203A3 for ; Thu, 2 Apr 2020 11:13:58 +0000 (UTC) Received: from Internal Mail-Server by MTLPINE1 (envelope-from noae@mellanox.com) with ESMTPS (AES256-SHA encrypted); 2 Apr 2020 14:13:50 +0300 Received: from pegasus27.mtr.labs.mlnx. (pegasus27.mtr.labs.mlnx [10.210.16.14]) by labmailer.mlnx (8.13.8/8.13.8) with ESMTP id 032BDoaQ010235; Thu, 2 Apr 2020 14:13:50 +0300 From: Noa Ezra To: ovs-dev@openvswitch.org Date: Thu, 2 Apr 2020 11:13:47 +0000 Message-Id: <1585826027-330699-3-git-send-email-noae@mellanox.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1585826027-330699-1-git-send-email-noae@mellanox.com> References: <1585826027-330699-1-git-send-email-noae@mellanox.com> Cc: elibr@mellanox.com, Ameer Mahagneh , Noa Ezra Subject: [ovs-dev] [PATCH ovs v1 2/2] netdev-dpdk: Add dpdkvdpa port X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" dpdkvdpa netdev works with 3 components: vhost-user socket, vdpa device: real vdpa device or a VF and representor of "vdpa device". In order to add a new vDPA port, add a new port to existing bridge with type dpdkvdpa and vDPA options: ovs-vsctl add-port br0 vdpa0 -- set Interface vdpa0 type=dpdkvdpa options:vdpa-socket-path= options:vdpa-accelerator-devargs= options:dpdk-devargs=,representor=[id] On this command OVS will create a new netdev: 1. Register vhost-user-client device. 2. Open and configure VF dpdk port. 3. Open and configure representor dpdk port. The new netdev will use netdev_rxq_recv() function in order to receive packets from VF and push to vhost-user and receive packets from vhost-user and push to VF. Signed-off-by: Noa Ezra Reviewed-by: Oz Shlomo --- Documentation/automake.mk | 1 + Documentation/topics/dpdk/index.rst | 1 + Documentation/topics/dpdk/vdpa.rst | 90 ++++++++++++++++++++ NEWS | 1 + lib/netdev-dpdk.c | 164 +++++++++++++++++++++++++++++++++++- vswitchd/vswitch.xml | 25 ++++++ 6 files changed, 281 insertions(+), 1 deletion(-) create mode 100644 Documentation/topics/dpdk/vdpa.rst diff --git a/Documentation/automake.mk b/Documentation/automake.mk index f85c432..7caf6e7 100644 --- a/Documentation/automake.mk +++ b/Documentation/automake.mk @@ -41,6 +41,7 @@ DOC_SOURCE = \ Documentation/topics/dpdk/qos.rst \ Documentation/topics/dpdk/vdev.rst \ Documentation/topics/dpdk/vhost-user.rst \ + Documentation/topics/dpdk/vdpa.rst \ Documentation/topics/fuzzing/index.rst \ Documentation/topics/fuzzing/what-is-fuzzing.rst \ Documentation/topics/fuzzing/ovs-fuzzing-infrastructure.rst \ diff --git a/Documentation/topics/dpdk/index.rst b/Documentation/topics/dpdk/index.rst index a5be5e3..e8595c3 100644 --- a/Documentation/topics/dpdk/index.rst +++ b/Documentation/topics/dpdk/index.rst @@ -39,3 +39,4 @@ DPDK Support /topics/dpdk/qos /topics/dpdk/jumbo-frames /topics/dpdk/memory + /topics/dpdk/vdpa diff --git a/Documentation/topics/dpdk/vdpa.rst b/Documentation/topics/dpdk/vdpa.rst new file mode 100644 index 0000000..34c5300 --- /dev/null +++ b/Documentation/topics/dpdk/vdpa.rst @@ -0,0 +1,90 @@ +.. + Copyright (c) 2019 Mellanox Technologies, Ltd. + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at: + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, WITHOUT + WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the + License for the specific language governing permissions and limitations + under the License. + + Convention for heading levels in Open vSwitch documentation: + + ======= Heading 0 (reserved for the title in a document) + ------- Heading 1 + ~~~~~~~ Heading 2 + +++++++ Heading 3 + ''''''' Heading 4 + + Avoid deeper levels because they do not render well. + + +=============== +DPDK VDPA Ports +=============== + +In user space there are two main approaches to communicate with a guest (VM), +using virtIO ports (e.g. netdev type=dpdkvhoshuser/dpdkvhostuserclient) or +SR-IOV using phy ports (e.g. netdev type = dpdk). +Phy ports allow working with port representor which is attached to the OVS and +a matching VF is given with pass-through to the guest. +HW rules can process packets from up-link and direct them to the VF without +going through SW (OVS) and therefore using phy ports gives the best +performance. +However, SR-IOV architecture requires that the guest will use a driver which is +specific to the underlying HW. Specific HW driver has two main drawbacks: +1. Breaks virtualization in some sense (guest aware of the HW), can also limit +the type of images supported. +2. Less natural support for live migration. + +Using virtIO port solves both problems, but reduces performance and causes +losing of some functionality, for example, for some HW offload, working +directly with virtIO cannot be supported. + +We created a new netdev type- dpdkvdpa. dpdkvdpa port solves this conflict. +The new netdev is basically very similar to regular dpdk netdev but it has some +additional functionally. +This port translates between phy port to virtIO port, it takes packets from +rx-queue and send them to the suitable tx-queue and allows to transfer packets +from virtIO guest (VM) to a VF and vice versa and benefit both SR-IOV and +virtIO. + +Quick Example +------------- + +Configure OVS bridge and ports +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +you must first create a bridge and add ports to the switch. +Since the dpdkvdpa port is configured as a client, the vdpa-socket-path must be +configured by the user. +VHOST_USER_SOCKET_PATH=/path/to/socket + + $ ovs-vsctl add-br br0-ovs -- set bridge br0-ovs datapath_type=netdev + $ ovs-vsctl add-port br0-ovs pf -- set Interface pf \ + type=dpdk options:dpdk-devargs= + $ ovs-vsctl add-port br0 vdpa0 -- set Interface vdpa0 type=dpdkvdpa \ + options:vdpa-socket-path=VHOST_USER_SOCKET_PATH \ + options:vdpa-accelerator-devargs= \ + options:dpdk-devargs=,representor=[id] + +Once the ports have been added to the switch, they must be added to the guest. + +Adding vhost-user ports to the guest (QEMU) +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Attach the vhost-user device sockets to the guest. To do this, you must pass +the following parameters to QEMU: + + -chardev socket,id=char1,path=$VHOST_USER_SOCKET_PATH,server + -netdev type=vhost-user,id=mynet1,chardev=char1,vhostforce + -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1 + +QEMU will wait until the port is created successfully in OVS to boot the VM. +In this mode, in case the switch will crash, the vHost ports will reconnect +automatically once it is brought back. diff --git a/NEWS b/NEWS index 70bd175..79ed080 100644 --- a/NEWS +++ b/NEWS @@ -45,6 +45,7 @@ v2.13.0 - 14 Feb 2020 * Add hardware offload support for output, drop, set of MAC, IPv4 and TCP/UDP ports actions (experimental). * Add experimental support for TSO. + * 'dpdkvdpa' port type. - RSTP: * The rstp_statistics column in Port table will only be updated every stats-update-interval configured in Open_vSwitch table. diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 44ebf96..ce7ed7e 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -54,6 +54,7 @@ #include "fatal-signal.h" #include "if-notifier.h" #include "netdev-provider.h" +#include "netdev-dpdk-vdpa.h" #include "netdev-vport.h" #include "odp-util.h" #include "openvswitch/dynamic-string.h" @@ -532,6 +533,8 @@ struct netdev_dpdk { int rte_xstats_ids_size; uint64_t *rte_xstats_ids; ); + + struct netdev_dpdk_vdpa_relay *relay; }; struct netdev_rxq_dpdk { @@ -541,6 +544,7 @@ struct netdev_rxq_dpdk { static void netdev_dpdk_destruct(struct netdev *netdev); static void netdev_dpdk_vhost_destruct(struct netdev *netdev); +static void netdev_dpdk_vdpa_destruct(struct netdev *netdev); static int netdev_dpdk_get_sw_custom_stats(const struct netdev *, struct netdev_custom_stats *); @@ -555,7 +559,8 @@ static bool is_dpdk_class(const struct netdev_class *class) { return class->destruct == netdev_dpdk_destruct - || class->destruct == netdev_dpdk_vhost_destruct; + || class->destruct == netdev_dpdk_vhost_destruct + || class->destruct == netdev_dpdk_vdpa_destruct; } /* DPDK NIC drivers allocate RX buffers at a particular granularity, typically @@ -1432,6 +1437,30 @@ netdev_dpdk_construct(struct netdev *netdev) return err; } +static int +netdev_dpdk_vdpa_construct(struct netdev *netdev) +{ + struct netdev_dpdk *dev; + int err; + + err = netdev_dpdk_construct(netdev); + if (err) { + VLOG_ERR("netdev_dpdk_construct failed. Port: %s\n", netdev->name); + goto out; + } + + ovs_mutex_lock(&dpdk_mutex); + dev = netdev_dpdk_cast(netdev); + dev->relay = netdev_dpdk_vdpa_alloc_relay(); + if (!dev->relay) { + err = ENOMEM; + } + + ovs_mutex_unlock(&dpdk_mutex); +out: + return err; +} + static void common_destruct(struct netdev_dpdk *dev) OVS_REQUIRES(dpdk_mutex) @@ -1515,6 +1544,19 @@ dpdk_vhost_driver_unregister(struct netdev_dpdk *dev OVS_UNUSED, } static void +netdev_dpdk_vdpa_destruct(struct netdev *netdev) +{ + struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); + + ovs_mutex_lock(&dpdk_mutex); + netdev_dpdk_vdpa_destruct_impl(dev->relay); + rte_free(dev->relay); + ovs_mutex_unlock(&dpdk_mutex); + + netdev_dpdk_destruct(netdev); +} + +static void netdev_dpdk_vhost_destruct(struct netdev *netdev) { struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); @@ -2018,6 +2060,50 @@ out: } static int + +netdev_dpdk_vdpa_set_config(struct netdev *netdev, const struct smap *args, + char **errp) +{ + struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); + const char *vdpa_accelerator_devargs = + smap_get(args, "vdpa-accelerator-devargs"); + const char *vdpa_socket_path = + smap_get(args, "vdpa-socket-path"); + int vdpa_max_queues = smap_get_int(args, "vdpa-max-queues", -1); + int err = 0; + + if ((vdpa_accelerator_devargs == NULL) || (vdpa_socket_path == NULL)) { + VLOG_ERR("netdev_dpdk_vdpa_set_config failed." + "Required arguments are missing for VDPA port %s", + netdev->name); + goto free_relay; + } + + err = netdev_dpdk_set_config(netdev, args, errp); + if (err) { + VLOG_ERR("netdev_dpdk_set_config failed. Port: %s", netdev->name); + goto free_relay; + } + + err = netdev_dpdk_vdpa_config_impl(dev->relay, dev->port_id, + vdpa_socket_path, + vdpa_accelerator_devargs, + vdpa_max_queues); + if (err) { + VLOG_ERR("netdev_dpdk_vdpa_config_impl failed. Port %s", + netdev->name); + goto free_relay; + } + + goto out; + +free_relay: + rte_free(dev->relay); +out: + return err; +} + +static int netdev_dpdk_vhost_client_set_config(struct netdev *netdev, const struct smap *args, char **errp OVS_UNUSED) @@ -2479,6 +2565,23 @@ netdev_dpdk_rxq_recv(struct netdev_rxq *rxq, struct dp_packet_batch *batch, return 0; } +static int +netdev_dpdk_vdpa_rxq_recv(struct netdev_rxq *rxq, + struct dp_packet_batch *batch, + int *qfill) +{ + struct netdev_dpdk *dev = netdev_dpdk_cast(rxq->netdev); + int fwd_rx; + int ret; + + fwd_rx = netdev_dpdk_vdpa_rxq_recv_impl(dev->relay, rxq->queue_id); + ret = netdev_dpdk_rxq_recv(rxq, batch, qfill); + if ((ret == EAGAIN) && fwd_rx) { + return 0; + } + return ret; +} + static inline int netdev_dpdk_qos_run(struct netdev_dpdk *dev, struct rte_mbuf **pkts, int cnt, bool should_steal) @@ -3244,6 +3347,26 @@ netdev_dpdk_get_sw_custom_stats(const struct netdev *netdev, } static int +netdev_dpdk_vdpa_get_custom_stats(const struct netdev *netdev, + struct netdev_custom_stats *custom_stats) +{ + struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); + int err = 0; + + ovs_mutex_lock(&dev->mutex); + + err = netdev_dpdk_vdpa_get_custom_stats_impl(dev->relay, + custom_stats); + if (err) { + VLOG_ERR("netdev_dpdk_vdpa_get_custom_stats_impl failed." + "Port %s\n", netdev->name); + } + + ovs_mutex_unlock(&dev->mutex); + return err; +} + +static int netdev_dpdk_get_features(const struct netdev *netdev, enum netdev_features *current, enum netdev_features *advertised, @@ -5022,6 +5145,31 @@ netdev_dpdk_vhost_reconfigure(struct netdev *netdev) } static int +netdev_dpdk_vdpa_reconfigure(struct netdev *netdev) +{ + struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); + int err; + + err = netdev_dpdk_reconfigure(netdev); + if (err) { + VLOG_ERR("netdev_dpdk_reconfigure failed. Port %s", netdev->name); + goto out; + } + + ovs_mutex_lock(&dev->mutex); + err = netdev_dpdk_vdpa_update_relay(dev->relay, dev->dpdk_mp->mp, + dev->up.n_rxq); + if (err) { + VLOG_ERR("netdev_dpdk_vdpa_update_relay failed. Port %s", + netdev->name); + } + + ovs_mutex_unlock(&dev->mutex); +out: + return err; +} + +static int netdev_dpdk_vhost_client_reconfigure(struct netdev *netdev) { struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); @@ -5310,10 +5458,24 @@ static const struct netdev_class dpdk_vhost_client_class = { .rxq_enabled = netdev_dpdk_vhost_rxq_enabled, }; +static const struct netdev_class dpdk_vdpa_class = { + .type = "dpdkvdpa", + NETDEV_DPDK_CLASS_COMMON, + .construct = netdev_dpdk_vdpa_construct, + .destruct = netdev_dpdk_vdpa_destruct, + .rxq_recv = netdev_dpdk_vdpa_rxq_recv, + .set_config = netdev_dpdk_vdpa_set_config, + .reconfigure = netdev_dpdk_vdpa_reconfigure, + .get_stats = netdev_dpdk_get_stats, + .get_custom_stats = netdev_dpdk_vdpa_get_custom_stats, + .send = netdev_dpdk_eth_send +}; + void netdev_dpdk_register(void) { netdev_register_provider(&dpdk_class); netdev_register_provider(&dpdk_vhost_class); netdev_register_provider(&dpdk_vhost_client_class); + netdev_register_provider(&dpdk_vdpa_class); } diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml index f9339af..e7715f5 100644 --- a/vswitchd/vswitch.xml +++ b/vswitchd/vswitch.xml @@ -2671,6 +2671,13 @@

+
dpdkvdpa
+
+ The dpdk vDPA port allows forwarding bi-directional traffic between + SR-IOV virtual functions (VFs) and VirtIO devices in virtual + machines (VMs). +
+ @@ -3219,6 +3226,24 @@ ovs-vsctl add-port br0 p0 -- set Interface p0 type=patch options:peer=p1 \

+ +

+ The value specifies the path to the socket associated with a VDPA + port that will be created by QEMU. + Only supported by dpdkvdpa interfaces. +

+
+ + +

+ The value specifies the PCI address associated with the virtual + function. + Only supported by dpdkvdpa interfaces. +

+
+