From patchwork Mon May 21 15:44:13 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ciara Loftus X-Patchwork-Id: 917705 X-Patchwork-Delegate: ian.stokes@intel.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=intel.com Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 40qNRJ0ps8z9s33 for ; Tue, 22 May 2018 01:46:03 +1000 (AEST) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id EB88613BE; Mon, 21 May 2018 15:44:20 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id 7DE7BFC1 for ; Mon, 21 May 2018 15:44:19 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 7C194180 for ; Mon, 21 May 2018 15:44:16 +0000 (UTC) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga106.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 21 May 2018 08:44:15 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.49,426,1520924400"; d="scan'208";a="42989019" Received: from sivswdev01.ir.intel.com (HELO localhost.localdomain) ([10.237.217.45]) by orsmga008.jf.intel.com with ESMTP; 21 May 2018 08:44:14 -0700 From: Ciara Loftus To: dev@openvswitch.org Date: Mon, 21 May 2018 16:44:13 +0100 Message-Id: <1526917453-17997-1-git-send-email-ciara.loftus@intel.com> X-Mailer: git-send-email 1.7.0.7 X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Subject: [ovs-dev] [RFC PATCH] netdev-dpdk: Integrate vHost User PMD X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org The vHost PMD brings vHost User port types ('dpdkvhostuser' and 'dpdkvhostuserclient') under control of DPDK's librte_ether API, like all other DPDK netdev types ('dpdk' and 'dpdkr'). In doing so, direct calls to DPDK's librte_vhost library are removed and replaced with librte_ether API calls, for which most of the infrastructure is already in place. This change has a number of benefits, including: * Reduced codebase (~200LOC removed) * More features automatically enabled for vHost ports eg. custom stats and additional get_status information. * OVS can be ignorant to changes in the librte_vhost API between DPDK releases potentially making upgrades easier and the OVS codebase less susceptible to change. The sum of all DPDK port types must not exceed RTE_MAX_ETHPORTS which is set and can be modified in the DPDK configuration. Prior to this patch this only applied to 'dpdk' and 'dpdkr' ports, but now applies to all DPDK port types including vHost User. Performance (pps) of the different topologies p2p, pvp, pvvp and vv has been measured to remain within a +/- 5% margin of existing performance. Signed-off-by: Ciara Loftus --- To function correctly, this patch requires the following patches to be applied to DPDK and OVS respectively: 1. http://dpdk.org/dev/patchwork/patch/39315/ (due to be backported to DPDK 17.11.3) 2. https://patchwork.ozlabs.org/patch/914653/ NEWS | 3 + lib/dpdk.c | 11 + lib/dpdk.h | 1 + lib/netdev-dpdk.c | 920 ++++++++++++++++++++------------------------------- tests/system-dpdk.at | 1 + 5 files changed, 372 insertions(+), 564 deletions(-) diff --git a/NEWS b/NEWS index ec548b0..55dc513 100644 --- a/NEWS +++ b/NEWS @@ -30,6 +30,9 @@ Post-v2.9.0 * New 'check-dpdk' Makefile target to run a new system testsuite. See Testing topic for the details. * Add LSC interrupt support for DPDK physical devices. + * Use DPDK's vHost PMD instead of direct library calls. This means the + maximum number of vHost ports is equal to RTE_MAX_ETHPORTS as defined + in the DPDK configuration. - Userspace datapath: * Commands ovs-appctl dpif-netdev/pmd-*-show can now work on a single PMD * Detailed PMD performance metrics available with new command diff --git a/lib/dpdk.c b/lib/dpdk.c index 00dd974..6cfc6fc 100644 --- a/lib/dpdk.c +++ b/lib/dpdk.c @@ -22,6 +22,7 @@ #include #include +#include #include #include #include @@ -32,6 +33,7 @@ #include "dirs.h" #include "fatal-signal.h" +#include "id-pool.h" #include "netdev-dpdk.h" #include "openvswitch/dynamic-string.h" #include "openvswitch/vlog.h" @@ -43,6 +45,7 @@ static FILE *log_stream = NULL; /* Stream for DPDK log redirection */ static char *vhost_sock_dir = NULL; /* Location of vhost-user sockets */ static bool vhost_iommu_enabled = false; /* Status of vHost IOMMU support */ +static struct id_pool *vhost_driver_ids; /* Pool of IDs for vHost PMDs */ static int process_vhost_flags(char *flag, const char *default_val, int size, @@ -457,6 +460,8 @@ dpdk_init__(const struct smap *ovs_other_config) } #endif + vhost_driver_ids = id_pool_create(0, RTE_MAX_ETHPORTS); + /* Finally, register the dpdk classes */ netdev_dpdk_register(); } @@ -498,6 +503,12 @@ dpdk_vhost_iommu_enabled(void) return vhost_iommu_enabled; } +struct id_pool * +dpdk_get_vhost_id_pool(void) +{ + return vhost_driver_ids; +} + void dpdk_set_lcore_id(unsigned cpu) { diff --git a/lib/dpdk.h b/lib/dpdk.h index b041535..c7143f7 100644 --- a/lib/dpdk.h +++ b/lib/dpdk.h @@ -39,5 +39,6 @@ void dpdk_set_lcore_id(unsigned cpu); const char *dpdk_get_vhost_sock_dir(void); bool dpdk_vhost_iommu_enabled(void); void print_dpdk_version(void); +struct id_pool *dpdk_get_vhost_id_pool(void); #endif /* dpdk.h */ diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index afddf6d..defc51d 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -31,6 +31,7 @@ #include #include #include +#include #include #include #include @@ -44,6 +45,7 @@ #include "dpdk.h" #include "dpif-netdev.h" #include "fatal-signal.h" +#include "id-pool.h" #include "netdev-provider.h" #include "netdev-vport.h" #include "odp-util.h" @@ -63,6 +65,7 @@ #include "unixctl.h" enum {VIRTIO_RXQ, VIRTIO_TXQ, VIRTIO_QNUM}; +enum {VHOST_SERVER_MODE, VHOST_CLIENT_MODE}; VLOG_DEFINE_THIS_MODULE(netdev_dpdk); static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20); @@ -122,6 +125,7 @@ static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20); #define XSTAT_RX_BROADCAST_PACKETS "rx_broadcast_packets" #define XSTAT_TX_BROADCAST_PACKETS "tx_broadcast_packets" #define XSTAT_RX_UNDERSIZED_ERRORS "rx_undersized_errors" +#define XSTAT_RX_UNDERSIZE_PACKETS "rx_undersize_packets" #define XSTAT_RX_OVERSIZE_ERRORS "rx_oversize_errors" #define XSTAT_RX_FRAGMENTED_ERRORS "rx_fragmented_errors" #define XSTAT_RX_JABBER_ERRORS "rx_jabber_errors" @@ -135,7 +139,7 @@ static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20); /* Maximum size of Physical NIC Queues */ #define NIC_PORT_MAX_Q_SIZE 4096 -#define OVS_VHOST_MAX_QUEUE_NUM 1024 /* Maximum number of vHost TX queues. */ +#define OVS_VHOST_MAX_QUEUE_NUM RTE_MAX_QUEUES_PER_PORT /* Max vHost TXQs */ #define OVS_VHOST_QUEUE_MAP_UNKNOWN (-1) /* Mapping not initialized. */ #define OVS_VHOST_QUEUE_DISABLED (-2) /* Queue was disabled by guest and not * yet mapped to another queue. */ @@ -170,21 +174,6 @@ static const struct rte_eth_conf port_conf = { }, }; -/* - * These callbacks allow virtio-net devices to be added to vhost ports when - * configuration has been fully completed. - */ -static int new_device(int vid); -static void destroy_device(int vid); -static int vring_state_changed(int vid, uint16_t queue_id, int enable); -static const struct vhost_device_ops virtio_net_device_ops = -{ - .new_device = new_device, - .destroy_device = destroy_device, - .vring_state_changed = vring_state_changed, - .features_changed = NULL -}; - enum { DPDK_RING_SIZE = 256 }; BUILD_ASSERT_DECL(IS_POW2(DPDK_RING_SIZE)); enum { DRAIN_TSC = 200000ULL }; @@ -379,6 +368,8 @@ struct netdev_dpdk { char *devargs; /* Device arguments for dpdk ports */ struct dpdk_tx_queue *tx_q; struct rte_eth_link link; + /* ID of vhost user port given to the PMD driver */ + int32_t vhost_pmd_id; ); PADDED_MEMBERS_CACHELINE_MARKER(CACHE_LINE_SIZE, cacheline1, @@ -472,7 +463,12 @@ static void netdev_dpdk_vhost_destruct(struct netdev *netdev); static void netdev_dpdk_clear_xstats(struct netdev_dpdk *dev); -int netdev_dpdk_get_vid(const struct netdev_dpdk *dev); +static int link_status_changed_callback(dpdk_port_t port_id, + enum rte_eth_event_type type, void *param, void *ret_param); +static int vring_state_changed_callback(dpdk_port_t port_id, + enum rte_eth_event_type type, void *param, void *ret_param); +static void netdev_dpdk_remap_txqs(struct netdev_dpdk *dev); +static void netdev_dpdk_txq_map_clear(struct netdev_dpdk *dev); struct ingress_policer * netdev_dpdk_get_ingress_policer(const struct netdev_dpdk *dev); @@ -812,11 +808,13 @@ dpdk_eth_dev_port_config(struct netdev_dpdk *dev, int n_rxq, int n_txq) break; } - diag = rte_eth_dev_set_mtu(dev->port_id, dev->mtu); - if (diag) { - VLOG_ERR("Interface %s MTU (%d) setup error: %s", - dev->up.name, dev->mtu, rte_strerror(-diag)); - break; + if (dev->type == DPDK_DEV_ETH) { + diag = rte_eth_dev_set_mtu(dev->port_id, dev->mtu); + if (diag) { + VLOG_ERR("Interface %s MTU (%d) setup error: %s", + dev->up.name, dev->mtu, rte_strerror(-diag)); + break; + } } for (i = 0; i < n_txq; i++) { @@ -851,8 +849,13 @@ dpdk_eth_dev_port_config(struct netdev_dpdk *dev, int n_rxq, int n_txq) continue; } - dev->up.n_rxq = n_rxq; - dev->up.n_txq = n_txq; + /* Only set n_*xq for physical devices. vHost User devices will set + * this value correctly using info from the virtio backend. + */ + if (dev->type == DPDK_DEV_ETH) { + dev->up.n_rxq = n_rxq; + dev->up.n_txq = n_txq; + } return 0; } @@ -893,8 +896,17 @@ dpdk_eth_dev_init(struct netdev_dpdk *dev) dev->hw_ol_features |= NETDEV_RX_CHECKSUM_OFFLOAD; } - n_rxq = MIN(info.max_rx_queues, dev->up.n_rxq); - n_txq = MIN(info.max_tx_queues, dev->up.n_txq); + if (dev->type != DPDK_DEV_ETH) { + /* We don't know how many queues QEMU will request so we need to + * provision for the maximum, as if we configure less up front than + * what QEMU configures later, those additional queues will never be + * available to us. */ + n_rxq = OVS_VHOST_MAX_QUEUE_NUM; + n_txq = OVS_VHOST_MAX_QUEUE_NUM; + } else { + n_rxq = MIN(info.max_rx_queues, dev->up.n_rxq); + n_txq = MIN(info.max_tx_queues, dev->up.n_txq); + } diag = dpdk_eth_dev_port_config(dev, n_rxq, n_txq); if (diag) { @@ -997,9 +1009,8 @@ common_construct(struct netdev *netdev, dpdk_port_t port_no, dev->requested_mtu = ETHER_MTU; dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu); dev->requested_lsc_interrupt_mode = 0; - ovsrcu_index_init(&dev->vid, -1); + dev->vhost_pmd_id = -1; dev->vhost_reconfigured = false; - dev->attached = false; ovsrcu_init(&dev->qos_conf, NULL); @@ -1057,19 +1068,62 @@ dpdk_dev_parse_name(const char dev_name[], const char prefix[], } static int -vhost_common_construct(struct netdev *netdev) - OVS_REQUIRES(dpdk_mutex) +dpdk_attach_vhost_pmd(struct netdev_dpdk *dev, int mode) { - int socket_id = rte_lcore_to_socket_id(rte_get_master_lcore()); - struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); + char *devargs; + int err = 0; + dpdk_port_t port_no = 0; + uint32_t driver_id = 0; + int iommu_enabled = 0; + int zc_enabled = 0; - dev->tx_q = netdev_dpdk_alloc_txq(OVS_VHOST_MAX_QUEUE_NUM); - if (!dev->tx_q) { - return ENOMEM; + if (dev->vhost_driver_flags & RTE_VHOST_USER_DEQUEUE_ZERO_COPY) { + zc_enabled = 1; + } + + if (dpdk_vhost_iommu_enabled()) { + iommu_enabled = 1; } - return common_construct(netdev, DPDK_ETH_PORT_ID_INVALID, - DPDK_DEV_VHOST, socket_id); + if (id_pool_alloc_id(dpdk_get_vhost_id_pool(), &driver_id)) { + devargs = xasprintf("net_vhost%u,iface=%s,queues=%i,client=%i," + "dequeue-zero-copy=%i,iommu-support=%i", + driver_id, dev->vhost_id, OVS_VHOST_MAX_QUEUE_NUM, mode, + zc_enabled, iommu_enabled); + err = rte_eth_dev_attach(devargs, &port_no); + if (!err) { + dev->attached = true; + dev->port_id = port_no; + dev->vhost_pmd_id = driver_id; + err = rte_vhost_driver_disable_features(dev->vhost_id, + 1ULL << VIRTIO_NET_F_HOST_TSO4 + | 1ULL << VIRTIO_NET_F_HOST_TSO6 + | 1ULL << VIRTIO_NET_F_CSUM); + if (err) { + VLOG_ERR("rte_vhost_driver_disable_features failed for vhost " + "user client port: %s\n", dev->up.name); + } + + rte_eth_dev_callback_register(dev->port_id, + RTE_ETH_EVENT_QUEUE_STATE, + vring_state_changed_callback, + NULL); + rte_eth_dev_callback_register(dev->port_id, + RTE_ETH_EVENT_INTR_LSC, + link_status_changed_callback, + NULL); + } else { + id_pool_free_id(dpdk_get_vhost_id_pool(), driver_id); + VLOG_ERR("Failed to attach vhost-user device %s to DPDK", + dev->vhost_id); + } + } else { + VLOG_ERR("Unable to create vhost-user device %s - too many vhost-user " + "devices registered with PMD", dev->vhost_id); + err = ENODEV; + } + + return err; } static int @@ -1082,7 +1136,7 @@ netdev_dpdk_vhost_construct(struct netdev *netdev) /* 'name' is appended to 'vhost_sock_dir' and used to create a socket in * the file system. '/' or '\' would traverse directories, so they're not * acceptable in 'name'. */ - if (strchr(name, '/') || strchr(name, '\\')) { + if (strchr(name, '/') || strchr(name, '\\') || strchr(name, ',')) { VLOG_ERR("\"%s\" is not a valid name for a vhost-user port. " "A valid name must not include '/' or '\\'", name); @@ -1097,46 +1151,23 @@ netdev_dpdk_vhost_construct(struct netdev *netdev) dpdk_get_vhost_sock_dir(), name); dev->vhost_driver_flags &= ~RTE_VHOST_USER_CLIENT; - err = rte_vhost_driver_register(dev->vhost_id, dev->vhost_driver_flags); - if (err) { - VLOG_ERR("vhost-user socket device setup failure for socket %s\n", - dev->vhost_id); - goto out; - } else { + err = dpdk_attach_vhost_pmd(dev, VHOST_SERVER_MODE); + if (!err) { fatal_signal_add_file_to_unlink(dev->vhost_id); VLOG_INFO("Socket %s created for vhost-user port %s\n", dev->vhost_id, name); - } - - err = rte_vhost_driver_callback_register(dev->vhost_id, - &virtio_net_device_ops); - if (err) { - VLOG_ERR("rte_vhost_driver_callback_register failed for vhost user " - "port: %s\n", name); - goto out; - } - - err = rte_vhost_driver_disable_features(dev->vhost_id, - 1ULL << VIRTIO_NET_F_HOST_TSO4 - | 1ULL << VIRTIO_NET_F_HOST_TSO6 - | 1ULL << VIRTIO_NET_F_CSUM); - if (err) { - VLOG_ERR("rte_vhost_driver_disable_features failed for vhost user " - "port: %s\n", name); - goto out; - } - - err = rte_vhost_driver_start(dev->vhost_id); - if (err) { - VLOG_ERR("rte_vhost_driver_start failed for vhost user " - "port: %s\n", name); + } else { goto out; } - err = vhost_common_construct(netdev); + err = common_construct(&dev->up, dev->port_id, DPDK_DEV_VHOST, + rte_lcore_to_socket_id(rte_get_master_lcore())); if (err) { - VLOG_ERR("vhost_common_construct failed for vhost user " - "port: %s\n", name); + VLOG_ERR("common_construct failed for vhost user port: %s\n", name); + rte_eth_dev_detach(dev->port_id, dev->vhost_id); + if (dev->vhost_pmd_id >= 0) { + id_pool_free_id(dpdk_get_vhost_id_pool(), dev->vhost_pmd_id); + } } out: @@ -1149,12 +1180,14 @@ out: static int netdev_dpdk_vhost_client_construct(struct netdev *netdev) { + struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); int err; ovs_mutex_lock(&dpdk_mutex); - err = vhost_common_construct(netdev); + err = common_construct(&dev->up, DPDK_ETH_PORT_ID_INVALID, DPDK_DEV_VHOST, + rte_lcore_to_socket_id(rte_get_master_lcore())); if (err) { - VLOG_ERR("vhost_common_construct failed for vhost user client" + VLOG_ERR("common_construct failed for vhost user client" "port: %s\n", netdev->name); } ovs_mutex_unlock(&dpdk_mutex); @@ -1178,90 +1211,76 @@ common_destruct(struct netdev_dpdk *dev) OVS_REQUIRES(dpdk_mutex) OVS_EXCLUDED(dev->mutex) { - rte_free(dev->tx_q); - dpdk_mp_release(dev->mp); - - ovs_list_remove(&dev->list_node); - free(ovsrcu_get_protected(struct ingress_policer *, - &dev->ingress_policer)); - ovs_mutex_destroy(&dev->mutex); -} - -static void -netdev_dpdk_destruct(struct netdev *netdev) -{ - struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); char devname[RTE_ETH_NAME_MAX_LEN]; - ovs_mutex_lock(&dpdk_mutex); - rte_eth_dev_stop(dev->port_id); dev->started = false; if (dev->attached) { rte_eth_dev_close(dev->port_id); if (rte_eth_dev_detach(dev->port_id, devname) < 0) { - VLOG_ERR("Device '%s' can not be detached", dev->devargs); + VLOG_ERR("Device '%s' can not be detached", devname); } else { VLOG_INFO("Device '%s' has been detached", devname); } } netdev_dpdk_clear_xstats(dev); - free(dev->devargs); - common_destruct(dev); - - ovs_mutex_unlock(&dpdk_mutex); + rte_free(dev->tx_q); + dpdk_mp_release(dev->mp); + ovs_list_remove(&dev->list_node); + free(ovsrcu_get_protected(struct ingress_policer *, + &dev->ingress_policer)); + ovs_mutex_destroy(&dev->mutex); } -/* rte_vhost_driver_unregister() can call back destroy_device(), which will - * try to acquire 'dpdk_mutex' and possibly 'dev->mutex'. To avoid a - * deadlock, none of the mutexes must be held while calling this function. */ -static int -dpdk_vhost_driver_unregister(struct netdev_dpdk *dev OVS_UNUSED, - char *vhost_id) - OVS_EXCLUDED(dpdk_mutex) - OVS_EXCLUDED(dev->mutex) +static void +netdev_dpdk_destruct(struct netdev *netdev) { - return rte_vhost_driver_unregister(vhost_id); + struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); + + ovs_mutex_lock(&dpdk_mutex); + common_destruct(dev); + free(dev->devargs); + ovs_mutex_unlock(&dpdk_mutex); } static void netdev_dpdk_vhost_destruct(struct netdev *netdev) { struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); - char *vhost_id; ovs_mutex_lock(&dpdk_mutex); /* Guest becomes an orphan if still attached. */ - if (netdev_dpdk_get_vid(dev) >= 0 - && !(dev->vhost_driver_flags & RTE_VHOST_USER_CLIENT)) { + check_link_status(dev); + if (dev->link.link_status == ETH_LINK_UP) { VLOG_ERR("Removing port '%s' while vhost device still attached.", netdev->name); VLOG_ERR("To restore connectivity after re-adding of port, VM on " "socket '%s' must be restarted.", dev->vhost_id); } - vhost_id = xstrdup(dev->vhost_id); - - common_destruct(dev); - - ovs_mutex_unlock(&dpdk_mutex); + rte_eth_dev_callback_unregister(dev->port_id, + RTE_ETH_EVENT_QUEUE_STATE, + vring_state_changed_callback, NULL); + rte_eth_dev_callback_unregister(dev->port_id, + RTE_ETH_EVENT_INTR_LSC, + link_status_changed_callback, NULL); - if (!vhost_id[0]) { - goto out; + if (dev->vhost_pmd_id >= 0) { + id_pool_free_id(dpdk_get_vhost_id_pool(), + dev->vhost_pmd_id); } - if (dpdk_vhost_driver_unregister(dev, vhost_id)) { - VLOG_ERR("%s: Unable to unregister vhost driver for socket '%s'.\n", - netdev->name, vhost_id); - } else if (!(dev->vhost_driver_flags & RTE_VHOST_USER_CLIENT)) { - /* OVS server mode - remove this socket from list for deletion */ - fatal_signal_remove_file_to_unlink(vhost_id); + if (!(dev->vhost_driver_flags & RTE_VHOST_USER_CLIENT)) { + /* OVS server mode - remove this socket from list for deletion */ + fatal_signal_remove_file_to_unlink(dev->vhost_id); } -out: - free(vhost_id); + + common_destruct(dev); + + ovs_mutex_unlock(&dpdk_mutex); } static void @@ -1846,12 +1865,6 @@ ingress_policer_run(struct ingress_policer *policer, struct rte_mbuf **pkts, return cnt; } -static bool -is_vhost_running(struct netdev_dpdk *dev) -{ - return (netdev_dpdk_get_vid(dev) >= 0 && dev->vhost_reconfigured); -} - static inline void netdev_dpdk_vhost_update_rx_size_counters(struct netdev_stats *stats, unsigned int packet_size) @@ -1913,64 +1926,9 @@ netdev_dpdk_vhost_update_rx_counters(struct netdev_stats *stats, } } -/* - * The receive path for the vhost port is the TX path out from guest. - */ -static int -netdev_dpdk_vhost_rxq_recv(struct netdev_rxq *rxq, - struct dp_packet_batch *batch, int *qfill) -{ - struct netdev_dpdk *dev = netdev_dpdk_cast(rxq->netdev); - struct ingress_policer *policer = netdev_dpdk_get_ingress_policer(dev); - uint16_t nb_rx = 0; - uint16_t dropped = 0; - int qid = rxq->queue_id * VIRTIO_QNUM + VIRTIO_TXQ; - int vid = netdev_dpdk_get_vid(dev); - - if (OVS_UNLIKELY(vid < 0 || !dev->vhost_reconfigured - || !(dev->flags & NETDEV_UP))) { - return EAGAIN; - } - - nb_rx = rte_vhost_dequeue_burst(vid, qid, dev->mp, - (struct rte_mbuf **) batch->packets, - NETDEV_MAX_BURST); - if (!nb_rx) { - return EAGAIN; - } - - if (qfill) { - if (nb_rx == NETDEV_MAX_BURST) { - /* The DPDK API returns a uint32_t which often has invalid bits in - * the upper 16-bits. Need to restrict the value to uint16_t. */ - *qfill = rte_vhost_rx_queue_count(vid, qid) & UINT16_MAX; - } else { - *qfill = 0; - } - } - - if (policer) { - dropped = nb_rx; - nb_rx = ingress_policer_run(policer, - (struct rte_mbuf **) batch->packets, - nb_rx, true); - dropped -= nb_rx; - } - - rte_spinlock_lock(&dev->stats_lock); - netdev_dpdk_vhost_update_rx_counters(&dev->stats, batch->packets, - nb_rx, dropped); - rte_spinlock_unlock(&dev->stats_lock); - - batch->count = nb_rx; - dp_packet_batch_init_packet_fields(batch); - - return 0; -} - static int -netdev_dpdk_rxq_recv(struct netdev_rxq *rxq, struct dp_packet_batch *batch, - int *qfill) +common_recv(struct netdev_rxq *rxq, struct dp_packet_batch *batch, + int *qfill) { struct netdev_rxq_dpdk *rx = netdev_rxq_dpdk_cast(rxq); struct netdev_dpdk *dev = netdev_dpdk_cast(rxq->netdev); @@ -2018,6 +1976,30 @@ netdev_dpdk_rxq_recv(struct netdev_rxq *rxq, struct dp_packet_batch *batch, return 0; } +/* + * The receive path for the vhost port is the TX path out from guest. + */ +static int +netdev_dpdk_vhost_rxq_recv(struct netdev_rxq *rxq, + struct dp_packet_batch *batch, + int *qfill) +{ + struct netdev_dpdk *dev = netdev_dpdk_cast(rxq->netdev); + + if (dev->vhost_reconfigured) { + return common_recv(rxq, batch, qfill); + } + + return EAGAIN; +} + +static int +netdev_dpdk_rxq_recv(struct netdev_rxq *rxq, struct dp_packet_batch *batch, + int *qfill) +{ + return common_recv(rxq, batch, qfill); +} + static inline int netdev_dpdk_qos_run(struct netdev_dpdk *dev, struct rte_mbuf **pkts, int cnt, bool may_steal) @@ -2059,80 +2041,6 @@ netdev_dpdk_filter_packet_len(struct netdev_dpdk *dev, struct rte_mbuf **pkts, return cnt; } -static inline void -netdev_dpdk_vhost_update_tx_counters(struct netdev_stats *stats, - struct dp_packet **packets, - int attempted, - int dropped) -{ - int i; - int sent = attempted - dropped; - - stats->tx_packets += sent; - stats->tx_dropped += dropped; - - for (i = 0; i < sent; i++) { - stats->tx_bytes += dp_packet_size(packets[i]); - } -} - -static void -__netdev_dpdk_vhost_send(struct netdev *netdev, int qid, - struct dp_packet **pkts, int cnt) -{ - struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); - struct rte_mbuf **cur_pkts = (struct rte_mbuf **) pkts; - unsigned int total_pkts = cnt; - unsigned int dropped = 0; - int i, retries = 0; - int vid = netdev_dpdk_get_vid(dev); - - qid = dev->tx_q[qid % netdev->n_txq].map; - - if (OVS_UNLIKELY(vid < 0 || !dev->vhost_reconfigured || qid < 0 - || !(dev->flags & NETDEV_UP))) { - rte_spinlock_lock(&dev->stats_lock); - dev->stats.tx_dropped+= cnt; - rte_spinlock_unlock(&dev->stats_lock); - goto out; - } - - rte_spinlock_lock(&dev->tx_q[qid].tx_lock); - - cnt = netdev_dpdk_filter_packet_len(dev, cur_pkts, cnt); - /* Check has QoS has been configured for the netdev */ - cnt = netdev_dpdk_qos_run(dev, cur_pkts, cnt, true); - dropped = total_pkts - cnt; - - do { - int vhost_qid = qid * VIRTIO_QNUM + VIRTIO_RXQ; - unsigned int tx_pkts; - - tx_pkts = rte_vhost_enqueue_burst(vid, vhost_qid, cur_pkts, cnt); - if (OVS_LIKELY(tx_pkts)) { - /* Packets have been sent.*/ - cnt -= tx_pkts; - /* Prepare for possible retry.*/ - cur_pkts = &cur_pkts[tx_pkts]; - } else { - /* No packets sent - do not retry.*/ - break; - } - } while (cnt && (retries++ <= VHOST_ENQ_RETRY_NUM)); - - rte_spinlock_unlock(&dev->tx_q[qid].tx_lock); - - rte_spinlock_lock(&dev->stats_lock); - netdev_dpdk_vhost_update_tx_counters(&dev->stats, pkts, total_pkts, - cnt + dropped); - rte_spinlock_unlock(&dev->stats_lock); - -out: - for (i = 0; i < total_pkts - dropped; i++) { - dp_packet_delete(pkts[i]); - } -} - /* Tx function. Transmit packets indefinitely */ static void dpdk_do_tx_copy(struct netdev *netdev, int qid, struct dp_packet_batch *batch) @@ -2186,12 +2094,7 @@ dpdk_do_tx_copy(struct netdev *netdev, int qid, struct dp_packet_batch *batch) } if (OVS_LIKELY(txcnt)) { - if (dev->type == DPDK_DEV_VHOST) { - __netdev_dpdk_vhost_send(netdev, qid, (struct dp_packet **) pkts, - txcnt); - } else { - dropped += netdev_dpdk_eth_tx_burst(dev, qid, pkts, txcnt); - } + dropped += netdev_dpdk_eth_tx_burst(dev, qid, pkts, txcnt); } if (OVS_UNLIKELY(dropped)) { @@ -2201,21 +2104,6 @@ dpdk_do_tx_copy(struct netdev *netdev, int qid, struct dp_packet_batch *batch) } } -static int -netdev_dpdk_vhost_send(struct netdev *netdev, int qid, - struct dp_packet_batch *batch, - bool concurrent_txq OVS_UNUSED) -{ - - if (OVS_UNLIKELY(batch->packets[0]->source != DPBUF_DPDK)) { - dpdk_do_tx_copy(netdev, qid, batch); - dp_packet_delete_batch(batch, true); - } else { - __netdev_dpdk_vhost_send(netdev, qid, batch->packets, batch->count); - } - return 0; -} - static inline void netdev_dpdk_send__(struct netdev_dpdk *dev, int qid, struct dp_packet_batch *batch, @@ -2226,8 +2114,7 @@ netdev_dpdk_send__(struct netdev_dpdk *dev, int qid, return; } - if (OVS_UNLIKELY(concurrent_txq)) { - qid = qid % dev->up.n_txq; + if (concurrent_txq) { rte_spinlock_lock(&dev->tx_q[qid].tx_lock); } @@ -2254,7 +2141,7 @@ netdev_dpdk_send__(struct netdev_dpdk *dev, int qid, } } - if (OVS_UNLIKELY(concurrent_txq)) { + if (concurrent_txq) { rte_spinlock_unlock(&dev->tx_q[qid].tx_lock); } } @@ -2265,11 +2152,35 @@ netdev_dpdk_eth_send(struct netdev *netdev, int qid, { struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); + if (concurrent_txq) { + qid = qid % dev->up.n_txq; + } + netdev_dpdk_send__(dev, qid, batch, concurrent_txq); return 0; } static int +netdev_dpdk_vhost_send(struct netdev *netdev, int qid, + struct dp_packet_batch *batch, + bool concurrent_txq OVS_UNUSED) +{ + struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); + + qid = dev->tx_q[qid % netdev->n_txq].map; + if (qid == -1 || !dev->vhost_reconfigured) { + rte_spinlock_lock(&dev->stats_lock); + dev->stats.tx_dropped+= batch->count; + rte_spinlock_unlock(&dev->stats_lock); + dp_packet_delete_batch(batch, true); + } else { + netdev_dpdk_send__(dev, qid, batch, false); + } + + return 0; +} + +static int netdev_dpdk_set_etheraddr(struct netdev *netdev, const struct eth_addr mac) { struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); @@ -2343,41 +2254,6 @@ netdev_dpdk_set_mtu(struct netdev *netdev, int mtu) static int netdev_dpdk_get_carrier(const struct netdev *netdev, bool *carrier); -static int -netdev_dpdk_vhost_get_stats(const struct netdev *netdev, - struct netdev_stats *stats) -{ - struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); - - ovs_mutex_lock(&dev->mutex); - - rte_spinlock_lock(&dev->stats_lock); - /* Supported Stats */ - stats->rx_packets = dev->stats.rx_packets; - stats->tx_packets = dev->stats.tx_packets; - stats->rx_dropped = dev->stats.rx_dropped; - stats->tx_dropped = dev->stats.tx_dropped; - stats->multicast = dev->stats.multicast; - stats->rx_bytes = dev->stats.rx_bytes; - stats->tx_bytes = dev->stats.tx_bytes; - stats->rx_errors = dev->stats.rx_errors; - stats->rx_length_errors = dev->stats.rx_length_errors; - - stats->rx_1_to_64_packets = dev->stats.rx_1_to_64_packets; - stats->rx_65_to_127_packets = dev->stats.rx_65_to_127_packets; - stats->rx_128_to_255_packets = dev->stats.rx_128_to_255_packets; - stats->rx_256_to_511_packets = dev->stats.rx_256_to_511_packets; - stats->rx_512_to_1023_packets = dev->stats.rx_512_to_1023_packets; - stats->rx_1024_to_1522_packets = dev->stats.rx_1024_to_1522_packets; - stats->rx_1523_to_max_packets = dev->stats.rx_1523_to_max_packets; - - rte_spinlock_unlock(&dev->stats_lock); - - ovs_mutex_unlock(&dev->mutex); - - return 0; -} - static void netdev_dpdk_convert_xstats(struct netdev_stats *stats, const struct rte_eth_xstat *xstats, @@ -2423,6 +2299,8 @@ netdev_dpdk_convert_xstats(struct netdev_stats *stats, stats->tx_broadcast_packets = xstats[i].value; } else if (strcmp(XSTAT_RX_UNDERSIZED_ERRORS, names[i].name) == 0) { stats->rx_undersized_errors = xstats[i].value; + } else if (strcmp(XSTAT_RX_UNDERSIZE_PACKETS, names[i].name) == 0) { + stats->rx_undersized_errors = xstats[i].value; } else if (strcmp(XSTAT_RX_FRAGMENTED_ERRORS, names[i].name) == 0) { stats->rx_fragmented_errors = xstats[i].value; } else if (strcmp(XSTAT_RX_JABBER_ERRORS, names[i].name) == 0) { @@ -2445,6 +2323,11 @@ netdev_dpdk_get_stats(const struct netdev *netdev, struct netdev_stats *stats) struct rte_eth_xstat_name *rte_xstats_names = NULL; int rte_xstats_len, rte_xstats_new_len, rte_xstats_ret; + if (!rte_eth_dev_is_valid_port(dev->port_id)) { + ovs_mutex_unlock(&dev->mutex); + return EPROTO; + } + if (rte_eth_stats_get(dev->port_id, &rte_stats)) { VLOG_ERR("Can't get ETH statistics for port: "DPDK_PORT_ID_FMT, dev->port_id); @@ -2521,6 +2404,10 @@ netdev_dpdk_get_custom_stats(const struct netdev *netdev, ovs_mutex_lock(&dev->mutex); + if (rte_eth_dev_is_valid_port(dev->port_id)) { + goto out; + } + if (netdev_dpdk_configure_xstats(dev)) { uint64_t *values = xcalloc(dev->rte_xstats_ids_size, sizeof(uint64_t)); @@ -2557,6 +2444,7 @@ netdev_dpdk_get_custom_stats(const struct netdev *netdev, free(values); } +out: ovs_mutex_unlock(&dev->mutex); return 0; @@ -2713,24 +2601,6 @@ netdev_dpdk_get_carrier(const struct netdev *netdev, bool *carrier) return 0; } -static int -netdev_dpdk_vhost_get_carrier(const struct netdev *netdev, bool *carrier) -{ - struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); - - ovs_mutex_lock(&dev->mutex); - - if (is_vhost_running(dev)) { - *carrier = 1; - } else { - *carrier = 0; - } - - ovs_mutex_unlock(&dev->mutex); - - return 0; -} - static long long int netdev_dpdk_get_carrier_resets(const struct netdev *netdev) { @@ -2780,8 +2650,7 @@ netdev_dpdk_update_flags__(struct netdev_dpdk *dev, * running then change netdev's change_seq to trigger link state * update. */ - if ((NETDEV_UP & ((*old_flagsp ^ on) | (*old_flagsp ^ off))) - && is_vhost_running(dev)) { + if ((NETDEV_UP & ((*old_flagsp ^ on) | (*old_flagsp ^ off)))) { netdev_change_seq_changed(&dev->up); /* Clear statistics if device is getting up. */ @@ -2811,18 +2680,41 @@ netdev_dpdk_update_flags(struct netdev *netdev, return error; } +static void +common_get_status(struct smap *args, struct netdev_dpdk *dev, + struct rte_eth_dev_info *dev_info) +{ + smap_add_format(args, "port_no", DPDK_PORT_ID_FMT, dev->port_id); + smap_add_format(args, "numa_id", "%d", + rte_eth_dev_socket_id(dev->port_id)); + smap_add_format(args, "driver_name", "%s", dev_info->driver_name); + smap_add_format(args, "min_rx_bufsize", "%u", dev_info->min_rx_bufsize); + smap_add_format(args, "max_rx_pktlen", "%u", dev->max_packet_len); + smap_add_format(args, "max_rx_queues", "%u", dev_info->max_rx_queues); + smap_add_format(args, "max_tx_queues", "%u", dev_info->max_tx_queues); + smap_add_format(args, "max_mac_addrs", "%u", dev_info->max_mac_addrs); +} + static int netdev_dpdk_vhost_user_get_status(const struct netdev *netdev, struct smap *args) { struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); + struct rte_eth_dev_info dev_info; + + if (!rte_eth_dev_is_valid_port(dev->port_id)) { + return ENODEV; + } ovs_mutex_lock(&dev->mutex); + rte_eth_dev_info_get(dev->port_id, &dev_info); + + common_get_status(args, dev, &dev_info); bool client_mode = dev->vhost_driver_flags & RTE_VHOST_USER_CLIENT; smap_add_format(args, "mode", "%s", client_mode ? "client" : "server"); - int vid = netdev_dpdk_get_vid(dev); + int vid = rte_eth_vhost_get_vid_from_port_id(dev->port_id);; if (vid < 0) { smap_add_format(args, "status", "disconnected"); ovs_mutex_unlock(&dev->mutex); @@ -2883,15 +2775,8 @@ netdev_dpdk_get_status(const struct netdev *netdev, struct smap *args) rte_eth_dev_info_get(dev->port_id, &dev_info); ovs_mutex_unlock(&dev->mutex); - smap_add_format(args, "port_no", DPDK_PORT_ID_FMT, dev->port_id); - smap_add_format(args, "numa_id", "%d", - rte_eth_dev_socket_id(dev->port_id)); - smap_add_format(args, "driver_name", "%s", dev_info.driver_name); - smap_add_format(args, "min_rx_bufsize", "%u", dev_info.min_rx_bufsize); - smap_add_format(args, "max_rx_pktlen", "%u", dev->max_packet_len); - smap_add_format(args, "max_rx_queues", "%u", dev_info.max_rx_queues); - smap_add_format(args, "max_tx_queues", "%u", dev_info.max_tx_queues); - smap_add_format(args, "max_mac_addrs", "%u", dev_info.max_mac_addrs); + common_get_status(args, dev, &dev_info); + smap_add_format(args, "max_hash_mac_addrs", "%u", dev_info.max_hash_mac_addrs); smap_add_format(args, "max_vfs", "%u", dev_info.max_vfs); @@ -3070,19 +2955,6 @@ out: } /* - * Set virtqueue flags so that we do not receive interrupts. - */ -static void -set_irq_status(int vid) -{ - uint32_t i; - - for (i = 0; i < rte_vhost_get_vring_num(vid); i++) { - rte_vhost_enable_guest_notification(vid, i, 0); - } -} - -/* * Fixes mapping for vhost-user tx queues. Must be called after each * enabling/disabling of queues and n_txq modifications. */ @@ -3123,53 +2995,60 @@ netdev_dpdk_remap_txqs(struct netdev_dpdk *dev) free(enabled_queues); } -/* - * A new virtio-net device is added to a vhost port. - */ static int -new_device(int vid) +link_status_changed_callback(dpdk_port_t port_id, + enum rte_eth_event_type type OVS_UNUSED, + void *param OVS_UNUSED, + void *ret_param OVS_UNUSED) { struct netdev_dpdk *dev; bool exists = false; int newnode = 0; - char ifname[IF_NAME_SZ]; - - rte_vhost_get_ifname(vid, ifname, sizeof ifname); ovs_mutex_lock(&dpdk_mutex); /* Add device to the vhost port with the same name as that passed down. */ LIST_FOR_EACH(dev, list_node, &dpdk_list) { ovs_mutex_lock(&dev->mutex); - if (strncmp(ifname, dev->vhost_id, IF_NAME_SZ) == 0) { - uint32_t qp_num = rte_vhost_get_vring_num(vid)/VIRTIO_QNUM; - - /* Get NUMA information */ - newnode = rte_vhost_get_numa_node(vid); - if (newnode == -1) { + if (port_id == dev->port_id) { + check_link_status(dev); + if (dev->link.link_status == ETH_LINK_UP) { + /* Device brought up */ + /* Get queue information */ + int vid = rte_eth_vhost_get_vid_from_port_id(dev->port_id); + uint32_t qp_num = rte_vhost_get_vring_num(vid) / VIRTIO_QNUM; + if (qp_num <= 0) { + qp_num = dev->requested_n_rxq; + } + /* Get NUMA information */ + newnode = rte_eth_dev_socket_id(dev->port_id); + if (newnode == -1) { #ifdef VHOST_NUMA - VLOG_INFO("Error getting NUMA info for vHost Device '%s'", - ifname); + VLOG_INFO("Error getting NUMA info for vHost Device '%s'", + dev->vhost_id); #endif - newnode = dev->socket_id; - } + newnode = dev->socket_id; + } + if (dev->requested_n_txq != qp_num + || dev->requested_n_rxq != qp_num + || dev->requested_socket_id != newnode) { + dev->requested_socket_id = newnode; + dev->requested_n_rxq = qp_num; + dev->requested_n_txq = qp_num; + netdev_request_reconfigure(&dev->up); + } else { + /* Reconfiguration not required. */ + dev->vhost_reconfigured = true; + } - if (dev->requested_n_txq != qp_num - || dev->requested_n_rxq != qp_num - || dev->requested_socket_id != newnode) { - dev->requested_socket_id = newnode; - dev->requested_n_rxq = qp_num; - dev->requested_n_txq = qp_num; - netdev_request_reconfigure(&dev->up); + VLOG_INFO("vHost Device '%s' has been added on numa node %i", + dev->vhost_id, newnode); } else { - /* Reconfiguration not required. */ - dev->vhost_reconfigured = true; + /* Device brought down */ + dev->vhost_reconfigured = false; + netdev_dpdk_txq_map_clear(dev); + VLOG_INFO("vHost Device '%s' has been removed", dev->vhost_id); } - - ovsrcu_index_set(&dev->vid, vid); exists = true; - - /* Disable notifications. */ - set_irq_status(vid); netdev_change_seq_changed(&dev->up); ovs_mutex_unlock(&dev->mutex); break; @@ -3179,14 +3058,11 @@ new_device(int vid) ovs_mutex_unlock(&dpdk_mutex); if (!exists) { - VLOG_INFO("vHost Device '%s' can't be added - name not found", ifname); + VLOG_INFO("vHost Device with port id %i not found", port_id); return -1; } - VLOG_INFO("vHost Device '%s' has been added on numa node %i", - ifname, newnode); - return 0; } @@ -3202,78 +3078,32 @@ netdev_dpdk_txq_map_clear(struct netdev_dpdk *dev) } } -/* - * Remove a virtio-net device from the specific vhost port. Use dev->remove - * flag to stop any more packets from being sent or received to/from a VM and - * ensure all currently queued packets have been sent/received before removing - * the device. - */ -static void -destroy_device(int vid) -{ - struct netdev_dpdk *dev; - bool exists = false; - char ifname[IF_NAME_SZ]; - - rte_vhost_get_ifname(vid, ifname, sizeof ifname); - - ovs_mutex_lock(&dpdk_mutex); - LIST_FOR_EACH (dev, list_node, &dpdk_list) { - if (netdev_dpdk_get_vid(dev) == vid) { - - ovs_mutex_lock(&dev->mutex); - dev->vhost_reconfigured = false; - ovsrcu_index_set(&dev->vid, -1); - netdev_dpdk_txq_map_clear(dev); - - netdev_change_seq_changed(&dev->up); - ovs_mutex_unlock(&dev->mutex); - exists = true; - break; - } - } - - ovs_mutex_unlock(&dpdk_mutex); - - if (exists) { - /* - * Wait for other threads to quiesce after setting the 'virtio_dev' - * to NULL, before returning. - */ - ovsrcu_synchronize(); - /* - * As call to ovsrcu_synchronize() will end the quiescent state, - * put thread back into quiescent state before returning. - */ - ovsrcu_quiesce_start(); - VLOG_INFO("vHost Device '%s' has been removed", ifname); - } else { - VLOG_INFO("vHost Device '%s' not found", ifname); - } -} - static int -vring_state_changed(int vid, uint16_t queue_id, int enable) +vring_state_changed_callback(dpdk_port_t port_id, + enum rte_eth_event_type type OVS_UNUSED, + void *param OVS_UNUSED, + void *ret_param OVS_UNUSED) { struct netdev_dpdk *dev; bool exists = false; - int qid = queue_id / VIRTIO_QNUM; + int vid = -1; char ifname[IF_NAME_SZ]; + struct rte_eth_vhost_queue_event event; + int err = 0; - rte_vhost_get_ifname(vid, ifname, sizeof ifname); - - if (queue_id % VIRTIO_QNUM == VIRTIO_TXQ) { + err = rte_eth_vhost_get_queue_event(port_id, &event); + if (err || event.rx) { return 0; } ovs_mutex_lock(&dpdk_mutex); LIST_FOR_EACH (dev, list_node, &dpdk_list) { ovs_mutex_lock(&dev->mutex); - if (strncmp(ifname, dev->vhost_id, IF_NAME_SZ) == 0) { - if (enable) { - dev->tx_q[qid].map = qid; + if (port_id == dev->port_id) { + if (event.enable) { + dev->tx_q[event.queue_id].map = event.queue_id; } else { - dev->tx_q[qid].map = OVS_VHOST_QUEUE_DISABLED; + dev->tx_q[event.queue_id].map = OVS_VHOST_QUEUE_DISABLED; } netdev_dpdk_remap_txqs(dev); exists = true; @@ -3284,10 +3114,13 @@ vring_state_changed(int vid, uint16_t queue_id, int enable) } ovs_mutex_unlock(&dpdk_mutex); + vid = rte_eth_vhost_get_vid_from_port_id(dev->port_id); + rte_vhost_get_ifname(vid, ifname, sizeof ifname); + if (exists) { - VLOG_INFO("State of queue %d ( tx_qid %d ) of vhost device '%s'" - "changed to \'%s\'", queue_id, qid, ifname, - (enable == 1) ? "enabled" : "disabled"); + VLOG_INFO("State of tx_qid %d of vhost device '%s'" + "changed to \'%s\'", event.queue_id, ifname, + (event.enable == 1) ? "enabled" : "disabled"); } else { VLOG_INFO("vHost Device '%s' not found", ifname); return -1; @@ -3296,25 +3129,6 @@ vring_state_changed(int vid, uint16_t queue_id, int enable) return 0; } -/* - * Retrieve the DPDK virtio device ID (vid) associated with a vhostuser - * or vhostuserclient netdev. - * - * Returns a value greater or equal to zero for a valid vid or '-1' if - * there is no valid vid associated. A vid of '-1' must not be used in - * rte_vhost_ APi calls. - * - * Once obtained and validated, a vid can be used by a PMD for multiple - * subsequent rte_vhost API calls until the PMD quiesces. A PMD should - * not fetch the vid again for each of a series of API calls. - */ - -int -netdev_dpdk_get_vid(const struct netdev_dpdk *dev) -{ - return ovsrcu_index_get(&dev->vid); -} - struct ingress_policer * netdev_dpdk_get_ingress_policer(const struct netdev_dpdk *dev) { @@ -3681,13 +3495,12 @@ static const struct dpdk_qos_ops egress_policer_ops = { }; static int -netdev_dpdk_reconfigure(struct netdev *netdev) +common_reconfigure(struct netdev *netdev) + OVS_REQUIRES(dev->mutex) { struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); int err = 0; - ovs_mutex_lock(&dev->mutex); - if (netdev->n_txq == dev->requested_n_txq && netdev->n_rxq == dev->requested_n_rxq && dev->mtu == dev->requested_mtu @@ -3727,17 +3540,36 @@ netdev_dpdk_reconfigure(struct netdev *netdev) netdev_change_seq_changed(netdev); out: + return err; +} + +static int +netdev_dpdk_reconfigure(struct netdev *netdev) +{ + struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); + int err = 0; + + ovs_mutex_lock(&dev->mutex); + err = common_reconfigure(netdev); ovs_mutex_unlock(&dev->mutex); + return err; } static int -dpdk_vhost_reconfigure_helper(struct netdev_dpdk *dev) +dpdk_vhost_reconfigure_helper(struct netdev *netdev) OVS_REQUIRES(dev->mutex) { + struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); + int err; + dev->up.n_txq = dev->requested_n_txq; dev->up.n_rxq = dev->requested_n_rxq; - int err; + + err = common_reconfigure(netdev); + if (err) { + return err; + } /* Enable TX queue 0 by default if it wasn't disabled. */ if (dev->tx_q[0].map == OVS_VHOST_QUEUE_MAP_UNKNOWN) { @@ -3746,14 +3578,7 @@ dpdk_vhost_reconfigure_helper(struct netdev_dpdk *dev) netdev_dpdk_remap_txqs(dev); - err = netdev_dpdk_mempool_configure(dev); - if (!err) { - /* A new mempool was created. */ - netdev_change_seq_changed(&dev->up); - } else if (err != EEXIST){ - return err; - } - if (netdev_dpdk_get_vid(dev) >= 0) { + if (rte_eth_vhost_get_vid_from_port_id(dev->port_id) >= 0) { if (dev->vhost_reconfigured == false) { dev->vhost_reconfigured = true; /* Carrier status may need updating. */ @@ -3771,7 +3596,7 @@ netdev_dpdk_vhost_reconfigure(struct netdev *netdev) int err; ovs_mutex_lock(&dev->mutex); - err = dpdk_vhost_reconfigure_helper(dev); + err = dpdk_vhost_reconfigure_helper(netdev); ovs_mutex_unlock(&dev->mutex); return err; @@ -3781,9 +3606,8 @@ static int netdev_dpdk_vhost_client_reconfigure(struct netdev *netdev) { struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); - int err; - uint64_t vhost_flags = 0; - bool zc_enabled; + int err = 0; + int sid = -1; ovs_mutex_lock(&dev->mutex); @@ -3794,64 +3618,50 @@ netdev_dpdk_vhost_client_reconfigure(struct netdev *netdev) */ if (!(dev->vhost_driver_flags & RTE_VHOST_USER_CLIENT) && strlen(dev->vhost_id)) { - /* Register client-mode device. */ - vhost_flags |= RTE_VHOST_USER_CLIENT; + /* First time once-only configuration */ + err = dpdk_attach_vhost_pmd(dev, VHOST_CLIENT_MODE); + + if (!err) { + sid = rte_eth_dev_socket_id(dev->port_id); + dev->socket_id = sid < 0 ? SOCKET0 : sid; + dev->vhost_driver_flags |= RTE_VHOST_USER_CLIENT; + + if (dev->requested_socket_id != dev->socket_id + || dev->requested_mtu != dev->mtu) { + err = netdev_dpdk_mempool_configure(dev); + if (err && err != EEXIST) { + goto unlock; + } + } - /* Enable IOMMU support, if explicitly requested. */ - if (dpdk_vhost_iommu_enabled()) { - vhost_flags |= RTE_VHOST_USER_IOMMU_SUPPORT; - } + netdev->n_txq = dev->requested_n_txq; + netdev->n_rxq = dev->requested_n_rxq; + + rte_free(dev->tx_q); + err = dpdk_eth_dev_init(dev); + dev->tx_q = netdev_dpdk_alloc_txq(netdev->n_txq); + if (!dev->tx_q) { + rte_eth_dev_detach(dev->port_id, dev->vhost_id); + if (dev->vhost_pmd_id >= 0) { + id_pool_free_id(dpdk_get_vhost_id_pool(), + dev->vhost_pmd_id); + } + err = ENOMEM; + goto unlock; + } - zc_enabled = dev->vhost_driver_flags - & RTE_VHOST_USER_DEQUEUE_ZERO_COPY; - /* Enable zero copy flag, if requested */ - if (zc_enabled) { - vhost_flags |= RTE_VHOST_USER_DEQUEUE_ZERO_COPY; - } + netdev_change_seq_changed(netdev); - err = rte_vhost_driver_register(dev->vhost_id, vhost_flags); - if (err) { - VLOG_ERR("vhost-user device setup failure for device %s\n", - dev->vhost_id); - goto unlock; - } else { - /* Configuration successful */ - dev->vhost_driver_flags |= vhost_flags; VLOG_INFO("vHost User device '%s' created in 'client' mode, " "using client socket '%s'", dev->up.name, dev->vhost_id); - if (zc_enabled) { - VLOG_INFO("Zero copy enabled for vHost port %s", dev->up.name); - } - } - - err = rte_vhost_driver_callback_register(dev->vhost_id, - &virtio_net_device_ops); - if (err) { - VLOG_ERR("rte_vhost_driver_callback_register failed for " - "vhost user client port: %s\n", dev->up.name); - goto unlock; - } - - err = rte_vhost_driver_disable_features(dev->vhost_id, - 1ULL << VIRTIO_NET_F_HOST_TSO4 - | 1ULL << VIRTIO_NET_F_HOST_TSO6 - | 1ULL << VIRTIO_NET_F_CSUM); - if (err) { - VLOG_ERR("rte_vhost_driver_disable_features failed for vhost user " - "client port: %s\n", dev->up.name); - goto unlock; - } - - err = rte_vhost_driver_start(dev->vhost_id); - if (err) { - VLOG_ERR("rte_vhost_driver_start failed for vhost user " - "client port: %s\n", dev->up.name); - goto unlock; } + goto unlock; } - err = dpdk_vhost_reconfigure_helper(dev); + if (rte_eth_dev_is_valid_port(dev->port_id)) { + err = dpdk_vhost_reconfigure_helper(netdev); + } unlock: ovs_mutex_unlock(&dev->mutex); @@ -3861,9 +3671,7 @@ unlock: #define NETDEV_DPDK_CLASS(NAME, INIT, CONSTRUCT, DESTRUCT, \ SET_CONFIG, SET_TX_MULTIQ, SEND, \ - GET_CARRIER, GET_STATS, \ - GET_CUSTOM_STATS, \ - GET_FEATURES, GET_STATUS, \ + GET_STATUS, \ RECONFIGURE, RXQ_RECV) \ { \ NAME, \ @@ -3893,12 +3701,12 @@ unlock: netdev_dpdk_get_mtu, \ netdev_dpdk_set_mtu, \ netdev_dpdk_get_ifindex, \ - GET_CARRIER, \ + netdev_dpdk_get_carrier, \ netdev_dpdk_get_carrier_resets, \ netdev_dpdk_set_miimon, \ - GET_STATS, \ - GET_CUSTOM_STATS, \ - GET_FEATURES, \ + netdev_dpdk_get_stats, \ + netdev_dpdk_get_custom_stats, \ + netdev_dpdk_get_features, \ NULL, /* set_advertisements */ \ NULL, /* get_pt_mode */ \ \ @@ -3945,10 +3753,6 @@ static const struct netdev_class dpdk_class = netdev_dpdk_set_config, netdev_dpdk_set_tx_multiq, netdev_dpdk_eth_send, - netdev_dpdk_get_carrier, - netdev_dpdk_get_stats, - netdev_dpdk_get_custom_stats, - netdev_dpdk_get_features, netdev_dpdk_get_status, netdev_dpdk_reconfigure, netdev_dpdk_rxq_recv); @@ -3962,10 +3766,6 @@ static const struct netdev_class dpdk_ring_class = netdev_dpdk_ring_set_config, netdev_dpdk_set_tx_multiq, netdev_dpdk_ring_send, - netdev_dpdk_get_carrier, - netdev_dpdk_get_stats, - netdev_dpdk_get_custom_stats, - netdev_dpdk_get_features, netdev_dpdk_get_status, netdev_dpdk_reconfigure, netdev_dpdk_rxq_recv); @@ -3979,10 +3779,6 @@ static const struct netdev_class dpdk_vhost_class = NULL, NULL, netdev_dpdk_vhost_send, - netdev_dpdk_vhost_get_carrier, - netdev_dpdk_vhost_get_stats, - NULL, - NULL, netdev_dpdk_vhost_user_get_status, netdev_dpdk_vhost_reconfigure, netdev_dpdk_vhost_rxq_recv); @@ -3995,10 +3791,6 @@ static const struct netdev_class dpdk_vhost_client_class = netdev_dpdk_vhost_client_set_config, NULL, netdev_dpdk_vhost_send, - netdev_dpdk_vhost_get_carrier, - netdev_dpdk_vhost_get_stats, - NULL, - NULL, netdev_dpdk_vhost_user_get_status, netdev_dpdk_vhost_client_reconfigure, netdev_dpdk_vhost_rxq_recv); diff --git a/tests/system-dpdk.at b/tests/system-dpdk.at index 3d21b01..baefa2b 100644 --- a/tests/system-dpdk.at +++ b/tests/system-dpdk.at @@ -68,6 +68,7 @@ OVS_VSWITCHD_STOP("/does not exist. The Open vSwitch kernel module is probably n /failed to connect to \/tmp\/dpdkvhostclient0: No such file or directory/d /Global register is changed during/d /EAL: No free hugepages reported in hugepages-1048576kB/d +/Rx checksum offload is not supported/d ") AT_CLEANUP dnl --------------------------------------------------------------------------