From patchwork Tue May 1 17:02:14 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Lam, Tiago" X-Patchwork-Id: 907144 X-Patchwork-Delegate: ian.stokes@intel.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=intel.com Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 40b7B56l58z9s35 for ; Wed, 2 May 2018 03:07:09 +1000 (AEST) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id 16E379D0; Tue, 1 May 2018 17:03:00 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id 18108957 for ; Tue, 1 May 2018 17:02:59 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 4CB4E6D0 for ; Tue, 1 May 2018 17:02:58 +0000 (UTC) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 01 May 2018 10:02:57 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.49,351,1520924400"; d="scan'208";a="51485484" Received: from silpixa00399125.ir.intel.com ([10.237.223.34]) by fmsmga001.fm.intel.com with ESMTP; 01 May 2018 10:02:57 -0700 From: Tiago Lam To: ovs-dev@openvswitch.org Date: Tue, 1 May 2018 18:02:14 +0100 Message-Id: <1525194134-248371-9-git-send-email-tiago.lam@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1525194134-248371-1-git-send-email-tiago.lam@intel.com> References: <1525194134-248371-1-git-send-email-tiago.lam@intel.com> X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Subject: [ovs-dev] [RFC v5 8/8] netdev-dpdk: support multi-segment jumbo frames X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org From: Mark Kavanagh Currently, jumbo frame support for OvS-DPDK is implemented by increasing the size of mbufs within a mempool, such that each mbuf within the pool is large enough to contain an entire jumbo frame of a user-defined size. Typically, for each user-defined MTU, 'requested_mtu', a new mempool is created, containing mbufs of size ~requested_mtu. With the multi-segment approach, a port uses a single mempool, (containing standard/default-sized mbufs of ~2k bytes), irrespective of the user-requested MTU value. To accommodate jumbo frames, mbufs are chained together, where each mbuf in the chain stores a portion of the jumbo frame. Each mbuf in the chain is termed a segment, hence the name. == Enabling multi-segment mbufs == Multi-segment and single-segment mbufs are mutually exclusive, and the user must decide on which approach to adopt on init. The introduction of a new OVSDB field, 'dpdk-multi-seg-mbufs', facilitates this. This is a global boolean value, which determines how jumbo frames are represented across all DPDK ports. In the absence of a user-supplied value, 'dpdk-multi-seg-mbufs' defaults to false, i.e. multi-segment mbufs must be explicitly enabled / single-segment mbufs remain the default. Setting the field is identical to setting existing DPDK-specific OVSDB fields: ovs-vsctl set Open_vSwitch . other_config:dpdk-init=true ovs-vsctl set Open_vSwitch . other_config:dpdk-lcore-mask=0x10 ovs-vsctl set Open_vSwitch . other_config:dpdk-socket-mem=4096,0 ==> ovs-vsctl set Open_vSwitch . other_config:dpdk-multi-seg-mbufs=true Co-authored-by: Tiago Lam Signed-off-by: Mark Kavanagh Signed-off-by: Tiago Lam --- NEWS | 1 + lib/dpdk.c | 7 +++++++ lib/netdev-dpdk.c | 52 +++++++++++++++++++++++++++++++++++++++++++++------- lib/netdev-dpdk.h | 1 + vswitchd/vswitch.xml | 20 ++++++++++++++++++++ 5 files changed, 74 insertions(+), 7 deletions(-) diff --git a/NEWS b/NEWS index d22ad14..e6752d6 100644 --- a/NEWS +++ b/NEWS @@ -92,6 +92,7 @@ v2.9.0 - 19 Feb 2018 pmd assignments. * Add rxq utilization of pmd to appctl 'dpif-netdev/pmd-rxq-show'. * Add support for vHost dequeue zero copy (experimental) + * Add support for multi-segment mbufs - Userspace datapath: * Output packet batching support. - vswitchd: diff --git a/lib/dpdk.c b/lib/dpdk.c index 00dd974..1447724 100644 --- a/lib/dpdk.c +++ b/lib/dpdk.c @@ -459,6 +459,13 @@ dpdk_init__(const struct smap *ovs_other_config) /* Finally, register the dpdk classes */ netdev_dpdk_register(); + + bool multi_seg_mbufs_enable = smap_get_bool(ovs_other_config, + "dpdk-multi-seg-mbufs", false); + if (multi_seg_mbufs_enable) { + VLOG_INFO("DPDK multi-segment mbufs enabled\n"); + netdev_dpdk_multi_segment_mbufs_enable(); + } } void diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 4c6a3c0..5746ae0 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -66,6 +66,7 @@ enum {VIRTIO_RXQ, VIRTIO_TXQ, VIRTIO_QNUM}; VLOG_DEFINE_THIS_MODULE(netdev_dpdk); static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20); +static bool dpdk_multi_segment_mbufs = false; #define DPDK_PORT_WATCHDOG_INTERVAL 5 @@ -593,6 +594,7 @@ dpdk_mp_create(struct netdev_dpdk *dev, uint16_t mbuf_pkt_data_len) + dev->requested_n_txq * dev->requested_txq_size + MIN(RTE_MAX_LCORE, dev->requested_n_rxq) * NETDEV_MAX_BURST + MIN_NB_MBUF; + /* XXX: should n_mbufs be increased if multi-seg mbufs are used? */ ovs_mutex_lock(&dpdk_mp_mutex); do { @@ -693,7 +695,13 @@ dpdk_mp_release(struct rte_mempool *mp) /* Tries to allocate a new mempool - or re-use an existing one where * appropriate - on requested_socket_id with a size determined by - * requested_mtu and requested Rx/Tx queues. + * requested_mtu and requested Rx/Tx queues. Some properties of the mempool's + * elements are dependent on the value of 'dpdk_multi_segment_mbufs': + * - if 'true', then the mempool contains standard-sized mbufs that are chained + * together to accommodate packets of size 'requested_mtu'. + * - if 'false', then the members of the allocated mempool are + * non-standard-sized mbufs. Each mbuf in the mempool is large enough to + * fully accomdate packets of size 'requested_mtu'. * On success - or when re-using an existing mempool - the new configuration * will be applied. * On error, device will be left unchanged. */ @@ -701,10 +709,18 @@ static int netdev_dpdk_mempool_configure(struct netdev_dpdk *dev) OVS_REQUIRES(dev->mutex) { - uint16_t buf_size = dpdk_buf_size(dev->requested_mtu); + uint16_t buf_size = 0; struct rte_mempool *mp; int ret = 0; + /* Contiguous mbufs in use - permit oversized mbufs */ + if (!dpdk_multi_segment_mbufs) { + buf_size = dpdk_buf_size(dev->requested_mtu); + } else { + /* multi-segment mbufs - use standard mbuf size */ + buf_size = dpdk_buf_size(ETHER_MTU); + } + dpdk_mp_sweep(); mp = dpdk_mp_create(dev, buf_size); @@ -786,11 +802,25 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, int n_rxq, int n_txq) int diag = 0; int i; struct rte_eth_conf conf = port_conf; + struct rte_eth_txconf txconf; + + /* Multi-segment-mbuf-specific setup. */ + if (dpdk_multi_segment_mbufs) { + struct rte_eth_dev_info dev_info; + + /* DPDK PMDs typically attempt to use simple or vectorized + * transmit functions, neither of which are compatible with + * multi-segment mbufs. Ensure that these are disabled when + * multi-segment mbufs are enabled. + */ + rte_eth_dev_info_get(dev->port_id, &dev_info); + txconf = dev_info.default_txconf; + txconf.txq_flags &= ~ETH_TXQ_FLAGS_NOMULTSEGS; - /* For some NICs (e.g. Niantic), scatter_rx mode needs to be explicitly - * enabled. */ - if (dev->mtu > ETHER_MTU) { - conf.rxmode.enable_scatter = 1; + /* For some NICs (e.g. Niantic), scattered_rx mode (required for + * ingress jumbo frames when multi-segments are enabled) needs to + * be explicitly enabled. */ + conf.rxmode.enable_scatter = 1; } conf.rxmode.hw_ip_checksum = (dev->hw_ol_features & @@ -821,7 +851,9 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, int n_rxq, int n_txq) for (i = 0; i < n_txq; i++) { diag = rte_eth_tx_queue_setup(dev->port_id, i, dev->txq_size, - dev->socket_id, NULL); + dev->socket_id, + dpdk_multi_segment_mbufs ? &txconf + : NULL); if (diag) { VLOG_INFO("Interface %s unable to setup txq(%d): %s", dev->up.name, i, rte_strerror(-diag)); @@ -3868,6 +3900,12 @@ unlock: return err; } +void +netdev_dpdk_multi_segment_mbufs_enable(void) +{ + dpdk_multi_segment_mbufs = true; +} + #define NETDEV_DPDK_CLASS(NAME, INIT, CONSTRUCT, DESTRUCT, \ SET_CONFIG, SET_TX_MULTIQ, SEND, \ GET_CARRIER, GET_STATS, \ diff --git a/lib/netdev-dpdk.h b/lib/netdev-dpdk.h index b7d02a7..a3339fe 100644 --- a/lib/netdev-dpdk.h +++ b/lib/netdev-dpdk.h @@ -25,6 +25,7 @@ struct dp_packet; #ifdef DPDK_NETDEV +void netdev_dpdk_multi_segment_mbufs_enable(void); void netdev_dpdk_register(void); void free_dpdk_buf(struct dp_packet *); diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml index 9c2a826..5ef0926 100644 --- a/vswitchd/vswitch.xml +++ b/vswitchd/vswitch.xml @@ -331,6 +331,26 @@

+ +

+ Specifies if DPDK uses multi-segment mbufs for handling jumbo frames. +

+

+ If true, DPDK allocates a single mempool per port, irrespective + of the ports' requested MTU sizes. The elements of this mempool are + 'standard'-sized mbufs (typically 2k MB), which may be chained + together to accommodate jumbo frames. In this approach, each mbuf + typically stores a fragment of the overall jumbo frame. +

+

+ If not specified, defaults to false, in which case, + the size of each mbuf within a DPDK port's mempool will be grown to + accommodate jumbo frames within a single mbuf. +

+
+ +