From patchwork Wed May 23 16:47:27 2018
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Lam, Tiago"
X-Patchwork-Id: 919204
Return-Path:
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Authentication-Results: ozlabs.org;
spf=pass (mailfrom) smtp.mailfrom=openvswitch.org
(client-ip=140.211.169.12; helo=mail.linuxfoundation.org;
envelope-from=ovs-dev-bounces@openvswitch.org;
receiver=)
Authentication-Results: ozlabs.org;
dmarc=fail (p=none dis=none) header.from=intel.com
Received: from mail.linuxfoundation.org (mail.linuxfoundation.org
[140.211.169.12])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
bits)) (No client certificate requested)
by ozlabs.org (Postfix) with ESMTPS id 40rdsH4B6lz9s15
for ;
Thu, 24 May 2018 02:54:27 +1000 (AEST)
Received: from mail.linux-foundation.org (localhost [127.0.0.1])
by mail.linuxfoundation.org (Postfix) with ESMTP id 42A91DD6;
Wed, 23 May 2018 16:48:12 +0000 (UTC)
X-Original-To: ovs-dev@openvswitch.org
Delivered-To: ovs-dev@mail.linuxfoundation.org
Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org
[172.17.192.35])
by mail.linuxfoundation.org (Postfix) with ESMTPS id EFDB4DDF
for ; Wed, 23 May 2018 16:48:08 +0000 (UTC)
X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6
Received: from mga09.intel.com (mga09.intel.com [134.134.136.24])
by smtp1.linuxfoundation.org (Postfix) with ESMTPS id EE815284
for ; Wed, 23 May 2018 16:48:07 +0000 (UTC)
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga004.jf.intel.com ([10.7.209.38])
by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;
23 May 2018 09:48:07 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.49,433,1520924400"; d="scan'208";a="201846115"
Received: from silpixa00399125.ir.intel.com ([10.237.223.34])
by orsmga004.jf.intel.com with ESMTP; 23 May 2018 09:48:06 -0700
From: Tiago Lam
To: ovs-dev@openvswitch.org
Date: Wed, 23 May 2018 17:47:27 +0100
Message-Id: <1527094048-105084-13-git-send-email-tiago.lam@intel.com>
X-Mailer: git-send-email 2.7.4
In-Reply-To: <1527094048-105084-1-git-send-email-tiago.lam@intel.com>
References: <1527094048-105084-1-git-send-email-tiago.lam@intel.com>
X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_HI
autolearn=ham version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
smtp1.linux-foundation.org
Subject: [ovs-dev] [RFC v7 12/13] netdev-dpdk: support multi-segment jumbo
frames
X-BeenThere: ovs-dev@openvswitch.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id:
List-Unsubscribe: ,
List-Archive:
List-Post:
List-Help:
List-Subscribe: ,
MIME-Version: 1.0
Sender: ovs-dev-bounces@openvswitch.org
Errors-To: ovs-dev-bounces@openvswitch.org
From: Mark Kavanagh
Currently, jumbo frame support for OvS-DPDK is implemented by
increasing the size of mbufs within a mempool, such that each mbuf
within the pool is large enough to contain an entire jumbo frame of
a user-defined size. Typically, for each user-defined MTU,
'requested_mtu', a new mempool is created, containing mbufs of size
~requested_mtu.
With the multi-segment approach, a port uses a single mempool,
(containing standard/default-sized mbufs of ~2k bytes), irrespective
of the user-requested MTU value. To accommodate jumbo frames, mbufs
are chained together, where each mbuf in the chain stores a portion of
the jumbo frame. Each mbuf in the chain is termed a segment, hence the
name.
== Enabling multi-segment mbufs ==
Multi-segment and single-segment mbufs are mutually exclusive, and the
user must decide on which approach to adopt on init. The introduction
of a new OVSDB field, 'dpdk-multi-seg-mbufs', facilitates this. This
is a global boolean value, which determines how jumbo frames are
represented across all DPDK ports. In the absence of a user-supplied
value, 'dpdk-multi-seg-mbufs' defaults to false, i.e. multi-segment
mbufs must be explicitly enabled / single-segment mbufs remain the
default.
Setting the field is identical to setting existing DPDK-specific OVSDB
fields:
ovs-vsctl set Open_vSwitch . other_config:dpdk-init=true
ovs-vsctl set Open_vSwitch . other_config:dpdk-lcore-mask=0x10
ovs-vsctl set Open_vSwitch . other_config:dpdk-socket-mem=4096,0
==> ovs-vsctl set Open_vSwitch . other_config:dpdk-multi-seg-mbufs=true
Co-authored-by: Tiago Lam
Signed-off-by: Mark Kavanagh
Signed-off-by: Tiago Lam
---
NEWS | 1 +
lib/dpdk.c | 7 +++++++
lib/netdev-dpdk.c | 49 ++++++++++++++++++++++++++++++++++++++++++++++---
lib/netdev-dpdk.h | 2 ++
vswitchd/vswitch.xml | 20 ++++++++++++++++++++
5 files changed, 76 insertions(+), 3 deletions(-)
diff --git a/NEWS b/NEWS
index ec548b0..3d4defd 100644
--- a/NEWS
+++ b/NEWS
@@ -102,6 +102,7 @@ v2.9.0 - 19 Feb 2018
pmd assignments.
* Add rxq utilization of pmd to appctl 'dpif-netdev/pmd-rxq-show'.
* Add support for vHost dequeue zero copy (experimental)
+ * Add support for multi-segment mbufs
- Userspace datapath:
* Output packet batching support.
- vswitchd:
diff --git a/lib/dpdk.c b/lib/dpdk.c
index 00dd974..1447724 100644
--- a/lib/dpdk.c
+++ b/lib/dpdk.c
@@ -459,6 +459,13 @@ dpdk_init__(const struct smap *ovs_other_config)
/* Finally, register the dpdk classes */
netdev_dpdk_register();
+
+ bool multi_seg_mbufs_enable = smap_get_bool(ovs_other_config,
+ "dpdk-multi-seg-mbufs", false);
+ if (multi_seg_mbufs_enable) {
+ VLOG_INFO("DPDK multi-segment mbufs enabled\n");
+ netdev_dpdk_multi_segment_mbufs_enable();
+ }
}
void
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index c6dfe6d..2bddc0b 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -66,6 +66,7 @@ enum {VIRTIO_RXQ, VIRTIO_TXQ, VIRTIO_QNUM};
VLOG_DEFINE_THIS_MODULE(netdev_dpdk);
static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20);
+static bool dpdk_multi_segment_mbufs = false;
#define DPDK_PORT_WATCHDOG_INTERVAL 5
@@ -632,6 +633,7 @@ dpdk_mp_create(struct netdev_dpdk *dev, uint16_t mbuf_pkt_data_len)
+ dev->requested_n_txq * dev->requested_txq_size
+ MIN(RTE_MAX_LCORE, dev->requested_n_rxq) * NETDEV_MAX_BURST
+ MIN_NB_MBUF;
+ /* XXX: should n_mbufs be increased if multi-seg mbufs are used? */
ovs_mutex_lock(&dpdk_mp_mutex);
do {
@@ -731,7 +733,13 @@ dpdk_mp_release(struct rte_mempool *mp)
/* Tries to allocate a new mempool - or re-use an existing one where
* appropriate - on requested_socket_id with a size determined by
- * requested_mtu and requested Rx/Tx queues.
+ * requested_mtu and requested Rx/Tx queues. Some properties of the mempool's
+ * elements are dependent on the value of 'dpdk_multi_segment_mbufs':
+ * - if 'true', then the mempool contains standard-sized mbufs that are chained
+ * together to accommodate packets of size 'requested_mtu'.
+ * - if 'false', then the members of the allocated mempool are
+ * non-standard-sized mbufs. Each mbuf in the mempool is large enough to
+ * fully accomdate packets of size 'requested_mtu'.
* On success - or when re-using an existing mempool - the new configuration
* will be applied.
* On error, device will be left unchanged. */
@@ -739,10 +747,18 @@ static int
netdev_dpdk_mempool_configure(struct netdev_dpdk *dev)
OVS_REQUIRES(dev->mutex)
{
- uint16_t buf_size = dpdk_buf_size(dev->requested_mtu);
+ uint16_t buf_size = 0;
struct rte_mempool *mp;
int ret = 0;
+ /* Contiguous mbufs in use - permit oversized mbufs */
+ if (!dpdk_multi_segment_mbufs) {
+ buf_size = dpdk_buf_size(dev->requested_mtu);
+ } else {
+ /* multi-segment mbufs - use standard mbuf size */
+ buf_size = dpdk_buf_size(ETHER_MTU);
+ }
+
dpdk_mp_sweep();
mp = dpdk_mp_create(dev, buf_size);
@@ -824,6 +840,7 @@ dpdk_eth_dev_port_config(struct netdev_dpdk *dev, int n_rxq, int n_txq)
int diag = 0;
int i;
struct rte_eth_conf conf = port_conf;
+ struct rte_eth_txconf txconf;
struct rte_eth_dev_info info;
/* As of DPDK 17.11.1 a few PMDs require to explicitly enable
@@ -839,6 +856,18 @@ dpdk_eth_dev_port_config(struct netdev_dpdk *dev, int n_rxq, int n_txq)
}
}
+ /* Multi-segment-mbuf-specific setup. */
+ if (dpdk_multi_segment_mbufs) {
+ /* DPDK PMDs typically attempt to use simple or vectorized
+ * transmit functions, neither of which are compatible with
+ * multi-segment mbufs. Ensure that these are disabled when
+ * multi-segment mbufs are enabled.
+ */
+ rte_eth_dev_info_get(dev->port_id, &info);
+ txconf = info.default_txconf;
+ txconf.txq_flags &= ~ETH_TXQ_FLAGS_NOMULTSEGS;
+ }
+
conf.intr_conf.lsc = dev->lsc_interrupt_mode;
conf.rxmode.hw_ip_checksum = (dev->hw_ol_features &
NETDEV_RX_CHECKSUM_OFFLOAD) != 0;
@@ -868,7 +897,9 @@ dpdk_eth_dev_port_config(struct netdev_dpdk *dev, int n_rxq, int n_txq)
for (i = 0; i < n_txq; i++) {
diag = rte_eth_tx_queue_setup(dev->port_id, i, dev->txq_size,
- dev->socket_id, NULL);
+ dev->socket_id,
+ dpdk_multi_segment_mbufs ? &txconf
+ : NULL);
if (diag) {
VLOG_INFO("Interface %s unable to setup txq(%d): %s",
dev->up.name, i, rte_strerror(-diag));
@@ -3961,6 +3992,18 @@ unlock:
return err;
}
+bool
+netdev_dpdk_is_multi_segment_mbufs_enabled(void)
+{
+ return dpdk_multi_segment_mbufs == true;
+}
+
+void
+netdev_dpdk_multi_segment_mbufs_enable(void)
+{
+ dpdk_multi_segment_mbufs = true;
+}
+
#define NETDEV_DPDK_CLASS(NAME, INIT, CONSTRUCT, DESTRUCT, \
SET_CONFIG, SET_TX_MULTIQ, SEND, \
GET_CARRIER, GET_STATS, \
diff --git a/lib/netdev-dpdk.h b/lib/netdev-dpdk.h
index e627553..78871ca 100644
--- a/lib/netdev-dpdk.h
+++ b/lib/netdev-dpdk.h
@@ -26,6 +26,8 @@ struct dp_packet;
#ifdef DPDK_NETDEV
struct rte_mempool;
+bool netdev_dpdk_is_multi_segment_mbufs_enabled(void);
+void netdev_dpdk_multi_segment_mbufs_enable(void);
void netdev_dpdk_register(void);
struct dp_packet *
dpdk_buf_alloc(struct rte_mempool *mp);
diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
index 3940b8d..12a2de4 100644
--- a/vswitchd/vswitch.xml
+++ b/vswitchd/vswitch.xml
@@ -331,6 +331,26 @@
+
+
+ Specifies if DPDK uses multi-segment mbufs for handling jumbo frames.
+
+
+ If true, DPDK allocates a single mempool per port, irrespective
+ of the ports' requested MTU sizes. The elements of this mempool are
+ 'standard'-sized mbufs (typically 2k MB), which may be chained
+ together to accommodate jumbo frames. In this approach, each mbuf
+ typically stores a fragment of the overall jumbo frame.
+
+
+ If not specified, defaults to false
, in which case,
+ the size of each mbuf within a DPDK port's mempool will be grown to
+ accommodate jumbo frames within a single mbuf.
+
+
+
+