From patchwork Tue May 1 17:02:14 2018
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Lam, Tiago"
X-Patchwork-Id: 907144
X-Patchwork-Delegate: ian.stokes@intel.com
Return-Path:
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Authentication-Results: ozlabs.org;
spf=pass (mailfrom) smtp.mailfrom=openvswitch.org
(client-ip=140.211.169.12; helo=mail.linuxfoundation.org;
envelope-from=ovs-dev-bounces@openvswitch.org;
receiver=)
Authentication-Results: ozlabs.org;
dmarc=none (p=none dis=none) header.from=intel.com
Received: from mail.linuxfoundation.org (mail.linuxfoundation.org
[140.211.169.12])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
bits)) (No client certificate requested)
by ozlabs.org (Postfix) with ESMTPS id 40b7B56l58z9s35
for ;
Wed, 2 May 2018 03:07:09 +1000 (AEST)
Received: from mail.linux-foundation.org (localhost [127.0.0.1])
by mail.linuxfoundation.org (Postfix) with ESMTP id 16E379D0;
Tue, 1 May 2018 17:03:00 +0000 (UTC)
X-Original-To: ovs-dev@openvswitch.org
Delivered-To: ovs-dev@mail.linuxfoundation.org
Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org
[172.17.192.35])
by mail.linuxfoundation.org (Postfix) with ESMTPS id 18108957
for ; Tue, 1 May 2018 17:02:59 +0000 (UTC)
X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6
Received: from mga04.intel.com (mga04.intel.com [192.55.52.120])
by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 4CB4E6D0
for ; Tue, 1 May 2018 17:02:58 +0000 (UTC)
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from fmsmga001.fm.intel.com ([10.253.24.23])
by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;
01 May 2018 10:02:57 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.49,351,1520924400"; d="scan'208";a="51485484"
Received: from silpixa00399125.ir.intel.com ([10.237.223.34])
by fmsmga001.fm.intel.com with ESMTP; 01 May 2018 10:02:57 -0700
From: Tiago Lam
To: ovs-dev@openvswitch.org
Date: Tue, 1 May 2018 18:02:14 +0100
Message-Id: <1525194134-248371-9-git-send-email-tiago.lam@intel.com>
X-Mailer: git-send-email 2.7.4
In-Reply-To: <1525194134-248371-1-git-send-email-tiago.lam@intel.com>
References: <1525194134-248371-1-git-send-email-tiago.lam@intel.com>
X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED
autolearn=ham version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
smtp1.linux-foundation.org
Subject: [ovs-dev] [RFC v5 8/8] netdev-dpdk: support multi-segment jumbo
frames
X-BeenThere: ovs-dev@openvswitch.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id:
List-Unsubscribe: ,
List-Archive:
List-Post:
List-Help:
List-Subscribe: ,
MIME-Version: 1.0
Sender: ovs-dev-bounces@openvswitch.org
Errors-To: ovs-dev-bounces@openvswitch.org
From: Mark Kavanagh
Currently, jumbo frame support for OvS-DPDK is implemented by
increasing the size of mbufs within a mempool, such that each mbuf
within the pool is large enough to contain an entire jumbo frame of
a user-defined size. Typically, for each user-defined MTU,
'requested_mtu', a new mempool is created, containing mbufs of size
~requested_mtu.
With the multi-segment approach, a port uses a single mempool,
(containing standard/default-sized mbufs of ~2k bytes), irrespective
of the user-requested MTU value. To accommodate jumbo frames, mbufs
are chained together, where each mbuf in the chain stores a portion of
the jumbo frame. Each mbuf in the chain is termed a segment, hence the
name.
== Enabling multi-segment mbufs ==
Multi-segment and single-segment mbufs are mutually exclusive, and the
user must decide on which approach to adopt on init. The introduction
of a new OVSDB field, 'dpdk-multi-seg-mbufs', facilitates this. This
is a global boolean value, which determines how jumbo frames are
represented across all DPDK ports. In the absence of a user-supplied
value, 'dpdk-multi-seg-mbufs' defaults to false, i.e. multi-segment
mbufs must be explicitly enabled / single-segment mbufs remain the
default.
Setting the field is identical to setting existing DPDK-specific OVSDB
fields:
ovs-vsctl set Open_vSwitch . other_config:dpdk-init=true
ovs-vsctl set Open_vSwitch . other_config:dpdk-lcore-mask=0x10
ovs-vsctl set Open_vSwitch . other_config:dpdk-socket-mem=4096,0
==> ovs-vsctl set Open_vSwitch . other_config:dpdk-multi-seg-mbufs=true
Co-authored-by: Tiago Lam
Signed-off-by: Mark Kavanagh
Signed-off-by: Tiago Lam
---
NEWS | 1 +
lib/dpdk.c | 7 +++++++
lib/netdev-dpdk.c | 52 +++++++++++++++++++++++++++++++++++++++++++++-------
lib/netdev-dpdk.h | 1 +
vswitchd/vswitch.xml | 20 ++++++++++++++++++++
5 files changed, 74 insertions(+), 7 deletions(-)
diff --git a/NEWS b/NEWS
index d22ad14..e6752d6 100644
--- a/NEWS
+++ b/NEWS
@@ -92,6 +92,7 @@ v2.9.0 - 19 Feb 2018
pmd assignments.
* Add rxq utilization of pmd to appctl 'dpif-netdev/pmd-rxq-show'.
* Add support for vHost dequeue zero copy (experimental)
+ * Add support for multi-segment mbufs
- Userspace datapath:
* Output packet batching support.
- vswitchd:
diff --git a/lib/dpdk.c b/lib/dpdk.c
index 00dd974..1447724 100644
--- a/lib/dpdk.c
+++ b/lib/dpdk.c
@@ -459,6 +459,13 @@ dpdk_init__(const struct smap *ovs_other_config)
/* Finally, register the dpdk classes */
netdev_dpdk_register();
+
+ bool multi_seg_mbufs_enable = smap_get_bool(ovs_other_config,
+ "dpdk-multi-seg-mbufs", false);
+ if (multi_seg_mbufs_enable) {
+ VLOG_INFO("DPDK multi-segment mbufs enabled\n");
+ netdev_dpdk_multi_segment_mbufs_enable();
+ }
}
void
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 4c6a3c0..5746ae0 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -66,6 +66,7 @@ enum {VIRTIO_RXQ, VIRTIO_TXQ, VIRTIO_QNUM};
VLOG_DEFINE_THIS_MODULE(netdev_dpdk);
static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20);
+static bool dpdk_multi_segment_mbufs = false;
#define DPDK_PORT_WATCHDOG_INTERVAL 5
@@ -593,6 +594,7 @@ dpdk_mp_create(struct netdev_dpdk *dev, uint16_t mbuf_pkt_data_len)
+ dev->requested_n_txq * dev->requested_txq_size
+ MIN(RTE_MAX_LCORE, dev->requested_n_rxq) * NETDEV_MAX_BURST
+ MIN_NB_MBUF;
+ /* XXX: should n_mbufs be increased if multi-seg mbufs are used? */
ovs_mutex_lock(&dpdk_mp_mutex);
do {
@@ -693,7 +695,13 @@ dpdk_mp_release(struct rte_mempool *mp)
/* Tries to allocate a new mempool - or re-use an existing one where
* appropriate - on requested_socket_id with a size determined by
- * requested_mtu and requested Rx/Tx queues.
+ * requested_mtu and requested Rx/Tx queues. Some properties of the mempool's
+ * elements are dependent on the value of 'dpdk_multi_segment_mbufs':
+ * - if 'true', then the mempool contains standard-sized mbufs that are chained
+ * together to accommodate packets of size 'requested_mtu'.
+ * - if 'false', then the members of the allocated mempool are
+ * non-standard-sized mbufs. Each mbuf in the mempool is large enough to
+ * fully accomdate packets of size 'requested_mtu'.
* On success - or when re-using an existing mempool - the new configuration
* will be applied.
* On error, device will be left unchanged. */
@@ -701,10 +709,18 @@ static int
netdev_dpdk_mempool_configure(struct netdev_dpdk *dev)
OVS_REQUIRES(dev->mutex)
{
- uint16_t buf_size = dpdk_buf_size(dev->requested_mtu);
+ uint16_t buf_size = 0;
struct rte_mempool *mp;
int ret = 0;
+ /* Contiguous mbufs in use - permit oversized mbufs */
+ if (!dpdk_multi_segment_mbufs) {
+ buf_size = dpdk_buf_size(dev->requested_mtu);
+ } else {
+ /* multi-segment mbufs - use standard mbuf size */
+ buf_size = dpdk_buf_size(ETHER_MTU);
+ }
+
dpdk_mp_sweep();
mp = dpdk_mp_create(dev, buf_size);
@@ -786,11 +802,25 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, int n_rxq, int n_txq)
int diag = 0;
int i;
struct rte_eth_conf conf = port_conf;
+ struct rte_eth_txconf txconf;
+
+ /* Multi-segment-mbuf-specific setup. */
+ if (dpdk_multi_segment_mbufs) {
+ struct rte_eth_dev_info dev_info;
+
+ /* DPDK PMDs typically attempt to use simple or vectorized
+ * transmit functions, neither of which are compatible with
+ * multi-segment mbufs. Ensure that these are disabled when
+ * multi-segment mbufs are enabled.
+ */
+ rte_eth_dev_info_get(dev->port_id, &dev_info);
+ txconf = dev_info.default_txconf;
+ txconf.txq_flags &= ~ETH_TXQ_FLAGS_NOMULTSEGS;
- /* For some NICs (e.g. Niantic), scatter_rx mode needs to be explicitly
- * enabled. */
- if (dev->mtu > ETHER_MTU) {
- conf.rxmode.enable_scatter = 1;
+ /* For some NICs (e.g. Niantic), scattered_rx mode (required for
+ * ingress jumbo frames when multi-segments are enabled) needs to
+ * be explicitly enabled. */
+ conf.rxmode.enable_scatter = 1;
}
conf.rxmode.hw_ip_checksum = (dev->hw_ol_features &
@@ -821,7 +851,9 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, int n_rxq, int n_txq)
for (i = 0; i < n_txq; i++) {
diag = rte_eth_tx_queue_setup(dev->port_id, i, dev->txq_size,
- dev->socket_id, NULL);
+ dev->socket_id,
+ dpdk_multi_segment_mbufs ? &txconf
+ : NULL);
if (diag) {
VLOG_INFO("Interface %s unable to setup txq(%d): %s",
dev->up.name, i, rte_strerror(-diag));
@@ -3868,6 +3900,12 @@ unlock:
return err;
}
+void
+netdev_dpdk_multi_segment_mbufs_enable(void)
+{
+ dpdk_multi_segment_mbufs = true;
+}
+
#define NETDEV_DPDK_CLASS(NAME, INIT, CONSTRUCT, DESTRUCT, \
SET_CONFIG, SET_TX_MULTIQ, SEND, \
GET_CARRIER, GET_STATS, \
diff --git a/lib/netdev-dpdk.h b/lib/netdev-dpdk.h
index b7d02a7..a3339fe 100644
--- a/lib/netdev-dpdk.h
+++ b/lib/netdev-dpdk.h
@@ -25,6 +25,7 @@ struct dp_packet;
#ifdef DPDK_NETDEV
+void netdev_dpdk_multi_segment_mbufs_enable(void);
void netdev_dpdk_register(void);
void free_dpdk_buf(struct dp_packet *);
diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
index 9c2a826..5ef0926 100644
--- a/vswitchd/vswitch.xml
+++ b/vswitchd/vswitch.xml
@@ -331,6 +331,26 @@
+
+
+ Specifies if DPDK uses multi-segment mbufs for handling jumbo frames.
+
+
+ If true, DPDK allocates a single mempool per port, irrespective
+ of the ports' requested MTU sizes. The elements of this mempool are
+ 'standard'-sized mbufs (typically 2k MB), which may be chained
+ together to accommodate jumbo frames. In this approach, each mbuf
+ typically stores a fragment of the overall jumbo frame.
+
+
+ If not specified, defaults to false
, in which case,
+ the size of each mbuf within a DPDK port's mempool will be grown to
+ accommodate jumbo frames within a single mbuf.
+
+
+
+