From patchwork Mon Aug 20 17:44:22 2018
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Lam, Tiago"
X-Patchwork-Id: 959854
X-Patchwork-Delegate: ian.stokes@intel.com
Return-Path:
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Authentication-Results: ozlabs.org;
spf=pass (mailfrom) smtp.mailfrom=openvswitch.org
(client-ip=140.211.169.12; helo=mail.linuxfoundation.org;
envelope-from=ovs-dev-bounces@openvswitch.org;
receiver=)
Authentication-Results: ozlabs.org;
dmarc=fail (p=none dis=none) header.from=intel.com
Received: from mail.linuxfoundation.org (mail.linuxfoundation.org
[140.211.169.12])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
bits)) (No client certificate requested)
by ozlabs.org (Postfix) with ESMTPS id 41vLmw35Shz9s8T
for ;
Tue, 21 Aug 2018 03:45:20 +1000 (AEST)
Received: from mail.linux-foundation.org (localhost [127.0.0.1])
by mail.linuxfoundation.org (Postfix) with ESMTP id AAA4AD35;
Mon, 20 Aug 2018 17:44:52 +0000 (UTC)
X-Original-To: ovs-dev@openvswitch.org
Delivered-To: ovs-dev@mail.linuxfoundation.org
Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org
[172.17.192.35])
by mail.linuxfoundation.org (Postfix) with ESMTPS id DFC2DCE4
for ; Mon, 20 Aug 2018 17:44:50 +0000 (UTC)
X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6
Received: from mga05.intel.com (mga05.intel.com [192.55.52.43])
by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 3FE8D67F
for ; Mon, 20 Aug 2018 17:44:50 +0000 (UTC)
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from fmsmga004.fm.intel.com ([10.253.24.48])
by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;
20 Aug 2018 10:44:48 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.53,266,1531810800"; d="scan'208";a="81880324"
Received: from silpixa00399125.ir.intel.com ([10.237.223.34])
by fmsmga004.fm.intel.com with ESMTP; 20 Aug 2018 10:44:44 -0700
From: Tiago Lam
To: ovs-dev@openvswitch.org
Date: Mon, 20 Aug 2018 18:44:22 +0100
Message-Id: <1534787075-139132-2-git-send-email-tiago.lam@intel.com>
X-Mailer: git-send-email 2.7.4
In-Reply-To: <1534787075-139132-1-git-send-email-tiago.lam@intel.com>
References: <1534787075-139132-1-git-send-email-tiago.lam@intel.com>
X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED
autolearn=ham version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
smtp1.linux-foundation.org
Cc: i.maximets@samsung.com
Subject: [ovs-dev] [PATCH v8 01/14] netdev-dpdk: fix mbuf sizing
X-BeenThere: ovs-dev@openvswitch.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id:
List-Unsubscribe: ,
List-Archive:
List-Post:
List-Help:
List-Subscribe: ,
MIME-Version: 1.0
Sender: ovs-dev-bounces@openvswitch.org
Errors-To: ovs-dev-bounces@openvswitch.org
From: Mark Kavanagh
There are numerous factors that must be considered when calculating
the size of an mbuf:
- the data portion of the mbuf must be sized in accordance With Rx
buffer alignment (typically 1024B). So, for example, in order to
successfully receive and capture a 1500B packet, mbufs with a
data portion of size 2048B must be used.
- in OvS, the elements that comprise an mbuf are:
* the dp packet, which includes a struct rte mbuf (704B)
* RTE_PKTMBUF_HEADROOM (128B)
* packet data (aligned to 1k, as previously described)
* RTE_PKTMBUF_TAILROOM (typically 0)
Some PMDs require that the total mbuf size (i.e. the total sum of all
of the above-listed components' lengths) is cache-aligned. To satisfy
this requirement, it may be necessary to round up the total mbuf size
with respect to cacheline size. In doing so, it's possible that the
dp_packet's data portion is inadvertently increased in size, such that
it no longer adheres to Rx buffer alignment. Consequently, the
following property of the mbuf no longer holds true:
mbuf.data_len == mbuf.buf_len - mbuf.data_off
This creates a problem in the case of multi-segment mbufs, where that
assumption is assumed to be true for all but the final segment in an
mbuf chain. Resolve this issue by adjusting the size of the mbuf's
private data portion, as opposed to the packet data portion when
aligning mbuf size to cachelines.
Fixes: 4be4d22 ("netdev-dpdk: clean up mbuf initialization")
Fixes: 31b88c9 ("netdev-dpdk: round up mbuf_size to cache_line_size")
CC: Santosh Shukla
Signed-off-by: Mark Kavanagh
Acked-by: Santosh Shukla
Acked-by: Eelco Chaudron
---
lib/netdev-dpdk.c | 56 +++++++++++++++++++++++++++++++++++++------------------
1 file changed, 38 insertions(+), 18 deletions(-)
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index ac02a09..0cd9ff6 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -88,10 +88,6 @@ static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20);
#define MTU_TO_MAX_FRAME_LEN(mtu) ((mtu) + ETHER_HDR_MAX_LEN)
#define FRAME_LEN_TO_MTU(frame_len) ((frame_len) \
- ETHER_HDR_LEN - ETHER_CRC_LEN)
-#define MBUF_SIZE(mtu) ROUND_UP((MTU_TO_MAX_FRAME_LEN(mtu) \
- + sizeof(struct dp_packet) \
- + RTE_PKTMBUF_HEADROOM), \
- RTE_CACHE_LINE_SIZE)
#define NETDEV_DPDK_MBUF_ALIGN 1024
#define NETDEV_DPDK_MAX_PKT_LEN 9728
@@ -637,7 +633,11 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu, bool per_port_mp)
char mp_name[RTE_MEMPOOL_NAMESIZE];
const char *netdev_name = netdev_get_name(&dev->up);
int socket_id = dev->requested_socket_id;
- uint32_t n_mbufs;
+ uint32_t n_mbufs = 0;
+ uint32_t mbuf_size = 0;
+ uint32_t aligned_mbuf_size = 0;
+ uint32_t mbuf_priv_data_len = 0;
+ uint32_t pkt_size = 0;
uint32_t hash = hash_string(netdev_name, 0);
struct dpdk_mp *dmp = NULL;
int ret;
@@ -650,6 +650,9 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu, bool per_port_mp)
dmp->mtu = mtu;
dmp->refcount = 1;
+ /* Get the size of each mbuf, based on the MTU */
+ mbuf_size = dpdk_buf_size(dev->requested_mtu);
+
n_mbufs = dpdk_calculate_mbufs(dev, mtu, per_port_mp);
do {
@@ -661,8 +664,8 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu, bool per_port_mp)
* so this is not an issue for tasks such as debugging.
*/
ret = snprintf(mp_name, RTE_MEMPOOL_NAMESIZE,
- "ovs%08x%02d%05d%07u",
- hash, socket_id, mtu, n_mbufs);
+ "ovs%08x%02d%05d%07u",
+ hash, socket_id, mtu, n_mbufs);
if (ret < 0 || ret >= RTE_MEMPOOL_NAMESIZE) {
VLOG_DBG("snprintf returned %d. "
"Failed to generate a mempool name for \"%s\". "
@@ -671,17 +674,34 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu, bool per_port_mp)
break;
}
- VLOG_DBG("Port %s: Requesting a mempool of %u mbufs "
- "on socket %d for %d Rx and %d Tx queues.",
- netdev_name, n_mbufs, socket_id,
- dev->requested_n_rxq, dev->requested_n_txq);
-
- dmp->mp = rte_pktmbuf_pool_create(mp_name, n_mbufs,
- MP_CACHE_SZ,
- sizeof (struct dp_packet)
- - sizeof (struct rte_mbuf),
- MBUF_SIZE(mtu)
- - sizeof(struct dp_packet),
+ VLOG_DBG("Port %s: Requesting a mempool of %u mbufs of size %u "
+ "on socket %d for %d Rx and %d Tx queues, "
+ "cache line size of %u",
+ netdev_name, n_mbufs, mbuf_size, socket_id,
+ dev->requested_n_rxq, dev->requested_n_txq,
+ RTE_CACHE_LINE_SIZE);
+
+ mbuf_priv_data_len = sizeof(struct dp_packet) -
+ sizeof(struct rte_mbuf);
+ /* The size of the entire dp_packet. */
+ pkt_size = sizeof (struct dp_packet) +
+ mbuf_size + RTE_PKTMBUF_HEADROOM;
+ /* mbuf size, rounded up to cacheline size. */
+ aligned_mbuf_size = ROUND_UP(pkt_size, RTE_CACHE_LINE_SIZE);
+ /* If there is a size discrepancy, add padding to mbuf_priv_data_len.
+ * This maintains mbuf size cache alignment, while also honoring RX
+ * buffer alignment in the data portion of the mbuf. If this adjustment
+ * is not made, there is a possiblity later on that for an element of
+ * the mempool, buf, buf->data_len < (buf->buf_len - buf->data_off).
+ * This is problematic in the case of multi-segment mbufs, particularly
+ * when an mbuf segment needs to be resized (when [push|popp]ing a VLAN
+ * header, for example.
+ */
+ mbuf_priv_data_len += (aligned_mbuf_size - pkt_size);
+
+ dmp->mp = rte_pktmbuf_pool_create(mp_name, n_mbufs, MP_CACHE_SZ,
+ mbuf_priv_data_len,
+ mbuf_size + RTE_PKTMBUF_HEADROOM,
socket_id);
if (dmp->mp) {
From patchwork Mon Aug 20 17:44:23 2018
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Lam, Tiago"
X-Patchwork-Id: 959853
X-Patchwork-Delegate: ian.stokes@intel.com
Return-Path:
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Authentication-Results: ozlabs.org;
spf=pass (mailfrom) smtp.mailfrom=openvswitch.org
(client-ip=140.211.169.12; helo=mail.linuxfoundation.org;
envelope-from=ovs-dev-bounces@openvswitch.org;
receiver=)
Authentication-Results: ozlabs.org;
dmarc=fail (p=none dis=none) header.from=intel.com
Received: from mail.linuxfoundation.org (mail.linuxfoundation.org
[140.211.169.12])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
bits)) (No client certificate requested)
by ozlabs.org (Postfix) with ESMTPS id 41vLmR3G9mz9s8T
for ;
Tue, 21 Aug 2018 03:44:54 +1000 (AEST)
Received: from mail.linux-foundation.org (localhost [127.0.0.1])
by mail.linuxfoundation.org (Postfix) with ESMTP id D2281CDA;
Mon, 20 Aug 2018 17:44:50 +0000 (UTC)
X-Original-To: ovs-dev@openvswitch.org
Delivered-To: ovs-dev@mail.linuxfoundation.org
Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org
[172.17.192.35])
by mail.linuxfoundation.org (Postfix) with ESMTPS id 3A91F89C
for ; Mon, 20 Aug 2018 17:44:50 +0000 (UTC)
X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6
Received: from mga05.intel.com (mga05.intel.com [192.55.52.43])
by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 3546167F
for ; Mon, 20 Aug 2018 17:44:49 +0000 (UTC)
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from fmsmga004.fm.intel.com ([10.253.24.48])
by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;
20 Aug 2018 10:44:48 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.53,266,1531810800"; d="scan'208";a="81880329"
Received: from silpixa00399125.ir.intel.com ([10.237.223.34])
by fmsmga004.fm.intel.com with ESMTP; 20 Aug 2018 10:44:46 -0700
From: Tiago Lam
To: ovs-dev@openvswitch.org
Date: Mon, 20 Aug 2018 18:44:23 +0100
Message-Id: <1534787075-139132-3-git-send-email-tiago.lam@intel.com>
X-Mailer: git-send-email 2.7.4
In-Reply-To: <1534787075-139132-1-git-send-email-tiago.lam@intel.com>
References: <1534787075-139132-1-git-send-email-tiago.lam@intel.com>
X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED
autolearn=ham version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
smtp1.linux-foundation.org
Cc: i.maximets@samsung.com
Subject: [ovs-dev] [PATCH v8 02/14] dp-packet: Init specific mbuf fields.
X-BeenThere: ovs-dev@openvswitch.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id:
List-Unsubscribe: ,
List-Archive:
List-Post:
List-Help:
List-Subscribe: ,
MIME-Version: 1.0
Sender: ovs-dev-bounces@openvswitch.org
Errors-To: ovs-dev-bounces@openvswitch.org
From: Mark Kavanagh
dp_packets are created using xmalloc(); in the case of OvS-DPDK, it's
possible the the resultant mbuf portion of the dp_packet contains
random data. For some mbuf fields, specifically those related to
multi-segment mbufs and/or offload features, random values may cause
unexpected behaviour, should the dp_packet's contents be later copied
to a DPDK mbuf. It is critical therefore, that these fields should be
initialized to 0.
This patch ensures that the following mbuf fields are initialized to
appropriate values on creation of a new dp_packet:
- ol_flags=0
- nb_segs=1
- tx_offload=0
- packet_type=0
- next=NULL
Adapted from an idea by Michael Qiu :
https://patchwork.ozlabs.org/patch/777570/
Co-authored-by: Tiago Lam
Signed-off-by: Mark Kavanagh
Signed-off-by: Tiago Lam
Acked-by: Eelco Chaudron
---
lib/dp-packet.h | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/lib/dp-packet.h b/lib/dp-packet.h
index ba91e58..b948fe1 100644
--- a/lib/dp-packet.h
+++ b/lib/dp-packet.h
@@ -625,14 +625,15 @@ dp_packet_mbuf_rss_flag_reset(struct dp_packet *p OVS_UNUSED)
}
/* This initialization is needed for packets that do not come
- * from DPDK interfaces, when vswitchd is built with --with-dpdk.
- * The DPDK rte library will still otherwise manage the mbuf.
- * We only need to initialize the mbuf ol_flags. */
+ * from DPDK interfaces, when vswitchd is built with --with-dpdk. */
static inline void
dp_packet_mbuf_init(struct dp_packet *p OVS_UNUSED)
{
#ifdef DPDK_NETDEV
- p->mbuf.ol_flags = 0;
+ struct rte_mbuf *mbuf = &(p->mbuf);
+ mbuf->ol_flags = mbuf->tx_offload = mbuf->packet_type = 0;
+ mbuf->nb_segs = 1;
+ mbuf->next = NULL;
#endif
}
From patchwork Mon Aug 20 17:44:24 2018
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Lam, Tiago"
X-Patchwork-Id: 959855
X-Patchwork-Delegate: ian.stokes@intel.com
Return-Path:
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Authentication-Results: ozlabs.org;
spf=pass (mailfrom) smtp.mailfrom=openvswitch.org
(client-ip=140.211.169.12; helo=mail.linuxfoundation.org;
envelope-from=ovs-dev-bounces@openvswitch.org;
receiver=)
Authentication-Results: ozlabs.org;
dmarc=fail (p=none dis=none) header.from=intel.com
Received: from mail.linuxfoundation.org (mail.linuxfoundation.org
[140.211.169.12])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
bits)) (No client certificate requested)
by ozlabs.org (Postfix) with ESMTPS id 41vLnT189Nz9s8T
for ;
Tue, 21 Aug 2018 03:45:49 +1000 (AEST)
Received: from mail.linux-foundation.org (localhost [127.0.0.1])
by mail.linuxfoundation.org (Postfix) with ESMTP id 6EEDBD4A;
Mon, 20 Aug 2018 17:44:53 +0000 (UTC)
X-Original-To: ovs-dev@openvswitch.org
Delivered-To: ovs-dev@mail.linuxfoundation.org
Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org
[172.17.192.35])
by mail.linuxfoundation.org (Postfix) with ESMTPS id 4F0AFD07
for ; Mon, 20 Aug 2018 17:44:51 +0000 (UTC)
X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6
Received: from mga05.intel.com (mga05.intel.com [192.55.52.43])
by smtp1.linuxfoundation.org (Postfix) with ESMTPS id E48FC67F
for ; Mon, 20 Aug 2018 17:44:50 +0000 (UTC)
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from fmsmga004.fm.intel.com ([10.253.24.48])
by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;
20 Aug 2018 10:44:50 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.53,266,1531810800"; d="scan'208";a="81880338"
Received: from silpixa00399125.ir.intel.com ([10.237.223.34])
by fmsmga004.fm.intel.com with ESMTP; 20 Aug 2018 10:44:48 -0700
From: Tiago Lam
To: ovs-dev@openvswitch.org
Date: Mon, 20 Aug 2018 18:44:24 +0100
Message-Id: <1534787075-139132-4-git-send-email-tiago.lam@intel.com>
X-Mailer: git-send-email 2.7.4
In-Reply-To: <1534787075-139132-1-git-send-email-tiago.lam@intel.com>
References: <1534787075-139132-1-git-send-email-tiago.lam@intel.com>
X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED
autolearn=ham version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
smtp1.linux-foundation.org
Cc: i.maximets@samsung.com
Subject: [ovs-dev] [PATCH v8 03/14] dp-packet: Fix allocated size on DPDK
init.
X-BeenThere: ovs-dev@openvswitch.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id:
List-Unsubscribe: ,
List-Archive:
List-Post:
List-Help:
List-Subscribe: ,
MIME-Version: 1.0
Sender: ovs-dev-bounces@openvswitch.org
Errors-To: ovs-dev-bounces@openvswitch.org
When enabled with DPDK OvS deals with two types of packets, the ones
coming from the mempool and the ones locally created by OvS - which are
copied to mempool mbufs before output. In the latter, the space is
allocated from the system, while in the former the mbufs are allocated
from a mempool, which takes care of initialising them appropriately.
In the current implementation, during mempool's initialisation of mbufs,
dp_packet_set_allocated() is called from dp_packet_init_dpdk() without
considering that the allocated space, in the case of multi-segment
mbufs, might be greater than a single mbuf. Furthermore, given that
dp_packet_init_dpdk() is on the code path that's called upon mempool's
initialisation, a call to dp_packet_set_allocated() is redundant, since
mempool takes care of initialising it.
To fix this, dp_packet_set_allocated() is no longer called after
initialisation of a mempool, only in dp_packet_init__(), which is still
called by OvS when initialising locally created packets.
Signed-off-by: Tiago Lam
Acked-by: Eelco Chaudron
---
lib/dp-packet.c | 3 +--
lib/dp-packet.h | 2 +-
lib/netdev-dpdk.c | 2 +-
3 files changed, 3 insertions(+), 4 deletions(-)
diff --git a/lib/dp-packet.c b/lib/dp-packet.c
index 443c225..782e7c2 100644
--- a/lib/dp-packet.c
+++ b/lib/dp-packet.c
@@ -99,9 +99,8 @@ dp_packet_use_const(struct dp_packet *b, const void *data, size_t size)
* buffer. Here, non-transient ovs dp-packet fields are initialized for
* packets that are part of a DPDK memory pool. */
void
-dp_packet_init_dpdk(struct dp_packet *b, size_t allocated)
+dp_packet_init_dpdk(struct dp_packet *b)
{
- dp_packet_set_allocated(b, allocated);
b->source = DPBUF_DPDK;
}
diff --git a/lib/dp-packet.h b/lib/dp-packet.h
index b948fe1..6376039 100644
--- a/lib/dp-packet.h
+++ b/lib/dp-packet.h
@@ -114,7 +114,7 @@ void dp_packet_use(struct dp_packet *, void *, size_t);
void dp_packet_use_stub(struct dp_packet *, void *, size_t);
void dp_packet_use_const(struct dp_packet *, const void *, size_t);
-void dp_packet_init_dpdk(struct dp_packet *, size_t allocated);
+void dp_packet_init_dpdk(struct dp_packet *);
void dp_packet_init(struct dp_packet *, size_t);
void dp_packet_uninit(struct dp_packet *);
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 0cd9ff6..ebd55e9 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -550,7 +550,7 @@ ovs_rte_pktmbuf_init(struct rte_mempool *mp OVS_UNUSED,
{
struct rte_mbuf *pkt = _p;
- dp_packet_init_dpdk((struct dp_packet *) pkt, pkt->buf_len);
+ dp_packet_init_dpdk((struct dp_packet *) pkt);
}
static int
From patchwork Mon Aug 20 17:44:25 2018
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Lam, Tiago"
X-Patchwork-Id: 959857
X-Patchwork-Delegate: ian.stokes@intel.com
Return-Path:
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Authentication-Results: ozlabs.org;
spf=pass (mailfrom) smtp.mailfrom=openvswitch.org
(client-ip=140.211.169.12; helo=mail.linuxfoundation.org;
envelope-from=ovs-dev-bounces@openvswitch.org;
receiver=)
Authentication-Results: ozlabs.org;
dmarc=fail (p=none dis=none) header.from=intel.com
Received: from mail.linuxfoundation.org (mail.linuxfoundation.org
[140.211.169.12])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
bits)) (No client certificate requested)
by ozlabs.org (Postfix) with ESMTPS id 41vLpq5Ddbz9s9F
for ;
Tue, 21 Aug 2018 03:46:59 +1000 (AEST)
Received: from mail.linux-foundation.org (localhost [127.0.0.1])
by mail.linuxfoundation.org (Postfix) with ESMTP id 3A9FAD3E;
Mon, 20 Aug 2018 17:45:18 +0000 (UTC)
X-Original-To: ovs-dev@openvswitch.org
Delivered-To: ovs-dev@mail.linuxfoundation.org
Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org
[172.17.192.35])
by mail.linuxfoundation.org (Postfix) with ESMTPS id 39686CDD
for ; Mon, 20 Aug 2018 17:45:17 +0000 (UTC)
X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6
Received: from mga09.intel.com (mga09.intel.com [134.134.136.24])
by smtp1.linuxfoundation.org (Postfix) with ESMTPS id C4AC4772
for ; Mon, 20 Aug 2018 17:45:16 +0000 (UTC)
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from fmsmga004.fm.intel.com ([10.253.24.48])
by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;
20 Aug 2018 10:45:16 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.53,266,1531810800"; d="scan'208";a="81880346"
Received: from silpixa00399125.ir.intel.com ([10.237.223.34])
by fmsmga004.fm.intel.com with ESMTP; 20 Aug 2018 10:44:50 -0700
From: Tiago Lam
To: ovs-dev@openvswitch.org
Date: Mon, 20 Aug 2018 18:44:25 +0100
Message-Id: <1534787075-139132-5-git-send-email-tiago.lam@intel.com>
X-Mailer: git-send-email 2.7.4
In-Reply-To: <1534787075-139132-1-git-send-email-tiago.lam@intel.com>
References: <1534787075-139132-1-git-send-email-tiago.lam@intel.com>
X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_HI
autolearn=ham version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
smtp1.linux-foundation.org
Cc: i.maximets@samsung.com
Subject: [ovs-dev] [PATCH v8 04/14] netdev-dpdk: Serialise non-pmds mbufs'
alloc/free.
X-BeenThere: ovs-dev@openvswitch.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id:
List-Unsubscribe: ,
List-Archive:
List-Post:
List-Help:
List-Subscribe: ,
MIME-Version: 1.0
Sender: ovs-dev-bounces@openvswitch.org
Errors-To: ovs-dev-bounces@openvswitch.org
A new mutex, 'nonpmd_mp_mutex', has been introduced to serialise
allocation and free operations by non-pmd threads on a given mempool.
free_dpdk_buf() has been modified to make use of the introduced mutex.
Signed-off-by: Tiago Lam
Acked-by: Eelco Chaudron
---
lib/netdev-dpdk.c | 33 ++++++++++++++++++++++++++++++---
1 file changed, 30 insertions(+), 3 deletions(-)
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index ebd55e9..aee8e20 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -322,6 +322,16 @@ static struct ovs_mutex dpdk_mp_mutex OVS_ACQ_AFTER(dpdk_mutex)
static struct ovs_list dpdk_mp_list OVS_GUARDED_BY(dpdk_mp_mutex)
= OVS_LIST_INITIALIZER(&dpdk_mp_list);
+/* This mutex must be used by non pmd threads when allocating or freeing
+ * mbufs through mempools, when outside of the `non_pmd_mutex` mutex, in struct
+ * dp_netdev.
+ * The reason, as pointed out in the "Known Issues" section in DPDK's EAL docs,
+ * is that the implementation on which mempool is based off is non-preemptable.
+ * Since non-pmds may end up not being pinned this could lead to the preemption
+ * between non-pmds performing operations on the same mempool, which could lead
+ * to memory corruption. */
+static struct ovs_mutex nonpmd_mp_mutex = OVS_MUTEX_INITIALIZER;
+
struct dpdk_mp {
struct rte_mempool *mp;
int mtu;
@@ -492,6 +502,8 @@ struct netdev_rxq_dpdk {
dpdk_port_t port_id;
};
+static bool dpdk_thread_is_pmd(void);
+
static void netdev_dpdk_destruct(struct netdev *netdev);
static void netdev_dpdk_vhost_destruct(struct netdev *netdev);
@@ -525,6 +537,12 @@ dpdk_buf_size(int mtu)
NETDEV_DPDK_MBUF_ALIGN);
}
+static bool
+dpdk_thread_is_pmd(void)
+{
+ return rte_lcore_id() != NON_PMD_CORE_ID;
+}
+
/* Allocates an area of 'sz' bytes from DPDK. The memory is zero'ed.
*
* Unlike xmalloc(), this function can return NULL on failure. */
@@ -535,11 +553,20 @@ dpdk_rte_mzalloc(size_t sz)
}
void
-free_dpdk_buf(struct dp_packet *p)
+free_dpdk_buf(struct dp_packet *packet)
{
- struct rte_mbuf *pkt = (struct rte_mbuf *) p;
+ /* If non-pmd we need to lock on nonpmd_mp_mutex mutex */
+ if (!dpdk_thread_is_pmd()) {
+ ovs_mutex_lock(&nonpmd_mp_mutex);
+
+ rte_pktmbuf_free(&packet->mbuf);
+
+ ovs_mutex_unlock(&nonpmd_mp_mutex);
+
+ return;
+ }
- rte_pktmbuf_free(pkt);
+ rte_pktmbuf_free(&packet->mbuf);
}
static void
From patchwork Mon Aug 20 17:44:26 2018
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Lam, Tiago"
X-Patchwork-Id: 959862
X-Patchwork-Delegate: ian.stokes@intel.com
Return-Path:
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Authentication-Results: ozlabs.org;
spf=pass (mailfrom) smtp.mailfrom=openvswitch.org
(client-ip=140.211.169.12; helo=mail.linuxfoundation.org;
envelope-from=ovs-dev-bounces@openvswitch.org;
receiver=)
Authentication-Results: ozlabs.org;
dmarc=fail (p=none dis=none) header.from=intel.com
Received: from mail.linuxfoundation.org (mail.linuxfoundation.org
[140.211.169.12])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
bits)) (No client certificate requested)
by ozlabs.org (Postfix) with ESMTPS id 41vLs82LlQz9s0n
for ;
Tue, 21 Aug 2018 03:49:00 +1000 (AEST)
Received: from mail.linux-foundation.org (localhost [127.0.0.1])
by mail.linuxfoundation.org (Postfix) with ESMTP id 29357DAB;
Mon, 20 Aug 2018 17:45:24 +0000 (UTC)
X-Original-To: ovs-dev@openvswitch.org
Delivered-To: ovs-dev@mail.linuxfoundation.org
Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org
[172.17.192.35])
by mail.linuxfoundation.org (Postfix) with ESMTPS id EF24AD6F
for ; Mon, 20 Aug 2018 17:45:18 +0000 (UTC)
X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6
Received: from mga09.intel.com (mga09.intel.com [134.134.136.24])
by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 5F129773
for ; Mon, 20 Aug 2018 17:45:18 +0000 (UTC)
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from fmsmga004.fm.intel.com ([10.253.24.48])
by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;
20 Aug 2018 10:45:16 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.53,266,1531810800"; d="scan'208";a="81880356"
Received: from silpixa00399125.ir.intel.com ([10.237.223.34])
by fmsmga004.fm.intel.com with ESMTP; 20 Aug 2018 10:44:52 -0700
From: Tiago Lam
To: ovs-dev@openvswitch.org
Date: Mon, 20 Aug 2018 18:44:26 +0100
Message-Id: <1534787075-139132-6-git-send-email-tiago.lam@intel.com>
X-Mailer: git-send-email 2.7.4
In-Reply-To: <1534787075-139132-1-git-send-email-tiago.lam@intel.com>
References: <1534787075-139132-1-git-send-email-tiago.lam@intel.com>
X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_HI
autolearn=ham version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
smtp1.linux-foundation.org
Cc: Marcin Ksiadz ,
Przemyslaw Lal ,
Michael Qiu , i.maximets@samsung.com
Subject: [ovs-dev] [PATCH v8 05/14] dp-packet: Fix data_len handling
multi-seg mbufs.
X-BeenThere: ovs-dev@openvswitch.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id:
List-Unsubscribe: ,
List-Archive:
List-Post:
List-Help:
List-Subscribe: ,
MIME-Version: 1.0
Sender: ovs-dev-bounces@openvswitch.org
Errors-To: ovs-dev-bounces@openvswitch.org
When a dp_packet is from a DPDK source, and it contains multi-segment
mbufs, the data_len is not equal to the packet size, pkt_len. Instead,
the data_len of each mbuf in the chain should be considered while
distributing the new (provided) size.
To account for the above dp_packet_set_size() has been changed so that,
in the multi-segment mbufs case, only the data_len on the last mbuf of
the chain and the total size of the packet, pkt_len, are changed. The
data_len on the intermediate mbufs preceeding the last mbuf is not
changed by dp_packet_set_size(). Furthermore, in some cases
dp_packet_set_size() may be used to set a smaller size than the current
packet size, thus effectively trimming the end of the packet. In the
multi-segment mbufs case this may lead to lingering mbufs that may need
freeing.
__dp_packet_set_data() now also updates an mbufs' data_len after setting
the data offset. This is so that both fields are always in sync for each
mbuf in a chain.
Co-authored-by: Michael Qiu
Co-authored-by: Mark Kavanagh
Co-authored-by: Przemyslaw Lal
Co-authored-by: Marcin Ksiadz
Co-authored-by: Yuanhan Liu
Signed-off-by: Michael Qiu
Signed-off-by: Mark Kavanagh
Signed-off-by: Przemyslaw Lal
Signed-off-by: Marcin Ksiadz
Signed-off-by: Yuanhan Liu
Signed-off-by: Tiago Lam
Acked-by: Eelco Chaudron
---
lib/dp-packet.h | 76 ++++++++++++++++++++++++++++++++++++++++++++++++---------
1 file changed, 64 insertions(+), 12 deletions(-)
diff --git a/lib/dp-packet.h b/lib/dp-packet.h
index 6376039..d2803af 100644
--- a/lib/dp-packet.h
+++ b/lib/dp-packet.h
@@ -429,17 +429,49 @@ dp_packet_size(const struct dp_packet *b)
static inline void
dp_packet_set_size(struct dp_packet *b, uint32_t v)
{
- /* netdev-dpdk does not currently support segmentation; consequently, for
- * all intents and purposes, 'data_len' (16 bit) and 'pkt_len' (32 bit) may
- * be used interchangably.
- *
- * On the datapath, it is expected that the size of packets
- * (and thus 'v') will always be <= UINT16_MAX; this means that there is no
- * loss of accuracy in assigning 'v' to 'data_len'.
- */
- b->mbuf.data_len = (uint16_t)v; /* Current seg length. */
- b->mbuf.pkt_len = v; /* Total length of all segments linked to
- * this segment. */
+ if (b->source == DPBUF_DPDK) {
+ struct rte_mbuf *mbuf = &b->mbuf;
+ uint16_t new_len = v;
+ uint16_t data_len;
+ uint16_t nb_segs = 0;
+ uint16_t pkt_len = 0;
+
+ /* Trim 'v' length bytes from the end of the chained buffers, freeing
+ any buffers that may be left floating */
+ while (mbuf) {
+ data_len = MIN(new_len, mbuf->data_len);
+ mbuf->data_len = data_len;
+
+ if (new_len - data_len <= 0) {
+ /* Free the rest of chained mbufs */
+ free_dpdk_buf(CONTAINER_OF(mbuf->next, struct dp_packet,
+ mbuf));
+ mbuf->next = NULL;
+ } else if (!mbuf->next) {
+ /* Don't assign more than what we have available */
+ mbuf->data_len = MIN(new_len,
+ mbuf->buf_len - mbuf->data_off);
+ }
+
+ new_len -= data_len;
+ nb_segs += 1;
+ pkt_len += mbuf->data_len;
+ mbuf = mbuf->next;
+ }
+
+ /* pkt_len != v would effectively mean that pkt_len < than 'v' (as
+ * being bigger is logically impossible). Being < than 'v' would mean
+ * the 'v' provided was bigger than the available room, which is the
+ * responsibility of the caller to make sure there is enough room */
+ ovs_assert(pkt_len == v);
+
+ b->mbuf.nb_segs = nb_segs;
+ b->mbuf.pkt_len = pkt_len;
+ } else {
+ b->mbuf.data_len = v;
+ /* Total length of all segments linked to this segment. */
+ b->mbuf.pkt_len = v;
+ }
}
static inline uint16_t
@@ -451,7 +483,27 @@ __packet_data(const struct dp_packet *b)
static inline void
__packet_set_data(struct dp_packet *b, uint16_t v)
{
- b->mbuf.data_off = v;
+ if (b->source == DPBUF_DPDK) {
+ /* Moving data_off away from the first mbuf in the chain is not a
+ * possibility using DPBUF_DPDK dp_packets */
+ ovs_assert(v == UINT16_MAX || v <= b->mbuf.buf_len);
+
+ uint16_t prev_ofs = b->mbuf.data_off;
+ b->mbuf.data_off = v;
+ int16_t ofs_diff = prev_ofs - b->mbuf.data_off;
+
+ /* When dealing with DPDK mbufs, keep data_off and data_len in sync.
+ * Thus, update data_len if the length changes with the move of
+ * data_off. However, if data_len is 0, there's no data to move and
+ * data_len should remain 0. */
+
+ if (b->mbuf.data_len != 0) {
+ b->mbuf.data_len = MIN(b->mbuf.data_len + ofs_diff,
+ b->mbuf.buf_len - b->mbuf.data_off);
+ }
+ } else {
+ b->mbuf.data_off = v;
+ }
}
static inline uint16_t
From patchwork Mon Aug 20 17:44:27 2018
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Lam, Tiago"
X-Patchwork-Id: 959858
X-Patchwork-Delegate: ian.stokes@intel.com
Return-Path:
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Authentication-Results: ozlabs.org;
spf=pass (mailfrom) smtp.mailfrom=openvswitch.org
(client-ip=140.211.169.12; helo=mail.linuxfoundation.org;
envelope-from=ovs-dev-bounces@openvswitch.org;
receiver=)
Authentication-Results: ozlabs.org;
dmarc=fail (p=none dis=none) header.from=intel.com
Received: from mail.linuxfoundation.org (mail.linuxfoundation.org
[140.211.169.12])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
bits)) (No client certificate requested)
by ozlabs.org (Postfix) with ESMTPS id 41vLqX4J1Dz9s8T
for ;
Tue, 21 Aug 2018 03:47:36 +1000 (AEST)
Received: from mail.linux-foundation.org (localhost [127.0.0.1])
by mail.linuxfoundation.org (Postfix) with ESMTP id 1F549D8E;
Mon, 20 Aug 2018 17:45:21 +0000 (UTC)
X-Original-To: ovs-dev@openvswitch.org
Delivered-To: ovs-dev@mail.linuxfoundation.org
Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org
[172.17.192.35])
by mail.linuxfoundation.org (Postfix) with ESMTPS id 049CBCDD
for ; Mon, 20 Aug 2018 17:45:18 +0000 (UTC)
X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6
Received: from mga09.intel.com (mga09.intel.com [134.134.136.24])
by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 4803767F
for ; Mon, 20 Aug 2018 17:45:17 +0000 (UTC)
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from fmsmga004.fm.intel.com ([10.253.24.48])
by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;
20 Aug 2018 10:45:16 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.53,266,1531810800"; d="scan'208";a="81880363"
Received: from silpixa00399125.ir.intel.com ([10.237.223.34])
by fmsmga004.fm.intel.com with ESMTP; 20 Aug 2018 10:44:55 -0700
From: Tiago Lam
To: ovs-dev@openvswitch.org
Date: Mon, 20 Aug 2018 18:44:27 +0100
Message-Id: <1534787075-139132-7-git-send-email-tiago.lam@intel.com>
X-Mailer: git-send-email 2.7.4
In-Reply-To: <1534787075-139132-1-git-send-email-tiago.lam@intel.com>
References: <1534787075-139132-1-git-send-email-tiago.lam@intel.com>
X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_HI
autolearn=ham version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
smtp1.linux-foundation.org
Cc: i.maximets@samsung.com
Subject: [ovs-dev] [PATCH v8 06/14] dp-packet: Handle multi-seg mbufs in
helper funcs.
X-BeenThere: ovs-dev@openvswitch.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id:
List-Unsubscribe: ,
List-Archive:
List-Post:
List-Help:
List-Subscribe: ,
MIME-Version: 1.0
Sender: ovs-dev-bounces@openvswitch.org
Errors-To: ovs-dev-bounces@openvswitch.org
Most helper functions in dp-packet assume that the data held by a
dp_packet is contiguous, and perform operations such as pointer
arithmetic under that assumption. However, with the introduction of
multi-segment mbufs, where data is non-contiguous, such assumptions are
no longer possible. Some examples of Such helper functions are
dp_packet_tail(), dp_packet_tailroom(), dp_packet_end(),
dp_packet_get_allocated() and dp_packet_at().
Thus, instead of assuming contiguous data in dp_packet, they now
iterate over the (non-contiguous) data in mbufs to perform their
calculations.
Finally, dp_packet_use__() has also been modified to perform the
initialisation of the packet (and setting the source) before continuing
to set its size and data length, which now depends on the type of
packet.
Co-authored-by: Mark Kavanagh
Signed-off-by: Mark Kavanagh
Signed-off-by: Tiago Lam
Acked-by: Eelco Chaudron
---
lib/dp-packet.c | 4 +-
lib/dp-packet.h | 150 +++++++++++++++++++++++++++++++++++++++++++++++++++-----
2 files changed, 140 insertions(+), 14 deletions(-)
diff --git a/lib/dp-packet.c b/lib/dp-packet.c
index 782e7c2..2aaeaae 100644
--- a/lib/dp-packet.c
+++ b/lib/dp-packet.c
@@ -41,11 +41,11 @@ static void
dp_packet_use__(struct dp_packet *b, void *base, size_t allocated,
enum dp_packet_source source)
{
+ dp_packet_init__(b, allocated, source);
+
dp_packet_set_base(b, base);
dp_packet_set_data(b, base);
dp_packet_set_size(b, 0);
-
- dp_packet_init__(b, allocated, source);
}
/* Initializes 'b' as an empty dp_packet that contains the 'allocated' bytes of
diff --git a/lib/dp-packet.h b/lib/dp-packet.h
index d2803af..48be19b 100644
--- a/lib/dp-packet.h
+++ b/lib/dp-packet.h
@@ -185,9 +185,25 @@ dp_packet_delete(struct dp_packet *b)
static inline void *
dp_packet_at(const struct dp_packet *b, size_t offset, size_t size)
{
- return offset + size <= dp_packet_size(b)
- ? (char *) dp_packet_data(b) + offset
- : NULL;
+ if (offset + size > dp_packet_size(b)) {
+ return NULL;
+ }
+
+#ifdef DPDK_NETDEV
+ if (b->source == DPBUF_DPDK) {
+ struct rte_mbuf *buf = CONST_CAST(struct rte_mbuf *, &b->mbuf);
+
+ while (buf && offset > buf->data_len) {
+ offset -= buf->data_len;
+
+ buf = buf->next;
+ }
+
+ return buf ? rte_pktmbuf_mtod_offset(buf, char *, offset) : NULL;
+ }
+#endif
+
+ return (char *) dp_packet_data(b) + offset;
}
/* Returns a pointer to byte 'offset' in 'b', which must contain at least
@@ -196,13 +212,23 @@ static inline void *
dp_packet_at_assert(const struct dp_packet *b, size_t offset, size_t size)
{
ovs_assert(offset + size <= dp_packet_size(b));
- return ((char *) dp_packet_data(b)) + offset;
+ return dp_packet_at(b, offset, size);
}
/* Returns a pointer to byte following the last byte of data in use in 'b'. */
static inline void *
dp_packet_tail(const struct dp_packet *b)
{
+#ifdef DPDK_NETDEV
+ if (b->source == DPBUF_DPDK) {
+ struct rte_mbuf *buf = CONST_CAST(struct rte_mbuf *, &b->mbuf);
+ /* Find last segment where data ends, meaning the tail of the chained
+ * mbufs must be there */
+ buf = rte_pktmbuf_lastseg(buf);
+
+ return rte_pktmbuf_mtod_offset(buf, void *, buf->data_len);
+ }
+#endif
return (char *) dp_packet_data(b) + dp_packet_size(b);
}
@@ -211,6 +237,15 @@ dp_packet_tail(const struct dp_packet *b)
static inline void *
dp_packet_end(const struct dp_packet *b)
{
+#ifdef DPDK_NETDEV
+ if (b->source == DPBUF_DPDK) {
+ struct rte_mbuf *buf = CONST_CAST(struct rte_mbuf *, &(b->mbuf));
+
+ buf = rte_pktmbuf_lastseg(buf);
+
+ return (char *) buf->buf_addr + buf->buf_len;
+ }
+#endif
return (char *) dp_packet_base(b) + dp_packet_get_allocated(b);
}
@@ -236,6 +271,15 @@ dp_packet_tailroom(const struct dp_packet *b)
static inline void
dp_packet_clear(struct dp_packet *b)
{
+#ifdef DPDK_NETDEV
+ if (b->source == DPBUF_DPDK) {
+ /* sets pkt_len and data_len to zero and frees unused mbufs */
+ dp_packet_set_size(b, 0);
+ rte_pktmbuf_reset(&b->mbuf);
+
+ return;
+ }
+#endif
dp_packet_set_data(b, dp_packet_base(b));
dp_packet_set_size(b, 0);
}
@@ -252,24 +296,38 @@ dp_packet_pull(struct dp_packet *b, size_t size)
return data;
}
+#ifdef DPDK_NETDEV
+/* Similar to dp_packet_try_pull() but doesn't actually pull any data, only
+ * checks if it could and returns true or false accordingly.
+ *
+ * Valid for dp_packets carrying mbufs only. */
+static inline bool
+dp_packet_mbuf_may_pull(const struct dp_packet *b, size_t size) {
+ if (size > b->mbuf.data_len) {
+ return false;
+ }
+
+ return true;
+}
+#endif
+
/* If 'b' has at least 'size' bytes of data, removes that many bytes from the
* head end of 'b' and returns the first byte removed. Otherwise, returns a
* null pointer without modifying 'b'. */
static inline void *
dp_packet_try_pull(struct dp_packet *b, size_t size)
{
+#ifdef DPDK_NETDEV
+ if (!dp_packet_mbuf_may_pull(b, size)) {
+ return NULL;
+ }
+#endif
+
return dp_packet_size(b) - dp_packet_l2_pad_size(b) >= size
? dp_packet_pull(b, size) : NULL;
}
static inline bool
-dp_packet_equal(const struct dp_packet *a, const struct dp_packet *b)
-{
- return dp_packet_size(a) == dp_packet_size(b) &&
- !memcmp(dp_packet_data(a), dp_packet_data(b), dp_packet_size(a));
-}
-
-static inline bool
dp_packet_is_eth(const struct dp_packet *b)
{
return b->packet_type == htonl(PT_ETH);
@@ -311,6 +369,12 @@ dp_packet_set_l2_pad_size(struct dp_packet *b, uint8_t pad_size)
static inline void *
dp_packet_l2_5(const struct dp_packet *b)
{
+#ifdef DPDK_NETDEV
+ if (!dp_packet_mbuf_may_pull(b, b->l2_5_ofs)) {
+ return NULL;
+ }
+#endif
+
return b->l2_5_ofs != UINT16_MAX
? (char *) dp_packet_data(b) + b->l2_5_ofs
: NULL;
@@ -327,6 +391,12 @@ dp_packet_set_l2_5(struct dp_packet *b, void *l2_5)
static inline void *
dp_packet_l3(const struct dp_packet *b)
{
+#ifdef DPDK_NETDEV
+ if (!dp_packet_mbuf_may_pull(b, b->l3_ofs)) {
+ return NULL;
+ }
+#endif
+
return b->l3_ofs != UINT16_MAX
? (char *) dp_packet_data(b) + b->l3_ofs
: NULL;
@@ -341,6 +411,12 @@ dp_packet_set_l3(struct dp_packet *b, void *l3)
static inline void *
dp_packet_l4(const struct dp_packet *b)
{
+#ifdef DPDK_NETDEV
+ if (!dp_packet_mbuf_may_pull(b, b->l4_ofs)) {
+ return NULL;
+ }
+#endif
+
return b->l4_ofs != UINT16_MAX
? (char *) dp_packet_data(b) + b->l4_ofs
: NULL;
@@ -355,6 +431,27 @@ dp_packet_set_l4(struct dp_packet *b, void *l4)
static inline size_t
dp_packet_l4_size(const struct dp_packet *b)
{
+#ifdef DPDK_NETDEV
+ if (b->source == DPBUF_DPDK) {
+ if (!dp_packet_mbuf_may_pull(b, b->l4_ofs)) {
+ return 0;
+ }
+
+ struct rte_mbuf *mbuf = CONST_CAST(struct rte_mbuf *, &b->mbuf);
+ size_t l4_size = mbuf->data_len - b->l4_ofs;
+
+ mbuf = mbuf->next;
+ while (mbuf) {
+ l4_size += mbuf->data_len;
+
+ mbuf = mbuf->next;
+ }
+
+ l4_size -= dp_packet_l2_pad_size(b);
+
+ return l4_size;
+ }
+#endif
return b->l4_ofs != UINT16_MAX
? (const char *)dp_packet_tail(b) - (const char *)dp_packet_l4(b)
- dp_packet_l2_pad_size(b)
@@ -408,6 +505,28 @@ dp_packet_get_nd_payload(const struct dp_packet *b)
#ifdef DPDK_NETDEV
BUILD_ASSERT_DECL(offsetof(struct dp_packet, mbuf) == 0);
+static inline bool
+dp_packet_equal(const struct dp_packet *a, const struct dp_packet *b)
+{
+ const struct rte_mbuf *m_a = &a->mbuf;
+ const struct rte_mbuf *m_b = &b->mbuf;
+ if (dp_packet_size(a) != dp_packet_size(b) ||
+ m_a->nb_segs != m_b->nb_segs) {
+ return false;
+ }
+
+ while (m_a != NULL && m_b != NULL) {
+ if (m_a->data_len != m_b->data_len ||
+ memcmp(dp_packet_data(a), dp_packet_data(b), m_a->data_len)) {
+ return false;
+ }
+
+ m_a = m_a->next;
+ m_b = m_b->next;
+ }
+ return true;
+}
+
static inline void *
dp_packet_base(const struct dp_packet *b)
{
@@ -509,7 +628,7 @@ __packet_set_data(struct dp_packet *b, uint16_t v)
static inline uint16_t
dp_packet_get_allocated(const struct dp_packet *b)
{
- return b->mbuf.buf_len;
+ return b->mbuf.nb_segs * b->mbuf.buf_len;
}
static inline void
@@ -518,6 +637,13 @@ dp_packet_set_allocated(struct dp_packet *b, uint16_t s)
b->mbuf.buf_len = s;
}
#else
+static inline bool
+dp_packet_equal(const struct dp_packet *a, const struct dp_packet *b)
+{
+ return dp_packet_size(a) == dp_packet_size(b) &&
+ !memcmp(dp_packet_data(a), dp_packet_data(b), dp_packet_size(a));
+}
+
static inline void *
dp_packet_base(const struct dp_packet *b)
{
From patchwork Mon Aug 20 17:44:28 2018
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Lam, Tiago"
X-Patchwork-Id: 959861
X-Patchwork-Delegate: ian.stokes@intel.com
Return-Path:
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Authentication-Results: ozlabs.org;
spf=pass (mailfrom) smtp.mailfrom=openvswitch.org
(client-ip=140.211.169.12; helo=mail.linuxfoundation.org;
envelope-from=ovs-dev-bounces@openvswitch.org;
receiver=)
Authentication-Results: ozlabs.org;
dmarc=fail (p=none dis=none) header.from=intel.com
Received: from mail.linuxfoundation.org (mail.linuxfoundation.org
[140.211.169.12])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
bits)) (No client certificate requested)
by ozlabs.org (Postfix) with ESMTPS id 41vLrm3CsWz9s8T
for ;
Tue, 21 Aug 2018 03:48:40 +1000 (AEST)
Received: from mail.linux-foundation.org (localhost [127.0.0.1])
by mail.linuxfoundation.org (Postfix) with ESMTP id 59FBED9C;
Mon, 20 Aug 2018 17:45:23 +0000 (UTC)
X-Original-To: ovs-dev@openvswitch.org
Delivered-To: ovs-dev@mail.linuxfoundation.org
Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org
[172.17.192.35])
by mail.linuxfoundation.org (Postfix) with ESMTPS id AC49FD69
for ; Mon, 20 Aug 2018 17:45:18 +0000 (UTC)
X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6
Received: from mga09.intel.com (mga09.intel.com [134.134.136.24])
by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 1A26F67F
for ; Mon, 20 Aug 2018 17:45:18 +0000 (UTC)
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from fmsmga004.fm.intel.com ([10.253.24.48])
by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;
20 Aug 2018 10:45:16 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.53,266,1531810800"; d="scan'208";a="81880370"
Received: from silpixa00399125.ir.intel.com ([10.237.223.34])
by fmsmga004.fm.intel.com with ESMTP; 20 Aug 2018 10:44:57 -0700
From: Tiago Lam
To: ovs-dev@openvswitch.org
Date: Mon, 20 Aug 2018 18:44:28 +0100
Message-Id: <1534787075-139132-8-git-send-email-tiago.lam@intel.com>
X-Mailer: git-send-email 2.7.4
In-Reply-To: <1534787075-139132-1-git-send-email-tiago.lam@intel.com>
References: <1534787075-139132-1-git-send-email-tiago.lam@intel.com>
X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_HI
autolearn=ham version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
smtp1.linux-foundation.org
Cc: i.maximets@samsung.com
Subject: [ovs-dev] [PATCH v8 07/14] dp-packet: Handle multi-seg mubfs in
shift() func.
X-BeenThere: ovs-dev@openvswitch.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id:
List-Unsubscribe: ,
List-Archive:
List-Post:
List-Help:
List-Subscribe: ,
MIME-Version: 1.0
Sender: ovs-dev-bounces@openvswitch.org
Errors-To: ovs-dev-bounces@openvswitch.org
In its current implementation dp_packet_shift() is also unaware of
multi-seg mbufs (that holds data in memory non-contiguously) and assumes
that data exists contiguously in memory, memmove'ing data to perform the
shift.
To add support for multi-seg mbuds a new set of functions was
introduced, dp_packet_mbuf_shift() and dp_packet_mbuf_write(). These
functions are used by dp_packet_shift(), when handling multi-seg mbufs,
to shift and write data within a chain of mbufs.
Signed-off-by: Tiago Lam
Acked-by: Eelco Chaudron
---
lib/dp-packet.c | 100 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
lib/dp-packet.h | 10 ++++++
2 files changed, 110 insertions(+)
diff --git a/lib/dp-packet.c b/lib/dp-packet.c
index 2aaeaae..167bf43 100644
--- a/lib/dp-packet.c
+++ b/lib/dp-packet.c
@@ -294,6 +294,100 @@ dp_packet_prealloc_headroom(struct dp_packet *b, size_t size)
}
}
+#ifdef DPDK_NETDEV
+/* Write len data bytes in a mbuf at specified offset.
+ *
+ * 'mbuf', pointer to the destination mbuf where 'ofs' is, and the mbuf where
+ * the data will first be written.
+ * 'ofs', the offset within the provided 'mbuf' where 'data' is to be written.
+ * 'len', the size of the to be written 'data'.
+ * 'data', pointer to the to be written bytes.
+ *
+ * XXX: This function is the counterpart of the `rte_pktmbuf_read()` function
+ * available with DPDK, in the rte_mbuf.h */
+void
+dp_packet_mbuf_write(struct rte_mbuf *mbuf, int16_t ofs, uint32_t len,
+ const void *data)
+{
+ char *dst_addr;
+ uint16_t data_len;
+ int len_copy;
+ while (mbuf) {
+ if (len == 0) {
+ break;
+ }
+
+ dst_addr = rte_pktmbuf_mtod_offset(mbuf, char *, ofs);
+ data_len = MBUF_BUF_END(mbuf->buf_addr, mbuf->buf_len) - dst_addr;
+
+ len_copy = MIN(len, data_len);
+ /* We don't know if 'data' is the result of a rte_pktmbuf_read() call,
+ * in which case we may end up writing to the same region of memory we
+ * are reading from and overlapping. Hence the use of memmove() here */
+ memmove(dst_addr, data, len_copy);
+
+ data = ((char *) data) + len_copy;
+ len -= len_copy;
+ ofs = 0;
+
+ mbuf->data_len = len_copy;
+ mbuf = mbuf->next;
+ }
+}
+
+static void
+dp_packet_mbuf_shift_(struct rte_mbuf *dbuf, int16_t dst_ofs,
+ const struct rte_mbuf *sbuf, uint16_t src_ofs, int len)
+{
+ char *rd = xmalloc(sizeof(*rd) * len);
+ const char *wd = rte_pktmbuf_read(sbuf, src_ofs, len, rd);
+
+ ovs_assert(wd);
+
+ dp_packet_mbuf_write(dbuf, dst_ofs, len, wd);
+
+ free(rd);
+}
+
+/* Similarly to dp_packet_shift(), shifts the data within the mbufs of a
+ * dp_packet of DPBUF_DPDK source by 'delta' bytes.
+ * Caller must make sure of the following conditions:
+ * - When shifting left, delta can't be bigger than the data_len available in
+ * the last mbuf;
+ * - When shifting right, delta can't be bigger than the space available in the
+ * first mbuf (buf_len - data_off).
+ * Both these conditions guarantee that a shift operation doesn't fall outside
+ * the bounds of the existing mbufs, so that the first and last mbufs (when
+ * using multi-segment mbufs), remain the same. */
+static void
+dp_packet_mbuf_shift(struct dp_packet *b, int delta)
+{
+ uint16_t src_ofs;
+ int16_t dst_ofs;
+
+ struct rte_mbuf *mbuf = CONST_CAST(struct rte_mbuf *, &b->mbuf);
+ struct rte_mbuf *tmbuf = rte_pktmbuf_lastseg(mbuf);
+
+ if (delta < 0) {
+ ovs_assert(-delta <= tmbuf->data_len);
+ } else {
+ ovs_assert(delta < (mbuf->buf_len - mbuf->data_off));
+ }
+
+ /* Set the destination and source offsets to copy to */
+ dst_ofs = delta;
+ src_ofs = 0;
+
+ /* Shift data from src mbuf and offset to dst mbuf and offset */
+ dp_packet_mbuf_shift_(mbuf, dst_ofs, mbuf, src_ofs,
+ rte_pktmbuf_pkt_len(mbuf));
+
+ /* Update mbufs' properties, and if using multi-segment mbufs, first and
+ * last mbuf's data_len also needs to be adjusted */
+ mbuf->data_off = mbuf->data_off + dst_ofs;
+}
+#endif
+
/* Shifts all of the data within the allocated space in 'b' by 'delta' bytes.
* For example, a 'delta' of 1 would cause each byte of data to move one byte
* forward (from address 'p' to 'p+1'), and a 'delta' of -1 would cause each
@@ -306,6 +400,12 @@ dp_packet_shift(struct dp_packet *b, int delta)
: true);
if (delta != 0) {
+#ifdef DPDK_NETDEV
+ if (b->source == DPBUF_DPDK) {
+ dp_packet_mbuf_shift(b, delta);
+ return;
+ }
+#endif
char *dst = (char *) dp_packet_data(b) + delta;
memmove(dst, dp_packet_data(b), dp_packet_size(b));
dp_packet_set_data(b, dst);
diff --git a/lib/dp-packet.h b/lib/dp-packet.h
index 48be19b..3a99044 100644
--- a/lib/dp-packet.h
+++ b/lib/dp-packet.h
@@ -80,6 +80,11 @@ struct dp_packet {
};
};
+#ifdef DPDK_NETDEV
+#define MBUF_BUF_END(BUF_ADDR, BUF_LEN) \
+ (char *) (((char *) BUF_ADDR) + BUF_LEN)
+#endif
+
static inline void *dp_packet_data(const struct dp_packet *);
static inline void dp_packet_set_data(struct dp_packet *, void *);
static inline void *dp_packet_base(const struct dp_packet *);
@@ -133,6 +138,11 @@ static inline void *dp_packet_at(const struct dp_packet *, size_t offset,
size_t size);
static inline void *dp_packet_at_assert(const struct dp_packet *,
size_t offset, size_t size);
+#ifdef DPDK_NETDEV
+void
+dp_packet_mbuf_write(struct rte_mbuf *mbuf, int16_t ofs, uint32_t len,
+ const void *data);
+#endif
static inline void *dp_packet_tail(const struct dp_packet *);
static inline void *dp_packet_end(const struct dp_packet *);
From patchwork Mon Aug 20 17:44:29 2018
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Lam, Tiago"
X-Patchwork-Id: 959863
X-Patchwork-Delegate: ian.stokes@intel.com
Return-Path:
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Authentication-Results: ozlabs.org;
spf=pass (mailfrom) smtp.mailfrom=openvswitch.org
(client-ip=140.211.169.12; helo=mail.linuxfoundation.org;
envelope-from=ovs-dev-bounces@openvswitch.org;
receiver=)
Authentication-Results: ozlabs.org;
dmarc=fail (p=none dis=none) header.from=intel.com
Received: from mail.linuxfoundation.org (mail.linuxfoundation.org
[140.211.169.12])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
bits)) (No client certificate requested)
by ozlabs.org (Postfix) with ESMTPS id 41vLsZ0c1Yz9s0n
for ;
Tue, 21 Aug 2018 03:49:22 +1000 (AEST)
Received: from mail.linux-foundation.org (localhost [127.0.0.1])
by mail.linuxfoundation.org (Postfix) with ESMTP id 00536DB3;
Mon, 20 Aug 2018 17:45:25 +0000 (UTC)
X-Original-To: ovs-dev@openvswitch.org
Delivered-To: ovs-dev@mail.linuxfoundation.org
Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org
[172.17.192.35])
by mail.linuxfoundation.org (Postfix) with ESMTPS id 5223BD75
for ; Mon, 20 Aug 2018 17:45:19 +0000 (UTC)
X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6
Received: from mga09.intel.com (mga09.intel.com [134.134.136.24])
by smtp1.linuxfoundation.org (Postfix) with ESMTPS id B9B7F67F
for ; Mon, 20 Aug 2018 17:45:18 +0000 (UTC)
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from fmsmga004.fm.intel.com ([10.253.24.48])
by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;
20 Aug 2018 10:45:16 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.53,266,1531810800"; d="scan'208";a="81880458"
Received: from silpixa00399125.ir.intel.com ([10.237.223.34])
by fmsmga004.fm.intel.com with ESMTP; 20 Aug 2018 10:45:00 -0700
From: Tiago Lam
To: ovs-dev@openvswitch.org
Date: Mon, 20 Aug 2018 18:44:29 +0100
Message-Id: <1534787075-139132-9-git-send-email-tiago.lam@intel.com>
X-Mailer: git-send-email 2.7.4
In-Reply-To: <1534787075-139132-1-git-send-email-tiago.lam@intel.com>
References: <1534787075-139132-1-git-send-email-tiago.lam@intel.com>
X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_HI
autolearn=ham version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
smtp1.linux-foundation.org
Cc: Michael Qiu , i.maximets@samsung.com
Subject: [ovs-dev] [PATCH v8 08/14] dp-packet: copy data from multi-seg.
DPDK mbuf
X-BeenThere: ovs-dev@openvswitch.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id:
List-Unsubscribe: ,
List-Archive:
List-Post:
List-Help:
List-Subscribe: ,
MIME-Version: 1.0
Sender: ovs-dev-bounces@openvswitch.org
Errors-To: ovs-dev-bounces@openvswitch.org
From: Michael Qiu
When doing packet clone, if packet source is from DPDK driver,
multi-segment must be considered, and copy the segment's data one by
one.
Also, lots of DPDK mbuf's info is missed during a copy, like packet
type, ol_flags, etc. That information is very important for DPDK to do
packets processing.
Co-authored-by: Mark Kavanagh
Co-authored-by: Tiago Lam
Signed-off-by: Michael Qiu
Signed-off-by: Mark Kavanagh
Signed-off-by: Tiago Lam
Acked-by: Eelco Chaudron
---
lib/dp-packet.c | 69 ++++++++++++++++++++++++++++++++++++++++++++++---------
lib/dp-packet.h | 3 +++
lib/netdev-dpdk.c | 1 +
3 files changed, 62 insertions(+), 11 deletions(-)
diff --git a/lib/dp-packet.c b/lib/dp-packet.c
index 167bf43..806640b 100644
--- a/lib/dp-packet.c
+++ b/lib/dp-packet.c
@@ -48,6 +48,22 @@ dp_packet_use__(struct dp_packet *b, void *base, size_t allocated,
dp_packet_set_size(b, 0);
}
+#ifdef DPDK_NETDEV
+void
+dp_packet_copy_mbuf_flags(struct dp_packet *dst, const struct dp_packet *src)
+{
+ ovs_assert(dst != NULL && src != NULL);
+ struct rte_mbuf *buf_dst = &(dst->mbuf);
+ struct rte_mbuf buf_src = src->mbuf;
+
+ buf_dst->ol_flags = buf_src.ol_flags;
+ buf_dst->packet_type = buf_src.packet_type;
+ buf_dst->tx_offload = buf_src.tx_offload;
+}
+#else
+#define dp_packet_copy_mbuf_flags(arg1, arg2)
+#endif
+
/* Initializes 'b' as an empty dp_packet that contains the 'allocated' bytes of
* memory starting at 'base'. 'base' should be the first byte of a region
* obtained from malloc(). It will be freed (with free()) if 'b' is resized or
@@ -158,6 +174,44 @@ dp_packet_clone(const struct dp_packet *buffer)
return dp_packet_clone_with_headroom(buffer, 0);
}
+#ifdef DPDK_NETDEV
+struct dp_packet *
+dp_packet_clone_with_headroom(const struct dp_packet *b, size_t headroom) {
+ struct dp_packet *new_buffer;
+ uint32_t pkt_len = dp_packet_size(b);
+
+ /* copy multi-seg data */
+ if (b->source == DPBUF_DPDK && !rte_pktmbuf_is_contiguous(&b->mbuf)) {
+ void *dst = NULL;
+ struct rte_mbuf *mbuf = CONST_CAST(struct rte_mbuf *, &b->mbuf);
+
+ new_buffer = dp_packet_new_with_headroom(pkt_len, headroom);
+ dst = dp_packet_data(new_buffer);
+ dp_packet_set_size(new_buffer, pkt_len);
+
+ if (!rte_pktmbuf_read(mbuf, 0, pkt_len, dst)) {
+ return NULL;
+ }
+ } else {
+ new_buffer = dp_packet_clone_data_with_headroom(dp_packet_data(b),
+ dp_packet_size(b),
+ headroom);
+ }
+
+ /* Copy the following fields into the returned buffer: l2_pad_size,
+ * l2_5_ofs, l3_ofs, l4_ofs, cutlen, packet_type and md. */
+ memcpy(&new_buffer->l2_pad_size, &b->l2_pad_size,
+ sizeof(struct dp_packet) -
+ offsetof(struct dp_packet, l2_pad_size));
+
+ dp_packet_copy_mbuf_flags(new_buffer, b);
+ if (dp_packet_rss_valid(new_buffer)) {
+ new_buffer->mbuf.hash.rss = b->mbuf.hash.rss;
+ }
+
+ return new_buffer;
+}
+#else
/* Creates and returns a new dp_packet whose data are copied from 'buffer'.
* The returned dp_packet will additionally have 'headroom' bytes of
* headroom. */
@@ -165,32 +219,25 @@ struct dp_packet *
dp_packet_clone_with_headroom(const struct dp_packet *buffer, size_t headroom)
{
struct dp_packet *new_buffer;
+ uint32_t pkt_len = dp_packet_size(buffer);
new_buffer = dp_packet_clone_data_with_headroom(dp_packet_data(buffer),
- dp_packet_size(buffer),
- headroom);
+ pkt_len, headroom);
+
/* Copy the following fields into the returned buffer: l2_pad_size,
* l2_5_ofs, l3_ofs, l4_ofs, cutlen, packet_type and md. */
memcpy(&new_buffer->l2_pad_size, &buffer->l2_pad_size,
sizeof(struct dp_packet) -
offsetof(struct dp_packet, l2_pad_size));
-#ifdef DPDK_NETDEV
- new_buffer->mbuf.ol_flags = buffer->mbuf.ol_flags;
-#else
new_buffer->rss_hash_valid = buffer->rss_hash_valid;
-#endif
-
if (dp_packet_rss_valid(new_buffer)) {
-#ifdef DPDK_NETDEV
- new_buffer->mbuf.hash.rss = buffer->mbuf.hash.rss;
-#else
new_buffer->rss_hash = buffer->rss_hash;
-#endif
}
return new_buffer;
}
+#endif
/* Creates and returns a new dp_packet that initially contains a copy of the
* 'size' bytes of data starting at 'data' with no headroom or tailroom. */
diff --git a/lib/dp-packet.h b/lib/dp-packet.h
index 3a99044..022e420 100644
--- a/lib/dp-packet.h
+++ b/lib/dp-packet.h
@@ -124,6 +124,9 @@ void dp_packet_init_dpdk(struct dp_packet *);
void dp_packet_init(struct dp_packet *, size_t);
void dp_packet_uninit(struct dp_packet *);
+void dp_packet_copy_mbuf_flags(struct dp_packet *dst,
+ const struct dp_packet *src);
+
struct dp_packet *dp_packet_new(size_t);
struct dp_packet *dp_packet_new_with_headroom(size_t, size_t headroom);
struct dp_packet *dp_packet_clone(const struct dp_packet *);
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index aee8e20..e005d00 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -2364,6 +2364,7 @@ dpdk_do_tx_copy(struct netdev *netdev, int qid, struct dp_packet_batch *batch)
memcpy(rte_pktmbuf_mtod(pkts[txcnt], void *),
dp_packet_data(packet), size);
dp_packet_set_size((struct dp_packet *)pkts[txcnt], size);
+ dp_packet_copy_mbuf_flags((struct dp_packet *)pkts[txcnt], packet);
txcnt++;
}
From patchwork Mon Aug 20 17:44:30 2018
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Lam, Tiago"
X-Patchwork-Id: 959864
X-Patchwork-Delegate: ian.stokes@intel.com
Return-Path:
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Authentication-Results: ozlabs.org;
spf=pass (mailfrom) smtp.mailfrom=openvswitch.org
(client-ip=140.211.169.12; helo=mail.linuxfoundation.org;
envelope-from=ovs-dev-bounces@openvswitch.org;
receiver=)
Authentication-Results: ozlabs.org;
dmarc=fail (p=none dis=none) header.from=intel.com
Received: from mail.linuxfoundation.org (mail.linuxfoundation.org
[140.211.169.12])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
bits)) (No client certificate requested)
by ozlabs.org (Postfix) with ESMTPS id 41vLsz55m0z9s0n
for ;
Tue, 21 Aug 2018 03:49:43 +1000 (AEST)
Received: from mail.linux-foundation.org (localhost [127.0.0.1])
by mail.linuxfoundation.org (Postfix) with ESMTP id C4F5EDA6;
Mon, 20 Aug 2018 17:45:26 +0000 (UTC)
X-Original-To: ovs-dev@openvswitch.org
Delivered-To: ovs-dev@mail.linuxfoundation.org
Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org
[172.17.192.35])
by mail.linuxfoundation.org (Postfix) with ESMTPS id 96B75D6F
for ; Mon, 20 Aug 2018 17:45:20 +0000 (UTC)
X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6
Received: from mga09.intel.com (mga09.intel.com [134.134.136.24])
by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 09F8D773
for ; Mon, 20 Aug 2018 17:45:18 +0000 (UTC)
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from fmsmga004.fm.intel.com ([10.253.24.48])
by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;
20 Aug 2018 10:45:16 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.53,266,1531810800"; d="scan'208";a="81880463"
Received: from silpixa00399125.ir.intel.com ([10.237.223.34])
by fmsmga004.fm.intel.com with ESMTP; 20 Aug 2018 10:45:02 -0700
From: Tiago Lam
To: ovs-dev@openvswitch.org
Date: Mon, 20 Aug 2018 18:44:30 +0100
Message-Id: <1534787075-139132-10-git-send-email-tiago.lam@intel.com>
X-Mailer: git-send-email 2.7.4
In-Reply-To: <1534787075-139132-1-git-send-email-tiago.lam@intel.com>
References: <1534787075-139132-1-git-send-email-tiago.lam@intel.com>
X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_HI
autolearn=ham version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
smtp1.linux-foundation.org
Cc: i.maximets@samsung.com
Subject: [ovs-dev] [PATCH v8 09/14] dp-packet: Add support for data
"linearization".
X-BeenThere: ovs-dev@openvswitch.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id:
List-Unsubscribe: ,
List-Archive:
List-Post:
List-Help:
List-Subscribe: ,
MIME-Version: 1.0
Sender: ovs-dev-bounces@openvswitch.org
Errors-To: ovs-dev-bounces@openvswitch.org
Previous commits have added support to the dp_packet API to handle
multi-segmented packets, where data is not stored contiguously in
memory. However, in some cases, it is inevitable and data must be
provided contiguously. Examples of such cases are when performing csums
over the entire packet data, or when write()'ing to a file descriptor
(for a tap interface, for example). For such cases, the dp_packet API
has been extended to provide a way to transform a multi-segmented
DPBUF_DPDK packet into a DPBUF_MALLOC system packet (at the expense of a
copy of memory). If the packet's data is already stored in memory
contigously then there's no need to convert the packet.
Additionally, the main use cases that were assuming that a dp_packet's
data is always held contiguously in memory were changed to make use of
the new "linear functions" in the dp_packet API when there's a need to
traverse the entire's packet data. Per the example above, when the
packet's data needs to be write() to the tap's file descriptor, or when
the conntrack module needs to verify a packet's checksum, the data is
now linearized.
Two new functions have also been added to the packets module to perform
the checksum over a dp_packet's data (using the alredy used csum API).
Initially, this is just a way to abstract the data's linearization, but
in the future this could be optimized to perform the checksum over the
multi-segmented packets, without the need to copy.
Signed-off-by: Tiago Lam
---
lib/bfd.c | 3 +-
lib/conntrack.c | 17 +++++----
lib/dp-packet.c | 18 +++++++++
lib/dp-packet.h | 89 +++++++++++++++++++++++++++++++++++++++----
lib/dpif-netlink.c | 2 +-
lib/dpif.c | 2 +-
lib/netdev-bsd.c | 2 +-
lib/netdev-dummy.c | 5 ++-
lib/netdev-linux.c | 5 ++-
lib/netdev-native-tnl.c | 24 ++++++------
lib/odp-execute.c | 2 +-
lib/ofp-print.c | 2 +-
lib/ovs-lldp.c | 3 +-
lib/packets.c | 20 +++++++++-
lib/packets.h | 3 ++
ofproto/ofproto-dpif-sflow.c | 2 +-
ofproto/ofproto-dpif-upcall.c | 2 +-
ofproto/ofproto-dpif-xlate.c | 12 ++++--
18 files changed, 168 insertions(+), 45 deletions(-)
diff --git a/lib/bfd.c b/lib/bfd.c
index 5308262..d50d2da 100644
--- a/lib/bfd.c
+++ b/lib/bfd.c
@@ -722,7 +722,8 @@ bfd_process_packet(struct bfd *bfd, const struct flow *flow,
if (!msg) {
VLOG_INFO_RL(&rl, "%s: Received too-short BFD control message (only "
"%"PRIdPTR" bytes long, at least %d required).",
- bfd->name, (uint8_t *) dp_packet_tail(p) - l7,
+ bfd->name, dp_packet_size(p) -
+ (l7 - (uint8_t *) dp_packet_data(p)),
BFD_PACKET_LEN);
goto out;
}
diff --git a/lib/conntrack.c b/lib/conntrack.c
index 974f985..15d1ed2 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -636,6 +636,8 @@ reverse_pat_packet(struct dp_packet *pkt, const struct conn *conn)
static void
reverse_nat_packet(struct dp_packet *pkt, const struct conn *conn)
{
+ void *l3 = dp_packet_linear_ofs(pkt, pkt->l3_ofs);
+ void *l4 = dp_packet_linear_ofs(pkt, pkt->l4_ofs);
char *tail = dp_packet_tail(pkt);
char pad = dp_packet_l2_pad_size(pkt);
struct conn_key inner_key;
@@ -644,8 +646,8 @@ reverse_nat_packet(struct dp_packet *pkt, const struct conn *conn)
uint16_t orig_l4_ofs = pkt->l4_ofs;
if (conn->key.dl_type == htons(ETH_TYPE_IP)) {
- struct ip_header *nh = dp_packet_l3(pkt);
- struct icmp_header *icmp = dp_packet_l4(pkt);
+ struct ip_header *nh = l3;
+ struct icmp_header *icmp = l4;
struct ip_header *inner_l3 = (struct ip_header *) (icmp + 1);
extract_l3_ipv4(&inner_key, inner_l3, tail - ((char *)inner_l3) - pad,
&inner_l4, false);
@@ -664,8 +666,8 @@ reverse_nat_packet(struct dp_packet *pkt, const struct conn *conn)
icmp->icmp_csum = 0;
icmp->icmp_csum = csum(icmp, tail - (char *) icmp - pad);
} else {
- struct ovs_16aligned_ip6_hdr *nh6 = dp_packet_l3(pkt);
- struct icmp6_error_header *icmp6 = dp_packet_l4(pkt);
+ struct ovs_16aligned_ip6_hdr *nh6 = l3;
+ struct icmp6_error_header *icmp6 = l4;
struct ovs_16aligned_ip6_hdr *inner_l3_6 =
(struct ovs_16aligned_ip6_hdr *) (icmp6 + 1);
extract_l3_ipv6(&inner_key, inner_l3_6,
@@ -1320,6 +1322,7 @@ conntrack_execute(struct conntrack *ct, struct dp_packet_batch *pkt_batch,
write_ct_md(packet, zone, NULL, NULL, NULL);
continue;
}
+
process_one(ct, packet, &ctx, zone, force, commit, now, setmark,
setlabel, nat_action_info, tp_src, tp_dst, helper);
}
@@ -1902,8 +1905,8 @@ conn_key_extract(struct conntrack *ct, struct dp_packet *pkt, ovs_be16 dl_type,
struct conn_lookup_ctx *ctx, uint16_t zone)
{
const struct eth_header *l2 = dp_packet_eth(pkt);
- const struct ip_header *l3 = dp_packet_l3(pkt);
- const char *l4 = dp_packet_l4(pkt);
+ const struct ip_header *l3 = dp_packet_linear_ofs(pkt, pkt->l3_ofs);
+ const char *l4 = dp_packet_linear_ofs(pkt, pkt->l4_ofs);
memset(ctx, 0, sizeof *ctx);
@@ -3167,7 +3170,7 @@ handle_ftp_ctl(struct conntrack *ct, const struct conn_lookup_ctx *ctx,
const struct conn *conn_for_expectation,
long long now, enum ftp_ctl_pkt ftp_ctl, bool nat)
{
- struct ip_header *l3_hdr = dp_packet_l3(pkt);
+ struct ip_header *l3_hdr = dp_packet_linear_ofs(pkt, pkt->l3_ofs);
ovs_be32 v4_addr_rep = 0;
struct ct_addr v6_addr_rep;
size_t addr_offset_from_ftp_data_start;
diff --git a/lib/dp-packet.c b/lib/dp-packet.c
index 806640b..b8f5242 100644
--- a/lib/dp-packet.c
+++ b/lib/dp-packet.c
@@ -118,6 +118,9 @@ void
dp_packet_init_dpdk(struct dp_packet *b)
{
b->source = DPBUF_DPDK;
+#ifdef DPDK_NETDEV
+ b->mstate = NULL;
+#endif
}
/* Initializes 'b' as an empty dp_packet with an initial capacity of 'size'
@@ -135,6 +138,21 @@ dp_packet_uninit(struct dp_packet *b)
if (b) {
if (b->source == DPBUF_MALLOC) {
free(dp_packet_base(b));
+
+#ifdef DPDK_NETDEV
+ /* Packet has been "linearized" */
+ if (b->mstate) {
+ b->source = DPBUF_DPDK;
+ b->mbuf.buf_addr = b->mstate->addr;
+ b->mbuf.buf_len = b->mstate->len;
+ b->mbuf.data_off = b->mstate->off;
+
+ free(b->mstate);
+ b->mstate = NULL;
+
+ free_dpdk_buf((struct dp_packet *) b);
+ }
+#endif
} else if (b->source == DPBUF_DPDK) {
#ifdef DPDK_NETDEV
/* If this dp_packet was allocated by DPDK it must have been
diff --git a/lib/dp-packet.h b/lib/dp-packet.h
index 022e420..7f7b5f5 100644
--- a/lib/dp-packet.h
+++ b/lib/dp-packet.h
@@ -46,6 +46,16 @@ enum OVS_PACKED_ENUM dp_packet_source {
#define DP_PACKET_CONTEXT_SIZE 64
+#ifdef DPDK_NETDEV
+/* Struct to save data for when a DPBUF_DPDK packet is converted to
+ * DPBUF_MALLOC. */
+struct mbuf_state {
+ void *addr;
+ uint16_t len;
+ uint16_t off;
+};
+#endif
+
/* Buffer for holding packet data. A dp_packet is automatically reallocated
* as necessary if it grows too large for the available memory.
* By default the packet type is set to Ethernet (PT_ETH).
@@ -53,6 +63,7 @@ enum OVS_PACKED_ENUM dp_packet_source {
struct dp_packet {
#ifdef DPDK_NETDEV
struct rte_mbuf mbuf; /* DPDK mbuf */
+ struct mbuf_state *mstate; /* Used when packet has been "linearized" */
#else
void *base_; /* First byte of allocated space. */
uint16_t allocated_; /* Number of bytes allocated. */
@@ -86,6 +97,7 @@ struct dp_packet {
#endif
static inline void *dp_packet_data(const struct dp_packet *);
+static inline void *dp_packet_linear_data(const struct dp_packet *b);
static inline void dp_packet_set_data(struct dp_packet *, void *);
static inline void *dp_packet_base(const struct dp_packet *);
static inline void dp_packet_set_base(struct dp_packet *, void *);
@@ -139,6 +151,8 @@ static inline void dp_packet_delete(struct dp_packet *);
static inline void *dp_packet_at(const struct dp_packet *, size_t offset,
size_t size);
+static inline void *dp_packet_linear_ofs(const struct dp_packet *b,
+ uint16_t ofs);
static inline void *dp_packet_at_assert(const struct dp_packet *,
size_t offset, size_t size);
#ifdef DPDK_NETDEV
@@ -181,15 +195,11 @@ static inline void
dp_packet_delete(struct dp_packet *b)
{
if (b) {
- if (b->source == DPBUF_DPDK) {
- /* If this dp_packet was allocated by DPDK it must have been
- * created as a dp_packet */
- free_dpdk_buf((struct dp_packet*) b);
- return;
- }
-
dp_packet_uninit(b);
- free(b);
+
+ if (b->source != DPBUF_DPDK) {
+ free(b);
+ }
}
}
@@ -747,6 +757,68 @@ dp_packet_data(const struct dp_packet *b)
? (char *) dp_packet_base(b) + __packet_data(b) : NULL;
}
+/* Linearizes the data held by 'b', if and only if its content is
+ * non-contiguous, and returns a pointer to the byte 'ofs' within linearized
+ * 'b', if 'ofs' has been set (!= UINT16_MAX). Otherwise, returns a null
+ * pointer. */
+static inline void *
+dp_packet_linear_ofs(const struct dp_packet *b, uint16_t ofs)
+{
+ /* "Linearize" the data in the packet, iff needed */
+ dp_packet_linear_data(b);
+
+ return ofs != UINT16_MAX
+ ? (char *) dp_packet_data(b) + ofs
+ : NULL;
+}
+
+/* Copies the content of the DPDK packet 'b', if and only if its content is
+ * distributed amongst multiple segments, into system's memory, so that data is
+ * stored linearly. A pointer to the newly allocated (copied) data is returned.
+ *
+ * This is an expensive operation which should only be performed as a last
+ * resort, when multi-segments are under use but data must be accessed
+ * linearly. Otherwise dp_packet_data() should be used instead. */
+static inline void *
+dp_packet_linear_data(const struct dp_packet *b)
+{
+ if (b->source == DPBUF_DPDK) {
+#ifdef DPDK_NETDEV
+ if (!rte_pktmbuf_is_contiguous(&b->mbuf)) {
+ struct rte_mbuf *mbuf = CONST_CAST(struct rte_mbuf *, &b->mbuf);
+ struct dp_packet *pkt = CONST_CAST(struct dp_packet *, b);
+ uint32_t pkt_len = dp_packet_size(pkt);
+ struct mbuf_state *mstate = NULL;
+ void *dst = xmalloc(pkt_len);
+
+ /* Copy packet's data to system's memory */
+ if (!rte_pktmbuf_read(mbuf, 0, pkt_len, dst)) {
+ return NULL;
+ }
+
+ /* Free all mbufs except for the first */
+ dp_packet_clear(pkt);
+
+ /* Save mbuf's buf_addr to restore later */
+ mstate = xmalloc(sizeof(*mstate));
+ mstate->addr = pkt->mbuf.buf_addr;
+ mstate->len = pkt->mbuf.buf_len;
+ mstate->off = pkt->mbuf.data_off;
+ pkt->mstate = mstate;
+
+ /* Tranform DPBUF_DPDK packet into a DPBUF_MALLOC packet */
+ pkt->source = DPBUF_MALLOC;
+ pkt->mbuf.buf_addr = dst;
+ pkt->mbuf.buf_len = pkt_len;
+ pkt->mbuf.data_off = 0;
+ dp_packet_set_size(pkt, pkt_len);
+ }
+#endif
+ }
+
+ return dp_packet_data(b);
+}
+
static inline void
dp_packet_set_data(struct dp_packet *b, void *data)
{
@@ -825,6 +897,7 @@ dp_packet_mbuf_init(struct dp_packet *p OVS_UNUSED)
mbuf->ol_flags = mbuf->tx_offload = mbuf->packet_type = 0;
mbuf->nb_segs = 1;
mbuf->next = NULL;
+ p->mstate = NULL;
#endif
}
diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c
index e6d5a6e..14e6b3e 100644
--- a/lib/dpif-netlink.c
+++ b/lib/dpif-netlink.c
@@ -1875,7 +1875,7 @@ dpif_netlink_encode_execute(int dp_ifindex, const struct dpif_execute *d_exec,
k_exec->dp_ifindex = dp_ifindex;
nl_msg_put_unspec(buf, OVS_PACKET_ATTR_PACKET,
- dp_packet_data(d_exec->packet),
+ dp_packet_linear_data(d_exec->packet),
dp_packet_size(d_exec->packet));
key_ofs = nl_msg_start_nested(buf, OVS_PACKET_ATTR_KEY);
diff --git a/lib/dpif.c b/lib/dpif.c
index d799f97..9cd0d07 100644
--- a/lib/dpif.c
+++ b/lib/dpif.c
@@ -1830,7 +1830,7 @@ log_execute_message(const struct dpif *dpif,
uint64_t stub[1024 / 8];
struct ofpbuf md = OFPBUF_STUB_INITIALIZER(stub);
- packet = ofp_packet_to_string(dp_packet_data(execute->packet),
+ packet = ofp_packet_to_string(dp_packet_linear_data(execute->packet),
dp_packet_size(execute->packet),
execute->packet->packet_type);
odp_key_from_dp_packet(&md, execute->packet);
diff --git a/lib/netdev-bsd.c b/lib/netdev-bsd.c
index a153aa2..71dc87f 100644
--- a/lib/netdev-bsd.c
+++ b/lib/netdev-bsd.c
@@ -701,7 +701,7 @@ netdev_bsd_send(struct netdev *netdev_, int qid OVS_UNUSED,
}
for (i = 0; i < batch->count; i++) {
- const void *data = dp_packet_data(batch->packets[i]);
+ const void *data = dp_packet_data_linear(batch->packets[i]);
size_t size = dp_packet_size(batch->packets[i]);
while (!error) {
diff --git a/lib/netdev-dummy.c b/lib/netdev-dummy.c
index d498467..eef169c 100644
--- a/lib/netdev-dummy.c
+++ b/lib/netdev-dummy.c
@@ -233,7 +233,8 @@ dummy_packet_stream_run(struct netdev_dummy *dev, struct dummy_packet_stream *s)
ASSIGN_CONTAINER(txbuf_node, ovs_list_front(&s->txq), list_node);
txbuf = txbuf_node->pkt;
- retval = stream_send(s->stream, dp_packet_data(txbuf), dp_packet_size(txbuf));
+ retval = stream_send(s->stream, dp_packet_linear_data(txbuf),
+ dp_packet_size(txbuf));
if (retval > 0) {
dp_packet_pull(txbuf, retval);
@@ -1088,7 +1089,7 @@ netdev_dummy_send(struct netdev *netdev, int qid OVS_UNUSED,
struct dp_packet *packet;
DP_PACKET_BATCH_FOR_EACH(i, packet, batch) {
- const void *buffer = dp_packet_data(packet);
+ const void *buffer = dp_packet_linear_data(packet);
size_t size = dp_packet_size(packet);
if (batch->packets[i]->packet_type != htonl(PT_ETH)) {
diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
index 0c42268..e490ed9 100644
--- a/lib/netdev-linux.c
+++ b/lib/netdev-linux.c
@@ -1380,7 +1380,7 @@ netdev_linux_sock_batch_send(int sock, int ifindex,
struct dp_packet *packet;
DP_PACKET_BATCH_FOR_EACH (i, packet, batch) {
- iov[i].iov_base = dp_packet_data(packet);
+ iov[i].iov_base = dp_packet_linear_data(packet);
iov[i].iov_len = dp_packet_size(packet);
mmsg[i].msg_hdr = (struct msghdr) { .msg_name = &sll,
.msg_namelen = sizeof sll,
@@ -1434,7 +1434,8 @@ netdev_linux_tap_batch_send(struct netdev *netdev_,
int error;
do {
- retval = write(netdev->tap_fd, dp_packet_data(packet), size);
+ retval = write(netdev->tap_fd, dp_packet_linear_data(packet),
+ size);
error = retval < 0 ? errno : 0;
} while (error == EINTR);
diff --git a/lib/netdev-native-tnl.c b/lib/netdev-native-tnl.c
index 56baaa2..bb2fca1 100644
--- a/lib/netdev-native-tnl.c
+++ b/lib/netdev-native-tnl.c
@@ -87,7 +87,8 @@ netdev_tnl_ip_extract_tnl_md(struct dp_packet *packet, struct flow_tnl *tnl,
ovs_be32 ip_src, ip_dst;
if (OVS_UNLIKELY(!dp_packet_ip_checksum_valid(packet))) {
- if (csum(ip, IP_IHL(ip->ip_ihl_ver) * 4)) {
+ if (packet_csum(packet, packet->l3_ofs,
+ IP_IHL(ip->ip_ihl_ver) * 4)) {
VLOG_WARN_RL(&err_rl, "ip packet has invalid checksum");
return NULL;
}
@@ -196,11 +197,10 @@ udp_extract_tnl_md(struct dp_packet *packet, struct flow_tnl *tnl,
csum = packet_csum_pseudoheader(dp_packet_l3(packet));
}
- csum = csum_continue(csum, udp, dp_packet_size(packet) -
- ((const unsigned char *)udp -
- (const unsigned char *)dp_packet_eth(packet)
- ));
- if (csum_finish(csum)) {
+ size_t csize = dp_packet_size(packet) -
+ ((const unsigned char *)udp -
+ (const unsigned char *)dp_packet_eth(packet));
+ if (packet_csum_partial(packet, csum, packet->l4_ofs, csize)) {
return NULL;
}
}
@@ -236,8 +236,8 @@ netdev_tnl_push_udp_header(const struct netdev *netdev OVS_UNUSED,
csum = packet_csum_pseudoheader(netdev_tnl_ip_hdr(dp_packet_data(packet)));
}
- csum = csum_continue(csum, udp, ip_tot_size);
- udp->udp_csum = csum_finish(csum);
+ udp->udp_csum = packet_csum_partial(packet, csum, packet->l4_ofs,
+ ip_tot_size);
if (!udp->udp_csum) {
udp->udp_csum = htons(0xffff);
@@ -372,10 +372,10 @@ parse_gre_header(struct dp_packet *packet,
options = (ovs_16aligned_be32 *)(greh + 1);
if (greh->flags & htons(GRE_CSUM)) {
ovs_be16 pkt_csum;
+ size_t csize = dp_packet_size(packet) - ((const unsigned char *)greh -
+ (const unsigned char *)dp_packet_eth(packet));
- pkt_csum = csum(greh, dp_packet_size(packet) -
- ((const unsigned char *)greh -
- (const unsigned char *)dp_packet_eth(packet)));
+ pkt_csum = packet_csum(packet, packet->l4_ofs, csize);
if (pkt_csum) {
return -EINVAL;
}
@@ -449,7 +449,7 @@ netdev_gre_push_header(const struct netdev *netdev,
if (greh->flags & htons(GRE_CSUM)) {
ovs_be16 *csum_opt = (ovs_be16 *) (greh + 1);
- *csum_opt = csum(greh, ip_tot_size);
+ *csum_opt = packet_csum(packet, packet->l4_ofs, ip_tot_size);
}
if (greh->flags & htons(GRE_SEQ)) {
diff --git a/lib/odp-execute.c b/lib/odp-execute.c
index 5831d1f..e4d2604 100644
--- a/lib/odp-execute.c
+++ b/lib/odp-execute.c
@@ -231,7 +231,7 @@ static void
odp_set_nd(struct dp_packet *packet, const struct ovs_key_nd *key,
const struct ovs_key_nd *mask)
{
- const struct ovs_nd_msg *ns = dp_packet_l4(packet);
+ const struct ovs_nd_msg *ns = dp_packet_linear_ofs(packet, packet->l4_ofs);
const struct ovs_nd_lla_opt *lla_opt = dp_packet_get_nd_payload(packet);
if (OVS_LIKELY(ns && lla_opt)) {
diff --git a/lib/ofp-print.c b/lib/ofp-print.c
index cf93d2e..459e59f 100644
--- a/lib/ofp-print.c
+++ b/lib/ofp-print.c
@@ -111,7 +111,7 @@ ofp_packet_to_string(const void *data, size_t len, ovs_be32 packet_type)
char *
ofp_dp_packet_to_string(const struct dp_packet *packet)
{
- return ofp_packet_to_string(dp_packet_data(packet),
+ return ofp_packet_to_string(dp_packet_linear_data(packet),
dp_packet_size(packet),
packet->packet_type);
}
diff --git a/lib/ovs-lldp.c b/lib/ovs-lldp.c
index 05c1dd4..21605c6 100644
--- a/lib/ovs-lldp.c
+++ b/lib/ovs-lldp.c
@@ -668,7 +668,8 @@ lldp_process_packet(struct lldp *lldp, const struct dp_packet *p)
{
if (lldp) {
lldpd_recv(lldp->lldpd, lldpd_first_hardware(lldp->lldpd),
- (char *) dp_packet_data(p), dp_packet_size(p));
+ (char *) dp_packet_linear_data(p),
+ dp_packet_size(p));
}
}
diff --git a/lib/packets.c b/lib/packets.c
index 38bfb60..cb3ac30 100644
--- a/lib/packets.c
+++ b/lib/packets.c
@@ -1180,7 +1180,8 @@ packet_set_ipv6(struct dp_packet *packet, const struct in6_addr *src,
const struct in6_addr *dst, uint8_t key_tc, ovs_be32 key_fl,
uint8_t key_hl)
{
- struct ovs_16aligned_ip6_hdr *nh = dp_packet_l3(packet);
+ struct ovs_16aligned_ip6_hdr *nh = dp_packet_linear_ofs(packet,
+ packet->l3_ofs);
uint8_t proto = 0;
bool rh_present;
@@ -1645,6 +1646,23 @@ packet_csum_pseudoheader(const struct ip_header *ip)
return partial;
}
+uint32_t
+packet_csum_partial(const struct dp_packet *b, uint32_t partial,
+ uint16_t data_ofs, size_t len)
+{
+ void *data = dp_packet_linear_ofs(b, data_ofs);
+
+ uint32_t csum = csum_continue(partial, data, len);
+
+ return csum_finish(csum);
+}
+
+uint32_t
+packet_csum(const struct dp_packet *b, uint16_t data_ofs, size_t len)
+{
+ return packet_csum_partial(b, 0, data_ofs, len);
+}
+
#ifndef __CHECKER__
uint32_t
packet_csum_pseudoheader6(const struct ovs_16aligned_ip6_hdr *ip6)
diff --git a/lib/packets.h b/lib/packets.h
index 7645a9d..f2fcfdc 100644
--- a/lib/packets.h
+++ b/lib/packets.h
@@ -1552,6 +1552,9 @@ void packet_put_ra_prefix_opt(struct dp_packet *,
ovs_be32 preferred_lifetime,
const ovs_be128 router_prefix);
uint32_t packet_csum_pseudoheader(const struct ip_header *);
+uint32_t packet_csum_partial(const struct dp_packet *b, uint32_t partial,
+ uint16_t data_ofs, size_t len);
+uint32_t packet_csum(const struct dp_packet *b, uint16_t data_ofs, size_t len);
void IP_ECN_set_ce(struct dp_packet *pkt, bool is_ipv6);
#define DNS_HEADER_LEN 12
diff --git a/ofproto/ofproto-dpif-sflow.c b/ofproto/ofproto-dpif-sflow.c
index d17d7a8..3167ee4 100644
--- a/ofproto/ofproto-dpif-sflow.c
+++ b/ofproto/ofproto-dpif-sflow.c
@@ -1319,7 +1319,7 @@ dpif_sflow_received(struct dpif_sflow *ds, const struct dp_packet *packet,
header->stripped = 4;
header->header_length = MIN(dp_packet_size(packet),
sampler->sFlowFsMaximumHeaderSize);
- header->header_bytes = dp_packet_data(packet);
+ header->header_bytes = dp_packet_linear_data(packet);
/* Add extended switch element. */
memset(&switchElem, 0, sizeof(switchElem));
diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
index 6222207..ec55e69 100644
--- a/ofproto/ofproto-dpif-upcall.c
+++ b/ofproto/ofproto-dpif-upcall.c
@@ -1455,7 +1455,7 @@ process_upcall(struct udpif *udpif, struct upcall *upcall,
.pin = {
.up = {
.base = {
- .packet = xmemdup(dp_packet_data(packet),
+ .packet = xmemdup(dp_packet_linear_data(packet),
dp_packet_size(packet)),
.packet_len = dp_packet_size(packet),
.reason = cookie->controller.reason,
diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
index e26f6c8..a9d2547 100644
--- a/ofproto/ofproto-dpif-xlate.c
+++ b/ofproto/ofproto-dpif-xlate.c
@@ -1735,7 +1735,8 @@ stp_process_packet(const struct xport *xport, const struct dp_packet *packet)
}
if (dp_packet_try_pull(&payload, ETH_HEADER_LEN + LLC_HEADER_LEN)) {
- stp_received_bpdu(sp, dp_packet_data(&payload), dp_packet_size(&payload));
+ stp_received_bpdu(sp, dp_packet_linear_data(&payload),
+ dp_packet_size(&payload));
}
}
@@ -1786,7 +1787,8 @@ rstp_process_packet(const struct xport *xport, const struct dp_packet *packet)
}
if (dp_packet_try_pull(&payload, ETH_HEADER_LEN + LLC_HEADER_LEN)) {
- rstp_port_received_bpdu(xport->rstp_port, dp_packet_data(&payload),
+ rstp_port_received_bpdu(xport->rstp_port,
+ dp_packet_linear_data(&payload),
dp_packet_size(&payload));
}
}
@@ -2559,7 +2561,8 @@ update_mcast_snooping_table4__(const struct xlate_ctx *ctx,
size_t offset;
ovs_be32 ip4 = flow->igmp_group_ip4;
- offset = (char *) dp_packet_l4(packet) - (char *) dp_packet_data(packet);
+ offset = (char *) dp_packet_linear_ofs(packet, packet->l4_ofs) -
+ (char *) dp_packet_linear_data(packet);
igmp = dp_packet_at(packet, offset, IGMP_HEADER_LEN);
if (!igmp || csum(igmp, dp_packet_l4_size(packet)) != 0) {
xlate_report_debug(ctx, OFT_DETAIL,
@@ -2618,7 +2621,8 @@ update_mcast_snooping_table6__(const struct xlate_ctx *ctx,
int count;
size_t offset;
- offset = (char *) dp_packet_l4(packet) - (char *) dp_packet_data(packet);
+ offset = (char *) dp_packet_linear_ofs(packet, packet->l4_ofs) -
+ (char *) dp_packet_linear_data(packet);
mld = dp_packet_at(packet, offset, MLD_HEADER_LEN);
if (!mld ||
From patchwork Mon Aug 20 17:44:31 2018
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Lam, Tiago"
X-Patchwork-Id: 959859
X-Patchwork-Delegate: ian.stokes@intel.com
Return-Path:
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Authentication-Results: ozlabs.org;
spf=pass (mailfrom) smtp.mailfrom=openvswitch.org
(client-ip=140.211.169.12; helo=mail.linuxfoundation.org;
envelope-from=ovs-dev-bounces@openvswitch.org;
receiver=)
Authentication-Results: ozlabs.org;
dmarc=fail (p=none dis=none) header.from=intel.com
Received: from mail.linuxfoundation.org (mail.linuxfoundation.org
[140.211.169.12])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
bits)) (No client certificate requested)
by ozlabs.org (Postfix) with ESMTPS id 41vLr00wzxz9s8T
for ;
Tue, 21 Aug 2018 03:48:00 +1000 (AEST)
Received: from mail.linux-foundation.org (localhost [127.0.0.1])
by mail.linuxfoundation.org (Postfix) with ESMTP id D139BD9D;
Mon, 20 Aug 2018 17:45:21 +0000 (UTC)
X-Original-To: ovs-dev@openvswitch.org
Delivered-To: ovs-dev@mail.linuxfoundation.org
Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org
[172.17.192.35])
by mail.linuxfoundation.org (Postfix) with ESMTPS id 14DEBD1F
for ; Mon, 20 Aug 2018 17:45:18 +0000 (UTC)
X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6
Received: from mga06.intel.com (mga06.intel.com [134.134.136.31])
by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 843CB772
for ; Mon, 20 Aug 2018 17:45:17 +0000 (UTC)
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from fmsmga004.fm.intel.com ([10.253.24.48])
by orsmga104.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;
20 Aug 2018 10:45:16 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.53,266,1531810800"; d="scan'208";a="81880465"
Received: from silpixa00399125.ir.intel.com ([10.237.223.34])
by fmsmga004.fm.intel.com with ESMTP; 20 Aug 2018 10:45:04 -0700
From: Tiago Lam
To: ovs-dev@openvswitch.org
Date: Mon, 20 Aug 2018 18:44:31 +0100
Message-Id: <1534787075-139132-11-git-send-email-tiago.lam@intel.com>
X-Mailer: git-send-email 2.7.4
In-Reply-To: <1534787075-139132-1-git-send-email-tiago.lam@intel.com>
References: <1534787075-139132-1-git-send-email-tiago.lam@intel.com>
X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED
autolearn=ham version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
smtp1.linux-foundation.org
Cc: Michael Qiu , i.maximets@samsung.com
Subject: [ovs-dev] [PATCH v8 10/14] netdev-dpdk: copy large packet to
multi-seg. mbufs
X-BeenThere: ovs-dev@openvswitch.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id:
List-Unsubscribe: ,
List-Archive:
List-Post:
List-Help:
List-Subscribe: ,
MIME-Version: 1.0
Sender: ovs-dev-bounces@openvswitch.org
Errors-To: ovs-dev-bounces@openvswitch.org
From: Mark Kavanagh
Currently, packets are only copied to a single segment in the function
dpdk_do_tx_copy(). This could be an issue in the case of jumbo frames,
particularly when multi-segment mbufs are involved.
This patch calculates the number of segments needed by a packet and
copies the data to each segment.
A new function, dpdk_buf_alloc(), has also been introduced as a wrapper
around the nonpmd_mp_mutex to serialise allocations from a non-pmd
context.
Co-authored-by: Michael Qiu
Co-authored-by: Tiago Lam
Signed-off-by: Mark Kavanagh
Signed-off-by: Michael Qiu
Signed-off-by: Tiago Lam
Acked-by: Eelco Chaudron
---
lib/netdev-dpdk.c | 84 +++++++++++++++++++++++++++++++++++++++++++++++++------
1 file changed, 75 insertions(+), 9 deletions(-)
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index e005d00..e2df825 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -552,6 +552,27 @@ dpdk_rte_mzalloc(size_t sz)
return rte_zmalloc(OVS_VPORT_DPDK, sz, OVS_CACHE_LINE_SIZE);
}
+static struct rte_mbuf *
+dpdk_buf_alloc(struct rte_mempool *mp)
+{
+ struct rte_mbuf *mbuf = NULL;
+
+ /* If non-pmd we need to lock on nonpmd_mp_mutex mutex */
+ if (!dpdk_thread_is_pmd()) {
+ ovs_mutex_lock(&nonpmd_mp_mutex);
+
+ mbuf = rte_pktmbuf_alloc(mp);
+
+ ovs_mutex_unlock(&nonpmd_mp_mutex);
+
+ return mbuf;
+ }
+
+ mbuf = rte_pktmbuf_alloc(mp);
+
+ return mbuf;
+}
+
void
free_dpdk_buf(struct dp_packet *packet)
{
@@ -2316,6 +2337,49 @@ out:
}
}
+static int
+dpdk_copy_dp_packet_to_mbuf(struct dp_packet *packet, struct rte_mbuf **head,
+ struct rte_mempool *mp)
+{
+ struct rte_mbuf *mbuf, *fmbuf;
+ uint32_t size = dp_packet_size(packet);
+ uint16_t max_data_len;
+ uint32_t nb_segs = 0;
+
+ /* Allocate first mbuf to know the size of data available */
+ fmbuf = mbuf = *head = dpdk_buf_alloc(mp);
+ if (OVS_UNLIKELY(!mbuf)) {
+ return ENOMEM;
+ }
+
+ /* All new allocated mbuf's max data len is the same */
+ max_data_len = mbuf->buf_len - mbuf->data_off;
+
+ /* Calculate # of output mbufs. */
+ nb_segs = size / max_data_len;
+ if (size % max_data_len) {
+ nb_segs = nb_segs + 1;
+ }
+
+ /* Allocate additional mbufs, less the one alredy allocated above */
+ for (int i = 1; i < nb_segs; i++) {
+ mbuf->next = dpdk_buf_alloc(mp);
+ if (!mbuf->next) {
+ free_dpdk_buf(CONTAINER_OF(fmbuf, struct dp_packet, mbuf));
+ fmbuf = NULL;
+ return ENOMEM;
+ }
+ mbuf = mbuf->next;
+ }
+
+ fmbuf->nb_segs = nb_segs;
+ fmbuf->pkt_len = size;
+
+ dp_packet_mbuf_write(fmbuf, 0, size, dp_packet_data(packet));
+
+ return 0;
+}
+
/* Tx function. Transmit packets indefinitely */
static void
dpdk_do_tx_copy(struct netdev *netdev, int qid, struct dp_packet_batch *batch)
@@ -2332,6 +2396,7 @@ dpdk_do_tx_copy(struct netdev *netdev, int qid, struct dp_packet_batch *batch)
struct rte_mbuf *pkts[PKT_ARRAY_SIZE];
uint32_t cnt = batch_cnt;
uint32_t dropped = 0;
+ uint32_t i;
if (dev->type != DPDK_DEV_VHOST) {
/* Check if QoS has been configured for this netdev. */
@@ -2342,28 +2407,29 @@ dpdk_do_tx_copy(struct netdev *netdev, int qid, struct dp_packet_batch *batch)
uint32_t txcnt = 0;
- for (uint32_t i = 0; i < cnt; i++) {
+ for (i = 0; i < cnt; i++) {
struct dp_packet *packet = batch->packets[i];
uint32_t size = dp_packet_size(packet);
+ int err = 0;
if (OVS_UNLIKELY(size > dev->max_packet_len)) {
VLOG_WARN_RL(&rl, "Too big size %u max_packet_len %d",
size, dev->max_packet_len);
-
dropped++;
continue;
}
- pkts[txcnt] = rte_pktmbuf_alloc(dev->dpdk_mp->mp);
- if (OVS_UNLIKELY(!pkts[txcnt])) {
+ err = dpdk_copy_dp_packet_to_mbuf(packet, &pkts[txcnt],
+ dev->dpdk_mp->mp);
+ if (err != 0) {
+ if (err == ENOMEM) {
+ VLOG_ERR_RL(&rl, "Failed to alloc mbufs! %u packets dropped",
+ cnt - i);
+ }
+
dropped += cnt - i;
break;
}
-
- /* We have to do a copy for now */
- memcpy(rte_pktmbuf_mtod(pkts[txcnt], void *),
- dp_packet_data(packet), size);
- dp_packet_set_size((struct dp_packet *)pkts[txcnt], size);
dp_packet_copy_mbuf_flags((struct dp_packet *)pkts[txcnt], packet);
txcnt++;
From patchwork Mon Aug 20 17:44:32 2018
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Lam, Tiago"
X-Patchwork-Id: 959866
X-Patchwork-Delegate: ian.stokes@intel.com
Return-Path:
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Authentication-Results: ozlabs.org;
spf=pass (mailfrom) smtp.mailfrom=openvswitch.org
(client-ip=140.211.169.12; helo=mail.linuxfoundation.org;
envelope-from=ovs-dev-bounces@openvswitch.org;
receiver=)
Authentication-Results: ozlabs.org;
dmarc=fail (p=none dis=none) header.from=intel.com
Received: from mail.linuxfoundation.org (mail.linuxfoundation.org
[140.211.169.12])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
bits)) (No client certificate requested)
by ozlabs.org (Postfix) with ESMTPS id 41vLtq519yz9s0n
for ;
Tue, 21 Aug 2018 03:50:27 +1000 (AEST)
Received: from mail.linux-foundation.org (localhost [127.0.0.1])
by mail.linuxfoundation.org (Postfix) with ESMTP id 29D91DC8;
Mon, 20 Aug 2018 17:45:28 +0000 (UTC)
X-Original-To: ovs-dev@openvswitch.org
Delivered-To: ovs-dev@mail.linuxfoundation.org
Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org
[172.17.192.35])
by mail.linuxfoundation.org (Postfix) with ESMTPS id 48B2BD8F
for ; Mon, 20 Aug 2018 17:45:21 +0000 (UTC)
X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6
Received: from mga09.intel.com (mga09.intel.com [134.134.136.24])
by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 5FF0D67F
for ; Mon, 20 Aug 2018 17:45:19 +0000 (UTC)
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from fmsmga004.fm.intel.com ([10.253.24.48])
by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;
20 Aug 2018 10:45:16 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.53,266,1531810800"; d="scan'208";a="81880469"
Received: from silpixa00399125.ir.intel.com ([10.237.223.34])
by fmsmga004.fm.intel.com with ESMTP; 20 Aug 2018 10:45:07 -0700
From: Tiago Lam
To: ovs-dev@openvswitch.org
Date: Mon, 20 Aug 2018 18:44:32 +0100
Message-Id: <1534787075-139132-12-git-send-email-tiago.lam@intel.com>
X-Mailer: git-send-email 2.7.4
In-Reply-To: <1534787075-139132-1-git-send-email-tiago.lam@intel.com>
References: <1534787075-139132-1-git-send-email-tiago.lam@intel.com>
X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_HI
autolearn=ham version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
smtp1.linux-foundation.org
Cc: i.maximets@samsung.com
Subject: [ovs-dev] [PATCH v8 11/14] netdev-dpdk: support multi-segment jumbo
frames
X-BeenThere: ovs-dev@openvswitch.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id:
List-Unsubscribe: ,
List-Archive:
List-Post:
List-Help:
List-Subscribe: ,
MIME-Version: 1.0
Sender: ovs-dev-bounces@openvswitch.org
Errors-To: ovs-dev-bounces@openvswitch.org
From: Mark Kavanagh
Currently, jumbo frame support for OvS-DPDK is implemented by
increasing the size of mbufs within a mempool, such that each mbuf
within the pool is large enough to contain an entire jumbo frame of
a user-defined size. Typically, for each user-defined MTU,
'requested_mtu', a new mempool is created, containing mbufs of size
~requested_mtu.
With the multi-segment approach, a port uses a single mempool,
(containing standard/default-sized mbufs of ~2k bytes), irrespective
of the user-requested MTU value. To accommodate jumbo frames, mbufs
are chained together, where each mbuf in the chain stores a portion of
the jumbo frame. Each mbuf in the chain is termed a segment, hence the
name.
== Enabling multi-segment mbufs ==
Multi-segment and single-segment mbufs are mutually exclusive, and the
user must decide on which approach to adopt on init. The introduction
of a new OVSDB field, 'dpdk-multi-seg-mbufs', facilitates this. This
is a global boolean value, which determines how jumbo frames are
represented across all DPDK ports. In the absence of a user-supplied
value, 'dpdk-multi-seg-mbufs' defaults to false, i.e. multi-segment
mbufs must be explicitly enabled / single-segment mbufs remain the
default.
Setting the field is identical to setting existing DPDK-specific OVSDB
fields:
ovs-vsctl set Open_vSwitch . other_config:dpdk-init=true
ovs-vsctl set Open_vSwitch . other_config:dpdk-lcore-mask=0x10
ovs-vsctl set Open_vSwitch . other_config:dpdk-socket-mem=4096,0
==> ovs-vsctl set Open_vSwitch . other_config:dpdk-multi-seg-mbufs=true
Co-authored-by: Tiago Lam
Signed-off-by: Mark Kavanagh
Signed-off-by: Tiago Lam
Acked-by: Eelco Chaudron
---
Documentation/topics/dpdk/jumbo-frames.rst | 67 ++++++++++++++++++++++++++++++
Documentation/topics/dpdk/memory.rst | 36 ++++++++++++++++
NEWS | 1 +
lib/dpdk.c | 8 ++++
lib/netdev-dpdk.c | 66 +++++++++++++++++++++++++----
lib/netdev-dpdk.h | 2 +
vswitchd/vswitch.xml | 22 ++++++++++
7 files changed, 194 insertions(+), 8 deletions(-)
diff --git a/Documentation/topics/dpdk/jumbo-frames.rst b/Documentation/topics/dpdk/jumbo-frames.rst
index 00360b4..07bf3ca 100644
--- a/Documentation/topics/dpdk/jumbo-frames.rst
+++ b/Documentation/topics/dpdk/jumbo-frames.rst
@@ -71,3 +71,70 @@ Jumbo frame support has been validated against 9728B frames, which is the
largest frame size supported by Fortville NIC using the DPDK i40e driver, but
larger frames and other DPDK NIC drivers may be supported. These cases are
common for use cases involving East-West traffic only.
+
+-------------------
+Multi-segment mbufs
+-------------------
+
+Instead of increasing the size of mbufs within a mempool, such that each mbuf
+within the pool is large enough to contain an entire jumbo frame of a
+user-defined size, mbufs can be chained together instead. In this approach each
+mbuf in the chain stores a portion of the jumbo frame, by default ~2K bytes,
+irrespective of the user-requested MTU value. Since each mbuf in the chain is
+termed a segment, this approach is named "multi-segment mbufs".
+
+This approach may bring more flexibility in use cases where the maximum packet
+length may be hard to guess. For example, in cases where packets originate from
+sources marked for oflload (such as TSO), each packet may be larger than the
+MTU, and as such, when forwarding it to a DPDK port a single mbuf may not be
+enough to hold all of the packet's data.
+
+Multi-segment and single-segment mbufs are mutually exclusive, and the user
+must decide on which approach to adopt on initialisation. If multi-segment
+mbufs is to be enabled, it can be done so with the following command::
+
+ $ ovs-vsctl set Open_vSwitch . other_config:dpdk-multi-seg-mbufs=true
+
+Single-segment mbufs still remain the default when using OvS-DPDK, and the
+above option `dpdk-multi-seg-mbufs` must be explicitly set to `true` if
+multi-segment mbufs are to be used.
+
+~~~~~~~~~~~~~~~~~
+Performance notes
+~~~~~~~~~~~~~~~~~
+
+When using multi-segment mbufs some PMDs may not support vectorized Tx
+functions, due to its non-contiguous nature. As a result this can hit
+performance for smaller packet sizes. For example, on a setup sending 64B
+packets at line rate, a decrease of ~20% has been observed. The performance
+impact stops being noticeable for larger packet sizes, although the exact size
+will between PMDs, and depending on the architecture one's using.
+
+Tests performed with the i40e PMD driver only showed this limitation for 64B
+packets, and the same rate was observed when comparing multi-segment mbufs and
+single-segment mbuf for 128B packets. In other words, the 20% drop in
+performance was not observed for packets >= 128B during this test case.
+
+Because of this, multi-segment mbufs is not advised to be used with smaller
+packet sizes, such as 64B.
+
+Also, note that using multi-segment mbufs won't improve memory usage. For a
+packet of 9000B, for example, which would be stored on a single mbuf when using
+the single-segment approach, 5 mbufs (9000/2176) of 2176B would be needed to
+store the same data using the multi-segment mbufs approach (refer to
+:doc:`/topics/dpdk/memory` for examples).
+
+~~~~~~~~~~~
+Limitations
+~~~~~~~~~~~
+
+Because multi-segment mbufs store the data uncontiguously in memory, when used
+across DPDK and non-DPDK ports, a performance drop is expected, as the mbufs'
+content needs to be copied into a contiguous region in memory to be used by
+operations such as write(). Exchanging traffic between DPDK ports (such as
+vhost and physical ports) doesn't have this limitation, however.
+
+Other operations may have a hit in performance as well, under the current
+implementation. For example, operations that require a checksum to be performed
+on the data, such as pushing / popping a VXLAN header, will also require a copy
+of the data (if it hasn't been copied before).
diff --git a/Documentation/topics/dpdk/memory.rst b/Documentation/topics/dpdk/memory.rst
index e5fb166..d8a952a 100644
--- a/Documentation/topics/dpdk/memory.rst
+++ b/Documentation/topics/dpdk/memory.rst
@@ -82,6 +82,14 @@ Users should be aware of the following:
Below are a number of examples of memory requirement calculations for both
shared and per port memory models.
+.. note::
+
+ If multi-segment mbufs is enabled (:doc:`/topics/dpdk/jumbo-frames`), both
+ the **number of mbufs** and the **size of each mbuf** might be adjusted,
+ which might change slightly the amount of memory required for a given
+ mempool. Examples of how these calculations are performed are also provided
+ below, for the higher MTU case of each memory model.
+
Shared Memory Calculations
~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -142,6 +150,20 @@ Example 4
Mbuf size = 10176 Bytes
Memory required = 262144 * 10176 = 2667 MB
+Example 5 (multi-segment mbufs enabled)
++++++++++++++++++++++++++++++++++++++++
+::
+
+ MTU = 9000 Bytes
+ Number of mbufs = 262144
+ Mbuf size = 2176 Bytes
+ Memory required = 262144 * (2176 * 5) = 2852 MB
+
+.. note::
+
+ In order to hold 9000B of data, 5 mbufs of 2176B each will be needed, hence
+ the "5" above in 2176 * 5.
+
Per Port Memory Calculations
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -214,3 +236,17 @@ Example 3: (2 rxq, 2 PMD, 9000 MTU)
Number of mbufs = (2 * 2048) + (3 * 2048) + (1 * 32) + (16384) = 26656
Mbuf size = 10176 Bytes
Memory required = 26656 * 10176 = 271 MB
+
+Example 4: (2 rxq, 2 PMD, 9000 MTU, multi-segment mbufs enabled)
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+::
+
+ MTU = 9000
+ Number of mbufs = (2 * 2048) + (3 * 2048) + (1 * 32) + (16384) = 26656
+ Mbuf size = 2176 Bytes
+ Memory required = 26656 * (2176 * 5) = 290 MB
+
+.. note::
+
+ In order to hold 9000B of data, 5 mbufs of 2176B each will be needed, hence
+ the "5" above in 2176 * 5.
diff --git a/NEWS b/NEWS
index 8987f9a..0d3b6c7 100644
--- a/NEWS
+++ b/NEWS
@@ -51,6 +51,7 @@ v2.10.0 - xx xxx xxxx
* Allow init to fail and record DPDK status/version in OVS database.
* Add experimental flow hardware offload support
* Support both shared and per port mempools for DPDK devices.
+ * Add support for multi-segment mbufs.
- Userspace datapath:
* Commands ovs-appctl dpif-netdev/pmd-*-show can now work on a single PMD
* Detailed PMD performance metrics available with new command
diff --git a/lib/dpdk.c b/lib/dpdk.c
index 0ee3e19..ac89fd8 100644
--- a/lib/dpdk.c
+++ b/lib/dpdk.c
@@ -497,6 +497,14 @@ dpdk_init__(const struct smap *ovs_other_config)
/* Finally, register the dpdk classes */
netdev_dpdk_register();
+
+ bool multi_seg_mbufs_enable = smap_get_bool(ovs_other_config,
+ "dpdk-multi-seg-mbufs", false);
+ if (multi_seg_mbufs_enable) {
+ VLOG_INFO("DPDK multi-segment mbufs enabled\n");
+ netdev_dpdk_multi_segment_mbufs_enable();
+ }
+
return true;
}
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index e2df825..12c27b4 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -70,6 +70,7 @@ enum {VIRTIO_RXQ, VIRTIO_TXQ, VIRTIO_QNUM};
VLOG_DEFINE_THIS_MODULE(netdev_dpdk);
static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20);
+static bool dpdk_multi_segment_mbufs = false;
#define DPDK_PORT_WATCHDOG_INTERVAL 5
@@ -521,6 +522,18 @@ is_dpdk_class(const struct netdev_class *class)
|| class->destruct == netdev_dpdk_vhost_destruct;
}
+bool
+netdev_dpdk_is_multi_segment_mbufs_enabled(void)
+{
+ return dpdk_multi_segment_mbufs == true;
+}
+
+void
+netdev_dpdk_multi_segment_mbufs_enable(void)
+{
+ dpdk_multi_segment_mbufs = true;
+}
+
/* DPDK NIC drivers allocate RX buffers at a particular granularity, typically
* aligned at 1k or less. If a declared mbuf size is not a multiple of this
* value, insufficient buffers are allocated to accomodate the packet in its
@@ -636,14 +649,17 @@ dpdk_mp_sweep(void) OVS_REQUIRES(dpdk_mp_mutex)
}
}
-/* Calculating the required number of mbufs differs depending on the
- * mempool model being used. Check if per port memory is in use before
- * calculating.
- */
+/* Calculating the required number of mbufs differs depending on the mempool
+ * model (per port vs shared mempools) being used.
+ * In case multi-segment mbufs are being used, the number of mbufs is also
+ * increased, to account for the multiple mbufs needed to hold each packet's
+ * data. */
static uint32_t
-dpdk_calculate_mbufs(struct netdev_dpdk *dev, int mtu, bool per_port_mp)
+dpdk_calculate_mbufs(struct netdev_dpdk *dev, int mtu, uint32_t mbuf_size,
+ bool per_port_mp)
{
uint32_t n_mbufs;
+ uint16_t max_frame_len = 0;
if (!per_port_mp) {
/* Shared memory are being used.
@@ -672,6 +688,22 @@ dpdk_calculate_mbufs(struct netdev_dpdk *dev, int mtu, bool per_port_mp)
+ MIN_NB_MBUF;
}
+ /* If multi-segment mbufs are used, we also increase the number of
+ * mbufs used. This is done by calculating how many mbufs are needed to
+ * hold the data on a single packet of MTU size. For example, for a
+ * received packet of 9000B, 5 mbufs (9000 / 2048) are needed to hold
+ * the data - 4 more than with single-mbufs (as mbufs' size is extended
+ * to hold all data) */
+ max_frame_len = MTU_TO_MAX_FRAME_LEN(dev->requested_mtu);
+ if (dpdk_multi_segment_mbufs && mbuf_size < max_frame_len) {
+ uint16_t nb_segs = max_frame_len / mbuf_size;
+ if (max_frame_len % mbuf_size) {
+ nb_segs += 1;
+ }
+
+ n_mbufs *= nb_segs;
+ }
+
return n_mbufs;
}
@@ -700,8 +732,12 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu, bool per_port_mp)
/* Get the size of each mbuf, based on the MTU */
mbuf_size = dpdk_buf_size(dev->requested_mtu);
+ /* multi-segment mbufs - use standard mbuf size */
+ if (dpdk_multi_segment_mbufs) {
+ mbuf_size = dpdk_buf_size(ETHER_MTU);
+ }
- n_mbufs = dpdk_calculate_mbufs(dev, mtu, per_port_mp);
+ n_mbufs = dpdk_calculate_mbufs(dev, mtu, mbuf_size, per_port_mp);
do {
/* Full DPDK memory pool name must be unique and cannot be
@@ -959,6 +995,7 @@ dpdk_eth_dev_port_config(struct netdev_dpdk *dev, int n_rxq, int n_txq)
int diag = 0;
int i;
struct rte_eth_conf conf = port_conf;
+ struct rte_eth_txconf txconf;
struct rte_eth_dev_info info;
uint16_t conf_mtu;
@@ -975,6 +1012,18 @@ dpdk_eth_dev_port_config(struct netdev_dpdk *dev, int n_rxq, int n_txq)
}
}
+ /* Multi-segment-mbuf-specific setup. */
+ if (dpdk_multi_segment_mbufs) {
+ /* DPDK PMDs typically attempt to use simple or vectorized
+ * transmit functions, neither of which are compatible with
+ * multi-segment mbufs. Ensure that these are disabled when
+ * multi-segment mbufs are enabled.
+ */
+ rte_eth_dev_info_get(dev->port_id, &info);
+ txconf = info.default_txconf;
+ txconf.txq_flags &= ~ETH_TXQ_FLAGS_NOMULTSEGS;
+ }
+
conf.intr_conf.lsc = dev->lsc_interrupt_mode;
conf.rxmode.hw_ip_checksum = (dev->hw_ol_features &
NETDEV_RX_CHECKSUM_OFFLOAD) != 0;
@@ -1019,7 +1068,9 @@ dpdk_eth_dev_port_config(struct netdev_dpdk *dev, int n_rxq, int n_txq)
for (i = 0; i < n_txq; i++) {
diag = rte_eth_tx_queue_setup(dev->port_id, i, dev->txq_size,
- dev->socket_id, NULL);
+ dev->socket_id,
+ dpdk_multi_segment_mbufs ? &txconf
+ : NULL);
if (diag) {
VLOG_INFO("Interface %s unable to setup txq(%d): %s",
dev->up.name, i, rte_strerror(-diag));
@@ -4108,7 +4159,6 @@ unlock:
return err;
}
-
/* Find rte_flow with @ufid */
static struct rte_flow *
ufid_to_rte_flow_find(const ovs_u128 *ufid) {
diff --git a/lib/netdev-dpdk.h b/lib/netdev-dpdk.h
index b7d02a7..19aa5c6 100644
--- a/lib/netdev-dpdk.h
+++ b/lib/netdev-dpdk.h
@@ -25,6 +25,8 @@ struct dp_packet;
#ifdef DPDK_NETDEV
+bool netdev_dpdk_is_multi_segment_mbufs_enabled(void);
+void netdev_dpdk_multi_segment_mbufs_enable(void);
void netdev_dpdk_register(void);
void free_dpdk_buf(struct dp_packet *);
diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
index 0cd8520..253cfc9 100644
--- a/vswitchd/vswitch.xml
+++ b/vswitchd/vswitch.xml
@@ -338,6 +338,28 @@
+
+
+ Specifies if DPDK uses multi-segment mbufs for handling jumbo frames.
+
+
+ If true, DPDK allocates a single mempool per port, irrespective of
+ the ports' requested MTU sizes. The elements of this mempool are
+ 'standard'-sized mbufs (typically 2k MB), which may be chained
+ together to accommodate jumbo frames. In this approach, each mbuf
+ typically stores a fragment of the overall jumbo frame.
+
+
+ If not specified, defaults to false
, in which case, the
+ size of each mbuf within a DPDK port's mempool will be grown to
+ accommodate jumbo frames within a single mbuf.
+
+
+ Changing this value requires restarting the daemon.
+
+
+
From patchwork Mon Aug 20 17:44:33 2018
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Lam, Tiago"
X-Patchwork-Id: 959867
X-Patchwork-Delegate: ian.stokes@intel.com
Return-Path:
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Authentication-Results: ozlabs.org;
spf=pass (mailfrom) smtp.mailfrom=openvswitch.org
(client-ip=140.211.169.12; helo=mail.linuxfoundation.org;
envelope-from=ovs-dev-bounces@openvswitch.org;
receiver=)
Authentication-Results: ozlabs.org;
dmarc=fail (p=none dis=none) header.from=intel.com
Received: from mail.linuxfoundation.org (mail.linuxfoundation.org
[140.211.169.12])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
bits)) (No client certificate requested)
by ozlabs.org (Postfix) with ESMTPS id 41vLvF4Djxz9s0n
for ;
Tue, 21 Aug 2018 03:50:49 +1000 (AEST)
Received: from mail.linux-foundation.org (localhost [127.0.0.1])
by mail.linuxfoundation.org (Postfix) with ESMTP id D3A76DCB;
Mon, 20 Aug 2018 17:45:28 +0000 (UTC)
X-Original-To: ovs-dev@openvswitch.org
Delivered-To: ovs-dev@mail.linuxfoundation.org
Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org
[172.17.192.35])
by mail.linuxfoundation.org (Postfix) with ESMTPS id C336BDA5
for ; Mon, 20 Aug 2018 17:45:22 +0000 (UTC)
X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6
Received: from mga09.intel.com (mga09.intel.com [134.134.136.24])
by smtp1.linuxfoundation.org (Postfix) with ESMTPS id A93B077E
for ; Mon, 20 Aug 2018 17:45:19 +0000 (UTC)
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from fmsmga004.fm.intel.com ([10.253.24.48])
by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;
20 Aug 2018 10:45:16 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.53,266,1531810800"; d="scan'208";a="81880471"
Received: from silpixa00399125.ir.intel.com ([10.237.223.34])
by fmsmga004.fm.intel.com with ESMTP; 20 Aug 2018 10:45:09 -0700
From: Tiago Lam
To: ovs-dev@openvswitch.org
Date: Mon, 20 Aug 2018 18:44:33 +0100
Message-Id: <1534787075-139132-13-git-send-email-tiago.lam@intel.com>
X-Mailer: git-send-email 2.7.4
In-Reply-To: <1534787075-139132-1-git-send-email-tiago.lam@intel.com>
References: <1534787075-139132-1-git-send-email-tiago.lam@intel.com>
X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_HI
autolearn=ham version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
smtp1.linux-foundation.org
Cc: i.maximets@samsung.com
Subject: [ovs-dev] [PATCH v8 12/14] dpdk-tests: Add unit-tests for multi-seg
mbufs.
X-BeenThere: ovs-dev@openvswitch.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id:
List-Unsubscribe: ,
List-Archive:
List-Post:
List-Help:
List-Subscribe: ,
MIME-Version: 1.0
Sender: ovs-dev-bounces@openvswitch.org
Errors-To: ovs-dev-bounces@openvswitch.org
In order to create a minimal environment that allows the tests to get
mbufs from an existing mempool, the following approach is taken:
- EAL is initialised (by using the main dpdk_init()) and a (very) small
mempool is instantiated (mimicking the logic in dpdk_mp_create()).
This mempool instance is global and used by all the tests;
- Packets are then allocated from the instantiated mempool, and tested
on, by running some operations on them and manipulating data.
The tests introduced focus on testing DPDK dp_packets (where
source=DPBUF_DPDK), linked with a single or multiple mbufs, across
several operations, such as:
- dp_packet_put();
- dp_packet_shift();
- dp_packet_reserve();
- dp_packet_push_uninit();
- dp_packet_clear();
- dp_packet_equal();
- dp_packet_linear_data();
- And as a consequence of some of these, dp_packet_put_uninit() and
dp_packet_resize__().
Finally, this has also been integrated with the new DPDK testsuite.
Thus, when running `$sudo make check-dpdk` one will also be running
these tests.
Signed-off-by: Tiago Lam
Acked-by: Eelco Chaudron
---
tests/automake.mk | 10 +-
tests/dpdk-packet-mbufs.at | 7 +
tests/system-dpdk-testsuite.at | 1 +
tests/test-dpdk-mbufs.c | 619 +++++++++++++++++++++++++++++++++++++++++
4 files changed, 636 insertions(+), 1 deletion(-)
create mode 100644 tests/dpdk-packet-mbufs.at
create mode 100644 tests/test-dpdk-mbufs.c
diff --git a/tests/automake.mk b/tests/automake.mk
index 49ceb41..f484f69 100644
--- a/tests/automake.mk
+++ b/tests/automake.mk
@@ -135,7 +135,8 @@ SYSTEM_DPDK_TESTSUITE_AT = \
tests/system-common-macros.at \
tests/system-dpdk-macros.at \
tests/system-dpdk-testsuite.at \
- tests/system-dpdk.at
+ tests/system-dpdk.at \
+ tests/dpdk-packet-mbufs.at
check_SCRIPTS += tests/atlocal
@@ -392,6 +393,10 @@ tests_ovstest_SOURCES = \
tests/test-vconn.c \
tests/test-aa.c \
tests/test-stopwatch.c
+if DPDK_NETDEV
+tests_ovstest_SOURCES += \
+ tests/test-dpdk-mbufs.c
+endif
if !WIN32
tests_ovstest_SOURCES += \
@@ -404,6 +409,9 @@ tests_ovstest_SOURCES += \
endif
tests_ovstest_LDADD = lib/libopenvswitch.la ovn/lib/libovn.la
+if DPDK_NETDEV
+tests_ovstest_LDFLAGS = $(AM_LDFLAGS) $(DPDK_vswitchd_LDFLAGS)
+endif
noinst_PROGRAMS += tests/test-strtok_r
tests_test_strtok_r_SOURCES = tests/test-strtok_r.c
diff --git a/tests/dpdk-packet-mbufs.at b/tests/dpdk-packet-mbufs.at
new file mode 100644
index 0000000..f28e4fc
--- /dev/null
+++ b/tests/dpdk-packet-mbufs.at
@@ -0,0 +1,7 @@
+AT_BANNER([OVS-DPDK dp_packet unit tests])
+
+AT_SETUP([OVS-DPDK dp_packet - mbufs allocation])
+AT_KEYWORDS([dp_packet, multi-seg, mbufs])
+AT_CHECK(ovstest test-dpdk-packet, [], [ignore], [ignore])
+
+AT_CLEANUP
diff --git a/tests/system-dpdk-testsuite.at b/tests/system-dpdk-testsuite.at
index 382f09e..f5edf58 100644
--- a/tests/system-dpdk-testsuite.at
+++ b/tests/system-dpdk-testsuite.at
@@ -23,3 +23,4 @@ m4_include([tests/system-common-macros.at])
m4_include([tests/system-dpdk-macros.at])
m4_include([tests/system-dpdk.at])
+m4_include([tests/dpdk-packet-mbufs.at])
diff --git a/tests/test-dpdk-mbufs.c b/tests/test-dpdk-mbufs.c
new file mode 100644
index 0000000..19081a3
--- /dev/null
+++ b/tests/test-dpdk-mbufs.c
@@ -0,0 +1,619 @@
+/*
+ * Copyright (c) 2018 Intel Corporation
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include "dp-packet.h"
+#include "ovstest.h"
+#include "dpdk.h"
+#include "smap.h"
+
+#define N_MBUFS 1024
+#define MBUF_DATA_LEN 2048
+
+static int num_tests = 0;
+
+/* Global var to hold a mempool instance, "test-mp", used in all of the tests
+ * below. This instance is instantiated in dpdk_setup_eal_with_mp(). */
+static struct rte_mempool *mp;
+
+/* Test data used to fill the packets with data. Note that this isn't a string
+ * that repsents a valid packet, by any means. The pattern is generated in set_
+ * testing_pattern_str() and the sole purpose is to verify the data remains the
+ * same after inserting and operating on multi-segment mbufs. */
+static char *test_str;
+
+/* Asserts a dp_packet that holds a single mbuf, where:
+ * - nb_segs must be 1;
+ * - pkt_len must be equal to data_len which in turn must equal the provided
+ * 'pkt_len';
+ * - data_off must start at the provided 'data_ofs';
+ * - next must be NULL. */
+static void
+assert_single_mbuf(struct dp_packet *pkt, uint16_t data_ofs,
+ uint32_t pkt_len) {
+ struct rte_mbuf *mbuf = CONST_CAST(struct rte_mbuf *, &pkt->mbuf);
+ ovs_assert(mbuf->nb_segs == 1);
+ ovs_assert(mbuf->data_off == data_ofs);
+ ovs_assert(mbuf->pkt_len == mbuf->data_len);
+ ovs_assert(mbuf->pkt_len == pkt_len);
+ ovs_assert(mbuf->next == NULL);
+}
+
+/* Asserts a dp_packet that holds multiple mbufs, where:
+ * - nb_segs must be > 1 and equal to the provided 'nb_segs';
+ * - data_off must start at the provided 'data_ofs';
+ * - pkt_len must be equal to the provided 'pkt_len' and the some of each
+ * mbufs' 'data_len' must equal the pky_len;
+ * - next must not be NULL. */
+static void
+assert_multiple_mbufs(struct dp_packet *pkt, uint16_t data_ofs,
+ uint32_t pkt_len, uint16_t nb_segs) {
+ struct rte_mbuf *mbuf = CONST_CAST(struct rte_mbuf *, &pkt->mbuf);
+ ovs_assert(mbuf->nb_segs > 1 && mbuf->nb_segs == nb_segs);
+ ovs_assert(mbuf->data_off == data_ofs);
+ ovs_assert(mbuf->pkt_len != mbuf->data_len);
+ ovs_assert(mbuf->next != NULL);
+ ovs_assert(mbuf->pkt_len == pkt_len);
+ /* Make sure pkt_len equals the sum of all segments data_len */
+ while (mbuf) {
+ pkt_len -= rte_pktmbuf_data_len(mbuf);
+ mbuf = mbuf->next;
+ }
+ ovs_assert(pkt_len == 0);
+}
+
+/* Asserts that the data existing in a packet, starting at 'data_ofs' of the
+ * first mbuf and of length 'data_len' matches the global test_str used,
+ * starting at index 0 and of the same length. */
+static void
+assert_data(struct dp_packet *pkt, uint16_t data_ofs, uint16_t data_len) {
+ struct rte_mbuf *mbuf = CONST_CAST(struct rte_mbuf *, &pkt->mbuf);
+
+ char *data = xmalloc(sizeof(*data) * data_len);
+ const char *rd = rte_pktmbuf_read(mbuf, data_ofs, data_len, data);
+
+ ovs_assert(rd != NULL);
+ ovs_assert(memcmp(rd, test_str, data_len) == 0);
+
+ free(data);
+}
+
+static void
+set_testing_pattern_str(void) {
+ static const char *pattern = "1234567890";
+
+ /* Pattern will be of size 5000B */
+ size_t test_str_len = 5000;
+ test_str = xmalloc(test_str_len * sizeof(*test_str) + 1);
+
+ for (int i = 0; i < test_str_len; i += strlen(pattern)) {
+ memcpy(test_str + i, pattern, strlen(pattern));
+ }
+
+ test_str[test_str_len] = 0;
+}
+
+static void
+dpdk_eal_init(void) {
+ struct smap other_config;
+ smap_init(&other_config);
+
+ printf("Initialising EAL...\n");
+ smap_add(&other_config, "dpdk-init", "true");
+ smap_add(&other_config, "dpdk-lcore-mask", "10");
+ smap_add(&other_config, "dpdk-socket-mem", "2048,0");
+ smap_add(&other_config, "dpdk-multi-seg-mbufs", "true");
+
+ dpdk_init(&other_config);
+}
+
+/* The allocation of mbufs here mimics the logic in dpdk_mp_create in
+ * netdev-dpdk.c. */
+static struct rte_mempool *
+dpdk_mp_create(char *mp_name) {
+ uint16_t mbuf_size, aligned_mbuf_size, mbuf_priv_data_len;
+
+ mbuf_size = sizeof (struct dp_packet) +
+ MBUF_DATA_LEN + RTE_PKTMBUF_HEADROOM;
+ aligned_mbuf_size = ROUND_UP(mbuf_size, RTE_CACHE_LINE_SIZE);
+ mbuf_priv_data_len = sizeof(struct dp_packet) - sizeof(struct rte_mbuf) +
+ (aligned_mbuf_size - mbuf_size);
+
+ struct rte_mempool *mpool = rte_pktmbuf_pool_create(
+ mp_name, N_MBUFS,
+ RTE_MEMPOOL_CACHE_MAX_SIZE,
+ mbuf_priv_data_len,
+ MBUF_DATA_LEN +
+ RTE_PKTMBUF_HEADROOM /* defaults 128B */,
+ SOCKET_ID_ANY);
+ if (mpool) {
+ printf("Allocated \"%s\" mempool with %u mbufs\n", mp_name, N_MBUFS);
+ } else {
+ printf("Failed mempool \"%s\" create request of %u mbufs: %s.\n",
+ mp_name, N_MBUFS, rte_strerror(rte_errno));
+
+ ovs_assert(mpool != NULL);
+ }
+
+ return mpool;
+}
+
+static void
+dpdk_setup_eal_with_mp(void) {
+ dpdk_eal_init();
+
+ mp = dpdk_mp_create("test-mp");
+ ovs_assert(mp != NULL);
+}
+
+static struct dp_packet *
+dpdk_mp_alloc_pkt(struct rte_mempool *mpool) {
+ struct rte_mbuf *mbuf = rte_pktmbuf_alloc(mpool);
+
+ struct dp_packet *pkt = (struct dp_packet *) mbuf;
+ pkt->source = DPBUF_DPDK;
+
+ return pkt;
+}
+
+/* Similar to dp_packet_put() in dp-packet.c, appends the 'size' bytes of data
+ * in 'p' to the tail end of 'pkt', allocating new mbufs if needed. */
+static struct dp_packet *
+dpdk_pkt_put(struct dp_packet *pkt, void *p, size_t size) {
+ uint16_t max_data_len, nb_segs;
+ struct rte_mbuf *mbuf, *fmbuf;
+
+ mbuf = CONST_CAST(struct rte_mbuf *, &pkt->mbuf);
+
+ /* All new allocated mbuf's max data len is the same */
+ max_data_len = mbuf->buf_len - mbuf->data_off;
+
+ /* Calculate # of needed mbufs to accomodate 'miss_len' */
+ nb_segs = size / max_data_len;
+ if (size % max_data_len) {
+ nb_segs += 1;
+ }
+
+ /* Proceed with the allocation of new mbufs */
+ mp = mbuf->pool;
+ fmbuf = mbuf;
+ mbuf = rte_pktmbuf_lastseg(mbuf);
+
+ for (int i = 0; i < nb_segs; i++) {
+ /* This takes care of initialising buf_len, data_len and other
+ * fields properly */
+ mbuf->next = rte_pktmbuf_alloc(mp);
+ if (!mbuf->next) {
+ printf("Problem allocating more mbufs for tests.\n");
+ rte_pktmbuf_free(mbuf);
+ fmbuf = NULL;
+ return NULL;
+ }
+
+ fmbuf->nb_segs += 1;
+
+ mbuf = mbuf->next;
+ }
+
+ dp_packet_mbuf_write(fmbuf, 0, size, p);
+
+ dp_packet_set_size(pkt, size);
+
+ return pkt;
+}
+
+static int
+test_dpdk_packet_insert_headroom(void) {
+ struct dp_packet *pkt = dpdk_mp_alloc_pkt(mp);
+ ovs_assert(pkt != NULL);
+
+ /* Reserve 256B of header */
+ size_t str_len = 512;
+ dp_packet_reserve(pkt, str_len);
+ char *p = dp_packet_push_uninit(pkt, str_len);
+ ovs_assert(p != NULL);
+ /* Put the first 512B of "test_str" in the allocated header */
+ memcpy(p, test_str, str_len);
+
+ /* Check properties and data are as expected */
+ assert_single_mbuf(pkt, RTE_PKTMBUF_HEADROOM, str_len);
+ assert_data(pkt, 0, str_len);
+
+ dp_packet_uninit(pkt);
+
+ return 0;
+}
+
+static int
+test_dpdk_packet_insert_tailroom_and_headroom(void) {
+ struct dp_packet *pkt = dpdk_mp_alloc_pkt(mp);
+ ovs_assert(pkt != NULL);
+
+ /* Reserve 256B of header */
+ size_t head_len = 256;
+ dp_packet_reserve(pkt, head_len);
+
+ /* Put the first 512B of "test_str" in the packet's header */
+ size_t str_len = 512;
+ char *p = dp_packet_put(pkt, test_str, str_len);
+ ovs_assert(p != NULL);
+
+ /* Fill the reserved 256B of header */
+ p = dp_packet_push_uninit(pkt, head_len);
+ ovs_assert(p != NULL);
+
+ /* Check properties and data are as expected */
+ assert_single_mbuf(pkt, RTE_PKTMBUF_HEADROOM, str_len + head_len);
+
+ /* Check the data inserted in the packet is correct */
+ char *data = xmalloc(sizeof(*data) * (str_len + head_len));
+ const char *rd = rte_pktmbuf_read(&pkt->mbuf, 0, str_len + head_len, data);
+ ovs_assert(rd != NULL);
+ /* Because of the headroom inserted, the data now begin at offset 256 */
+ ovs_assert(memcmp(rd + head_len, test_str, str_len) == 0);
+
+ dp_packet_uninit(pkt);
+ free(data);
+
+ return 0;
+}
+
+static int
+test_dpdk_packet_insert_tailroom_and_headroom_multiple_mbufs(void) {
+ struct dp_packet *pkt = dpdk_mp_alloc_pkt(mp);
+ ovs_assert(pkt != NULL);
+
+ /* Put the first 2050B of "test_str" in the packet, just enought to
+ * allocate two mbufs */
+ size_t str_len = MBUF_DATA_LEN + 2;
+ pkt = dpdk_pkt_put(pkt, test_str, str_len);
+ ovs_assert(pkt != NULL);
+
+ /* Put the first 512B of "test_str" in the packet's header */
+ size_t tail_len = 512;
+ char *p = dp_packet_put(pkt, test_str, tail_len);
+ ovs_assert(p != NULL);
+
+ /* Fill the entire headroom */
+ size_t head_len = RTE_PKTMBUF_HEADROOM;
+ p = dp_packet_push_uninit(pkt, head_len);
+ ovs_assert(p != NULL);
+ /* Copy the data to the reserved headroom */
+ memcpy(p, test_str, head_len);
+
+ /* Check properties and data are as expected */
+ size_t pkt_len = head_len + str_len + tail_len;
+ uint16_t nb_segs = 2;
+ assert_multiple_mbufs(pkt, 0, pkt_len, nb_segs);
+
+ /* Check the data inserted in the packet is correct */
+ char *data = xmalloc(sizeof(*data) * pkt_len);
+ const char *rd = rte_pktmbuf_read(&pkt->mbuf, 0, pkt_len, data);
+ ovs_assert(rd != NULL);
+ ovs_assert(memcmp(rd, test_str, head_len) == 0);
+ ovs_assert(memcmp(rd + head_len + str_len, test_str, tail_len) == 0);
+
+ dp_packet_uninit(pkt);
+ free(data);
+
+ return 0;
+}
+
+static int
+test_dpdk_packet_insert_tailroom_multiple_mbufs(void) {
+ struct dp_packet *pkt = dpdk_mp_alloc_pkt(mp);
+ ovs_assert(pkt != NULL);
+
+ /* Put the first 2050B of "test_str" in the packet, just enought to
+ * allocate two mbufs */
+ size_t str_len = MBUF_DATA_LEN + 2;
+ pkt = dpdk_pkt_put(pkt, test_str, str_len);
+ ovs_assert(pkt != NULL);
+
+ /* Put the first 2000B of "test_str" in the packet's end */
+ size_t tail_len = 2000;
+ char *p = dp_packet_put(pkt, test_str, tail_len);
+ ovs_assert(p != NULL);
+
+ /* Check properties and data are as expected */
+ char *data = xmalloc(sizeof(*data) * (str_len + tail_len));
+ const char *rd = rte_pktmbuf_read(&pkt->mbuf, 0, str_len + tail_len, data);
+ ovs_assert(rd != NULL);
+ /* Because of the headroom inserted, the data now begin at offset 256 */
+ ovs_assert(memcmp(rd + str_len, test_str, tail_len) == 0);
+
+ dp_packet_uninit(pkt);
+ free(data);
+
+ return 0;
+}
+
+static int
+test_dpdk_packet_insert_headroom_multiple_mbufs(void) {
+ struct dp_packet *pkt = dpdk_mp_alloc_pkt(mp);
+ ovs_assert(pkt != NULL);
+
+ /* Put the first 2050B of "test_str" in the packet, just enought to
+ * allocate two mbufs */
+ size_t str_len = MBUF_DATA_LEN + 2;
+ pkt = dpdk_pkt_put(pkt, test_str, str_len);
+
+ /* Fill the entire headroom */
+ size_t head_len = RTE_PKTMBUF_HEADROOM;
+ char *p = dp_packet_push_uninit(pkt, head_len);
+ ovs_assert(p != NULL);
+
+ /* Check properties and data are as expected */
+ char *data = xmalloc(sizeof(*data) * (str_len + head_len));
+ const char *rd = rte_pktmbuf_read(&pkt->mbuf, 0, str_len + head_len, data);
+ ovs_assert(rd != NULL);
+ /* Because of the headroom inserted, the data is at offset 'head_len' */
+ ovs_assert(memcmp(rd + head_len, test_str, str_len) == 0);
+
+ dp_packet_uninit(pkt);
+ free(data);
+
+ return 0;
+}
+
+static int
+test_dpdk_packet_change_size(void) {
+ struct dp_packet *pkt = dpdk_mp_alloc_pkt(mp);
+ ovs_assert(pkt != NULL);
+
+ /* Put enough data in the packet that spans three mbufs (5120B) */
+ size_t str_len = MBUF_DATA_LEN * 2 + 1024;
+ pkt = dpdk_pkt_put(pkt, test_str, str_len);
+ ovs_assert(pkt != NULL);
+
+ /* Check properties and data are as expected */
+ uint16_t nb_segs = 3;
+ assert_multiple_mbufs(pkt, RTE_PKTMBUF_HEADROOM, str_len, nb_segs);
+
+ /* Change the size of the packet to fit in a single mbuf */
+ dp_packet_clear(pkt);
+
+ assert_single_mbuf(pkt, RTE_PKTMBUF_HEADROOM, 0);
+
+ dp_packet_uninit(pkt);
+
+ return 0;
+}
+
+/* Shift() tests */
+
+static int
+test_dpdk_packet_shift_single_mbuf(void) {
+ struct dp_packet *pkt = dpdk_mp_alloc_pkt(mp);
+ ovs_assert(pkt != NULL);
+
+ /* Put the first 1024B of "test_str" in the packet */
+ size_t str_len = 1024;
+ char *p = dp_packet_put(pkt, test_str, str_len);
+ ovs_assert(p != NULL);
+
+ /* Shift data right by 512B */
+ uint16_t shift_len = 512;
+ dp_packet_shift(pkt, shift_len);
+
+ /* Check properties and data are as expected */
+ assert_single_mbuf(pkt, RTE_PKTMBUF_HEADROOM + shift_len, str_len);
+ assert_data(pkt, 0, str_len);
+
+ dp_packet_uninit(pkt);
+
+ return 0;
+}
+
+static int
+test_dpdk_packet_shift_multiple_mbufs(void) {
+ struct dp_packet *pkt = dpdk_mp_alloc_pkt(mp);
+ ovs_assert(pkt != NULL);
+
+ /* Put the data in "test_str" in the packet */
+ size_t str_len = strlen(test_str);
+ pkt = dpdk_pkt_put(pkt, test_str, str_len);
+ ovs_assert(pkt != NULL);
+
+ /* Check properties and data are as expected */
+ uint16_t nb_segs = 3;
+ assert_multiple_mbufs(pkt, RTE_PKTMBUF_HEADROOM, str_len, nb_segs);
+
+ /* Shift data right by 1024B */
+ uint16_t shift_len = 1024;
+ dp_packet_shift(pkt, shift_len);
+
+ /* Check the data has been inserted correctly */
+ assert_multiple_mbufs(pkt, RTE_PKTMBUF_HEADROOM + shift_len, str_len,
+ nb_segs);
+ assert_data(pkt, 0, str_len);
+
+ dp_packet_uninit(pkt);
+
+ return 0;
+}
+
+static int
+test_dpdk_packet_shift_right_then_left(void) {
+ struct dp_packet *pkt = dpdk_mp_alloc_pkt(mp);
+ ovs_assert(pkt != NULL);
+
+ /* Put the first 1024B of "test_str" in the packet */
+ size_t str_len = strlen(test_str);
+ pkt = dpdk_pkt_put(pkt, test_str, str_len);
+ ovs_assert(pkt != NULL);
+
+ /* Shift data right by 1024B */
+ int16_t shift_len = 1024;
+ dp_packet_shift(pkt, 1024);
+
+ /* Check properties and data are as expected */
+ uint16_t nb_segs = 3;
+ assert_multiple_mbufs(pkt, RTE_PKTMBUF_HEADROOM + shift_len, str_len,
+ nb_segs);
+
+ /* Shift data left by 512B */
+ dp_packet_shift(pkt, -shift_len);
+
+ /* We negative shift_len (-shift_len) since */
+ assert_multiple_mbufs(pkt, RTE_PKTMBUF_HEADROOM, str_len,
+ nb_segs);
+ assert_data(pkt, 0, str_len);
+
+ dp_packet_uninit(pkt);
+
+ return 0;
+}
+
+static int
+test_dpdk_packet_equal_multiple_mbufs(void) {
+ /* Allocate first packet for comparison */
+ struct dp_packet *pkt1 = dpdk_mp_alloc_pkt(mp);
+ ovs_assert(pkt1 != NULL);
+
+ /* Put the data in "test_str" in the packet */
+ size_t str_len = strlen(test_str);
+ pkt1 = dpdk_pkt_put(pkt1, test_str, str_len);
+ ovs_assert(pkt1 != NULL);
+
+ /* Check properties and data are as expected */
+ uint16_t nb_segs = 3;
+ assert_multiple_mbufs(pkt1, RTE_PKTMBUF_HEADROOM, str_len, nb_segs);
+
+ /* Allocate second packet for comparison */
+ struct dp_packet *pkt2 = dpdk_mp_alloc_pkt(mp);
+ ovs_assert(pkt2 != NULL);
+
+ /* Put the data in "test_str" in the packet */
+ pkt2 = dpdk_pkt_put(pkt2, test_str, str_len);
+ ovs_assert(pkt2 != NULL);
+
+ /* Check properties and data are as expected */
+ assert_multiple_mbufs(pkt2, RTE_PKTMBUF_HEADROOM, str_len, nb_segs);
+
+ ovs_assert(dp_packet_equal(pkt1, pkt2));
+
+ dp_packet_uninit(pkt1);
+ dp_packet_uninit(pkt2);
+
+ return 0;
+}
+
+static int
+test_dpdk_packet_single_mbuf_to_linear_malloc(void) {
+ struct dp_packet *pkt = dpdk_mp_alloc_pkt(mp);
+ ovs_assert(pkt != NULL);
+
+ /* Put the first 1024B of "test_str" in the packet */
+ size_t str_len = 1024;
+ char *p = dp_packet_put(pkt, test_str, str_len);
+ ovs_assert(p != NULL);
+
+ char *paddr = rte_pktmbuf_mtod(&pkt->mbuf, char *);
+ /* Convert DPBUF_DPDK packet in a linear DPBUF_MALLOC packet */
+ char *d = dp_packet_linear_data(pkt);
+
+ /* Check properties and data are as expected, namely:
+ * - The packet is still a DPBUF_DPDK packet;
+ * - The returned address is still an address in the mbuf;
+ * - Single mbuf properties still hold. */
+ ovs_assert(d != NULL);
+ ovs_assert(pkt->source == DPBUF_DPDK);
+ ovs_assert(d == paddr);
+ assert_single_mbuf(pkt, RTE_PKTMBUF_HEADROOM, str_len);
+
+ dp_packet_uninit(pkt);
+
+ return 0;
+}
+
+static int
+test_dpdk_packet_multiple_mbufs_to_linear_malloc(void) {
+ struct dp_packet *pkt = dpdk_mp_alloc_pkt(mp);
+ ovs_assert(pkt != NULL);
+
+ /* Put the data in "test_str" in the packet */
+ size_t str_len = strlen(test_str);
+ pkt = dpdk_pkt_put(pkt, test_str, str_len);
+ ovs_assert(pkt != NULL);
+
+ /* Check properties and data are as expected */
+ uint16_t nb_segs = 3;
+ assert_multiple_mbufs(pkt, RTE_PKTMBUF_HEADROOM, str_len, nb_segs);
+
+ char *paddr = rte_pktmbuf_mtod(&pkt->mbuf, char *);
+ /* Convert DPBUF_DPDK packet in a linear DPBUF_MALLOC packet */
+ char *d = dp_packet_linear_data(pkt);
+
+ /* Check properties and data are as expected, namely:
+ * - The packet is now a DPBUF_MALLOC packet;
+ * - The returned address is a new address;
+ * - All expected data is now in the new address. */
+ ovs_assert(d != NULL);
+ ovs_assert(pkt->source == DPBUF_MALLOC);
+ ovs_assert(d != paddr);
+ ovs_assert(memcmp(d, test_str, str_len) == 0);
+
+ dp_packet_uninit(pkt);
+
+ return 0;
+}
+
+static int
+test_dpdk_packet(int argc OVS_UNUSED, char *argv[] OVS_UNUSED)
+{
+ /* Setup environment for tests */
+ dpdk_setup_eal_with_mp();
+ set_testing_pattern_str();
+
+ test_dpdk_packet_insert_headroom();
+ num_tests++;
+ test_dpdk_packet_insert_tailroom_and_headroom();
+ num_tests++;
+ test_dpdk_packet_insert_tailroom_multiple_mbufs();
+ num_tests++;
+ test_dpdk_packet_insert_headroom_multiple_mbufs();
+ num_tests++;
+ test_dpdk_packet_insert_tailroom_and_headroom_multiple_mbufs();
+ num_tests++;
+ test_dpdk_packet_change_size();
+ num_tests++;
+ test_dpdk_packet_shift_single_mbuf();
+ num_tests++;
+ test_dpdk_packet_shift_multiple_mbufs();
+ num_tests++;
+ test_dpdk_packet_shift_right_then_left();
+ num_tests++;
+ test_dpdk_packet_equal_multiple_mbufs();
+ num_tests++;
+ test_dpdk_packet_single_mbuf_to_linear_malloc();
+ num_tests++;
+ test_dpdk_packet_multiple_mbufs_to_linear_malloc();
+ num_tests++;
+
+ printf("Executed %d tests\n", num_tests);
+
+ exit(0);
+}
+
+OVSTEST_REGISTER("test-dpdk-packet", test_dpdk_packet);
From patchwork Mon Aug 20 17:44:34 2018
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Lam, Tiago"
X-Patchwork-Id: 959860
X-Patchwork-Delegate: ian.stokes@intel.com
Return-Path:
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Authentication-Results: ozlabs.org;
spf=pass (mailfrom) smtp.mailfrom=openvswitch.org
(client-ip=140.211.169.12; helo=mail.linuxfoundation.org;
envelope-from=ovs-dev-bounces@openvswitch.org;
receiver=)
Authentication-Results: ozlabs.org;
dmarc=fail (p=none dis=none) header.from=intel.com
Received: from mail.linuxfoundation.org (mail.linuxfoundation.org
[140.211.169.12])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
bits)) (No client certificate requested)
by ozlabs.org (Postfix) with ESMTPS id 41vLrN2MjVz9s0n
for ;
Tue, 21 Aug 2018 03:48:20 +1000 (AEST)
Received: from mail.linux-foundation.org (localhost [127.0.0.1])
by mail.linuxfoundation.org (Postfix) with ESMTP id 9817EDA1;
Mon, 20 Aug 2018 17:45:22 +0000 (UTC)
X-Original-To: ovs-dev@openvswitch.org
Delivered-To: ovs-dev@mail.linuxfoundation.org
Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org
[172.17.192.35])
by mail.linuxfoundation.org (Postfix) with ESMTPS id 84C54D66
for ; Mon, 20 Aug 2018 17:45:18 +0000 (UTC)
X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6
Received: from mga06.intel.com (mga06.intel.com [134.134.136.31])
by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 219FF772
for ; Mon, 20 Aug 2018 17:45:18 +0000 (UTC)
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from fmsmga004.fm.intel.com ([10.253.24.48])
by orsmga104.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;
20 Aug 2018 10:45:16 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.53,266,1531810800"; d="scan'208";a="81880474"
Received: from silpixa00399125.ir.intel.com ([10.237.223.34])
by fmsmga004.fm.intel.com with ESMTP; 20 Aug 2018 10:45:11 -0700
From: Tiago Lam
To: ovs-dev@openvswitch.org
Date: Mon, 20 Aug 2018 18:44:34 +0100
Message-Id: <1534787075-139132-14-git-send-email-tiago.lam@intel.com>
X-Mailer: git-send-email 2.7.4
In-Reply-To: <1534787075-139132-1-git-send-email-tiago.lam@intel.com>
References: <1534787075-139132-1-git-send-email-tiago.lam@intel.com>
X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED
autolearn=ham version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
smtp1.linux-foundation.org
Cc: i.maximets@samsung.com
Subject: [ovs-dev] [PATCH v8 13/14] dpdk-tests: Accept other configs in
OVS_DPDK_START
X-BeenThere: ovs-dev@openvswitch.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id:
List-Unsubscribe: ,
List-Archive:
List-Post:
List-Help:
List-Subscribe: ,
MIME-Version: 1.0
Sender: ovs-dev-bounces@openvswitch.org
Errors-To: ovs-dev-bounces@openvswitch.org
As it stands, OVS_DPDK_START() won't allow other configs to be set
before starting the ovs-vswitchd daemon. This is a problem since some
configs, such as the "dpdk-multi-seg-mbufs=true" for enabling the
multi-segment mbufs, need to be set prior to start OvS.
To support other options, OVS_DPDK_START() has been modified to accept
extra configs in the form "$config_name=$config_value". It then uses
ovs-vsctl to set the configs.
Signed-off-by: Tiago Lam
Acked-by: Eelco Chaudron
---
tests/system-dpdk-macros.at | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/tests/system-dpdk-macros.at b/tests/system-dpdk-macros.at
index 0762ee0..7c65834 100644
--- a/tests/system-dpdk-macros.at
+++ b/tests/system-dpdk-macros.at
@@ -21,7 +21,7 @@ m4_define([OVS_DPDK_PRE_CHECK],
])
-# OVS_DPDK_START()
+# OVS_DPDK_START([other-conf-args])
#
# Create an empty database and start ovsdb-server. Add special configuration
# dpdk-init to enable DPDK functionality. Start ovs-vswitchd connected to that
@@ -48,6 +48,10 @@ m4_define([OVS_DPDK_START],
AT_CHECK([lscpu], [], [stdout])
AT_CHECK([cat stdout | grep "NUMA node(s)" | awk '{c=1; while (c++<$(3)) {printf "1024,"}; print "1024"}' > SOCKET_MEM])
AT_CHECK([ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-socket-mem="$(cat SOCKET_MEM)"])
+ dnl Iterate through $other-conf-args list and include them
+ m4_foreach_w(opt, $1, [
+ AT_CHECK([ovs-vsctl --no-wait set Open_vSwitch . other_config:opt])
+ ])
dnl Start ovs-vswitchd.
AT_CHECK([ovs-vswitchd --detach --no-chdir --pidfile --log-file -vvconn -vofproto_dpif -vunixctl], [0], [stdout], [stderr])
From patchwork Mon Aug 20 17:44:35 2018
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Lam, Tiago"
X-Patchwork-Id: 959865
X-Patchwork-Delegate: ian.stokes@intel.com
Return-Path:
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Authentication-Results: ozlabs.org;
spf=pass (mailfrom) smtp.mailfrom=openvswitch.org
(client-ip=140.211.169.12; helo=mail.linuxfoundation.org;
envelope-from=ovs-dev-bounces@openvswitch.org;
receiver=)
Authentication-Results: ozlabs.org;
dmarc=fail (p=none dis=none) header.from=intel.com
Received: from mail.linuxfoundation.org (mail.linuxfoundation.org
[140.211.169.12])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
bits)) (No client certificate requested)
by ozlabs.org (Postfix) with ESMTPS id 41vLtR14F1z9s0n
for ;
Tue, 21 Aug 2018 03:50:06 +1000 (AEST)
Received: from mail.linux-foundation.org (localhost [127.0.0.1])
by mail.linuxfoundation.org (Postfix) with ESMTP id 74F3EDC4;
Mon, 20 Aug 2018 17:45:27 +0000 (UTC)
X-Original-To: ovs-dev@openvswitch.org
Delivered-To: ovs-dev@mail.linuxfoundation.org
Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org
[172.17.192.35])
by mail.linuxfoundation.org (Postfix) with ESMTPS id 1D475D8D
for ; Mon, 20 Aug 2018 17:45:21 +0000 (UTC)
X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6
Received: from mga09.intel.com (mga09.intel.com [134.134.136.24])
by smtp1.linuxfoundation.org (Postfix) with ESMTPS id A52A9773
for ; Mon, 20 Aug 2018 17:45:20 +0000 (UTC)
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from fmsmga004.fm.intel.com ([10.253.24.48])
by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;
20 Aug 2018 10:45:17 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.53,266,1531810800"; d="scan'208";a="81880476"
Received: from silpixa00399125.ir.intel.com ([10.237.223.34])
by fmsmga004.fm.intel.com with ESMTP; 20 Aug 2018 10:45:13 -0700
From: Tiago Lam
To: ovs-dev@openvswitch.org
Date: Mon, 20 Aug 2018 18:44:35 +0100
Message-Id: <1534787075-139132-15-git-send-email-tiago.lam@intel.com>
X-Mailer: git-send-email 2.7.4
In-Reply-To: <1534787075-139132-1-git-send-email-tiago.lam@intel.com>
References: <1534787075-139132-1-git-send-email-tiago.lam@intel.com>
X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_HI
autolearn=ham version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
smtp1.linux-foundation.org
Cc: i.maximets@samsung.com
Subject: [ovs-dev] [PATCH v8 14/14] dpdk-tests: End-to-end tests for
multi-seg mbufs.
X-BeenThere: ovs-dev@openvswitch.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id:
List-Unsubscribe: ,
List-Archive:
List-Post:
List-Help:
List-Subscribe: ,
MIME-Version: 1.0
Sender: ovs-dev-bounces@openvswitch.org
Errors-To: ovs-dev-bounces@openvswitch.org
The following tests are added to the DPDK testsuite to add some
coverage for the multi-segment mbufs:
- Check that multi-segment mbufs are disabled by default;
- Check that providing `other_config:dpdk-multi-seg-mbufs=true` indeed
enables mbufs;
- Using a DPDK port, send a random packet out and check that `ofctl
dump-flows` shows the correct amount of packets and bytes sent.
Signed-off-by: Tiago Lam
Acked-by: Eelco Chaudron
---
tests/system-dpdk.at | 65 ++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 65 insertions(+)
diff --git a/tests/system-dpdk.at b/tests/system-dpdk.at
index 3d21b01..af8de8c 100644
--- a/tests/system-dpdk.at
+++ b/tests/system-dpdk.at
@@ -71,3 +71,68 @@ OVS_VSWITCHD_STOP("/does not exist. The Open vSwitch kernel module is probably n
")
AT_CLEANUP
dnl --------------------------------------------------------------------------
+
+AT_SETUP([Jumbo frames - Multi-segment disabled by default])
+OVS_DPDK_START()
+
+AT_CHECK([grep "multi-segment mbufs enabled" ovs-vswitchd.log], [1], [])
+OVS_VSWITCHD_STOP("/Global register is changed during/d
+/EAL: No free hugepages reported in hugepages-1048576kB/d
+")
+AT_CLEANUP
+
+AT_SETUP([Jumbo frames - Multi-segment enabled])
+OVS_DPDK_START([dpdk-multi-seg-mbufs=true])
+AT_CHECK([grep "multi-segment mbufs enabled" ovs-vswitchd.log], [], [stdout])
+OVS_VSWITCHD_STOP("/Global register is changed during/d
+/EAL: No free hugepages reported in hugepages-1048576kB/d
+")
+AT_CLEANUP
+
+AT_SETUP([Jumbo frames - Multi-segment mbufs Tx])
+OVS_DPDK_PRE_CHECK()
+OVS_DPDK_START([per-port-memory=true dpdk-multi-seg-mbufs=true])
+
+dnl Add userspace bridge and attach it to OVS
+AT_CHECK([ovs-vsctl add-br br10 -- set bridge br10 datapath_type=netdev])
+AT_CHECK([ovs-vsctl add-port br10 dpdk0 \
+ -- set Interface dpdk0 type=dpdk options:dpdk-devargs=$(cat PCI_ADDR) \
+ -- set Interface dpdk0 mtu_request=9000], [], [stdout], [stderr])
+
+AT_CHECK([ovs-vsctl show], [], [stdout])
+
+dnl Add flows to send packets out from the 'dpdk0' port
+AT_CHECK([
+ovs-ofctl del-flows br10
+ovs-ofctl add-flow br10 in_port=LOCAL,actions=output:dpdk0
+], [], [stdout])
+
+AT_CHECK([ovs-ofctl dump-flows br10], [], [stdout])
+
+dnl Send packet out, of the 'dpdk0' port
+AT_CHECK([
+ARP_HEADER="000009000B00000009000A00080600010800060400010000000000010A0000\
+010000000000020A000002"
+dnl Build a random hex string to append to the ARP_HEADER
+RANDOM_BODY=$(printf '0102030405%.0s' {1..1750})
+dnl 8792B ARP packet
+RANDOM_ARP="$ARP_HEADER$RANDOM_BODY"
+
+ovs-ofctl packet-out br10 "packet=$RANDOM_ARP,action=resubmit:LOCAL"
+], [], [stdout])
+
+AT_CHECK([ovs-ofctl dump-flows br10], [0], [stdout])
+
+dnl Confirm the single packet as been sent with correct size
+AT_CHECK([ovs-ofctl dump-flows br10 | ofctl_strip | grep in_port], [0], [dnl
+ n_packets=1, n_bytes=8792, in_port=LOCAL actions=output:1
+])
+
+dnl Clean up
+OVS_VSWITCHD_STOP("/does not exist. The Open vSwitch kernel module is probably not loaded./d
+/Failed to enable flow control/d
+/failed to connect to \/tmp\/dpdkvhostclient0: No such file or directory/d
+/Global register is changed during/d
+/EAL: No free hugepages reported in hugepages-1048576kB/d
+")
+AT_CLEANUP