From patchwork Tue May 1 17:02:07 2018
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Lam, Tiago"
X-Patchwork-Id: 907134
X-Patchwork-Delegate: ian.stokes@intel.com
Return-Path:
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Authentication-Results: ozlabs.org;
spf=pass (mailfrom) smtp.mailfrom=openvswitch.org
(client-ip=140.211.169.12; helo=mail.linuxfoundation.org;
envelope-from=ovs-dev-bounces@openvswitch.org;
receiver=)
Authentication-Results: ozlabs.org;
dmarc=none (p=none dis=none) header.from=intel.com
Received: from mail.linuxfoundation.org (mail.linuxfoundation.org
[140.211.169.12])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
bits)) (No client certificate requested)
by ozlabs.org (Postfix) with ESMTPS id 40b75S2fKhz9s35
for ;
Wed, 2 May 2018 03:03:08 +1000 (AEST)
Received: from mail.linux-foundation.org (localhost [127.0.0.1])
by mail.linuxfoundation.org (Postfix) with ESMTP id 655AB901;
Tue, 1 May 2018 17:02:40 +0000 (UTC)
X-Original-To: ovs-dev@openvswitch.org
Delivered-To: ovs-dev@mail.linuxfoundation.org
Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org
[172.17.192.35])
by mail.linuxfoundation.org (Postfix) with ESMTPS id 1C3C38DC
for ; Tue, 1 May 2018 17:02:39 +0000 (UTC)
X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6
Received: from mga04.intel.com (mga04.intel.com [192.55.52.120])
by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 416996D0
for ; Tue, 1 May 2018 17:02:38 +0000 (UTC)
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from fmsmga001.fm.intel.com ([10.253.24.23])
by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;
01 May 2018 10:02:37 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.49,351,1520924400"; d="scan'208";a="51485386"
Received: from silpixa00399125.ir.intel.com ([10.237.223.34])
by fmsmga001.fm.intel.com with ESMTP; 01 May 2018 10:02:36 -0700
From: Tiago Lam
To: ovs-dev@openvswitch.org
Date: Tue, 1 May 2018 18:02:07 +0100
Message-Id: <1525194134-248371-2-git-send-email-tiago.lam@intel.com>
X-Mailer: git-send-email 2.7.4
In-Reply-To: <1525194134-248371-1-git-send-email-tiago.lam@intel.com>
References: <1525194134-248371-1-git-send-email-tiago.lam@intel.com>
X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED
autolearn=ham version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
smtp1.linux-foundation.org
Subject: [ovs-dev] [RFC v5 1/8] netdev-dpdk: fix mbuf sizing
X-BeenThere: ovs-dev@openvswitch.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id:
List-Unsubscribe: ,
List-Archive:
List-Post:
List-Help:
List-Subscribe: ,
MIME-Version: 1.0
Sender: ovs-dev-bounces@openvswitch.org
Errors-To: ovs-dev-bounces@openvswitch.org
From: Mark Kavanagh
There are numerous factors that must be considered when calculating
the size of an mbuf:
- the data portion of the mbuf must be sized in accordance With Rx
buffer alignment (typically 1024B). So, for example, in order to
successfully receive and capture a 1500B packet, mbufs with a
data portion of size 2048B must be used.
- in OvS, the elements that comprise an mbuf are:
* the dp packet, which includes a struct rte mbuf (704B)
* RTE_PKTMBUF_HEADROOM (128B)
* packet data (aligned to 1k, as previously described)
* RTE_PKTMBUF_TAILROOM (typically 0)
Some PMDs require that the total mbuf size (i.e. the total sum of all
of the above-listed components' lengths) is cache-aligned. To satisfy
this requirement, it may be necessary to round up the total mbuf size
with respect to cacheline size. In doing so, it's possible that the
dp_packet's data portion is inadvertently increased in size, such that
it no longer adheres to Rx buffer alignment. Consequently, the
following property of the mbuf no longer holds true:
mbuf.data_len == mbuf.buf_len - mbuf.data_off
This creates a problem in the case of multi-segment mbufs, where that
assumption is assumed to be true for all but the final segment in an
mbuf chain. Resolve this issue by adjusting the size of the mbuf's
private data portion, as opposed to the packet data portion when
aligning mbuf size to cachelines.
Fixes: 4be4d22 ("netdev-dpdk: clean up mbuf initialization")
Fixes: 31b88c9 ("netdev-dpdk: round up mbuf_size to cache_line_size")
CC: Santosh Shukla
Co-authored-by: Tiago Lam
Signed-off-by: Mark Kavanagh
Signed-off-by: Tiago Lam
---
lib/netdev-dpdk.c | 46 ++++++++++++++++++++++++++++++----------------
1 file changed, 30 insertions(+), 16 deletions(-)
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 3306b19..648a1de 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -82,12 +82,6 @@ static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20);
+ (2 * VLAN_HEADER_LEN))
#define MTU_TO_FRAME_LEN(mtu) ((mtu) + ETHER_HDR_LEN + ETHER_CRC_LEN)
#define MTU_TO_MAX_FRAME_LEN(mtu) ((mtu) + ETHER_HDR_MAX_LEN)
-#define FRAME_LEN_TO_MTU(frame_len) ((frame_len) \
- - ETHER_HDR_LEN - ETHER_CRC_LEN)
-#define MBUF_SIZE(mtu) ROUND_UP((MTU_TO_MAX_FRAME_LEN(mtu) \
- + sizeof(struct dp_packet) \
- + RTE_PKTMBUF_HEADROOM), \
- RTE_CACHE_LINE_SIZE)
#define NETDEV_DPDK_MBUF_ALIGN 1024
#define NETDEV_DPDK_MAX_PKT_LEN 9728
@@ -486,7 +480,7 @@ is_dpdk_class(const struct netdev_class *class)
* behaviour, which reduces performance. To prevent this, use a buffer size
* that is closest to 'mtu', but which satisfies the aforementioned criteria.
*/
-static uint32_t
+static uint16_t
dpdk_buf_size(int mtu)
{
return ROUND_UP((MTU_TO_MAX_FRAME_LEN(mtu) + RTE_PKTMBUF_HEADROOM),
@@ -577,7 +571,7 @@ dpdk_mp_do_not_free(struct rte_mempool *mp) OVS_REQUIRES(dpdk_mp_mutex)
* - a new mempool was just created;
* - a matching mempool already exists. */
static struct rte_mempool *
-dpdk_mp_create(struct netdev_dpdk *dev, int mtu)
+dpdk_mp_create(struct netdev_dpdk *dev, uint16_t mbuf_pkt_data_len)
{
char mp_name[RTE_MEMPOOL_NAMESIZE];
const char *netdev_name = netdev_get_name(&dev->up);
@@ -585,6 +579,7 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu)
uint32_t n_mbufs;
uint32_t hash = hash_string(netdev_name, 0);
struct rte_mempool *mp = NULL;
+ uint16_t mbuf_size, aligned_mbuf_size, mbuf_priv_data_len;
/*
* XXX: rough estimation of number of mbufs required for this port:
@@ -604,12 +599,13 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu)
* longer than RTE_MEMPOOL_NAMESIZE. */
int ret = snprintf(mp_name, RTE_MEMPOOL_NAMESIZE,
"ovs%08x%02d%05d%07u",
- hash, socket_id, mtu, n_mbufs);
+ hash, socket_id, mbuf_pkt_data_len, n_mbufs);
if (ret < 0 || ret >= RTE_MEMPOOL_NAMESIZE) {
VLOG_DBG("snprintf returned %d. "
"Failed to generate a mempool name for \"%s\". "
- "Hash:0x%x, socket_id: %d, mtu:%d, mbufs:%u.",
- ret, netdev_name, hash, socket_id, mtu, n_mbufs);
+ "Hash:0x%x, socket_id: %d, pkt data room:%d, mbufs:%u.",
+ ret, netdev_name, hash, socket_id, mbuf_pkt_data_len,
+ n_mbufs);
break;
}
@@ -618,13 +614,31 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu)
netdev_name, n_mbufs, socket_id,
dev->requested_n_rxq, dev->requested_n_txq);
+ mbuf_priv_data_len = sizeof(struct dp_packet) -
+ sizeof(struct rte_mbuf);
+ /* The size of the entire mbuf. */
+ mbuf_size = sizeof (struct dp_packet) +
+ mbuf_pkt_data_len + RTE_PKTMBUF_HEADROOM;
+ /* mbuf size, rounded up to cacheline size. */
+ aligned_mbuf_size = ROUND_UP(mbuf_size, RTE_CACHE_LINE_SIZE);
+ /* If there is a size discrepancy, add padding to mbuf_priv_data_len.
+ * This maintains mbuf size cache alignment, while also honoring RX
+ * buffer alignment in the data portion of the mbuf. If this adjustment
+ * is not made, there is a possiblity later on that for an element of
+ * the mempool, buf, buf->data_len < (buf->buf_len - buf->data_off).
+ * This is problematic in the case of multi-segment mbufs, particularly
+ * when an mbuf segment needs to be resized (when [push|popp]ing a VLAN
+ * header, for example.
+ */
+ mbuf_priv_data_len += (aligned_mbuf_size - mbuf_size);
+
mp = rte_pktmbuf_pool_create(mp_name, n_mbufs, MP_CACHE_SZ,
- sizeof (struct dp_packet) - sizeof (struct rte_mbuf),
- MBUF_SIZE(mtu) - sizeof(struct dp_packet), socket_id);
+ mbuf_priv_data_len,
+ mbuf_pkt_data_len + RTE_PKTMBUF_HEADROOM, socket_id);
if (mp) {
VLOG_DBG("Allocated \"%s\" mempool with %u mbufs",
- mp_name, n_mbufs);
+ mp_name, n_mbufs);
/* rte_pktmbuf_pool_create has done some initialization of the
* rte_mbuf part of each dp_packet. Some OvS specific fields
* of the packet still need to be initialized by
@@ -685,13 +699,13 @@ static int
netdev_dpdk_mempool_configure(struct netdev_dpdk *dev)
OVS_REQUIRES(dev->mutex)
{
- uint32_t buf_size = dpdk_buf_size(dev->requested_mtu);
+ uint16_t buf_size = dpdk_buf_size(dev->requested_mtu);
struct rte_mempool *mp;
int ret = 0;
dpdk_mp_sweep();
- mp = dpdk_mp_create(dev, FRAME_LEN_TO_MTU(buf_size));
+ mp = dpdk_mp_create(dev, buf_size);
if (!mp) {
VLOG_ERR("Failed to create memory pool for netdev "
"%s, with MTU %d on socket %d: %s\n",
From patchwork Tue May 1 17:02:08 2018
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Lam, Tiago"
X-Patchwork-Id: 907135
X-Patchwork-Delegate: ian.stokes@intel.com
Return-Path:
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Authentication-Results: ozlabs.org;
spf=pass (mailfrom) smtp.mailfrom=openvswitch.org
(client-ip=140.211.169.12; helo=mail.linuxfoundation.org;
envelope-from=ovs-dev-bounces@openvswitch.org;
receiver=)
Authentication-Results: ozlabs.org;
dmarc=none (p=none dis=none) header.from=intel.com
Received: from mail.linuxfoundation.org (mail.linuxfoundation.org
[140.211.169.12])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
bits)) (No client certificate requested)
by ozlabs.org (Postfix) with ESMTPS id 40b76L4NHLz9s1w
for ;
Wed, 2 May 2018 03:03:54 +1000 (AEST)
Received: from mail.linux-foundation.org (localhost [127.0.0.1])
by mail.linuxfoundation.org (Postfix) with ESMTP id 6F71A927;
Tue, 1 May 2018 17:02:45 +0000 (UTC)
X-Original-To: ovs-dev@openvswitch.org
Delivered-To: ovs-dev@mail.linuxfoundation.org
Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org
[172.17.192.35])
by mail.linuxfoundation.org (Postfix) with ESMTPS id A8E80927
for ; Tue, 1 May 2018 17:02:43 +0000 (UTC)
X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6
Received: from mga04.intel.com (mga04.intel.com [192.55.52.120])
by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 7F9E762A
for ; Tue, 1 May 2018 17:02:41 +0000 (UTC)
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from fmsmga001.fm.intel.com ([10.253.24.23])
by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;
01 May 2018 10:02:40 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.49,351,1520924400"; d="scan'208";a="51485399"
Received: from silpixa00399125.ir.intel.com ([10.237.223.34])
by fmsmga001.fm.intel.com with ESMTP; 01 May 2018 10:02:40 -0700
From: Tiago Lam
To: ovs-dev@openvswitch.org
Date: Tue, 1 May 2018 18:02:08 +0100
Message-Id: <1525194134-248371-3-git-send-email-tiago.lam@intel.com>
X-Mailer: git-send-email 2.7.4
In-Reply-To: <1525194134-248371-1-git-send-email-tiago.lam@intel.com>
References: <1525194134-248371-1-git-send-email-tiago.lam@intel.com>
X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED
autolearn=ham version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
smtp1.linux-foundation.org
Subject: [ovs-dev] [RFC v5 2/8] dp-packet: init specific mbuf fields to 0
X-BeenThere: ovs-dev@openvswitch.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id:
List-Unsubscribe: ,
List-Archive:
List-Post:
List-Help:
List-Subscribe: ,
MIME-Version: 1.0
Sender: ovs-dev-bounces@openvswitch.org
Errors-To: ovs-dev-bounces@openvswitch.org
From: Mark Kavanagh
dp_packets are created using xmalloc(); in the case of OvS-DPDK, it's
possible the the resultant mbuf portion of the dp_packet contains
random data. For some mbuf fields, specifically those related to
multi-segment mbufs and/or offload features, random values may cause
unexpected behaviour, should the dp_packet's contents be later copied
to a DPDK mbuf. It is critical therefore, that these fields should be
initialized to 0.
This patch ensures that the following mbuf fields are initialized to 0,
on creation of a new dp_packet:
- ol_flags
- nb_segs
- tx_offload
- packet_type
Adapted from an idea by Michael Qiu :
https://patchwork.ozlabs.org/patch/777570/
Co-authored-by: Tiago Lam
Signed-off-by: Mark Kavanagh
Signed-off-by: Tiago Lam
---
lib/dp-packet.h | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/lib/dp-packet.h b/lib/dp-packet.h
index 21c8ca5..9bfb7b7 100644
--- a/lib/dp-packet.h
+++ b/lib/dp-packet.h
@@ -626,13 +626,13 @@ dp_packet_mbuf_rss_flag_reset(struct dp_packet *p OVS_UNUSED)
/* This initialization is needed for packets that do not come
* from DPDK interfaces, when vswitchd is built with --with-dpdk.
- * The DPDK rte library will still otherwise manage the mbuf.
- * We only need to initialize the mbuf ol_flags. */
+ * The DPDK rte library will still otherwise manage the mbuf. */
static inline void
dp_packet_mbuf_init(struct dp_packet *p OVS_UNUSED)
{
#ifdef DPDK_NETDEV
- p->mbuf.ol_flags = 0;
+ struct rte_mbuf *mbuf = &(p->mbuf);
+ mbuf->ol_flags = mbuf->nb_segs = mbuf->tx_offload = mbuf->packet_type = 0;
#endif
}
From patchwork Tue May 1 17:02:09 2018
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Lam, Tiago"
X-Patchwork-Id: 907136
X-Patchwork-Delegate: ian.stokes@intel.com
Return-Path:
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Authentication-Results: ozlabs.org;
spf=pass (mailfrom) smtp.mailfrom=openvswitch.org
(client-ip=140.211.169.12; helo=mail.linuxfoundation.org;
envelope-from=ovs-dev-bounces@openvswitch.org;
receiver=)
Authentication-Results: ozlabs.org;
dmarc=none (p=none dis=none) header.from=intel.com
Received: from mail.linuxfoundation.org (mail.linuxfoundation.org
[140.211.169.12])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
bits)) (No client certificate requested)
by ozlabs.org (Postfix) with ESMTPS id 40b7721Qjgz9s1w
for ;
Wed, 2 May 2018 03:04:30 +1000 (AEST)
Received: from mail.linux-foundation.org (localhost [127.0.0.1])
by mail.linuxfoundation.org (Postfix) with ESMTP id 578BF949;
Tue, 1 May 2018 17:02:47 +0000 (UTC)
X-Original-To: ovs-dev@openvswitch.org
Delivered-To: ovs-dev@mail.linuxfoundation.org
Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org
[172.17.192.35])
by mail.linuxfoundation.org (Postfix) with ESMTPS id 72F6592B
for ; Tue, 1 May 2018 17:02:45 +0000 (UTC)
X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6
Received: from mga04.intel.com (mga04.intel.com [192.55.52.120])
by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 8021B6D0
for ; Tue, 1 May 2018 17:02:44 +0000 (UTC)
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from fmsmga001.fm.intel.com ([10.253.24.23])
by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;
01 May 2018 10:02:43 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.49,351,1520924400"; d="scan'208";a="51485426"
Received: from silpixa00399125.ir.intel.com ([10.237.223.34])
by fmsmga001.fm.intel.com with ESMTP; 01 May 2018 10:02:43 -0700
From: Tiago Lam
To: ovs-dev@openvswitch.org
Date: Tue, 1 May 2018 18:02:09 +0100
Message-Id: <1525194134-248371-4-git-send-email-tiago.lam@intel.com>
X-Mailer: git-send-email 2.7.4
In-Reply-To: <1525194134-248371-1-git-send-email-tiago.lam@intel.com>
References: <1525194134-248371-1-git-send-email-tiago.lam@intel.com>
X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED
autolearn=ham version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
smtp1.linux-foundation.org
Subject: [ovs-dev] [RFC v5 3/8] dp-packet: Add support for multi-seg mbufs
X-BeenThere: ovs-dev@openvswitch.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id:
List-Unsubscribe: ,
List-Archive:
List-Post:
List-Help:
List-Subscribe: ,
MIME-Version: 1.0
Sender: ovs-dev-bounces@openvswitch.org
Errors-To: ovs-dev-bounces@openvswitch.org
From: Mark Kavanagh
Some functions in dp-packet assume that the data held by a dp_packet is
contiguous, and perform operations such as pointer arithmetic under that
assumption. However, with the introduction of multi-segment mbufs, where
data is non-contiguous, such assumptions are no longer possible. Thus,
dp_packet_put_init(), dp_packet_shift(), dp_packet_tail(),
dp_packet_end() and dp_packet_at() were modified to take multi-segment
mbufs into account.
Both dp_packet_put_uninit() and dp_packet_shift() are, in their current
implementation, operating on the data buffer of a dp_packet as if it
were contiguous, which in the case of multi-segment mbufs means they
operate on the first mbuf in the chain. However, in the case of
dp_packet_put_uninit(), for example, it is the data length of the last
mbuf in the mbuf chain that should be adjusted. Both functions have thus
been modified to support multi-segment mbufs.
Finally, dp_packet_tail(), dp_packet_end() and dp_packet_at() were also
modified to operate differently when dealing with multi-segment mbufs,
and now iterate over the non-contiguous data buffers for their
calculations.
Co-authored-by: Tiago Lam
Signed-off-by: Mark Kavanagh
Signed-off-by: Tiago Lam
---
lib/dp-packet.c | 44 ++++++++++++++++-
lib/dp-packet.h | 142 ++++++++++++++++++++++++++++++++++++++----------------
lib/netdev-dpdk.c | 8 +--
3 files changed, 147 insertions(+), 47 deletions(-)
diff --git a/lib/dp-packet.c b/lib/dp-packet.c
index 443c225..fd9fad0 100644
--- a/lib/dp-packet.c
+++ b/lib/dp-packet.c
@@ -298,10 +298,33 @@ dp_packet_prealloc_headroom(struct dp_packet *b, size_t size)
/* Shifts all of the data within the allocated space in 'b' by 'delta' bytes.
* For example, a 'delta' of 1 would cause each byte of data to move one byte
* forward (from address 'p' to 'p+1'), and a 'delta' of -1 would cause each
- * byte to move one byte backward (from 'p' to 'p-1'). */
+ * byte to move one byte backward (from 'p' to 'p-1').
+ * Note for DPBUF_DPDK(XXX): The shift can only move within a size of RTE_
+ * PKTMBUF_HEADROOM, to either left or right, which is usually defined as 128
+ * bytes.
+ */
void
dp_packet_shift(struct dp_packet *b, int delta)
{
+#ifdef DPDK_NETDEV
+ if (b->source == DPBUF_DPDK) {
+ ovs_assert(delta > 0 ? delta <= dp_packet_headroom(b)
+ : delta < 0 ? -delta <= dp_packet_headroom(b)
+ : true);
+
+ if (delta != 0) {
+ struct rte_mbuf *mbuf = CONST_CAST(struct rte_mbuf *, &b->mbuf);
+
+ if (delta > 0) {
+ rte_pktmbuf_prepend(mbuf, delta);
+ } else {
+ rte_pktmbuf_prepend(mbuf, delta);
+ }
+ }
+
+ return;
+ }
+#endif
ovs_assert(delta > 0 ? delta <= dp_packet_tailroom(b)
: delta < 0 ? -delta <= dp_packet_headroom(b)
: true);
@@ -315,14 +338,31 @@ dp_packet_shift(struct dp_packet *b, int delta)
/* Appends 'size' bytes of data to the tail end of 'b', reallocating and
* copying its data if necessary. Returns a pointer to the first byte of the
- * new data, which is left uninitialized. */
+ * new data, which is left uninitialized.
+ * Note for DPBUF_DPDK(XXX): In this case there must be enough tailroom to put
+ * the data in, otherwise this will result in a call to ovs_abort(). */
void *
dp_packet_put_uninit(struct dp_packet *b, size_t size)
{
void *p;
dp_packet_prealloc_tailroom(b, size);
p = dp_packet_tail(b);
+#ifdef DPDK_NETDEV
+ if (b->source == DPBUF_DPDK) {
+ /* In the case of multi-segment mbufs, the data length of the last mbuf
+ * should be adjusted by 'size' bytes. The packet length of the entire
+ * mbuf chain (stored in the first mbuf of said chain) is adjusted in
+ * the normal execution path below.
+ */
+ struct rte_mbuf *buf = &(b->mbuf);
+ buf = rte_pktmbuf_lastseg(buf);
+
+ buf->data_len += size;
+ }
+#endif
+
dp_packet_set_size(b, dp_packet_size(b) + size);
+
return p;
}
diff --git a/lib/dp-packet.h b/lib/dp-packet.h
index 9bfb7b7..d6512cf 100644
--- a/lib/dp-packet.h
+++ b/lib/dp-packet.h
@@ -55,12 +55,12 @@ struct dp_packet {
struct rte_mbuf mbuf; /* DPDK mbuf */
#else
void *base_; /* First byte of allocated space. */
- uint16_t allocated_; /* Number of bytes allocated. */
uint16_t data_ofs; /* First byte actually in use. */
uint32_t size_; /* Number of bytes in use. */
uint32_t rss_hash; /* Packet hash. */
bool rss_hash_valid; /* Is the 'rss_hash' valid? */
#endif
+ uint16_t allocated_; /* Number of bytes allocated. */
enum dp_packet_source source; /* Source of memory allocated as 'base'. */
/* All the following elements of this struct are copied in a single call
@@ -133,6 +133,8 @@ static inline void *dp_packet_at(const struct dp_packet *, size_t offset,
size_t size);
static inline void *dp_packet_at_assert(const struct dp_packet *,
size_t offset, size_t size);
+static inline void * dp_packet_at_offset(const struct dp_packet *b,
+ size_t offset);
static inline void *dp_packet_tail(const struct dp_packet *);
static inline void *dp_packet_end(const struct dp_packet *);
@@ -180,40 +182,6 @@ dp_packet_delete(struct dp_packet *b)
}
}
-/* If 'b' contains at least 'offset + size' bytes of data, returns a pointer to
- * byte 'offset'. Otherwise, returns a null pointer. */
-static inline void *
-dp_packet_at(const struct dp_packet *b, size_t offset, size_t size)
-{
- return offset + size <= dp_packet_size(b)
- ? (char *) dp_packet_data(b) + offset
- : NULL;
-}
-
-/* Returns a pointer to byte 'offset' in 'b', which must contain at least
- * 'offset + size' bytes of data. */
-static inline void *
-dp_packet_at_assert(const struct dp_packet *b, size_t offset, size_t size)
-{
- ovs_assert(offset + size <= dp_packet_size(b));
- return ((char *) dp_packet_data(b)) + offset;
-}
-
-/* Returns a pointer to byte following the last byte of data in use in 'b'. */
-static inline void *
-dp_packet_tail(const struct dp_packet *b)
-{
- return (char *) dp_packet_data(b) + dp_packet_size(b);
-}
-
-/* Returns a pointer to byte following the last byte allocated for use (but
- * not necessarily in use) in 'b'. */
-static inline void *
-dp_packet_end(const struct dp_packet *b)
-{
- return (char *) dp_packet_base(b) + dp_packet_get_allocated(b);
-}
-
/* Returns the number of bytes of headroom in 'b', that is, the number of bytes
* of unused space in dp_packet 'b' before the data that is in use. (Most
* commonly, the data in a dp_packet is at its beginning, and thus the
@@ -454,18 +422,107 @@ __packet_set_data(struct dp_packet *b, uint16_t v)
b->mbuf.data_off = v;
}
-static inline uint16_t
-dp_packet_get_allocated(const struct dp_packet *b)
+static inline void *
+dp_packet_at_offset(const struct dp_packet *b, size_t offset)
{
- return b->mbuf.buf_len;
+ if (b->source == DPBUF_DPDK) {
+ struct rte_mbuf *buf = CONST_CAST(struct rte_mbuf *, &b->mbuf);
+
+ while (buf && offset > buf->data_len) {
+ offset -= buf->data_len;
+
+ buf = buf->next;
+ }
+ return buf ? rte_pktmbuf_mtod_offset(buf, char *, offset) : NULL;
+ } else {
+ return (char *) dp_packet_data(b) + offset;
+ }
}
-static inline void
-dp_packet_set_allocated(struct dp_packet *b, uint16_t s)
+/* If 'b' contains at least 'offset + size' bytes of data, returns a pointer to
+ * byte 'offset'. Otherwise, returns a null pointer. */
+static inline void *
+dp_packet_at(const struct dp_packet *b, size_t offset, size_t size)
+{
+ return offset + size <= dp_packet_size(b)
+ ? dp_packet_at_offset(b, offset)
+ : NULL;
+}
+
+/* Returns a pointer to byte 'offset' in 'b', which must contain at least
+ * 'offset + size' bytes of data. */
+static inline void *
+dp_packet_at_assert(const struct dp_packet *b, size_t offset, size_t size)
{
- b->mbuf.buf_len = s;
+ ovs_assert(offset + size <= dp_packet_size(b));
+ return dp_packet_at_offset(b, offset);
+}
+
+/* Returns a pointer to byte following the last byte of data in use in 'b'. */
+static inline void *
+dp_packet_tail(const struct dp_packet *b)
+{
+ if (b->source == DPBUF_DPDK) {
+ struct rte_mbuf *buf = CONST_CAST(struct rte_mbuf *, &b->mbuf);
+
+ buf = rte_pktmbuf_lastseg(buf);
+
+ return rte_pktmbuf_mtod_offset(buf, char *, buf->data_len);
+ } else {
+ return (char *) dp_packet_data(b) + dp_packet_size(b);
+ }
+}
+
+/* Returns a pointer to byte following the last byte allocated for use (but
+ * not necessarily in use) in 'b'. */
+static inline void *
+dp_packet_end(const struct dp_packet *b)
+{
+ if (b->source == DPBUF_DPDK) {
+ struct rte_mbuf *buf = CONST_CAST(struct rte_mbuf *, &(b->mbuf));
+
+ buf = rte_pktmbuf_lastseg(buf);
+
+ return rte_pktmbuf_mtod(buf, char *) + buf->buf_len;
+ } else {
+ return (char *) dp_packet_base(b) + dp_packet_get_allocated(b);
+ }
}
#else
+/* If 'b' contains at least 'offset + size' bytes of data, returns a pointer to
+ * byte 'offset'. Otherwise, returns a null pointer. */
+static inline void *
+dp_packet_at(const struct dp_packet *b, size_t offset, size_t size)
+{
+ return offset + size <= dp_packet_size(b)
+ ? (char *) dp_packet_data(b) + offset
+ : NULL;
+}
+
+/* Returns a pointer to byte 'offset' in 'b', which must contain at least
+ * 'offset + size' bytes of data. */
+static inline void *
+dp_packet_at_assert(const struct dp_packet *b, size_t offset, size_t size)
+{
+ ovs_assert(offset + size <= dp_packet_size(b));
+ return ((char *) dp_packet_data(b)) + offset;
+}
+
+/* Returns a pointer to byte following the last byte of data in use in 'b'. */
+static inline void *
+dp_packet_tail(const struct dp_packet *b)
+{
+ return (char *) dp_packet_data(b) + dp_packet_size(b);
+}
+
+/* Returns a pointer to byte following the last byte allocated for use (but
+ * not necessarily in use) in 'b'. */
+static inline void *
+dp_packet_end(const struct dp_packet *b)
+{
+ return (char *) dp_packet_base(b) + dp_packet_get_allocated(b);
+}
+
static inline void *
dp_packet_base(const struct dp_packet *b)
{
@@ -502,6 +559,8 @@ __packet_set_data(struct dp_packet *b, uint16_t v)
b->data_ofs = v;
}
+#endif
+
static inline uint16_t
dp_packet_get_allocated(const struct dp_packet *b)
{
@@ -513,7 +572,6 @@ dp_packet_set_allocated(struct dp_packet *b, uint16_t s)
{
b->allocated_ = s;
}
-#endif
static inline void
dp_packet_reset_cutlen(struct dp_packet *b)
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 648a1de..7008492 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -506,13 +506,14 @@ free_dpdk_buf(struct dp_packet *p)
static void
ovs_rte_pktmbuf_init(struct rte_mempool *mp OVS_UNUSED,
- void *opaque_arg OVS_UNUSED,
+ void *opaque_arg,
void *_p,
unsigned i OVS_UNUSED)
{
struct rte_mbuf *pkt = _p;
+ uint16_t allocated = *(uint16_t *) opaque_arg;
- dp_packet_init_dpdk((struct dp_packet *) pkt, pkt->buf_len);
+ dp_packet_init_dpdk((struct dp_packet *) pkt, allocated);
}
static int
@@ -643,7 +644,8 @@ dpdk_mp_create(struct netdev_dpdk *dev, uint16_t mbuf_pkt_data_len)
* rte_mbuf part of each dp_packet. Some OvS specific fields
* of the packet still need to be initialized by
* ovs_rte_pktmbuf_init. */
- rte_mempool_obj_iter(mp, ovs_rte_pktmbuf_init, NULL);
+ uint16_t allocated_bytes = dpdk_buf_size(dev->requested_mtu);
+ rte_mempool_obj_iter(mp, ovs_rte_pktmbuf_init, &allocated_bytes);
} else if (rte_errno == EEXIST) {
/* A mempool with the same name already exists. We just
* retrieve its pointer to be returned to the caller. */
From patchwork Tue May 1 17:02:10 2018
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Lam, Tiago"
X-Patchwork-Id: 907140
X-Patchwork-Delegate: ian.stokes@intel.com
Return-Path:
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Authentication-Results: ozlabs.org;
spf=pass (mailfrom) smtp.mailfrom=openvswitch.org
(client-ip=140.211.169.12; helo=mail.linuxfoundation.org;
envelope-from=ovs-dev-bounces@openvswitch.org;
receiver=)
Authentication-Results: ozlabs.org;
dmarc=none (p=none dis=none) header.from=intel.com
Received: from mail.linuxfoundation.org (mail.linuxfoundation.org
[140.211.169.12])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
bits)) (No client certificate requested)
by ozlabs.org (Postfix) with ESMTPS id 40b77b5t2Rz9s1w
for ;
Wed, 2 May 2018 03:04:59 +1000 (AEST)
Received: from mail.linux-foundation.org (localhost [127.0.0.1])
by mail.linuxfoundation.org (Postfix) with ESMTP id 36A9E950;
Tue, 1 May 2018 17:02:50 +0000 (UTC)
X-Original-To: ovs-dev@openvswitch.org
Delivered-To: ovs-dev@mail.linuxfoundation.org
Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org
[172.17.192.35])
by mail.linuxfoundation.org (Postfix) with ESMTPS id 6755B8F5
for ; Tue, 1 May 2018 17:02:49 +0000 (UTC)
X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6
Received: from mga04.intel.com (mga04.intel.com [192.55.52.120])
by smtp1.linuxfoundation.org (Postfix) with ESMTPS id C120E6D8
for ; Tue, 1 May 2018 17:02:48 +0000 (UTC)
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from fmsmga001.fm.intel.com ([10.253.24.23])
by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;
01 May 2018 10:02:48 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.49,351,1520924400"; d="scan'208";a="51485438"
Received: from silpixa00399125.ir.intel.com ([10.237.223.34])
by fmsmga001.fm.intel.com with ESMTP; 01 May 2018 10:02:47 -0700
From: Tiago Lam
To: ovs-dev@openvswitch.org
Date: Tue, 1 May 2018 18:02:10 +0100
Message-Id: <1525194134-248371-5-git-send-email-tiago.lam@intel.com>
X-Mailer: git-send-email 2.7.4
In-Reply-To: <1525194134-248371-1-git-send-email-tiago.lam@intel.com>
References: <1525194134-248371-1-git-send-email-tiago.lam@intel.com>
X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED
autolearn=ham version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
smtp1.linux-foundation.org
Cc: Marcin Ksiadz ,
Przemyslaw Lal ,
Michael Qiu
Subject: [ovs-dev] [RFC v5 4/8] dp-packet: Fix data_len issue with multi-seg
mbufs
X-BeenThere: ovs-dev@openvswitch.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id:
List-Unsubscribe: ,
List-Archive:
List-Post:
List-Help:
List-Subscribe: ,
MIME-Version: 1.0
Sender: ovs-dev-bounces@openvswitch.org
Errors-To: ovs-dev-bounces@openvswitch.org
When a dp_packet is from a DPDK source, and it contains multi-segment
mbufs, the data_len is not equal to the packet size, pkt_len. Instead,
the data_len of each mbuf in the chain should be considered while
distributing the new (provided) size.
To account for the above dp_packet_set_size() has been changed so that,
in the multi-segment mbufs case, only the data_len on the last mbuf of
the chain and the total size of the packet, pkt_len, are changed. The
data_len on the intermediate mbufs preceeding the last mbuf is not
changed by dp_packet_set_size(). Furthermore, in some cases
dp_packet_set_size() may be used to set a smaller size than the current
packet size, thus effectively trimming the end of the packet. In the
multi-segment mbufs case this may lead to lingering mbufs that may need
freeing.
__dp_packet_set_data() now also updates an mbufs' data_len after setting
the data offset. This is so that both fields are always in sync for each
mbuf in a chain.
Co-authored-by: Michael Qiu
Co-authored-by: Mark Kavanagh
Co-authored-by: Przemyslaw Lal
Co-authored-by: Marcin Ksiadz
Co-authored-by: Yuanhan Liu
Signed-off-by: Michael Qiu
Signed-off-by: Mark Kavanagh
Signed-off-by: Przemyslaw Lal
Signed-off-by: Marcin Ksiadz
Signed-off-by: Yuanhan Liu
Signed-off-by: Tiago Lam
---
lib/dp-packet.h | 38 +++++++++++++++++++++++++++-----------
1 file changed, 27 insertions(+), 11 deletions(-)
diff --git a/lib/dp-packet.h b/lib/dp-packet.h
index d6512cf..93b0aaf 100644
--- a/lib/dp-packet.h
+++ b/lib/dp-packet.h
@@ -397,17 +397,31 @@ dp_packet_size(const struct dp_packet *b)
static inline void
dp_packet_set_size(struct dp_packet *b, uint32_t v)
{
- /* netdev-dpdk does not currently support segmentation; consequently, for
- * all intents and purposes, 'data_len' (16 bit) and 'pkt_len' (32 bit) may
- * be used interchangably.
- *
- * On the datapath, it is expected that the size of packets
- * (and thus 'v') will always be <= UINT16_MAX; this means that there is no
- * loss of accuracy in assigning 'v' to 'data_len'.
- */
- b->mbuf.data_len = (uint16_t)v; /* Current seg length. */
- b->mbuf.pkt_len = v; /* Total length of all segments linked to
- * this segment. */
+ if (b->source == DPBUF_DPDK) {
+ struct rte_mbuf *seg = &b->mbuf;
+ uint16_t pkt_len = v;
+ uint16_t seg_len;
+
+ /* Trim 'v' length bytes from the end of the chained buffers, freeing
+ any buffers that may be left floating */
+ while (seg) {
+ seg_len = MIN(pkt_len, seg->data_len);
+ seg->data_len = seg_len;
+
+ pkt_len -= seg_len;
+ if (pkt_len <= 0) {
+ /* Free the rest of chained mbufs */
+ rte_pktmbuf_free(seg->next);
+ seg->next = NULL;
+ }
+ seg = seg->next;
+ }
+ } else {
+ b->mbuf.data_len = v;
+ }
+
+ /* Total length of all segments linked to this segment. */
+ b->mbuf.pkt_len = v;
}
static inline uint16_t
@@ -420,6 +434,8 @@ static inline void
__packet_set_data(struct dp_packet *b, uint16_t v)
{
b->mbuf.data_off = v;
+ /* When dealing with DPDK mbufs, keep data_off and data_len in sync */
+ b->mbuf.data_len = b->mbuf.buf_len - b->mbuf.data_off;
}
static inline void *
From patchwork Tue May 1 17:02:11 2018
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Lam, Tiago"
X-Patchwork-Id: 907141
X-Patchwork-Delegate: ian.stokes@intel.com
Return-Path:
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Authentication-Results: ozlabs.org;
spf=pass (mailfrom) smtp.mailfrom=openvswitch.org
(client-ip=140.211.169.12; helo=mail.linuxfoundation.org;
envelope-from=ovs-dev-bounces@openvswitch.org;
receiver=)
Authentication-Results: ozlabs.org;
dmarc=none (p=none dis=none) header.from=intel.com
Received: from mail.linuxfoundation.org (mail.linuxfoundation.org
[140.211.169.12])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
bits)) (No client certificate requested)
by ozlabs.org (Postfix) with ESMTPS id 40b78C3Kzkz9s1w
for ;
Wed, 2 May 2018 03:05:31 +1000 (AEST)
Received: from mail.linux-foundation.org (localhost [127.0.0.1])
by mail.linuxfoundation.org (Postfix) with ESMTP id 464C3900;
Tue, 1 May 2018 17:02:53 +0000 (UTC)
X-Original-To: ovs-dev@openvswitch.org
Delivered-To: ovs-dev@mail.linuxfoundation.org
Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org
[172.17.192.35])
by mail.linuxfoundation.org (Postfix) with ESMTPS id E32FD900
for ; Tue, 1 May 2018 17:02:51 +0000 (UTC)
X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6
Received: from mga04.intel.com (mga04.intel.com [192.55.52.120])
by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 3CF9E62A
for ; Tue, 1 May 2018 17:02:51 +0000 (UTC)
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from fmsmga001.fm.intel.com ([10.253.24.23])
by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;
01 May 2018 10:02:50 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.49,351,1520924400"; d="scan'208";a="51485451"
Received: from silpixa00399125.ir.intel.com ([10.237.223.34])
by fmsmga001.fm.intel.com with ESMTP; 01 May 2018 10:02:50 -0700
From: Tiago Lam
To: ovs-dev@openvswitch.org
Date: Tue, 1 May 2018 18:02:11 +0100
Message-Id: <1525194134-248371-6-git-send-email-tiago.lam@intel.com>
X-Mailer: git-send-email 2.7.4
In-Reply-To: <1525194134-248371-1-git-send-email-tiago.lam@intel.com>
References: <1525194134-248371-1-git-send-email-tiago.lam@intel.com>
X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED
autolearn=ham version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
smtp1.linux-foundation.org
Cc: Michael Qiu
Subject: [ovs-dev] [RFC v5 5/8] dp-packet: copy mbuf info for packet copy
X-BeenThere: ovs-dev@openvswitch.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id:
List-Unsubscribe: ,
List-Archive:
List-Post:
List-Help:
List-Subscribe: ,
MIME-Version: 1.0
Sender: ovs-dev-bounces@openvswitch.org
Errors-To: ovs-dev-bounces@openvswitch.org
From: Michael Qiu
Currently, when doing packet copy, lots of DPDK mbuf's info
will be missed, like packet type, ol_flags, etc.
Those information is very important for DPDK to do
packets processing.
Co-authored-by: Mark Kavanagh
[mark.b.kavanagh@intel.com rebased]
Co-authored-by: Tiago Lam
Signed-off-by: Michael Qiu
Signed-off-by: Mark Kavanagh
Signed-off-by: Tiago Lam
---
lib/dp-packet.c | 25 +++++++++++++++++++------
lib/dp-packet.h | 3 +++
lib/netdev-dpdk.c | 1 +
3 files changed, 23 insertions(+), 6 deletions(-)
diff --git a/lib/dp-packet.c b/lib/dp-packet.c
index fd9fad0..a2793f7 100644
--- a/lib/dp-packet.c
+++ b/lib/dp-packet.c
@@ -48,6 +48,22 @@ dp_packet_use__(struct dp_packet *b, void *base, size_t allocated,
dp_packet_init__(b, allocated, source);
}
+#ifdef DPDK_NETDEV
+void
+dp_packet_copy_mbuf_flags(struct dp_packet *dst, const struct dp_packet *src) {
+ ovs_assert(dst != NULL && src != NULL);
+ struct rte_mbuf *buf_dst = &(dst->mbuf);
+ struct rte_mbuf buf_src = src->mbuf;
+
+ buf_dst->nb_segs = buf_src.nb_segs;
+ buf_dst->ol_flags = buf_src.ol_flags;
+ buf_dst->packet_type = buf_src.packet_type;
+ buf_dst->tx_offload = buf_src.tx_offload;
+}
+#else
+#define dp_packet_copy_mbuf_flags(arg1, arg2)
+#endif
+
/* Initializes 'b' as an empty dp_packet that contains the 'allocated' bytes of
* memory starting at 'base'. 'base' should be the first byte of a region
* obtained from malloc(). It will be freed (with free()) if 'b' is resized or
@@ -177,15 +193,12 @@ dp_packet_clone_with_headroom(const struct dp_packet *buffer, size_t headroom)
offsetof(struct dp_packet, l2_pad_size));
#ifdef DPDK_NETDEV
- new_buffer->mbuf.ol_flags = buffer->mbuf.ol_flags;
-#else
- new_buffer->rss_hash_valid = buffer->rss_hash_valid;
-#endif
-
+ dp_packet_copy_mbuf_flags(new_buffer, buffer);
if (dp_packet_rss_valid(new_buffer)) {
-#ifdef DPDK_NETDEV
new_buffer->mbuf.hash.rss = buffer->mbuf.hash.rss;
#else
+ new_buffer->rss_hash_valid = buffer->rss_hash_valid;
+ if (dp_packet_rss_valid(new_buffer)) {
new_buffer->rss_hash = buffer->rss_hash;
#endif
}
diff --git a/lib/dp-packet.h b/lib/dp-packet.h
index 93b0aaf..4607699 100644
--- a/lib/dp-packet.h
+++ b/lib/dp-packet.h
@@ -119,6 +119,9 @@ void dp_packet_init_dpdk(struct dp_packet *, size_t allocated);
void dp_packet_init(struct dp_packet *, size_t);
void dp_packet_uninit(struct dp_packet *);
+void dp_packet_copy_mbuf_flags(struct dp_packet *dst,
+ const struct dp_packet *src);
+
struct dp_packet *dp_packet_new(size_t);
struct dp_packet *dp_packet_new_with_headroom(size_t, size_t headroom);
struct dp_packet *dp_packet_clone(const struct dp_packet *);
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 7008492..c9de742 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -2149,6 +2149,7 @@ dpdk_do_tx_copy(struct netdev *netdev, int qid, struct dp_packet_batch *batch)
memcpy(rte_pktmbuf_mtod(pkts[txcnt], void *),
dp_packet_data(packet), size);
dp_packet_set_size((struct dp_packet *)pkts[txcnt], size);
+ dp_packet_copy_mbuf_flags((struct dp_packet *)pkts[txcnt], packet);
txcnt++;
}
From patchwork Tue May 1 17:02:12 2018
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Lam, Tiago"
X-Patchwork-Id: 907142
X-Patchwork-Delegate: ian.stokes@intel.com
Return-Path:
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Authentication-Results: ozlabs.org;
spf=pass (mailfrom) smtp.mailfrom=openvswitch.org
(client-ip=140.211.169.12; helo=mail.linuxfoundation.org;
envelope-from=ovs-dev-bounces@openvswitch.org;
receiver=)
Authentication-Results: ozlabs.org;
dmarc=none (p=none dis=none) header.from=intel.com
Received: from mail.linuxfoundation.org (mail.linuxfoundation.org
[140.211.169.12])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
bits)) (No client certificate requested)
by ozlabs.org (Postfix) with ESMTPS id 40b78m4gLBz9s3D
for ;
Wed, 2 May 2018 03:06:00 +1000 (AEST)
Received: from mail.linux-foundation.org (localhost [127.0.0.1])
by mail.linuxfoundation.org (Postfix) with ESMTP id 2102B9CD;
Tue, 1 May 2018 17:02:57 +0000 (UTC)
X-Original-To: ovs-dev@openvswitch.org
Delivered-To: ovs-dev@mail.linuxfoundation.org
Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org
[172.17.192.35])
by mail.linuxfoundation.org (Postfix) with ESMTPS id C6BB298B
for ; Tue, 1 May 2018 17:02:55 +0000 (UTC)
X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6
Received: from mga04.intel.com (mga04.intel.com [192.55.52.120])
by smtp1.linuxfoundation.org (Postfix) with ESMTPS id BE66D67D
for ; Tue, 1 May 2018 17:02:53 +0000 (UTC)
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from fmsmga001.fm.intel.com ([10.253.24.23])
by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;
01 May 2018 10:02:53 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.49,351,1520924400"; d="scan'208";a="51485460"
Received: from silpixa00399125.ir.intel.com ([10.237.223.34])
by fmsmga001.fm.intel.com with ESMTP; 01 May 2018 10:02:52 -0700
From: Tiago Lam
To: ovs-dev@openvswitch.org
Date: Tue, 1 May 2018 18:02:12 +0100
Message-Id: <1525194134-248371-7-git-send-email-tiago.lam@intel.com>
X-Mailer: git-send-email 2.7.4
In-Reply-To: <1525194134-248371-1-git-send-email-tiago.lam@intel.com>
References: <1525194134-248371-1-git-send-email-tiago.lam@intel.com>
X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED
autolearn=ham version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
smtp1.linux-foundation.org
Cc: Michael Qiu
Subject: [ovs-dev] [RFC v5 6/8] dp-packet: copy data from multi-seg. DPDK
mbuf
X-BeenThere: ovs-dev@openvswitch.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id:
List-Unsubscribe: ,
List-Archive:
List-Post:
List-Help:
List-Subscribe: ,
MIME-Version: 1.0
Sender: ovs-dev-bounces@openvswitch.org
Errors-To: ovs-dev-bounces@openvswitch.org
From: Michael Qiu
When doing packet clone, if packet source is from DPDK driver,
multi-segment must be considered, and copy the segment's
data one by one.
Co-authored-by: Mark Kavanagh
Co-authored-by: Tiago Lam
Signed-off-by: Michael Qiu
Signed-off-by: Mark Kavanagh
Signed-off-by: Tiago Lam
---
lib/dp-packet.c | 55 +++++++++++++++++++++++++++++++++++++++++++++++--------
1 file changed, 47 insertions(+), 8 deletions(-)
diff --git a/lib/dp-packet.c b/lib/dp-packet.c
index a2793f7..85db57a 100644
--- a/lib/dp-packet.c
+++ b/lib/dp-packet.c
@@ -175,6 +175,49 @@ dp_packet_clone(const struct dp_packet *buffer)
return dp_packet_clone_with_headroom(buffer, 0);
}
+#ifdef DPDK_NETDEV
+struct dp_packet *
+dp_packet_clone_with_headroom(const struct dp_packet *buffer,
+ size_t headroom) {
+ struct dp_packet *new_buffer;
+ uint32_t pkt_len = dp_packet_size(buffer);
+
+ /* copy multi-seg data */
+ if (buffer->source == DPBUF_DPDK && buffer->mbuf.nb_segs > 1) {
+ uint32_t offset = 0;
+ void *dst = NULL;
+ struct rte_mbuf *tmbuf = CONST_CAST(struct rte_mbuf *,
+ &(buffer->mbuf));
+
+ new_buffer = dp_packet_new_with_headroom(pkt_len, headroom);
+ dp_packet_set_size(new_buffer, pkt_len + headroom);
+ dst = dp_packet_tail(new_buffer);
+
+ while (tmbuf) {
+ rte_memcpy((char *)dst + offset,
+ rte_pktmbuf_mtod(tmbuf, void *), tmbuf->data_len);
+ offset += tmbuf->data_len;
+ tmbuf = tmbuf->next;
+ }
+ } else {
+ new_buffer = dp_packet_clone_data_with_headroom(dp_packet_data(buffer),
+ pkt_len, headroom);
+ }
+
+ /* Copy the following fields into the returned buffer: l2_pad_size,
+ * l2_5_ofs, l3_ofs, l4_ofs, cutlen, packet_type and md. */
+ memcpy(&new_buffer->l2_pad_size, &buffer->l2_pad_size,
+ sizeof(struct dp_packet) -
+ offsetof(struct dp_packet, l2_pad_size));
+
+ dp_packet_copy_mbuf_flags(new_buffer, buffer);
+ if (dp_packet_rss_valid(new_buffer)) {
+ new_buffer->mbuf.hash.rss = buffer->mbuf.hash.rss;
+ }
+
+ return new_buffer;
+}
+#else
/* Creates and returns a new dp_packet whose data are copied from 'buffer'.
* The returned dp_packet will additionally have 'headroom' bytes of
* headroom. */
@@ -182,29 +225,25 @@ struct dp_packet *
dp_packet_clone_with_headroom(const struct dp_packet *buffer, size_t headroom)
{
struct dp_packet *new_buffer;
+ uint32_t pkt_len = dp_packet_size(buffer);
new_buffer = dp_packet_clone_data_with_headroom(dp_packet_data(buffer),
- dp_packet_size(buffer),
- headroom);
+ pkt_len, headroom);
+
/* Copy the following fields into the returned buffer: l2_pad_size,
* l2_5_ofs, l3_ofs, l4_ofs, cutlen, packet_type and md. */
memcpy(&new_buffer->l2_pad_size, &buffer->l2_pad_size,
sizeof(struct dp_packet) -
offsetof(struct dp_packet, l2_pad_size));
-#ifdef DPDK_NETDEV
- dp_packet_copy_mbuf_flags(new_buffer, buffer);
- if (dp_packet_rss_valid(new_buffer)) {
- new_buffer->mbuf.hash.rss = buffer->mbuf.hash.rss;
-#else
new_buffer->rss_hash_valid = buffer->rss_hash_valid;
if (dp_packet_rss_valid(new_buffer)) {
new_buffer->rss_hash = buffer->rss_hash;
-#endif
}
return new_buffer;
}
+#endif
/* Creates and returns a new dp_packet that initially contains a copy of the
* 'size' bytes of data starting at 'data' with no headroom or tailroom. */
From patchwork Tue May 1 17:02:13 2018
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Lam, Tiago"
X-Patchwork-Id: 907143
X-Patchwork-Delegate: ian.stokes@intel.com
Return-Path:
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Authentication-Results: ozlabs.org;
spf=pass (mailfrom) smtp.mailfrom=openvswitch.org
(client-ip=140.211.169.12; helo=mail.linuxfoundation.org;
envelope-from=ovs-dev-bounces@openvswitch.org;
receiver=)
Authentication-Results: ozlabs.org;
dmarc=none (p=none dis=none) header.from=intel.com
Received: from mail.linuxfoundation.org (mail.linuxfoundation.org
[140.211.169.12])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
bits)) (No client certificate requested)
by ozlabs.org (Postfix) with ESMTPS id 40b79N1dgyz9s1w
for ;
Wed, 2 May 2018 03:06:32 +1000 (AEST)
Received: from mail.linux-foundation.org (localhost [127.0.0.1])
by mail.linuxfoundation.org (Postfix) with ESMTP id 0110B9D1;
Tue, 1 May 2018 17:02:58 +0000 (UTC)
X-Original-To: ovs-dev@openvswitch.org
Delivered-To: ovs-dev@mail.linuxfoundation.org
Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org
[172.17.192.35])
by mail.linuxfoundation.org (Postfix) with ESMTPS id 7CD4C957
for ; Tue, 1 May 2018 17:02:56 +0000 (UTC)
X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6
Received: from mga04.intel.com (mga04.intel.com [192.55.52.120])
by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 0EFD967D
for ; Tue, 1 May 2018 17:02:56 +0000 (UTC)
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from fmsmga001.fm.intel.com ([10.253.24.23])
by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;
01 May 2018 10:02:55 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.49,351,1520924400"; d="scan'208";a="51485473"
Received: from silpixa00399125.ir.intel.com ([10.237.223.34])
by fmsmga001.fm.intel.com with ESMTP; 01 May 2018 10:02:55 -0700
From: Tiago Lam
To: ovs-dev@openvswitch.org
Date: Tue, 1 May 2018 18:02:13 +0100
Message-Id: <1525194134-248371-8-git-send-email-tiago.lam@intel.com>
X-Mailer: git-send-email 2.7.4
In-Reply-To: <1525194134-248371-1-git-send-email-tiago.lam@intel.com>
References: <1525194134-248371-1-git-send-email-tiago.lam@intel.com>
X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED
autolearn=ham version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
smtp1.linux-foundation.org
Cc: Michael Qiu
Subject: [ovs-dev] [RFC v5 7/8] netdev-dpdk: copy large packet to multi-seg.
mbufs
X-BeenThere: ovs-dev@openvswitch.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id:
List-Unsubscribe: ,
List-Archive:
List-Post:
List-Help:
List-Subscribe: ,
MIME-Version: 1.0
Sender: ovs-dev-bounces@openvswitch.org
Errors-To: ovs-dev-bounces@openvswitch.org
From: Mark Kavanagh
Currently, packets are only copied to a single segment in
the function dpdk_do_tx_copy(). This could be an issue in
the case of jumbo frames, particularly when multi-segment
mbufs are involved.
This patch calculates the number of segments needed by a
packet and copies the data to each segment.
Co-authored-by: Michael Qiu
Co-authored-by: Tiago Lam
Signed-off-by: Mark Kavanagh
Signed-off-by: Michael Qiu
Signed-off-by: Tiago Lam
---
lib/netdev-dpdk.c | 78 ++++++++++++++++++++++++++++++++++++++++++++++++-------
1 file changed, 68 insertions(+), 10 deletions(-)
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index c9de742..4c6a3c0 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -2101,6 +2101,71 @@ out:
}
}
+static int
+dpdk_prep_tx_buf(struct dp_packet *packet, struct rte_mbuf **head,
+ struct rte_mempool *mp)
+{
+ struct rte_mbuf *temp;
+ uint32_t size = dp_packet_size(packet);
+ uint16_t max_data_len, data_len;
+ uint32_t nb_segs = 0;
+ int i;
+
+ temp = *head = rte_pktmbuf_alloc(mp);
+ if (OVS_UNLIKELY(!temp)) {
+ return 1;
+ }
+
+ /* All new allocated mbuf's max data len is the same */
+ max_data_len = temp->buf_len - temp->data_off;
+
+ /* Calculate # of output mbufs. */
+ nb_segs = size / max_data_len;
+ if (size % max_data_len) {
+ nb_segs = nb_segs + 1;
+ }
+
+ /* Allocate additional mbufs when multiple output mbufs required. */
+ for (i = 1; i < nb_segs; i++) {
+ temp->next = rte_pktmbuf_alloc(mp);
+ if (!temp->next) {
+ rte_pktmbuf_free(*head);
+ *head = NULL;
+ break;
+ }
+ temp = temp->next;
+ }
+ /* We have to do a copy for now */
+ rte_pktmbuf_pkt_len(*head) = size;
+ temp = *head;
+
+ data_len = size < max_data_len ? size: max_data_len;
+ if (packet->source == DPBUF_DPDK) {
+ *head = &(packet->mbuf);
+ while (temp && head && size > 0) {
+ rte_memcpy(rte_pktmbuf_mtod(temp, void *),
+ dp_packet_data((struct dp_packet *)head), data_len);
+ rte_pktmbuf_data_len(temp) = data_len;
+ *head = (*head)->next;
+ size = size - data_len;
+ data_len = size < max_data_len ? size: max_data_len;
+ temp = temp->next;
+ }
+ } else {
+ int offset = 0;
+ while (temp && size > 0) {
+ memcpy(rte_pktmbuf_mtod(temp, void *),
+ dp_packet_at(packet, offset, data_len), data_len);
+ rte_pktmbuf_data_len(temp) = data_len;
+ temp = temp->next;
+ size = size - data_len;
+ offset += data_len;
+ data_len = size < max_data_len ? size: max_data_len;
+ }
+ }
+ return 0;
+}
+
/* Tx function. Transmit packets indefinitely */
static void
dpdk_do_tx_copy(struct netdev *netdev, int qid, struct dp_packet_batch *batch)
@@ -2117,6 +2182,7 @@ dpdk_do_tx_copy(struct netdev *netdev, int qid, struct dp_packet_batch *batch)
struct rte_mbuf *pkts[PKT_ARRAY_SIZE];
uint32_t cnt = batch_cnt;
uint32_t dropped = 0;
+ uint32_t i;
if (dev->type != DPDK_DEV_VHOST) {
/* Check if QoS has been configured for this netdev. */
@@ -2127,27 +2193,19 @@ dpdk_do_tx_copy(struct netdev *netdev, int qid, struct dp_packet_batch *batch)
uint32_t txcnt = 0;
- for (uint32_t i = 0; i < cnt; i++) {
+ for (i = 0; i < cnt; i++) {
struct dp_packet *packet = batch->packets[i];
uint32_t size = dp_packet_size(packet);
-
if (OVS_UNLIKELY(size > dev->max_packet_len)) {
VLOG_WARN_RL(&rl, "Too big size %u max_packet_len %d",
size, dev->max_packet_len);
-
dropped++;
continue;
}
-
- pkts[txcnt] = rte_pktmbuf_alloc(dev->mp);
- if (OVS_UNLIKELY(!pkts[txcnt])) {
+ if (!dpdk_prep_tx_buf(packet, &pkts[txcnt], dev->mp)) {
dropped += cnt - i;
break;
}
-
- /* We have to do a copy for now */
- memcpy(rte_pktmbuf_mtod(pkts[txcnt], void *),
- dp_packet_data(packet), size);
dp_packet_set_size((struct dp_packet *)pkts[txcnt], size);
dp_packet_copy_mbuf_flags((struct dp_packet *)pkts[txcnt], packet);
From patchwork Tue May 1 17:02:14 2018
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Lam, Tiago"
X-Patchwork-Id: 907144
X-Patchwork-Delegate: ian.stokes@intel.com
Return-Path:
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Authentication-Results: ozlabs.org;
spf=pass (mailfrom) smtp.mailfrom=openvswitch.org
(client-ip=140.211.169.12; helo=mail.linuxfoundation.org;
envelope-from=ovs-dev-bounces@openvswitch.org;
receiver=)
Authentication-Results: ozlabs.org;
dmarc=none (p=none dis=none) header.from=intel.com
Received: from mail.linuxfoundation.org (mail.linuxfoundation.org
[140.211.169.12])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
bits)) (No client certificate requested)
by ozlabs.org (Postfix) with ESMTPS id 40b7B56l58z9s35
for ;
Wed, 2 May 2018 03:07:09 +1000 (AEST)
Received: from mail.linux-foundation.org (localhost [127.0.0.1])
by mail.linuxfoundation.org (Postfix) with ESMTP id 16E379D0;
Tue, 1 May 2018 17:03:00 +0000 (UTC)
X-Original-To: ovs-dev@openvswitch.org
Delivered-To: ovs-dev@mail.linuxfoundation.org
Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org
[172.17.192.35])
by mail.linuxfoundation.org (Postfix) with ESMTPS id 18108957
for ; Tue, 1 May 2018 17:02:59 +0000 (UTC)
X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6
Received: from mga04.intel.com (mga04.intel.com [192.55.52.120])
by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 4CB4E6D0
for ; Tue, 1 May 2018 17:02:58 +0000 (UTC)
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from fmsmga001.fm.intel.com ([10.253.24.23])
by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;
01 May 2018 10:02:57 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.49,351,1520924400"; d="scan'208";a="51485484"
Received: from silpixa00399125.ir.intel.com ([10.237.223.34])
by fmsmga001.fm.intel.com with ESMTP; 01 May 2018 10:02:57 -0700
From: Tiago Lam
To: ovs-dev@openvswitch.org
Date: Tue, 1 May 2018 18:02:14 +0100
Message-Id: <1525194134-248371-9-git-send-email-tiago.lam@intel.com>
X-Mailer: git-send-email 2.7.4
In-Reply-To: <1525194134-248371-1-git-send-email-tiago.lam@intel.com>
References: <1525194134-248371-1-git-send-email-tiago.lam@intel.com>
X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED
autolearn=ham version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
smtp1.linux-foundation.org
Subject: [ovs-dev] [RFC v5 8/8] netdev-dpdk: support multi-segment jumbo
frames
X-BeenThere: ovs-dev@openvswitch.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id:
List-Unsubscribe: ,
List-Archive:
List-Post:
List-Help:
List-Subscribe: ,
MIME-Version: 1.0
Sender: ovs-dev-bounces@openvswitch.org
Errors-To: ovs-dev-bounces@openvswitch.org
From: Mark Kavanagh
Currently, jumbo frame support for OvS-DPDK is implemented by
increasing the size of mbufs within a mempool, such that each mbuf
within the pool is large enough to contain an entire jumbo frame of
a user-defined size. Typically, for each user-defined MTU,
'requested_mtu', a new mempool is created, containing mbufs of size
~requested_mtu.
With the multi-segment approach, a port uses a single mempool,
(containing standard/default-sized mbufs of ~2k bytes), irrespective
of the user-requested MTU value. To accommodate jumbo frames, mbufs
are chained together, where each mbuf in the chain stores a portion of
the jumbo frame. Each mbuf in the chain is termed a segment, hence the
name.
== Enabling multi-segment mbufs ==
Multi-segment and single-segment mbufs are mutually exclusive, and the
user must decide on which approach to adopt on init. The introduction
of a new OVSDB field, 'dpdk-multi-seg-mbufs', facilitates this. This
is a global boolean value, which determines how jumbo frames are
represented across all DPDK ports. In the absence of a user-supplied
value, 'dpdk-multi-seg-mbufs' defaults to false, i.e. multi-segment
mbufs must be explicitly enabled / single-segment mbufs remain the
default.
Setting the field is identical to setting existing DPDK-specific OVSDB
fields:
ovs-vsctl set Open_vSwitch . other_config:dpdk-init=true
ovs-vsctl set Open_vSwitch . other_config:dpdk-lcore-mask=0x10
ovs-vsctl set Open_vSwitch . other_config:dpdk-socket-mem=4096,0
==> ovs-vsctl set Open_vSwitch . other_config:dpdk-multi-seg-mbufs=true
Co-authored-by: Tiago Lam
Signed-off-by: Mark Kavanagh
Signed-off-by: Tiago Lam
---
NEWS | 1 +
lib/dpdk.c | 7 +++++++
lib/netdev-dpdk.c | 52 +++++++++++++++++++++++++++++++++++++++++++++-------
lib/netdev-dpdk.h | 1 +
vswitchd/vswitch.xml | 20 ++++++++++++++++++++
5 files changed, 74 insertions(+), 7 deletions(-)
diff --git a/NEWS b/NEWS
index d22ad14..e6752d6 100644
--- a/NEWS
+++ b/NEWS
@@ -92,6 +92,7 @@ v2.9.0 - 19 Feb 2018
pmd assignments.
* Add rxq utilization of pmd to appctl 'dpif-netdev/pmd-rxq-show'.
* Add support for vHost dequeue zero copy (experimental)
+ * Add support for multi-segment mbufs
- Userspace datapath:
* Output packet batching support.
- vswitchd:
diff --git a/lib/dpdk.c b/lib/dpdk.c
index 00dd974..1447724 100644
--- a/lib/dpdk.c
+++ b/lib/dpdk.c
@@ -459,6 +459,13 @@ dpdk_init__(const struct smap *ovs_other_config)
/* Finally, register the dpdk classes */
netdev_dpdk_register();
+
+ bool multi_seg_mbufs_enable = smap_get_bool(ovs_other_config,
+ "dpdk-multi-seg-mbufs", false);
+ if (multi_seg_mbufs_enable) {
+ VLOG_INFO("DPDK multi-segment mbufs enabled\n");
+ netdev_dpdk_multi_segment_mbufs_enable();
+ }
}
void
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 4c6a3c0..5746ae0 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -66,6 +66,7 @@ enum {VIRTIO_RXQ, VIRTIO_TXQ, VIRTIO_QNUM};
VLOG_DEFINE_THIS_MODULE(netdev_dpdk);
static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20);
+static bool dpdk_multi_segment_mbufs = false;
#define DPDK_PORT_WATCHDOG_INTERVAL 5
@@ -593,6 +594,7 @@ dpdk_mp_create(struct netdev_dpdk *dev, uint16_t mbuf_pkt_data_len)
+ dev->requested_n_txq * dev->requested_txq_size
+ MIN(RTE_MAX_LCORE, dev->requested_n_rxq) * NETDEV_MAX_BURST
+ MIN_NB_MBUF;
+ /* XXX: should n_mbufs be increased if multi-seg mbufs are used? */
ovs_mutex_lock(&dpdk_mp_mutex);
do {
@@ -693,7 +695,13 @@ dpdk_mp_release(struct rte_mempool *mp)
/* Tries to allocate a new mempool - or re-use an existing one where
* appropriate - on requested_socket_id with a size determined by
- * requested_mtu and requested Rx/Tx queues.
+ * requested_mtu and requested Rx/Tx queues. Some properties of the mempool's
+ * elements are dependent on the value of 'dpdk_multi_segment_mbufs':
+ * - if 'true', then the mempool contains standard-sized mbufs that are chained
+ * together to accommodate packets of size 'requested_mtu'.
+ * - if 'false', then the members of the allocated mempool are
+ * non-standard-sized mbufs. Each mbuf in the mempool is large enough to
+ * fully accomdate packets of size 'requested_mtu'.
* On success - or when re-using an existing mempool - the new configuration
* will be applied.
* On error, device will be left unchanged. */
@@ -701,10 +709,18 @@ static int
netdev_dpdk_mempool_configure(struct netdev_dpdk *dev)
OVS_REQUIRES(dev->mutex)
{
- uint16_t buf_size = dpdk_buf_size(dev->requested_mtu);
+ uint16_t buf_size = 0;
struct rte_mempool *mp;
int ret = 0;
+ /* Contiguous mbufs in use - permit oversized mbufs */
+ if (!dpdk_multi_segment_mbufs) {
+ buf_size = dpdk_buf_size(dev->requested_mtu);
+ } else {
+ /* multi-segment mbufs - use standard mbuf size */
+ buf_size = dpdk_buf_size(ETHER_MTU);
+ }
+
dpdk_mp_sweep();
mp = dpdk_mp_create(dev, buf_size);
@@ -786,11 +802,25 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, int n_rxq, int n_txq)
int diag = 0;
int i;
struct rte_eth_conf conf = port_conf;
+ struct rte_eth_txconf txconf;
+
+ /* Multi-segment-mbuf-specific setup. */
+ if (dpdk_multi_segment_mbufs) {
+ struct rte_eth_dev_info dev_info;
+
+ /* DPDK PMDs typically attempt to use simple or vectorized
+ * transmit functions, neither of which are compatible with
+ * multi-segment mbufs. Ensure that these are disabled when
+ * multi-segment mbufs are enabled.
+ */
+ rte_eth_dev_info_get(dev->port_id, &dev_info);
+ txconf = dev_info.default_txconf;
+ txconf.txq_flags &= ~ETH_TXQ_FLAGS_NOMULTSEGS;
- /* For some NICs (e.g. Niantic), scatter_rx mode needs to be explicitly
- * enabled. */
- if (dev->mtu > ETHER_MTU) {
- conf.rxmode.enable_scatter = 1;
+ /* For some NICs (e.g. Niantic), scattered_rx mode (required for
+ * ingress jumbo frames when multi-segments are enabled) needs to
+ * be explicitly enabled. */
+ conf.rxmode.enable_scatter = 1;
}
conf.rxmode.hw_ip_checksum = (dev->hw_ol_features &
@@ -821,7 +851,9 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, int n_rxq, int n_txq)
for (i = 0; i < n_txq; i++) {
diag = rte_eth_tx_queue_setup(dev->port_id, i, dev->txq_size,
- dev->socket_id, NULL);
+ dev->socket_id,
+ dpdk_multi_segment_mbufs ? &txconf
+ : NULL);
if (diag) {
VLOG_INFO("Interface %s unable to setup txq(%d): %s",
dev->up.name, i, rte_strerror(-diag));
@@ -3868,6 +3900,12 @@ unlock:
return err;
}
+void
+netdev_dpdk_multi_segment_mbufs_enable(void)
+{
+ dpdk_multi_segment_mbufs = true;
+}
+
#define NETDEV_DPDK_CLASS(NAME, INIT, CONSTRUCT, DESTRUCT, \
SET_CONFIG, SET_TX_MULTIQ, SEND, \
GET_CARRIER, GET_STATS, \
diff --git a/lib/netdev-dpdk.h b/lib/netdev-dpdk.h
index b7d02a7..a3339fe 100644
--- a/lib/netdev-dpdk.h
+++ b/lib/netdev-dpdk.h
@@ -25,6 +25,7 @@ struct dp_packet;
#ifdef DPDK_NETDEV
+void netdev_dpdk_multi_segment_mbufs_enable(void);
void netdev_dpdk_register(void);
void free_dpdk_buf(struct dp_packet *);
diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
index 9c2a826..5ef0926 100644
--- a/vswitchd/vswitch.xml
+++ b/vswitchd/vswitch.xml
@@ -331,6 +331,26 @@
+
+
+ Specifies if DPDK uses multi-segment mbufs for handling jumbo frames.
+
+
+ If true, DPDK allocates a single mempool per port, irrespective
+ of the ports' requested MTU sizes. The elements of this mempool are
+ 'standard'-sized mbufs (typically 2k MB), which may be chained
+ together to accommodate jumbo frames. In this approach, each mbuf
+ typically stores a fragment of the overall jumbo frame.
+
+
+ If not specified, defaults to false
, in which case,
+ the size of each mbuf within a DPDK port's mempool will be grown to
+ accommodate jumbo frames within a single mbuf.
+
+
+
+