From patchwork Tue May 1 17:02:07 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Lam, Tiago" X-Patchwork-Id: 907134 X-Patchwork-Delegate: ian.stokes@intel.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=intel.com Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 40b75S2fKhz9s35 for ; Wed, 2 May 2018 03:03:08 +1000 (AEST) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id 655AB901; Tue, 1 May 2018 17:02:40 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id 1C3C38DC for ; Tue, 1 May 2018 17:02:39 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 416996D0 for ; Tue, 1 May 2018 17:02:38 +0000 (UTC) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 01 May 2018 10:02:37 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.49,351,1520924400"; d="scan'208";a="51485386" Received: from silpixa00399125.ir.intel.com ([10.237.223.34]) by fmsmga001.fm.intel.com with ESMTP; 01 May 2018 10:02:36 -0700 From: Tiago Lam To: ovs-dev@openvswitch.org Date: Tue, 1 May 2018 18:02:07 +0100 Message-Id: <1525194134-248371-2-git-send-email-tiago.lam@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1525194134-248371-1-git-send-email-tiago.lam@intel.com> References: <1525194134-248371-1-git-send-email-tiago.lam@intel.com> X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Subject: [ovs-dev] [RFC v5 1/8] netdev-dpdk: fix mbuf sizing X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org From: Mark Kavanagh There are numerous factors that must be considered when calculating the size of an mbuf: - the data portion of the mbuf must be sized in accordance With Rx buffer alignment (typically 1024B). So, for example, in order to successfully receive and capture a 1500B packet, mbufs with a data portion of size 2048B must be used. - in OvS, the elements that comprise an mbuf are: * the dp packet, which includes a struct rte mbuf (704B) * RTE_PKTMBUF_HEADROOM (128B) * packet data (aligned to 1k, as previously described) * RTE_PKTMBUF_TAILROOM (typically 0) Some PMDs require that the total mbuf size (i.e. the total sum of all of the above-listed components' lengths) is cache-aligned. To satisfy this requirement, it may be necessary to round up the total mbuf size with respect to cacheline size. In doing so, it's possible that the dp_packet's data portion is inadvertently increased in size, such that it no longer adheres to Rx buffer alignment. Consequently, the following property of the mbuf no longer holds true: mbuf.data_len == mbuf.buf_len - mbuf.data_off This creates a problem in the case of multi-segment mbufs, where that assumption is assumed to be true for all but the final segment in an mbuf chain. Resolve this issue by adjusting the size of the mbuf's private data portion, as opposed to the packet data portion when aligning mbuf size to cachelines. Fixes: 4be4d22 ("netdev-dpdk: clean up mbuf initialization") Fixes: 31b88c9 ("netdev-dpdk: round up mbuf_size to cache_line_size") CC: Santosh Shukla Co-authored-by: Tiago Lam Signed-off-by: Mark Kavanagh Signed-off-by: Tiago Lam --- lib/netdev-dpdk.c | 46 ++++++++++++++++++++++++++++++---------------- 1 file changed, 30 insertions(+), 16 deletions(-) diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 3306b19..648a1de 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -82,12 +82,6 @@ static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20); + (2 * VLAN_HEADER_LEN)) #define MTU_TO_FRAME_LEN(mtu) ((mtu) + ETHER_HDR_LEN + ETHER_CRC_LEN) #define MTU_TO_MAX_FRAME_LEN(mtu) ((mtu) + ETHER_HDR_MAX_LEN) -#define FRAME_LEN_TO_MTU(frame_len) ((frame_len) \ - - ETHER_HDR_LEN - ETHER_CRC_LEN) -#define MBUF_SIZE(mtu) ROUND_UP((MTU_TO_MAX_FRAME_LEN(mtu) \ - + sizeof(struct dp_packet) \ - + RTE_PKTMBUF_HEADROOM), \ - RTE_CACHE_LINE_SIZE) #define NETDEV_DPDK_MBUF_ALIGN 1024 #define NETDEV_DPDK_MAX_PKT_LEN 9728 @@ -486,7 +480,7 @@ is_dpdk_class(const struct netdev_class *class) * behaviour, which reduces performance. To prevent this, use a buffer size * that is closest to 'mtu', but which satisfies the aforementioned criteria. */ -static uint32_t +static uint16_t dpdk_buf_size(int mtu) { return ROUND_UP((MTU_TO_MAX_FRAME_LEN(mtu) + RTE_PKTMBUF_HEADROOM), @@ -577,7 +571,7 @@ dpdk_mp_do_not_free(struct rte_mempool *mp) OVS_REQUIRES(dpdk_mp_mutex) * - a new mempool was just created; * - a matching mempool already exists. */ static struct rte_mempool * -dpdk_mp_create(struct netdev_dpdk *dev, int mtu) +dpdk_mp_create(struct netdev_dpdk *dev, uint16_t mbuf_pkt_data_len) { char mp_name[RTE_MEMPOOL_NAMESIZE]; const char *netdev_name = netdev_get_name(&dev->up); @@ -585,6 +579,7 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu) uint32_t n_mbufs; uint32_t hash = hash_string(netdev_name, 0); struct rte_mempool *mp = NULL; + uint16_t mbuf_size, aligned_mbuf_size, mbuf_priv_data_len; /* * XXX: rough estimation of number of mbufs required for this port: @@ -604,12 +599,13 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu) * longer than RTE_MEMPOOL_NAMESIZE. */ int ret = snprintf(mp_name, RTE_MEMPOOL_NAMESIZE, "ovs%08x%02d%05d%07u", - hash, socket_id, mtu, n_mbufs); + hash, socket_id, mbuf_pkt_data_len, n_mbufs); if (ret < 0 || ret >= RTE_MEMPOOL_NAMESIZE) { VLOG_DBG("snprintf returned %d. " "Failed to generate a mempool name for \"%s\". " - "Hash:0x%x, socket_id: %d, mtu:%d, mbufs:%u.", - ret, netdev_name, hash, socket_id, mtu, n_mbufs); + "Hash:0x%x, socket_id: %d, pkt data room:%d, mbufs:%u.", + ret, netdev_name, hash, socket_id, mbuf_pkt_data_len, + n_mbufs); break; } @@ -618,13 +614,31 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu) netdev_name, n_mbufs, socket_id, dev->requested_n_rxq, dev->requested_n_txq); + mbuf_priv_data_len = sizeof(struct dp_packet) - + sizeof(struct rte_mbuf); + /* The size of the entire mbuf. */ + mbuf_size = sizeof (struct dp_packet) + + mbuf_pkt_data_len + RTE_PKTMBUF_HEADROOM; + /* mbuf size, rounded up to cacheline size. */ + aligned_mbuf_size = ROUND_UP(mbuf_size, RTE_CACHE_LINE_SIZE); + /* If there is a size discrepancy, add padding to mbuf_priv_data_len. + * This maintains mbuf size cache alignment, while also honoring RX + * buffer alignment in the data portion of the mbuf. If this adjustment + * is not made, there is a possiblity later on that for an element of + * the mempool, buf, buf->data_len < (buf->buf_len - buf->data_off). + * This is problematic in the case of multi-segment mbufs, particularly + * when an mbuf segment needs to be resized (when [push|popp]ing a VLAN + * header, for example. + */ + mbuf_priv_data_len += (aligned_mbuf_size - mbuf_size); + mp = rte_pktmbuf_pool_create(mp_name, n_mbufs, MP_CACHE_SZ, - sizeof (struct dp_packet) - sizeof (struct rte_mbuf), - MBUF_SIZE(mtu) - sizeof(struct dp_packet), socket_id); + mbuf_priv_data_len, + mbuf_pkt_data_len + RTE_PKTMBUF_HEADROOM, socket_id); if (mp) { VLOG_DBG("Allocated \"%s\" mempool with %u mbufs", - mp_name, n_mbufs); + mp_name, n_mbufs); /* rte_pktmbuf_pool_create has done some initialization of the * rte_mbuf part of each dp_packet. Some OvS specific fields * of the packet still need to be initialized by @@ -685,13 +699,13 @@ static int netdev_dpdk_mempool_configure(struct netdev_dpdk *dev) OVS_REQUIRES(dev->mutex) { - uint32_t buf_size = dpdk_buf_size(dev->requested_mtu); + uint16_t buf_size = dpdk_buf_size(dev->requested_mtu); struct rte_mempool *mp; int ret = 0; dpdk_mp_sweep(); - mp = dpdk_mp_create(dev, FRAME_LEN_TO_MTU(buf_size)); + mp = dpdk_mp_create(dev, buf_size); if (!mp) { VLOG_ERR("Failed to create memory pool for netdev " "%s, with MTU %d on socket %d: %s\n", From patchwork Tue May 1 17:02:08 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Lam, Tiago" X-Patchwork-Id: 907135 X-Patchwork-Delegate: ian.stokes@intel.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=intel.com Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 40b76L4NHLz9s1w for ; Wed, 2 May 2018 03:03:54 +1000 (AEST) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id 6F71A927; Tue, 1 May 2018 17:02:45 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id A8E80927 for ; Tue, 1 May 2018 17:02:43 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 7F9E762A for ; Tue, 1 May 2018 17:02:41 +0000 (UTC) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 01 May 2018 10:02:40 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.49,351,1520924400"; d="scan'208";a="51485399" Received: from silpixa00399125.ir.intel.com ([10.237.223.34]) by fmsmga001.fm.intel.com with ESMTP; 01 May 2018 10:02:40 -0700 From: Tiago Lam To: ovs-dev@openvswitch.org Date: Tue, 1 May 2018 18:02:08 +0100 Message-Id: <1525194134-248371-3-git-send-email-tiago.lam@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1525194134-248371-1-git-send-email-tiago.lam@intel.com> References: <1525194134-248371-1-git-send-email-tiago.lam@intel.com> X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Subject: [ovs-dev] [RFC v5 2/8] dp-packet: init specific mbuf fields to 0 X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org From: Mark Kavanagh dp_packets are created using xmalloc(); in the case of OvS-DPDK, it's possible the the resultant mbuf portion of the dp_packet contains random data. For some mbuf fields, specifically those related to multi-segment mbufs and/or offload features, random values may cause unexpected behaviour, should the dp_packet's contents be later copied to a DPDK mbuf. It is critical therefore, that these fields should be initialized to 0. This patch ensures that the following mbuf fields are initialized to 0, on creation of a new dp_packet: - ol_flags - nb_segs - tx_offload - packet_type Adapted from an idea by Michael Qiu : https://patchwork.ozlabs.org/patch/777570/ Co-authored-by: Tiago Lam Signed-off-by: Mark Kavanagh Signed-off-by: Tiago Lam --- lib/dp-packet.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/lib/dp-packet.h b/lib/dp-packet.h index 21c8ca5..9bfb7b7 100644 --- a/lib/dp-packet.h +++ b/lib/dp-packet.h @@ -626,13 +626,13 @@ dp_packet_mbuf_rss_flag_reset(struct dp_packet *p OVS_UNUSED) /* This initialization is needed for packets that do not come * from DPDK interfaces, when vswitchd is built with --with-dpdk. - * The DPDK rte library will still otherwise manage the mbuf. - * We only need to initialize the mbuf ol_flags. */ + * The DPDK rte library will still otherwise manage the mbuf. */ static inline void dp_packet_mbuf_init(struct dp_packet *p OVS_UNUSED) { #ifdef DPDK_NETDEV - p->mbuf.ol_flags = 0; + struct rte_mbuf *mbuf = &(p->mbuf); + mbuf->ol_flags = mbuf->nb_segs = mbuf->tx_offload = mbuf->packet_type = 0; #endif } From patchwork Tue May 1 17:02:09 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Lam, Tiago" X-Patchwork-Id: 907136 X-Patchwork-Delegate: ian.stokes@intel.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=intel.com Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 40b7721Qjgz9s1w for ; Wed, 2 May 2018 03:04:30 +1000 (AEST) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id 578BF949; Tue, 1 May 2018 17:02:47 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id 72F6592B for ; Tue, 1 May 2018 17:02:45 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 8021B6D0 for ; Tue, 1 May 2018 17:02:44 +0000 (UTC) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 01 May 2018 10:02:43 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.49,351,1520924400"; d="scan'208";a="51485426" Received: from silpixa00399125.ir.intel.com ([10.237.223.34]) by fmsmga001.fm.intel.com with ESMTP; 01 May 2018 10:02:43 -0700 From: Tiago Lam To: ovs-dev@openvswitch.org Date: Tue, 1 May 2018 18:02:09 +0100 Message-Id: <1525194134-248371-4-git-send-email-tiago.lam@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1525194134-248371-1-git-send-email-tiago.lam@intel.com> References: <1525194134-248371-1-git-send-email-tiago.lam@intel.com> X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Subject: [ovs-dev] [RFC v5 3/8] dp-packet: Add support for multi-seg mbufs X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org From: Mark Kavanagh Some functions in dp-packet assume that the data held by a dp_packet is contiguous, and perform operations such as pointer arithmetic under that assumption. However, with the introduction of multi-segment mbufs, where data is non-contiguous, such assumptions are no longer possible. Thus, dp_packet_put_init(), dp_packet_shift(), dp_packet_tail(), dp_packet_end() and dp_packet_at() were modified to take multi-segment mbufs into account. Both dp_packet_put_uninit() and dp_packet_shift() are, in their current implementation, operating on the data buffer of a dp_packet as if it were contiguous, which in the case of multi-segment mbufs means they operate on the first mbuf in the chain. However, in the case of dp_packet_put_uninit(), for example, it is the data length of the last mbuf in the mbuf chain that should be adjusted. Both functions have thus been modified to support multi-segment mbufs. Finally, dp_packet_tail(), dp_packet_end() and dp_packet_at() were also modified to operate differently when dealing with multi-segment mbufs, and now iterate over the non-contiguous data buffers for their calculations. Co-authored-by: Tiago Lam Signed-off-by: Mark Kavanagh Signed-off-by: Tiago Lam --- lib/dp-packet.c | 44 ++++++++++++++++- lib/dp-packet.h | 142 ++++++++++++++++++++++++++++++++++++++---------------- lib/netdev-dpdk.c | 8 +-- 3 files changed, 147 insertions(+), 47 deletions(-) diff --git a/lib/dp-packet.c b/lib/dp-packet.c index 443c225..fd9fad0 100644 --- a/lib/dp-packet.c +++ b/lib/dp-packet.c @@ -298,10 +298,33 @@ dp_packet_prealloc_headroom(struct dp_packet *b, size_t size) /* Shifts all of the data within the allocated space in 'b' by 'delta' bytes. * For example, a 'delta' of 1 would cause each byte of data to move one byte * forward (from address 'p' to 'p+1'), and a 'delta' of -1 would cause each - * byte to move one byte backward (from 'p' to 'p-1'). */ + * byte to move one byte backward (from 'p' to 'p-1'). + * Note for DPBUF_DPDK(XXX): The shift can only move within a size of RTE_ + * PKTMBUF_HEADROOM, to either left or right, which is usually defined as 128 + * bytes. + */ void dp_packet_shift(struct dp_packet *b, int delta) { +#ifdef DPDK_NETDEV + if (b->source == DPBUF_DPDK) { + ovs_assert(delta > 0 ? delta <= dp_packet_headroom(b) + : delta < 0 ? -delta <= dp_packet_headroom(b) + : true); + + if (delta != 0) { + struct rte_mbuf *mbuf = CONST_CAST(struct rte_mbuf *, &b->mbuf); + + if (delta > 0) { + rte_pktmbuf_prepend(mbuf, delta); + } else { + rte_pktmbuf_prepend(mbuf, delta); + } + } + + return; + } +#endif ovs_assert(delta > 0 ? delta <= dp_packet_tailroom(b) : delta < 0 ? -delta <= dp_packet_headroom(b) : true); @@ -315,14 +338,31 @@ dp_packet_shift(struct dp_packet *b, int delta) /* Appends 'size' bytes of data to the tail end of 'b', reallocating and * copying its data if necessary. Returns a pointer to the first byte of the - * new data, which is left uninitialized. */ + * new data, which is left uninitialized. + * Note for DPBUF_DPDK(XXX): In this case there must be enough tailroom to put + * the data in, otherwise this will result in a call to ovs_abort(). */ void * dp_packet_put_uninit(struct dp_packet *b, size_t size) { void *p; dp_packet_prealloc_tailroom(b, size); p = dp_packet_tail(b); +#ifdef DPDK_NETDEV + if (b->source == DPBUF_DPDK) { + /* In the case of multi-segment mbufs, the data length of the last mbuf + * should be adjusted by 'size' bytes. The packet length of the entire + * mbuf chain (stored in the first mbuf of said chain) is adjusted in + * the normal execution path below. + */ + struct rte_mbuf *buf = &(b->mbuf); + buf = rte_pktmbuf_lastseg(buf); + + buf->data_len += size; + } +#endif + dp_packet_set_size(b, dp_packet_size(b) + size); + return p; } diff --git a/lib/dp-packet.h b/lib/dp-packet.h index 9bfb7b7..d6512cf 100644 --- a/lib/dp-packet.h +++ b/lib/dp-packet.h @@ -55,12 +55,12 @@ struct dp_packet { struct rte_mbuf mbuf; /* DPDK mbuf */ #else void *base_; /* First byte of allocated space. */ - uint16_t allocated_; /* Number of bytes allocated. */ uint16_t data_ofs; /* First byte actually in use. */ uint32_t size_; /* Number of bytes in use. */ uint32_t rss_hash; /* Packet hash. */ bool rss_hash_valid; /* Is the 'rss_hash' valid? */ #endif + uint16_t allocated_; /* Number of bytes allocated. */ enum dp_packet_source source; /* Source of memory allocated as 'base'. */ /* All the following elements of this struct are copied in a single call @@ -133,6 +133,8 @@ static inline void *dp_packet_at(const struct dp_packet *, size_t offset, size_t size); static inline void *dp_packet_at_assert(const struct dp_packet *, size_t offset, size_t size); +static inline void * dp_packet_at_offset(const struct dp_packet *b, + size_t offset); static inline void *dp_packet_tail(const struct dp_packet *); static inline void *dp_packet_end(const struct dp_packet *); @@ -180,40 +182,6 @@ dp_packet_delete(struct dp_packet *b) } } -/* If 'b' contains at least 'offset + size' bytes of data, returns a pointer to - * byte 'offset'. Otherwise, returns a null pointer. */ -static inline void * -dp_packet_at(const struct dp_packet *b, size_t offset, size_t size) -{ - return offset + size <= dp_packet_size(b) - ? (char *) dp_packet_data(b) + offset - : NULL; -} - -/* Returns a pointer to byte 'offset' in 'b', which must contain at least - * 'offset + size' bytes of data. */ -static inline void * -dp_packet_at_assert(const struct dp_packet *b, size_t offset, size_t size) -{ - ovs_assert(offset + size <= dp_packet_size(b)); - return ((char *) dp_packet_data(b)) + offset; -} - -/* Returns a pointer to byte following the last byte of data in use in 'b'. */ -static inline void * -dp_packet_tail(const struct dp_packet *b) -{ - return (char *) dp_packet_data(b) + dp_packet_size(b); -} - -/* Returns a pointer to byte following the last byte allocated for use (but - * not necessarily in use) in 'b'. */ -static inline void * -dp_packet_end(const struct dp_packet *b) -{ - return (char *) dp_packet_base(b) + dp_packet_get_allocated(b); -} - /* Returns the number of bytes of headroom in 'b', that is, the number of bytes * of unused space in dp_packet 'b' before the data that is in use. (Most * commonly, the data in a dp_packet is at its beginning, and thus the @@ -454,18 +422,107 @@ __packet_set_data(struct dp_packet *b, uint16_t v) b->mbuf.data_off = v; } -static inline uint16_t -dp_packet_get_allocated(const struct dp_packet *b) +static inline void * +dp_packet_at_offset(const struct dp_packet *b, size_t offset) { - return b->mbuf.buf_len; + if (b->source == DPBUF_DPDK) { + struct rte_mbuf *buf = CONST_CAST(struct rte_mbuf *, &b->mbuf); + + while (buf && offset > buf->data_len) { + offset -= buf->data_len; + + buf = buf->next; + } + return buf ? rte_pktmbuf_mtod_offset(buf, char *, offset) : NULL; + } else { + return (char *) dp_packet_data(b) + offset; + } } -static inline void -dp_packet_set_allocated(struct dp_packet *b, uint16_t s) +/* If 'b' contains at least 'offset + size' bytes of data, returns a pointer to + * byte 'offset'. Otherwise, returns a null pointer. */ +static inline void * +dp_packet_at(const struct dp_packet *b, size_t offset, size_t size) +{ + return offset + size <= dp_packet_size(b) + ? dp_packet_at_offset(b, offset) + : NULL; +} + +/* Returns a pointer to byte 'offset' in 'b', which must contain at least + * 'offset + size' bytes of data. */ +static inline void * +dp_packet_at_assert(const struct dp_packet *b, size_t offset, size_t size) { - b->mbuf.buf_len = s; + ovs_assert(offset + size <= dp_packet_size(b)); + return dp_packet_at_offset(b, offset); +} + +/* Returns a pointer to byte following the last byte of data in use in 'b'. */ +static inline void * +dp_packet_tail(const struct dp_packet *b) +{ + if (b->source == DPBUF_DPDK) { + struct rte_mbuf *buf = CONST_CAST(struct rte_mbuf *, &b->mbuf); + + buf = rte_pktmbuf_lastseg(buf); + + return rte_pktmbuf_mtod_offset(buf, char *, buf->data_len); + } else { + return (char *) dp_packet_data(b) + dp_packet_size(b); + } +} + +/* Returns a pointer to byte following the last byte allocated for use (but + * not necessarily in use) in 'b'. */ +static inline void * +dp_packet_end(const struct dp_packet *b) +{ + if (b->source == DPBUF_DPDK) { + struct rte_mbuf *buf = CONST_CAST(struct rte_mbuf *, &(b->mbuf)); + + buf = rte_pktmbuf_lastseg(buf); + + return rte_pktmbuf_mtod(buf, char *) + buf->buf_len; + } else { + return (char *) dp_packet_base(b) + dp_packet_get_allocated(b); + } } #else +/* If 'b' contains at least 'offset + size' bytes of data, returns a pointer to + * byte 'offset'. Otherwise, returns a null pointer. */ +static inline void * +dp_packet_at(const struct dp_packet *b, size_t offset, size_t size) +{ + return offset + size <= dp_packet_size(b) + ? (char *) dp_packet_data(b) + offset + : NULL; +} + +/* Returns a pointer to byte 'offset' in 'b', which must contain at least + * 'offset + size' bytes of data. */ +static inline void * +dp_packet_at_assert(const struct dp_packet *b, size_t offset, size_t size) +{ + ovs_assert(offset + size <= dp_packet_size(b)); + return ((char *) dp_packet_data(b)) + offset; +} + +/* Returns a pointer to byte following the last byte of data in use in 'b'. */ +static inline void * +dp_packet_tail(const struct dp_packet *b) +{ + return (char *) dp_packet_data(b) + dp_packet_size(b); +} + +/* Returns a pointer to byte following the last byte allocated for use (but + * not necessarily in use) in 'b'. */ +static inline void * +dp_packet_end(const struct dp_packet *b) +{ + return (char *) dp_packet_base(b) + dp_packet_get_allocated(b); +} + static inline void * dp_packet_base(const struct dp_packet *b) { @@ -502,6 +559,8 @@ __packet_set_data(struct dp_packet *b, uint16_t v) b->data_ofs = v; } +#endif + static inline uint16_t dp_packet_get_allocated(const struct dp_packet *b) { @@ -513,7 +572,6 @@ dp_packet_set_allocated(struct dp_packet *b, uint16_t s) { b->allocated_ = s; } -#endif static inline void dp_packet_reset_cutlen(struct dp_packet *b) diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 648a1de..7008492 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -506,13 +506,14 @@ free_dpdk_buf(struct dp_packet *p) static void ovs_rte_pktmbuf_init(struct rte_mempool *mp OVS_UNUSED, - void *opaque_arg OVS_UNUSED, + void *opaque_arg, void *_p, unsigned i OVS_UNUSED) { struct rte_mbuf *pkt = _p; + uint16_t allocated = *(uint16_t *) opaque_arg; - dp_packet_init_dpdk((struct dp_packet *) pkt, pkt->buf_len); + dp_packet_init_dpdk((struct dp_packet *) pkt, allocated); } static int @@ -643,7 +644,8 @@ dpdk_mp_create(struct netdev_dpdk *dev, uint16_t mbuf_pkt_data_len) * rte_mbuf part of each dp_packet. Some OvS specific fields * of the packet still need to be initialized by * ovs_rte_pktmbuf_init. */ - rte_mempool_obj_iter(mp, ovs_rte_pktmbuf_init, NULL); + uint16_t allocated_bytes = dpdk_buf_size(dev->requested_mtu); + rte_mempool_obj_iter(mp, ovs_rte_pktmbuf_init, &allocated_bytes); } else if (rte_errno == EEXIST) { /* A mempool with the same name already exists. We just * retrieve its pointer to be returned to the caller. */ From patchwork Tue May 1 17:02:10 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Lam, Tiago" X-Patchwork-Id: 907140 X-Patchwork-Delegate: ian.stokes@intel.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=intel.com Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 40b77b5t2Rz9s1w for ; Wed, 2 May 2018 03:04:59 +1000 (AEST) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id 36A9E950; Tue, 1 May 2018 17:02:50 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id 6755B8F5 for ; Tue, 1 May 2018 17:02:49 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id C120E6D8 for ; Tue, 1 May 2018 17:02:48 +0000 (UTC) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 01 May 2018 10:02:48 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.49,351,1520924400"; d="scan'208";a="51485438" Received: from silpixa00399125.ir.intel.com ([10.237.223.34]) by fmsmga001.fm.intel.com with ESMTP; 01 May 2018 10:02:47 -0700 From: Tiago Lam To: ovs-dev@openvswitch.org Date: Tue, 1 May 2018 18:02:10 +0100 Message-Id: <1525194134-248371-5-git-send-email-tiago.lam@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1525194134-248371-1-git-send-email-tiago.lam@intel.com> References: <1525194134-248371-1-git-send-email-tiago.lam@intel.com> X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Cc: Marcin Ksiadz , Przemyslaw Lal , Michael Qiu Subject: [ovs-dev] [RFC v5 4/8] dp-packet: Fix data_len issue with multi-seg mbufs X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org When a dp_packet is from a DPDK source, and it contains multi-segment mbufs, the data_len is not equal to the packet size, pkt_len. Instead, the data_len of each mbuf in the chain should be considered while distributing the new (provided) size. To account for the above dp_packet_set_size() has been changed so that, in the multi-segment mbufs case, only the data_len on the last mbuf of the chain and the total size of the packet, pkt_len, are changed. The data_len on the intermediate mbufs preceeding the last mbuf is not changed by dp_packet_set_size(). Furthermore, in some cases dp_packet_set_size() may be used to set a smaller size than the current packet size, thus effectively trimming the end of the packet. In the multi-segment mbufs case this may lead to lingering mbufs that may need freeing. __dp_packet_set_data() now also updates an mbufs' data_len after setting the data offset. This is so that both fields are always in sync for each mbuf in a chain. Co-authored-by: Michael Qiu Co-authored-by: Mark Kavanagh Co-authored-by: Przemyslaw Lal Co-authored-by: Marcin Ksiadz Co-authored-by: Yuanhan Liu Signed-off-by: Michael Qiu Signed-off-by: Mark Kavanagh Signed-off-by: Przemyslaw Lal Signed-off-by: Marcin Ksiadz Signed-off-by: Yuanhan Liu Signed-off-by: Tiago Lam --- lib/dp-packet.h | 38 +++++++++++++++++++++++++++----------- 1 file changed, 27 insertions(+), 11 deletions(-) diff --git a/lib/dp-packet.h b/lib/dp-packet.h index d6512cf..93b0aaf 100644 --- a/lib/dp-packet.h +++ b/lib/dp-packet.h @@ -397,17 +397,31 @@ dp_packet_size(const struct dp_packet *b) static inline void dp_packet_set_size(struct dp_packet *b, uint32_t v) { - /* netdev-dpdk does not currently support segmentation; consequently, for - * all intents and purposes, 'data_len' (16 bit) and 'pkt_len' (32 bit) may - * be used interchangably. - * - * On the datapath, it is expected that the size of packets - * (and thus 'v') will always be <= UINT16_MAX; this means that there is no - * loss of accuracy in assigning 'v' to 'data_len'. - */ - b->mbuf.data_len = (uint16_t)v; /* Current seg length. */ - b->mbuf.pkt_len = v; /* Total length of all segments linked to - * this segment. */ + if (b->source == DPBUF_DPDK) { + struct rte_mbuf *seg = &b->mbuf; + uint16_t pkt_len = v; + uint16_t seg_len; + + /* Trim 'v' length bytes from the end of the chained buffers, freeing + any buffers that may be left floating */ + while (seg) { + seg_len = MIN(pkt_len, seg->data_len); + seg->data_len = seg_len; + + pkt_len -= seg_len; + if (pkt_len <= 0) { + /* Free the rest of chained mbufs */ + rte_pktmbuf_free(seg->next); + seg->next = NULL; + } + seg = seg->next; + } + } else { + b->mbuf.data_len = v; + } + + /* Total length of all segments linked to this segment. */ + b->mbuf.pkt_len = v; } static inline uint16_t @@ -420,6 +434,8 @@ static inline void __packet_set_data(struct dp_packet *b, uint16_t v) { b->mbuf.data_off = v; + /* When dealing with DPDK mbufs, keep data_off and data_len in sync */ + b->mbuf.data_len = b->mbuf.buf_len - b->mbuf.data_off; } static inline void * From patchwork Tue May 1 17:02:11 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Lam, Tiago" X-Patchwork-Id: 907141 X-Patchwork-Delegate: ian.stokes@intel.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=intel.com Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 40b78C3Kzkz9s1w for ; Wed, 2 May 2018 03:05:31 +1000 (AEST) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id 464C3900; Tue, 1 May 2018 17:02:53 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id E32FD900 for ; Tue, 1 May 2018 17:02:51 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 3CF9E62A for ; Tue, 1 May 2018 17:02:51 +0000 (UTC) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 01 May 2018 10:02:50 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.49,351,1520924400"; d="scan'208";a="51485451" Received: from silpixa00399125.ir.intel.com ([10.237.223.34]) by fmsmga001.fm.intel.com with ESMTP; 01 May 2018 10:02:50 -0700 From: Tiago Lam To: ovs-dev@openvswitch.org Date: Tue, 1 May 2018 18:02:11 +0100 Message-Id: <1525194134-248371-6-git-send-email-tiago.lam@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1525194134-248371-1-git-send-email-tiago.lam@intel.com> References: <1525194134-248371-1-git-send-email-tiago.lam@intel.com> X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Cc: Michael Qiu Subject: [ovs-dev] [RFC v5 5/8] dp-packet: copy mbuf info for packet copy X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org From: Michael Qiu Currently, when doing packet copy, lots of DPDK mbuf's info will be missed, like packet type, ol_flags, etc. Those information is very important for DPDK to do packets processing. Co-authored-by: Mark Kavanagh [mark.b.kavanagh@intel.com rebased] Co-authored-by: Tiago Lam Signed-off-by: Michael Qiu Signed-off-by: Mark Kavanagh Signed-off-by: Tiago Lam --- lib/dp-packet.c | 25 +++++++++++++++++++------ lib/dp-packet.h | 3 +++ lib/netdev-dpdk.c | 1 + 3 files changed, 23 insertions(+), 6 deletions(-) diff --git a/lib/dp-packet.c b/lib/dp-packet.c index fd9fad0..a2793f7 100644 --- a/lib/dp-packet.c +++ b/lib/dp-packet.c @@ -48,6 +48,22 @@ dp_packet_use__(struct dp_packet *b, void *base, size_t allocated, dp_packet_init__(b, allocated, source); } +#ifdef DPDK_NETDEV +void +dp_packet_copy_mbuf_flags(struct dp_packet *dst, const struct dp_packet *src) { + ovs_assert(dst != NULL && src != NULL); + struct rte_mbuf *buf_dst = &(dst->mbuf); + struct rte_mbuf buf_src = src->mbuf; + + buf_dst->nb_segs = buf_src.nb_segs; + buf_dst->ol_flags = buf_src.ol_flags; + buf_dst->packet_type = buf_src.packet_type; + buf_dst->tx_offload = buf_src.tx_offload; +} +#else +#define dp_packet_copy_mbuf_flags(arg1, arg2) +#endif + /* Initializes 'b' as an empty dp_packet that contains the 'allocated' bytes of * memory starting at 'base'. 'base' should be the first byte of a region * obtained from malloc(). It will be freed (with free()) if 'b' is resized or @@ -177,15 +193,12 @@ dp_packet_clone_with_headroom(const struct dp_packet *buffer, size_t headroom) offsetof(struct dp_packet, l2_pad_size)); #ifdef DPDK_NETDEV - new_buffer->mbuf.ol_flags = buffer->mbuf.ol_flags; -#else - new_buffer->rss_hash_valid = buffer->rss_hash_valid; -#endif - + dp_packet_copy_mbuf_flags(new_buffer, buffer); if (dp_packet_rss_valid(new_buffer)) { -#ifdef DPDK_NETDEV new_buffer->mbuf.hash.rss = buffer->mbuf.hash.rss; #else + new_buffer->rss_hash_valid = buffer->rss_hash_valid; + if (dp_packet_rss_valid(new_buffer)) { new_buffer->rss_hash = buffer->rss_hash; #endif } diff --git a/lib/dp-packet.h b/lib/dp-packet.h index 93b0aaf..4607699 100644 --- a/lib/dp-packet.h +++ b/lib/dp-packet.h @@ -119,6 +119,9 @@ void dp_packet_init_dpdk(struct dp_packet *, size_t allocated); void dp_packet_init(struct dp_packet *, size_t); void dp_packet_uninit(struct dp_packet *); +void dp_packet_copy_mbuf_flags(struct dp_packet *dst, + const struct dp_packet *src); + struct dp_packet *dp_packet_new(size_t); struct dp_packet *dp_packet_new_with_headroom(size_t, size_t headroom); struct dp_packet *dp_packet_clone(const struct dp_packet *); diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 7008492..c9de742 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -2149,6 +2149,7 @@ dpdk_do_tx_copy(struct netdev *netdev, int qid, struct dp_packet_batch *batch) memcpy(rte_pktmbuf_mtod(pkts[txcnt], void *), dp_packet_data(packet), size); dp_packet_set_size((struct dp_packet *)pkts[txcnt], size); + dp_packet_copy_mbuf_flags((struct dp_packet *)pkts[txcnt], packet); txcnt++; } From patchwork Tue May 1 17:02:12 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Lam, Tiago" X-Patchwork-Id: 907142 X-Patchwork-Delegate: ian.stokes@intel.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=intel.com Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 40b78m4gLBz9s3D for ; Wed, 2 May 2018 03:06:00 +1000 (AEST) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id 2102B9CD; Tue, 1 May 2018 17:02:57 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id C6BB298B for ; Tue, 1 May 2018 17:02:55 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id BE66D67D for ; Tue, 1 May 2018 17:02:53 +0000 (UTC) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 01 May 2018 10:02:53 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.49,351,1520924400"; d="scan'208";a="51485460" Received: from silpixa00399125.ir.intel.com ([10.237.223.34]) by fmsmga001.fm.intel.com with ESMTP; 01 May 2018 10:02:52 -0700 From: Tiago Lam To: ovs-dev@openvswitch.org Date: Tue, 1 May 2018 18:02:12 +0100 Message-Id: <1525194134-248371-7-git-send-email-tiago.lam@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1525194134-248371-1-git-send-email-tiago.lam@intel.com> References: <1525194134-248371-1-git-send-email-tiago.lam@intel.com> X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Cc: Michael Qiu Subject: [ovs-dev] [RFC v5 6/8] dp-packet: copy data from multi-seg. DPDK mbuf X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org From: Michael Qiu When doing packet clone, if packet source is from DPDK driver, multi-segment must be considered, and copy the segment's data one by one. Co-authored-by: Mark Kavanagh Co-authored-by: Tiago Lam Signed-off-by: Michael Qiu Signed-off-by: Mark Kavanagh Signed-off-by: Tiago Lam --- lib/dp-packet.c | 55 +++++++++++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 47 insertions(+), 8 deletions(-) diff --git a/lib/dp-packet.c b/lib/dp-packet.c index a2793f7..85db57a 100644 --- a/lib/dp-packet.c +++ b/lib/dp-packet.c @@ -175,6 +175,49 @@ dp_packet_clone(const struct dp_packet *buffer) return dp_packet_clone_with_headroom(buffer, 0); } +#ifdef DPDK_NETDEV +struct dp_packet * +dp_packet_clone_with_headroom(const struct dp_packet *buffer, + size_t headroom) { + struct dp_packet *new_buffer; + uint32_t pkt_len = dp_packet_size(buffer); + + /* copy multi-seg data */ + if (buffer->source == DPBUF_DPDK && buffer->mbuf.nb_segs > 1) { + uint32_t offset = 0; + void *dst = NULL; + struct rte_mbuf *tmbuf = CONST_CAST(struct rte_mbuf *, + &(buffer->mbuf)); + + new_buffer = dp_packet_new_with_headroom(pkt_len, headroom); + dp_packet_set_size(new_buffer, pkt_len + headroom); + dst = dp_packet_tail(new_buffer); + + while (tmbuf) { + rte_memcpy((char *)dst + offset, + rte_pktmbuf_mtod(tmbuf, void *), tmbuf->data_len); + offset += tmbuf->data_len; + tmbuf = tmbuf->next; + } + } else { + new_buffer = dp_packet_clone_data_with_headroom(dp_packet_data(buffer), + pkt_len, headroom); + } + + /* Copy the following fields into the returned buffer: l2_pad_size, + * l2_5_ofs, l3_ofs, l4_ofs, cutlen, packet_type and md. */ + memcpy(&new_buffer->l2_pad_size, &buffer->l2_pad_size, + sizeof(struct dp_packet) - + offsetof(struct dp_packet, l2_pad_size)); + + dp_packet_copy_mbuf_flags(new_buffer, buffer); + if (dp_packet_rss_valid(new_buffer)) { + new_buffer->mbuf.hash.rss = buffer->mbuf.hash.rss; + } + + return new_buffer; +} +#else /* Creates and returns a new dp_packet whose data are copied from 'buffer'. * The returned dp_packet will additionally have 'headroom' bytes of * headroom. */ @@ -182,29 +225,25 @@ struct dp_packet * dp_packet_clone_with_headroom(const struct dp_packet *buffer, size_t headroom) { struct dp_packet *new_buffer; + uint32_t pkt_len = dp_packet_size(buffer); new_buffer = dp_packet_clone_data_with_headroom(dp_packet_data(buffer), - dp_packet_size(buffer), - headroom); + pkt_len, headroom); + /* Copy the following fields into the returned buffer: l2_pad_size, * l2_5_ofs, l3_ofs, l4_ofs, cutlen, packet_type and md. */ memcpy(&new_buffer->l2_pad_size, &buffer->l2_pad_size, sizeof(struct dp_packet) - offsetof(struct dp_packet, l2_pad_size)); -#ifdef DPDK_NETDEV - dp_packet_copy_mbuf_flags(new_buffer, buffer); - if (dp_packet_rss_valid(new_buffer)) { - new_buffer->mbuf.hash.rss = buffer->mbuf.hash.rss; -#else new_buffer->rss_hash_valid = buffer->rss_hash_valid; if (dp_packet_rss_valid(new_buffer)) { new_buffer->rss_hash = buffer->rss_hash; -#endif } return new_buffer; } +#endif /* Creates and returns a new dp_packet that initially contains a copy of the * 'size' bytes of data starting at 'data' with no headroom or tailroom. */ From patchwork Tue May 1 17:02:13 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Lam, Tiago" X-Patchwork-Id: 907143 X-Patchwork-Delegate: ian.stokes@intel.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=intel.com Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 40b79N1dgyz9s1w for ; Wed, 2 May 2018 03:06:32 +1000 (AEST) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id 0110B9D1; Tue, 1 May 2018 17:02:58 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id 7CD4C957 for ; Tue, 1 May 2018 17:02:56 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 0EFD967D for ; Tue, 1 May 2018 17:02:56 +0000 (UTC) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 01 May 2018 10:02:55 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.49,351,1520924400"; d="scan'208";a="51485473" Received: from silpixa00399125.ir.intel.com ([10.237.223.34]) by fmsmga001.fm.intel.com with ESMTP; 01 May 2018 10:02:55 -0700 From: Tiago Lam To: ovs-dev@openvswitch.org Date: Tue, 1 May 2018 18:02:13 +0100 Message-Id: <1525194134-248371-8-git-send-email-tiago.lam@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1525194134-248371-1-git-send-email-tiago.lam@intel.com> References: <1525194134-248371-1-git-send-email-tiago.lam@intel.com> X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Cc: Michael Qiu Subject: [ovs-dev] [RFC v5 7/8] netdev-dpdk: copy large packet to multi-seg. mbufs X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org From: Mark Kavanagh Currently, packets are only copied to a single segment in the function dpdk_do_tx_copy(). This could be an issue in the case of jumbo frames, particularly when multi-segment mbufs are involved. This patch calculates the number of segments needed by a packet and copies the data to each segment. Co-authored-by: Michael Qiu Co-authored-by: Tiago Lam Signed-off-by: Mark Kavanagh Signed-off-by: Michael Qiu Signed-off-by: Tiago Lam --- lib/netdev-dpdk.c | 78 ++++++++++++++++++++++++++++++++++++++++++++++++------- 1 file changed, 68 insertions(+), 10 deletions(-) diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index c9de742..4c6a3c0 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -2101,6 +2101,71 @@ out: } } +static int +dpdk_prep_tx_buf(struct dp_packet *packet, struct rte_mbuf **head, + struct rte_mempool *mp) +{ + struct rte_mbuf *temp; + uint32_t size = dp_packet_size(packet); + uint16_t max_data_len, data_len; + uint32_t nb_segs = 0; + int i; + + temp = *head = rte_pktmbuf_alloc(mp); + if (OVS_UNLIKELY(!temp)) { + return 1; + } + + /* All new allocated mbuf's max data len is the same */ + max_data_len = temp->buf_len - temp->data_off; + + /* Calculate # of output mbufs. */ + nb_segs = size / max_data_len; + if (size % max_data_len) { + nb_segs = nb_segs + 1; + } + + /* Allocate additional mbufs when multiple output mbufs required. */ + for (i = 1; i < nb_segs; i++) { + temp->next = rte_pktmbuf_alloc(mp); + if (!temp->next) { + rte_pktmbuf_free(*head); + *head = NULL; + break; + } + temp = temp->next; + } + /* We have to do a copy for now */ + rte_pktmbuf_pkt_len(*head) = size; + temp = *head; + + data_len = size < max_data_len ? size: max_data_len; + if (packet->source == DPBUF_DPDK) { + *head = &(packet->mbuf); + while (temp && head && size > 0) { + rte_memcpy(rte_pktmbuf_mtod(temp, void *), + dp_packet_data((struct dp_packet *)head), data_len); + rte_pktmbuf_data_len(temp) = data_len; + *head = (*head)->next; + size = size - data_len; + data_len = size < max_data_len ? size: max_data_len; + temp = temp->next; + } + } else { + int offset = 0; + while (temp && size > 0) { + memcpy(rte_pktmbuf_mtod(temp, void *), + dp_packet_at(packet, offset, data_len), data_len); + rte_pktmbuf_data_len(temp) = data_len; + temp = temp->next; + size = size - data_len; + offset += data_len; + data_len = size < max_data_len ? size: max_data_len; + } + } + return 0; +} + /* Tx function. Transmit packets indefinitely */ static void dpdk_do_tx_copy(struct netdev *netdev, int qid, struct dp_packet_batch *batch) @@ -2117,6 +2182,7 @@ dpdk_do_tx_copy(struct netdev *netdev, int qid, struct dp_packet_batch *batch) struct rte_mbuf *pkts[PKT_ARRAY_SIZE]; uint32_t cnt = batch_cnt; uint32_t dropped = 0; + uint32_t i; if (dev->type != DPDK_DEV_VHOST) { /* Check if QoS has been configured for this netdev. */ @@ -2127,27 +2193,19 @@ dpdk_do_tx_copy(struct netdev *netdev, int qid, struct dp_packet_batch *batch) uint32_t txcnt = 0; - for (uint32_t i = 0; i < cnt; i++) { + for (i = 0; i < cnt; i++) { struct dp_packet *packet = batch->packets[i]; uint32_t size = dp_packet_size(packet); - if (OVS_UNLIKELY(size > dev->max_packet_len)) { VLOG_WARN_RL(&rl, "Too big size %u max_packet_len %d", size, dev->max_packet_len); - dropped++; continue; } - - pkts[txcnt] = rte_pktmbuf_alloc(dev->mp); - if (OVS_UNLIKELY(!pkts[txcnt])) { + if (!dpdk_prep_tx_buf(packet, &pkts[txcnt], dev->mp)) { dropped += cnt - i; break; } - - /* We have to do a copy for now */ - memcpy(rte_pktmbuf_mtod(pkts[txcnt], void *), - dp_packet_data(packet), size); dp_packet_set_size((struct dp_packet *)pkts[txcnt], size); dp_packet_copy_mbuf_flags((struct dp_packet *)pkts[txcnt], packet); From patchwork Tue May 1 17:02:14 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Lam, Tiago" X-Patchwork-Id: 907144 X-Patchwork-Delegate: ian.stokes@intel.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=intel.com Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 40b7B56l58z9s35 for ; Wed, 2 May 2018 03:07:09 +1000 (AEST) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id 16E379D0; Tue, 1 May 2018 17:03:00 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id 18108957 for ; Tue, 1 May 2018 17:02:59 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 4CB4E6D0 for ; Tue, 1 May 2018 17:02:58 +0000 (UTC) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 01 May 2018 10:02:57 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.49,351,1520924400"; d="scan'208";a="51485484" Received: from silpixa00399125.ir.intel.com ([10.237.223.34]) by fmsmga001.fm.intel.com with ESMTP; 01 May 2018 10:02:57 -0700 From: Tiago Lam To: ovs-dev@openvswitch.org Date: Tue, 1 May 2018 18:02:14 +0100 Message-Id: <1525194134-248371-9-git-send-email-tiago.lam@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1525194134-248371-1-git-send-email-tiago.lam@intel.com> References: <1525194134-248371-1-git-send-email-tiago.lam@intel.com> X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Subject: [ovs-dev] [RFC v5 8/8] netdev-dpdk: support multi-segment jumbo frames X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org From: Mark Kavanagh Currently, jumbo frame support for OvS-DPDK is implemented by increasing the size of mbufs within a mempool, such that each mbuf within the pool is large enough to contain an entire jumbo frame of a user-defined size. Typically, for each user-defined MTU, 'requested_mtu', a new mempool is created, containing mbufs of size ~requested_mtu. With the multi-segment approach, a port uses a single mempool, (containing standard/default-sized mbufs of ~2k bytes), irrespective of the user-requested MTU value. To accommodate jumbo frames, mbufs are chained together, where each mbuf in the chain stores a portion of the jumbo frame. Each mbuf in the chain is termed a segment, hence the name. == Enabling multi-segment mbufs == Multi-segment and single-segment mbufs are mutually exclusive, and the user must decide on which approach to adopt on init. The introduction of a new OVSDB field, 'dpdk-multi-seg-mbufs', facilitates this. This is a global boolean value, which determines how jumbo frames are represented across all DPDK ports. In the absence of a user-supplied value, 'dpdk-multi-seg-mbufs' defaults to false, i.e. multi-segment mbufs must be explicitly enabled / single-segment mbufs remain the default. Setting the field is identical to setting existing DPDK-specific OVSDB fields: ovs-vsctl set Open_vSwitch . other_config:dpdk-init=true ovs-vsctl set Open_vSwitch . other_config:dpdk-lcore-mask=0x10 ovs-vsctl set Open_vSwitch . other_config:dpdk-socket-mem=4096,0 ==> ovs-vsctl set Open_vSwitch . other_config:dpdk-multi-seg-mbufs=true Co-authored-by: Tiago Lam Signed-off-by: Mark Kavanagh Signed-off-by: Tiago Lam --- NEWS | 1 + lib/dpdk.c | 7 +++++++ lib/netdev-dpdk.c | 52 +++++++++++++++++++++++++++++++++++++++++++++------- lib/netdev-dpdk.h | 1 + vswitchd/vswitch.xml | 20 ++++++++++++++++++++ 5 files changed, 74 insertions(+), 7 deletions(-) diff --git a/NEWS b/NEWS index d22ad14..e6752d6 100644 --- a/NEWS +++ b/NEWS @@ -92,6 +92,7 @@ v2.9.0 - 19 Feb 2018 pmd assignments. * Add rxq utilization of pmd to appctl 'dpif-netdev/pmd-rxq-show'. * Add support for vHost dequeue zero copy (experimental) + * Add support for multi-segment mbufs - Userspace datapath: * Output packet batching support. - vswitchd: diff --git a/lib/dpdk.c b/lib/dpdk.c index 00dd974..1447724 100644 --- a/lib/dpdk.c +++ b/lib/dpdk.c @@ -459,6 +459,13 @@ dpdk_init__(const struct smap *ovs_other_config) /* Finally, register the dpdk classes */ netdev_dpdk_register(); + + bool multi_seg_mbufs_enable = smap_get_bool(ovs_other_config, + "dpdk-multi-seg-mbufs", false); + if (multi_seg_mbufs_enable) { + VLOG_INFO("DPDK multi-segment mbufs enabled\n"); + netdev_dpdk_multi_segment_mbufs_enable(); + } } void diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 4c6a3c0..5746ae0 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -66,6 +66,7 @@ enum {VIRTIO_RXQ, VIRTIO_TXQ, VIRTIO_QNUM}; VLOG_DEFINE_THIS_MODULE(netdev_dpdk); static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20); +static bool dpdk_multi_segment_mbufs = false; #define DPDK_PORT_WATCHDOG_INTERVAL 5 @@ -593,6 +594,7 @@ dpdk_mp_create(struct netdev_dpdk *dev, uint16_t mbuf_pkt_data_len) + dev->requested_n_txq * dev->requested_txq_size + MIN(RTE_MAX_LCORE, dev->requested_n_rxq) * NETDEV_MAX_BURST + MIN_NB_MBUF; + /* XXX: should n_mbufs be increased if multi-seg mbufs are used? */ ovs_mutex_lock(&dpdk_mp_mutex); do { @@ -693,7 +695,13 @@ dpdk_mp_release(struct rte_mempool *mp) /* Tries to allocate a new mempool - or re-use an existing one where * appropriate - on requested_socket_id with a size determined by - * requested_mtu and requested Rx/Tx queues. + * requested_mtu and requested Rx/Tx queues. Some properties of the mempool's + * elements are dependent on the value of 'dpdk_multi_segment_mbufs': + * - if 'true', then the mempool contains standard-sized mbufs that are chained + * together to accommodate packets of size 'requested_mtu'. + * - if 'false', then the members of the allocated mempool are + * non-standard-sized mbufs. Each mbuf in the mempool is large enough to + * fully accomdate packets of size 'requested_mtu'. * On success - or when re-using an existing mempool - the new configuration * will be applied. * On error, device will be left unchanged. */ @@ -701,10 +709,18 @@ static int netdev_dpdk_mempool_configure(struct netdev_dpdk *dev) OVS_REQUIRES(dev->mutex) { - uint16_t buf_size = dpdk_buf_size(dev->requested_mtu); + uint16_t buf_size = 0; struct rte_mempool *mp; int ret = 0; + /* Contiguous mbufs in use - permit oversized mbufs */ + if (!dpdk_multi_segment_mbufs) { + buf_size = dpdk_buf_size(dev->requested_mtu); + } else { + /* multi-segment mbufs - use standard mbuf size */ + buf_size = dpdk_buf_size(ETHER_MTU); + } + dpdk_mp_sweep(); mp = dpdk_mp_create(dev, buf_size); @@ -786,11 +802,25 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, int n_rxq, int n_txq) int diag = 0; int i; struct rte_eth_conf conf = port_conf; + struct rte_eth_txconf txconf; + + /* Multi-segment-mbuf-specific setup. */ + if (dpdk_multi_segment_mbufs) { + struct rte_eth_dev_info dev_info; + + /* DPDK PMDs typically attempt to use simple or vectorized + * transmit functions, neither of which are compatible with + * multi-segment mbufs. Ensure that these are disabled when + * multi-segment mbufs are enabled. + */ + rte_eth_dev_info_get(dev->port_id, &dev_info); + txconf = dev_info.default_txconf; + txconf.txq_flags &= ~ETH_TXQ_FLAGS_NOMULTSEGS; - /* For some NICs (e.g. Niantic), scatter_rx mode needs to be explicitly - * enabled. */ - if (dev->mtu > ETHER_MTU) { - conf.rxmode.enable_scatter = 1; + /* For some NICs (e.g. Niantic), scattered_rx mode (required for + * ingress jumbo frames when multi-segments are enabled) needs to + * be explicitly enabled. */ + conf.rxmode.enable_scatter = 1; } conf.rxmode.hw_ip_checksum = (dev->hw_ol_features & @@ -821,7 +851,9 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, int n_rxq, int n_txq) for (i = 0; i < n_txq; i++) { diag = rte_eth_tx_queue_setup(dev->port_id, i, dev->txq_size, - dev->socket_id, NULL); + dev->socket_id, + dpdk_multi_segment_mbufs ? &txconf + : NULL); if (diag) { VLOG_INFO("Interface %s unable to setup txq(%d): %s", dev->up.name, i, rte_strerror(-diag)); @@ -3868,6 +3900,12 @@ unlock: return err; } +void +netdev_dpdk_multi_segment_mbufs_enable(void) +{ + dpdk_multi_segment_mbufs = true; +} + #define NETDEV_DPDK_CLASS(NAME, INIT, CONSTRUCT, DESTRUCT, \ SET_CONFIG, SET_TX_MULTIQ, SEND, \ GET_CARRIER, GET_STATS, \ diff --git a/lib/netdev-dpdk.h b/lib/netdev-dpdk.h index b7d02a7..a3339fe 100644 --- a/lib/netdev-dpdk.h +++ b/lib/netdev-dpdk.h @@ -25,6 +25,7 @@ struct dp_packet; #ifdef DPDK_NETDEV +void netdev_dpdk_multi_segment_mbufs_enable(void); void netdev_dpdk_register(void); void free_dpdk_buf(struct dp_packet *); diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml index 9c2a826..5ef0926 100644 --- a/vswitchd/vswitch.xml +++ b/vswitchd/vswitch.xml @@ -331,6 +331,26 @@

+ +

+ Specifies if DPDK uses multi-segment mbufs for handling jumbo frames. +

+

+ If true, DPDK allocates a single mempool per port, irrespective + of the ports' requested MTU sizes. The elements of this mempool are + 'standard'-sized mbufs (typically 2k MB), which may be chained + together to accommodate jumbo frames. In this approach, each mbuf + typically stores a fragment of the overall jumbo frame. +

+

+ If not specified, defaults to false, in which case, + the size of each mbuf within a DPDK port's mempool will be grown to + accommodate jumbo frames within a single mbuf. +

+
+ +