From patchwork Mon Jun 11 16:21:17 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Lam, Tiago" X-Patchwork-Id: 927769 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=intel.com Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 414JFS0NCmz9s01 for ; Tue, 12 Jun 2018 02:22:20 +1000 (AEST) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id C7F241050; Mon, 11 Jun 2018 16:21:48 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id B6273FCC for ; Mon, 11 Jun 2018 16:21:47 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id EA6A571C for ; Mon, 11 Jun 2018 16:21:46 +0000 (UTC) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga106.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 11 Jun 2018 09:21:46 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.49,502,1520924400"; d="scan'208";a="62210639" Received: from silpixa00399125.ir.intel.com ([10.237.223.34]) by fmsmga004.fm.intel.com with ESMTP; 11 Jun 2018 09:21:45 -0700 From: Tiago Lam To: ovs-dev@openvswitch.org Date: Mon, 11 Jun 2018 17:21:17 +0100 Message-Id: <1528734090-220990-2-git-send-email-tiago.lam@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1528734090-220990-1-git-send-email-tiago.lam@intel.com> References: <1528734090-220990-1-git-send-email-tiago.lam@intel.com> X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Cc: i.maximets@samsung.com Subject: [ovs-dev] [PATCH v8 01/13] netdev-dpdk: fix mbuf sizing X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org From: Mark Kavanagh There are numerous factors that must be considered when calculating the size of an mbuf: - the data portion of the mbuf must be sized in accordance With Rx buffer alignment (typically 1024B). So, for example, in order to successfully receive and capture a 1500B packet, mbufs with a data portion of size 2048B must be used. - in OvS, the elements that comprise an mbuf are: * the dp packet, which includes a struct rte mbuf (704B) * RTE_PKTMBUF_HEADROOM (128B) * packet data (aligned to 1k, as previously described) * RTE_PKTMBUF_TAILROOM (typically 0) Some PMDs require that the total mbuf size (i.e. the total sum of all of the above-listed components' lengths) is cache-aligned. To satisfy this requirement, it may be necessary to round up the total mbuf size with respect to cacheline size. In doing so, it's possible that the dp_packet's data portion is inadvertently increased in size, such that it no longer adheres to Rx buffer alignment. Consequently, the following property of the mbuf no longer holds true: mbuf.data_len == mbuf.buf_len - mbuf.data_off This creates a problem in the case of multi-segment mbufs, where that assumption is assumed to be true for all but the final segment in an mbuf chain. Resolve this issue by adjusting the size of the mbuf's private data portion, as opposed to the packet data portion when aligning mbuf size to cachelines. Fixes: 4be4d22 ("netdev-dpdk: clean up mbuf initialization") Fixes: 31b88c9 ("netdev-dpdk: round up mbuf_size to cache_line_size") CC: Santosh Shukla Signed-off-by: Mark Kavanagh Acked-by: Santosh Shukla --- lib/netdev-dpdk.c | 48 +++++++++++++++++++++++++++++++----------------- 1 file changed, 31 insertions(+), 17 deletions(-) diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 2e2f568..468ab36 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -82,12 +82,6 @@ static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20); + (2 * VLAN_HEADER_LEN)) #define MTU_TO_FRAME_LEN(mtu) ((mtu) + ETHER_HDR_LEN + ETHER_CRC_LEN) #define MTU_TO_MAX_FRAME_LEN(mtu) ((mtu) + ETHER_HDR_MAX_LEN) -#define FRAME_LEN_TO_MTU(frame_len) ((frame_len) \ - - ETHER_HDR_LEN - ETHER_CRC_LEN) -#define MBUF_SIZE(mtu) ROUND_UP((MTU_TO_MAX_FRAME_LEN(mtu) \ - + sizeof(struct dp_packet) \ - + RTE_PKTMBUF_HEADROOM), \ - RTE_CACHE_LINE_SIZE) #define NETDEV_DPDK_MBUF_ALIGN 1024 #define NETDEV_DPDK_MAX_PKT_LEN 9728 @@ -493,7 +487,7 @@ is_dpdk_class(const struct netdev_class *class) * behaviour, which reduces performance. To prevent this, use a buffer size * that is closest to 'mtu', but which satisfies the aforementioned criteria. */ -static uint32_t +static uint16_t dpdk_buf_size(int mtu) { return ROUND_UP((MTU_TO_MAX_FRAME_LEN(mtu) + RTE_PKTMBUF_HEADROOM), @@ -585,7 +579,7 @@ dpdk_mp_do_not_free(struct rte_mempool *mp) OVS_REQUIRES(dpdk_mp_mutex) * - a new mempool was just created; * - a matching mempool already exists. */ static struct rte_mempool * -dpdk_mp_create(struct netdev_dpdk *dev, int mtu) +dpdk_mp_create(struct netdev_dpdk *dev, uint16_t mbuf_pkt_data_len) { char mp_name[RTE_MEMPOOL_NAMESIZE]; const char *netdev_name = netdev_get_name(&dev->up); @@ -593,6 +587,7 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu) uint32_t n_mbufs; uint32_t hash = hash_string(netdev_name, 0); struct rte_mempool *mp = NULL; + uint16_t mbuf_size, aligned_mbuf_size, mbuf_priv_data_len; /* * XXX: rough estimation of number of mbufs required for this port: @@ -611,13 +606,14 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu) /* Full DPDK memory pool name must be unique and cannot be * longer than RTE_MEMPOOL_NAMESIZE. */ int ret = snprintf(mp_name, RTE_MEMPOOL_NAMESIZE, - "ovs%08x%02d%05d%07u", - hash, socket_id, mtu, n_mbufs); + "ovs%08x%02d%05u%07u", + hash, socket_id, mbuf_pkt_data_len, n_mbufs); if (ret < 0 || ret >= RTE_MEMPOOL_NAMESIZE) { VLOG_DBG("snprintf returned %d. " "Failed to generate a mempool name for \"%s\". " - "Hash:0x%x, socket_id: %d, mtu:%d, mbufs:%u.", - ret, netdev_name, hash, socket_id, mtu, n_mbufs); + "Hash:0x%x, socket_id: %d, pkt data room:%u, mbufs:%u.", + ret, netdev_name, hash, socket_id, mbuf_pkt_data_len, + n_mbufs); break; } @@ -626,13 +622,31 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu) netdev_name, n_mbufs, socket_id, dev->requested_n_rxq, dev->requested_n_txq); + mbuf_priv_data_len = sizeof(struct dp_packet) - + sizeof(struct rte_mbuf); + /* The size of the entire mbuf. */ + mbuf_size = sizeof (struct dp_packet) + + mbuf_pkt_data_len + RTE_PKTMBUF_HEADROOM; + /* mbuf size, rounded up to cacheline size. */ + aligned_mbuf_size = ROUND_UP(mbuf_size, RTE_CACHE_LINE_SIZE); + /* If there is a size discrepancy, add padding to mbuf_priv_data_len. + * This maintains mbuf size cache alignment, while also honoring RX + * buffer alignment in the data portion of the mbuf. If this adjustment + * is not made, there is a possiblity later on that for an element of + * the mempool, buf, buf->data_len < (buf->buf_len - buf->data_off). + * This is problematic in the case of multi-segment mbufs, particularly + * when an mbuf segment needs to be resized (when [push|popp]ing a VLAN + * header, for example. + */ + mbuf_priv_data_len += (aligned_mbuf_size - mbuf_size); + mp = rte_pktmbuf_pool_create(mp_name, n_mbufs, MP_CACHE_SZ, - sizeof (struct dp_packet) - sizeof (struct rte_mbuf), - MBUF_SIZE(mtu) - sizeof(struct dp_packet), socket_id); + mbuf_priv_data_len, + mbuf_pkt_data_len + RTE_PKTMBUF_HEADROOM, socket_id); if (mp) { VLOG_DBG("Allocated \"%s\" mempool with %u mbufs", - mp_name, n_mbufs); + mp_name, n_mbufs); /* rte_pktmbuf_pool_create has done some initialization of the * rte_mbuf part of each dp_packet. Some OvS specific fields * of the packet still need to be initialized by @@ -693,13 +707,13 @@ static int netdev_dpdk_mempool_configure(struct netdev_dpdk *dev) OVS_REQUIRES(dev->mutex) { - uint32_t buf_size = dpdk_buf_size(dev->requested_mtu); + uint16_t buf_size = dpdk_buf_size(dev->requested_mtu); struct rte_mempool *mp; int ret = 0; dpdk_mp_sweep(); - mp = dpdk_mp_create(dev, FRAME_LEN_TO_MTU(buf_size)); + mp = dpdk_mp_create(dev, buf_size); if (!mp) { VLOG_ERR("Failed to create memory pool for netdev " "%s, with MTU %d on socket %d: %s\n", From patchwork Mon Jun 11 16:21:18 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Lam, Tiago" X-Patchwork-Id: 927770 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=intel.com Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 414JGF626Xz9rvt for ; Tue, 12 Jun 2018 02:23:01 +1000 (AEST) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id B18AD10BB; Mon, 11 Jun 2018 16:21:50 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id 971771081 for ; Mon, 11 Jun 2018 16:21:49 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 1891671E for ; Mon, 11 Jun 2018 16:21:49 +0000 (UTC) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga106.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 11 Jun 2018 09:21:48 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.49,502,1520924400"; d="scan'208";a="62210648" Received: from silpixa00399125.ir.intel.com ([10.237.223.34]) by fmsmga004.fm.intel.com with ESMTP; 11 Jun 2018 09:21:47 -0700 From: Tiago Lam To: ovs-dev@openvswitch.org Date: Mon, 11 Jun 2018 17:21:18 +0100 Message-Id: <1528734090-220990-3-git-send-email-tiago.lam@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1528734090-220990-1-git-send-email-tiago.lam@intel.com> References: <1528734090-220990-1-git-send-email-tiago.lam@intel.com> X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Cc: i.maximets@samsung.com Subject: [ovs-dev] [PATCH v8 02/13] dp-packet: Init specific mbuf fields. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org From: Mark Kavanagh dp_packets are created using xmalloc(); in the case of OvS-DPDK, it's possible the the resultant mbuf portion of the dp_packet contains random data. For some mbuf fields, specifically those related to multi-segment mbufs and/or offload features, random values may cause unexpected behaviour, should the dp_packet's contents be later copied to a DPDK mbuf. It is critical therefore, that these fields should be initialized to 0. This patch ensures that the following mbuf fields are initialized to appropriate values on creation of a new dp_packet: - ol_flags=0 - nb_segs=1 - tx_offload=0 - packet_type=0 - next=NULL Adapted from an idea by Michael Qiu : https://patchwork.ozlabs.org/patch/777570/ Co-authored-by: Tiago Lam Signed-off-by: Mark Kavanagh Signed-off-by: Tiago Lam --- lib/dp-packet.h | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/lib/dp-packet.h b/lib/dp-packet.h index 596cfe6..82e45ad 100644 --- a/lib/dp-packet.h +++ b/lib/dp-packet.h @@ -626,13 +626,15 @@ dp_packet_mbuf_rss_flag_reset(struct dp_packet *p OVS_UNUSED) /* This initialization is needed for packets that do not come * from DPDK interfaces, when vswitchd is built with --with-dpdk. - * The DPDK rte library will still otherwise manage the mbuf. - * We only need to initialize the mbuf ol_flags. */ + * The DPDK rte library will still otherwise manage the mbuf. */ static inline void dp_packet_mbuf_init(struct dp_packet *p OVS_UNUSED) { #ifdef DPDK_NETDEV - p->mbuf.ol_flags = 0; + struct rte_mbuf *mbuf = &(p->mbuf); + mbuf->ol_flags = mbuf->tx_offload = mbuf->packet_type = 0; + mbuf->nb_segs = 1; + mbuf->next = NULL; #endif } From patchwork Mon Jun 11 16:21:19 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Lam, Tiago" X-Patchwork-Id: 927771 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=intel.com Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 414JGr2Szmz9rvt for ; Tue, 12 Jun 2018 02:23:32 +1000 (AEST) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id 6A2F71102; Mon, 11 Jun 2018 16:21:52 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id 4D90F113F for ; Mon, 11 Jun 2018 16:21:51 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id DAF5C71E for ; Mon, 11 Jun 2018 16:21:50 +0000 (UTC) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga106.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 11 Jun 2018 09:21:50 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.49,502,1520924400"; d="scan'208";a="62210654" Received: from silpixa00399125.ir.intel.com ([10.237.223.34]) by fmsmga004.fm.intel.com with ESMTP; 11 Jun 2018 09:21:49 -0700 From: Tiago Lam To: ovs-dev@openvswitch.org Date: Mon, 11 Jun 2018 17:21:19 +0100 Message-Id: <1528734090-220990-4-git-send-email-tiago.lam@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1528734090-220990-1-git-send-email-tiago.lam@intel.com> References: <1528734090-220990-1-git-send-email-tiago.lam@intel.com> X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Cc: i.maximets@samsung.com Subject: [ovs-dev] [PATCH v8 03/13] dp-packet: Fix allocated size on DPDK init. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org When enabled with DPDK OvS deals with two types of packets, the ones coming from the mempool and the ones locally created by OvS - which are copied to mempool mbufs before output. In the latter, the space is allocated from the system, while in the former the mbufs are allocated from a mempool, which takes care of initialising them appropriately. In the current implementation, during mempool's initialisation of mbufs, dp_packet_set_allocated() is called from dp_packet_init_dpdk() without considering that the allocated space, in the case of multi-segment mbufs, might be greater than a single mbuf. Furthermore, given that dp_packet_init_dpdk() is on the code path that's called upon mempool's initialisation, a call to dp_packet_set_allocated() is redundant, since mempool takes care of initialising it. To fix this, dp_packet_set_allocated() is no longer called after initialisation of a mempool, only in dp_packet_init__(), which is still called by OvS when initialising locally created packets. Signed-off-by: Tiago Lam Acked-by: Eelco Chaudron --- lib/dp-packet.c | 3 +-- lib/dp-packet.h | 2 +- lib/netdev-dpdk.c | 2 +- 3 files changed, 3 insertions(+), 4 deletions(-) diff --git a/lib/dp-packet.c b/lib/dp-packet.c index 443c225..782e7c2 100644 --- a/lib/dp-packet.c +++ b/lib/dp-packet.c @@ -99,9 +99,8 @@ dp_packet_use_const(struct dp_packet *b, const void *data, size_t size) * buffer. Here, non-transient ovs dp-packet fields are initialized for * packets that are part of a DPDK memory pool. */ void -dp_packet_init_dpdk(struct dp_packet *b, size_t allocated) +dp_packet_init_dpdk(struct dp_packet *b) { - dp_packet_set_allocated(b, allocated); b->source = DPBUF_DPDK; } diff --git a/lib/dp-packet.h b/lib/dp-packet.h index 82e45ad..4c104b6 100644 --- a/lib/dp-packet.h +++ b/lib/dp-packet.h @@ -114,7 +114,7 @@ void dp_packet_use(struct dp_packet *, void *, size_t); void dp_packet_use_stub(struct dp_packet *, void *, size_t); void dp_packet_use_const(struct dp_packet *, const void *, size_t); -void dp_packet_init_dpdk(struct dp_packet *, size_t allocated); +void dp_packet_init_dpdk(struct dp_packet *); void dp_packet_init(struct dp_packet *, size_t); void dp_packet_uninit(struct dp_packet *); diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 468ab36..f546507 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -519,7 +519,7 @@ ovs_rte_pktmbuf_init(struct rte_mempool *mp OVS_UNUSED, { struct rte_mbuf *pkt = _p; - dp_packet_init_dpdk((struct dp_packet *) pkt, pkt->buf_len); + dp_packet_init_dpdk((struct dp_packet *) pkt); } static int From patchwork Mon Jun 11 16:21:20 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Lam, Tiago" X-Patchwork-Id: 927772 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=intel.com Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 414JHP5WNwz9rvt for ; Tue, 12 Jun 2018 02:24:01 +1000 (AEST) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id 26BA2114F; Mon, 11 Jun 2018 16:21:55 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id E88B91055 for ; Mon, 11 Jun 2018 16:21:53 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 880D671C for ; Mon, 11 Jun 2018 16:21:53 +0000 (UTC) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga106.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 11 Jun 2018 09:21:53 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.49,502,1520924400"; d="scan'208";a="62210663" Received: from silpixa00399125.ir.intel.com ([10.237.223.34]) by fmsmga004.fm.intel.com with ESMTP; 11 Jun 2018 09:21:52 -0700 From: Tiago Lam To: ovs-dev@openvswitch.org Date: Mon, 11 Jun 2018 17:21:20 +0100 Message-Id: <1528734090-220990-5-git-send-email-tiago.lam@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1528734090-220990-1-git-send-email-tiago.lam@intel.com> References: <1528734090-220990-1-git-send-email-tiago.lam@intel.com> X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Cc: i.maximets@samsung.com Subject: [ovs-dev] [PATCH v8 04/13] netdev-dpdk: Serialise non-pmds mbufs' alloc/free. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org A new mutex, 'nonpmd_mp_mutex', has been introduced to serialise allocation and free operations by non-pmd threads on a given mempool. free_dpdk_buf() has been modified to make use of the introduced mutex. Signed-off-by: Tiago Lam --- lib/netdev-dpdk.c | 21 ++++++++++++++++++++- 1 file changed, 20 insertions(+), 1 deletion(-) diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index f546507..efd7c20 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -294,6 +294,10 @@ static struct ovs_mutex dpdk_mp_mutex OVS_ACQ_AFTER(dpdk_mutex) static struct ovs_list dpdk_mp_free_list OVS_GUARDED_BY(dpdk_mp_mutex) = OVS_LIST_INITIALIZER(&dpdk_mp_free_list); +/* This mutex must be used by non pmd threads when allocating or freeing + * mbufs through mempools. */ +static struct ovs_mutex nonpmd_mp_mutex = OVS_MUTEX_INITIALIZER; + /* Wrapper for a mempool released but not yet freed. */ struct dpdk_mp { struct rte_mempool *mp; @@ -461,6 +465,8 @@ struct netdev_rxq_dpdk { dpdk_port_t port_id; }; +static bool dpdk_thread_is_pmd(void); + static void netdev_dpdk_destruct(struct netdev *netdev); static void netdev_dpdk_vhost_destruct(struct netdev *netdev); @@ -494,6 +500,12 @@ dpdk_buf_size(int mtu) NETDEV_DPDK_MBUF_ALIGN); } +static bool +dpdk_thread_is_pmd(void) +{ + return rte_lcore_id() != NON_PMD_CORE_ID; +} + /* Allocates an area of 'sz' bytes from DPDK. The memory is zero'ed. * * Unlike xmalloc(), this function can return NULL on failure. */ @@ -506,9 +518,16 @@ dpdk_rte_mzalloc(size_t sz) void free_dpdk_buf(struct dp_packet *p) { - struct rte_mbuf *pkt = (struct rte_mbuf *) p; + if (!dpdk_thread_is_pmd()) { + ovs_mutex_lock(&nonpmd_mp_mutex); + } + struct rte_mbuf *pkt = (struct rte_mbuf *) p; rte_pktmbuf_free(pkt); + + if (!dpdk_thread_is_pmd()) { + ovs_mutex_unlock(&nonpmd_mp_mutex); + } } static void From patchwork Mon Jun 11 16:21:21 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Lam, Tiago" X-Patchwork-Id: 927774 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=intel.com Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 414JHw1WWhz9s2t for ; Tue, 12 Jun 2018 02:24:28 +1000 (AEST) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id 2C13E1207; Mon, 11 Jun 2018 16:21:59 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id E0B4E1055 for ; Mon, 11 Jun 2018 16:21:57 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 379D5717 for ; Mon, 11 Jun 2018 16:21:56 +0000 (UTC) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga106.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 11 Jun 2018 09:21:56 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.49,502,1520924400"; d="scan'208";a="62210669" Received: from silpixa00399125.ir.intel.com ([10.237.223.34]) by fmsmga004.fm.intel.com with ESMTP; 11 Jun 2018 09:21:53 -0700 From: Tiago Lam To: ovs-dev@openvswitch.org Date: Mon, 11 Jun 2018 17:21:21 +0100 Message-Id: <1528734090-220990-6-git-send-email-tiago.lam@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1528734090-220990-1-git-send-email-tiago.lam@intel.com> References: <1528734090-220990-1-git-send-email-tiago.lam@intel.com> X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Cc: Marcin Ksiadz , Przemyslaw Lal , Michael Qiu , i.maximets@samsung.com Subject: [ovs-dev] [PATCH v8 05/13] dp-packet: Fix data_len handling multi-seg mbufs. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org When a dp_packet is from a DPDK source, and it contains multi-segment mbufs, the data_len is not equal to the packet size, pkt_len. Instead, the data_len of each mbuf in the chain should be considered while distributing the new (provided) size. To account for the above dp_packet_set_size() has been changed so that, in the multi-segment mbufs case, only the data_len on the last mbuf of the chain and the total size of the packet, pkt_len, are changed. The data_len on the intermediate mbufs preceeding the last mbuf is not changed by dp_packet_set_size(). Furthermore, in some cases dp_packet_set_size() may be used to set a smaller size than the current packet size, thus effectively trimming the end of the packet. In the multi-segment mbufs case this may lead to lingering mbufs that may need freeing. __dp_packet_set_data() now also updates an mbufs' data_len after setting the data offset. This is so that both fields are always in sync for each mbuf in a chain. Co-authored-by: Michael Qiu Co-authored-by: Mark Kavanagh Co-authored-by: Przemyslaw Lal Co-authored-by: Marcin Ksiadz Co-authored-by: Yuanhan Liu Signed-off-by: Michael Qiu Signed-off-by: Mark Kavanagh Signed-off-by: Przemyslaw Lal Signed-off-by: Marcin Ksiadz Signed-off-by: Yuanhan Liu Signed-off-by: Tiago Lam --- lib/dp-packet.h | 56 +++++++++++++++++++++++++++++++++++++++++++++----------- 1 file changed, 45 insertions(+), 11 deletions(-) diff --git a/lib/dp-packet.h b/lib/dp-packet.h index 4c104b6..c301ed5 100644 --- a/lib/dp-packet.h +++ b/lib/dp-packet.h @@ -429,17 +429,39 @@ dp_packet_size(const struct dp_packet *b) static inline void dp_packet_set_size(struct dp_packet *b, uint32_t v) { - /* netdev-dpdk does not currently support segmentation; consequently, for - * all intents and purposes, 'data_len' (16 bit) and 'pkt_len' (32 bit) may - * be used interchangably. - * - * On the datapath, it is expected that the size of packets - * (and thus 'v') will always be <= UINT16_MAX; this means that there is no - * loss of accuracy in assigning 'v' to 'data_len'. - */ - b->mbuf.data_len = (uint16_t)v; /* Current seg length. */ - b->mbuf.pkt_len = v; /* Total length of all segments linked to - * this segment. */ + if (b->source == DPBUF_DPDK) { + struct rte_mbuf *seg = &b->mbuf; + struct rte_mbuf *fmbuf = seg; + uint16_t pkt_len = v; + uint16_t seg_len; + uint16_t nb_segs = 0; + + /* Trim 'v' length bytes from the end of the chained buffers, freeing + any buffers that may be left floating */ + while (seg) { + seg_len = MIN(pkt_len, seg->data_len); + seg->data_len = seg_len; + + pkt_len -= seg_len; + if (pkt_len <= 0) { + /* Free the rest of chained mbufs */ + free_dpdk_buf((struct dp_packet *) seg->next); + seg->next = NULL; + } else if (!seg->next) { + seg->data_len = pkt_len; + } + + nb_segs += 1; + seg = seg->next; + } + + fmbuf->nb_segs = nb_segs; + } else { + b->mbuf.data_len = v; + } + + /* Total length of all segments linked to this segment. */ + b->mbuf.pkt_len = v; } static inline uint16_t @@ -451,7 +473,19 @@ __packet_data(const struct dp_packet *b) static inline void __packet_set_data(struct dp_packet *b, uint16_t v) { + uint16_t prev_ofs = b->mbuf.data_off; b->mbuf.data_off = v; + int16_t ofs_diff = prev_ofs - b->mbuf.data_off; + + /* When dealing with DPDK mbufs, keep data_off and data_len in sync. Thus, + * update data_len if the length changes with the move of data_off. + * However, if data_len is 0, there's no data to move and data_Len should + * remain 0. */ + + if (b->mbuf.data_len != 0) { + b->mbuf.data_len = MIN(b->mbuf.data_len + ofs_diff, + b->mbuf.buf_len - b->mbuf.data_off); + } } static inline uint16_t From patchwork Mon Jun 11 16:21:22 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Lam, Tiago" X-Patchwork-Id: 927775 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=intel.com Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 414JJQ4BRhz9rvt for ; Tue, 12 Jun 2018 02:24:54 +1000 (AEST) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id EB5FB1153; Mon, 11 Jun 2018 16:22:02 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id E92211062 for ; Mon, 11 Jun 2018 16:22:01 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 2F5A6711 for ; Mon, 11 Jun 2018 16:22:00 +0000 (UTC) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga106.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 11 Jun 2018 09:21:59 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.49,502,1520924400"; d="scan'208";a="62210730" Received: from silpixa00399125.ir.intel.com ([10.237.223.34]) by fmsmga004.fm.intel.com with ESMTP; 11 Jun 2018 09:21:58 -0700 From: Tiago Lam To: ovs-dev@openvswitch.org Date: Mon, 11 Jun 2018 17:21:22 +0100 Message-Id: <1528734090-220990-7-git-send-email-tiago.lam@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1528734090-220990-1-git-send-email-tiago.lam@intel.com> References: <1528734090-220990-1-git-send-email-tiago.lam@intel.com> X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Cc: i.maximets@samsung.com Subject: [ovs-dev] [PATCH v8 06/13] dp-packet: Handle multi-seg mbufs in helper funcs. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org Most helper functions in dp-packet assume that the data held by a dp_packet is contiguous, and perform operations such as pointer arithmetic under that assumption. However, with the introduction of multi-segment mbufs, where data is non-contiguous, such assumptions are no longer possible. Some examples of Such helper functions are dp_packet_tail(), dp_packet_tailroom(), dp_packet_end(), dp_packet_get_allocated() and dp_packet_at(). Thus, instead of assuming contiguous data in dp_packet, they now iterate over the (non-contiguous) data in mbufs to perform their calculations. Finally, dp_packet_use__() has also been modified to perform the initialisation of the packet (and setting the source) before continuing to set its size and data length, which now depends on the type of packet. Co-authored-by: Mark Kavanagh Signed-off-by: Mark Kavanagh Signed-off-by: Tiago Lam --- lib/dp-packet.c | 4 +- lib/dp-packet.h | 242 +++++++++++++++++++++++++++++++++++++++++++++----------- 2 files changed, 197 insertions(+), 49 deletions(-) diff --git a/lib/dp-packet.c b/lib/dp-packet.c index 782e7c2..2aaeaae 100644 --- a/lib/dp-packet.c +++ b/lib/dp-packet.c @@ -41,11 +41,11 @@ static void dp_packet_use__(struct dp_packet *b, void *base, size_t allocated, enum dp_packet_source source) { + dp_packet_init__(b, allocated, source); + dp_packet_set_base(b, base); dp_packet_set_data(b, base); dp_packet_set_size(b, 0); - - dp_packet_init__(b, allocated, source); } /* Initializes 'b' as an empty dp_packet that contains the 'allocated' bytes of diff --git a/lib/dp-packet.h b/lib/dp-packet.h index c301ed5..272597f 100644 --- a/lib/dp-packet.h +++ b/lib/dp-packet.h @@ -133,6 +133,10 @@ static inline void *dp_packet_at(const struct dp_packet *, size_t offset, size_t size); static inline void *dp_packet_at_assert(const struct dp_packet *, size_t offset, size_t size); +#ifdef DPDK_NETDEV +static inline void * dp_packet_at_offset(const struct dp_packet *b, + size_t offset); +#endif static inline void *dp_packet_tail(const struct dp_packet *); static inline void *dp_packet_end(const struct dp_packet *); @@ -180,40 +184,6 @@ dp_packet_delete(struct dp_packet *b) } } -/* If 'b' contains at least 'offset + size' bytes of data, returns a pointer to - * byte 'offset'. Otherwise, returns a null pointer. */ -static inline void * -dp_packet_at(const struct dp_packet *b, size_t offset, size_t size) -{ - return offset + size <= dp_packet_size(b) - ? (char *) dp_packet_data(b) + offset - : NULL; -} - -/* Returns a pointer to byte 'offset' in 'b', which must contain at least - * 'offset + size' bytes of data. */ -static inline void * -dp_packet_at_assert(const struct dp_packet *b, size_t offset, size_t size) -{ - ovs_assert(offset + size <= dp_packet_size(b)); - return ((char *) dp_packet_data(b)) + offset; -} - -/* Returns a pointer to byte following the last byte of data in use in 'b'. */ -static inline void * -dp_packet_tail(const struct dp_packet *b) -{ - return (char *) dp_packet_data(b) + dp_packet_size(b); -} - -/* Returns a pointer to byte following the last byte allocated for use (but - * not necessarily in use) in 'b'. */ -static inline void * -dp_packet_end(const struct dp_packet *b) -{ - return (char *) dp_packet_base(b) + dp_packet_get_allocated(b); -} - /* Returns the number of bytes of headroom in 'b', that is, the number of bytes * of unused space in dp_packet 'b' before the data that is in use. (Most * commonly, the data in a dp_packet is at its beginning, and thus the @@ -229,6 +199,14 @@ dp_packet_headroom(const struct dp_packet *b) static inline size_t dp_packet_tailroom(const struct dp_packet *b) { +#ifdef DPDK_NETDEV + if (b->source == DPBUF_DPDK) { + struct rte_mbuf *tmbuf = CONST_CAST(struct rte_mbuf *, &b->mbuf); + tmbuf = rte_pktmbuf_lastseg(tmbuf); + + return rte_pktmbuf_tailroom(tmbuf); + } +#endif return (char *) dp_packet_end(b) - (char *) dp_packet_tail(b); } @@ -236,8 +214,13 @@ dp_packet_tailroom(const struct dp_packet *b) static inline void dp_packet_clear(struct dp_packet *b) { +#ifdef DPDK_NETDEV + dp_packet_set_size(b, 0); + rte_pktmbuf_reset(&b->mbuf); +#else dp_packet_set_data(b, dp_packet_base(b)); dp_packet_set_size(b, 0); +#endif } /* Removes 'size' bytes from the head end of 'b', which must contain at least @@ -252,12 +235,32 @@ dp_packet_pull(struct dp_packet *b, size_t size) return data; } +#ifdef DPDK_NETDEV +/* Similar to dp_packet_try_pull() but doesn't actually pull any code, only + * returns true or false if it would be possible to actually pull any code. + * Valid for dp_packets carrying mbufs only. */ +static inline bool +dp_packet_mbuf_may_pull(const struct dp_packet *b, size_t size) { + if (size > b->mbuf.data_len) { + return false; + } + + return true; +} +#endif + /* If 'b' has at least 'size' bytes of data, removes that many bytes from the * head end of 'b' and returns the first byte removed. Otherwise, returns a * null pointer without modifying 'b'. */ static inline void * dp_packet_try_pull(struct dp_packet *b, size_t size) { +#ifdef DPDK_NETDEV + if (!dp_packet_mbuf_may_pull(b, size)) { + return NULL; + } +#endif + return dp_packet_size(b) - dp_packet_l2_pad_size(b) >= size ? dp_packet_pull(b, size) : NULL; } @@ -311,6 +314,12 @@ dp_packet_set_l2_pad_size(struct dp_packet *b, uint8_t pad_size) static inline void * dp_packet_l2_5(const struct dp_packet *b) { +#ifdef DPDK_NETDEV + if (!dp_packet_mbuf_may_pull(b, b->l2_5_ofs)) { + return NULL; + } +#endif + return b->l2_5_ofs != UINT16_MAX ? (char *) dp_packet_data(b) + b->l2_5_ofs : NULL; @@ -327,6 +336,12 @@ dp_packet_set_l2_5(struct dp_packet *b, void *l2_5) static inline void * dp_packet_l3(const struct dp_packet *b) { +#ifdef DPDK_NETDEV + if (!dp_packet_mbuf_may_pull(b, b->l3_ofs)) { + return NULL; + } +#endif + return b->l3_ofs != UINT16_MAX ? (char *) dp_packet_data(b) + b->l3_ofs : NULL; @@ -341,6 +356,12 @@ dp_packet_set_l3(struct dp_packet *b, void *l3) static inline void * dp_packet_l4(const struct dp_packet *b) { +#ifdef DPDK_NETDEV + if (!dp_packet_mbuf_may_pull(b, b->l4_ofs)) { + return NULL; + } +#endif + return b->l4_ofs != UINT16_MAX ? (char *) dp_packet_data(b) + b->l4_ofs : NULL; @@ -355,10 +376,37 @@ dp_packet_set_l4(struct dp_packet *b, void *l4) static inline size_t dp_packet_l4_size(const struct dp_packet *b) { +#ifdef DPDK_NETDEV + if (b->l4_ofs == UINT16_MAX) { + return 0; + } + + struct rte_mbuf *hmbuf = CONST_CAST(struct rte_mbuf *, &b->mbuf); + struct rte_mbuf *tmbuf = dp_packet_tail(b); + + size_t l4_size = 0; + if (hmbuf == tmbuf) { + l4_size = hmbuf->data_len - b->l4_ofs; + } else { + l4_size = hmbuf->data_len - b->l4_ofs; + hmbuf = hmbuf->next; + + while (hmbuf) { + l4_size += tmbuf->data_len; + + hmbuf = hmbuf->next; + } + } + + l4_size -= dp_packet_l2_pad_size(b); + + return l4_size; +#else return b->l4_ofs != UINT16_MAX ? (const char *)dp_packet_tail(b) - (const char *)dp_packet_l4(b) - dp_packet_l2_pad_size(b) : 0; +#endif } static inline const void * @@ -491,7 +539,7 @@ __packet_set_data(struct dp_packet *b, uint16_t v) static inline uint16_t dp_packet_get_allocated(const struct dp_packet *b) { - return b->mbuf.buf_len; + return b->mbuf.nb_segs * b->mbuf.buf_len; } static inline void @@ -499,7 +547,107 @@ dp_packet_set_allocated(struct dp_packet *b, uint16_t s) { b->mbuf.buf_len = s; } + +static inline void * +dp_packet_at_offset(const struct dp_packet *b, size_t offset) +{ + if (b->source == DPBUF_DPDK) { + struct rte_mbuf *buf = CONST_CAST(struct rte_mbuf *, &b->mbuf); + + while (buf && offset > buf->data_len) { + offset -= buf->data_len; + + buf = buf->next; + } + return buf ? rte_pktmbuf_mtod_offset(buf, char *, offset) : NULL; + } else { + return (char *) dp_packet_data(b) + offset; + } +} + +/* If 'b' contains at least 'offset + size' bytes of data, returns a pointer to + * byte 'offset'. Otherwise, returns a null pointer. */ +static inline void * +dp_packet_at(const struct dp_packet *b, size_t offset, size_t size) +{ + return offset + size <= dp_packet_size(b) + ? dp_packet_at_offset(b, offset) + : NULL; +} + +/* Returns a pointer to byte 'offset' in 'b', which must contain at least + * 'offset + size' bytes of data. */ +static inline void * +dp_packet_at_assert(const struct dp_packet *b, size_t offset, size_t size) +{ + ovs_assert(offset + size <= dp_packet_size(b)); + return dp_packet_at_offset(b, offset); +} + +/* Returns a pointer to byte following the last byte of data in use in 'b'. */ +static inline void * +dp_packet_tail(const struct dp_packet *b) +{ + struct rte_mbuf *buf = CONST_CAST(struct rte_mbuf *, &b->mbuf); + if (b->source == DPBUF_DPDK) { + /* Find last segment where data ends, meaning the tail of the chained + mbufs is there */ + buf = rte_pktmbuf_lastseg(buf); + } + + return rte_pktmbuf_mtod_offset(buf, void *, buf->data_len); +} + +/* Returns a pointer to byte following the last byte allocated for use (but + * not necessarily in use) in 'b'. */ +static inline void * +dp_packet_end(const struct dp_packet *b) +{ + if (b->source == DPBUF_DPDK) { + struct rte_mbuf *buf = CONST_CAST(struct rte_mbuf *, &(b->mbuf)); + + buf = rte_pktmbuf_lastseg(buf); + + return (char *) buf->buf_addr + buf->buf_len; + } else { + return (char *) dp_packet_base(b) + dp_packet_get_allocated(b); + } +} #else +/* If 'b' contains at least 'offset + size' bytes of data, returns a pointer to + * byte 'offset'. Otherwise, returns a null pointer. */ +static inline void * +dp_packet_at(const struct dp_packet *b, size_t offset, size_t size) +{ + return offset + size <= dp_packet_size(b) + ? (char *) dp_packet_data(b) + offset + : NULL; +} + +/* Returns a pointer to byte 'offset' in 'b', which must contain at least + * 'offset + size' bytes of data. */ +static inline void * +dp_packet_at_assert(const struct dp_packet *b, size_t offset, size_t size) +{ + ovs_assert(offset + size <= dp_packet_size(b)); + return ((char *) dp_packet_data(b)) + offset; +} + +/* Returns a pointer to byte following the last byte of data in use in 'b'. */ +static inline void * +dp_packet_tail(const struct dp_packet *b) +{ + return (char *) dp_packet_data(b) + dp_packet_size(b); +} + +/* Returns a pointer to byte following the last byte allocated for use (but + * not necessarily in use) in 'b'. */ +static inline void * +dp_packet_end(const struct dp_packet *b) +{ + return (char *) dp_packet_base(b) + dp_packet_get_allocated(b); +} + static inline void * dp_packet_base(const struct dp_packet *b) { @@ -518,34 +666,34 @@ dp_packet_size(const struct dp_packet *b) return b->size_; } -static inline void -dp_packet_set_size(struct dp_packet *b, uint32_t v) +static inline uint16_t +dp_packet_get_allocated(const struct dp_packet *b) { - b->size_ = v; + return b->allocated_; } -static inline uint16_t -__packet_data(const struct dp_packet *b) +static inline void +dp_packet_set_allocated(struct dp_packet *b, uint16_t s) { - return b->data_ofs; + b->allocated_ = s; } static inline void -__packet_set_data(struct dp_packet *b, uint16_t v) +dp_packet_set_size(struct dp_packet *b, uint32_t v) { - b->data_ofs = v; + b->size_ = v; } static inline uint16_t -dp_packet_get_allocated(const struct dp_packet *b) +__packet_data(const struct dp_packet *b) { - return b->allocated_; + return b->data_ofs; } static inline void -dp_packet_set_allocated(struct dp_packet *b, uint16_t s) +__packet_set_data(struct dp_packet *b, uint16_t v) { - b->allocated_ = s; + b->data_ofs = v; } #endif From patchwork Mon Jun 11 16:21:23 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Lam, Tiago" X-Patchwork-Id: 927776 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=intel.com Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 414JKV46bsz9rvt for ; Tue, 12 Jun 2018 02:25:29 +1000 (AEST) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id DED6D1060; Mon, 11 Jun 2018 16:22:06 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id 4D63712E2 for ; Mon, 11 Jun 2018 16:22:05 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 4D46D72F for ; Mon, 11 Jun 2018 16:22:03 +0000 (UTC) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga106.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 11 Jun 2018 09:22:03 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.49,502,1520924400"; d="scan'208";a="62210757" Received: from silpixa00399125.ir.intel.com ([10.237.223.34]) by fmsmga004.fm.intel.com with ESMTP; 11 Jun 2018 09:22:01 -0700 From: Tiago Lam To: ovs-dev@openvswitch.org Date: Mon, 11 Jun 2018 17:21:23 +0100 Message-Id: <1528734090-220990-8-git-send-email-tiago.lam@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1528734090-220990-1-git-send-email-tiago.lam@intel.com> References: <1528734090-220990-1-git-send-email-tiago.lam@intel.com> X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Cc: i.maximets@samsung.com Subject: [ovs-dev] [PATCH v8 07/13] dp-packet: Handle multi-seg mbufs in put*() funcs. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org The dp_packet_put*() function - dp_packet_put_uninit(), dp_packet_put() and dp_packet_put_zeros() - are, in their current implementation, operating on the data buffer of a dp_packet as if it were contiguous, which in the case of multi-segment mbufs means they operate on the first mbuf in the chain. However, in the case of dp_packet_put_uninit(), for example, it is the data length of the last mbuf in the mbuf chain that should be adjusted. These functions have thus been modified to support multi-segment mbufs. Additionally, most of the core logic in dp_pcket_put_uninit() was moved to a new helper function, dp_packet_put_uninit()_, to abstract the implementation details from the API, since in the case of multi-seg mbufs a new struct is returned that holds the mbuf and offset that constitute the tail. For the single mbuf case a pointer to the byte that constitute the tail still returned. Co-authored-by: Mark Kavanagh Signed-off-by: Mark Kavanagh Signed-off-by: Tiago Lam --- lib/dp-packet.c | 34 ++++++++++++++++++++++++---------- lib/dp-packet.h | 5 +++++ 2 files changed, 29 insertions(+), 10 deletions(-) diff --git a/lib/dp-packet.c b/lib/dp-packet.c index 2aaeaae..9b97dd4 100644 --- a/lib/dp-packet.c +++ b/lib/dp-packet.c @@ -321,29 +321,43 @@ dp_packet_put_uninit(struct dp_packet *b, size_t size) void *p; dp_packet_prealloc_tailroom(b, size); p = dp_packet_tail(b); +#ifdef DPDK_NETDEV + struct rte_mbuf *mbuf; + + if (b->source == DPBUF_DPDK) { + mbuf = rte_pktmbuf_lastseg(&b->mbuf); + } else { + mbuf = CONST_CAST(struct rte_mbuf *, &b->mbuf); + } + + mbuf->data_len += size; +#endif dp_packet_set_size(b, dp_packet_size(b) + size); + return p; } -/* Appends 'size' zeroed bytes to the tail end of 'b'. Data in 'b' is - * reallocated and copied if necessary. Returns a pointer to the first byte of - * the data's location in the dp_packet. */ +/* Appends the 'size' bytes of data in 'p' to the tail end of 'b'. Data in 'b' + * is reallocated and copied if necessary. Returns a pointer to the first + * byte of the data's location in the dp_packet. */ void * -dp_packet_put_zeros(struct dp_packet *b, size_t size) +dp_packet_put(struct dp_packet *b, const void *p, size_t size) { void *dst = dp_packet_put_uninit(b, size); - memset(dst, 0, size); + memcpy(dst, p, size); + return dst; } -/* Appends the 'size' bytes of data in 'p' to the tail end of 'b'. Data in 'b' - * is reallocated and copied if necessary. Returns a pointer to the first - * byte of the data's location in the dp_packet. */ +/* Appends 'size' zeroed bytes to the tail end of 'b'. Data in 'b' is + * reallocated and copied if necessary. Returns a pointer to the first byte of + * the data's location in the dp_packet. */ void * -dp_packet_put(struct dp_packet *b, const void *p, size_t size) +dp_packet_put_zeros(struct dp_packet *b, size_t size) { void *dst = dp_packet_put_uninit(b, size); - memcpy(dst, p, size); + memset(dst, 0, size); + return dst; } diff --git a/lib/dp-packet.h b/lib/dp-packet.h index 21fa05e..8c0e23c 100644 --- a/lib/dp-packet.h +++ b/lib/dp-packet.h @@ -83,6 +83,11 @@ struct dp_packet { #ifdef DPDK_NETDEV #define MBUF_BUF_END(BUF_ADDR, BUF_LEN) \ (char *) (((char *) BUF_ADDR) + BUF_LEN) + +struct mbuf_tail { + struct rte_mbuf *mbuf; + uint16_t ofs; +}; #endif static inline void *dp_packet_data(const struct dp_packet *); From patchwork Mon Jun 11 16:21:25 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Lam, Tiago" X-Patchwork-Id: 927778 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=intel.com Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 414JL93SRVz9rvt for ; Tue, 12 Jun 2018 02:26:25 +1000 (AEST) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id 4016212F9; Mon, 11 Jun 2018 16:22:10 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id 159F312F9 for ; Mon, 11 Jun 2018 16:22:09 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 81417711 for ; Mon, 11 Jun 2018 16:22:08 +0000 (UTC) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga106.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 11 Jun 2018 09:22:08 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.49,502,1520924400"; d="scan'208";a="62210785" Received: from silpixa00399125.ir.intel.com ([10.237.223.34]) by fmsmga004.fm.intel.com with ESMTP; 11 Jun 2018 09:22:07 -0700 From: Tiago Lam To: ovs-dev@openvswitch.org Date: Mon, 11 Jun 2018 17:21:25 +0100 Message-Id: <1528734090-220990-10-git-send-email-tiago.lam@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1528734090-220990-1-git-send-email-tiago.lam@intel.com> References: <1528734090-220990-1-git-send-email-tiago.lam@intel.com> X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Cc: i.maximets@samsung.com Subject: [ovs-dev] [PATCH v8 08/13] dp-packet: Handle multi-seg mubfs in shift() func. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org In its current implementation dp_packet_shift() is also unaware of multi-seg mbufs (that holds data in memory non-contiguously) and assumes that data exists contiguously in memory, memmove'ing data to perform the shift. To add support for multi-seg mbuds a new set of functions was introduced, dp_packet_mbuf_shift() and dp_packet_mbuf_write(). These functions are used by dp_packet_shift(), when handling multi-seg mbufs, to shift and write data within a chain of mbufs. Signed-off-by: Tiago Lam --- lib/dp-packet.c | 102 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ lib/dp-packet.h | 8 +++++ 2 files changed, 110 insertions(+) diff --git a/lib/dp-packet.c b/lib/dp-packet.c index 9f8503e..399fadb 100644 --- a/lib/dp-packet.c +++ b/lib/dp-packet.c @@ -294,6 +294,102 @@ dp_packet_prealloc_headroom(struct dp_packet *b, size_t size) } } +#ifdef DPDK_NETDEV +/* Write len data bytes in a mbuf at specified offset. + * + * 'mbuf', pointer to the destination mbuf where 'ofs' is, and the mbuf where + * the data will first be written. + * 'ofs', the offset within the provided 'mbuf' where 'data' is to be written. + * 'len', the size of the to be written 'data'. + * 'data', pointer to the to be written bytes. + * + * XXX: This function is the counterpart of the `rte_pktmbuf_read()` function + * available with DPDK, in the rte_mbuf.h */ +void +dp_packet_mbuf_write(struct rte_mbuf *mbuf, int16_t ofs, uint32_t len, + const void *data) +{ + char *dst_addr; + uint16_t data_len; + int len_copy; + while (mbuf) { + if (len == 0) { + break; + } + + dst_addr = rte_pktmbuf_mtod_offset(mbuf, char *, ofs); + data_len = MBUF_BUF_END(mbuf->buf_addr, mbuf->buf_len) - dst_addr; + + len_copy = MIN(len, data_len); + /* We don't know if 'data' is the result of a rte_pktmbuf_read() call, + * in which case we may end up writing to the same region of memory we + * are reading from and overlapping. Hence the use of memmove() here */ + memmove(dst_addr, data, len_copy); + + data = ((char *) data) + len_copy; + len -= len_copy; + ofs = 0; + + mbuf = mbuf->next; + } +} + +static void +dp_packet_mbuf_shift_(struct rte_mbuf *dbuf, int16_t dst_ofs, + const struct rte_mbuf *sbuf, uint16_t src_ofs, int len) +{ + char rd[len]; + const char *wd = rte_pktmbuf_read(sbuf, src_ofs, len, rd); + + ovs_assert(wd); + + dp_packet_mbuf_write(dbuf, dst_ofs, len, wd); +} + +/* Similarly to dp_packet_shift(), shifts the data within the mbufs of a + * dp_packet of DPBUF_DPDK source by 'delta' bytes. + * Caller must make sure of the following conditions: + * - When shifting left, delta can't be bigger than the data_len available in + * the last mbuf; + * - When shifting right, delta can't be bigger than the space available in the + * first mbuf (buf_len - data_off). + * Both these conditions guarantee that a shift operation doesn't fall outside + * the bounds of the existing mbufs, so that the first and last mbufs (when + * using multi-segment mbufs), remain the same. */ +static void +dp_packet_mbuf_shift(struct dp_packet *b, int delta) +{ + uint16_t src_ofs; + int16_t dst_ofs; + + struct rte_mbuf *mbuf = CONST_CAST(struct rte_mbuf *, &b->mbuf); + struct rte_mbuf *tmbuf = rte_pktmbuf_lastseg(mbuf); + + if (delta < 0) { + ovs_assert(-delta <= tmbuf->data_len); + } else { + ovs_assert(delta < (mbuf->buf_len - mbuf->data_off)); + } + + /* Set the destination and source offsets to copy to */ + dst_ofs = delta; + src_ofs = 0; + + /* Shift data from src mbuf and offset to dst mbuf and offset */ + dp_packet_mbuf_shift_(mbuf, dst_ofs, mbuf, src_ofs, + rte_pktmbuf_pkt_len(mbuf)); + + /* Update mbufs' properties, and if using multi-segment mbufs, first and + * last mbuf's data_len also needs to be adjusted */ + mbuf->data_off = mbuf->data_off + dst_ofs; + + if (mbuf != tmbuf) { + tmbuf->data_len += delta; + mbuf->data_len -= delta; + } +} +#endif + /* Shifts all of the data within the allocated space in 'b' by 'delta' bytes. * For example, a 'delta' of 1 would cause each byte of data to move one byte * forward (from address 'p' to 'p+1'), and a 'delta' of -1 would cause each @@ -306,6 +402,12 @@ dp_packet_shift(struct dp_packet *b, int delta) : true); if (delta != 0) { +#ifdef DPDK_NETDEV + if (b->source == DPBUF_DPDK) { + dp_packet_mbuf_shift(b, delta); + return; + } +#endif char *dst = (char *) dp_packet_data(b) + delta; memmove(dst, dp_packet_data(b), dp_packet_size(b)); dp_packet_set_data(b, dst); diff --git a/lib/dp-packet.h b/lib/dp-packet.h index 272597f..4946fa3 100644 --- a/lib/dp-packet.h +++ b/lib/dp-packet.h @@ -80,6 +80,11 @@ struct dp_packet { }; }; +#ifdef DPDK_NETDEV +#define MBUF_BUF_END(BUF_ADDR, BUF_LEN) \ + (char *) (((char *) BUF_ADDR) + BUF_LEN) +#endif + static inline void *dp_packet_data(const struct dp_packet *); static inline void dp_packet_set_data(struct dp_packet *, void *); static inline void *dp_packet_base(const struct dp_packet *); @@ -136,6 +141,9 @@ static inline void *dp_packet_at_assert(const struct dp_packet *, #ifdef DPDK_NETDEV static inline void * dp_packet_at_offset(const struct dp_packet *b, size_t offset); +void +dp_packet_mbuf_write(struct rte_mbuf *mbuf, int16_t ofs, uint32_t len, + const void *data); #endif static inline void *dp_packet_tail(const struct dp_packet *); static inline void *dp_packet_end(const struct dp_packet *); From patchwork Mon Jun 11 16:21:26 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Lam, Tiago" X-Patchwork-Id: 927779 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=intel.com Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 414JLr3rY0z9rvt for ; Tue, 12 Jun 2018 02:27:00 +1000 (AEST) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id 00F89134E; Mon, 11 Jun 2018 16:22:16 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id 8773A12A5 for ; Mon, 11 Jun 2018 16:22:14 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 13345717 for ; Mon, 11 Jun 2018 16:22:13 +0000 (UTC) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga106.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 11 Jun 2018 09:22:10 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.49,502,1520924400"; d="scan'208";a="62210817" Received: from silpixa00399125.ir.intel.com ([10.237.223.34]) by fmsmga004.fm.intel.com with ESMTP; 11 Jun 2018 09:22:09 -0700 From: Tiago Lam To: ovs-dev@openvswitch.org Date: Mon, 11 Jun 2018 17:21:26 +0100 Message-Id: <1528734090-220990-11-git-send-email-tiago.lam@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1528734090-220990-1-git-send-email-tiago.lam@intel.com> References: <1528734090-220990-1-git-send-email-tiago.lam@intel.com> X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Cc: i.maximets@samsung.com Subject: [ovs-dev] [PATCH v8 09/13] dp-packet: Handle multi-seg mbufs in resize__(). X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org When enabled with DPDK OvS relies on mbufs allocated by mempools to receive and output data on DPDK ports. Until now, each OvS dp_packet has had only one mbuf associated, which is allocated with the maximum possible size, taking the MTU into account. This approach, however, doesn't allow us to increase the allocated size in an mbuf, if needed, since an mbuf is allocated and initialised upon mempool creation. Thus, in the current implementatin this is dealt with by calling OVS_NOT_REACHED() and terminating OvS. To avoid this, and allow the (already) allocated space to be better used, dp_packet_resize__() now tries to use the available room, both the tailroom and the headroom, to make enough space for the new data. Since this happens for packets of source DPBUF_DPDK, the single-segment mbuf case mentioned above is also covered by this new aproach in resize__(). Signed-off-by: Tiago Lam --- lib/dp-packet.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 46 insertions(+), 2 deletions(-) diff --git a/lib/dp-packet.c b/lib/dp-packet.c index 399fadb..d0fab94 100644 --- a/lib/dp-packet.c +++ b/lib/dp-packet.c @@ -237,9 +237,51 @@ dp_packet_resize__(struct dp_packet *b, size_t new_headroom, size_t new_tailroom new_allocated = new_headroom + dp_packet_size(b) + new_tailroom; switch (b->source) { + /* When resizing mbufs, both a single mbuf and multi-segment mbufs (where + * data is not contigously held in memory), both the headroom and the + * tailroom available will be used to make more space for where data needs + * to be inserted. I.e if there's not enough headroom, data may be shifted + * right if there's enough tailroom. + * However, this is not bulletproof and in some cases the space available + * won't be enough - in those cases, an error should be returned and the + * packet dropped. */ case DPBUF_DPDK: - OVS_NOT_REACHED(); + { + size_t miss_len; + + if (new_headroom == dp_packet_headroom(b)) { + /* This is a tailroom adjustment. Since there's no tailroom space + * left, try and shift data towards the head to free up tail space, + * if there's enough headroom */ + + miss_len = new_tailroom - dp_packet_tailroom(b); + + if (miss_len <= new_headroom) { + dp_packet_shift(b, -miss_len); + } else { + /* XXX: Handle error case and report error to caller */ + OVS_NOT_REACHED(); + } + } else { + /* Otherwise, this is a headroom adjustment. Try to shift data + * towards the tail to free up head space, if there's enough + * tailroom */ + + miss_len = new_headroom - dp_packet_headroom(b); + + if (miss_len <= new_tailroom) { + dp_packet_shift(b, miss_len); + } else { + /* XXX: Handle error case and report error to caller */ + OVS_NOT_REACHED(); + } + } + + new_base = dp_packet_base(b); + + break; + } case DPBUF_MALLOC: if (new_headroom == dp_packet_headroom(b)) { new_base = xrealloc(dp_packet_base(b), new_allocated); @@ -263,7 +305,9 @@ dp_packet_resize__(struct dp_packet *b, size_t new_headroom, size_t new_tailroom OVS_NOT_REACHED(); } - dp_packet_set_allocated(b, new_allocated); + if (b->source != DPBUF_DPDK) { + dp_packet_set_allocated(b, new_allocated); + } dp_packet_set_base(b, new_base); new_data = (char *) new_base + new_headroom; From patchwork Mon Jun 11 16:21:27 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Lam, Tiago" X-Patchwork-Id: 927780 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=intel.com Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 414JMS1zTDz9rvt for ; Tue, 12 Jun 2018 02:27:32 +1000 (AEST) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id A68B51337; Mon, 11 Jun 2018 16:22:16 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id 261B012A5 for ; Mon, 11 Jun 2018 16:22:15 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 916EA718 for ; Mon, 11 Jun 2018 16:22:14 +0000 (UTC) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga106.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 11 Jun 2018 09:22:12 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.49,502,1520924400"; d="scan'208";a="62210832" Received: from silpixa00399125.ir.intel.com ([10.237.223.34]) by fmsmga004.fm.intel.com with ESMTP; 11 Jun 2018 09:22:11 -0700 From: Tiago Lam To: ovs-dev@openvswitch.org Date: Mon, 11 Jun 2018 17:21:27 +0100 Message-Id: <1528734090-220990-12-git-send-email-tiago.lam@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1528734090-220990-1-git-send-email-tiago.lam@intel.com> References: <1528734090-220990-1-git-send-email-tiago.lam@intel.com> X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Cc: i.maximets@samsung.com, Michael Qiu Subject: [ovs-dev] [PATCH v8 10/13] dp-packet: copy data from multi-seg. DPDK mbuf X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org From: Michael Qiu When doing packet clone, if packet source is from DPDK driver, multi-segment must be considered, and copy the segment's data one by one. Also, lots of DPDK mbuf's info is missed during a copy, like packet type, ol_flags, etc. That information is very important for DPDK to do packets processing. Co-authored-by: Mark Kavanagh Signed-off-by: Michael Qiu Signed-off-by: Mark Kavanagh --- lib/dp-packet.c | 76 +++++++++++++++++++++++++++++++++++++++++++++++-------- lib/dp-packet.h | 3 +++ lib/netdev-dpdk.c | 1 + 3 files changed, 69 insertions(+), 11 deletions(-) diff --git a/lib/dp-packet.c b/lib/dp-packet.c index d0fab94..2e65b82 100644 --- a/lib/dp-packet.c +++ b/lib/dp-packet.c @@ -48,6 +48,23 @@ dp_packet_use__(struct dp_packet *b, void *base, size_t allocated, dp_packet_set_size(b, 0); } +#ifdef DPDK_NETDEV +void +dp_packet_copy_mbuf_flags(struct dp_packet *dst, const struct dp_packet *src) +{ + ovs_assert(dst != NULL && src != NULL); + struct rte_mbuf *buf_dst = &(dst->mbuf); + struct rte_mbuf buf_src = src->mbuf; + + buf_dst->nb_segs = buf_src.nb_segs; + buf_dst->ol_flags = buf_src.ol_flags; + buf_dst->packet_type = buf_src.packet_type; + buf_dst->tx_offload = buf_src.tx_offload; +} +#else +#define dp_packet_copy_mbuf_flags(arg1, arg2) +#endif + /* Initializes 'b' as an empty dp_packet that contains the 'allocated' bytes of * memory starting at 'base'. 'base' should be the first byte of a region * obtained from malloc(). It will be freed (with free()) if 'b' is resized or @@ -158,6 +175,50 @@ dp_packet_clone(const struct dp_packet *buffer) return dp_packet_clone_with_headroom(buffer, 0); } +#ifdef DPDK_NETDEV +struct dp_packet * +dp_packet_clone_with_headroom(const struct dp_packet *buffer, + size_t headroom) { + struct dp_packet *new_buffer; + uint32_t pkt_len = dp_packet_size(buffer); + + /* copy multi-seg data */ + if (buffer->source == DPBUF_DPDK && buffer->mbuf.nb_segs > 1) { + uint32_t offset = 0; + void *dst = NULL; + struct rte_mbuf *tmbuf = CONST_CAST(struct rte_mbuf *, + &(buffer->mbuf)); + + new_buffer = dp_packet_new_with_headroom(pkt_len, headroom); + dp_packet_set_size(new_buffer, pkt_len + headroom); + dst = dp_packet_tail(new_buffer); + + while (tmbuf) { + rte_memcpy((char *)dst + offset, + rte_pktmbuf_mtod(tmbuf, void *), tmbuf->data_len); + offset += tmbuf->data_len; + tmbuf = tmbuf->next; + } + } else { + new_buffer = dp_packet_clone_data_with_headroom(dp_packet_data(buffer), + dp_packet_size(buffer), + headroom); + } + + /* Copy the following fields into the returned buffer: l2_pad_size, + * l2_5_ofs, l3_ofs, l4_ofs, cutlen, packet_type and md. */ + memcpy(&new_buffer->l2_pad_size, &buffer->l2_pad_size, + sizeof(struct dp_packet) - + offsetof(struct dp_packet, l2_pad_size)); + + dp_packet_copy_mbuf_flags(new_buffer, buffer); + if (dp_packet_rss_valid(new_buffer)) { + new_buffer->mbuf.hash.rss = buffer->mbuf.hash.rss; + } + + return new_buffer; +} +#else /* Creates and returns a new dp_packet whose data are copied from 'buffer'. * The returned dp_packet will additionally have 'headroom' bytes of * headroom. */ @@ -165,32 +226,25 @@ struct dp_packet * dp_packet_clone_with_headroom(const struct dp_packet *buffer, size_t headroom) { struct dp_packet *new_buffer; + uint32_t pkt_len = dp_packet_size(buffer); new_buffer = dp_packet_clone_data_with_headroom(dp_packet_data(buffer), - dp_packet_size(buffer), - headroom); + pkt_len, headroom); + /* Copy the following fields into the returned buffer: l2_pad_size, * l2_5_ofs, l3_ofs, l4_ofs, cutlen, packet_type and md. */ memcpy(&new_buffer->l2_pad_size, &buffer->l2_pad_size, sizeof(struct dp_packet) - offsetof(struct dp_packet, l2_pad_size)); -#ifdef DPDK_NETDEV - new_buffer->mbuf.ol_flags = buffer->mbuf.ol_flags; -#else new_buffer->rss_hash_valid = buffer->rss_hash_valid; -#endif - if (dp_packet_rss_valid(new_buffer)) { -#ifdef DPDK_NETDEV - new_buffer->mbuf.hash.rss = buffer->mbuf.hash.rss; -#else new_buffer->rss_hash = buffer->rss_hash; -#endif } return new_buffer; } +#endif /* Creates and returns a new dp_packet that initially contains a copy of the * 'size' bytes of data starting at 'data' with no headroom or tailroom. */ diff --git a/lib/dp-packet.h b/lib/dp-packet.h index 4946fa3..3852756 100644 --- a/lib/dp-packet.h +++ b/lib/dp-packet.h @@ -124,6 +124,9 @@ void dp_packet_init_dpdk(struct dp_packet *); void dp_packet_init(struct dp_packet *, size_t); void dp_packet_uninit(struct dp_packet *); +void dp_packet_copy_mbuf_flags(struct dp_packet *dst, + const struct dp_packet *src); + struct dp_packet *dp_packet_new(size_t); struct dp_packet *dp_packet_new_with_headroom(size_t, size_t headroom); struct dp_packet *dp_packet_clone(const struct dp_packet *); diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index efd7c20..9b1fb9a 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -2215,6 +2215,7 @@ dpdk_do_tx_copy(struct netdev *netdev, int qid, struct dp_packet_batch *batch) memcpy(rte_pktmbuf_mtod(pkts[txcnt], void *), dp_packet_data(packet), size); dp_packet_set_size((struct dp_packet *)pkts[txcnt], size); + dp_packet_copy_mbuf_flags((struct dp_packet *)pkts[txcnt], packet); txcnt++; } From patchwork Mon Jun 11 16:21:28 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Lam, Tiago" X-Patchwork-Id: 927781 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=intel.com Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 414JN62gXKz9s01 for ; Tue, 12 Jun 2018 02:28:06 +1000 (AEST) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id 7F6A61369; Mon, 11 Jun 2018 16:22:19 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id A3B9F1356 for ; Mon, 11 Jun 2018 16:22:17 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 30926718 for ; Mon, 11 Jun 2018 16:22:16 +0000 (UTC) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga106.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 11 Jun 2018 09:22:15 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.49,502,1520924400"; d="scan'208";a="62210840" Received: from silpixa00399125.ir.intel.com ([10.237.223.34]) by fmsmga004.fm.intel.com with ESMTP; 11 Jun 2018 09:22:13 -0700 From: Tiago Lam To: ovs-dev@openvswitch.org Date: Mon, 11 Jun 2018 17:21:28 +0100 Message-Id: <1528734090-220990-13-git-send-email-tiago.lam@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1528734090-220990-1-git-send-email-tiago.lam@intel.com> References: <1528734090-220990-1-git-send-email-tiago.lam@intel.com> X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Cc: Michael Qiu , i.maximets@samsung.com Subject: [ovs-dev] [PATCH v8 11/13] netdev-dpdk: copy large packet to multi-seg. mbufs X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org From: Mark Kavanagh Currently, packets are only copied to a single segment in the function dpdk_do_tx_copy(). This could be an issue in the case of jumbo frames, particularly when multi-segment mbufs are involved. This patch calculates the number of segments needed by a packet and copies the data to each segment. A new function, dpdk_buf_alloc(), has also been introduced as a wrapper around the nonpmd_mp_mutex to serialise allocations from a non-pmd context. Co-authored-by: Michael Qiu Co-authored-by: Tiago Lam Signed-off-by: Mark Kavanagh Signed-off-by: Michael Qiu Signed-off-by: Tiago Lam --- lib/netdev-dpdk.c | 94 +++++++++++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 84 insertions(+), 10 deletions(-) diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 9b1fb9a..0079e28 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -515,6 +515,22 @@ dpdk_rte_mzalloc(size_t sz) return rte_zmalloc(OVS_VPORT_DPDK, sz, OVS_CACHE_LINE_SIZE); } +static struct rte_mbuf * +dpdk_buf_alloc(struct rte_mempool *mp) +{ + if (!dpdk_thread_is_pmd()) { + ovs_mutex_lock(&nonpmd_mp_mutex); + } + + struct rte_mbuf *mbuf = rte_pktmbuf_alloc(mp); + + if (!dpdk_thread_is_pmd()) { + ovs_mutex_unlock(&nonpmd_mp_mutex); + } + + return mbuf; +} + void free_dpdk_buf(struct dp_packet *p) { @@ -2167,6 +2183,71 @@ out: } } +static int +dpdk_prep_tx_buf(struct dp_packet *packet, struct rte_mbuf **head, + struct rte_mempool *mp) +{ + struct rte_mbuf *temp; + uint32_t size = dp_packet_size(packet); + uint16_t max_data_len, data_len; + uint32_t nb_segs = 0; + int i; + + temp = *head = dpdk_buf_alloc(mp); + if (OVS_UNLIKELY(!temp)) { + return 1; + } + + /* All new allocated mbuf's max data len is the same */ + max_data_len = temp->buf_len - temp->data_off; + + /* Calculate # of output mbufs. */ + nb_segs = size / max_data_len; + if (size % max_data_len) { + nb_segs = nb_segs + 1; + } + + /* Allocate additional mbufs when multiple output mbufs required. */ + for (i = 1; i < nb_segs; i++) { + temp->next = dpdk_buf_alloc(mp); + if (!temp->next) { + free_dpdk_buf((struct dp_packet *) *head); + *head = NULL; + break; + } + temp = temp->next; + } + /* We have to do a copy for now */ + rte_pktmbuf_pkt_len(*head) = size; + temp = *head; + + data_len = size < max_data_len ? size: max_data_len; + if (packet->source == DPBUF_DPDK) { + *head = &(packet->mbuf); + while (temp && head && size > 0) { + rte_memcpy(rte_pktmbuf_mtod(temp, void *), + dp_packet_data((struct dp_packet *)head), data_len); + rte_pktmbuf_data_len(temp) = data_len; + *head = (*head)->next; + size = size - data_len; + data_len = size < max_data_len ? size: max_data_len; + temp = temp->next; + } + } else { + int offset = 0; + while (temp && size > 0) { + memcpy(rte_pktmbuf_mtod(temp, void *), + dp_packet_at(packet, offset, data_len), data_len); + rte_pktmbuf_data_len(temp) = data_len; + temp = temp->next; + size = size - data_len; + offset += data_len; + data_len = size < max_data_len ? size: max_data_len; + } + } + return 0; +} + /* Tx function. Transmit packets indefinitely */ static void dpdk_do_tx_copy(struct netdev *netdev, int qid, struct dp_packet_batch *batch) @@ -2183,6 +2264,7 @@ dpdk_do_tx_copy(struct netdev *netdev, int qid, struct dp_packet_batch *batch) struct rte_mbuf *pkts[PKT_ARRAY_SIZE]; uint32_t cnt = batch_cnt; uint32_t dropped = 0; + uint32_t i; if (dev->type != DPDK_DEV_VHOST) { /* Check if QoS has been configured for this netdev. */ @@ -2193,27 +2275,19 @@ dpdk_do_tx_copy(struct netdev *netdev, int qid, struct dp_packet_batch *batch) uint32_t txcnt = 0; - for (uint32_t i = 0; i < cnt; i++) { + for (i = 0; i < cnt; i++) { struct dp_packet *packet = batch->packets[i]; uint32_t size = dp_packet_size(packet); - if (OVS_UNLIKELY(size > dev->max_packet_len)) { VLOG_WARN_RL(&rl, "Too big size %u max_packet_len %d", size, dev->max_packet_len); - dropped++; continue; } - - pkts[txcnt] = rte_pktmbuf_alloc(dev->mp); - if (OVS_UNLIKELY(!pkts[txcnt])) { + if (dpdk_prep_tx_buf(packet, &pkts[txcnt], dev->mp)) { dropped += cnt - i; break; } - - /* We have to do a copy for now */ - memcpy(rte_pktmbuf_mtod(pkts[txcnt], void *), - dp_packet_data(packet), size); dp_packet_set_size((struct dp_packet *)pkts[txcnt], size); dp_packet_copy_mbuf_flags((struct dp_packet *)pkts[txcnt], packet); From patchwork Mon Jun 11 16:21:29 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Lam, Tiago" X-Patchwork-Id: 927782 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=intel.com Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 414JNf1mKKz9rvt for ; Tue, 12 Jun 2018 02:28:34 +1000 (AEST) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id 2854A136C; Mon, 11 Jun 2018 16:22:20 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id 2D5FA105F for ; Mon, 11 Jun 2018 16:22:19 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 42E3F71C for ; Mon, 11 Jun 2018 16:22:18 +0000 (UTC) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga106.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 11 Jun 2018 09:22:18 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.49,502,1520924400"; d="scan'208";a="62210848" Received: from silpixa00399125.ir.intel.com ([10.237.223.34]) by fmsmga004.fm.intel.com with ESMTP; 11 Jun 2018 09:22:16 -0700 From: Tiago Lam To: ovs-dev@openvswitch.org Date: Mon, 11 Jun 2018 17:21:29 +0100 Message-Id: <1528734090-220990-14-git-send-email-tiago.lam@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1528734090-220990-1-git-send-email-tiago.lam@intel.com> References: <1528734090-220990-1-git-send-email-tiago.lam@intel.com> X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Cc: i.maximets@samsung.com Subject: [ovs-dev] [PATCH v8 12/13] netdev-dpdk: support multi-segment jumbo frames X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org From: Mark Kavanagh Currently, jumbo frame support for OvS-DPDK is implemented by increasing the size of mbufs within a mempool, such that each mbuf within the pool is large enough to contain an entire jumbo frame of a user-defined size. Typically, for each user-defined MTU, 'requested_mtu', a new mempool is created, containing mbufs of size ~requested_mtu. With the multi-segment approach, a port uses a single mempool, (containing standard/default-sized mbufs of ~2k bytes), irrespective of the user-requested MTU value. To accommodate jumbo frames, mbufs are chained together, where each mbuf in the chain stores a portion of the jumbo frame. Each mbuf in the chain is termed a segment, hence the name. == Enabling multi-segment mbufs == Multi-segment and single-segment mbufs are mutually exclusive, and the user must decide on which approach to adopt on init. The introduction of a new OVSDB field, 'dpdk-multi-seg-mbufs', facilitates this. This is a global boolean value, which determines how jumbo frames are represented across all DPDK ports. In the absence of a user-supplied value, 'dpdk-multi-seg-mbufs' defaults to false, i.e. multi-segment mbufs must be explicitly enabled / single-segment mbufs remain the default. Setting the field is identical to setting existing DPDK-specific OVSDB fields: ovs-vsctl set Open_vSwitch . other_config:dpdk-init=true ovs-vsctl set Open_vSwitch . other_config:dpdk-lcore-mask=0x10 ovs-vsctl set Open_vSwitch . other_config:dpdk-socket-mem=4096,0 ==> ovs-vsctl set Open_vSwitch . other_config:dpdk-multi-seg-mbufs=true Co-authored-by: Tiago Lam Signed-off-by: Mark Kavanagh Signed-off-by: Tiago Lam --- Documentation/topics/dpdk/jumbo-frames.rst | 41 +++++++++++++++++++++++++ NEWS | 1 + lib/dpdk.c | 8 +++++ lib/netdev-dpdk.c | 49 ++++++++++++++++++++++++++++-- lib/netdev-dpdk.h | 2 ++ vswitchd/vswitch.xml | 20 ++++++++++++ 6 files changed, 118 insertions(+), 3 deletions(-) diff --git a/Documentation/topics/dpdk/jumbo-frames.rst b/Documentation/topics/dpdk/jumbo-frames.rst index 00360b4..4f98e83 100644 --- a/Documentation/topics/dpdk/jumbo-frames.rst +++ b/Documentation/topics/dpdk/jumbo-frames.rst @@ -71,3 +71,44 @@ Jumbo frame support has been validated against 9728B frames, which is the largest frame size supported by Fortville NIC using the DPDK i40e driver, but larger frames and other DPDK NIC drivers may be supported. These cases are common for use cases involving East-West traffic only. + +------------------- +Multi-segment mbufs +------------------- + +Instead of increasing the size of mbufs within a mempool, such that each mbuf +within the pool is large enough to contain an entire jumbo frame of a +user-defined size, mbufs can be chained together instead. In this approach each +mbuf in the chain stores a portion of the jumbo frame, by default ~2K bytes, +irrespective of the user-requested MTU value. Since each mbuf in the chain is +termed a segment, this approach is named "multi-segment mbufs". + +This approach may bring more flexibility in use cases where the maximum packet +length may be hard to guess. For example, in cases where packets originate from +sources marked for oflload (such as TSO), each packet may be larger than the +MTU, and as such, when forwarding it to a DPDK port a single mbuf may not be +enough to hold all of the packet's data. + +Multi-segment and single-segment mbufs are mutually exclusive, and the user +must decide on which approach to adopt on initialisation. If multi-segment +mbufs is to be enabled, it can be done so with the following command:: + + $ ovs-vsctl set Open_vSwitch . other_config:dpdk-multi-seg-mbufs=true + +Single-segment mbufs still remain the default when using OvS-DPDK, and the +above option `dpdk-multi-seg-mbufs` must be explicitly set to `true` if +multi-segment mbufs are to be used. + +~~~~~~~~~~~~~~~~~ +Performance notes +~~~~~~~~~~~~~~~~~ + +When using multi-segment mbufs some PMDs may not support vectorized Tx +functions, due to its non-contiguous nature. As a result this can hit +performance for smaller packet sizes. For example, on a setup sending 64B +packets at line rate, a decrease of ~20% has been observed. The performance +impact stops being noticeable for larger packet sizes, although the exact size +will vary. + +Because of this, multi-segment mbufs is not advised to be used with smaller +packet sizes, such as 64B. diff --git a/NEWS b/NEWS index 484c6dc..7588cf0 100644 --- a/NEWS +++ b/NEWS @@ -110,6 +110,7 @@ v2.9.0 - 19 Feb 2018 pmd assignments. * Add rxq utilization of pmd to appctl 'dpif-netdev/pmd-rxq-show'. * Add support for vHost dequeue zero copy (experimental) + * Add support for multi-segment mbufs - Userspace datapath: * Output packet batching support. - vswitchd: diff --git a/lib/dpdk.c b/lib/dpdk.c index 09afd8c..7fbaf9f 100644 --- a/lib/dpdk.c +++ b/lib/dpdk.c @@ -465,6 +465,14 @@ dpdk_init__(const struct smap *ovs_other_config) /* Finally, register the dpdk classes */ netdev_dpdk_register(); + + bool multi_seg_mbufs_enable = smap_get_bool(ovs_other_config, + "dpdk-multi-seg-mbufs", false); + if (multi_seg_mbufs_enable) { + VLOG_INFO("DPDK multi-segment mbufs enabled\n"); + netdev_dpdk_multi_segment_mbufs_enable(); + } + return true; } diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 0079e28..fde95e3 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -66,6 +66,7 @@ enum {VIRTIO_RXQ, VIRTIO_TXQ, VIRTIO_QNUM}; VLOG_DEFINE_THIS_MODULE(netdev_dpdk); static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20); +static bool dpdk_multi_segment_mbufs = false; #define DPDK_PORT_WATCHDOG_INTERVAL 5 @@ -635,6 +636,7 @@ dpdk_mp_create(struct netdev_dpdk *dev, uint16_t mbuf_pkt_data_len) + dev->requested_n_txq * dev->requested_txq_size + MIN(RTE_MAX_LCORE, dev->requested_n_rxq) * NETDEV_MAX_BURST + MIN_NB_MBUF; + /* XXX: should n_mbufs be increased if multi-seg mbufs are used? */ ovs_mutex_lock(&dpdk_mp_mutex); do { @@ -734,7 +736,13 @@ dpdk_mp_release(struct rte_mempool *mp) /* Tries to allocate a new mempool - or re-use an existing one where * appropriate - on requested_socket_id with a size determined by - * requested_mtu and requested Rx/Tx queues. + * requested_mtu and requested Rx/Tx queues. Some properties of the mempool's + * elements are dependent on the value of 'dpdk_multi_segment_mbufs': + * - if 'true', then the mempool contains standard-sized mbufs that are chained + * together to accommodate packets of size 'requested_mtu'. + * - if 'false', then the members of the allocated mempool are + * non-standard-sized mbufs. Each mbuf in the mempool is large enough to + * fully accomdate packets of size 'requested_mtu'. * On success - or when re-using an existing mempool - the new configuration * will be applied. * On error, device will be left unchanged. */ @@ -742,10 +750,18 @@ static int netdev_dpdk_mempool_configure(struct netdev_dpdk *dev) OVS_REQUIRES(dev->mutex) { - uint16_t buf_size = dpdk_buf_size(dev->requested_mtu); + uint16_t buf_size = 0; struct rte_mempool *mp; int ret = 0; + /* Contiguous mbufs in use - permit oversized mbufs */ + if (!dpdk_multi_segment_mbufs) { + buf_size = dpdk_buf_size(dev->requested_mtu); + } else { + /* multi-segment mbufs - use standard mbuf size */ + buf_size = dpdk_buf_size(ETHER_MTU); + } + dpdk_mp_sweep(); mp = dpdk_mp_create(dev, buf_size); @@ -827,6 +843,7 @@ dpdk_eth_dev_port_config(struct netdev_dpdk *dev, int n_rxq, int n_txq) int diag = 0; int i; struct rte_eth_conf conf = port_conf; + struct rte_eth_txconf txconf; struct rte_eth_dev_info info; /* As of DPDK 17.11.1 a few PMDs require to explicitly enable @@ -842,6 +859,18 @@ dpdk_eth_dev_port_config(struct netdev_dpdk *dev, int n_rxq, int n_txq) } } + /* Multi-segment-mbuf-specific setup. */ + if (dpdk_multi_segment_mbufs) { + /* DPDK PMDs typically attempt to use simple or vectorized + * transmit functions, neither of which are compatible with + * multi-segment mbufs. Ensure that these are disabled when + * multi-segment mbufs are enabled. + */ + rte_eth_dev_info_get(dev->port_id, &info); + txconf = info.default_txconf; + txconf.txq_flags &= ~ETH_TXQ_FLAGS_NOMULTSEGS; + } + conf.intr_conf.lsc = dev->lsc_interrupt_mode; conf.rxmode.hw_ip_checksum = (dev->hw_ol_features & NETDEV_RX_CHECKSUM_OFFLOAD) != 0; @@ -871,7 +900,9 @@ dpdk_eth_dev_port_config(struct netdev_dpdk *dev, int n_rxq, int n_txq) for (i = 0; i < n_txq; i++) { diag = rte_eth_tx_queue_setup(dev->port_id, i, dev->txq_size, - dev->socket_id, NULL); + dev->socket_id, + dpdk_multi_segment_mbufs ? &txconf + : NULL); if (diag) { VLOG_INFO("Interface %s unable to setup txq(%d): %s", dev->up.name, i, rte_strerror(-diag)); @@ -3968,6 +3999,18 @@ unlock: return err; } +bool +netdev_dpdk_is_multi_segment_mbufs_enabled(void) +{ + return dpdk_multi_segment_mbufs == true; +} + +void +netdev_dpdk_multi_segment_mbufs_enable(void) +{ + dpdk_multi_segment_mbufs = true; +} + #define NETDEV_DPDK_CLASS(NAME, INIT, CONSTRUCT, DESTRUCT, \ SET_CONFIG, SET_TX_MULTIQ, SEND, \ GET_CARRIER, GET_STATS, \ diff --git a/lib/netdev-dpdk.h b/lib/netdev-dpdk.h index b7d02a7..19aa5c6 100644 --- a/lib/netdev-dpdk.h +++ b/lib/netdev-dpdk.h @@ -25,6 +25,8 @@ struct dp_packet; #ifdef DPDK_NETDEV +bool netdev_dpdk_is_multi_segment_mbufs_enabled(void); +void netdev_dpdk_multi_segment_mbufs_enable(void); void netdev_dpdk_register(void); void free_dpdk_buf(struct dp_packet *); diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml index 1e27a02..cbe4650 100644 --- a/vswitchd/vswitch.xml +++ b/vswitchd/vswitch.xml @@ -331,6 +331,26 @@

+ +

+ Specifies if DPDK uses multi-segment mbufs for handling jumbo frames. +

+

+ If true, DPDK allocates a single mempool per port, irrespective + of the ports' requested MTU sizes. The elements of this mempool are + 'standard'-sized mbufs (typically 2k MB), which may be chained + together to accommodate jumbo frames. In this approach, each mbuf + typically stores a fragment of the overall jumbo frame. +

+

+ If not specified, defaults to false, in which case, + the size of each mbuf within a DPDK port's mempool will be grown to + accommodate jumbo frames within a single mbuf. +

+
+ +

From patchwork Mon Jun 11 16:21:30 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Lam, Tiago" X-Patchwork-Id: 927783 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=intel.com Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 414JPG41x3z9rvt for ; Tue, 12 Jun 2018 02:29:06 +1000 (AEST) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id E87EC137B; Mon, 11 Jun 2018 16:22:22 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id DD061137A for ; Mon, 11 Jun 2018 16:22:21 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 7BD0D718 for ; Mon, 11 Jun 2018 16:22:20 +0000 (UTC) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga106.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 11 Jun 2018 09:22:20 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.49,502,1520924400"; d="scan'208";a="62210856" Received: from silpixa00399125.ir.intel.com ([10.237.223.34]) by fmsmga004.fm.intel.com with ESMTP; 11 Jun 2018 09:22:18 -0700 From: Tiago Lam To: ovs-dev@openvswitch.org Date: Mon, 11 Jun 2018 17:21:30 +0100 Message-Id: <1528734090-220990-15-git-send-email-tiago.lam@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1528734090-220990-1-git-send-email-tiago.lam@intel.com> References: <1528734090-220990-1-git-send-email-tiago.lam@intel.com> X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Cc: i.maximets@samsung.com Subject: [ovs-dev] [PATCH v8 13/13] dpdk-tests: Add test coverage for multi-seg mbufs. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org In order to create a minimal environment that allows the tests to get mbufs from an existing mempool, the following approach is taken: - EAL is initialised (by using the main dpdk_init()) and a (very) small mempool is instantiated (mimicking the logic in dpdk_mp_create()). This mempool instance is global and used by all the tests; - Packets are then allocated from the instantiated mempool, and tested on, by running some operations on them and manipulating data. The tests introduced focus on testing DPDK dp_packets (where source=DPBUF_DPDK), linked with a single or multiple mbufs, across several operations, such as: - dp_packet_put(); - dp_packet_shift(); - dp_packet_reserve(); - dp_packet_push_uninit(); - dp_packet_clear(); - And as a consequence of some of these, dp_packet_put_uninit() and dp_packet_resize__(). Finally, this has also been integrated with the new DPDK testsuite. Thus, when running `$sudo make check-dpdk` one will also be running these tests. Signed-off-by: Tiago Lam --- tests/automake.mk | 10 +- tests/dpdk-packet-mbufs.at | 7 + tests/system-dpdk-testsuite.at | 1 + tests/test-dpdk-mbufs.c | 518 +++++++++++++++++++++++++++++++++++++++++ 4 files changed, 535 insertions(+), 1 deletion(-) create mode 100644 tests/dpdk-packet-mbufs.at create mode 100644 tests/test-dpdk-mbufs.c diff --git a/tests/automake.mk b/tests/automake.mk index c420b29..74841a8 100644 --- a/tests/automake.mk +++ b/tests/automake.mk @@ -134,7 +134,8 @@ SYSTEM_DPDK_TESTSUITE_AT = \ tests/system-common-macros.at \ tests/system-dpdk-macros.at \ tests/system-dpdk-testsuite.at \ - tests/system-dpdk.at + tests/system-dpdk.at \ + tests/dpdk-packet-mbufs.at check_SCRIPTS += tests/atlocal @@ -384,6 +385,10 @@ tests_ovstest_SOURCES = \ tests/test-vconn.c \ tests/test-aa.c \ tests/test-stopwatch.c +if DPDK_NETDEV +tests_ovstest_SOURCES = \ + tests/test-dpdk-mbufs.c +endif if !WIN32 tests_ovstest_SOURCES += \ @@ -396,6 +401,9 @@ tests_ovstest_SOURCES += \ endif tests_ovstest_LDADD = lib/libopenvswitch.la ovn/lib/libovn.la +if DPDK_NETDEV +tests_ovstest_LDFLAGS = $(AM_LDFLAGS) $(DPDK_vswitchd_LDFLAGS) +endif noinst_PROGRAMS += tests/test-strtok_r tests_test_strtok_r_SOURCES = tests/test-strtok_r.c diff --git a/tests/dpdk-packet-mbufs.at b/tests/dpdk-packet-mbufs.at new file mode 100644 index 0000000..f28e4fc --- /dev/null +++ b/tests/dpdk-packet-mbufs.at @@ -0,0 +1,7 @@ +AT_BANNER([OVS-DPDK dp_packet unit tests]) + +AT_SETUP([OVS-DPDK dp_packet - mbufs allocation]) +AT_KEYWORDS([dp_packet, multi-seg, mbufs]) +AT_CHECK(ovstest test-dpdk-packet, [], [ignore], [ignore]) + +AT_CLEANUP diff --git a/tests/system-dpdk-testsuite.at b/tests/system-dpdk-testsuite.at index 382f09e..f5edf58 100644 --- a/tests/system-dpdk-testsuite.at +++ b/tests/system-dpdk-testsuite.at @@ -23,3 +23,4 @@ m4_include([tests/system-common-macros.at]) m4_include([tests/system-dpdk-macros.at]) m4_include([tests/system-dpdk.at]) +m4_include([tests/dpdk-packet-mbufs.at]) diff --git a/tests/test-dpdk-mbufs.c b/tests/test-dpdk-mbufs.c new file mode 100644 index 0000000..6c3bfdc --- /dev/null +++ b/tests/test-dpdk-mbufs.c @@ -0,0 +1,518 @@ +/* + * Copyright (c) 2018 Intel Corporation + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include "dp-packet.h" +#include "ovstest.h" +#include "dpdk.h" +#include "smap.h" + +#define N_MBUFS 1024 +#define MBUF_DATA_LEN 2048 + +int num_tests = 0; + +/* Global var to hold a mempool instance, "test-mp", used in all of the tests + * below. This instance is instantiated in dpdk_setup_eal_with_mp(). */ +static struct rte_mempool *mp; + +/* Test data used to fill the packets with data. Note that this isn't a string + * that repsents a valid packet, by any means. The pattern is generated in set_ + * testing_pattern_str() and the sole purpose is to verify the data remains the + * same after inserting and operating on multi-segment mbufs. */ +static char *test_str; + +/* Asserts a dp_packet that holds a single mbuf, where: + * - nb_segs must be 1; + * - pkt_len must be equal to data_len which in turn must equal the provided + * 'pkt_len'; + * - data_off must start at the provided 'data_ofs'; + * - next must be NULL. */ +static void +assert_single_mbuf(struct dp_packet *pkt, uint16_t data_ofs, + uint32_t pkt_len) { + struct rte_mbuf *mbuf = CONST_CAST(struct rte_mbuf *, &pkt->mbuf); + ovs_assert(mbuf->nb_segs == 1); + ovs_assert(mbuf->data_off == data_ofs); + ovs_assert(mbuf->pkt_len == mbuf->data_len); + ovs_assert(mbuf->pkt_len == pkt_len); + ovs_assert(mbuf->next == NULL); +} + +/* Asserts a dp_packet that holds multiple mbufs, where: + * - nb_segs must be > 1 and equal to the provided 'nb_segs'; + * - data_off must start at the provided 'data_ofs'; + * - pkt_len must be equal to the provided 'pkt_len' and the some of each + * mbufs' 'data_len' must equal the pky_len; + * - next must not be NULL. */ +static void +assert_multiple_mbufs(struct dp_packet *pkt, uint16_t data_ofs, + uint32_t pkt_len, uint16_t nb_segs) { + struct rte_mbuf *mbuf = CONST_CAST(struct rte_mbuf *, &pkt->mbuf); + ovs_assert(mbuf->nb_segs > 1 && mbuf->nb_segs == nb_segs); + ovs_assert(mbuf->data_off == data_ofs); + ovs_assert(mbuf->pkt_len != mbuf->data_len); + ovs_assert(mbuf->next != NULL); + ovs_assert(mbuf->pkt_len == pkt_len); + /* Make sure pkt_len equals the sum of all segments data_len */ + while (mbuf) { + pkt_len -= rte_pktmbuf_data_len(mbuf); + mbuf = mbuf->next; + } + ovs_assert(pkt_len == 0); +} + +/* Asserts that the data existing in a packet, starting at 'data_ofs' of the + * first mbuf and of length 'data_len' matches the global test_str used, + * starting at index 0 and of the same length. */ +static void +assert_data(struct dp_packet *pkt, uint16_t data_ofs, uint16_t data_len) { + struct rte_mbuf *mbuf = CONST_CAST(struct rte_mbuf *, &pkt->mbuf); + + char data[data_len]; + const char *rd = rte_pktmbuf_read(mbuf, data_ofs, data_len, data); + + ovs_assert(rd != NULL); + ovs_assert(memcmp(rd, test_str, data_len) == 0); +} + +static void +set_testing_pattern_str(void) { + static const char *pattern = "1234567890"; + + /* Pattern will be of size 5000B */ + size_t test_str_len = 5000; + test_str = xmalloc(test_str_len * sizeof(*test_str) + 1); + + for (int i = 0; i < test_str_len; i += strlen(pattern)) { + memcpy(test_str + i, pattern, strlen(pattern)); + } + + test_str[test_str_len] = 0; +} + +static void +dpdk_eal_init(void) { + struct smap other_config; + smap_init(&other_config); + + printf("Initialising EAL...\n"); + smap_add(&other_config, "dpdk-init", "true"); + smap_add(&other_config, "dpdk-lcore-mask", "10"); + smap_add(&other_config, "dpdk-socket-mem", "2048,0"); + smap_add(&other_config, "dpdk-multi-seg-mbufs", "true"); + + dpdk_init(&other_config); +} + +/* The allocation of mbufs here mimics the logic in dpdk_mp_create in + * netdev-dpdk.c. */ +static struct rte_mempool * +dpdk_mp_create(char *mp_name) { + uint16_t mbuf_size, aligned_mbuf_size, mbuf_priv_data_len; + + mbuf_size = sizeof (struct dp_packet) + + MBUF_DATA_LEN + RTE_PKTMBUF_HEADROOM; + aligned_mbuf_size = ROUND_UP(mbuf_size, RTE_CACHE_LINE_SIZE); + mbuf_priv_data_len = sizeof(struct dp_packet) - sizeof(struct rte_mbuf) + + (aligned_mbuf_size - mbuf_size); + + struct rte_mempool *mpool = rte_pktmbuf_pool_create( + mp_name, N_MBUFS, + RTE_MEMPOOL_CACHE_MAX_SIZE, + mbuf_priv_data_len, + MBUF_DATA_LEN + + RTE_PKTMBUF_HEADROOM /* defaults 128B */, + SOCKET_ID_ANY); + if (mpool) { + printf("Allocated \"%s\" mempool with %u mbufs\n", mp_name, N_MBUFS); + } else { + printf("Failed mempool \"%s\" create request of %u mbufs: %s.\n", + mp_name, N_MBUFS, rte_strerror(rte_errno)); + + ovs_assert(mpool != NULL); + } + + return mpool; +} + +static void +dpdk_setup_eal_with_mp(void) { + dpdk_eal_init(); + + mp = dpdk_mp_create("test-mp"); + ovs_assert(mp != NULL); +} + +static struct dp_packet * +dpdk_mp_alloc_pkt(struct rte_mempool *mpool) { + struct rte_mbuf *mbuf = rte_pktmbuf_alloc(mpool); + + struct dp_packet *pkt = (struct dp_packet *) mbuf; + pkt->source = DPBUF_DPDK; + + return pkt; +} + +/* Similar to dp_packet_put() in dp-packet.c, appends the 'size' bytes of data + * in 'p' to the tail end of 'pkt', allocating new mbufs if needed. */ +static struct dp_packet * +dpdk_pkt_put(struct dp_packet *pkt, void *p, size_t size) { + uint16_t max_data_len, nb_segs; + struct rte_mbuf *mbuf, *fmbuf; + + mbuf = CONST_CAST(struct rte_mbuf *, &pkt->mbuf); + + /* All new allocated mbuf's max data len is the same */ + max_data_len = mbuf->buf_len - mbuf->data_off; + + /* Calculate # of needed mbufs to accomodate 'miss_len' */ + nb_segs = size / max_data_len; + if (size % max_data_len) { + nb_segs += 1; + } + + /* Proceed with the allocation of new mbufs */ + mp = mbuf->pool; + fmbuf = mbuf; + mbuf = rte_pktmbuf_lastseg(mbuf); + + for (int i = 0; i < nb_segs; i++) { + /* This takes care of initialising buf_len, data_len and other + * fields properly */ + mbuf->next = rte_pktmbuf_alloc(mp); + if (!mbuf->next) { + printf("Problem allocating more mbufs for tests.\n"); + rte_pktmbuf_free(mbuf); + fmbuf = NULL; + return NULL; + } + + fmbuf->nb_segs += 1; + + mbuf = mbuf->next; + } + + dp_packet_mbuf_write(fmbuf, 0, size, p); + + /* Adjust size of intermediate mbufs from current tail to end */ + size_t pkt_len = size; + while (fmbuf && pkt_len > 0) { + fmbuf->data_len = MIN(pkt_len, fmbuf->buf_len - fmbuf->data_off); + pkt_len -= fmbuf->data_len; + + fmbuf = fmbuf->next; + } + + dp_packet_set_size(pkt, size); + + return pkt; +} + +static int +test_dpdk_packet_insert_headroom(void) { + struct dp_packet *pkt = dpdk_mp_alloc_pkt(mp); + ovs_assert(pkt != NULL); + + /* Reserve 256B of header */ + size_t str_len = 512; + dp_packet_reserve(pkt, str_len); + char *p = dp_packet_push_uninit(pkt, str_len); + ovs_assert(p != NULL); + /* Put the first 512B of "test_str" in the allocated header */ + memcpy(p, test_str, str_len); + + /* Check properties and data are as expected */ + assert_single_mbuf(pkt, RTE_PKTMBUF_HEADROOM, str_len); + assert_data(pkt, 0, str_len); + + dp_packet_uninit(pkt); + + return 0; +} + +static int +test_dpdk_packet_insert_tailroom_and_headroom(void) { + struct dp_packet *pkt = dpdk_mp_alloc_pkt(mp); + ovs_assert(pkt != NULL); + + /* Put the first 512B of "test_str" in the packet's header */ + size_t str_len = 512; + char *p = dp_packet_put(pkt, test_str, str_len); + ovs_assert(p != NULL); + /* Allocate extra 256B of header */ + size_t head_len = 256; + p = dp_packet_push_uninit(pkt, head_len); + ovs_assert(p != NULL); + + /* Check properties and data are as expected */ + assert_single_mbuf(pkt, 0, str_len + head_len); + + /* Check the data inserted in the packet is correct */ + char data[str_len + head_len]; + const char *rd = rte_pktmbuf_read(&pkt->mbuf, 0, str_len + head_len, data); + ovs_assert(rd != NULL); + /* Because of the headroom inserted, the data now begin at offset 256 */ + ovs_assert(memcmp(rd + head_len, test_str, str_len) == 0); + + dp_packet_uninit(pkt); + + return 0; +} + +static int +test_dpdk_packet_insert_tailroom_and_headroom_multiple_mbufs(void) { + struct dp_packet *pkt = dpdk_mp_alloc_pkt(mp); + ovs_assert(pkt != NULL); + + /* Put the first 2050B of "test_str" in the packet, just enought to + * allocate two mbufs */ + size_t str_len = MBUF_DATA_LEN + 2; + pkt = dpdk_pkt_put(pkt, test_str, str_len); + ovs_assert(pkt != NULL); + + /* Put the first 512B of "test_str" in the packet's header */ + size_t tail_len = 512; + char *p = dp_packet_put(pkt, test_str, tail_len); + ovs_assert(p != NULL); + + /* Allocate extra 256B of header */ + size_t head_len = 256; + p = dp_packet_push_uninit(pkt, head_len); + ovs_assert(p != NULL); + /* Copy the data to the reserved headroom */ + memcpy(p, test_str, head_len); + + /* Check properties and data are as expected */ + size_t pkt_len = head_len + str_len + tail_len; + uint16_t nb_segs = 2; + assert_multiple_mbufs(pkt, 0, pkt_len, nb_segs); + + /* Check the data inserted in the packet is correct */ + char data[pkt_len]; + const char *rd = rte_pktmbuf_read(&pkt->mbuf, 0, pkt_len, data); + ovs_assert(rd != NULL); + ovs_assert(memcmp(rd, test_str, head_len) == 0); + ovs_assert(memcmp(rd + head_len + str_len, test_str, tail_len) == 0); + + dp_packet_uninit(pkt); + + return 0; +} + +static int +test_dpdk_packet_insert_tailroom_multiple_mbufs(void) { + struct dp_packet *pkt = dpdk_mp_alloc_pkt(mp); + ovs_assert(pkt != NULL); + + /* Put the first 2050B of "test_str" in the packet, just enought to + * allocate two mbufs */ + size_t str_len = MBUF_DATA_LEN + 2; + pkt = dpdk_pkt_put(pkt, test_str, str_len); + ovs_assert(pkt != NULL); + + /* Put the first 2000B of "test_str" in the packet's end */ + size_t tail_len = 2000; + char *p = dp_packet_put(pkt, test_str, tail_len); + ovs_assert(p != NULL); + + /* Check properties and data are as expected */ + char data[str_len + tail_len]; + const char *rd = rte_pktmbuf_read(&pkt->mbuf, 0, str_len + tail_len, data); + ovs_assert(rd != NULL); + /* Because of the headroom inserted, the data now begin at offset 256 */ + ovs_assert(memcmp(rd + str_len, test_str, tail_len) == 0); + + dp_packet_uninit(pkt); + + return 0; +} + +static int +test_dpdk_packet_insert_headroom_multiple_mbufs(void) { + struct dp_packet *pkt = dpdk_mp_alloc_pkt(mp); + ovs_assert(pkt != NULL); + + /* Put the first 2050B of "test_str" in the packet, just enought to + * allocate two mbufs */ + size_t str_len = MBUF_DATA_LEN + 2; + pkt = dpdk_pkt_put(pkt, test_str, str_len); + + /* Push the first 2000B of "test_str" in the packet's header */ + size_t head_len = 2000; + char *p = dp_packet_push_uninit(pkt, head_len); + ovs_assert(p != NULL); + + /* Check properties and data are as expected */ + char data[str_len + head_len]; + const char *rd = rte_pktmbuf_read(&pkt->mbuf, 0, str_len + head_len, data); + ovs_assert(rd != NULL); + /* Because of the headroom inserted, the data is at offset 'head_len' */ + ovs_assert(memcmp(rd + head_len, test_str, str_len) == 0); + + dp_packet_uninit(pkt); + + return 0; +} + +static int +test_dpdk_packet_change_size(void) { + struct dp_packet *pkt = dpdk_mp_alloc_pkt(mp); + ovs_assert(pkt != NULL); + + /* Put enough data in the packet that spans three mbufs (5120B) */ + size_t str_len = MBUF_DATA_LEN * 2 + 1024; + pkt = dpdk_pkt_put(pkt, test_str, str_len); + ovs_assert(pkt != NULL); + + /* Check properties and data are as expected */ + uint16_t nb_segs = 3; + assert_multiple_mbufs(pkt, RTE_PKTMBUF_HEADROOM, str_len, nb_segs); + + /* Change the size of the packet to fit in a single mbuf */ + dp_packet_clear(pkt); + + assert_single_mbuf(pkt, RTE_PKTMBUF_HEADROOM, 0); + + dp_packet_uninit(pkt); + + return 0; +} + +/* Shift() tests */ + +static int +test_dpdk_packet_shift_single_mbuf(void) { + struct dp_packet *pkt = dpdk_mp_alloc_pkt(mp); + ovs_assert(pkt != NULL); + + /* Put the first 1024B of "test_str" in the packet */ + size_t str_len = 1024; + char *p = dp_packet_put(pkt, test_str, str_len); + ovs_assert(p != NULL); + + /* Shift data right by 512B */ + uint16_t shift_len = 512; + dp_packet_shift(pkt, shift_len); + + /* Check properties and data are as expected */ + assert_single_mbuf(pkt, RTE_PKTMBUF_HEADROOM + shift_len, str_len); + assert_data(pkt, 0, str_len); + + dp_packet_uninit(pkt); + + return 0; +} + +static int +test_dpdk_packet_shift_multiple_mbufs(void) { + struct dp_packet *pkt = dpdk_mp_alloc_pkt(mp); + ovs_assert(pkt != NULL); + + /* Put the data in "test_str" in the packet */ + size_t str_len = strlen(test_str); + pkt = dpdk_pkt_put(pkt, test_str, str_len); + ovs_assert(pkt != NULL); + + /* Check properties and data are as expected */ + uint16_t nb_segs = 3; + assert_multiple_mbufs(pkt, RTE_PKTMBUF_HEADROOM, str_len, nb_segs); + + /* Shift data right by 1024B */ + uint16_t shift_len = 1024; + dp_packet_shift(pkt, shift_len); + + /* Check the data has been inserted correctly */ + assert_multiple_mbufs(pkt, RTE_PKTMBUF_HEADROOM + shift_len, str_len, + nb_segs); + assert_data(pkt, 0, str_len); + + dp_packet_uninit(pkt); + + return 0; +} + +static int +test_dpdk_packet_shift_right_then_left(void) { + struct dp_packet *pkt = dpdk_mp_alloc_pkt(mp); + ovs_assert(pkt != NULL); + + /* Put the first 1024B of "test_str" in the packet */ + size_t str_len = strlen(test_str); + pkt = dpdk_pkt_put(pkt, test_str, str_len); + ovs_assert(pkt != NULL); + + /* Shift data right by 1024B */ + int16_t shift_len = 1024; + dp_packet_shift(pkt, 1024); + + /* Check properties and data are as expected */ + uint16_t nb_segs = 3; + assert_multiple_mbufs(pkt, RTE_PKTMBUF_HEADROOM + shift_len, str_len, + nb_segs); + + /* Shift data left by 512B */ + dp_packet_shift(pkt, -shift_len); + + /* We negative shift_len (-shift_len) since */ + assert_multiple_mbufs(pkt, RTE_PKTMBUF_HEADROOM, str_len, + nb_segs); + assert_data(pkt, 0, str_len); + + dp_packet_uninit(pkt); + + return 0; +} + +static int +test_dpdk_packet(int argc OVS_UNUSED, char *argv[] OVS_UNUSED) +{ + /* Setup environment for tests */ + dpdk_setup_eal_with_mp(); + set_testing_pattern_str(); + + test_dpdk_packet_insert_headroom(); + num_tests++; + test_dpdk_packet_insert_tailroom_and_headroom(); + num_tests++; + test_dpdk_packet_insert_tailroom_multiple_mbufs(); + num_tests++; + test_dpdk_packet_insert_headroom_multiple_mbufs(); + num_tests++; + test_dpdk_packet_insert_tailroom_and_headroom_multiple_mbufs(); + num_tests++; + test_dpdk_packet_change_size(); + num_tests++; + test_dpdk_packet_shift_single_mbuf(); + num_tests++; + test_dpdk_packet_shift_multiple_mbufs(); + num_tests++; + test_dpdk_packet_shift_right_then_left(); + num_tests++; + + printf("Executed %d tests\n", num_tests); + + exit(0); +} + +OVSTEST_REGISTER("test-dpdk-packet", test_dpdk_packet);