From patchwork Tue Nov 21 18:29:10 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mark Kavanagh X-Patchwork-Id: 840148 X-Patchwork-Delegate: ian.stokes@intel.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3yhDdx37rnz9t31 for ; Wed, 22 Nov 2017 05:29:57 +1100 (AEDT) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id 36B80AE1; Tue, 21 Nov 2017 18:29:24 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id D4AEEAD2 for ; Tue, 21 Nov 2017 18:29:21 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 37A548A for ; Tue, 21 Nov 2017 18:29:21 +0000 (UTC) Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 21 Nov 2017 10:29:21 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.44,432,1505804400"; d="scan'208";a="7930083" Received: from silpixa00380299.ir.intel.com ([10.237.222.17]) by orsmga001.jf.intel.com with ESMTP; 21 Nov 2017 10:29:19 -0800 From: Mark Kavanagh To: dev@openvswitch.org, qiudayu@chinac.com Date: Tue, 21 Nov 2017 18:29:10 +0000 Message-Id: <1511288957-68599-2-git-send-email-mark.b.kavanagh@intel.com> X-Mailer: git-send-email 1.9.3 In-Reply-To: <1511288957-68599-1-git-send-email-mark.b.kavanagh@intel.com> References: <1511288957-68599-1-git-send-email-mark.b.kavanagh@intel.com> X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, T_RP_MATCHES_RCVD autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Subject: [ovs-dev] [RFC PATCH v3 1/8] netdev-dpdk: simplify mbuf sizing X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org When calculating the mbuf data_room_size (i.e. the size of actual packet data that an mbuf can accomodate), it is possible to simply use the value calculated by dpdk_buf_size() as a parameter to rte_pktmbuf_pool_create(). This simplifies mbuf sizing considerably. This patch removes the related size conversions and macros, which are no longer needed. The benefits of this approach are threefold: - the mbuf sizing code is much simpler, and more readable. - mbuf size will always be cache-aligned [1], satisfying that requirement of specific PMDs (vNIC thunderx, for example). - the maximum amount of data that each mbuf contains may now be calculated as mbuf->buf_len - mbuf->data_off. This is important in the case of multi-segment jumbo frames. [1] (this is true since mbuf size is now always a multiple of 1024, + 128B RTE_PKTMBUF_HEADROOM + 704B dp_packet). Fixes: 4be4d22 ("netdev-dpdk: clean up mbuf initialization") Fixes: 31b88c9 ("netdev-dpdk: round up mbuf_size to cache_line_size") Signed-off-by: Mark Kavanagh --- lib/netdev-dpdk.c | 22 ++++++++-------------- 1 file changed, 8 insertions(+), 14 deletions(-) diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 8906423..c5eb851 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -81,12 +81,6 @@ static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20); + (2 * VLAN_HEADER_LEN)) #define MTU_TO_FRAME_LEN(mtu) ((mtu) + ETHER_HDR_LEN + ETHER_CRC_LEN) #define MTU_TO_MAX_FRAME_LEN(mtu) ((mtu) + ETHER_HDR_MAX_LEN) -#define FRAME_LEN_TO_MTU(frame_len) ((frame_len) \ - - ETHER_HDR_LEN - ETHER_CRC_LEN) -#define MBUF_SIZE(mtu) ROUND_UP((MTU_TO_MAX_FRAME_LEN(mtu) \ - + sizeof(struct dp_packet) \ - + RTE_PKTMBUF_HEADROOM), \ - RTE_CACHE_LINE_SIZE) #define NETDEV_DPDK_MBUF_ALIGN 1024 #define NETDEV_DPDK_MAX_PKT_LEN 9728 @@ -447,7 +441,7 @@ is_dpdk_class(const struct netdev_class *class) * behaviour, which reduces performance. To prevent this, use a buffer size * that is closest to 'mtu', but which satisfies the aforementioned criteria. */ -static uint32_t +static uint16_t dpdk_buf_size(int mtu) { return ROUND_UP((MTU_TO_MAX_FRAME_LEN(mtu) + RTE_PKTMBUF_HEADROOM), @@ -486,7 +480,7 @@ ovs_rte_pktmbuf_init(struct rte_mempool *mp OVS_UNUSED, * - a new mempool was just created; * - a matching mempool already exists. */ static struct rte_mempool * -dpdk_mp_create(struct netdev_dpdk *dev, int mtu) +dpdk_mp_create(struct netdev_dpdk *dev, uint16_t frame_len) { char mp_name[RTE_MEMPOOL_NAMESIZE]; const char *netdev_name = netdev_get_name(&dev->up); @@ -513,12 +507,12 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu) * longer than RTE_MEMPOOL_NAMESIZE. */ int ret = snprintf(mp_name, RTE_MEMPOOL_NAMESIZE, "ovs%08x%02d%05d%07u", - hash, socket_id, mtu, n_mbufs); + hash, socket_id, frame_len, n_mbufs); if (ret < 0 || ret >= RTE_MEMPOOL_NAMESIZE) { VLOG_DBG("snprintf returned %d. " "Failed to generate a mempool name for \"%s\". " - "Hash:0x%x, socket_id: %d, mtu:%d, mbufs:%u.", - ret, netdev_name, hash, socket_id, mtu, n_mbufs); + "Hash:0x%x, socket_id: %d, frame length:%d, mbufs:%u.", + ret, netdev_name, hash, socket_id, frame_len, n_mbufs); break; } @@ -529,7 +523,7 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu) mp = rte_pktmbuf_pool_create(mp_name, n_mbufs, MP_CACHE_SZ, sizeof (struct dp_packet) - sizeof (struct rte_mbuf), - MBUF_SIZE(mtu) - sizeof(struct dp_packet), socket_id); + frame_len + RTE_PKTMBUF_HEADROOM, socket_id); if (mp) { VLOG_DBG("Allocated \"%s\" mempool with %u mbufs", @@ -582,11 +576,11 @@ static int netdev_dpdk_mempool_configure(struct netdev_dpdk *dev) OVS_REQUIRES(dev->mutex) { - uint32_t buf_size = dpdk_buf_size(dev->requested_mtu); + uint16_t buf_size = dpdk_buf_size(dev->requested_mtu); struct rte_mempool *mp; int ret = 0; - mp = dpdk_mp_create(dev, FRAME_LEN_TO_MTU(buf_size)); + mp = dpdk_mp_create(dev, buf_size); if (!mp) { VLOG_ERR("Failed to create memory pool for netdev " "%s, with MTU %d on socket %d: %s\n", From patchwork Tue Nov 21 18:29:11 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mark Kavanagh X-Patchwork-Id: 840149 X-Patchwork-Delegate: ian.stokes@intel.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3yhDgB50LZz9t8T for ; Wed, 22 Nov 2017 05:30:36 +1100 (AEDT) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id 1483CB0A; Tue, 21 Nov 2017 18:29:25 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id 765EBAB6 for ; Tue, 21 Nov 2017 18:29:23 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id E4812478 for ; Tue, 21 Nov 2017 18:29:22 +0000 (UTC) Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 21 Nov 2017 10:29:22 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.44,432,1505804400"; d="scan'208";a="7930093" Received: from silpixa00380299.ir.intel.com ([10.237.222.17]) by orsmga001.jf.intel.com with ESMTP; 21 Nov 2017 10:29:21 -0800 From: Mark Kavanagh To: dev@openvswitch.org, qiudayu@chinac.com Date: Tue, 21 Nov 2017 18:29:11 +0000 Message-Id: <1511288957-68599-3-git-send-email-mark.b.kavanagh@intel.com> X-Mailer: git-send-email 1.9.3 In-Reply-To: <1511288957-68599-1-git-send-email-mark.b.kavanagh@intel.com> References: <1511288957-68599-1-git-send-email-mark.b.kavanagh@intel.com> X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, T_RP_MATCHES_RCVD autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Subject: [ovs-dev] [RFC PATCH v3 2/8] lib/dp-packet: init specific mbuf fields to 0 X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org dp_packets are created using xmalloc(); in the case of OvS-DPDK, it's possible the the resultant mbuf portion of the dp_packet contains random data. For some mbuf fields, specifically those related to multi-segment mbufs and/or offload features, random values may cause unexpected behaviour, should the dp_packet's contents be later copied to a DPDK mbuf. It is critical therefore, that these fields should be initialized to 0. This patch ensures that the following mbuf fields are initialized to 0, on creation of a new dp_packet: - ol_flags - nb_segs - tx_offload - packet_type Adapted from an idea by Michael Qiu : https://patchwork.ozlabs.org/patch/777570/ Signed-off-by: Mark Kavanagh Acked-by: Michael Qiu --- lib/dp-packet.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/lib/dp-packet.h b/lib/dp-packet.h index b4b721c..7aa440f 100644 --- a/lib/dp-packet.h +++ b/lib/dp-packet.h @@ -626,13 +626,13 @@ dp_packet_mbuf_rss_flag_reset(struct dp_packet *p OVS_UNUSED) /* This initialization is needed for packets that do not come * from DPDK interfaces, when vswitchd is built with --with-dpdk. - * The DPDK rte library will still otherwise manage the mbuf. - * We only need to initialize the mbuf ol_flags. */ + * The DPDK rte library will still otherwise manage the mbuf. */ static inline void dp_packet_mbuf_init(struct dp_packet *p OVS_UNUSED) { #ifdef DPDK_NETDEV - p->mbuf.ol_flags = 0; + struct rte_mbuf *mbuf = &(p->mbuf); + mbuf->ol_flags = mbuf->nb_segs = mbuf->tx_offload = mbuf->packet_type = 0; #endif } From patchwork Tue Nov 21 18:29:12 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mark Kavanagh X-Patchwork-Id: 840150 X-Patchwork-Delegate: ian.stokes@intel.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3yhDgR3k3tz9t31 for ; Wed, 22 Nov 2017 05:31:15 +1100 (AEDT) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id E0EB1B9E; Tue, 21 Nov 2017 18:29:26 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id 01E00B08 for ; Tue, 21 Nov 2017 18:29:25 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id ABF488A for ; Tue, 21 Nov 2017 18:29:24 +0000 (UTC) Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 21 Nov 2017 10:29:24 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.44,432,1505804400"; d="scan'208";a="7930106" Received: from silpixa00380299.ir.intel.com ([10.237.222.17]) by orsmga001.jf.intel.com with ESMTP; 21 Nov 2017 10:29:23 -0800 From: Mark Kavanagh To: dev@openvswitch.org, qiudayu@chinac.com Date: Tue, 21 Nov 2017 18:29:12 +0000 Message-Id: <1511288957-68599-4-git-send-email-mark.b.kavanagh@intel.com> X-Mailer: git-send-email 1.9.3 In-Reply-To: <1511288957-68599-1-git-send-email-mark.b.kavanagh@intel.com> References: <1511288957-68599-1-git-send-email-mark.b.kavanagh@intel.com> X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, T_RP_MATCHES_RCVD autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Subject: [ovs-dev] [RFC PATCH v3 3/8] lib/dp-packet: copy mbuf info for packet copy X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org From: Michael Qiu Currently, when doing packet copy, lots of DPDK mbuf's info will be missed, like packet type, ol_flags, etc. Those information is very important for DPDK to do packets processing. Signed-off-by: Michael Qiu [mark.b.kavanagh@intel.com rebased] Signed-off-by: Mark Kavanagh --- lib/dp-packet.c | 3 +++ lib/netdev-dpdk.c | 4 ++++ 2 files changed, 7 insertions(+) diff --git a/lib/dp-packet.c b/lib/dp-packet.c index 443c225..5078211 100644 --- a/lib/dp-packet.c +++ b/lib/dp-packet.c @@ -178,6 +178,9 @@ dp_packet_clone_with_headroom(const struct dp_packet *buffer, size_t headroom) #ifdef DPDK_NETDEV new_buffer->mbuf.ol_flags = buffer->mbuf.ol_flags; + new_buffer->mbuf.tx_offload = buffer->mbuf.tx_offload; + new_buffer->mbuf.packet_type = buffer->mbuf.packet_type; + new_buffer->mbuf.nb_segs = buffer->mbuf.nb_segs; #else new_buffer->rss_hash_valid = buffer->rss_hash_valid; #endif diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index c5eb851..61a0dca 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -1860,6 +1860,10 @@ dpdk_do_tx_copy(struct netdev *netdev, int qid, struct dp_packet_batch *batch) memcpy(rte_pktmbuf_mtod(pkts[txcnt], void *), dp_packet_data(packet), size); dp_packet_set_size((struct dp_packet *)pkts[txcnt], size); + pkts[txcnt]->nb_segs = packet->mbuf.nb_segs; + pkts[txcnt]->ol_flags = packet->mbuf.ol_flags; + pkts[txcnt]->packet_type = packet->mbuf.packet_type; + pkts[txcnt]->tx_offload = packet->mbuf.tx_offload; txcnt++; } From patchwork Tue Nov 21 18:29:13 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mark Kavanagh X-Patchwork-Id: 840151 X-Patchwork-Delegate: ian.stokes@intel.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3yhDh122dNz9t2f for ; Wed, 22 Nov 2017 05:31:45 +1100 (AEDT) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id C75DFBC1; Tue, 21 Nov 2017 18:29:28 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id 5BBC9BBD for ; Tue, 21 Nov 2017 18:29:27 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id EAC214E9 for ; Tue, 21 Nov 2017 18:29:26 +0000 (UTC) Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 21 Nov 2017 10:29:26 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.44,432,1505804400"; d="scan'208";a="7930112" Received: from silpixa00380299.ir.intel.com ([10.237.222.17]) by orsmga001.jf.intel.com with ESMTP; 21 Nov 2017 10:29:24 -0800 From: Mark Kavanagh To: dev@openvswitch.org, qiudayu@chinac.com Date: Tue, 21 Nov 2017 18:29:13 +0000 Message-Id: <1511288957-68599-5-git-send-email-mark.b.kavanagh@intel.com> X-Mailer: git-send-email 1.9.3 In-Reply-To: <1511288957-68599-1-git-send-email-mark.b.kavanagh@intel.com> References: <1511288957-68599-1-git-send-email-mark.b.kavanagh@intel.com> X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, T_RP_MATCHES_RCVD autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Cc: Marcin Ksiadz , Przemyslaw Lal Subject: [ovs-dev] [RFC PATCH v3 4/8] lib/dp-packet: Fix data_len issue with multi-segs X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org From: Michael Qiu When a packet is from DPDK source, and it contains multiple segments, data_len is not equal to the packet size. This patch fixes this issue. Co-authored-by: Mark Kavanagh Co-authored-by: Przemyslaw Lal Co-authored-by: Marcin Ksiadz Co-authored-by: Yuanhan Liu Signed-off-by: Michael Qiu Signed-off-by: Mark Kavanagh Signed-off-by: Przemyslaw Lal Signed-off-by: Marcin Ksiadz Signed-off-by: Yuanhan Liu --- lib/dp-packet.h | 18 ++++++++---------- 1 file changed, 8 insertions(+), 10 deletions(-) diff --git a/lib/dp-packet.h b/lib/dp-packet.h index 7aa440f..c2736d3 100644 --- a/lib/dp-packet.h +++ b/lib/dp-packet.h @@ -23,6 +23,7 @@ #ifdef DPDK_NETDEV #include #include +#include "rte_ether.h" #endif #include "netdev-dpdk.h" @@ -429,17 +430,14 @@ dp_packet_size(const struct dp_packet *b) static inline void dp_packet_set_size(struct dp_packet *b, uint32_t v) { - /* netdev-dpdk does not currently support segmentation; consequently, for - * all intents and purposes, 'data_len' (16 bit) and 'pkt_len' (32 bit) may - * be used interchangably. - * - * On the datapath, it is expected that the size of packets - * (and thus 'v') will always be <= UINT16_MAX; this means that there is no - * loss of accuracy in assigning 'v' to 'data_len'. + /* + * Assign current segment length. If total length is greater than + * max data length in a segment, additional calculation is needed */ - b->mbuf.data_len = (uint16_t)v; /* Current seg length. */ - b->mbuf.pkt_len = v; /* Total length of all segments linked to - * this segment. */ + b->mbuf.data_len = MIN(v, b->mbuf.buf_len - b->mbuf.data_off); + + /* Total length of all segments linked to this segment. */ + b->mbuf.pkt_len = v; } static inline uint16_t From patchwork Tue Nov 21 18:29:14 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mark Kavanagh X-Patchwork-Id: 840152 X-Patchwork-Delegate: ian.stokes@intel.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3yhDhd64gSz9sRW for ; Wed, 22 Nov 2017 05:32:17 +1100 (AEDT) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id C7CDFBD5; Tue, 21 Nov 2017 18:29:30 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id EDF80BC5 for ; Tue, 21 Nov 2017 18:29:28 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id A18414E9 for ; Tue, 21 Nov 2017 18:29:28 +0000 (UTC) Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 21 Nov 2017 10:29:28 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.44,432,1505804400"; d="scan'208";a="7930127" Received: from silpixa00380299.ir.intel.com ([10.237.222.17]) by orsmga001.jf.intel.com with ESMTP; 21 Nov 2017 10:29:27 -0800 From: Mark Kavanagh To: dev@openvswitch.org, qiudayu@chinac.com Date: Tue, 21 Nov 2017 18:29:14 +0000 Message-Id: <1511288957-68599-6-git-send-email-mark.b.kavanagh@intel.com> X-Mailer: git-send-email 1.9.3 In-Reply-To: <1511288957-68599-1-git-send-email-mark.b.kavanagh@intel.com> References: <1511288957-68599-1-git-send-email-mark.b.kavanagh@intel.com> X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, T_RP_MATCHES_RCVD autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Subject: [ovs-dev] [RFC PATCH v3 5/8] lib/dp-packet: fix dp_packet_put_uninit for multi-seg mbufs X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org dp_packet_put_uninit(dp_packet, size) appends 'size' bytes to the tail of a dp_packet. In the case of multi-segment mbufs, it is the data length of the last mbuf in the mbuf chain that should be adjusted by 'size' bytes. In its current implementation, dp_packet_put_uninit() adjusts the dp_packet's size via a call to dp_packet_set_size(); however, this adjusts the data length of the first mbuf in the chain, which is incorrect in the case of multi-segment mbufs. Instead, traverse the mbuf chain to locate the final mbuf of said chain, and update its data_len [1]. To finish, increase the packet length of the entire mbuf [2] by 'size'. [1] In the case of a single-segment mbuf, this is the mbuf itself. [2] This is stored in the first mbuf of an mbuf chain. Signed-off-by: Mark Kavanagh --- lib/dp-packet.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/lib/dp-packet.c b/lib/dp-packet.c index 5078211..5c590e5 100644 --- a/lib/dp-packet.c +++ b/lib/dp-packet.c @@ -325,6 +325,22 @@ dp_packet_put_uninit(struct dp_packet *b, size_t size) void *p; dp_packet_prealloc_tailroom(b, size); p = dp_packet_tail(b); +#ifdef DPDK_NETDEV + if (b->source == DPBUF_DPDK) { + struct rte_mbuf *buf = &(b->mbuf); + /* In the case of multi-segment mbufs, the data length of the last mbuf + * should be adjusted by 'size' bytes. A call to dp_packet_size() would + * adjust the data length of the first mbuf in the segment, so we avoid + * invoking same; as a result, the packet length of the entire mbuf + * chain (stored in the first mbuf of said chain) must be adjusted here + * instead. + */ + while (buf->next) + buf = buf->next; + buf->data_len += size; + b->mbuf.pkt_len += size; + } else +#endif dp_packet_set_size(b, dp_packet_size(b) + size); return p; } From patchwork Tue Nov 21 18:29:15 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mark Kavanagh X-Patchwork-Id: 840153 X-Patchwork-Delegate: ian.stokes@intel.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3yhDjM6m95z9sRW for ; Wed, 22 Nov 2017 05:32:55 +1100 (AEDT) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id 223D0C00; Tue, 21 Nov 2017 18:29:33 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id A029CAF7 for ; Tue, 21 Nov 2017 18:29:30 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 570F88A for ; Tue, 21 Nov 2017 18:29:30 +0000 (UTC) Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 21 Nov 2017 10:29:30 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.44,432,1505804400"; d="scan'208";a="7930139" Received: from silpixa00380299.ir.intel.com ([10.237.222.17]) by orsmga001.jf.intel.com with ESMTP; 21 Nov 2017 10:29:28 -0800 From: Mark Kavanagh To: dev@openvswitch.org, qiudayu@chinac.com Date: Tue, 21 Nov 2017 18:29:15 +0000 Message-Id: <1511288957-68599-7-git-send-email-mark.b.kavanagh@intel.com> X-Mailer: git-send-email 1.9.3 In-Reply-To: <1511288957-68599-1-git-send-email-mark.b.kavanagh@intel.com> References: <1511288957-68599-1-git-send-email-mark.b.kavanagh@intel.com> X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, T_RP_MATCHES_RCVD autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Subject: [ovs-dev] [RFC PATCH v3 6/8] lib/dp-packet: copy data from multi-seg. DPDK mbuf X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org From: Michael Qiu When doing packet clone, if packet source is from DPDK driver, multi-segment must be considered, and copy the segment's data one by one. Signed-off-by: Michael Qiu Signed-off-by: Mark Kavanagh --- lib/dp-packet.c | 24 ++++++++++++++++++++++-- 1 file changed, 22 insertions(+), 2 deletions(-) diff --git a/lib/dp-packet.c b/lib/dp-packet.c index 5c590e5..26fff02 100644 --- a/lib/dp-packet.c +++ b/lib/dp-packet.c @@ -166,10 +166,30 @@ struct dp_packet * dp_packet_clone_with_headroom(const struct dp_packet *buffer, size_t headroom) { struct dp_packet *new_buffer; + uint32_t pkt_len = dp_packet_size(buffer); +#ifdef DPDK_NETDEV + /* copy multi-seg data */ + if (buffer->source == DPBUF_DPDK && buffer->mbuf.nb_segs > 1) { + uint32_t offset = 0; + void *dst = NULL; + struct rte_mbuf *tmbuf = CONST_CAST(struct rte_mbuf *, &(buffer->mbuf)); + + new_buffer = dp_packet_new_with_headroom(pkt_len, headroom); + dp_packet_set_size(new_buffer, pkt_len + headroom); + dst = dp_packet_tail(new_buffer); + + while (tmbuf) { + rte_memcpy((char *)dst + offset, + rte_pktmbuf_mtod(tmbuf, void *), tmbuf->data_len); + offset += tmbuf->data_len; + tmbuf = tmbuf->next; + } + } + else +#endif new_buffer = dp_packet_clone_data_with_headroom(dp_packet_data(buffer), - dp_packet_size(buffer), - headroom); + pkt_len, headroom); /* Copy the following fields into the returned buffer: l2_pad_size, * l2_5_ofs, l3_ofs, l4_ofs, cutlen, packet_type and md. */ memcpy(&new_buffer->l2_pad_size, &buffer->l2_pad_size, From patchwork Tue Nov 21 18:29:16 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mark Kavanagh X-Patchwork-Id: 840154 X-Patchwork-Delegate: ian.stokes@intel.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3yhDjy30K6z9sRW for ; Wed, 22 Nov 2017 05:33:26 +1100 (AEDT) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id 06F5AC13; Tue, 21 Nov 2017 18:29:35 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id 82C1DAF3 for ; Tue, 21 Nov 2017 18:29:32 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 0C4B6478 for ; Tue, 21 Nov 2017 18:29:32 +0000 (UTC) Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 21 Nov 2017 10:29:31 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.44,432,1505804400"; d="scan'208";a="7930159" Received: from silpixa00380299.ir.intel.com ([10.237.222.17]) by orsmga001.jf.intel.com with ESMTP; 21 Nov 2017 10:29:30 -0800 From: Mark Kavanagh To: dev@openvswitch.org, qiudayu@chinac.com Date: Tue, 21 Nov 2017 18:29:16 +0000 Message-Id: <1511288957-68599-8-git-send-email-mark.b.kavanagh@intel.com> X-Mailer: git-send-email 1.9.3 In-Reply-To: <1511288957-68599-1-git-send-email-mark.b.kavanagh@intel.com> References: <1511288957-68599-1-git-send-email-mark.b.kavanagh@intel.com> X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, T_RP_MATCHES_RCVD autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Subject: [ovs-dev] [RFC PATCH v3 7/8] netdev-dpdk: copy large packet to multi-seg. mbufs X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org From: Michael Qiu Currently, packets are only copied to a single segment in the function dpdk_do_tx_copy(). This could be an issue in the case of jumbo frames, particularly when multi-segment mbufs are involved. This patch calculates the number of segments needed by a packet and copies the data to each segment. Signed-off-by: Michael Qiu Signed-off-by: Mark Kavanagh --- lib/netdev-dpdk.c | 55 +++++++++++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 51 insertions(+), 4 deletions(-) diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 61a0dca..36275bd 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -1824,8 +1824,10 @@ dpdk_do_tx_copy(struct netdev *netdev, int qid, struct dp_packet_batch *batch) #endif struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); struct rte_mbuf *pkts[PKT_ARRAY_SIZE]; + struct rte_mbuf *temp, *head = NULL; uint32_t cnt = batch_cnt; uint32_t dropped = 0; + uint32_t i, j, nb_segs; if (dev->type != DPDK_DEV_VHOST) { /* Check if QoS has been configured for this netdev. */ @@ -1838,9 +1840,10 @@ dpdk_do_tx_copy(struct netdev *netdev, int qid, struct dp_packet_batch *batch) uint32_t txcnt = 0; - for (uint32_t i = 0; i < cnt; i++) { + for (i = 0; i < cnt; i++) { struct dp_packet *packet = batch->packets[i]; uint32_t size = dp_packet_size(packet); + uint16_t max_data_len, data_len; if (OVS_UNLIKELY(size > dev->max_packet_len)) { VLOG_WARN_RL(&rl, "Too big size %u max_packet_len %d", @@ -1850,15 +1853,59 @@ dpdk_do_tx_copy(struct netdev *netdev, int qid, struct dp_packet_batch *batch) continue; } - pkts[txcnt] = rte_pktmbuf_alloc(dev->mp); + temp = pkts[txcnt] = rte_pktmbuf_alloc(dev->mp); if (OVS_UNLIKELY(!pkts[txcnt])) { dropped += cnt - i; break; } + /* All new allocated mbuf's max data len is the same */ + max_data_len = temp->buf_len - temp->data_off; + + /* Calculate # of output mbufs. */ + nb_segs = size / max_data_len; + if (size % max_data_len) + nb_segs = nb_segs + 1; + + /* Allocate additional mbufs when multiple output mbufs required. */ + for (j = 1; j < nb_segs; j++) { + temp->next = rte_pktmbuf_alloc(dev->mp); + if (!temp->next) { + rte_pktmbuf_free(pkts[txcnt]); + pkts[txcnt] = NULL; + break; + } + temp = temp->next; + } /* We have to do a copy for now */ - memcpy(rte_pktmbuf_mtod(pkts[txcnt], void *), - dp_packet_data(packet), size); + rte_pktmbuf_pkt_len(pkts[txcnt]) = size; + temp = pkts[txcnt]; + + data_len = size < max_data_len ? size: max_data_len; + if (packet->source == DPBUF_DPDK) { + head = &(packet->mbuf); + while (temp && head && size > 0) { + rte_memcpy(rte_pktmbuf_mtod(temp, void*), + dp_packet_data((struct dp_packet *)head), data_len); + rte_pktmbuf_data_len(temp) = data_len; + head = head->next; + size = size - data_len; + data_len = size < max_data_len ? size: max_data_len; + temp = temp->next; + } + } else { + int offset = 0; + while (temp && size > 0) { + memcpy(rte_pktmbuf_mtod(temp, void *), + dp_packet_at(packet, offset, data_len), data_len); + rte_pktmbuf_data_len(temp) = data_len; + temp = temp->next; + size = size - data_len; + offset += data_len; + data_len = size < max_data_len ? size: max_data_len; + } + } + dp_packet_set_size((struct dp_packet *)pkts[txcnt], size); pkts[txcnt]->nb_segs = packet->mbuf.nb_segs; pkts[txcnt]->ol_flags = packet->mbuf.ol_flags; From patchwork Tue Nov 21 18:29:17 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mark Kavanagh X-Patchwork-Id: 840155 X-Patchwork-Delegate: ian.stokes@intel.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3yhDkd4y0bz9sRW for ; Wed, 22 Nov 2017 05:34:01 +1100 (AEDT) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id 18861C26; Tue, 21 Nov 2017 18:29:37 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id 364C0BF2 for ; Tue, 21 Nov 2017 18:29:34 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 8BF064E9 for ; Tue, 21 Nov 2017 18:29:33 +0000 (UTC) Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 21 Nov 2017 10:29:33 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.44,432,1505804400"; d="scan'208";a="7930169" Received: from silpixa00380299.ir.intel.com ([10.237.222.17]) by orsmga001.jf.intel.com with ESMTP; 21 Nov 2017 10:29:32 -0800 From: Mark Kavanagh To: dev@openvswitch.org, qiudayu@chinac.com Date: Tue, 21 Nov 2017 18:29:17 +0000 Message-Id: <1511288957-68599-9-git-send-email-mark.b.kavanagh@intel.com> X-Mailer: git-send-email 1.9.3 In-Reply-To: <1511288957-68599-1-git-send-email-mark.b.kavanagh@intel.com> References: <1511288957-68599-1-git-send-email-mark.b.kavanagh@intel.com> X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, T_RP_MATCHES_RCVD autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Subject: [ovs-dev] [RFC PATCH v3 8/8] netdev-dpdk: support multi-segment jumbo frames X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org Currently, jumbo frame support for OvS-DPDK is implemented by increasing the size of mbufs within a mempool, such that each mbuf within the pool is large enough to contain an entire jumbo frame of a user-defined size. Typically, for each user-defined MTU, 'requested_mtu', a new mempool is created, containing mbufs of size ~requested_mtu. With the multi-segment approach, a port uses a single mempool, (containing standard/default-sized mbufs of ~2k bytes), irrespective of the user-requested MTU value. To accommodate jumbo frames, mbufs are chained together, where each mbuf in the chain stores a portion of the jumbo frame. Each mbuf in the chain is termed a segment, hence the name. == Enabling multi-segment mbufs == Multi-segment and single-segment mbufs are mutually exclusive, and the user must decide on which approach to adopt on init. The introduction of a new OVSDB field, 'dpdk-multi-seg-mbufs', facilitates this. This is a global boolean value, which determines how jumbo frames are represented across all DPDK ports. In the absence of a user-supplied value, 'dpdk-multi-seg-mbufs' defaults to false, i.e. multi-segment mbufs must be explicitly enabled / single-segment mbufs remain the default. Setting the field is identical to setting existing DPDK-specific OVSDB fields: ovs-vsctl set Open_vSwitch . other_config:dpdk-init=true ovs-vsctl set Open_vSwitch . other_config:dpdk-lcore-mask=0x10 ovs-vsctl set Open_vSwitch . other_config:dpdk-socket-mem=4096,0 ==> ovs-vsctl set Open_vSwitch . other_config:dpdk-multi-seg-mbufs=true Signed-off-by: Mark Kavanagh --- NEWS | 1 + lib/dpdk.c | 7 +++++++ lib/netdev-dpdk.c | 43 ++++++++++++++++++++++++++++++++++++++++--- lib/netdev-dpdk.h | 1 + vswitchd/vswitch.xml | 20 ++++++++++++++++++++ 5 files changed, 69 insertions(+), 3 deletions(-) diff --git a/NEWS b/NEWS index c15dc24..657b598 100644 --- a/NEWS +++ b/NEWS @@ -15,6 +15,7 @@ Post-v2.8.0 - DPDK: * Add support for DPDK v17.11 * Add support for vHost IOMMU feature + * Add support for multi-segment mbufs v2.8.0 - 31 Aug 2017 -------------------- diff --git a/lib/dpdk.c b/lib/dpdk.c index 8da6c32..4c28bd0 100644 --- a/lib/dpdk.c +++ b/lib/dpdk.c @@ -450,6 +450,13 @@ dpdk_init__(const struct smap *ovs_other_config) /* Finally, register the dpdk classes */ netdev_dpdk_register(); + + bool multi_seg_mbufs_enable = smap_get_bool(ovs_other_config, + "dpdk-multi-seg-mbufs", false); + if (multi_seg_mbufs_enable) { + VLOG_INFO("DPDK multi-segment mbufs enabled\n"); + netdev_dpdk_multi_segment_mbufs_enable(); + } } void diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 36275bd..293edad 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -65,6 +65,7 @@ enum {VIRTIO_RXQ, VIRTIO_TXQ, VIRTIO_QNUM}; VLOG_DEFINE_THIS_MODULE(netdev_dpdk); static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20); +static bool dpdk_multi_segment_mbufs = false; #define DPDK_PORT_WATCHDOG_INTERVAL 5 @@ -500,6 +501,7 @@ dpdk_mp_create(struct netdev_dpdk *dev, uint16_t frame_len) + dev->requested_n_txq * dev->requested_txq_size + MIN(RTE_MAX_LCORE, dev->requested_n_rxq) * NETDEV_MAX_BURST + MIN_NB_MBUF; + /* XXX (RFC) - should n_mbufs be increased if multi-seg mbufs are used? */ ovs_mutex_lock(&dpdk_mp_mutex); do { @@ -568,7 +570,13 @@ dpdk_mp_free(struct rte_mempool *mp) /* Tries to allocate a new mempool - or re-use an existing one where * appropriate - on requested_socket_id with a size determined by - * requested_mtu and requested Rx/Tx queues. + * requested_mtu and requested Rx/Tx queues. Some properties of the mempool's + * elements are dependent on the value of 'dpdk_multi_segment_mbufs': + * - if 'true', then the mempool contains standard-sized mbufs that are chained + * together to accommodate packets of size 'requested_mtu'. + * - if 'false', then the members of the allocated mempool are + * non-standard-sized mbufs. Each mbuf in the mempool is large enough to fully + * accomdate packets of size 'requested_mtu'. * On success - or when re-using an existing mempool - the new configuration * will be applied. * On error, device will be left unchanged. */ @@ -576,10 +584,18 @@ static int netdev_dpdk_mempool_configure(struct netdev_dpdk *dev) OVS_REQUIRES(dev->mutex) { - uint16_t buf_size = dpdk_buf_size(dev->requested_mtu); + uint16_t buf_size = 0; struct rte_mempool *mp; int ret = 0; + /* Contiguous mbufs in use - permit oversized mbufs */ + if (!dpdk_multi_segment_mbufs) { + buf_size = dpdk_buf_size(dev->requested_mtu); + } else { + /* multi-segment mbufs - use standard mbuf size */ + buf_size = dpdk_buf_size(ETHER_MTU); + } + mp = dpdk_mp_create(dev, buf_size); if (!mp) { VLOG_ERR("Failed to create memory pool for netdev " @@ -657,6 +673,7 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, int n_rxq, int n_txq) int diag = 0; int i; struct rte_eth_conf conf = port_conf; + struct rte_eth_txconf txconf; /* For some NICs (e.g. Niantic), scatter_rx mode needs to be explicitly * enabled. */ @@ -690,9 +707,23 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, int n_rxq, int n_txq) break; } + /* DPDK PMDs typically attempt to use simple or vectorized + * transmit functions, neither of which are compatible with + * multi-segment mbufs. Ensure that these are disabled in the + * when multi-segment mbufs are enabled. + */ + if (dpdk_multi_segment_mbufs) { + struct rte_eth_dev_info dev_info; + rte_eth_dev_info_get(dev->port_id, &dev_info); + txconf = dev_info.default_txconf; + txconf.txq_flags &= ~ETH_TXQ_FLAGS_NOMULTSEGS; + } + for (i = 0; i < n_txq; i++) { diag = rte_eth_tx_queue_setup(dev->port_id, i, dev->txq_size, - dev->socket_id, NULL); + dev->socket_id, + dpdk_multi_segment_mbufs ? &txconf + : NULL); if (diag) { VLOG_INFO("Interface %s txq(%d) setup error: %s", dev->up.name, i, rte_strerror(-diag)); @@ -3380,6 +3411,12 @@ unlock: return err; } +void +netdev_dpdk_multi_segment_mbufs_enable(void) +{ + dpdk_multi_segment_mbufs = true; +} + #define NETDEV_DPDK_CLASS(NAME, INIT, CONSTRUCT, DESTRUCT, \ SET_CONFIG, SET_TX_MULTIQ, SEND, \ GET_CARRIER, GET_STATS, \ diff --git a/lib/netdev-dpdk.h b/lib/netdev-dpdk.h index b7d02a7..a3339fe 100644 --- a/lib/netdev-dpdk.h +++ b/lib/netdev-dpdk.h @@ -25,6 +25,7 @@ struct dp_packet; #ifdef DPDK_NETDEV +void netdev_dpdk_multi_segment_mbufs_enable(void); void netdev_dpdk_register(void); void free_dpdk_buf(struct dp_packet *); diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml index a633226..2b71c4a 100644 --- a/vswitchd/vswitch.xml +++ b/vswitchd/vswitch.xml @@ -331,6 +331,26 @@

+ +

+ Specifies if DPDK uses multi-segment mbufs for handling jumbo frames. +

+

+ If true, DPDK allocates a single mempool per port, irrespective + of the ports' requested MTU sizes. The elements of this mempool are + 'standard'-sized mbufs (typically 2k MB), which may be chained + together to accommodate jumbo frames. In this approach, each mbuf + typically stores a fragment of the overall jumbo frame. +

+

+ If not specified, defaults to false, in which case, the size + of each mbuf within a DPDK port's mempool will be grown to accommodate + jumbo frames within a single mbuf. +

+
+ +