From patchwork Wed Jul 1 09:15:29 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: yang_y_yi X-Patchwork-Id: 1320320 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=140.211.166.136; helo=silver.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=163.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=163.com header.i=@163.com header.a=rsa-sha256 header.s=s110527 header.b=nxv4bnvI; dkim-atps=neutral Received: from silver.osuosl.org (smtp3.osuosl.org [140.211.166.136]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 49xbF35WsHz9sTY for ; Wed, 1 Jul 2020 19:16:07 +1000 (AEST) Received: from localhost (localhost [127.0.0.1]) by silver.osuosl.org (Postfix) with ESMTP id 7F0B23023B; Wed, 1 Jul 2020 09:16:03 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from silver.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id xwlS5ATNIv+3; Wed, 1 Jul 2020 09:16:02 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by silver.osuosl.org (Postfix) with ESMTP id E865E2F953; Wed, 1 Jul 2020 09:16:01 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id C8337C0890; Wed, 1 Jul 2020 09:16:01 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from silver.osuosl.org (smtp3.osuosl.org [140.211.166.136]) by lists.linuxfoundation.org (Postfix) with ESMTP id 37B5BC0733 for ; Wed, 1 Jul 2020 09:16:00 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by silver.osuosl.org (Postfix) with ESMTP id 15AAC301C9 for ; Wed, 1 Jul 2020 09:16:00 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from silver.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id L979yezU4g5B for ; Wed, 1 Jul 2020 09:15:58 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from mail-m973.mail.163.com (mail-m973.mail.163.com [123.126.97.3]) by silver.osuosl.org (Postfix) with ESMTPS id 8AD892F953 for ; Wed, 1 Jul 2020 09:15:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=163.com; s=s110527; h=From:Subject:Date:Message-Id:MIME-Version; bh=Du08i BSUVvpEAee0o7iYtlyCkWE7G71EgUawiqog6LM=; b=nxv4bnvIkZLIS0BA31OzZ vbuibbU44muvTlVKTC/XBjA5zswyrfxLcvFalSSOJ0paXsddoWBJWFKWInn3JFKn tIakHU4csCPf2cgtEHnGRCjIkPMSZ9uYprEhDgW6siMsoRK3Xg5rv7KFbzPfp06I 6Y5xSRdgNACgMxNClKfN6M= Received: from yangyi0100.home.langchao.com (unknown [111.207.123.55]) by smtp3 (Coremail) with SMTP id G9xpCgA3OWM1VPxeaI88EQ--.82S3; Wed, 01 Jul 2020 17:15:36 +0800 (CST) From: yang_y_yi@163.com To: ovs-dev@openvswitch.org Date: Wed, 1 Jul 2020 17:15:29 +0800 Message-Id: <20200701091533.221552-2-yang_y_yi@163.com> X-Mailer: git-send-email 2.19.2.windows.1 In-Reply-To: <20200701091533.221552-1-yang_y_yi@163.com> References: <20200701091533.221552-1-yang_y_yi@163.com> MIME-Version: 1.0 X-CM-TRANSID: G9xpCgA3OWM1VPxeaI88EQ--.82S3 X-Coremail-Antispam: 1Uf129KBjvdXoW7Jw4Dtr13Ary3JryrWF1fXrb_yoWftwbEgr 4DZw1vvryDKrs7XF1UAr4DKw1Uuw1xAFyvgFsxJF93Ka4SgrZ5WrWvvFs3ZFnxuw1UKFW0 ga1kJFWYyr13tjkaLaAFLSUrUUUUUb8apTn2vfkv8UJUUUU8Yxn0WfASr-VFAUDa7-sFnT 9fnUUvcSsGvfC2KfnxnUUI43ZEXa7IUUvPfJUUUUU== X-Originating-IP: [111.207.123.55] X-CM-SenderInfo: 51dqwsp1b1xqqrwthudrp/1tbiFhhUi144LwBcsAAAs0 Cc: yang_y_yi@163.com, fbl@sysclose.org Subject: [ovs-dev] [PATCH v2 1/5] Fix dp_packet_set_size error for multi-seg mbuf X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" From: Yi Yang For multi-seg mbuf, pkt_len isn't equal to data_len, data_len is data_len of the first seg, pkt_len is sum of data_len of all the segs, so for such packets, dp_packet_set_size shouldn't change data_len. Signed-off-by: Yi Yang --- lib/dp-packet.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/lib/dp-packet.h b/lib/dp-packet.h index 0430cca..070d111 100644 --- a/lib/dp-packet.h +++ b/lib/dp-packet.h @@ -575,7 +575,9 @@ dp_packet_set_size(struct dp_packet *b, uint32_t v) * (and thus 'v') will always be <= UINT16_MAX; this means that there is no * loss of accuracy in assigning 'v' to 'data_len'. */ - b->mbuf.data_len = (uint16_t)v; /* Current seg length. */ + if (b->mbuf.nb_segs <= 1) { + b->mbuf.data_len = (uint16_t)v; /* Current seg length. */ + } b->mbuf.pkt_len = v; /* Total length of all segments linked to * this segment. */ } From patchwork Wed Jul 1 09:15:30 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: yang_y_yi X-Patchwork-Id: 1320323 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=140.211.166.133; helo=hemlock.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=163.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=163.com header.i=@163.com header.a=rsa-sha256 header.s=s110527 header.b=LJkNh+rS; dkim-atps=neutral Received: from hemlock.osuosl.org (smtp2.osuosl.org [140.211.166.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 49xbF92w1hz9s1x for ; Wed, 1 Jul 2020 19:16:13 +1000 (AEST) Received: from localhost (localhost [127.0.0.1]) by hemlock.osuosl.org (Postfix) with ESMTP id AB9F389475; Wed, 1 Jul 2020 09:16:10 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from hemlock.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Z0voB+MzfeST; Wed, 1 Jul 2020 09:16:04 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by hemlock.osuosl.org (Postfix) with ESMTP id 463C88AC72; Wed, 1 Jul 2020 09:16:04 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 2ABB2C08A8; Wed, 1 Jul 2020 09:16:04 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from fraxinus.osuosl.org (smtp4.osuosl.org [140.211.166.137]) by lists.linuxfoundation.org (Postfix) with ESMTP id 629C4C0733 for ; Wed, 1 Jul 2020 09:16:01 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by fraxinus.osuosl.org (Postfix) with ESMTP id 5090F8915F for ; Wed, 1 Jul 2020 09:16:01 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from fraxinus.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id sJyDi70As1Rs for ; Wed, 1 Jul 2020 09:15:59 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from mail-m973.mail.163.com (mail-m973.mail.163.com [123.126.97.3]) by fraxinus.osuosl.org (Postfix) with ESMTPS id CB51D8914B for ; Wed, 1 Jul 2020 09:15:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=163.com; s=s110527; h=From:Subject:Date:Message-Id:MIME-Version; bh=iJDk1 n5b3RjDwhPFqa1iz3a52mJFlEUGM3i5AIOb5No=; b=LJkNh+rSJzpTzKKc+mDLB BEfmp6Ibf3N/9EFy5PrDud0CXQ+LbR/ldOrfwwN4FvTGwh0yO2cLhEGPxuOivPiS oZR8r3+quaUe9VMgV3FP8UAcTaMDBaRroGu/94a6ydBjL7jAmNRF7v9y7qc3PE8R 4GWx5KaKL+ANulDJpvTgKI= Received: from yangyi0100.home.langchao.com (unknown [111.207.123.55]) by smtp3 (Coremail) with SMTP id G9xpCgA3OWM1VPxeaI88EQ--.82S4; Wed, 01 Jul 2020 17:15:36 +0800 (CST) From: yang_y_yi@163.com To: ovs-dev@openvswitch.org Date: Wed, 1 Jul 2020 17:15:30 +0800 Message-Id: <20200701091533.221552-3-yang_y_yi@163.com> X-Mailer: git-send-email 2.19.2.windows.1 In-Reply-To: <20200701091533.221552-1-yang_y_yi@163.com> References: <20200701091533.221552-1-yang_y_yi@163.com> MIME-Version: 1.0 X-CM-TRANSID: G9xpCgA3OWM1VPxeaI88EQ--.82S4 X-Coremail-Antispam: 1Uf129KBjvAXoW3KFW7uF4xGFW5Gw1DWF13twb_yoW8Xw1DXo Z7Gr43u3WrWr1kA3y8KFyUWF4vqw40kF4093ZYq3W5ua4ayr1DX3yfCay3Aa13Zr13Ar4D Aw4Utas3ZrZrJry8n29KB7ZKAUJUUUUU529EdanIXcx71UUUUU7v73VFW2AGmfu7bjvjm3 AaLaJ3UbIYCTnIWIevJa73UjIFyTuYvjxUbUUUDUUUU X-Originating-IP: [111.207.123.55] X-CM-SenderInfo: 51dqwsp1b1xqqrwthudrp/1tbiMxhUi1Xl4x7uMwAAsk Cc: yang_y_yi@163.com, fbl@sysclose.org Subject: [ovs-dev] [PATCH v2 2/5] Enable VXLAN TSO for DPDK datapath X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" From: Yi Yang Many NICs can support VXLAN TSO which can help improve across-compute-node VM-to-VM performance in case that MTU is set to 1500. This patch allows dpdkvhostuserclient interface and veth/tap interface to leverage NICs' offload capability to maximize across-compute-node TCP performance, with it applied, OVS DPDK can reach linespeed for across-compute-node VM-to-VM TCP performance. Signed-off-by: Yi Yang --- lib/dp-packet.h | 61 +++++++++++++++++ lib/netdev-dpdk.c | 193 +++++++++++++++++++++++++++++++++++++++++++++++++---- lib/netdev-linux.c | 20 ++++++ lib/netdev.c | 14 ++-- 4 files changed, 271 insertions(+), 17 deletions(-) diff --git a/lib/dp-packet.h b/lib/dp-packet.h index 070d111..07af124 100644 --- a/lib/dp-packet.h +++ b/lib/dp-packet.h @@ -1034,6 +1034,67 @@ dp_packet_hwol_set_tcp_seg(struct dp_packet *b) *dp_packet_ol_flags_ptr(b) |= DP_PACKET_OL_TX_TCP_SEG; } +#ifdef DPDK_NETDEV +/* Mark packet 'b' for VXLAN TCP segmentation offloading. */ +static inline void +dp_packet_hwol_set_vxlan_tcp_seg(struct dp_packet *b) +{ + b->mbuf.ol_flags |= PKT_TX_TUNNEL_VXLAN; + b->mbuf.l2_len += sizeof(struct udp_header) + + sizeof(struct vxlanhdr); + b->mbuf.outer_l2_len = ETH_HEADER_LEN; + b->mbuf.outer_l3_len = IP_HEADER_LEN; +} + +/* Set l2_len for the packet 'b' */ +static inline void +dp_packet_hwol_set_l2_len(struct dp_packet *b, int l2_len) +{ + b->mbuf.l2_len = l2_len; +} + +/* Set l3_len for the packet 'b' */ +static inline void +dp_packet_hwol_set_l3_len(struct dp_packet *b, int l3_len) +{ + b->mbuf.l3_len = l3_len; +} + +/* Set l4_len for the packet 'b' */ +static inline void +dp_packet_hwol_set_l4_len(struct dp_packet *b, int l4_len) +{ + b->mbuf.l4_len = l4_len; +} +#else +/* Mark packet 'b' for VXLAN TCP segmentation offloading. */ +static inline void +dp_packet_hwol_set_vxlan_tcp_seg(struct dp_packet *b OVS_UNUSED) +{ +} + +/* Set l2_len for the packet 'b' */ +static inline void +dp_packet_hwol_set_l2_len(struct dp_packet *b OVS_UNUSED, + int l2_len OVS_UNUSED) +{ +} + +/* Set l3_len for the packet 'b' */ +static inline void +dp_packet_hwol_set_l3_len(struct dp_packet *b OVS_UNUSED, + int l3_len OVS_UNUSED) +{ +} + +/* Set l4_len for the packet 'b' */ +static inline void +dp_packet_hwol_set_l4_len(struct dp_packet *b OVS_UNUSED, + int l4_len OVS_UNUSED) +{ +} +#endif /* DPDK_NETDEV */ + static inline bool dp_packet_ip_checksum_valid(const struct dp_packet *p) { diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 44ebf96..bf5fa63 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -44,6 +44,7 @@ #include #include #include +#include #include "cmap.h" #include "coverage.h" @@ -87,6 +88,7 @@ COVERAGE_DEFINE(vhost_notification); #define OVS_CACHE_LINE_SIZE CACHE_LINE_SIZE #define OVS_VPORT_DPDK "ovs_dpdk" +#define DPDK_RTE_HDR_OFFSET 1 /* * need to reserve tons of extra space in the mbufs so we can align the @@ -405,6 +407,7 @@ enum dpdk_hw_ol_features { NETDEV_RX_HW_SCATTER = 1 << 2, NETDEV_TX_TSO_OFFLOAD = 1 << 3, NETDEV_TX_SCTP_CHECKSUM_OFFLOAD = 1 << 4, + NETDEV_TX_VXLAN_TNL_TSO_OFFLOAD = 1 << 5, }; /* @@ -988,6 +991,12 @@ dpdk_eth_dev_port_config(struct netdev_dpdk *dev, int n_rxq, int n_txq) if (dev->hw_ol_features & NETDEV_TX_TSO_OFFLOAD) { conf.txmode.offloads |= DPDK_TX_TSO_OFFLOAD_FLAGS; + /* Enable VXLAN TSO support if available */ + if (dev->hw_ol_features & NETDEV_TX_VXLAN_TNL_TSO_OFFLOAD) { + conf.txmode.offloads |= DEV_TX_OFFLOAD_VXLAN_TNL_TSO; + conf.txmode.offloads |= DEV_TX_OFFLOAD_OUTER_IPV4_CKSUM; + conf.txmode.offloads |= DEV_TX_OFFLOAD_MULTI_SEGS; + } if (dev->hw_ol_features & NETDEV_TX_SCTP_CHECKSUM_OFFLOAD) { conf.txmode.offloads |= DEV_TX_OFFLOAD_SCTP_CKSUM; } @@ -1126,6 +1135,10 @@ dpdk_eth_dev_init(struct netdev_dpdk *dev) if ((info.tx_offload_capa & tx_tso_offload_capa) == tx_tso_offload_capa) { dev->hw_ol_features |= NETDEV_TX_TSO_OFFLOAD; + /* Enable VXLAN TSO support if available */ + if (info.tx_offload_capa & DEV_TX_OFFLOAD_VXLAN_TNL_TSO) { + dev->hw_ol_features |= NETDEV_TX_VXLAN_TNL_TSO_OFFLOAD; + } if (info.tx_offload_capa & DEV_TX_OFFLOAD_SCTP_CKSUM) { dev->hw_ol_features |= NETDEV_TX_SCTP_CHECKSUM_OFFLOAD; } else { @@ -2137,29 +2150,96 @@ static bool netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, struct rte_mbuf *mbuf) { struct dp_packet *pkt = CONTAINER_OF(mbuf, struct dp_packet, mbuf); + uint16_t l4_proto = 0; + struct rte_ether_hdr *eth_hdr = + rte_pktmbuf_mtod(mbuf, struct rte_ether_hdr *); + struct rte_ipv4_hdr *ip_hdr; + struct rte_ipv6_hdr *ip6_hdr; + + if (mbuf->ol_flags & PKT_TX_TUNNEL_VXLAN) { + /* Handle VXLAN TSO */ + struct rte_udp_hdr *udp_hdr; + + if (mbuf->ol_flags & PKT_TX_IPV4) { + ip_hdr = (struct rte_ipv4_hdr *)(eth_hdr + DPDK_RTE_HDR_OFFSET); + udp_hdr = (struct rte_udp_hdr *)(ip_hdr + DPDK_RTE_HDR_OFFSET); + + /* outer IP checksum offload */ + ip_hdr->hdr_checksum = 0; + mbuf->ol_flags |= PKT_TX_OUTER_IP_CKSUM; + mbuf->ol_flags |= PKT_TX_OUTER_IPV4; + + ip_hdr = (struct rte_ipv4_hdr *) + ((uint8_t *)udp_hdr + mbuf->l2_len); + l4_proto = ip_hdr->next_proto_id; + + /* inner IP checksum offload */ + ip_hdr->hdr_checksum = 0; + mbuf->ol_flags |= PKT_TX_IP_CKSUM; + } else if (mbuf->ol_flags & PKT_TX_IPV6) { + ip_hdr = (struct rte_ipv4_hdr *)(eth_hdr + DPDK_RTE_HDR_OFFSET); + udp_hdr = (struct rte_udp_hdr *)(ip_hdr + DPDK_RTE_HDR_OFFSET); + + /* outer IP checksum offload */ + ip_hdr->hdr_checksum = 0; + mbuf->ol_flags |= PKT_TX_OUTER_IP_CKSUM; + mbuf->ol_flags |= PKT_TX_OUTER_IPV4; + + ip6_hdr = (struct rte_ipv6_hdr *) + ((uint8_t *)udp_hdr + mbuf->l2_len); + l4_proto = ip6_hdr->proto; + + /* inner IP checksum offload offload */ + mbuf->ol_flags |= PKT_TX_IP_CKSUM; + } + } else if (mbuf->ol_flags & PKT_TX_L4_MASK) { + /* Handle VLAN TSO */ + /* no inner IP checksum for IPV6 */ + if (mbuf->ol_flags & PKT_TX_IPV4) { + ip_hdr = (struct rte_ipv4_hdr *)(eth_hdr + DPDK_RTE_HDR_OFFSET); + l4_proto = ip_hdr->next_proto_id; + + /* IP checksum offload */ + ip_hdr->hdr_checksum = 0; + mbuf->ol_flags |= PKT_TX_IP_CKSUM; + } else if (mbuf->ol_flags & PKT_TX_IPV6) { + ip6_hdr = (struct rte_ipv6_hdr *)(eth_hdr + DPDK_RTE_HDR_OFFSET); + l4_proto = ip6_hdr->proto; + + /* IP checksum offload */ + mbuf->ol_flags |= PKT_TX_IP_CKSUM; + } - if (mbuf->ol_flags & PKT_TX_L4_MASK) { mbuf->l2_len = (char *)dp_packet_l3(pkt) - (char *)dp_packet_eth(pkt); mbuf->l3_len = (char *)dp_packet_l4(pkt) - (char *)dp_packet_l3(pkt); mbuf->outer_l2_len = 0; mbuf->outer_l3_len = 0; } - if (mbuf->ol_flags & PKT_TX_TCP_SEG) { - struct tcp_header *th = dp_packet_l4(pkt); - - if (!th) { + if ((mbuf->ol_flags & PKT_TX_L4_MASK) == PKT_TX_UDP_CKSUM) { + if (l4_proto != IPPROTO_UDP) { + VLOG_WARN_RL(&rl, "%s: UDP packet without L4 header" + " pkt len: %"PRIu32"", dev->up.name, mbuf->pkt_len); + return false; + } + /* VXLAN GSO can be done here */ + } else if (mbuf->ol_flags & PKT_TX_TCP_SEG || + mbuf->ol_flags & PKT_TX_TCP_CKSUM) { + if (l4_proto != IPPROTO_TCP) { VLOG_WARN_RL(&rl, "%s: TCP Segmentation without L4 header" " pkt len: %"PRIu32"", dev->up.name, mbuf->pkt_len); return false; } - mbuf->l4_len = TCP_OFFSET(th->tcp_ctl) * 4; - mbuf->ol_flags |= PKT_TX_TCP_CKSUM; - mbuf->tso_segsz = dev->mtu - mbuf->l3_len - mbuf->l4_len; + if (mbuf->pkt_len - mbuf->l2_len > 1450) { + dp_packet_hwol_set_tcp_seg(pkt); + } - if (mbuf->ol_flags & PKT_TX_IPV4) { - mbuf->ol_flags |= PKT_TX_IP_CKSUM; + mbuf->ol_flags |= PKT_TX_TCP_CKSUM; + if (mbuf->ol_flags & PKT_TX_TCP_SEG) { + mbuf->tso_segsz = 1450 - mbuf->l3_len - mbuf->l4_len; + } else { + mbuf->tso_segsz = 0; } } return true; @@ -2365,6 +2445,71 @@ netdev_dpdk_vhost_update_rx_counters(struct netdev_dpdk *dev, } } +static void +netdev_linux_parse_l2(struct dp_packet *pkt, uint16_t *l4_proto) +{ + struct rte_mbuf *mbuf = (struct rte_mbuf *)pkt; + struct rte_ether_hdr *eth_hdr = + rte_pktmbuf_mtod(mbuf, struct rte_ether_hdr *); + ovs_be16 eth_type; + int l2_len; + int l3_len = 0; + int l4_len = 0; + + l2_len = ETH_HEADER_LEN; + eth_type = (OVS_FORCE ovs_be16) eth_hdr->ether_type; + if (eth_type_vlan(eth_type)) { + struct rte_vlan_hdr *vlan_hdr = + (struct rte_vlan_hdr *)(eth_hdr + DPDK_RTE_HDR_OFFSET); + + eth_type = (OVS_FORCE ovs_be16) vlan_hdr->eth_proto; + l2_len += VLAN_HEADER_LEN; + } + + dp_packet_hwol_set_l2_len(pkt, l2_len); + + if (eth_type == htons(ETH_TYPE_IP)) { + struct rte_ipv4_hdr *ipv4_hdr = (struct rte_ipv4_hdr *) + ((char *)eth_hdr + l2_len); + + l3_len = IP_HEADER_LEN; + dp_packet_hwol_set_tx_ipv4(pkt); + *l4_proto = ipv4_hdr->next_proto_id; + } else if (eth_type == htons(RTE_ETHER_TYPE_IPV6)) { + struct rte_ipv6_hdr *ipv6_hdr = (struct rte_ipv6_hdr *) + ((char *)eth_hdr + l2_len); + l3_len = IPV6_HEADER_LEN; + dp_packet_hwol_set_tx_ipv6(pkt); + *l4_proto = ipv6_hdr->proto; + } + + dp_packet_hwol_set_l3_len(pkt, l3_len); + + if (*l4_proto == IPPROTO_TCP) { + struct rte_tcp_hdr *tcp_hdr = (struct rte_tcp_hdr *) + ((char *)eth_hdr + l2_len + l3_len); + + l4_len = (tcp_hdr->data_off & 0xf0) >> 2; + dp_packet_hwol_set_l4_len(pkt, l4_len); + } +} + +static void +netdev_dpdk_parse_hdr(struct dp_packet *b) +{ + uint16_t l4_proto = 0; + + netdev_linux_parse_l2(b, &l4_proto); + + if (l4_proto == IPPROTO_TCP) { + dp_packet_hwol_set_csum_tcp(b); + } else if (l4_proto == IPPROTO_UDP) { + dp_packet_hwol_set_csum_udp(b); + } else if (l4_proto == IPPROTO_SCTP) { + dp_packet_hwol_set_csum_sctp(b); + } +} + /* * The receive path for the vhost port is the TX path out from guest. */ @@ -2378,6 +2523,7 @@ netdev_dpdk_vhost_rxq_recv(struct netdev_rxq *rxq, uint16_t qos_drops = 0; int qid = rxq->queue_id * VIRTIO_QNUM + VIRTIO_TXQ; int vid = netdev_dpdk_get_vid(dev); + struct dp_packet *packet; if (OVS_UNLIKELY(vid < 0 || !dev->vhost_reconfigured || !(dev->flags & NETDEV_UP))) { @@ -2417,6 +2563,14 @@ netdev_dpdk_vhost_rxq_recv(struct netdev_rxq *rxq, batch->count = nb_rx; dp_packet_batch_init_packet_fields(batch); + DP_PACKET_BATCH_FOR_EACH (i, packet, batch) { + struct rte_mbuf *mbuf = (struct rte_mbuf *)packet; + + /* Clear ol_flags and set it by parsing header */ + mbuf->ol_flags = 0; + netdev_dpdk_parse_hdr(packet); + } + return 0; } @@ -2737,13 +2891,18 @@ dpdk_copy_dp_packet_to_mbuf(struct rte_mempool *mp, struct dp_packet *pkt_orig) mbuf_dest->tx_offload = pkt_orig->mbuf.tx_offload; mbuf_dest->packet_type = pkt_orig->mbuf.packet_type; - mbuf_dest->ol_flags |= (pkt_orig->mbuf.ol_flags & - ~(EXT_ATTACHED_MBUF | IND_ATTACHED_MBUF)); + mbuf_dest->ol_flags |= pkt_orig->mbuf.ol_flags; + mbuf_dest->l2_len = pkt_orig->mbuf.l2_len; + mbuf_dest->l3_len = pkt_orig->mbuf.l3_len; + mbuf_dest->l4_len = pkt_orig->mbuf.l4_len; + mbuf_dest->outer_l2_len = pkt_orig->mbuf.outer_l2_len; + mbuf_dest->outer_l3_len = pkt_orig->mbuf.outer_l3_len; memcpy(&pkt_dest->l2_pad_size, &pkt_orig->l2_pad_size, sizeof(struct dp_packet) - offsetof(struct dp_packet, l2_pad_size)); - if (mbuf_dest->ol_flags & PKT_TX_L4_MASK) { + if ((mbuf_dest->outer_l2_len == 0) && + (mbuf_dest->ol_flags & PKT_TX_L4_MASK)) { mbuf_dest->l2_len = (char *)dp_packet_l3(pkt_dest) - (char *)dp_packet_eth(pkt_dest); mbuf_dest->l3_len = (char *)dp_packet_l4(pkt_dest) @@ -2773,6 +2932,7 @@ dpdk_do_tx_copy(struct netdev *netdev, int qid, struct dp_packet_batch *batch) uint32_t tx_failure = 0; uint32_t mtu_drops = 0; uint32_t qos_drops = 0; + struct rte_mbuf *mbuf; if (dev->type != DPDK_DEV_VHOST) { /* Check if QoS has been configured for this netdev. */ @@ -2801,6 +2961,9 @@ dpdk_do_tx_copy(struct netdev *netdev, int qid, struct dp_packet_batch *batch) break; } + mbuf = (struct rte_mbuf *)pkts[txcnt]; + netdev_dpdk_prep_hwol_packet(dev, mbuf); + txcnt++; } @@ -4949,6 +5112,10 @@ netdev_dpdk_reconfigure(struct netdev *netdev) netdev->ol_flags |= NETDEV_TX_OFFLOAD_TCP_CKSUM; netdev->ol_flags |= NETDEV_TX_OFFLOAD_UDP_CKSUM; netdev->ol_flags |= NETDEV_TX_OFFLOAD_IPV4_CKSUM; + /* Enable VXLAN TSO support if available */ + if (dev->hw_ol_features & NETDEV_TX_VXLAN_TNL_TSO_OFFLOAD) { + netdev->ol_flags |= NETDEV_TX_VXLAN_TNL_TSO_OFFLOAD; + } if (dev->hw_ol_features & NETDEV_TX_SCTP_CHECKSUM_OFFLOAD) { netdev->ol_flags |= NETDEV_TX_OFFLOAD_SCTP_CKSUM; } diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c index 6269c24..f6e80fc 100644 --- a/lib/netdev-linux.c +++ b/lib/netdev-linux.c @@ -6500,6 +6500,8 @@ netdev_linux_parse_l2(struct dp_packet *b, uint16_t *l4proto) struct eth_header *eth_hdr; ovs_be16 eth_type; int l2_len; + int l3_len = 0; + int l4_len = 0; eth_hdr = dp_packet_at(b, 0, ETH_HEADER_LEN); if (!eth_hdr) { @@ -6519,6 +6521,8 @@ netdev_linux_parse_l2(struct dp_packet *b, uint16_t *l4proto) l2_len += VLAN_HEADER_LEN; } + dp_packet_hwol_set_l2_len(b, l2_len); + if (eth_type == htons(ETH_TYPE_IP)) { struct ip_header *ip_hdr = dp_packet_at(b, l2_len, IP_HEADER_LEN); @@ -6526,6 +6530,7 @@ netdev_linux_parse_l2(struct dp_packet *b, uint16_t *l4proto) return -EINVAL; } + l3_len = IP_HEADER_LEN; *l4proto = ip_hdr->ip_proto; dp_packet_hwol_set_tx_ipv4(b); } else if (eth_type == htons(ETH_TYPE_IPV6)) { @@ -6536,10 +6541,25 @@ netdev_linux_parse_l2(struct dp_packet *b, uint16_t *l4proto) return -EINVAL; } + l3_len = IPV6_HEADER_LEN; *l4proto = nh6->ip6_ctlun.ip6_un1.ip6_un1_nxt; dp_packet_hwol_set_tx_ipv6(b); } + dp_packet_hwol_set_l3_len(b, l3_len); + + if (*l4proto == IPPROTO_TCP) { + struct tcp_header *tcp_hdr = dp_packet_at(b, l2_len + l3_len, + sizeof(struct tcp_header)); + + if (!tcp_hdr) { + return -EINVAL; + } + + l4_len = TCP_OFFSET(tcp_hdr->tcp_ctl) * 4; + dp_packet_hwol_set_l4_len(b, l4_len); + } + return 0; } diff --git a/lib/netdev.c b/lib/netdev.c index 90962ee..b437caf 100644 --- a/lib/netdev.c +++ b/lib/netdev.c @@ -960,15 +960,21 @@ netdev_push_header(const struct netdev *netdev, size_t i, size = dp_packet_batch_size(batch); DP_PACKET_BATCH_REFILL_FOR_EACH (i, size, packet, batch) { - if (OVS_UNLIKELY(dp_packet_hwol_is_tso(packet) - || dp_packet_hwol_l4_mask(packet))) { + if (OVS_UNLIKELY((dp_packet_hwol_is_tso(packet) + || dp_packet_hwol_l4_mask(packet)) + && (data->tnl_type != OVS_VPORT_TYPE_VXLAN))) { COVERAGE_INC(netdev_push_header_drops); dp_packet_delete(packet); - VLOG_WARN_RL(&rl, "%s: Tunneling packets with HW offload flags is " - "not supported: packet dropped", + VLOG_WARN_RL(&rl, + "%s: non-VxLAN Tunneling packets with HW offload " + "flags is not supported: packet dropped", netdev_get_name(netdev)); } else { netdev->netdev_class->push_header(netdev, packet, data); + if ((data->tnl_type == OVS_VPORT_TYPE_VXLAN) + && dp_packet_hwol_is_tso(packet)) { + dp_packet_hwol_set_vxlan_tcp_seg(packet); + } pkt_metadata_init(&packet->md, data->out_port); dp_packet_batch_refill(batch, packet, i); } From patchwork Wed Jul 1 09:15:31 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: yang_y_yi X-Patchwork-Id: 1320324 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=140.211.166.137; helo=fraxinus.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=163.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=163.com header.i=@163.com header.a=rsa-sha256 header.s=s110527 header.b=QstmOoYj; dkim-atps=neutral Received: from fraxinus.osuosl.org (smtp4.osuosl.org [140.211.166.137]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 49xbFH5cLCz9sTY for ; Wed, 1 Jul 2020 19:16:18 +1000 (AEST) Received: from localhost (localhost [127.0.0.1]) by fraxinus.osuosl.org (Postfix) with ESMTP id 73B3A895E0; Wed, 1 Jul 2020 09:16:16 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from fraxinus.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id SYyJ7mgSdwya; Wed, 1 Jul 2020 09:16:12 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by fraxinus.osuosl.org (Postfix) with ESMTP id 35599895CC; Wed, 1 Jul 2020 09:16:05 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id E6416C08A8; Wed, 1 Jul 2020 09:16:04 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from silver.osuosl.org (smtp3.osuosl.org [140.211.166.136]) by lists.linuxfoundation.org (Postfix) with ESMTP id 8C94CC0733 for ; Wed, 1 Jul 2020 09:16:01 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by silver.osuosl.org (Postfix) with ESMTP id 6A7B22F953 for ; Wed, 1 Jul 2020 09:16:01 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from silver.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id YoMzX+gXdSsA for ; Wed, 1 Jul 2020 09:15:59 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from mail-m973.mail.163.com (mail-m973.mail.163.com [123.126.97.3]) by silver.osuosl.org (Postfix) with ESMTPS id 607CC2FC5D for ; Wed, 1 Jul 2020 09:15:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=163.com; s=s110527; h=From:Subject:Date:Message-Id:MIME-Version; bh=fe3N5 /OqtEQEhayEQiEMrjurdhnKR503Xhuvs95BSds=; b=QstmOoYj099JOZNN+I+fM N/06N4m0DECAj5zhUZAlY9Wz5q/EQTWv+Xgj0dPw58e3KjCqHUc2NZKrCjm76KrO N77jTwOoH3bQl01CuoN3H1zb229uGpq6Y7ceJ0DMBp8m8Y+/l5VRmTqB5+Xc++9F FQK7lD38DamgpaXbqo1CIg= Received: from yangyi0100.home.langchao.com (unknown [111.207.123.55]) by smtp3 (Coremail) with SMTP id G9xpCgA3OWM1VPxeaI88EQ--.82S5; Wed, 01 Jul 2020 17:15:36 +0800 (CST) From: yang_y_yi@163.com To: ovs-dev@openvswitch.org Date: Wed, 1 Jul 2020 17:15:31 +0800 Message-Id: <20200701091533.221552-4-yang_y_yi@163.com> X-Mailer: git-send-email 2.19.2.windows.1 In-Reply-To: <20200701091533.221552-1-yang_y_yi@163.com> References: <20200701091533.221552-1-yang_y_yi@163.com> MIME-Version: 1.0 X-CM-TRANSID: G9xpCgA3OWM1VPxeaI88EQ--.82S5 X-Coremail-Antispam: 1Uf129KBjvAXoW3KrW3GF4xZryrtw4fWF43trb_yoW8GrW8Xo ZrJa1fXw18Kr1UA347Kw18ua1DZw48Ka18uanY9w15Zas0yr15W34fJ3y5A3y3Zwn3ZFs8 uw18ta42qr48GrZ5n29KB7ZKAUJUUUUU529EdanIXcx71UUUUU7v73VFW2AGmfu7bjvjm3 AaLaJ3UbIYCTnIWIevJa73UjIFyTuYvjxUe9a9DUUUU X-Originating-IP: [111.207.123.55] X-CM-SenderInfo: 51dqwsp1b1xqqrwthudrp/1tbiFhhUi144LwBcuwAAs- Cc: yang_y_yi@163.com, fbl@sysclose.org Subject: [ovs-dev] [PATCH v2 3/5] Add GSO support for DPDK data path X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" From: Yi Yang GSO(Generic Segment Offload) can segment large UDP and TCP packet to small packets per MTU of destination , especially for the case that physical NIC can't do hardware offload VXLAN TSO and VXLAN UFO, GSO can make sure userspace TSO can still work but not drop. In addition, GSO can help improve UDP performane when UFO is enabled in VM. GSO can support TCP, UDP, VXLAN TCP, VXLAN UDP, it is done in Tx function of physical NIC. Signed-off-by: Yi Yang --- lib/dp-packet.h | 21 +++++- lib/netdev-dpdk.c | 200 ++++++++++++++++++++++++++++++++++++++++++++++++----- lib/netdev-linux.c | 17 ++++- 3 files changed, 216 insertions(+), 22 deletions(-) diff --git a/lib/dp-packet.h b/lib/dp-packet.h index 07af124..282d374 100644 --- a/lib/dp-packet.h +++ b/lib/dp-packet.h @@ -81,6 +81,8 @@ enum dp_packet_offload_mask { DEF_OL_FLAG(DP_PACKET_OL_TX_UDP_CKSUM, PKT_TX_UDP_CKSUM, 0x400), /* Offload SCTP checksum. */ DEF_OL_FLAG(DP_PACKET_OL_TX_SCTP_CKSUM, PKT_TX_SCTP_CKSUM, 0x800), + /* UDP Segmentation Offload. */ + DEF_OL_FLAG(DP_PACKET_OL_TX_UDP_SEG, PKT_TX_UDP_SEG, 0x1000), /* Adding new field requires adding to DP_PACKET_OL_SUPPORTED_MASK. */ }; @@ -95,7 +97,8 @@ enum dp_packet_offload_mask { DP_PACKET_OL_TX_IPV6 | \ DP_PACKET_OL_TX_TCP_CKSUM | \ DP_PACKET_OL_TX_UDP_CKSUM | \ - DP_PACKET_OL_TX_SCTP_CKSUM) + DP_PACKET_OL_TX_SCTP_CKSUM | \ + DP_PACKET_OL_TX_UDP_SEG) #define DP_PACKET_OL_TX_L4_MASK (DP_PACKET_OL_TX_TCP_CKSUM | \ DP_PACKET_OL_TX_UDP_CKSUM | \ @@ -956,6 +959,13 @@ dp_packet_hwol_is_tso(const struct dp_packet *b) return !!(*dp_packet_ol_flags_ptr(b) & DP_PACKET_OL_TX_TCP_SEG); } +/* Returns 'true' if packet 'b' is marked for UDP segmentation offloading. */ +static inline bool +dp_packet_hwol_is_uso(const struct dp_packet *b) +{ + return !!(*dp_packet_ol_flags_ptr(b) & DP_PACKET_OL_TX_UDP_SEG); +} + /* Returns 'true' if packet 'b' is marked for IPv4 checksum offloading. */ static inline bool dp_packet_hwol_is_ipv4(const struct dp_packet *b) @@ -1034,6 +1044,15 @@ dp_packet_hwol_set_tcp_seg(struct dp_packet *b) *dp_packet_ol_flags_ptr(b) |= DP_PACKET_OL_TX_TCP_SEG; } +/* Mark packet 'b' for UDP segmentation offloading. It implies that + * either the packet 'b' is marked for IPv4 or IPv6 checksum offloading + * and also for UDP checksum offloading. */ +static inline void +dp_packet_hwol_set_udp_seg(struct dp_packet *b) +{ + *dp_packet_ol_flags_ptr(b) |= DP_PACKET_OL_TX_UDP_SEG; +} + #ifdef DPDK_NETDEV /* Mark packet 'b' for VXLAN TCP segmentation offloading. */ static inline void diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index bf5fa63..50fa11d 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -38,13 +38,15 @@ #include #include #include +#include +#include #include #include #include #include #include #include -#include +#include #include "cmap.h" #include "coverage.h" @@ -162,6 +164,7 @@ typedef uint16_t dpdk_port_t; | DEV_TX_OFFLOAD_UDP_CKSUM \ | DEV_TX_OFFLOAD_IPV4_CKSUM) +#define MAX_GSO_MBUFS 64 static const struct rte_eth_conf port_conf = { .rxmode = { @@ -2144,6 +2147,16 @@ netdev_dpdk_rxq_dealloc(struct netdev_rxq *rxq) rte_free(rx); } +static uint16_t +get_udptcp_checksum(void *l3_hdr, void *l4_hdr, uint16_t ethertype) +{ + if (ethertype == htons(RTE_ETHER_TYPE_IPV4)) { + return rte_ipv4_udptcp_cksum(l3_hdr, l4_hdr); + } else { /* assume ethertype == RTE_ETHER_TYPE_IPV6 */ + return rte_ipv6_udptcp_cksum(l3_hdr, l4_hdr); + } +} + /* Prepare the packet for HWOL. * Return True if the packet is OK to continue. */ static bool @@ -2216,6 +2229,10 @@ netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, struct rte_mbuf *mbuf) mbuf->outer_l3_len = 0; } + if ((l4_proto != IPPROTO_UDP) && (l4_proto != IPPROTO_TCP)) { + return true; + } + if ((mbuf->ol_flags & PKT_TX_L4_MASK) == PKT_TX_UDP_CKSUM) { if (l4_proto != IPPROTO_UDP) { VLOG_WARN_RL(&rl, "%s: UDP packet without L4 header" @@ -2227,7 +2244,8 @@ netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, struct rte_mbuf *mbuf) mbuf->ol_flags & PKT_TX_TCP_CKSUM) { if (l4_proto != IPPROTO_TCP) { VLOG_WARN_RL(&rl, "%s: TCP Segmentation without L4 header" - " pkt len: %"PRIu32"", dev->up.name, mbuf->pkt_len); + " pkt len: %"PRIu32" l4_proto = %d", + dev->up.name, mbuf->pkt_len, l4_proto); return false; } @@ -2242,6 +2260,50 @@ netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, struct rte_mbuf *mbuf) mbuf->tso_segsz = 0; } } + + /* UDP GSO if necessary */ + if (l4_proto == IPPROTO_UDP) { + /* VXLAN GSO can be done here */ + if ((mbuf->ol_flags & PKT_TX_UDP_SEG) || + (mbuf->pkt_len > (1450 + mbuf->outer_l2_len + mbuf->outer_l3_len + + mbuf->l2_len))) { + dp_packet_hwol_set_udp_seg(pkt); + + /* For UDP GSO, udp checksum must be calculated by software */ + if ((mbuf->ol_flags & PKT_TX_L4_MASK) == PKT_TX_UDP_CKSUM) { + void *l3_hdr, *l4_hdr; + struct rte_udp_hdr *udp_hdr; + + /* PKT_TX_UDP_CKSUM must be cleaned for GSO because + * udp checksum only can be caculated by software for + * GSO case. + */ + mbuf->ol_flags &= ~PKT_TX_UDP_CKSUM; + + eth_hdr = (struct rte_ether_hdr *) + ((uint8_t *)eth_hdr + mbuf->outer_l2_len + + mbuf->outer_l3_len + + sizeof(struct udp_header) + + sizeof(struct vxlanhdr)); + l3_hdr = (uint8_t *)eth_hdr + mbuf->l2_len - + sizeof(struct udp_header) - + sizeof(struct vxlanhdr); + l4_hdr = (uint8_t *)l3_hdr + mbuf->l3_len; + udp_hdr = (struct rte_udp_hdr *)l4_hdr; + udp_hdr->dgram_cksum = 0; + udp_hdr->dgram_cksum = + get_udptcp_checksum(l3_hdr, l4_hdr, eth_hdr->ether_type); + } + + /* FOR GSO, gso_size includes l2_len + l3_len */ + mbuf->tso_segsz = 1450 + mbuf->outer_l2_len + mbuf->outer_l3_len + + mbuf->l2_len; + if (mbuf->tso_segsz > dev->mtu) { + mbuf->tso_segsz = dev->mtu; + } + } + } + return true; } @@ -2272,24 +2334,19 @@ netdev_dpdk_prep_hwol_batch(struct netdev_dpdk *dev, struct rte_mbuf **pkts, return cnt; } -/* Tries to transmit 'pkts' to txq 'qid' of device 'dev'. Takes ownership of - * 'pkts', even in case of failure. - * - * Returns the number of packets that weren't transmitted. */ static inline int -netdev_dpdk_eth_tx_burst(struct netdev_dpdk *dev, int qid, - struct rte_mbuf **pkts, int cnt) +__netdev_dpdk_eth_tx_burst(struct netdev_dpdk *dev, int qid, + struct rte_mbuf **pkts, int cnt) { uint32_t nb_tx = 0; - uint16_t nb_tx_prep = cnt; + uint32_t nb_tx_prep; - if (userspace_tso_enabled()) { - nb_tx_prep = rte_eth_tx_prepare(dev->port_id, qid, pkts, cnt); - if (nb_tx_prep != cnt) { - VLOG_WARN_RL(&rl, "%s: Output batch contains invalid packets. " - "Only %u/%u are valid: %s", dev->up.name, nb_tx_prep, - cnt, rte_strerror(rte_errno)); - } + nb_tx_prep = rte_eth_tx_prepare(dev->port_id, qid, pkts, cnt); + if (nb_tx_prep != cnt) { + VLOG_WARN_RL(&rl, "%s: Output batch contains invalid packets. " + "Only %u/%u are valid: %s", + dev->up.name, nb_tx_prep, + cnt, rte_strerror(rte_errno)); } while (nb_tx != nb_tx_prep) { @@ -2317,6 +2374,88 @@ netdev_dpdk_eth_tx_burst(struct netdev_dpdk *dev, int qid, return cnt - nb_tx; } +/* Tries to transmit 'pkts' to txq 'qid' of device 'dev'. Takes ownership of + * 'pkts', even in case of failure. + * + * Returns the number of packets that weren't transmitted. */ +static inline int +netdev_dpdk_eth_tx_burst(struct netdev_dpdk *dev, int qid, + struct rte_mbuf **pkts, int cnt) +{ + uint32_t nb_tx = 0; + int i; + int ret; + int failures = 0; + + if (userspace_tso_enabled()) { + /* The best point to do gso */ + struct rte_gso_ctx gso_ctx; + struct rte_mbuf *gso_mbufs[MAX_GSO_MBUFS]; + int tx_start = -1; + + /* Setup gso context */ + gso_ctx.direct_pool = dev->dpdk_mp->mp; + gso_ctx.indirect_pool = dev->dpdk_mp->mp; + gso_ctx.gso_types = 0; + gso_ctx.gso_size = 0; + gso_ctx.flag = 0; + + /* Do GSO if needed */ + for (i = 0; i < cnt; i++) { + if (pkts[i]->ol_flags & PKT_TX_UDP_SEG) { + /* Send non GSO packets before pkts[i] */ + if (tx_start != -1) { + failures += __netdev_dpdk_eth_tx_burst( + dev, qid, + pkts + tx_start, + i - tx_start); + } + tx_start = -1; + + if (pkts[i]->ol_flags & PKT_TX_TUNNEL_VXLAN) { + gso_ctx.gso_types = DEV_TX_OFFLOAD_VXLAN_TNL_TSO | + DEV_TX_OFFLOAD_UDP_TSO; + } else { + gso_ctx.gso_types = DEV_TX_OFFLOAD_UDP_TSO; + } + gso_ctx.gso_size = pkts[i]->tso_segsz; + ret = rte_gso_segment(pkts[i], /* packet to segment */ + &gso_ctx, /* gso context */ + /* gso output mbufs */ + (struct rte_mbuf **)&gso_mbufs, + MAX_GSO_MBUFS); + if (ret < 0) { + rte_pktmbuf_free(pkts[i]); + } else { + int j, k; + struct rte_mbuf * next_part; + nb_tx = ret; + for (j = 0; j < nb_tx; j++) { + next_part = gso_mbufs[j]; + for (k = 0; k < gso_mbufs[j]->nb_segs; k++) { + next_part = next_part->next; + } + } + __netdev_dpdk_eth_tx_burst(dev, qid, gso_mbufs, nb_tx); + } + continue; + } + if (tx_start == -1) { + tx_start = i; + } + } + + if (tx_start != -1) { + /* Send non GSO packets before pkts[i] */ + failures += __netdev_dpdk_eth_tx_burst(dev, qid, pkts + tx_start, + i - tx_start); + } + return failures; + } + + return __netdev_dpdk_eth_tx_burst(dev, qid, pkts, cnt); +} + static inline bool netdev_dpdk_srtcm_policer_pkt_handle(struct rte_meter_srtcm *meter, struct rte_meter_srtcm_profile *profile, @@ -2446,7 +2585,7 @@ netdev_dpdk_vhost_update_rx_counters(struct netdev_dpdk *dev, } static void -netdev_linux_parse_l2(struct dp_packet *pkt, uint16_t *l4_proto) +netdev_dpdk_parse_l2(struct dp_packet *pkt, uint16_t *l4_proto, int *is_frag) { struct rte_mbuf *mbuf = (struct rte_mbuf *)pkt; struct rte_ether_hdr *eth_hdr = @@ -2456,6 +2595,7 @@ netdev_linux_parse_l2(struct dp_packet *pkt, uint16_t *l4_proto) int l3_len = 0; int l4_len = 0; + *is_frag = 0; l2_len = ETH_HEADER_LEN; eth_type = (OVS_FORCE ovs_be16) eth_hdr->ether_type; if (eth_type_vlan(eth_type)) { @@ -2475,9 +2615,11 @@ netdev_linux_parse_l2(struct dp_packet *pkt, uint16_t *l4_proto) l3_len = IP_HEADER_LEN; dp_packet_hwol_set_tx_ipv4(pkt); *l4_proto = ipv4_hdr->next_proto_id; + *is_frag = rte_ipv4_frag_pkt_is_fragmented(ipv4_hdr); } else if (eth_type == htons(RTE_ETHER_TYPE_IPV6)) { struct rte_ipv6_hdr *ipv6_hdr = (struct rte_ipv6_hdr *) ((char *)eth_hdr + l2_len); + l3_len = IPV6_HEADER_LEN; dp_packet_hwol_set_tx_ipv6(pkt); *l4_proto = ipv6_hdr->proto; @@ -2491,6 +2633,12 @@ netdev_linux_parse_l2(struct dp_packet *pkt, uint16_t *l4_proto) l4_len = (tcp_hdr->data_off & 0xf0) >> 2; dp_packet_hwol_set_l4_len(pkt, l4_len); + } else if (*l4_proto == IPPROTO_UDP) { + struct rte_udp_hdr *udp_hdr = (struct rte_udp_hdr *) + ((char *)eth_hdr + l2_len + l3_len); + + l4_len = sizeof(*udp_hdr); + dp_packet_hwol_set_l4_len(pkt, l4_len); } } @@ -2498,13 +2646,16 @@ static void netdev_dpdk_parse_hdr(struct dp_packet *b) { uint16_t l4_proto = 0; + int is_frag = 0; - netdev_linux_parse_l2(b, &l4_proto); + netdev_dpdk_parse_l2(b, &l4_proto, &is_frag); if (l4_proto == IPPROTO_TCP) { dp_packet_hwol_set_csum_tcp(b); } else if (l4_proto == IPPROTO_UDP) { - dp_packet_hwol_set_csum_udp(b); + if (is_frag == 0) { + dp_packet_hwol_set_csum_udp(b); + } } else if (l4_proto == IPPROTO_SCTP) { dp_packet_hwol_set_csum_sctp(b); } @@ -5195,6 +5346,7 @@ netdev_dpdk_vhost_client_reconfigure(struct netdev *netdev) int err; uint64_t vhost_flags = 0; uint64_t vhost_unsup_flags; + uint64_t vhost_supported_flags; bool zc_enabled; ovs_mutex_lock(&dev->mutex); @@ -5280,6 +5432,16 @@ netdev_dpdk_vhost_client_reconfigure(struct netdev *netdev) goto unlock; } + err = rte_vhost_driver_get_features(dev->vhost_id, + &vhost_supported_flags); + if (err) { + VLOG_ERR("rte_vhost_driver_get_features failed for " + "vhost user client port: %s\n", dev->up.name); + goto unlock; + } + VLOG_INFO("vhostuserclient port %s features: 0x%016lx", + dev->up.name, vhost_supported_flags); + err = rte_vhost_driver_start(dev->vhost_id); if (err) { VLOG_ERR("rte_vhost_driver_start failed for vhost user " diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c index f6e80fc..c95f40f 100644 --- a/lib/netdev-linux.c +++ b/lib/netdev-linux.c @@ -6558,6 +6558,16 @@ netdev_linux_parse_l2(struct dp_packet *b, uint16_t *l4proto) l4_len = TCP_OFFSET(tcp_hdr->tcp_ctl) * 4; dp_packet_hwol_set_l4_len(b, l4_len); + } else if (*l4proto == IPPROTO_UDP) { + struct udp_header *udp_hdr = dp_packet_at(b, l2_len + l3_len, + sizeof(struct udp_header)); + + if (!udp_hdr) { + return -EINVAL; + } + + l4_len = sizeof(struct udp_header); + dp_packet_hwol_set_l4_len(b, l4_len); } return 0; @@ -6573,9 +6583,9 @@ netdev_linux_parse_vnet_hdr(struct dp_packet *b) return -EINVAL; } - if (vnet->flags == 0 && vnet->gso_type == VIRTIO_NET_HDR_GSO_NONE) { + /*if (vnet->flags == 0 && vnet->gso_type == VIRTIO_NET_HDR_GSO_NONE) { return 0; - } + }*/ if (netdev_linux_parse_l2(b, &l4proto)) { return -EINVAL; @@ -6601,6 +6611,9 @@ netdev_linux_parse_vnet_hdr(struct dp_packet *b) || type == VIRTIO_NET_HDR_GSO_TCPV6) { dp_packet_hwol_set_tcp_seg(b); } + if (type == VIRTIO_NET_HDR_GSO_UDP) { + dp_packet_hwol_set_udp_seg(b); + } } return 0; From patchwork Wed Jul 1 09:15:32 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: yang_y_yi X-Patchwork-Id: 1320322 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=140.211.166.133; helo=hemlock.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=163.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=163.com header.i=@163.com header.a=rsa-sha256 header.s=s110527 header.b=eSoBRmPT; dkim-atps=neutral Received: from hemlock.osuosl.org (smtp2.osuosl.org [140.211.166.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 49xbF61JgHz9sTb for ; Wed, 1 Jul 2020 19:16:10 +1000 (AEST) Received: from localhost (localhost [127.0.0.1]) by hemlock.osuosl.org (Postfix) with ESMTP id D24FD8AAF8; Wed, 1 Jul 2020 09:16:05 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from hemlock.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id vuGe226Cp31v; Wed, 1 Jul 2020 09:16:02 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by hemlock.osuosl.org (Postfix) with ESMTP id ACE918AA69; Wed, 1 Jul 2020 09:16:02 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 89CA6C08A9; Wed, 1 Jul 2020 09:16:02 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from hemlock.osuosl.org (smtp2.osuosl.org [140.211.166.133]) by lists.linuxfoundation.org (Postfix) with ESMTP id C2134C0733 for ; Wed, 1 Jul 2020 09:16:00 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by hemlock.osuosl.org (Postfix) with ESMTP id AED968A9DB for ; Wed, 1 Jul 2020 09:16:00 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from hemlock.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id O0uRwe5wZjEc for ; Wed, 1 Jul 2020 09:15:58 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from mail-m973.mail.163.com (mail-m973.mail.163.com [123.126.97.3]) by hemlock.osuosl.org (Postfix) with ESMTPS id A0DEC8A876 for ; Wed, 1 Jul 2020 09:15:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=163.com; s=s110527; h=From:Subject:Date:Message-Id:MIME-Version; bh=SRz5R tbyPd2375oHR5umQ01oUAIt8sl0BQ7J7j3OELs=; b=eSoBRmPTnPWRdONIGce3h bJBEvi/UwanxgqVF1Xloaxdt+G4T+Wfy5U29jD738BjJLhHxGrjqXZ4OwXAgK3b0 1Ac+va9K8LnzwlLPKEpBzYdH58DTNYtannO6RKc0HqEhXAeTK8QAHthk74kuDrzB 0I8SAAeveI8adZRnFvamIs= Received: from yangyi0100.home.langchao.com (unknown [111.207.123.55]) by smtp3 (Coremail) with SMTP id G9xpCgA3OWM1VPxeaI88EQ--.82S6; Wed, 01 Jul 2020 17:15:37 +0800 (CST) From: yang_y_yi@163.com To: ovs-dev@openvswitch.org Date: Wed, 1 Jul 2020 17:15:32 +0800 Message-Id: <20200701091533.221552-5-yang_y_yi@163.com> X-Mailer: git-send-email 2.19.2.windows.1 In-Reply-To: <20200701091533.221552-1-yang_y_yi@163.com> References: <20200701091533.221552-1-yang_y_yi@163.com> MIME-Version: 1.0 X-CM-TRANSID: G9xpCgA3OWM1VPxeaI88EQ--.82S6 X-Coremail-Antispam: 1Uf129KBjvJXoW3KFW7Ww4DurW7Kr47Jw4kXrb_yoWkAw45pa 1UGF95Aw4ktwsrtrsrXr4ruwn3Krs7Cr4rK39aqw1Sv3Zrtw1Fqay09FWj9Fy3ta4UGw13 tw4qy3Z5WF1UWw7anT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDUYxBIdaVFxhVjvjDU0xZFpf9x0zR4KZAUUUUU= X-Originating-IP: [111.207.123.55] X-CM-SenderInfo: 51dqwsp1b1xqqrwthudrp/xtbB0hBUi1UMXRanfQABsU Cc: yang_y_yi@163.com, fbl@sysclose.org Subject: [ovs-dev] [PATCH v2 4/5] Add VXLAN TCP and UDP GRO support for DPDK data path X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" From: Yi Yang GRO(Generic Receive Offload) can help improve performance when TSO (TCP Segment Offload) or VXLAN TSO is enabled on transmit side, this can avoid overhead of ovs DPDK data path and enqueue vhost for VM by merging many small packets to large packets (65535 bytes at most) once it receives packets from physical NIC. It can work for both VXLAN and vlan case. Signed-off-by: Yi Yang --- lib/dp-packet.h | 26 ++++++++ lib/netdev-dpdk.c | 178 +++++++++++++++++++++++++++++++++++++++++++++++++++--- 2 files changed, 195 insertions(+), 9 deletions(-) diff --git a/lib/dp-packet.h b/lib/dp-packet.h index 282d374..3ddee36 100644 --- a/lib/dp-packet.h +++ b/lib/dp-packet.h @@ -1085,6 +1085,20 @@ dp_packet_hwol_set_l4_len(struct dp_packet *b, int l4_len) { b->mbuf.l4_len = l4_len; } + +/* Set outer_l2_len for the packet 'b' */ +static inline void +dp_packet_hwol_set_outer_l2_len(struct dp_packet *b, int outer_l2_len) +{ + b->mbuf.outer_l2_len = outer_l2_len; +} + +/* Set outer_l3_len for the packet 'b' */ +static inline void +dp_packet_hwol_set_outer_l3_len(struct dp_packet *b, int outer_l3_len) +{ + b->mbuf.outer_l3_len = outer_l3_len; +} #else /* Mark packet 'b' for VXLAN TCP segmentation offloading. */ static inline void @@ -1112,6 +1126,18 @@ dp_packet_hwol_set_l4_len(struct dp_packet *b OVS_UNUSED, int l4_len OVS_UNUSED) { } + +/* Set outer_l2_len for the packet 'b' */ +static inline void +dp_packet_hwol_set_outer_l2_len(struct dp_packet *b, int outer_l2_len) +{ +} + +/* Set outer_l3_len for the packet 'b' */ +static inline void +dp_packet_hwol_set_outer_l3_len(struct dp_packet *b, int outer_l3_len) +{ +} #endif /* DPDK_NETDEV */ static inline bool diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 50fa11d..61c2c62 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -25,6 +25,7 @@ #include #include #include +#include /* Include rte_compat.h first to allow experimental API's needed for the * rte_meter.h rfc4115 functions. Once they are no longer marked as @@ -47,6 +48,7 @@ #include #include #include +#include #include "cmap.h" #include "coverage.h" @@ -2157,6 +2159,8 @@ get_udptcp_checksum(void *l3_hdr, void *l4_hdr, uint16_t ethertype) } } +#define UDP_VXLAN_ETH_HDR_SIZE 30 + /* Prepare the packet for HWOL. * Return True if the packet is OK to continue. */ static bool @@ -2169,6 +2173,42 @@ netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, struct rte_mbuf *mbuf) struct rte_ipv4_hdr *ip_hdr; struct rte_ipv6_hdr *ip6_hdr; + /* ol_flags is cleaned after vxlan pop, so need reset for those packets. + * Such packets are only for local VMs or namespaces, so need to return + * after ol_flags, l2_len, l3_len and tso_segsz are set. + */ + if (((mbuf->ol_flags & PKT_TX_TUNNEL_VXLAN) == 0) && + (mbuf->l2_len == UDP_VXLAN_ETH_HDR_SIZE) && + (mbuf->pkt_len > 1464)) { + mbuf->ol_flags = 0; + mbuf->l2_len -= sizeof(struct udp_header) + + sizeof(struct vxlanhdr); + if (mbuf->l3_len == IP_HEADER_LEN) { + mbuf->ol_flags |= PKT_TX_IPV4; + ip_hdr = (struct rte_ipv4_hdr *)(eth_hdr + 1); + l4_proto = ip_hdr->next_proto_id; + } else if (mbuf->l3_len == IPV6_HEADER_LEN) { + mbuf->ol_flags |= PKT_TX_IPV6; + ip6_hdr = (struct rte_ipv6_hdr *)(eth_hdr + 1); + l4_proto = ip6_hdr->proto; + } + + mbuf->ol_flags |= PKT_TX_IP_CKSUM; + if (l4_proto == IPPROTO_TCP) { + mbuf->ol_flags |= PKT_TX_TCP_SEG; + mbuf->ol_flags |= PKT_TX_TCP_CKSUM; + } else if (l4_proto == IPPROTO_UDP) { + mbuf->ol_flags |= PKT_TX_UDP_SEG; + mbuf->ol_flags |= PKT_TX_UDP_CKSUM; + } + mbuf->tso_segsz = 1450; + if (mbuf->tso_segsz > dev->mtu) { + mbuf->tso_segsz = dev->mtu; + } + + return true; + } + if (mbuf->ol_flags & PKT_TX_TUNNEL_VXLAN) { /* Handle VXLAN TSO */ struct rte_udp_hdr *udp_hdr; @@ -2584,18 +2624,26 @@ netdev_dpdk_vhost_update_rx_counters(struct netdev_dpdk *dev, } } +#define VXLAN_DST_PORT 4789 + static void -netdev_dpdk_parse_l2(struct dp_packet *pkt, uint16_t *l4_proto, int *is_frag) +__netdev_dpdk_parse_hdr(struct dp_packet *pkt, int offset, + uint16_t *l4_proto, int *is_frag) { struct rte_mbuf *mbuf = (struct rte_mbuf *)pkt; struct rte_ether_hdr *eth_hdr = - rte_pktmbuf_mtod(mbuf, struct rte_ether_hdr *); + rte_pktmbuf_mtod_offset(mbuf, struct rte_ether_hdr *, offset); ovs_be16 eth_type; int l2_len; int l3_len = 0; int l4_len = 0; + uint16_t inner_l4_proto = 0; + int inner_is_frag = 0; - *is_frag = 0; + if (offset == 0) { + *is_frag = 0; + } + mbuf->packet_type = 0; l2_len = ETH_HEADER_LEN; eth_type = (OVS_FORCE ovs_be16) eth_hdr->ether_type; if (eth_type_vlan(eth_type)) { @@ -2616,6 +2664,7 @@ netdev_dpdk_parse_l2(struct dp_packet *pkt, uint16_t *l4_proto, int *is_frag) dp_packet_hwol_set_tx_ipv4(pkt); *l4_proto = ipv4_hdr->next_proto_id; *is_frag = rte_ipv4_frag_pkt_is_fragmented(ipv4_hdr); + mbuf->packet_type |= RTE_PTYPE_L3_IPV4; } else if (eth_type == htons(RTE_ETHER_TYPE_IPV6)) { struct rte_ipv6_hdr *ipv6_hdr = (struct rte_ipv6_hdr *) ((char *)eth_hdr + l2_len); @@ -2626,6 +2675,8 @@ netdev_dpdk_parse_l2(struct dp_packet *pkt, uint16_t *l4_proto, int *is_frag) } dp_packet_hwol_set_l3_len(pkt, l3_len); + dp_packet_hwol_set_outer_l2_len(pkt, 0); + dp_packet_hwol_set_outer_l3_len(pkt, 0); if (*l4_proto == IPPROTO_TCP) { struct rte_tcp_hdr *tcp_hdr = (struct rte_tcp_hdr *) @@ -2633,12 +2684,38 @@ netdev_dpdk_parse_l2(struct dp_packet *pkt, uint16_t *l4_proto, int *is_frag) l4_len = (tcp_hdr->data_off & 0xf0) >> 2; dp_packet_hwol_set_l4_len(pkt, l4_len); + mbuf->packet_type |= RTE_PTYPE_L4_TCP; } else if (*l4_proto == IPPROTO_UDP) { struct rte_udp_hdr *udp_hdr = (struct rte_udp_hdr *) ((char *)eth_hdr + l2_len + l3_len); - l4_len = sizeof(*udp_hdr); + l4_len = sizeof(struct rte_udp_hdr); dp_packet_hwol_set_l4_len(pkt, l4_len); + mbuf->packet_type |= RTE_PTYPE_L4_UDP; + + /* Need to parse inner packet if needed */ + if (ntohs(udp_hdr->dst_port) == VXLAN_DST_PORT) { + __netdev_dpdk_parse_hdr(pkt, + l2_len + l3_len + l4_len + + sizeof(struct vxlanhdr), + &inner_l4_proto, + &inner_is_frag); + mbuf->l2_len += sizeof(struct rte_udp_hdr) + + sizeof(struct vxlanhdr); + dp_packet_hwol_set_outer_l2_len(pkt, l2_len); + dp_packet_hwol_set_outer_l3_len(pkt, l3_len); + + /* Set packet_type, it is necessary for GRO */ + mbuf->packet_type |= RTE_PTYPE_TUNNEL_VXLAN; + if (mbuf->l3_len == IP_HEADER_LEN) { + mbuf->packet_type |= RTE_PTYPE_INNER_L3_IPV4; + } + if (inner_l4_proto == IPPROTO_TCP) { + mbuf->packet_type |= RTE_PTYPE_INNER_L4_TCP; + } else if (inner_l4_proto == IPPROTO_UDP) { + mbuf->packet_type |= RTE_PTYPE_INNER_L4_UDP; + } + } } } @@ -2648,7 +2725,7 @@ netdev_dpdk_parse_hdr(struct dp_packet *b) uint16_t l4_proto = 0; int is_frag = 0; - netdev_dpdk_parse_l2(b, &l4_proto, &is_frag); + __netdev_dpdk_parse_hdr(b, 0, &l4_proto, &is_frag); if (l4_proto == IPPROTO_TCP) { dp_packet_hwol_set_csum_tcp(b); @@ -2733,6 +2810,8 @@ netdev_dpdk_vhost_rxq_enabled(struct netdev_rxq *rxq) return dev->vhost_rxq_enabled[rxq->queue_id]; } +static RTE_DEFINE_PER_LCORE(void *, _gro_ctx); + static int netdev_dpdk_rxq_recv(struct netdev_rxq *rxq, struct dp_packet_batch *batch, int *qfill) @@ -2742,6 +2821,36 @@ netdev_dpdk_rxq_recv(struct netdev_rxq *rxq, struct dp_packet_batch *batch, struct ingress_policer *policer = netdev_dpdk_get_ingress_policer(dev); int nb_rx; int dropped = 0; + struct rte_gro_param gro_param; + struct dp_packet *packet; + struct dp_packet *udp_pkts[NETDEV_MAX_BURST]; + struct dp_packet *other_pkts[NETDEV_MAX_BURST]; + + int nb_udp_rx = 0; + int nb_other_rx = 0; + + /* Initialize GRO parameters */ + gro_param.gro_types = RTE_GRO_TCP_IPV4 | + RTE_GRO_UDP_IPV4 | + RTE_GRO_IPV4_VXLAN_TCP_IPV4 | + RTE_GRO_IPV4_VXLAN_UDP_IPV4; + gro_param.max_flow_num = 1024; + /* There are 46 fragments for a 64K big packet */ + gro_param.max_item_per_flow = NETDEV_MAX_BURST * 2; + + /* Initialize GRO context */ + if (RTE_PER_LCORE(_gro_ctx) == NULL) { + uint32_t cpu, node; + int ret; + + ret = syscall(__NR_getcpu, &cpu, &node, NULL); + if (ret == 0) { + gro_param.socket_id = node; + } else { + gro_param.socket_id = 0; + } + RTE_PER_LCORE(_gro_ctx) = rte_gro_ctx_create(&gro_param); + } if (OVS_UNLIKELY(!(dev->flags & NETDEV_UP))) { return EAGAIN; @@ -2770,7 +2879,58 @@ netdev_dpdk_rxq_recv(struct netdev_rxq *rxq, struct dp_packet_batch *batch, rte_spinlock_unlock(&dev->stats_lock); } + /* Need to parse packet header and set necessary fields in mbuf for GRO */ batch->count = nb_rx; + DP_PACKET_BATCH_FOR_EACH (i, packet, batch) { + uint16_t l4_proto = 0; + int is_frag = 0; + + __netdev_dpdk_parse_hdr(packet, 0, &l4_proto, &is_frag); + if (packet->mbuf.packet_type & RTE_PTYPE_TUNNEL_VXLAN) { + if (packet->mbuf.packet_type & RTE_PTYPE_INNER_L4_UDP) { + udp_pkts[nb_udp_rx++] = packet; + } else { + other_pkts[nb_other_rx++] = packet; + } + } else { + if (packet->mbuf.packet_type & RTE_PTYPE_L4_UDP) { + udp_pkts[nb_udp_rx++] = packet; + } else { + other_pkts[nb_other_rx++] = packet; + } + } + } + + /* Do GRO here if needed, note: IP fragment can be out of order */ + if (nb_udp_rx) { + /* UDP packet must use heavy rte_gro_reassemble */ + nb_udp_rx = rte_gro_reassemble((struct rte_mbuf **) udp_pkts, + nb_udp_rx, RTE_PER_LCORE(_gro_ctx)); + nb_udp_rx += rte_gro_timeout_flush(RTE_PER_LCORE(_gro_ctx), 10000, + RTE_GRO_UDP_IPV4 + | RTE_GRO_IPV4_VXLAN_UDP_IPV4, + (struct rte_mbuf **)&udp_pkts[nb_udp_rx], + NETDEV_MAX_BURST - nb_udp_rx); + } + + if (nb_other_rx) { + /* TCP packet is better for lightweigh rte_gro_reassemble_burst */ + nb_other_rx = rte_gro_reassemble_burst((struct rte_mbuf **) other_pkts, + nb_other_rx, + &gro_param); + } + + batch->count = nb_udp_rx + nb_other_rx; + if (nb_udp_rx) { + memcpy(batch->packets, udp_pkts, + nb_udp_rx * sizeof(struct dp_packet *)); + } + + if (nb_other_rx) { + memcpy(&batch->packets[nb_udp_rx], other_pkts, + nb_other_rx * sizeof(struct dp_packet *)); + } + dp_packet_batch_init_packet_fields(batch); if (qfill) { @@ -2811,10 +2971,11 @@ netdev_dpdk_filter_packet_len(struct netdev_dpdk *dev, struct rte_mbuf **pkts, for (i = 0; i < pkt_cnt; i++) { pkt = pkts[i]; if (OVS_UNLIKELY((pkt->pkt_len > dev->max_packet_len) - && !(pkt->ol_flags & PKT_TX_TCP_SEG))) { + && !(pkt->ol_flags & (PKT_TX_TCP_SEG | PKT_TX_UDP_SEG)))) { VLOG_WARN_RL(&rl, "%s: Too big size %" PRIu32 " " - "max_packet_len %d", dev->up.name, pkt->pkt_len, - dev->max_packet_len); + "max_packet_len %d ol_flags 0x%016lx", + dev->up.name, pkt->pkt_len, + dev->max_packet_len, pkt->ol_flags); rte_pktmbuf_free(pkt); continue; } @@ -3144,7 +3305,6 @@ netdev_dpdk_vhost_send(struct netdev *netdev, int qid, struct dp_packet_batch *batch, bool concurrent_txq OVS_UNUSED) { - if (OVS_UNLIKELY(batch->packets[0]->source != DPBUF_DPDK)) { dpdk_do_tx_copy(netdev, qid, batch); dp_packet_delete_batch(batch, true); From patchwork Wed Jul 1 09:15:33 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: yang_y_yi X-Patchwork-Id: 1320321 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=140.211.166.133; helo=hemlock.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=163.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=163.com header.i=@163.com header.a=rsa-sha256 header.s=s110527 header.b=fOa7FwdG; dkim-atps=neutral Received: from hemlock.osuosl.org (smtp2.osuosl.org [140.211.166.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 49xbF447p3z9sTZ for ; Wed, 1 Jul 2020 19:16:07 +1000 (AEST) Received: from localhost (localhost [127.0.0.1]) by hemlock.osuosl.org (Postfix) with ESMTP id DBD318ABBA; Wed, 1 Jul 2020 09:16:04 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from hemlock.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id y4K5Id+Qb6aj; Wed, 1 Jul 2020 09:16:03 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by hemlock.osuosl.org (Postfix) with ESMTP id 9A07D8ABF5; Wed, 1 Jul 2020 09:16:03 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 645F7C08A9; Wed, 1 Jul 2020 09:16:03 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from silver.osuosl.org (smtp3.osuosl.org [140.211.166.136]) by lists.linuxfoundation.org (Postfix) with ESMTP id D548FC0890 for ; Wed, 1 Jul 2020 09:16:00 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by silver.osuosl.org (Postfix) with ESMTP id C66792F953 for ; Wed, 1 Jul 2020 09:16:00 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from silver.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id epZGTFILsAWR for ; Wed, 1 Jul 2020 09:15:58 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from mail-m973.mail.163.com (mail-m973.mail.163.com [123.126.97.3]) by silver.osuosl.org (Postfix) with ESMTPS id 01AAB2FC50 for ; Wed, 1 Jul 2020 09:15:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=163.com; s=s110527; h=From:Subject:Date:Message-Id:MIME-Version; bh=hte1C 7s9Qjw2fDGfElA0oE0wKTvZh8hOMdOaaIU7mbs=; b=fOa7FwdGgApN3G+B+d6K8 9kSUmv51EX1aMmgPfBM0mD+QPTogfvxuw95reJsWa85CLbn9RTyWWzRnRfpAmlX1 DiAQ3jueYDi8iCjMXZLXh6MiM1zdRa0xoMen81sETlM22j2UBazXSvT7U9YQ7EB9 9KXn1fLAJZcNfFLqh/xO3A= Received: from yangyi0100.home.langchao.com (unknown [111.207.123.55]) by smtp3 (Coremail) with SMTP id G9xpCgA3OWM1VPxeaI88EQ--.82S7; Wed, 01 Jul 2020 17:15:37 +0800 (CST) From: yang_y_yi@163.com To: ovs-dev@openvswitch.org Date: Wed, 1 Jul 2020 17:15:33 +0800 Message-Id: <20200701091533.221552-6-yang_y_yi@163.com> X-Mailer: git-send-email 2.19.2.windows.1 In-Reply-To: <20200701091533.221552-1-yang_y_yi@163.com> References: <20200701091533.221552-1-yang_y_yi@163.com> MIME-Version: 1.0 X-CM-TRANSID: G9xpCgA3OWM1VPxeaI88EQ--.82S7 X-Coremail-Antispam: 1Uf129KBjvJXoW7WryfWFW7JF4kAFW3Jry8uFg_yoW8ZFyUpa y5urWIvrnIq3yjg34kXr17Xr1IgFW8Cay7CFnrta4YvanxJa4qvryUK3WYg3WUJFW3Jayr ZF1qyFy5uan8ArUanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDUYxBIdaVFxhVjvjDU0xZFpf9x0zR4mh7UUUUU= X-Originating-IP: [111.207.123.55] X-CM-SenderInfo: 51dqwsp1b1xqqrwthudrp/xtbB0hBUi1UMXRanfQACsX Cc: yang_y_yi@163.com, fbl@sysclose.org Subject: [ovs-dev] [PATCH v2 5/5] Update Documentation/topics/userspace-tso.rst X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" From: Yi Yang With GSO and GRO enabled, OVS DPDK can do GSO by software if NIC can't support TSO or VXLAN TSO hardware offload. Signed-off-by: Yi Yang --- Documentation/topics/userspace-tso.rst | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/Documentation/topics/userspace-tso.rst b/Documentation/topics/userspace-tso.rst index 0fbac93..71625eb 100644 --- a/Documentation/topics/userspace-tso.rst +++ b/Documentation/topics/userspace-tso.rst @@ -87,8 +87,8 @@ used to enable same:: Limitations ~~~~~~~~~~~ -The current OvS userspace `TSO` implementation supports flat and VLAN networks -only (i.e. no support for `TSO` over tunneled connection [VxLAN, GRE, IPinIP, +The current OvS userspace `TSO` implementation supports flat, VLAN and VXLAN +networks only (i.e. no support for `TSO` over tunneled connection [GRE, IPinIP, etc.]). The NIC driver must support and advertise checksum offload for TCP and UDP. @@ -98,11 +98,10 @@ in Open vSwitch. Currently, if the NIC supports that, then the feature is enabled, otherwise TSO can still be enabled but SCTP packets sent to the NIC will be dropped. -There is no software implementation of TSO, so all ports attached to the -datapath must support TSO or packets using that feature will be dropped -on ports without TSO support. That also means guests using vhost-user -in client mode will receive TSO packet regardless of TSO being enabled -or disabled within the guest. +There is software implementation of TSO, which is called as GSO (Generic +Segment Offload), so all ports attached to the datapath mustn't support TSO. +That also means guests using vhost-user in client mode can receive TSO packet +regardless of TSO being enabled or disabled within the guest. When the NIC performing the segmentation is using the i40e DPDK PMD, a fix must be included in the DPDK build, otherwise TSO will not work. The fix can