From patchwork Fri Aug 7 10:56:45 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: yang_y_yi X-Patchwork-Id: 1342217 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=140.211.166.136; helo=silver.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=163.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=163.com header.i=@163.com header.a=rsa-sha256 header.s=s110527 header.b=YMN1ZIsD; dkim-atps=neutral Received: from silver.osuosl.org (smtp3.osuosl.org [140.211.166.136]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4BNMlb2mgRz9sTF for ; Fri, 7 Aug 2020 20:58:02 +1000 (AEST) Received: from localhost (localhost [127.0.0.1]) by silver.osuosl.org (Postfix) with ESMTP id 9AC9325C89; Fri, 7 Aug 2020 10:58:00 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from silver.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id MZ7k8PzEMe3W; Fri, 7 Aug 2020 10:57:34 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by silver.osuosl.org (Postfix) with ESMTP id 661262561C; Fri, 7 Aug 2020 10:57:31 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 3897CC088E; Fri, 7 Aug 2020 10:57:31 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from fraxinus.osuosl.org (smtp4.osuosl.org [140.211.166.137]) by lists.linuxfoundation.org (Postfix) with ESMTP id 1E3C6C004C for ; Fri, 7 Aug 2020 10:57:30 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by fraxinus.osuosl.org (Postfix) with ESMTP id 0D37887313 for ; Fri, 7 Aug 2020 10:57:30 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from fraxinus.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 6qw30ttSzzMD for ; Fri, 7 Aug 2020 10:57:28 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from mail-m971.mail.163.com (mail-m971.mail.163.com [123.126.97.1]) by fraxinus.osuosl.org (Postfix) with ESMTP id C1A8C872FD for ; Fri, 7 Aug 2020 10:57:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=163.com; s=s110527; h=From:Subject:Date:Message-Id:MIME-Version; bh=MqgIS WApeVmIsL3FhQaBIBZPEiigD875XgvkYCAPLRY=; b=YMN1ZIsDGhdOLzQxd4Zot qqPU4q6HTz4DD069HuIX6AGMMa6sVGMniD7SWrgr36INoKTwHRQgwlvSMnb6aZFs kVtQo4jA9sdknS2JPf8vuATJ7SOvMue7FoK1Z/CGwRWUM0sibMfYqKOsAFjYpJRm xuAMCnuPFWqbuGhc6tmCbw= Received: from yangyi0100.home.langchao.com (unknown [111.207.123.58]) by smtp1 (Coremail) with SMTP id GdxpCgBXfVpwMy1fICWdCA--.20S3; Fri, 07 Aug 2020 18:57:05 +0800 (CST) From: yang_y_yi@163.com To: ovs-dev@openvswitch.org Date: Fri, 7 Aug 2020 18:56:45 +0800 Message-Id: <20200807105648.94860-2-yang_y_yi@163.com> X-Mailer: git-send-email 2.19.2.windows.1 In-Reply-To: <20200807105648.94860-1-yang_y_yi@163.com> References: <20200807105648.94860-1-yang_y_yi@163.com> MIME-Version: 1.0 X-CM-TRANSID: GdxpCgBXfVpwMy1fICWdCA--.20S3 X-Coremail-Antispam: 1Uf129KBjvAXoW3KFW7Aw48Xw1kWw4fZw1UZFb_yoW8CF13Co Z7Gr43uF1rWr1xA348KFyDZF4vqrW0kF409anYq3W5ua4ayr1DX3yfC3y3Aw43Zr13AF4D Zw1Utas3ZrZrJry8n29KB7ZKAUJUUUUU529EdanIXcx71UUUUU7v73VFW2AGmfu7bjvjm3 AaLaJ3UbIYCTnIWIevJa73UjIFyTuYvjxUjHqcDUUUU X-Originating-IP: [111.207.123.58] X-CM-SenderInfo: 51dqwsp1b1xqqrwthudrp/1tbiqAF5i1c7RMGQ4AAAsC Cc: yang_y_yi@163.com, fbl@sysclose.org Subject: [ovs-dev] [PATCH V3 1/4] Enable VXLAN TSO for DPDK datapath X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" From: Yi Yang Many NICs can support VXLAN TSO which can help improve across-compute-node VM-to-VM performance in case that MTU is set to 1500. This patch allows dpdkvhostuserclient interface and veth/tap interface to leverage NICs' offload capability to maximize across-compute-node TCP performance, with it applied, OVS DPDK can reach linespeed for across-compute-node VM-to-VM TCP performance. Signed-off-by: Yi Yang --- lib/dp-packet.h | 76 ++++++++++++++++++++ lib/netdev-dpdk.c | 188 ++++++++++++++++++++++++++++++++++++++++++++++---- lib/netdev-linux.c | 20 ++++++ lib/netdev-provider.h | 1 + lib/netdev.c | 69 ++++++++++++++++-- 5 files changed, 338 insertions(+), 16 deletions(-) diff --git a/lib/dp-packet.h b/lib/dp-packet.h index 0430cca..79895f2 100644 --- a/lib/dp-packet.h +++ b/lib/dp-packet.h @@ -81,6 +81,8 @@ enum dp_packet_offload_mask { DEF_OL_FLAG(DP_PACKET_OL_TX_UDP_CKSUM, PKT_TX_UDP_CKSUM, 0x400), /* Offload SCTP checksum. */ DEF_OL_FLAG(DP_PACKET_OL_TX_SCTP_CKSUM, PKT_TX_SCTP_CKSUM, 0x800), + /* VXLAN TCP Segmentation Offload. */ + DEF_OL_FLAG(DP_PACKET_OL_TX_TUNNEL_VXLAN, PKT_TX_TUNNEL_VXLAN, 0x1000), /* Adding new field requires adding to DP_PACKET_OL_SUPPORTED_MASK. */ }; @@ -1032,6 +1034,80 @@ dp_packet_hwol_set_tcp_seg(struct dp_packet *b) *dp_packet_ol_flags_ptr(b) |= DP_PACKET_OL_TX_TCP_SEG; } +#ifdef DPDK_NETDEV +/* Mark packet 'b' for VXLAN TCP segmentation offloading. */ +static inline void +dp_packet_hwol_set_vxlan_tcp_seg(struct dp_packet *b) +{ + b->mbuf.ol_flags |= DP_PACKET_OL_TX_TUNNEL_VXLAN; + b->mbuf.l2_len += sizeof(struct udp_header) + + sizeof(struct vxlanhdr); + b->mbuf.outer_l2_len = ETH_HEADER_LEN; + b->mbuf.outer_l3_len = IP_HEADER_LEN; +} + +/* Check if it is a VXLAN packet */ +static inline bool +dp_packet_hwol_is_vxlan_tcp_seg(struct dp_packet *b) +{ + return (b->mbuf.ol_flags & DP_PACKET_OL_TX_TUNNEL_VXLAN); +} + +/* Set l2_len for the packet 'b' */ +static inline void +dp_packet_hwol_set_l2_len(struct dp_packet *b, int l2_len) +{ + b->mbuf.l2_len = l2_len; +} + +/* Set l3_len for the packet 'b' */ +static inline void +dp_packet_hwol_set_l3_len(struct dp_packet *b, int l3_len) +{ + b->mbuf.l3_len = l3_len; +} + +/* Set l4_len for the packet 'b' */ +static inline void +dp_packet_hwol_set_l4_len(struct dp_packet *b, int l4_len) +{ + b->mbuf.l4_len = l4_len; +} +#else +/* Mark packet 'b' for VXLAN TCP segmentation offloading. */ +static inline void +dp_packet_hwol_set_vxlan_tcp_seg(struct dp_packet *b OVS_UNUSED) +{ +} + +/* Check if it is a VXLAN packet */ +static inline bool +dp_packet_hwol_is_vxlan_tcp_seg(struct dp_packet *b OVS_UNUSED) +{ +} + +/* Set l2_len for the packet 'b' */ +static inline void +dp_packet_hwol_set_l2_len(struct dp_packet *b OVS_UNUSED, + int l2_len OVS_UNUSED) +{ +} + +/* Set l3_len for the packet 'b' */ +static inline void +dp_packet_hwol_set_l3_len(struct dp_packet *b OVS_UNUSED, + int l3_len OVS_UNUSED) +{ +} + +/* Set l4_len for the packet 'b' */ +static inline void +dp_packet_hwol_set_l4_len(struct dp_packet *b OVS_UNUSED, + int l4_len OVS_UNUSED) +{ +} +#endif /* DPDK_NETDEV */ + static inline bool dp_packet_ip_checksum_valid(const struct dp_packet *p) { diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 44ebf96..30493ed 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -44,6 +44,7 @@ #include #include #include +#include #include "cmap.h" #include "coverage.h" @@ -87,6 +88,7 @@ COVERAGE_DEFINE(vhost_notification); #define OVS_CACHE_LINE_SIZE CACHE_LINE_SIZE #define OVS_VPORT_DPDK "ovs_dpdk" +#define DPDK_RTE_HDR_OFFSET 1 /* * need to reserve tons of extra space in the mbufs so we can align the @@ -405,6 +407,7 @@ enum dpdk_hw_ol_features { NETDEV_RX_HW_SCATTER = 1 << 2, NETDEV_TX_TSO_OFFLOAD = 1 << 3, NETDEV_TX_SCTP_CHECKSUM_OFFLOAD = 1 << 4, + NETDEV_TX_VXLAN_TNL_TSO_OFFLOAD = 1 << 5, }; /* @@ -986,8 +989,17 @@ dpdk_eth_dev_port_config(struct netdev_dpdk *dev, int n_rxq, int n_txq) conf.rxmode.offloads |= DEV_RX_OFFLOAD_KEEP_CRC; } + if (info.rx_offload_capa & DEV_TX_OFFLOAD_MULTI_SEGS) { + conf.txmode.offloads |= DEV_TX_OFFLOAD_MULTI_SEGS; + } + if (dev->hw_ol_features & NETDEV_TX_TSO_OFFLOAD) { conf.txmode.offloads |= DPDK_TX_TSO_OFFLOAD_FLAGS; + /* Enable VXLAN TSO support if available */ + if (dev->hw_ol_features & NETDEV_TX_VXLAN_TNL_TSO_OFFLOAD) { + conf.txmode.offloads |= DEV_TX_OFFLOAD_VXLAN_TNL_TSO; + conf.txmode.offloads |= DEV_TX_OFFLOAD_OUTER_IPV4_CKSUM; + } if (dev->hw_ol_features & NETDEV_TX_SCTP_CHECKSUM_OFFLOAD) { conf.txmode.offloads |= DEV_TX_OFFLOAD_SCTP_CKSUM; } @@ -1126,6 +1138,10 @@ dpdk_eth_dev_init(struct netdev_dpdk *dev) if ((info.tx_offload_capa & tx_tso_offload_capa) == tx_tso_offload_capa) { dev->hw_ol_features |= NETDEV_TX_TSO_OFFLOAD; + /* Enable VXLAN TSO support if available */ + if (info.tx_offload_capa & DEV_TX_OFFLOAD_VXLAN_TNL_TSO) { + dev->hw_ol_features |= NETDEV_TX_VXLAN_TNL_TSO_OFFLOAD; + } if (info.tx_offload_capa & DEV_TX_OFFLOAD_SCTP_CKSUM) { dev->hw_ol_features |= NETDEV_TX_SCTP_CHECKSUM_OFFLOAD; } else { @@ -2131,35 +2147,166 @@ netdev_dpdk_rxq_dealloc(struct netdev_rxq *rxq) rte_free(rx); } +static inline bool +is_local_to_local(uint16_t src_port_id, struct netdev_dpdk *dev) +{ + bool ret = false; + struct netdev_dpdk *src_dev; + + if (src_port_id == UINT16_MAX) { + ret = true; + } else { + src_dev = netdev_dpdk_lookup_by_port_id(src_port_id); + if (src_dev && (netdev_dpdk_get_vid(src_dev) >= 0)) { + ret = true; + } + } + + if (ret) { + if (netdev_dpdk_get_vid(dev) < 0) { + ret = false; + } + } + + return ret; +} + /* Prepare the packet for HWOL. * Return True if the packet is OK to continue. */ static bool netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, struct rte_mbuf *mbuf) { struct dp_packet *pkt = CONTAINER_OF(mbuf, struct dp_packet, mbuf); + uint16_t l4_proto = 0; + uint8_t *l3_hdr_ptr = NULL; + struct rte_ether_hdr *eth_hdr = + rte_pktmbuf_mtod(mbuf, struct rte_ether_hdr *); + struct rte_ipv4_hdr *ip_hdr; + struct rte_ipv6_hdr *ip6_hdr; + + /* Return directly if source and destitation of mbuf are local ports + * because mbuf has already set ol_flags and l*_len correctly. + */ + if (is_local_to_local(mbuf->port, dev)) { + if (mbuf->ol_flags & (PKT_TX_TCP_SEG | PKT_TX_UDP_SEG)) { + mbuf->tso_segsz = 1450 - mbuf->l3_len - mbuf->l4_len; + } + return true; + } + + if (mbuf->ol_flags & PKT_TX_TUNNEL_VXLAN) { + /* Handle VXLAN TSO */ + struct rte_udp_hdr *udp_hdr; + + /* small packets whose size is less than or equal to MTU needn't + * VXLAN TSO. In addtion, if hardware can't support VXLAN TSO, it + * also can't be handled. So PKT_TX_TUNNEL_VXLAN must be cleared + * outer_l2_len and outer_l3_len must be zeroed. + */ + if (!(dev->up.ol_flags & NETDEV_TX_OFFLOAD_VXLAN_TSO) + || (!(mbuf->ol_flags & (PKT_TX_TCP_SEG | PKT_TX_UDP_SEG)) + && (mbuf->pkt_len <= 1450 + mbuf->outer_l2_len + mbuf->outer_l3_len + + mbuf->l2_len))) { + mbuf->ol_flags &= ~PKT_TX_TUNNEL_VXLAN; + mbuf->l2_len -= sizeof(struct udp_header) + + sizeof(struct vxlanhdr); + mbuf->outer_l2_len = 0; + mbuf->outer_l3_len = 0; + return true; + } + + if (mbuf->ol_flags & PKT_TX_IPV4) { + ip_hdr = (struct rte_ipv4_hdr *)(eth_hdr + DPDK_RTE_HDR_OFFSET); + udp_hdr = (struct rte_udp_hdr *)(ip_hdr + DPDK_RTE_HDR_OFFSET); + + /* outer IP checksum offload */ + ip_hdr->hdr_checksum = 0; + mbuf->ol_flags |= PKT_TX_OUTER_IP_CKSUM; + mbuf->ol_flags |= PKT_TX_OUTER_IPV4; + + ip_hdr = (struct rte_ipv4_hdr *) + ((uint8_t *)udp_hdr + mbuf->l2_len); + l4_proto = ip_hdr->next_proto_id; + l3_hdr_ptr = (uint8_t *)ip_hdr; + + /* inner IP checksum offload */ + ip_hdr->hdr_checksum = 0; + mbuf->ol_flags |= PKT_TX_IP_CKSUM; + } else if (mbuf->ol_flags & PKT_TX_IPV6) { + ip_hdr = (struct rte_ipv4_hdr *)(eth_hdr + DPDK_RTE_HDR_OFFSET); + udp_hdr = (struct rte_udp_hdr *)(ip_hdr + DPDK_RTE_HDR_OFFSET); + + /* outer IP checksum offload */ + ip_hdr->hdr_checksum = 0; + mbuf->ol_flags |= PKT_TX_OUTER_IP_CKSUM; + mbuf->ol_flags |= PKT_TX_OUTER_IPV4; + + ip6_hdr = (struct rte_ipv6_hdr *) + ((uint8_t *)udp_hdr + mbuf->l2_len); + l4_proto = ip6_hdr->proto; + l3_hdr_ptr = (uint8_t *)ip6_hdr; + + /* inner IP checksum offload offload */ + mbuf->ol_flags |= PKT_TX_IP_CKSUM; + } + } else if (mbuf->ol_flags & PKT_TX_L4_MASK) { + /* Handle VLAN TSO */ + /* no inner IP checksum for IPV6 */ + if (mbuf->ol_flags & PKT_TX_IPV4) { + ip_hdr = (struct rte_ipv4_hdr *)(eth_hdr + DPDK_RTE_HDR_OFFSET); + l4_proto = ip_hdr->next_proto_id; + l3_hdr_ptr = (uint8_t *)ip_hdr; + + /* IP checksum offload */ + ip_hdr->hdr_checksum = 0; + mbuf->ol_flags |= PKT_TX_IP_CKSUM; + } else if (mbuf->ol_flags & PKT_TX_IPV6) { + ip6_hdr = (struct rte_ipv6_hdr *)(eth_hdr + DPDK_RTE_HDR_OFFSET); + l4_proto = ip6_hdr->proto; + l3_hdr_ptr = (uint8_t *)ip6_hdr; + + /* IP checksum offload */ + mbuf->ol_flags |= PKT_TX_IP_CKSUM; + } - if (mbuf->ol_flags & PKT_TX_L4_MASK) { mbuf->l2_len = (char *)dp_packet_l3(pkt) - (char *)dp_packet_eth(pkt); mbuf->l3_len = (char *)dp_packet_l4(pkt) - (char *)dp_packet_l3(pkt); mbuf->outer_l2_len = 0; mbuf->outer_l3_len = 0; } - if (mbuf->ol_flags & PKT_TX_TCP_SEG) { - struct tcp_header *th = dp_packet_l4(pkt); + /* It is possible that l4_len isn't set for vhostuserclient */ + if ((l3_hdr_ptr != NULL) && (l4_proto == IPPROTO_TCP) + && (mbuf->l4_len < 20)) { + struct rte_tcp_hdr *tcp_hdr = (struct rte_tcp_hdr *) + (l3_hdr_ptr + mbuf->l3_len); - if (!th) { + mbuf->l4_len = (tcp_hdr->data_off & 0xf0) >> 2; + } + + if ((mbuf->ol_flags & PKT_TX_L4_MASK) == PKT_TX_UDP_CKSUM) { + if (l4_proto != IPPROTO_UDP) { + VLOG_WARN_RL(&rl, "%s: UDP packet without L4 header" + " pkt len: %"PRIu32"", dev->up.name, mbuf->pkt_len); + return false; + } + } else if (mbuf->ol_flags & PKT_TX_TCP_SEG || + mbuf->ol_flags & PKT_TX_TCP_CKSUM) { + if (l4_proto != IPPROTO_TCP) { VLOG_WARN_RL(&rl, "%s: TCP Segmentation without L4 header" " pkt len: %"PRIu32"", dev->up.name, mbuf->pkt_len); return false; } - mbuf->l4_len = TCP_OFFSET(th->tcp_ctl) * 4; - mbuf->ol_flags |= PKT_TX_TCP_CKSUM; - mbuf->tso_segsz = dev->mtu - mbuf->l3_len - mbuf->l4_len; + if (mbuf->pkt_len - mbuf->l2_len > 1450) { + dp_packet_hwol_set_tcp_seg(pkt); + } - if (mbuf->ol_flags & PKT_TX_IPV4) { - mbuf->ol_flags |= PKT_TX_IP_CKSUM; + mbuf->ol_flags |= PKT_TX_TCP_CKSUM; + if (mbuf->ol_flags & PKT_TX_TCP_SEG) { + mbuf->tso_segsz = 1450 - mbuf->l3_len - mbuf->l4_len; + } else { + mbuf->tso_segsz = 0; } } return true; @@ -2737,19 +2884,27 @@ dpdk_copy_dp_packet_to_mbuf(struct rte_mempool *mp, struct dp_packet *pkt_orig) mbuf_dest->tx_offload = pkt_orig->mbuf.tx_offload; mbuf_dest->packet_type = pkt_orig->mbuf.packet_type; - mbuf_dest->ol_flags |= (pkt_orig->mbuf.ol_flags & - ~(EXT_ATTACHED_MBUF | IND_ATTACHED_MBUF)); + mbuf_dest->ol_flags |= pkt_orig->mbuf.ol_flags; + mbuf_dest->l2_len = pkt_orig->mbuf.l2_len; + mbuf_dest->l3_len = pkt_orig->mbuf.l3_len; + mbuf_dest->l4_len = pkt_orig->mbuf.l4_len; + mbuf_dest->outer_l2_len = pkt_orig->mbuf.outer_l2_len; + mbuf_dest->outer_l3_len = pkt_orig->mbuf.outer_l3_len; memcpy(&pkt_dest->l2_pad_size, &pkt_orig->l2_pad_size, sizeof(struct dp_packet) - offsetof(struct dp_packet, l2_pad_size)); - if (mbuf_dest->ol_flags & PKT_TX_L4_MASK) { + if ((mbuf_dest->outer_l2_len == 0) && + (mbuf_dest->ol_flags & PKT_TX_L4_MASK)) { mbuf_dest->l2_len = (char *)dp_packet_l3(pkt_dest) - (char *)dp_packet_eth(pkt_dest); mbuf_dest->l3_len = (char *)dp_packet_l4(pkt_dest) - (char *) dp_packet_l3(pkt_dest); } + /* Mark it as non-DPDK port */ + mbuf_dest->port = UINT16_MAX; + return pkt_dest; } @@ -2808,6 +2963,11 @@ dpdk_do_tx_copy(struct netdev *netdev, int qid, struct dp_packet_batch *batch) if (dev->type == DPDK_DEV_VHOST) { __netdev_dpdk_vhost_send(netdev, qid, pkts, txcnt); } else { + if (userspace_tso_enabled()) { + txcnt = netdev_dpdk_prep_hwol_batch(dev, + (struct rte_mbuf **)pkts, + txcnt); + } tx_failure += netdev_dpdk_eth_tx_burst(dev, qid, (struct rte_mbuf **)pkts, txcnt); @@ -4949,6 +5109,10 @@ netdev_dpdk_reconfigure(struct netdev *netdev) netdev->ol_flags |= NETDEV_TX_OFFLOAD_TCP_CKSUM; netdev->ol_flags |= NETDEV_TX_OFFLOAD_UDP_CKSUM; netdev->ol_flags |= NETDEV_TX_OFFLOAD_IPV4_CKSUM; + /* Enable VXLAN TSO support if available */ + if (dev->hw_ol_features & NETDEV_TX_VXLAN_TNL_TSO_OFFLOAD) { + netdev->ol_flags |= NETDEV_TX_OFFLOAD_VXLAN_TSO; + } if (dev->hw_ol_features & NETDEV_TX_SCTP_CHECKSUM_OFFLOAD) { netdev->ol_flags |= NETDEV_TX_OFFLOAD_SCTP_CKSUM; } diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c index fe7fb9b..9f830b4 100644 --- a/lib/netdev-linux.c +++ b/lib/netdev-linux.c @@ -6508,6 +6508,8 @@ netdev_linux_parse_l2(struct dp_packet *b, uint16_t *l4proto) struct eth_header *eth_hdr; ovs_be16 eth_type; int l2_len; + int l3_len = 0; + int l4_len = 0; eth_hdr = dp_packet_at(b, 0, ETH_HEADER_LEN); if (!eth_hdr) { @@ -6527,6 +6529,8 @@ netdev_linux_parse_l2(struct dp_packet *b, uint16_t *l4proto) l2_len += VLAN_HEADER_LEN; } + dp_packet_hwol_set_l2_len(b, l2_len); + if (eth_type == htons(ETH_TYPE_IP)) { struct ip_header *ip_hdr = dp_packet_at(b, l2_len, IP_HEADER_LEN); @@ -6534,6 +6538,7 @@ netdev_linux_parse_l2(struct dp_packet *b, uint16_t *l4proto) return -EINVAL; } + l3_len = IP_HEADER_LEN; *l4proto = ip_hdr->ip_proto; dp_packet_hwol_set_tx_ipv4(b); } else if (eth_type == htons(ETH_TYPE_IPV6)) { @@ -6544,10 +6549,25 @@ netdev_linux_parse_l2(struct dp_packet *b, uint16_t *l4proto) return -EINVAL; } + l3_len = IPV6_HEADER_LEN; *l4proto = nh6->ip6_ctlun.ip6_un1.ip6_un1_nxt; dp_packet_hwol_set_tx_ipv6(b); } + dp_packet_hwol_set_l3_len(b, l3_len); + + if (*l4proto == IPPROTO_TCP) { + struct tcp_header *tcp_hdr = dp_packet_at(b, l2_len + l3_len, + sizeof(struct tcp_header)); + + if (!tcp_hdr) { + return -EINVAL; + } + + l4_len = TCP_OFFSET(tcp_hdr->tcp_ctl) * 4; + dp_packet_hwol_set_l4_len(b, l4_len); + } + return 0; } diff --git a/lib/netdev-provider.h b/lib/netdev-provider.h index 73dce2f..d616d79 100644 --- a/lib/netdev-provider.h +++ b/lib/netdev-provider.h @@ -43,6 +43,7 @@ enum netdev_ol_flags { NETDEV_TX_OFFLOAD_UDP_CKSUM = 1 << 2, NETDEV_TX_OFFLOAD_SCTP_CKSUM = 1 << 3, NETDEV_TX_OFFLOAD_TCP_TSO = 1 << 4, + NETDEV_TX_OFFLOAD_VXLAN_TSO = 1 << 5, }; /* A network device (e.g. an Ethernet device). diff --git a/lib/netdev.c b/lib/netdev.c index 91e9195..64583d1 100644 --- a/lib/netdev.c +++ b/lib/netdev.c @@ -33,6 +33,7 @@ #include "cmap.h" #include "coverage.h" +#include "csum.h" #include "dpif.h" #include "dp-packet.h" #include "openvswitch/dynamic-string.h" @@ -785,6 +786,36 @@ netdev_get_pt_mode(const struct netdev *netdev) : NETDEV_PT_LEGACY_L2); } +static inline void +calculate_tcpudp_checksum(struct dp_packet *p) +{ + uint32_t pseudo_hdr_csum; + struct ip_header *ip = dp_packet_l3(p); + size_t l4_len = (char *) dp_packet_tail(p) - (char *) dp_packet_l4(p); + uint16_t l4_proto = 0; + + l4_proto = ip->ip_proto; + ip->ip_csum = 0; + ip->ip_csum = csum(ip, sizeof *ip); + pseudo_hdr_csum = packet_csum_pseudoheader(ip); + if (l4_proto == IPPROTO_TCP) { + struct tcp_header *tcp = dp_packet_l4(p); + + tcp->tcp_csum = 0; + tcp->tcp_csum = csum_finish(csum_continue(pseudo_hdr_csum, + tcp, l4_len)); + } else if (l4_proto == IPPROTO_UDP) { + struct udp_header *udp = dp_packet_l4(p); + + udp->udp_csum = 0; + udp->udp_csum = csum_finish(csum_continue(pseudo_hdr_csum, + udp, l4_len)); + if (!udp->udp_csum) { + udp->udp_csum = htons(0xffff); + } + } +} + /* Check if a 'packet' is compatible with 'netdev_flags'. * If a packet is incompatible, return 'false' with the 'errormsg' * pointing to a reason. */ @@ -794,6 +825,14 @@ netdev_send_prepare_packet(const uint64_t netdev_flags, { uint64_t l4_mask; + if (dp_packet_hwol_is_vxlan_tcp_seg(packet) + && (dp_packet_hwol_is_tso(packet) || dp_packet_hwol_l4_mask(packet)) + && !(netdev_flags & NETDEV_TX_OFFLOAD_VXLAN_TSO)) { + /* Fall back to GSO in software. */ + VLOG_ERR_BUF(errormsg, "No VXLAN TSO support"); + return false; + } + if (dp_packet_hwol_is_tso(packet) && !(netdev_flags & NETDEV_TX_OFFLOAD_TCP_TSO)) { /* Fall back to GSO in software. */ @@ -960,15 +999,37 @@ netdev_push_header(const struct netdev *netdev, size_t i, size = dp_packet_batch_size(batch); DP_PACKET_BATCH_REFILL_FOR_EACH (i, size, packet, batch) { - if (OVS_UNLIKELY(dp_packet_hwol_is_tso(packet) - || dp_packet_hwol_l4_mask(packet))) { + if (OVS_UNLIKELY((dp_packet_hwol_is_tso(packet) + || dp_packet_hwol_l4_mask(packet)) + && (data->tnl_type != OVS_VPORT_TYPE_VXLAN))) { COVERAGE_INC(netdev_push_header_drops); dp_packet_delete(packet); - VLOG_WARN_RL(&rl, "%s: Tunneling packets with HW offload flags is " - "not supported: packet dropped", + VLOG_WARN_RL(&rl, + "%s: non-VxLAN Tunneling packets with HW offload " + "flags is not supported: packet dropped", netdev_get_name(netdev)); } else { + if (data->tnl_type == OVS_VPORT_TYPE_VXLAN) { + /* VXLAN offload can't support udp checksum offload + * for inner udp packet, so udp checksum must be set + * before push header in order that outer checksum can + * be set correctly. + */ + if (dp_packet_hwol_l4_is_udp(packet)) { + packet->mbuf.ol_flags &= ~DP_PACKET_OL_TX_UDP_CKSUM; + calculate_tcpudp_checksum(packet); + } + } netdev->netdev_class->push_header(netdev, packet, data); + if (data->tnl_type == OVS_VPORT_TYPE_VXLAN) { + /* Just identify it as a vxlan packet, here netdev is + * vxlan_sys_*, netdev->ol_flags can't indicate if final + * physical output port can support VXLAN TSO, in + * netdev_send_prepare_packet will drop it if final + * physical output port can't support VXLAN TSO. + */ + dp_packet_hwol_set_vxlan_tcp_seg(packet); + } pkt_metadata_init(&packet->md, data->out_port); dp_packet_batch_refill(batch, packet, i); } From patchwork Fri Aug 7 10:56:46 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: yang_y_yi X-Patchwork-Id: 1342218 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=140.211.166.136; helo=silver.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=163.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=163.com header.i=@163.com header.a=rsa-sha256 header.s=s110527 header.b=MAlKZpS3; dkim-atps=neutral Received: from silver.osuosl.org (smtp3.osuosl.org [140.211.166.136]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4BNMmY2pMcz9sTF for ; Fri, 7 Aug 2020 20:58:53 +1000 (AEST) Received: from localhost (localhost [127.0.0.1]) by silver.osuosl.org (Postfix) with ESMTP id 8D550258AC; Fri, 7 Aug 2020 10:58:51 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from silver.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ZGN9WjghEbNc; Fri, 7 Aug 2020 10:58:09 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by silver.osuosl.org (Postfix) with ESMTP id 16ED4258C4; Fri, 7 Aug 2020 10:57:39 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id ECC92C0053; Fri, 7 Aug 2020 10:57:38 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from silver.osuosl.org (smtp3.osuosl.org [140.211.166.136]) by lists.linuxfoundation.org (Postfix) with ESMTP id DA969C004C for ; Fri, 7 Aug 2020 10:57:37 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by silver.osuosl.org (Postfix) with ESMTP id C0E772587B for ; Fri, 7 Aug 2020 10:57:37 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from silver.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id J8gPJtBulKOl for ; Fri, 7 Aug 2020 10:57:27 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from mail-m971.mail.163.com (mail-m971.mail.163.com [123.126.97.1]) by silver.osuosl.org (Postfix) with ESMTP id 227322042E for ; Fri, 7 Aug 2020 10:57:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=163.com; s=s110527; h=From:Subject:Date:Message-Id:MIME-Version; bh=R+cre 3Slse+XusyLygWoMyUSjX0UVXMGvdF2KirN0j8=; b=MAlKZpS3Xxh/sSRxyEO4e 455uY2KvOJ+l2xxOf6sKBw0GJGLbRcnh7bJtxq7oK+qp3KZSzZUdlW1tLz2se9Mi iEAo4u7d/KCI5Y5UdSiRKn9d+bnrSn6tKtkeShDZRcO6fNJPa28G5mdQ4S2QF3Ss NEgynfoQ0TGnWu3t3HTT3Y= Received: from yangyi0100.home.langchao.com (unknown [111.207.123.58]) by smtp1 (Coremail) with SMTP id GdxpCgBXfVpwMy1fICWdCA--.20S4; Fri, 07 Aug 2020 18:57:05 +0800 (CST) From: yang_y_yi@163.com To: ovs-dev@openvswitch.org Date: Fri, 7 Aug 2020 18:56:46 +0800 Message-Id: <20200807105648.94860-3-yang_y_yi@163.com> X-Mailer: git-send-email 2.19.2.windows.1 In-Reply-To: <20200807105648.94860-1-yang_y_yi@163.com> References: <20200807105648.94860-1-yang_y_yi@163.com> MIME-Version: 1.0 X-CM-TRANSID: GdxpCgBXfVpwMy1fICWdCA--.20S4 X-Coremail-Antispam: 1Uf129KBjvAXoWfArW5Gry3AF45Ary7Kw1DGFg_yoW8tr4rZo Z7Jw4fZ3W8Wr1UA3yjgw18Zay0qw4rKF409anY9w15uasIyr1UG34fK3yfJa13ZwnxZFs8 Z3y8tas2qr4xGryfn29KB7ZKAUJUUUUU529EdanIXcx71UUUUU7v73VFW2AGmfu7bjvjm3 AaLaJ3UbIYCTnIWIevJa73UjIFyTuYvjxUO3ktUUUUU X-Originating-IP: [111.207.123.58] X-CM-SenderInfo: 51dqwsp1b1xqqrwthudrp/1tbiqAF5i1c7RMGQ5gAAsE Cc: yang_y_yi@163.com, fbl@sysclose.org Subject: [ovs-dev] [PATCH V3 2/4] Add GSO support for DPDK data path X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" From: Yi Yang GSO(Generic Segment Offload) can segment large UDP and TCP packet to small packets per MTU of destination , especially for the case that physical NIC can't do hardware offload VXLAN TSO and VXLAN UFO, GSO can make sure userspace TSO can still work but not drop. In addition, GSO can help improve UDP performane when UFO is enabled in VM. GSO can support TCP, UDP, VXLAN TCP, VXLAN UDP, it is done in Tx function of physical NIC. Signed-off-by: Yi Yang --- lib/dp-packet.h | 21 +++- lib/netdev-dpdk.c | 358 +++++++++++++++++++++++++++++++++++++++++++++++++---- lib/netdev-linux.c | 17 ++- lib/netdev.c | 67 +++++++--- 4 files changed, 417 insertions(+), 46 deletions(-) diff --git a/lib/dp-packet.h b/lib/dp-packet.h index 79895f2..c33868d 100644 --- a/lib/dp-packet.h +++ b/lib/dp-packet.h @@ -83,6 +83,8 @@ enum dp_packet_offload_mask { DEF_OL_FLAG(DP_PACKET_OL_TX_SCTP_CKSUM, PKT_TX_SCTP_CKSUM, 0x800), /* VXLAN TCP Segmentation Offload. */ DEF_OL_FLAG(DP_PACKET_OL_TX_TUNNEL_VXLAN, PKT_TX_TUNNEL_VXLAN, 0x1000), + /* UDP Segmentation Offload. */ + DEF_OL_FLAG(DP_PACKET_OL_TX_UDP_SEG, PKT_TX_UDP_SEG, 0x2000), /* Adding new field requires adding to DP_PACKET_OL_SUPPORTED_MASK. */ }; @@ -97,7 +99,8 @@ enum dp_packet_offload_mask { DP_PACKET_OL_TX_IPV6 | \ DP_PACKET_OL_TX_TCP_CKSUM | \ DP_PACKET_OL_TX_UDP_CKSUM | \ - DP_PACKET_OL_TX_SCTP_CKSUM) + DP_PACKET_OL_TX_SCTP_CKSUM | \ + DP_PACKET_OL_TX_UDP_SEG) #define DP_PACKET_OL_TX_L4_MASK (DP_PACKET_OL_TX_TCP_CKSUM | \ DP_PACKET_OL_TX_UDP_CKSUM | \ @@ -956,6 +959,13 @@ dp_packet_hwol_is_tso(const struct dp_packet *b) return !!(*dp_packet_ol_flags_ptr(b) & DP_PACKET_OL_TX_TCP_SEG); } +/* Returns 'true' if packet 'b' is marked for UDP segmentation offloading. */ +static inline bool +dp_packet_hwol_is_uso(const struct dp_packet *b) +{ + return !!(*dp_packet_ol_flags_ptr(b) & DP_PACKET_OL_TX_UDP_SEG); +} + /* Returns 'true' if packet 'b' is marked for IPv4 checksum offloading. */ static inline bool dp_packet_hwol_is_ipv4(const struct dp_packet *b) @@ -1034,6 +1044,15 @@ dp_packet_hwol_set_tcp_seg(struct dp_packet *b) *dp_packet_ol_flags_ptr(b) |= DP_PACKET_OL_TX_TCP_SEG; } +/* Mark packet 'b' for UDP segmentation offloading. It implies that + * either the packet 'b' is marked for IPv4 or IPv6 checksum offloading + * and also for UDP checksum offloading. */ +static inline void +dp_packet_hwol_set_udp_seg(struct dp_packet *b) +{ + *dp_packet_ol_flags_ptr(b) |= DP_PACKET_OL_TX_UDP_SEG; +} + #ifdef DPDK_NETDEV /* Mark packet 'b' for VXLAN TCP segmentation offloading. */ static inline void diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 30493ed..888a45e 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -38,13 +38,15 @@ #include #include #include +#include +#include #include #include #include #include #include #include -#include +#include #include "cmap.h" #include "coverage.h" @@ -162,6 +164,7 @@ typedef uint16_t dpdk_port_t; | DEV_TX_OFFLOAD_UDP_CKSUM \ | DEV_TX_OFFLOAD_IPV4_CKSUM) +#define MAX_GSO_MBUFS 64 static const struct rte_eth_conf port_conf = { .rxmode = { @@ -2171,6 +2174,16 @@ is_local_to_local(uint16_t src_port_id, struct netdev_dpdk *dev) return ret; } +static uint16_t +get_udptcp_checksum(void *l3_hdr, void *l4_hdr, uint16_t ethertype) +{ + if (ethertype == htons(RTE_ETHER_TYPE_IPV4)) { + return rte_ipv4_udptcp_cksum(l3_hdr, l4_hdr); + } else { /* assume ethertype == RTE_ETHER_TYPE_IPV6 */ + return rte_ipv6_udptcp_cksum(l3_hdr, l4_hdr); + } +} + /* Prepare the packet for HWOL. * Return True if the packet is OK to continue. */ static bool @@ -2203,10 +2216,9 @@ netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, struct rte_mbuf *mbuf) * also can't be handled. So PKT_TX_TUNNEL_VXLAN must be cleared * outer_l2_len and outer_l3_len must be zeroed. */ - if (!(dev->up.ol_flags & NETDEV_TX_OFFLOAD_VXLAN_TSO) - || (!(mbuf->ol_flags & (PKT_TX_TCP_SEG | PKT_TX_UDP_SEG)) + if (!(mbuf->ol_flags & (PKT_TX_TCP_SEG | PKT_TX_UDP_SEG)) && (mbuf->pkt_len <= 1450 + mbuf->outer_l2_len + mbuf->outer_l3_len - + mbuf->l2_len))) { + + mbuf->l2_len)) { mbuf->ol_flags &= ~PKT_TX_TUNNEL_VXLAN; mbuf->l2_len -= sizeof(struct udp_header) + sizeof(struct vxlanhdr); @@ -2249,7 +2261,7 @@ netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, struct rte_mbuf *mbuf) /* inner IP checksum offload offload */ mbuf->ol_flags |= PKT_TX_IP_CKSUM; } - } else if (mbuf->ol_flags & PKT_TX_L4_MASK) { + } else if (mbuf->ol_flags & (PKT_TX_IPV4 | PKT_TX_IPV6)) { /* Handle VLAN TSO */ /* no inner IP checksum for IPV6 */ if (mbuf->ol_flags & PKT_TX_IPV4) { @@ -2273,6 +2285,18 @@ netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, struct rte_mbuf *mbuf) mbuf->l3_len = (char *)dp_packet_l4(pkt) - (char *)dp_packet_l3(pkt); mbuf->outer_l2_len = 0; mbuf->outer_l3_len = 0; + + /* In case of GRO, PKT_TX_TCP_SEG or PKT_TX_UDP_SEG wasn't set by GRO + * APIs, here is a place we can mark it. + */ + if ((mbuf->pkt_len > 1464) + && (!(mbuf->ol_flags & (PKT_TX_TCP_SEG | PKT_TX_UDP_SEG)))) { + if (l4_proto == IPPROTO_UDP) { + mbuf->ol_flags |= PKT_TX_UDP_SEG; + } else if (l4_proto == IPPROTO_TCP) { + mbuf->ol_flags |= PKT_TX_TCP_SEG; + } + } } /* It is possible that l4_len isn't set for vhostuserclient */ @@ -2284,6 +2308,10 @@ netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, struct rte_mbuf *mbuf) mbuf->l4_len = (tcp_hdr->data_off & 0xf0) >> 2; } + if ((l4_proto != IPPROTO_UDP) && (l4_proto != IPPROTO_TCP)) { + return true; + } + if ((mbuf->ol_flags & PKT_TX_L4_MASK) == PKT_TX_UDP_CKSUM) { if (l4_proto != IPPROTO_UDP) { VLOG_WARN_RL(&rl, "%s: UDP packet without L4 header" @@ -2294,11 +2322,13 @@ netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, struct rte_mbuf *mbuf) mbuf->ol_flags & PKT_TX_TCP_CKSUM) { if (l4_proto != IPPROTO_TCP) { VLOG_WARN_RL(&rl, "%s: TCP Segmentation without L4 header" - " pkt len: %"PRIu32"", dev->up.name, mbuf->pkt_len); + " pkt len: %"PRIu32" l4_proto = %d", + dev->up.name, mbuf->pkt_len, l4_proto); return false; } - if (mbuf->pkt_len - mbuf->l2_len > 1450) { + if (mbuf->pkt_len > 1450 + mbuf->outer_l2_len + mbuf->outer_l3_len + + mbuf->l2_len) { dp_packet_hwol_set_tcp_seg(pkt); } @@ -2308,7 +2338,66 @@ netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, struct rte_mbuf *mbuf) } else { mbuf->tso_segsz = 0; } + + if (!(dev->up.ol_flags & NETDEV_TX_OFFLOAD_TCP_TSO)) { + /* PKT_TX_TCP_CKSUM must be cleaned for GSO because + * tcp checksum only can be caculated by software for + * GSO case. + */ + mbuf->ol_flags &= ~PKT_TX_TCP_CKSUM; + } } + + /* UDP GSO if necessary */ + if (l4_proto == IPPROTO_UDP) { + /* VXLAN GSO can be done here */ + if ((mbuf->ol_flags & PKT_TX_UDP_SEG) || + (mbuf->pkt_len > (1450 + mbuf->outer_l2_len + mbuf->outer_l3_len + + mbuf->l2_len))) { + dp_packet_hwol_set_udp_seg(pkt); + + /* For UDP GSO, udp checksum must be calculated by software */ + if ((mbuf->ol_flags & PKT_TX_L4_MASK) == PKT_TX_UDP_CKSUM) { + void *l3_hdr, *l4_hdr; + struct rte_udp_hdr *udp_hdr; + + /* PKT_TX_UDP_CKSUM must be cleaned for GSO because + * udp checksum only can be caculated by software for + * GSO case. + */ + mbuf->ol_flags &= ~PKT_TX_UDP_CKSUM; + + eth_hdr = (struct rte_ether_hdr *) + ((uint8_t *)eth_hdr + mbuf->outer_l2_len + + mbuf->outer_l3_len + + sizeof(struct udp_header) + + sizeof(struct vxlanhdr)); + l3_hdr = (uint8_t *)eth_hdr + mbuf->l2_len - + sizeof(struct udp_header) - + sizeof(struct vxlanhdr); + l4_hdr = (uint8_t *)l3_hdr + mbuf->l3_len; + ip_hdr = (struct rte_ipv4_hdr *)l3_hdr; + ip_hdr->hdr_checksum = 0; + ip_hdr->hdr_checksum = rte_ipv4_cksum(ip_hdr); + /* Don't touch UDP checksum if it is ip fragment */ + if (!rte_ipv4_frag_pkt_is_fragmented(ip_hdr)) { + udp_hdr = (struct rte_udp_hdr *)l4_hdr; + udp_hdr->dgram_cksum = 0; + udp_hdr->dgram_cksum = + get_udptcp_checksum(l3_hdr, l4_hdr, + eth_hdr->ether_type); + } + } + + /* FOR GSO, gso_size includes l2_len + l3_len */ + mbuf->tso_segsz = 1450 + mbuf->outer_l2_len + mbuf->outer_l3_len + + mbuf->l2_len; + if (mbuf->tso_segsz > dev->mtu) { + mbuf->tso_segsz = dev->mtu; + } + } + } + return true; } @@ -2339,24 +2428,19 @@ netdev_dpdk_prep_hwol_batch(struct netdev_dpdk *dev, struct rte_mbuf **pkts, return cnt; } -/* Tries to transmit 'pkts' to txq 'qid' of device 'dev'. Takes ownership of - * 'pkts', even in case of failure. - * - * Returns the number of packets that weren't transmitted. */ static inline int -netdev_dpdk_eth_tx_burst(struct netdev_dpdk *dev, int qid, - struct rte_mbuf **pkts, int cnt) +__netdev_dpdk_eth_tx_burst(struct netdev_dpdk *dev, int qid, + struct rte_mbuf **pkts, int cnt) { uint32_t nb_tx = 0; - uint16_t nb_tx_prep = cnt; + uint32_t nb_tx_prep; - if (userspace_tso_enabled()) { - nb_tx_prep = rte_eth_tx_prepare(dev->port_id, qid, pkts, cnt); - if (nb_tx_prep != cnt) { - VLOG_WARN_RL(&rl, "%s: Output batch contains invalid packets. " - "Only %u/%u are valid: %s", dev->up.name, nb_tx_prep, - cnt, rte_strerror(rte_errno)); - } + nb_tx_prep = rte_eth_tx_prepare(dev->port_id, qid, pkts, cnt); + if (nb_tx_prep != cnt) { + VLOG_WARN_RL(&rl, "%s: Output batch contains invalid packets. " + "Only %u/%u are valid: %s", + dev->up.name, nb_tx_prep, + cnt, rte_strerror(rte_errno)); } while (nb_tx != nb_tx_prep) { @@ -2384,6 +2468,200 @@ netdev_dpdk_eth_tx_burst(struct netdev_dpdk *dev, int qid, return cnt - nb_tx; } +static inline void +set_multiseg_udptcp_cksum(struct rte_mbuf *mbuf) +{ + uint16_t l3_offset = mbuf->outer_l2_len + mbuf->outer_l3_len + + mbuf->l2_len; + struct rte_ipv4_hdr *ipv4_hdr = (struct rte_ipv4_hdr *) + (rte_pktmbuf_mtod(mbuf, char *) + l3_offset); + struct rte_tcp_hdr *tcp_hdr; + uint32_t l4_hdr_len; + uint8_t *l4_hdr; + struct rte_mbuf *next = mbuf->next; + uint32_t cksum = 0; + uint16_t l4_proto; + uint32_t inner_cksum; + + l4_proto = ipv4_hdr->next_proto_id; + if ((l4_proto != IPPROTO_UDP) && (l4_proto != IPPROTO_TCP)) { + return; + } + + if (l4_proto == IPPROTO_TCP) { + /* For TCP GSO, inner TCP header is in every seg, + * TCP checksum has to be calculated by software. + */ + + l4_hdr_len = mbuf->data_len - l3_offset + - sizeof(struct rte_ipv4_hdr); + l4_hdr = (uint8_t *)(ipv4_hdr + 1); + tcp_hdr = (struct rte_tcp_hdr *)l4_hdr; + tcp_hdr->cksum = 0; + } + + /* Set inner ip checksum */ + ipv4_hdr->hdr_checksum = 0; + ipv4_hdr->hdr_checksum = rte_ipv4_cksum(ipv4_hdr); + + if (l4_proto == IPPROTO_TCP) { + cksum = rte_raw_cksum(l4_hdr, l4_hdr_len); + } else if (l4_proto == IPPROTO_UDP) { + if (next == NULL) { + /* It wasn't GSOed */ + cksum = rte_raw_cksum(ipv4_hdr + 1, + ntohs(ipv4_hdr->total_length) + - sizeof(struct rte_ipv4_hdr)); + } else { + cksum = 0; + } + } + + /* It was GSOed */ + while (next) { + cksum += rte_raw_cksum(rte_pktmbuf_mtod(next, char *), next->data_len); + next = next->next; + } + + /* Save cksum to inner_cksum, outer udp checksum needs it */ + inner_cksum = cksum; + + cksum += rte_ipv4_phdr_cksum(ipv4_hdr, 0); + cksum = ((cksum & 0xffff0000) >> 16) + (cksum & 0xffff); + cksum = (~cksum) & 0xffff; + if (cksum == 0) { + cksum = 0xffff; + } + + /* Set inner TCP checksum */ + if (l4_proto == IPPROTO_TCP) { + tcp_hdr->cksum = (uint16_t)cksum; + } + + /* Set outer udp checksum in case of VXLAN */ + if (mbuf->outer_l2_len != 0) { + ipv4_hdr = (struct rte_ipv4_hdr *) + (rte_pktmbuf_mtod(mbuf, char *) + mbuf->outer_l2_len); + struct rte_udp_hdr *udp_hdr = (struct rte_udp_hdr *) + (ipv4_hdr + 1); + + /* Set outer ip checksum */ + ipv4_hdr->hdr_checksum = 0; + ipv4_hdr->hdr_checksum = rte_ipv4_cksum(ipv4_hdr); + + udp_hdr->dgram_cksum = 0; + cksum = rte_ipv4_phdr_cksum(ipv4_hdr, 0); + cksum += rte_raw_cksum(udp_hdr, mbuf->l2_len + mbuf->l3_len); + cksum += inner_cksum; + if (l4_proto == IPPROTO_TCP) { + cksum += tcp_hdr->cksum; + } + cksum = ((cksum & 0xffff0000) >> 16) + (cksum & 0xffff); + cksum = (~cksum) & 0xffff; + if (cksum == 0) { + cksum = 0xffff; + } + udp_hdr->dgram_cksum = (uint16_t)cksum; + } +} + +/* Tries to transmit 'pkts' to txq 'qid' of device 'dev'. Takes ownership of + * 'pkts', even in case of failure. + * + * Returns the number of packets that weren't transmitted. */ +static inline int +netdev_dpdk_eth_tx_burst(struct netdev_dpdk *dev, int qid, + struct rte_mbuf **pkts, int cnt) +{ + uint32_t nb_tx = 0; + int i; + int ret; + int failures = 0; + + if (userspace_tso_enabled()) { + /* The best point to do gso */ + struct rte_gso_ctx gso_ctx; + struct rte_mbuf *gso_mbufs[MAX_GSO_MBUFS]; + int tx_start = -1; + + /* Setup gso context */ + gso_ctx.direct_pool = dev->dpdk_mp->mp; + gso_ctx.indirect_pool = dev->dpdk_mp->mp; + + /* Do GSO if needed */ + for (i = 0; i < cnt; i++) { + if (((pkts[i]->ol_flags & PKT_TX_UDP_SEG) && + !(dev->hw_ol_features & DEV_TX_OFFLOAD_UDP_TSO)) || + ((pkts[i]->ol_flags & PKT_TX_TCP_SEG) && + ((!(dev->hw_ol_features & NETDEV_TX_VXLAN_TNL_TSO_OFFLOAD) + && (pkts[i]->ol_flags & PKT_TX_TUNNEL_VXLAN)) + || !(dev->hw_ol_features & DEV_TX_OFFLOAD_TCP_TSO)))) { + /* Send non GSO packets before pkts[i] */ + if (tx_start != -1) { + failures += __netdev_dpdk_eth_tx_burst( + dev, qid, + pkts + tx_start, + i - tx_start); + } + tx_start = -1; + + gso_ctx.gso_types = 0; + gso_ctx.gso_size = pkts[i]->tso_segsz; + gso_ctx.flag = 0; + if (pkts[i]->ol_flags & PKT_TX_TUNNEL_VXLAN) { + gso_ctx.gso_types |= DEV_TX_OFFLOAD_VXLAN_TNL_TSO; + } + if (pkts[i]->ol_flags & PKT_TX_UDP_SEG) { + gso_ctx.gso_types |= DEV_TX_OFFLOAD_UDP_TSO; + } else if (pkts[i]->ol_flags & PKT_TX_TCP_SEG) { + gso_ctx.gso_types |= DEV_TX_OFFLOAD_TCP_TSO; + pkts[i]->ol_flags &= ~PKT_TX_TCP_CKSUM; + } + ret = rte_gso_segment(pkts[i], /* packet to segment */ + &gso_ctx, /* gso context */ + /* gso output mbufs */ + (struct rte_mbuf **)&gso_mbufs, + MAX_GSO_MBUFS); + if (ret < 0) { + rte_pktmbuf_free(pkts[i]); + } else { + int j, k; + struct rte_mbuf * next_part; + nb_tx = ret; + for (j = 0; j < nb_tx; j++) { + set_multiseg_udptcp_cksum(gso_mbufs[j]); + /* Clear them because of no offload */ + gso_mbufs[j]->ol_flags = 0; + gso_mbufs[j]->outer_l2_len = 0; + gso_mbufs[j]->outer_l3_len = 0; + gso_mbufs[j]->l2_len = 0; + gso_mbufs[j]->l3_len = 0; + gso_mbufs[j]->l4_len = 0; + next_part = gso_mbufs[j]; + for (k = 0; k < gso_mbufs[j]->nb_segs; k++) { + next_part = next_part->next; + } + } + __netdev_dpdk_eth_tx_burst(dev, qid, gso_mbufs, nb_tx); + } + continue; + } + if (tx_start == -1) { + tx_start = i; + } + } + + if (tx_start != -1) { + /* Send non GSO packets before pkts[i] */ + failures += __netdev_dpdk_eth_tx_burst(dev, qid, pkts + tx_start, + i - tx_start); + } + return failures; + } + + return __netdev_dpdk_eth_tx_burst(dev, qid, pkts, cnt); +} + static inline bool netdev_dpdk_srtcm_policer_pkt_handle(struct rte_meter_srtcm *meter, struct rte_meter_srtcm_profile *profile, @@ -2786,10 +3064,24 @@ out: } } +struct shinfo_arg { + void * buf; + struct rte_mbuf *mbuf; +}; + +/* For GSO case, the extended mbuf only can be freed by + * netdev_dpdk_extbuf_free + */ static void -netdev_dpdk_extbuf_free(void *addr OVS_UNUSED, void *opaque) +netdev_dpdk_extbuf_free(struct rte_mbuf *m, void *opaque) { - rte_free(opaque); + struct shinfo_arg *arg = (struct shinfo_arg *)opaque; + + rte_free(arg->buf); + if (m != arg->mbuf) { + rte_pktmbuf_free(arg->mbuf); + } + free(arg); } static struct rte_mbuf * @@ -2821,8 +3113,11 @@ dpdk_pktmbuf_attach_extbuf(struct rte_mbuf *pkt, uint32_t data_len) /* Initialize shinfo. */ if (shinfo) { + struct shinfo_arg *arg = xmalloc(sizeof(struct shinfo_arg)); + arg->buf = buf; + arg->mbuf = pkt; shinfo->free_cb = netdev_dpdk_extbuf_free; - shinfo->fcb_opaque = buf; + shinfo->fcb_opaque = arg; rte_mbuf_ext_refcnt_set(shinfo, 1); } else { shinfo = rte_pktmbuf_ext_shinfo_init_helper(buf, &buf_len, @@ -2852,6 +3147,10 @@ dpdk_pktmbuf_alloc(struct rte_mempool *mp, uint32_t data_len) return NULL; } + if (unlikely(pkt->shinfo != NULL)) { + pkt->shinfo = NULL; + } + if (rte_pktmbuf_tailroom(pkt) >= data_len) { return pkt; } @@ -5192,6 +5491,7 @@ netdev_dpdk_vhost_client_reconfigure(struct netdev *netdev) int err; uint64_t vhost_flags = 0; uint64_t vhost_unsup_flags; + uint64_t vhost_supported_flags; bool zc_enabled; ovs_mutex_lock(&dev->mutex); @@ -5277,6 +5577,16 @@ netdev_dpdk_vhost_client_reconfigure(struct netdev *netdev) goto unlock; } + err = rte_vhost_driver_get_features(dev->vhost_id, + &vhost_supported_flags); + if (err) { + VLOG_ERR("rte_vhost_driver_get_features failed for " + "vhost user client port: %s\n", dev->up.name); + goto unlock; + } + VLOG_INFO("vhostuserclient port %s features: 0x%016lx", + dev->up.name, vhost_supported_flags); + err = rte_vhost_driver_start(dev->vhost_id); if (err) { VLOG_ERR("rte_vhost_driver_start failed for vhost user " diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c index 9f830b4..557f139 100644 --- a/lib/netdev-linux.c +++ b/lib/netdev-linux.c @@ -6566,6 +6566,16 @@ netdev_linux_parse_l2(struct dp_packet *b, uint16_t *l4proto) l4_len = TCP_OFFSET(tcp_hdr->tcp_ctl) * 4; dp_packet_hwol_set_l4_len(b, l4_len); + } else if (*l4proto == IPPROTO_UDP) { + struct udp_header *udp_hdr = dp_packet_at(b, l2_len + l3_len, + sizeof(struct udp_header)); + + if (!udp_hdr) { + return -EINVAL; + } + + l4_len = sizeof(struct udp_header); + dp_packet_hwol_set_l4_len(b, l4_len); } return 0; @@ -6581,10 +6591,6 @@ netdev_linux_parse_vnet_hdr(struct dp_packet *b) return -EINVAL; } - if (vnet->flags == 0 && vnet->gso_type == VIRTIO_NET_HDR_GSO_NONE) { - return 0; - } - if (netdev_linux_parse_l2(b, &l4proto)) { return -EINVAL; } @@ -6609,6 +6615,9 @@ netdev_linux_parse_vnet_hdr(struct dp_packet *b) || type == VIRTIO_NET_HDR_GSO_TCPV6) { dp_packet_hwol_set_tcp_seg(b); } + if (type == VIRTIO_NET_HDR_GSO_UDP) { + dp_packet_hwol_set_udp_seg(b); + } } return 0; diff --git a/lib/netdev.c b/lib/netdev.c index 64583d1..02f28c8 100644 --- a/lib/netdev.c +++ b/lib/netdev.c @@ -825,23 +825,41 @@ netdev_send_prepare_packet(const uint64_t netdev_flags, { uint64_t l4_mask; - if (dp_packet_hwol_is_vxlan_tcp_seg(packet) - && (dp_packet_hwol_is_tso(packet) || dp_packet_hwol_l4_mask(packet)) - && !(netdev_flags & NETDEV_TX_OFFLOAD_VXLAN_TSO)) { - /* Fall back to GSO in software. */ - VLOG_ERR_BUF(errormsg, "No VXLAN TSO support"); - return false; - } - - if (dp_packet_hwol_is_tso(packet) - && !(netdev_flags & NETDEV_TX_OFFLOAD_TCP_TSO)) { - /* Fall back to GSO in software. */ - VLOG_ERR_BUF(errormsg, "No TSO support"); - return false; - } - + /* GSO can handle TSO by software even if device can't handle hardware + * offload, so needn't check it here. + */ l4_mask = dp_packet_hwol_l4_mask(packet); if (l4_mask) { + /* Calculate checksum for VLAN TSO case when no hardware offload + * feature is available. Note: for VXLAN TSO case, checksum has + * been calculated before here, so it won't be done here again + * because checksum flags in packet->m.ol_flags have been cleaned. + */ + if (dp_packet_hwol_l4_is_tcp(packet) + && !dp_packet_hwol_is_vxlan_tcp_seg(packet) + && !(netdev_flags & NETDEV_TX_OFFLOAD_TCP_CKSUM)) { + packet->mbuf.ol_flags &= ~DP_PACKET_OL_TX_TCP_CKSUM; + /* Only calculate TCP checksum for non-TSO packet, + * it will be calculated after GSO for TSO packet. + */ + if (!(packet->mbuf.ol_flags & DP_PACKET_OL_TX_TCP_SEG)) { + calculate_tcpudp_checksum(packet); + } + return true; + } else if (dp_packet_hwol_l4_is_udp(packet) + && !dp_packet_hwol_is_vxlan_tcp_seg(packet) + && !(netdev_flags & NETDEV_TX_OFFLOAD_UDP_CKSUM)) { + packet->mbuf.ol_flags &= ~DP_PACKET_OL_TX_UDP_CKSUM; + /* Only calculate UDP checksum for non-UFO packet, + * it will be calculated immediately before GSO for + * UFO packet. + */ + if (!(packet->mbuf.ol_flags & DP_PACKET_OL_TX_UDP_SEG)) { + calculate_tcpudp_checksum(packet); + } + return true; + } + if (dp_packet_hwol_l4_is_tcp(packet)) { if (!(netdev_flags & NETDEV_TX_OFFLOAD_TCP_CKSUM)) { /* Fall back to TCP csum in software. */ @@ -1013,11 +1031,26 @@ netdev_push_header(const struct netdev *netdev, /* VXLAN offload can't support udp checksum offload * for inner udp packet, so udp checksum must be set * before push header in order that outer checksum can - * be set correctly. + * be set correctly. But GSO code will set udp checksum + * if packet->mbuf.ol_flags has DP_PACKET_OL_TX_UDP_SEG. */ if (dp_packet_hwol_l4_is_udp(packet)) { packet->mbuf.ol_flags &= ~DP_PACKET_OL_TX_UDP_CKSUM; - calculate_tcpudp_checksum(packet); + /* Only calculate UDP checksum for non-UFO packet, + * it will be calculated immediately before GSO for + * UFO packet. + */ + if (!(packet->mbuf.ol_flags & DP_PACKET_OL_TX_UDP_SEG)) { + calculate_tcpudp_checksum(packet); + } + } else if (dp_packet_hwol_l4_is_tcp(packet)) { + packet->mbuf.ol_flags &= ~DP_PACKET_OL_TX_TCP_CKSUM; + /* Only calculate TCP checksum for non-TSO packet, + * it will be calculated after GSO for TSO packet. + */ + if (!(packet->mbuf.ol_flags & DP_PACKET_OL_TX_TCP_SEG)) { + calculate_tcpudp_checksum(packet); + } } } netdev->netdev_class->push_header(netdev, packet, data); From patchwork Fri Aug 7 10:56:47 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: yang_y_yi X-Patchwork-Id: 1342214 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=140.211.166.138; helo=whitealder.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=163.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=163.com header.i=@163.com header.a=rsa-sha256 header.s=s110527 header.b=FWB5O1MC; dkim-atps=neutral Received: from whitealder.osuosl.org (smtp1.osuosl.org [140.211.166.138]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4BNMl44zCRz9sTQ for ; Fri, 7 Aug 2020 20:57:36 +1000 (AEST) Received: from localhost (localhost [127.0.0.1]) by whitealder.osuosl.org (Postfix) with ESMTP id 02F2888764; Fri, 7 Aug 2020 10:57:35 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from whitealder.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id VyYA1Zl-bZon; Fri, 7 Aug 2020 10:57:30 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by whitealder.osuosl.org (Postfix) with ESMTP id 96B4A88748; Fri, 7 Aug 2020 10:57:30 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 896EEC0050; Fri, 7 Aug 2020 10:57:30 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from fraxinus.osuosl.org (smtp4.osuosl.org [140.211.166.137]) by lists.linuxfoundation.org (Postfix) with ESMTP id 5A137C004C for ; Fri, 7 Aug 2020 10:57:28 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by fraxinus.osuosl.org (Postfix) with ESMTP id 56AE6872EF for ; Fri, 7 Aug 2020 10:57:28 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from fraxinus.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 1rlyA5XIFXei for ; Fri, 7 Aug 2020 10:57:27 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from mail-m971.mail.163.com (mail-m971.mail.163.com [123.126.97.1]) by fraxinus.osuosl.org (Postfix) with ESMTP id 7FD4B872F0 for ; Fri, 7 Aug 2020 10:57:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=163.com; s=s110527; h=From:Subject:Date:Message-Id:MIME-Version; bh=W83fs /2wav5EBiQnLXkP8OgwMbYe+xe1vDU9bCeW2jc=; b=FWB5O1MCqGtPpLtE6ZnXF XnrEjpEKpH4knyxkzMEbuHxvSd1hkUEd8MSuiNqypzZ5exO+DyzSAGMCzY3TjlSA tRYbAiVnYh2IdPY9Z7uAF/OcK2vaVB/eFvqlZQGz3HUAl3etm/K1vjc58xIuYrlI KYJ8O3iqrhI1WpQ8j5mJGQ= Received: from yangyi0100.home.langchao.com (unknown [111.207.123.58]) by smtp1 (Coremail) with SMTP id GdxpCgBXfVpwMy1fICWdCA--.20S5; Fri, 07 Aug 2020 18:57:05 +0800 (CST) From: yang_y_yi@163.com To: ovs-dev@openvswitch.org Date: Fri, 7 Aug 2020 18:56:47 +0800 Message-Id: <20200807105648.94860-4-yang_y_yi@163.com> X-Mailer: git-send-email 2.19.2.windows.1 In-Reply-To: <20200807105648.94860-1-yang_y_yi@163.com> References: <20200807105648.94860-1-yang_y_yi@163.com> MIME-Version: 1.0 X-CM-TRANSID: GdxpCgBXfVpwMy1fICWdCA--.20S5 X-Coremail-Antispam: 1Uf129KBjvAXoW3KFW7Jr15Gw1ktr48GryDWrg_yoW8WrWUAo ZrJr43uFn5Ww1vyw18tFy8Wr1qvw48KF4UuwsY934DuasFyr1DW3s3Ga1fJa13Zr13Jr4k Zw4kt3W7WrWUJryrn29KB7ZKAUJUUUUU529EdanIXcx71UUUUU7v73VFW2AGmfu7bjvjm3 AaLaJ3UbIYCTnIWIevJa73UjIFyTuYvjxUziiSDUUUU X-Originating-IP: [111.207.123.58] X-CM-SenderInfo: 51dqwsp1b1xqqrwthudrp/1tbiFgF5i144MmuqngAAsu Cc: yang_y_yi@163.com, fbl@sysclose.org Subject: [ovs-dev] [PATCH V3 3/4] Add VXLAN TCP and UDP GRO support for DPDK data path X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" From: Yi Yang GRO(Generic Receive Offload) can help improve performance when TSO (TCP Segment Offload) or VXLAN TSO is enabled on transmit side, this can avoid overhead of ovs DPDK data path and enqueue vhost for VM by merging many small packets to large packets (65535 bytes at most) once it receives packets from physical NIC. It can work for both VXLAN and vlan case. Signed-off-by: Yi Yang --- lib/dp-packet.h | 37 ++++++++- lib/netdev-dpdk.c | 227 ++++++++++++++++++++++++++++++++++++++++++++++++++++- lib/netdev-linux.c | 112 ++++++++++++++++++++++++-- 3 files changed, 365 insertions(+), 11 deletions(-) diff --git a/lib/dp-packet.h b/lib/dp-packet.h index c33868d..18307c0 100644 --- a/lib/dp-packet.h +++ b/lib/dp-packet.h @@ -580,7 +580,16 @@ dp_packet_set_size(struct dp_packet *b, uint32_t v) * (and thus 'v') will always be <= UINT16_MAX; this means that there is no * loss of accuracy in assigning 'v' to 'data_len'. */ - b->mbuf.data_len = (uint16_t)v; /* Current seg length. */ + if (b->mbuf.nb_segs <= 1) { + b->mbuf.data_len = (uint16_t)v; /* Current seg length. */ + } else { + /* For multi-seg packet, if it is resize, data_len should be + * adjusted by offset, this will happend in case of push or pop. + */ + if (b->mbuf.pkt_len != 0) { + b->mbuf.data_len += v - b->mbuf.pkt_len; + } + } b->mbuf.pkt_len = v; /* Total length of all segments linked to * this segment. */ } @@ -1092,6 +1101,20 @@ dp_packet_hwol_set_l4_len(struct dp_packet *b, int l4_len) { b->mbuf.l4_len = l4_len; } + +/* Set outer_l2_len for the packet 'b' */ +static inline void +dp_packet_hwol_set_outer_l2_len(struct dp_packet *b, int outer_l2_len) +{ + b->mbuf.outer_l2_len = outer_l2_len; +} + +/* Set outer_l3_len for the packet 'b' */ +static inline void +dp_packet_hwol_set_outer_l3_len(struct dp_packet *b, int outer_l3_len) +{ + b->mbuf.outer_l3_len = outer_l3_len; +} #else /* Mark packet 'b' for VXLAN TCP segmentation offloading. */ static inline void @@ -1125,6 +1148,18 @@ dp_packet_hwol_set_l4_len(struct dp_packet *b OVS_UNUSED, int l4_len OVS_UNUSED) { } + +/* Set outer_l2_len for the packet 'b' */ +static inline void +dp_packet_hwol_set_outer_l2_len(struct dp_packet *b, int outer_l2_len) +{ +} + +/* Set outer_l3_len for the packet 'b' */ +static inline void +dp_packet_hwol_set_outer_l3_len(struct dp_packet *b, int outer_l3_len) +{ +} #endif /* DPDK_NETDEV */ static inline bool diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 888a45e..b6c57a6 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -25,6 +25,7 @@ #include #include #include +#include /* Include rte_compat.h first to allow experimental API's needed for the * rte_meter.h rfc4115 functions. Once they are no longer marked as @@ -47,6 +48,7 @@ #include #include #include +#include #include "cmap.h" #include "coverage.h" @@ -2184,6 +2186,8 @@ get_udptcp_checksum(void *l3_hdr, void *l4_hdr, uint16_t ethertype) } } +#define UDP_VXLAN_ETH_HDR_SIZE 30 + /* Prepare the packet for HWOL. * Return True if the packet is OK to continue. */ static bool @@ -2207,6 +2211,42 @@ netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, struct rte_mbuf *mbuf) return true; } + /* ol_flags is cleaned after vxlan pop, so need reset for those packets. + * Such packets are only for local VMs or namespaces, so need to return + * after ol_flags, l2_len, l3_len and tso_segsz are set. + */ + if (((mbuf->ol_flags & PKT_TX_TUNNEL_VXLAN) == 0) && + (mbuf->l2_len == UDP_VXLAN_ETH_HDR_SIZE) && + (mbuf->pkt_len > 1464)) { + mbuf->ol_flags = 0; + mbuf->l2_len -= sizeof(struct udp_header) + + sizeof(struct vxlanhdr); + if (mbuf->l3_len == IP_HEADER_LEN) { + mbuf->ol_flags |= PKT_TX_IPV4; + ip_hdr = (struct rte_ipv4_hdr *)(eth_hdr + 1); + l4_proto = ip_hdr->next_proto_id; + } else if (mbuf->l3_len == IPV6_HEADER_LEN) { + mbuf->ol_flags |= PKT_TX_IPV6; + ip6_hdr = (struct rte_ipv6_hdr *)(eth_hdr + 1); + l4_proto = ip6_hdr->proto; + } + + mbuf->ol_flags |= PKT_TX_IP_CKSUM; + if (l4_proto == IPPROTO_TCP) { + mbuf->ol_flags |= PKT_TX_TCP_SEG; + mbuf->ol_flags |= PKT_TX_TCP_CKSUM; + } else if (l4_proto == IPPROTO_UDP) { + mbuf->ol_flags |= PKT_TX_UDP_SEG; + mbuf->ol_flags |= PKT_TX_UDP_CKSUM; + } + mbuf->tso_segsz = 1450; + if (mbuf->tso_segsz > dev->mtu) { + mbuf->tso_segsz = dev->mtu; + } + + return true; + } + if (mbuf->ol_flags & PKT_TX_TUNNEL_VXLAN) { /* Handle VXLAN TSO */ struct rte_udp_hdr *udp_hdr; @@ -2853,6 +2893,104 @@ netdev_dpdk_vhost_rxq_enabled(struct netdev_rxq *rxq) return dev->vhost_rxq_enabled[rxq->queue_id]; } +#define VXLAN_DST_PORT 4789 + +static void +netdev_dpdk_parse_hdr(struct dp_packet *pkt, int offset, uint16_t *l4_proto, + int *is_frag) +{ + struct rte_mbuf *mbuf = (struct rte_mbuf *)pkt; + struct rte_ether_hdr *eth_hdr = + rte_pktmbuf_mtod_offset(mbuf, struct rte_ether_hdr *, offset); + ovs_be16 eth_type; + int l2_len; + int l3_len = 0; + int l4_len = 0; + uint16_t inner_l4_proto = 0; + int inner_is_frag = 0; + + if (offset == 0) { + *is_frag = 0; + } + mbuf->packet_type = 0; + l2_len = ETH_HEADER_LEN; + eth_type = (OVS_FORCE ovs_be16) eth_hdr->ether_type; + if (eth_type_vlan(eth_type)) { + struct rte_vlan_hdr *vlan_hdr = + (struct rte_vlan_hdr *)(eth_hdr + DPDK_RTE_HDR_OFFSET); + + eth_type = (OVS_FORCE ovs_be16) vlan_hdr->eth_proto; + l2_len += VLAN_HEADER_LEN; + } + + dp_packet_hwol_set_l2_len(pkt, l2_len); + dp_packet_hwol_set_outer_l2_len(pkt, 0); + dp_packet_hwol_set_outer_l3_len(pkt, 0); + + if (eth_type == htons(ETH_TYPE_IP)) { + struct rte_ipv4_hdr *ipv4_hdr = (struct rte_ipv4_hdr *) + ((char *)eth_hdr + l2_len); + + l3_len = IP_HEADER_LEN; + dp_packet_hwol_set_tx_ipv4(pkt); + *l4_proto = ipv4_hdr->next_proto_id; + *is_frag = rte_ipv4_frag_pkt_is_fragmented(ipv4_hdr); + mbuf->packet_type |= RTE_PTYPE_L3_IPV4; + } else if (eth_type == htons(RTE_ETHER_TYPE_IPV6)) { + struct rte_ipv6_hdr *ipv6_hdr = (struct rte_ipv6_hdr *) + ((char *)eth_hdr + l2_len); + l3_len = IPV6_HEADER_LEN; + dp_packet_hwol_set_tx_ipv6(pkt); + *l4_proto = ipv6_hdr->proto; + } + + dp_packet_hwol_set_l3_len(pkt, l3_len); + + if (*l4_proto == IPPROTO_TCP) { + struct rte_tcp_hdr *tcp_hdr = (struct rte_tcp_hdr *) + ((char *)eth_hdr + l2_len + l3_len); + + l4_len = (tcp_hdr->data_off & 0xf0) >> 2; + dp_packet_hwol_set_l4_len(pkt, l4_len); + mbuf->packet_type |= RTE_PTYPE_L4_TCP; + } else if (*l4_proto == IPPROTO_UDP) { + struct rte_udp_hdr *udp_hdr = (struct rte_udp_hdr *) + ((char *)eth_hdr + l2_len + l3_len); + + l4_len = sizeof(*udp_hdr); + dp_packet_hwol_set_l4_len(pkt, l4_len); + mbuf->packet_type |= RTE_PTYPE_L4_UDP; + + /* Need to parse inner packet if needed */ + if (ntohs(udp_hdr->dst_port) == VXLAN_DST_PORT) { + netdev_dpdk_parse_hdr(pkt, + l2_len + l3_len + l4_len + + sizeof(struct vxlanhdr), + &inner_l4_proto, + &inner_is_frag); + mbuf->l2_len += sizeof(struct rte_udp_hdr) + + sizeof(struct vxlanhdr); + dp_packet_hwol_set_outer_l2_len(pkt, l2_len); + dp_packet_hwol_set_outer_l3_len(pkt, l3_len); + + /* Set packet_type, it is necessary for GRO */ + mbuf->packet_type |= RTE_PTYPE_TUNNEL_VXLAN; + if (mbuf->l3_len == IP_HEADER_LEN) { + mbuf->packet_type |= RTE_PTYPE_INNER_L3_IPV4; + } + if (inner_l4_proto == IPPROTO_TCP) { + mbuf->packet_type |= RTE_PTYPE_INNER_L4_TCP; + mbuf->packet_type |= RTE_PTYPE_L4_UDP; + } else if (inner_l4_proto == IPPROTO_UDP) { + mbuf->packet_type |= RTE_PTYPE_INNER_L4_UDP; + mbuf->packet_type |= RTE_PTYPE_L4_UDP; + } + } + } +} + +static RTE_DEFINE_PER_LCORE(void *, _gro_ctx); + static int netdev_dpdk_rxq_recv(struct netdev_rxq *rxq, struct dp_packet_batch *batch, int *qfill) @@ -2862,6 +3000,36 @@ netdev_dpdk_rxq_recv(struct netdev_rxq *rxq, struct dp_packet_batch *batch, struct ingress_policer *policer = netdev_dpdk_get_ingress_policer(dev); int nb_rx; int dropped = 0; + struct rte_gro_param gro_param; + struct dp_packet *packet; + struct dp_packet *udp_pkts[NETDEV_MAX_BURST]; + struct dp_packet *other_pkts[NETDEV_MAX_BURST]; + + int nb_udp_rx = 0; + int nb_other_rx = 0; + + /* Initialize GRO parameters */ + gro_param.gro_types = RTE_GRO_TCP_IPV4 | + RTE_GRO_UDP_IPV4 | + RTE_GRO_IPV4_VXLAN_TCP_IPV4 | + RTE_GRO_IPV4_VXLAN_UDP_IPV4; + gro_param.max_flow_num = 1024; + /* There are 46 fragments for a 64K big packet */ + gro_param.max_item_per_flow = NETDEV_MAX_BURST * 2; + + /* Initialize GRO context */ + if (RTE_PER_LCORE(_gro_ctx) == NULL) { + uint32_t cpu, node; + int ret; + + ret = syscall(__NR_getcpu, &cpu, &node, NULL); + if (ret == 0) { + gro_param.socket_id = node; + } else { + gro_param.socket_id = 0; + } + RTE_PER_LCORE(_gro_ctx) = rte_gro_ctx_create(&gro_param); + } if (OVS_UNLIKELY(!(dev->flags & NETDEV_UP))) { return EAGAIN; @@ -2890,7 +3058,58 @@ netdev_dpdk_rxq_recv(struct netdev_rxq *rxq, struct dp_packet_batch *batch, rte_spinlock_unlock(&dev->stats_lock); } + /* Need to parse packet header and set necessary fields in mbuf for GRO */ batch->count = nb_rx; + DP_PACKET_BATCH_FOR_EACH (i, packet, batch) { + uint16_t l4_proto = 0; + int is_frag = 0; + + netdev_dpdk_parse_hdr(packet, 0, &l4_proto, &is_frag); + if (packet->mbuf.packet_type & RTE_PTYPE_TUNNEL_VXLAN) { + if (packet->mbuf.packet_type & RTE_PTYPE_INNER_L4_UDP) { + udp_pkts[nb_udp_rx++] = packet; + } else { + other_pkts[nb_other_rx++] = packet; + } + } else { + if (packet->mbuf.packet_type & RTE_PTYPE_L4_UDP) { + udp_pkts[nb_udp_rx++] = packet; + } else { + other_pkts[nb_other_rx++] = packet; + } + } + } + + /* Do GRO here if needed, note: IP fragment can be out of order */ + if (nb_udp_rx) { + /* UDP packet must use heavy rte_gro_reassemble */ + nb_udp_rx = rte_gro_reassemble((struct rte_mbuf **) udp_pkts, + nb_udp_rx, RTE_PER_LCORE(_gro_ctx)); + nb_udp_rx += rte_gro_timeout_flush(RTE_PER_LCORE(_gro_ctx), 10000, + RTE_GRO_UDP_IPV4 + | RTE_GRO_IPV4_VXLAN_UDP_IPV4, + (struct rte_mbuf **)&udp_pkts[nb_udp_rx], + NETDEV_MAX_BURST - nb_udp_rx); + } + + if (nb_other_rx) { + /* TCP packet is better for lightweigh rte_gro_reassemble_burst */ + nb_other_rx = rte_gro_reassemble_burst((struct rte_mbuf **) other_pkts, + nb_other_rx, + &gro_param); + } + + batch->count = nb_udp_rx + nb_other_rx; + if (nb_udp_rx) { + memcpy(batch->packets, udp_pkts, + nb_udp_rx * sizeof(struct dp_packet *)); + } + + if (nb_other_rx) { + memcpy(&batch->packets[nb_udp_rx], other_pkts, + nb_other_rx * sizeof(struct dp_packet *)); + } + dp_packet_batch_init_packet_fields(batch); if (qfill) { @@ -2931,10 +3150,11 @@ netdev_dpdk_filter_packet_len(struct netdev_dpdk *dev, struct rte_mbuf **pkts, for (i = 0; i < pkt_cnt; i++) { pkt = pkts[i]; if (OVS_UNLIKELY((pkt->pkt_len > dev->max_packet_len) - && !(pkt->ol_flags & PKT_TX_TCP_SEG))) { + && !(pkt->ol_flags & (PKT_TX_TCP_SEG | PKT_TX_UDP_SEG)))) { VLOG_WARN_RL(&rl, "%s: Too big size %" PRIu32 " " - "max_packet_len %d", dev->up.name, pkt->pkt_len, - dev->max_packet_len); + "max_packet_len %d ol_flags 0x%016lx", + dev->up.name, pkt->pkt_len, + dev->max_packet_len, pkt->ol_flags); rte_pktmbuf_free(pkt); continue; } @@ -3289,7 +3509,6 @@ netdev_dpdk_vhost_send(struct netdev *netdev, int qid, struct dp_packet_batch *batch, bool concurrent_txq OVS_UNUSED) { - if (OVS_UNLIKELY(batch->packets[0]->source != DPBUF_DPDK)) { dpdk_do_tx_copy(netdev, qid, batch); dp_packet_delete_batch(batch, true); diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c index 557f139..d8a035a 100644 --- a/lib/netdev-linux.c +++ b/lib/netdev-linux.c @@ -50,6 +50,7 @@ #include #include "coverage.h" +#include "csum.h" #include "dp-packet.h" #include "dpif-netlink.h" #include "dpif-netdev.h" @@ -1549,8 +1550,31 @@ netdev_linux_sock_batch_send(int sock, int ifindex, bool tso, int mtu, if (tso) { netdev_linux_prepend_vnet_hdr(packet, mtu); } - - iov[i].iov_base = dp_packet_data(packet); + /* It is a GROed packet which has multiple segments, so need to merge + * as a big packet in order that sendmmsg can handle it correctly. + */ + if (packet->mbuf.nb_segs > 1) { + struct dp_packet *new_packet = + dp_packet_new(dp_packet_size(packet)); + struct rte_mbuf *next = (struct rte_mbuf *)packet; + uint32_t offset = 0; + + iov[i].iov_base = dp_packet_data(new_packet); + /* Copy multi-seg mbuf data to linear buffer */ + while (next) { + memcpy((uint8_t *)dp_packet_data(new_packet) + offset, + rte_pktmbuf_mtod(next, char *), + next->data_len); + offset += next->data_len; + next = next->next; + } + dp_packet_set_size(new_packet, offset); + dp_packet_delete(packet); + batch->packets[i] = new_packet; + packet = new_packet; + } else { + iov[i].iov_base = dp_packet_data(packet); + } iov[i].iov_len = dp_packet_size(packet); mmsg[i].msg_hdr = (struct msghdr) { .msg_name = &sll, .msg_namelen = sizeof sll, @@ -6624,17 +6648,93 @@ netdev_linux_parse_vnet_hdr(struct dp_packet *b) } static void +netdev_linux_set_ol_flags_and_ip_cksum(struct dp_packet *b, int mtu) +{ + struct eth_header *eth_hdr; + uint16_t l4proto = 0; + ovs_be16 eth_type; + int l2_len; + + eth_hdr = dp_packet_at(b, 0, ETH_HEADER_LEN); + if (!eth_hdr) { + return; + } + + l2_len = ETH_HEADER_LEN; + eth_type = eth_hdr->eth_type; + if (eth_type_vlan(eth_type)) { + struct vlan_header *vlan = dp_packet_at(b, l2_len, VLAN_HEADER_LEN); + + if (!vlan) { + return; + } + + eth_type = vlan->vlan_next_type; + l2_len += VLAN_HEADER_LEN; + } + + if (eth_type == htons(ETH_TYPE_IP)) { + struct ip_header *ip_hdr = dp_packet_at(b, l2_len, IP_HEADER_LEN); + + if (!ip_hdr) { + return; + } + + ip_hdr->ip_csum = 0; + ip_hdr->ip_csum = csum(ip_hdr, sizeof *ip_hdr); + l4proto = ip_hdr->ip_proto; + dp_packet_hwol_set_tx_ipv4(b); + } else if (eth_type == htons(ETH_TYPE_IPV6)) { + struct ovs_16aligned_ip6_hdr *nh6; + + nh6 = dp_packet_at(b, l2_len, IPV6_HEADER_LEN); + if (!nh6) { + return; + } + + l4proto = nh6->ip6_ctlun.ip6_un1.ip6_un1_nxt; + dp_packet_hwol_set_tx_ipv6(b); + } + + if (l4proto == IPPROTO_TCP) { + /* Note: needn't set tcp checksum */ + if (dp_packet_size(b) > mtu + b->mbuf.l2_len) { + dp_packet_hwol_set_tcp_seg(b); + } + dp_packet_hwol_set_csum_tcp(b); + } else if (l4proto == IPPROTO_UDP) { + if (dp_packet_size(b) > mtu + b->mbuf.l2_len) { + dp_packet_hwol_set_udp_seg(b); + } + dp_packet_hwol_set_csum_udp(b); + } +} + +static void netdev_linux_prepend_vnet_hdr(struct dp_packet *b, int mtu) { - struct virtio_net_hdr *vnet = dp_packet_push_zeros(b, sizeof *vnet); + struct virtio_net_hdr *vnet; + + /* ol_flags weren't set correctly for received packets which are from + * physical port and GROed, so it has to been set again in order that + * vnet_hdr can be prepended correctly. + */ + if ((dp_packet_size(b) > mtu + b->mbuf.l2_len) + && !dp_packet_hwol_l4_mask(b)) { + netdev_linux_set_ol_flags_and_ip_cksum(b, mtu); + } + + vnet = dp_packet_push_zeros(b, sizeof *vnet); - if (dp_packet_hwol_is_tso(b)) { + if (dp_packet_hwol_is_tso(b) || dp_packet_hwol_is_uso(b)) { uint16_t hdr_len = ((char *)dp_packet_l4(b) - (char *)dp_packet_eth(b)) - + TCP_HEADER_LEN; + + b->mbuf.l4_len; vnet->hdr_len = (OVS_FORCE __virtio16)hdr_len; vnet->gso_size = (OVS_FORCE __virtio16)(mtu - hdr_len); - if (dp_packet_hwol_is_ipv4(b)) { + if (dp_packet_hwol_is_uso(b)) { + vnet->gso_type = VIRTIO_NET_HDR_GSO_UDP; + } else if (dp_packet_hwol_is_ipv4(b)) { vnet->gso_type = VIRTIO_NET_HDR_GSO_TCPV4; } else { vnet->gso_type = VIRTIO_NET_HDR_GSO_TCPV6; From patchwork Fri Aug 7 10:56:48 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: yang_y_yi X-Patchwork-Id: 1342213 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=140.211.166.133; helo=hemlock.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=163.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=163.com header.i=@163.com header.a=rsa-sha256 header.s=s110527 header.b=QH7UNzx+; dkim-atps=neutral Received: from hemlock.osuosl.org (smtp2.osuosl.org [140.211.166.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4BNMl23dJpz9sTF for ; Fri, 7 Aug 2020 20:57:34 +1000 (AEST) Received: from localhost (localhost [127.0.0.1]) by hemlock.osuosl.org (Postfix) with ESMTP id E9D8F8836E; Fri, 7 Aug 2020 10:57:31 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from hemlock.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id XVXPUbAaFCQx; Fri, 7 Aug 2020 10:57:29 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by hemlock.osuosl.org (Postfix) with ESMTP id EBD69882B0; Fri, 7 Aug 2020 10:57:28 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id D8E36C0050; Fri, 7 Aug 2020 10:57:28 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from fraxinus.osuosl.org (smtp4.osuosl.org [140.211.166.137]) by lists.linuxfoundation.org (Postfix) with ESMTP id D8922C004C for ; Fri, 7 Aug 2020 10:57:27 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by fraxinus.osuosl.org (Postfix) with ESMTP id C04C3872F3 for ; Fri, 7 Aug 2020 10:57:27 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from fraxinus.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id XF0lZYe15bnm for ; Fri, 7 Aug 2020 10:57:26 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from mail-m971.mail.163.com (mail-m971.mail.163.com [123.126.97.1]) by fraxinus.osuosl.org (Postfix) with ESMTP id 4B7E7872EF for ; Fri, 7 Aug 2020 10:57:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=163.com; s=s110527; h=From:Subject:Date:Message-Id:MIME-Version; bh=S7iaB 4XfRGnehfwt4aqe8Vs1+j+HuEqRYRx35P+Kd/g=; b=QH7UNzx+2+5xR4X8wq1Hz Z0XJzrAHdv/z0Dy+scOJxmMGpbowTY4KfTndvEYa7fdijMm6K52UP84i/bJekjqW Pu9JJzpsSRaJXA+qTurKe4HYyDH0Sb0GA4i8p6E6hn2fi85SuM/11ELdFBPS6V/K AcUQm1MuRnmRD7vAhub4Ag= Received: from yangyi0100.home.langchao.com (unknown [111.207.123.58]) by smtp1 (Coremail) with SMTP id GdxpCgBXfVpwMy1fICWdCA--.20S6; Fri, 07 Aug 2020 18:57:05 +0800 (CST) From: yang_y_yi@163.com To: ovs-dev@openvswitch.org Date: Fri, 7 Aug 2020 18:56:48 +0800 Message-Id: <20200807105648.94860-5-yang_y_yi@163.com> X-Mailer: git-send-email 2.19.2.windows.1 In-Reply-To: <20200807105648.94860-1-yang_y_yi@163.com> References: <20200807105648.94860-1-yang_y_yi@163.com> MIME-Version: 1.0 X-CM-TRANSID: GdxpCgBXfVpwMy1fICWdCA--.20S6 X-Coremail-Antispam: 1Uf129KBjvJXoW7WryfWFW7JF4kAFWxtF1xGrg_yoW8CFW5pa yUurWIqr9Iq34jg34kXr17Xr1xWFWkCay7Crnrt345Z3ZxJa4qvryUt3WYg3WUJFW3tayF vF1DtF15Can8ArUanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDUYxBIdaVFxhVjvjDU0xZFpf9x07Uo6wZUUUUU= X-Originating-IP: [111.207.123.58] X-CM-SenderInfo: 51dqwsp1b1xqqrwthudrp/xtbB0gF5i1UMYIEdRQABsA Cc: yang_y_yi@163.com, fbl@sysclose.org Subject: [ovs-dev] [PATCH V3 4/4] Update Documentation/topics/userspace-tso.rst X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" From: Yi Yang With GSO and GRO enabled, OVS DPDK can do GSO by software if NIC can't support TSO or VXLAN TSO hardware offload. Signed-off-by: Yi Yang --- Documentation/topics/userspace-tso.rst | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/Documentation/topics/userspace-tso.rst b/Documentation/topics/userspace-tso.rst index aafa4a1..3a255cd 100644 --- a/Documentation/topics/userspace-tso.rst +++ b/Documentation/topics/userspace-tso.rst @@ -87,8 +87,8 @@ used to enable same:: Limitations ~~~~~~~~~~~ -The current OvS userspace `TSO` implementation supports flat and VLAN networks -only (i.e. no support for `TSO` over tunneled connection [VxLAN, GRE, IPinIP, +The current OvS userspace `TSO` implementation supports flat, VLAN and VXLAN +networks only (i.e. no support for `TSO` over tunneled connection [GRE, IPinIP, etc.]). The NIC driver must support and advertise checksum offload for TCP and UDP. @@ -98,11 +98,10 @@ in Open vSwitch. Currently, if the NIC supports that, then the feature is enabled, otherwise TSO can still be enabled but SCTP packets sent to the NIC will be dropped. -There is no software implementation of TSO, so all ports attached to the -datapath must support TSO or packets using that feature will be dropped -on ports without TSO support. That also means guests using vhost-user -in client mode will receive TSO packet regardless of TSO being enabled -or disabled within the guest. +There is software implementation of TSO, which is called as GSO (Generic +Segment Offload), so all ports attached to the datapath mustn't support TSO. +That also means guests using vhost-user in client mode can receive TSO packet +regardless of TSO being enabled or disabled within the guest. ~~~~~~~~~~~~~~~~~~ Performance Tuning