From patchwork Mon Mar 27 10:50:10 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mike Pattrick X-Patchwork-Id: 1761502 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=140.211.166.138; helo=smtp1.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=cDP5PO6D; dkim-atps=neutral Received: from smtp1.osuosl.org (smtp1.osuosl.org [140.211.166.138]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4PlV2z59Nzz1yYh for ; Mon, 27 Mar 2023 21:50:35 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by smtp1.osuosl.org (Postfix) with ESMTP id C59BA81E8C; Mon, 27 Mar 2023 10:50:33 +0000 (UTC) DKIM-Filter: OpenDKIM Filter v2.11.0 smtp1.osuosl.org C59BA81E8C Authentication-Results: smtp1.osuosl.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=cDP5PO6D X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp1.osuosl.org ([127.0.0.1]) by localhost (smtp1.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id FXUnjXh8AO6C; Mon, 27 Mar 2023 10:50:32 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by smtp1.osuosl.org (Postfix) with ESMTPS id C7F0481A4E; Mon, 27 Mar 2023 10:50:31 +0000 (UTC) DKIM-Filter: OpenDKIM Filter v2.11.0 smtp1.osuosl.org C7F0481A4E Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id B5532C0035; Mon, 27 Mar 2023 10:50:30 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp3.osuosl.org (smtp3.osuosl.org [140.211.166.136]) by lists.linuxfoundation.org (Postfix) with ESMTP id 53EA0C008C for ; Mon, 27 Mar 2023 10:50:29 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp3.osuosl.org (Postfix) with ESMTP id 0F8D460F1A for ; Mon, 27 Mar 2023 10:50:29 +0000 (UTC) DKIM-Filter: OpenDKIM Filter v2.11.0 smtp3.osuosl.org 0F8D460F1A Authentication-Results: smtp3.osuosl.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=cDP5PO6D X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp3.osuosl.org ([127.0.0.1]) by localhost (smtp3.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id vEmMQN7Ek4dE for ; Mon, 27 Mar 2023 10:50:28 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.8.0 DKIM-Filter: OpenDKIM Filter v2.11.0 smtp3.osuosl.org EF5F860ECA Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by smtp3.osuosl.org (Postfix) with ESMTPS id EF5F860ECA for ; Mon, 27 Mar 2023 10:50:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1679914226; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=iKNpam2DyoTumxiNxtRwF5Z8oBlwzUFC4JehygCq3j8=; b=cDP5PO6DSY5QJ8Mc9bJIAdhhMalwbQSEUbdrNOpYUzOfa60RyW4fvMia9ALOsFc1RQrM72 ry2WMNS0qWKa497ghYWPHHGDZGZbF7eSQ9HL0u7ObMKRShHkJyn33kZA71x/96+if0E7Vp 1qgjQEBlPv0OT+iqQ3EfrYbaHGAFdM8= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-644-KGO-wqorOZK7JlEmU2O4TQ-1; Mon, 27 Mar 2023 06:50:23 -0400 X-MC-Unique: KGO-wqorOZK7JlEmU2O4TQ-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 2E938185A7AC; Mon, 27 Mar 2023 10:50:23 +0000 (UTC) Received: from mpattric.remote.csb (unknown [10.22.9.221]) by smtp.corp.redhat.com (Postfix) with ESMTP id A38C72166B2A; Mon, 27 Mar 2023 10:50:22 +0000 (UTC) From: Mike Pattrick To: dev@openvswitch.org, david.marchand@redhat.com Date: Mon, 27 Mar 2023 06:50:10 -0400 Message-Id: <20230327105013.491103-2-mkp@redhat.com> In-Reply-To: <20230327105013.491103-1-mkp@redhat.com> References: <20230327105013.491103-1-mkp@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.6 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Cc: Flavio Leitner , i.maximets@ovn.org Subject: [ovs-dev] [PATCH v11 1/4] Documentation: Document netdev offload. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" From: Flavio Leitner Document the implementation of netdev hardware offloading in userspace datapath. Signed-off-by: Flavio Leitner Co-authored-by: Mike Pattrick Signed-off-by: Mike Pattrick Reviewed-by: Simon Horman --- Since v9: - Renamed documentation to reflect the userspace checksum nature of this feature - Edited for formatting and clarity issues. Since v10: - No change --- Documentation/automake.mk | 1 + Documentation/topics/index.rst | 1 + .../topics/userspace-checksum-offloading.rst | 103 ++++++++++++++++++ 3 files changed, 105 insertions(+) create mode 100644 Documentation/topics/userspace-checksum-offloading.rst diff --git a/Documentation/automake.mk b/Documentation/automake.mk index cdf3c9926..8bd3dbb2b 100644 --- a/Documentation/automake.mk +++ b/Documentation/automake.mk @@ -57,6 +57,7 @@ DOC_SOURCE = \ Documentation/topics/record-replay.rst \ Documentation/topics/tracing.rst \ Documentation/topics/usdt-probes.rst \ + Documentation/topics/userspace-checksum-offloading.rst \ Documentation/topics/userspace-tso.rst \ Documentation/topics/userspace-tx-steering.rst \ Documentation/topics/windows.rst \ diff --git a/Documentation/topics/index.rst b/Documentation/topics/index.rst index 90d4c66e6..f239fcf83 100644 --- a/Documentation/topics/index.rst +++ b/Documentation/topics/index.rst @@ -55,5 +55,6 @@ OVS userspace-tso idl-compound-indexes ovs-extensions + userspace-checksum-offloading userspace-tx-steering usdt-probes diff --git a/Documentation/topics/userspace-checksum-offloading.rst b/Documentation/topics/userspace-checksum-offloading.rst new file mode 100644 index 000000000..7ab258710 --- /dev/null +++ b/Documentation/topics/userspace-checksum-offloading.rst @@ -0,0 +1,103 @@ +.. + Licensed under the Apache License, Version 2.0 (the "License"); you may + not use this file except in compliance with the License. You may obtain + a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, WITHOUT + WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the + License for the specific language governing permissions and limitations + under the License. + + Convention for heading levels in Open vSwitch documentation: + + ======= Heading 0 (reserved for the title in a document) + ------- Heading 1 + ~~~~~~~ Heading 2 + +++++++ Heading 3 + ''''''' Heading 4 + + Avoid deeper levels because they do not render well. + +======================================== +Userspace Datapath - Checksum Offloading +======================================== + +This document explains the internals of Open vSwitch support for checksum +offloading in the userspace datapath. + +Design +------ + +Open vSwitch strives to forward packets as they arrive regardless of whether +the checksum is correct or not. OVS is not responsible for fixing external +checksum issues. + +The checksum calculation can be offloaded to the NIC when the packet's checksum +is verified, known to be good, or known to be destined for an interface that +will recalculate the checksum anyways. + +In other cases, OVS will update the checksum if packet contents is modified in +a way that would also invalidate the checksum and the checksum status is not +known. + +For example, OVS can accept a packet with a corrupted IP checksum, and a flow +rule can change the IP destination address to another address. In that case, +OVS needs to partially recompute the checksum instead of offloading or +calculate all of it again which would fix the existing issue. + +The interface (internally referred to as a netdev) can set flags indicating if +the checksum is good or bad. The checksum is considered unverified if no flag +is set. + +When packets ingress into the datapath with good checksum, OVS should enable +checksum offload by default. This allows the data path to postpone checksum +updates until the packet egress the data path. + +When a packet egress the datapath, the packet flags and the egress interface +flags are verified to make sure all required NIC offload features to send out +the packet are available. If not, the data path will fall back to equivalent +software implementation. + + +Interface (a.k.a. Netdev) +------------------------- + +When the interface initiates, it should set the flags to tell the datapath +which offload features are supported. For example, if the driver supports IP +checksum offloading, then netdev->ol_flags should set the flag +NETDEV_TX_OFFLOAD_IPV4_CKSUM. + + +Rules +----- + +1) OVS should strive to forward all packets regardless of checksum. + +2) OVS must not correct a bad packet checksum. + +3) Packet with flag DP_PACKET_OL_RX_IP_CKSUM_GOOD means that the IP checksum is + present in the packet and it is good. + +4) Packet with flag DP_PACKET_OL_RX_IP_CKSUM_BAD means that the IP checksum is + present in the packet and it is bad. Extra care should be taken to not fix + the packet during data path processing. + +5) The ingress packet parser can only set DP_PACKET_OL_TX_IP_CKSUM if the + packet has DP_PACKET_OL_RX_IP_CKSUM_GOOD to not violate rule #2. + +6) Packet with flag DP_PACKET_OL_TX_IPV4 is an IPv4 packet. + +7) Packet with flag DP_PACKET_OL_TX_IPV6 is an IPv6 packet. + +8) Packet with flag DP_PACKET_OL_TX_IP_CKSUM tells the datapath to skip + updating the IP checksum if the packet is modified. The IP checksum will be + calculated by the egress interface if that supports IP checksum offload, + otherwise the IP checksum will be performed in software before handing over + the packet to the interface. + +9) When there are modifications to the packet that requires a checksum update, + the datapath needs to remove the DP_PACKET_OL_RX_IP_CKSUM_GOOD flag, + otherwise the checksum is assumed to be good in the packet. From patchwork Mon Mar 27 10:50:11 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mike Pattrick X-Patchwork-Id: 1761503 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=2605:bc80:3010::136; helo=smtp3.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=HLEH2fv6; dkim-atps=neutral Received: from smtp3.osuosl.org (smtp3.osuosl.org [IPv6:2605:bc80:3010::136]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4PlV313GZkz1yXq for ; Mon, 27 Mar 2023 21:50:37 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by smtp3.osuosl.org (Postfix) with ESMTP id 4AD9E60FF2; Mon, 27 Mar 2023 10:50:35 +0000 (UTC) DKIM-Filter: OpenDKIM Filter v2.11.0 smtp3.osuosl.org 4AD9E60FF2 Authentication-Results: smtp3.osuosl.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=HLEH2fv6 X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp3.osuosl.org ([127.0.0.1]) by localhost (smtp3.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 9eAVkeU6K3rQ; Mon, 27 Mar 2023 10:50:34 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by smtp3.osuosl.org (Postfix) with ESMTPS id 35E6660F1A; Mon, 27 Mar 2023 10:50:33 +0000 (UTC) DKIM-Filter: OpenDKIM Filter v2.11.0 smtp3.osuosl.org 35E6660F1A Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 7CB78C0092; Mon, 27 Mar 2023 10:50:31 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp1.osuosl.org (smtp1.osuosl.org [140.211.166.138]) by lists.linuxfoundation.org (Postfix) with ESMTP id DD95BC0032 for ; Mon, 27 Mar 2023 10:50:29 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp1.osuosl.org (Postfix) with ESMTP id C685F81A18 for ; Mon, 27 Mar 2023 10:50:29 +0000 (UTC) DKIM-Filter: OpenDKIM Filter v2.11.0 smtp1.osuosl.org C685F81A18 Authentication-Results: smtp1.osuosl.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=HLEH2fv6 X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp1.osuosl.org ([127.0.0.1]) by localhost (smtp1.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id kn1t36qYle3c for ; Mon, 27 Mar 2023 10:50:29 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.8.0 DKIM-Filter: OpenDKIM Filter v2.11.0 smtp1.osuosl.org DC32F819F6 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by smtp1.osuosl.org (Postfix) with ESMTPS id DC32F819F6 for ; Mon, 27 Mar 2023 10:50:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1679914226; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Oao4WC4bBrG/x01ZlXlI+DO3BsWpAZdDNH7FKrZVKXg=; b=HLEH2fv6yTOsahKbN1yX5OkBDzB6r8eaDffHQi84K+YAZSjzROi331KJXmpTDaBnkWYxC6 /1pGkisYxLesIu4y/wweqRUJsIsnFg+Hndu/EaTRgHpyo/i7hLRhOJbZaokcudq1MxNCxr XlrhWkSoQYt6UtuX5kdd4i1rnp8tQEQ= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-397-8wLbINauMiGqkLnrdA94nw-1; Mon, 27 Mar 2023 06:50:24 -0400 X-MC-Unique: 8wLbINauMiGqkLnrdA94nw-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 4E08B801206; Mon, 27 Mar 2023 10:50:24 +0000 (UTC) Received: from mpattric.remote.csb (unknown [10.22.9.221]) by smtp.corp.redhat.com (Postfix) with ESMTP id C33532166B2B; Mon, 27 Mar 2023 10:50:23 +0000 (UTC) From: Mike Pattrick To: dev@openvswitch.org, david.marchand@redhat.com Date: Mon, 27 Mar 2023 06:50:11 -0400 Message-Id: <20230327105013.491103-3-mkp@redhat.com> In-Reply-To: <20230327105013.491103-2-mkp@redhat.com> References: <20230327105013.491103-1-mkp@redhat.com> <20230327105013.491103-2-mkp@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.6 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Cc: Flavio Leitner , i.maximets@ovn.org Subject: [ovs-dev] [PATCH v11 2/4] dpif-netdev: Show netdev offloading flags. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" From: Flavio Leitner This patch modifies netdev_get_status to include information about checksum offload status by port, allowing the user to gain insight into where checksum offloading is active. Signed-off-by: Flavio Leitner Co-authored-by: Mike Pattrick Signed-off-by: Mike Pattrick Reviewed-by: David Marchand Reviewed-by: Simon Horman --- Since v9: - Removed entire ovs-appctl dpif-netdev/offload-show command, replaced with a field in the netdev status. - Removed duplicative field tx_tso_offload from netdev-dpdk.c Since v10: - No change --- lib/dpif-netdev-unixctl.man | 6 ++++++ lib/netdev-dpdk.c | 5 ----- lib/netdev-provider.h | 1 + lib/netdev.c | 29 ++++++++++++++++++++++++++--- tests/dpif-netdev.at | 18 ++++++++++++++++++ 5 files changed, 51 insertions(+), 8 deletions(-) diff --git a/lib/dpif-netdev-unixctl.man b/lib/dpif-netdev-unixctl.man index 8cd847416..2840d462e 100644 --- a/lib/dpif-netdev-unixctl.man +++ b/lib/dpif-netdev-unixctl.man @@ -262,3 +262,9 @@ PMDs in the case where no value is specified. By default "scalar" is used. \fIstudy_cnt\fR defaults to 128 and indicates the number of packets that the "study" miniflow implementation must parse before choosing an optimal implementation. +. +.IP "\fBdpif-netdev/offload-show\fR [\fIdp\fR] [\fInetdev\fR]" +Prints the hardware offloading features enabled in netdev \fInetdev\fR +attached to datapath \fIdp\fR. The datapath \fIdp\fR parameter can be +omitted if there is only one. All netdev ports are printed if the +parameter \fInetdev\fR is omitted. diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index fb0dd43f7..560694dbc 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -1753,11 +1753,6 @@ netdev_dpdk_get_config(const struct netdev *netdev, struct smap *args) } else { smap_add(args, "rx_csum_offload", "false"); } - if (dev->hw_ol_features & NETDEV_TX_TSO_OFFLOAD) { - smap_add(args, "tx_tso_offload", "true"); - } else { - smap_add(args, "tx_tso_offload", "false"); - } smap_add(args, "lsc_interrupt_mode", dev->lsc_interrupt_mode ? "true" : "false"); diff --git a/lib/netdev-provider.h b/lib/netdev-provider.h index b5420947d..fcf52bdd9 100644 --- a/lib/netdev-provider.h +++ b/lib/netdev-provider.h @@ -37,6 +37,7 @@ extern "C" { struct netdev_tnl_build_header_params; #define NETDEV_NUMA_UNSPEC OVS_NUMA_UNSPEC +/* Keep this enum updated with translation to string below. */ enum netdev_ol_flags { NETDEV_TX_OFFLOAD_IPV4_CKSUM = 1 << 0, NETDEV_TX_OFFLOAD_TCP_CKSUM = 1 << 1, diff --git a/lib/netdev.c b/lib/netdev.c index c79778378..818589246 100644 --- a/lib/netdev.c +++ b/lib/netdev.c @@ -43,6 +43,7 @@ #include "netdev-provider.h" #include "netdev-vport.h" #include "odp-netlink.h" +#include "openvswitch/json.h" #include "openflow/openflow.h" #include "packets.h" #include "openvswitch/ofp-print.h" @@ -1373,9 +1374,31 @@ netdev_get_next_hop(const struct netdev *netdev, int netdev_get_status(const struct netdev *netdev, struct smap *smap) { - return (netdev->netdev_class->get_status - ? netdev->netdev_class->get_status(netdev, smap) - : EOPNOTSUPP); + int err = EOPNOTSUPP; + + /* Set offload status only if relevant. */ + if (netdev_get_dpif_type(netdev) && + strcmp(netdev_get_dpif_type(netdev), "system")) { + +#define OL_ADD_STAT(name, bit) \ + smap_add(smap, name "_csum_offload", \ + netdev->ol_flags & bit ? "true" : "false"); + + OL_ADD_STAT("ip", NETDEV_TX_OFFLOAD_IPV4_CKSUM); + OL_ADD_STAT("tcp", NETDEV_TX_OFFLOAD_TCP_CKSUM); + OL_ADD_STAT("udp", NETDEV_TX_OFFLOAD_UDP_CKSUM); + OL_ADD_STAT("sctp", NETDEV_TX_OFFLOAD_SCTP_CKSUM); + OL_ADD_STAT("tso", NETDEV_TX_OFFLOAD_TCP_TSO); +#undef OL_ADD_STAT + + err = 0; + } + + if (!netdev->netdev_class->get_status) { + return err; + } + + return netdev->netdev_class->get_status(netdev, smap); } /* Returns all assigned IP address to 'netdev' and returns 0. diff --git a/tests/dpif-netdev.at b/tests/dpif-netdev.at index baab60a22..8a663a4b6 100644 --- a/tests/dpif-netdev.at +++ b/tests/dpif-netdev.at @@ -650,6 +650,24 @@ AT_CHECK([ovs-appctl revalidator/resume]) OVS_VSWITCHD_STOP AT_CLEANUP +AT_SETUP([dpif-netdev - check dpif-netdev/offload-show]) +OVS_VSWITCHD_START( + [add-port br0 p1 \ + -- set interface p1 type=dummy options:pstream=punix:$OVS_RUNDIR/p0.sock \ + -- set bridge br0 datapath-type=dummy \ + other-config:datapath-id=1234 fail-mode=secure]) + +AT_CHECK([ovs-vsctl list interface p1 | sed -n 's/^status.*{\(.*\).*}$/\1/p'], [0], [dnl +ip_csum_offload="false", sctp_csum_offload="false", tcp_csum_offload="false", tso_csum_offload="false", udp_csum_offload="false" +], []) + +AT_CHECK([ovs-vsctl list interface br0 | sed -n 's/^status.*{\(.*\).*}$/\1/p'], [0], [dnl +ip_csum_offload="false", sctp_csum_offload="false", tcp_csum_offload="false", tso_csum_offload="false", udp_csum_offload="false" +], []) + +OVS_VSWITCHD_STOP +AT_CLEANUP + # SEND_UDP_PKTS([p_name], [p_ofport]) # # Sends 128 packets to port 'p_name' with different UDP destination ports. From patchwork Mon Mar 27 10:50:12 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mike Pattrick X-Patchwork-Id: 1761504 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=2605:bc80:3010::137; helo=smtp4.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=d49oODT0; dkim-atps=neutral Received: from smtp4.osuosl.org (smtp4.osuosl.org [IPv6:2605:bc80:3010::137]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4PlV383gt2z1yXq for ; Mon, 27 Mar 2023 21:50:44 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by smtp4.osuosl.org (Postfix) with ESMTP id 81FA7416DB; Mon, 27 Mar 2023 10:50:42 +0000 (UTC) DKIM-Filter: OpenDKIM Filter v2.11.0 smtp4.osuosl.org 81FA7416DB Authentication-Results: smtp4.osuosl.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=d49oODT0 X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp4.osuosl.org ([127.0.0.1]) by localhost (smtp4.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 8aj0O1JlkqUb; Mon, 27 Mar 2023 10:50:36 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by smtp4.osuosl.org (Postfix) with ESMTPS id 69A73415E2; Mon, 27 Mar 2023 10:50:35 +0000 (UTC) DKIM-Filter: OpenDKIM Filter v2.11.0 smtp4.osuosl.org 69A73415E2 Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 296B0C008F; Mon, 27 Mar 2023 10:50:35 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp2.osuosl.org (smtp2.osuosl.org [140.211.166.133]) by lists.linuxfoundation.org (Postfix) with ESMTP id BB3DFC0089 for ; Mon, 27 Mar 2023 10:50:32 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp2.osuosl.org (Postfix) with ESMTP id 8111240AAD for ; Mon, 27 Mar 2023 10:50:32 +0000 (UTC) DKIM-Filter: OpenDKIM Filter v2.11.0 smtp2.osuosl.org 8111240AAD Authentication-Results: smtp2.osuosl.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=d49oODT0 X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp2.osuosl.org ([127.0.0.1]) by localhost (smtp2.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id zrqs9Ds1k4AV for ; Mon, 27 Mar 2023 10:50:30 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.8.0 DKIM-Filter: OpenDKIM Filter v2.11.0 smtp2.osuosl.org 7735740105 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by smtp2.osuosl.org (Postfix) with ESMTPS id 7735740105 for ; Mon, 27 Mar 2023 10:50:30 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1679914229; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=7qr8OK6MAndxWTSpFr2FBx3IvePFoB2S9WonZs3R6Ug=; b=d49oODT0IYKzlQtH1gJnB1Rt/XHXpA19EFbnyjlmV1rJfPJWe4Plfvc2hCHyXn2ZiSXIm9 9qQgtwuFfHz2S9WACB99hpkEvWj5D9Sx5jslkFmai5FZb0y9BM8pLe+f7rWGxcVv2/vlQo FUwpqCMA5IQeoyHINBYmTV14eigZG+8= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-662-5v4yavOtP_K65lJmQjiphw-1; Mon, 27 Mar 2023 06:50:26 -0400 X-MC-Unique: 5v4yavOtP_K65lJmQjiphw-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id B8E6D3C11783; Mon, 27 Mar 2023 10:50:25 +0000 (UTC) Received: from mpattric.remote.csb (unknown [10.22.9.221]) by smtp.corp.redhat.com (Postfix) with ESMTP id CC0AC2166B26; Mon, 27 Mar 2023 10:50:24 +0000 (UTC) From: Mike Pattrick To: dev@openvswitch.org, david.marchand@redhat.com Date: Mon, 27 Mar 2023 06:50:12 -0400 Message-Id: <20230327105013.491103-4-mkp@redhat.com> In-Reply-To: <20230327105013.491103-3-mkp@redhat.com> References: <20230327105013.491103-1-mkp@redhat.com> <20230327105013.491103-2-mkp@redhat.com> <20230327105013.491103-3-mkp@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.6 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Cc: Flavio Leitner , i.maximets@ovn.org Subject: [ovs-dev] [PATCH v11 3/4] userspace: Enable IP checksum offloading by default. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" From: Flavio Leitner The netdev receiving packets is supposed to provide the flags indicating if the IP checksum was verified and it is GOOD or BAD, otherwise the stack will check when appropriate by software. If the packet comes with good checksum, then postpone the checksum calculation to the egress device if needed. When encapsulate a packet with that flag, set the checksum of the inner IP header since that is not yet supported. Calculate the IP checksum when the packet is going to be sent over a device that doesn't support the feature. Linux devices don't support IP checksum offload alone, so the support is not enabled. Signed-off-by: Flavio Leitner Co-authored-by: Mike Pattrick Signed-off-by: Mike Pattrick Reviewed-by: Simon Horman --- Since v9: - Removed duplicative field tx_ip_csum_offload from netdev-dpdk.c - Left rx_csum_offload field as it is not duplicative - Moved system-userspace-offload.at tests to dpif-netdev.at - Various visual changes - Extended miniflow_extract changes into avx512 code Since v10: - avx512 checksum length corrected --- lib/conntrack.c | 19 ++++---- lib/dp-packet.c | 15 ++++++ lib/dp-packet.h | 62 +++++++++++++++++++++++-- lib/dpif-netdev-extract-avx512.c | 5 ++ lib/dpif-netdev.c | 2 + lib/flow.c | 15 ++++-- lib/ipf.c | 11 +++-- lib/netdev-dpdk.c | 71 +++++++++++++++++++---------- lib/netdev-dummy.c | 23 ++++++++++ lib/netdev-native-tnl.c | 21 ++++++--- lib/netdev.c | 16 +++++++ lib/odp-execute-avx512.c | 19 +++++--- lib/odp-execute.c | 21 +++++++-- lib/packets.c | 34 +++++++++++--- tests/dpif-netdev.at | 78 ++++++++++++++++++++++++++++++++ 15 files changed, 345 insertions(+), 67 deletions(-) diff --git a/lib/conntrack.c b/lib/conntrack.c index 8cf7779c6..54166c320 100644 --- a/lib/conntrack.c +++ b/lib/conntrack.c @@ -2027,16 +2027,15 @@ conn_key_extract(struct conntrack *ct, struct dp_packet *pkt, ovs_be16 dl_type, ctx->key.dl_type = dl_type; if (ctx->key.dl_type == htons(ETH_TYPE_IP)) { - bool hwol_bad_l3_csum = dp_packet_ip_checksum_bad(pkt); - if (hwol_bad_l3_csum) { + if (dp_packet_ip_checksum_bad(pkt)) { ok = false; COVERAGE_INC(conntrack_l3csum_err); } else { - bool hwol_good_l3_csum = dp_packet_ip_checksum_valid(pkt) - || dp_packet_hwol_is_ipv4(pkt); - /* Validate the checksum only when hwol is not supported. */ + /* Validate the checksum only when hwol is not supported and the + * packet's checksum status is not known. */ ok = extract_l3_ipv4(&ctx->key, l3, dp_packet_l3_size(pkt), NULL, - !hwol_good_l3_csum); + !dp_packet_hwol_is_ipv4(pkt) && + !dp_packet_ip_checksum_good(pkt)); } } else if (ctx->key.dl_type == htons(ETH_TYPE_IPV6)) { ok = extract_l3_ipv6(&ctx->key, l3, dp_packet_l3_size(pkt), NULL); @@ -2047,8 +2046,8 @@ conn_key_extract(struct conntrack *ct, struct dp_packet *pkt, ovs_be16 dl_type, if (ok) { bool hwol_bad_l4_csum = dp_packet_l4_checksum_bad(pkt); if (!hwol_bad_l4_csum) { - bool hwol_good_l4_csum = dp_packet_l4_checksum_valid(pkt) - || dp_packet_hwol_tx_l4_checksum(pkt); + bool hwol_good_l4_csum = dp_packet_l4_checksum_good(pkt) + || dp_packet_hwol_tx_l4_checksum(pkt); /* Validate the checksum only when hwol is not supported. */ if (extract_l4(&ctx->key, l4, dp_packet_l4_size(pkt), &ctx->icmp_related, l3, !hwol_good_l4_csum, @@ -3357,7 +3356,9 @@ handle_ftp_ctl(struct conntrack *ct, const struct conn_lookup_ctx *ctx, } if (seq_skew) { ip_len = ntohs(l3_hdr->ip_tot_len) + seq_skew; - if (!dp_packet_hwol_is_ipv4(pkt)) { + if (dp_packet_hwol_tx_ip_csum(pkt)) { + dp_packet_ol_reset_ip_csum_good(pkt); + } else { l3_hdr->ip_csum = recalc_csum16(l3_hdr->ip_csum, l3_hdr->ip_tot_len, htons(ip_len)); diff --git a/lib/dp-packet.c b/lib/dp-packet.c index ae8ab5800..61c36de44 100644 --- a/lib/dp-packet.c +++ b/lib/dp-packet.c @@ -21,6 +21,7 @@ #include "dp-packet.h" #include "netdev-afxdp.h" #include "netdev-dpdk.h" +#include "netdev-provider.h" #include "openvswitch/dynamic-string.h" #include "util.h" @@ -530,3 +531,17 @@ dp_packet_compare_offsets(struct dp_packet *b1, struct dp_packet *b2, } return true; } + +/* Checks if the packet 'p' is compatible with netdev_ol_flags 'flags' + * and if not, updates the packet with the software fall back. */ +void +dp_packet_ol_send_prepare(struct dp_packet *p, uint64_t flags) +{ + if (dp_packet_ip_checksum_good(p) || !dp_packet_hwol_tx_ip_csum(p)) { + dp_packet_hwol_reset_tx_ip_csum(p); + } else if (!(flags & NETDEV_TX_OFFLOAD_IPV4_CKSUM)) { + dp_packet_ip_set_header_csum(p); + dp_packet_ol_set_ip_csum_good(p); + dp_packet_hwol_reset_tx_ip_csum(p); + } +} diff --git a/lib/dp-packet.h b/lib/dp-packet.h index b3e6a5d10..af0a2b7f0 100644 --- a/lib/dp-packet.h +++ b/lib/dp-packet.h @@ -25,6 +25,7 @@ #include #endif +#include "csum.h" #include "netdev-afxdp.h" #include "netdev-dpdk.h" #include "openvswitch/list.h" @@ -83,6 +84,8 @@ enum dp_packet_offload_mask { DEF_OL_FLAG(DP_PACKET_OL_TX_UDP_CKSUM, RTE_MBUF_F_TX_UDP_CKSUM, 0x400), /* Offload SCTP checksum. */ DEF_OL_FLAG(DP_PACKET_OL_TX_SCTP_CKSUM, RTE_MBUF_F_TX_SCTP_CKSUM, 0x800), + /* Offload IP checksum. */ + DEF_OL_FLAG(DP_PACKET_OL_TX_IP_CKSUM, RTE_MBUF_F_TX_IP_CKSUM, 0x1000), /* Adding new field requires adding to DP_PACKET_OL_SUPPORTED_MASK. */ }; @@ -97,7 +100,8 @@ enum dp_packet_offload_mask { DP_PACKET_OL_TX_IPV6 | \ DP_PACKET_OL_TX_TCP_CKSUM | \ DP_PACKET_OL_TX_UDP_CKSUM | \ - DP_PACKET_OL_TX_SCTP_CKSUM) + DP_PACKET_OL_TX_SCTP_CKSUM | \ + DP_PACKET_OL_TX_IP_CKSUM) #define DP_PACKET_OL_TX_L4_MASK (DP_PACKET_OL_TX_TCP_CKSUM | \ DP_PACKET_OL_TX_UDP_CKSUM | \ @@ -239,6 +243,7 @@ static inline bool dp_packet_equal(const struct dp_packet *, bool dp_packet_compare_offsets(struct dp_packet *good, struct dp_packet *test, struct ds *err_str); +void dp_packet_ol_send_prepare(struct dp_packet *, uint64_t); /* Frees memory that 'b' points to, as well as 'b' itself. */ @@ -1030,6 +1035,26 @@ dp_packet_hwol_set_tx_ipv6(struct dp_packet *b) *dp_packet_ol_flags_ptr(b) |= DP_PACKET_OL_TX_IPV6; } +/* Returns 'true' if packet 'p' is marked for IPv4 checksum offloading. */ +static inline bool +dp_packet_hwol_tx_ip_csum(const struct dp_packet *p) +{ + return !!(*dp_packet_ol_flags_ptr(p) & DP_PACKET_OL_TX_IP_CKSUM); +} + +/* Marks packet 'p' for IPv4 checksum offloading. */ +static inline void +dp_packet_hwol_set_tx_ip_csum(struct dp_packet *p) +{ + *dp_packet_ol_flags_ptr(p) |= DP_PACKET_OL_TX_IP_CKSUM; +} + +static inline void +dp_packet_hwol_reset_tx_ip_csum(struct dp_packet *p) +{ + *dp_packet_ol_flags_ptr(p) &= ~DP_PACKET_OL_TX_IP_CKSUM; +} + /* Mark packet 'b' for TCP checksum offloading. It implies that either * the packet 'b' is marked for IPv4 or IPv6 checksum offloading. */ static inline void @@ -1063,13 +1088,31 @@ dp_packet_hwol_set_tcp_seg(struct dp_packet *b) *dp_packet_ol_flags_ptr(b) |= DP_PACKET_OL_TX_TCP_SEG; } +/* Returns 'true' if the IP header has good integrity and the + * checksum in it is complete. */ static inline bool -dp_packet_ip_checksum_valid(const struct dp_packet *p) +dp_packet_ip_checksum_good(const struct dp_packet *p) { return (*dp_packet_ol_flags_ptr(p) & DP_PACKET_OL_RX_IP_CKSUM_MASK) == DP_PACKET_OL_RX_IP_CKSUM_GOOD; } +/* Marks packet 'p' with good IPv4 checksum. */ +static inline void +dp_packet_ol_set_ip_csum_good(struct dp_packet *p) +{ + *dp_packet_ol_flags_ptr(p) &= ~DP_PACKET_OL_RX_IP_CKSUM_BAD; + *dp_packet_ol_flags_ptr(p) |= DP_PACKET_OL_RX_IP_CKSUM_GOOD; +} + +/* Resets IP good checksum flag in packet 'p'. */ +static inline void +dp_packet_ol_reset_ip_csum_good(struct dp_packet *p) +{ + *dp_packet_ol_flags_ptr(p) &= ~DP_PACKET_OL_RX_IP_CKSUM_GOOD; +} + +/* Marks packet 'p' with bad IPv4 checksum. */ static inline bool dp_packet_ip_checksum_bad(const struct dp_packet *p) { @@ -1077,8 +1120,21 @@ dp_packet_ip_checksum_bad(const struct dp_packet *p) DP_PACKET_OL_RX_IP_CKSUM_BAD; } +/* Calculate and set the IPv4 header checksum in packet 'p'. */ +static inline void +dp_packet_ip_set_header_csum(struct dp_packet *p) +{ + struct ip_header *ip = dp_packet_l3(p); + + ovs_assert(ip); + ip->ip_csum = 0; + ip->ip_csum = csum(ip, sizeof *ip); +} + +/* Returns 'true' if the packet 'p' has good integrity and the + * checksum in it is correct. */ static inline bool -dp_packet_l4_checksum_valid(const struct dp_packet *p) +dp_packet_l4_checksum_good(const struct dp_packet *p) { return (*dp_packet_ol_flags_ptr(p) & DP_PACKET_OL_RX_L4_CKSUM_MASK) == DP_PACKET_OL_RX_L4_CKSUM_GOOD; diff --git a/lib/dpif-netdev-extract-avx512.c b/lib/dpif-netdev-extract-avx512.c index 968845f2d..66884eaf0 100644 --- a/lib/dpif-netdev-extract-avx512.c +++ b/lib/dpif-netdev-extract-avx512.c @@ -698,6 +698,7 @@ mfex_ipv6_set_l2_pad_size(struct dp_packet *pkt, return -1; } dp_packet_set_l2_pad_size(pkt, len_from_ipv6 - (p_len + IPV6_HEADER_LEN)); + dp_packet_hwol_set_tx_ipv6(pkt); return 0; } @@ -728,6 +729,10 @@ mfex_ipv4_set_l2_pad_size(struct dp_packet *pkt, struct ip_header *nh, return -1; } dp_packet_set_l2_pad_size(pkt, len_from_ipv4 - ip_tot_len); + dp_packet_hwol_set_tx_ipv4(pkt); + if (dp_packet_ip_checksum_good(pkt)) { + dp_packet_hwol_set_tx_ip_csum(pkt); + } return 0; } diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index aed2c8fbb..152392313 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -7913,6 +7913,8 @@ dp_netdev_upcall(struct dp_netdev_pmd_thread *pmd, struct dp_packet *packet_, ds_destroy(&ds); } + dp_packet_ol_send_prepare(packet_, 0); + return dp->upcall_cb(packet_, flow, ufid, pmd->core_id, type, userdata, actions, wc, put_actions, dp->upcall_aux); } diff --git a/lib/flow.c b/lib/flow.c index c3a3aa3ce..6c8bf7fc0 100644 --- a/lib/flow.c +++ b/lib/flow.c @@ -907,6 +907,10 @@ miniflow_extract(struct dp_packet *packet, struct miniflow *dst) nw_proto = nh->ip_proto; nw_frag = ipv4_get_nw_frag(nh); data_pull(&data, &size, ip_len); + dp_packet_hwol_set_tx_ipv4(packet); + if (dp_packet_ip_checksum_good(packet)) { + dp_packet_hwol_set_tx_ip_csum(packet); + } } else if (dl_type == htons(ETH_TYPE_IPV6)) { const struct ovs_16aligned_ip6_hdr *nh = data; ovs_be32 tc_flow; @@ -920,6 +924,7 @@ miniflow_extract(struct dp_packet *packet, struct miniflow *dst) } data_pull(&data, &size, sizeof *nh); + dp_packet_hwol_set_tx_ipv6(packet); plen = ntohs(nh->ip6_plen); dp_packet_set_l2_pad_size(packet, size - plen); size = plen; /* Never pull padding. */ @@ -3221,9 +3226,12 @@ packet_expand(struct dp_packet *p, const struct flow *flow, size_t size) struct ip_header *ip = dp_packet_l3(p); ip->ip_tot_len = htons(p->l4_ofs - p->l3_ofs + l4_len); - ip->ip_csum = 0; - ip->ip_csum = csum(ip, sizeof *ip); - + if (dp_packet_hwol_tx_ip_csum(p)) { + dp_packet_ol_reset_ip_csum_good(p); + } else { + dp_packet_ip_set_header_csum(p); + dp_packet_ol_set_ip_csum_good(p); + } pseudo_hdr_csum = packet_csum_pseudoheader(ip); } else { /* ETH_TYPE_IPV6 */ struct ovs_16aligned_ip6_hdr *nh = dp_packet_l3(p); @@ -3313,6 +3321,7 @@ flow_compose(struct dp_packet *p, const struct flow *flow, /* Checksum has already been zeroed by put_zeros call. */ ip->ip_csum = csum(ip, sizeof *ip); + dp_packet_ol_set_ip_csum_good(p); pseudo_hdr_csum = packet_csum_pseudoheader(ip); flow_compose_l4_csum(p, flow, pseudo_hdr_csum); } else if (flow->dl_type == htons(ETH_TYPE_IPV6)) { diff --git a/lib/ipf.c b/lib/ipf.c index d45266374..18c98576a 100644 --- a/lib/ipf.c +++ b/lib/ipf.c @@ -433,7 +433,9 @@ ipf_reassemble_v4_frags(struct ipf_list *ipf_list) len += rest_len; l3 = dp_packet_l3(pkt); ovs_be16 new_ip_frag_off = l3->ip_frag_off & ~htons(IP_MORE_FRAGMENTS); - if (!dp_packet_hwol_is_ipv4(pkt)) { + if (dp_packet_hwol_tx_ip_csum(pkt)) { + dp_packet_ol_reset_ip_csum_good(pkt); + } else { l3->ip_csum = recalc_csum16(l3->ip_csum, l3->ip_frag_off, new_ip_frag_off); l3->ip_csum = recalc_csum16(l3->ip_csum, l3->ip_tot_len, htons(len)); @@ -608,8 +610,7 @@ ipf_is_valid_v4_frag(struct ipf *ipf, struct dp_packet *pkt) goto invalid_pkt; } - if (OVS_UNLIKELY(!dp_packet_ip_checksum_valid(pkt) - && !dp_packet_hwol_is_ipv4(pkt) + if (OVS_UNLIKELY(!dp_packet_ip_checksum_good(pkt) && csum(l3, ip_hdr_len) != 0)) { COVERAGE_INC(ipf_l3csum_err); goto invalid_pkt; @@ -1185,7 +1186,9 @@ ipf_post_execute_reass_pkts(struct ipf *ipf, } else { struct ip_header *l3_frag = dp_packet_l3(frag_i->pkt); struct ip_header *l3_reass = dp_packet_l3(pkt); - if (!dp_packet_hwol_is_ipv4(frag_i->pkt)) { + if (dp_packet_hwol_tx_ip_csum(frag_i->pkt)) { + dp_packet_ol_reset_ip_csum_good(frag_i->pkt); + } else { ovs_be32 reass_ip = get_16aligned_be32(&l3_reass->ip_src); ovs_be32 frag_ip = diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 560694dbc..8c2c07898 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -411,8 +411,9 @@ enum dpdk_hw_ol_features { NETDEV_RX_CHECKSUM_OFFLOAD = 1 << 0, NETDEV_RX_HW_CRC_STRIP = 1 << 1, NETDEV_RX_HW_SCATTER = 1 << 2, - NETDEV_TX_TSO_OFFLOAD = 1 << 3, - NETDEV_TX_SCTP_CHECKSUM_OFFLOAD = 1 << 4, + NETDEV_TX_IPV4_CKSUM_OFFLOAD = 1 << 3, + NETDEV_TX_TSO_OFFLOAD = 1 << 4, + NETDEV_TX_SCTP_CHECKSUM_OFFLOAD = 1 << 5, }; /* @@ -1039,6 +1040,10 @@ dpdk_eth_dev_port_config(struct netdev_dpdk *dev, int n_rxq, int n_txq) conf.rxmode.offloads |= RTE_ETH_RX_OFFLOAD_KEEP_CRC; } + if (dev->hw_ol_features & NETDEV_TX_IPV4_CKSUM_OFFLOAD) { + conf.txmode.offloads |= RTE_ETH_TX_OFFLOAD_IPV4_CKSUM; + } + if (dev->hw_ol_features & NETDEV_TX_TSO_OFFLOAD) { conf.txmode.offloads |= DPDK_TX_TSO_OFFLOAD_FLAGS; if (dev->hw_ol_features & NETDEV_TX_SCTP_CHECKSUM_OFFLOAD) { @@ -1179,6 +1184,12 @@ dpdk_eth_dev_init(struct netdev_dpdk *dev) dev->hw_ol_features &= ~NETDEV_RX_HW_SCATTER; } + if (info.tx_offload_capa & RTE_ETH_TX_OFFLOAD_IPV4_CKSUM) { + dev->hw_ol_features |= NETDEV_TX_IPV4_CKSUM_OFFLOAD; + } else { + dev->hw_ol_features &= ~NETDEV_TX_IPV4_CKSUM_OFFLOAD; + } + dev->hw_ol_features &= ~NETDEV_TX_TSO_OFFLOAD; if (userspace_tso_enabled()) { if ((info.tx_offload_capa & tx_tso_offload_capa) @@ -2195,13 +2206,16 @@ netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, struct rte_mbuf *mbuf) { struct dp_packet *pkt = CONTAINER_OF(mbuf, struct dp_packet, mbuf); - if (mbuf->ol_flags & RTE_MBUF_F_TX_L4_MASK) { - mbuf->l2_len = (char *)dp_packet_l3(pkt) - (char *)dp_packet_eth(pkt); - mbuf->l3_len = (char *)dp_packet_l4(pkt) - (char *)dp_packet_l3(pkt); - mbuf->outer_l2_len = 0; - mbuf->outer_l3_len = 0; + if (!(mbuf->ol_flags & (RTE_MBUF_F_TX_IP_CKSUM | RTE_MBUF_F_TX_L4_MASK + | RTE_MBUF_F_TX_TCP_SEG))) { + return true; } + mbuf->l2_len = (char *) dp_packet_l3(pkt) - (char *) dp_packet_eth(pkt); + mbuf->l3_len = (char *) dp_packet_l4(pkt) - (char *) dp_packet_l3(pkt); + mbuf->outer_l2_len = 0; + mbuf->outer_l3_len = 0; + if (mbuf->ol_flags & RTE_MBUF_F_TX_TCP_SEG) { struct tcp_header *th = dp_packet_l4(pkt); @@ -2260,13 +2274,11 @@ netdev_dpdk_eth_tx_burst(struct netdev_dpdk *dev, int qid, uint32_t nb_tx = 0; uint16_t nb_tx_prep = cnt; - if (userspace_tso_enabled()) { - nb_tx_prep = rte_eth_tx_prepare(dev->port_id, qid, pkts, cnt); - if (nb_tx_prep != cnt) { - VLOG_WARN_RL(&rl, "%s: Output batch contains invalid packets. " - "Only %u/%u are valid: %s", dev->up.name, nb_tx_prep, - cnt, rte_strerror(rte_errno)); - } + nb_tx_prep = rte_eth_tx_prepare(dev->port_id, qid, pkts, cnt); + if (nb_tx_prep != cnt) { + VLOG_WARN_RL(&rl, "%s: Output batch contains invalid packets. " + "Only %u/%u are valid: %s", netdev_get_name(&dev->up), + nb_tx_prep, cnt, rte_strerror(rte_errno)); } while (nb_tx != nb_tx_prep) { @@ -2605,11 +2617,19 @@ dpdk_copy_dp_packet_to_mbuf(struct rte_mempool *mp, struct dp_packet *pkt_orig) memcpy(&pkt_dest->l2_pad_size, &pkt_orig->l2_pad_size, sizeof(struct dp_packet) - offsetof(struct dp_packet, l2_pad_size)); - if (mbuf_dest->ol_flags & RTE_MBUF_F_TX_L4_MASK) { - mbuf_dest->l2_len = (char *)dp_packet_l3(pkt_dest) - - (char *)dp_packet_eth(pkt_dest); - mbuf_dest->l3_len = (char *)dp_packet_l4(pkt_dest) + if (dp_packet_l3(pkt_dest)) { + if (dp_packet_eth(pkt_dest)) { + mbuf_dest->l2_len = (char *) dp_packet_l3(pkt_dest) + - (char *) dp_packet_eth(pkt_dest); + } else { + mbuf_dest->l2_len = 0; + } + if (dp_packet_l4(pkt_dest)) { + mbuf_dest->l3_len = (char *) dp_packet_l4(pkt_dest) - (char *) dp_packet_l3(pkt_dest); + } else { + mbuf_dest->l3_len = 0; + } } return pkt_dest; @@ -2667,11 +2687,9 @@ netdev_dpdk_common_send(struct netdev *netdev, struct dp_packet_batch *batch, pkt_cnt = cnt; /* Prepare each mbuf for hardware offloading. */ - if (userspace_tso_enabled()) { - cnt = netdev_dpdk_prep_hwol_batch(dev, pkts, pkt_cnt); - stats->tx_invalid_hwol_drops += pkt_cnt - cnt; - pkt_cnt = cnt; - } + cnt = netdev_dpdk_prep_hwol_batch(dev, pkts, pkt_cnt); + stats->tx_invalid_hwol_drops += pkt_cnt - cnt; + pkt_cnt = cnt; /* Apply Quality of Service policy. */ cnt = netdev_dpdk_qos_run(dev, pkts, pkt_cnt, true); @@ -5228,6 +5246,13 @@ netdev_dpdk_reconfigure(struct netdev *netdev) } err = dpdk_eth_dev_init(dev); + + if (dev->hw_ol_features & NETDEV_TX_IPV4_CKSUM_OFFLOAD) { + netdev->ol_flags |= NETDEV_TX_OFFLOAD_IPV4_CKSUM; + } else { + netdev->ol_flags &= ~NETDEV_TX_OFFLOAD_IPV4_CKSUM; + } + if (dev->hw_ol_features & NETDEV_TX_TSO_OFFLOAD) { netdev->ol_flags |= NETDEV_TX_OFFLOAD_TCP_TSO; netdev->ol_flags |= NETDEV_TX_OFFLOAD_TCP_CKSUM; diff --git a/lib/netdev-dummy.c b/lib/netdev-dummy.c index 7467e9fbc..45a720e86 100644 --- a/lib/netdev-dummy.c +++ b/lib/netdev-dummy.c @@ -147,6 +147,11 @@ struct netdev_dummy { int requested_n_txq OVS_GUARDED; int requested_n_rxq OVS_GUARDED; int requested_numa_id OVS_GUARDED; + + /* Enable netdev IP csum offload. */ + bool ol_ip_csum OVS_GUARDED; + /* Flag RX packet with good csum. */ + bool ol_ip_csum_set_good OVS_GUARDED; }; /* Max 'recv_queue_len' in struct netdev_dummy. */ @@ -914,6 +919,13 @@ netdev_dummy_set_config(struct netdev *netdev_, const struct smap *args, } } + netdev->ol_ip_csum_set_good = smap_get_bool(args, "ol_ip_csum_set_good", + false); + netdev->ol_ip_csum = smap_get_bool(args, "ol_ip_csum", false); + if (netdev->ol_ip_csum) { + netdev_->ol_flags |= NETDEV_TX_OFFLOAD_IPV4_CKSUM; + } + netdev_change_seq_changed(netdev_); /* 'dummy-pmd' specific config. */ @@ -1092,6 +1104,10 @@ netdev_dummy_rxq_recv(struct netdev_rxq *rxq_, struct dp_packet_batch *batch, netdev->rxq_stats[rxq_->queue_id].bytes += dp_packet_size(packet); netdev->custom_stats[0].value++; netdev->custom_stats[1].value++; + if (netdev->ol_ip_csum_set_good) { + /* The netdev hardware sets the flag when the packet has good csum. */ + dp_packet_ol_set_ip_csum_good(packet); + } ovs_mutex_unlock(&netdev->mutex); dp_packet_batch_init_packet(batch, packet); @@ -1174,6 +1190,13 @@ netdev_dummy_send(struct netdev *netdev, int qid, } ovs_mutex_lock(&dev->mutex); + if (dp_packet_hwol_tx_ip_csum(packet)) { + if (!dp_packet_ip_checksum_good(packet)) { + dp_packet_ip_set_header_csum(packet); + dp_packet_ol_set_ip_csum_good(packet); + } + } + dev->stats.tx_packets++; dev->txq_stats[qid].packets++; dev->stats.tx_bytes += size; diff --git a/lib/netdev-native-tnl.c b/lib/netdev-native-tnl.c index b89dfdd52..53055a254 100644 --- a/lib/netdev-native-tnl.c +++ b/lib/netdev-native-tnl.c @@ -88,7 +88,10 @@ netdev_tnl_ip_extract_tnl_md(struct dp_packet *packet, struct flow_tnl *tnl, ovs_be32 ip_src, ip_dst; - if (OVS_UNLIKELY(!dp_packet_ip_checksum_valid(packet))) { + /* A packet coming from a network device might have the + * csum already checked. In this case, skip the check. */ + if (OVS_UNLIKELY(!dp_packet_ip_checksum_good(packet)) + && !dp_packet_hwol_tx_ip_csum(packet)) { if (csum(ip, IP_IHL(ip->ip_ihl_ver) * 4)) { VLOG_WARN_RL(&err_rl, "ip packet has invalid checksum"); return NULL; @@ -142,7 +145,8 @@ netdev_tnl_ip_extract_tnl_md(struct dp_packet *packet, struct flow_tnl *tnl, * * This function sets the IP header's ip_tot_len field (which should be zeroed * as part of 'header') and puts its value into '*ip_tot_size' as well. Also - * updates IP header checksum, as well as the l3 and l4 offsets in 'packet'. + * updates IP header checksum if not offloaded, as well as the l3 and l4 + * offsets in the 'packet'. * * Return pointer to the L4 header added to 'packet'. */ void * @@ -167,11 +171,16 @@ netdev_tnl_push_ip_header(struct dp_packet *packet, *ip_tot_size -= IPV6_HEADER_LEN; ip6->ip6_plen = htons(*ip_tot_size); packet->l4_ofs = dp_packet_size(packet) - *ip_tot_size; + dp_packet_hwol_set_tx_ipv6(packet); + dp_packet_ol_reset_ip_csum_good(packet); return ip6 + 1; } else { ip = netdev_tnl_ip_hdr(eth); ip->ip_tot_len = htons(*ip_tot_size); - ip->ip_csum = recalc_csum16(ip->ip_csum, 0, ip->ip_tot_len); + /* Postpone checksum to when the packet is pushed to the port. */ + dp_packet_hwol_set_tx_ipv4(packet); + dp_packet_hwol_set_tx_ip_csum(packet); + dp_packet_ol_reset_ip_csum_good(packet); *ip_tot_size -= IP_HEADER_LEN; packet->l4_ofs = dp_packet_size(packet) - *ip_tot_size; return ip + 1; @@ -190,7 +199,7 @@ udp_extract_tnl_md(struct dp_packet *packet, struct flow_tnl *tnl, } if (udp->udp_csum) { - if (OVS_UNLIKELY(!dp_packet_l4_checksum_valid(packet))) { + if (OVS_UNLIKELY(!dp_packet_l4_checksum_good(packet))) { uint32_t csum; if (netdev_tnl_is_header_ipv6(dp_packet_data(packet))) { csum = packet_csum_pseudoheader6(dp_packet_l3(packet)); @@ -297,8 +306,8 @@ netdev_tnl_ip_build_header(struct ovs_action_push_tnl *data, ip->ip_frag_off = (params->flow->tunnel.flags & FLOW_TNL_F_DONT_FRAGMENT) ? htons(IP_DF) : 0; - /* Checksum has already been zeroed by eth_build_header. */ - ip->ip_csum = csum(ip, sizeof *ip); + /* The checksum will be calculated when the headers are pushed + * to the packet if offloading is not enabled. */ data->header_len += IP_HEADER_LEN; return ip + 1; diff --git a/lib/netdev.c b/lib/netdev.c index 818589246..13449cfc8 100644 --- a/lib/netdev.c +++ b/lib/netdev.c @@ -808,6 +808,14 @@ netdev_send_prepare_packet(const uint64_t netdev_flags, return false; } + /* Packet with IP csum offloading enabled was received with verified csum. + * Leave the IP csum offloading enabled even with good checksum to the + * netdev to decide what would be the best to do. + * Provide a software fallback in case the device doesn't support IP csum + * offloading. Note: Encapsulated packet must have the inner IP header + * csum already calculated. */ + dp_packet_ol_send_prepare(packet, netdev_flags); + l4_mask = dp_packet_hwol_l4_mask(packet); if (l4_mask) { if (dp_packet_hwol_l4_is_tcp(packet)) { @@ -975,7 +983,15 @@ netdev_push_header(const struct netdev *netdev, "not supported: packet dropped", netdev_get_name(netdev)); } else { + /* The packet is going to be encapsulated and there is + * no support yet for inner network header csum offloading. */ + if (dp_packet_hwol_tx_ip_csum(packet) + && !dp_packet_ip_checksum_good(packet)) { + dp_packet_ip_set_header_csum(packet); + } + netdev->netdev_class->push_header(netdev, packet, data); + pkt_metadata_init(&packet->md, data->out_port); dp_packet_batch_refill(batch, packet, i); } diff --git a/lib/odp-execute-avx512.c b/lib/odp-execute-avx512.c index c28461ec1..93b6b6ccc 100644 --- a/lib/odp-execute-avx512.c +++ b/lib/odp-execute-avx512.c @@ -450,7 +450,6 @@ action_avx512_ipv4_set_addrs(struct dp_packet_batch *batch, DP_PACKET_BATCH_FOR_EACH (i, packet, batch) { struct ip_header *nh = dp_packet_l3(packet); - ovs_be16 old_csum = ~nh->ip_csum; /* Load the 20 bytes of the IPv4 header. Without options, which is the * most common case it's 20 bytes, but can be up to 60 bytes. */ @@ -463,13 +462,19 @@ action_avx512_ipv4_set_addrs(struct dp_packet_batch *batch, * (v_pkt_masked). */ __m256i v_new_hdr = _mm256_or_si256(v_key_shuf, v_pkt_masked); - /* Update the IP checksum based on updated IP values. */ - uint16_t delta = avx512_ipv4_hdr_csum_delta(v_packet, v_new_hdr); - uint32_t new_csum = old_csum + delta; - delta = csum_finish(new_csum); + if (dp_packet_hwol_tx_ip_csum(packet)) { + dp_packet_ol_reset_ip_csum_good(packet); + } else { + ovs_be16 old_csum = ~nh->ip_csum; - /* Insert new checksum. */ - v_new_hdr = _mm256_insert_epi16(v_new_hdr, delta, 5); + /* Update the IP checksum based on updated IP values. */ + uint16_t delta = avx512_ipv4_hdr_csum_delta(v_packet, v_new_hdr); + uint32_t new_csum = old_csum + delta; + delta = csum_finish(new_csum); + + /* Insert new checksum. */ + v_new_hdr = _mm256_insert_epi16(v_new_hdr, delta, 5); + } /* If ip_src or ip_dst has been modified, L4 checksum needs to * be updated too. */ diff --git a/lib/odp-execute.c b/lib/odp-execute.c index 5cf6fbec0..37f0f717a 100644 --- a/lib/odp-execute.c +++ b/lib/odp-execute.c @@ -169,9 +169,14 @@ odp_set_ipv4(struct dp_packet *packet, const struct ovs_key_ipv4 *key, new_tos = key->ipv4_tos | (nh->ip_tos & ~mask->ipv4_tos); if (nh->ip_tos != new_tos) { - nh->ip_csum = recalc_csum16(nh->ip_csum, - htons((uint16_t) nh->ip_tos), - htons((uint16_t) new_tos)); + if (dp_packet_hwol_tx_ip_csum(packet)) { + dp_packet_ol_reset_ip_csum_good(packet); + } else { + nh->ip_csum = recalc_csum16(nh->ip_csum, + htons((uint16_t) nh->ip_tos), + htons((uint16_t) new_tos)); + } + nh->ip_tos = new_tos; } } @@ -180,8 +185,14 @@ odp_set_ipv4(struct dp_packet *packet, const struct ovs_key_ipv4 *key, new_ttl = key->ipv4_ttl | (nh->ip_ttl & ~mask->ipv4_ttl); if (OVS_LIKELY(nh->ip_ttl != new_ttl)) { - nh->ip_csum = recalc_csum16(nh->ip_csum, htons(nh->ip_ttl << 8), - htons(new_ttl << 8)); + if (dp_packet_hwol_tx_ip_csum(packet)) { + dp_packet_ol_reset_ip_csum_good(packet); + } else { + nh->ip_csum = recalc_csum16(nh->ip_csum, + htons(nh->ip_ttl << 8), + htons(new_ttl << 8)); + } + nh->ip_ttl = new_ttl; } } diff --git a/lib/packets.c b/lib/packets.c index 06f516cb1..36d9ec5b9 100644 --- a/lib/packets.c +++ b/lib/packets.c @@ -1144,7 +1144,12 @@ packet_set_ipv4_addr(struct dp_packet *packet, } } } - nh->ip_csum = recalc_csum32(nh->ip_csum, old_addr, new_addr); + + if (dp_packet_hwol_tx_ip_csum(packet)) { + dp_packet_ol_reset_ip_csum_good(packet); + } else { + nh->ip_csum = recalc_csum32(nh->ip_csum, old_addr, new_addr); + } put_16aligned_be32(addr, new_addr); } @@ -1311,16 +1316,26 @@ packet_set_ipv4(struct dp_packet *packet, ovs_be32 src, ovs_be32 dst, if (nh->ip_tos != tos) { uint8_t *field = &nh->ip_tos; - nh->ip_csum = recalc_csum16(nh->ip_csum, htons((uint16_t) *field), - htons((uint16_t) tos)); + if (dp_packet_hwol_tx_ip_csum(packet)) { + dp_packet_ol_reset_ip_csum_good(packet); + } else { + nh->ip_csum = recalc_csum16(nh->ip_csum, htons((uint16_t) *field), + htons((uint16_t) tos)); + } + *field = tos; } if (nh->ip_ttl != ttl) { uint8_t *field = &nh->ip_ttl; - nh->ip_csum = recalc_csum16(nh->ip_csum, htons(*field << 8), - htons(ttl << 8)); + if (dp_packet_hwol_tx_ip_csum(packet)) { + dp_packet_ol_reset_ip_csum_good(packet); + } else { + nh->ip_csum = recalc_csum16(nh->ip_csum, htons(*field << 8), + htons(ttl << 8)); + } + *field = ttl; } } @@ -1931,8 +1946,13 @@ IP_ECN_set_ce(struct dp_packet *pkt, bool is_ipv6) tos |= IP_ECN_CE; if (nh->ip_tos != tos) { - nh->ip_csum = recalc_csum16(nh->ip_csum, htons(nh->ip_tos), - htons((uint16_t) tos)); + if (dp_packet_hwol_tx_ip_csum(pkt)) { + dp_packet_ol_reset_ip_csum_good(pkt); + } else { + nh->ip_csum = recalc_csum16(nh->ip_csum, htons(nh->ip_tos), + htons((uint16_t) tos)); + } + nh->ip_tos = tos; } } diff --git a/tests/dpif-netdev.at b/tests/dpif-netdev.at index 8a663a4b6..f0f1ccfe5 100644 --- a/tests/dpif-netdev.at +++ b/tests/dpif-netdev.at @@ -734,3 +734,81 @@ AT_CHECK([test `ovs-vsctl get Interface p2 statistics:tx_q0_packets` -gt 0 -a dn OVS_VSWITCHD_STOP AT_CLEANUP + +AT_SETUP([userspace offload - ip csum offload]) +OVS_VSWITCHD_START( + [add-br br1 -- set bridge br1 datapath-type=dummy -- \ + add-port br1 p1 -- \ + set Interface p1 type=dummy -- \ + add-port br1 p2 -- \ + set Interface p2 type=dummy --]) + +# Modify the ip_dst addr to force changing the IP csum. +AT_CHECK([ovs-ofctl add-flow br1 in_port=p1,actions=mod_nw_dst:192.168.1.1,output:p2]) + +# Check if no offload remains ok. +AT_CHECK([ovs-vsctl set Interface p2 options:tx_pcap=p2.pcap]) +AT_CHECK([ovs-vsctl set Interface p1 options:ol_ip_csum=false]) +AT_CHECK([ovs-vsctl set Interface p1 options:ol_ip_csum_set_good=false]) +AT_CHECK([ovs-appctl netdev-dummy/receive p1 \ +0a8f394fe0738abf7e2f058408004500003433e0400040068f8fc0a87b02c0a87b01d4781451a962ad5417ed297b801000e547fd00000101080a2524d2345c7fe1c4 +]) + +# Checksum should change to 0x990 with ip_dst changed to 192.168.1.1 +# by the datapath while processing the packet. +AT_CHECK([ovs-pcap p2.pcap > p2.pcap.txt 2>&1]) +AT_CHECK([tail -n 1 p2.pcap.txt], [0], [dnl +0a8f394fe0738abf7e2f058408004500003433e0400040060990c0a87b02c0a80101d4781451a962ad5417ed297b801000e5c1fd00000101080a2524d2345c7fe1c4 +]) + +# Check if packets entering the datapath with csum offloading +# enabled gets the csum updated properly by egress handling +# in the datapath and not by the netdev. +AT_CHECK([ovs-vsctl set Interface p1 options:ol_ip_csum=false]) +AT_CHECK([ovs-vsctl set Interface p1 options:ol_ip_csum_set_good=true]) +AT_CHECK([ovs-appctl netdev-dummy/receive p1 \ +0a8f394fe0738abf7e2f058408004500003433e0400040068f8fc0a87b02c0a87b01d4781451a962ad5417ed297b801000e547fd00000101080a2524d2345c7fe1c4 +]) +AT_CHECK([ovs-pcap p2.pcap > p2.pcap.txt 2>&1]) +AT_CHECK([tail -n 1 p2.pcap.txt], [0], [dnl +0a8f394fe0738abf7e2f058408004500003433e0400040060990c0a87b02c0a80101d4781451a962ad5417ed297b801000e5c1fd00000101080a2524d2345c7fe1c4 +]) + +# Check if packets entering the datapath with csum offloading +# enabled gets the csum updated properly by netdev and not +# by the datapath. +AT_CHECK([ovs-vsctl set Interface p1 options:ol_ip_csum=true]) +AT_CHECK([ovs-vsctl set Interface p1 options:ol_ip_csum_set_good=true]) +AT_CHECK([ovs-appctl netdev-dummy/receive p1 \ +0a8f394fe0738abf7e2f058408004500003433e0400040068f8fc0a87b02c0a87b01d4781451a962ad5417ed297b801000e547fd00000101080a2524d2345c7fe1c4 +]) +AT_CHECK([ovs-pcap p2.pcap > p2.pcap.txt 2>&1]) +AT_CHECK([tail -n 1 p2.pcap.txt], [0], [dnl +0a8f394fe0738abf7e2f058408004500003433e0400040060990c0a87b02c0a80101d4781451a962ad5417ed297b801000e5c1fd00000101080a2524d2345c7fe1c4 +]) + +# Push a packet with bad csum and offloading disabled to check +# if the datapath updates the csum, but does not fix the issue. +AT_CHECK([ovs-vsctl set Interface p1 options:ol_ip_csum=false]) +AT_CHECK([ovs-vsctl set Interface p1 options:ol_ip_csum_set_good=false]) +AT_CHECK([ovs-appctl netdev-dummy/receive p1 \ +0a8f394fe0738abf7e2f058408004500003433e0400040068f03c0a87b02c0a87b01d4781451a962ad5417ed297b801000e547fd00000101080a2524d2345c7fe1c4 +]) +AT_CHECK([ovs-pcap p2.pcap > p2.pcap.txt 2>&1]) +AT_CHECK([tail -n 1 p2.pcap.txt], [0], [dnl +0a8f394fe0738abf7e2f058408004500003433e0400040060904c0a87b02c0a80101d4781451a962ad5417ed297b801000e5c1fd00000101080a2524d2345c7fe1c4 +]) + +# Push a packet with bad csum and offloading enabled to check +# if the driver updates and fixes the csum. +AT_CHECK([ovs-vsctl set Interface p1 options:ol_ip_csum=true]) +AT_CHECK([ovs-vsctl set Interface p1 options:ol_ip_csum_set_good=true]) +AT_CHECK([ovs-appctl netdev-dummy/receive p1 \ +0a8f394fe0738abf7e2f058408004500003433e0400040068f03c0a87b02c0a87b01d4781451a962ad5417ed297b801000e547fd00000101080a2524d2345c7fe1c4 +]) +AT_CHECK([ovs-pcap p2.pcap > p2.pcap.txt 2>&1]) +AT_CHECK([tail -n 1 p2.pcap.txt], [0], [dnl +0a8f394fe0738abf7e2f058408004500003433e0400040060990c0a87b02c0a80101d4781451a962ad5417ed297b801000e5c1fd00000101080a2524d2345c7fe1c4 +]) +OVS_VSWITCHD_STOP +AT_CLEANUP From patchwork Mon Mar 27 10:50:13 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mike Pattrick X-Patchwork-Id: 1761505 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=2605:bc80:3010::133; helo=smtp2.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=MhgLXrob; dkim-atps=neutral Received: from smtp2.osuosl.org (smtp2.osuosl.org [IPv6:2605:bc80:3010::133]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4PlV3H0Lxzz1yXq for ; Mon, 27 Mar 2023 21:50:50 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by smtp2.osuosl.org (Postfix) with ESMTP id BF99540AE2; Mon, 27 Mar 2023 10:50:48 +0000 (UTC) DKIM-Filter: OpenDKIM Filter v2.11.0 smtp2.osuosl.org BF99540AE2 Authentication-Results: smtp2.osuosl.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=MhgLXrob X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp2.osuosl.org ([127.0.0.1]) by localhost (smtp2.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 37Lg7rti2WNQ; Mon, 27 Mar 2023 10:50:41 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by smtp2.osuosl.org (Postfix) with ESMTPS id 9896940AF0; Mon, 27 Mar 2023 10:50:39 +0000 (UTC) DKIM-Filter: OpenDKIM Filter v2.11.0 smtp2.osuosl.org 9896940AF0 Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 32872C0032; Mon, 27 Mar 2023 10:50:39 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp4.osuosl.org (smtp4.osuosl.org [IPv6:2605:bc80:3010::137]) by lists.linuxfoundation.org (Postfix) with ESMTP id 41B1AC0032 for ; Mon, 27 Mar 2023 10:50:38 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp4.osuosl.org (Postfix) with ESMTP id 2CC3941625 for ; Mon, 27 Mar 2023 10:50:36 +0000 (UTC) DKIM-Filter: OpenDKIM Filter v2.11.0 smtp4.osuosl.org 2CC3941625 Authentication-Results: smtp4.osuosl.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=MhgLXrob X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp4.osuosl.org ([127.0.0.1]) by localhost (smtp4.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 92tjl0xpPfhP for ; Mon, 27 Mar 2023 10:50:32 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.8.0 DKIM-Filter: OpenDKIM Filter v2.11.0 smtp4.osuosl.org B2CEF409DB Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by smtp4.osuosl.org (Postfix) with ESMTPS id B2CEF409DB for ; Mon, 27 Mar 2023 10:50:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1679914230; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=jRqBGvOsJLl9ubF4d1lrPTKS4u9gTfNMMT6IhnbOgbQ=; b=MhgLXrobmo6CxuKZUmlhq65hGq+rzXsw+nduPOMp2dYr7aD574pd15Pqvgljh04jDnp9rM B3QQPk7a4gemqbrA7bv2q1OQJKi5XYA8/wXXUZr0M2Q4ziCz7A89UriJDSeNXkXsFgBbDN buCeXTYFKo4dsAv82IlKDOfEL3dEod0= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-159-Y33Oc76xPcKY8lSlSBtgOA-1; Mon, 27 Mar 2023 06:50:27 -0400 X-MC-Unique: Y33Oc76xPcKY8lSlSBtgOA-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 1E7FE185A7AC; Mon, 27 Mar 2023 10:50:27 +0000 (UTC) Received: from mpattric.remote.csb (unknown [10.22.9.221]) by smtp.corp.redhat.com (Postfix) with ESMTP id 680112166B26; Mon, 27 Mar 2023 10:50:26 +0000 (UTC) From: Mike Pattrick To: dev@openvswitch.org, david.marchand@redhat.com Date: Mon, 27 Mar 2023 06:50:13 -0400 Message-Id: <20230327105013.491103-5-mkp@redhat.com> In-Reply-To: <20230327105013.491103-4-mkp@redhat.com> References: <20230327105013.491103-1-mkp@redhat.com> <20230327105013.491103-2-mkp@redhat.com> <20230327105013.491103-3-mkp@redhat.com> <20230327105013.491103-4-mkp@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.6 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Cc: Flavio Leitner , i.maximets@ovn.org Subject: [ovs-dev] [PATCH v11 4/4] userspace: Enable L4 checksum offloading by default. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" From: Flavio Leitner The netdev receiving packets is supposed to provide the flags indicating if the L4 checksum was verified and it is OK or BAD, otherwise the stack will check when appropriate by software. If the packet comes with good checksum, then postpone the checksum calculation to the egress device if needed. When encapsulate a packet with that flag, set the checksum of the inner L4 header since that is not yet supported. Calculate the L4 checksum when the packet is going to be sent over a device that doesn't support the feature. Linux tap devices allows enabling L3 and L4 offload, so this patch enables the feature. However, Linux socket interface remains disabled because the API doesn't allow enabling those two features without enabling TSO too. Signed-off-by: Flavio Leitner Co-authored-by: Mike Pattrick Signed-off-by: Mike Pattrick Reviewed-by: Simon Horman --- Since v9: - Extended miniflow_extract changes into avx512 code - Formatting changes - Note that we cannot currently enable checksum offloading in CONFIGURE_VETH_OFFLOADS for check-system-userspace as netdev-linux.c currently only parses the vnet header if TSO is enabled. Since v10: - No change Signed-off-by: Mike Pattrick --- lib/conntrack.c | 15 +- lib/dp-packet.c | 25 ++++ lib/dp-packet.h | 78 +++++++++- lib/dpif-netdev-extract-avx512.c | 62 +++++++- lib/flow.c | 23 +++ lib/netdev-dpdk.c | 174 +++++++++++++++------- lib/netdev-linux.c | 242 +++++++++++++++++++++---------- lib/netdev-native-tnl.c | 32 +--- lib/netdev.c | 46 ++---- lib/odp-execute-avx512.c | 24 +-- lib/packets.c | 175 +++++++++++++++++----- lib/packets.h | 3 + 12 files changed, 645 insertions(+), 254 deletions(-) diff --git a/lib/conntrack.c b/lib/conntrack.c index 54166c320..75f3d7fee 100644 --- a/lib/conntrack.c +++ b/lib/conntrack.c @@ -2044,13 +2044,12 @@ conn_key_extract(struct conntrack *ct, struct dp_packet *pkt, ovs_be16 dl_type, } if (ok) { - bool hwol_bad_l4_csum = dp_packet_l4_checksum_bad(pkt); - if (!hwol_bad_l4_csum) { - bool hwol_good_l4_csum = dp_packet_l4_checksum_good(pkt) - || dp_packet_hwol_tx_l4_checksum(pkt); + if (!dp_packet_l4_checksum_bad(pkt)) { /* Validate the checksum only when hwol is not supported. */ if (extract_l4(&ctx->key, l4, dp_packet_l4_size(pkt), - &ctx->icmp_related, l3, !hwol_good_l4_csum, + &ctx->icmp_related, l3, + !dp_packet_l4_checksum_good(pkt) && + !dp_packet_hwol_tx_l4_checksum(pkt), NULL)) { ctx->hash = conn_key_hash(&ctx->key, ct->hash_basis); return true; @@ -3379,8 +3378,10 @@ handle_ftp_ctl(struct conntrack *ct, const struct conn_lookup_ctx *ctx, adj_seqnum(&th->tcp_seq, ec->seq_skew); } - th->tcp_csum = 0; - if (!dp_packet_hwol_tx_l4_checksum(pkt)) { + if (dp_packet_hwol_tx_l4_checksum(pkt)) { + dp_packet_ol_reset_l4_csum_good(pkt); + } else { + th->tcp_csum = 0; if (ctx->key.dl_type == htons(ETH_TYPE_IPV6)) { th->tcp_csum = packet_csum_upperlayer6(nh6, th, ctx->key.nw_proto, dp_packet_l4_size(pkt)); diff --git a/lib/dp-packet.c b/lib/dp-packet.c index 61c36de44..dfedd0e9b 100644 --- a/lib/dp-packet.c +++ b/lib/dp-packet.c @@ -38,6 +38,9 @@ dp_packet_init__(struct dp_packet *b, size_t allocated, enum dp_packet_source so dp_packet_init_specific(b); /* By default assume the packet type to be Ethernet. */ b->packet_type = htonl(PT_ETH); + /* Reset csum start and offset. */ + b->csum_start = 0; + b->csum_offset = 0; } static void @@ -544,4 +547,26 @@ dp_packet_ol_send_prepare(struct dp_packet *p, uint64_t flags) dp_packet_ol_set_ip_csum_good(p); dp_packet_hwol_reset_tx_ip_csum(p); } + + if (dp_packet_l4_checksum_good(p) || !dp_packet_hwol_tx_l4_checksum(p)) { + dp_packet_hwol_reset_tx_l4_csum(p); + return; + } + + if (dp_packet_hwol_l4_is_tcp(p) + && !(flags & NETDEV_TX_OFFLOAD_TCP_CKSUM)) { + packet_tcp_complete_csum(p); + dp_packet_ol_set_l4_csum_good(p); + dp_packet_hwol_reset_tx_l4_csum(p); + } else if (dp_packet_hwol_l4_is_udp(p) + && !(flags & NETDEV_TX_OFFLOAD_UDP_CKSUM)) { + packet_udp_complete_csum(p); + dp_packet_ol_set_l4_csum_good(p); + dp_packet_hwol_reset_tx_l4_csum(p); + } else if (!(flags & NETDEV_TX_OFFLOAD_SCTP_CKSUM) + && dp_packet_hwol_l4_is_sctp(p)) { + packet_sctp_complete_csum(p); + dp_packet_ol_set_l4_csum_good(p); + dp_packet_hwol_reset_tx_l4_csum(p); + } } diff --git a/lib/dp-packet.h b/lib/dp-packet.h index af0a2b7f0..c37c85857 100644 --- a/lib/dp-packet.h +++ b/lib/dp-packet.h @@ -140,6 +140,8 @@ struct dp_packet { or UINT16_MAX. */ uint32_t cutlen; /* length in bytes to cut from the end. */ ovs_be32 packet_type; /* Packet type as defined in OpenFlow */ + uint16_t csum_start; /* Position to start checksumming from. */ + uint16_t csum_offset; /* Offset to place checksum. */ union { struct pkt_metadata md; uint64_t data[DP_PACKET_CONTEXT_SIZE / 8]; @@ -997,6 +999,13 @@ dp_packet_hwol_is_ipv4(const struct dp_packet *b) return !!(*dp_packet_ol_flags_ptr(b) & DP_PACKET_OL_TX_IPV4); } +/* Returns 'true' if packet 'p' is marked as IPv6. */ +static inline bool +dp_packet_hwol_tx_ipv6(const struct dp_packet *p) +{ + return !!(*dp_packet_ol_flags_ptr(p) & DP_PACKET_OL_TX_IPV6); +} + /* Returns 'true' if packet 'b' is marked for TCP checksum offloading. */ static inline bool dp_packet_hwol_l4_is_tcp(const struct dp_packet *b) @@ -1021,18 +1030,26 @@ dp_packet_hwol_l4_is_sctp(struct dp_packet *b) DP_PACKET_OL_TX_SCTP_CKSUM; } -/* Mark packet 'b' for IPv4 checksum offloading. */ static inline void -dp_packet_hwol_set_tx_ipv4(struct dp_packet *b) +dp_packet_hwol_reset_tx_l4_csum(struct dp_packet *p) +{ + *dp_packet_ol_flags_ptr(p) &= ~DP_PACKET_OL_TX_L4_MASK; +} + +/* Mark packet 'p' as IPv4. */ +static inline void +dp_packet_hwol_set_tx_ipv4(struct dp_packet *p) { - *dp_packet_ol_flags_ptr(b) |= DP_PACKET_OL_TX_IPV4; + *dp_packet_ol_flags_ptr(p) &= ~DP_PACKET_OL_TX_IPV6; + *dp_packet_ol_flags_ptr(p) |= DP_PACKET_OL_TX_IPV4; } -/* Mark packet 'b' for IPv6 checksum offloading. */ +/* Mark packet 'a' as IPv6. */ static inline void -dp_packet_hwol_set_tx_ipv6(struct dp_packet *b) +dp_packet_hwol_set_tx_ipv6(struct dp_packet *a) { - *dp_packet_ol_flags_ptr(b) |= DP_PACKET_OL_TX_IPV6; + *dp_packet_ol_flags_ptr(a) &= ~DP_PACKET_OL_TX_IPV4; + *dp_packet_ol_flags_ptr(a) |= DP_PACKET_OL_TX_IPV6; } /* Returns 'true' if packet 'p' is marked for IPv4 checksum offloading. */ @@ -1147,6 +1164,55 @@ dp_packet_l4_checksum_bad(const struct dp_packet *p) DP_PACKET_OL_RX_L4_CKSUM_BAD; } +/* Returns 'true' if the packet has good integrity though the + * checksum in the packet 'p' is not complete. */ +static inline bool +dp_packet_ol_l4_csum_partial(const struct dp_packet *p) +{ + return (*dp_packet_ol_flags_ptr(p) & DP_PACKET_OL_RX_L4_CKSUM_MASK) == + DP_PACKET_OL_RX_L4_CKSUM_MASK; +} + +/* Marks packet 'p' with good integrity though the checksum in the + * packet is not complete. */ +static inline void +dp_packet_ol_set_l4_csum_partial(struct dp_packet *p) +{ + *dp_packet_ol_flags_ptr(p) |= DP_PACKET_OL_RX_L4_CKSUM_MASK; +} + +/* Marks packet 'p' with good L4 checksum. */ +static inline void +dp_packet_ol_set_l4_csum_good(struct dp_packet *p) +{ + *dp_packet_ol_flags_ptr(p) &= ~DP_PACKET_OL_RX_L4_CKSUM_BAD; + *dp_packet_ol_flags_ptr(p) |= DP_PACKET_OL_RX_L4_CKSUM_GOOD; +} + +/* Marks packet 'p' with good L4 checksum as modified. */ +static inline void +dp_packet_ol_reset_l4_csum_good(struct dp_packet *p) +{ + if (!dp_packet_ol_l4_csum_partial(p)) { + *dp_packet_ol_flags_ptr(p) &= ~DP_PACKET_OL_RX_L4_CKSUM_GOOD; + } +} + +/* Marks packet 'p' with good integrity if the 'start' and 'offset' + * matches with the 'csum_start' and 'csum_offset' in packet 'p'. + * The 'start' is the offset from the begin of the packet headers. + * The 'offset' is the offset from start to place the checksum. + * The csum_start and csum_offset fields are set from the virtio_net_hdr + * struct that may be provided by a netdev on packet ingress. */ +static inline void +dp_packet_ol_vnet_csum_check(struct dp_packet *p, uint16_t start, + uint16_t offset) +{ + if (p->csum_start == start && p->csum_offset == offset) { + dp_packet_ol_set_l4_csum_partial(p); + } +} + static inline uint32_t ALWAYS_INLINE dp_packet_calc_hash_ipv4(const uint8_t *pkt, const uint16_t l3_ofs, uint32_t hash) diff --git a/lib/dpif-netdev-extract-avx512.c b/lib/dpif-netdev-extract-avx512.c index 66884eaf0..a9299d229 100644 --- a/lib/dpif-netdev-extract-avx512.c +++ b/lib/dpif-netdev-extract-avx512.c @@ -698,7 +698,6 @@ mfex_ipv6_set_l2_pad_size(struct dp_packet *pkt, return -1; } dp_packet_set_l2_pad_size(pkt, len_from_ipv6 - (p_len + IPV6_HEADER_LEN)); - dp_packet_hwol_set_tx_ipv6(pkt); return 0; } @@ -729,10 +728,6 @@ mfex_ipv4_set_l2_pad_size(struct dp_packet *pkt, struct ip_header *nh, return -1; } dp_packet_set_l2_pad_size(pkt, len_from_ipv4 - ip_tot_len); - dp_packet_hwol_set_tx_ipv4(pkt); - if (dp_packet_ip_checksum_good(pkt)) { - dp_packet_hwol_set_tx_ip_csum(pkt); - } return 0; } @@ -763,6 +758,45 @@ mfex_check_tcp_data_offset(const struct tcp_header *tcp) return ret; } +static void +mfex_ipv4_set_hwol(struct dp_packet *pkt) +{ + dp_packet_hwol_set_tx_ipv4(pkt); + if (dp_packet_ip_checksum_good(pkt)) { + dp_packet_hwol_set_tx_ip_csum(pkt); + } +} + +static void +mfex_ipv6_set_hwol(struct dp_packet *pkt) +{ + dp_packet_hwol_set_tx_ipv6(pkt); +} + +static void +mfex_tcp_set_hwol(struct dp_packet *pkt) +{ + dp_packet_ol_vnet_csum_check(pkt, pkt->l4_ofs, + offsetof(struct tcp_header, + tcp_csum)); + if (dp_packet_l4_checksum_good(pkt) + || dp_packet_ol_l4_csum_partial(pkt)) { + dp_packet_hwol_set_csum_tcp(pkt); + } +} + +static void +mfex_udp_set_hwol(struct dp_packet *pkt) +{ + dp_packet_ol_vnet_csum_check(pkt, pkt->l4_ofs, + offsetof(struct udp_header, + udp_csum)); + if (dp_packet_l4_checksum_good(pkt) + || dp_packet_ol_l4_csum_partial(pkt)) { + dp_packet_hwol_set_csum_udp(pkt); + } +} + /* Generic loop to process any mfex profile. This code is specialized into * multiple actual MFEX implementation functions. Its marked ALWAYS_INLINE * to ensure the compiler specializes each instance. The code is marked "hot" @@ -864,6 +898,8 @@ mfex_avx512_process(struct dp_packet_batch *packets, const struct tcp_header *tcp = (void *)&pkt[38]; mfex_handle_tcp_flags(tcp, &blocks[7]); dp_packet_update_rss_hash_ipv4_tcp_udp(packet); + mfex_ipv4_set_hwol(packet); + mfex_tcp_set_hwol(packet); } break; case PROFILE_ETH_VLAN_IPV4_UDP: { @@ -876,6 +912,8 @@ mfex_avx512_process(struct dp_packet_batch *packets, continue; } dp_packet_update_rss_hash_ipv4_tcp_udp(packet); + mfex_ipv4_set_hwol(packet); + mfex_udp_set_hwol(packet); } break; case PROFILE_ETH_IPV4_TCP: { @@ -891,6 +929,8 @@ mfex_avx512_process(struct dp_packet_batch *packets, continue; } dp_packet_update_rss_hash_ipv4_tcp_udp(packet); + mfex_ipv4_set_hwol(packet); + mfex_tcp_set_hwol(packet); } break; case PROFILE_ETH_IPV4_UDP: { @@ -902,6 +942,8 @@ mfex_avx512_process(struct dp_packet_batch *packets, continue; } dp_packet_update_rss_hash_ipv4_tcp_udp(packet); + mfex_ipv4_set_hwol(packet); + mfex_udp_set_hwol(packet); } break; case PROFILE_ETH_IPV6_UDP: { @@ -920,6 +962,8 @@ mfex_avx512_process(struct dp_packet_batch *packets, /* Process UDP header. */ mfex_handle_ipv6_l4((void *)&pkt[54], &blocks[9]); dp_packet_update_rss_hash_ipv6_tcp_udp(packet); + mfex_ipv6_set_hwol(packet); + mfex_udp_set_hwol(packet); } break; case PROFILE_ETH_IPV6_TCP: { @@ -943,6 +987,8 @@ mfex_avx512_process(struct dp_packet_batch *packets, } mfex_handle_tcp_flags(tcp, &blocks[9]); dp_packet_update_rss_hash_ipv6_tcp_udp(packet); + mfex_ipv6_set_hwol(packet); + mfex_tcp_set_hwol(packet); } break; case PROFILE_ETH_VLAN_IPV6_TCP: { @@ -969,6 +1015,8 @@ mfex_avx512_process(struct dp_packet_batch *packets, } mfex_handle_tcp_flags(tcp, &blocks[10]); dp_packet_update_rss_hash_ipv6_tcp_udp(packet); + mfex_ipv6_set_hwol(packet); + mfex_tcp_set_hwol(packet); } break; case PROFILE_ETH_VLAN_IPV6_UDP: { @@ -990,6 +1038,8 @@ mfex_avx512_process(struct dp_packet_batch *packets, /* Process UDP header. */ mfex_handle_ipv6_l4((void *)&pkt[58], &blocks[10]); dp_packet_update_rss_hash_ipv6_tcp_udp(packet); + mfex_ipv6_set_hwol(packet); + mfex_udp_set_hwol(packet); } break; case PROFILE_ETH_IPV4_NVGRE: { @@ -1000,6 +1050,8 @@ mfex_avx512_process(struct dp_packet_batch *packets, continue; } dp_packet_update_rss_hash_ipv4(packet); + mfex_ipv4_set_hwol(packet); + mfex_udp_set_hwol(packet); } break; default: diff --git a/lib/flow.c b/lib/flow.c index 6c8bf7fc0..5aaf3b420 100644 --- a/lib/flow.c +++ b/lib/flow.c @@ -1027,6 +1027,13 @@ miniflow_extract(struct dp_packet *packet, struct miniflow *dst) } else if (dl_type == htons(ETH_TYPE_IPV6)) { dp_packet_update_rss_hash_ipv6_tcp_udp(packet); } + dp_packet_ol_vnet_csum_check(packet, packet->l4_ofs, + offsetof(struct tcp_header, + tcp_csum)); + if (dp_packet_l4_checksum_good(packet) + || dp_packet_ol_l4_csum_partial(packet)) { + dp_packet_hwol_set_csum_tcp(packet); + } } } } else if (OVS_LIKELY(nw_proto == IPPROTO_UDP)) { @@ -1042,6 +1049,13 @@ miniflow_extract(struct dp_packet *packet, struct miniflow *dst) } else if (dl_type == htons(ETH_TYPE_IPV6)) { dp_packet_update_rss_hash_ipv6_tcp_udp(packet); } + dp_packet_ol_vnet_csum_check(packet, packet->l4_ofs, + offsetof(struct udp_header, + udp_csum)); + if (dp_packet_l4_checksum_good(packet) + || dp_packet_ol_l4_csum_partial(packet)) { + dp_packet_hwol_set_csum_udp(packet); + } } } else if (OVS_LIKELY(nw_proto == IPPROTO_SCTP)) { if (OVS_LIKELY(size >= SCTP_HEADER_LEN)) { @@ -1051,6 +1065,13 @@ miniflow_extract(struct dp_packet *packet, struct miniflow *dst) miniflow_push_be16(mf, tp_dst, sctp->sctp_dst); miniflow_push_be16(mf, ct_tp_src, ct_tp_src); miniflow_push_be16(mf, ct_tp_dst, ct_tp_dst); + dp_packet_ol_vnet_csum_check(packet, packet->l4_ofs, + offsetof(struct sctp_header, + sctp_csum)); + if (dp_packet_l4_checksum_good(packet) + || dp_packet_ol_l4_csum_partial(packet)) { + dp_packet_hwol_set_csum_sctp(packet); + } } } else if (OVS_LIKELY(nw_proto == IPPROTO_ICMP)) { if (OVS_LIKELY(size >= ICMP_HEADER_LEN)) { @@ -3170,6 +3191,7 @@ flow_compose_l4_csum(struct dp_packet *p, const struct flow *flow, tcp->tcp_csum = 0; tcp->tcp_csum = csum_finish(csum_continue(pseudo_hdr_csum, tcp, l4_len)); + dp_packet_ol_set_l4_csum_good(p); } else if (flow->nw_proto == IPPROTO_UDP) { struct udp_header *udp = dp_packet_l4(p); @@ -3179,6 +3201,7 @@ flow_compose_l4_csum(struct dp_packet *p, const struct flow *flow, if (!udp->udp_csum) { udp->udp_csum = htons(0xffff); } + dp_packet_ol_set_l4_csum_good(p); } else if (flow->nw_proto == IPPROTO_ICMP) { struct icmp_header *icmp = dp_packet_l4(p); diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 8c2c07898..31ac534f6 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -412,8 +412,10 @@ enum dpdk_hw_ol_features { NETDEV_RX_HW_CRC_STRIP = 1 << 1, NETDEV_RX_HW_SCATTER = 1 << 2, NETDEV_TX_IPV4_CKSUM_OFFLOAD = 1 << 3, - NETDEV_TX_TSO_OFFLOAD = 1 << 4, - NETDEV_TX_SCTP_CHECKSUM_OFFLOAD = 1 << 5, + NETDEV_TX_TCP_CKSUM_OFFLOAD = 1 << 4, + NETDEV_TX_UDP_CKSUM_OFFLOAD = 1 << 5, + NETDEV_TX_SCTP_CKSUM_OFFLOAD = 1 << 6, + NETDEV_TX_TSO_OFFLOAD = 1 << 7, }; /* @@ -1008,6 +1010,35 @@ dpdk_watchdog(void *dummy OVS_UNUSED) return NULL; } +static void +netdev_dpdk_update_netdev_flag(struct netdev_dpdk *dev, + enum dpdk_hw_ol_features hw_ol_features, + enum netdev_ol_flags flag) +{ + struct netdev *netdev = &dev->up; + + if (dev->hw_ol_features & hw_ol_features) { + netdev->ol_flags |= flag; + } else { + netdev->ol_flags &= ~flag; + } +} + +static void +netdev_dpdk_update_netdev_flags(struct netdev_dpdk *dev) +{ + netdev_dpdk_update_netdev_flag(dev, NETDEV_TX_IPV4_CKSUM_OFFLOAD, + NETDEV_TX_OFFLOAD_IPV4_CKSUM); + netdev_dpdk_update_netdev_flag(dev, NETDEV_TX_TCP_CKSUM_OFFLOAD, + NETDEV_TX_OFFLOAD_TCP_CKSUM); + netdev_dpdk_update_netdev_flag(dev, NETDEV_TX_UDP_CKSUM_OFFLOAD, + NETDEV_TX_OFFLOAD_UDP_CKSUM); + netdev_dpdk_update_netdev_flag(dev, NETDEV_TX_SCTP_CKSUM_OFFLOAD, + NETDEV_TX_OFFLOAD_SCTP_CKSUM); + netdev_dpdk_update_netdev_flag(dev, NETDEV_TX_TSO_OFFLOAD, + NETDEV_TX_OFFLOAD_TCP_TSO); +} + static int dpdk_eth_dev_port_config(struct netdev_dpdk *dev, int n_rxq, int n_txq) { @@ -1044,11 +1075,20 @@ dpdk_eth_dev_port_config(struct netdev_dpdk *dev, int n_rxq, int n_txq) conf.txmode.offloads |= RTE_ETH_TX_OFFLOAD_IPV4_CKSUM; } + if (dev->hw_ol_features & NETDEV_TX_TCP_CKSUM_OFFLOAD) { + conf.txmode.offloads |= RTE_ETH_TX_OFFLOAD_TCP_CKSUM; + } + + if (dev->hw_ol_features & NETDEV_TX_UDP_CKSUM_OFFLOAD) { + conf.txmode.offloads |= RTE_ETH_TX_OFFLOAD_UDP_CKSUM; + } + + if (dev->hw_ol_features & NETDEV_TX_SCTP_CKSUM_OFFLOAD) { + conf.txmode.offloads |= RTE_ETH_TX_OFFLOAD_SCTP_CKSUM; + } + if (dev->hw_ol_features & NETDEV_TX_TSO_OFFLOAD) { - conf.txmode.offloads |= DPDK_TX_TSO_OFFLOAD_FLAGS; - if (dev->hw_ol_features & NETDEV_TX_SCTP_CHECKSUM_OFFLOAD) { - conf.txmode.offloads |= RTE_ETH_TX_OFFLOAD_SCTP_CKSUM; - } + conf.txmode.offloads |= RTE_ETH_TX_OFFLOAD_TCP_TSO; } /* Limit configured rss hash functions to only those supported @@ -1154,7 +1194,6 @@ dpdk_eth_dev_init(struct netdev_dpdk *dev) struct rte_ether_addr eth_addr; int diag; int n_rxq, n_txq; - uint32_t tx_tso_offload_capa = DPDK_TX_TSO_OFFLOAD_FLAGS; uint32_t rx_chksm_offload_capa = RTE_ETH_RX_OFFLOAD_UDP_CKSUM | RTE_ETH_RX_OFFLOAD_TCP_CKSUM | RTE_ETH_RX_OFFLOAD_IPV4_CKSUM; @@ -1190,18 +1229,28 @@ dpdk_eth_dev_init(struct netdev_dpdk *dev) dev->hw_ol_features &= ~NETDEV_TX_IPV4_CKSUM_OFFLOAD; } + if (info.tx_offload_capa & RTE_ETH_TX_OFFLOAD_TCP_CKSUM) { + dev->hw_ol_features |= NETDEV_TX_TCP_CKSUM_OFFLOAD; + } else { + dev->hw_ol_features &= ~NETDEV_TX_TCP_CKSUM_OFFLOAD; + } + + if (info.tx_offload_capa & RTE_ETH_TX_OFFLOAD_UDP_CKSUM) { + dev->hw_ol_features |= NETDEV_TX_UDP_CKSUM_OFFLOAD; + } else { + dev->hw_ol_features &= ~NETDEV_TX_UDP_CKSUM_OFFLOAD; + } + + if (info.tx_offload_capa & RTE_ETH_TX_OFFLOAD_SCTP_CKSUM) { + dev->hw_ol_features |= NETDEV_TX_SCTP_CKSUM_OFFLOAD; + } else { + dev->hw_ol_features &= ~NETDEV_TX_SCTP_CKSUM_OFFLOAD; + } + dev->hw_ol_features &= ~NETDEV_TX_TSO_OFFLOAD; if (userspace_tso_enabled()) { - if ((info.tx_offload_capa & tx_tso_offload_capa) - == tx_tso_offload_capa) { + if (info.tx_offload_capa & RTE_ETH_TX_OFFLOAD_TCP_TSO) { dev->hw_ol_features |= NETDEV_TX_TSO_OFFLOAD; - if (info.tx_offload_capa & RTE_ETH_TX_OFFLOAD_SCTP_CKSUM) { - dev->hw_ol_features |= NETDEV_TX_SCTP_CHECKSUM_OFFLOAD; - } else { - VLOG_WARN("%s: Tx SCTP checksum offload is not supported, " - "SCTP packets sent to this device will be dropped", - netdev_get_name(&dev->up)); - } } else { VLOG_WARN("%s: Tx TSO offload is not supported.", netdev_get_name(&dev->up)); @@ -2213,6 +2262,7 @@ netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, struct rte_mbuf *mbuf) mbuf->l2_len = (char *) dp_packet_l3(pkt) - (char *) dp_packet_eth(pkt); mbuf->l3_len = (char *) dp_packet_l4(pkt) - (char *) dp_packet_l3(pkt); + mbuf->l4_len = 0; mbuf->outer_l2_len = 0; mbuf->outer_l3_len = 0; @@ -4149,6 +4199,7 @@ new_device(int vid) ovs_mutex_lock(&dev->mutex); if (nullable_string_is_equal(ifname, dev->vhost_id)) { uint32_t qp_num = rte_vhost_get_vring_num(vid) / VIRTIO_QNUM; + uint64_t features; /* Get NUMA information */ newnode = rte_vhost_get_numa_node(vid); @@ -4173,6 +4224,36 @@ new_device(int vid) dev->vhost_reconfigured = true; } + if (rte_vhost_get_negotiated_features(vid, &features)) { + VLOG_INFO("Error checking guest features for " + "vHost Device '%s'", dev->vhost_id); + } else { + if (features & (1ULL << VIRTIO_NET_F_GUEST_CSUM)) { + dev->hw_ol_features |= NETDEV_TX_TCP_CKSUM_OFFLOAD; + dev->hw_ol_features |= NETDEV_TX_UDP_CKSUM_OFFLOAD; + dev->hw_ol_features |= NETDEV_TX_SCTP_CKSUM_OFFLOAD; + } + + if (userspace_tso_enabled()) { + if (features & (1ULL << VIRTIO_NET_F_GUEST_TSO4) + && features & (1ULL << VIRTIO_NET_F_GUEST_TSO6)) { + + dev->hw_ol_features |= NETDEV_TX_TSO_OFFLOAD; + VLOG_DBG("%s: TSO enabled on vhost port", + netdev_get_name(&dev->up)); + } else { + VLOG_WARN("%s: Tx TSO offload is not supported.", + netdev_get_name(&dev->up)); + } + } + } + + /* There is no support in virtio net to offload IPv4 csum, + * but the vhost library handles IPv4 csum offloading fine. */ + dev->hw_ol_features |= NETDEV_TX_IPV4_CKSUM_OFFLOAD; + + netdev_dpdk_update_netdev_flags(dev); + ovsrcu_index_set(&dev->vid, vid); exists = true; @@ -4236,6 +4317,14 @@ destroy_device(int vid) dev->up.n_rxq * sizeof *dev->vhost_rxq_enabled); netdev_dpdk_txq_map_clear(dev); + /* Clear offload capabilities before next new_device. */ + dev->hw_ol_features &= ~NETDEV_TX_IPV4_CKSUM_OFFLOAD; + dev->hw_ol_features &= ~NETDEV_TX_TCP_CKSUM_OFFLOAD; + dev->hw_ol_features &= ~NETDEV_TX_UDP_CKSUM_OFFLOAD; + dev->hw_ol_features &= ~NETDEV_TX_SCTP_CKSUM_OFFLOAD; + dev->hw_ol_features &= ~NETDEV_TX_TSO_OFFLOAD; + netdev_dpdk_update_netdev_flags(dev); + netdev_change_seq_changed(&dev->up); ovs_mutex_unlock(&dev->mutex); exists = true; @@ -5246,22 +5335,7 @@ netdev_dpdk_reconfigure(struct netdev *netdev) } err = dpdk_eth_dev_init(dev); - - if (dev->hw_ol_features & NETDEV_TX_IPV4_CKSUM_OFFLOAD) { - netdev->ol_flags |= NETDEV_TX_OFFLOAD_IPV4_CKSUM; - } else { - netdev->ol_flags &= ~NETDEV_TX_OFFLOAD_IPV4_CKSUM; - } - - if (dev->hw_ol_features & NETDEV_TX_TSO_OFFLOAD) { - netdev->ol_flags |= NETDEV_TX_OFFLOAD_TCP_TSO; - netdev->ol_flags |= NETDEV_TX_OFFLOAD_TCP_CKSUM; - netdev->ol_flags |= NETDEV_TX_OFFLOAD_UDP_CKSUM; - netdev->ol_flags |= NETDEV_TX_OFFLOAD_IPV4_CKSUM; - if (dev->hw_ol_features & NETDEV_TX_SCTP_CHECKSUM_OFFLOAD) { - netdev->ol_flags |= NETDEV_TX_OFFLOAD_SCTP_CKSUM; - } - } + netdev_dpdk_update_netdev_flags(dev); /* If both requested and actual hwaddr were previously * unset (initialized to 0), then first device init above @@ -5308,11 +5382,6 @@ dpdk_vhost_reconfigure_helper(struct netdev_dpdk *dev) memset(dev->sw_stats, 0, sizeof *dev->sw_stats); rte_spinlock_unlock(&dev->stats_lock); - if (userspace_tso_enabled()) { - dev->hw_ol_features |= NETDEV_TX_TSO_OFFLOAD; - VLOG_DBG("%s: TSO enabled on vhost port", netdev_get_name(&dev->up)); - } - netdev_dpdk_remap_txqs(dev); if (netdev_dpdk_get_vid(dev) >= 0) { @@ -5333,6 +5402,8 @@ dpdk_vhost_reconfigure_helper(struct netdev_dpdk *dev) } } + netdev_dpdk_update_netdev_flags(dev); + return 0; } @@ -5354,8 +5425,6 @@ netdev_dpdk_vhost_client_reconfigure(struct netdev *netdev) { struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); int err; - uint64_t vhost_flags = 0; - uint64_t vhost_unsup_flags; ovs_mutex_lock(&dev->mutex); @@ -5365,6 +5434,9 @@ netdev_dpdk_vhost_client_reconfigure(struct netdev *netdev) * 2. A path has been specified. */ if (!(dev->vhost_driver_flags & RTE_VHOST_USER_CLIENT) && dev->vhost_id) { + uint64_t virtio_unsup_features = 0; + uint64_t vhost_flags = 0; + /* Register client-mode device. */ vhost_flags |= RTE_VHOST_USER_CLIENT; @@ -5411,22 +5483,22 @@ netdev_dpdk_vhost_client_reconfigure(struct netdev *netdev) } if (userspace_tso_enabled()) { - netdev->ol_flags |= NETDEV_TX_OFFLOAD_TCP_TSO; - netdev->ol_flags |= NETDEV_TX_OFFLOAD_TCP_CKSUM; - netdev->ol_flags |= NETDEV_TX_OFFLOAD_UDP_CKSUM; - netdev->ol_flags |= NETDEV_TX_OFFLOAD_SCTP_CKSUM; - netdev->ol_flags |= NETDEV_TX_OFFLOAD_IPV4_CKSUM; - vhost_unsup_flags = 1ULL << VIRTIO_NET_F_HOST_ECN - | 1ULL << VIRTIO_NET_F_HOST_UFO; + virtio_unsup_features = 1ULL << VIRTIO_NET_F_HOST_ECN + | 1ULL << VIRTIO_NET_F_HOST_UFO; + VLOG_DBG("%s: TSO enabled on vhost port", + netdev_get_name(&dev->up)); } else { - /* This disables checksum offloading and all the features - * that depends on it (TSO, UFO, ECN) according to virtio - * specification. */ - vhost_unsup_flags = 1ULL << VIRTIO_NET_F_CSUM; + /* Advertise checksum offloading to the guest, but explicitly + * disable TSO and friends. + * NOTE: we can't disable HOST_ECN which may have been wrongly + * negotiated by a running guest. */ + virtio_unsup_features = 1ULL << VIRTIO_NET_F_HOST_TSO4 + | 1ULL << VIRTIO_NET_F_HOST_TSO6 + | 1ULL << VIRTIO_NET_F_HOST_UFO; } err = rte_vhost_driver_disable_features(dev->vhost_id, - vhost_unsup_flags); + virtio_unsup_features); if (err) { VLOG_ERR("rte_vhost_driver_disable_features failed for " "vhost user client port: %s\n", dev->up.name); diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c index 36620199e..8b4a327ae 100644 --- a/lib/netdev-linux.c +++ b/lib/netdev-linux.c @@ -938,14 +938,6 @@ netdev_linux_common_construct(struct netdev *netdev_) netnsid_unset(&netdev->netnsid); ovs_mutex_init(&netdev->mutex); - if (userspace_tso_enabled()) { - netdev_->ol_flags |= NETDEV_TX_OFFLOAD_TCP_TSO; - netdev_->ol_flags |= NETDEV_TX_OFFLOAD_TCP_CKSUM; - netdev_->ol_flags |= NETDEV_TX_OFFLOAD_UDP_CKSUM; - netdev_->ol_flags |= NETDEV_TX_OFFLOAD_SCTP_CKSUM; - netdev_->ol_flags |= NETDEV_TX_OFFLOAD_IPV4_CKSUM; - } - return 0; } @@ -959,6 +951,16 @@ netdev_linux_construct(struct netdev *netdev_) return error; } + /* The socket interface doesn't offer the option to enable only + * csum offloading without TSO. */ + if (userspace_tso_enabled()) { + netdev_->ol_flags |= NETDEV_TX_OFFLOAD_TCP_TSO; + netdev_->ol_flags |= NETDEV_TX_OFFLOAD_TCP_CKSUM; + netdev_->ol_flags |= NETDEV_TX_OFFLOAD_UDP_CKSUM; + netdev_->ol_flags |= NETDEV_TX_OFFLOAD_SCTP_CKSUM; + netdev_->ol_flags |= NETDEV_TX_OFFLOAD_IPV4_CKSUM; + } + error = get_flags(&netdev->up, &netdev->ifi_flags); if (error == ENODEV) { if (netdev->up.netdev_class != &netdev_internal_class) { @@ -987,6 +989,7 @@ netdev_linux_construct_tap(struct netdev *netdev_) struct netdev_linux *netdev = netdev_linux_cast(netdev_); static const char tap_dev[] = "/dev/net/tun"; const char *name = netdev_->name; + unsigned long oflags; struct ifreq ifr; int error = netdev_linux_common_construct(netdev_); @@ -1004,10 +1007,7 @@ netdev_linux_construct_tap(struct netdev *netdev_) /* Create tap device. */ get_flags(&netdev->up, &netdev->ifi_flags); - ifr.ifr_flags = IFF_TAP | IFF_NO_PI; - if (userspace_tso_enabled()) { - ifr.ifr_flags |= IFF_VNET_HDR; - } + ifr.ifr_flags = IFF_TAP | IFF_NO_PI | IFF_VNET_HDR; ovs_strzcpy(ifr.ifr_name, name, sizeof ifr.ifr_name); if (ioctl(netdev->tap_fd, TUNSETIFF, &ifr) == -1) { @@ -1030,21 +1030,22 @@ netdev_linux_construct_tap(struct netdev *netdev_) goto error_close; } + oflags = TUN_F_CSUM; if (userspace_tso_enabled()) { - /* Old kernels don't support TUNSETOFFLOAD. If TUNSETOFFLOAD is - * available, it will return EINVAL when a flag is unknown. - * Therefore, try enabling offload with no flags to check - * if TUNSETOFFLOAD support is available or not. */ - if (ioctl(netdev->tap_fd, TUNSETOFFLOAD, 0) == 0 || errno != EINVAL) { - unsigned long oflags = TUN_F_CSUM | TUN_F_TSO4 | TUN_F_TSO6; - - if (ioctl(netdev->tap_fd, TUNSETOFFLOAD, oflags) == -1) { - VLOG_WARN("%s: enabling tap offloading failed: %s", name, - ovs_strerror(errno)); - error = errno; - goto error_close; - } - } + oflags |= (TUN_F_TSO4 | TUN_F_TSO6); + } + + if (ioctl(netdev->tap_fd, TUNSETOFFLOAD, oflags) == 0) { + netdev_->ol_flags |= (NETDEV_TX_OFFLOAD_IPV4_CKSUM + | NETDEV_TX_OFFLOAD_TCP_CKSUM + | NETDEV_TX_OFFLOAD_UDP_CKSUM); + + if (userspace_tso_enabled()) { + netdev_->ol_flags |= NETDEV_TX_OFFLOAD_TCP_TSO; + } + } else { + VLOG_WARN("%s: Disabling hardware offloading: %s", name, + ovs_strerror(errno)); } netdev->present = true; @@ -1344,18 +1345,22 @@ netdev_linux_batch_rxq_recv_sock(struct netdev_rxq_linux *rx, int mtu, pkt = buffers[i]; } - if (virtio_net_hdr_size && netdev_linux_parse_vnet_hdr(pkt)) { - struct netdev *netdev_ = netdev_rxq_get_netdev(&rx->up); - struct netdev_linux *netdev = netdev_linux_cast(netdev_); + if (virtio_net_hdr_size) { + int ret = netdev_linux_parse_vnet_hdr(pkt); + if (OVS_UNLIKELY(ret)) { + struct netdev *netdev_ = netdev_rxq_get_netdev(&rx->up); + struct netdev_linux *netdev = netdev_linux_cast(netdev_); - /* Unexpected error situation: the virtio header is not present - * or corrupted. Drop the packet but continue in case next ones - * are correct. */ - dp_packet_delete(pkt); - netdev->rx_dropped += 1; - VLOG_WARN_RL(&rl, "%s: Dropped packet: Invalid virtio net header", - netdev_get_name(netdev_)); - continue; + /* Unexpected error situation: the virtio header is not + * present or corrupted or contains unsupported features. + * Drop the packet but continue in case next ones are + * correct. */ + dp_packet_delete(pkt); + netdev->rx_dropped += 1; + VLOG_WARN_RL(&rl, "%s: Dropped packet: %s", + netdev_get_name(netdev_), ovs_strerror(ret)); + continue; + } } for (cmsg = CMSG_FIRSTHDR(&mmsgs[i].msg_hdr); cmsg; @@ -1403,7 +1408,6 @@ static int netdev_linux_batch_rxq_recv_tap(struct netdev_rxq_linux *rx, int mtu, struct dp_packet_batch *batch) { - int virtio_net_hdr_size; ssize_t retval; size_t std_len; int iovlen; @@ -1413,16 +1417,14 @@ netdev_linux_batch_rxq_recv_tap(struct netdev_rxq_linux *rx, int mtu, /* Use the buffer from the allocated packet below to receive MTU * sized packets and an aux_buf for extra TSO data. */ iovlen = IOV_TSO_SIZE; - virtio_net_hdr_size = sizeof(struct virtio_net_hdr); } else { /* Use only the buffer from the allocated packet. */ iovlen = IOV_STD_SIZE; - virtio_net_hdr_size = 0; } /* The length here needs to be accounted in the same way when the * aux_buf is allocated so that it can be prepended to TSO buffer. */ - std_len = virtio_net_hdr_size + VLAN_ETH_HEADER_LEN + mtu; + std_len = sizeof(struct virtio_net_hdr) + VLAN_ETH_HEADER_LEN + mtu; for (i = 0; i < NETDEV_MAX_BURST; i++) { struct dp_packet *buffer; struct dp_packet *pkt; @@ -1462,7 +1464,7 @@ netdev_linux_batch_rxq_recv_tap(struct netdev_rxq_linux *rx, int mtu, pkt = buffer; } - if (virtio_net_hdr_size && netdev_linux_parse_vnet_hdr(pkt)) { + if (netdev_linux_parse_vnet_hdr(pkt)) { struct netdev *netdev_ = netdev_rxq_get_netdev(&rx->up); struct netdev_linux *netdev = netdev_linux_cast(netdev_); @@ -1611,7 +1613,7 @@ netdev_linux_sock_batch_send(int sock, int ifindex, bool tso, int mtu, * on other interface types because we attach a socket filter to the rx * socket. */ static int -netdev_linux_tap_batch_send(struct netdev *netdev_, bool tso, int mtu, +netdev_linux_tap_batch_send(struct netdev *netdev_, int mtu, struct dp_packet_batch *batch) { struct netdev_linux *netdev = netdev_linux_cast(netdev_); @@ -1632,9 +1634,7 @@ netdev_linux_tap_batch_send(struct netdev *netdev_, bool tso, int mtu, ssize_t retval; int error; - if (tso) { - netdev_linux_prepend_vnet_hdr(packet, mtu); - } + netdev_linux_prepend_vnet_hdr(packet, mtu); size = dp_packet_size(packet); do { @@ -1765,7 +1765,7 @@ netdev_linux_send(struct netdev *netdev_, int qid OVS_UNUSED, error = netdev_linux_sock_batch_send(sock, ifindex, tso, mtu, batch); } else { - error = netdev_linux_tap_batch_send(netdev_, tso, mtu, batch); + error = netdev_linux_tap_batch_send(netdev_, mtu, batch); } if (error) { if (error == ENOBUFS) { @@ -6831,53 +6831,76 @@ netdev_linux_parse_l2(struct dp_packet *b, uint16_t *l4proto) return 0; } +/* Initializes packet 'b' with features enabled in the prepended + * struct virtio_net_hdr. Returns 0 if successful, otherwise a + * positive errno value. */ static int netdev_linux_parse_vnet_hdr(struct dp_packet *b) { struct virtio_net_hdr *vnet = dp_packet_pull(b, sizeof *vnet); - uint16_t l4proto = 0; if (OVS_UNLIKELY(!vnet)) { - return -EINVAL; + return EINVAL; } if (vnet->flags == 0 && vnet->gso_type == VIRTIO_NET_HDR_GSO_NONE) { return 0; } - if (netdev_linux_parse_l2(b, &l4proto)) { - return -EINVAL; - } - if (vnet->flags == VIRTIO_NET_HDR_F_NEEDS_CSUM) { - if (l4proto == IPPROTO_TCP) { - dp_packet_hwol_set_csum_tcp(b); - } else if (l4proto == IPPROTO_UDP) { + uint16_t l4proto = 0; + + if (netdev_linux_parse_l2(b, &l4proto)) { + return EINVAL; + } + + if (l4proto == IPPROTO_UDP) { dp_packet_hwol_set_csum_udp(b); - } else if (l4proto == IPPROTO_SCTP) { - dp_packet_hwol_set_csum_sctp(b); } + /* The packet has offloaded checksum. However, there is no + * additional information like the protocol used, so it would + * require to parse the packet here. The checksum starting point + * and offset are going to be verified when the packet headers + * are parsed during miniflow extraction. */ + b->csum_start = (OVS_FORCE uint16_t) vnet->csum_start; + b->csum_offset = (OVS_FORCE uint16_t) vnet->csum_offset; + } else { + b->csum_start = 0; + b->csum_offset = 0; } - if (l4proto && vnet->gso_type != VIRTIO_NET_HDR_GSO_NONE) { - uint8_t allowed_mask = VIRTIO_NET_HDR_GSO_TCPV4 - | VIRTIO_NET_HDR_GSO_TCPV6 - | VIRTIO_NET_HDR_GSO_UDP; - uint8_t type = vnet->gso_type & allowed_mask; + int ret = 0; + switch (vnet->gso_type) { + case VIRTIO_NET_HDR_GSO_TCPV4: + case VIRTIO_NET_HDR_GSO_TCPV6: + /* FIXME: The packet has offloaded TCP segmentation. The gso_size + * is given and needs to be respected. */ + dp_packet_hwol_set_tcp_seg(b); + break; - if (type == VIRTIO_NET_HDR_GSO_TCPV4 - || type == VIRTIO_NET_HDR_GSO_TCPV6) { - dp_packet_hwol_set_tcp_seg(b); - } + case VIRTIO_NET_HDR_GSO_UDP: + /* UFO is not supported. */ + VLOG_WARN_RL(&rl, "Received an unsupported packet with UFO enabled."); + ret = ENOTSUP; + break; + + case VIRTIO_NET_HDR_GSO_NONE: + break; + + default: + ret = ENOTSUP; + VLOG_WARN_RL(&rl, "Received an unsupported packet with GSO type: 0x%x", + vnet->gso_type); } - return 0; + return ret; } static void netdev_linux_prepend_vnet_hdr(struct dp_packet *b, int mtu) { - struct virtio_net_hdr *vnet = dp_packet_push_zeros(b, sizeof *vnet); + struct virtio_net_hdr v; + struct virtio_net_hdr *vnet = &v; if (dp_packet_hwol_is_tso(b)) { uint16_t hdr_len = ((char *)dp_packet_l4(b) - (char *)dp_packet_eth(b)) @@ -6887,30 +6910,91 @@ netdev_linux_prepend_vnet_hdr(struct dp_packet *b, int mtu) vnet->gso_size = (OVS_FORCE __virtio16)(mtu - hdr_len); if (dp_packet_hwol_is_ipv4(b)) { vnet->gso_type = VIRTIO_NET_HDR_GSO_TCPV4; - } else { + } else if (dp_packet_hwol_tx_ipv6(b)) { vnet->gso_type = VIRTIO_NET_HDR_GSO_TCPV6; } } else { - vnet->flags = VIRTIO_NET_HDR_GSO_NONE; + vnet->hdr_len = 0; + vnet->gso_size = 0; + vnet->gso_type = VIRTIO_NET_HDR_GSO_NONE; } - if (dp_packet_hwol_l4_mask(b)) { - vnet->flags = VIRTIO_NET_HDR_F_NEEDS_CSUM; - vnet->csum_start = (OVS_FORCE __virtio16)((char *)dp_packet_l4(b) - - (char *)dp_packet_eth(b)); - + if (dp_packet_l4_checksum_good(b)) { + /* The packet has good L4 checksum. No need to validate again. */ + vnet->csum_start = vnet->csum_offset = (OVS_FORCE __virtio16) 0; + vnet->flags = VIRTIO_NET_HDR_F_DATA_VALID; + } else if (dp_packet_hwol_tx_l4_checksum(b)) { + /* The csum calculation is offloaded. */ if (dp_packet_hwol_l4_is_tcp(b)) { + /* Virtual I/O Device (VIRTIO) Version 1.1 + * 5.1.6.2 Packet Transmission + * If the driver negotiated VIRTIO_NET_F_CSUM, it can skip + * checksumming the packet: + * - flags has the VIRTIO_NET_HDR_F_NEEDS_CSUM set, + * - csum_start is set to the offset within the packet + * to begin checksumming, and + * - csum_offset indicates how many bytes after the + * csum_start the new (16 bit ones complement) checksum + * is placed by the device. + * The TCP checksum field in the packet is set to the sum of + * the TCP pseudo header, so that replacing it by the ones + * complement checksum of the TCP header and body will give + * the correct result. */ + + struct tcp_header *tcp_hdr = dp_packet_l4(b); + ovs_be16 csum = 0; + if (dp_packet_hwol_is_ipv4(b)) { + const struct ip_header *ip_hdr = dp_packet_l3(b); + csum = ~csum_finish(packet_csum_pseudoheader(ip_hdr)); + } else if (dp_packet_hwol_tx_ipv6(b)) { + const struct ovs_16aligned_ip6_hdr *ip6_hdr = dp_packet_l3(b); + csum = ~csum_finish(packet_csum_pseudoheader6(ip6_hdr)); + } + + tcp_hdr->tcp_csum = csum; + vnet->flags = VIRTIO_NET_HDR_F_NEEDS_CSUM; + vnet->csum_start = (OVS_FORCE __virtio16) b->l4_ofs; vnet->csum_offset = (OVS_FORCE __virtio16) __builtin_offsetof( struct tcp_header, tcp_csum); } else if (dp_packet_hwol_l4_is_udp(b)) { + struct udp_header *udp_hdr = dp_packet_l4(b); + ovs_be16 csum = 0; + + if (dp_packet_hwol_is_ipv4(b)) { + const struct ip_header *ip_hdr = dp_packet_l3(b); + csum = ~csum_finish(packet_csum_pseudoheader(ip_hdr)); + } else if (dp_packet_hwol_tx_ipv6(b)) { + const struct ovs_16aligned_ip6_hdr *ip6_hdr = dp_packet_l3(b); + csum = ~csum_finish(packet_csum_pseudoheader6(ip6_hdr)); + } + + udp_hdr->udp_csum = csum; + vnet->flags = VIRTIO_NET_HDR_F_NEEDS_CSUM; + vnet->csum_start = (OVS_FORCE __virtio16) b->l4_ofs; vnet->csum_offset = (OVS_FORCE __virtio16) __builtin_offsetof( struct udp_header, udp_csum); } else if (dp_packet_hwol_l4_is_sctp(b)) { - vnet->csum_offset = (OVS_FORCE __virtio16) __builtin_offsetof( - struct sctp_header, sctp_csum); + /* The Linux kernel networking stack only supports csum_start + * and csum_offset when SCTP GSO is enabled. See kernel's + * skb_csum_hwoffload_help(). Currently there is no SCTP + * segmentation offload support in OVS. */ + vnet->csum_start = vnet->csum_offset = (OVS_FORCE __virtio16) 0; + vnet->flags = 0; } else { - VLOG_WARN_RL(&rl, "Unsupported L4 protocol"); + /* This should only happen when DP_PACKET_OL_TX_L4_MASK includes + * a new flag that is not covered in above checks. */ + VLOG_WARN_RL(&rl, "Unsupported L4 checksum offload. " + "Flags: %"PRIu64, + (uint64_t)*dp_packet_ol_flags_ptr(b)); + vnet->csum_start = vnet->csum_offset = (OVS_FORCE __virtio16) 0; + vnet->flags = 0; } + } else { + /* Packet L4 csum is unknown. */ + vnet->csum_start = vnet->csum_offset = (OVS_FORCE __virtio16) 0; + vnet->flags = 0; } + + dp_packet_push(b, vnet, sizeof *vnet); } diff --git a/lib/netdev-native-tnl.c b/lib/netdev-native-tnl.c index 53055a254..f16da6a19 100644 --- a/lib/netdev-native-tnl.c +++ b/lib/netdev-native-tnl.c @@ -224,28 +224,6 @@ udp_extract_tnl_md(struct dp_packet *packet, struct flow_tnl *tnl, return udp + 1; } -static void -netdev_tnl_calc_udp_csum(struct udp_header *udp, struct dp_packet *packet, - int ip_tot_size) -{ - uint32_t csum; - - if (netdev_tnl_is_header_ipv6(dp_packet_data(packet))) { - csum = packet_csum_pseudoheader6(netdev_tnl_ipv6_hdr( - dp_packet_data(packet))); - } else { - csum = packet_csum_pseudoheader(netdev_tnl_ip_hdr( - dp_packet_data(packet))); - } - - csum = csum_continue(csum, udp, ip_tot_size); - udp->udp_csum = csum_finish(csum); - - if (!udp->udp_csum) { - udp->udp_csum = htons(0xffff); - } -} - void netdev_tnl_push_udp_header(const struct netdev *netdev OVS_UNUSED, struct dp_packet *packet, @@ -260,9 +238,9 @@ netdev_tnl_push_udp_header(const struct netdev *netdev OVS_UNUSED, udp->udp_src = netdev_tnl_get_src_port(packet); udp->udp_len = htons(ip_tot_size); - if (udp->udp_csum) { - netdev_tnl_calc_udp_csum(udp, packet, ip_tot_size); - } + /* Postpone checksum to the egress netdev. */ + dp_packet_hwol_set_csum_udp(packet); + dp_packet_ol_reset_l4_csum_good(packet); } static void * @@ -806,7 +784,9 @@ netdev_gtpu_push_header(const struct netdev *netdev, data->header_len, &ip_tot_size); udp->udp_src = netdev_tnl_get_src_port(packet); udp->udp_len = htons(ip_tot_size); - netdev_tnl_calc_udp_csum(udp, packet, ip_tot_size); + /* Postpone checksum to the egress netdev. */ + dp_packet_hwol_set_csum_udp(packet); + dp_packet_ol_reset_l4_csum_good(packet); gtpuh = ALIGNED_CAST(struct gtpuhdr *, udp + 1); diff --git a/lib/netdev.c b/lib/netdev.c index 13449cfc8..ec378de90 100644 --- a/lib/netdev.c +++ b/lib/netdev.c @@ -799,8 +799,6 @@ static bool netdev_send_prepare_packet(const uint64_t netdev_flags, struct dp_packet *packet, char **errormsg) { - uint64_t l4_mask; - if (dp_packet_hwol_is_tso(packet) && !(netdev_flags & NETDEV_TX_OFFLOAD_TCP_TSO)) { /* Fall back to GSO in software. */ @@ -813,36 +811,16 @@ netdev_send_prepare_packet(const uint64_t netdev_flags, * netdev to decide what would be the best to do. * Provide a software fallback in case the device doesn't support IP csum * offloading. Note: Encapsulated packet must have the inner IP header + * csum already calculated. + * Packet with L4 csum offloading enabled was received with verified csum. + * Leave the L4 csum offloading enabled even with good checksum for the + * netdev to decide what would be the best to do. + * Netdev that requires pseudo header csum needs to calculate that. + * Provide a software fallback in case the netdev doesn't support L4 csum + * offloading. Note: Encapsulated packet must have the inner L4 header * csum already calculated. */ dp_packet_ol_send_prepare(packet, netdev_flags); - l4_mask = dp_packet_hwol_l4_mask(packet); - if (l4_mask) { - if (dp_packet_hwol_l4_is_tcp(packet)) { - if (!(netdev_flags & NETDEV_TX_OFFLOAD_TCP_CKSUM)) { - /* Fall back to TCP csum in software. */ - VLOG_ERR_BUF(errormsg, "No TCP checksum support"); - return false; - } - } else if (dp_packet_hwol_l4_is_udp(packet)) { - if (!(netdev_flags & NETDEV_TX_OFFLOAD_UDP_CKSUM)) { - /* Fall back to UDP csum in software. */ - VLOG_ERR_BUF(errormsg, "No UDP checksum support"); - return false; - } - } else if (dp_packet_hwol_l4_is_sctp(packet)) { - if (!(netdev_flags & NETDEV_TX_OFFLOAD_SCTP_CKSUM)) { - /* Fall back to SCTP csum in software. */ - VLOG_ERR_BUF(errormsg, "No SCTP checksum support"); - return false; - } - } else { - VLOG_ERR_BUF(errormsg, "No L4 checksum support: mask: %"PRIu64, - l4_mask); - return false; - } - } - return true; } @@ -975,20 +953,16 @@ netdev_push_header(const struct netdev *netdev, size_t i, size = dp_packet_batch_size(batch); DP_PACKET_BATCH_REFILL_FOR_EACH (i, size, packet, batch) { - if (OVS_UNLIKELY(dp_packet_hwol_is_tso(packet) - || dp_packet_hwol_l4_mask(packet))) { + if (OVS_UNLIKELY(dp_packet_hwol_is_tso(packet))) { COVERAGE_INC(netdev_push_header_drops); dp_packet_delete(packet); - VLOG_WARN_RL(&rl, "%s: Tunneling packets with HW offload flags is " + VLOG_WARN_RL(&rl, "%s: Tunneling packets with TSO is " "not supported: packet dropped", netdev_get_name(netdev)); } else { /* The packet is going to be encapsulated and there is * no support yet for inner network header csum offloading. */ - if (dp_packet_hwol_tx_ip_csum(packet) - && !dp_packet_ip_checksum_good(packet)) { - dp_packet_ip_set_header_csum(packet); - } + dp_packet_ol_send_prepare(packet, 0); netdev->netdev_class->push_header(netdev, packet, data); diff --git a/lib/odp-execute-avx512.c b/lib/odp-execute-avx512.c index 93b6b6ccc..ebb13d2d1 100644 --- a/lib/odp-execute-avx512.c +++ b/lib/odp-execute-avx512.c @@ -485,9 +485,11 @@ action_avx512_ipv4_set_addrs(struct dp_packet_batch *batch, size_t l4_size = dp_packet_l4_size(packet); if (nh->ip_proto == IPPROTO_UDP && l4_size >= UDP_HEADER_LEN) { - /* New UDP checksum. */ struct udp_header *uh = dp_packet_l4(packet); - if (uh->udp_csum) { + if (dp_packet_hwol_l4_is_udp(packet)) { + dp_packet_ol_reset_l4_csum_good(packet); + } else if (uh->udp_csum) { + /* New UDP checksum. */ uint16_t old_udp_checksum = ~uh->udp_csum; uint32_t udp_checksum = old_udp_checksum + delta_checksum; udp_checksum = csum_finish(udp_checksum); @@ -500,13 +502,17 @@ action_avx512_ipv4_set_addrs(struct dp_packet_batch *batch, } } else if (nh->ip_proto == IPPROTO_TCP && l4_size >= TCP_HEADER_LEN) { - /* New TCP checksum. */ - struct tcp_header *th = dp_packet_l4(packet); - uint16_t old_tcp_checksum = ~th->tcp_csum; - uint32_t tcp_checksum = old_tcp_checksum + delta_checksum; - tcp_checksum = csum_finish(tcp_checksum); - - th->tcp_csum = tcp_checksum; + if (dp_packet_hwol_l4_is_tcp(packet)) { + dp_packet_ol_reset_l4_csum_good(packet); + } else { + /* New TCP checksum. */ + struct tcp_header *th = dp_packet_l4(packet); + uint16_t old_tcp_checksum = ~th->tcp_csum; + uint32_t tcp_checksum = old_tcp_checksum + delta_checksum; + tcp_checksum = csum_finish(tcp_checksum); + + th->tcp_csum = tcp_checksum; + } } pkt_metadata_init_conn(&packet->md); diff --git a/lib/packets.c b/lib/packets.c index 36d9ec5b9..198098db1 100644 --- a/lib/packets.c +++ b/lib/packets.c @@ -1131,16 +1131,22 @@ packet_set_ipv4_addr(struct dp_packet *packet, pkt_metadata_init_conn(&packet->md); if (nh->ip_proto == IPPROTO_TCP && l4_size >= TCP_HEADER_LEN) { - struct tcp_header *th = dp_packet_l4(packet); - - th->tcp_csum = recalc_csum32(th->tcp_csum, old_addr, new_addr); + if (dp_packet_hwol_l4_is_tcp(packet)) { + dp_packet_ol_reset_l4_csum_good(packet); + } else { + struct tcp_header *th = dp_packet_l4(packet); + th->tcp_csum = recalc_csum32(th->tcp_csum, old_addr, new_addr); + } } else if (nh->ip_proto == IPPROTO_UDP && l4_size >= UDP_HEADER_LEN ) { - struct udp_header *uh = dp_packet_l4(packet); - - if (uh->udp_csum) { - uh->udp_csum = recalc_csum32(uh->udp_csum, old_addr, new_addr); - if (!uh->udp_csum) { - uh->udp_csum = htons(0xffff); + if (dp_packet_hwol_l4_is_udp(packet)) { + dp_packet_ol_reset_l4_csum_good(packet); + } else { + struct udp_header *uh = dp_packet_l4(packet); + if (uh->udp_csum) { + uh->udp_csum = recalc_csum32(uh->udp_csum, old_addr, new_addr); + if (!uh->udp_csum) { + uh->udp_csum = htons(0xffff); + } } } } @@ -1246,16 +1252,24 @@ packet_update_csum128(struct dp_packet *packet, uint8_t proto, size_t l4_size = dp_packet_l4_size(packet); if (proto == IPPROTO_TCP && l4_size >= TCP_HEADER_LEN) { - struct tcp_header *th = dp_packet_l4(packet); + if (dp_packet_hwol_l4_is_tcp(packet)) { + dp_packet_ol_reset_l4_csum_good(packet); + } else { + struct tcp_header *th = dp_packet_l4(packet); - th->tcp_csum = recalc_csum128(th->tcp_csum, addr, new_addr); + th->tcp_csum = recalc_csum128(th->tcp_csum, addr, new_addr); + } } else if (proto == IPPROTO_UDP && l4_size >= UDP_HEADER_LEN) { - struct udp_header *uh = dp_packet_l4(packet); + if (dp_packet_hwol_l4_is_udp(packet)) { + dp_packet_ol_reset_l4_csum_good(packet); + } else { + struct udp_header *uh = dp_packet_l4(packet); - if (uh->udp_csum) { - uh->udp_csum = recalc_csum128(uh->udp_csum, addr, new_addr); - if (!uh->udp_csum) { - uh->udp_csum = htons(0xffff); + if (uh->udp_csum) { + uh->udp_csum = recalc_csum128(uh->udp_csum, addr, new_addr); + if (!uh->udp_csum) { + uh->udp_csum = htons(0xffff); + } } } } else if (proto == IPPROTO_ICMPV6 && @@ -1375,7 +1389,9 @@ static void packet_set_port(ovs_be16 *port, ovs_be16 new_port, ovs_be16 *csum) { if (*port != new_port) { - *csum = recalc_csum16(*csum, *port, new_port); + if (csum) { + *csum = recalc_csum16(*csum, *port, new_port); + } *port = new_port; } } @@ -1387,9 +1403,16 @@ void packet_set_tcp_port(struct dp_packet *packet, ovs_be16 src, ovs_be16 dst) { struct tcp_header *th = dp_packet_l4(packet); + ovs_be16 *csum = NULL; + + if (dp_packet_hwol_l4_is_tcp(packet)) { + dp_packet_ol_reset_l4_csum_good(packet); + } else { + csum = &th->tcp_csum; + } - packet_set_port(&th->tcp_src, src, &th->tcp_csum); - packet_set_port(&th->tcp_dst, dst, &th->tcp_csum); + packet_set_port(&th->tcp_src, src, csum); + packet_set_port(&th->tcp_dst, dst, csum); pkt_metadata_init_conn(&packet->md); } @@ -1401,17 +1424,21 @@ packet_set_udp_port(struct dp_packet *packet, ovs_be16 src, ovs_be16 dst) { struct udp_header *uh = dp_packet_l4(packet); - if (uh->udp_csum) { - packet_set_port(&uh->udp_src, src, &uh->udp_csum); - packet_set_port(&uh->udp_dst, dst, &uh->udp_csum); + if (dp_packet_hwol_l4_is_udp(packet)) { + dp_packet_ol_reset_l4_csum_good(packet); + packet_set_port(&uh->udp_src, src, NULL); + packet_set_port(&uh->udp_dst, dst, NULL); + } else { + ovs_be16 *csum = uh->udp_csum ? &uh->udp_csum : NULL; + + packet_set_port(&uh->udp_src, src, csum); + packet_set_port(&uh->udp_dst, dst, csum); - if (!uh->udp_csum) { + if (csum && !uh->udp_csum) { uh->udp_csum = htons(0xffff); } - } else { - uh->udp_src = src; - uh->udp_dst = dst; } + pkt_metadata_init_conn(&packet->md); } @@ -1422,18 +1449,27 @@ void packet_set_sctp_port(struct dp_packet *packet, ovs_be16 src, ovs_be16 dst) { struct sctp_header *sh = dp_packet_l4(packet); - ovs_be32 old_csum, old_correct_csum, new_csum; - uint16_t tp_len = dp_packet_l4_size(packet); - old_csum = get_16aligned_be32(&sh->sctp_csum); - put_16aligned_be32(&sh->sctp_csum, 0); - old_correct_csum = crc32c((void *)sh, tp_len); + if (dp_packet_hwol_l4_is_sctp(packet)) { + dp_packet_ol_reset_l4_csum_good(packet); + sh->sctp_src = src; + sh->sctp_dst = dst; + } else { + ovs_be32 old_csum, old_correct_csum, new_csum; + uint16_t tp_len = dp_packet_l4_size(packet); - sh->sctp_src = src; - sh->sctp_dst = dst; + old_csum = get_16aligned_be32(&sh->sctp_csum); + put_16aligned_be32(&sh->sctp_csum, 0); + old_correct_csum = crc32c((void *) sh, tp_len); + + sh->sctp_src = src; + sh->sctp_dst = dst; + + new_csum = crc32c((void *) sh, tp_len); + put_16aligned_be32(&sh->sctp_csum, old_csum ^ old_correct_csum + ^ new_csum); + } - new_csum = crc32c((void *)sh, tp_len); - put_16aligned_be32(&sh->sctp_csum, old_csum ^ old_correct_csum ^ new_csum); pkt_metadata_init_conn(&packet->md); } @@ -1957,3 +1993,72 @@ IP_ECN_set_ce(struct dp_packet *pkt, bool is_ipv6) } } } + +/* Set TCP checksum field in packet 'p' with complete checksum. + * The packet must have the L3 and L4 offsets. */ +void +packet_tcp_complete_csum(struct dp_packet *p) +{ + struct tcp_header *tcp = dp_packet_l4(p); + + tcp->tcp_csum = 0; + if (dp_packet_hwol_is_ipv4(p)) { + struct ip_header *ip = dp_packet_l3(p); + + tcp->tcp_csum = csum_finish(csum_continue(packet_csum_pseudoheader(ip), + tcp, dp_packet_l4_size(p))); + } else if (dp_packet_hwol_tx_ipv6(p)) { + struct ovs_16aligned_ip6_hdr *ip6 = dp_packet_l3(p); + + tcp->tcp_csum = packet_csum_upperlayer6(ip6, tcp, ip6->ip6_nxt, + dp_packet_l4_size(p)); + } else { + OVS_NOT_REACHED(); + } +} + +/* Set UDP checksum field in packet 'p' with complete checksum. + * The packet must have the L3 and L4 offsets. */ +void +packet_udp_complete_csum(struct dp_packet *p) +{ + struct udp_header *udp = dp_packet_l4(p); + + /* Skip csum calculation if the udp_csum is zero. */ + if (!udp->udp_csum) { + return; + } + + udp->udp_csum = 0; + if (dp_packet_hwol_is_ipv4(p)) { + struct ip_header *ip = dp_packet_l3(p); + + udp->udp_csum = csum_finish(csum_continue(packet_csum_pseudoheader(ip), + udp, dp_packet_l4_size(p))); + } else if (dp_packet_hwol_tx_ipv6(p)) { + struct ovs_16aligned_ip6_hdr *ip6 = dp_packet_l3(p); + + udp->udp_csum = packet_csum_upperlayer6(ip6, udp, ip6->ip6_nxt, + dp_packet_l4_size(p)); + } else { + OVS_NOT_REACHED(); + } + + if (!udp->udp_csum) { + udp->udp_csum = htons(0xffff); + } +} + +/* Set SCTP checksum field in packet 'p' with complete checksum. + * The packet must have the L3 and L4 offsets. */ +void +packet_sctp_complete_csum(struct dp_packet *p) +{ + struct sctp_header *sh = dp_packet_l4(p); + uint16_t tp_len = dp_packet_l4_size(p); + ovs_be32 csum; + + put_16aligned_be32(&sh->sctp_csum, 0); + csum = crc32c((void *) sh, tp_len); + put_16aligned_be32(&sh->sctp_csum, csum); +} diff --git a/lib/packets.h b/lib/packets.h index 8626aac8d..eefc16b46 100644 --- a/lib/packets.h +++ b/lib/packets.h @@ -1645,6 +1645,9 @@ uint32_t packet_csum_pseudoheader(const struct ip_header *); bool packet_rh_present(struct dp_packet *packet, uint8_t *nexthdr, bool *first_frag); void IP_ECN_set_ce(struct dp_packet *pkt, bool is_ipv6); +void packet_tcp_complete_csum(struct dp_packet *); +void packet_udp_complete_csum(struct dp_packet *); +void packet_sctp_complete_csum(struct dp_packet *); #define DNS_HEADER_LEN 12 struct dns_header {