From patchwork Fri Jan 8 19:47:35 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Edward Cree X-Patchwork-Id: 565043 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 369D7140BB1 for ; Sat, 9 Jan 2016 06:47:55 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933102AbcAHTru (ORCPT ); Fri, 8 Jan 2016 14:47:50 -0500 Received: from nbfkord-smmo02.seg.att.com ([209.65.160.78]:27154 "EHLO nbfkord-smmo02.seg.att.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932778AbcAHTrs (ORCPT ); Fri, 8 Jan 2016 14:47:48 -0500 Received: from unknown [12.187.104.26] (EHLO webmail.solarflare.com) by nbfkord-smmo02.seg.att.com(mxl_mta-7.2.4-6) over TLS secured channel with ESMTP id 26210965.0.2719873.00-2369.6357081.nbfkord-smmo02.seg.att.com (envelope-from ); Fri, 08 Jan 2016 19:47:47 +0000 (UTC) X-MXL-Hash: 569012636cf4612e-983e6de02bdd66431480b89cb6ba3caabe6ea00a Received: from ec-desktop.uk.level5networks.com (10.17.20.45) by ocex03.SolarFlarecom.com (10.20.40.36) with Microsoft SMTP Server (TLS) id 15.0.1044.25; Fri, 8 Jan 2016 11:47:37 -0800 From: Edward Cree Subject: [PATCH net-next 8/8] Documentation/networking: add checksum-offloads.txt to explain LCO To: David Miller References: <56901197.8040808@solarflare.com> CC: , , , Message-ID: <56901257.20602@solarflare.com> Date: Fri, 8 Jan 2016 19:47:35 +0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.1.0 MIME-Version: 1.0 In-Reply-To: <56901197.8040808@solarflare.com> X-Originating-IP: [10.17.20.45] X-ClientProxiedBy: ocex03.SolarFlarecom.com (10.20.40.36) To ocex03.SolarFlarecom.com (10.20.40.36) X-AnalysisOut: [v=2.0 cv=QrTpKyOd c=1 sm=1 a=8BlWFWvVlq5taO8ncb8nKg==:17 a] X-AnalysisOut: [=zRKbQ67AAAAA:8 a=fVG4DLb5TBsA:10 a=7aQ_Q-yQQ-AA:10 a=48vg] X-AnalysisOut: [C7mUAAAA:8 a=wE7toghjvCsghg530SUA:9 a=QEXdDO2ut3YA:10 a=5H] X-AnalysisOut: [PJuVnFyjPNE4Hy:21 a=tiDsZqGkDzPt5kH1:21] X-Spam: [F=0.2000000000; CM=0.500; S=0.200(2015072901)] X-MAIL-FROM: X-SOURCE-IP: [12.187.104.26] Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Signed-off-by: Edward Cree --- Documentation/networking/00-INDEX | 2 + Documentation/networking/checksum-offloads.txt | 119 +++++++++++++++++++++++++ include/linux/skbuff.h | 2 + 3 files changed, 123 insertions(+) create mode 100644 Documentation/networking/checksum-offloads.txt diff --git a/Documentation/networking/00-INDEX b/Documentation/networking/00-INDEX index df27a1a..415154a 100644 --- a/Documentation/networking/00-INDEX +++ b/Documentation/networking/00-INDEX @@ -44,6 +44,8 @@ can.txt - documentation on CAN protocol family. cdc_mbim.txt - 3G/LTE USB modem (Mobile Broadband Interface Model) +checksum-offloads.txt + - Explanation of checksum offloads; LCO, RCO cops.txt - info on the COPS LocalTalk Linux driver cs89x0.txt diff --git a/Documentation/networking/checksum-offloads.txt b/Documentation/networking/checksum-offloads.txt new file mode 100644 index 0000000..de2a327 --- /dev/null +++ b/Documentation/networking/checksum-offloads.txt @@ -0,0 +1,119 @@ +Checksum Offloads in the Linux Networking Stack + + +Introduction +============ + +This document describes a set of techniques in the Linux networking stack + to take advantage of checksum offload capabilities of various NICs. + +The following technologies are described: + * TX Checksum Offload + * LCO: Local Checksum Offload + * RCO: Remote Checksum Offload + +Things that should be documented here but aren't yet: + * RX Checksum Offload + * CHECKSUM_UNNECESSARY conversion + + +TX Checksum Offload +=================== + +The interface for offloading a transmit checksum to a device is explained + in detail in comments near the top of include/linux/skbuff.h. +In brief, it allows to request the device fill in a single ones-complement + checksum defined by the sk_buff fields skb->csum_start and + skb->csum_offset. The device should compute the 16-bit ones-complement + checksum (i.e. the 'IP-style' checksum) from csum_start to the end of the + packet, and fill in the result at (csum_start + csum_offset). +Because csum_offset cannot be negative, this ensures that the previous + value of the checksum field is included in the checksum computation, thus + it can be used to supply any needed corrections to the checksum (such as + the sum of the pseudo-header for UDP or TCP). +This interface only allows a single checksum to be offloaded. Where + encapsulation is used, the packet may have multiple checksum fields in + different header layers, and the rest will have to be handled by another + mechanism such as LCO or RCO. +No offloading of the IP header checksum is performed; it is always done in + software. This is OK because when we build the IP header, we obviously + have it in cache, so summing it isn't expensive. It's also rather short. +The requirements for GSO are more complicated, because when segmenting an + encapsulated packet both the inner and outer checksums may need to be + edited or recomputed for each resulting segment. See the skbuff.h comment + (section 'E') for more details. + +A driver declares its offload capabilities in netdev->hw_features; see + Documentation/networking/netdev-features for more. Note that a device + which only advertises NETIF_F_IP[V6]_CSUM must still obey the csum_start + and csum_offset given in the SKB; if it tries to deduce these itself in + hardware (as some NICs do) the driver should check that the values in the + SKB match those which the hardware will deduce, and if not, fall back to + checksumming in software instead (with skb_checksum_help or one of the + skb_csum_off_chk* functions as mentioned in include/linux/skbuff.h). This + is a pain, but that's what you get when hardware tries to be clever. + +The stack should, for the most part, assume that checksum offload is + supported by the underlying device. The only place that should check is + validate_xmit_skb(), and the functions it calls directly or indirectly. + That function compares the offload features requested by the SKB (which + may include other offloads besides TX Checksum Offload) and, if they are + not supported or enabled on the device (determined by netdev->features), + performs the corresponding offload in software. In the case of TX + Checksum Offload, that means calling skb_checksum_help(skb). + + +LCO: Local Checksum Offload +=========================== + +LCO is a technique for efficiently computing the outer checksum of an + encapsulated datagram when the inner checksum is due to be offloaded. +The ones-complement sum of a correctly checksummed TCP or UDP packet is + equal to the sum of the pseudo header, because everything else gets + 'cancelled out' by the checksum field. This is because the sum was + complemented before being written to the checksum field. +More generally, this holds in any case where the 'IP-style' ones complement + checksum is used, and thus any checksum that TX Checksum Offload supports. +That is, if we have set up TX Checksum Offload with a start/offset pair, we + know that _after the device has filled in that checksum_, the ones + complement sum from csum_start to the end of the packet will be equal to + _whatever value we put in the checksum field beforehand_. This allows us + to compute the outer checksum without looking at the payload: we simply + stop summing when we get to csum_start, then add the 16-bit word at + (csum_start + csum_offset). +Then, when the true inner checksum is filled in (either by hardware or by + skb_checksum_help()), the outer checksum will become correct by virtue of + the arithmetic. + +LCO is performed by the stack when constructing an outer UDP header for an + encapsulation such as VXLAN or GENEVE, in udp_set_csum(). Similarly for + the IPv6 equivalents, in udp6_set_csum(). +It is also performed when constructing an IPv4 GRE header, in + net/ipv4/ip_gre.c:build_header(). It is *not* currently performed when + constructing an IPv6 GRE header; the GRE checksum is computed over the + whole packet in net/ipv6/ip6_gre.c:ip6gre_xmit2(), but it should be + possible to use LCO here as IPv6 GRE still uses an IP-style checksum. +All of the LCO implementations use a helper function lco_csum(), in + include/linux/skbuff.h. + +LCO can safely be used for nested encapsulations; in this case, the outer + encapsulation layer will sum over both its own header and the 'middle' + header. This does mean that the 'middle' header will get summed multiple + times, but there doesn't seem to be a way to avoid that without incurring + bigger costs (e.g. in SKB bloat). + + +RCO: Remote Checksum Offload +============================ + +RCO is a technique for eliding the inner checksum of an encapsulated + datagram, allowing the outer checksum to be offloaded. It does, however, + involve a change to the encapsulation protocols, which the receiver must + also support. For this reason, it is disabled by default. +RCO is detailed in the following Internet-Drafts: +https://tools.ietf.org/html/draft-herbert-remotecsumoffload-00 +https://tools.ietf.org/html/draft-herbert-vxlan-rco-00 +In Linux, RCO is implemented individually in each encapsulation protocol, + and most tunnel types have flags controlling its use. For instance, VXLAN + has the flag VXLAN_F_REMCSUM_TX (per struct vxlan_rdst) to indicate that + RCO should be used when transmitting to a given remote destination. diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index ee063ff..448ea2e 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -3668,6 +3668,8 @@ static inline unsigned int skb_gso_network_seglen(const struct sk_buff *skb) /* Local Checksum Offload. * Compute outer checksum based on the assumption that the * inner checksum will be offloaded later. + * See Documentation/networking/checksum-offloads.txt for + * explanation of how this works. * Fill in outer checksum adjustment (e.g. with sum of outer * pseudo-header) before calling. * Also ensure that inner checksum is in linear data area.