From patchwork Thu Jan 14 15:10:43 2010 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: William Allen Simpson X-Patchwork-Id: 42897 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id DBAC9B7CBA for ; Fri, 15 Jan 2010 02:13:29 +1100 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757204Ab0ANPKv (ORCPT ); Thu, 14 Jan 2010 10:10:51 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1757174Ab0ANPKt (ORCPT ); Thu, 14 Jan 2010 10:10:49 -0500 Received: from qw-out-2122.google.com ([74.125.92.26]:38243 "EHLO qw-out-2122.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756625Ab0ANPKq (ORCPT ); Thu, 14 Jan 2010 10:10:46 -0500 Received: by qw-out-2122.google.com with SMTP id 9so409883qwb.37 for ; Thu, 14 Jan 2010 07:10:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from :user-agent:mime-version:to:cc:subject:references:in-reply-to :content-type; bh=bAtVerYNr1t5pFP5r6K+aN+kzRG6NTmtcT1X9xTX1U0=; b=x1O8YyvtbKzsBoR/QG4WpMFeH0PaLG9o2gNiHP4Av2h4rAAyBT3JnnzbVOklD/n0gp XmraO3hNf07sZ7vnaijrRWjB4EE2/cQFfcemk0PdemEzbF30089f8LImebyL2VdRYmZe aiAWjYklL2pzRXD4ow996OHknEmc8NAfFbdjc= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type; b=jrnxOr4Orh2DNWNgwyiJFZf72U31Hd4Mu5q8iTVnTgfeM0EYqcDsNp2aCFlWHl7fMK SJhxJAH5zojlXLnPJmYPTdUcoXpZf7aDqTh65DZOqQpPmbO3Gj5r5szr52oPTpOoW6xa 76klgpnh31qYAt79lbdZaKle873nH0wsXILjE= Received: by 10.229.52.38 with SMTP id f38mr626870qcg.26.1263481845333; Thu, 14 Jan 2010 07:10:45 -0800 (PST) Received: from Wastrel.local (c-68-40-195-221.hsd1.mi.comcast.net [68.40.195.221]) by mx.google.com with ESMTPS id 21sm675640iwn.2.2010.01.14.07.10.43 (version=TLSv1/SSLv3 cipher=RC4-MD5); Thu, 14 Jan 2010 07:10:44 -0800 (PST) Message-ID: <4B4F33F3.3000509@gmail.com> Date: Thu, 14 Jan 2010 10:10:43 -0500 From: William Allen Simpson User-Agent: Thunderbird 2.0.0.23 (Macintosh/20090812) MIME-Version: 1.0 To: Linux Kernel Developers CC: Linux Kernel Network Developers , =?ISO-8859-1?Q?Ilpo_J=E4rvinen?= , Joe Perches , Andi Kleen Subject: [PATCH v5] tcp: harmonize tcp_vx_rcv header length assumptions References: <4B49D001.4000302@gmail.com> In-Reply-To: <4B49D001.4000302@gmail.com> Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Harmonize tcp_v4_rcv() and tcp_v6_rcv() -- better document tcp doff and header length assumptions, and carefully compare implementations. Reduces multiply/shifts, marginally improving speed. Removes redundant tcp header length checks before checksumming. Instead, assumes (and documents) that any backlog processing and transform policies will carefully preserve the header, and will ensure the socket buffer length remains >= the header size. Stand-alone patch, originally developed for TCPCT. Signed-off-by: William.Allen.Simpson@gmail.com --- include/net/xfrm.h | 7 ++++++ net/ipv4/tcp_ipv4.c | 45 +++++++++++++++++++++----------------- net/ipv6/tcp_ipv6.c | 59 ++++++++++++++++++++++++++++---------------------- 3 files changed, 65 insertions(+), 46 deletions(-) diff --git a/include/net/xfrm.h b/include/net/xfrm.h index 6d85861..89c5465 100644 --- a/include/net/xfrm.h +++ b/include/net/xfrm.h @@ -975,6 +975,13 @@ xfrm_state_addr_cmp(struct xfrm_tmpl *tmpl, struct xfrm_state *x, unsigned short } #ifdef CONFIG_XFRM +/* + * For transport, the policy is checked before the presumed more expensive + * checksum. The transport header has already been checked for size, and is + * guaranteed to be contiguous. These policies must not alter the header or + * its position in the buffer, and should not shorten the buffer length + * without ensuring the length remains >= the header size. + */ extern int __xfrm_policy_check(struct sock *, int dir, struct sk_buff *skb, unsigned short family); static inline int __xfrm_policy_check2(struct sock *sk, int dir, diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index 65b8ebf..0a76e41 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -1559,7 +1559,8 @@ int tcp_v4_do_rcv(struct sock *sk, struct sk_buff *skb) return 0; } - if (skb->len < tcp_hdrlen(skb) || tcp_checksum_complete(skb)) + /* Assumes header and options unchanged since checksum_init() */ + if (tcp_checksum_complete(skb)) goto csum_err; if (sk->sk_state == TCP_LISTEN) { @@ -1601,14 +1602,13 @@ csum_err: } /* - * From tcp_input.c + * Called by ip_input.c: ip_local_deliver_finish() */ - int tcp_v4_rcv(struct sk_buff *skb) { - const struct iphdr *iph; struct tcphdr *th; struct sock *sk; + int tcp_header_len; int ret; struct net *net = dev_net(skb->dev); @@ -1618,31 +1618,33 @@ int tcp_v4_rcv(struct sk_buff *skb) /* Count it even if it's bad */ TCP_INC_STATS_BH(net, TCP_MIB_INSEGS); + /* Check too short header */ if (!pskb_may_pull(skb, sizeof(struct tcphdr))) goto discard_it; - th = tcp_hdr(skb); - - if (th->doff < sizeof(struct tcphdr) / 4) + /* Check bad doff, compare doff directly to constant value */ + tcp_header_len = tcp_hdr(skb)->doff; + if (tcp_header_len < (sizeof(struct tcphdr) / 4)) goto bad_packet; - if (!pskb_may_pull(skb, th->doff * 4)) + + /* Check too short header and options */ + tcp_header_len *= 4; + if (!pskb_may_pull(skb, tcp_header_len)) goto discard_it; - /* An explanation is required here, I think. - * Packet length and doff are validated by header prediction, - * provided case of th->doff==0 is eliminated. - * So, we defer the checks. */ + /* Packet length and doff are validated by header prediction, + * provided case of th->doff == 0 is eliminated (above). + */ if (!skb_csum_unnecessary(skb) && tcp_v4_checksum_init(skb)) goto bad_packet; th = tcp_hdr(skb); - iph = ip_hdr(skb); TCP_SKB_CB(skb)->seq = ntohl(th->seq); TCP_SKB_CB(skb)->end_seq = (TCP_SKB_CB(skb)->seq + th->syn + th->fin + - skb->len - th->doff * 4); + skb->len - tcp_header_len); TCP_SKB_CB(skb)->ack_seq = ntohl(th->ack_seq); TCP_SKB_CB(skb)->when = 0; - TCP_SKB_CB(skb)->flags = iph->tos; + TCP_SKB_CB(skb)->flags = ip_hdr(skb)->tos; TCP_SKB_CB(skb)->sacked = 0; sk = __inet_lookup_skb(&tcp_hashinfo, skb, th->source, th->dest); @@ -1682,14 +1684,14 @@ process: bh_unlock_sock(sk); sock_put(sk); - return ret; no_tcp_socket: if (!xfrm4_policy_check(NULL, XFRM_POLICY_IN, skb)) goto discard_it; - if (skb->len < (th->doff << 2) || tcp_checksum_complete(skb)) { + /* Assumes header and options unchanged since checksum_init() */ + if (tcp_checksum_complete(skb)) { bad_packet: TCP_INC_STATS_BH(net, TCP_MIB_INERRS); } else { @@ -1711,18 +1713,21 @@ do_time_wait: goto discard_it; } - if (skb->len < (th->doff << 2) || tcp_checksum_complete(skb)) { + /* Assumes header and options unchanged since checksum_init() */ + if (tcp_checksum_complete(skb)) { TCP_INC_STATS_BH(net, TCP_MIB_INERRS); inet_twsk_put(inet_twsk(sk)); goto discard_it; } + switch (tcp_timewait_state_process(inet_twsk(sk), skb, th)) { case TCP_TW_SYN: { struct sock *sk2 = inet_lookup_listener(dev_net(skb->dev), &tcp_hashinfo, - iph->daddr, th->dest, + ip_hdr(skb)->daddr, + th->dest, inet_iif(skb)); - if (sk2) { + if (sk2 != NULL) { inet_twsk_deschedule(inet_twsk(sk), &tcp_death_row); inet_twsk_put(inet_twsk(sk)); sk = sk2; diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index febfd59..b76939a 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -1594,7 +1594,8 @@ static int tcp_v6_do_rcv(struct sock *sk, struct sk_buff *skb) return 0; } - if (skb->len < tcp_hdrlen(skb) || tcp_checksum_complete(skb)) + /* Assumes header and options unchanged since checksum_init() */ + if (tcp_checksum_complete(skb)) goto csum_err; if (sk->sk_state == TCP_LISTEN) { @@ -1664,38 +1665,47 @@ ipv6_pktoptions: return 0; } +/* + * Called by ip6_input.c: ip6_input_finish() + */ static int tcp_v6_rcv(struct sk_buff *skb) { struct tcphdr *th; struct sock *sk; + int tcp_header_len; int ret; struct net *net = dev_net(skb->dev); if (skb->pkt_type != PACKET_HOST) goto discard_it; - /* - * Count it even if it's bad. - */ + /* Count it even if it's bad */ TCP_INC_STATS_BH(net, TCP_MIB_INSEGS); + /* Check too short header */ if (!pskb_may_pull(skb, sizeof(struct tcphdr))) goto discard_it; - th = tcp_hdr(skb); - - if (th->doff < sizeof(struct tcphdr)/4) + /* Check bad doff, compare doff directly to constant value */ + tcp_header_len = tcp_hdr(skb)->doff; + if (tcp_header_len < (sizeof(struct tcphdr) / 4)) goto bad_packet; - if (!pskb_may_pull(skb, th->doff*4)) + + /* Check too short header and options */ + tcp_header_len *= 4; + if (!pskb_may_pull(skb, tcp_header_len)) goto discard_it; + /* Packet length and doff are validated by header prediction, + * provided case of th->doff == 0 is eliminated (above). + */ if (!skb_csum_unnecessary(skb) && tcp_v6_checksum_init(skb)) goto bad_packet; th = tcp_hdr(skb); TCP_SKB_CB(skb)->seq = ntohl(th->seq); TCP_SKB_CB(skb)->end_seq = (TCP_SKB_CB(skb)->seq + th->syn + th->fin + - skb->len - th->doff*4); + skb->len - tcp_header_len); TCP_SKB_CB(skb)->ack_seq = ntohl(th->ack_seq); TCP_SKB_CB(skb)->when = 0; TCP_SKB_CB(skb)->flags = ipv6_get_dsfield(ipv6_hdr(skb)); @@ -1711,6 +1721,7 @@ process: if (!xfrm6_policy_check(sk, XFRM_POLICY_IN, skb)) goto discard_and_relse; + /* nf_reset(skb); in ip6_input.c ip6_input_finish() */ if (sk_filter(sk, skb)) goto discard_and_relse; @@ -1743,7 +1754,8 @@ no_tcp_socket: if (!xfrm6_policy_check(NULL, XFRM_POLICY_IN, skb)) goto discard_it; - if (skb->len < (th->doff<<2) || tcp_checksum_complete(skb)) { + /* Assumes header and options unchanged since checksum_init() */ + if (tcp_checksum_complete(skb)) { bad_packet: TCP_INC_STATS_BH(net, TCP_MIB_INERRS); } else { @@ -1751,11 +1763,7 @@ bad_packet: } discard_it: - - /* - * Discard frame - */ - + /* Discard frame. */ kfree_skb(skb); return 0; @@ -1769,24 +1777,23 @@ do_time_wait: goto discard_it; } - if (skb->len < (th->doff<<2) || tcp_checksum_complete(skb)) { + /* Assumes header and options unchanged since checksum_init() */ + if (tcp_checksum_complete(skb)) { TCP_INC_STATS_BH(net, TCP_MIB_INERRS); inet_twsk_put(inet_twsk(sk)); goto discard_it; } switch (tcp_timewait_state_process(inet_twsk(sk), skb, th)) { - case TCP_TW_SYN: - { - struct sock *sk2; - - sk2 = inet6_lookup_listener(dev_net(skb->dev), &tcp_hashinfo, - &ipv6_hdr(skb)->daddr, - ntohs(th->dest), inet6_iif(skb)); + case TCP_TW_SYN: { + struct sock *sk2 = inet6_lookup_listener(dev_net(skb->dev), + &tcp_hashinfo, + &ipv6_hdr(skb)->daddr, + ntohs(th->dest), + inet6_iif(skb)); if (sk2 != NULL) { - struct inet_timewait_sock *tw = inet_twsk(sk); - inet_twsk_deschedule(tw, &tcp_death_row); - inet_twsk_put(tw); + inet_twsk_deschedule(inet_twsk(sk), &tcp_death_row); + inet_twsk_put(inet_twsk(sk)); sk = sk2; goto process; }