Message ID | 4B4C4962.8040207@gmail.com |
---|---|
State | Not Applicable, archived |
Delegated to: | David Miller |
Headers | show |
Le 12/01/2010 11:05, William Allen Simpson a écrit : > Harmonize tcp_v4_rcv() and tcp_v6_rcv() -- better document tcp doff and > header length assumptions. > > Reduces multiply/shifts, marginally improving speed. > > Retains redundant tcp header length checks before checksumming. > > Stand-alone patch, originally developed for TCPCT. > > Requires: > net: tcp_header_len_th and tcp_option_len_th > > Signed-off-by: William.Allen.Simpson@gmail.com > --- > net/ipv4/tcp_ipv4.c | 36 +++++++++++++++++++------------- > net/ipv6/tcp_ipv6.c | 56 > ++++++++++++++++++++++++++++---------------------- > 2 files changed, 52 insertions(+), 40 deletions(-) Seems fine, but : 1) What means the "Transformed ?" you wrote several times ? 2) This part : - th = tcp_hdr(skb); - - if (th->doff < sizeof(struct tcphdr)/4) + /* Check bad doff, compare doff directly to constant value */ + tcp_header_len = tcp_hdr(skb)->doff; + if (tcp_header_len < (sizeof(struct tcphdr) / 4)) goto bad_packet; - if (!pskb_may_pull(skb, th->doff*4)) + + /* Check too short header and options */ + tcp_header_len *= 4; + if (!pskb_may_pull(skb, tcp_header_len)) goto discard_it; could be : (no need for 4 multiplies/divides) tcp_header_len = tcp_header_len_th(tcp_hdr(skb)); if (tcp_header_len < sizeof(struct tcphdr)) goto bad_packet; /* Check too short header and options */ if (!pskb_may_pull(skb, tcp_header_len)) goto discard_it; -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Eric Dumazet wrote: > Seems fine, but : > > 1) What means the "Transformed ?" you wrote several times ? > The only reason that I've been able to figure out for having the skb->len test in those places is the preceding xfrm4_policy_check() or xfrm6_policy_check() must be able to shrink the skb->len? When I did the original transform stuff in other code circa 1995, I'd envisioned IP length or link layer (PPP) length shrinking (removing padding after block ciphers) -- and apparently this implementation extended that concept to transport layer, too. Personally, I'd prefer that a single test be placed in the appropriate spot in the xfrm* functions, instead. Anybody know where? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Eric Dumazet wrote: > 2) This part : > > - th = tcp_hdr(skb); > - > - if (th->doff < sizeof(struct tcphdr)/4) > + /* Check bad doff, compare doff directly to constant value */ > + tcp_header_len = tcp_hdr(skb)->doff; > + if (tcp_header_len < (sizeof(struct tcphdr) / 4)) > goto bad_packet; > - if (!pskb_may_pull(skb, th->doff*4)) > + > + /* Check too short header and options */ > + tcp_header_len *= 4; > + if (!pskb_may_pull(skb, tcp_header_len)) > goto discard_it; > > > could be : (no need for 4 multiplies/divides) > > tcp_header_len = tcp_header_len_th(tcp_hdr(skb)); > if (tcp_header_len < sizeof(struct tcphdr)) > goto bad_packet; > /* Check too short header and options */ > if (!pskb_may_pull(skb, tcp_header_len)) > goto discard_it; > Actually, tcp_header_len_th() has a multiply by 4 in it, too. My code exactly parallels the existing code. That is slightly faster for bad packets, as it does the raw comparison first, saving the multiply by 4 until it has passed that test. My change just saves a store (and maybe a register load) over the existing code. This is supposed to be "fast path" after all.... Also adds comments that explain what we're doing. As I mentioned earlier in the thread: ... back on Nov 10th, Ilpo brought to my attention -- *hidden* inside the pskb_may_pull() -- the tcp header length is range checked for being too short (skb->len < th->doff * 4). Anytime I find the code isn't obvious to me, I figure the next person will benefit from some more comments.... (Of course, this patch also fixes existing comments that are not true!) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
William Allen Simpson wrote: > Eric Dumazet wrote: >> Seems fine, but : >> >> 1) What means the "Transformed ?" you wrote several times ? >> > The only reason that I've been able to figure out for having the > skb->len test in those places is the preceding xfrm4_policy_check() > or xfrm6_policy_check() must be able to shrink the skb->len? > > When I did the original transform stuff in other code circa 1995, I'd > envisioned IP length or link layer (PPP) length shrinking (removing > padding after block ciphers) -- and apparently this implementation > extended that concept to transport layer, too. > > Personally, I'd prefer that a single test be placed in the appropriate > spot in the xfrm* functions, instead. Anybody know where? > I've spent another day staring at the xfrm* functions. Since nobody familiar with them has answered my recent questions, it seems I'm on my own.... So, here are my conclusions: The current xfrm* code shouldn't change the TCP header. If anything did, the current tests wouldn't work anyway. For example: tcp_ipv4: tcp_v4_rcv() ... 1645 no_tcp_socket: 1646 if (!xfrm4_policy_check(NULL, XFRM_POLICY_IN, skb)) 1647 goto discard_it; 1648 1649 if (skb->len < (th->doff << 2) || tcp_checksum_complete(skb)) { This code depends on the *th pointer remaining unchanged. A pullup or skb clone could make the pointer invalid. Likewise, the checksum occurs after the xfrm* code. Thus, the xfrm* cannot alter, decrypt, or tunnel the input data. Therefore, I'll remove those existing extraneous skb->len tests. And I'll add these criteria to the include/net/xfrm.h for future reference. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index 65b8ebf..7ea2cff 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -1559,6 +1559,7 @@ int tcp_v4_do_rcv(struct sock *sk, struct sk_buff *skb) return 0; } + /* Transformed? re-check too short header and options before checksum */ if (skb->len < tcp_hdrlen(skb) || tcp_checksum_complete(skb)) goto csum_err; @@ -1601,14 +1602,14 @@ csum_err: } /* - * From tcp_input.c + * Called by ip_input.c: ip_local_deliver_finish() */ - int tcp_v4_rcv(struct sk_buff *skb) { const struct iphdr *iph; struct tcphdr *th; struct sock *sk; + int tcp_header_len; int ret; struct net *net = dev_net(skb->dev); @@ -1618,20 +1619,23 @@ int tcp_v4_rcv(struct sk_buff *skb) /* Count it even if it's bad */ TCP_INC_STATS_BH(net, TCP_MIB_INSEGS); + /* Check too short header */ if (!pskb_may_pull(skb, sizeof(struct tcphdr))) goto discard_it; - th = tcp_hdr(skb); - - if (th->doff < sizeof(struct tcphdr) / 4) + /* Check bad doff, compare doff directly to constant value */ + tcp_header_len = tcp_hdr(skb)->doff; + if (tcp_header_len < (sizeof(struct tcphdr) / 4)) goto bad_packet; - if (!pskb_may_pull(skb, th->doff * 4)) + + /* Check too short header and options */ + tcp_header_len *= 4; + if (!pskb_may_pull(skb, tcp_header_len)) goto discard_it; - /* An explanation is required here, I think. - * Packet length and doff are validated by header prediction, - * provided case of th->doff==0 is eliminated. - * So, we defer the checks. */ + /* Packet length and doff are validated by header prediction, + * provided case of th->doff == 0 is eliminated (above). + */ if (!skb_csum_unnecessary(skb) && tcp_v4_checksum_init(skb)) goto bad_packet; @@ -1639,7 +1643,7 @@ int tcp_v4_rcv(struct sk_buff *skb) iph = ip_hdr(skb); TCP_SKB_CB(skb)->seq = ntohl(th->seq); TCP_SKB_CB(skb)->end_seq = (TCP_SKB_CB(skb)->seq + th->syn + th->fin + - skb->len - th->doff * 4); + skb->len - tcp_header_len); TCP_SKB_CB(skb)->ack_seq = ntohl(th->ack_seq); TCP_SKB_CB(skb)->when = 0; TCP_SKB_CB(skb)->flags = iph->tos; @@ -1682,14 +1686,14 @@ process: bh_unlock_sock(sk); sock_put(sk); - return ret; no_tcp_socket: if (!xfrm4_policy_check(NULL, XFRM_POLICY_IN, skb)) goto discard_it; - if (skb->len < (th->doff << 2) || tcp_checksum_complete(skb)) { + /* Transformed? re-check too short header and options before checksum */ + if (skb->len < tcp_header_len_th(th) || tcp_checksum_complete(skb)) { bad_packet: TCP_INC_STATS_BH(net, TCP_MIB_INERRS); } else { @@ -1711,18 +1715,20 @@ do_time_wait: goto discard_it; } - if (skb->len < (th->doff << 2) || tcp_checksum_complete(skb)) { + /* Transformed? re-check too short header and options before checksum */ + if (skb->len < tcp_header_len_th(th) || tcp_checksum_complete(skb)) { TCP_INC_STATS_BH(net, TCP_MIB_INERRS); inet_twsk_put(inet_twsk(sk)); goto discard_it; } + switch (tcp_timewait_state_process(inet_twsk(sk), skb, th)) { case TCP_TW_SYN: { struct sock *sk2 = inet_lookup_listener(dev_net(skb->dev), &tcp_hashinfo, iph->daddr, th->dest, inet_iif(skb)); - if (sk2) { + if (sk2 != NULL) { inet_twsk_deschedule(inet_twsk(sk), &tcp_death_row); inet_twsk_put(inet_twsk(sk)); sk = sk2; diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index febfd59..268bc8a 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -1594,6 +1594,7 @@ static int tcp_v6_do_rcv(struct sock *sk, struct sk_buff *skb) return 0; } + /* Transformed? re-check too short header and options before checksum */ if (skb->len < tcp_hdrlen(skb) || tcp_checksum_complete(skb)) goto csum_err; @@ -1664,38 +1665,47 @@ ipv6_pktoptions: return 0; } +/* + * Called by ip6_input.c: ip6_input_finish() + */ static int tcp_v6_rcv(struct sk_buff *skb) { struct tcphdr *th; struct sock *sk; + int tcp_header_len; int ret; struct net *net = dev_net(skb->dev); if (skb->pkt_type != PACKET_HOST) goto discard_it; - /* - * Count it even if it's bad. - */ + /* Count it even if it's bad */ TCP_INC_STATS_BH(net, TCP_MIB_INSEGS); + /* Check too short header */ if (!pskb_may_pull(skb, sizeof(struct tcphdr))) goto discard_it; - th = tcp_hdr(skb); - - if (th->doff < sizeof(struct tcphdr)/4) + /* Check bad doff, compare doff directly to constant value */ + tcp_header_len = tcp_hdr(skb)->doff; + if (tcp_header_len < (sizeof(struct tcphdr) / 4)) goto bad_packet; - if (!pskb_may_pull(skb, th->doff*4)) + + /* Check too short header and options */ + tcp_header_len *= 4; + if (!pskb_may_pull(skb, tcp_header_len)) goto discard_it; + /* Packet length and doff are validated by header prediction, + * provided case of th->doff == 0 is eliminated (above). + */ if (!skb_csum_unnecessary(skb) && tcp_v6_checksum_init(skb)) goto bad_packet; th = tcp_hdr(skb); TCP_SKB_CB(skb)->seq = ntohl(th->seq); TCP_SKB_CB(skb)->end_seq = (TCP_SKB_CB(skb)->seq + th->syn + th->fin + - skb->len - th->doff*4); + skb->len - tcp_header_len); TCP_SKB_CB(skb)->ack_seq = ntohl(th->ack_seq); TCP_SKB_CB(skb)->when = 0; TCP_SKB_CB(skb)->flags = ipv6_get_dsfield(ipv6_hdr(skb)); @@ -1743,7 +1753,8 @@ no_tcp_socket: if (!xfrm6_policy_check(NULL, XFRM_POLICY_IN, skb)) goto discard_it; - if (skb->len < (th->doff<<2) || tcp_checksum_complete(skb)) { + /* Transformed? re-check too short header and options before checksum */ + if (skb->len < tcp_header_len_th(th) || tcp_checksum_complete(skb)) { bad_packet: TCP_INC_STATS_BH(net, TCP_MIB_INERRS); } else { @@ -1751,11 +1762,7 @@ bad_packet: } discard_it: - - /* - * Discard frame - */ - + /* Discard frame. */ kfree_skb(skb); return 0; @@ -1769,24 +1776,23 @@ do_time_wait: goto discard_it; } - if (skb->len < (th->doff<<2) || tcp_checksum_complete(skb)) { + /* Transformed? re-check too short header and options before checksum */ + if (skb->len < tcp_header_len_th(th) || tcp_checksum_complete(skb)) { TCP_INC_STATS_BH(net, TCP_MIB_INERRS); inet_twsk_put(inet_twsk(sk)); goto discard_it; } switch (tcp_timewait_state_process(inet_twsk(sk), skb, th)) { - case TCP_TW_SYN: - { - struct sock *sk2; - - sk2 = inet6_lookup_listener(dev_net(skb->dev), &tcp_hashinfo, - &ipv6_hdr(skb)->daddr, - ntohs(th->dest), inet6_iif(skb)); + case TCP_TW_SYN: { + struct sock *sk2 = inet6_lookup_listener(dev_net(skb->dev), + &tcp_hashinfo, + &ipv6_hdr(skb)->daddr, + ntohs(th->dest), + inet6_iif(skb)); if (sk2 != NULL) { - struct inet_timewait_sock *tw = inet_twsk(sk); - inet_twsk_deschedule(tw, &tcp_death_row); - inet_twsk_put(tw); + inet_twsk_deschedule(inet_twsk(sk), &tcp_death_row); + inet_twsk_put(inet_twsk(sk)); sk = sk2; goto process; }
Harmonize tcp_v4_rcv() and tcp_v6_rcv() -- better document tcp doff and header length assumptions. Reduces multiply/shifts, marginally improving speed. Retains redundant tcp header length checks before checksumming. Stand-alone patch, originally developed for TCPCT. Requires: net: tcp_header_len_th and tcp_option_len_th Signed-off-by: William.Allen.Simpson@gmail.com --- net/ipv4/tcp_ipv4.c | 36 +++++++++++++++++++------------- net/ipv6/tcp_ipv6.c | 56 ++++++++++++++++++++++++++++---------------------- 2 files changed, 52 insertions(+), 40 deletions(-)