From patchwork Mon Nov 2 16:14:02 2009 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Adayadil Thomas X-Patchwork-Id: 37437 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.176.167]) by ozlabs.org (Postfix) with ESMTP id 84C0E1007D3 for ; Tue, 3 Nov 2009 03:14:12 +1100 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755379AbZKBQN7 (ORCPT ); Mon, 2 Nov 2009 11:13:59 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1755336AbZKBQN7 (ORCPT ); Mon, 2 Nov 2009 11:13:59 -0500 Received: from mail-vw0-f192.google.com ([209.85.212.192]:48117 "EHLO mail-vw0-f192.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755324AbZKBQN6 (ORCPT ); Mon, 2 Nov 2009 11:13:58 -0500 Received: by vws30 with SMTP id 30so1313643vws.33 for ; Mon, 02 Nov 2009 08:14:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:cc:content-type; bh=v8s6lFvxVbFhFJSEQKbDeLDpEWaGKCY3vq80hFQRUDc=; b=B4Mi/AIBF08OsyHvnWE1bGhJb++7NQQilE13MyDKUsQ2xZl1VREbntM/hNAAHNoT0L U4MqGmBn/Tne5toUG1kMHYXipJeT4sRWQciJfd9FdQOSuP+rYrDT0JBWwm3zjZTW6uZx IIjfx6UUd44CcQ/R2Q9p53mqnAA1U9PRaU0b0= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=M+WbtG506VKwfjBU02eZUA1QvTO7PkS9a1CtqBoalVcBOC4YOvgW564PwOeNQ9ouRB a8rmPLocFdW8G7eOiDNl3mYFZXWZs0GFJqsTVvdMkntBfB4FkoqAuAhth0Pj0O4vYzvd EPyPd2S91VbD3A4dnptdyFar9dRVdiuClYaCM= MIME-Version: 1.0 Received: by 10.220.121.131 with SMTP id h3mr5237066vcr.42.1257178442552; Mon, 02 Nov 2009 08:14:02 -0800 (PST) In-Reply-To: <4AEB75DD.8050204@candelatech.com> References: <20091030152054.GA7936@gondor.apana.org.au> <4AEB06E6.6020206@gmail.com> <4AEB75DD.8050204@candelatech.com> Date: Mon, 2 Nov 2009 11:14:02 -0500 Message-ID: Subject: Re: Connection tracking and vlan From: Adayadil Thomas To: Ben Greear Cc: "Eric W. Biederman" , Eric Dumazet , Herbert Xu , netdev@vger.kernel.org, Patrick McHardy Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org If the vlan id is used for hash, it still may not avoid the problem completely, i.e. in case of both connections hashing to the same bucket. I was wondering about your opinion about adding an optional member to the tuple structure, vid (for vlan id). I have attached the patch for this change. I would be grateful for any comments such as dependencies on the rest of the system. Thanks much On Fri, Oct 30, 2009 at 6:25 PM, Ben Greear wrote: > On 10/30/2009 04:15 PM, Eric W. Biederman wrote: > >>> If ip_conntrack does not consider vlans, it is possible that all 5 >>> tuple are the same >>> and thus affect the connection tracking. >>> >>> I hope I have described the scenario well. If not I can explain in a >>> more detailed fashion. >> >> Unless you have multiple network namespaces linux assumes all packets are >> in the same ip space.  And 10.10.10.1 is the same machine no matter >> which interface you talk to it on. > > It only takes a relatively small patch that lets conn-track hash on a > skb->foo_mark, and allow that mark to be set on incoming packets > based on netdevice or whatever, (before the conn-track lookup is > done). > > This is logically somewhat similar to using multiple routing > tables and has been working well for me for several years.... > > Thanks, > Ben > > -- > Ben Greear > Candela Technologies Inc  http://www.candelatech.com > > --- linux-2.6.20.21/include/linux/netfilter_ipv4/ip_conntrack_core.h 2007-10-17 15:31:14.000000000 -0400 +++ linux-2.6.20.21.mod/include/linux/netfilter_ipv4/ip_conntrack_core.h 2009-11-02 09:53:44.000000000 -0500 @@ -24,7 +24,12 @@ const struct sk_buff *skb, unsigned int dataoff, struct ip_conntrack_tuple *tuple, +#ifdef CONFIG_IP_NF_CONNTRACK_VLAN + const struct ip_conntrack_protocol *protocol, + unsigned short vid); +#else const struct ip_conntrack_protocol *protocol); +#endif extern int ip_ct_invert_tuple(struct ip_conntrack_tuple *inverse, --- linux-2.6.20.21/include/linux/netfilter_ipv4/ip_conntrack_tuple.h 2007-10-17 15:31:14.000000000 -0400 +++ linux-2.6.20.21.mod/include/linux/netfilter_ipv4/ip_conntrack_tuple.h 2009-11-02 09:56:09.000000000 -0500 @@ -79,6 +79,10 @@ /* The direction (for tuplehash) */ u_int8_t dir; } dst; +#ifdef CONFIG_IP_NF_CONNTRACK_VLAN + __be16 vid; +#endif + }; /* This is optimized opposed to a memset of the whole structure. Everything we @@ -87,6 +91,9 @@ do { \ (tuple)->src.u.all = 0; \ (tuple)->dst.u.all = 0; \ +#ifdef CONFIG_IP_NF_CONNTRACK_VLAN + (tuple)->vid = 0; \ +#endif } while (0) #ifdef __KERNEL__ @@ -125,10 +132,22 @@ && t1->dst.protonum == t2->dst.protonum; } +#ifdef CONFIG_IP_NF_CONNTRACK_VLAN +static inline int ip_ct_tuple_vlan_equal(const struct ip_conntrack_tuple *t1, + const struct ip_conntrack_tuple *t2) +{ + return t1->vid == t2->vid; +} +#endif + static inline int ip_ct_tuple_equal(const struct ip_conntrack_tuple *t1, const struct ip_conntrack_tuple *t2) { +#ifdef CONFIG_IP_NF_CONNTRACK_VLAN + return ip_ct_tuple_src_equal(t1, t2) && ip_ct_tuple_dst_equal(t1, t2) && ip_ct_tuple_vlan_equal(t1, t2); +#else return ip_ct_tuple_src_equal(t1, t2) && ip_ct_tuple_dst_equal(t1, t2); +#endif } static inline int ip_ct_tuple_mask_cmp(const struct ip_conntrack_tuple *t, --- linux-2.6.20.21/net/ipv4/netfilter/ip_conntrack_core.c 2009-11-02 10:57:49.000000000 -0500 +++ linux-2.6.20.21.mod/net/ipv4/netfilter/ip_conntrack_core.c 2009-11-02 09:57:57.000000000 -0500 @@ -45,6 +45,7 @@ #include #include #include +#include /* ip_conntrack_lock protects the main hash table, protocol/helper/expected registrations, conntrack timers*/ @@ -181,7 +182,8 @@ const struct sk_buff *skb, unsigned int dataoff, struct ip_conntrack_tuple *tuple, - const struct ip_conntrack_protocol *protocol) + const struct ip_conntrack_protocol *protocol, + unsigned short vid) { /* Never happen */ if (iph->frag_off & htons(IP_OFFSET)) { @@ -194,6 +196,7 @@ tuple->dst.ip = iph->daddr; tuple->dst.protonum = iph->protocol; tuple->dst.dir = IP_CT_DIR_ORIGINAL; + tuple->vid = vid; return protocol->pkt_to_tuple(skb, dataoff, tuple); } @@ -206,6 +209,7 @@ inverse->src.ip = orig->dst.ip; inverse->dst.ip = orig->src.ip; inverse->dst.protonum = orig->dst.protonum; + inverse->vid = orig->vid; inverse->dst.dir = !orig->dst.dir; return protocol->invert_tuple(inverse, orig); @@ -779,11 +783,26 @@ struct ip_conntrack_tuple tuple; struct ip_conntrack_tuple_hash *h; struct ip_conntrack *ct; +#ifdef CONFIG_IP_NF_CONNTRACK_VLAN + struct vlan_ethhdr *vhdr; + unsigned short vlan_TCI, vid = 8192; + + vhdr = vlan_eth_hdr(skb); + if (vhdr && vhdr->h_vlan_proto == __constant_htons(ETH_P_8021Q)){ + vlan_TCI = ntohs(vhdr->h_vlan_TCI); + vid = (vlan_TCI & VLAN_VID_MASK); + } +#endif IP_NF_ASSERT((skb->nh.iph->frag_off & htons(IP_OFFSET)) == 0); +#ifdef CONFIG_IP_NF_CONNTRACK_VLAN if (!ip_ct_get_tuple(skb->nh.iph, skb, skb->nh.iph->ihl*4, + &tuple,proto,vid)) +#else + if (!ip_ct_get_tuple(skb->nh.iph, skb, skb->nh.iph->ihl*4, &tuple,proto)) +#endif return NULL; /* look for tuple match */ --- linux-2.6.20.21/net/ipv4/netfilter/ip_conntrack_proto_icmp.c 2007-10-17 15:31:14.000000000 -0400 +++ linux-2.6.20.21.mod/net/ipv4/netfilter/ip_conntrack_proto_icmp.c 2009-11-02 09:58:52.000000000 -0500 @@ -20,6 +20,7 @@ #include #include #include +#include unsigned int ip_ct_icmp_timeout __read_mostly = 30*HZ; @@ -146,6 +147,16 @@ struct ip_conntrack_protocol *innerproto; struct ip_conntrack_tuple_hash *h; int dataoff; +#ifdef CONFIG_IP_NF_CONNTRACK_VLAN + struct vlan_ethhdr *vhdr; + unsigned short vlan_TCI, vid = 8192; + + vhdr = vlan_eth_hdr(skb); + if (vhdr && vhdr->h_vlan_proto == __constant_htons(ETH_P_8021Q)){ + vlan_TCI = ntohs(vhdr->h_vlan_TCI); + vid = (vlan_TCI & VLAN_VID_MASK); + } +#endif IP_NF_ASSERT(skb->nfct == NULL); @@ -164,7 +175,11 @@ innerproto = ip_conntrack_proto_find_get(inside->ip.protocol); dataoff = skb->nh.iph->ihl*4 + sizeof(inside->icmp) + inside->ip.ihl*4; /* Are they talking about one of our connections? */ +#ifdef CONFIG_IP_NF_CONNTRACK_VLAN + if (!ip_ct_get_tuple(&inside->ip, skb, dataoff, &origtuple, innerproto, vid)) { +#else if (!ip_ct_get_tuple(&inside->ip, skb, dataoff, &origtuple, innerproto)) { +#endif DEBUGP("icmp_error: ! get_tuple p=%u", inside->ip.protocol); ip_conntrack_proto_put(innerproto); return -NF_ACCEPT; --- linux-2.6.20.21/net/ipv4/netfilter/ip_nat_core.c 2009-11-02 10:57:49.000000000 -0500 +++ linux-2.6.20.21.mod/net/ipv4/netfilter/ip_nat_core.c 2009-11-02 10:01:08.000000000 -0500 @@ -23,6 +23,7 @@ #include #include #include +#include #include #include @@ -821,6 +822,17 @@ unsigned long statusbit; enum ip_nat_manip_type manip = HOOK2MANIP(hooknum); +#ifdef CONFIG_IP_NF_CONNTRACK_VLAN + struct vlan_ethhdr *vhdr; + unsigned short vlan_TCI, vid = 8192; + + vhdr = vlan_eth_hdr(*pskb); + if (vhdr && vhdr->h_vlan_proto == __constant_htons(ETH_P_8021Q)){ + vlan_TCI = ntohs(vhdr->h_vlan_TCI); + vid = (vlan_TCI & VLAN_VID_MASK); + } +#endif + if (!skb_make_writable(pskb, hdrlen + sizeof(*inside))) return 0; @@ -850,10 +862,14 @@ DEBUGP("icmp_reply_translation: translating error %p manp %u dir %s\n", *pskb, manip, dir == IP_CT_DIR_ORIGINAL ? "ORIG" : "REPLY"); +#ifdef CONFIG_IP_NF_CONNTRACK_VLAN if (!ip_ct_get_tuple(&inside->ip, *pskb, (*pskb)->nh.iph->ihl*4 + sizeof(struct icmphdr) + inside->ip.ihl*4, &inner, + __ip_conntrack_proto_find(inside->ip.protocol), vid)) +#else __ip_conntrack_proto_find(inside->ip.protocol))) +#endif return 0; /* Change inner back to look like incoming packet. We do the