Patchwork Connection tracking and vlan

login
register
mail settings
Submitter Adayadil Thomas
Date Nov. 2, 2009, 4:14 p.m.
Message ID <fb7befa20911020814q2c4bcd1bj7e2b5a4c17ba0f89@mail.gmail.com>
Download mbox | patch
Permalink /patch/37437/
State RFC
Delegated to: David Miller
Headers show

Comments

Adayadil Thomas - Nov. 2, 2009, 4:14 p.m.
If the vlan id is used for hash, it still may not avoid the problem completely,
i.e. in case of both connections hashing to the same bucket.

I was wondering about your opinion about adding an optional member to the tuple
structure, vid (for vlan id).

I have attached the patch for this change. I would be grateful for any comments
such as dependencies on the rest of the system.


Thanks much



On Fri, Oct 30, 2009 at 6:25 PM, Ben Greear <greearb@candelatech.com> wrote:
> On 10/30/2009 04:15 PM, Eric W. Biederman wrote:
>
>>> If ip_conntrack does not consider vlans, it is possible that all 5
>>> tuple are the same
>>> and thus affect the connection tracking.
>>>
>>> I hope I have described the scenario well. If not I can explain in a
>>> more detailed fashion.
>>
>> Unless you have multiple network namespaces linux assumes all packets are
>> in the same ip space.  And 10.10.10.1 is the same machine no matter
>> which interface you talk to it on.
>
> It only takes a relatively small patch that lets conn-track hash on a
> skb->foo_mark, and allow that mark to be set on incoming packets
> based on netdevice or whatever, (before the conn-track lookup is
> done).
>
> This is logically somewhat similar to using multiple routing
> tables and has been working well for me for several years....
>
> Thanks,
> Ben
>
> --
> Ben Greear <greearb@candelatech.com>
> Candela Technologies Inc  http://www.candelatech.com
>
>
Patrick McHardy - Nov. 2, 2009, 4:33 p.m.
Adayadil Thomas wrote:
> If the vlan id is used for hash, it still may not avoid the problem completely,
> i.e. in case of both connections hashing to the same bucket.
> 
> I was wondering about your opinion about adding an optional member to the tuple
> structure, vid (for vlan id).
>
> I have attached the patch for this change. I would be grateful for any comments
> such as dependencies on the rest of the system.

Absolutely not, conntrack is not meant to deal with anything below
the network layer and I don't want to add any hacks for the bridge
netfilter "integration", which has already caused an endless amount
of problems. Additionally this is just one of many possible identifiers
people might want to use to distinguish similar entries and has a
number of practical issues, like breaking asymetric setups using
different VLANs for each direction.

I might be willing to consider a generically usable numerical
identifier to distinguish similar entries, something like
"conntrack zones". This could also help with the defragmentation
issue discussed earlier, the identifier would also be added to
the defragmentation identifier, for asymetric setups the interfaces
would be put in the same "zone".

But it would be preferrable if we could do this using network
namespaces somehow.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet - Nov. 2, 2009, 4:41 p.m.
Patrick McHardy a écrit :
> 
> But it would be preferrable if we could do this using network
> namespaces somehow.

eg eth0.100 and eth0.101 on namespace 1,  eth0.200 and eth0.201 on namespace 2 ?

Can we do that with current kernel ? (different vlans on an unique physical device)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Patrick McHardy - Nov. 2, 2009, 4:48 p.m.
Eric Dumazet wrote:
> Patrick McHardy a écrit :
>> But it would be preferrable if we could do this using network
>> namespaces somehow.
> 
> eg eth0.100 and eth0.101 on namespace 1,  eth0.200 and eth0.201 on namespace 2 ?
> 
> Can we do that with current kernel ? (different vlans on an unique physical device)

By default the underlying device needs to exist in the same namespace
in which the VLAN device is created. I believe it should be possible
to move the VLAN device to a different namespace after creating it,
but I'm not sure about that.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric W. Biederman - Nov. 2, 2009, 5:02 p.m.
Patrick McHardy <kaber@trash.net> writes:

> Eric Dumazet wrote:
>> Patrick McHardy a écrit :
>>> But it would be preferrable if we could do this using network
>>> namespaces somehow.
>> 
>> eg eth0.100 and eth0.101 on namespace 1,  eth0.200 and eth0.201 on namespace 2 ?
>> 
>> Can we do that with current kernel ? (different vlans on an unique physical device)
>
> By default the underlying device needs to exist in the same namespace
> in which the VLAN device is created. I believe it should be possible
> to move the VLAN device to a different namespace after creating it,
> but I'm not sure about that.

There should be no problem moving the vlan device after it is created.

Eric

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

--- linux-2.6.20.21/include/linux/netfilter_ipv4/ip_conntrack_core.h	2007-10-17 15:31:14.000000000 -0400
+++ linux-2.6.20.21.mod/include/linux/netfilter_ipv4/ip_conntrack_core.h	2009-11-02 09:53:44.000000000 -0500
@@ -24,7 +24,12 @@ 
 		const struct sk_buff *skb,
 		unsigned int dataoff,
 		struct ip_conntrack_tuple *tuple,
+#ifdef CONFIG_IP_NF_CONNTRACK_VLAN
+		const struct ip_conntrack_protocol *protocol,
+		unsigned short vid);
+#else
 		const struct ip_conntrack_protocol *protocol);
+#endif
 
 extern int
 ip_ct_invert_tuple(struct ip_conntrack_tuple *inverse,
--- linux-2.6.20.21/include/linux/netfilter_ipv4/ip_conntrack_tuple.h	2007-10-17 15:31:14.000000000 -0400
+++ linux-2.6.20.21.mod/include/linux/netfilter_ipv4/ip_conntrack_tuple.h	2009-11-02 09:56:09.000000000 -0500
@@ -79,6 +79,10 @@ 
 		/* The direction (for tuplehash) */
 		u_int8_t dir;
 	} dst;
+#ifdef CONFIG_IP_NF_CONNTRACK_VLAN
+	__be16 vid;
+#endif
+
 };
 
 /* This is optimized opposed to a memset of the whole structure.  Everything we
@@ -87,6 +91,9 @@ 
 	do {							\
 		(tuple)->src.u.all = 0;				\
 		(tuple)->dst.u.all = 0;				\
+#ifdef CONFIG_IP_NF_CONNTRACK_VLAN
+		(tuple)->vid = 0;				\
+#endif
 	} while (0)
 
 #ifdef __KERNEL__
@@ -125,10 +132,22 @@ 
 		&& t1->dst.protonum == t2->dst.protonum;
 }
 
+#ifdef CONFIG_IP_NF_CONNTRACK_VLAN
+static inline int ip_ct_tuple_vlan_equal(const struct ip_conntrack_tuple *t1,
+				         const struct ip_conntrack_tuple *t2)
+{
+	return t1->vid == t2->vid;
+}
+#endif
+
 static inline int ip_ct_tuple_equal(const struct ip_conntrack_tuple *t1,
 				    const struct ip_conntrack_tuple *t2)
 {
+#ifdef CONFIG_IP_NF_CONNTRACK_VLAN
+	return ip_ct_tuple_src_equal(t1, t2) && ip_ct_tuple_dst_equal(t1, t2) && ip_ct_tuple_vlan_equal(t1, t2);
+#else
 	return ip_ct_tuple_src_equal(t1, t2) && ip_ct_tuple_dst_equal(t1, t2);
+#endif
 }
 
 static inline int ip_ct_tuple_mask_cmp(const struct ip_conntrack_tuple *t,
--- linux-2.6.20.21/net/ipv4/netfilter/ip_conntrack_core.c	2009-11-02 10:57:49.000000000 -0500
+++ linux-2.6.20.21.mod/net/ipv4/netfilter/ip_conntrack_core.c	2009-11-02 09:57:57.000000000 -0500
@@ -45,6 +45,7 @@ 
 #include <linux/percpu.h>
 #include <linux/moduleparam.h>
 #include <linux/notifier.h>
+#include <linux/if_vlan.h>
 
 /* ip_conntrack_lock protects the main hash table, protocol/helper/expected
    registrations, conntrack timers*/
@@ -181,7 +182,8 @@ 
 		const struct sk_buff *skb,
 		unsigned int dataoff,
 		struct ip_conntrack_tuple *tuple,
-		const struct ip_conntrack_protocol *protocol)
+		const struct ip_conntrack_protocol *protocol,
+		unsigned short vid)
 {
 	/* Never happen */
 	if (iph->frag_off & htons(IP_OFFSET)) {
@@ -194,6 +196,7 @@ 
 	tuple->dst.ip = iph->daddr;
 	tuple->dst.protonum = iph->protocol;
 	tuple->dst.dir = IP_CT_DIR_ORIGINAL;
+	tuple->vid = vid;
 
 	return protocol->pkt_to_tuple(skb, dataoff, tuple);
 }
@@ -206,6 +209,7 @@ 
 	inverse->src.ip = orig->dst.ip;
 	inverse->dst.ip = orig->src.ip;
 	inverse->dst.protonum = orig->dst.protonum;
+	inverse->vid = orig->vid;
 	inverse->dst.dir = !orig->dst.dir;
 
 	return protocol->invert_tuple(inverse, orig);
@@ -779,11 +783,26 @@ 
 	struct ip_conntrack_tuple tuple;
 	struct ip_conntrack_tuple_hash *h;
 	struct ip_conntrack *ct;
+#ifdef CONFIG_IP_NF_CONNTRACK_VLAN
+	struct vlan_ethhdr *vhdr;
+	unsigned short vlan_TCI, vid = 8192;
+
+	vhdr = vlan_eth_hdr(skb);
+	if (vhdr && vhdr->h_vlan_proto == __constant_htons(ETH_P_8021Q)){
+		vlan_TCI = ntohs(vhdr->h_vlan_TCI);
+		vid = (vlan_TCI & VLAN_VID_MASK);
+	}
+#endif
 
 	IP_NF_ASSERT((skb->nh.iph->frag_off & htons(IP_OFFSET)) == 0);
 
+#ifdef CONFIG_IP_NF_CONNTRACK_VLAN
 	if (!ip_ct_get_tuple(skb->nh.iph, skb, skb->nh.iph->ihl*4, 
+				&tuple,proto,vid))
+#else
+	if (!ip_ct_get_tuple(skb->nh.iph, skb, skb->nh.iph->ihl*4,
 				&tuple,proto))
+#endif
 		return NULL;
 
 	/* look for tuple match */
--- linux-2.6.20.21/net/ipv4/netfilter/ip_conntrack_proto_icmp.c	2007-10-17 15:31:14.000000000 -0400
+++ linux-2.6.20.21.mod/net/ipv4/netfilter/ip_conntrack_proto_icmp.c	2009-11-02 09:58:52.000000000 -0500
@@ -20,6 +20,7 @@ 
 #include <linux/netfilter_ipv4/ip_conntrack.h>
 #include <linux/netfilter_ipv4/ip_conntrack_core.h>
 #include <linux/netfilter_ipv4/ip_conntrack_protocol.h>
+#include <linux/if_vlan.h>
 
 unsigned int ip_ct_icmp_timeout __read_mostly = 30*HZ;
 
@@ -146,6 +147,16 @@ 
 	struct ip_conntrack_protocol *innerproto;
 	struct ip_conntrack_tuple_hash *h;
 	int dataoff;
+#ifdef CONFIG_IP_NF_CONNTRACK_VLAN
+	struct vlan_ethhdr *vhdr;
+	unsigned short vlan_TCI, vid = 8192;
+
+	vhdr = vlan_eth_hdr(skb);
+	if (vhdr && vhdr->h_vlan_proto == __constant_htons(ETH_P_8021Q)){
+		vlan_TCI = ntohs(vhdr->h_vlan_TCI);
+		vid = (vlan_TCI & VLAN_VID_MASK);
+	}
+#endif
 
 	IP_NF_ASSERT(skb->nfct == NULL);
 
@@ -164,7 +175,11 @@ 
 	innerproto = ip_conntrack_proto_find_get(inside->ip.protocol);
 	dataoff = skb->nh.iph->ihl*4 + sizeof(inside->icmp) + inside->ip.ihl*4;
 	/* Are they talking about one of our connections? */
+#ifdef CONFIG_IP_NF_CONNTRACK_VLAN
+	if (!ip_ct_get_tuple(&inside->ip, skb, dataoff, &origtuple, innerproto, vid)) {
+#else
 	if (!ip_ct_get_tuple(&inside->ip, skb, dataoff, &origtuple, innerproto)) {
+#endif
 		DEBUGP("icmp_error: ! get_tuple p=%u", inside->ip.protocol);
 		ip_conntrack_proto_put(innerproto);
 		return -NF_ACCEPT;
--- linux-2.6.20.21/net/ipv4/netfilter/ip_nat_core.c	2009-11-02 10:57:49.000000000 -0500
+++ linux-2.6.20.21.mod/net/ipv4/netfilter/ip_nat_core.c	2009-11-02 10:01:08.000000000 -0500
@@ -23,6 +23,7 @@ 
 #include <linux/icmp.h>
 #include <linux/udp.h>
 #include <linux/jhash.h>
+#include <linux/if_vlan.h>
 
 #include <linux/netfilter_ipv4/ip_conntrack.h>
 #include <linux/netfilter_ipv4/ip_conntrack_core.h>
@@ -821,6 +822,17 @@ 
 	unsigned long statusbit;
 	enum ip_nat_manip_type manip = HOOK2MANIP(hooknum);
 
+#ifdef CONFIG_IP_NF_CONNTRACK_VLAN
+	struct vlan_ethhdr *vhdr;
+	unsigned short vlan_TCI, vid = 8192;
+
+	vhdr = vlan_eth_hdr(*pskb);
+	if (vhdr && vhdr->h_vlan_proto == __constant_htons(ETH_P_8021Q)){
+		vlan_TCI = ntohs(vhdr->h_vlan_TCI);
+		vid = (vlan_TCI & VLAN_VID_MASK);
+	}
+#endif
+
 	if (!skb_make_writable(pskb, hdrlen + sizeof(*inside)))
 		return 0;
 
@@ -850,10 +862,14 @@ 
 	DEBUGP("icmp_reply_translation: translating error %p manp %u dir %s\n",
 	       *pskb, manip, dir == IP_CT_DIR_ORIGINAL ? "ORIG" : "REPLY");
 
+#ifdef CONFIG_IP_NF_CONNTRACK_VLAN
 	if (!ip_ct_get_tuple(&inside->ip, *pskb, (*pskb)->nh.iph->ihl*4 +
 	                     sizeof(struct icmphdr) + inside->ip.ihl*4,
 	                     &inner,
+			     __ip_conntrack_proto_find(inside->ip.protocol), vid))
+#else
 			     __ip_conntrack_proto_find(inside->ip.protocol)))
+#endif
 		return 0;
 
 	/* Change inner back to look like incoming packet.  We do the