diff mbox

[5/5] netfilter: xt_TEE: have cloned packet travel through Xtables too

Message ID 1270031934-15940-6-git-send-email-jengelh@medozas.de
State Not Applicable, archived
Delegated to: David Miller
Headers show

Commit Message

Jan Engelhardt March 31, 2010, 10:38 a.m. UTC
Since Xtables is now reentrant/nestable, the cloned packet can also go
through Xtables and be subject to rules itself.

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
---
 net/ipv4/ip_output.c   |    1 -
 net/ipv6/ip6_output.c  |    1 -
 net/netfilter/xt_TEE.c |   18 ++----------------
 3 files changed, 2 insertions(+), 18 deletions(-)

Comments

Patrick McHardy April 1, 2010, 10:37 a.m. UTC | #1
Jan Engelhardt wrote:
> Since Xtables is now reentrant/nestable, the cloned packet can also go
> through Xtables and be subject to rules itself.

That sounds dangerous if conntrack isn't used to prevent loops.
Is that really useful? For filtering, you can simply apply the
rules before deciding to TEE the packet.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jan Engelhardt April 1, 2010, 11:03 a.m. UTC | #2
On Thursday 2010-04-01 12:37, Patrick McHardy wrote:

>Jan Engelhardt wrote:
>> Since Xtables is now reentrant/nestable, the cloned packet can also go
>> through Xtables and be subject to rules itself.
>
>That sounds dangerous if conntrack isn't used to prevent loops.

Conntrack loops are prevented by using a dummy conntrack, just as 
NOTRACK does.

>Is that really useful? For filtering, you can simply apply the
>rules before deciding to TEE the packet.

I can think of a handful of applications:
 - CLASSIFY

 - When the cloned packets gets XFRMed or tunneled, its status switches 
   from "special" to "plain". Doing policy routing on them does not seem 
   so far-fetched.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Patrick McHardy April 1, 2010, 11:09 a.m. UTC | #3
Jan Engelhardt wrote:
> On Thursday 2010-04-01 12:37, Patrick McHardy wrote:
> 
>> Jan Engelhardt wrote:
>>> Since Xtables is now reentrant/nestable, the cloned packet can also go
>>> through Xtables and be subject to rules itself.
>> That sounds dangerous if conntrack isn't used to prevent loops.
> 
> Conntrack loops are prevented by using a dummy conntrack, just as 
> NOTRACK does.

My question was about the case without conntrack.

>> Is that really useful? For filtering, you can simply apply the
>> rules before deciding to TEE the packet.
> 
> I can think of a handful of applications:
>  - CLASSIFY

Good point, you should probably reset a couple of skb members
after the skb_copy().

>  - When the cloned packets gets XFRMed or tunneled, its status switches 
>    from "special" to "plain". Doing policy routing on them does not seem 
>    so far-fetched.

Fair enough, provided we can also handle loops when conntrack
isn't used.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jan Engelhardt April 1, 2010, 1:15 p.m. UTC | #4
On Thursday 2010-04-01 13:09, Patrick McHardy wrote:

>> Conntrack loops are prevented by using a dummy conntrack, just as 
>> NOTRACK does.
>[...]
>>  - When the cloned packets gets XFRMed or tunneled, its status switches 
>>    from "special" to "plain". Doing policy routing on them does not seem 
>>    so far-fetched.
>
>My question was about the case without conntrack.

Hm. Do you have any suggestion in countering a case whereby a user
does -I OUTPUT -j TEE without conntrack?

Perhaps making nesting a feature that requires conntrack, such that the 
non-CT case can't loop?

>> I can think of a handful of applications:
>>  - CLASSIFY
>
>Good point, you should probably reset a couple of skb members
>after the skb_copy().

I take it you mean

 nf_reset(skb)
 skb->mark = 0;
 skb_init_secmark(nskb);

Or should we be using skb_alloc and copying the data portion over, like 
ipt_REJECT does since v2.6.24-2931-g9ba99b0?

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Patrick McHardy April 1, 2010, 1:22 p.m. UTC | #5
Jan Engelhardt wrote:
> On Thursday 2010-04-01 13:09, Patrick McHardy wrote:
> 
>>> Conntrack loops are prevented by using a dummy conntrack, just as 
>>> NOTRACK does.
>> [...]
>>>  - When the cloned packets gets XFRMed or tunneled, its status switches 
>>>    from "special" to "plain". Doing policy routing on them does not seem 
>>>    so far-fetched.
>> My question was about the case without conntrack.
> 
> Hm. Do you have any suggestion in countering a case whereby a user
> does -I OUTPUT -j TEE without conntrack?
> 
> Perhaps making nesting a feature that requires conntrack, such that the 
> non-CT case can't loop?

If we drop the reentrancy thing, what should work is to prevent
using loopback as output device and using something similar to
the recursion counters tunnel devices used to have.

>>> I can think of a handful of applications:
>>>  - CLASSIFY
>> Good point, you should probably reset a couple of skb members
>> after the skb_copy().
> 
> I take it you mean
> 
>  nf_reset(skb)
>  skb->mark = 0;
>  skb_init_secmark(nskb);

Yes, basically. Although I believe the selinux people would be
happier if you kept the original secmark for the copied packets :)

> Or should we be using skb_alloc and copying the data portion over, like 
> ipt_REJECT does since v2.6.24-2931-g9ba99b0?

I guess pskb_copy() would be most optimal since we can modify
the header, but the non-linear area could be shared
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jan Engelhardt April 1, 2010, 1:44 p.m. UTC | #6
On Thursday 2010-04-01 15:22, Patrick McHardy wrote:
>>>> Conntrack loops are prevented by using a dummy conntrack, just as 
>>>> NOTRACK does.
>>> [...]
>>>>  - When the cloned packets gets XFRMed or tunneled, its status switches 
>>>>    from "special" to "plain". Doing policy routing on them does not seem 
>>>>    so far-fetched.
>>> My question was about the case without conntrack.
>> 
>> Hm. Do you have any suggestion in countering a case whereby a user
>> does -I OUTPUT -j TEE without conntrack?
>> 
>> Perhaps making nesting a feature that requires conntrack, such that the 
>> non-CT case can't loop?
>
>If we drop the reentrancy thing, what should work is to prevent
>using loopback as output device and using something similar to
>the recursion counters tunnel devices used to have.

Nah. I'm going to pick a bit from struct skbuff to indicate the
packet was teed so as to avoid that loop.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Patrick McHardy April 1, 2010, 1:48 p.m. UTC | #7
Jan Engelhardt wrote:
> On Thursday 2010-04-01 15:22, Patrick McHardy wrote:
>>>>> Conntrack loops are prevented by using a dummy conntrack, just as 
>>>>> NOTRACK does.
>>>> [...]
>>>>>  - When the cloned packets gets XFRMed or tunneled, its status switches 
>>>>>    from "special" to "plain". Doing policy routing on them does not seem 
>>>>>    so far-fetched.
>>>> My question was about the case without conntrack.
>>> Hm. Do you have any suggestion in countering a case whereby a user
>>> does -I OUTPUT -j TEE without conntrack?
>>>
>>> Perhaps making nesting a feature that requires conntrack, such that the 
>>> non-CT case can't loop?
>> If we drop the reentrancy thing, what should work is to prevent
>> using loopback as output device and using something similar to
>> the recursion counters tunnel devices used to have.
> 
> Nah. I'm going to pick a bit from struct skbuff to indicate the
> packet was teed so as to avoid that loop.

That's a bad idea, we shouldn't be adding new skb members for something
as peripheral as this module.

What's wrong with adding a reentrancy counter?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jan Engelhardt April 1, 2010, 1:59 p.m. UTC | #8
On Thursday 2010-04-01 15:48, Patrick McHardy wrote:
>Jan Engelhardt wrote:
>> On Thursday 2010-04-01 15:22, Patrick McHardy wrote:
>>>>>> Conntrack loops are prevented by using a dummy conntrack, just as 
>>>>>> NOTRACK does.
>>>>> [...]
>>>>>>  - When the cloned packets gets XFRMed or tunneled, its status switches 
>>>>>>    from "special" to "plain". Doing policy routing on them does not seem 
>>>>>>    so far-fetched.
>>>>> My question was about the case without conntrack.
>>>> Hm. Do you have any suggestion in countering a case whereby a user
>>>> does -I OUTPUT -j TEE without conntrack?
>>>>
>>>> Perhaps making nesting a feature that requires conntrack, such that the 
>>>> non-CT case can't loop?
>>> If we drop the reentrancy thing, what should work is to prevent
>>> using loopback as output device and using something similar to
>>> the recursion counters tunnel devices used to have.
>> 
>> Nah. I'm going to pick a bit from struct skbuff to indicate the
>> packet was teed so as to avoid that loop.
>
>That's a bad idea, we shouldn't be adding new skb members for something
>as peripheral as this module.

I would have done this, which does not add a member:

	IP6CB(skb)->flags |= IPSKB_CLONED;

>What's wrong with adding a reentrancy counter?

Sounds like a plan.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Patrick McHardy April 1, 2010, 2:03 p.m. UTC | #9
Jan Engelhardt wrote:
> On Thursday 2010-04-01 15:48, Patrick McHardy wrote:
>> Jan Engelhardt wrote:
>>> On Thursday 2010-04-01 15:22, Patrick McHardy wrote:
>>>>>>> Conntrack loops are prevented by using a dummy conntrack, just as 
>>>>>>> NOTRACK does.
>>>>>> [...]
>>>>>>>  - When the cloned packets gets XFRMed or tunneled, its status switches 
>>>>>>>    from "special" to "plain". Doing policy routing on them does not seem 
>>>>>>>    so far-fetched.
>>>>>> My question was about the case without conntrack.
>>>>> Hm. Do you have any suggestion in countering a case whereby a user
>>>>> does -I OUTPUT -j TEE without conntrack?
>>>>>
>>>>> Perhaps making nesting a feature that requires conntrack, such that the 
>>>>> non-CT case can't loop?
>>>> If we drop the reentrancy thing, what should work is to prevent
>>>> using loopback as output device and using something similar to
>>>> the recursion counters tunnel devices used to have.
>>> Nah. I'm going to pick a bit from struct skbuff to indicate the
>>> packet was teed so as to avoid that loop.
>> That's a bad idea, we shouldn't be adding new skb members for something
>> as peripheral as this module.
> 
> I would have done this, which does not add a member:
> 
> 	IP6CB(skb)->flags |= IPSKB_CLONED;

This doesn't work, the CB is not preserved across layers
(which f.i. matters if you allow loopback destinations).
Its also not preserved for clones.

>> What's wrong with adding a reentrancy counter?
> 
> Sounds like a plan.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jan Engelhardt April 2, 2010, 6:15 p.m. UTC | #10
On Thursday 2010-04-01 16:03, Patrick McHardy wrote:
>>>>[detecting teed packets getting teed again by means of
>>>> iptables -A OUTPUT -j TEE]
>>>
>>> What's wrong with adding a reentrancy counter?
>> 
>> Sounds like a plan.

Should we be using a percpu variable, or is a simplistic
array ok too?

static bool tee_active;

target(...)
{
	if (tee_active[smp_processor_id()])
		return XT_CONTINUE;
	...
	if (tee_tg4_route(...)) {
		tee_active[cpu] = true;
		ip_local_out(skb);
		tee_active[cpu] = false;
	}
}


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jan Engelhardt April 6, 2010, 4:14 p.m. UTC | #11
On Thursday 2010-04-01 15:22, Patrick McHardy wrote:
>> Or should we be using skb_alloc and copying the data portion over, like 
>> ipt_REJECT does since v2.6.24-2931-g9ba99b0?
>
>I guess pskb_copy() would be most optimal since we can modify
>the header, but the non-linear area could be shared

Trying to improve my understanding: when doing skb_pull,
does the skb->head that is relevant for pskb_copy move?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Patrick McHardy April 6, 2010, 4:37 p.m. UTC | #12
Jan Engelhardt wrote:
> On Thursday 2010-04-01 15:22, Patrick McHardy wrote:
>>> Or should we be using skb_alloc and copying the data portion over, like 
>>> ipt_REJECT does since v2.6.24-2931-g9ba99b0?
>> I guess pskb_copy() would be most optimal since we can modify
>> the header, but the non-linear area could be shared
> 
> Trying to improve my understanding: when doing skb_pull,
> does the skb->head that is relevant for pskb_copy move?

skb_pull() only changes skb->data.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jan Engelhardt April 7, 2010, 1:26 p.m. UTC | #13
On Tuesday 2010-04-06 18:37, Patrick McHardy wrote:
>Jan Engelhardt wrote:
>> On Thursday 2010-04-01 15:22, Patrick McHardy wrote:
>>>> Or should we be using skb_alloc and copying the data portion over, like 
>>>> ipt_REJECT does since v2.6.24-2931-g9ba99b0?
>>> I guess pskb_copy() would be most optimal since we can modify
>>> the header, but the non-linear area could be shared
>> 
>> Trying to improve my understanding: when doing skb_pull,
>> does the skb->head that is relevant for pskb_copy move?
>
>skb_pull() only changes skb->data.

But how does it interact, with, say, xt_TCPMSS which modifies not
only the L3 header, but also the L4 header?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 0abfdde..f09135e 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -309,7 +309,6 @@  int ip_output(struct sk_buff *skb)
 			    ip_finish_output,
 			    !(IPCB(skb)->flags & IPSKB_REROUTED));
 }
-EXPORT_SYMBOL_GPL(ip_output);
 
 int ip_queue_xmit(struct sk_buff *skb, int ipfragok)
 {
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 307d8bf..7e10f62 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -176,7 +176,6 @@  int ip6_output(struct sk_buff *skb)
 			    ip6_finish_output,
 			    !(IP6CB(skb)->flags & IPSKB_REROUTED));
 }
-EXPORT_SYMBOL_GPL(ip6_output);
 
 /*
  *	xmit an sk_buff (used by TCP)
diff --git a/net/netfilter/xt_TEE.c b/net/netfilter/xt_TEE.c
index 96dd746..70078f1 100644
--- a/net/netfilter/xt_TEE.c
+++ b/net/netfilter/xt_TEE.c
@@ -130,21 +130,8 @@  tee_tg4(struct sk_buff *skb, const struct xt_target_param *par)
 	skb->nfctinfo = IP_CT_NEW;
 	nf_conntrack_get(skb->nfct);
 #endif
-	/*
-	 * Xtables is not reentrant currently, so a choice has to be made:
-	 * 1. return absolute verdict for the original and let the cloned
-	 *    packet travel through the chains
-	 * 2. let the original continue travelling and not pass the clone
-	 *    to Xtables.
-	 * #2 is chosen. Normally, we would use ip_local_out for the clone.
-	 * Because iph->check is already correct and we don't pass it to
-	 * Xtables anyway, a shortcut to dst_output [forwards to ip_output] can
-	 * be taken. %IPSKB_REROUTED needs to be set so that ip_output does not
-	 * invoke POSTROUTING on the cloned packet.
-	 */
-	IPCB(skb)->flags |= IPSKB_REROUTED;
 	if (tee_tg_route4(skb, info))
-		ip_output(skb);
+		ip_local_out(skb);
 
 	return XT_CONTINUE;
 }
@@ -199,9 +186,8 @@  tee_tg6(struct sk_buff *skb, const struct xt_target_param *par)
 		struct ipv6hdr *iph = ipv6_hdr(skb);
 		--iph->hop_limit;
 	}
-	IP6CB(skb)->flags |= IPSKB_REROUTED;
 	if (tee_tg_route6(skb, info))
-		ip6_output(skb);
+		ip6_local_out(skb);
 
 	return XT_CONTINUE;
 }