diff mbox

IPv6:Send an ICMPv6 "Fragment Reassembly Timeout" message when enabling connection track

Message ID 4B62A338.6020106@cn.fujitsu.com
State Not Applicable, archived
Delegated to: David Miller
Headers show

Commit Message

Shan Wei Jan. 29, 2010, 8:58 a.m. UTC
I have made a patch for an end host with IPv4 connection track enable
to send an ICMP "Fragment Reassembly Timeout" message when defaging timeout.
So add same changes for IPv6 connection track according to the section 4.5 
in RFC2460.

Quote Begin:
 Section 4.5 in RFC2460.
   If insufficient fragments are received to complete reassembly of a
   packet within 60 seconds of the reception of the first-arriving
   fragment of that packet, reassembly of that packet must be
   abandoned and all the fragments that have been received for that
   packet must be discarded.  If the first fragment (i.e., the one
   with a Fragment Offset of zero) has been received, an ICMP Time
   Exceeded -- Fragment Reassembly Time Exceeded message should be
   sent to the source of that fragment.
Quote End.

I have tested the patch on both host type and route type.


Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com> 
---
 include/linux/skbuff.h                  |    5 ++++
 net/ipv6/netfilter/nf_conntrack_reasm.c |   34 ++++++++++++++++++++++++++++++-
 net/ipv6/route.c                        |    1 +
 3 files changed, 39 insertions(+), 1 deletions(-)

Comments

Patrick McHardy Feb. 3, 2010, 4:42 p.m. UTC | #1
Shan Wei wrote:
> @@ -349,17 +378,20 @@ static int nf_ct_frag6_queue(struct nf_ct_frag6_queue *fq, struct sk_buff *skb,
>  	else
>  		fq->q.fragments = skb;
>  
> -	skb->dev = NULL;
>  	fq->q.stamp = skb->tstamp;
>  	fq->q.meat += skb->len;
>  	atomic_add(skb->truesize, &nf_init_frags.mem);
>  
>  	/* The first fragment.
>  	 * nhoffset is obtained from the first fragment, of course.
> +	 * Reserve dev for sending an ICMP "Fragment Reassembly Timeout"
> +	 * message.
>  	 */
>  	if (offset == 0) {
>  		fq->nhoffset = nhoff;
>  		fq->q.last_in |= INET_FRAG_FIRST_IN;
> +	} else {
> +		skb->dev = NULL;
>  	}

We need to store the iif and perform a lookup later just as in IPv4
because the device is not reference counted and might disappear while
the fragments are queued.

Besides this, the patch looks good.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Shan Wei Feb. 8, 2010, 2:18 p.m. UTC | #2
Patrick McHardy wrote, at 02/04/2010 12:42 AM:
> Shan Wei wrote:
>> @@ -349,17 +378,20 @@ static int nf_ct_frag6_queue(struct nf_ct_frag6_queue *fq, struct sk_buff *skb,
>>  	else
>>  		fq->q.fragments = skb;
>>  
>> -	skb->dev = NULL;
>>  	fq->q.stamp = skb->tstamp;
>>  	fq->q.meat += skb->len;
>>  	atomic_add(skb->truesize, &nf_init_frags.mem);
>>  
>>  	/* The first fragment.
>>  	 * nhoffset is obtained from the first fragment, of course.
>> +	 * Reserve dev for sending an ICMP "Fragment Reassembly Timeout"
>> +	 * message.
>>  	 */
>>  	if (offset == 0) {
>>  		fq->nhoffset = nhoff;
>>  		fq->q.last_in |= INET_FRAG_FIRST_IN;
>> +	} else {
>> +		skb->dev = NULL;
>>  	}
> 
> We need to store the iif and perform a lookup later just as in IPv4
> because the device is not reference counted and might disappear while
> the fragments are queued.

There is no net namespace in nf_conntrack_reasm,
So we can't look up net device according to stored iif.

How about introducing net namespace to nf_conntrack_reasm?
There are the following two advantages:
1. nf_init_frags can be deleted, because net structure includes netns_frags structure member.

2. Record counter value, e.g. IPSTATS_MIB_REASMFAILS if reassamble with fail.
   Since IPv6 conntrack fails to reassamble fragments, then the original fragment is not forwarded to IPv6 stack. 
   The counter value can't be recorded. But IPv4 conntrack uses IPv4 defrag code, and records
   counter value correctly.

These are just my thoughts, no practice.
Patrick McHardy Feb. 8, 2010, 2:20 p.m. UTC | #3
Shan Wei wrote:
> Patrick McHardy wrote, at 02/04/2010 12:42 AM:
>> Shan Wei wrote:
>>> @@ -349,17 +378,20 @@ static int nf_ct_frag6_queue(struct nf_ct_frag6_queue *fq, struct sk_buff *skb,
>>>  	else
>>>  		fq->q.fragments = skb;
>>>  
>>> -	skb->dev = NULL;
>>>  	fq->q.stamp = skb->tstamp;
>>>  	fq->q.meat += skb->len;
>>>  	atomic_add(skb->truesize, &nf_init_frags.mem);
>>>  
>>>  	/* The first fragment.
>>>  	 * nhoffset is obtained from the first fragment, of course.
>>> +	 * Reserve dev for sending an ICMP "Fragment Reassembly Timeout"
>>> +	 * message.
>>>  	 */
>>>  	if (offset == 0) {
>>>  		fq->nhoffset = nhoff;
>>>  		fq->q.last_in |= INET_FRAG_FIRST_IN;
>>> +	} else {
>>> +		skb->dev = NULL;
>>>  	}
>> We need to store the iif and perform a lookup later just as in IPv4
>> because the device is not reference counted and might disappear while
>> the fragments are queued.
> 
> There is no net namespace in nf_conntrack_reasm,
> So we can't look up net device according to stored iif.
> 
> How about introducing net namespace to nf_conntrack_reasm?
> There are the following two advantages:
> 1. nf_init_frags can be deleted, because net structure includes netns_frags structure member.
> 
> 2. Record counter value, e.g. IPSTATS_MIB_REASMFAILS if reassamble with fail.
>    Since IPv6 conntrack fails to reassamble fragments, then the original fragment is not forwarded to IPv6 stack. 
>    The counter value can't be recorded. But IPv4 conntrack uses IPv4 defrag code, and records
>    counter value correctly.
> 
> These are just my thoughts, no practice.

Sounds good to me.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index ae836fd..33a1784 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -431,6 +431,11 @@  static inline struct rtable *skb_rtable(const struct sk_buff *skb)
 	return (struct rtable *)skb_dst(skb);
 }
 
+static inline struct rt6_info *skb_r6table(const struct sk_buff *skb)
+{
+	return (struct rt6_info *)skb_dst(skb);
+}
+
 extern void kfree_skb(struct sk_buff *skb);
 extern void consume_skb(struct sk_buff *skb);
 extern void	       __kfree_skb(struct sk_buff *skb);
diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c b/net/ipv6/netfilter/nf_conntrack_reasm.c
index 312c20a..2be0edc 100644
--- a/net/ipv6/netfilter/nf_conntrack_reasm.c
+++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
@@ -27,10 +27,12 @@ 
 #include <linux/ipv6.h>
 #include <linux/icmpv6.h>
 #include <linux/random.h>
+#include <linux/ipv6_route.h>
 
 #include <net/sock.h>
 #include <net/snmp.h>
 #include <net/inet_frag.h>
+#include <net/ip6_route.h>
 
 #include <net/ipv6.h>
 #include <net/protocol.h>
@@ -160,6 +162,33 @@  static void nf_ct_frag6_expire(unsigned long data)
 
 	fq_kill(fq);
 
+	/* Don't send error if the first segment did not arrive. */
+	if (!(fq->q.last_in & INET_FRAG_FIRST_IN) || !fq->q.fragments)
+		goto out;
+
+	/*
+	 * Only search router table for the head fragment,
+	 * when defraging timeout at PRE_ROUTING HOOK.
+	 */
+	if (fq->user == IP6_DEFRAG_CONNTRACK_IN) {
+		struct sk_buff *head = fq->q.fragments;
+
+		ip6_route_input(head);
+		if (!skb_dst(head))
+			goto out;
+
+		/*
+		 * Only an end host needs to send an ICMP "Fragment Reassembly
+		 * Timeout" message, per section 4.5 of RFC2460.
+		 */
+		if (!(skb_r6table(head)->rt6i_flags & RTF_LOCAL))
+			goto out;
+
+		/* Send an ICMP "Fragment Reassembly Timeout" message. */
+		icmpv6_send(head, ICMPV6_TIME_EXCEED, ICMPV6_EXC_FRAGTIME, 0,
+			    head->dev);
+	}
+
 out:
 	spin_unlock(&fq->q.lock);
 	fq_put(fq);
@@ -349,17 +378,20 @@  static int nf_ct_frag6_queue(struct nf_ct_frag6_queue *fq, struct sk_buff *skb,
 	else
 		fq->q.fragments = skb;
 
-	skb->dev = NULL;
 	fq->q.stamp = skb->tstamp;
 	fq->q.meat += skb->len;
 	atomic_add(skb->truesize, &nf_init_frags.mem);
 
 	/* The first fragment.
 	 * nhoffset is obtained from the first fragment, of course.
+	 * Reserve dev for sending an ICMP "Fragment Reassembly Timeout"
+	 * message.
 	 */
 	if (offset == 0) {
 		fq->nhoffset = nhoff;
 		fq->q.last_in |= INET_FRAG_FIRST_IN;
+	} else {
+		skb->dev = NULL;
 	}
 	write_lock(&nf_frags.lock);
 	list_move_tail(&fq->q.lru_list, &nf_init_frags.lru_list);
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index c2bd74c..0980d6c 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -802,6 +802,7 @@  void ip6_route_input(struct sk_buff *skb)
 
 	skb_dst_set(skb, fib6_rule_lookup(net, &fl, flags, ip6_pol_route_input));
 }
+EXPORT_SYMBOL(ip6_route_input);
 
 static struct rt6_info *ip6_pol_route_output(struct net *net, struct fib6_table *table,
 					     struct flowi *fl, int flags)