diff mbox

[RFC] IP: Send a fragment reassembly time exceeded packet when enabling connection track

Message ID 4B3191E7.8060509@cn.fujitsu.com
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Shan Wei Dec. 23, 2009, 3:43 a.m. UTC
Default, a host may send a fragment reassembly time exceeded packet
(ICMP Time Exceeded Message with code value of 1) when defraging fragments timeout.
But, when enabling connection track, a host can't send the packet.

Because, the module of nf_defrag_ipv4 selected by connection track is registered 
in PRE_ROUTING HOOK and assembles all accepted fragments(here, not begin to routing).
After defrag timeout, the host can't send fragment reassembly time exceeded packet, 
because of lack of router information.

RFC 792 says:
>> > >   If a host reassembling a fragmented datagram cannot complete the
>> > >   reassembly due to missing fragments within its time limit it
>> > >   discards the datagram, and it may send a time exceeded message.
>> > > 
>> > >   If fragment zero is not available then no time exceeded need be
>> > >   sent at all.
>> > > 
>> > > 
>> > > Read more: http://www.faqs.org/rfcs/rfc792.html#ixzz0aOXRD7Wp

So, the patch try to fix it with filling router information before sending fragment reassembly
time exceeded packet when defrag timeout.

Note:
Doing local deliver, also assemble fragments. But it already routing at ip_rcv_finish(). 
So skb_dst(head) is not NULL.


Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
---
 net/ipv4/ip_fragment.c |   22 +++++++++++++++++++---
 1 files changed, 19 insertions(+), 3 deletions(-)

-- 1.6.3.3 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Patrick McHardy Jan. 5, 2010, 5:44 a.m. UTC | #1
Shan Wei wrote:
> Default, a host may send a fragment reassembly time exceeded packet
> (ICMP Time Exceeded Message with code value of 1) when defraging fragments timeout.
> But, when enabling connection track, a host can't send the packet.
> 
> Because, the module of nf_defrag_ipv4 selected by connection track is registered 
> in PRE_ROUTING HOOK and assembles all accepted fragments(here, not begin to routing).
> After defrag timeout, the host can't send fragment reassembly time exceeded packet, 
> because of lack of router information.
> 
> RFC 792 says:
>>>>>   If a host reassembling a fragmented datagram cannot complete the
>>>>>   reassembly due to missing fragments within its time limit it
>>>>>   discards the datagram, and it may send a time exceeded message.
>>>>>
>>>>>   If fragment zero is not available then no time exceeded need be
>>>>>   sent at all.
>>>>>
>>>>>
>>>>> Read more: http://www.faqs.org/rfcs/rfc792.html#ixzz0aOXRD7Wp
> 
> So, the patch try to fix it with filling router information before sending fragment reassembly
> time exceeded packet when defrag timeout.

I guess the question is whether we really want to send an ICMP
message in this case. The above quote applies to end hosts,
while conntrack is also (probably more commonly) used on routers,
which normally shouldn't attempt reassembly. I can see no real
downside to this though except that it makes it quite easy to
discover firewalls, but that shouldn't be a real problem.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Shan Wei Jan. 13, 2010, 3:15 a.m. UTC | #2
Patrick McHardy wrote, at 01/05/2010 01:44 PM:
> Shan Wei wrote:
>> Default, a host may send a fragment reassembly time exceeded packet
>> (ICMP Time Exceeded Message with code value of 1) when defraging fragments timeout.
>> But, when enabling connection track, a host can't send the packet.
>>
>> Because, the module of nf_defrag_ipv4 selected by connection track is registered 
>> in PRE_ROUTING HOOK and assembles all accepted fragments(here, not begin to routing).
>> After defrag timeout, the host can't send fragment reassembly time exceeded packet, 
>> because of lack of router information.
>>
>> RFC 792 says:
>>>>>>   If a host reassembling a fragmented datagram cannot complete the
>>>>>>   reassembly due to missing fragments within its time limit it
>>>>>>   discards the datagram, and it may send a time exceeded message.
>>>>>>
>>>>>>   If fragment zero is not available then no time exceeded need be
>>>>>>   sent at all.
>>>>>>
>>>>>>
>>>>>> Read more: http://www.faqs.org/rfcs/rfc792.html#ixzz0aOXRD7Wp
>> So, the patch try to fix it with filling router information before sending fragment reassembly
>> time exceeded packet when defrag timeout.
> 
> I guess the question is whether we really want to send an ICMP
> message in this case. The above quote applies to end hosts,

Yes, what you guess is what i want to ask. :-)
Should end hosts which are using conntrack send a fragment reassembly time exceeded message?

> while conntrack is also (probably more commonly) used on routers,
> which normally shouldn't attempt reassembly.  

There are two point:
1.Take security into account, end hosts also used conntrack. 

  For example: When a host is attacked by denial of service TCP flaws, RedHat used the conntrack&recent
  match to limit the TCP connections.
  
  About details, see the phenomenon description:
    http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2008-4609
    http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2008-4609

  See RedHat's solution:
    http://kbase.redhat.com/faq/docs/DOC-18730
 

2.On the latest kernel, a router on which the conntrack is used, reassemble fragments and 
  forward reassembled intact packet. This implementation is not coincide with what you said.

  nf_defrag_ipv4 module is registered on PRE_ROUTING hook with the highest priority. So, search router table 
  after completing the reassembly and forward it to destination host.


If I miss something, please tell me.

Thanks.
-----
Shan Wei

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Patrick McHardy Jan. 13, 2010, 8:27 a.m. UTC | #3
Shan Wei wrote:
> Patrick McHardy wrote, at 01/05/2010 01:44 PM:
>> Shan Wei wrote:
>>> Default, a host may send a fragment reassembly time exceeded packet
>>> (ICMP Time Exceeded Message with code value of 1) when defraging fragments timeout.
>>> But, when enabling connection track, a host can't send the packet.
>>>
>>> Because, the module of nf_defrag_ipv4 selected by connection track is registered 
>>> in PRE_ROUTING HOOK and assembles all accepted fragments(here, not begin to routing).
>>> After defrag timeout, the host can't send fragment reassembly time exceeded packet, 
>>> because of lack of router information.
>>>
>>> RFC 792 says:
>>>>>>>   If a host reassembling a fragmented datagram cannot complete the
>>>>>>>   reassembly due to missing fragments within its time limit it
>>>>>>>   discards the datagram, and it may send a time exceeded message.
>>>>>>>
>>>>>>>   If fragment zero is not available then no time exceeded need be
>>>>>>>   sent at all.
>>>>>>>
>>>>>>>
>>>>>>> Read more: http://www.faqs.org/rfcs/rfc792.html#ixzz0aOXRD7Wp
>>> So, the patch try to fix it with filling router information before sending fragment reassembly
>>> time exceeded packet when defrag timeout.
>> I guess the question is whether we really want to send an ICMP
>> message in this case. The above quote applies to end hosts,
> 
> Yes, what you guess is what i want to ask. :-)
> Should end hosts which are using conntrack send a fragment reassembly time exceeded message?

Yes, they should.

>> while conntrack is also (probably more commonly) used on routers,
>> which normally shouldn't attempt reassembly.  
> 
> There are two point:
> 1.Take security into account, end hosts also used conntrack. 
> 
>   For example: When a host is attacked by denial of service TCP flaws, RedHat used the conntrack&recent
>   match to limit the TCP connections.
>   
>   About details, see the phenomenon description:
>     http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2008-4609
>     http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2008-4609
> 
>   See RedHat's solution:
>     http://kbase.redhat.com/faq/docs/DOC-18730

I'm not sure I get the connection to this patch.

> 2.On the latest kernel, a router on which the conntrack is used, reassemble fragments and 
>   forward reassembled intact packet. This implementation is not coincide with what you said.

Yes, thats a necessity for conntrack to work, but its not what
a router usually does. But it actually does refragment the packet
if it exceeds the MTU of the outgoing interface.

>   nf_defrag_ipv4 module is registered on PRE_ROUTING hook with the highest priority. So, search router table 
>   after completing the reassembly and forward it to destination host.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Shan Wei Jan. 14, 2010, 9:18 a.m. UTC | #4
Patrick McHardy wrote, at 01/13/2010 04:27 PM:
>> Should end hosts which are using conntrack send a fragment reassembly time exceeded message?
> 
> Yes, they should.

OK. Please ignore the patch, because the patch is wrong.
With the patch, a router also send a fragment reassembly time exceeded message
when reassembling timeout.

I'll make a new patch to fix the problem. After testing, then submit it.
diff mbox

Patch

diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
index 86964b3..1417cb8 100644
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -38,6 +38,7 @@ 
 #include <net/checksum.h>
 #include <net/inetpeer.h>
 #include <net/inet_frag.h>
+#include <net/route.h>
 #include <linux/tcp.h>
 #include <linux/udp.h>
 #include <linux/inet.h>
@@ -204,12 +205,27 @@  static void ip_expire(unsigned long arg)
 
 	if ((qp->q.last_in & INET_FRAG_FIRST_IN) && qp->q.fragments != NULL) {
 		struct sk_buff *head = qp->q.fragments;
+		const struct iphdr *iph = ip_hdr(head);
 
 		/* Send an ICMP "Fragment Reassembly Timeout" message. */
 		rcu_read_lock();
-		head->dev = dev_get_by_index_rcu(net, qp->iif);
-		if (head->dev)
-			icmp_send(head, ICMP_TIME_EXCEEDED, ICMP_EXC_FRAGTIME, 0);
+		if ((head->dev = dev_get_by_index_rcu(net, qp->iif)) == NULL) 
+			goto unlock_out;
+
+		if (skb_dst(head) == NULL) {
+			int err = ip_route_input(head, iph->daddr, iph->saddr, 
+						 iph->tos, head->dev);
+			if (unlikely(err)) {
+				if (err == -EHOSTUNREACH)
+					IP_INC_STATS_BH(net, IPSTATS_MIB_INADDRERRORS);
+				else if (err == -ENETUNREACH)
+					IP_INC_STATS_BH(net, IPSTATS_MIB_INNOROUTES);
+				goto unlock_out;
+			}
+ 		}
+		
+		icmp_send(head, ICMP_TIME_EXCEEDED, ICMP_EXC_FRAGTIME, 0);
+unlock_out:
 		rcu_read_unlock();
 	}
 out: