diff mbox series

[net-next] icmp: don't fail on fragment reassembly time exceeded

Message ID 20171012141237.2209-1-mcroce@redhat.com
State Accepted, archived
Delegated to: David Miller
Headers show
Series [net-next] icmp: don't fail on fragment reassembly time exceeded | expand

Commit Message

Matteo Croce Oct. 12, 2017, 2:12 p.m. UTC
The ICMP implementation currently replies to an ICMP time exceeded message
(type 11) with an ICMP host unreachable message (type 3, code 1).

However, time exceeded messages can either represent "time to live exceeded
in transit" (code 0) or "fragment reassembly time exceeded" (code 1).

Unconditionally replying to "fragment reassembly time exceeded" with
host unreachable messages might cause unjustified connection resets
which are now easily triggered as UFO has been removed, because, in turn,
sending large buffers triggers IP fragmentation.

The issue can be easily reproduced by running a lot of UDP streams
which is likely to trigger IP fragmentation:

  # start netserver in the test namespace
  ip netns add test
  ip netns exec test netserver

  # create a VETH pair
  ip link add name veth0 type veth peer name veth0 netns test
  ip link set veth0 up
  ip -n test link set veth0 up

  for i in $(seq 20 29); do
      # assign addresses to both ends
      ip addr add dev veth0 192.168.$i.1/24
      ip -n test addr add dev veth0 192.168.$i.2/24

      # start the traffic
      netperf -L 192.168.$i.1 -H 192.168.$i.2 -t UDP_STREAM -l 0 &
  done

  # wait
  send_data: data send error: No route to host (errno 113)
  netperf: send_omni: send_data failed: No route to host

We need to differentiate instead: if fragment reassembly time exceeded
is reported, we need to silently drop the packet,
if time to live exceeded is reported, maintain the current behaviour.
In both cases increment the related error count "icmpInTimeExcds".

While at it, fix a typo in a comment, and convert the if statement
into a switch to mate it more readable.

Signed-off-by: Matteo Croce <mcroce@redhat.com>
---
 net/ipv4/icmp.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

Comments

David Miller Oct. 14, 2017, 6:05 p.m. UTC | #1
From: Matteo Croce <mcroce@redhat.com>
Date: Thu, 12 Oct 2017 16:12:37 +0200

> The ICMP implementation currently replies to an ICMP time exceeded message
> (type 11) with an ICMP host unreachable message (type 3, code 1).
> 
> However, time exceeded messages can either represent "time to live exceeded
> in transit" (code 0) or "fragment reassembly time exceeded" (code 1).
> 
> Unconditionally replying to "fragment reassembly time exceeded" with
> host unreachable messages might cause unjustified connection resets
> which are now easily triggered as UFO has been removed, because, in turn,
> sending large buffers triggers IP fragmentation.
> 
> The issue can be easily reproduced by running a lot of UDP streams
> which is likely to trigger IP fragmentation:
> 
>   # start netserver in the test namespace
>   ip netns add test
>   ip netns exec test netserver
> 
>   # create a VETH pair
>   ip link add name veth0 type veth peer name veth0 netns test
>   ip link set veth0 up
>   ip -n test link set veth0 up
> 
>   for i in $(seq 20 29); do
>       # assign addresses to both ends
>       ip addr add dev veth0 192.168.$i.1/24
>       ip -n test addr add dev veth0 192.168.$i.2/24
> 
>       # start the traffic
>       netperf -L 192.168.$i.1 -H 192.168.$i.2 -t UDP_STREAM -l 0 &
>   done
> 
>   # wait
>   send_data: data send error: No route to host (errno 113)
>   netperf: send_omni: send_data failed: No route to host
> 
> We need to differentiate instead: if fragment reassembly time exceeded
> is reported, we need to silently drop the packet,
> if time to live exceeded is reported, maintain the current behaviour.
> In both cases increment the related error count "icmpInTimeExcds".
> 
> While at it, fix a typo in a comment, and convert the if statement
> into a switch to mate it more readable.
> 
> Signed-off-by: Matteo Croce <mcroce@redhat.com>

Looks good, applied, thank you!
diff mbox series

Patch

diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index 681e33998e03..3c1570d3e22f 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -782,7 +782,7 @@  static bool icmp_tag_validation(int proto)
 }
 
 /*
- *	Handle ICMP_DEST_UNREACH, ICMP_TIME_EXCEED, ICMP_QUENCH, and
+ *	Handle ICMP_DEST_UNREACH, ICMP_TIME_EXCEEDED, ICMP_QUENCH, and
  *	ICMP_PARAMETERPROB.
  */
 
@@ -810,7 +810,8 @@  static bool icmp_unreach(struct sk_buff *skb)
 	if (iph->ihl < 5) /* Mangled header, drop. */
 		goto out_err;
 
-	if (icmph->type == ICMP_DEST_UNREACH) {
+	switch (icmph->type) {
+	case ICMP_DEST_UNREACH:
 		switch (icmph->code & 15) {
 		case ICMP_NET_UNREACH:
 		case ICMP_HOST_UNREACH:
@@ -846,8 +847,16 @@  static bool icmp_unreach(struct sk_buff *skb)
 		}
 		if (icmph->code > NR_ICMP_UNREACH)
 			goto out;
-	} else if (icmph->type == ICMP_PARAMETERPROB)
+		break;
+	case ICMP_PARAMETERPROB:
 		info = ntohl(icmph->un.gateway) >> 24;
+		break;
+	case ICMP_TIME_EXCEEDED:
+		__ICMP_INC_STATS(net, ICMP_MIB_INTIMEEXCDS);
+		if (icmph->code == ICMP_EXC_FRAGTIME)
+			goto out;
+		break;
+	}
 
 	/*
 	 *	Throw it at our lower layers