diff mbox

[Regression,w/] Restore network resistance to weird ICMP messages

Message ID CAO1wt+Y7JfeCXnrR5BeZ7K6dErmB3nfdA4mRkFQ21S=VPCkLXg@mail.gmail.com
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Vicente Jimenez Aguilar Nov. 7, 2016, 11:11 a.m. UTC
Handle weird ICMP fragmentation needed messages with next hop MTU
equal to (or exceeding) dropped packet size

Fixes: 46517008e116 ("ipv4: Kill ip_rt_frag_needed().")

In a large corporate network, we spotted this weird ICMP message after
a long troubleshooting. See attached capture file. Those ICMP "network
unreachable - fragmentation needed and don't fragment bit set"
messages are sent by a router that drop 1500 bytes IP packets and fill
the next hop MTU ICMP field with 1500.

Those messages cause the TCP connection to stall but only on newer
kernels. Older kernels set path MTU to 1492 and communicates
successfully.

After checking code and commit history, I spotted how commit
46517008e116 ("ipv4: Kill ip_rt_frag_needed().") from June 2012
changed ICMP messages handling by removing ip_rt_frag_needed function.

The relevant part of the ip_rt_frag_needed function that was removed is:

if (new_mtu < 68 || new_mtu >= old_mtu) {
        /* BSD 4.2 derived systems incorrectly adjust
         * tot_len by the IP header length, and report
         * a zero MTU in the ICMP message.
         */
        if (mtu == 0 &&
            old_mtu >= 68 + (iph->ihl << 2))
                old_mtu -= iph->ihl << 2;
        mtu = guess_mtu(old_mtu);
}


This condition handled the cases when next hop MTU where zero (less
than 68). Now this is handled by the protocol and fixed by commit
68b7107b6298 "ipv4: icmp: Fix pMTU handling for rare case".

But the rarest case when (next hop MTU) new_mtu >= old_mtu (dropped
packet length) was also removed. This commit restores this check.
Instead of using a table lookup like function guess_mtu uses, it just
try to set the path MTU decrementing by 2 bytes the dropped packet
size.

In our case, setting the path MTU to just 1498 (one iteration) worked.
This solution should converge in any case to a good value by small
steps. I don't think there's a need to a more complex solution.

The patched kernel worked perfectly setting the path MTU to 1498 from
the initial default interface value of 1500. This time I don't have a
capture file from inside the affected center, but all received packed
had a maximum size of 1498.

Comments

David Miller Nov. 10, 2016, 1:22 a.m. UTC | #1
From: Vicente Jiménez <googuy@gmail.com>
Date: Mon, 7 Nov 2016 12:11:59 +0100

> From bfc9a00e6b78d8eb60e46dacd7d761669d29a573 Mon Sep 17 00:00:00 2001
> From: Vicente Jimenez Aguilar <googuy@gmail.com>
> Date: Mon, 31 Oct 2016 13:10:29 +0100
> Subject: [PATCH] ipv4: icmp: Fix pMTU handling for rarest case
> 
> Restore network resistance to weird ICMP fragmentation needed messages
> with next hop MTU equal to (or exceeding) dropped packet size
> 
> Fixes: 46517008e116 ("ipv4: Kill ip_rt_frag_needed().")
> Signed-off-by: Vicente Jimenez Aguilar <googuy@gmail.com>
> ---
>  net/ipv4/icmp.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
> index 38abe70..c0af1d2 100644
> --- a/net/ipv4/icmp.c
> +++ b/net/ipv4/icmp.c
> @@ -776,6 +776,7 @@ static bool icmp_unreach(struct sk_buff *skb)
>  	struct icmphdr *icmph;
>  	struct net *net;
>  	u32 info = 0;
> +	unsigned short old_mtu;
>  
>  	net = dev_net(skb_dst(skb)->dev);
>  

Order local variable declarations from longest to shortest line
please.

> +				if ( info >= old_mtu )

There should be no space after the '(' and before the ')' in this
conditional.
diff mbox

Patch

From bfc9a00e6b78d8eb60e46dacd7d761669d29a573 Mon Sep 17 00:00:00 2001
From: Vicente Jimenez Aguilar <googuy@gmail.com>
Date: Mon, 31 Oct 2016 13:10:29 +0100
Subject: [PATCH] ipv4: icmp: Fix pMTU handling for rarest case

Restore network resistance to weird ICMP fragmentation needed messages
with next hop MTU equal to (or exceeding) dropped packet size

Fixes: 46517008e116 ("ipv4: Kill ip_rt_frag_needed().")
Signed-off-by: Vicente Jimenez Aguilar <googuy@gmail.com>
---
 net/ipv4/icmp.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index 38abe70..c0af1d2 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -776,6 +776,7 @@  static bool icmp_unreach(struct sk_buff *skb)
 	struct icmphdr *icmph;
 	struct net *net;
 	u32 info = 0;
+	unsigned short old_mtu;
 
 	net = dev_net(skb_dst(skb)->dev);
 
@@ -819,6 +820,12 @@  static bool icmp_unreach(struct sk_buff *skb)
 				/* fall through */
 			case 0:
 				info = ntohs(icmph->un.frag.mtu);
+				/* Handle weird case where next hop MTU is
+				 * equal to or exceeding dropped packet size
+				 */
+				old_mtu = ntohs(iph->tot_len);
+				if ( info >= old_mtu )
+					info = old_mtu - 2;
 			}
 			break;
 		case ICMP_SR_FAILED:
-- 
2.7.3