diff mbox

[net,v3] tcp: warn on bogus MSS and try to amend it

Message ID 83a3345f2fae904fa59794d59703d86851cda7d5.1480718620.git.marcelo.leitner@gmail.com
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Marcelo Ricardo Leitner Dec. 2, 2016, 10:51 p.m. UTC
There have been some reports lately about TCP connection stalls caused
by NIC drivers that aren't setting gso_size on aggregated packets on rx
path. This causes TCP to assume that the MSS is actually the size of the
aggregated packet, which is invalid.

Although the proper fix is to be done at each driver, it's often hard
and cumbersome for one to debug, come to such root cause and report/fix
it.

This patch amends this situation in two ways. First, it adds a warning
on when this situation occurs, so it gives a hint to those trying to
debug this. It also limit the maximum probed MSS to the adverised MSS,
as it should never be any higher than that.

The result is that the connection may not have the best performance ever
but it shouldn't stall, and the admin will have a hint on what to look
for.

Tested with virtio by forcing gso_size to 0.

v2: updated msg per David suggestion
v3: use skb_iif to find the interface and also log its name, per Eric
    Dumazet suggestion. As the skb may be backlogged and the interface
    gone by then, we need to check if the number still has a meaning.

Cc: Jonathan Maxwell <jmaxwell37@gmail.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
---
 net/ipv4/tcp_input.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

Comments

David Miller Dec. 4, 2016, 4:37 a.m. UTC | #1
From: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Date: Fri,  2 Dec 2016 20:51:51 -0200

> @@ -144,7 +144,21 @@ static void tcp_measure_rcv_mss(struct sock *sk, const struct sk_buff *skb)
>  	 */
>  	len = skb_shinfo(skb)->gso_size ? : skb->len;
>  	if (len >= icsk->icsk_ack.rcv_mss) {
> -		icsk->icsk_ack.rcv_mss = len;
> +		static bool __once __read_mostly;
> +
> +		icsk->icsk_ack.rcv_mss = min_t(unsigned int, len,
> +					       tcp_sk(sk)->advmss);
> +		if (icsk->icsk_ack.rcv_mss != len && !__once) {
> +			struct net_device *dev;
> +
> +			__once = true;
> +
> +			rcu_read_lock();
> +			dev = dev_get_by_index_rcu(sock_net(sk), skb->skb_iif);
> +			pr_warn_once("%s: Driver has suspect GRO implementation, TCP performance may be compromised.\n",
> +				     dev ? dev->name : "Unknown driver");
> +			rcu_read_unlock();
> +		}

This is almost ready to go.

Since you are doing the 'once' logic by hand, using pr_warn_once() is
redundant.  And while you're at it, why not split this into a helper
function:

static void tcp_gro_dev_warn(struct sock *sk, const struct sk_buff *skb)
{
	static bool __once __read_mostly;

	if (!__once) {
		__once = true;

		rcu_read_lock();
		dev = dev_get_by_index_rcu(sock_net(sk), skb->skb_iif);
		pr_warn("%s: Driver has suspect GRO implementation, TCP performance may be compromised.\n",
			dev ? dev->name : "Unknown driver");
		rcu_read_unlock();
	}
}

And then call that when icsk->icsk_ack.rcv_mss != len, you can even
put an unlikely() around the condition as well.
diff mbox

Patch

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index a27b9c0e27c08b4e4aeaff3d0bfdf3ae561ba4d8..042a8a895e97d04afbdc377830537e8fd3b15d1e 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -144,7 +144,21 @@  static void tcp_measure_rcv_mss(struct sock *sk, const struct sk_buff *skb)
 	 */
 	len = skb_shinfo(skb)->gso_size ? : skb->len;
 	if (len >= icsk->icsk_ack.rcv_mss) {
-		icsk->icsk_ack.rcv_mss = len;
+		static bool __once __read_mostly;
+
+		icsk->icsk_ack.rcv_mss = min_t(unsigned int, len,
+					       tcp_sk(sk)->advmss);
+		if (icsk->icsk_ack.rcv_mss != len && !__once) {
+			struct net_device *dev;
+
+			__once = true;
+
+			rcu_read_lock();
+			dev = dev_get_by_index_rcu(sock_net(sk), skb->skb_iif);
+			pr_warn_once("%s: Driver has suspect GRO implementation, TCP performance may be compromised.\n",
+				     dev ? dev->name : "Unknown driver");
+			rcu_read_unlock();
+		}
 	} else {
 		/* Otherwise, we make more careful check taking into account,
 		 * that SACKs block is variable.