diff mbox

"TCP: eth0: Driver has suspect GRO implementation, TCP performance may be compromised." message with "ethtool -K eth0 gro off"

Message ID 20170203115457.GA5296@localhost.localdomain
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Marcelo Ricardo Leitner Feb. 3, 2017, 11:54 a.m. UTC
On Thu, Feb 02, 2017 at 05:59:24AM -0800, Eric Dumazet wrote:
> On Thu, 2017-02-02 at 05:31 -0800, Eric Dumazet wrote:
> 
> > Anyway, I suspect the test is simply buggy ;)
> > 
> > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> > index 41dcbd568cbe2403f2a9e659669afe462a42e228..5394a39fcce964a7fe7075b1531a8a1e05550a54 100644
> > --- a/net/ipv4/tcp_input.c
> > +++ b/net/ipv4/tcp_input.c
> > @@ -164,7 +164,7 @@ static void tcp_measure_rcv_mss(struct sock *sk, const struct sk_buff *skb)
> >  	if (len >= icsk->icsk_ack.rcv_mss) {
> >  		icsk->icsk_ack.rcv_mss = min_t(unsigned int, len,
> >  					       tcp_sk(sk)->advmss);
> > -		if (unlikely(icsk->icsk_ack.rcv_mss != len))
> > +		if (unlikely(icsk->icsk_ack.rcv_mss != len && skb_is_gso(skb)))
> >  			tcp_gro_dev_warn(sk, skb);
> >  	} else {
> >  		/* Otherwise, we make more careful check taking into account,
> 
> This wont really help.
> 
> Our tcp_sk(sk)->advmss can be lower than the MSS used by the remote
> peer.
> 
> ip ro add .... advmss 512

I don't follow. With a good driver, how can advmss be smaller than the
MSS used by the remote peer? Even with the route entry above, I get
segments just up to advmss, and no warning.

Though yeah, interesting that this driver doesn't even support GRO. FCS
perhaps?

Markus, do you have other interfaces in your system? Which MTU do you
use, and please try the (untested) patch below, to gather more debug
info:

---8<---

Comments

Markus Trippelsdorf Feb. 3, 2017, 12:06 p.m. UTC | #1
On 2017.02.03 at 09:54 -0200, Marcelo Ricardo Leitner wrote:
> On Thu, Feb 02, 2017 at 05:59:24AM -0800, Eric Dumazet wrote:
> > On Thu, 2017-02-02 at 05:31 -0800, Eric Dumazet wrote:
> > 
> > > Anyway, I suspect the test is simply buggy ;)
> > > 
> > > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> > > index 41dcbd568cbe2403f2a9e659669afe462a42e228..5394a39fcce964a7fe7075b1531a8a1e05550a54 100644
> > > --- a/net/ipv4/tcp_input.c
> > > +++ b/net/ipv4/tcp_input.c
> > > @@ -164,7 +164,7 @@ static void tcp_measure_rcv_mss(struct sock *sk, const struct sk_buff *skb)
> > >  	if (len >= icsk->icsk_ack.rcv_mss) {
> > >  		icsk->icsk_ack.rcv_mss = min_t(unsigned int, len,
> > >  					       tcp_sk(sk)->advmss);
> > > -		if (unlikely(icsk->icsk_ack.rcv_mss != len))
> > > +		if (unlikely(icsk->icsk_ack.rcv_mss != len && skb_is_gso(skb)))
> > >  			tcp_gro_dev_warn(sk, skb);
> > >  	} else {
> > >  		/* Otherwise, we make more careful check taking into account,
> > 
> > This wont really help.
> > 
> > Our tcp_sk(sk)->advmss can be lower than the MSS used by the remote
> > peer.
> > 
> > ip ro add .... advmss 512
> 
> I don't follow. With a good driver, how can advmss be smaller than the
> MSS used by the remote peer? Even with the route entry above, I get
> segments just up to advmss, and no warning.
> 
> Though yeah, interesting that this driver doesn't even support GRO. FCS
> perhaps?
> 
> Markus, do you have other interfaces in your system? Which MTU do you
> use, and please try the (untested) patch below, to gather more debug
> info:

No, eth0 is the only interface. MTU = 1500.
Sure, I will try your patch. But I don't know how to reproduce the
issue, so you will have to wait until it triggers again.
Eric Dumazet Feb. 3, 2017, 1:24 p.m. UTC | #2
On Fri, 2017-02-03 at 09:54 -0200, Marcelo Ricardo Leitner wrote:
> On Thu, Feb 02, 2017 at 05:59:24AM -0800, Eric Dumazet wrote:
> > On Thu, 2017-02-02 at 05:31 -0800, Eric Dumazet wrote:
> > 
> > > Anyway, I suspect the test is simply buggy ;)
> > > 
> > > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> > > index 41dcbd568cbe2403f2a9e659669afe462a42e228..5394a39fcce964a7fe7075b1531a8a1e05550a54 100644
> > > --- a/net/ipv4/tcp_input.c
> > > +++ b/net/ipv4/tcp_input.c
> > > @@ -164,7 +164,7 @@ static void tcp_measure_rcv_mss(struct sock *sk, const struct sk_buff *skb)
> > >  	if (len >= icsk->icsk_ack.rcv_mss) {
> > >  		icsk->icsk_ack.rcv_mss = min_t(unsigned int, len,
> > >  					       tcp_sk(sk)->advmss);
> > > -		if (unlikely(icsk->icsk_ack.rcv_mss != len))
> > > +		if (unlikely(icsk->icsk_ack.rcv_mss != len && skb_is_gso(skb)))
> > >  			tcp_gro_dev_warn(sk, skb);
> > >  	} else {
> > >  		/* Otherwise, we make more careful check taking into account,
> > 
> > This wont really help.
> > 
> > Our tcp_sk(sk)->advmss can be lower than the MSS used by the remote
> > peer.
> > 
> > ip ro add .... advmss 512
> 
> I don't follow. With a good driver, how can advmss be smaller than the
> MSS used by the remote peer? Even with the route entry above, I get
> segments just up to advmss, and no warning.
> 

A TCP flow has two ends.

Common MTU = 1500

One can have advmss 500, the other one no advmss (or the standard 1460
one)

So if we compare apple and orange, result might be shocking ;)

If you want to reproduce this use the "ip ro add .... advmss 512" hint,
and/or play with sysctl_tcp_mtu_probing
Marcelo Ricardo Leitner Feb. 3, 2017, 1:53 p.m. UTC | #3
On Fri, Feb 03, 2017 at 05:24:06AM -0800, Eric Dumazet wrote:
> On Fri, 2017-02-03 at 09:54 -0200, Marcelo Ricardo Leitner wrote:
> > On Thu, Feb 02, 2017 at 05:59:24AM -0800, Eric Dumazet wrote:
> > > On Thu, 2017-02-02 at 05:31 -0800, Eric Dumazet wrote:
> > > 
> > > > Anyway, I suspect the test is simply buggy ;)
> > > > 
> > > > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> > > > index 41dcbd568cbe2403f2a9e659669afe462a42e228..5394a39fcce964a7fe7075b1531a8a1e05550a54 100644
> > > > --- a/net/ipv4/tcp_input.c
> > > > +++ b/net/ipv4/tcp_input.c
> > > > @@ -164,7 +164,7 @@ static void tcp_measure_rcv_mss(struct sock *sk, const struct sk_buff *skb)
> > > >  	if (len >= icsk->icsk_ack.rcv_mss) {
> > > >  		icsk->icsk_ack.rcv_mss = min_t(unsigned int, len,
> > > >  					       tcp_sk(sk)->advmss);
> > > > -		if (unlikely(icsk->icsk_ack.rcv_mss != len))
> > > > +		if (unlikely(icsk->icsk_ack.rcv_mss != len && skb_is_gso(skb)))
> > > >  			tcp_gro_dev_warn(sk, skb);
> > > >  	} else {
> > > >  		/* Otherwise, we make more careful check taking into account,
> > > 
> > > This wont really help.
> > > 
> > > Our tcp_sk(sk)->advmss can be lower than the MSS used by the remote
> > > peer.
> > > 
> > > ip ro add .... advmss 512
> > 
> > I don't follow. With a good driver, how can advmss be smaller than the
> > MSS used by the remote peer? Even with the route entry above, I get
> > segments just up to advmss, and no warning.
> > 
> 
> A TCP flow has two ends.

Indeed, though should be mostly about only one of them.

> 
> Common MTU = 1500
> 
> One can have advmss 500, the other one no advmss (or the standard 1460
> one)

Considering the rx side of peer A. Peer A advertises a given MSS to peer
B and should not receive any segment from peer B larger than so.
I'm failing to see how advmss can be smaller than the segment size just
received.

> 
> So if we compare apple and orange, result might be shocking ;)

Yes heh just not seeing the mix here..

> 
> If you want to reproduce this use the "ip ro add .... advmss 512" hint,
> and/or play with sysctl_tcp_mtu_probing

I tried the route with advmss, no luck so far.
Still digging..

  Marcelo
Eric Dumazet Feb. 3, 2017, 2:16 p.m. UTC | #4
On Fri, 2017-02-03 at 11:53 -0200, Marcelo Ricardo Leitner wrote:
> On Fri, Feb 03, 2017 at 05:24:06AM -0800, Eric Dumazet wrote:
> > On Fri, 2017-02-03 at 09:54 -0200, Marcelo Ricardo Leitner wrote:
> > > On Thu, Feb 02, 2017 at 05:59:24AM -0800, Eric Dumazet wrote:
> > > > On Thu, 2017-02-02 at 05:31 -0800, Eric Dumazet wrote:
> > > > 
> > > > > Anyway, I suspect the test is simply buggy ;)
> > > > > 
> > > > > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> > > > > index 41dcbd568cbe2403f2a9e659669afe462a42e228..5394a39fcce964a7fe7075b1531a8a1e05550a54 100644
> > > > > --- a/net/ipv4/tcp_input.c
> > > > > +++ b/net/ipv4/tcp_input.c
> > > > > @@ -164,7 +164,7 @@ static void tcp_measure_rcv_mss(struct sock *sk, const struct sk_buff *skb)
> > > > >  	if (len >= icsk->icsk_ack.rcv_mss) {
> > > > >  		icsk->icsk_ack.rcv_mss = min_t(unsigned int, len,
> > > > >  					       tcp_sk(sk)->advmss);
> > > > > -		if (unlikely(icsk->icsk_ack.rcv_mss != len))
> > > > > +		if (unlikely(icsk->icsk_ack.rcv_mss != len && skb_is_gso(skb)))
> > > > >  			tcp_gro_dev_warn(sk, skb);
> > > > >  	} else {
> > > > >  		/* Otherwise, we make more careful check taking into account,
> > > > 
> > > > This wont really help.
> > > > 
> > > > Our tcp_sk(sk)->advmss can be lower than the MSS used by the remote
> > > > peer.
> > > > 
> > > > ip ro add .... advmss 512
> > > 
> > > I don't follow. With a good driver, how can advmss be smaller than the
> > > MSS used by the remote peer? Even with the route entry above, I get
> > > segments just up to advmss, and no warning.
> > > 
> > 
> > A TCP flow has two ends.
> 
> Indeed, though should be mostly about only one of them.
> 
> > 
> > Common MTU = 1500
> > 
> > One can have advmss 500, the other one no advmss (or the standard 1460
> > one)
> 
> Considering the rx side of peer A. Peer A advertises a given MSS to peer
> B and should not receive any segment from peer B larger than so.
> I'm failing to see how advmss can be smaller than the segment size just
> received.

tcp_sk(sk)->advmss records what the peer announced during its SYN (or
SYNACK) message, in the MSS option.

Nothing prevents the peer to change its mind later.

Eg starting with MSS 512, then switch later to sending packets of 1024
or 1400 bytes.

So the innocent NIC driver is not the problem here.
Marcelo Ricardo Leitner Feb. 3, 2017, 2:28 p.m. UTC | #5
On Fri, Feb 03, 2017 at 06:16:06AM -0800, Eric Dumazet wrote:
> On Fri, 2017-02-03 at 11:53 -0200, Marcelo Ricardo Leitner wrote:
> > On Fri, Feb 03, 2017 at 05:24:06AM -0800, Eric Dumazet wrote:
> > > On Fri, 2017-02-03 at 09:54 -0200, Marcelo Ricardo Leitner wrote:
> > > > On Thu, Feb 02, 2017 at 05:59:24AM -0800, Eric Dumazet wrote:
> > > > > On Thu, 2017-02-02 at 05:31 -0800, Eric Dumazet wrote:
> > > > > 
> > > > > > Anyway, I suspect the test is simply buggy ;)
> > > > > > 
> > > > > > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> > > > > > index 41dcbd568cbe2403f2a9e659669afe462a42e228..5394a39fcce964a7fe7075b1531a8a1e05550a54 100644
> > > > > > --- a/net/ipv4/tcp_input.c
> > > > > > +++ b/net/ipv4/tcp_input.c
> > > > > > @@ -164,7 +164,7 @@ static void tcp_measure_rcv_mss(struct sock *sk, const struct sk_buff *skb)
> > > > > >  	if (len >= icsk->icsk_ack.rcv_mss) {
> > > > > >  		icsk->icsk_ack.rcv_mss = min_t(unsigned int, len,
> > > > > >  					       tcp_sk(sk)->advmss);
> > > > > > -		if (unlikely(icsk->icsk_ack.rcv_mss != len))
> > > > > > +		if (unlikely(icsk->icsk_ack.rcv_mss != len && skb_is_gso(skb)))
> > > > > >  			tcp_gro_dev_warn(sk, skb);
> > > > > >  	} else {
> > > > > >  		/* Otherwise, we make more careful check taking into account,
> > > > > 
> > > > > This wont really help.
> > > > > 
> > > > > Our tcp_sk(sk)->advmss can be lower than the MSS used by the remote
> > > > > peer.
> > > > > 
> > > > > ip ro add .... advmss 512
> > > > 
> > > > I don't follow. With a good driver, how can advmss be smaller than the
> > > > MSS used by the remote peer? Even with the route entry above, I get
> > > > segments just up to advmss, and no warning.
> > > > 
> > > 
> > > A TCP flow has two ends.
> > 
> > Indeed, though should be mostly about only one of them.
> > 
> > > 
> > > Common MTU = 1500
> > > 
> > > One can have advmss 500, the other one no advmss (or the standard 1460
> > > one)
> > 
> > Considering the rx side of peer A. Peer A advertises a given MSS to peer
> > B and should not receive any segment from peer B larger than so.
> > I'm failing to see how advmss can be smaller than the segment size just
> > received.
> 
> tcp_sk(sk)->advmss records what the peer announced during its SYN (or
> SYNACK) message, in the MSS option.
> 
> Nothing prevents the peer to change its mind later.
> 
> Eg starting with MSS 512, then switch later to sending packets of 1024
> or 1400 bytes.

Aren't you mixing the endpoints here? MSS is the largest amount of data
that the peer can receive in a single segment, and not how much it will
send. For the sending part, that depends on what the other peer
announced, and we can have 2 different MSS in a single connection, one
for each peer.

If a peer later wants to send larger segments, it can, but it must
respect the mss advertised by the other peer during handshake.

> 
> So the innocent NIC driver is not the problem here.
> 
>
Eric Dumazet Feb. 3, 2017, 2:47 p.m. UTC | #6
On Fri, 2017-02-03 at 12:28 -0200, Marcelo Ricardo Leitner wrote:

> Aren't you mixing the endpoints here? MSS is the largest amount of data
> that the peer can receive in a single segment, and not how much it will
> send. For the sending part, that depends on what the other peer
> announced, and we can have 2 different MSS in a single connection, one
> for each peer.
> 
> If a peer later wants to send larger segments, it can, but it must
> respect the mss advertised by the other peer during handshake.
> 

I am not mixing endpoints, you are.

If you need to be convinced, please grab :
https://patchwork.ozlabs.org/patch/723028/

And just watch "ss -temoi ..."
Marcelo Ricardo Leitner Feb. 6, 2017, 9:12 p.m. UTC | #7
On Fri, Feb 03, 2017 at 06:47:33AM -0800, Eric Dumazet wrote:
> On Fri, 2017-02-03 at 12:28 -0200, Marcelo Ricardo Leitner wrote:
> 
> > Aren't you mixing the endpoints here? MSS is the largest amount of data
> > that the peer can receive in a single segment, and not how much it will
> > send. For the sending part, that depends on what the other peer
> > announced, and we can have 2 different MSS in a single connection, one
> > for each peer.
> > 
> > If a peer later wants to send larger segments, it can, but it must
> > respect the mss advertised by the other peer during handshake.
> > 
> 
> I am not mixing endpoints, you are.
> 
> If you need to be convinced, please grab :
> https://patchwork.ozlabs.org/patch/723028/
> 
> And just watch "ss -temoi ..." 

I still don't get it, but I also hit the warning on my laptop, using
iwlwifi. Not sure what I did in order to trigger it, it was by accident.

  Marcelo
Markus Trippelsdorf March 10, 2017, 12:22 p.m. UTC | #8
On 2017.02.06 at 19:12 -0200, Marcelo Ricardo Leitner wrote:
> On Fri, Feb 03, 2017 at 06:47:33AM -0800, Eric Dumazet wrote:
> > On Fri, 2017-02-03 at 12:28 -0200, Marcelo Ricardo Leitner wrote:
> > 
> > > Aren't you mixing the endpoints here? MSS is the largest amount of data
> > > that the peer can receive in a single segment, and not how much it will
> > > send. For the sending part, that depends on what the other peer
> > > announced, and we can have 2 different MSS in a single connection, one
> > > for each peer.
> > > 
> > > If a peer later wants to send larger segments, it can, but it must
> > > respect the mss advertised by the other peer during handshake.
> > > 
> > 
> > I am not mixing endpoints, you are.
> > 
> > If you need to be convinced, please grab :
> > https://patchwork.ozlabs.org/patch/723028/
> > 
> > And just watch "ss -temoi ..." 
> 
> I still don't get it, but I also hit the warning on my laptop, using
> iwlwifi. Not sure what I did in order to trigger it, it was by accident.

I am running with your debugging patch applied since the beginning of
February and was not able to reproduce the issue ever again.
So I think your code is innocent and another bug (,that seems to be
fixed since then) somehow caused the kernel to jump to the function.
Markus Trippelsdorf March 19, 2017, 12:14 p.m. UTC | #9
On 2017.02.06 at 19:12 -0200, Marcelo Ricardo Leitner wrote:
> On Fri, Feb 03, 2017 at 06:47:33AM -0800, Eric Dumazet wrote:
> > On Fri, 2017-02-03 at 12:28 -0200, Marcelo Ricardo Leitner wrote:
> > 
> > > Aren't you mixing the endpoints here? MSS is the largest amount of data
> > > that the peer can receive in a single segment, and not how much it will
> > > send. For the sending part, that depends on what the other peer
> > > announced, and we can have 2 different MSS in a single connection, one
> > > for each peer.
> > > 
> > > If a peer later wants to send larger segments, it can, but it must
> > > respect the mss advertised by the other peer during handshake.
> > > 
> > 
> > I am not mixing endpoints, you are.
> > 
> > If you need to be convinced, please grab :
> > https://patchwork.ozlabs.org/patch/723028/
> > 
> > And just watch "ss -temoi ..." 
> 
> I still don't get it, but I also hit the warning on my laptop, using
> iwlwifi. Not sure what I did in order to trigger it, it was by accident.

After many weeks without any warning, I've hit the issue again today:

 TCP: eth0: Driver has suspect GRO implementation, TCP performance may be compromised. rcv_mss:1448 advmss:1448 len:1460
Eric Dumazet March 19, 2017, 7:20 p.m. UTC | #10
On Sun, 2017-03-19 at 13:14 +0100, Markus Trippelsdorf wrote:
> On 2017.02.06 at 19:12 -0200, Marcelo Ricardo Leitner wrote:
> > On Fri, Feb 03, 2017 at 06:47:33AM -0800, Eric Dumazet wrote:
> > > On Fri, 2017-02-03 at 12:28 -0200, Marcelo Ricardo Leitner wrote:
> > > 
> > > > Aren't you mixing the endpoints here? MSS is the largest amount of data
> > > > that the peer can receive in a single segment, and not how much it will
> > > > send. For the sending part, that depends on what the other peer
> > > > announced, and we can have 2 different MSS in a single connection, one
> > > > for each peer.
> > > > 
> > > > If a peer later wants to send larger segments, it can, but it must
> > > > respect the mss advertised by the other peer during handshake.
> > > > 
> > > 
> > > I am not mixing endpoints, you are.
> > > 
> > > If you need to be convinced, please grab :
> > > https://patchwork.ozlabs.org/patch/723028/
> > > 
> > > And just watch "ss -temoi ..." 
> > 
> > I still don't get it, but I also hit the warning on my laptop, using
> > iwlwifi. Not sure what I did in order to trigger it, it was by accident.
> 
> After many weeks without any warning, I've hit the issue again today:
> 
>  TCP: eth0: Driver has suspect GRO implementation, TCP performance may be compromised. rcv_mss:1448 advmss:1448 len:1460
> 

It is very possible the sender suddenly forgot to use TCP timestamps.

This warning is a hint, and can not assume senders are not dumb.
diff mbox

Patch

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index bfa165cc455a..eddd5b6a28b1 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -128,6 +128,7 @@  int sysctl_tcp_invalid_ratelimit __read_mostly = HZ/2;
 
 static void tcp_gro_dev_warn(struct sock *sk, const struct sk_buff *skb)
 {
+	struct inet_connection_sock *icsk = inet_csk(sk);
 	static bool __once __read_mostly;
 
 	if (!__once) {
@@ -137,8 +138,9 @@  static void tcp_gro_dev_warn(struct sock *sk, const struct sk_buff *skb)
 
 		rcu_read_lock();
 		dev = dev_get_by_index_rcu(sock_net(sk), skb->skb_iif);
-		pr_warn("%s: Driver has suspect GRO implementation, TCP performance may be compromised.\n",
-			dev ? dev->name : "Unknown driver");
+		pr_warn("%s: Driver has suspect GRO implementation, TCP performance may be compromised. rcv_mss:%u advmss:%u len:%u\n",
+			dev ? dev->name : "Unknown driver",
+			icsk->icsk_ack.rcv_mss, tcp_sk(sk)->advmss, skb->len);
 		rcu_read_unlock();
 	}
 }