Message ID | 20170203115457.GA5296@localhost.localdomain |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
On 2017.02.03 at 09:54 -0200, Marcelo Ricardo Leitner wrote: > On Thu, Feb 02, 2017 at 05:59:24AM -0800, Eric Dumazet wrote: > > On Thu, 2017-02-02 at 05:31 -0800, Eric Dumazet wrote: > > > > > Anyway, I suspect the test is simply buggy ;) > > > > > > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c > > > index 41dcbd568cbe2403f2a9e659669afe462a42e228..5394a39fcce964a7fe7075b1531a8a1e05550a54 100644 > > > --- a/net/ipv4/tcp_input.c > > > +++ b/net/ipv4/tcp_input.c > > > @@ -164,7 +164,7 @@ static void tcp_measure_rcv_mss(struct sock *sk, const struct sk_buff *skb) > > > if (len >= icsk->icsk_ack.rcv_mss) { > > > icsk->icsk_ack.rcv_mss = min_t(unsigned int, len, > > > tcp_sk(sk)->advmss); > > > - if (unlikely(icsk->icsk_ack.rcv_mss != len)) > > > + if (unlikely(icsk->icsk_ack.rcv_mss != len && skb_is_gso(skb))) > > > tcp_gro_dev_warn(sk, skb); > > > } else { > > > /* Otherwise, we make more careful check taking into account, > > > > This wont really help. > > > > Our tcp_sk(sk)->advmss can be lower than the MSS used by the remote > > peer. > > > > ip ro add .... advmss 512 > > I don't follow. With a good driver, how can advmss be smaller than the > MSS used by the remote peer? Even with the route entry above, I get > segments just up to advmss, and no warning. > > Though yeah, interesting that this driver doesn't even support GRO. FCS > perhaps? > > Markus, do you have other interfaces in your system? Which MTU do you > use, and please try the (untested) patch below, to gather more debug > info: No, eth0 is the only interface. MTU = 1500. Sure, I will try your patch. But I don't know how to reproduce the issue, so you will have to wait until it triggers again.
On Fri, 2017-02-03 at 09:54 -0200, Marcelo Ricardo Leitner wrote: > On Thu, Feb 02, 2017 at 05:59:24AM -0800, Eric Dumazet wrote: > > On Thu, 2017-02-02 at 05:31 -0800, Eric Dumazet wrote: > > > > > Anyway, I suspect the test is simply buggy ;) > > > > > > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c > > > index 41dcbd568cbe2403f2a9e659669afe462a42e228..5394a39fcce964a7fe7075b1531a8a1e05550a54 100644 > > > --- a/net/ipv4/tcp_input.c > > > +++ b/net/ipv4/tcp_input.c > > > @@ -164,7 +164,7 @@ static void tcp_measure_rcv_mss(struct sock *sk, const struct sk_buff *skb) > > > if (len >= icsk->icsk_ack.rcv_mss) { > > > icsk->icsk_ack.rcv_mss = min_t(unsigned int, len, > > > tcp_sk(sk)->advmss); > > > - if (unlikely(icsk->icsk_ack.rcv_mss != len)) > > > + if (unlikely(icsk->icsk_ack.rcv_mss != len && skb_is_gso(skb))) > > > tcp_gro_dev_warn(sk, skb); > > > } else { > > > /* Otherwise, we make more careful check taking into account, > > > > This wont really help. > > > > Our tcp_sk(sk)->advmss can be lower than the MSS used by the remote > > peer. > > > > ip ro add .... advmss 512 > > I don't follow. With a good driver, how can advmss be smaller than the > MSS used by the remote peer? Even with the route entry above, I get > segments just up to advmss, and no warning. > A TCP flow has two ends. Common MTU = 1500 One can have advmss 500, the other one no advmss (or the standard 1460 one) So if we compare apple and orange, result might be shocking ;) If you want to reproduce this use the "ip ro add .... advmss 512" hint, and/or play with sysctl_tcp_mtu_probing
On Fri, Feb 03, 2017 at 05:24:06AM -0800, Eric Dumazet wrote: > On Fri, 2017-02-03 at 09:54 -0200, Marcelo Ricardo Leitner wrote: > > On Thu, Feb 02, 2017 at 05:59:24AM -0800, Eric Dumazet wrote: > > > On Thu, 2017-02-02 at 05:31 -0800, Eric Dumazet wrote: > > > > > > > Anyway, I suspect the test is simply buggy ;) > > > > > > > > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c > > > > index 41dcbd568cbe2403f2a9e659669afe462a42e228..5394a39fcce964a7fe7075b1531a8a1e05550a54 100644 > > > > --- a/net/ipv4/tcp_input.c > > > > +++ b/net/ipv4/tcp_input.c > > > > @@ -164,7 +164,7 @@ static void tcp_measure_rcv_mss(struct sock *sk, const struct sk_buff *skb) > > > > if (len >= icsk->icsk_ack.rcv_mss) { > > > > icsk->icsk_ack.rcv_mss = min_t(unsigned int, len, > > > > tcp_sk(sk)->advmss); > > > > - if (unlikely(icsk->icsk_ack.rcv_mss != len)) > > > > + if (unlikely(icsk->icsk_ack.rcv_mss != len && skb_is_gso(skb))) > > > > tcp_gro_dev_warn(sk, skb); > > > > } else { > > > > /* Otherwise, we make more careful check taking into account, > > > > > > This wont really help. > > > > > > Our tcp_sk(sk)->advmss can be lower than the MSS used by the remote > > > peer. > > > > > > ip ro add .... advmss 512 > > > > I don't follow. With a good driver, how can advmss be smaller than the > > MSS used by the remote peer? Even with the route entry above, I get > > segments just up to advmss, and no warning. > > > > A TCP flow has two ends. Indeed, though should be mostly about only one of them. > > Common MTU = 1500 > > One can have advmss 500, the other one no advmss (or the standard 1460 > one) Considering the rx side of peer A. Peer A advertises a given MSS to peer B and should not receive any segment from peer B larger than so. I'm failing to see how advmss can be smaller than the segment size just received. > > So if we compare apple and orange, result might be shocking ;) Yes heh just not seeing the mix here.. > > If you want to reproduce this use the "ip ro add .... advmss 512" hint, > and/or play with sysctl_tcp_mtu_probing I tried the route with advmss, no luck so far. Still digging.. Marcelo
On Fri, 2017-02-03 at 11:53 -0200, Marcelo Ricardo Leitner wrote: > On Fri, Feb 03, 2017 at 05:24:06AM -0800, Eric Dumazet wrote: > > On Fri, 2017-02-03 at 09:54 -0200, Marcelo Ricardo Leitner wrote: > > > On Thu, Feb 02, 2017 at 05:59:24AM -0800, Eric Dumazet wrote: > > > > On Thu, 2017-02-02 at 05:31 -0800, Eric Dumazet wrote: > > > > > > > > > Anyway, I suspect the test is simply buggy ;) > > > > > > > > > > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c > > > > > index 41dcbd568cbe2403f2a9e659669afe462a42e228..5394a39fcce964a7fe7075b1531a8a1e05550a54 100644 > > > > > --- a/net/ipv4/tcp_input.c > > > > > +++ b/net/ipv4/tcp_input.c > > > > > @@ -164,7 +164,7 @@ static void tcp_measure_rcv_mss(struct sock *sk, const struct sk_buff *skb) > > > > > if (len >= icsk->icsk_ack.rcv_mss) { > > > > > icsk->icsk_ack.rcv_mss = min_t(unsigned int, len, > > > > > tcp_sk(sk)->advmss); > > > > > - if (unlikely(icsk->icsk_ack.rcv_mss != len)) > > > > > + if (unlikely(icsk->icsk_ack.rcv_mss != len && skb_is_gso(skb))) > > > > > tcp_gro_dev_warn(sk, skb); > > > > > } else { > > > > > /* Otherwise, we make more careful check taking into account, > > > > > > > > This wont really help. > > > > > > > > Our tcp_sk(sk)->advmss can be lower than the MSS used by the remote > > > > peer. > > > > > > > > ip ro add .... advmss 512 > > > > > > I don't follow. With a good driver, how can advmss be smaller than the > > > MSS used by the remote peer? Even with the route entry above, I get > > > segments just up to advmss, and no warning. > > > > > > > A TCP flow has two ends. > > Indeed, though should be mostly about only one of them. > > > > > Common MTU = 1500 > > > > One can have advmss 500, the other one no advmss (or the standard 1460 > > one) > > Considering the rx side of peer A. Peer A advertises a given MSS to peer > B and should not receive any segment from peer B larger than so. > I'm failing to see how advmss can be smaller than the segment size just > received. tcp_sk(sk)->advmss records what the peer announced during its SYN (or SYNACK) message, in the MSS option. Nothing prevents the peer to change its mind later. Eg starting with MSS 512, then switch later to sending packets of 1024 or 1400 bytes. So the innocent NIC driver is not the problem here.
On Fri, Feb 03, 2017 at 06:16:06AM -0800, Eric Dumazet wrote: > On Fri, 2017-02-03 at 11:53 -0200, Marcelo Ricardo Leitner wrote: > > On Fri, Feb 03, 2017 at 05:24:06AM -0800, Eric Dumazet wrote: > > > On Fri, 2017-02-03 at 09:54 -0200, Marcelo Ricardo Leitner wrote: > > > > On Thu, Feb 02, 2017 at 05:59:24AM -0800, Eric Dumazet wrote: > > > > > On Thu, 2017-02-02 at 05:31 -0800, Eric Dumazet wrote: > > > > > > > > > > > Anyway, I suspect the test is simply buggy ;) > > > > > > > > > > > > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c > > > > > > index 41dcbd568cbe2403f2a9e659669afe462a42e228..5394a39fcce964a7fe7075b1531a8a1e05550a54 100644 > > > > > > --- a/net/ipv4/tcp_input.c > > > > > > +++ b/net/ipv4/tcp_input.c > > > > > > @@ -164,7 +164,7 @@ static void tcp_measure_rcv_mss(struct sock *sk, const struct sk_buff *skb) > > > > > > if (len >= icsk->icsk_ack.rcv_mss) { > > > > > > icsk->icsk_ack.rcv_mss = min_t(unsigned int, len, > > > > > > tcp_sk(sk)->advmss); > > > > > > - if (unlikely(icsk->icsk_ack.rcv_mss != len)) > > > > > > + if (unlikely(icsk->icsk_ack.rcv_mss != len && skb_is_gso(skb))) > > > > > > tcp_gro_dev_warn(sk, skb); > > > > > > } else { > > > > > > /* Otherwise, we make more careful check taking into account, > > > > > > > > > > This wont really help. > > > > > > > > > > Our tcp_sk(sk)->advmss can be lower than the MSS used by the remote > > > > > peer. > > > > > > > > > > ip ro add .... advmss 512 > > > > > > > > I don't follow. With a good driver, how can advmss be smaller than the > > > > MSS used by the remote peer? Even with the route entry above, I get > > > > segments just up to advmss, and no warning. > > > > > > > > > > A TCP flow has two ends. > > > > Indeed, though should be mostly about only one of them. > > > > > > > > Common MTU = 1500 > > > > > > One can have advmss 500, the other one no advmss (or the standard 1460 > > > one) > > > > Considering the rx side of peer A. Peer A advertises a given MSS to peer > > B and should not receive any segment from peer B larger than so. > > I'm failing to see how advmss can be smaller than the segment size just > > received. > > tcp_sk(sk)->advmss records what the peer announced during its SYN (or > SYNACK) message, in the MSS option. > > Nothing prevents the peer to change its mind later. > > Eg starting with MSS 512, then switch later to sending packets of 1024 > or 1400 bytes. Aren't you mixing the endpoints here? MSS is the largest amount of data that the peer can receive in a single segment, and not how much it will send. For the sending part, that depends on what the other peer announced, and we can have 2 different MSS in a single connection, one for each peer. If a peer later wants to send larger segments, it can, but it must respect the mss advertised by the other peer during handshake. > > So the innocent NIC driver is not the problem here. > >
On Fri, 2017-02-03 at 12:28 -0200, Marcelo Ricardo Leitner wrote: > Aren't you mixing the endpoints here? MSS is the largest amount of data > that the peer can receive in a single segment, and not how much it will > send. For the sending part, that depends on what the other peer > announced, and we can have 2 different MSS in a single connection, one > for each peer. > > If a peer later wants to send larger segments, it can, but it must > respect the mss advertised by the other peer during handshake. > I am not mixing endpoints, you are. If you need to be convinced, please grab : https://patchwork.ozlabs.org/patch/723028/ And just watch "ss -temoi ..."
On Fri, Feb 03, 2017 at 06:47:33AM -0800, Eric Dumazet wrote: > On Fri, 2017-02-03 at 12:28 -0200, Marcelo Ricardo Leitner wrote: > > > Aren't you mixing the endpoints here? MSS is the largest amount of data > > that the peer can receive in a single segment, and not how much it will > > send. For the sending part, that depends on what the other peer > > announced, and we can have 2 different MSS in a single connection, one > > for each peer. > > > > If a peer later wants to send larger segments, it can, but it must > > respect the mss advertised by the other peer during handshake. > > > > I am not mixing endpoints, you are. > > If you need to be convinced, please grab : > https://patchwork.ozlabs.org/patch/723028/ > > And just watch "ss -temoi ..." I still don't get it, but I also hit the warning on my laptop, using iwlwifi. Not sure what I did in order to trigger it, it was by accident. Marcelo
On 2017.02.06 at 19:12 -0200, Marcelo Ricardo Leitner wrote: > On Fri, Feb 03, 2017 at 06:47:33AM -0800, Eric Dumazet wrote: > > On Fri, 2017-02-03 at 12:28 -0200, Marcelo Ricardo Leitner wrote: > > > > > Aren't you mixing the endpoints here? MSS is the largest amount of data > > > that the peer can receive in a single segment, and not how much it will > > > send. For the sending part, that depends on what the other peer > > > announced, and we can have 2 different MSS in a single connection, one > > > for each peer. > > > > > > If a peer later wants to send larger segments, it can, but it must > > > respect the mss advertised by the other peer during handshake. > > > > > > > I am not mixing endpoints, you are. > > > > If you need to be convinced, please grab : > > https://patchwork.ozlabs.org/patch/723028/ > > > > And just watch "ss -temoi ..." > > I still don't get it, but I also hit the warning on my laptop, using > iwlwifi. Not sure what I did in order to trigger it, it was by accident. I am running with your debugging patch applied since the beginning of February and was not able to reproduce the issue ever again. So I think your code is innocent and another bug (,that seems to be fixed since then) somehow caused the kernel to jump to the function.
On 2017.02.06 at 19:12 -0200, Marcelo Ricardo Leitner wrote: > On Fri, Feb 03, 2017 at 06:47:33AM -0800, Eric Dumazet wrote: > > On Fri, 2017-02-03 at 12:28 -0200, Marcelo Ricardo Leitner wrote: > > > > > Aren't you mixing the endpoints here? MSS is the largest amount of data > > > that the peer can receive in a single segment, and not how much it will > > > send. For the sending part, that depends on what the other peer > > > announced, and we can have 2 different MSS in a single connection, one > > > for each peer. > > > > > > If a peer later wants to send larger segments, it can, but it must > > > respect the mss advertised by the other peer during handshake. > > > > > > > I am not mixing endpoints, you are. > > > > If you need to be convinced, please grab : > > https://patchwork.ozlabs.org/patch/723028/ > > > > And just watch "ss -temoi ..." > > I still don't get it, but I also hit the warning on my laptop, using > iwlwifi. Not sure what I did in order to trigger it, it was by accident. After many weeks without any warning, I've hit the issue again today: TCP: eth0: Driver has suspect GRO implementation, TCP performance may be compromised. rcv_mss:1448 advmss:1448 len:1460
On Sun, 2017-03-19 at 13:14 +0100, Markus Trippelsdorf wrote: > On 2017.02.06 at 19:12 -0200, Marcelo Ricardo Leitner wrote: > > On Fri, Feb 03, 2017 at 06:47:33AM -0800, Eric Dumazet wrote: > > > On Fri, 2017-02-03 at 12:28 -0200, Marcelo Ricardo Leitner wrote: > > > > > > > Aren't you mixing the endpoints here? MSS is the largest amount of data > > > > that the peer can receive in a single segment, and not how much it will > > > > send. For the sending part, that depends on what the other peer > > > > announced, and we can have 2 different MSS in a single connection, one > > > > for each peer. > > > > > > > > If a peer later wants to send larger segments, it can, but it must > > > > respect the mss advertised by the other peer during handshake. > > > > > > > > > > I am not mixing endpoints, you are. > > > > > > If you need to be convinced, please grab : > > > https://patchwork.ozlabs.org/patch/723028/ > > > > > > And just watch "ss -temoi ..." > > > > I still don't get it, but I also hit the warning on my laptop, using > > iwlwifi. Not sure what I did in order to trigger it, it was by accident. > > After many weeks without any warning, I've hit the issue again today: > > TCP: eth0: Driver has suspect GRO implementation, TCP performance may be compromised. rcv_mss:1448 advmss:1448 len:1460 > It is very possible the sender suddenly forgot to use TCP timestamps. This warning is a hint, and can not assume senders are not dumb.
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index bfa165cc455a..eddd5b6a28b1 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -128,6 +128,7 @@ int sysctl_tcp_invalid_ratelimit __read_mostly = HZ/2; static void tcp_gro_dev_warn(struct sock *sk, const struct sk_buff *skb) { + struct inet_connection_sock *icsk = inet_csk(sk); static bool __once __read_mostly; if (!__once) { @@ -137,8 +138,9 @@ static void tcp_gro_dev_warn(struct sock *sk, const struct sk_buff *skb) rcu_read_lock(); dev = dev_get_by_index_rcu(sock_net(sk), skb->skb_iif); - pr_warn("%s: Driver has suspect GRO implementation, TCP performance may be compromised.\n", - dev ? dev->name : "Unknown driver"); + pr_warn("%s: Driver has suspect GRO implementation, TCP performance may be compromised. rcv_mss:%u advmss:%u len:%u\n", + dev ? dev->name : "Unknown driver", + icsk->icsk_ack.rcv_mss, tcp_sk(sk)->advmss, skb->len); rcu_read_unlock(); } }