Message ID | 1473952353.22679.25.camel@edumazet-glaptop3.roam.corp.google.com |
---|---|
State | Accepted, archived |
Delegated to: | David Miller |
Headers | show |
From: Eric Dumazet > Sent: 15 September 2016 16:13 > If a TCP socket gets a large write queue, an overflow can happen > in a test in __tcp_retransmit_skb() preventing all retransmits. ... > if (atomic_read(&sk->sk_wmem_alloc) > > - min(sk->sk_wmem_queued + (sk->sk_wmem_queued >> 2), sk->sk_sndbuf)) > + min_t(u32, sk->sk_wmem_queued + (sk->sk_wmem_queued >> 2), > + sk->sk_sndbuf)) > return -EAGAIN; Might it also be better to split that test to (say): u32 wmem_alloc = atomic_read(&sk->sk_wmem_alloc); if (unlikely((wmem_alloc > sk->sk_sndbuf)) return -EAGAIN; if (unlikely(wmem_alloc > sk->sk_wmem_queued + (sk->sk_wmem_queued >> 2))) return -EAGAIN; It might even be worth splitting the second test as: if (unlikely(wmem_alloc > sk->sk_wmem_queued) && wmem_alloc > sk->sk_wmem_queued + (sk->sk_wmem_queued >> 2)) return -EAGAIN; David
On Thu, 2016-09-15 at 15:52 +0000, David Laight wrote: > From: Eric Dumazet > > Sent: 15 September 2016 16:13 > > If a TCP socket gets a large write queue, an overflow can happen > > in a test in __tcp_retransmit_skb() preventing all retransmits. > ... > > if (atomic_read(&sk->sk_wmem_alloc) > > > - min(sk->sk_wmem_queued + (sk->sk_wmem_queued >> 2), sk->sk_sndbuf)) > > + min_t(u32, sk->sk_wmem_queued + (sk->sk_wmem_queued >> 2), > > + sk->sk_sndbuf)) > > return -EAGAIN; > > Might it also be better to split that test to (say): > > u32 wmem_alloc = atomic_read(&sk->sk_wmem_alloc); > if (unlikely((wmem_alloc > sk->sk_sndbuf)) > return -EAGAIN; > if (unlikely(wmem_alloc > sk->sk_wmem_queued + (sk->sk_wmem_queued >> 2))) > return -EAGAIN; Well, I find the existing code more readable, but this is just an opinion. Thanks.
From: Eric Dumazet <eric.dumazet@gmail.com> Date: Thu, 15 Sep 2016 08:12:33 -0700 > From: Eric Dumazet <edumazet@google.com> > > If a TCP socket gets a large write queue, an overflow can happen > in a test in __tcp_retransmit_skb() preventing all retransmits. > > The flow then stalls and resets after timeouts. > > Tested: > > sysctl -w net.core.wmem_max=1000000000 > netperf -H dest -- -s 1000000000 > > Signed-off-by: Eric Dumazet <edumazet@google.com> Applied.
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index bdaef7fd6e47..f53d0cca5fa4 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -2605,7 +2605,8 @@ int __tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb, int segs) * copying overhead: fragmentation, tunneling, mangling etc. */ if (atomic_read(&sk->sk_wmem_alloc) > - min(sk->sk_wmem_queued + (sk->sk_wmem_queued >> 2), sk->sk_sndbuf)) + min_t(u32, sk->sk_wmem_queued + (sk->sk_wmem_queued >> 2), + sk->sk_sndbuf)) return -EAGAIN; if (skb_still_in_host_queue(sk, skb))