diff mbox

[net] tcp: fix overflow in __tcp_retransmit_skb()

Message ID 1473952353.22679.25.camel@edumazet-glaptop3.roam.corp.google.com
State Accepted, archived
Delegated to: David Miller
Headers show

Commit Message

Eric Dumazet Sept. 15, 2016, 3:12 p.m. UTC
From: Eric Dumazet <edumazet@google.com>

If a TCP socket gets a large write queue, an overflow can happen
in a test in __tcp_retransmit_skb() preventing all retransmits.

The flow then stalls and resets after timeouts.

Tested:

sysctl -w net.core.wmem_max=1000000000
netperf -H dest -- -s 1000000000

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv4/tcp_output.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

David Laight Sept. 15, 2016, 3:52 p.m. UTC | #1
From: Eric Dumazet

> Sent: 15 September 2016 16:13

> If a TCP socket gets a large write queue, an overflow can happen

> in a test in __tcp_retransmit_skb() preventing all retransmits.

...
>  	if (atomic_read(&sk->sk_wmem_alloc) >

> -	    min(sk->sk_wmem_queued + (sk->sk_wmem_queued >> 2), sk->sk_sndbuf))

> +	    min_t(u32, sk->sk_wmem_queued + (sk->sk_wmem_queued >> 2),

> +		  sk->sk_sndbuf))

>  		return -EAGAIN;


Might it also be better to split that test to (say):

	u32 wmem_alloc = atomic_read(&sk->sk_wmem_alloc);
	if (unlikely((wmem_alloc > sk->sk_sndbuf))
		return -EAGAIN;
	if (unlikely(wmem_alloc > sk->sk_wmem_queued + (sk->sk_wmem_queued >> 2)))
		return -EAGAIN;

It might even be worth splitting the second test as:

	if (unlikely(wmem_alloc > sk->sk_wmem_queued)
	    && wmem_alloc > sk->sk_wmem_queued + (sk->sk_wmem_queued >> 2))
		return -EAGAIN;

	David
Eric Dumazet Sept. 15, 2016, 4:35 p.m. UTC | #2
On Thu, 2016-09-15 at 15:52 +0000, David Laight wrote:
> From: Eric Dumazet
> > Sent: 15 September 2016 16:13
> > If a TCP socket gets a large write queue, an overflow can happen
> > in a test in __tcp_retransmit_skb() preventing all retransmits.
> ...
> >  	if (atomic_read(&sk->sk_wmem_alloc) >
> > -	    min(sk->sk_wmem_queued + (sk->sk_wmem_queued >> 2), sk->sk_sndbuf))
> > +	    min_t(u32, sk->sk_wmem_queued + (sk->sk_wmem_queued >> 2),
> > +		  sk->sk_sndbuf))
> >  		return -EAGAIN;
> 
> Might it also be better to split that test to (say):
> 
> 	u32 wmem_alloc = atomic_read(&sk->sk_wmem_alloc);
> 	if (unlikely((wmem_alloc > sk->sk_sndbuf))
> 		return -EAGAIN;
> 	if (unlikely(wmem_alloc > sk->sk_wmem_queued + (sk->sk_wmem_queued >> 2)))
> 		return -EAGAIN;

Well, I find the existing code more readable, but this is just an
opinion.

Thanks.
David Miller Sept. 17, 2016, 2 p.m. UTC | #3
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 15 Sep 2016 08:12:33 -0700

> From: Eric Dumazet <edumazet@google.com>
> 
> If a TCP socket gets a large write queue, an overflow can happen
> in a test in __tcp_retransmit_skb() preventing all retransmits.
> 
> The flow then stalls and resets after timeouts.
> 
> Tested:
> 
> sysctl -w net.core.wmem_max=1000000000
> netperf -H dest -- -s 1000000000
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied.
diff mbox

Patch

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index bdaef7fd6e47..f53d0cca5fa4 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2605,7 +2605,8 @@  int __tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb, int segs)
 	 * copying overhead: fragmentation, tunneling, mangling etc.
 	 */
 	if (atomic_read(&sk->sk_wmem_alloc) >
-	    min(sk->sk_wmem_queued + (sk->sk_wmem_queued >> 2), sk->sk_sndbuf))
+	    min_t(u32, sk->sk_wmem_queued + (sk->sk_wmem_queued >> 2),
+		  sk->sk_sndbuf))
 		return -EAGAIN;
 
 	if (skb_still_in_host_queue(sk, skb))