diff mbox series

[SRU,X] tcp: refine memory limit test in tcp_fragment()

Message ID 1561403947-567-1-git-send-email-tyhicks@canonical.com
State New
Headers show
Series [SRU,X] tcp: refine memory limit test in tcp_fragment() | expand

Commit Message

Tyler Hicks June 24, 2019, 7:19 p.m. UTC
From: Eric Dumazet <edumazet@google.com>

tcp_fragment() might be called for skbs in the write queue.

Memory limits might have been exceeded because tcp_sendmsg() only
checks limits at full skb (64KB) boundaries.

Therefore, we need to make sure tcp_fragment() wont punish applications
that might have setup very low SO_SNDBUF values.

Fixes: f070ef2ac667 ("tcp: tcp_fragment() should apply sane memory limits")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Christoph Paasch <cpaasch@apple.com>
Tested-by: Christoph Paasch <cpaasch@apple.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

CVE-2019-11478

(backported from commit b6653b3629e5b88202be3c9abc44713973f5c4b4)
[tyhicks: Don't enforce the limit on the skb that tcp_send_head points
 as that skb has never been sent out. In newer kernels containing commit
 75c119afe14f ("tcp: implement rb-tree based retransmit queue"), where
 there the retransmission queue is separate from the write queue, this
 skb would be in the write queue.
 With the modified check in this backported patch, we run the risk of
 enforcing the memory limit on an skb that is after tcp_send_head in the
 queue yet has never been sent out. However, an inspection of all
 tcp_fragment() call sites finds that this shouldn't occur and the limit
 will only be enforced on skbs that are up for retransmission.]
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
---

I've successfully tested this patch using a slightly modified version of
a packetdrill test that was sent to the netdev list. Without this kernel
change, the test hangs. The test successfully completes with this kernel
change.

 net/ipv4/tcp_output.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Thadeu Lima de Souza Cascardo June 24, 2019, 7:30 p.m. UTC | #1
Acked-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Jay Vosburgh June 24, 2019, 9:57 p.m. UTC | #2
Tyler Hicks <tyhicks@canonical.com> wrote:

>From: Eric Dumazet <edumazet@google.com>
>
>tcp_fragment() might be called for skbs in the write queue.
>
>Memory limits might have been exceeded because tcp_sendmsg() only
>checks limits at full skb (64KB) boundaries.
>
>Therefore, we need to make sure tcp_fragment() wont punish applications
>that might have setup very low SO_SNDBUF values.
>
>Fixes: f070ef2ac667 ("tcp: tcp_fragment() should apply sane memory limits")
>Signed-off-by: Eric Dumazet <edumazet@google.com>
>Reported-by: Christoph Paasch <cpaasch@apple.com>
>Tested-by: Christoph Paasch <cpaasch@apple.com>
>Signed-off-by: David S. Miller <davem@davemloft.net>
>
>CVE-2019-11478
>
>(backported from commit b6653b3629e5b88202be3c9abc44713973f5c4b4)
>[tyhicks: Don't enforce the limit on the skb that tcp_send_head points
> as that skb has never been sent out. In newer kernels containing commit
> 75c119afe14f ("tcp: implement rb-tree based retransmit queue"), where
> there the retransmission queue is separate from the write queue, this
> skb would be in the write queue.
> With the modified check in this backported patch, we run the risk of
> enforcing the memory limit on an skb that is after tcp_send_head in the
> queue yet has never been sent out. However, an inspection of all
> tcp_fragment() call sites finds that this shouldn't occur and the limit
> will only be enforced on skbs that are up for retransmission.]
>Signed-off-by: Tyler Hicks <tyhicks@canonical.com>

Reviewed-by: Jay Vosburgh <jay.vosburgh@canonical.com>


>---
>
>I've successfully tested this patch using a slightly modified version of
>a packetdrill test that was sent to the netdev list. Without this kernel
>change, the test hangs. The test successfully completes with this kernel
>change.
>
> net/ipv4/tcp_output.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
>diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
>index ede265fbf7ba..719d2cc8770c 100644
>--- a/net/ipv4/tcp_output.c
>+++ b/net/ipv4/tcp_output.c
>@@ -1163,7 +1163,8 @@ int tcp_fragment(struct sock *sk, struct sk_buff *skb, u32 len,
> 	if (nsize < 0)
> 		nsize = 0;
> 
>-	if (unlikely((sk->sk_wmem_queued >> 1) > sk->sk_sndbuf)) {
>+	if (unlikely((sk->sk_wmem_queued >> 1) > sk->sk_sndbuf &&
>+		     skb != tcp_send_head(sk))) {
> 		NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPWQUEUETOOBIG);
> 		return -ENOMEM;
> 	}
>-- 
>2.7.4
>
Stefan Bader June 28, 2019, 2:27 p.m. UTC | #3
On 24.06.19 21:19, Tyler Hicks wrote:
> From: Eric Dumazet <edumazet@google.com>
> 
> tcp_fragment() might be called for skbs in the write queue.
> 
> Memory limits might have been exceeded because tcp_sendmsg() only
> checks limits at full skb (64KB) boundaries.
> 
> Therefore, we need to make sure tcp_fragment() wont punish applications
> that might have setup very low SO_SNDBUF values.
> 
> Fixes: f070ef2ac667 ("tcp: tcp_fragment() should apply sane memory limits")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reported-by: Christoph Paasch <cpaasch@apple.com>
> Tested-by: Christoph Paasch <cpaasch@apple.com>
> Signed-off-by: David S. Miller <davem@davemloft.net>
> 
> CVE-2019-11478
> 
> (backported from commit b6653b3629e5b88202be3c9abc44713973f5c4b4)
> [tyhicks: Don't enforce the limit on the skb that tcp_send_head points
>  as that skb has never been sent out. In newer kernels containing commit
>  75c119afe14f ("tcp: implement rb-tree based retransmit queue"), where
>  there the retransmission queue is separate from the write queue, this
>  skb would be in the write queue.
>  With the modified check in this backported patch, we run the risk of
>  enforcing the memory limit on an skb that is after tcp_send_head in the
>  queue yet has never been sent out. However, an inspection of all
>  tcp_fragment() call sites finds that this shouldn't occur and the limit
>  will only be enforced on skbs that are up for retransmission.]
> Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
> ---

Just to make it more obvious, Jay's review should count as an ack.
> 
> I've successfully tested this patch using a slightly modified version of
> a packetdrill test that was sent to the netdev list. Without this kernel
> change, the test hangs. The test successfully completes with this kernel
> change.
> 
>  net/ipv4/tcp_output.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index ede265fbf7ba..719d2cc8770c 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -1163,7 +1163,8 @@ int tcp_fragment(struct sock *sk, struct sk_buff *skb, u32 len,
>  	if (nsize < 0)
>  		nsize = 0;
>  
> -	if (unlikely((sk->sk_wmem_queued >> 1) > sk->sk_sndbuf)) {
> +	if (unlikely((sk->sk_wmem_queued >> 1) > sk->sk_sndbuf &&
> +		     skb != tcp_send_head(sk))) {
>  		NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPWQUEUETOOBIG);
>  		return -ENOMEM;
>  	}
>
Kleber Sacilotto de Souza July 1, 2019, 9:10 a.m. UTC | #4
On 6/24/19 9:19 PM, Tyler Hicks wrote:
> From: Eric Dumazet <edumazet@google.com>
> 
> tcp_fragment() might be called for skbs in the write queue.
> 
> Memory limits might have been exceeded because tcp_sendmsg() only
> checks limits at full skb (64KB) boundaries.
> 
> Therefore, we need to make sure tcp_fragment() wont punish applications
> that might have setup very low SO_SNDBUF values.
> 
> Fixes: f070ef2ac667 ("tcp: tcp_fragment() should apply sane memory limits")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reported-by: Christoph Paasch <cpaasch@apple.com>
> Tested-by: Christoph Paasch <cpaasch@apple.com>
> Signed-off-by: David S. Miller <davem@davemloft.net>
> 
> CVE-2019-11478
> 
> (backported from commit b6653b3629e5b88202be3c9abc44713973f5c4b4)
> [tyhicks: Don't enforce the limit on the skb that tcp_send_head points
>  as that skb has never been sent out. In newer kernels containing commit
>  75c119afe14f ("tcp: implement rb-tree based retransmit queue"), where
>  there the retransmission queue is separate from the write queue, this
>  skb would be in the write queue.
>  With the modified check in this backported patch, we run the risk of
>  enforcing the memory limit on an skb that is after tcp_send_head in the
>  queue yet has never been sent out. However, an inspection of all
>  tcp_fragment() call sites finds that this shouldn't occur and the limit
>  will only be enforced on skbs that are up for retransmission.]
> Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
> ---
> 
> I've successfully tested this patch using a slightly modified version of
> a packetdrill test that was sent to the netdev list. Without this kernel
> change, the test hangs. The test successfully completes with this kernel
> change.
> 
>  net/ipv4/tcp_output.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index ede265fbf7ba..719d2cc8770c 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -1163,7 +1163,8 @@ int tcp_fragment(struct sock *sk, struct sk_buff *skb, u32 len,
>  	if (nsize < 0)
>  		nsize = 0;
>  
> -	if (unlikely((sk->sk_wmem_queued >> 1) > sk->sk_sndbuf)) {
> +	if (unlikely((sk->sk_wmem_queued >> 1) > sk->sk_sndbuf &&
> +		     skb != tcp_send_head(sk))) {
>  		NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPWQUEUETOOBIG);
>  		return -ENOMEM;
>  	}
> 


This patch has already been applied to xenial/master-next branch. Sending
the applied message only for consistency.

Thanks,
Kleber
diff mbox series

Patch

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index ede265fbf7ba..719d2cc8770c 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1163,7 +1163,8 @@  int tcp_fragment(struct sock *sk, struct sk_buff *skb, u32 len,
 	if (nsize < 0)
 		nsize = 0;
 
-	if (unlikely((sk->sk_wmem_queued >> 1) > sk->sk_sndbuf)) {
+	if (unlikely((sk->sk_wmem_queued >> 1) > sk->sk_sndbuf &&
+		     skb != tcp_send_head(sk))) {
 		NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPWQUEUETOOBIG);
 		return -ENOMEM;
 	}