Message ID | 20190719185233.242049-1-edumazet@google.com |
---|---|
State | Accepted |
Delegated to: | David Miller |
Headers | show |
Series | [net] tcp: be more careful in tcp_fragment() | expand |
From: Eric Dumazet <edumazet@google.com> Date: Fri, 19 Jul 2019 11:52:33 -0700 > Some applications set tiny SO_SNDBUF values and expect > TCP to just work. Recent patches to address CVE-2019-11478 > broke them in case of losses, since retransmits might > be prevented. > > We should allow these flows to make progress. > > This patch allows the first and last skb in retransmit queue > to be split even if memory limits are hit. > > It also adds the some room due to the fact that tcp_sendmsg() > and tcp_sendpage() might overshoot sk_wmem_queued by about one full > TSO skb (64KB size). Note this allowance was already present > in stable backports for kernels < 4.15 > > Note for < 4.15 backports : > tcp_rtx_queue_tail() will probably look like : > > static inline struct sk_buff *tcp_rtx_queue_tail(const struct sock *sk) > { > struct sk_buff *skb = tcp_send_head(sk); > > return skb ? tcp_write_queue_prev(sk, skb) : tcp_write_queue_tail(sk); > } > > Fixes: f070ef2ac667 ("tcp: tcp_fragment() should apply sane memory limits") > Signed-off-by: Eric Dumazet <edumazet@google.com> > Reported-by: Andrew Prout <aprout@ll.mit.edu> > Tested-by: Andrew Prout <aprout@ll.mit.edu> > Tested-by: Jonathan Lemon <jonathan.lemon@gmail.com> > Tested-by: Michal Kubecek <mkubecek@suse.cz> > Acked-by: Neal Cardwell <ncardwell@google.com> > Acked-by: Yuchung Cheng <ycheng@google.com> > Acked-by: Christoph Paasch <cpaasch@apple.com> > Cc: Jonathan Looney <jtl@netflix.com> Applied and queued up for -stable.
diff --git a/include/net/tcp.h b/include/net/tcp.h index f42d300f0cfaa87520320dd287a7b4750adf7d8a..e5cf514ba118e688ce3b3da66f696abd47e1d10f 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -1709,6 +1709,11 @@ static inline struct sk_buff *tcp_rtx_queue_head(const struct sock *sk) return skb_rb_first(&sk->tcp_rtx_queue); } +static inline struct sk_buff *tcp_rtx_queue_tail(const struct sock *sk) +{ + return skb_rb_last(&sk->tcp_rtx_queue); +} + static inline struct sk_buff *tcp_write_queue_head(const struct sock *sk) { return skb_peek(&sk->sk_write_queue); diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 4af1f5dae9d3e937ef39685355c9d3f19ff3ee3b..6e4afc48d7bba7cded4d3fe38f32ab02328f9e05 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -1288,6 +1288,7 @@ int tcp_fragment(struct sock *sk, enum tcp_queue tcp_queue, struct tcp_sock *tp = tcp_sk(sk); struct sk_buff *buff; int nsize, old_factor; + long limit; int nlen; u8 flags; @@ -1298,8 +1299,16 @@ int tcp_fragment(struct sock *sk, enum tcp_queue tcp_queue, if (nsize < 0) nsize = 0; - if (unlikely((sk->sk_wmem_queued >> 1) > sk->sk_sndbuf && - tcp_queue != TCP_FRAG_IN_WRITE_QUEUE)) { + /* tcp_sendmsg() can overshoot sk_wmem_queued by one full size skb. + * We need some allowance to not penalize applications setting small + * SO_SNDBUF values. + * Also allow first and last skb in retransmit queue to be split. + */ + limit = sk->sk_sndbuf + 2 * SKB_TRUESIZE(GSO_MAX_SIZE); + if (unlikely((sk->sk_wmem_queued >> 1) > limit && + tcp_queue != TCP_FRAG_IN_WRITE_QUEUE && + skb != tcp_rtx_queue_head(sk) && + skb != tcp_rtx_queue_tail(sk))) { NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPWQUEUETOOBIG); return -ENOMEM; }