Message ID | 20191211073419.258820-1-edumazet@google.com |
---|---|
State | Superseded |
Delegated to: | David Miller |
Headers | show |
Series | [net] tcp: do not send empty skb from tcp_write_xmit() | expand |
On Wed, Dec 11, 2019 at 2:34 AM Eric Dumazet <edumazet@google.com> wrote: > > Backport of commit fdfc5c8594c2 ("tcp: remove empty skb from > write queue in error cases") in linux-4.14 stable triggered > various bugs. One of them has been fixed in commit ba2ddb43f270 > ("tcp: Don't dequeue SYN/FIN-segments from write-queue"), but > we still have crashes in some occasions. > > Root-cause is that when tcp_sendmsg() has allocated a fresh > skb and could not append a fragment before being blocked > in sk_stream_wait_memory(), tcp_write_xmit() might be called > and decide to send this fresh and empty skb. > > Sending an empty packet is not only silly, it might have caused > many issues we had in the past with tp->packets_out being > out of sync. > > Fixes: c65f7f00c587 ("[TCP]: Simplify SKB data portion allocation with NETIF_F_SG.") > Signed-off-by: Eric Dumazet <edumazet@google.com> > Cc: Christoph Paasch <cpaasch@apple.com> > Cc: Neal Cardwell <ncardwell@google.com> > Cc: Jason Baron <jbaron@akamai.com> > --- > net/ipv4/tcp_output.c | 8 ++++++++ > 1 file changed, 8 insertions(+) > > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c > index b184f03d743715ef4b2d166ceae651529be77953..57f434a8e41ffd6bc584cb4d9e87703491a378c1 100644 > --- a/net/ipv4/tcp_output.c > +++ b/net/ipv4/tcp_output.c > @@ -2438,6 +2438,14 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle, > if (tcp_small_queue_check(sk, skb, 0)) > break; > > + /* Argh, we hit an empty skb(), presumably a thread > + * is sleeping in sendmsg()/sk_stream_wait_memory(). > + * We do not want to send a pure-ack packet and have > + * a strange looking rtx queue with empty packet(s). > + */ > + if (TCP_SKB_CB(skb)->end_seq == TCP_SKB_CB(skb)->seq) > + break; > + > if (unlikely(tcp_transmit_skb(sk, skb, 1, gfp))) > break; > > -- Thanks for the fix, Eric! Is there any risk that any current or future bugs that create persistently empty skbs could cause the connection to "freeze", unable to reach the tcp_transmit_skb() call in tcp_write_xmit()? To avoid this risk, would it make sense to delete the empty skb and continue the tcp_write_xmit() transmit loop, rather than breaking out of the loop? Just curious to learn. :-) thanks, neal
On Wed, Dec 11, 2019 at 11:17 AM Neal Cardwell <ncardwell@google.com> wrote: > > On Wed, Dec 11, 2019 at 2:34 AM Eric Dumazet <edumazet@google.com> wrote: > > > > Backport of commit fdfc5c8594c2 ("tcp: remove empty skb from > > write queue in error cases") in linux-4.14 stable triggered > > various bugs. One of them has been fixed in commit ba2ddb43f270 > > ("tcp: Don't dequeue SYN/FIN-segments from write-queue"), but > > we still have crashes in some occasions. > > > > Root-cause is that when tcp_sendmsg() has allocated a fresh > > skb and could not append a fragment before being blocked > > in sk_stream_wait_memory(), tcp_write_xmit() might be called > > and decide to send this fresh and empty skb. > > > > Sending an empty packet is not only silly, it might have caused > > many issues we had in the past with tp->packets_out being > > out of sync. > > > > Fixes: c65f7f00c587 ("[TCP]: Simplify SKB data portion allocation with NETIF_F_SG.") > > Signed-off-by: Eric Dumazet <edumazet@google.com> > > Cc: Christoph Paasch <cpaasch@apple.com> > > Cc: Neal Cardwell <ncardwell@google.com> > > Cc: Jason Baron <jbaron@akamai.com> > > --- > > net/ipv4/tcp_output.c | 8 ++++++++ > > 1 file changed, 8 insertions(+) > > > > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c > > index b184f03d743715ef4b2d166ceae651529be77953..57f434a8e41ffd6bc584cb4d9e87703491a378c1 100644 > > --- a/net/ipv4/tcp_output.c > > +++ b/net/ipv4/tcp_output.c > > @@ -2438,6 +2438,14 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle, > > if (tcp_small_queue_check(sk, skb, 0)) > > break; > > > > + /* Argh, we hit an empty skb(), presumably a thread > > + * is sleeping in sendmsg()/sk_stream_wait_memory(). > > + * We do not want to send a pure-ack packet and have > > + * a strange looking rtx queue with empty packet(s). > > + */ > > + if (TCP_SKB_CB(skb)->end_seq == TCP_SKB_CB(skb)->seq) > > + break; > > + > > if (unlikely(tcp_transmit_skb(sk, skb, 1, gfp))) > > break; > > > > -- > > Thanks for the fix, Eric! > > Is there any risk that any current or future bugs that create > persistently empty skbs could cause the connection to "freeze", unable > to reach the tcp_transmit_skb() call in tcp_write_xmit()? > > To avoid this risk, would it make sense to delete the empty skb and > continue the tcp_write_xmit() transmit loop, rather than breaking out > of the loop? This 'empty' skb must be the last in the queue. Removing it from the queue would not prevent tcp_write_xmit() from breaking the loop. If we remove it, then we force sendmsg() to re-allocate a fresh skb, which seems more work, and would paper around the bugs that would be un-noticed. Another question to ask is if we need to reconsider how we use tcp_write_queue_empty() as an indicator for 'I have something in the write queue' This is obviously wrong if the write queue has a single empty skb. Maybe we should instead use tp->write_seq != tp->snd_nxt
On Wed, Dec 11, 2019 at 11:39 AM Eric Dumazet <edumazet@google.com> wrote: > > On Wed, Dec 11, 2019 at 11:17 AM Neal Cardwell <ncardwell@google.com> wrote: > > > > On Wed, Dec 11, 2019 at 2:34 AM Eric Dumazet <edumazet@google.com> wrote: > > > > > > Backport of commit fdfc5c8594c2 ("tcp: remove empty skb from > > > write queue in error cases") in linux-4.14 stable triggered > > > various bugs. One of them has been fixed in commit ba2ddb43f270 > > > ("tcp: Don't dequeue SYN/FIN-segments from write-queue"), but > > > we still have crashes in some occasions. > > > > > > Root-cause is that when tcp_sendmsg() has allocated a fresh > > > skb and could not append a fragment before being blocked > > > in sk_stream_wait_memory(), tcp_write_xmit() might be called > > > and decide to send this fresh and empty skb. > > > > > > Sending an empty packet is not only silly, it might have caused > > > many issues we had in the past with tp->packets_out being > > > out of sync. > > > > > > Fixes: c65f7f00c587 ("[TCP]: Simplify SKB data portion allocation with NETIF_F_SG.") > > > Signed-off-by: Eric Dumazet <edumazet@google.com> > > > Cc: Christoph Paasch <cpaasch@apple.com> > > > Cc: Neal Cardwell <ncardwell@google.com> > > > Cc: Jason Baron <jbaron@akamai.com> > > > --- > > > net/ipv4/tcp_output.c | 8 ++++++++ > > > 1 file changed, 8 insertions(+) > > > > > > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c > > > index b184f03d743715ef4b2d166ceae651529be77953..57f434a8e41ffd6bc584cb4d9e87703491a378c1 100644 > > > --- a/net/ipv4/tcp_output.c > > > +++ b/net/ipv4/tcp_output.c > > > @@ -2438,6 +2438,14 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle, > > > if (tcp_small_queue_check(sk, skb, 0)) > > > break; > > > > > > + /* Argh, we hit an empty skb(), presumably a thread > > > + * is sleeping in sendmsg()/sk_stream_wait_memory(). > > > + * We do not want to send a pure-ack packet and have > > > + * a strange looking rtx queue with empty packet(s). > > > + */ > > > + if (TCP_SKB_CB(skb)->end_seq == TCP_SKB_CB(skb)->seq) > > > + break; > > > + > > > if (unlikely(tcp_transmit_skb(sk, skb, 1, gfp))) > > > break; > > > > > > -- > > > > Thanks for the fix, Eric! > > > > Is there any risk that any current or future bugs that create > > persistently empty skbs could cause the connection to "freeze", unable > > to reach the tcp_transmit_skb() call in tcp_write_xmit()? > > > > To avoid this risk, would it make sense to delete the empty skb and > > continue the tcp_write_xmit() transmit loop, rather than breaking out > > of the loop? > > This 'empty' skb must be the last in the queue. > > Removing it from the queue would not prevent tcp_write_xmit() from > breaking the loop. > > If we remove it, then we force sendmsg() to re-allocate a fresh skb, > which seems more work, and would paper around the bugs that would be > un-noticed. > > Another question to ask is if we need to reconsider how we use > tcp_write_queue_empty() as an indicator > for 'I have something in the write queue' > > This is obviously wrong if the write queue has a single empty skb. > > Maybe we should instead use tp->write_seq != tp->snd_nxt Also we use skb_queue_len(&sk->sk_write_queue) == 0 in tcp_sendmsg_locked() and this seems not good. We could have changed tcp_sendmsg_locked() to leave the empty skb in the write queue. ( and we can thus remove tcp_remove_empty_skb() completely since it has caused many problems in stable kernels) diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 8a39ee79489192c02385aaadc8d1ae969fb55d23..9ba3294de9cb4e929afdc0e00b01b6b534c84af6 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -1419,7 +1419,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size) sock_zerocopy_put_abort(uarg, true); err = sk_stream_error(sk, flags, err); /* make sure we wake any epoll edge trigger waiter */ - if (unlikely(skb_queue_len(&sk->sk_write_queue) == 0 && + if (unlikely(tp->write_seq == tp->snd_nxt && err == -EAGAIN)) { sk->sk_write_space(sk); tcp_chrono_stop(sk, TCP_CHRONO_SNDBUF_LIMITED);
On Wed, Dec 11, 2019 at 2:44 PM Eric Dumazet <edumazet@google.com> wrote: > > On Wed, Dec 11, 2019 at 11:39 AM Eric Dumazet <edumazet@google.com> wrote: > > > > On Wed, Dec 11, 2019 at 11:17 AM Neal Cardwell <ncardwell@google.com> wrote: > > > > > > On Wed, Dec 11, 2019 at 2:34 AM Eric Dumazet <edumazet@google.com> wrote: > > > > > > > > Backport of commit fdfc5c8594c2 ("tcp: remove empty skb from > > > > write queue in error cases") in linux-4.14 stable triggered > > > > various bugs. One of them has been fixed in commit ba2ddb43f270 > > > > ("tcp: Don't dequeue SYN/FIN-segments from write-queue"), but > > > > we still have crashes in some occasions. > > > > > > > > Root-cause is that when tcp_sendmsg() has allocated a fresh > > > > skb and could not append a fragment before being blocked > > > > in sk_stream_wait_memory(), tcp_write_xmit() might be called > > > > and decide to send this fresh and empty skb. > > > > > > > > Sending an empty packet is not only silly, it might have caused > > > > many issues we had in the past with tp->packets_out being > > > > out of sync. > > > > > > > > Fixes: c65f7f00c587 ("[TCP]: Simplify SKB data portion allocation with NETIF_F_SG.") > > > > Signed-off-by: Eric Dumazet <edumazet@google.com> > > > > Cc: Christoph Paasch <cpaasch@apple.com> > > > > Cc: Neal Cardwell <ncardwell@google.com> > > > > Cc: Jason Baron <jbaron@akamai.com> > > > > --- > > > > net/ipv4/tcp_output.c | 8 ++++++++ > > > > 1 file changed, 8 insertions(+) > > > > > > > > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c > > > > index b184f03d743715ef4b2d166ceae651529be77953..57f434a8e41ffd6bc584cb4d9e87703491a378c1 100644 > > > > --- a/net/ipv4/tcp_output.c > > > > +++ b/net/ipv4/tcp_output.c > > > > @@ -2438,6 +2438,14 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle, > > > > if (tcp_small_queue_check(sk, skb, 0)) > > > > break; > > > > > > > > + /* Argh, we hit an empty skb(), presumably a thread > > > > + * is sleeping in sendmsg()/sk_stream_wait_memory(). > > > > + * We do not want to send a pure-ack packet and have > > > > + * a strange looking rtx queue with empty packet(s). > > > > + */ > > > > + if (TCP_SKB_CB(skb)->end_seq == TCP_SKB_CB(skb)->seq) > > > > + break; > > > > + > > > > if (unlikely(tcp_transmit_skb(sk, skb, 1, gfp))) > > > > break; > > > > > > > > -- > > > > > > Thanks for the fix, Eric! > > > > > > Is there any risk that any current or future bugs that create > > > persistently empty skbs could cause the connection to "freeze", unable > > > to reach the tcp_transmit_skb() call in tcp_write_xmit()? > > > > > > To avoid this risk, would it make sense to delete the empty skb and > > > continue the tcp_write_xmit() transmit loop, rather than breaking out > > > of the loop? > > > > This 'empty' skb must be the last in the queue. > > > > Removing it from the queue would not prevent tcp_write_xmit() from > > breaking the loop. > > > > If we remove it, then we force sendmsg() to re-allocate a fresh skb, > > which seems more work, and would paper around the bugs that would be > > un-noticed. > > > > Another question to ask is if we need to reconsider how we use > > tcp_write_queue_empty() as an indicator > > for 'I have something in the write queue' > > > > This is obviously wrong if the write queue has a single empty skb. > > > > Maybe we should instead use tp->write_seq != tp->snd_nxt > > Also we use skb_queue_len(&sk->sk_write_queue) == 0 in > tcp_sendmsg_locked() and this seems not good. > > We could have changed tcp_sendmsg_locked() to leave the empty skb in > the write queue. > ( and we can thus remove tcp_remove_empty_skb() completely since it > has caused many problems in stable kernels) > > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c > index 8a39ee79489192c02385aaadc8d1ae969fb55d23..9ba3294de9cb4e929afdc0e00b01b6b534c84af6 > 100644 > --- a/net/ipv4/tcp.c > +++ b/net/ipv4/tcp.c > @@ -1419,7 +1419,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct > msghdr *msg, size_t size) > sock_zerocopy_put_abort(uarg, true); > err = sk_stream_error(sk, flags, err); > /* make sure we wake any epoll edge trigger waiter */ > - if (unlikely(skb_queue_len(&sk->sk_write_queue) == 0 && > + if (unlikely(tp->write_seq == tp->snd_nxt && > err == -EAGAIN)) { > sk->sk_write_space(sk); > tcp_chrono_stop(sk, TCP_CHRONO_SNDBUF_LIMITED); Thanks, Eric. I like your idea to audit the TCP calls to tcp_write_queue_empty() and skb_queue_len(&sk->sk_write_queue) to see which ones should be replaced with comparisons of tp->write_seq and tp->snd_nxt (including this one). Good catch! neal
On Wed, Dec 11, 2019 at 2:34 AM Eric Dumazet <edumazet@google.com> wrote: > > Backport of commit fdfc5c8594c2 ("tcp: remove empty skb from > write queue in error cases") in linux-4.14 stable triggered > various bugs. One of them has been fixed in commit ba2ddb43f270 > ("tcp: Don't dequeue SYN/FIN-segments from write-queue"), but > we still have crashes in some occasions. > > Root-cause is that when tcp_sendmsg() has allocated a fresh > skb and could not append a fragment before being blocked > in sk_stream_wait_memory(), tcp_write_xmit() might be called > and decide to send this fresh and empty skb. > > Sending an empty packet is not only silly, it might have caused > many issues we had in the past with tp->packets_out being > out of sync. > > Fixes: c65f7f00c587 ("[TCP]: Simplify SKB data portion allocation with NETIF_F_SG.") > Signed-off-by: Eric Dumazet <edumazet@google.com> > Cc: Christoph Paasch <cpaasch@apple.com> > Cc: Neal Cardwell <ncardwell@google.com> > Cc: Jason Baron <jbaron@akamai.com> > --- Acked-by: Neal Cardwell <ncardwell@google.com> thanks, neal
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index b184f03d743715ef4b2d166ceae651529be77953..57f434a8e41ffd6bc584cb4d9e87703491a378c1 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -2438,6 +2438,14 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle, if (tcp_small_queue_check(sk, skb, 0)) break; + /* Argh, we hit an empty skb(), presumably a thread + * is sleeping in sendmsg()/sk_stream_wait_memory(). + * We do not want to send a pure-ack packet and have + * a strange looking rtx queue with empty packet(s). + */ + if (TCP_SKB_CB(skb)->end_seq == TCP_SKB_CB(skb)->seq) + break; + if (unlikely(tcp_transmit_skb(sk, skb, 1, gfp))) break;
Backport of commit fdfc5c8594c2 ("tcp: remove empty skb from write queue in error cases") in linux-4.14 stable triggered various bugs. One of them has been fixed in commit ba2ddb43f270 ("tcp: Don't dequeue SYN/FIN-segments from write-queue"), but we still have crashes in some occasions. Root-cause is that when tcp_sendmsg() has allocated a fresh skb and could not append a fragment before being blocked in sk_stream_wait_memory(), tcp_write_xmit() might be called and decide to send this fresh and empty skb. Sending an empty packet is not only silly, it might have caused many issues we had in the past with tp->packets_out being out of sync. Fixes: c65f7f00c587 ("[TCP]: Simplify SKB data portion allocation with NETIF_F_SG.") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Christoph Paasch <cpaasch@apple.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Jason Baron <jbaron@akamai.com> --- net/ipv4/tcp_output.c | 8 ++++++++ 1 file changed, 8 insertions(+)