diff mbox series

[net] tcp: purge write queue upon RST

Message ID 20180227233218.158382-1-soheil.kdev@gmail.com
State Accepted, archived
Delegated to: David Miller
Headers show
Series [net] tcp: purge write queue upon RST | expand

Commit Message

Soheil Hassas Yeganeh Feb. 27, 2018, 11:32 p.m. UTC
From: Soheil Hassas Yeganeh <soheil@google.com>

When the connection is reset, there is no point in
keeping the packets on the write queue until the connection
is closed.

RFC 793 (page 70) and RFC 793-bis (page 64) both suggest
purging the write queue upon RST:
https://tools.ietf.org/html/draft-ietf-tcpm-rfc793bis-07

Moreover, this is essential for a correct MSG_ZEROCOPY
implementation, because userspace cannot call close(fd)
before receiving zerocopy signals even when the connection
is reset.

Fixes: f214f915e7db ("tcp: enable MSG_ZEROCOPY")
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
---
 net/ipv4/tcp_input.c | 1 +
 1 file changed, 1 insertion(+)

Comments

David Miller Feb. 28, 2018, 4:42 p.m. UTC | #1
From: Soheil Hassas Yeganeh <soheil.kdev@gmail.com>
Date: Tue, 27 Feb 2018 18:32:18 -0500

> From: Soheil Hassas Yeganeh <soheil@google.com>
> 
> When the connection is reset, there is no point in
> keeping the packets on the write queue until the connection
> is closed.
> 
> RFC 793 (page 70) and RFC 793-bis (page 64) both suggest
> purging the write queue upon RST:
> https://tools.ietf.org/html/draft-ietf-tcpm-rfc793bis-07
> 
> Moreover, this is essential for a correct MSG_ZEROCOPY
> implementation, because userspace cannot call close(fd)
> before receiving zerocopy signals even when the connection
> is reset.
> 
> Fixes: f214f915e7db ("tcp: enable MSG_ZEROCOPY")
> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
> Reviewed-by: Eric Dumazet <edumazet@google.com>
> Signed-off-by: Yuchung Cheng <ycheng@google.com>
> Signed-off-by: Neal Cardwell <ncardwell@google.com>

This is one of those "yeah, why have we been doing this all of
this time?" kind of situation.

Let's hope there isn't some subtle side effect, but indeed this
current behavior is broken for MSG_ZEROCOPY.

Applied and queued up for -stable, thanks!
Eric Dumazet Feb. 28, 2018, 4:46 p.m. UTC | #2
On Wed, Feb 28, 2018 at 8:42 AM, David Miller <davem@davemloft.net> wrote:
> From: Soheil Hassas Yeganeh <soheil.kdev@gmail.com>
> Date: Tue, 27 Feb 2018 18:32:18 -0500
>
>> From: Soheil Hassas Yeganeh <soheil@google.com>
>>
>> When the connection is reset, there is no point in
>> keeping the packets on the write queue until the connection
>> is closed.
>>
>> RFC 793 (page 70) and RFC 793-bis (page 64) both suggest
>> purging the write queue upon RST:
>> https://tools.ietf.org/html/draft-ietf-tcpm-rfc793bis-07
>>
>> Moreover, this is essential for a correct MSG_ZEROCOPY
>> implementation, because userspace cannot call close(fd)
>> before receiving zerocopy signals even when the connection
>> is reset.
>>
>> Fixes: f214f915e7db ("tcp: enable MSG_ZEROCOPY")
>> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
>> Reviewed-by: Eric Dumazet <edumazet@google.com>
>> Signed-off-by: Yuchung Cheng <ycheng@google.com>
>> Signed-off-by: Neal Cardwell <ncardwell@google.com>
>
> This is one of those "yeah, why have we been doing this all of
> this time?" kind of situation.
>
> Let's hope there isn't some subtle side effect, but indeed this
> current behavior is broken for MSG_ZEROCOPY.
>

One of the effect is that for very large queues (more than 100 MB), queue purge
might take a lot of time, in BH context (while handling one RST)

But even before the patch, this could also happen from BH context anyway.

We might use work queue (s) in the future to handle the purge in the
background in process context.
But really this is not urgent.

> Applied and queued up for -stable, thanks!

Thanks David.
diff mbox series

Patch

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 06b9c4765f42..b17fac2629c3 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -3998,6 +3998,7 @@  void tcp_reset(struct sock *sk)
 	/* This barrier is coupled with smp_rmb() in tcp_poll() */
 	smp_wmb();
 
+	tcp_write_queue_purge(sk);
 	tcp_done(sk);
 
 	if (!sock_flag(sk, SOCK_DEAD))