diff mbox series

[net-next] tcp: allow zerocopy with fastopen

Message ID 20190125161723.75429-1-willemdebruijn.kernel@gmail.com
State Accepted
Delegated to: David Miller
Headers show
Series [net-next] tcp: allow zerocopy with fastopen | expand

Commit Message

Willem de Bruijn Jan. 25, 2019, 4:17 p.m. UTC
From: Willem de Bruijn <willemb@google.com>

Accept MSG_ZEROCOPY in all the TCP states that allow sendmsg. Remove
the explicit check for ESTABLISHED and CLOSE_WAIT states.

This requires correctly handling zerocopy state (uarg, sk_zckey) in
all paths reachable from other TCP states. Such as the EPIPE case
in sk_stream_wait_connect, which a sendmsg() in incorrect state will
now hit. Most paths are already safe.

Only extension needed is for TCP Fastopen active open. This can build
an skb with data in tcp_send_syn_data. Pass the uarg along with other
fastopen state, so that this skb also generates a zerocopy
notification on release.

Tested with active and passive tcp fastopen packetdrill scripts at
https://github.com/wdebruij/packetdrill/commit/1747eef03d25a2404e8132817d0f1244fd6f129d

Signed-off-by: Willem de Bruijn <willemb@google.com>
---
 include/net/tcp.h     |  1 +
 net/ipv4/tcp.c        | 11 ++++-------
 net/ipv4/tcp_output.c |  1 +
 3 files changed, 6 insertions(+), 7 deletions(-)

Comments

Eric Dumazet Jan. 25, 2019, 5:06 p.m. UTC | #1
On 01/25/2019 08:17 AM, Willem de Bruijn wrote:
> From: Willem de Bruijn <willemb@google.com>
> 
> Accept MSG_ZEROCOPY in all the TCP states that allow sendmsg. Remove
> the explicit check for ESTABLISHED and CLOSE_WAIT states.
> 
> This requires correctly handling zerocopy state (uarg, sk_zckey) in
> all paths reachable from other TCP states. Such as the EPIPE case
> in sk_stream_wait_connect, which a sendmsg() in incorrect state will
> now hit. Most paths are already safe.
> 
> Only extension needed is for TCP Fastopen active open. This can build
> an skb with data in tcp_send_syn_data. Pass the uarg along with other
> fastopen state, so that this skb also generates a zerocopy
> notification on release.
> 
> Tested with active and passive tcp fastopen packetdrill scripts at
> https://github.com/wdebruij/packetdrill/commit/1747eef03d25a2404e8132817d0f1244fd6f129d
> 
> Signed-off-by: Willem de Bruijn <willemb@google.com>
> ---

Let see if syzkaller finds issues with this :)

Signed-off-by: Eric Dumazet <edumazet@google.com>
David Miller Jan. 26, 2019, 6:41 a.m. UTC | #2
From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Date: Fri, 25 Jan 2019 11:17:23 -0500

> From: Willem de Bruijn <willemb@google.com>
> 
> Accept MSG_ZEROCOPY in all the TCP states that allow sendmsg. Remove
> the explicit check for ESTABLISHED and CLOSE_WAIT states.
> 
> This requires correctly handling zerocopy state (uarg, sk_zckey) in
> all paths reachable from other TCP states. Such as the EPIPE case
> in sk_stream_wait_connect, which a sendmsg() in incorrect state will
> now hit. Most paths are already safe.
> 
> Only extension needed is for TCP Fastopen active open. This can build
> an skb with data in tcp_send_syn_data. Pass the uarg along with other
> fastopen state, so that this skb also generates a zerocopy
> notification on release.
> 
> Tested with active and passive tcp fastopen packetdrill scripts at
> https://github.com/wdebruij/packetdrill/commit/1747eef03d25a2404e8132817d0f1244fd6f129d
> 
> Signed-off-by: Willem de Bruijn <willemb@google.com>

Applied, thanks.
diff mbox series

Patch

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 5c950180d61be..a6e0355921e1d 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1608,6 +1608,7 @@  struct tcp_fastopen_request {
 	struct msghdr			*data;  /* data in MSG_FASTOPEN */
 	size_t				size;
 	int				copied;	/* queued in tcp_connect() */
+	struct ubuf_info		*uarg;
 };
 void tcp_free_fastopen_req(struct tcp_sock *tp);
 void tcp_fastopen_destroy_cipher(struct sock *sk);
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 5f099c9d04e5d..12ba21433dd00 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1127,7 +1127,8 @@  void tcp_free_fastopen_req(struct tcp_sock *tp)
 }
 
 static int tcp_sendmsg_fastopen(struct sock *sk, struct msghdr *msg,
-				int *copied, size_t size)
+				int *copied, size_t size,
+				struct ubuf_info *uarg)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
 	struct inet_sock *inet = inet_sk(sk);
@@ -1147,6 +1148,7 @@  static int tcp_sendmsg_fastopen(struct sock *sk, struct msghdr *msg,
 		return -ENOBUFS;
 	tp->fastopen_req->data = msg;
 	tp->fastopen_req->size = size;
+	tp->fastopen_req->uarg = uarg;
 
 	if (inet->defer_connect) {
 		err = tcp_connect(sk);
@@ -1186,11 +1188,6 @@  int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
 	flags = msg->msg_flags;
 
 	if (flags & MSG_ZEROCOPY && size && sock_flag(sk, SOCK_ZEROCOPY)) {
-		if ((1 << sk->sk_state) & ~(TCPF_ESTABLISHED | TCPF_CLOSE_WAIT)) {
-			err = -EINVAL;
-			goto out_err;
-		}
-
 		skb = tcp_write_queue_tail(sk);
 		uarg = sock_zerocopy_realloc(sk, size, skb_zcopy(skb));
 		if (!uarg) {
@@ -1205,7 +1202,7 @@  int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
 
 	if (unlikely(flags & MSG_FASTOPEN || inet_sk(sk)->defer_connect) &&
 	    !tp->repair) {
-		err = tcp_sendmsg_fastopen(sk, msg, &copied_syn, size);
+		err = tcp_sendmsg_fastopen(sk, msg, &copied_syn, size, uarg);
 		if (err == -EINPROGRESS && copied_syn > 0)
 			goto out;
 		else if (err)
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 6527f61f59ff1..26a2948dca954 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -3455,6 +3455,7 @@  static int tcp_send_syn_data(struct sock *sk, struct sk_buff *syn)
 			skb_trim(syn_data, copied);
 			space = copied;
 		}
+		skb_zcopy_set(syn_data, fo->uarg, NULL);
 	}
 	/* No more data pending in inet_wait_for_connect() */
 	if (space == fo->size)