diff mbox series

[net-next,1/2] tcp-zerocopy: Return inq along with tcp receive zerocopy.

Message ID 20200214233050.19429-1-arjunroy.kdev@gmail.com
State Accepted
Delegated to: David Miller
Headers show
Series [net-next,1/2] tcp-zerocopy: Return inq along with tcp receive zerocopy. | expand

Commit Message

Arjun Roy Feb. 14, 2020, 11:30 p.m. UTC
From: Arjun Roy <arjunroy@google.com>

This patchset is intended to reduce the number of extra system calls
imposed by TCP receive zerocopy. For ping-pong RPC style workloads,
this patchset has demonstrated a system call reduction of about 30%
when coupled with userspace changes.

For applications using edge-triggered epoll, returning inq along with
the result of tcp receive zerocopy could remove the need to call
recvmsg()=-EAGAIN after a successful zerocopy. Generally speaking,
since normally we would need to perform a recvmsg() call for every
successful small RPC read via TCP receive zerocopy, returning inq can
reduce the number of system calls performed by approximately half.

Signed-off-by: Arjun Roy <arjunroy@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>

---
 include/uapi/linux/tcp.h |  1 +
 net/ipv4/tcp.c           | 15 ++++++++++++++-
 2 files changed, 15 insertions(+), 1 deletion(-)

Comments

David Miller Feb. 17, 2020, 3:25 a.m. UTC | #1
From: Arjun Roy <arjunroy.kdev@gmail.com>
Date: Fri, 14 Feb 2020 15:30:49 -0800

> From: Arjun Roy <arjunroy@google.com>
> 
> This patchset is intended to reduce the number of extra system calls
> imposed by TCP receive zerocopy. For ping-pong RPC style workloads,
> this patchset has demonstrated a system call reduction of about 30%
> when coupled with userspace changes.
> 
> For applications using edge-triggered epoll, returning inq along with
> the result of tcp receive zerocopy could remove the need to call
> recvmsg()=-EAGAIN after a successful zerocopy. Generally speaking,
> since normally we would need to perform a recvmsg() call for every
> successful small RPC read via TCP receive zerocopy, returning inq can
> reduce the number of system calls performed by approximately half.
> 
> Signed-off-by: Arjun Roy <arjunroy@google.com>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>

Applied.
diff mbox series

Patch

diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h
index 74af1f759cee..19700101cbba 100644
--- a/include/uapi/linux/tcp.h
+++ b/include/uapi/linux/tcp.h
@@ -343,5 +343,6 @@  struct tcp_zerocopy_receive {
 	__u64 address;		/* in: address of mapping */
 	__u32 length;		/* in/out: number of bytes to map/mapped */
 	__u32 recv_skip_hint;	/* out: amount of bytes to skip */
+	__u32 inq; /* out: amount of bytes in read queue */
 };
 #endif /* _UAPI_LINUX_TCP_H */
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index f09fbc85b108..947be81b35c5 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -3658,13 +3658,26 @@  static int do_tcp_getsockopt(struct sock *sk, int level,
 
 		if (get_user(len, optlen))
 			return -EFAULT;
-		if (len != sizeof(zc))
+		if (len < offsetofend(struct tcp_zerocopy_receive, length))
 			return -EINVAL;
+		if (len > sizeof(zc))
+			len = sizeof(zc);
 		if (copy_from_user(&zc, optval, len))
 			return -EFAULT;
 		lock_sock(sk);
 		err = tcp_zerocopy_receive(sk, &zc);
 		release_sock(sk);
+		switch (len) {
+		case sizeof(zc):
+		case offsetofend(struct tcp_zerocopy_receive, inq):
+			goto zerocopy_rcv_inq;
+		case offsetofend(struct tcp_zerocopy_receive, length):
+		default:
+			goto zerocopy_rcv_out;
+		}
+zerocopy_rcv_inq:
+		zc.inq = tcp_inq_hint(sk);
+zerocopy_rcv_out:
 		if (!err && copy_to_user(optval, &zc, len))
 			err = -EFAULT;
 		return err;