Message ID | 20200214233050.19429-1-arjunroy.kdev@gmail.com |
---|---|
State | Accepted |
Delegated to: | David Miller |
Headers | show |
Series | [net-next,1/2] tcp-zerocopy: Return inq along with tcp receive zerocopy. | expand |
From: Arjun Roy <arjunroy.kdev@gmail.com> Date: Fri, 14 Feb 2020 15:30:49 -0800 > From: Arjun Roy <arjunroy@google.com> > > This patchset is intended to reduce the number of extra system calls > imposed by TCP receive zerocopy. For ping-pong RPC style workloads, > this patchset has demonstrated a system call reduction of about 30% > when coupled with userspace changes. > > For applications using edge-triggered epoll, returning inq along with > the result of tcp receive zerocopy could remove the need to call > recvmsg()=-EAGAIN after a successful zerocopy. Generally speaking, > since normally we would need to perform a recvmsg() call for every > successful small RPC read via TCP receive zerocopy, returning inq can > reduce the number of system calls performed by approximately half. > > Signed-off-by: Arjun Roy <arjunroy@google.com> > Signed-off-by: Eric Dumazet <edumazet@google.com> > Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Applied.
diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h index 74af1f759cee..19700101cbba 100644 --- a/include/uapi/linux/tcp.h +++ b/include/uapi/linux/tcp.h @@ -343,5 +343,6 @@ struct tcp_zerocopy_receive { __u64 address; /* in: address of mapping */ __u32 length; /* in/out: number of bytes to map/mapped */ __u32 recv_skip_hint; /* out: amount of bytes to skip */ + __u32 inq; /* out: amount of bytes in read queue */ }; #endif /* _UAPI_LINUX_TCP_H */ diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index f09fbc85b108..947be81b35c5 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -3658,13 +3658,26 @@ static int do_tcp_getsockopt(struct sock *sk, int level, if (get_user(len, optlen)) return -EFAULT; - if (len != sizeof(zc)) + if (len < offsetofend(struct tcp_zerocopy_receive, length)) return -EINVAL; + if (len > sizeof(zc)) + len = sizeof(zc); if (copy_from_user(&zc, optval, len)) return -EFAULT; lock_sock(sk); err = tcp_zerocopy_receive(sk, &zc); release_sock(sk); + switch (len) { + case sizeof(zc): + case offsetofend(struct tcp_zerocopy_receive, inq): + goto zerocopy_rcv_inq; + case offsetofend(struct tcp_zerocopy_receive, length): + default: + goto zerocopy_rcv_out; + } +zerocopy_rcv_inq: + zc.inq = tcp_inq_hint(sk); +zerocopy_rcv_out: if (!err && copy_to_user(optval, &zc, len)) err = -EFAULT; return err;