Message ID | 20200214233050.19429-2-arjunroy.kdev@gmail.com |
---|---|
State | Accepted |
Delegated to: | David Miller |
Headers | show |
Series | [net-next,1/2] tcp-zerocopy: Return inq along with tcp receive zerocopy. | expand |
From: Arjun Roy <arjunroy.kdev@gmail.com> Date: Fri, 14 Feb 2020 15:30:50 -0800 > From: Arjun Roy <arjunroy@google.com> > > This patchset is intended to reduce the number of extra system calls > imposed by TCP receive zerocopy. For ping-pong RPC style workloads, > this patchset has demonstrated a system call reduction of about 30% > when coupled with userspace changes. > > For applications using epoll, returning sk_err along with the result > of tcp receive zerocopy could remove the need to call > recvmsg()=-EAGAIN after a spurious wakeup. > > Consider a multi-threaded application using epoll. A thread may awaken > with EPOLLIN but another thread may already be reading. The > spuriously-awoken thread does not necessarily know that another thread > 'won'; rather, it may be possible that it was woken up due to the > presence of an error if there is no data. A zerocopy read receiving 0 > bytes thus would need to be followed up by recvmsg to be sure. > > Instead, we return sk_err directly with zerocopy, so the application > can avoid this extra system call. > > Signed-off-by: Arjun Roy <arjunroy@google.com> > Signed-off-by: Eric Dumazet <edumazet@google.com> > Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Applied.
diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h index 19700101cbba..e1706a7c9d88 100644 --- a/include/uapi/linux/tcp.h +++ b/include/uapi/linux/tcp.h @@ -344,5 +344,6 @@ struct tcp_zerocopy_receive { __u32 length; /* in/out: number of bytes to map/mapped */ __u32 recv_skip_hint; /* out: amount of bytes to skip */ __u32 inq; /* out: amount of bytes in read queue */ + __s32 err; /* out: socket error */ }; #endif /* _UAPI_LINUX_TCP_H */ diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 947be81b35c5..0efac228bbdb 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -3667,14 +3667,20 @@ static int do_tcp_getsockopt(struct sock *sk, int level, lock_sock(sk); err = tcp_zerocopy_receive(sk, &zc); release_sock(sk); + if (len == sizeof(zc)) + goto zerocopy_rcv_sk_err; switch (len) { - case sizeof(zc): + case offsetofend(struct tcp_zerocopy_receive, err): + goto zerocopy_rcv_sk_err; case offsetofend(struct tcp_zerocopy_receive, inq): goto zerocopy_rcv_inq; case offsetofend(struct tcp_zerocopy_receive, length): default: goto zerocopy_rcv_out; } +zerocopy_rcv_sk_err: + if (!err) + zc.err = sock_error(sk); zerocopy_rcv_inq: zc.inq = tcp_inq_hint(sk); zerocopy_rcv_out: