From patchwork Fri Feb 14 23:30:49 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Arjun Roy X-Patchwork-Id: 1238428 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20161025 header.b=gbdzcSLs; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 48K8lH5qQXz9s29 for ; Sat, 15 Feb 2020 10:31:07 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728018AbgBNXbF (ORCPT ); Fri, 14 Feb 2020 18:31:05 -0500 Received: from mail-pl1-f193.google.com ([209.85.214.193]:36622 "EHLO mail-pl1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727529AbgBNXbF (ORCPT ); Fri, 14 Feb 2020 18:31:05 -0500 Received: by mail-pl1-f193.google.com with SMTP id a6so4286255plm.3 for ; Fri, 14 Feb 2020 15:31:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=OxCog8fVyySuAIJvK8qvH0KIiRzKGIwKHGb/B7M5EJE=; b=gbdzcSLsc+ZfzSJEUsZZs8j+fNOHC33/X3w+1xZik9APjAYHXDP+kIjDCIwF7Kx2hZ q+GaZke1cMhDZ3uY9qf3K5nDq2jOJIgpdkQFHw7sDirU2L/mTzCsIuWbUMcMO0UGvTuL a0xJpuNPb+zOsbuePPUj0DvkPiA42SOl7dcMpg+Q+BAFNiCoDMn5CPUvzT7O/EZvbHUP IqLRKPexpZSSlCBk1OCGgBV/4PXzla+dSsan9C++rcnuLXzZM18vPkmhAEsWzqdQ+JGF HFBd/BqjUqiBUuPaN4NjYZHXt3ntBT1+vSilLYf2+ro8qhfVETTB6knKGKeB/8SPOn3H qq4Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=OxCog8fVyySuAIJvK8qvH0KIiRzKGIwKHGb/B7M5EJE=; b=qb58fNQ4MbkfN+B3rSsanDAuZTvJHXR8knCEYKnqXLXu/KiJp5XzpsMLVqS5lIvjCp ZBYdcndbTVpRhGb6JT+ormQwKKrr3z6xTJUjIGTABU3oa0LzCiBMGdvplS4S9eghq2na tYogTa7RJQvpw/yZ4D/Ht9Ourd3gGTT+Id7UpHsiTZ//dVVMYjSNwU1HZV2pflfj13yP tpwx/Wc7uclON9M4Ut8TnvgK9N3KFduA2GpDVKTAk/+GC3QoPUkvL/hEbT/CnJiXxZoi hrh1Y2Be5F8ZIeYoufw3FIF5+aM6y3B2UwU0DtO2bGz5mKia9aTsKBiwcTGRlBuXznrZ 4d+A== X-Gm-Message-State: APjAAAV8/S5Lp13R2Y7+5TJ5rBkxwf8BZ7jqqDZYxYsoOyCwxjglQWCx dfUWYYAyMfL5A2kA2K0OP6U= X-Google-Smtp-Source: APXvYqw+34JbsDacim74z1Nyw9jB3yx/xzBEfwG6ccE8ORCaWwtCt88OOzMPvYT1A/QCnXIHQPS7Mg== X-Received: by 2002:a17:90a:608:: with SMTP id j8mr6425796pjj.85.1581723064682; Fri, 14 Feb 2020 15:31:04 -0800 (PST) Received: from phantasmagoria.svl.corp.google.com ([2620:15c:2c4:201:2b0a:8c1:6a84:1aa0]) by smtp.gmail.com with ESMTPSA id p23sm8687868pgn.92.2020.02.14.15.31.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 14 Feb 2020 15:31:04 -0800 (PST) From: Arjun Roy To: davem@davemloft.net, netdev@vger.kernel.org Cc: arjunroy@google.com, soheil@google.com, edumazet@google.com Subject: [PATCH net-next 1/2] tcp-zerocopy: Return inq along with tcp receive zerocopy. Date: Fri, 14 Feb 2020 15:30:49 -0800 Message-Id: <20200214233050.19429-1-arjunroy.kdev@gmail.com> X-Mailer: git-send-email 2.25.0.265.gbab2e86ba0-goog MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Arjun Roy This patchset is intended to reduce the number of extra system calls imposed by TCP receive zerocopy. For ping-pong RPC style workloads, this patchset has demonstrated a system call reduction of about 30% when coupled with userspace changes. For applications using edge-triggered epoll, returning inq along with the result of tcp receive zerocopy could remove the need to call recvmsg()=-EAGAIN after a successful zerocopy. Generally speaking, since normally we would need to perform a recvmsg() call for every successful small RPC read via TCP receive zerocopy, returning inq can reduce the number of system calls performed by approximately half. Signed-off-by: Arjun Roy Signed-off-by: Eric Dumazet Signed-off-by: Soheil Hassas Yeganeh --- include/uapi/linux/tcp.h | 1 + net/ipv4/tcp.c | 15 ++++++++++++++- 2 files changed, 15 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h index 74af1f759cee..19700101cbba 100644 --- a/include/uapi/linux/tcp.h +++ b/include/uapi/linux/tcp.h @@ -343,5 +343,6 @@ struct tcp_zerocopy_receive { __u64 address; /* in: address of mapping */ __u32 length; /* in/out: number of bytes to map/mapped */ __u32 recv_skip_hint; /* out: amount of bytes to skip */ + __u32 inq; /* out: amount of bytes in read queue */ }; #endif /* _UAPI_LINUX_TCP_H */ diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index f09fbc85b108..947be81b35c5 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -3658,13 +3658,26 @@ static int do_tcp_getsockopt(struct sock *sk, int level, if (get_user(len, optlen)) return -EFAULT; - if (len != sizeof(zc)) + if (len < offsetofend(struct tcp_zerocopy_receive, length)) return -EINVAL; + if (len > sizeof(zc)) + len = sizeof(zc); if (copy_from_user(&zc, optval, len)) return -EFAULT; lock_sock(sk); err = tcp_zerocopy_receive(sk, &zc); release_sock(sk); + switch (len) { + case sizeof(zc): + case offsetofend(struct tcp_zerocopy_receive, inq): + goto zerocopy_rcv_inq; + case offsetofend(struct tcp_zerocopy_receive, length): + default: + goto zerocopy_rcv_out; + } +zerocopy_rcv_inq: + zc.inq = tcp_inq_hint(sk); +zerocopy_rcv_out: if (!err && copy_to_user(optval, &zc, len)) err = -EFAULT; return err; From patchwork Fri Feb 14 23:30:50 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Arjun Roy X-Patchwork-Id: 1238429 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20161025 header.b=L+fZiOSN; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 48K8lM1tMcz9s29 for ; Sat, 15 Feb 2020 10:31:11 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728073AbgBNXbK (ORCPT ); Fri, 14 Feb 2020 18:31:10 -0500 Received: from mail-pg1-f195.google.com ([209.85.215.195]:37567 "EHLO mail-pg1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727529AbgBNXbK (ORCPT ); Fri, 14 Feb 2020 18:31:10 -0500 Received: by mail-pg1-f195.google.com with SMTP id z12so5704101pgl.4 for ; Fri, 14 Feb 2020 15:31:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=gZLQknL7E6b2EIxvwkbFcnl+or+NRLvY1ZsTZInqkc4=; b=L+fZiOSNpJOe9E08xOItmkxceFRZT7txu/O9BqYAiOgqWyq5cWqN3TmwgjeYd6rb57 atebjA6jjWH4LLBZxLtpialBiJIpxQI5Sjh3NL+WfPY+75bJ6S4nhzLRqtoNxXvwkghO ygmMHsIWI2Ur6p3QYs5qe6qc5qKXL8h2LGKD7b5hGLYDlz4aPDwh5+L1wofO+WgnAN6b f2fUB+5G794uuAWzxHF9aMfeA8ZRQvAwQCmax3SIjcm5wTZn2I3VpRN4smUovXPYtLA+ bWuf0VJPw8NXsUf4MuT31UXH0MwbJiZ8LUIkDrH9opcxD6AG3NtEfc3Iasw8x5WqTWj2 FiXA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=gZLQknL7E6b2EIxvwkbFcnl+or+NRLvY1ZsTZInqkc4=; b=QoHk6L6pNblQhtXZ9Pl0NTb3lrg8AaI1OvUBqMgR0Ve92FQTzDOWuuxoLgAYvPGjcx uPIXGZS8Kws75K/dgfPHWsQl5Cz/gizy5PFE6PY6qIk/Z3re/2MaB5UjocoWwacPyZIv Z5vK1QpzSh3gwLw7uhttyFH5pPD1PPdTcl07viZGJDS7brtVO+mOAYi8E34zZO7ezCnE m4WUPI/7FV5fvLI5Uk3HthbnhgucTZKN0u2FXhtAftGn+15n+OZOgGyonIZVvMGKQxkG 3XnD693uH9gZ0TgRmD1w2jjt2wF5zXufImxAoHzj0ax0q2XBKZlpFiCCIj3SxO0ybdFp 6JEA== X-Gm-Message-State: APjAAAUBjUVOkae6KECqsXz1N4bNIp4jZdCc686bc+gr6/+jqRMrPXl0 w/YR2rS0MAJI2WoqzweOoCr1OpO5 X-Google-Smtp-Source: APXvYqxNGFBMuRG7J3rifllQR7AFDcKTdUk9L/KlMz+DkWoj81VBqOCR8ajErp9ZNYldlhaU4dQ1qA== X-Received: by 2002:a63:fe14:: with SMTP id p20mr5639327pgh.94.1581723068950; Fri, 14 Feb 2020 15:31:08 -0800 (PST) Received: from phantasmagoria.svl.corp.google.com ([2620:15c:2c4:201:2b0a:8c1:6a84:1aa0]) by smtp.gmail.com with ESMTPSA id p23sm8687868pgn.92.2020.02.14.15.31.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 14 Feb 2020 15:31:08 -0800 (PST) From: Arjun Roy To: davem@davemloft.net, netdev@vger.kernel.org Cc: arjunroy@google.com, soheil@google.com, edumazet@google.com Subject: [PATCH net-next 2/2] tcp-zerocopy: Return sk_err (if set) along with tcp receive zerocopy. Date: Fri, 14 Feb 2020 15:30:50 -0800 Message-Id: <20200214233050.19429-2-arjunroy.kdev@gmail.com> X-Mailer: git-send-email 2.25.0.265.gbab2e86ba0-goog In-Reply-To: <20200214233050.19429-1-arjunroy.kdev@gmail.com> References: <20200214233050.19429-1-arjunroy.kdev@gmail.com> MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Arjun Roy This patchset is intended to reduce the number of extra system calls imposed by TCP receive zerocopy. For ping-pong RPC style workloads, this patchset has demonstrated a system call reduction of about 30% when coupled with userspace changes. For applications using epoll, returning sk_err along with the result of tcp receive zerocopy could remove the need to call recvmsg()=-EAGAIN after a spurious wakeup. Consider a multi-threaded application using epoll. A thread may awaken with EPOLLIN but another thread may already be reading. The spuriously-awoken thread does not necessarily know that another thread 'won'; rather, it may be possible that it was woken up due to the presence of an error if there is no data. A zerocopy read receiving 0 bytes thus would need to be followed up by recvmsg to be sure. Instead, we return sk_err directly with zerocopy, so the application can avoid this extra system call. Signed-off-by: Arjun Roy Signed-off-by: Eric Dumazet Signed-off-by: Soheil Hassas Yeganeh --- include/uapi/linux/tcp.h | 1 + net/ipv4/tcp.c | 8 +++++++- 2 files changed, 8 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h index 19700101cbba..e1706a7c9d88 100644 --- a/include/uapi/linux/tcp.h +++ b/include/uapi/linux/tcp.h @@ -344,5 +344,6 @@ struct tcp_zerocopy_receive { __u32 length; /* in/out: number of bytes to map/mapped */ __u32 recv_skip_hint; /* out: amount of bytes to skip */ __u32 inq; /* out: amount of bytes in read queue */ + __s32 err; /* out: socket error */ }; #endif /* _UAPI_LINUX_TCP_H */ diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 947be81b35c5..0efac228bbdb 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -3667,14 +3667,20 @@ static int do_tcp_getsockopt(struct sock *sk, int level, lock_sock(sk); err = tcp_zerocopy_receive(sk, &zc); release_sock(sk); + if (len == sizeof(zc)) + goto zerocopy_rcv_sk_err; switch (len) { - case sizeof(zc): + case offsetofend(struct tcp_zerocopy_receive, err): + goto zerocopy_rcv_sk_err; case offsetofend(struct tcp_zerocopy_receive, inq): goto zerocopy_rcv_inq; case offsetofend(struct tcp_zerocopy_receive, length): default: goto zerocopy_rcv_out; } +zerocopy_rcv_sk_err: + if (!err) + zc.err = sock_error(sk); zerocopy_rcv_inq: zc.inq = tcp_inq_hint(sk); zerocopy_rcv_out: