From patchwork Thu Nov 14 17:32:11 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Florian Westphal X-Patchwork-Id: 1194982 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.01.org (client-ip=2001:19d0:306:5::1; helo=ml01.01.org; envelope-from=mptcp-bounces@lists.01.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=strlen.de Received: from ml01.01.org (ml01.01.org [IPv6:2001:19d0:306:5::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 47DSwH3G3cz9sPF for ; Fri, 15 Nov 2019 04:22:22 +1100 (AEDT) Received: from new-ml01.vlan13.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id B5019100EE8CF; Thu, 14 Nov 2019 09:23:46 -0800 (PST) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=2a0a:51c0:0:12e:520::1; helo=chamillionaire.breakpoint.cc; envelope-from=fw@breakpoint.cc; receiver= Received: from Chamillionaire.breakpoint.cc (Chamillionaire.breakpoint.cc [IPv6:2a0a:51c0:0:12e:520::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id CD426100EE8CD for ; Thu, 14 Nov 2019 09:23:44 -0800 (PST) Received: from fw by Chamillionaire.breakpoint.cc with local (Exim 4.92) (envelope-from ) id 1iVIou-00061D-KK; Thu, 14 Nov 2019 18:22:12 +0100 From: Florian Westphal To: Date: Thu, 14 Nov 2019 18:32:11 +0100 Message-Id: <20191114173225.21199-1-fw@strlen.de> X-Mailer: git-send-email 2.23.0 MIME-Version: 1.0 Message-ID-Hash: DRZ6JPRXBSJJMA4WE4JOEH3ES7GD7632 X-Message-ID-Hash: DRZ6JPRXBSJJMA4WE4JOEH3ES7GD7632 X-MailFrom: fw@breakpoint.cc X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header X-Mailman-Version: 3.1.1 Precedence: list Subject: [MPTCP] [RFC] mptcp: wmem accounting and nonblocking io support List-Id: Discussions regarding MPTCP upstreaming Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: This (large, sigh) series fixes poll handling in mptcp. The first patch extends the test suite with a mmap-based mode to check large, blocking writes. This uncovered a minor problem with the earlier v2 wmem accounting patch series -- we would happily take a lot more data than sndbuf allowed, as we only limited based on what the subflow could accept. So with a 4k sndbuf we could easily accept 256kb or even more. This patch doesn't change anything in the test suite behaviour however, you need to use "-b 4096" and/or "-m mmap" to enable this mode. Second patch changes test suite to move to nonblocking io, this breaks mptcp because mptcp_poll can signal EPOLLIN when it shouldn't, so userspace gets -EAGAIN even though poll told it otherwise. Patches 3/4/5/6 are an update vs. last wmem accounting series. Remaining patches fix the nonblocking io behaviour. mptcp_poll is made to be stand-alone, i.e. it no longer calls __tcp_poll on the subflow sockets and only considers mptcp_sk state. After this series the selftest works again and mptcp sk rtx queue is limited by msk wmem. The patches can't easily be rebased/merged so I propose that I would squash this myself and send a pull request when done. The following changes since commit d1dbb32dc58df543e89f4004c1a0b96fe8acf99b: subflow: wake parent mptcp socket on subflow state change (2019-11-14 13:03:44 +0000) are available in the Git repository at: git://git.breakpoint.cc/fw/mptcp-next.git tcp_poll_removal_06 for you to fetch changes up to fa583a84bcf9550783ac6a229ab584d68764659e: sendmsg: truncate source buffer if mptcp sndbuf size was set from userspace (2019-11-14 17:56:13 +0100) ---------------------------------------------------------------- Florian Westphal (14): selftest: add mmap-write support selftests: make sockets non-blocking for default poll mode mptcp: add wmem_queued accounting mptcp: allow partial cleaning of rtx head dfrag mptcp: add and use mptcp RTX flag sendmsg: block until mptcp sk is writeable subflow: sk_data_ready: make wakeup on tcp sock conditional mptcp: add and use mptcp_subflow_get_retrans mptcp: sendmsg: transmit on backup if other subflows have been closed recv: make DATA_READY reflect ssk in-sequence state sendmsg: clear SEND_SPACE if write caused wmem to grow too large mptcp_poll: don't consider subflow socket state anymore sendmsg: don't restart mptcp_sendmsg_frag sendmsg: truncate source buffer if mptcp sndbuf size was set from userspace net/mptcp/options.c | 2 +- net/mptcp/protocol.c | 295 ++++++++++++++++----- net/mptcp/protocol.h | 4 +- net/mptcp/subflow.c | 12 +- tools/testing/selftests/net/mptcp/mptcp_connect.c | 268 ++++++++++++++++++- tools/testing/selftests/net/mptcp/mptcp_connect.sh | 36 ++- 6 files changed, 544 insertions(+), 73 deletions(-) From patchwork Thu Nov 14 17:32:13 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Florian Westphal X-Patchwork-Id: 1194984 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.01.org (client-ip=2001:19d0:306:5::1; helo=ml01.01.org; envelope-from=mptcp-bounces@lists.01.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=strlen.de Received: from ml01.01.org (ml01.01.org [IPv6:2001:19d0:306:5::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 47DSwL1nTbz9sP6 for ; Fri, 15 Nov 2019 04:22:26 +1100 (AEDT) Received: from new-ml01.vlan13.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id C4F29100DC3D2; Thu, 14 Nov 2019 09:23:53 -0800 (PST) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=2a0a:51c0:0:12e:520::1; helo=chamillionaire.breakpoint.cc; envelope-from=fw@breakpoint.cc; receiver= Received: from Chamillionaire.breakpoint.cc (Chamillionaire.breakpoint.cc [IPv6:2a0a:51c0:0:12e:520::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 8C1AD100DC3D0 for ; Thu, 14 Nov 2019 09:23:51 -0800 (PST) Received: from fw by Chamillionaire.breakpoint.cc with local (Exim 4.92) (envelope-from ) id 1iVIp2-00061V-Uj; Thu, 14 Nov 2019 18:22:20 +0100 From: Florian Westphal To: Cc: Florian Westphal Date: Thu, 14 Nov 2019 18:32:13 +0100 Message-Id: <20191114173225.21199-3-fw@strlen.de> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20191114173225.21199-1-fw@strlen.de> References: <20191114173225.21199-1-fw@strlen.de> MIME-Version: 1.0 Message-ID-Hash: VBBOKSMGEFMTDPNA3MQFFD5POA7QHXQJ X-Message-ID-Hash: VBBOKSMGEFMTDPNA3MQFFD5POA7QHXQJ X-MailFrom: fw@breakpoint.cc X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header X-Mailman-Version: 3.1.1 Precedence: list Subject: [MPTCP] [RFC 02/14] selftests: make sockets non-blocking for default poll mode List-Id: Discussions regarding MPTCP upstreaming Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: This change makes tests fail because mptcp_poll may signal POLLIN when no data is there and POLLOUT when it should not. Rest of series addresses this and makes selftest work again. Signed-off-by: Florian Westphal --- tools/testing/selftests/net/mptcp/mptcp_connect.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/tools/testing/selftests/net/mptcp/mptcp_connect.c b/tools/testing/selftests/net/mptcp/mptcp_connect.c index ea8e08b1f481..ab52468a4b51 100644 --- a/tools/testing/selftests/net/mptcp/mptcp_connect.c +++ b/tools/testing/selftests/net/mptcp/mptcp_connect.c @@ -279,6 +279,15 @@ static ssize_t do_rnd_read(const int fd, char *buf, const size_t len) return read(fd, buf, cap); } +static void set_nonblock(int fd) +{ + int flags = fcntl(fd, F_GETFL); + if (flags == -1) + return; + + fcntl(fd, F_SETFL, flags | O_NONBLOCK); +} + static int copyfd_io_poll(int infd, int peerfd, int outfd) { struct pollfd fds = { @@ -288,6 +297,8 @@ static int copyfd_io_poll(int infd, int peerfd, int outfd) unsigned int woff = 0, wlen = 0; char wbuf[8192]; + set_nonblock(peerfd); + for (;;) { char rbuf[8192]; ssize_t len; From patchwork Thu Nov 14 17:32:14 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Florian Westphal X-Patchwork-Id: 1194985 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.01.org (client-ip=2001:19d0:306:5::1; helo=ml01.01.org; envelope-from=mptcp-bounces@lists.01.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=strlen.de Received: from ml01.01.org (ml01.01.org [IPv6:2001:19d0:306:5::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 47DSwQ2KjDz9s7T for ; Fri, 15 Nov 2019 04:22:30 +1100 (AEDT) Received: from new-ml01.vlan13.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id DA958100DC3D2; Thu, 14 Nov 2019 09:23:57 -0800 (PST) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=2a0a:51c0:0:12e:520::1; helo=chamillionaire.breakpoint.cc; envelope-from=fw@breakpoint.cc; receiver= Received: from Chamillionaire.breakpoint.cc (Chamillionaire.breakpoint.cc [IPv6:2a0a:51c0:0:12e:520::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id AFFA5100EE8CD for ; Thu, 14 Nov 2019 09:23:55 -0800 (PST) Received: from fw by Chamillionaire.breakpoint.cc with local (Exim 4.92) (envelope-from ) id 1iVIp7-00061g-37; Thu, 14 Nov 2019 18:22:25 +0100 From: Florian Westphal To: Cc: Florian Westphal Date: Thu, 14 Nov 2019 18:32:14 +0100 Message-Id: <20191114173225.21199-4-fw@strlen.de> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20191114173225.21199-1-fw@strlen.de> References: <20191114173225.21199-1-fw@strlen.de> MIME-Version: 1.0 Message-ID-Hash: BHJGF2B2BU662IVAAOOGMM65MEJSR7PI X-Message-ID-Hash: BHJGF2B2BU662IVAAOOGMM65MEJSR7PI X-MailFrom: fw@breakpoint.cc X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header X-Mailman-Version: 3.1.1 Precedence: list Subject: [MPTCP] [RFC 03/14] mptcp: add wmem_queued accounting List-Id: Discussions regarding MPTCP upstreaming Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: Peer could ack data at TCP level but refrain from sending mptcp-level ACKs. This could result in a growing the mptcp socket backlog indefinitely. We should thus block mptcp_sendmsg until the peer has acked some of the sent data. In order to be able to do so, increment the mptcp socket wmem_queued counter on memory allocation and decrement it when releasing the memory on mptcp-level ack reception. Because TCP performns sndbuf auto-tuning up to tcp_wmem_max[2], make this the mptcp sk_sndbuf limit. In the future we could add experiment with autotuning as TCP does in tcp_sndbuf_expand(). Signed-off-by: Florian Westphal --- net/mptcp/protocol.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 1a432abfb176..9ad5bd5c2437 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -139,8 +139,11 @@ static inline bool mptcp_frag_can_collapse_to(const struct mptcp_sock *msk, static void dfrag_clear(struct sock *sk, struct mptcp_data_frag *dfrag) { + int len = dfrag->data_len + dfrag->overhead; + list_del(&dfrag->list); - sk_mem_uncharge(sk, dfrag->data_len + dfrag->overhead); + sk_mem_uncharge(sk, len); + sk_wmem_queued_add(sk, -len); put_page(dfrag->page); } @@ -304,6 +307,9 @@ static int mptcp_sendmsg_frag(struct sock *sk, struct sock *ssk, if (!dfrag_collapsed) { get_page(dfrag->page); list_add_tail(&dfrag->list, &msk->rtx_queue); + sk_wmem_queued_add(sk, frag_truesize); + } else { + sk_wmem_queued_add(sk, ret); } /* charge data on mptcp rtx queue to the master socket @@ -711,6 +717,7 @@ static int mptcp_init_sock(struct sock *sk) return ret; sk_sockets_allocated_inc(sk); + sk->sk_sndbuf = sock_net(sk)->ipv4.sysctl_tcp_wmem[2]; return 0; } From patchwork Thu Nov 14 17:32:15 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Florian Westphal X-Patchwork-Id: 1194986 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.01.org (client-ip=2001:19d0:306:5::1; helo=ml01.01.org; envelope-from=mptcp-bounces@lists.01.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=strlen.de Received: from ml01.01.org (ml01.01.org [IPv6:2001:19d0:306:5::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 47DSwW13WCz9s7T for ; Fri, 15 Nov 2019 04:22:35 +1100 (AEDT) Received: from new-ml01.vlan13.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id E286B100DC3CB; Thu, 14 Nov 2019 09:24:02 -0800 (PST) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=2a0a:51c0:0:12e:520::1; helo=chamillionaire.breakpoint.cc; envelope-from=fw@breakpoint.cc; receiver= Received: from Chamillionaire.breakpoint.cc (Chamillionaire.breakpoint.cc [IPv6:2a0a:51c0:0:12e:520::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id A43BB100EE8CE for ; Thu, 14 Nov 2019 09:24:00 -0800 (PST) Received: from fw by Chamillionaire.breakpoint.cc with local (Exim 4.92) (envelope-from ) id 1iVIpB-00061n-86; Thu, 14 Nov 2019 18:22:29 +0100 From: Florian Westphal To: Cc: Florian Westphal Date: Thu, 14 Nov 2019 18:32:15 +0100 Message-Id: <20191114173225.21199-5-fw@strlen.de> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20191114173225.21199-1-fw@strlen.de> References: <20191114173225.21199-1-fw@strlen.de> MIME-Version: 1.0 Message-ID-Hash: YHRNMVDTETFMTQGTJ4JYCSMILW4UTWOE X-Message-ID-Hash: YHRNMVDTETFMTQGTJ4JYCSMILW4UTWOE X-MailFrom: fw@breakpoint.cc X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header X-Mailman-Version: 3.1.1 Precedence: list Subject: [MPTCP] [RFC 04/14] mptcp: allow partial cleaning of rtx head dfrag List-Id: Discussions regarding MPTCP upstreaming Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: After adding wmem accouting for the mptcp socket we could get into a situation where the mptcp socket can't transmit more data, and mptcp_clean_una doesn't reduce wmem even if snd_una has advanced because it currently will only remove entire dfrags. Allow advancing the dfrag head sequence and reduce wmem, even though this isn't correct (as we can't release the page). Because we will soon block on mptcp sk in case wmem is too large, call sk_stream_write_space() in case we reduced the backlog so userspace task blocked in sendmsg or poll will be woken up. This isn't an issue if the send buffer is large, but it is when SO_SNDBUF is used to reduce it to a lower value. Note we can still get a deadlock for low SO_SNDBUF values in case both sides of the connection write to the socket: both could be blocked due to wmem being too small -- and current mptcp stack will only increment mptcp ack_seq on recv. This doesn't happen with the selftest as it uses poll() and will always call recv if there is data to read. Signed-off-by: Florian Westphal --- net/mptcp/protocol.c | 28 +++++++++++++++++++++++++--- 1 file changed, 25 insertions(+), 3 deletions(-) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 9ad5bd5c2437..a09ea93896c7 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -137,13 +137,18 @@ static inline bool mptcp_frag_can_collapse_to(const struct mptcp_sock *msk, df->data_seq + df->data_len == msk->write_seq; } +static void dfrag_uncharge(struct sock *sk, int len) +{ + sk_mem_uncharge(sk, len); + sk_wmem_queued_add(sk, -len); +} + static void dfrag_clear(struct sock *sk, struct mptcp_data_frag *dfrag) { int len = dfrag->data_len + dfrag->overhead; list_del(&dfrag->list); - sk_mem_uncharge(sk, len); - sk_wmem_queued_add(sk, -len); + dfrag_uncharge(sk, len); put_page(dfrag->page); } @@ -152,14 +157,31 @@ static void mptcp_clean_una(struct sock *sk) struct mptcp_sock *msk = mptcp_sk(sk); struct mptcp_data_frag *dtmp, *dfrag; u64 snd_una = atomic64_read(&msk->snd_una); + bool cleaned = false; list_for_each_entry_safe(dfrag, dtmp, &msk->rtx_queue, list) { if (after64(dfrag->data_seq + dfrag->data_len, snd_una)) break; dfrag_clear(sk, dfrag); + cleaned = true; + } + + dfrag = mptcp_rtx_head(sk); + if (dfrag && after64(snd_una, dfrag->data_seq)) { + u64 delta = dfrag->data_seq + dfrag->data_len - snd_una; + + dfrag->data_seq += delta; + dfrag->data_len -= delta; + + dfrag_uncharge(sk, delta); + cleaned = true; + } + + if (cleaned) { + sk_mem_reclaim_partial(sk); + sk_stream_write_space(sk); } - sk_mem_reclaim_partial(sk); } /* ensure we get enough memory for the frag hdr, beyond some minimal amount of From patchwork Thu Nov 14 17:32:16 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Florian Westphal X-Patchwork-Id: 1194987 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.01.org (client-ip=2001:19d0:306:5::1; helo=ml01.01.org; envelope-from=mptcp-bounces@lists.01.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=strlen.de Received: from ml01.01.org (ml01.01.org [IPv6:2001:19d0:306:5::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 47DSwY3yY9z9sP4 for ; Fri, 15 Nov 2019 04:22:37 +1100 (AEDT) Received: from new-ml01.vlan13.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id E98DE100DC3D2; Thu, 14 Nov 2019 09:24:04 -0800 (PST) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=2a0a:51c0:0:12e:520::1; helo=chamillionaire.breakpoint.cc; envelope-from=fw@breakpoint.cc; receiver= Received: from Chamillionaire.breakpoint.cc (Chamillionaire.breakpoint.cc [IPv6:2a0a:51c0:0:12e:520::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 05A7A100DC3D0 for ; Thu, 14 Nov 2019 09:24:04 -0800 (PST) Received: from fw by Chamillionaire.breakpoint.cc with local (Exim 4.92) (envelope-from ) id 1iVIpF-00061u-D2; Thu, 14 Nov 2019 18:22:33 +0100 From: Florian Westphal To: Cc: Florian Westphal Date: Thu, 14 Nov 2019 18:32:16 +0100 Message-Id: <20191114173225.21199-6-fw@strlen.de> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20191114173225.21199-1-fw@strlen.de> References: <20191114173225.21199-1-fw@strlen.de> MIME-Version: 1.0 Message-ID-Hash: NWRHIBXYCZ6CMLBBQX4HPOVFR4JJONLK X-Message-ID-Hash: NWRHIBXYCZ6CMLBBQX4HPOVFR4JJONLK X-MailFrom: fw@breakpoint.cc X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header X-Mailman-Version: 3.1.1 Precedence: list Subject: [MPTCP] [RFC 05/14] mptcp: add and use mptcp RTX flag List-Id: Discussions regarding MPTCP upstreaming Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: This is needed in the (unlikely) case that userspace is blocked in mptcp_sendmsg because wmem is exhausted. In that case, only the rtx work queue will clean the rtx backlog, but it could take several milliseconds until it runs next. So, allow it to get scheduled as soon as possible so wmem can be reclaimed if the mptcp socket sndbuf is exhausted. Because such quick-schedule should not cause retransmits, add a flag that indicates when the work queue has been scheduled on behalf of the retransmit timer. Signed-off-by: Florian Westphal --- net/mptcp/options.c | 2 +- net/mptcp/protocol.c | 26 ++++++++++++++++++++------ net/mptcp/protocol.h | 3 ++- 3 files changed, 23 insertions(+), 8 deletions(-) diff --git a/net/mptcp/options.c b/net/mptcp/options.c index 9a18a3670cdf..035ac67541a0 100644 --- a/net/mptcp/options.c +++ b/net/mptcp/options.c @@ -583,7 +583,7 @@ static void update_una(struct mptcp_sock *msk, old_snd_una = atomic64_cmpxchg(&msk->snd_una, snd_una, new_snd_una); if (old_snd_una == snd_una) { - mptcp_reset_timer((struct sock *)msk); + mptcp_data_acked((struct sock *)msk); break; } } diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index a09ea93896c7..2144e80b8704 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -40,7 +40,7 @@ static bool mptcp_timer_pending(struct sock *sk) return timer_pending(&inet_csk(sk)->icsk_retransmit_timer); } -void mptcp_reset_timer(struct sock *sk) +static void mptcp_reset_timer(struct sock *sk) { struct inet_connection_sock *icsk = inet_csk(sk); unsigned long tout; @@ -52,6 +52,15 @@ void mptcp_reset_timer(struct sock *sk) sk_reset_timer(sk, &icsk->icsk_retransmit_timer, jiffies + tout); } +void mptcp_data_acked(struct sock *sk) +{ + mptcp_reset_timer(sk); + + if (!sk_stream_is_writeable(sk) && + schedule_work(&mptcp_sk(sk)->rtx_work)) + sock_hold(sk); +} + static void mptcp_stop_timer(struct sock *sk) { struct inet_connection_sock *icsk = inet_csk(sk); @@ -623,6 +632,7 @@ static void mptcp_retransmit_handler(struct sock *sk) if (atomic64_read(&msk->snd_una) == msk->write_seq) { mptcp_stop_timer(sk); } else { + set_bit(MPTCP_WORK_RTX, &msk->flags); if (schedule_work(&msk->rtx_work)) sock_hold(sk); } @@ -647,7 +657,7 @@ static void mptcp_retransmit_timer(struct timer_list *t) sock_put(sk); } -static void mptcp_retransmit(struct work_struct *work) +static void mptcp_worker(struct work_struct *work) { int orig_len, orig_offset, ret, mss_now = 0, size_goal = 0; struct mptcp_data_frag *dfrag; @@ -663,6 +673,10 @@ static void mptcp_retransmit(struct work_struct *work) lock_sock(sk); mptcp_clean_una(sk); + + if (!test_and_clear_bit(MPTCP_WORK_RTX, &msk->flags)) + goto unlock; + dfrag = mptcp_rtx_head(sk); if (!dfrag) goto unlock; @@ -715,7 +729,7 @@ static int __mptcp_init_sock(struct sock *sk) INIT_LIST_HEAD(&msk->conn_list); INIT_LIST_HEAD(&msk->rtx_queue); - INIT_WORK(&msk->rtx_work, mptcp_retransmit); + INIT_WORK(&msk->rtx_work, mptcp_worker); /* re-use the csk retrans timer for MPTCP-level retrans */ timer_setup(&msk->sk.icsk_retransmit_timer, mptcp_retransmit_timer, 0); @@ -755,7 +769,7 @@ static void __mptcp_clear_xmit(struct sock *sk) dfrag_clear(sk, dfrag); } -static void mptcp_cancel_rtx_work(struct sock *sk) +static void mptcp_cancel_work(struct sock *sk) { struct mptcp_sock *msk = mptcp_sk(sk); @@ -792,7 +806,7 @@ static void mptcp_close(struct sock *sk, long timeout) __mptcp_clear_xmit(sk); release_sock(sk); - mptcp_cancel_rtx_work(sk); + mptcp_cancel_work(sk); sk_common_release(sk); } @@ -802,7 +816,7 @@ static int mptcp_disconnect(struct sock *sk, int flags) lock_sock(sk); __mptcp_clear_xmit(sk); release_sock(sk); - mptcp_cancel_rtx_work(sk); + mptcp_cancel_work(sk); return tcp_disconnect(sk, flags); } diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index 63ff8bd8a098..6e23da8c5024 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -76,6 +76,7 @@ /* MPTCP socket flags */ #define MPTCP_DATA_READY BIT(0) +#define MPTCP_WORK_RTX BIT(1) static inline __be32 mptcp_option(u8 subopt, u8 len, u8 nib, u8 field) { @@ -290,7 +291,7 @@ void mptcp_get_options(const struct sk_buff *skb, void mptcp_finish_connect(struct sock *sk, int mp_capable); void mptcp_finish_join(struct sock *sk); -void mptcp_reset_timer(struct sock *sk); +void mptcp_data_acked(struct sock *sk); int mptcp_token_new_request(struct request_sock *req); void mptcp_token_destroy_request(u32 token); From patchwork Thu Nov 14 17:32:17 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Florian Westphal X-Patchwork-Id: 1194988 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.01.org (client-ip=2001:19d0:306:5::1; helo=ml01.01.org; envelope-from=mptcp-bounces@lists.01.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=strlen.de Received: from ml01.01.org (ml01.01.org [IPv6:2001:19d0:306:5::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 47DSwd1fjtz9s7T for ; Fri, 15 Nov 2019 04:22:41 +1100 (AEDT) Received: from new-ml01.vlan13.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id F0FD1100DC3CF; Thu, 14 Nov 2019 09:24:08 -0800 (PST) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=2a0a:51c0:0:12e:520::1; helo=chamillionaire.breakpoint.cc; envelope-from=fw@breakpoint.cc; receiver= Received: from Chamillionaire.breakpoint.cc (Chamillionaire.breakpoint.cc [IPv6:2a0a:51c0:0:12e:520::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 36211100EEBB6 for ; Thu, 14 Nov 2019 09:24:08 -0800 (PST) Received: from fw by Chamillionaire.breakpoint.cc with local (Exim 4.92) (envelope-from ) id 1iVIpJ-000622-Ho; Thu, 14 Nov 2019 18:22:37 +0100 From: Florian Westphal To: Cc: Florian Westphal Date: Thu, 14 Nov 2019 18:32:17 +0100 Message-Id: <20191114173225.21199-7-fw@strlen.de> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20191114173225.21199-1-fw@strlen.de> References: <20191114173225.21199-1-fw@strlen.de> MIME-Version: 1.0 Message-ID-Hash: EISEJKMFRME6CM3YYX3DFNXC64BPB5M5 X-Message-ID-Hash: EISEJKMFRME6CM3YYX3DFNXC64BPB5M5 X-MailFrom: fw@breakpoint.cc X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header X-Mailman-Version: 3.1.1 Precedence: list Subject: [MPTCP] [RFC 06/14] sendmsg: block until mptcp sk is writeable List-Id: Discussions regarding MPTCP upstreaming Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: This disables transmit of new data until the peer has acked enough mptcp data to get below the wspace write threshold (more than half of wspace upperlimit is available again). Also have poll not report EPOLLOUT in this case, its not relevant if a subflow is writeable. The latter is a temporary workaround that is needed because mptcp_poll walks the subflows and calls __tcp_poll on each of them. Because subflow ssk is usually writable, we will have to undo-that if the mptcp sndbuf is exhausted. This won't be needed anymore once __tcp_poll is removed, I am working on this. Signed-off-by: Florian Westphal --- net/mptcp/protocol.c | 18 ++++++++++++++++-- 1 file changed, 16 insertions(+), 2 deletions(-) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 2144e80b8704..83be407e1dd6 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -406,6 +406,18 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) return ret; } + timeo = sock_sndtimeo(sk, msg->msg_flags & MSG_DONTWAIT); + + mptcp_clean_una(sk); + + while (!sk_stream_memory_free(sk)) { + ret = sk_stream_wait_memory(sk, &timeo); + if (ret) + goto out; + + mptcp_clean_una(sk); + } + ssk = mptcp_subflow_get(msk); if (!ssk) { release_sock(sk); @@ -421,8 +433,6 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) pr_debug("conn_list->subflow=%p", ssk); lock_sock(ssk); - mptcp_clean_una(sk); - timeo = sock_sndtimeo(sk, msg->msg_flags & MSG_DONTWAIT); while (msg_data_left(msg)) { ret = mptcp_sendmsg_frag(sk, ssk, msg, NULL, &timeo, &mss_now, &size_goal); @@ -1312,6 +1322,10 @@ static __poll_t mptcp_poll(struct file *file, struct socket *sock, tcp_sock = mptcp_subflow_tcp_socket(subflow); ret |= __tcp_poll(tcp_sock->sk); } + + if (!sk_stream_is_writeable(sk)) + ret &= ~(EPOLLOUT|EPOLLWRNORM); + release_sock(sk); return ret; From patchwork Thu Nov 14 17:32:18 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Florian Westphal X-Patchwork-Id: 1194989 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.01.org (client-ip=198.145.21.10; helo=ml01.01.org; envelope-from=mptcp-bounces@lists.01.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=strlen.de Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 47DSwl21Rmz9sP6 for ; Fri, 15 Nov 2019 04:22:47 +1100 (AEDT) Received: from new-ml01.vlan13.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id 044AA100EEBB6; Thu, 14 Nov 2019 09:24:14 -0800 (PST) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=2a0a:51c0:0:12e:520::1; helo=chamillionaire.breakpoint.cc; envelope-from=fw@breakpoint.cc; receiver= Received: from Chamillionaire.breakpoint.cc (Chamillionaire.breakpoint.cc [IPv6:2a0a:51c0:0:12e:520::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 51B36100EE8CD for ; Thu, 14 Nov 2019 09:24:12 -0800 (PST) Received: from fw by Chamillionaire.breakpoint.cc with local (Exim 4.92) (envelope-from ) id 1iVIpN-00062A-MV; Thu, 14 Nov 2019 18:22:41 +0100 From: Florian Westphal To: Cc: Florian Westphal Date: Thu, 14 Nov 2019 18:32:18 +0100 Message-Id: <20191114173225.21199-8-fw@strlen.de> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20191114173225.21199-1-fw@strlen.de> References: <20191114173225.21199-1-fw@strlen.de> MIME-Version: 1.0 Message-ID-Hash: ZFLOZ5JYI46WMRJ27YTDUHI5TFF6J445 X-Message-ID-Hash: ZFLOZ5JYI46WMRJ27YTDUHI5TFF6J445 X-MailFrom: fw@breakpoint.cc X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header X-Mailman-Version: 3.1.1 Precedence: list Subject: [MPTCP] [RFC 07/14] subflow: sk_data_ready: make wakeup on tcp sock conditional List-Id: Discussions regarding MPTCP upstreaming Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: No need for this unless we don't have a parent or the socket is not mp capable. Only the mptcp socket will be waiting for events. In case the mptcp socket connected to a tcp-only peer, we're in fallback mode and need to wakeup the parent too. Signed-off-by: Florian Westphal --- net/mptcp/subflow.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c index ff38d54392cd..976e49349276 100644 --- a/net/mptcp/subflow.c +++ b/net/mptcp/subflow.c @@ -646,10 +646,13 @@ static void subflow_data_ready(struct sock *sk) struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(sk); struct sock *parent = subflow->conn; - subflow->tcp_sk_data_ready(sk); + if (!parent || !(subflow->mp_capable || subflow->mp_join)) { + subflow->tcp_sk_data_ready(sk); - if (!parent || !(subflow->mp_capable || subflow->mp_join)) + if (parent) + parent->sk_data_ready(parent); return; + } /* always propagate the EoF */ if (mptcp_subflow_data_available(sk) || subflow->rx_eof) { From patchwork Thu Nov 14 17:32:19 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Florian Westphal X-Patchwork-Id: 1194990 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.01.org (client-ip=198.145.21.10; helo=ml01.01.org; envelope-from=mptcp-bounces@lists.01.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=strlen.de Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 47DSwq2QXHz9sP6 for ; Fri, 15 Nov 2019 04:22:51 +1100 (AEDT) Received: from new-ml01.vlan13.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id 0CA70100DC3CB; Thu, 14 Nov 2019 09:24:19 -0800 (PST) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=2a0a:51c0:0:12e:520::1; helo=chamillionaire.breakpoint.cc; envelope-from=fw@breakpoint.cc; receiver= Received: from Chamillionaire.breakpoint.cc (Chamillionaire.breakpoint.cc [IPv6:2a0a:51c0:0:12e:520::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 7EAA7100DC3CF for ; Thu, 14 Nov 2019 09:24:16 -0800 (PST) Received: from fw by Chamillionaire.breakpoint.cc with local (Exim 4.92) (envelope-from ) id 1iVIpR-00062M-Ry; Thu, 14 Nov 2019 18:22:45 +0100 From: Florian Westphal To: Cc: Florian Westphal Date: Thu, 14 Nov 2019 18:32:19 +0100 Message-Id: <20191114173225.21199-9-fw@strlen.de> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20191114173225.21199-1-fw@strlen.de> References: <20191114173225.21199-1-fw@strlen.de> MIME-Version: 1.0 Message-ID-Hash: WOG456LZDUUJM67NPUKZHW5AICIBFU63 X-Message-ID-Hash: WOG456LZDUUJM67NPUKZHW5AICIBFU63 X-MailFrom: fw@breakpoint.cc X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header X-Mailman-Version: 3.1.1 Precedence: list Subject: [MPTCP] [RFC 08/14] mptcp: add and use mptcp_subflow_get_retrans List-Id: Discussions regarding MPTCP upstreaming Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: Instead of having retransmit worker grab first subflow on the list, make it always return NULL, unless either the first non-backup subflow on the list is idle or all normal subflows have already been removed. In the latter case, the first idle backup subflow is used. Rationale is that it makes no sense to attempt to retransmit at mptcp level if there is still unsent data in its write queue. V2: always return NULL when first non-idle ssk is seen. Signed-off-by: Florian Westphal --- net/mptcp/protocol.c | 33 ++++++++++++++++++++++++++++++++- 1 file changed, 32 insertions(+), 1 deletion(-) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 83be407e1dd6..c5cf19a4b9f0 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -667,6 +667,37 @@ static void mptcp_retransmit_timer(struct timer_list *t) sock_put(sk); } +/* Find an idle subflow. Return NULL if there is unacked data at tcp + * level. + * + * A backup subflow is returned only if thats the only kind available. + */ +static struct sock *mptcp_subflow_get_retrans(const struct mptcp_sock *msk) +{ + struct mptcp_subflow_context *subflow; + struct sock *backup = NULL; + + sock_owned_by_me((const struct sock *)msk); + + mptcp_for_each_subflow(msk, subflow) { + struct sock *ssk = mptcp_subflow_tcp_socket(subflow)->sk; + + /* still data outstanding at TCP level? Don't retransmit. */ + if (!tcp_write_queue_empty(ssk)) + return NULL; + + if (subflow->backup) { + if (!backup) + backup = ssk; + continue; + } + + return ssk; + } + + return backup; +} + static void mptcp_worker(struct work_struct *work) { int orig_len, orig_offset, ret, mss_now = 0, size_goal = 0; @@ -691,7 +722,7 @@ static void mptcp_worker(struct work_struct *work) if (!dfrag) goto unlock; - ssk = mptcp_subflow_get(msk); + ssk = mptcp_subflow_get_retrans(msk); if (!ssk) goto reset_unlock; From patchwork Thu Nov 14 17:32:20 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Florian Westphal X-Patchwork-Id: 1194991 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.01.org (client-ip=2001:19d0:306:5::1; helo=ml01.01.org; envelope-from=mptcp-bounces@lists.01.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=strlen.de Received: from ml01.01.org (ml01.01.org [IPv6:2001:19d0:306:5::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 47DSwv2prvz9sP6 for ; Fri, 15 Nov 2019 04:22:55 +1100 (AEDT) Received: from new-ml01.vlan13.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id 148D0100DC3CF; Thu, 14 Nov 2019 09:24:23 -0800 (PST) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=2a0a:51c0:0:12e:520::1; helo=chamillionaire.breakpoint.cc; envelope-from=fw@breakpoint.cc; receiver= Received: from Chamillionaire.breakpoint.cc (Chamillionaire.breakpoint.cc [IPv6:2a0a:51c0:0:12e:520::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id A2036100DC3CF for ; Thu, 14 Nov 2019 09:24:20 -0800 (PST) Received: from fw by Chamillionaire.breakpoint.cc with local (Exim 4.92) (envelope-from ) id 1iVIpW-00062T-0g; Thu, 14 Nov 2019 18:22:50 +0100 From: Florian Westphal To: Cc: Florian Westphal Date: Thu, 14 Nov 2019 18:32:20 +0100 Message-Id: <20191114173225.21199-10-fw@strlen.de> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20191114173225.21199-1-fw@strlen.de> References: <20191114173225.21199-1-fw@strlen.de> MIME-Version: 1.0 Message-ID-Hash: KCLW6RPCJUC5MTG7P7GRNXZQTT3KL5JW X-Message-ID-Hash: KCLW6RPCJUC5MTG7P7GRNXZQTT3KL5JW X-MailFrom: fw@breakpoint.cc X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header X-Mailman-Version: 3.1.1 Precedence: list Subject: [MPTCP] [RFC 09/14] mptcp: sendmsg: transmit on backup if other subflows have been closed List-Id: Discussions regarding MPTCP upstreaming Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: Currently we always pick the first ssk on the list and then have mptcp_sendmsg_frag wait until more space becomes available in case that ssk has no write space available. Instead check the first subflow on the list. If no more write space is available, then we need to either return -EAGAIN to userspace (nonblock case), or we need to wait until a subflow becomes available. This is done by blocking the current thread via sk_stream_wait_memory() and then make the subflow sk_write_space() unblock the parent mptcp socket. We can't acquire the mptcp socket lock from the subflow callbacks, but we can use the mptcp_sk->flags -- MPTCP_SEND_SPACE flag is added for this purpose. If it gets set, then at least one subflow has become available for writing. v2: dumb-down the selection: just pick the first ssk on the list and make mptcp socket block if it has no wspace. Backup is only used if no non-backup subflow exists. Signed-off-by: Florian Westphal --- net/mptcp/protocol.c | 69 +++++++++++++++++++++++++++++++++++++++----- net/mptcp/protocol.h | 1 + net/mptcp/subflow.c | 5 +++- 3 files changed, 67 insertions(+), 8 deletions(-) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index c5cf19a4b9f0..6fb178067a4a 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -384,6 +384,43 @@ static int mptcp_sendmsg_frag(struct sock *sk, struct sock *ssk, return ret; } +static struct sock *mptcp_subflow_get_send(struct mptcp_sock *msk) +{ + struct mptcp_subflow_context *subflow; + struct sock *backup = NULL; + + sock_owned_by_me((const struct sock *)msk); + + mptcp_for_each_subflow(msk, subflow) { + struct sock *ssk = mptcp_subflow_tcp_socket(subflow)->sk; + + if (!sk_stream_memory_free(ssk)) { + struct socket *sock = ssk->sk_socket; + + if (sock) { + clear_bit(MPTCP_SEND_SPACE, &msk->flags); + smp_mb__after_atomic(); + + /* enables sk->write_space() callbacks */ + set_bit(SOCK_NOSPACE, &sock->flags); + } + + return NULL; + } + + if (subflow->backup) { + if (!backup) + backup = ssk; + + continue; + } + + return ssk; + } + + return backup; +} + static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) { int mss_now = 0, size_goal = 0, ret = 0; @@ -418,18 +455,28 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) mptcp_clean_una(sk); } - ssk = mptcp_subflow_get(msk); - if (!ssk) { - release_sock(sk); - return -ENOTCONN; - } - - if (!msg_data_left(msg)) { + if (unlikely(!msg_data_left(msg))) { + ssk = mptcp_subflow_get(msk); pr_debug("empty send"); ret = sock_sendmsg(ssk->sk_socket, msg); goto out; } + ssk = mptcp_subflow_get_send(msk); + while (!ssk) { + ret = sk_stream_wait_memory(sk, &timeo); + if (ret) + goto out; + + mptcp_clean_una(sk); + + ssk = mptcp_subflow_get_send(msk); + if (list_empty(&msk->conn_list)) { + ret = -ENOTCONN; + goto out; + } + } + pr_debug("conn_list->subflow=%p", ssk); lock_sock(ssk); @@ -1123,6 +1170,13 @@ bool mptcp_sk_is_subflow(const struct sock *sk) return subflow->mp_join == 1; } +static bool mptcp_memory_free(const struct sock *sk, int wake) +{ + struct mptcp_sock *msk = mptcp_sk(sk); + + return wake ? test_bit(MPTCP_SEND_SPACE, &msk->flags) : true; +} + static struct proto mptcp_prot = { .name = "MPTCP", .owner = THIS_MODULE, @@ -1143,6 +1197,7 @@ static struct proto mptcp_prot = { .sockets_allocated = &mptcp_sockets_allocated, .memory_allocated = &tcp_memory_allocated, .memory_pressure = &tcp_memory_pressure, + .stream_memory_free = mptcp_memory_free, .sysctl_wmem_offset = offsetof(struct net, ipv4.sysctl_tcp_wmem), .sysctl_mem = sysctl_tcp_mem, .obj_size = sizeof(struct mptcp_sock), diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index 6e23da8c5024..ce5c5de6a5eb 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -77,6 +77,7 @@ /* MPTCP socket flags */ #define MPTCP_DATA_READY BIT(0) #define MPTCP_WORK_RTX BIT(1) +#define MPTCP_SEND_SPACE BIT(2) static inline __be32 mptcp_option(u8 subopt, u8 len, u8 nib, u8 field) { diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c index 976e49349276..32082c6e8552 100644 --- a/net/mptcp/subflow.c +++ b/net/mptcp/subflow.c @@ -670,8 +670,11 @@ static void subflow_write_space(struct sock *sk) struct sock *parent = subflow->conn; sk_stream_write_space(sk); - if (parent) + if (parent && sk_stream_is_writeable(sk)) { + set_bit(MPTCP_SEND_SPACE, &mptcp_sk(parent)->flags); + smp_mb__after_atomic(); sk_stream_write_space(parent); + } } int mptcp_subflow_connect(struct sock *sk, struct sockaddr *local, From patchwork Thu Nov 14 17:32:21 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Florian Westphal X-Patchwork-Id: 1194992 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.01.org (client-ip=198.145.21.10; helo=ml01.01.org; envelope-from=mptcp-bounces@lists.01.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=strlen.de Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 47DSwz6wj1z9sP4 for ; Fri, 15 Nov 2019 04:22:59 +1100 (AEDT) Received: from new-ml01.vlan13.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id 283E1100DC3D2; Thu, 14 Nov 2019 09:24:27 -0800 (PST) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=2a0a:51c0:0:12e:520::1; helo=chamillionaire.breakpoint.cc; envelope-from=fw@breakpoint.cc; receiver= Received: from Chamillionaire.breakpoint.cc (Chamillionaire.breakpoint.cc [IPv6:2a0a:51c0:0:12e:520::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id C27C9100EE8CD for ; Thu, 14 Nov 2019 09:24:24 -0800 (PST) Received: from fw by Chamillionaire.breakpoint.cc with local (Exim 4.92) (envelope-from ) id 1iVIpa-00062a-5S; Thu, 14 Nov 2019 18:22:54 +0100 From: Florian Westphal To: Cc: Florian Westphal Date: Thu, 14 Nov 2019 18:32:21 +0100 Message-Id: <20191114173225.21199-11-fw@strlen.de> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20191114173225.21199-1-fw@strlen.de> References: <20191114173225.21199-1-fw@strlen.de> MIME-Version: 1.0 Message-ID-Hash: 2TYBNL54CG3F542IJS6TJ4NVPQHVGSY3 X-Message-ID-Hash: 2TYBNL54CG3F542IJS6TJ4NVPQHVGSY3 X-MailFrom: fw@breakpoint.cc X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header X-Mailman-Version: 3.1.1 Precedence: list Subject: [MPTCP] [RFC 10/14] recv: make DATA_READY reflect ssk in-sequence state List-Id: Discussions regarding MPTCP upstreaming Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: In order to make mptcp_poll independent of the subflows, we need to keep the mptcp DATA_READY flag in sync, i.e., if it is set, at least one ssk has in-sequence data. If it is cleared, no further data is available. Avoid the unconditional clearing on recv entry. Instead make sure the flag is cleared on exit if there is no more in-sequence data available. Signed-off-by: Florian Westphal --- net/mptcp/protocol.c | 51 +++++++++++++++++++++++++++++--------------- 1 file changed, 34 insertions(+), 17 deletions(-) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 6fb178067a4a..b8f936c78ed3 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -556,8 +556,10 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, { struct mptcp_sock *msk = mptcp_sk(sk); struct mptcp_subflow_context *subflow; + bool more_data_avail = false; struct mptcp_read_arg arg; read_descriptor_t desc; + bool wait_data = false; struct socket *ssock; struct tcp_sock *tp; bool done = false; @@ -590,10 +592,6 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, u32 map_remaining; int bytes_read; - smp_mb__before_atomic(); - clear_bit(MPTCP_DATA_READY, &msk->flags); - smp_mb__after_atomic(); - ssk = mptcp_subflow_recv_lookup(msk); pr_debug("msk=%p ssk=%p", msk, ssk); if (!ssk) @@ -603,7 +601,7 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, tp = tcp_sk(ssk); lock_sock(ssk); - while (mptcp_subflow_data_available(ssk) && !done) { + do { /* try to read as much data as available */ map_remaining = subflow->map_data_len - mptcp_subflow_get_map_offset(subflow); @@ -614,8 +612,7 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, if (bytes_read < 0) { if (!copied) copied = bytes_read; - done = true; - continue; + goto next; } pr_debug("msk ack_seq=%llx -> %llx", msk->ack_seq, @@ -624,23 +621,27 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, copied += bytes_read; if (copied >= len) { done = true; - continue; + goto next; } if (tp->urg_data && tp->urg_seq == tp->copied_seq) { pr_err("Urgent data present, cannot proceed"); done = true; - continue; + goto next; } - } +next: + more_data_avail = mptcp_subflow_data_available(ssk); + } while (more_data_avail && !done); release_sock(ssk); continue; wait_for_data: + more_data_avail = false; + /* only the master socket status is relevant here. The exit * conditions mirror closely tcp_recvmsg() */ if (copied >= target) - break; + goto out; if (copied) { if (sk->sk_err || @@ -648,36 +649,52 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, (sk->sk_shutdown & RCV_SHUTDOWN) || !timeo || signal_pending(current)) - break; + goto out; } else { if (sk->sk_err) { copied = sock_error(sk); - break; + goto out; } if (sk->sk_shutdown & RCV_SHUTDOWN) - break; + goto out; if (sk->sk_state == TCP_CLOSE) { copied = -ENOTCONN; - break; + goto out; } if (!timeo) { copied = -EAGAIN; - break; + goto out; } if (signal_pending(current)) { copied = sock_intr_errno(timeo); - break; + goto out; } } pr_debug("block timeout %ld", timeo); + wait_data = true; mptcp_wait_data(sk, &timeo); } +out: + if (more_data_avail) { + if (!test_bit(MPTCP_DATA_READY, &msk->flags)) + set_bit(MPTCP_DATA_READY, &msk->flags); + } else if (!wait_data) { + clear_bit(MPTCP_DATA_READY, &msk->flags); + + /* .. race-breaker: ssk might get new data after last + * data_available() returns false. + */ + ssk = mptcp_subflow_recv_lookup(msk); + if (unlikely(ssk)) + set_bit(MPTCP_DATA_READY, &msk->flags); + } + release_sock(sk); return copied; } From patchwork Thu Nov 14 17:32:22 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Florian Westphal X-Patchwork-Id: 1194993 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.01.org (client-ip=198.145.21.10; helo=ml01.01.org; envelope-from=mptcp-bounces@lists.01.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=strlen.de Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 47DSx33m9Rz9s7T for ; Fri, 15 Nov 2019 04:23:03 +1100 (AEDT) Received: from new-ml01.vlan13.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id 2FAB2100DC3D0; Thu, 14 Nov 2019 09:24:31 -0800 (PST) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=2a0a:51c0:0:12e:520::1; helo=chamillionaire.breakpoint.cc; envelope-from=fw@breakpoint.cc; receiver= Received: from Chamillionaire.breakpoint.cc (Chamillionaire.breakpoint.cc [IPv6:2a0a:51c0:0:12e:520::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id DED17100EEBB6 for ; Thu, 14 Nov 2019 09:24:28 -0800 (PST) Received: from fw by Chamillionaire.breakpoint.cc with local (Exim 4.92) (envelope-from ) id 1iVIpe-00062i-AM; Thu, 14 Nov 2019 18:22:58 +0100 From: Florian Westphal To: Cc: Florian Westphal Date: Thu, 14 Nov 2019 18:32:22 +0100 Message-Id: <20191114173225.21199-12-fw@strlen.de> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20191114173225.21199-1-fw@strlen.de> References: <20191114173225.21199-1-fw@strlen.de> MIME-Version: 1.0 Message-ID-Hash: C65WTBLFO2HXWU2JJSUUCPONQQLAUXMW X-Message-ID-Hash: C65WTBLFO2HXWU2JJSUUCPONQQLAUXMW X-MailFrom: fw@breakpoint.cc X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header X-Mailman-Version: 3.1.1 Precedence: list Subject: [MPTCP] [RFC 11/14] sendmsg: clear SEND_SPACE if write caused wmem to grow too large List-Id: Discussions regarding MPTCP upstreaming Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: This is needed when we get rid of __tcp_poll and let mptcp_poll only rely on msk->flags -- we need to avoid EPOLLOUT if the write filled too much sndbuf space. Signed-off-by: Florian Westphal --- net/mptcp/protocol.c | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index b8f936c78ed3..201e256a3221 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -421,6 +421,22 @@ static struct sock *mptcp_subflow_get_send(struct mptcp_sock *msk) return backup; } +static void ssk_check_wmem(struct mptcp_sock *msk, struct sock *ssk) +{ + struct socket *sock; + + if (likely(sk_stream_is_writeable(ssk))) + return; + + sock = READ_ONCE(ssk->sk_socket); + + if (sock) { + clear_bit(MPTCP_SEND_SPACE, &msk->flags); + smp_mb__after_atomic(); + set_bit(SOCK_NOSPACE, &sock->flags); + } +} + static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) { int mss_now = 0, size_goal = 0, ret = 0; @@ -500,6 +516,7 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) mptcp_reset_timer(sk); } + ssk_check_wmem(msk, ssk); release_sock(ssk); out: From patchwork Thu Nov 14 17:32:23 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Florian Westphal X-Patchwork-Id: 1194994 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.01.org (client-ip=198.145.21.10; helo=ml01.01.org; envelope-from=mptcp-bounces@lists.01.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=strlen.de Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 47DSx83CdLz9s7T for ; Fri, 15 Nov 2019 04:23:07 +1100 (AEDT) Received: from new-ml01.vlan13.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id 36F62100DC3CF; Thu, 14 Nov 2019 09:24:35 -0800 (PST) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=2a0a:51c0:0:12e:520::1; helo=chamillionaire.breakpoint.cc; envelope-from=fw@breakpoint.cc; receiver= Received: from Chamillionaire.breakpoint.cc (Chamillionaire.breakpoint.cc [IPv6:2a0a:51c0:0:12e:520::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 13CE1100EE8CD for ; Thu, 14 Nov 2019 09:24:33 -0800 (PST) Received: from fw by Chamillionaire.breakpoint.cc with local (Exim 4.92) (envelope-from ) id 1iVIpi-00062p-F4; Thu, 14 Nov 2019 18:23:02 +0100 From: Florian Westphal To: Cc: Florian Westphal Date: Thu, 14 Nov 2019 18:32:23 +0100 Message-Id: <20191114173225.21199-13-fw@strlen.de> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20191114173225.21199-1-fw@strlen.de> References: <20191114173225.21199-1-fw@strlen.de> MIME-Version: 1.0 Message-ID-Hash: AMQGEGDUMI5QRGWA4NMUJQXGBGN6TCNV X-Message-ID-Hash: AMQGEGDUMI5QRGWA4NMUJQXGBGN6TCNV X-MailFrom: fw@breakpoint.cc X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header X-Mailman-Version: 3.1.1 Precedence: list Subject: [MPTCP] [RFC 12/14] mptcp_poll: don't consider subflow socket state anymore List-Id: Discussions regarding MPTCP upstreaming Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: After previous patch, we do not need to call __tcp_poll anymore, we can use msk->flags instead to see if we have data available on a socket. SEND_SPACE flag indicates when a subflow has enough space to accept more data, it gets cleared on mptcp_sendmsg() return in case ssk runs below the free watermark. "net: tcp: add __tcp_poll helper" can now be removed. Signed-off-by: Florian Westphal --- net/mptcp/protocol.c | 30 +++++++++++++++--------------- 1 file changed, 15 insertions(+), 15 deletions(-) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 201e256a3221..7d3bf189b407 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -189,7 +189,10 @@ static void mptcp_clean_una(struct sock *sk) if (cleaned) { sk_mem_reclaim_partial(sk); - sk_stream_write_space(sk); + + /* Only wake up writers if a subflow is ready */ + if (test_bit(MPTCP_SEND_SPACE, &msk->flags)) + sk_stream_write_space(sk); } } @@ -852,6 +855,7 @@ static int __mptcp_init_sock(struct sock *sk) INIT_LIST_HEAD(&msk->rtx_queue); INIT_WORK(&msk->rtx_work, mptcp_worker); + __set_bit(MPTCP_SEND_SPACE, &msk->flags); /* re-use the csk retrans timer for MPTCP-level retrans */ timer_setup(&msk->sk.icsk_retransmit_timer, mptcp_retransmit_timer, 0); @@ -1416,39 +1420,35 @@ static int mptcp_stream_accept(struct socket *sock, struct socket *newsock, static __poll_t mptcp_poll(struct file *file, struct socket *sock, struct poll_table_struct *wait) { - struct mptcp_subflow_context *subflow; const struct mptcp_sock *msk; struct sock *sk = sock->sk; struct socket *ssock; - __poll_t ret = 0; + __poll_t mask = 0; msk = mptcp_sk(sk); lock_sock(sk); ssock = __mptcp_fallback_get_ref(msk); if (ssock) { release_sock(sk); - ret = ssock->ops->poll(file, ssock, wait); + mask = ssock->ops->poll(file, ssock, wait); sock_put(ssock->sk); - return ret; + return mask; } release_sock(sk); sock_poll_wait(file, sock, wait); lock_sock(sk); - mptcp_for_each_subflow(msk, subflow) { - struct socket *tcp_sock; - - tcp_sock = mptcp_subflow_tcp_socket(subflow); - ret |= __tcp_poll(tcp_sock->sk); - } - - if (!sk_stream_is_writeable(sk)) - ret &= ~(EPOLLOUT|EPOLLWRNORM); + if (test_bit(MPTCP_DATA_READY, &msk->flags)) + mask = EPOLLIN | EPOLLRDNORM; + if (sk_stream_is_writeable(sk) && test_bit(MPTCP_SEND_SPACE, &msk->flags)) + mask |= EPOLLOUT | EPOLLWRNORM; + if (sk->sk_shutdown & RCV_SHUTDOWN) + mask |= EPOLLIN | EPOLLRDNORM | EPOLLRDHUP; release_sock(sk); - return ret; + return mask; } static int mptcp_shutdown(struct socket *sock, int how) From patchwork Thu Nov 14 17:32:24 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Florian Westphal X-Patchwork-Id: 1194995 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.01.org (client-ip=198.145.21.10; helo=ml01.01.org; envelope-from=mptcp-bounces@lists.01.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=strlen.de Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 47DSxD4LzQz9s7T for ; Fri, 15 Nov 2019 04:23:12 +1100 (AEDT) Received: from new-ml01.vlan13.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id 3EAE9100DC3CF; Thu, 14 Nov 2019 09:24:40 -0800 (PST) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=2a0a:51c0:0:12e:520::1; helo=chamillionaire.breakpoint.cc; envelope-from=fw@breakpoint.cc; receiver= Received: from Chamillionaire.breakpoint.cc (Chamillionaire.breakpoint.cc [IPv6:2a0a:51c0:0:12e:520::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 39FEE100EEBB6 for ; Thu, 14 Nov 2019 09:24:37 -0800 (PST) Received: from fw by Chamillionaire.breakpoint.cc with local (Exim 4.92) (envelope-from ) id 1iVIpm-00062w-Jp; Thu, 14 Nov 2019 18:23:06 +0100 From: Florian Westphal To: Cc: Florian Westphal Date: Thu, 14 Nov 2019 18:32:24 +0100 Message-Id: <20191114173225.21199-14-fw@strlen.de> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20191114173225.21199-1-fw@strlen.de> References: <20191114173225.21199-1-fw@strlen.de> MIME-Version: 1.0 Message-ID-Hash: RY3YAZSZRSNF2CNENJTTHT2M3M6PPKIN X-Message-ID-Hash: RY3YAZSZRSNF2CNENJTTHT2M3M6PPKIN X-MailFrom: fw@breakpoint.cc X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header X-Mailman-Version: 3.1.1 Precedence: list Subject: [MPTCP] [RFC 13/14] sendmsg: don't restart mptcp_sendmsg_frag List-Id: Discussions regarding MPTCP upstreaming Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: This function calls do_tcp_sendpages which already has such a loop. When tcp sendbuffer runs out of space and non-blocking io is used, do_tcp_sendpages will return early because it can't sleep. No -EAGAIN is returned, as some data was sent. When mptcp_sendmsg_frag is called again, next call will either return -EAGAIN immediately or it will only send a few more bytes. Simplify this and leave all 'allocate another skb?' logic to tcp. This would need to be spread over multiple changes, I'd propose I do the squash myself and send a pull request for the updated branch if thats fine with you. Signed-off-by: Florian Westphal --- net/mptcp/protocol.c | 25 ++++++++----------------- 1 file changed, 8 insertions(+), 17 deletions(-) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 7d3bf189b407..fbbff667e07a 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -499,14 +499,10 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) pr_debug("conn_list->subflow=%p", ssk); lock_sock(ssk); - while (msg_data_left(msg)) { - ret = mptcp_sendmsg_frag(sk, ssk, msg, NULL, &timeo, &mss_now, - &size_goal); - if (ret < 0) - break; - - copied += ret; - } + ret = mptcp_sendmsg_frag(sk, ssk, msg, NULL, &timeo, &mss_now, + &size_goal); + if (ret > 0) + copied = ret; mptcp_set_timeout(sk, ssk); if (copied) { @@ -789,7 +785,6 @@ static void mptcp_worker(struct work_struct *work) struct sock *ssk, *sk; struct mptcp_sock *msk; u64 orig_write_seq; - size_t copied = 0; struct msghdr msg; long timeo = 0; @@ -816,20 +811,16 @@ static void mptcp_worker(struct work_struct *work) orig_len = dfrag->data_len; orig_offset = dfrag->offset; orig_write_seq = dfrag->data_seq; - while (dfrag->data_len > 0) { - ret = mptcp_sendmsg_frag(sk, ssk, &msg, dfrag, &timeo, &mss_now, - &size_goal); - if (ret < 0) - break; + ret = mptcp_sendmsg_frag(sk, ssk, &msg, dfrag, &timeo, &mss_now, + &size_goal); + if (ret > 0) { MPTCP_INC_STATS(sock_net(sk), MPTCP_MIB_RETRANSSEGS); - copied += ret; dfrag->data_len -= ret; dfrag->offset += ret; - } - if (copied) tcp_push(ssk, msg.msg_flags, mss_now, tcp_sk(ssk)->nonagle, size_goal); + } dfrag->data_seq = orig_write_seq; dfrag->offset = orig_offset; From patchwork Thu Nov 14 17:32:25 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Florian Westphal X-Patchwork-Id: 1194996 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.01.org (client-ip=198.145.21.10; helo=ml01.01.org; envelope-from=mptcp-bounces@lists.01.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=strlen.de Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 47DSxJ0yCNz9s7T for ; Fri, 15 Nov 2019 04:23:16 +1100 (AEDT) Received: from new-ml01.vlan13.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id 4665C100DC3D0; Thu, 14 Nov 2019 09:24:43 -0800 (PST) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=2a0a:51c0:0:12e:520::1; helo=chamillionaire.breakpoint.cc; envelope-from=fw@breakpoint.cc; receiver= Received: from Chamillionaire.breakpoint.cc (Chamillionaire.breakpoint.cc [IPv6:2a0a:51c0:0:12e:520::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 60C2E100DC3D0 for ; Thu, 14 Nov 2019 09:24:41 -0800 (PST) Received: from fw by Chamillionaire.breakpoint.cc with local (Exim 4.92) (envelope-from ) id 1iVIpq-000633-Os; Thu, 14 Nov 2019 18:23:10 +0100 From: Florian Westphal To: Cc: Florian Westphal Date: Thu, 14 Nov 2019 18:32:25 +0100 Message-Id: <20191114173225.21199-15-fw@strlen.de> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20191114173225.21199-1-fw@strlen.de> References: <20191114173225.21199-1-fw@strlen.de> MIME-Version: 1.0 Message-ID-Hash: CDD3JKTJGFVDVM7L363LIGPP6Y7UIUKN X-Message-ID-Hash: CDD3JKTJGFVDVM7L363LIGPP6Y7UIUKN X-MailFrom: fw@breakpoint.cc X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header X-Mailman-Version: 3.1.1 Precedence: list Subject: [MPTCP] [RFC 14/14] sendmsg: truncate source buffer if mptcp sndbuf size was set from userspace List-Id: Discussions regarding MPTCP upstreaming Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: If userspace uses SO_SNDBUF to set a value, then truncate the buffer according to remaining wmem. Otherwise, running selftest script with "-m mmap -b 4096" shows very large sucessful write() calls, as we're only limited by how much data the underlying tcp flow is willing to accept. NB: It might make sense to carry SO_SNDBUF down to the subflows, but thats left out for now. Signed-off-by: Florian Westphal --- net/mptcp/protocol.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index fbbff667e07a..eb3499ca4f36 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -498,6 +498,15 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) pr_debug("conn_list->subflow=%p", ssk); + if (unlikely(sk->sk_userlocks & SOCK_SNDBUF_LOCK)) { + int limit = sk_stream_wspace(sk); + + if (WARN_ON_ONCE(limit <= 0)) + limit = 1; + + iov_iter_truncate(&msg->msg_iter, limit); + } + lock_sock(ssk); ret = mptcp_sendmsg_frag(sk, ssk, msg, NULL, &timeo, &mss_now, &size_goal);