From patchwork Mon Nov 18 21:45:25 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Florian Westphal X-Patchwork-Id: 1196989 X-Patchwork-Delegate: matthieu.baerts@tessares.net Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.01.org (client-ip=2001:19d0:306:5::1; helo=ml01.01.org; envelope-from=mptcp-bounces@lists.01.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=strlen.de Received: from ml01.01.org (ml01.01.org [IPv6:2001:19d0:306:5::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 47H2ZW0vjxz9sP4 for ; Tue, 19 Nov 2019 08:45:55 +1100 (AEDT) Received: from new-ml01.vlan13.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id 79C32100DC431; Mon, 18 Nov 2019 13:46:49 -0800 (PST) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=2a0a:51c0:0:12e:520::1; helo=chamillionaire.breakpoint.cc; envelope-from=fw@breakpoint.cc; receiver= Received: from Chamillionaire.breakpoint.cc (Chamillionaire.breakpoint.cc [IPv6:2a0a:51c0:0:12e:520::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 74772100DC42C for ; Mon, 18 Nov 2019 13:46:47 -0800 (PST) Received: from fw by Chamillionaire.breakpoint.cc with local (Exim 4.92) (envelope-from ) id 1iWoqE-0008HK-22; Mon, 18 Nov 2019 22:45:50 +0100 From: Florian Westphal To: Date: Mon, 18 Nov 2019 22:45:25 +0100 Message-Id: <20191118214538.21931-1-fw@strlen.de> X-Mailer: git-send-email 2.23.0 MIME-Version: 1.0 Message-ID-Hash: WB37DKMOZQRIOKC7YHYDCRZV3SYP77AL X-Message-ID-Hash: WB37DKMOZQRIOKC7YHYDCRZV3SYP77AL X-MailFrom: fw@breakpoint.cc X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header X-Mailman-Version: 3.1.1 Precedence: list Subject: [MPTCP] [PATCH v2] mptcp: wmem accounting and nonblocking io support List-Id: Discussions regarding MPTCP upstreaming Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: This is the second iteration of the mptcp poll rework. Most patches have no changes, the ones with changes have these highlighted in the changelog (6/9/10). This addresses comments from Paolo. I've also axed the 'empty send' part as per comments from Paolo and Mat. There were doubts as to the repeated invocations of mptcp_send_frag() (RFC patch 13/14, "sendmsg: don't restart mptcp_sendmsg_frag"), I have dropped it from this iteration -- mptcp_sendmsg_frag needs rework which isn't related to this patch series. Old covert letter: The first patch extends the test suite with a mmap-based mode to check large, blocking writes. This uncovered a minor problem with the earlier v2 wmem accounting patch series -- we would happily take a lot more data than sndbuf allowed, as we only limited based on what the subflow could accept. So with a 4k sndbuf we could easily accept 256kb or even more. This patch doesn't change anything in the test suite behaviour however, you need to use "-b 4096" and/or "-m mmap" to enable this mode. Second patch changes test suite to move to nonblocking io, this breaks mptcp because mptcp_poll can signal EPOLLIN when it shouldn't, so userspace gets -EAGAIN even though poll told it otherwise. Patches 3/4/5/6 are an update vs. last wmem accounting series. Remaining patches fix the nonblocking io behaviour. mptcp_poll is made to be stand-alone, i.e. it no longer calls __tcp_poll on the subflow sockets and only considers mptcp_sk state. After this series the selftest works again and mptcp sk rtx queue is limited by msk wmem. The patches can't easily be rebased/merged so I propose that I would squash this myself and send a pull request when done. The following changes since commit c925affd3fa9dd18e6c326cff510bd30087f0ac8: subflow: wake parent mptcp socket on subflow state change (2019-11-18 02:44:11 +0000) are available in the Git repository at: git://git.breakpoint.cc/fw/mptcp-next.git tcp_poll_removal_08 for you to fetch changes up to 2bbcfcd464c6ff1626800cc2e7c02085e02f35e8: sendmsg: truncate source buffer if mptcp sndbuf size was set from userspace (2019-11-18 17:45:44 +0100) ---------------------------------------------------------------- Florian Westphal (13): selftest: add mmap-write support selftests: make sockets non-blocking for default poll mode mptcp: add wmem_queued accounting mptcp: allow partial cleaning of rtx head dfrag mptcp: add and use mptcp RTX flag sendmsg: block until mptcp sk is writeable subflow: sk_data_ready: make wakeup on tcp sock conditional mptcp: add and use mptcp_subflow_get_retrans mptcp: sendmsg: transmit on backup if other subflows have been closed recv: make DATA_READY reflect ssk in-sequence state sendmsg: clear SEND_SPACE if write caused wmem to grow too large mptcp_poll: don't consider subflow socket state anymore sendmsg: truncate source buffer if mptcp sndbuf size was set from userspace net/mptcp/options.c | 2 +- net/mptcp/protocol.c | 249 ++++++++++++++++--- net/mptcp/protocol.h | 4 +- net/mptcp/subflow.c | 12 +- tools/testing/selftests/net/mptcp/mptcp_connect.c | 268 ++++++++++++++++++++- tools/testing/selftests/net/mptcp/mptcp_connect.sh | 36 ++- 6 files changed, 518 insertions(+), 53 deletions(-) From patchwork Mon Nov 18 21:45:27 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Florian Westphal X-Patchwork-Id: 1196991 X-Patchwork-Delegate: matthieu.baerts@tessares.net Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.01.org (client-ip=198.145.21.10; helo=ml01.01.org; envelope-from=mptcp-bounces@lists.01.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=strlen.de Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 47H2Zb6pfpz9sPT for ; Tue, 19 Nov 2019 08:45:59 +1100 (AEDT) Received: from new-ml01.vlan13.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id 9A005100DC2A8; Mon, 18 Nov 2019 13:46:53 -0800 (PST) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=2a0a:51c0:0:12e:520::1; helo=chamillionaire.breakpoint.cc; envelope-from=fw@breakpoint.cc; receiver= Received: from Chamillionaire.breakpoint.cc (Chamillionaire.breakpoint.cc [IPv6:2a0a:51c0:0:12e:520::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id E0BDA100DC431 for ; Mon, 18 Nov 2019 13:46:51 -0800 (PST) Received: from fw by Chamillionaire.breakpoint.cc with local (Exim 4.92) (envelope-from ) id 1iWoqI-0008HY-Ff; Mon, 18 Nov 2019 22:45:54 +0100 From: Florian Westphal To: Cc: Florian Westphal Date: Mon, 18 Nov 2019 22:45:27 +0100 Message-Id: <20191118214538.21931-3-fw@strlen.de> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20191118214538.21931-1-fw@strlen.de> References: <20191118214538.21931-1-fw@strlen.de> MIME-Version: 1.0 Message-ID-Hash: 7XTBCJJN5MEYMQJWTQ7VQQ7EXIJSIXWU X-Message-ID-Hash: 7XTBCJJN5MEYMQJWTQ7VQQ7EXIJSIXWU X-MailFrom: fw@breakpoint.cc X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header X-Mailman-Version: 3.1.1 Precedence: list Subject: [MPTCP] [PATCH v2 02/13] selftests: make sockets non-blocking for default poll mode List-Id: Discussions regarding MPTCP upstreaming Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: This change makes tests fail because mptcp_poll may signal POLLIN when no data is there and POLLOUT when it should not. Rest of series addresses this and makes selftest work again. Signed-off-by: Florian Westphal --- tools/testing/selftests/net/mptcp/mptcp_connect.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/tools/testing/selftests/net/mptcp/mptcp_connect.c b/tools/testing/selftests/net/mptcp/mptcp_connect.c index ea8e08b1f481..ab52468a4b51 100644 --- a/tools/testing/selftests/net/mptcp/mptcp_connect.c +++ b/tools/testing/selftests/net/mptcp/mptcp_connect.c @@ -279,6 +279,15 @@ static ssize_t do_rnd_read(const int fd, char *buf, const size_t len) return read(fd, buf, cap); } +static void set_nonblock(int fd) +{ + int flags = fcntl(fd, F_GETFL); + if (flags == -1) + return; + + fcntl(fd, F_SETFL, flags | O_NONBLOCK); +} + static int copyfd_io_poll(int infd, int peerfd, int outfd) { struct pollfd fds = { @@ -288,6 +297,8 @@ static int copyfd_io_poll(int infd, int peerfd, int outfd) unsigned int woff = 0, wlen = 0; char wbuf[8192]; + set_nonblock(peerfd); + for (;;) { char rbuf[8192]; ssize_t len; From patchwork Mon Nov 18 21:45:28 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Florian Westphal X-Patchwork-Id: 1196992 X-Patchwork-Delegate: matthieu.baerts@tessares.net Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.01.org (client-ip=198.145.21.10; helo=ml01.01.org; envelope-from=mptcp-bounces@lists.01.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=strlen.de Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 47H2Zg2gCfz9sP4 for ; Tue, 19 Nov 2019 08:46:03 +1100 (AEDT) Received: from new-ml01.vlan13.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id B02BC100DC2A5; Mon, 18 Nov 2019 13:46:57 -0800 (PST) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=2a0a:51c0:0:12e:520::1; helo=chamillionaire.breakpoint.cc; envelope-from=fw@breakpoint.cc; receiver= Received: from Chamillionaire.breakpoint.cc (Chamillionaire.breakpoint.cc [IPv6:2a0a:51c0:0:12e:520::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 0FB7D100DC2A5 for ; Mon, 18 Nov 2019 13:46:56 -0800 (PST) Received: from fw by Chamillionaire.breakpoint.cc with local (Exim 4.92) (envelope-from ) id 1iWoqM-0008Hg-MT; Mon, 18 Nov 2019 22:45:58 +0100 From: Florian Westphal To: Cc: Florian Westphal Date: Mon, 18 Nov 2019 22:45:28 +0100 Message-Id: <20191118214538.21931-4-fw@strlen.de> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20191118214538.21931-1-fw@strlen.de> References: <20191118214538.21931-1-fw@strlen.de> MIME-Version: 1.0 Message-ID-Hash: I3YQGYNWJWP2BGCEPVE422Z6IGOTD6RT X-Message-ID-Hash: I3YQGYNWJWP2BGCEPVE422Z6IGOTD6RT X-MailFrom: fw@breakpoint.cc X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header X-Mailman-Version: 3.1.1 Precedence: list Subject: [MPTCP] [PATCH v2 03/13] mptcp: add wmem_queued accounting List-Id: Discussions regarding MPTCP upstreaming Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: Peer could ack data at TCP level but refrain from sending mptcp-level ACKs. This could result in a growing the mptcp socket backlog indefinitely. We should thus block mptcp_sendmsg until the peer has acked some of the sent data. In order to be able to do so, increment the mptcp socket wmem_queued counter on memory allocation and decrement it when releasing the memory on mptcp-level ack reception. Because TCP performns sndbuf auto-tuning up to tcp_wmem_max[2], make this the mptcp sk_sndbuf limit. In the future we could add experiment with autotuning as TCP does in tcp_sndbuf_expand(). Signed-off-by: Florian Westphal --- net/mptcp/protocol.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 1a432abfb176..9ad5bd5c2437 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -139,8 +139,11 @@ static inline bool mptcp_frag_can_collapse_to(const struct mptcp_sock *msk, static void dfrag_clear(struct sock *sk, struct mptcp_data_frag *dfrag) { + int len = dfrag->data_len + dfrag->overhead; + list_del(&dfrag->list); - sk_mem_uncharge(sk, dfrag->data_len + dfrag->overhead); + sk_mem_uncharge(sk, len); + sk_wmem_queued_add(sk, -len); put_page(dfrag->page); } @@ -304,6 +307,9 @@ static int mptcp_sendmsg_frag(struct sock *sk, struct sock *ssk, if (!dfrag_collapsed) { get_page(dfrag->page); list_add_tail(&dfrag->list, &msk->rtx_queue); + sk_wmem_queued_add(sk, frag_truesize); + } else { + sk_wmem_queued_add(sk, ret); } /* charge data on mptcp rtx queue to the master socket @@ -711,6 +717,7 @@ static int mptcp_init_sock(struct sock *sk) return ret; sk_sockets_allocated_inc(sk); + sk->sk_sndbuf = sock_net(sk)->ipv4.sysctl_tcp_wmem[2]; return 0; } From patchwork Mon Nov 18 21:45:29 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Florian Westphal X-Patchwork-Id: 1196993 X-Patchwork-Delegate: matthieu.baerts@tessares.net Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.01.org (client-ip=198.145.21.10; helo=ml01.01.org; envelope-from=mptcp-bounces@lists.01.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=strlen.de Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 47H2Zm07Tsz9sP4 for ; Tue, 19 Nov 2019 08:46:07 +1100 (AEDT) Received: from new-ml01.vlan13.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id BA995100DC3FD; Mon, 18 Nov 2019 13:47:01 -0800 (PST) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=2a0a:51c0:0:12e:520::1; helo=chamillionaire.breakpoint.cc; envelope-from=fw@breakpoint.cc; receiver= Received: from Chamillionaire.breakpoint.cc (Chamillionaire.breakpoint.cc [IPv6:2a0a:51c0:0:12e:520::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 32E7D100DC431 for ; Mon, 18 Nov 2019 13:47:00 -0800 (PST) Received: from fw by Chamillionaire.breakpoint.cc with local (Exim 4.92) (envelope-from ) id 1iWoqQ-0008Hx-TT; Mon, 18 Nov 2019 22:46:02 +0100 From: Florian Westphal To: Cc: Florian Westphal Date: Mon, 18 Nov 2019 22:45:29 +0100 Message-Id: <20191118214538.21931-5-fw@strlen.de> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20191118214538.21931-1-fw@strlen.de> References: <20191118214538.21931-1-fw@strlen.de> MIME-Version: 1.0 Message-ID-Hash: HWIRXAMFW4QZHQPCU6CQAP3ED33SOJLU X-Message-ID-Hash: HWIRXAMFW4QZHQPCU6CQAP3ED33SOJLU X-MailFrom: fw@breakpoint.cc X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header X-Mailman-Version: 3.1.1 Precedence: list Subject: [MPTCP] [PATCH v2 04/13] mptcp: allow partial cleaning of rtx head dfrag List-Id: Discussions regarding MPTCP upstreaming Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: After adding wmem accouting for the mptcp socket we could get into a situation where the mptcp socket can't transmit more data, and mptcp_clean_una doesn't reduce wmem even if snd_una has advanced because it currently will only remove entire dfrags. Allow advancing the dfrag head sequence and reduce wmem, even though this isn't correct (as we can't release the page). Because we will soon block on mptcp sk in case wmem is too large, call sk_stream_write_space() in case we reduced the backlog so userspace task blocked in sendmsg or poll will be woken up. This isn't an issue if the send buffer is large, but it is when SO_SNDBUF is used to reduce it to a lower value. Note we can still get a deadlock for low SO_SNDBUF values in case both sides of the connection write to the socket: both could be blocked due to wmem being too small -- and current mptcp stack will only increment mptcp ack_seq on recv. This doesn't happen with the selftest as it uses poll() and will always call recv if there is data to read. Signed-off-by: Florian Westphal --- net/mptcp/protocol.c | 28 +++++++++++++++++++++++++--- 1 file changed, 25 insertions(+), 3 deletions(-) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 9ad5bd5c2437..a09ea93896c7 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -137,13 +137,18 @@ static inline bool mptcp_frag_can_collapse_to(const struct mptcp_sock *msk, df->data_seq + df->data_len == msk->write_seq; } +static void dfrag_uncharge(struct sock *sk, int len) +{ + sk_mem_uncharge(sk, len); + sk_wmem_queued_add(sk, -len); +} + static void dfrag_clear(struct sock *sk, struct mptcp_data_frag *dfrag) { int len = dfrag->data_len + dfrag->overhead; list_del(&dfrag->list); - sk_mem_uncharge(sk, len); - sk_wmem_queued_add(sk, -len); + dfrag_uncharge(sk, len); put_page(dfrag->page); } @@ -152,14 +157,31 @@ static void mptcp_clean_una(struct sock *sk) struct mptcp_sock *msk = mptcp_sk(sk); struct mptcp_data_frag *dtmp, *dfrag; u64 snd_una = atomic64_read(&msk->snd_una); + bool cleaned = false; list_for_each_entry_safe(dfrag, dtmp, &msk->rtx_queue, list) { if (after64(dfrag->data_seq + dfrag->data_len, snd_una)) break; dfrag_clear(sk, dfrag); + cleaned = true; + } + + dfrag = mptcp_rtx_head(sk); + if (dfrag && after64(snd_una, dfrag->data_seq)) { + u64 delta = dfrag->data_seq + dfrag->data_len - snd_una; + + dfrag->data_seq += delta; + dfrag->data_len -= delta; + + dfrag_uncharge(sk, delta); + cleaned = true; + } + + if (cleaned) { + sk_mem_reclaim_partial(sk); + sk_stream_write_space(sk); } - sk_mem_reclaim_partial(sk); } /* ensure we get enough memory for the frag hdr, beyond some minimal amount of From patchwork Mon Nov 18 21:45:30 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Florian Westphal X-Patchwork-Id: 1196994 X-Patchwork-Delegate: matthieu.baerts@tessares.net Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.01.org (client-ip=198.145.21.10; helo=ml01.01.org; envelope-from=mptcp-bounces@lists.01.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=strlen.de Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 47H2Zq2w1sz9s4Y for ; Tue, 19 Nov 2019 08:46:11 +1100 (AEDT) Received: from new-ml01.vlan13.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id C69B4100DC2A8; Mon, 18 Nov 2019 13:47:05 -0800 (PST) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=2a0a:51c0:0:12e:520::1; helo=chamillionaire.breakpoint.cc; envelope-from=fw@breakpoint.cc; receiver= Received: from Chamillionaire.breakpoint.cc (Chamillionaire.breakpoint.cc [IPv6:2a0a:51c0:0:12e:520::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 5BC66100DC2A8 for ; Mon, 18 Nov 2019 13:47:04 -0800 (PST) Received: from fw by Chamillionaire.breakpoint.cc with local (Exim 4.92) (envelope-from ) id 1iWoqV-0008I4-2R; Mon, 18 Nov 2019 22:46:07 +0100 From: Florian Westphal To: Cc: Florian Westphal Date: Mon, 18 Nov 2019 22:45:30 +0100 Message-Id: <20191118214538.21931-6-fw@strlen.de> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20191118214538.21931-1-fw@strlen.de> References: <20191118214538.21931-1-fw@strlen.de> MIME-Version: 1.0 Message-ID-Hash: YK3QOLMAO23KFJTYIGAIAKLZ5EOVBYDD X-Message-ID-Hash: YK3QOLMAO23KFJTYIGAIAKLZ5EOVBYDD X-MailFrom: fw@breakpoint.cc X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header X-Mailman-Version: 3.1.1 Precedence: list Subject: [MPTCP] [PATCH v2 05/13] mptcp: add and use mptcp RTX flag List-Id: Discussions regarding MPTCP upstreaming Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: This is needed in the (unlikely) case that userspace is blocked in mptcp_sendmsg because wmem is exhausted. In that case, only the rtx work queue will clean the rtx backlog, but it could take several milliseconds until it runs next. So, allow it to get scheduled as soon as possible so wmem can be reclaimed if the mptcp socket sndbuf is exhausted. Because such quick-schedule should not cause retransmits, add a flag that indicates when the work queue has been scheduled on behalf of the retransmit timer. Signed-off-by: Florian Westphal --- net/mptcp/options.c | 2 +- net/mptcp/protocol.c | 26 ++++++++++++++++++++------ net/mptcp/protocol.h | 3 ++- 3 files changed, 23 insertions(+), 8 deletions(-) diff --git a/net/mptcp/options.c b/net/mptcp/options.c index 9a18a3670cdf..035ac67541a0 100644 --- a/net/mptcp/options.c +++ b/net/mptcp/options.c @@ -583,7 +583,7 @@ static void update_una(struct mptcp_sock *msk, old_snd_una = atomic64_cmpxchg(&msk->snd_una, snd_una, new_snd_una); if (old_snd_una == snd_una) { - mptcp_reset_timer((struct sock *)msk); + mptcp_data_acked((struct sock *)msk); break; } } diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index a09ea93896c7..2144e80b8704 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -40,7 +40,7 @@ static bool mptcp_timer_pending(struct sock *sk) return timer_pending(&inet_csk(sk)->icsk_retransmit_timer); } -void mptcp_reset_timer(struct sock *sk) +static void mptcp_reset_timer(struct sock *sk) { struct inet_connection_sock *icsk = inet_csk(sk); unsigned long tout; @@ -52,6 +52,15 @@ void mptcp_reset_timer(struct sock *sk) sk_reset_timer(sk, &icsk->icsk_retransmit_timer, jiffies + tout); } +void mptcp_data_acked(struct sock *sk) +{ + mptcp_reset_timer(sk); + + if (!sk_stream_is_writeable(sk) && + schedule_work(&mptcp_sk(sk)->rtx_work)) + sock_hold(sk); +} + static void mptcp_stop_timer(struct sock *sk) { struct inet_connection_sock *icsk = inet_csk(sk); @@ -623,6 +632,7 @@ static void mptcp_retransmit_handler(struct sock *sk) if (atomic64_read(&msk->snd_una) == msk->write_seq) { mptcp_stop_timer(sk); } else { + set_bit(MPTCP_WORK_RTX, &msk->flags); if (schedule_work(&msk->rtx_work)) sock_hold(sk); } @@ -647,7 +657,7 @@ static void mptcp_retransmit_timer(struct timer_list *t) sock_put(sk); } -static void mptcp_retransmit(struct work_struct *work) +static void mptcp_worker(struct work_struct *work) { int orig_len, orig_offset, ret, mss_now = 0, size_goal = 0; struct mptcp_data_frag *dfrag; @@ -663,6 +673,10 @@ static void mptcp_retransmit(struct work_struct *work) lock_sock(sk); mptcp_clean_una(sk); + + if (!test_and_clear_bit(MPTCP_WORK_RTX, &msk->flags)) + goto unlock; + dfrag = mptcp_rtx_head(sk); if (!dfrag) goto unlock; @@ -715,7 +729,7 @@ static int __mptcp_init_sock(struct sock *sk) INIT_LIST_HEAD(&msk->conn_list); INIT_LIST_HEAD(&msk->rtx_queue); - INIT_WORK(&msk->rtx_work, mptcp_retransmit); + INIT_WORK(&msk->rtx_work, mptcp_worker); /* re-use the csk retrans timer for MPTCP-level retrans */ timer_setup(&msk->sk.icsk_retransmit_timer, mptcp_retransmit_timer, 0); @@ -755,7 +769,7 @@ static void __mptcp_clear_xmit(struct sock *sk) dfrag_clear(sk, dfrag); } -static void mptcp_cancel_rtx_work(struct sock *sk) +static void mptcp_cancel_work(struct sock *sk) { struct mptcp_sock *msk = mptcp_sk(sk); @@ -792,7 +806,7 @@ static void mptcp_close(struct sock *sk, long timeout) __mptcp_clear_xmit(sk); release_sock(sk); - mptcp_cancel_rtx_work(sk); + mptcp_cancel_work(sk); sk_common_release(sk); } @@ -802,7 +816,7 @@ static int mptcp_disconnect(struct sock *sk, int flags) lock_sock(sk); __mptcp_clear_xmit(sk); release_sock(sk); - mptcp_cancel_rtx_work(sk); + mptcp_cancel_work(sk); return tcp_disconnect(sk, flags); } diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index 63ff8bd8a098..6e23da8c5024 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -76,6 +76,7 @@ /* MPTCP socket flags */ #define MPTCP_DATA_READY BIT(0) +#define MPTCP_WORK_RTX BIT(1) static inline __be32 mptcp_option(u8 subopt, u8 len, u8 nib, u8 field) { @@ -290,7 +291,7 @@ void mptcp_get_options(const struct sk_buff *skb, void mptcp_finish_connect(struct sock *sk, int mp_capable); void mptcp_finish_join(struct sock *sk); -void mptcp_reset_timer(struct sock *sk); +void mptcp_data_acked(struct sock *sk); int mptcp_token_new_request(struct request_sock *req); void mptcp_token_destroy_request(u32 token); From patchwork Mon Nov 18 21:45:31 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Florian Westphal X-Patchwork-Id: 1196995 X-Patchwork-Delegate: matthieu.baerts@tessares.net Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.01.org (client-ip=198.145.21.10; helo=ml01.01.org; envelope-from=mptcp-bounces@lists.01.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=strlen.de Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 47H2Zx07zbz9s4Y for ; Tue, 19 Nov 2019 08:46:16 +1100 (AEDT) Received: from new-ml01.vlan13.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id CF829100DC2A4; Mon, 18 Nov 2019 13:47:10 -0800 (PST) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=2a0a:51c0:0:12e:520::1; helo=chamillionaire.breakpoint.cc; envelope-from=fw@breakpoint.cc; receiver= Received: from Chamillionaire.breakpoint.cc (Chamillionaire.breakpoint.cc [IPv6:2a0a:51c0:0:12e:520::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id C12DC100DC42F for ; Mon, 18 Nov 2019 13:47:08 -0800 (PST) Received: from fw by Chamillionaire.breakpoint.cc with local (Exim 4.92) (envelope-from ) id 1iWoqZ-0008IC-7h; Mon, 18 Nov 2019 22:46:11 +0100 From: Florian Westphal To: Cc: Florian Westphal Date: Mon, 18 Nov 2019 22:45:31 +0100 Message-Id: <20191118214538.21931-7-fw@strlen.de> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20191118214538.21931-1-fw@strlen.de> References: <20191118214538.21931-1-fw@strlen.de> MIME-Version: 1.0 Message-ID-Hash: EGEO6LNV5WXR5QRDWXOKXQNQQV5UNSTY X-Message-ID-Hash: EGEO6LNV5WXR5QRDWXOKXQNQQV5UNSTY X-MailFrom: fw@breakpoint.cc X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header X-Mailman-Version: 3.1.1 Precedence: list Subject: [MPTCP] [PATCH v2 06/13] sendmsg: block until mptcp sk is writeable List-Id: Discussions regarding MPTCP upstreaming Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: This disables transmit of new data until the peer has acked enough mptcp data to get below the wspace write threshold (more than half of wspace upperlimit is available again). Also have poll not report EPOLLOUT in this case, its not relevant if a subflow is writeable. The latter is a temporary workaround that is needed because mptcp_poll walks the subflows and calls __tcp_poll on each of them. Because subflow ssk is usually writable, the EPOLLOUT needs to be removed if the mptcp sndbuf is exhausted. This is only needed until __tcp_poll is removed later in the series. v2: drop the "empty send" part as per comments from Paolo and Mat. Signed-off-by: Florian Westphal --- net/mptcp/protocol.c | 24 ++++++++++++++++-------- 1 file changed, 16 insertions(+), 8 deletions(-) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 2144e80b8704..1515c3bad751 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -406,23 +406,27 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) return ret; } + timeo = sock_sndtimeo(sk, msg->msg_flags & MSG_DONTWAIT); + + mptcp_clean_una(sk); + + while (!sk_stream_memory_free(sk)) { + ret = sk_stream_wait_memory(sk, &timeo); + if (ret) + goto out; + + mptcp_clean_una(sk); + } + ssk = mptcp_subflow_get(msk); if (!ssk) { release_sock(sk); return -ENOTCONN; } - if (!msg_data_left(msg)) { - pr_debug("empty send"); - ret = sock_sendmsg(ssk->sk_socket, msg); - goto out; - } - pr_debug("conn_list->subflow=%p", ssk); lock_sock(ssk); - mptcp_clean_una(sk); - timeo = sock_sndtimeo(sk, msg->msg_flags & MSG_DONTWAIT); while (msg_data_left(msg)) { ret = mptcp_sendmsg_frag(sk, ssk, msg, NULL, &timeo, &mss_now, &size_goal); @@ -1312,6 +1316,10 @@ static __poll_t mptcp_poll(struct file *file, struct socket *sock, tcp_sock = mptcp_subflow_tcp_socket(subflow); ret |= __tcp_poll(tcp_sock->sk); } + + if (!sk_stream_is_writeable(sk)) + ret &= ~(EPOLLOUT|EPOLLWRNORM); + release_sock(sk); return ret; From patchwork Mon Nov 18 21:45:32 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Florian Westphal X-Patchwork-Id: 1196996 X-Patchwork-Delegate: matthieu.baerts@tessares.net Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.01.org (client-ip=2001:19d0:306:5::1; helo=ml01.01.org; envelope-from=mptcp-bounces@lists.01.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=strlen.de Received: from ml01.01.org (ml01.01.org [IPv6:2001:19d0:306:5::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 47H2b141lLz9s4Y for ; Tue, 19 Nov 2019 08:46:21 +1100 (AEDT) Received: from new-ml01.vlan13.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id D8025100DC2A4; Mon, 18 Nov 2019 13:47:15 -0800 (PST) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=2a0a:51c0:0:12e:520::1; helo=chamillionaire.breakpoint.cc; envelope-from=fw@breakpoint.cc; receiver= Received: from Chamillionaire.breakpoint.cc (Chamillionaire.breakpoint.cc [IPv6:2a0a:51c0:0:12e:520::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id BE6C1100DC42F for ; Mon, 18 Nov 2019 13:47:13 -0800 (PST) Received: from fw by Chamillionaire.breakpoint.cc with local (Exim 4.92) (envelope-from ) id 1iWoqd-0008IJ-KH; Mon, 18 Nov 2019 22:46:15 +0100 From: Florian Westphal To: Cc: Florian Westphal Date: Mon, 18 Nov 2019 22:45:32 +0100 Message-Id: <20191118214538.21931-8-fw@strlen.de> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20191118214538.21931-1-fw@strlen.de> References: <20191118214538.21931-1-fw@strlen.de> MIME-Version: 1.0 Message-ID-Hash: RFCBBBJRGVYAMRP3EYJJU3RCCQO4TWZC X-Message-ID-Hash: RFCBBBJRGVYAMRP3EYJJU3RCCQO4TWZC X-MailFrom: fw@breakpoint.cc X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header X-Mailman-Version: 3.1.1 Precedence: list Subject: [MPTCP] [PATCH v2 07/13] subflow: sk_data_ready: make wakeup on tcp sock conditional List-Id: Discussions regarding MPTCP upstreaming Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: No need for this unless we don't have a parent or the socket is not mp capable. Only the mptcp socket will be waiting for events. In case the mptcp socket connected to a tcp-only peer, we're in fallback mode and need to wakeup the parent too. Signed-off-by: Florian Westphal --- net/mptcp/subflow.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c index ff38d54392cd..976e49349276 100644 --- a/net/mptcp/subflow.c +++ b/net/mptcp/subflow.c @@ -646,10 +646,13 @@ static void subflow_data_ready(struct sock *sk) struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(sk); struct sock *parent = subflow->conn; - subflow->tcp_sk_data_ready(sk); + if (!parent || !(subflow->mp_capable || subflow->mp_join)) { + subflow->tcp_sk_data_ready(sk); - if (!parent || !(subflow->mp_capable || subflow->mp_join)) + if (parent) + parent->sk_data_ready(parent); return; + } /* always propagate the EoF */ if (mptcp_subflow_data_available(sk) || subflow->rx_eof) { From patchwork Mon Nov 18 21:45:33 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Florian Westphal X-Patchwork-Id: 1196997 X-Patchwork-Delegate: matthieu.baerts@tessares.net Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.01.org (client-ip=198.145.21.10; helo=ml01.01.org; envelope-from=mptcp-bounces@lists.01.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=strlen.de Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 47H2b60pccz9s4Y for ; Tue, 19 Nov 2019 08:46:26 +1100 (AEDT) Received: from new-ml01.vlan13.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id E0783100DC2A5; Mon, 18 Nov 2019 13:47:19 -0800 (PST) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=2a0a:51c0:0:12e:520::1; helo=chamillionaire.breakpoint.cc; envelope-from=fw@breakpoint.cc; receiver= Received: from Chamillionaire.breakpoint.cc (Chamillionaire.breakpoint.cc [IPv6:2a0a:51c0:0:12e:520::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 27BB5100DC2A8 for ; Mon, 18 Nov 2019 13:47:17 -0800 (PST) Received: from fw by Chamillionaire.breakpoint.cc with local (Exim 4.92) (envelope-from ) id 1iWoqh-0008IQ-Qk; Mon, 18 Nov 2019 22:46:19 +0100 From: Florian Westphal To: Cc: Florian Westphal Date: Mon, 18 Nov 2019 22:45:33 +0100 Message-Id: <20191118214538.21931-9-fw@strlen.de> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20191118214538.21931-1-fw@strlen.de> References: <20191118214538.21931-1-fw@strlen.de> MIME-Version: 1.0 Message-ID-Hash: NZ6BGQWD7AMCBJQJ4AC6SISIX7WCTGLE X-Message-ID-Hash: NZ6BGQWD7AMCBJQJ4AC6SISIX7WCTGLE X-MailFrom: fw@breakpoint.cc X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header X-Mailman-Version: 3.1.1 Precedence: list Subject: [MPTCP] [PATCH v2 08/13] mptcp: add and use mptcp_subflow_get_retrans List-Id: Discussions regarding MPTCP upstreaming Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: Instead of having retransmit worker grab first subflow on the list, make it always return NULL, unless either the first non-backup subflow on the list is idle or all normal subflows have already been removed. In the latter case, the first idle backup subflow is used. Rationale is that it makes no sense to attempt to retransmit at mptcp level if there is still unsent data in its write queue. Signed-off-by: Florian Westphal --- net/mptcp/protocol.c | 33 ++++++++++++++++++++++++++++++++- 1 file changed, 32 insertions(+), 1 deletion(-) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 1515c3bad751..d0b050f6611e 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -661,6 +661,37 @@ static void mptcp_retransmit_timer(struct timer_list *t) sock_put(sk); } +/* Find an idle subflow. Return NULL if there is unacked data at tcp + * level. + * + * A backup subflow is returned only if thats the only kind available. + */ +static struct sock *mptcp_subflow_get_retrans(const struct mptcp_sock *msk) +{ + struct mptcp_subflow_context *subflow; + struct sock *backup = NULL; + + sock_owned_by_me((const struct sock *)msk); + + mptcp_for_each_subflow(msk, subflow) { + struct sock *ssk = mptcp_subflow_tcp_socket(subflow)->sk; + + /* still data outstanding at TCP level? Don't retransmit. */ + if (!tcp_write_queue_empty(ssk)) + return NULL; + + if (subflow->backup) { + if (!backup) + backup = ssk; + continue; + } + + return ssk; + } + + return backup; +} + static void mptcp_worker(struct work_struct *work) { int orig_len, orig_offset, ret, mss_now = 0, size_goal = 0; @@ -685,7 +716,7 @@ static void mptcp_worker(struct work_struct *work) if (!dfrag) goto unlock; - ssk = mptcp_subflow_get(msk); + ssk = mptcp_subflow_get_retrans(msk); if (!ssk) goto reset_unlock; From patchwork Mon Nov 18 21:45:34 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Florian Westphal X-Patchwork-Id: 1196998 X-Patchwork-Delegate: matthieu.baerts@tessares.net Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.01.org (client-ip=198.145.21.10; helo=ml01.01.org; envelope-from=mptcp-bounces@lists.01.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=strlen.de Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 47H2bB6C2fz9s4Y for ; Tue, 19 Nov 2019 08:46:30 +1100 (AEDT) Received: from new-ml01.vlan13.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id E8D1F100DC2A7; Mon, 18 Nov 2019 13:47:23 -0800 (PST) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=2a0a:51c0:0:12e:520::1; helo=chamillionaire.breakpoint.cc; envelope-from=fw@breakpoint.cc; receiver= Received: from Chamillionaire.breakpoint.cc (Chamillionaire.breakpoint.cc [IPv6:2a0a:51c0:0:12e:520::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 5E421100DC2A8 for ; Mon, 18 Nov 2019 13:47:21 -0800 (PST) Received: from fw by Chamillionaire.breakpoint.cc with local (Exim 4.92) (envelope-from ) id 1iWoqm-0008IX-07; Mon, 18 Nov 2019 22:46:24 +0100 From: Florian Westphal To: Cc: Florian Westphal Date: Mon, 18 Nov 2019 22:45:34 +0100 Message-Id: <20191118214538.21931-10-fw@strlen.de> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20191118214538.21931-1-fw@strlen.de> References: <20191118214538.21931-1-fw@strlen.de> MIME-Version: 1.0 Message-ID-Hash: X3VXS6XL5CEMCQDIXLSDUYPCJNGECS7I X-Message-ID-Hash: X3VXS6XL5CEMCQDIXLSDUYPCJNGECS7I X-MailFrom: fw@breakpoint.cc X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header X-Mailman-Version: 3.1.1 Precedence: list Subject: [MPTCP] [PATCH v2 09/13] mptcp: sendmsg: transmit on backup if other subflows have been closed List-Id: Discussions regarding MPTCP upstreaming Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: Currently we always pick the first ssk on the list and then have mptcp_sendmsg_frag wait until more space becomes available in case that ssk has no write space available. Instead check the first subflow on the list. If no more write space is available, then we need to either return -EAGAIN to userspace (nonblock case), or we need to wait until a subflow becomes available. This is done by blocking the current thread via sk_stream_wait_memory() and then make the subflow sk_write_space() unblock the parent mptcp socket. We can't acquire the mptcp socket lock from the subflow callbacks, but we can use the mptcp_sk->flags -- MPTCP_SEND_SPACE flag is added for this purpose. If it gets set, then at least one subflow has become available for writing. v1: dumb-down the selection: just pick the first ssk on the list and make mptcp socket block if it has no wspace. Backup is only used if no non-backup subflow exists. v2: avoid another while loop and fold !ssk condition with wmem check on parent mptcp socket (Paolo). Signed-off-by: Florian Westphal --- net/mptcp/protocol.c | 58 +++++++++++++++++++++++++++++++++++++++----- net/mptcp/protocol.h | 1 + net/mptcp/subflow.c | 5 +++- 3 files changed, 57 insertions(+), 7 deletions(-) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index d0b050f6611e..be927f456a18 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -384,6 +384,43 @@ static int mptcp_sendmsg_frag(struct sock *sk, struct sock *ssk, return ret; } +static struct sock *mptcp_subflow_get_send(struct mptcp_sock *msk) +{ + struct mptcp_subflow_context *subflow; + struct sock *backup = NULL; + + sock_owned_by_me((const struct sock *)msk); + + mptcp_for_each_subflow(msk, subflow) { + struct sock *ssk = mptcp_subflow_tcp_socket(subflow)->sk; + + if (!sk_stream_memory_free(ssk)) { + struct socket *sock = ssk->sk_socket; + + if (sock) { + clear_bit(MPTCP_SEND_SPACE, &msk->flags); + smp_mb__after_atomic(); + + /* enables sk->write_space() callbacks */ + set_bit(SOCK_NOSPACE, &sock->flags); + } + + return NULL; + } + + if (subflow->backup) { + if (!backup) + backup = ssk; + + continue; + } + + return ssk; + } + + return backup; +} + static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) { int mss_now = 0, size_goal = 0, ret = 0; @@ -410,18 +447,19 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) mptcp_clean_una(sk); - while (!sk_stream_memory_free(sk)) { + ssk = mptcp_subflow_get_send(msk); + while (!sk_stream_memory_free(sk) || !ssk) { ret = sk_stream_wait_memory(sk, &timeo); if (ret) goto out; mptcp_clean_una(sk); - } - ssk = mptcp_subflow_get(msk); - if (!ssk) { - release_sock(sk); - return -ENOTCONN; + ssk = mptcp_subflow_get_send(msk); + if (list_empty(&msk->conn_list)) { + ret = -ENOTCONN; + goto out; + } } pr_debug("conn_list->subflow=%p", ssk); @@ -1117,6 +1155,13 @@ bool mptcp_sk_is_subflow(const struct sock *sk) return subflow->mp_join == 1; } +static bool mptcp_memory_free(const struct sock *sk, int wake) +{ + struct mptcp_sock *msk = mptcp_sk(sk); + + return wake ? test_bit(MPTCP_SEND_SPACE, &msk->flags) : true; +} + static struct proto mptcp_prot = { .name = "MPTCP", .owner = THIS_MODULE, @@ -1137,6 +1182,7 @@ static struct proto mptcp_prot = { .sockets_allocated = &mptcp_sockets_allocated, .memory_allocated = &tcp_memory_allocated, .memory_pressure = &tcp_memory_pressure, + .stream_memory_free = mptcp_memory_free, .sysctl_wmem_offset = offsetof(struct net, ipv4.sysctl_tcp_wmem), .sysctl_mem = sysctl_tcp_mem, .obj_size = sizeof(struct mptcp_sock), diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index 6e23da8c5024..ce5c5de6a5eb 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -77,6 +77,7 @@ /* MPTCP socket flags */ #define MPTCP_DATA_READY BIT(0) #define MPTCP_WORK_RTX BIT(1) +#define MPTCP_SEND_SPACE BIT(2) static inline __be32 mptcp_option(u8 subopt, u8 len, u8 nib, u8 field) { diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c index 976e49349276..32082c6e8552 100644 --- a/net/mptcp/subflow.c +++ b/net/mptcp/subflow.c @@ -670,8 +670,11 @@ static void subflow_write_space(struct sock *sk) struct sock *parent = subflow->conn; sk_stream_write_space(sk); - if (parent) + if (parent && sk_stream_is_writeable(sk)) { + set_bit(MPTCP_SEND_SPACE, &mptcp_sk(parent)->flags); + smp_mb__after_atomic(); sk_stream_write_space(parent); + } } int mptcp_subflow_connect(struct sock *sk, struct sockaddr *local, From patchwork Mon Nov 18 21:45:35 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Florian Westphal X-Patchwork-Id: 1196999 X-Patchwork-Delegate: matthieu.baerts@tessares.net Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.01.org (client-ip=198.145.21.10; helo=ml01.01.org; envelope-from=mptcp-bounces@lists.01.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=strlen.de Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 47H2bG1jSjz9s4Y for ; Tue, 19 Nov 2019 08:46:34 +1100 (AEDT) Received: from new-ml01.vlan13.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id 09E2F100DC2A6; Mon, 18 Nov 2019 13:47:28 -0800 (PST) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=2a0a:51c0:0:12e:520::1; helo=chamillionaire.breakpoint.cc; envelope-from=fw@breakpoint.cc; receiver= Received: from Chamillionaire.breakpoint.cc (Chamillionaire.breakpoint.cc [IPv6:2a0a:51c0:0:12e:520::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 719AB100DC431 for ; Mon, 18 Nov 2019 13:47:26 -0800 (PST) Received: from fw by Chamillionaire.breakpoint.cc with local (Exim 4.92) (envelope-from ) id 1iWoqq-0008Ie-6M; Mon, 18 Nov 2019 22:46:28 +0100 From: Florian Westphal To: Cc: Florian Westphal Date: Mon, 18 Nov 2019 22:45:35 +0100 Message-Id: <20191118214538.21931-11-fw@strlen.de> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20191118214538.21931-1-fw@strlen.de> References: <20191118214538.21931-1-fw@strlen.de> MIME-Version: 1.0 Message-ID-Hash: 43GJKKRNT7RA7XSOH4IIJBVRHFWI3NBZ X-Message-ID-Hash: 43GJKKRNT7RA7XSOH4IIJBVRHFWI3NBZ X-MailFrom: fw@breakpoint.cc X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header X-Mailman-Version: 3.1.1 Precedence: list Subject: [MPTCP] [PATCH v2 10/13] recv: make DATA_READY reflect ssk in-sequence state List-Id: Discussions regarding MPTCP upstreaming Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: In order to make mptcp_poll independent of the subflows, we need to keep the mptcp DATA_READY flag in sync, i.e., if it is set, at least one ssk has in-sequence data. If it is cleared, no further data is available. Avoid the unconditional clearing on recv entry. Instead make sure the flag is cleared on exit if there is no more in-sequence data available. v2: - add back 'done = true' assigment (Paolo) - keep 'break' statements instead of 'goto out'. (Paolo) Signed-off-by: Florian Westphal --- net/mptcp/protocol.c | 35 ++++++++++++++++++++++++++--------- 1 file changed, 26 insertions(+), 9 deletions(-) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index be927f456a18..8b22cf245580 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -541,8 +541,10 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, { struct mptcp_sock *msk = mptcp_sk(sk); struct mptcp_subflow_context *subflow; + bool more_data_avail = false; struct mptcp_read_arg arg; read_descriptor_t desc; + bool wait_data = false; struct socket *ssock; struct tcp_sock *tp; bool done = false; @@ -575,10 +577,6 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, u32 map_remaining; int bytes_read; - smp_mb__before_atomic(); - clear_bit(MPTCP_DATA_READY, &msk->flags); - smp_mb__after_atomic(); - ssk = mptcp_subflow_recv_lookup(msk); pr_debug("msk=%p ssk=%p", msk, ssk); if (!ssk) @@ -588,7 +586,7 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, tp = tcp_sk(ssk); lock_sock(ssk); - while (mptcp_subflow_data_available(ssk) && !done) { + do { /* try to read as much data as available */ map_remaining = subflow->map_data_len - mptcp_subflow_get_map_offset(subflow); @@ -600,7 +598,7 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, if (!copied) copied = bytes_read; done = true; - continue; + goto next; } pr_debug("msk ack_seq=%llx -> %llx", msk->ack_seq, @@ -609,18 +607,22 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, copied += bytes_read; if (copied >= len) { done = true; - continue; + goto next; } if (tp->urg_data && tp->urg_seq == tp->copied_seq) { pr_err("Urgent data present, cannot proceed"); done = true; - continue; + goto next; } - } +next: + more_data_avail = mptcp_subflow_data_available(ssk); + } while (more_data_avail && !done); release_sock(ssk); continue; wait_for_data: + more_data_avail = false; + /* only the master socket status is relevant here. The exit * conditions mirror closely tcp_recvmsg() */ @@ -660,9 +662,24 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, } pr_debug("block timeout %ld", timeo); + wait_data = true; mptcp_wait_data(sk, &timeo); } + if (more_data_avail) { + if (!test_bit(MPTCP_DATA_READY, &msk->flags)) + set_bit(MPTCP_DATA_READY, &msk->flags); + } else if (!wait_data) { + clear_bit(MPTCP_DATA_READY, &msk->flags); + + /* .. race-breaker: ssk might get new data after last + * data_available() returns false. + */ + ssk = mptcp_subflow_recv_lookup(msk); + if (unlikely(ssk)) + set_bit(MPTCP_DATA_READY, &msk->flags); + } + release_sock(sk); return copied; } From patchwork Mon Nov 18 21:45:36 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Florian Westphal X-Patchwork-Id: 1197000 X-Patchwork-Delegate: matthieu.baerts@tessares.net Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.01.org (client-ip=2001:19d0:306:5::1; helo=ml01.01.org; envelope-from=mptcp-bounces@lists.01.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=strlen.de Received: from ml01.01.org (ml01.01.org [IPv6:2001:19d0:306:5::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 47H2bJ58Vwz9s4Y for ; Tue, 19 Nov 2019 08:46:36 +1100 (AEDT) Received: from new-ml01.vlan13.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id 11CF7100DC2A8; Mon, 18 Nov 2019 13:47:31 -0800 (PST) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=2a0a:51c0:0:12e:520::1; helo=chamillionaire.breakpoint.cc; envelope-from=fw@breakpoint.cc; receiver= Received: from Chamillionaire.breakpoint.cc (Chamillionaire.breakpoint.cc [IPv6:2a0a:51c0:0:12e:520::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id ADDAE100DC2A7 for ; Mon, 18 Nov 2019 13:47:29 -0800 (PST) Received: from fw by Chamillionaire.breakpoint.cc with local (Exim 4.92) (envelope-from ) id 1iWoqu-0008Im-Bv; Mon, 18 Nov 2019 22:46:32 +0100 From: Florian Westphal To: Cc: Florian Westphal Date: Mon, 18 Nov 2019 22:45:36 +0100 Message-Id: <20191118214538.21931-12-fw@strlen.de> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20191118214538.21931-1-fw@strlen.de> References: <20191118214538.21931-1-fw@strlen.de> MIME-Version: 1.0 Message-ID-Hash: AAJOXX2LQ4FDCHQHANKI6SS65EB56DXZ X-Message-ID-Hash: AAJOXX2LQ4FDCHQHANKI6SS65EB56DXZ X-MailFrom: fw@breakpoint.cc X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header X-Mailman-Version: 3.1.1 Precedence: list Subject: [MPTCP] [PATCH v2 11/13] sendmsg: clear SEND_SPACE if write caused wmem to grow too large List-Id: Discussions regarding MPTCP upstreaming Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: This is needed when we get rid of __tcp_poll and let mptcp_poll only rely on msk->flags -- we need to avoid EPOLLOUT if the write filled too much sndbuf space. Signed-off-by: Florian Westphal --- net/mptcp/protocol.c | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 8b22cf245580..9f2cbf2b89fb 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -421,6 +421,22 @@ static struct sock *mptcp_subflow_get_send(struct mptcp_sock *msk) return backup; } +static void ssk_check_wmem(struct mptcp_sock *msk, struct sock *ssk) +{ + struct socket *sock; + + if (likely(sk_stream_is_writeable(ssk))) + return; + + sock = READ_ONCE(ssk->sk_socket); + + if (sock) { + clear_bit(MPTCP_SEND_SPACE, &msk->flags); + smp_mb__after_atomic(); + set_bit(SOCK_NOSPACE, &sock->flags); + } +} + static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) { int mss_now = 0, size_goal = 0, ret = 0; @@ -485,6 +501,7 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) mptcp_reset_timer(sk); } + ssk_check_wmem(msk, ssk); release_sock(ssk); out: From patchwork Mon Nov 18 21:45:37 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Florian Westphal X-Patchwork-Id: 1197001 X-Patchwork-Delegate: matthieu.baerts@tessares.net Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.01.org (client-ip=198.145.21.10; helo=ml01.01.org; envelope-from=mptcp-bounces@lists.01.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=strlen.de Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 47H2bQ5KyXz9s4Y for ; Tue, 19 Nov 2019 08:46:42 +1100 (AEDT) Received: from new-ml01.vlan13.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id 8152E100DC2A8; Mon, 18 Nov 2019 13:47:36 -0800 (PST) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=2a0a:51c0:0:12e:520::1; helo=chamillionaire.breakpoint.cc; envelope-from=fw@breakpoint.cc; receiver= Received: from Chamillionaire.breakpoint.cc (Chamillionaire.breakpoint.cc [IPv6:2a0a:51c0:0:12e:520::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id CE52B100DC2A6 for ; Mon, 18 Nov 2019 13:47:33 -0800 (PST) Received: from fw by Chamillionaire.breakpoint.cc with local (Exim 4.92) (envelope-from ) id 1iWoqy-0008It-Hm; Mon, 18 Nov 2019 22:46:36 +0100 From: Florian Westphal To: Cc: Florian Westphal Date: Mon, 18 Nov 2019 22:45:37 +0100 Message-Id: <20191118214538.21931-13-fw@strlen.de> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20191118214538.21931-1-fw@strlen.de> References: <20191118214538.21931-1-fw@strlen.de> MIME-Version: 1.0 Message-ID-Hash: WRGYJKIKBVE6CJV2K7E3PJU3F4PRYRTU X-Message-ID-Hash: WRGYJKIKBVE6CJV2K7E3PJU3F4PRYRTU X-MailFrom: fw@breakpoint.cc X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header X-Mailman-Version: 3.1.1 Precedence: list Subject: [MPTCP] [PATCH v2 12/13] mptcp_poll: don't consider subflow socket state anymore List-Id: Discussions regarding MPTCP upstreaming Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: After previous patch, we do not need to call __tcp_poll anymore, we can use msk->flags instead to see if we have data available on a socket. SEND_SPACE flag indicates when a subflow has enough space to accept more data, it gets cleared on mptcp_sendmsg() return in case ssk runs below the free watermark. The commit "net: tcp: add __tcp_poll helper" can now be removed. Signed-off-by: Florian Westphal --- net/mptcp/protocol.c | 30 +++++++++++++++--------------- 1 file changed, 15 insertions(+), 15 deletions(-) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 9f2cbf2b89fb..b36d1b89cb34 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -189,7 +189,10 @@ static void mptcp_clean_una(struct sock *sk) if (cleaned) { sk_mem_reclaim_partial(sk); - sk_stream_write_space(sk); + + /* Only wake up writers if a subflow is ready */ + if (test_bit(MPTCP_SEND_SPACE, &msk->flags)) + sk_stream_write_space(sk); } } @@ -837,6 +840,7 @@ static int __mptcp_init_sock(struct sock *sk) INIT_LIST_HEAD(&msk->rtx_queue); INIT_WORK(&msk->rtx_work, mptcp_worker); + __set_bit(MPTCP_SEND_SPACE, &msk->flags); /* re-use the csk retrans timer for MPTCP-level retrans */ timer_setup(&msk->sk.icsk_retransmit_timer, mptcp_retransmit_timer, 0); @@ -1401,39 +1405,35 @@ static int mptcp_stream_accept(struct socket *sock, struct socket *newsock, static __poll_t mptcp_poll(struct file *file, struct socket *sock, struct poll_table_struct *wait) { - struct mptcp_subflow_context *subflow; const struct mptcp_sock *msk; struct sock *sk = sock->sk; struct socket *ssock; - __poll_t ret = 0; + __poll_t mask = 0; msk = mptcp_sk(sk); lock_sock(sk); ssock = __mptcp_fallback_get_ref(msk); if (ssock) { release_sock(sk); - ret = ssock->ops->poll(file, ssock, wait); + mask = ssock->ops->poll(file, ssock, wait); sock_put(ssock->sk); - return ret; + return mask; } release_sock(sk); sock_poll_wait(file, sock, wait); lock_sock(sk); - mptcp_for_each_subflow(msk, subflow) { - struct socket *tcp_sock; - - tcp_sock = mptcp_subflow_tcp_socket(subflow); - ret |= __tcp_poll(tcp_sock->sk); - } - - if (!sk_stream_is_writeable(sk)) - ret &= ~(EPOLLOUT|EPOLLWRNORM); + if (test_bit(MPTCP_DATA_READY, &msk->flags)) + mask = EPOLLIN | EPOLLRDNORM; + if (sk_stream_is_writeable(sk) && test_bit(MPTCP_SEND_SPACE, &msk->flags)) + mask |= EPOLLOUT | EPOLLWRNORM; + if (sk->sk_shutdown & RCV_SHUTDOWN) + mask |= EPOLLIN | EPOLLRDNORM | EPOLLRDHUP; release_sock(sk); - return ret; + return mask; } static int mptcp_shutdown(struct socket *sock, int how) From patchwork Mon Nov 18 21:45:38 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Florian Westphal X-Patchwork-Id: 1197002 X-Patchwork-Delegate: matthieu.baerts@tessares.net Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.01.org (client-ip=2001:19d0:306:5::1; helo=ml01.01.org; envelope-from=mptcp-bounces@lists.01.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=strlen.de Received: from ml01.01.org (ml01.01.org [IPv6:2001:19d0:306:5::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 47H2bT1QnCz9s4Y for ; Tue, 19 Nov 2019 08:46:45 +1100 (AEDT) Received: from new-ml01.vlan13.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id 89959100DC2A6; Mon, 18 Nov 2019 13:47:39 -0800 (PST) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=2a0a:51c0:0:12e:520::1; helo=chamillionaire.breakpoint.cc; envelope-from=fw@breakpoint.cc; receiver= Received: from Chamillionaire.breakpoint.cc (Chamillionaire.breakpoint.cc [IPv6:2a0a:51c0:0:12e:520::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 03951100DC2AA for ; Mon, 18 Nov 2019 13:47:38 -0800 (PST) Received: from fw by Chamillionaire.breakpoint.cc with local (Exim 4.92) (envelope-from ) id 1iWor2-0008J0-N5; Mon, 18 Nov 2019 22:46:40 +0100 From: Florian Westphal To: Cc: Florian Westphal Date: Mon, 18 Nov 2019 22:45:38 +0100 Message-Id: <20191118214538.21931-14-fw@strlen.de> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20191118214538.21931-1-fw@strlen.de> References: <20191118214538.21931-1-fw@strlen.de> MIME-Version: 1.0 Message-ID-Hash: QK3TTVFCYGPB45UMVH7MKOPFMQDQAUHZ X-Message-ID-Hash: QK3TTVFCYGPB45UMVH7MKOPFMQDQAUHZ X-MailFrom: fw@breakpoint.cc X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header X-Mailman-Version: 3.1.1 Precedence: list Subject: [MPTCP] [PATCH v2 13/13] sendmsg: truncate source buffer if mptcp sndbuf size was set from userspace List-Id: Discussions regarding MPTCP upstreaming Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: If userspace uses SO_SNDBUF to set a value, then truncate the buffer according to remaining wmem. Otherwise, running selftest script with "-m mmap -b 4096" shows very large sucessful write() calls, as we're only limited by how much data the underlying tcp flow is willing to accept. NB: It might make sense to carry SO_SNDBUF down to the subflows, but thats left out for now. Signed-off-by: Florian Westphal --- net/mptcp/protocol.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index b36d1b89cb34..e158b30f9cab 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -483,6 +483,15 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) pr_debug("conn_list->subflow=%p", ssk); + if (unlikely(sk->sk_userlocks & SOCK_SNDBUF_LOCK)) { + int limit = sk_stream_wspace(sk); + + if (WARN_ON_ONCE(limit <= 0)) + limit = 1; + + iov_iter_truncate(&msg->msg_iter, limit); + } + lock_sock(ssk); while (msg_data_left(msg)) { ret = mptcp_sendmsg_frag(sk, ssk, msg, NULL, &timeo, &mss_now,