From patchwork Mon Jan 11 10:05:39 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Abeni X-Patchwork-Id: 1424486 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.01.org (client-ip=2001:19d0:306:5::1; helo=ml01.01.org; envelope-from=mptcp-bounces@lists.01.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=JJl7aFmf; dkim-atps=neutral Received: from ml01.01.org (ml01.01.org [IPv6:2001:19d0:306:5::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4DDq9w1kbmz9sXV for ; Mon, 11 Jan 2021 21:06:44 +1100 (AEDT) Received: from ml01.vlan13.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id 00A12100EBBBE; Mon, 11 Jan 2021 02:06:38 -0800 (PST) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=216.205.24.124; helo=us-smtp-delivery-124.mimecast.com; envelope-from=pabeni@redhat.com; receiver= Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id B3512100EBBAA for ; Mon, 11 Jan 2021 02:06:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1610359594; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=dQxMuQI3c2F8vJ5ue/Jzh02w7xQk4Ex7cOaQI7TKIZ4=; b=JJl7aFmf1ROPvD3WdkPD/81SXj7uBxBUdR8XEiviMyDOyym+k6PE6+eN0aS73dvJp6a72g ycalf+vZvDGyOUS+JUuHOn+E+PuFup6zaCwszeE/Uju37kfpXqbrOEadDed/IBioOsJ4rJ RHd79aPc8B9qWa8YYVyPAkWsde9EukQ= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-187-eupwz-jINKeKLr-yupWZdA-1; Mon, 11 Jan 2021 05:06:33 -0500 X-MC-Unique: eupwz-jINKeKLr-yupWZdA-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 071368144E0 for ; Mon, 11 Jan 2021 10:06:32 +0000 (UTC) Received: from gerbillo.redhat.com (ovpn-114-113.ams2.redhat.com [10.36.114.113]) by smtp.corp.redhat.com (Postfix) with ESMTP id 6A5175D6D5 for ; Mon, 11 Jan 2021 10:06:31 +0000 (UTC) From: Paolo Abeni To: mptcp@lists.01.org Date: Mon, 11 Jan 2021 11:05:39 +0100 Message-Id: <7dd1113da19bbf0c033fcb1046430c4d074edd31.1610359105.git.pabeni@redhat.com> In-Reply-To: References: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=pabeni@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Message-ID-Hash: NOLDAKBAUAYTHAK6KCOHT57SAPQUTLDN X-Message-ID-Hash: NOLDAKBAUAYTHAK6KCOHT57SAPQUTLDN X-MailFrom: pabeni@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header X-Mailman-Version: 3.1.1 Precedence: list Subject: [MPTCP] [PATCH v2 mptcp-next 1/4] mptcp: always graft subflow socket to parent. List-Id: Discussions regarding MPTCP upstreaming Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: Currently incoming subflow uses link to the parent socket, while outgoing one link to a per subflow socket. The latter is not really needed, except at initial connect() time and for the first subflow. Always graft the outgoing subflow to the parent socket and free the unneeded ones early. This allows some code cleanup, reduces the amount of memory used and will simplify the next patch Signed-off-by: Paolo Abeni --- net/mptcp/protocol.c | 36 ++++++++++-------------------------- net/mptcp/protocol.h | 1 + net/mptcp/subflow.c | 3 +++ 3 files changed, 14 insertions(+), 26 deletions(-) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 5977e9e083be..ada69495a51b 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -114,11 +114,7 @@ static int __mptcp_socket_create(struct mptcp_sock *msk) list_add(&subflow->node, &msk->conn_list); sock_hold(ssock->sk); subflow->request_mptcp = 1; - - /* accept() will wait on first subflow sk_wq, and we always wakes up - * via msk->sk_socket - */ - RCU_INIT_POINTER(msk->first->sk_wq, &sk->sk_socket->wq); + mptcp_sock_graft(msk->first, sk->sk_socket); return 0; } @@ -2116,9 +2112,6 @@ static struct sock *mptcp_subflow_get_retrans(const struct mptcp_sock *msk) void __mptcp_close_ssk(struct sock *sk, struct sock *ssk, struct mptcp_subflow_context *subflow) { - bool dispose_socket = false; - struct socket *sock; - list_del(&subflow->node); lock_sock_nested(ssk, SINGLE_DEPTH_NESTING); @@ -2126,11 +2119,8 @@ void __mptcp_close_ssk(struct sock *sk, struct sock *ssk, /* if we are invoked by the msk cleanup code, the subflow is * already orphaned */ - sock = ssk->sk_socket; - if (sock) { - dispose_socket = sock != sk->sk_socket; + if (ssk->sk_socket) sock_orphan(ssk); - } subflow->disposable = 1; @@ -2148,8 +2138,6 @@ void __mptcp_close_ssk(struct sock *sk, struct sock *ssk, __sock_put(ssk); } release_sock(ssk); - if (dispose_socket) - iput(SOCK_INODE(sock)); sock_put(ssk); } @@ -2540,6 +2528,12 @@ static void __mptcp_destroy_sock(struct sock *sk) pr_debug("msk=%p", msk); + /* dispose the ancillatory tcp socket, if any */ + if (msk->subflow) { + iput(SOCK_INODE(msk->subflow)); + msk->subflow = NULL; + } + /* be sure to always acquire the join list lock, to sync vs * mptcp_finish_join(). */ @@ -2590,20 +2584,10 @@ static void mptcp_close(struct sock *sk, long timeout) inet_csk(sk)->icsk_mtup.probe_timestamp = tcp_jiffies32; list_for_each_entry(subflow, &mptcp_sk(sk)->conn_list, node) { struct sock *ssk = mptcp_subflow_tcp_sock(subflow); - bool slow, dispose_socket; - struct socket *sock; + bool slow = lock_sock_fast(ssk); - slow = lock_sock_fast(ssk); - sock = ssk->sk_socket; - dispose_socket = sock && sock != sk->sk_socket; sock_orphan(ssk); unlock_sock_fast(ssk, slow); - - /* for the outgoing subflows we additionally need to free - * the associated socket - */ - if (dispose_socket) - iput(SOCK_INODE(sock)); } sock_orphan(sk); @@ -3042,7 +3026,7 @@ void mptcp_finish_connect(struct sock *ssk) mptcp_rcv_space_init(msk, ssk); } -static void mptcp_sock_graft(struct sock *sk, struct socket *parent) +void mptcp_sock_graft(struct sock *sk, struct socket *parent) { write_lock_bh(&sk->sk_callback_lock); rcu_assign_pointer(sk->sk_wq, &parent->wq); diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index d6400ad2d615..65d200a1072b 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -473,6 +473,7 @@ void mptcp_subflow_shutdown(struct sock *sk, struct sock *ssk, int how); void __mptcp_close_ssk(struct sock *sk, struct sock *ssk, struct mptcp_subflow_context *subflow); void mptcp_subflow_reset(struct sock *ssk); +void mptcp_sock_graft(struct sock *sk, struct socket *parent); /* called with sk socket lock held */ int __mptcp_subflow_connect(struct sock *sk, const struct mptcp_addr_info *loc, diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c index 278cbe3e539e..22313710d769 100644 --- a/net/mptcp/subflow.c +++ b/net/mptcp/subflow.c @@ -1159,6 +1159,9 @@ int __mptcp_subflow_connect(struct sock *sk, const struct mptcp_addr_info *loc, if (err && err != -EINPROGRESS) goto failed_unlink; + /* discard the subflow socket */ + mptcp_sock_graft(ssk, sk->sk_socket); + iput(SOCK_INODE(sf)); return err; failed_unlink: From patchwork Mon Jan 11 10:05:40 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Abeni X-Patchwork-Id: 1424482 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.01.org (client-ip=198.145.21.10; helo=ml01.01.org; envelope-from=mptcp-bounces@lists.01.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=hqPt/BzA; dkim-atps=neutral Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4DDq9r4NKHz9sRR for ; Mon, 11 Jan 2021 21:06:40 +1100 (AEDT) Received: from ml01.vlan13.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id 11836100EBBC4; Mon, 11 Jan 2021 02:06:39 -0800 (PST) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=63.128.21.124; helo=us-smtp-delivery-124.mimecast.com; envelope-from=pabeni@redhat.com; receiver= Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [63.128.21.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 75C91100EBBAA for ; Mon, 11 Jan 2021 02:06:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1610359595; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=7ccBZJOXJN3jBgTg8g4W51X0zMgz/cbiPoSgWdy5Bk4=; b=hqPt/BzA3AatG1jvYREhPyI0gir38djz95L5r4rjF3gwGnH6aH61tBpT6/1sXVyLVEVaqb ZPwBt7j2AVdCS9ziQqejV85bsu6Lc6c4vzT7UtjO56NgtttGdyKnDfkZ6z3IGhD6gSK8ra gjEvHwIf1L+/G09XzrfZUW27kHCjlx8= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-388-vZYi-4OcNaONgPJN4rVV8g-1; Mon, 11 Jan 2021 05:06:33 -0500 X-MC-Unique: vZYi-4OcNaONgPJN4rVV8g-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id EE4208144F5 for ; Mon, 11 Jan 2021 10:06:32 +0000 (UTC) Received: from gerbillo.redhat.com (ovpn-114-113.ams2.redhat.com [10.36.114.113]) by smtp.corp.redhat.com (Postfix) with ESMTP id 5DB1F6A8FE for ; Mon, 11 Jan 2021 10:06:32 +0000 (UTC) From: Paolo Abeni To: mptcp@lists.01.org Date: Mon, 11 Jan 2021 11:05:40 +0100 Message-Id: In-Reply-To: References: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=pabeni@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Message-ID-Hash: URSJUSBLYTLWOBKLJA7PQXXF7FLUGNLH X-Message-ID-Hash: URSJUSBLYTLWOBKLJA7PQXXF7FLUGNLH X-MailFrom: pabeni@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header X-Mailman-Version: 3.1.1 Precedence: list Subject: [MPTCP] [PATCH v2 mptcp-next 2/4] mptcp: re-enable sndbuf autotune List-Id: Discussions regarding MPTCP upstreaming Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: After commit 6e628cd3a8f7 ("mptcp: use mptcp release_cb for delayed tasks"), MPTCP never sets the flag bit SOCK_NOSPACE on its subflow. As a side effect, autotune never takes place, as it happens inside tcp_new_space(), which in turn is called only when the mentioned bit is set. Let's sendmsg() set the subflows NOSPACE bit when looking for more memory. Additionally, cleanup the sndbuf propagation from subflow into the msk, leveraging the subflow write space callback and dropping a bunch of duplicate code. This also makes the SNDBUF_LIMITED chrono relevant again for MPTCP subflows. Fixes: 6e628cd3a8f7 ("mptcp: use mptcp release_cb for delayed tasks") Signed-off-by: Paolo Abeni --- v1 -> v2: - leverage the previous commit, so we can always simply set the nospace bit on the msk/parent socket --- net/mptcp/protocol.c | 57 +++++++++++++++++++------------------------- net/mptcp/protocol.h | 20 ++++++++++++++++ net/mptcp/subflow.c | 10 +++++++- 3 files changed, 53 insertions(+), 34 deletions(-) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index ada69495a51b..a64b2f6fb17b 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -730,10 +730,14 @@ void mptcp_data_ready(struct sock *sk, struct sock *ssk) void __mptcp_flush_join_list(struct mptcp_sock *msk) { + struct mptcp_subflow_context *subflow; + if (likely(list_empty(&msk->join_list))) return; spin_lock_bh(&msk->join_list_lock); + list_for_each_entry(subflow, &msk->join_list, node) + mptcp_propagate_sndbuf((struct sock *)msk, mptcp_subflow_tcp_sock(subflow)); list_splice_tail_init(&msk->join_list, &msk->conn_list); spin_unlock_bh(&msk->join_list_lock); } @@ -1033,13 +1037,7 @@ static void __mptcp_clean_una(struct sock *sk) __mptcp_update_wmem(sk); sk_mem_reclaim_partial(sk); } - - if (sk_stream_is_writeable(sk)) { - /* pairs with memory barrier in mptcp_poll */ - smp_mb(); - if (test_and_clear_bit(MPTCP_NOSPACE, &msk->flags)) - sk_stream_write_space(sk); - } + mptcp_write_space(sk); } if (snd_una == READ_ONCE(msk->snd_nxt)) { @@ -1358,8 +1356,7 @@ struct subflow_send_info { u64 ratio; }; -static struct sock *mptcp_subflow_get_send(struct mptcp_sock *msk, - u32 *sndbuf) +static struct sock *mptcp_subflow_get_send(struct mptcp_sock *msk) { struct subflow_send_info send_info[2]; struct mptcp_subflow_context *subflow; @@ -1370,24 +1367,17 @@ static struct sock *mptcp_subflow_get_send(struct mptcp_sock *msk, sock_owned_by_me((struct sock *)msk); - *sndbuf = 0; if (__mptcp_check_fallback(msk)) { if (!msk->first) return NULL; - *sndbuf = msk->first->sk_sndbuf; return sk_stream_memory_free(msk->first) ? msk->first : NULL; } /* re-use last subflow, if the burst allow that */ if (msk->last_snd && msk->snd_burst > 0 && sk_stream_memory_free(msk->last_snd) && - mptcp_subflow_active(mptcp_subflow_ctx(msk->last_snd))) { - mptcp_for_each_subflow(msk, subflow) { - ssk = mptcp_subflow_tcp_sock(subflow); - *sndbuf = max(tcp_sk(ssk)->snd_wnd, *sndbuf); - } + mptcp_subflow_active(mptcp_subflow_ctx(msk->last_snd))) return msk->last_snd; - } /* pick the subflow with the lower wmem/wspace ratio */ for (i = 0; i < 2; ++i) { @@ -1400,7 +1390,6 @@ static struct sock *mptcp_subflow_get_send(struct mptcp_sock *msk, continue; nr_active += !subflow->backup; - *sndbuf = max(tcp_sk(ssk)->snd_wnd, *sndbuf); if (!sk_stream_memory_free(subflow->tcp_sock)) continue; @@ -1421,7 +1410,8 @@ static struct sock *mptcp_subflow_get_send(struct mptcp_sock *msk, send_info[1].ssk, send_info[1].ratio); /* pick the best backup if no other subflow is active */ - if (!nr_active) + msk->use_backup = !nr_active; + if (msk->use_backup) send_info[0].ssk = send_info[1].ssk; if (send_info[0].ssk) { @@ -1430,6 +1420,7 @@ static struct sock *mptcp_subflow_get_send(struct mptcp_sock *msk, sk_stream_wspace(msk->last_snd)); return msk->last_snd; } + return NULL; } @@ -1450,7 +1441,6 @@ static void mptcp_push_pending(struct sock *sk, unsigned int flags) }; struct mptcp_data_frag *dfrag; int len, copied = 0; - u32 sndbuf; while ((dfrag = mptcp_send_head(sk))) { info.sent = dfrag->already_sent; @@ -1461,12 +1451,7 @@ static void mptcp_push_pending(struct sock *sk, unsigned int flags) prev_ssk = ssk; __mptcp_flush_join_list(msk); - ssk = mptcp_subflow_get_send(msk, &sndbuf); - - /* do auto tuning */ - if (!(sk->sk_userlocks & SOCK_SNDBUF_LOCK) && - sndbuf > READ_ONCE(sk->sk_sndbuf)) - WRITE_ONCE(sk->sk_sndbuf, sndbuf); + ssk = mptcp_subflow_get_send(msk); /* try to keep the subflow socket lock across * consecutive xmit on the same socket @@ -1533,11 +1518,6 @@ static void __mptcp_subflow_push_pending(struct sock *sk, struct sock *ssk) while (len > 0) { int ret = 0; - /* do auto tuning */ - if (!(sk->sk_userlocks & SOCK_SNDBUF_LOCK) && - ssk->sk_sndbuf > READ_ONCE(sk->sk_sndbuf)) - WRITE_ONCE(sk->sk_sndbuf, ssk->sk_sndbuf); - if (unlikely(mptcp_must_reclaim_memory(sk, ssk))) { __mptcp_update_wmem(sk); sk_mem_reclaim_partial(sk); @@ -1575,6 +1555,15 @@ static void __mptcp_subflow_push_pending(struct sock *sk, struct sock *ssk) } } +static void mptcp_set_nospace(struct sock *sk) +{ + /* enable autotune */ + set_bit(SOCK_NOSPACE, &sk->sk_socket->flags); + + /* will be cleared on avail space */ + set_bit(MPTCP_NOSPACE, &mptcp_sk(sk)->flags); +} + static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) { struct mptcp_sock *msk = mptcp_sk(sk); @@ -1676,7 +1665,7 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) continue; wait_for_memory: - set_bit(MPTCP_NOSPACE, &msk->flags); + mptcp_set_nospace(sk); mptcp_push_pending(sk, msg->msg_flags); ret = sk_stream_wait_memory(sk, &timeo); if (ret) @@ -2347,6 +2336,7 @@ static int __mptcp_init_sock(struct sock *sk) msk->tx_pending_data = 0; msk->size_goal_cache = TCP_BASE_MSS; + msk->use_backup = false; msk->ack_hint = NULL; msk->first = NULL; inet_csk(sk)->icsk_sync_mss = mptcp_sync_mss; @@ -3269,6 +3259,7 @@ static int mptcp_stream_accept(struct socket *sock, struct socket *newsock, mptcp_copy_inaddrs(newsk, msk->first); mptcp_rcv_space_init(msk, msk->first); + mptcp_propagate_sndbuf(newsk, msk->first); /* set ssk->sk_socket of accept()ed flows to mptcp socket. * This is needed so NOSPACE flag can be set from tcp stack. @@ -3309,7 +3300,7 @@ static __poll_t mptcp_check_writeable(struct mptcp_sock *msk) if (sk_stream_is_writeable(sk)) return EPOLLOUT | EPOLLWRNORM; - set_bit(MPTCP_NOSPACE, &msk->flags); + mptcp_set_nospace(sk); smp_mb__after_atomic(); /* msk->flags is changed by write_space cb */ if (sk_stream_is_writeable(sk)) return EPOLLOUT | EPOLLWRNORM; diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index 65d200a1072b..adc56bcbdf68 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -248,6 +248,7 @@ struct mptcp_sock { bool snd_data_fin_enable; bool rcv_fastclose; bool use_64bit_ack; /* Set when we received a 64-bit DSN */ + bool use_backup; /* only backup subflow available */ spinlock_t join_list_lock; struct sock *ack_hint; struct work_struct work; @@ -522,6 +523,25 @@ static inline bool mptcp_data_fin_enabled(const struct mptcp_sock *msk) READ_ONCE(msk->write_seq) == READ_ONCE(msk->snd_nxt); } +static inline bool mptcp_propagate_sndbuf(struct sock *sk, struct sock *ssk) +{ + if ((sk->sk_userlocks & SOCK_SNDBUF_LOCK) || ssk->sk_sndbuf <= READ_ONCE(sk->sk_sndbuf)) + return false; + + WRITE_ONCE(sk->sk_sndbuf, ssk->sk_sndbuf); + return true; +} + +static inline void mptcp_write_space(struct sock *sk) +{ + if (sk_stream_is_writeable(sk)) { + /* pairs with memory barrier in mptcp_poll */ + smp_mb(); + if (test_and_clear_bit(MPTCP_NOSPACE, &mptcp_sk(sk)->flags)) + sk_stream_write_space(sk); + } +} + void mptcp_destroy_common(struct mptcp_sock *msk); void __init mptcp_token_init(void); diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c index 22313710d769..31cc362a4638 100644 --- a/net/mptcp/subflow.c +++ b/net/mptcp/subflow.c @@ -343,6 +343,7 @@ static void subflow_finish_connect(struct sock *sk, const struct sk_buff *skb) if (subflow->conn_finished) return; + mptcp_propagate_sndbuf(parent, sk); subflow->rel_write_seq = 1; subflow->conn_finished = 1; subflow->ssn_offset = TCP_SKB_CB(skb)->seq; @@ -1040,7 +1041,13 @@ static void subflow_data_ready(struct sock *sk) static void subflow_write_space(struct sock *ssk) { - /* we take action in __mptcp_clean_una() */ + struct socket *sock = ssk->sk_socket; + + if (mptcp_propagate_sndbuf(mptcp_subflow_ctx(ssk)->conn, ssk)) + mptcp_write_space(ssk); + + if (sk_stream_is_writeable(ssk) && sock) + clear_bit(SOCK_NOSPACE, &sock->flags); } static struct inet_connection_sock_af_ops * @@ -1302,6 +1309,7 @@ static void subflow_state_change(struct sock *sk) __subflow_state_change(sk); if (subflow_simultaneous_connect(sk)) { + mptcp_propagate_sndbuf(parent, sk); mptcp_do_fallback(sk); mptcp_rcv_space_init(mptcp_sk(parent), sk); pr_fallback(mptcp_sk(parent)); From patchwork Mon Jan 11 10:05:41 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Abeni X-Patchwork-Id: 1424485 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.01.org (client-ip=2001:19d0:306:5::1; helo=ml01.01.org; envelope-from=mptcp-bounces@lists.01.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=KmHk2pqM; dkim-atps=neutral Received: from ml01.01.org (ml01.01.org [IPv6:2001:19d0:306:5::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4DDq9v4RbKz9sRR for ; Mon, 11 Jan 2021 21:06:42 +1100 (AEDT) Received: from ml01.vlan13.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id 1D36E100EBBD0; Mon, 11 Jan 2021 02:06:39 -0800 (PST) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=63.128.21.124; helo=us-smtp-delivery-124.mimecast.com; envelope-from=pabeni@redhat.com; receiver= Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [63.128.21.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 8D29A100EBBB9 for ; Mon, 11 Jan 2021 02:06:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1610359596; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=veOn3er4sT4zJMu25vHdU9TX0TQHThlqCp39cAsXRLE=; b=KmHk2pqMCMF0WJI1So1bNB+9KVduAfyOW/jL+mec4DiH5mSuYp3Jp3vVTx8gtZsFS12Ro+ Xb71cRU90ccdIb30mesoyg9cOwHIcDL1/fd4EsXtCNthK6OIUTSbEwu9kl9AelG4xNHNkJ 8EIv3hNaKybj8futKFuQy3yJSq4bEPA= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-416-IJ9n5A5wM2uYIcWjtSRx4A-1; Mon, 11 Jan 2021 05:06:34 -0500 X-MC-Unique: IJ9n5A5wM2uYIcWjtSRx4A-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id E063F107ACF9 for ; Mon, 11 Jan 2021 10:06:33 +0000 (UTC) Received: from gerbillo.redhat.com (ovpn-114-113.ams2.redhat.com [10.36.114.113]) by smtp.corp.redhat.com (Postfix) with ESMTP id 507D25D6D5 for ; Mon, 11 Jan 2021 10:06:33 +0000 (UTC) From: Paolo Abeni To: mptcp@lists.01.org Date: Mon, 11 Jan 2021 11:05:41 +0100 Message-Id: <0bd545cbc71005e983baa13f0a62e6d6e90c3920.1610359105.git.pabeni@redhat.com> In-Reply-To: References: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=pabeni@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Message-ID-Hash: GQRRZKSBLLSOTVQA5XZYIRNZ4NYLWBNB X-Message-ID-Hash: GQRRZKSBLLSOTVQA5XZYIRNZ4NYLWBNB X-MailFrom: pabeni@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header X-Mailman-Version: 3.1.1 Precedence: list Subject: [MPTCP] [PATCH v2 mptcp-next 3/4] mptcp: do not queue excessive data on subflows. List-Id: Discussions regarding MPTCP upstreaming Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: The current packet scheduler can enqueue up to sndbuf data on each subflow. If the send buffer is large and the subflows are not symmetric, this could lead to suboptimal aggregate bandwidth utilization. Limit the amount of queued data to the maximum cwnd. Signed-off-by: Paolo Abeni --- net/mptcp/protocol.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index a64b2f6fb17b..510b87a3553b 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -1390,7 +1390,7 @@ static struct sock *mptcp_subflow_get_send(struct mptcp_sock *msk) continue; nr_active += !subflow->backup; - if (!sk_stream_memory_free(subflow->tcp_sock)) + if (!sk_stream_memory_free(subflow->tcp_sock) || !tcp_sk(ssk)->snd_wnd) continue; pace = READ_ONCE(ssk->sk_pacing_rate); @@ -1417,7 +1417,7 @@ static struct sock *mptcp_subflow_get_send(struct mptcp_sock *msk) if (send_info[0].ssk) { msk->last_snd = send_info[0].ssk; msk->snd_burst = min_t(int, MPTCP_SEND_BURST_SIZE, - sk_stream_wspace(msk->last_snd)); + tcp_sk(msk->last_snd)->snd_wnd); return msk->last_snd; } From patchwork Mon Jan 11 10:05:42 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Abeni X-Patchwork-Id: 1424484 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.01.org (client-ip=2001:19d0:306:5::1; helo=ml01.01.org; envelope-from=mptcp-bounces@lists.01.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=C9AZHjBm; dkim-atps=neutral Received: from ml01.01.org (ml01.01.org [IPv6:2001:19d0:306:5::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4DDq9v4Rc4z9sWL for ; Mon, 11 Jan 2021 21:06:42 +1100 (AEDT) Received: from ml01.vlan13.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id 29A7F100EBBD6; Mon, 11 Jan 2021 02:06:40 -0800 (PST) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=63.128.21.124; helo=us-smtp-delivery-124.mimecast.com; envelope-from=pabeni@redhat.com; receiver= Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [63.128.21.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id B4768100EBBC0 for ; Mon, 11 Jan 2021 02:06:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1610359597; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=bhqa10ElV4mALmSoAbBs95k1stu0i4JkU8yVajhR3c0=; b=C9AZHjBmROwZ5yfKV4e/xdV6P+NaUgvh1YZqirtywDv0f4Ii7kb54/T15nnjNjcTUqeLOE Jcmr3RHYgbLxzCRYnpOMlbLcTFlsv3LzBHCakj4VauId5VvUzGcKKsQDIQ54GsUZrridvZ 3YdlPKa5expPbUnqOzdAJd8G6DYwBHk= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-594-RgwPqn_0OaqhRK_CtmROFQ-1; Mon, 11 Jan 2021 05:06:35 -0500 X-MC-Unique: RgwPqn_0OaqhRK_CtmROFQ-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id D4D208144E8 for ; Mon, 11 Jan 2021 10:06:34 +0000 (UTC) Received: from gerbillo.redhat.com (ovpn-114-113.ams2.redhat.com [10.36.114.113]) by smtp.corp.redhat.com (Postfix) with ESMTP id 437055D6D5 for ; Mon, 11 Jan 2021 10:06:34 +0000 (UTC) From: Paolo Abeni To: mptcp@lists.01.org Date: Mon, 11 Jan 2021 11:05:42 +0100 Message-Id: In-Reply-To: References: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=pabeni@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Message-ID-Hash: WQ5K4SBXAT6MR6D644M5XD5YT4L5BDRH X-Message-ID-Hash: WQ5K4SBXAT6MR6D644M5XD5YT4L5BDRH X-MailFrom: pabeni@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header X-Mailman-Version: 3.1.1 Precedence: list Subject: [MPTCP] [PATCH v2 mptcp-next 4/4] mptcp: schedule work for better snd subflow selection List-Id: Discussions regarding MPTCP upstreaming Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: Otherwise the packet scheduler policy will not be enforced when pushing pending data at MPTCP-level ack reception time. Signed-off-by: Paolo Abeni --- I'm very sad to re-introduce the worker usage for the datapath, but I could not find an easy way to avoid it. I'm thinking of some weird schema involving a dummy percpu napi instance serving/processing a queue of mptcp sockets instead of actual packets. Any BH user can enqueue an MPTCP subflow there to delegate some action to the latter. The above will require also overriding the tcp_release_cb() with something able to process this delegated events. Yep, crazy, but possibly doable without any core change and possibly could be used to avoid the worker usage even for data fin processing/sending. --- net/mptcp/protocol.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 510b87a3553b..0791421a971f 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -2244,6 +2244,7 @@ static void mptcp_worker(struct work_struct *work) if (unlikely(state == TCP_CLOSE)) goto unlock; + mptcp_push_pending(sk, 0); mptcp_check_data_fin_ack(sk); __mptcp_flush_join_list(msk); @@ -2903,10 +2904,14 @@ void __mptcp_check_push(struct sock *sk, struct sock *ssk) if (!mptcp_send_head(sk)) return; - if (!sock_owned_by_user(sk)) - __mptcp_subflow_push_pending(sk, ssk); - else + if (!sock_owned_by_user(sk)) { + if (mptcp_subflow_get_send(mptcp_sk(sk)) == ssk) + __mptcp_subflow_push_pending(sk, ssk); + else + mptcp_schedule_work(sk); + } else { set_bit(MPTCP_PUSH_PENDING, &mptcp_sk(sk)->flags); + } } #define MPTCP_DEFERRED_ALL (TCPF_WRITE_TIMER_DEFERRED)