From patchwork Mon Jan 11 22:43:25 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Abeni X-Patchwork-Id: 1424854 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.01.org (client-ip=198.145.21.10; helo=ml01.01.org; envelope-from=mptcp-bounces@lists.01.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=WSRba62t; dkim-atps=neutral Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4DF7zV5QyDz9sVy for ; Tue, 12 Jan 2021 09:43:48 +1100 (AEDT) Received: from ml01.vlan13.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id D63EB100EBB9B; Mon, 11 Jan 2021 14:43:45 -0800 (PST) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=63.128.21.124; helo=us-smtp-delivery-124.mimecast.com; envelope-from=pabeni@redhat.com; receiver= Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [63.128.21.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 43512100EBBB3 for ; Mon, 11 Jan 2021 14:43:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1610405022; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=tUrBVbbVhk0xz3ewoJhsExjwWSKveVEKRR/FpWcub2w=; b=WSRba62tqqMxbzX8jy+5V6T+7OOgyF+Kl+jaKRcCsLNmygzTVkfu1d0iwSDcNiziAS7z0C r02jeU/gUmzcVqJqJ1Ko69uwMt4hJqK/WFe5uW4yV0znTtqXuVygV339SC1x1Ri2fwPCT6 +nIc43WDw0UeoJeCFRbOBVd08kA/1YA= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-298-wsfmY26jM0exSbW47tGSUQ-1; Mon, 11 Jan 2021 17:43:40 -0500 X-MC-Unique: wsfmY26jM0exSbW47tGSUQ-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id AF2E251F3 for ; Mon, 11 Jan 2021 22:43:39 +0000 (UTC) Received: from gerbillo.redhat.com (ovpn-112-22.ams2.redhat.com [10.36.112.22]) by smtp.corp.redhat.com (Postfix) with ESMTP id 11D815D9DB for ; Mon, 11 Jan 2021 22:43:38 +0000 (UTC) From: Paolo Abeni To: mptcp@lists.01.org Date: Mon, 11 Jan 2021 23:43:25 +0100 Message-Id: <06d7dd5b5a2a5da7507bcd54d61f6b06a2ee919c.1610404441.git.pabeni@redhat.com> In-Reply-To: References: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=pabeni@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Message-ID-Hash: PP7UJZKJ5SX64UBO6LSRXDZK3MUDG2OZ X-Message-ID-Hash: PP7UJZKJ5SX64UBO6LSRXDZK3MUDG2OZ X-MailFrom: pabeni@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header X-Mailman-Version: 3.1.1 Precedence: list Subject: [MPTCP] [RFC PATCH 1/2] mptcp: implement deferred action infrastructure List-Id: Discussions regarding MPTCP upstreaming Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: On MPTCP-level ack reception, the packet scheduler may select a subflow other then the current one. Prior to this commit we rely on the workqueue to trigger action on such subflow. This changeset introduce an infrastructure that allows any MPTCP subflow to schedule actions (MPTCP xmit) on others subflows without resorting to (multiple) process reschedule. A dummy NAPI instance is used instead. When MPTCP needs to trigger action an a different subflow, it enqueues the target subflow on the NAPI backlog and schedule such instance as needed. The dummy NAPI poll method walk the sockets backlog and try to acquire the (BH) socket lock on each of them. If the socket is owned by the user space, the action will be completed by the sock release cb, otherwise push is started. Signed-off-by: Paolo Abeni --- help with the commit prose to make this change more upstream-palatable more then welcome! ;) --- net/mptcp/protocol.c | 86 ++++++++++++++++++++++++++++++++++++++++++++ net/mptcp/protocol.h | 52 +++++++++++++++++++++++++++ net/mptcp/subflow.c | 2 ++ 3 files changed, 140 insertions(+) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 0791421a971f..3d5ac817b2fb 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -2959,6 +2959,30 @@ static void mptcp_release_cb(struct sock *sk) } } +static void mptcp_subflow_process_deferred(struct sock *ssk) +{ + struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk); + struct sock *sk = subflow->conn; + + mptcp_data_lock(sk); + if (!sock_owned_by_user(sk)) + __mptcp_subflow_push_pending(sk, ssk); + else + set_bit(MPTCP_PUSH_PENDING, &mptcp_sk(sk)->flags); + mptcp_data_unlock(sk); + mptcp_subflow_deferred_done(subflow); +} + +static void tcp_release_cb_override(struct sock *ssk) +{ + struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk); + + if (mptcp_subflow_has_deferred_action(subflow)) + mptcp_subflow_process_deferred(ssk); + + tcp_release_cb(ssk); +} + static int mptcp_hash(struct sock *sk) { /* should never be called, @@ -3111,6 +3135,8 @@ static struct proto mptcp_prot = { .no_autobind = true, }; +static struct proto tcp_prot_override; + static int mptcp_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len) { struct mptcp_sock *msk = mptcp_sk(sock->sk); @@ -3265,6 +3291,7 @@ static int mptcp_stream_accept(struct socket *sock, struct socket *newsock, mptcp_copy_inaddrs(newsk, msk->first); mptcp_rcv_space_init(msk, msk->first); mptcp_propagate_sndbuf(newsk, msk->first); + mptcp_subflow_ops_override(msk->first); /* set ssk->sk_socket of accept()ed flows to mptcp socket. * This is needed so NOSPACE flag can be set from tcp stack. @@ -3375,13 +3402,58 @@ static struct inet_protosw mptcp_protosw = { #define MPTCP_USE_SLAB 1 #endif +DEFINE_PER_CPU(struct mptcp_deferred_action, mptcp_deferred_actions); + +static int mptcp_napi_poll(struct napi_struct *napi, int budget) +{ + struct mptcp_deferred_action *deferred; + struct mptcp_subflow_context *subflow; + int work_done = 0; + + deferred = container_of(napi, struct mptcp_deferred_action, napi); + while ((subflow = mptcp_subflow_deferred_next(deferred)) != NULL) { + struct sock *ssk = mptcp_subflow_tcp_sock(subflow); + + bh_lock_sock_nested(ssk); + if (!sock_owned_by_user(ssk)) + mptcp_subflow_process_deferred(ssk); + + /* if the sock is locked the deferred status will be cleared + * by tcp_release_cb_override + */ + bh_unlock_sock(ssk); + + if (++work_done == budget) + return budget; + } + + /* always provide a 0 'work_done' argument, so that napi_complete_done + * will not try accessing the NULL napi->dev ptr + */ + napi_complete_done(napi, 0); + return work_done; +} + void __init mptcp_proto_init(void) { + int cpu; + mptcp_prot.h.hashinfo = tcp_prot.h.hashinfo; if (percpu_counter_init(&mptcp_sockets_allocated, 0, GFP_KERNEL)) panic("Failed to allocate MPTCP pcpu counter\n"); + for_each_possible_cpu(cpu) { + struct mptcp_deferred_action *deferred = per_cpu_ptr(&mptcp_deferred_actions, cpu); + + INIT_LIST_HEAD(&deferred->head); + netif_tx_napi_add(init_net.loopback_dev, &deferred->napi, mptcp_napi_poll, + NAPI_POLL_WEIGHT); + napi_enable(&deferred->napi); + } + + tcp_prot_override = tcp_prot; + tcp_prot_override.release_cb = tcp_release_cb_override; mptcp_subflow_init(); mptcp_pm_init(); mptcp_token_init(); @@ -3420,6 +3492,7 @@ static const struct proto_ops mptcp_v6_stream_ops = { #endif }; +static struct proto tcpv6_prot_override; static struct proto mptcp_v6_prot; static void mptcp_v6_destroy(struct sock *sk) @@ -3446,6 +3519,9 @@ int __init mptcp_proto_v6_init(void) mptcp_v6_prot.destroy = mptcp_v6_destroy; mptcp_v6_prot.obj_size = sizeof(struct mptcp6_sock); + tcpv6_prot_override = tcpv6_prot; + tcpv6_prot_override.release_cb = tcp_release_cb_override; + err = proto_register(&mptcp_v6_prot, MPTCP_USE_SLAB); if (err) return err; @@ -3457,3 +3533,13 @@ int __init mptcp_proto_v6_init(void) return err; } #endif + +void mptcp_subflow_ops_override(struct sock *ssk) +{ +#if IS_ENABLED(CONFIG_MPTCP_IPV6) + if (ssk->sk_prot == &tcpv6_prot) + ssk->sk_prot = &tcpv6_prot_override; + else +#endif + ssk->sk_prot = &tcp_prot_override; +} diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index adc56bcbdf68..702f0e137d8a 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -379,6 +379,13 @@ enum mptcp_data_avail { MPTCP_SUBFLOW_OOO_DATA }; +struct mptcp_deferred_action { + struct napi_struct napi; + struct list_head head; +}; + +DECLARE_PER_CPU(struct mptcp_deferred_action, mptcp_deferred_actions); + /* MPTCP subflow context */ struct mptcp_subflow_context { struct list_head node;/* conn_list of subflows */ @@ -416,6 +423,9 @@ struct mptcp_subflow_context { u8 local_id; u8 remote_id; + long deferred_status; + struct list_head deferred_node; + struct sock *tcp_sock; /* tcp sk backpointer */ struct sock *conn; /* parent mptcp_sock */ const struct inet_connection_sock_af_ops *icsk_af_ops; @@ -464,6 +474,48 @@ static inline void mptcp_add_pending_subflow(struct mptcp_sock *msk, spin_unlock_bh(&msk->join_list_lock); } +void mptcp_subflow_ops_override(struct sock *ssk); + +static inline void mptcp_subflow_defer(struct mptcp_subflow_context *subflow) +{ + struct mptcp_deferred_action *deferred; + bool schedule; + + if (!test_and_set_bit(1, &subflow->deferred_status)) { + local_bh_disable(); + deferred = this_cpu_ptr(&mptcp_deferred_actions); + schedule = list_empty(&deferred->head); + list_add_tail(&subflow->deferred_node, &deferred->head); + if (schedule) + napi_schedule(&deferred->napi); + local_bh_enable(); + } +} + +static inline struct mptcp_subflow_context * +mptcp_subflow_deferred_next(struct mptcp_deferred_action *deferred) +{ + struct mptcp_subflow_context *ret; + + if (list_empty(&deferred->head)) + return NULL; + + ret = list_first_entry(&deferred->head, struct mptcp_subflow_context, deferred_node); + list_del_init(&ret->deferred_node); + return ret; +} + +static inline bool mptcp_subflow_has_deferred_action(const struct mptcp_subflow_context *subflow) +{ + return !test_bit(1, &subflow->deferred_status); +} + +static inline void mptcp_subflow_deferred_done(struct mptcp_subflow_context *subflow) +{ + clear_bit(1, &subflow->deferred_status); + list_del_init(&subflow->deferred_node); +} + int mptcp_is_enabled(struct net *net); unsigned int mptcp_get_add_addr_timeout(struct net *net); void mptcp_subflow_fully_established(struct mptcp_subflow_context *subflow, diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c index 31cc362a4638..1e22f0dca5e6 100644 --- a/net/mptcp/subflow.c +++ b/net/mptcp/subflow.c @@ -1261,6 +1261,7 @@ int mptcp_subflow_create_socket(struct sock *sk, struct socket **new_sock) *new_sock = sf; sock_hold(sk); subflow->conn = sk; + mptcp_subflow_ops_override(sf->sk); return 0; } @@ -1277,6 +1278,7 @@ static struct mptcp_subflow_context *subflow_create_ctx(struct sock *sk, rcu_assign_pointer(icsk->icsk_ulp_data, ctx); INIT_LIST_HEAD(&ctx->node); + INIT_LIST_HEAD(&ctx->deferred_node); pr_debug("subflow=%p", ctx); From patchwork Mon Jan 11 22:43:26 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Abeni X-Patchwork-Id: 1424855 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.01.org (client-ip=198.145.21.10; helo=ml01.01.org; envelope-from=mptcp-bounces@lists.01.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=gYLH3/Jb; dkim-atps=neutral Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4DF7zW1bY2z9sXH for ; Tue, 12 Jan 2021 09:43:48 +1100 (AEDT) Received: from ml01.vlan13.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id E68EF100EBB9F; Mon, 11 Jan 2021 14:43:45 -0800 (PST) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=216.205.24.124; helo=us-smtp-delivery-124.mimecast.com; envelope-from=pabeni@redhat.com; receiver= Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id CAFAB100EBB96 for ; Mon, 11 Jan 2021 14:43:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1610405023; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=2gBaa67g43dXWvkUbjNhroyHsfFCGQVY31U14plJoY8=; b=gYLH3/JbLeEzQu5e7okDRsl8CcBxJgH6oMzQNwo+s/4uwG1y2RNCT+7Y0VTjng8/JjwPQI 9IOEGBwFBGEeP0SHfwIXrUnbQ1cORRDF/Rz+OCtRxVNHu3VcV5bqSlfD2EQZFR7uCetFJI L+tGFhtBhEJK0q/qKOdtnT5j0jfUS9w= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-343-9vSPWYoZMfi33reeLp3JPw-1; Mon, 11 Jan 2021 17:43:42 -0500 X-MC-Unique: 9vSPWYoZMfi33reeLp3JPw-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 234EC800050 for ; Mon, 11 Jan 2021 22:43:41 +0000 (UTC) Received: from gerbillo.redhat.com (ovpn-112-22.ams2.redhat.com [10.36.112.22]) by smtp.corp.redhat.com (Postfix) with ESMTP id 4A6855D9DB for ; Mon, 11 Jan 2021 22:43:40 +0000 (UTC) From: Paolo Abeni To: mptcp@lists.01.org Date: Mon, 11 Jan 2021 23:43:26 +0100 Message-Id: <34918ed732944158fd7e5ce697f9f88644bb2c59.1610404441.git.pabeni@redhat.com> In-Reply-To: References: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=pabeni@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Message-ID-Hash: GTJJAYS7YEKYMHWIEJPQZ7NVERQVTGG3 X-Message-ID-Hash: GTJJAYS7YEKYMHWIEJPQZ7NVERQVTGG3 X-MailFrom: pabeni@redhat.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header X-Mailman-Version: 3.1.1 Precedence: list Subject: [MPTCP] [RFC PATCH 2/2] mptcp: leverage the deferred actions for packet scheduling List-Id: Discussions regarding MPTCP upstreaming Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: This change leverage the infrastructure introduced by the previous patch to avoid invoking the MPTCP worker to spool the pending data, when the packet scheduler picks a subflow other then the one currently processing the incoming MPTCP-level ack. Additinally we can fourther refine the subflow selection invoking the packet scheduler for each chunk of data even inside __mptcp_subflow_push_pending() Signed-off-by: Paolo Abeni --- net/mptcp/protocol.c | 24 ++++++++++++++++++++---- 1 file changed, 20 insertions(+), 4 deletions(-) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 3d5ac817b2fb..af45affc0a17 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -1508,7 +1508,9 @@ static void __mptcp_subflow_push_pending(struct sock *sk, struct sock *ssk) struct mptcp_sock *msk = mptcp_sk(sk); struct mptcp_sendmsg_info info; struct mptcp_data_frag *dfrag; + struct sock *xmit_ssk; int len, copied = 0; + bool first = true; info.flags = 0; while ((dfrag = mptcp_send_head(sk))) { @@ -1518,6 +1520,18 @@ static void __mptcp_subflow_push_pending(struct sock *sk, struct sock *ssk) while (len > 0) { int ret = 0; + /* the caller already invoked the packet scheduler, + * check for a different subflow usage only after + * spooling the first chunk of data + */ + xmit_ssk = first ? ssk : mptcp_subflow_get_send(mptcp_sk(sk)); + if (!xmit_ssk) + goto out; + if (xmit_ssk != ssk) { + mptcp_subflow_defer(mptcp_subflow_ctx(xmit_ssk)); + goto out; + } + if (unlikely(mptcp_must_reclaim_memory(sk, ssk))) { __mptcp_update_wmem(sk); sk_mem_reclaim_partial(sk); @@ -1536,6 +1550,7 @@ static void __mptcp_subflow_push_pending(struct sock *sk, struct sock *ssk) msk->tx_pending_data -= ret; copied += ret; len -= ret; + first = false; } WRITE_ONCE(msk->first_pending, mptcp_send_next(sk)); } @@ -2244,7 +2259,6 @@ static void mptcp_worker(struct work_struct *work) if (unlikely(state == TCP_CLOSE)) goto unlock; - mptcp_push_pending(sk, 0); mptcp_check_data_fin_ack(sk); __mptcp_flush_join_list(msk); @@ -2905,10 +2919,12 @@ void __mptcp_check_push(struct sock *sk, struct sock *ssk) return; if (!sock_owned_by_user(sk)) { - if (mptcp_subflow_get_send(mptcp_sk(sk)) == ssk) + struct sock *xmit_ssk = mptcp_subflow_get_send(mptcp_sk(sk)); + + if (xmit_ssk == ssk) __mptcp_subflow_push_pending(sk, ssk); - else - mptcp_schedule_work(sk); + else if (xmit_ssk) + mptcp_subflow_defer(mptcp_subflow_ctx(xmit_ssk)); } else { set_bit(MPTCP_PUSH_PENDING, &mptcp_sk(sk)->flags); }